• ### On the zeros of the spectrogram of white noise(1708.00082)

July 31, 2017 stat.ME
In a recent paper, Flandrin [2015] has proposed filtering based on the zeros of a spectrogram, using the short-time Fourier transform and a Gaussian window. His results are based on empirical observations on the distribution of the zeros of the spectrogram of white Gaussian noise. These zeros tend to be uniformly spread over the time-frequency plane, and not to clutter. Our contributions are threefold: we rigorously define the zeros of the spectrogram of continuous white Gaussian noise, we explicitly characterize their statistical distribution, and we investigate the computational and statistical underpinnings of the practical implementation of signal detection based on the statistics of spectrogram zeros. In particular, we stress that the zeros of spectrograms of white Gaussian noise correspond to zeros of Gaussian analytic functions, a topic of recent independent mathematical interest [Hough et al., 2009].
• ### Zonotope hit-and-run for efficient sampling from projection DPPs(1705.10498)

June 15, 2017 stat.CO, cs.LG, stat.ML
Determinantal point processes (DPPs) are distributions over sets of items that model diversity using kernels. Their applications in machine learning include summary extraction and recommendation systems. Yet, the cost of sampling from a DPP is prohibitive in large-scale applications, which has triggered an effort towards efficient approximate samplers. We build a novel MCMC sampler that combines ideas from combinatorial geometry, linear programming, and Monte Carlo methods to sample from DPPs with a fixed sample cardinality, also called projection DPPs. Our sampler leverages the ability of the hit-and-run MCMC kernel to efficiently move across convex bodies. Previous theoretical results yield a fast mixing time of our chain when targeting a distribution that is close to a projection DPP, but not a DPP in general. Our empirical results demonstrate that this extends to sampling projection DPPs, i.e., our sampler is more sample-efficient than previous approaches which in turn translates to faster convergence when dealing with costly-to-evaluate functions, such as summary extraction in our experiments.
• ### Monte Carlo with Determinantal Point Processes(1605.00361)

June 15, 2019 math.CA, math.PR, stat.CO, stat.ME
We show that repulsive random variables can yield Monte Carlo methods with faster convergence rates than the typical $N^{-1/2}$, where $N$ is the number of integrand evaluations. More precisely, we propose stochastic numerical quadratures involving determinantal point processes associated with multivariate orthogonal polynomials, and we obtain root mean square errors that decrease as $N^{-(1+1/d)/2}$, where $d$ is the dimension of the ambient space. First, we prove a central limit theorem (CLT) for the linear statistics of a class of determinantal point processes, when the reference measure is a product measure supported on a hypercube, which satisfies the Nevai-class regularity condition, a result which may be of independent interest. Next, we introduce a Monte Carlo method based on these determinantal point processes, and prove a CLT with explicit limiting variance for the quadrature error, when the reference measure satisfies a stronger regularity condition. As a corollary, by taking a specific reference measure and using a construction similar to importance sampling, we obtain a general Monte Carlo method, which applies to any measure with continuously derivable density. Loosely speaking, our method can be interpreted as a stochastic counterpart to Gaussian quadrature, which, at the price of some convergence rate, is easily generalizable to any dimension and has a more explicit error term.
• ### Concentration inequalities for sampling without replacement(1309.4029)

July 27, 2015 math.ST, stat.TH
Concentration inequalities quantify the deviation of a random variable from a fixed value. In spite of numerous applications, such as opinion surveys or ecological counting procedures, few concentration results are known for the setting of sampling without replacement from a finite population. Until now, the best general concentration inequality has been a Hoeffding inequality due to Serfling [Ann. Statist. 2 (1974) 39-48]. In this paper, we first improve on the fundamental result of Serfling [Ann. Statist. 2 (1974) 39-48], and further extend it to obtain a Bernstein concentration bound for sampling without replacement. We then derive an empirical version of our bound that does not require the variance to be known to the user.
• ### Adaptive MCMC with online relabeling(1210.2601)

July 27, 2015 math.PR, stat.CO, math.ST, stat.TH, stat.ME
When targeting a distribution that is artificially invariant under some permutations, Markov chain Monte Carlo (MCMC) algorithms face the label-switching problem, rendering marginal inference particularly cumbersome. Such a situation arises, for example, in the Bayesian analysis of finite mixture models. Adaptive MCMC algorithms such as adaptive Metropolis (AM), which self-calibrates its proposal distribution using an online estimate of the covariance matrix of the target, are no exception. To address the label-switching issue, relabeling algorithms associate a permutation to each MCMC sample, trying to obtain reasonable marginals. In the case of adaptive Metropolis (Bernoulli 7 (2001) 223-242), an online relabeling strategy is required. This paper is devoted to the AMOR algorithm, a provably consistent variant of AM that can cope with the label-switching problem. The idea is to nest relabeling steps within the MCMC algorithm based on the estimation of a single covariance matrix that is used both for adapting the covariance of the proposal distribution in the Metropolis algorithm step and for online relabeling. We compare the behavior of AMOR to similar relabeling methods. In the case of compactly supported target distributions, we prove a strong law of large numbers for AMOR and its ergodicity. These are the first results on the consistency of an online relabeling algorithm to our knowledge. The proof underlines latent relations between relabeling and vector quantization.
• ### Inference for determinantal point processes without spectral knowledge(1507.01154)

July 4, 2015 stat.CO, stat.ML
Determinantal point processes (DPPs) are point process models that naturally encode diversity between the points of a given realization, through a positive definite kernel $K$. DPPs possess desirable properties, such as exact sampling or analyticity of the moments, but learning the parameters of kernel $K$ through likelihood-based inference is not straightforward. First, the kernel that appears in the likelihood is not $K$, but another kernel $L$ related to $K$ through an often intractable spectral decomposition. This issue is typically bypassed in machine learning by directly parametrizing the kernel $L$, at the price of some interpretability of the model parameters. We follow this approach here. Second, the likelihood has an intractable normalizing constant, which takes the form of a large determinant in the case of a DPP over a finite set of objects, and the form of a Fredholm determinant in the case of a DPP over a continuous domain. Our main contribution is to derive bounds on the likelihood of a DPP, both for finite and continuous domains. Unlike previous work, our bounds are cheap to evaluate since they do not rely on approximating the spectrum of a large matrix or an operator. Through usual arguments, these bounds thus yield cheap variational inference and moderately expensive exact Markov chain Monte Carlo inference methods for DPPs.
• ### On Markov chain Monte Carlo methods for tall data(1505.02827)

May 11, 2015 stat.CO, stat.ME, stat.ML
Markov chain Monte Carlo methods are often deemed too computationally intensive to be of any practical use for big data applications, and in particular for inference on datasets containing a large number $n$ of individual data points, also known as tall datasets. In scenarios where data are assumed independent, various approaches to scale up the Metropolis-Hastings algorithm in a Bayesian inference context have been recently proposed in machine learning and computational statistics. These approaches can be grouped into two categories: divide-and-conquer approaches and, subsampling-based algorithms. The aims of this article are as follows. First, we present a comprehensive review of the existing literature, commenting on the underlying assumptions and theoretical guarantees of each method. Second, by leveraging our understanding of these limitations, we propose an original subsampling-based approach which samples from a distribution provably close to the posterior distribution of interest, yet can require less than $O(n)$ data point likelihood evaluations at each iteration for certain statistical models in favourable scenarios. Finally, we have only been able so far to propose subsampling-based methods which display good performance in scenarios where the Bernstein-von Mises approximation of the target posterior distribution is excellent. It remains an open challenge to develop such methods in scenarios where the Bernstein-von Mises approximation is poor.
• Contributions of the Pierre Auger Collaboration to the 33rd International Cosmic Ray Conference, Rio de Janeiro, Brazil, July 2013