
It is well known in many settings that reversible Langevin diffusions in
confining potentials converge to equilibrium exponentially fast. Adding
irreversible perturbations to the drift of a Langevin diffusion that maintain
the same invariant measure accelerates its convergence to stationarity. Many
existing works thus advocate the use of such nonreversible dynamics for
sampling. When implementing Markov Chain Monte Carlo algorithms (MCMC) using
time discretisations of such Stochastic Differential Equations (SDEs), one can
append the discretization with the usual MetropolisHastings acceptreject step
and this is often done in practice because the acceptreject step eliminates
bias. On the other hand, such a step makes the resulting chain reversible. It
is not known whether adding the acceptreject step preserves the faster mixing
properties of the nonreversible dynamics. In this paper, we address this gap
between theory and practice by analyzing the optimal scaling of MCMC algorithms
constructed from proposal moves that are timestep Euler discretisations of an
irreversible SDE, for high dimensional Gaussian target measures. We call the
resulting algorithm the \imala, in comparison to the classical MALA algorithm
(here {\em ip} is for irreversible proposal). In order to quantify how the cost
of the algorithm scales with the dimension $N$, we prove invariance principles
for the appropriately rescaled chain. In contrast to the usual MALA algorithm,
we show that there could be two regimes asymptotically: (i) a diffusive regime,
as in the MALA algorithm and (ii) a ``fluid" regime where the limit is an
ordinary differential equation. We provide concrete examples where the limit is
a diffusion, as in the standard MALA, but with provably higher limiting
acceptance probabilities. Numerical results are also given corroborating the
theory.

As it has become common to use many computer cores in routine applications,
finding good ways to parallelize popular algorithms has become increasingly
important. In this paper, we present a parallelization scheme for Markov chain
Monte Carlo (MCMC) methods based on spectral clustering of the underlying state
space, generalizing earlier work on parallelization of MCMC methods by state
space partitioning. We show empirically that this approach speeds up MCMC
sampling for multimodal distributions and that it can be usefully applied in
greater generality than several related algorithms. Our algorithm converges
under reasonable conditions to an `optimal' MCMC algorithm. We also show that
our approach can be asymptotically far more efficient than naive
parallelization, even in situations such as completely flat target
distributions where no unique optimal algorithm exists. Finally, we combine
theoretical and empirical bounds to provide practical guidance on the choice of
tuning parameters.

Determining the total variation mixing time of Kac's random walk on the
special orthogonal group $\mathrm{SO}(n)$ has been a longstanding open
problem. In this paper, we construct a novel nonMarkovian coupling for
bounding this mixing time. The analysis of our coupling entails controlling the
smallest singular value of a certain random matrix with highly dependent
entries. The dependence of the entries in our matrix makes it notamenable to
existing techniques in random matrix theory. To circumvent this difficulty, we
extend some recent bounds on the smallest singular values of matrices with
independent entries to our setting. These bounds imply that the mixing time of
Kac's walk on the group $\mathrm{SO}(n)$ is between $C_{1} n^{2}$ and $C_{2}
n^{4} \log(n)$ for some explicit constants $0 < C_{1}, C_{2} < \infty$,
substantially improving on the bound of $O(n^{5} \log(n)^{2})$ by Jiang. Our
methods may also be applied to other high dimensional Gibbs samplers with
constraints and thus are of independent interest. In addition to giving
analytical bounds on the mixing time, our approach allows us to compute
rigorous estimates of the mixing time by simulating the eigenvalues of a random
matrix.

Two common concerns raised in analyses of randomized experiments are (i)
appropriately handling issues of noncompliance, and (ii) appropriately
adjusting for multiple tests (e.g., on multiple outcomes or subgroups).
Although simple intentiontotreat (ITT) and Bonferroni methods are valid in
terms of type I error, they can each lead to a substantial loss of power; when
employing both simultaneously, the total loss may be severe. Alternatives exist
to address each concern. Here we propose an analysis method for experiments
involving both features that merges posterior predictive $p$values for
complier causal effects with randomizationbased multiple comparisons
adjustments; the results are valid familywise tests that are doubly
advantageous: more powerful than both those based on standard ITT statistics
and those using traditional multiple comparison adjustments. The operating
characteristics and advantages of our method are demonstrated through a series
of simulated experiments and an analysis of the United States Job Training
Partnership Act (JTPA) Study, where our methods lead to different conclusions
regarding the significance of estimated JTPA effects.

Twocomponent mixture priors provide a traditional way to induce sparsity in
highdimensional Bayes models. However, several aspects of such a prior,
including computational complexities in highdimensions, interpretation of
exact zeros and nonsparse posterior summaries under standard loss functions,
has motivated an amazing variety of continuous shrinkage priors, which can be
expressed as globallocal scale mixtures of Gaussians. Interestingly, we
demonstrate that many commonly used shrinkage priors, including the Bayesian
Lasso, do not have adequate posterior concentration in highdimensional
settings.

Determining the mixing time of Kac's random walk on the sphere
$\mathrm{S}^{n1}$ is a longstanding open problem. We show that the total
variation mixing time of Kac's walk on $\mathrm{S}^{n1}$ is between
$\frac{1}{2} \, n \log(n)$ and $200 \,n \log(n)$. Our bound is thus optimal up
to a constant factor, improving on the bestknown upper bound of $O(n^{5}
\log(n)^{2})$ due to Jiang. Our main tool is a `nonMarkovian' coupling
recently introduced by the second author for obtaining the convergence rates of
certain high dimensional Gibbs samplers in continuous state spaces.

It is well known that the ratio of two independent standard Gaussian random
variables follows a Cauchy distribution. Any convex combination of independent
standard Cauchy random variables also follows a Cauchy distribution. In a
recent joint work, the author proved a surprising multivariate generalization
of the above facts. Fix $m > 1$ and let $\Sigma$ be a $m\times m$ positive
semidefinite matrix. Let $X,Y \sim \mathrm{N}(0,\Sigma)$ be independent
vectors. Let $\vec{w}=(w_1, \dots, w_m)$ be a vector of nonnegative numbers
with $\sum_{j=1}^m w_j = 1.$ The author proved recently that the random
variable $$ Z = \sum_{j=1}^m w_j\frac{X_j}{Y_j}\; $$ also has the standard
Cauchy distribution. In this note, we provide some more understanding of this
result and give a number of natural generalizations. In particular, we observe
that if $(X,Y)$ have the same marginal distribution, they need neither be
independent nor be jointly normal for $Z$ to be Cauchy distributed. In fact,
our calculations suggest that joint normality of $(X,Y)$ may be the only
instance in which they can be independent. Our results also give a method to
construct copulas of Cauchy distributions.

Volume limitations and low yield thresholds of biological fluids have led to
widespread use of passive microparticle rheology. The meansquareddisplacement
(MSD) statistics of bead position time series (bead paths) are either applied
directly to determine the creep compliance [Xu et al (1998)] or transformed to
determine dynamic storage and loss moduli [Mason & Weitz (1995)]. A prevalent
hurdle arises when there is a nondiffusive experimental drift in the data.
Commensurate with the magnitude of drift relative to diffusive mobility,
quantified by a P\'eclet number, the MSD statistics are distorted, and thus the
path data must be "corrected" for drift. The standard approach is to estimate
and subtract the drift from particle paths, and then calculate MSD statistics.
We present an alternative, parametric approach using maximum likelihood
estimation that simultaneously fits drift and diffusive model parameters from
the path data; the MSD statistics (and consequently the compliance and dynamic
moduli) then follow directly from the bestfit model. We illustrate and compare
both methods on simulated path data over a range of P\'eclet numbers, where
exact answers are known. We choose fractional Brownian motion as the numerical
model because it affords tunable, subdiffusive MSD statistics consistent with
typical 30 second long, experimental observations of microbeads in several
biological fluids. Finally, we apply and compare both methods on data from
human bronchial epithelial cell culture mucus.

Many finitestate reversible Markov chains can be naturally decomposed into
"projection" and "restriction" chains. In this paper we provide bounds on the
total variation mixing times of the original chain in terms of the mixing
properties of these related chains. This paper is in the tradition of existing
bounds on Poincare and logSobolev constants of Markov chains in terms of
similar decompositions [JSTV04, MR02, MR06, MY09]. Our proofs are simple,
relying largely on recent results relating hitting and mixing times of
reversible Markov chains [PS13, Oli12]. We describe situations in which our
results give substantially better bounds than those obtained by applying
existing decomposition results and provide examples for illustration.

Stateoftheart techniques in passive particletracking microscopy provide
highresolution path trajectories of diverse foreign particles in biological
fluids. For particles on the order of 1 micron diameter, these paths are
generally inconsistent with simple Brownian motion. Yet, despite an abundance
of data confirming these findings and their wideranging scientific
implications, stochastic modeling of the complex particle motion has received
comparatively little attention. Even among posited models, there is virtually
no literature on likelihoodbased inference, model comparisons, and other
quantitative assessments. In this article, we develop a rigorous and
computationally efficient Bayesian methodology to address this gap. We analyze
two of the most prevalent candidate models for 30 second paths of 1 micron
diameter tracer particles in human lung mucus: fractional Brownian motion (fBM)
and a Generalized Langevin Equation (GLE) consistent with viscoelastic theory.
Our model comparisons distinctly favor GLE over fBM, with the former describing
the data remarkably well up to the timescales for which we have reliable
information.

The Cauchy distribution is usually presented as a mathematical curiosity, an
exception to the Law of Large Numbers, or even as an "Evil" distribution in
some introductory courses. It therefore surprised us when Drton and Xiao (2016)
proved the following result for $m=2$ and conjectured it for $m\ge 3$. Let $X=
(X_1,..., X_m)$ and $Y = (Y_1, ...,Y_m)$ be i.i.d $N(0,\Sigma)$, where
$\Sigma=\{\sigma_{ij}\}\ge 0$ is an $m\times m$ and \textit{arbitrary}
covariance matrix with $\sigma_{jj}>0$ for all $1\leq j\leq m$. Then $$Z =
\sum_{j=1}^m w_j \frac{X_j}{Y_j} \ \sim \mathrm{Cauchy}(0,1),$$ as long as
$w=(w_1,..., w_m) $ is independent of $(X, Y)$, $w_j\ge 0, j=1,..., m$, and
$\sum_{j=1}^m w_j=1$. In this note, we present an elementary proof of this
conjecture for any $m \geq 2$ by linking $Z$ to a geometric characterization of
Cauchy(0,1) given in Willams (1969). This general result is essential to the
large sample behavior of Wald tests in many applications such as factor models
and contingency tables. It also leads to other unexpected results such as $$
\sum_{i=1}^m\sum_{j=1}^m \frac{w_iw_j\sigma_{ij}}{X_iX_j} \sim
{\text{L\'{e}vy}}(0, 1). $$ This generalizes the "super Cauchy phenomenon" that
the average of $m$ i.i.d. standard L\'evy variables (i.e., inverse chisquared
variables with one degree of freedom) has the same distribution as that of a
single standard L\'evy variable multiplied by $m$ (which is obtained by taking
$w_j=1/m$ and $\Sigma$ to be the identity matrix).

We construct a new framework for accelerating Markov chain Monte Carlo in
posterior sampling problems where standard methods are limited by the
computational cost of the likelihood, or of numerical models embedded therein.
Our approach introduces local approximations of these models into the
MetropolisHastings kernel, borrowing ideas from deterministic approximation
theory, optimization, and experimental design. Previous efforts at integrating
approximate models into inference typically sacrifice either the sampler's
exactness or efficiency; our work seeks to address these limitations by
exploiting useful convergence characteristics of local approximations. We prove
the ergodicity of our approximate Markov chain, showing that it samples
asymptotically from the \emph{exact} posterior distribution of interest. We
describe variations of the algorithm that employ either local polynomial
approximations or local Gaussian process regressors. Our theoretical results
reinforce the key observation underlying this paper: when the likelihood has
some \emph{local} regularity, the number of model evaluations per MCMC step can
be greatly reduced without biasing the Monte Carlo average. Numerical
experiments demonstrate multiple orderofmagnitude reductions in the number of
forward model evaluations used in representative ODE and PDE inference
problems, with both synthetic and real data.

It has historically been a challenge to perform Bayesian inference in a
designbased survey context. The present paper develops a Bayesian model for
sampling inference in the presence of inverseprobability weights. We use a
hierarchical approach in which we model the distribution of the weights of the
nonsampled units in the population and simultaneously include them as
predictors in a nonparametric Gaussian process regression. We use simulation
studies to evaluate the performance of our procedure and compare it to the
classical designbased estimator. We apply our method to the Fragile Family and
Child Wellbeing Study. Our studies find the Bayesian nonparametric finite
population estimator to be more robust than the classical designbased
estimator without loss in efficiency, which works because we induce
regularization for small cells and thus this is a way of automatically
smoothing the highly variable weights.

In many modern applications, difficulty in evaluating the posterior density
makes performing even a single MCMC step slow. This difficulty can be caused by
intractable likelihood functions, but also appears for routine problems with
large data sets. Many researchers have responded by running approximate
versions of MCMC algorithms. In this note, we develop quantitative bounds for
showing the ergodicity of these approximate samplers. We then use these bounds
to study the biasvariance tradeoff of approximate MCMC algorithms. We apply
our results to simple versions of recently proposed algorithms, including a
variant of the "austerity" framework of Korratikara et al.

In this paper, we investigate Gaussian process regression models where inputs
are subject to measurement error. In spatial statistics, input measurement
errors occur when the geographical locations of observed data are not known
exactly. Such sources of error are not special cases of "nugget" or microscale
variation, and require alternative methods for both interpolation and parameter
estimation. Gaussian process models do not straightforwardly extend to
incorporate input measurement error, and simply ignoring noise in the input
space can lead to poor performance for both prediction and parameter inference.
We review and extend existing theory on prediction and estimation in the
presence of location errors, and show that ignoring location errors may lead to
Kriging that is not "selfefficient". We also introduce a Markov Chain Monte
Carlo (MCMC) approach using the Hybrid Monte Carlo algorithm that obtains
optimal (minimum MSE) predictions, and discuss situations that lead to
multimodality of the target distribution and/or poor chain mixing. Through
simulation study and analysis of global air temperature data, we show that
appropriate methods for incorporating location measurement error are essential
to valid inference in this regime.

In the AGEMAP genomics study, researchers were interested in detecting genes
related to age in a variety of tissue types. After not finding many agerelated
genes in some of the analyzed tissue types, the study was criticized for having
low power. It is possible that the low power is due to the presence of
important unmeasured variables, and indeed we find that a latent factor model
appears to explain substantial variability not captured by measured covariates.
We propose including the estimated latent factors in a multiple regression
model. The key difficulty in doing so is assigning appropriate degrees of
freedom to the estimated factors to obtain unbiased error variance estimators
and enable valid hypothesis testing. When the number of responses is large
relative to the sample size, treating the estimated factors like observed
covariates leads to a downward bias in the variance estimates. Many adhoc
solutions to this problem have been proposed in the literature without the
backup of a careful theoretical analysis. Using recent results from random
matrix theory, we derive a simple, easy to use expression for degrees of
freedom. Our estimate gives a principled alternative to adhoc approaches in
common use. Extensive simulation results show excellent agreement between the
proposed estimator and its theoretical value. Applying our methodology to the
AGEMAP genomics study, we found an order of magnitude increase in the number of
significant genes. Although we focus on the AGEMAP study, the methods developed
in this paper are widely applicable to other multivariate models, and thus are
of independent interest.

In this paper, we study the detection boundary for minimax hypothesis testing
in the context of highdimensional, sparse binary regression models. Motivated
by genetic sequencing association studies for rare variant effects, we
investigate the complexity of the hypothesis testing problem when the design
matrix is sparse. We observe a new phenomenon in the behavior of detection
boundary which does not occur in the case of Gaussian linear regression. We
derive the detection boundary as a function of two components: a design matrix
sparsity index and signal strength, each of which is a function of the sparsity
of the alternative. For any alternative, if the design matrix sparsity index is
too high, any test is asymptotically powerless irrespective of the magnitude of
signal strength. For binary design matrices with the sparsity index that is not
too high, our results are parallel to those in the Gaussian case. In this
context, we derive detection boundaries for both dense and sparse regimes. For
the dense regime, we show that the generalized likelihood ratio is rate
optimal; for the sparse regime, we propose an extended Higher Criticism Test
and show it is rate optimal and sharp. We illustrate the finite sample
properties of the theoretical results using simulation studies.

We study a kinetically constrained Ising process (KCIP) associated with a
graph G and density parameter p; this process is an interacting particle system
with state space $\{0,1\}^{G}$. The stationary distribution of the KCIP Markov
chain is the Binomial($G, p$) distribution on the number of particles,
conditioned on having at least one particle. The `constraint' in the name of
the process refers to the rule that a vertex cannot change its state unless it
has at least one neighbour in state `1'. The KCIP has been proposed by
statistical physicists as a model for the glass transition, and more recently
as a simple algorithm for data storage in computer networks. In this note, we
study the mixing time of this process on the torus $G = \mathbb{Z}_{L}^{d}$, $d
\geq 3$, in the lowdensity regime $p = \frac{c}{n}$ for arbitrary $0 < c < 1$;
this regime is the subject of a conjecture of Aldous and is natural in the
context of computer networks. Our results provide a counterexample to Aldous'
conjecture, suggest a natural modifcation of the conjecture, and show that this
modifcation is correct up to logarithmic factors. The methods developed in this
paper also provide a strategy for tackling Aldous' conjecture for other graphs.

Lung tumor tracking for radiotherapy requires realtime, multiplestep ahead
forecasting of a quasiperiodic time series recording instantaneous tumor
locations. We introduce a locationmixture autoregressive (LMAR) process that
admits multimodal conditional distributions, fast approximate inference using
the EM algorithm and accurate multiplestep ahead predictive distributions.
LMAR outperforms several commonly used methods in terms of outofsample
prediction accuracy using clinical data from lung tumor patients. With its
superior predictive performance and realtime computation, the LMAR model could
be effectively implemented for use in current tumor tracking systems.

Adaptive Markov chains are an important class of Monte Carlo methods for
sampling from probability distributions. The time evolution of adaptive
algorithms depends on past samples, and thus these algorithms are
nonMarkovian. Although there has been previous work establishing conditions
for their ergodicity, not much is known theoretically about their finite sample
properties. In this paper, using a notion of discrete Ricci curvature for
Markov kernels introduced by Ollivier, we establish concentration inequalities
and finite sample bounds for a class of adaptive Markov chains. After
establishing some general results, we give quantitative bounds for
`multilevel' adaptive algorithms such as the equienergy sampler. We also
provide the first rigorous proofs that the finite sample properties of an
equienergy sampler are superior to those of related parallel tempering and
MetropolisHastings samplers after a learning period comparable to their mixing
times.

Sparse Bayesian factor models are routinely implemented for parsimonious
dependence modeling and dimensionality reduction in highdimensional
applications. We provide theoretical understanding of such Bayesian procedures
in terms of posterior convergence rates in inferring highdimensional
covariance matrices where the dimension can be larger than the sample size.
Under relevant sparsity assumptions on the true covariance matrix, we show that
commonlyused point mass mixture priors on the factor loadings lead to
consistent estimation in the operator norm even when $p\gg n$. One of our major
contributions is to develop a new class of continuous shrinkage priors and
provide insights into their concentration around sparse vectors. Using such
priors for the factor loadings, we obtain similar rate of convergence as
obtained with point mass mixture priors. To obtain the convergence rates, we
construct test functions to separate points in the space of highdimensional
covariance matrices using insights from random matrix theory; the tools
developed may be of independent interest. We also derive minimax rates and show
that the Bayesian posterior rates of convergence coincide with the minimax
rates upto a $\sqrt{\log n}$ term.

In this paper we prove the universality of covariance matrices of the form
$H_{N\times N}={X}^{\dagger}X$ where $X$ is an ${M\times N}$ rectangular matrix
with independent real valued entries $x_{ij}$ satisfying $\mathbb{E}x_{ij}=0$
and $\mathbb{E}x^2_{ij}={\frac{1}{M}}$, $N$, $M\to \infty$. Furthermore it is
assumed that these entries have subexponential tails or sufficiently high
number of moments. We will study the asymptotics in the regime
$N/M=d_N\in(0,\infty),\lim_{N\to\infty}d_N\neq0,\infty$. Our main result is the
edge universality of the sample covariance matrix at both edges of the
spectrum. In the case $\lim_{N\to\infty}d_N=1$, we only focus on the largest
eigenvalue. Our proof is based on a novel version of the Green function
comparison theorem for data matrices with dependent entries. En route to
proving edge universality, we establish that the Stieltjes transform of the
empirical eigenvalue distribution of $H$ is given by the MarcenkoPastur law
uniformly up to the edges of the spectrum with an error of order $(N\eta)^{1}$
where $\eta$ is the imaginary part of the spectral parameter in the Stieltjes
transform. Combining these results with existing techniques we also show bulk
universality of covariance matrices. All our results hold for both real and
complex valued entries.

Consider a probability measure on a Hilbert space defined via its density
with respect to a Gaussian. The purpose of this paper is to demonstrate that an
appropriately defined Markov chain, which is reversible with respect to the
measure in question, exhibits a diffusion limit to a noisy gradient flow, also
reversible with respect to the same measure. The Markov chain is defined by
applying a MetropolisHastings acceptreject mechanism to an OrnsteinUhlenbeck
proposal which is itself reversible with respect to the underlying Gaussian
measure. The resulting noisy gradient flow is a stochastic partial differential
equation driven by a Wiener process with spatial correlation given by the
underlying Gaussian structure.

We describe a new MCMC method optimized for the sampling of probability
measures on Hilbert space which have a density with respect to a Gaussian; such
measures arise in the Bayesian approach to inverse problems, and in conditioned
diffusions. Our algorithm is based on two key design principles: (i) algorithms
which are welldefined in infinite dimensions result in methods which do not
suffer from the curse of dimensionality when they are applied to approximations
of the infinite dimensional target measure on $\bbR^N$; (ii) nonreversible
algorithms can have better mixing properties compared to their reversible
counterparts. The method we introduce is based on the hybrid Monte Carlo
algorithm, tailored to incorporate these two design principles. The main result
of this paper states that the new algorithm, appropriately rescaled, converges
weakly to a second order Langevin diffusion on Hilbert space; as a consequence
the algorithm explores the approximate target measures on $\bbR^N$ in a number
of steps which is independent of $N$. We also present the underlying theory for
the limiting nonreversible diffusion on Hilbert space, including
characterization of the invariant measure, and we describe numerical
simulations demonstrating that the proposed method has favourable mixing
properties as an MCMC algorithm.

Penalized regression methods, such as $L_1$ regularization, are routinely
used in highdimensional applications, and there is a rich literature on
optimality properties under sparsity assumptions. In the Bayesian paradigm,
sparsity is routinely induced through twocomponent mixture priors having a
probability mass at zero, but such priors encounter daunting computational
problems in high dimensions. This has motivated an amazing variety of
continuous shrinkage priors, which can be expressed as globallocal scale
mixtures of Gaussians, facilitating computation. In sharp contrast to the
frequentist literature, little is known about the properties of such priors and
the convergence and concentration of the corresponding posterior distribution.
In this article, we propose a new class of DirichletLaplace (DL) priors,
which possess optimal posterior concentration and lead to efficient posterior
computation exploiting results from normalized random measure theory. Finite
sample performance of DirichletLaplace priors relative to alternatives is
assessed in simulated and real data examples.