
The horseshoe prior is frequently employed in Bayesian analysis of
highdimensional models, and has been shown to achieve minimax optimal risk
properties when the truth is sparse. While optimizationbased algorithms for
the extremely popular Lasso and elastic net procedures can scale to dimension
in the hundreds of thousands, algorithms for the horseshoe that use Markov
chain Monte Carlo (MCMC) for computation are limited to problems an order of
magnitude smaller. This is due to high computational cost per step and growth
of the variance of timeaveraging estimators as a function of dimension. We
propose two new MCMC algorithms for computation in these models that have
improved performance compared to existing alternatives. One of the algorithms
also approximates an expensive matrix product to give orders of magnitude
speedup in highdimensional applications. We prove that the exact algorithm is
geometrically ergodic, and give guarantees for the accuracy of the approximate
algorithm using perturbation theory. Versions of the approximation algorithm
that gradually decrease the approximation error as the chain extends are shown
to be exact. The scalability of the algorithm is illustrated in simulations
with problem size as large as $N=5,000$ observations and $p=50,000$ predictors,
and an application to a genomewide association study with $N=2,267$ and
$p=98,385$. The empirical results also show that the new algorithm yields
estimates with lower mean squared error, intervals with better coverage, and
elucidates features of the posterior that were often missed by previous
algorithms in high dimensions, including bimodality of posterior marginals
indicating uncertainty about which covariates belong in the model.

A variety of methods have been proposed for inference about extreme
dependence for multivariate or spatiallyindexed stochastic processes and time
series. Most of these proceed by first transforming data to some specific
extreme value marginal distribution, often the unit Fr\'echet, then fitting a
family of maxstable processes to the transformed data and exploring dependence
within the framework of that model. The marginal transformation, model
selection, and model fitting are all possible sources of misspecification in
this approach.
We propose an alternative modelfree approach, based on the idea that
substantial information on the strength of tail dependence and its temporal
structure are encoded in the distribution of the waiting times between
exceedances of high thresholds at different locations. We propose quantifying
the strength of extremal dependence and assessing uncertainty by using
statistics based on these waiting times. The method does not rely on any
specific underlying model for the process, nor on asymptotic distribution
theory. The method is illustrated by applications to climatological, financial,
and electrophysiology data.
To put the proposed approach within the context of the existing literature,
we construct a class of spacetimeindexed stochastic processes whose waiting
time distributions are available in closed form by endowing the support points
in de Haan's spectral representation of maxstable processes with random birth
times, velocities, and lifetimes, and applying Smith's model to these
processes. We show that waiting times in this model are stochatically
decreasing in mean speed, and the sample mean of the waiting times obeys a
central limit theorem with a uniform convergence rate under mild conditions.
This indicates that our procedure can be implemented in this setting using
standard $t$ statistics and associated hypothesis tests.

Recovery of population size history from molecular sequence data is an
important problem in population genetics. Inference commonly relies on a
coalescent model linking the population size history to genealogies. The high
computational cost of estimating parameters from these models usually compels
researchers to select a subset of the available data or to rely on
nonsufficient summary statistics for statistical inference. We consider the
problem of recovering the true population size history from two possible
alternatives on the basis of coalescent time data. We give exact expressions
for the probability of selecting the correct alternative in a variety of
biologically interesting cases as a function of the separation between the
alternative size histories, the number of individuals, loci, and the sampling
times. The results are applied to human population history. This work has
significant implications for optimal design when the inferential goal is to
test scientific hypotheses about population size trajectories in coalescent
models with and without recombination.

A/B testing is ubiquitous within the machine learning and data science
operations of internet companies. Generically, the idea is to perform a
statistical test of the hypothesis that a new feature is better than the
existing platformfor example, it results in higher revenue. If the p value
for the test is below some predefined thresholdoften, 0.05the new
feature is implemented. The difficulty of choosing an appropriate threshold has
been noted before, particularly because dependent tests are often done
sequentially, leading some to propose control of the false discovery rate (FDR)
rather than use of a single, universal threshold. However, it is still
necessary to make an arbitrary choice of the level at which to control FDR.
Here we suggest a decisiontheoretic approach to determining whether to adopt a
new feature, which enables automated selection of an appropriate threshold. Our
method has the basic ingredients of any decisiontheory problem: a loss
function, action space, and a notion of optimality, for which we choose Bayes
risk. However, the loss function and the action space differ from the typical
choices made in the literature, which has focused on the theory of point
estimation. We give some basic results for Bayesoptimal thresholding rules for
the feature adoption decision, and give some examples using eBay data. The
results suggest that the 0.05 pvalue threshold may be too conservative in some
settings, but that its widespread use may reflect an adhoc means of
controlling multiplicity in the common case of repeatedly testing variants of
an experiment when the threshold is not reached.

There has been considerable interest in making Bayesian inference more
scalable. In big data settings, most literature focuses on reducing the
computing time per iteration, with less focused on reducing the number of
iterations needed in Markov chain Monte Carlo (MCMC). This article focuses on
data augmentation MCMC (DAMCMC), a widely used technique. DAMCMC samples tend
to become highly autocorrelated in large data samples, due to a miscalibration
problem in which conditional posterior distributions given augmented data are
too concentrated. This makes it necessary to collect very long MCMC paths to
obtain acceptably low MC error. To combat this inefficiency, we propose a
family of calibrated data augmentation algorithms, which appropriately adjust
the variance of conditional posterior distributions. A MetropolisHastings step
is used to eliminate bias in the stationary distribution of the resulting
sampler. Compared to existing alternatives, this approach can dramatically
reduce MC error by reducing autocorrelation and increasing the effective number
of DAMCMC samples per computing time. The approach is simple and applicable to
a broad variety of existing data augmentation algorithms, and we focus on three
popular models: probit, logistic and Poisson loglinear. Dramatic gains in
computational efficiency are shown in applications.

The Markov Chain Monte Carlo method is the dominant paradigm for posterior
computation in Bayesian analysis. It is common to control computation time by
making approximations to the Markov transition kernel. Comparatively little
attention has been paid to computational optimality in these approximating
Markov Chains, or when such approximations are justified relative to obtaining
shorter paths from the exact kernel. We give simple, sharp bounds for uniform
approximations of uniformly mixing Markov chains. We then suggest a notion of
optimality that incorporates computation time and approximation error, and use
our bounds to make generalizations about properties of good approximations in
the uniformly mixing setting. The relevance of these properties is demonstrated
in applications to a minibatchingbased approximate MCMC algorithm for large
$n$ logistic regression and lowrank approximations for Gaussian processes.

Many modern applications collect highly imbalanced categorical data, with
some categories relatively rare. Bayesian hierarchical models combat data
sparsity by borrowing information, while also quantifying uncertainty. However,
posterior computation presents a fundamental barrier to routine use; a single
class of algorithms does not work well in all settings and practitioners waste
time trying different types of MCMC approaches. This article was motivated by
an application to quantitative advertising in which we encountered extremely
poor computational performance for common data augmentation MCMC algorithms but
obtained excellent performance for adaptive Metropolis. To obtain a deeper
understanding of this behavior, we give strong theory results on computational
complexity in an infinitely imbalanced asymptotic regime. Our results show
computational complexity of Metropolis is logarithmic in sample size, while
data augmentation is polynomial in sample size. The root cause of poor
performance of data augmentation is a discrepancy between the rates at which
the target density and MCMC step sizes concentrate. In general, MCMC algorithms
that have a similar discrepancy will fail in large samples  a result with
substantial practical impact.

This simple note lays out a few observations which are well known in many
ways but may not have been said in quite this way before. The basic idea is
that when comparing two different Markov chains it is useful to couple them is
such a way that they agree as often as possible. We construct such a coupling
and analyze it by a simple dominating chain which registers if the two
processes agree or disagree. We find that this imagery is useful when thinking
about such problems. We are particularly interested in comparing the invariant
measures and long time averages of the processes. However, since the paths
agree for long runs, it also provides estimates on various stopping times such
as hitting or exit times. We also show that certain bounds are tight. Finally,
we provide a simple application to a Markov Chain Monte Carlo algorithm and
show numerically that the results of the paper show a good level of
approximation at considerable speed up by using an approximating chain rather
than the original sampling chain.

There has been substantial recent interest in record linkage, attempting to
group the records pertaining to the same entities from a large database lacking
unique identifiers. This can be viewed as a type of "microclustering," with few
observations per cluster and a very large number of clusters. A variety of
methods have been proposed, but there is a lack of literature providing
theoretical guarantees on performance. We show that the problem is
fundamentally hard from a theoretical perspective, and even in idealized cases,
accurate entity resolution is effectively impossible when the number of
entities is small relative to the number of records and/or the separation among
records from different entities is not extremely large. To characterize the
fundamental difficulty, we focus on entity resolution based on multivariate
Gaussian mixture models, but our conclusions apply broadly and are supported by
simulation studies inspired by human rights applications. These results suggest
conservatism in interpretation of the results of record linkage, support
collection of additional data to more accurately disambiguate the entities, and
motivate a focus on coarser inference. For example, results from a simulation
study suggest that sometimes one may obtain accurate results for population
size estimation even when fine scale entity resolution is inaccurate.

Predictive modeling is increasingly being employed to assist human
decisionmakers. One purported advantage of replacing or augmenting human
judgment with computer models in high stakes settings such as sentencing,
hiring, policing, college admissions, and parole decisions is the perceived
"neutrality" of computers. It is argued that because computer models do not
hold personal prejudice, the predictions they produce will be equally free from
prejudice. There is growing recognition that employing algorithms does not
remove the potential for bias, and can even amplify it if the training data
were generated by a process that is itself biased. In this paper, we provide a
probabilistic notion of algorithmic bias. We propose a method to eliminate bias
from predictive models by removing all information regarding protected
variables from the data to which the models will ultimately be trained. Unlike
previous work in this area, our framework is general enough to accommodate data
on any measurement scale. Motivated by models currently in use in the criminal
justice system that inform decisions on pretrial release and parole, we apply
our proposed method to a dataset on the criminal histories of individuals at
the time of sentencing to produce "raceneutral" predictions of rearrest. In
the process, we demonstrate that a common approach to creating "raceneutral"
models omitting race as a covariate still results in racially disparate
predictions. We then demonstrate that the application of our proposed method to
these data removes racial disparities from predictions with minimal impact on
predictive accuracy.

Capturerecapture methods aim to estimate the size of a closed population on
the basis of multiple incomplete enumerations of individuals. In many
applications, the individual probability of being recorded is heterogeneous in
the population. Previous studies have suggested that it is not possible to
reliably estimate the total population size when capture heterogeneity exists.
Here we approach population estimation in the presence of capture heterogeneity
as a latent length biased nonparametric density estimation problem on the unit
interval. We show that in this setting it is generally impossible to estimate
the density on the entire unit interval in finite samples, and that estimators
of the population size have high and sometimes unbounded risk when the density
has significant mass near zero. As an alternative, we propose estimating the
population of individuals with capture probability exceeding some threshold. We
provide methods for selecting an appropriate threshold, and show that this
approach results in estimators with substantially lower risk than estimators of
the total population size, with correspondingly smaller uncertainty, even when
the parameter of interest is the total population. The alternative paradigm is
demonstrated in extensive simulation studies and an application to snowshoe
hare multiple recapture data.

In contingency table analysis, sparse data is frequently encountered for even
modest numbers of variables, resulting in nonexistence of maximum likelihood
estimates. A common solution is to obtain regularized estimates of the
parameters of a loglinear model. Bayesian methods provide a coherent approach
to regularization, but are often computationally intensive. Conjugate priors
ease computational demands, but the conjugate DiaconisYlvisaker priors for the
parameters of loglinear models do not give rise to closed form credible
regions, complicating posterior inference. Here we derive the optimal Gaussian
approximation to the posterior for loglinear models with DiaconisYlvisaker
priors, and provide convergence rate and finitesample bounds for the
KullbackLeibler divergence between the exact posterior and the optimal
Gaussian approximation. We demonstrate empirically in simulations and a real
data application that the approximation is highly accurate, even in relatively
small samples. The proposed approximation provides a computationally scalable
and principled approach to regularized estimation and approximate Bayesian
inference for loglinear models.

Contingency table analysis routinely relies on log linear models, with latent
structure analysis providing a common alternative. Latent structure models lead
to a low rank tensor factorization of the probability mass function for
multivariate categorical data, while log linear models achieve dimensionality
reduction through sparsity. Little is known about the relationship between
these notions of dimensionality reduction in the two paradigms. We derive
several results relating the support of a loglinear model to the nonnegative
rank of the associated probability tensor. Motivated by these findings, we
propose a new collapsed Tucker class of tensor decompositions, which bridge
existing PARAFAC and Tucker decompositions, providing a more flexible framework
for parsimoniously characterizing multivariate categorical data. Taking a
Bayesian approach to inference, we illustrate advantages of the new
decompositions in simulations and an application to functional disability data.