
Hamiltonian Monte Carlo has emerged as a standard tool for posterior
computation. In this article, we present an extension that can efficiently
explore target distributions with discontinuous densities. Our extension in
particular enables efficient sampling from ordinal parameters though embedding
of probability mass functions into continuous spaces. We motivate our approach
through a theory of discontinuous Hamiltonian dynamics and develop a
corresponding numerical solver. The proposed solver is the first of its kind,
with a remarkable ability to exactly preserve the Hamiltonian. We apply our
algorithm to challenging posterior inference problems to demonstrate its wide
applicability and competitive performance.

There is wide interest in studying how the distribution of a continuous
response changes with a predictor. We are motivated by environmental
applications in which the predictor is the dose of an exposure and the response
is a health outcome. A main focus in these studies is inference on dose levels
associated with a given increase in risk relative to a baseline. Popular
methods either dichotomize the continuous response or focus on modeling changes
with the dose in the expectation of the outcome. Such choices may lead to
information loss and provide inaccurate inference on doseresponse
relationships. We instead propose a Bayesian convex mixture regression model
that allows the entire distribution of the health outcome to be unknown and
changing with the dose. To balance flexibility and parsimony, we rely on a
mixture model for the density at the extreme doses, and express the conditional
density at each intermediate dose via a convex combination of these extremal
densities. This representation generalizes classical doseresponse models for
quantitative outcomes, and provides a more parsimonious, but still powerful,
formulation compared to nonparametric methods, thereby improving
interpretability and efficiency in inference on risk functions. A Markov chain
Monte Carlo algorithm for posterior inference is developed, and the benefits of
our methods are outlined in simulations, along with a study on the impact of
DDT exposure on gestational age.

There is increasing interest in learning a set of small outcomerelevant
subgraphs in networkpredictor regression. The extracted signal subgraphs can
greatly improve the interpretation of the association between the network
predictor and the response. In brain connectomics, the brain network for an
individual corresponds to a set of interconnections among brain regions and
there is a strong interest in linking the brain connectome to human cognitive
traits. Modern neuroimaging technology allows a very fine segmentation of the
brain, producing very large structural brain networks. Therefore, accurate and
efficient methods for identifying a set of small predictive subgraphs become
crucial, leading to discovery of key interconnected brain regions related to
the trait and important insights on the mechanism of variation in human
cognitive traits. We propose a symmetric bilinear model with $L_1$ penalty to
search for small clique subgraphs that contain useful information about the
response. A coordinate descent algorithm is developed to estimate the model
where we derive analytical solutions for a sequence of conditional convex
optimizations. Application of this method on human connectome and language
comprehension data shows interesting discovery of relevant interconnections
among several small sets of brain regions and better predictive performance
than competitors.

This article focuses on the problem of studying shared and
individualspecific structure in replicated networks or graphvalued data. In
particular, the observed data consist of $n$ graphs, $G_i, i=1,\ldots,n$, with
each graph consisting of a collection of edges between $V$ nodes. In brain
connectomics, the graph for an individual corresponds to a set of
interconnections among brain regions. Such data can be organized as a $V \times
V$ binary adjacency matrix $A_i$ for each $i$, with ones indicating an edge
between a pair of nodes and zeros indicating no edge. When nodes have a shared
meaning across replicates $i=1,\ldots,n$, it becomes of substantial interest to
study similarities and differences in the adjacency matrices. To address this
problem, we propose a method to estimate a common structure and lowdimensional
individualspecific deviations from replicated networks. The proposed Multiple
GRAph Factorization (MGRAF) model relies on a logistic regression mapping
combined with a hierarchical eigenvalue decomposition. We develop an efficient
algorithm for estimation and study basic properties of our approach. Simulation
studies show excellent operating characteristics and we apply the method to
human brain connectomics data.

This paper proposes Bayesian mosaic, a parallelizable composite posterior,
for scalable Bayesian inference on a broad class of multivariate discrete data
models. Sampling is embarrassingly parallel since Bayesian mosaic is a
multiplication of component posteriors that can be independently sampled from.
Analogous to composite likelihood methods, these component posteriors are based
on univariate or bivariate marginal densities. Utilizing the fact that the
score functions of these densities are unbiased, we show that Bayesian mosaic
is consistent and asymptotically normal under mild conditions. Since the
evaluation of univariate or bivariate marginal densities can rely on numerical
integration, sampling from Bayesian mosaic bypasses the traditional data
augmented Markov chain Monte Carlo (DAMCMC) method, which has a provably slow
mixing rate when data are imbalanced. Moreover, we show that sampling from
Bayesian mosaic has better scalability to large sample size than DAMCMC. The
method is evaluated via simulation studies and an application on a citation
count dataset.

Dirichlet process mixture (DPM) models tend to produce many small clusters
regardless of whether they are needed to accurately characterize the data 
this is particularly true for large data sets. However, interpretability,
parsimony, data storage and communication costs all are hampered by having
overly many clusters. We propose a powered Chinese restaurant process to limit
this kind of problem and penalize over clustering. The method is illustrated
using some simulation examples and data with large and small sample size
including MNIST and the Old Faithful Geyser data.

We propose a class of intrinsic Gaussian processes (inGPs) for
interpolation, regression and classification on manifolds with a primary focus
on complex constrained domains or irregular shaped spaces arising as subsets or
submanifolds of R, R2, R3 and beyond. For example, inGPs can accommodate
spatial domains arising as complex subsets of Euclidean space. inGPs respect
the potentially complex boundary or interior conditions as well as the
intrinsic geometry of the spaces. The key novelty of the proposed approach is
to utilise the relationship between heat kernels and the transition density of
Brownian motion on manifolds for constructing and approximating valid and
computationally feasible covariance kernels. This enables inGPs to be
practically applied in great generality, while existing approaches for
smoothing on constrained domains are limited to simple special cases. The broad
utilities of the inGP approach is illustrated through simulation studies and
data examples.

The Markov Chain Monte Carlo method is the dominant paradigm for posterior
computation in Bayesian analysis. It is common to control computation time by
making approximations to the Markov transition kernel. Comparatively little
attention has been paid to computational optimality in these approximating
Markov Chains, or when such approximations are justified relative to obtaining
shorter paths from the exact kernel. We give simple, sharp bounds for uniform
approximations of uniformly mixing Markov chains. We then suggest a notion of
optimality that incorporates computation time and approximation error, and use
our bounds to make generalizations about properties of good approximations in
the uniformly mixing setting. The relevance of these properties is demonstrated
in applications to a minibatchingbased approximate MCMC algorithm for large
$n$ logistic regression and lowrank approximations for Gaussian processes.

Gaussian processes (GPs) are very widely used for modeling of unknown
functions or surfaces in applications ranging from regression to classification
to spatial processes. Although there is an increasingly vast literature on
applications, methods, theory and algorithms related to GPs, the overwhelming
majority of this literature focuses on the case in which the input domain
corresponds to a Euclidean space. However, particularly in recent years with
the increasing collection of complex data, it is commonly the case that the
input domain does not have such a simple form. For example, it is common for
the inputs to be restricted to a nonEuclidean manifold, a case which forms the
motivation for this article. In particular, we propose a general extrinsic
framework for GP modeling on manifolds, which relies on embedding of the
manifold into a Euclidean space and then constructing extrinsic kernels for GPs
on their images. These extrinsic Gaussian processes (eGPs) are used as prior
distributions for unknown functions in Bayesian inferences. Our approach is
simple and general, and we show that the eGPs inherit fine theoretical
properties from GP models in Euclidean spaces. We consider applications of our
models to regression and classification problems with predictors lying in a
large class of manifolds, including spheres, planar shape spaces, a space of
positive definite matrices, and Grassmannians. Our models can be readily used
by practitioners in biological sciences for various regression and
classification problems, such as disease diagnosis or detection. Our work is
also likely to have impact in spatial statistics when spatial locations are on
the sphere or other geometric spaces.

Hamiltonian Monte Carlo (HMC) has become routinely used for sampling from
posterior distributions. Its extension Riemann manifold HMC (RMHMC) modifies
the proposal kernel through distortion of local distances by a Riemannian
metric. The performance depends critically on the choice of metric, with the
Fisher information providing the standard choice. In this article, we propose a
new class of metrics aimed at improving HMC's performance on multimodal target
distributions. We refer to the proposed approach as geometrically tempered HMC
(GTHMC) due to its connection to other tempering methods. We establish a
geometric theory behind RMHMC to motivate GTHMC and characterize its
theoretical properties. Moreover, we develop a novel variable step size
integrator for simulating Hamiltonian dynamics to improve on the usual
St\"{o}rmerVerlet integrator which suffers from numerical instability in GTHMC
settings. We illustrate GTHMC through simulations, demonstrating generality and
substantial gains over standard HMC implementations in terms of effective
sample sizes.

Hamiltonian Monte Carlo (HMC) and related algorithms have become routinely
used in Bayesian computation. In this article, we present a simple and provably
accurate method to improve the efficiency of HMC and related algorithms with
essentially no extra computational cost. This is achieved by {recycling the
intermediate states along simulated trajectories of Hamiltonian dynamics.
Standard algorithms use only the end points of trajectories, wastefully
discarding all the intermediate states. Compared to the alternative methods for
utilizing the intermediate states, our algorithm is simpler to apply in
practice and requires little programming effort beyond the usual
implementations of HMC and related algorithms. Our algorithm applies
straightforwardly to the noUturn sampler, arguably the most popular variant
of HMC. Through a variety of experiments, we demonstrate that our recycling
algorithm yields substantial computational efficiency gains.

Nonlinear latent variable models have become increasingly popular in a
variety of applications. However, there has been little study on theoretical
properties of these models. In this article, we study rates of posterior
contraction in univariate density estimation for a class of nonlinear latent
variable models where unobserved U(0,1) latent variables are related to the
response variables via a random nonlinear regression with an additive error.
Our approach relies on characterizing the space of densities induced by the
above model as kernel convolutions with a general class of continuous mixing
measures. The literature on posterior rates of contraction in density
estimation almost entirely focuses on finite or countably infinite mixture
models. We develop approximation results for our class of continuous mixing
measures. Using an appropriate Gaussian process prior on the unknown regression
function, we obtain the optimal frequentist rate up to a logarithmic factor
under standard regularity conditions on the true density.

Ordinary least squares (OLS) is the default method for fitting linear models,
but is not applicable for problems with dimensionality larger than the sample
size. For these problems, we advocate the use of a generalized version of OLS
motivated by ridge regression, and propose two novel threestep algorithms
involving least squares fitting and hard thresholding. The algorithms are
methodologically simple to understand intuitively, computationally easy to
implement efficiently, and theoretically appealing for choosing models
consistently. Numerical exercises comparing our methods with penalizationbased
approaches in simulations and data analyses illustrate the great potential of
the proposed algorithms.

Hybrid Monte Carlo (HMC) generates samples from a prescribed probability
distribution in a configuration space by simulating Hamiltonian dynamics,
followed by the Metropolis (Hastings) acceptance/rejection step. Compressible
HMC (CHMC) generalizes HMC to a situation in which the dynamics is reversible
but not necessarily Hamiltonian. This article presents a framework to further
extend the algorithm. Within the existing framework, each trajectory of the
dynamics must be integrated for the same amount of (random) time to generate a
valid Metropolis proposal. Our generalized acceptance/rejection mechanism
allows a more deliberate choice of the integration time for each trajectory.
The proposed algorithm in particular enables an effective application of
variable step size integrators to HMCtype sampling algorithms based on
reversible dynamics. The potential of our framework is further demonstrated by
another extension of HMC which reduces the wasted computations due to unstable
numerical approximations and corresponding rejected proposals.

Fitting statistical models is computationally challenging when the sample
size or the dimension of the dataset is huge. An attractive approach for
downscaling the problem size is to first partition the dataset into subsets
and then fit using distributed algorithms. The dataset can be partitioned
either horizontally (in the sample space) or vertically (in the feature space).
While the majority of the literature focuses on sample space partitioning,
feature space partitioning is more effective when $p\gg n$. Existing methods
for partitioning features, however, are either vulnerable to high correlations
or inefficient in reducing the model dimension. In this paper, we solve these
problems through a new embarrassingly parallel framework named DECO for
distributed variable selection and parameter estimation. In DECO, variables are
first partitioned and allocated to $m$ distributed workers. The decorrelated
subset data within each worker are then fitted via any algorithm designed for
highdimensional problems. We show that by incorporating the decorrelation
step, DECO can achieve consistent variable selection and parameter estimation
on each subset with (almost) no assumptions. In addition, the convergence rate
is nearly minimax optimal for both sparse and weakly sparse models and does NOT
depend on the partition number $m$. Extensive numerical experiments are
provided to illustrate the performance of the new framework.

We present a data augmentation scheme to perform Markov chain Monte Carlo
inference for models where data generation involves a rejection sampling
algorithm. Our idea, which seems to be missing in the literature, is a simple
scheme to instantiate the rejected proposals preceding each data point. The
resulting joint probability over observed and rejected variables can be much
simpler than the marginal distribution over the observed variables, which often
involves intractable integrals. We consider three problems, the first being the
modeling of flowcytometry measurements subject to truncation. The second is a
Bayesian analysis of the matrix Langevin distribution on the Stiefel manifold,
and the third, Bayesian inference for a nonparametric Gaussian process density
model. The latter two are instances of problems where Markov chain Monte Carlo
inference is doublyintractable. Our experiments demonstrate superior
performance over stateoftheart sampling algorithms for such problems.

Although Bayesian density estimation using discrete mixtures has good
performance in modest dimensions, there is a lack of statistical and
computational scalability to highdimensional multivariate cases. To combat the
curse of dimensionality, it is necessary to assume the data are concentrated
near a lowerdimensional subspace. However, Bayesian methods for learning this
subspace along with the density of the data scale poorly computationally. To
solve this problem, we propose an empirical Bayes approach, which estimates a
multiscale dictionary using geometric multiresolution analysis in a first
stage. We use this dictionary within a multiscale mixture model, which allows
uncertainty in component allocation, mixture weights and scaling factors over a
binary tree. A computational algorithm is proposed, which scales efficiently to
massive dimensional problems. We provide some theoretical support for this
geometric density estimation (GEODE) method, and illustrate the performance
through simulated and real data examples.

For massive data sets, efficient computation commonly relies on distributed
algorithms that store and process subsets of the data on different machines,
minimizing communication costs. Our focus is on regression and classification
problems involving many features. A variety of distributed algorithms have been
proposed in this context, but challenges arise in defining an algorithm with
low communication, theoretical guarantees and excellent practical performance
in general settings. We propose a MEdian Selection Subset AGgregation Estimator
(message) algorithm, which attempts to solve these problems. The algorithm
applies feature selection in parallel for each subset using Lasso or another
method, calculates the `median' feature inclusion index, estimates coefficients
for the selected features in parallel for each subset, and then averages these
estimates. The algorithm is simple, involves very minimal communication, scales
efficiently in both sample and feature size, and has theoretical guarantees. In
particular, we show model selection consistency and coefficient estimation
efficiency. Extensive experiments show excellent performance in variable
selection, estimation, prediction, and computation time relative to usual
competitors.

Sparse Bayesian factor models are routinely implemented for parsimonious
dependence modeling and dimensionality reduction in highdimensional
applications. We provide theoretical understanding of such Bayesian procedures
in terms of posterior convergence rates in inferring highdimensional
covariance matrices where the dimension can be larger than the sample size.
Under relevant sparsity assumptions on the true covariance matrix, we show that
commonlyused point mass mixture priors on the factor loadings lead to
consistent estimation in the operator norm even when $p\gg n$. One of our major
contributions is to develop a new class of continuous shrinkage priors and
provide insights into their concentration around sparse vectors. Using such
priors for the factor loadings, we obtain similar rate of convergence as
obtained with point mass mixture priors. To obtain the convergence rates, we
construct test functions to separate points in the space of highdimensional
covariance matrices using insights from random matrix theory; the tools
developed may be of independent interest. We also derive minimax rates and show
that the Bayesian posterior rates of convergence coincide with the minimax
rates upto a $\sqrt{\log n}$ term.

In nonparametric regression problems involving multiple predictors, there is
typically interest in estimating an anisotropic multivariate regression surface
in the important predictors while discarding the unimportant ones. Our focus is
on defining a Bayesian procedure that leads to the minimax optimal rate of
posterior contraction (up to a log factor) adapting to the unknown dimension
and anisotropic smoothness of the true surface. We propose such an approach
based on a Gaussian process prior with dimensionspecific scalings, which are
assigned carefullychosen hyperpriors. We additionally show that using a
homogenous Gaussian process with a single bandwidth leads to a suboptimal rate
in anisotropic cases.

We study the behavior of the posterior distribution in highdimensional
Bayesian Gaussian linear regression models having $p\gg n$, with $p$ the number
of predictors and $n$ the sample size. Our focus is on obtaining quantitative
finite sample bounds ensuring sufficient posterior probability assigned in
neighborhoods of the true regression coefficient vector, $\beta^0$, with high
probability. We assume that $\beta^0$ is approximately $S$sparse and obtain
universal bounds, which provide insight into the role of the prior in
controlling concentration of the posterior. Based on these finite sample
bounds, we examine the implied asymptotic contraction rates for several
examples showing that sparselystructured and heavytail shrinkage priors
exhibit rapid contraction rates. We also demonstrate that a stronger result
holds for the UniformGaussian\footnote[2]{A binary vector of indicators
($\gamma$) is drawn from the uniform distribution on the set of binary
sequences with exactly $S$ ones, and then each $\beta_i\sim\mathcal{N}(0,V^2)$
if $\gamma_i=1$ and $\beta_i=0$ if $\gamma_i=0$.} prior. These types of finite
sample bounds provide guidelines for designing and evaluating priors for
highdimensional problems.

An extremely common bottleneck encountered in statistical learning algorithms
is inversion of huge covariance matrices, examples being in evaluating Gaussian
likelihoods for a large number of data points. We propose general parallel
algorithms for inverting positive definite matrices, which are nearly rank
deficient. Such matrix inversions are needed in Gaussian process computations,
among other settings, and remain a bottleneck even with the increasing
literature on low rank approximations. We propose a general class of algorithms
for parallelizing computations to dramatically speed up computation time by
orders of magnitude exploiting multicore architectures. We implement our
algorithm on a cloud computing platform, providing pseudo and actual code. The
algorithm can be easily implemented on any multicore parallel computing
resource. Some illustrations are provided to give a flavor for the gains and
what becomes possible in freeing up this bottleneck.

It has become routine to collect data that are structured as multiway arrays
(tensors). There is an enormous literature on low rank and sparse matrix
factorizations, but limited consideration of extensions to the tensor case in
statistics. The most common low rank tensor factorization relies on parallel
factor analysis (PARAFAC), which expresses a rank $k$ tensor as a sum of rank
one tensors. When observations are only available for a tiny subset of the
cells of a big tensor, the low rank assumption is not sufficient and PARAFAC
has poor performance. We induce an additional layer of dimension reduction by
allowing the effective rank to vary across dimensions of the table. For
concreteness, we focus on a contingency table application. Taking a Bayesian
approach, we place priors on terms in the factorization and develop an
efficient Gibbs sampler for posterior computation. Theory is provided showing
posterior concentration rates in highdimensional settings, and the methods are
shown to have excellent performance in simulations and several real data
applications.

The preservation of our cultural heritage is of paramount importance. Thanks
to recent developments in digital acquisition techniques, powerful image
analysis algorithms are developed which can be useful noninvasive tools to
assist in the restoration and preservation of art. In this paper we propose a
semisupervised crack detection method that can be used for highdimensional
acquisitions of paintings coming from different modalities. Our dataset
consists of a recently acquired collection of images of the Ghent Altarpiece
(1432), one of Northern Europe's most important art masterpieces. Our goal is
to build a classifier that is able to discern crack pixels from the background
consisting of noncrack pixels, making optimal use of the information that is
provided by each modality. To accomplish this we employ a recently developed
nonparametric Bayesian classifier, that uses tensor factorizations to
characterize any conditional probability. A prior is placed on the parameters
of the factorization such that every possible interaction between predictors is
allowed while still identifying a sparse subset among these predictors. The
proposed Bayesian classifier, which we will refer to as conditional Bayesian
tensor factorization or CBTF, is assessed by visually comparing classification
results with the Random Forest (RF) algorithm.

We propose a generalized double Pareto prior for Bayesian shrinkage
estimation and inferences in linear models. The prior can be obtained via a
scale mixture of Laplace or normal distributions, forming a bridge between the
Laplace and NormalJeffreys' priors. While it has a spike at zero like the
Laplace density, it also has a Student's $t$like tail behavior. Bayesian
computation is straightforward via a simple Gibbs sampling algorithm. We
investigate the properties of the maximum a posteriori estimator, as sparse
estimation plays an important role in many problems, reveal connections with
some wellestablished regularization procedures, and show some asymptotic
results. The performance of the prior is tested through simulations and an
application.