
While the problem of estimating a probability density function (pdf) from its
observations is classical, the estimation under additional shape constraints is
both important and challenging. We introduce an efficient, geometric approach
for estimating pdfs given the number of its modes. This approach explores the
space of constrained pdf's using an action of the diffeomorphism group that
preserves their shapes. It starts with an initial template, with the desired
number of modes and arbitrarily chosen heights at the critical points, and
transforms it via: (1) composition by diffeomorphisms and (2) normalization to
obtain the final density estimate. The search for optimal diffeomorphism is
performed under the maximumlikelihood criterion and is accomplished by mapping
diffeomorphisms to the tangent space of a Hilbert sphere, a vector space whose
elements can be expressed using an orthogonal basis. This framework is first
applied to shapeconstrained univariate, unconditional pdf estimation and then
extended to conditional pdf estimation. We derive asymptotic convergence rates
of the estimator and demonstrate this approach using a synthetic dataset
involving speed distribution for different traffic flow on Californian
driveways.

A fundamental problem in network analysis is clustering the nodes into groups
which share a similar connectivity pattern. Existing algorithms for community
detection assume the knowledge of the number of clusters or estimate it a
priori using various selection criteria and subsequently estimate the community
structure. Ignoring the uncertainty in the first stage may lead to erroneous
clustering, particularly when the community structure is vague. We instead
propose a coherent probabilistic framework for simultaneous estimation of the
number of communities and the community structure, adapting recently developed
Bayesian nonparametric techniques to network models. An efficient Markov chain
Monte Carlo (MCMC) algorithm is proposed which obviates the need to perform
reversible jump MCMC on the number of clusters. The methodology is shown to
outperform recently developed community detection algorithms in a variety of
synthetic data examples and in benchmark realdatasets. Using an appropriate
metric on the space of all configurations, we develop nonasymptotic Bayes risk
bounds even when the number of clusters is unknown. Enroute, we develop
concentration properties of nonlinear functions of Bernoulli random variables,
which may be of independent interest.

We propose a family of variational approximations to Bayesian posterior
distributions, called $\alpha$VB, with provable statistical guarantees. The
standard variational approximation is a special case of $\alpha$VB with
$\alpha=1$. When $\alpha \in(0,1]$, a novel class of variational inequalities
are developed for linking the Bayes risk under the variational approximation to
the objective function in the variational optimization problem, implying that
maximizing the evidence lower bound in variational inference has the effect of
minimizing the Bayes risk within the variational density family. Operating in a
frequentist setup, the variational inequalities imply that point estimates
constructed from the $\alpha$VB procedure converge at an optimal rate to the
true parameter in a wide range of problems. We illustrate our general theory
with a number of examples, including the meanfield variational approximation
to (low)highdimensional Bayesian linear regression with spike and slab
priors, mixture of Gaussian models, latent Dirichlet allocation, and (mixture
of) Gaussian variational approximation in regular parametric models.

The article addresses a longstanding open problem on the justification of
using variational Bayes methods for parameter estimation. We provide general
conditions for obtaining optimal risk bounds for point estimates acquired from
meanfield variational Bayesian inference. The conditions pertain to the
existence of certain test functions for the distance metric on the parameter
space and minimal assumptions on the prior. A general recipe for verification
of the conditions is outlined which is broadly applicable to existing Bayesian
models with or without latent variables. As illustrations, specific
applications to Latent Dirichlet Allocation and Gaussian mixture models are
discussed.

Gaussian process (GP) regression is a powerful interpolation technique due to
its flexibility in capturing nonlinearity. In this paper, we provide a general
framework for understanding the frequentist coverage of pointwise and
simultaneous Bayesian credible sets in GP regression. As an intermediate
result, we develop a Bernstein vonMises type result under supremum norm in
random design GP regression. Identifying both the mean and covariance function
of the posterior distribution of the Gaussian process as regularized
$M$estimators, we show that the sampling distribution of the posterior mean
function and the centered posterior distribution can be respectively
approximated by two population level GPs. By developing a comparison inequality
between two GPs, we provide exact characterization of frequentist coverage
probabilities of Bayesian pointwise credible intervals and simultaneous
credible bands of the regression function. Our results show that inference
based on GP regression tends to be conservative; when the prior is
undersmoothed, the resulting credible intervals and bands have minimaxoptimal
sizes, with their frequentist coverage converging to a nondegenerate value
between their nominal level and one. As a byproduct of our theory, we show that
the GP regression also yields minimaxoptimal posterior contraction rate
relative to the supremum norm, which provides a positive evidence to the long
standing problem on optimal supremum norm contraction rate in GP regression.

In this article, we propose new Bayesian methods for selecting and estimating
a sparse coefficient vector for skewed heteroscedastic response. Our novel
Bayesian procedures effectively estimate the median and other quantile
functions, accommodate nonlocal prior for regression effects without
compromising ease of implementation via sampling based tools, and
asymptotically select the true set of predictors even when the number of
covariates increases in the same order of the sample size. We also extend our
method to deal with some observations with very large errors. Via simulation
studies and a reanalysis of a medical cost study with large number of
potential predictors, we illustrate the ease of implementation and other
practical advantages of our approach compared to existing methods for such
studies.

We propose a method for estimating a covariance matrix that can be
represented as a sum of a lowrank matrix and a diagonal matrix. The proposed
method compresses highdimensional data, computes the sample covariance in the
compressed space, and lifts it back to the ambient space via a decompression
operation. A salient feature of our approach relative to existing literature on
combining sparsity and lowrank structures in covariance matrix estimation is
that we do not require the lowrank component to be sparse. A principled
framework for estimating the compressed dimension using Stein's Unbiased Risk
Estimation theory is demonstrated. Experimental simulation results demonstrate
the efficacy and scalability of our proposed approach.

Nonlinear latent variable models have become increasingly popular in a
variety of applications. However, there has been little study on theoretical
properties of these models. In this article, we study rates of posterior
contraction in univariate density estimation for a class of nonlinear latent
variable models where unobserved U(0,1) latent variables are related to the
response variables via a random nonlinear regression with an additive error.
Our approach relies on characterizing the space of densities induced by the
above model as kernel convolutions with a general class of continuous mixing
measures. The literature on posterior rates of contraction in density
estimation almost entirely focuses on finite or countably infinite mixture
models. We develop approximation results for our class of continuous mixing
measures. Using an appropriate Gaussian process prior on the unknown regression
function, we obtain the optimal frequentist rate up to a logarithmic factor
under standard regularity conditions on the true density.

We introduce a geometric approach for estimating a probability density
function (pdf) given its samples. The procedure involves obtaining an initial
estimate of the pdf and then transforming it via a warping function to reach
the final estimate. The initial estimate is intended to be computationally
fast, albeit suboptimal, but its warping creates a larger, flexible class of
density functions, resulting in substantially improved estimation. The warping
is accomplished by mapping diffeomorphic functions to the tangent space of a
Hilbert sphere, a vector space whose elements can be expressed using an
orthogonal basis. Using a truncated basis expansion, we estimate the optimal
warping and, thus, the optimal density estimate. This framework is introduced
for univariate, unconditional pdf estimation and then extended to conditional
pdf estimation. The approach avoids many of the computational pitfalls
associated with current methods without losing on estimation performance. In
presence of irrelevant predictors, the approach achieves both statistical and
computational efficiency compared to classical approaches for conditional
density estimation. We derive asymptotic convergence rates of the density
estimator and demonstrate this approach using synthetic datasets, and a case
study to understand association of a toxic metabolite on preterm birth.

In this article, we investigate large sample properties of model selection
procedures in a general Bayesian framework when a closed form expression of the
marginal likelihood function is not available or a local asymptotic quadratic
approximation of the loglikelihood function does not exist. Under appropriate
identifiability assumptions on the true model, we provide sufficient conditions
for a Bayesian model selection procedure to be consistent and obey the Occam's
razor phenomenon, i.e., the probability of selecting the "smallest" model that
contains the truth tends to one as the sample size goes to infinity. In order
to show that a Bayesian model selection procedure selects the smallest model
containing the truth, we impose a prior anticoncentration condition, requiring
the prior mass assigned by large models to a neighborhood of the truth to be
sufficiently small. In a more general setting where the strong model
identifiability assumption may not hold, we introduce the notion of local
Bayesian complexity and develop oracle inequalities for Bayesian model
selection procedures. Our Bayesian oracle inequality characterizes a tradeoff
between the approximation error and a Bayesian characterization of the local
complexity of the model, illustrating the adaptive nature of averagingbased
Bayesian procedures towards achieving an optimal rate of posterior convergence.
Specific applications of the model selection theory are discussed in the
context of highdimensional nonparametric regression and density regression
where the regression function or the conditional density is assumed to depend
on a fixed subset of predictors. As a result of independent interest, we
propose a general technique for obtaining upper bounds of certain small ball
probability of stationary Gaussian processes.

We propose a distributed computing framework, based on a divide and conquer
strategy and hierarchical modeling, to accelerate posterior inference for
highdimensional Bayesian factor models. Our approach distributes the task of
highdimensional covariance matrix estimation to multiple cores, solves each
subproblem separately via a latent factor model, and then combines these
estimates to produce a global estimate of the covariance matrix. Existing
divide and conquer methods focus exclusively on dividing the total number of
observations $n$ into subsamples while keeping the dimension $p$ fixed. Our
approach is novel in this regard: it includes all of the $n$ samples in each
subproblem and, instead, splits the dimension $p$ into smaller subsets for each
subproblem. The subproblems themselves can be challenging to solve when $p$ is
large due to the dependencies across dimensions. To circumvent this issue, we
specify a novel hierarchical structure on the latent factors that allows for
flexible dependencies across dimensions, while still maintaining computational
efficiency. Our approach is readily parallelizable and is shown to have
computational efficiency of several orders of magnitude in comparison to
fitting a full factor model. We report the performance of our method in
synthetic examples and a genomics application.

We develop a finitesample goodnessoffit test for \emph{latentvariable}
block models for networks and test it on simulated and real data sets. The main
building block for the latent block assignment model test is the exact test for
the model with observed blocks assignment. The latter is implemented using
algebraic statistics. While we focus on three variants of the stochastic block
model, the methodology extends to any mixture of loglinear models on discrete
data.

We consider the problem of multivariate density deconvolution when the
interest lies in estimating the distribution of a vectorvalued random variable
but precise measurements of the variable of interest are not available,
observations being contaminated with additive measurement errors. The existing
sparse literature on the problem assumes the density of the measurement errors
to be completely known. We propose robust Bayesian semiparametric multivariate
deconvolution approaches when the measurement error density is not known but
replicated proxies are available for each unobserved value of the random
vector. Additionally, we allow the variability of the measurement errors to
depend on the associated unobserved value of the vector of interest through
unknown relationships which also automatically includes the case of
multivariate multiplicative measurement errors. Basic properties of finite
mixture models, multivariate normal kernels and exchangeable priors are
exploited in many novel ways to meet the modeling and computational challenges.
Theoretical results that show the flexibility of the proposed methods are
provided. We illustrate the efficiency of the proposed methods in recovering
the true density of interest through simulation experiments. The methodology is
applied to estimate the joint consumption pattern of different dietary
components from contaminated 24 hour recalls.

We consider the fractional posterior distribution that is obtained by
updating a prior distribution via Bayes theorem with a fractional likelihood
function, a usual likelihood function raised to a fractional power. First, we
analyze the contraction property of the fractional posterior in a general
misspecified framework. Our contraction results only require a prior mass
condition on certain KullbackLeibler (KL) neighborhood of the true parameter
(or the KL divergence minimizer in the misspecified case), and obviate
constructions of test functions and sieves commonly used in the literature for
analyzing the contraction property of a regular posterior. We show through a
counterexample that some condition controlling the complexity of the parameter
space is necessary for the regular posterior to contract, rendering additional
flexibility on the choice of the prior for the fractional posterior. Second, we
derive a novel Bayesian oracle inequality based on a PACBayes inequality in
misspecified models. Our derivation reveals several advantages of averaging
based Bayesian procedures over optimization based frequentist procedures. As an
application of the Bayesian oracle inequality, we derive a sharp oracle
inequality in the convex regression problem under an arbitrary dimension. We
also illustrate the theory in Gaussian process regression and density
estimation problems.

Additive nonparametric regression models provide an attractive tool for
variable selection in high dimensions when the relationship between the
response and predictors is complex. They offer greater flexibility compared to
parametric nonlinear regression models and better interpretability and
scalability than the nonparametric regression models. However, achieving
sparsity simultaneously in the number of nonparametric components as well as in
the variables within each nonparametric component poses a stiff computational
challenge. In this article, we develop a novel Bayesian additive regression
model using a combination of hard and soft shrinkages to separately control the
number of additive components and the variables within each component. An
efficient algorithm is developed to select the importance variables and
estimate the interaction network. Excellent performance is obtained in
simulated and real data examples.

Twocomponent mixture priors provide a traditional way to induce sparsity in
highdimensional Bayes models. However, several aspects of such a prior,
including computational complexities in highdimensions, interpretation of
exact zeros and nonsparse posterior summaries under standard loss functions,
has motivated an amazing variety of continuous shrinkage priors, which can be
expressed as globallocal scale mixtures of Gaussians. Interestingly, we
demonstrate that many commonly used shrinkage priors, including the Bayesian
Lasso, do not have adequate posterior concentration in highdimensional
settings.

We consider a nonparametric Bayesian model for conditional densities. The
model is a finite mixture of normal distributions with covariate dependent
multinomial logit mixing probabilities. A prior for the number of mixture
components is specified on positive integers. The marginal distribution of
covariates is not modeled. We study asymptotic frequentist behavior of the
posterior in this model. Specifically, we show that when the true conditional
density has a certain smoothness level, then the posterior contraction rate
around the truth is equal up to a log factor to the frequentist minimax rate of
estimation. An extension to the case when the covariate space is unbounded is
also established. As our result holds without a priori knowledge of the
smoothness level of the true density, the established posterior contraction
rates are adaptive. Moreover, we show that the rate is not affected by
inclusion of irrelevant covariates in the model. In Monte Carlo simulations, a
version of the model compares favorably to a crossvalidated kernel conditional
density estimator.

We study posterior rates of contraction in Gaussian process regression with
unbounded covariate domain. Our argument relies on developing a Gaussian
approximation to the posterior of the leading coefficients of a
KarhunenLo\'{e}ve expansion of the Gaussian process. The salient feature of
our result is deriving such an approximation in the $L^2$ Wasserstein distance
and relating the speed of the approximation to the posterior contraction rate
using a coupling argument. Specific illustrations are provided for the Gaussian
or squaredexponential covariance kernel.

With the advent of structured data in the form of social networks, genetic
circuits and protein interaction networks, statistical analysis of networks has
gained popularity over recent years. Stochastic block model constitutes a
classical clusterexhibiting random graph model for networks. There is a
substantial amount of literature devoted to proposing strategies for estimating
and inferring parameters of the model, both from classical and Bayesian
viewpoints. Unlike the classical counterpart, there is however a dearth of
theoretical results on the accuracy of estimation in the Bayesian setting. In
this article, we undertake a theoretical investigation of the posterior
distribution of the parameters in a stochastic block model. In particular, we
show that one obtains optimal rates of posterior convergence with routinely
used multinomialDirichlet priors on cluster indicators and uniform priors on
the probabilities of the random edge indicators. En route, we develop geometric
embedding techniques to exploit the lower dimensional structure of the
parameter space which may be of independent interest.

Unsupervised clustering of curves according to their shapes is an important
problem with broad scientific applications. The existing modelbased clustering
techniques either rely on simple probability models (e.g., Gaussian) that are
not generally valid for shape analysis or assume the number of clusters. We
develop an efficient Bayesian method to cluster curve data using an elastic
shape metric that is based on joint registration and comparison of shapes of
curves. The elasticinner product matrix obtained from the data is modeled
using a Wishart distribution whose parameters are assigned carefully chosen
prior distributions to allow for automatic inference on the number of clusters.
Posterior is sampled through an efficient Markov chain Monte Carlo procedure
based on the Chinese restaurant process to infer (1) the posterior distribution
on the number of clusters, and (2) clustering configuration of shapes. This
method is demonstrated on a variety of synthetic data and real data examples on
protein structure analysis, cell shape analysis in microscopy images, and
clustering of shaped from MPEG7 database.

Variable selection has received widespread attention over the last decade as
we routinely encounter highthroughput datasets in complex biological and
environment research. Most Bayesian variable selection methods are restricted
to mixture priors having separate components for characterizing the signal and
the noise. However, such priors encounter computational issues in high
dimensions. This has motivated continuous shrinkage priors, resembling the
twocomponent priors facilitating computation and interpretability. While such
priors are widely used for estimating highdimensional sparse vectors,
selecting a subset of variables remains a daunting task. In this article, we
propose a general approach for variable selection with shrinkage priors. The
presence of very few tuning parameters makes our method attractive in
comparison to adhoc thresholding approaches. The applicability of the approach
is not limited to continuous shrinkage priors, but can be used along with any
shrinkage prior. Theoretical properties for nearcollinear design matrices are
investigated and the method is shown to have good performance in a wide range
of synthetic data examples.

In Bayesian nonparametric models, Gaussian processes provide a popular prior
choice for regression function estimation. Existing literature on the
theoretical investigation of the resulting posterior distribution almost
exclusively assume a fixed design for covariates. The only random design result
we are aware of (van der Vaart & van Zanten, 2011) assumes the assigned
Gaussian process to be supported on the smoothness class specified by the true
function with probability one. This is a fairly restrictive assumption as it
essentially rules out the Gaussian process prior with a squared exponential
kernel when modeling rougher functions. In this article, we show that an
appropriate rescaling of the above Gaussian process leads to a rateoptimal
posterior distribution even when the covariates are independently realized from
a known density on a compact set. The proofs are based on deriving sharp
concentration inequalities for frequentist kernel estimators; the results might
be of independent interest.

Lung tumor tracking for radiotherapy requires realtime, multiplestep ahead
forecasting of a quasiperiodic time series recording instantaneous tumor
locations. We introduce a locationmixture autoregressive (LMAR) process that
admits multimodal conditional distributions, fast approximate inference using
the EM algorithm and accurate multiplestep ahead predictive distributions.
LMAR outperforms several commonly used methods in terms of outofsample
prediction accuracy using clinical data from lung tumor patients. With its
superior predictive performance and realtime computation, the LMAR model could
be effectively implemented for use in current tumor tracking systems.

Sparse Bayesian factor models are routinely implemented for parsimonious
dependence modeling and dimensionality reduction in highdimensional
applications. We provide theoretical understanding of such Bayesian procedures
in terms of posterior convergence rates in inferring highdimensional
covariance matrices where the dimension can be larger than the sample size.
Under relevant sparsity assumptions on the true covariance matrix, we show that
commonlyused point mass mixture priors on the factor loadings lead to
consistent estimation in the operator norm even when $p\gg n$. One of our major
contributions is to develop a new class of continuous shrinkage priors and
provide insights into their concentration around sparse vectors. Using such
priors for the factor loadings, we obtain similar rate of convergence as
obtained with point mass mixture priors. To obtain the convergence rates, we
construct test functions to separate points in the space of highdimensional
covariance matrices using insights from random matrix theory; the tools
developed may be of independent interest. We also derive minimax rates and show
that the Bayesian posterior rates of convergence coincide with the minimax
rates upto a $\sqrt{\log n}$ term.

In nonparametric regression problems involving multiple predictors, there is
typically interest in estimating an anisotropic multivariate regression surface
in the important predictors while discarding the unimportant ones. Our focus is
on defining a Bayesian procedure that leads to the minimax optimal rate of
posterior contraction (up to a log factor) adapting to the unknown dimension
and anisotropic smoothness of the true surface. We propose such an approach
based on a Gaussian process prior with dimensionspecific scalings, which are
assigned carefullychosen hyperpriors. We additionally show that using a
homogenous Gaussian process with a single bandwidth leads to a suboptimal rate
in anisotropic cases.