
This article considers variational approximations of the posterior
distribution in a highdimensional state space model. The variational
approximation is a multivariate Gaussian density, in which the variational
parameters to be optimized are a mean vector and a covariance matrix. The
number of parameters in the covariance matrix grows as the square of the number
of model parameters, so it is necessary to find simple yet effective
parametrizations of the covariance structure when the number of model
parameters is large. The joint posterior distribution over the highdimensional
state vectors is approximated using a dynamic factor model, with Markovian
dependence in time and a factor covariance structure for the states. This gives
a reduced dimension description of the dependence structure for the states, as
well as a temporal conditional independence structure similar to that in the
true posterior. We illustrate our approach in two highdimensional applications
which are challenging for Markov chain Monte Carlo sampling. The first is a
spatiotemporal model for the spread of the Eurasian CollaredDove across North
America. The second is a multivariate stochastic volatility model for financial
returns via a Wishart process.

We consider the problem of learning a Gaussian variational approximation to
the posterior distribution for a highdimensional parameter, where we impose
sparsity in the precision matrix to reflect appropriate conditional
independence structure in the model. Incorporating sparsity in the precision
matrix allows the Gaussian variational distribution to be both flexible and
parsimonious, and the sparsity is achieved through parameterization in terms of
the Cholesky factor. Efficient stochastic gradient methods which make
appropriate use of gradient information for the target distribution are
developed for the optimization. We consider alternative estimators of the
stochastic gradients which have lower variation and are more stable. Our
approach is illustrated using generalized linear mixed models and state space
models for time series.

It can be important in Bayesian analyses of complex models to construct
informative prior distributions which reflect knowledge external to the data at
hand. Nevertheless, how much prior information an analyst is able to elicit
from an expert for use in prior construction will be limited for practical
reasons, with checks for model adequacy and priordata conflict an essential
part of the process of model building and sensitivity analysis. This paper
develops effective numerical methods for exploring reasonable choices of a
prior distribution from a parametric class, when prior information is specified
in the form of some limited constraints on prior predictive distributions, and
where these prior predictive distributions are analytically intractable. The
methods developed may be thought of as a novel application of the ideas of
history matching, a technique developed in the literature on assessment of
computer models. We illustrate the approach in the context of logistic
regression and sparse signal shrinkage prior distributions for highdimensional
linear models.

Variational approximation methods have proven to be useful for scaling
Bayesian computations to large data sets and highly parametrized models.
Applying variational methods involves solving an optimization problem, and
recent research in this area has focused on stochastic gradient ascent methods
as a general approach to implementation. Here variational approximation is
considered for a posterior distribution in high dimensions using a Gaussian
approximating family. Gaussian variational approximation with an unrestricted
covariance matrix can be computationally burdensome in many problems because
the number of elements in the covariance matrix increases quadratically with
the dimension of the model parameter. To circumvent this problem,
lowdimensional factor covariance structures are considered. General stochastic
gradient approaches to efficiently perform the optimization are described, with
gradient estimates obtained using the socalled "reparametrization trick". The
end result is a flexible and efficient approach to highdimensional Gaussian
variational approximation, which we illustrate using eight real datasets.

When using complex Bayesian models to combine information, the checking for
consistency of the information being combined is good statistical practice.
Here a new method is developed for detecting priordata conflicts in Bayesian
models based on comparing the observed value of a prior to posterior divergence
to its distribution under the prior predictive distribution for the data. The
divergence measure used in our model check is a measure of how much beliefs
have changed from prior to posterior, and can be thought of as a measure of the
overall size of a relative belief function. It is shown that the proposed
method is intuitive, has desirable properties, can be extended to hierarchical
settings, and is related asymptotically to Jeffreys' and reference prior
distributions. In the case where calculations are difficult, the use of
variational approximations as a way of relieving the computational burden is
suggested. The methods are compared in a number of examples with an alternative
but closely related approach in the literature based on the prior predictive
distribution of a minimal sufficient statistic.

Synthetic likelihood is an attractive approach to likelihoodfree inference
when an approximately Gaussian summary statistic for the data, informative for
inference about the parameters, is available. The synthetic likelihood method
derives an approximate likelihood function from a plugin normal density
estimate for the summary statistic, with plugin mean and covariance matrix
obtained by Monte Carlo simulation from the model. In this article, we develop
alternatives to Markov chain Monte Carlo implementations of Bayesian synthetic
likelihoods with reduced computational overheads. Our approach uses stochastic
gradient variational inference methods for posterior approximation in the
synthetic likelihood context, employing unbiased estimates of the log
likelihood. We compare the new method with a related likelihood free
variational inference technique in the literature, while at the same time
improving the implementation of that approach in a number of ways. These new
algorithms are feasible to implement in situations which are challenging for
conventional approximate Bayesian computation (ABC) methods, in terms of the
dimensionality of the parameter and summary statistic.

Variational Bayes (VB) is rapidly becoming a popular tool for Bayesian
inference in statistical modeling. However, the existing VB algorithms are
restricted to cases where the likelihood is tractable, which precludes the use
of VB in many interesting situations such as in state space models and in
approximate Bayesian computation (ABC), where application of VB methods was
previously impossible. This paper extends the scope of application of VB to
cases where the likelihood is intractable, but can be estimated unbiasedly. The
proposed VB method therefore makes it possible to carry out Bayesian inference
in many statistical applications, including state space models and ABC. The
method is generic in the sense that it can be applied to almost all statistical
models without requiring too much modelbased derivation, which is a drawback
of many existing VB algorithms. We also show how the proposed method can be
used to obtain highly accurate VB approximations of marginal posterior
distributions.

Approximate Bayesian computation (ABC) refers to a family of inference
methods used in the Bayesian analysis of complex models where evaluation of the
likelihood is difficult. Conventional ABC methods often suffer from the curse
of dimensionality, and a marginal adjustment strategy was recently introduced
in the literature to improve the performance of ABC algorithms in
highdimensional problems. The marginal adjustment approach is extended using a
Gaussian copula approximation. The method first estimates the bivariate
posterior for each pair of parameters separately using a 2dimensional Gaussian
copula, and then combines these estimates together to estimate the joint
posterior. The approximation works well in large sample settings when the
posterior is approximately normal, but also works well in many cases which are
far from that situation due to the nonparametric estimation of the marginal
posterior distributions. If each bivariate posterior distribution can be well
estimated with a lowdimensional ABC analysis then this Gaussian copula method
can extend ABC methods to problems of high dimension. The method also results
in an analytic expression for the approximate posterior which is useful for
many purposes such as approximation of the likelihood itself. This method is
illustrated with several examples.

Flexible regression methods where interest centres on the way that the whole
distribution of a response vector changes with covariates are very useful in
some applications. A recently developed technique in this regard uses the
matrixvariate Dirichlet process as a prior for a mixing distribution on a
coefficient in a multivariate linear regression model. The method is
attractive, particularly in the multivariate setting, for the convenient way
that it allows for borrowing strength across different component regressions
and for its computational simplicity and tractability. The purpose of the
present article is to develop fast online variational Bayes approaches to
fitting this model and to investigate how they perform compared to MCMC and
batch variational methods in a number of scenarios.

Sliced Sudokubased spacefilling designs and, more generally, quasisliced
orthogonal arraybased spacefilling designs are useful experimental designs in
several contexts, including computer experiments with categorical in addition
to quantitative inputs and crossvalidation. Here, we provide a straightforward
construction of doubly orthogonal quasiSudoku Latin squares which can be used
to generate sliced spacefilling designs which achieve uniformity in one and
twodimensional projections for both the full design and each slice. A
construction of quasisliced orthogonal arrays based on these constructed
doubly orthogonal quasiSudoku Latin squares is also provided and can, in turn,
be used to generate sliced spacefilling designs which achieve uniformity in
one and twodimensional projections for the full design and and uniformity in
twodimensional projections for each slice. These constructions are very
practical to implement and yield a spectrum of design sizes and numbers of
factors not currently broadly available.

We develop a fast variational approximation scheme for Gaussian process (GP)
regression, where the spectrum of the covariance function is subjected to a
sparse approximation. Our approach enables uncertainty in covariance function
hyperparameters to be treated without using Monte Carlo methods and is robust
to overfitting. Our article makes three contributions. First, we present a
variational Bayes algorithm for fitting sparse spectrum GP regression models
that uses nonconjugate variational message passing to derive fast and efficient
updates. Second, we propose a novel adaptive neighbourhood technique for
obtaining predictive inference that is effective in dealing with
nonstationarity. Regression is performed locally at each point to be predicted
and the neighbourhood is determined using a measure defined based on
lengthscales estimated from an initial fit. Weighting dimensions according to
lengthscales, this downweights variables of little relevance, leading to
automatic variable selection and improved prediction. Third, we introduce a
technique for accelerating convergence in nonconjugate variational message
passing by adapting step sizes in the direction of the natural gradient of the
lower bound. Our adaptive strategy can be easily implemented and empirical
results indicate significant speedups.

We propose a novel Bayesian nonparametric method for hierarchical modelling
on a set of related density functions, where grouped data in the form of
samples from each density function are available. Borrowing strength across the
groups is a major challenge in this context. To address this problem, we
introduce a hierarchically structured prior, defined over a set of univariate
density functions, using convenient transformations of Gaussian processes.
Inference is performed through approximate Bayesian computation (ABC), via a
novel functional regression adjustment. The performance of the proposed method
is illustrated via a simulation study and an analysis of rural high school exam
performance in Brazil.

In stochastic variational inference, the variational Bayes objective function
is optimized using stochastic gradient approximation, where gradients computed
on small random subsets of data are used to approximate the true gradient over
the whole data set. This enables complex models to be fit to large data sets as
data can be processed in minibatches. In this article, we extend stochastic
variational inference for conjugateexponential models to nonconjugate models
and present a stochastic nonconjugate variational message passing algorithm for
fitting generalized linear mixed models that is scalable to large data sets. In
addition, we show that diagnostics for priorlikelihood conflict, which are
useful for Bayesian model criticism, can be obtained from nonconjugate
variational message passing automatically, as an alternative to
simulationbased Markov chain Monte Carlo methods. Finally, we demonstrate that
for moderatesized data sets, convergence can be accelerated by using the
stochastic version of nonconjugate variational message passing in the initial
stage of optimization before switching to the standard version.

The effects of different parametrizations on the convergence of Bayesian
computational algorithms for hierarchical models are well explored. Techniques
such as centering, noncentering and partial noncentering can be used to
accelerate convergence in MCMC and EM algorithms but are still not well studied
for variational Bayes (VB) methods. As a fast deterministic approach to
posterior approximation, VB is attracting increasing interest due to its
suitability for large highdimensional data. Use of different parametrizations
for VB has not only computational but also statistical implications, as
different parametrizations are associated with different factorized posterior
approximations. We examine the use of partially noncentered parametrizations in
VB for generalized linear mixed models (GLMMs). Our paper makes four
contributions. First, we show how to implement an algorithm called nonconjugate
variational message passing for GLMMs. Second, we show that the partially
noncentered parametrization can adapt to the quantity of information in the
data and determine a parametrization close to optimal. Third, we show that
partial noncentering can accelerate convergence and produce more accurate
posterior approximations than centering or noncentering. Finally, we
demonstrate how the variational lower bound, produced as part of the
computation, can be useful for model selection.

Mixtures of linear mixed models (MLMMs) are useful for clustering grouped
data and can be estimated by likelihood maximization through the EM algorithm.
The conventional approach to determining a suitable number of components is to
compare different mixture models using penalized loglikelihood criteria such
as BIC.We propose fitting MLMMs with variational methods which can perform
parameter estimation and model selection simultaneously. A variational
approximation is described where the variational lower bound and parameter
updates are in closed form, allowing fast evaluation. A new variational greedy
algorithm is developed for model selection and learning of the mixture
components. This approach allows an automatic initialization of the algorithm
and returns a plausible number of mixture components automatically. In cases of
weak identifiability of certain model parameters, we use hierarchical centering
to reparametrize the model and show empirically that there is a gain in
efficiency by variational algorithms similar to that in MCMC algorithms.
Related to this, we prove that the approximate rate of convergence of
variational algorithms by Gaussian approximation is equal to that of the
corresponding Gibbs sampler which suggests that reparametrizations can lead to
improved convergence in variational algorithms as well.

Modern statistical applications involving large data sets have focused
attention on statistical methodologies which are both efficient computationally
and able to deal with the screening of large numbers of different candidate
models. Here we consider computationally efficient variational Bayes approaches
to inference in highdimensional heteroscedastic linear regression, where both
the mean and variance are described in terms of linear functions of the
predictors and where the number of predictors can be larger than the sample
size. We derive a closed form variational lower bound on the log marginal
likelihood useful for model selection, and propose a novel fast greedy search
algorithm on the model space which makes use of one step optimization updates
to the variational lower bound in the current model for screening large numbers
of candidate predictor variables for inclusion/exclusion in a computationally
thrifty way. We show that the model search strategy we suggest is related to
widely used orthogonal matching pursuit algorithms for model search but yields
a framework for potentially extending these algorithms to more complex models.
The methodology is applied in simulations and in two real examples involving
prediction for food constituents using NIR technology and prediction of disease
progression in diabetes.

Model selection is an important activity in modern data analysis and the
conventional Bayesian approach to this problem involves calculation of marginal
likelihoods for different models, together with diagnostics which examine
specific aspects of model fit. Calculating the marginal likelihood is a
difficult computational problem. Our article proposes some extensions of the
Laplace approximation for this task that are related to copula models and which
are easy to apply. Variations which can be used both with and without
simulation from the posterior distribution are considered, as well as use of
the approximations with bridge sampling and in random effects models with a
large number of latent variables. The use of a tcopula to obtain higher
accuracy when multivariate dependence is not well captured by a Gaussian copula
is also discussed.