
Sampling from posterior distributions using Markov chain Monte Carlo (MCMC)
methods can require an exhaustive number of iterations, particularly when the
posterior is multimodal as the MCMC sampler can become trapped in a local mode
for a large number of iterations. In this paper, we introduce the
pseudoextended MCMC method as a simple approach for improving the mixing of
the MCMC sampler for multimodal posterior distributions. The pseudoextended
method augments the statespace of the posterior using pseudosamples as
auxiliary variables. On the extended space, the modes of the posterior are
connected, which allows the MCMC sampler to easily move between wellseparated
posterior modes. We demonstrate that the pseudoextended approach delivers
improved MCMC sampling over the Hamiltonian Monte Carlo algorithm on
multimodal posteriors, including Boltzmann machines and models with
sparsityinducing priors.

Gaussian process modulated Poisson processes provide a flexible framework for
modelling spatiotemporal point patterns. So far this had been restricted to one
dimension, binning to a predetermined grid, or small data sets of up to a few
thousand data points. Here we introduce Cox process inference based on Fourier
features. This sparse representation induces global rather than local
constraints on the function space and is computationally efficient. This allows
us to formulate a gridfree approximation that scales well with the number of
data points and the size of the domain. We demonstrate that this allows MCMC
approximations to the nonGaussian posterior. We also find that, in practice,
Fourier features have more consistent optimization behavior than previous
approaches. Our approximate Bayesian method can fit over 100,000 events with
complex spatiotemporal patterns in three dimensions on a single GPU.

The natural gradient method has been used effectively in conjugate Gaussian
process models, but the nonconjugate case has been largely unexplored. We
examine how natural gradients can be used in nonconjugate stochastic settings,
together with hyperparameter learning. We conclude that the natural gradient
can significantly improve performance in terms of wallclock time. For
illconditioned posteriors the benefit of the natural gradient method is
especially pronounced, and we demonstrate a practical setting where ordinary
gradients are unusable. We show how natural gradients can be computed
efficiently and automatically in any parameterization, using automatic
differentiation. Our code is integrated into the GPflow package.

We present a practical way of introducing convolutional structure into
Gaussian processes, making them more suited to highdimensional inputs like
images. The main contribution of our work is the construction of an
interdomain inducing point approximation that is welltailored to the
convolutional kernel. This allows us to gain the generalisation benefit of a
convolutional kernel, together with fast but accurate posterior inference. We
investigate several variations of the convolutional kernel, and apply it to
MNIST and CIFAR10, which have both been known to be challenging for Gaussian
processes. We also show how the marginal likelihood can be used to find an
optimal weighting between convolutional and RBF kernels to further improve
performance. We hope that this illustration of the usefulness of a marginal
likelihood will help automate discovering architectures in larger models.

Missing data and noisy observations pose significant challenges for reliably
predicting events from irregularly sampled multivariate time series
(longitudinal) data. Imputation methods, which are typically used for
completing the data prior to event prediction, lack a principled mechanism to
account for the uncertainty due to missingness. Alternatively, stateoftheart
joint modeling techniques can be used for jointly modeling the longitudinal and
event data and compute event probabilities conditioned on the longitudinal
observations. These approaches, however, make strong parametric assumptions and
do not easily scale to multivariate signals with many observations. Our
proposed approach consists of several key innovations. First, we develop a
flexible and scalable joint model based upon sparse multipleoutput Gaussian
processes. Unlike stateoftheart joint models, the proposed model can explain
highly challenging structure including nonGaussian noise while scaling to
large data. Second, we derive an optimal policy for predicting events using the
distribution of the event occurrence estimated by the joint model. The derived
policy tradesoff the cost of a delayed detection versus incorrect assessments
and abstains from making decisions when the estimated event probability does
not satisfy the derived confidence criteria. Experiments on a large dataset
show that the proposed framework significantly outperforms stateoftheart
techniques in event prediction.

The Gaussian process state space model (GPSSM) is a nonlinear dynamical
system, where unknown transition and/or measurement mappings are described by
GPs. Most research in GPSSMs has focussed on the state estimation problem.
However, the key challenge in GPSSMs has not been satisfactorily addressed yet:
system identification. To address this challenge, we impose a structured
Gaussian variational posterior distribution over the latent states, which is
parameterised by a recognition model in the form of a bidirectional recurrent
neural network. Inference with this structure allows us to recover a posterior
smoothed over the entire sequence(s) of data. We provide a practical algorithm
for efficiently computing a lower bound on the marginal likelihood using the
reparameterisation trick. This additionally allows arbitrary kernels to be used
within the GPSSM. We demonstrate that we can efficiently generate plausible
future trajectories of the system we seek to model with the GPSSM, requiring
only a small number of interactions with the true system.

This work brings together two powerful concepts in Gaussian processes: the
variational approach to sparse approximation and the spectral representation of
Gaussian processes. This gives rise to an approximation that inherits the
benefits of the variational approach but with the representational power and
computational scalability of spectral representations. The work hinges on a key
result that there exist spectral features related to a finite domain of the
Gaussian process which exhibit almostindependent covariances. We derive these
expressions for Matern kernels in one dimension, and generalize to more
dimensions using kernels with specific structures. Under the assumption of
additive Gaussian noise, our method requires only a single pass through the
dataset, making for very fast and accurate computation. We fit a model to 4
million training points in just a few minutes on a standard laptop. With
nonconjugate likelihoods, our MCMC scheme reduces the cost of computation from
O(NM2) (for a sparse Gaussian process) to O(NM) per iteration, where N is the
number of data and M is the number of features.

GPflow is a Gaussian process library that uses TensorFlow for its core
computations and Python for its front end. The distinguishing features of
GPflow are that it uses variational inference as the primary approximation
method, provides concise code through the use of automatic differentiation, has
been engineered with a particular emphasis on software testing and is able to
exploit GPU hardware.

We consider the problem of detecting and quantifying the periodic component
of a function given noisecorrupted observations of a limited number of
input/output tuples. Our approach is based on Gaussian process regression which
provides a flexible nonparametric framework for modelling periodic data. We
introduce a novel decomposition of the covariance function as the sum of
periodic and aperiodic kernels. This decomposition allows for the creation of
submodels which capture the periodic nature of the signal and its complement.
To quantify the periodicity of the signal, we derive a periodicity ratio which
reflects the uncertainty in the fitted submodels. Although the method can be
applied to many kernels, we give a special emphasis to the Mat\'ern family,
from the expression of the reproducing kernel Hilbert space inner product to
the implementation of the associated periodic kernels in a Gaussian process
toolkit. The proposed method is illustrated by considering the detection of
periodically expressed genes in the arabidopsis genome.

Gaussian process models are flexible, Bayesian nonparametric approaches to
regression. Properties of multivariate Gaussians mean that they can be combined
linearly in the manner of additive models and via a link function (like in
generalized linear models) to handle nonGaussian data. However, the link
function formalism is restrictive, link functions are always invertible and
must convert a parameter of interest to a linear combination of the underlying
processes. There are many likelihoods and models where a nonlinear combination
is more appropriate. We term these more general models Chained Gaussian
Processes: the transformation of the GPs to the likelihood parameters will not
generally be invertible, and that implies that linearisation would only be
possible with multiple (localized) links, i.e. a chain. We develop an
approximate inference procedure for Chained GPs that is scalable and applicable
to any factorized likelihood. We demonstrate the approximation on a range of
likelihood functions.

The variational framework for learning inducing variables (Titsias, 2009a)
has had a large impact on the Gaussian process literature. The framework may be
interpreted as minimizing a rigorously defined KullbackLeibler divergence
between the approximating and posterior processes. To our knowledge this
connection has thus far gone unremarked in the literature. In this paper we
give a substantial generalization of the literature on this topic. We give a
new proof of the result for infinite index sets which allows inducing points
that are not data points and likelihoods that depend on all function values. We
then discuss augmented index sets and show that, contrary to previous works,
marginal consistency of augmentation is not enough to guarantee consistency of
variational inference with the original model. We then characterize an extra
condition where such a guarantee is obtainable. Finally we show how our
framework sheds light on interdomain sparse approximations and sparse
approximations for Cox processes.

This paper is due to appear as a chapter of the forthcoming Handbook of
Approximate Bayesian Computation (ABC) by S. Sisson, L. Fan, and M. Beaumont.
We describe the challenge of calibrating climate simulators, and discuss the
differences in emphasis in climate science compared to many of the more
traditional ABC application areas. The primary difficulty is how to do
inference with a computationally expensive simulator which we can only afford
to run a small number of times, and we describe how Gaussian process emulators
are used as surrogate models in this case. We introduce the idea of history
matching, which is a nonprobabilistic calibration method, which divides the
parameter space into (not im)plausible and implausible regions. History
matching can be shown to be a special case of ABC, but with a greater emphasis
on defining realistic simulator discrepancy bounds, and using these to define
tolerances and metrics. We describe a design approach for choosing parameter
values at which to run the simulator, and illustrate the approach on a toy
climate model, showing that with careful design we can find the plausible
region with a very small number of model evaluations. Finally, we describe how
calibrated GENIE1 (an earth system model of intermediate complexity)
predictions have been used, and why it is important to accurately characterise
parametric uncertainty.

Motivation: Assigning RNAseq reads to their transcript of origin is a
fundamental task in transcript expression estimation. Where ambiguities in
assignments exist due to transcripts sharing sequence, e.g. alternative
isoforms or alleles, the problem can be solved through probabilistic inference.
Bayesian methods have been shown to provide accurate transcript abundance
estimates compared to competing methods. However, exact Bayesian inference is
intractable and approximate methods such as Markov chain Monte Carlo (MCMC) and
Variational Bayes (VB) are typically used. While providing a high degree of
accuracy and modelling flexibility, standard implementations can be
prohibitively slow for large datasets and complex transcriptome annotations.
Results: We propose a novel approximate inference scheme based on VB and
apply it to an existing model of transcript expression inference from RNAseq
data. Recent advances in VB algorithmics are used to improve the convergence of
the algorithm beyond the standard Variational Bayes Expectation Maximisation
(VBEM) algorithm. We apply our algorithm to simulated and biological datasets,
demonstrating a significant increase in speed with only very small loss in
accuracy of expression level estimation. We carry out a comparative study
against seven popular alternative methods and demonstrate that our new
algorithm provides excellent accuracy and interreplicate consistency while
remaining competitive in computation time.
Availability: The methods were implemented in R and C++, and are available as
part of the BitSeq project at \url{https://github.com/BitSeq}. The method is
also available through the BitSeq Bioconductor package. The source code to
reproduce all simulation results can be accessed via
\url{https://github.com/BitSeq/BitSeqVB_benchmarking}.

Gaussian process (GP) models form a core part of probabilistic machine
learning. Considerable research effort has been made into attacking three
issues with GP models: how to compute efficiently when the number of data is
large; how to approximate the posterior when the likelihood is not Gaussian and
how to estimate covariance function parameter posteriors. This paper
simultaneously addresses these, using a variational approximation to the
posterior which is sparse in support of the function but otherwise freeform.
The result is a Hybrid MonteCarlo sampling scheme which allows for a
nonGaussian approximation over the function values and covariance parameters
simultaneously, with efficient computations based on inducingpoint sparse GPs.
Code to replicate each experiment in this paper will be available shortly.

The Gaussian process latent variable model (GPLVM) is a popular approach to
nonlinear probabilistic dimensionality reduction. One design choice for the
model is the number of latent variables. We present a spike and slab prior for
the GPLVM and propose an efficient variational inference procedure that gives
a lower bound of the log marginal likelihood. The new model provides a more
principled approach for selecting latent dimensions than the standard way of
thresholding the lengthscale parameters. The effectiveness of our approach is
demonstrated through experiments on real and simulated data. Further, we extend
multiview Gaussian processes that rely on sharing latent dimensions (known as
manifold relevance determination) with spike and slab priors. This allows a
more principled approach for selecting a subset of the latent space for each
view of data. The extended model outperforms the previous stateoftheart when
applied to a crossmodal multimedia retrieval task.

Motivation: The mapping of RNAseq reads to their transcripts of origin is a
fundamental task in transcript expression estimation and differential
expression scoring. Where ambiguities in mapping exist due to transcripts
sharing sequence, e.g. alternative isoforms or alleles, the problem becomes an
instance of nontrivial probabilistic inference. Bayesian inference in such a
problem is intractable and approximate methods must be used such as Markov
chain Monte Carlo (MCMC) and Variational Bayes. Standard implementations of
these methods can be prohibitively slow for large datasets and complex gene
models.
Results: We propose an approximate inference scheme based on Variational
Bayes applied to an existing model of transcript expression inference from
RNAseq data. We apply recent advances in Variational Bayes algorithmics to
improve the convergence of the algorithm beyond the standard variational
expectationmaximisation approach. We apply our algorithm to simulated and
biological datasets, demonstrating that the increase in speed requires only a
small tradeoff in accuracy of expression level estimation.
Availability: The methods were implemented in R and C++, and are available as
part of the BitSeq project at https://code.google.com/p/bitseq/. The methods
will be made available through the BitSeq Bioconductor package at the next
stable release.

Deep Gaussian processes provide a flexible approach to probabilistic
modelling of data using either supervised or unsupervised learning. For
tractable inference approximations to the marginal likelihood of the model must
be made. The original approach to approximate inference in these models used
variational compression to allow for approximate variational marginalization of
the hidden variables leading to a lower bound on the marginal likelihood of the
model [Damianou and Lawrence, 2013]. In this paper we extend this idea with a
nested variational compression. The resulting lower bound on the likelihood can
be easily parallelized or adapted for stochastic variational inference.

Gaussian process classification is a popular method with a number of
appealing properties. We show how to scale the model within a variational
inducing point framework, outperforming the state of the art on benchmark
datasets. Importantly, the variational formulation can be exploited to allow
classification in problems with millions of data points, as we demonstrate in
experiments.

In this work, we present an extension of Gaussian process (GP) models with
sophisticated parallelization and GPU acceleration. The parallelization scheme
arises naturally from the modular computational structure w.r.t. datapoints in
the sparse Gaussian process formulation. Additionally, the computational
bottleneck is implemented with GPU acceleration for further speed up. Combining
both techniques allows applying Gaussian process models to millions of
datapoints. The efficiency of our algorithm is demonstrated with a synthetic
dataset. Its source code has been integrated into our popular software library
GPy.

In this publication, we combine two Bayesian nonparametric models: the
Gaussian Process (GP) and the Dirichlet Process (DP). Our innovation in the GP
model is to introduce a variation on the GP prior which enables us to model
structured timeseries data, i.e. data containing groups where we wish to model
inter and intragroup variability. Our innovation in the DP model is an
implementation of a new fast collapsed variational inference procedure which
enables us to optimize our variationala pproximation significantly faster than
standard VB approaches. In a biological time series application we show how our
model better captures salient features of the data, leading to better
consistency with existing biological classifications, while the associated
inference algorithm provides a twofold speedup over EMbased variational
inference.

We introduce stochastic variational inference for Gaussian process models.
This enables the application of Gaussian process (GP) models to data sets
containing millions of data points. We show how GPs can be vari ationally
decomposed to depend on a set of globally relevant inducing variables which
factorize the model in the necessary manner to perform variational inference.
Our ap proach is readily extended to models with nonGaussian likelihoods and
latent variable models based around Gaussian processes. We demonstrate the
approach on a simple toy problem and two real world data sets.

We present a general method for deriving collapsed variational inference
algo rithms for probabilistic models in the conjugate exponential family. Our
method unifies many existing approaches to collapsed variational inference. Our
collapsed variational inference leads to a new lower bound on the marginal
likelihood. We exploit the information geometry of the bound to derive much
faster optimization methods based on conjugate gradients for these models. Our
approach is very general and is easily applied to any model where the mean
field update equations have been derived. Empirically we show significant
speedups for probabilistic models optimized using our bound.