
We investigate the optimization of two probabilistic generative models with
binary latent variables using a novel variational EM approach. The approach
distinguishes itself from previous variational approaches by using latent
states as variational parameters. Here we use efficient and general purpose
sampling procedures to vary the latent states, and investigate the "black box"
applicability of the resulting optimization procedure. For general purpose
applicability, samples are drawn from approximate marginal distributions of the
considered generative model as well as from the model's prior distribution. As
such, variational sampling is defined in a generic form, and is directly
executable for a given model. As a proof of concept, we then apply the novel
procedure (A) to Binary Sparse Coding (a model with continuous observables),
and (B) to basic Sigmoid Belief Networks (which are models with binary
observables). Numerical experiments verify that the investigated approach
efficiently as well as effectively increases a variational free energy
objective without requiring any additional analytical steps.

We propose a class of intrinsic Gaussian processes (inGPs) for
interpolation, regression and classification on manifolds with a primary focus
on complex constrained domains or irregular shaped spaces arising as subsets or
submanifolds of R, R2, R3 and beyond. For example, inGPs can accommodate
spatial domains arising as complex subsets of Euclidean space. inGPs respect
the potentially complex boundary or interior conditions as well as the
intrinsic geometry of the spaces. The key novelty of the proposed approach is
to utilise the relationship between heat kernels and the transition density of
Brownian motion on manifolds for constructing and approximating valid and
computationally feasible covariance kernels. This enables inGPs to be
practically applied in great generality, while existing approaches for
smoothing on constrained domains are limited to simple special cases. The broad
utilities of the inGP approach is illustrated through simulation studies and
data examples.

Often in machine learning, data are collected as a combination of multiple
conditions, e.g., the voice recordings of multiple persons, each labeled with
an ID. How could we build a model that captures the latent information related
to these conditions and generalize to a new one with few data? We present a new
model called Latent Variable Multiple Output Gaussian Processes (LVMOGP) and
that allows to jointly model multiple conditions for regression and generalize
to a new condition with a few data points at test time. LVMOGP infers the
posteriors of Gaussian processes together with a latent space representing the
information about different conditions. We derive an efficient variational
inference method for LVMOGP, of which the computational complexity is as low as
sparse Gaussian processes. We show that LVMOGP significantly outperforms
related Gaussian process methods on various tasks with both synthetic and real
data.

Bayesian optimization (BO) has emerged during the last few years as an
effective approach to optimizing blackbox functions where direct queries of
the objective are expensive. In this paper we consider the case where direct
access to the function is not possible, but information about user preferences
is. Such scenarios arise in problems where human preferences are modeled, such
as A/B tests or recommender systems. We present a new framework for this
scenario that we call Preferential Bayesian Optimization (PBO) which allows us
to find the optimum of a latent function that can only be queried through
pairwise comparisons, the socalled duels. PBO extends the applicability of
standard BO ideas and generalizes previous discrete dueling approaches by
modeling the probability of the winner of each duel by means of a Gaussian
process model with a Bernoulli likelihood. The latent preference function is
used to define a family of acquisition functions that extend usual policies
used in BO. We illustrate the benefits of PBO in a variety of experiments,
showing that PBO needs drastically fewer comparisons for finding the optimum.
According to our experiments, the way of modeling correlations in PBO is key in
obtaining this advantage.

Quantitative modeling of posttranscriptional regulation process is a
challenging problem in systems biology. A mechanical model of the regulatory
process needs to be able to describe the available spatiotemporal protein
concentration and mRNA expression data and recover the continuous
spatiotemporal fields. Rigorous methods are required to identify model
parameters. A promising approach to deal with these difficulties is proposed
using Gaussian process as a prior distribution over the latent function of
protein concentration and mRNA expression. In this study, we consider a partial
differential equation mechanical model with differential operators and latent
function. Since the operators at stake are linear, the information from the
physical model can be encoded into the kernel function. Hybrid Monte Carlo
methods are employed to carry out Bayesian inference of the partial
differential equation parameters and Gaussian process kernel parameters. The
spatiotemporal field of protein concentration and mRNA expression are
reconstructed without explicitly solving the partial differential equation.

We propose a nonparametric procedure to achieve fast inference in generative
graphical models when the number of latent states is very large. The approach
is based on iterative latent variable preselection, where we alternate between
learning a 'selection function' to reveal the relevant latent variables, and
use this to obtain a compact approximation of the posterior distribution for
EM; this can make inference possible where the number of possible latent states
is e.g. exponential in the number of latent variables, whereas an exact
approach would be computationally unfeasible. We learn the selection function
entirely from the observed data and current EM state via Gaussian process
regression. This is by contrast with earlier approaches, where selection
functions were manuallydesigned for each problem setting. We show that our
approach performs as well as these bespoke selection functions on a wide
variety of inference problems: in particular, for the challenging case of a
hierarchical model for object localization with occlusion, we achieve results
that match a customized stateoftheart selection method, at a far lower
computational cost.

Unsupervised learning on imbalanced data is challenging because, when given
imbalanced data, current model is often dominated by the major category and
ignores the categories with small amount of data. We develop a latent variable
model that can cope with imbalanced data by dividing the latent space into a
shared space and a private space. Based on Gaussian Process Latent Variable
Models, we propose a new kernel formulation that enables the separation of
latent space and derives an efficient variational inference method. The
performance of our model is demonstrated with an imbalanced medical image
dataset.

We develop a scalable deep nonparametric generative model by augmenting deep
Gaussian processes with a recognition model. Inference is performed in a novel
scalable variational framework where the variational posterior distributions
are reparametrized through a multilayer perceptron. The key aspect of this
reformulation is that it prevents the proliferation of variational parameters
which otherwise grow linearly in proportion to the sample size. We derive a new
formulation of the variational lower bound that allows us to distribute most of
the computation in a way that enables to handle datasets of the size of
mainstream deep learning tasks. We show the efficacy of the method on a variety
of challenges including deep unsupervised learning and deep Bayesian
optimization.

We define Recurrent Gaussian Processes (RGP) models, a general family of
Bayesian nonparametric models with recurrent GP priors which are able to learn
dynamical patterns from sequential data. Similar to Recurrent Neural Networks
(RNNs), RGPs can have different formulations for their internal states,
distinct inference methods and be extended with deep structures. In such
context, we propose a novel deep RGP model whose autoregressive states are
latent, thereby performing representation and dynamical learning
simultaneously. To fully exploit the Bayesian nature of the RGP model we
develop the Recurrent Variational Bayes (REVARB) framework, which enables
efficient inference and strong regularization through coherent propagation of
uncertainty across the RGP layers and states. We also introduce a RGP extension
where variational parameters are greatly reduced by being reparametrized
through RNNbased sequential recognition models. We apply our model to the
tasks of nonlinear system identification and human motion modeling. The
promising obtained results indicate that our RGP model maintains its highly
flexibility while being able to avoid overfitting and being applicable even
when larger datasets are not available.

The popularity of Bayesian optimization methods for efficient exploration of
parameter spaces has lead to a series of papers applying Gaussian processes as
surrogates in the optimization of functions. However, most proposed approaches
only allow the exploration of the parameter space to occur sequentially. Often,
it is desirable to simultaneously propose batches of parameter values to
explore. This is particularly the case when large parallel processing
facilities are available. These facilities could be computational or physical
facets of the process being optimized. E.g. in biological experiments many
experimental set ups allow several samples to be simultaneously processed.
Batch methods, however, require modeling of the interaction between the
evaluations in the batch, which can be expensive in complex scenarios. We
investigate a simple heuristic based on an estimate of the Lipschitz constant
that captures the most important aspect of this interaction (i.e. local
repulsion) at negligible computational overhead. The resulting algorithm
compares well, in running time, with much more elaborate alternatives. The
approach assumes that the function of interest, $f$, is a Lipschitz continuous
function. A wraploop around the acquisition function is used to collect
batches of points of certain size minimizing the nonparallelizable
computational effort. The speedup of our method with respect to previous
approaches is significant in a set of computationally expensive experiments.

The Gaussian process latent variable model (GPLVM) is a popular approach to
nonlinear probabilistic dimensionality reduction. One design choice for the
model is the number of latent variables. We present a spike and slab prior for
the GPLVM and propose an efficient variational inference procedure that gives
a lower bound of the log marginal likelihood. The new model provides a more
principled approach for selecting latent dimensions than the standard way of
thresholding the lengthscale parameters. The effectiveness of our approach is
demonstrated through experiments on real and simulated data. Further, we extend
multiview Gaussian processes that rely on sharing latent dimensions (known as
manifold relevance determination) with spike and slab priors. This allows a
more principled approach for selecting a subset of the latent space for each
view of data. The extended model outperforms the previous stateoftheart when
applied to a crossmodal multimedia retrieval task.

In this work, we present an extension of Gaussian process (GP) models with
sophisticated parallelization and GPU acceleration. The parallelization scheme
arises naturally from the modular computational structure w.r.t. datapoints in
the sparse Gaussian process formulation. Additionally, the computational
bottleneck is implemented with GPU acceleration for further speed up. Combining
both techniques allows applying Gaussian process models to millions of
datapoints. The efficiency of our algorithm is demonstrated with a synthetic
dataset. Its source code has been integrated into our popular software library
GPy.

We study the task of cleaning scanned text documents that are strongly
corrupted by dirt such as manual line strokes, spilled ink etc. We aim at
autonomously removing dirt from a single lettersize page based only on the
information the page contains. Our approach, therefore, has to learn character
representations without supervision and requires a mechanism to distinguish
learned representations from irregular patterns. To learn character
representations, we use a probabilistic generative model parameterizing pattern
features, feature variances, the features' planar arrangements, and pattern
frequencies. The latent variables of the model describe pattern class, pattern
position, and the presence or absence of individual pattern features. The model
parameters are optimized using a novel variational EM approximation. After
learning, the parameters represent, independently of their absolute position,
planar feature arrangements and their variances. A quality measure defined
based on the learned representation then allows for an autonomous
discrimination between regular character patterns and the irregular patterns
making up the dirt. The irregular patterns can thus be removed to clean the
document. For a full Latin alphabet we found that a single page does not
contain sufficiently many character examples. However, even if heavily
corrupted by dirt, we show that a page containing a lower number of character
types can efficiently and autonomously be cleaned solely based on the
structural regularity of the characters it contains. In different examples
using characters from different alphabets, we demonstrate generality of the
approach and discuss its implications for future developments.

We presented an alternative computational method for determining the
permitted LS spectral terms arising from $l^N$ electronic configurations. This
method makes the direct calculation of LS terms possible. Using only basic
algebra, we derived our theory from LScoupling scheme and Pauli exclusion
principle. As an application, we have performed the most complete set of
calculations to date of the spectral terms arising from $l^N$ electronic
configurations, and the representative results were shown. As another
application on deducing LScoupling rules, for two equivalent electrons, we
deduced the famous Even Rule; for three equivalent electrons, we derived a new
simple rule.