• We investigate the optimization of two probabilistic generative models with binary latent variables using a novel variational EM approach. The approach distinguishes itself from previous variational approaches by using latent states as variational parameters. Here we use efficient and general purpose sampling procedures to vary the latent states, and investigate the "black box" applicability of the resulting optimization procedure. For general purpose applicability, samples are drawn from approximate marginal distributions of the considered generative model as well as from the model's prior distribution. As such, variational sampling is defined in a generic form, and is directly executable for a given model. As a proof of concept, we then apply the novel procedure (A) to Binary Sparse Coding (a model with continuous observables), and (B) to basic Sigmoid Belief Networks (which are models with binary observables). Numerical experiments verify that the investigated approach efficiently as well as effectively increases a variational free energy objective without requiring any additional analytical steps.
  • We propose a class of intrinsic Gaussian processes (in-GPs) for interpolation, regression and classification on manifolds with a primary focus on complex constrained domains or irregular shaped spaces arising as subsets or submanifolds of R, R2, R3 and beyond. For example, in-GPs can accommodate spatial domains arising as complex subsets of Euclidean space. in-GPs respect the potentially complex boundary or interior conditions as well as the intrinsic geometry of the spaces. The key novelty of the proposed approach is to utilise the relationship between heat kernels and the transition density of Brownian motion on manifolds for constructing and approximating valid and computationally feasible covariance kernels. This enables in-GPs to be practically applied in great generality, while existing approaches for smoothing on constrained domains are limited to simple special cases. The broad utilities of the in-GP approach is illustrated through simulation studies and data examples.
  • Often in machine learning, data are collected as a combination of multiple conditions, e.g., the voice recordings of multiple persons, each labeled with an ID. How could we build a model that captures the latent information related to these conditions and generalize to a new one with few data? We present a new model called Latent Variable Multiple Output Gaussian Processes (LVMOGP) and that allows to jointly model multiple conditions for regression and generalize to a new condition with a few data points at test time. LVMOGP infers the posteriors of Gaussian processes together with a latent space representing the information about different conditions. We derive an efficient variational inference method for LVMOGP, of which the computational complexity is as low as sparse Gaussian processes. We show that LVMOGP significantly outperforms related Gaussian process methods on various tasks with both synthetic and real data.
  • Bayesian optimization (BO) has emerged during the last few years as an effective approach to optimizing black-box functions where direct queries of the objective are expensive. In this paper we consider the case where direct access to the function is not possible, but information about user preferences is. Such scenarios arise in problems where human preferences are modeled, such as A/B tests or recommender systems. We present a new framework for this scenario that we call Preferential Bayesian Optimization (PBO) which allows us to find the optimum of a latent function that can only be queried through pairwise comparisons, the so-called duels. PBO extends the applicability of standard BO ideas and generalizes previous discrete dueling approaches by modeling the probability of the winner of each duel by means of a Gaussian process model with a Bernoulli likelihood. The latent preference function is used to define a family of acquisition functions that extend usual policies used in BO. We illustrate the benefits of PBO in a variety of experiments, showing that PBO needs drastically fewer comparisons for finding the optimum. According to our experiments, the way of modeling correlations in PBO is key in obtaining this advantage.
  • Quantitative modeling of post-transcriptional regulation process is a challenging problem in systems biology. A mechanical model of the regulatory process needs to be able to describe the available spatio-temporal protein concentration and mRNA expression data and recover the continuous spatio-temporal fields. Rigorous methods are required to identify model parameters. A promising approach to deal with these difficulties is proposed using Gaussian process as a prior distribution over the latent function of protein concentration and mRNA expression. In this study, we consider a partial differential equation mechanical model with differential operators and latent function. Since the operators at stake are linear, the information from the physical model can be encoded into the kernel function. Hybrid Monte Carlo methods are employed to carry out Bayesian inference of the partial differential equation parameters and Gaussian process kernel parameters. The spatio-temporal field of protein concentration and mRNA expression are reconstructed without explicitly solving the partial differential equation.
  • We propose a nonparametric procedure to achieve fast inference in generative graphical models when the number of latent states is very large. The approach is based on iterative latent variable preselection, where we alternate between learning a 'selection function' to reveal the relevant latent variables, and use this to obtain a compact approximation of the posterior distribution for EM; this can make inference possible where the number of possible latent states is e.g. exponential in the number of latent variables, whereas an exact approach would be computationally unfeasible. We learn the selection function entirely from the observed data and current EM state via Gaussian process regression. This is by contrast with earlier approaches, where selection functions were manually-designed for each problem setting. We show that our approach performs as well as these bespoke selection functions on a wide variety of inference problems: in particular, for the challenging case of a hierarchical model for object localization with occlusion, we achieve results that match a customized state-of-the-art selection method, at a far lower computational cost.
  • Unsupervised learning on imbalanced data is challenging because, when given imbalanced data, current model is often dominated by the major category and ignores the categories with small amount of data. We develop a latent variable model that can cope with imbalanced data by dividing the latent space into a shared space and a private space. Based on Gaussian Process Latent Variable Models, we propose a new kernel formulation that enables the separation of latent space and derives an efficient variational inference method. The performance of our model is demonstrated with an imbalanced medical image dataset.
  • We develop a scalable deep non-parametric generative model by augmenting deep Gaussian processes with a recognition model. Inference is performed in a novel scalable variational framework where the variational posterior distributions are reparametrized through a multilayer perceptron. The key aspect of this reformulation is that it prevents the proliferation of variational parameters which otherwise grow linearly in proportion to the sample size. We derive a new formulation of the variational lower bound that allows us to distribute most of the computation in a way that enables to handle datasets of the size of mainstream deep learning tasks. We show the efficacy of the method on a variety of challenges including deep unsupervised learning and deep Bayesian optimization.
  • We define Recurrent Gaussian Processes (RGP) models, a general family of Bayesian nonparametric models with recurrent GP priors which are able to learn dynamical patterns from sequential data. Similar to Recurrent Neural Networks (RNNs), RGPs can have different formulations for their internal states, distinct inference methods and be extended with deep structures. In such context, we propose a novel deep RGP model whose autoregressive states are latent, thereby performing representation and dynamical learning simultaneously. To fully exploit the Bayesian nature of the RGP model we develop the Recurrent Variational Bayes (REVARB) framework, which enables efficient inference and strong regularization through coherent propagation of uncertainty across the RGP layers and states. We also introduce a RGP extension where variational parameters are greatly reduced by being reparametrized through RNN-based sequential recognition models. We apply our model to the tasks of nonlinear system identification and human motion modeling. The promising obtained results indicate that our RGP model maintains its highly flexibility while being able to avoid overfitting and being applicable even when larger datasets are not available.
  • The popularity of Bayesian optimization methods for efficient exploration of parameter spaces has lead to a series of papers applying Gaussian processes as surrogates in the optimization of functions. However, most proposed approaches only allow the exploration of the parameter space to occur sequentially. Often, it is desirable to simultaneously propose batches of parameter values to explore. This is particularly the case when large parallel processing facilities are available. These facilities could be computational or physical facets of the process being optimized. E.g. in biological experiments many experimental set ups allow several samples to be simultaneously processed. Batch methods, however, require modeling of the interaction between the evaluations in the batch, which can be expensive in complex scenarios. We investigate a simple heuristic based on an estimate of the Lipschitz constant that captures the most important aspect of this interaction (i.e. local repulsion) at negligible computational overhead. The resulting algorithm compares well, in running time, with much more elaborate alternatives. The approach assumes that the function of interest, $f$, is a Lipschitz continuous function. A wrap-loop around the acquisition function is used to collect batches of points of certain size minimizing the non-parallelizable computational effort. The speed-up of our method with respect to previous approaches is significant in a set of computationally expensive experiments.
  • The Gaussian process latent variable model (GP-LVM) is a popular approach to non-linear probabilistic dimensionality reduction. One design choice for the model is the number of latent variables. We present a spike and slab prior for the GP-LVM and propose an efficient variational inference procedure that gives a lower bound of the log marginal likelihood. The new model provides a more principled approach for selecting latent dimensions than the standard way of thresholding the length-scale parameters. The effectiveness of our approach is demonstrated through experiments on real and simulated data. Further, we extend multi-view Gaussian processes that rely on sharing latent dimensions (known as manifold relevance determination) with spike and slab priors. This allows a more principled approach for selecting a subset of the latent space for each view of data. The extended model outperforms the previous state-of-the-art when applied to a cross-modal multimedia retrieval task.
  • In this work, we present an extension of Gaussian process (GP) models with sophisticated parallelization and GPU acceleration. The parallelization scheme arises naturally from the modular computational structure w.r.t. datapoints in the sparse Gaussian process formulation. Additionally, the computational bottleneck is implemented with GPU acceleration for further speed up. Combining both techniques allows applying Gaussian process models to millions of datapoints. The efficiency of our algorithm is demonstrated with a synthetic dataset. Its source code has been integrated into our popular software library GPy.
  • We study the task of cleaning scanned text documents that are strongly corrupted by dirt such as manual line strokes, spilled ink etc. We aim at autonomously removing dirt from a single letter-size page based only on the information the page contains. Our approach, therefore, has to learn character representations without supervision and requires a mechanism to distinguish learned representations from irregular patterns. To learn character representations, we use a probabilistic generative model parameterizing pattern features, feature variances, the features' planar arrangements, and pattern frequencies. The latent variables of the model describe pattern class, pattern position, and the presence or absence of individual pattern features. The model parameters are optimized using a novel variational EM approximation. After learning, the parameters represent, independently of their absolute position, planar feature arrangements and their variances. A quality measure defined based on the learned representation then allows for an autonomous discrimination between regular character patterns and the irregular patterns making up the dirt. The irregular patterns can thus be removed to clean the document. For a full Latin alphabet we found that a single page does not contain sufficiently many character examples. However, even if heavily corrupted by dirt, we show that a page containing a lower number of character types can efficiently and autonomously be cleaned solely based on the structural regularity of the characters it contains. In different examples using characters from different alphabets, we demonstrate generality of the approach and discuss its implications for future developments.
  • We presented an alternative computational method for determining the permitted LS spectral terms arising from $l^N$ electronic configurations. This method makes the direct calculation of LS terms possible. Using only basic algebra, we derived our theory from LS-coupling scheme and Pauli exclusion principle. As an application, we have performed the most complete set of calculations to date of the spectral terms arising from $l^N$ electronic configurations, and the representative results were shown. As another application on deducing LS-coupling rules, for two equivalent electrons, we deduced the famous Even Rule; for three equivalent electrons, we derived a new simple rule.