
There is a widespread need for techniques that can discover structure from
time series data. Recently introduced techniques such as Automatic Bayesian
Covariance Discovery (ABCD) provide a way to find structure within a single
time series by searching through a space of covariance kernels that is
generated using a simple grammar. While ABCD can identify a broad class of
temporal patterns, it is difficult to extend and can be brittle in practice.
This paper shows how to extend ABCD by formulating it in terms of probabilistic
program synthesis. The key technical ideas are to (i) represent models using
abstract syntax trees for a domainspecific probabilistic language, and (ii)
represent the time series model prior, likelihood, and search strategy using
probabilistic programs in a sufficiently expressive language. The final
probabilistic program is written in under 70 lines of probabilistic code in
Venture. The paper demonstrates an application to time series clustering that
involves a nonparametric extension to ABCD, experiments for interpolation and
extrapolation on realworld econometric data, and improvements in accuracy over
both nonparametric and standard regression baselines.

Databases are widespread, yet extracting relevant data can be difficult.
Without substantial domain knowledge, multivariate search queries often return
sparse or uninformative results. This paper introduces an approach for
searching structured data based on probabilistic programming and nonparametric
Bayes. Users specify queries in a probabilistic language that combines standard
SQL database search operators with an information theoretic ranking function
called predictive relevance. Predictive relevance can be calculated by a fast
sparse matrix algorithm based on posterior samples from CrossCat, a
nonparametric Bayesian model for highdimensional, heterogeneouslytyped data
tables. The result is a flexible search technique that applies to a broad class
of information retrieval problems, which we integrate into BayesDB, a
probabilistic programming platform for probabilistic data analysis. This paper
demonstrates applications to databases of US colleges, global macroeconomic
indicators of public health, and classic cars. We found that human evaluators
often prefer the results from probabilistic search to results from a standard
baseline.

Datasets with hundreds of variables and many missing values are commonplace.
In this setting, it is both statistically and computationally challenging to
detect true predictive relationships between variables and also to suppress
false positives. This paper proposes an approach that combines probabilistic
programming, information theory, and nonparametric Bayes. It shows how to use
Bayesian nonparametric modeling to (i) build an ensemble of joint probability
models for all the variables; (ii) efficiently detect marginal independencies;
and (iii) estimate the conditional mutual information between arbitrary subsets
of variables, subject to a broad class of constraints. Users can access these
capabilities using BayesDB, a probabilistic programming platform for
probabilistic data analysis, by writing queries in a simple, SQLlike language.
This paper demonstrates empirically that the method can (i) detect
contextspecific (in)dependencies on challenging synthetic problems and (ii)
yield improved sensitivity and specificity over baselines from statistics and
machine learning, on a realworld database of over 300 sparsely observed
indicators of macroeconomic development and public health.

Probabilistic techniques are central to data analysis, but different
approaches can be difficult to apply, combine, and compare. This paper
introduces composable generative population models (CGPMs), a computational
abstraction that extends directed graphical models and can be used to describe
and compose a broad class of probabilistic data analysis techniques. Examples
include hierarchical Bayesian models, multivariate kernel methods,
discriminative machine learning, clustering algorithms, dimensionality
reduction, and arbitrary probabilistic programs. We also demonstrate the
integration of CGPMs into BayesDB, a probabilistic programming platform that
can express data analysis tasks using a modeling language and a structured
query language. The practical value is illustrated in two ways. First, CGPMs
are used in an analysis that identifies satellite data records which probably
violate Kepler's Third Law, by composing causal probabilistic programs with
nonparametric Bayes in under 50 lines of probabilistic code. Second, for
several representative data analysis tasks, we report on lines of code and
accuracy measurements of various CGPMs, plus comparisons with standard baseline
solutions from Python and MATLAB libraries.

Is it possible to make statistical inference broadly accessible to
nonstatisticians without sacrificing mathematical rigor or inference quality?
This paper describes BayesDB, a probabilistic programming platform that aims to
enable users to query the probable implications of their data as directly as
SQL databases enable them to query the data itself. This paper focuses on four
aspects of BayesDB: (i) BQL, an SQLlike query language for Bayesian data
analysis, that answers queries by averaging over an implicit space of
probabilistic models; (ii) techniques for implementing BQL using a broad class
of multivariate probabilistic models; (iii) a semiparametric Bayesian
modelbuilder that auomatically builds ensembles of factorial mixture models to
serve as baselines; and (iv) MML, a "metamodeling" language for imposing
qualitative constraints on the modelbuilder and combining baseline models with
custom algorithmic and statistical models that can be implemented in external
software. BayesDB is illustrated using three applications: cleaning and
exploring a public database of Earth satellites; assessing the evidence for
temporal dependence between macroeconomic indicators; and analyzing a salary
survey.

Approximate inference in highdimensional, discrete probabilistic models is a
central problem in computational statistics and machine learning. This paper
describes discrete particle variational inference (DPVI), a new approach that
combines key strengths of Monte Carlo, variational and searchbased techniques.
DPVI is based on a novel family of particlebased variational approximations
that can be fit using simple, fast, deterministic search techniques. Like Monte
Carlo, DPVI can handle multiple modes, and yields exact results in a
welldefined limit. Like unstructured meanfield, DPVI is based on optimizing a
lower bound on the partition function; when this quantity is not of intrinsic
interest, it facilitates convergence assessment and debugging. Like both Monte
Carlo and combinatorial search, DPVI can take advantage of factorization,
sequential structure, and custom search operators. This paper defines DPVI
particlebased approximation family and partition function lower bounds, along
with the sequential DPVI and local DPVI algorithm templates for optimizing
them. DPVI is illustrated and evaluated via experiments on lattice Markov
Random Fields, nonparametric Bayesian mixtures and blockmodels, and parametric
as well as nonparametric hidden Markov models. Results include applications to
realworld spikesorting and relational modeling problems, and show that DPVI
can offer appealing time/accuracy tradeoffs as compared to multiple
alternatives.

There is a widespread need for statistical methods that can analyze
highdimensional datasets with out imposing restrictive or opaque modeling
assumptions. This paper describes a domaingeneral data analysis method called
CrossCat. CrossCat infers multiple nonoverlapping views of the data, each
consisting of a subset of the variables, and uses a separate nonparametric
mixture to model each view. CrossCat is based on approximately Bayesian
inference in a hierarchical, nonparamet ric model for data tables. This model
consists of a Dirichlet process mixture over the columns of a data table in
which each mixture component is itself an independent Dirichlet process mixture
over the rows; the inner mixture components are simple parametric models whose
form depends on the types of data in the table. CrossCat combines strengths of
mixture modeling and Bayesian net work structure learning. Like mixture
modeling, CrossCat can model a broad class of distributions by positing latent
variables, and produces representations that can be efficiently conditioned and
sampled from for prediction. Like Bayesian networks, CrossCat represents the
dependencies and independencies between variables, and thus remains accurate
when there are multiple statistical signals. Inference is done via a scalable
Gibbs sampling scheme; this paper shows that it works well in practice. This
paper also includes empirical results on heterogeneous tabular data of up to 10
million cells, such as hospital cost and quality measures, voting records,
unemployment rates, gene expression measurements, and images of handwritten
digits. CrossCat infers structure that is consistent with accepted findings and
commonsense knowledge in multiple domains and yields predictive accuracy
competitive with generative, discriminative, and modelfree alternatives.

We introduce and demonstrate a new approach to inference in expressive
probabilistic programming languages based on particle Markov chain Monte Carlo.
Our approach is simple to implement and easy to parallelize. It applies to
Turingcomplete probabilistic programming languages and supports accurate
inference in models that make use of complex control flow, including stochastic
recursion. It also includes primitives from Bayesian nonparametric statistics.
Our experiments show that this approach can be more efficient than previously
introduced singlesite MetropolisHastings methods.

Models of complex systems are often formalized as sequential software
simulators: computationally intensive programs that iteratively build up
probable system configurations given parameters and initial conditions. These
simulators enable modelers to capture effects that are difficult to
characterize analytically or summarize statistically. However, in many
realworld applications, these simulations need to be inverted to match the
observed data. This typically requires the custom design, derivation and
implementation of sophisticated inversion algorithms. Here we give a framework
for inverting a broad class of complex software simulators via probabilistic
programming and automatic inference, using under 20 lines of probabilistic
code. Our approach is based on a formulation of inversion as approximate
inference in a simple sequential probabilistic model. We implement four
inference strategies, including MetropolisHastings, a sequentialized
MetropolisHastings scheme, and a particle Markov chain Monte Carlo scheme,
requiring 4 or fewer lines of probabilistic code each. We demonstrate our
framework by applying it to invert a real geological software simulator from
the oil and gas industry.

Probabilistic programming languages can simplify the development of machine
learning techniques, but only if inference is sufficiently scalable.
Unfortunately, Bayesian parameter estimation for highly coupled models such as
regressions and statespace models still scales poorly; each MCMC transition
takes linear time in the number of observations. This paper describes a
sublineartime algorithm for making MetropolisHastings (MH) updates to latent
variables in probabilistic programs. The approach generalizes recently
introduced approximate MH techniques: instead of subsampling data items assumed
to be independent, it subsamples edges in a dynamically constructed graphical
model. It thus applies to a broader class of problems and interoperates with
other generalpurpose inference techniques. Empirical results, including
confirmation of sublinear pertransition scaling, are presented for Bayesian
logistic regression, nonlinear classification via joint Dirichlet process
mixtures, and parameter estimation for stochastic volatility models (with state
estimation via particle MCMC). All three applications use the same
implementation, and each requires under 20 lines of probabilistic code.

Particle Markov chain Monte Carlo techniques rank among current
stateoftheart methods for probabilistic program inference. A drawback of
these techniques is that they rely on importance resampling, which results in
degenerate particle trajectories and a low effective sample size for variables
sampled early in a program. We here develop a formalism to adapt ancestor
resampling, a technique that mitigates particle degeneracy, to the
probabilistic programming setting. We present empirical results that
demonstrate nontrivial performance gains.

We introduce Church, a universal language for describing stochastic
generative processes. Church is based on the Lisp model of lambda calculus,
containing a pure Lisp as its deterministic subset. The semantics of Church is
defined in terms of evaluation histories and conditional distributions on such
histories. Church also includes a novel language construct, the stochastic
memoizer, which enables simple description of many complex nonparametric
models. We illustrate language features through several examples, including: a
generalized Bayes net in which parameters cluster over trials, infinite PCFGs,
planning by inference, and various nonparametric clustering models. Finally,
we show how to implement query on any Church program, exactly and
approximately, using Monte Carlo techniques.

We describe Venture, an interactive virtual machine for probabilistic
programming that aims to be sufficiently expressive, extensible, and efficient
for generalpurpose use. Like Church, probabilistic models and inference
problems in Venture are specified via a Turingcomplete, higherorder
probabilistic language descended from Lisp. Unlike Church, Venture also
provides a compositional language for custom inference strategies built out of
scalable exact and approximate techniques. We also describe four key aspects of
Venture's implementation that build on ideas from probabilistic graphical
models. First, we describe the stochastic procedure interface (SPI) that
specifies and encapsulates primitive random variables. The SPI supports custom
control flow, higherorder probabilistic procedures, partially exchangeable
sequences and ``likelihoodfree'' stochastic simulators. It also supports
external models that do inference over latent variables hidden from Venture.
Second, we describe probabilistic execution traces (PETs), which represent
execution histories of Venture programs. PETs capture conditional dependencies,
existential dependencies and exchangeable coupling. Third, we describe
partitions of execution histories called scaffolds that factor global inference
problems into coherent subproblems. Finally, we describe a family of
stochastic regeneration algorithms for efficiently modifying PET fragments
contained within scaffolds. Stochastic regeneration linear runtime scaling in
cases where many previous approaches scaled quadratically. We show how to use
stochastic regeneration and the SPI to implement generalpurpose inference
strategies such as MetropolisHastings, Gibbs sampling, and blocked proposals
based on particle Markov chain Monte Carlo and meanfield variational inference
techniques.

The brain interprets ambiguous sensory information faster and more reliably
than modern computers, using neurons that are slower and less reliable than
logic gates. But Bayesian inference, which underpins many computational models
of perception and cognition, appears computationally challenging even given
modern transistor speeds and energy budgets. The computational principles and
structures needed to narrow this gap are unknown. Here we show how to build
fast Bayesian computing machines using intentionally stochastic, digital parts,
narrowing this efficiency gap by multiple orders of magnitude. We find that by
connecting stochastic digital components according to simple mathematical
rules, one can build massively parallel, low precision circuits that solve
Bayesian inference problems and are compatible with the Poisson firing
statistics of cortical neurons. We evaluate circuits for depth and motion
perception, perceptual learning and causal reasoning, each performing inference
over 10,000+ latent variables in real time  a 1,000x speed advantage over
commodity microprocessors. These results suggest a new role for randomness in
the engineering and reverseengineering of intelligent computation.

Traditional approaches to Bayes net structure learning typically assume
little regularity in graph structure other than sparseness. However, in many
cases, we expect more systematicity: variables in realworld systems often
group into classes that predict the kinds of probabilistic dependencies they
participate in. Here we capture this form of prior knowledge in a hierarchical
Bayesian framework, and exploit it to enable structure learning and type
discovery from small datasets. Specifically, we present a nonparametric
generative model for directed acyclic graphs as a prior for Bayes net structure
learning. Our model assumes that variables come in one or more classes and that
the prior probability of an edge existing between two variables is a function
only of their classes. We derive an MCMC algorithm for simultaneous inference
of the number of classes, the class assignments of variables, and the Bayes net
structure over variables. For several realistic, sparse datasets, we show that
the bias towards systematicity of connections provided by our model yields more
accurate learned networks than a traditional, uniform prior approach, and that
the classes found by our model are appropriate.