• We investigate the low-dimensional structure of deterministic transformations between random variables, i.e., transport maps between probability measures. In the context of statistics and machine learning, these transformations can be used to couple a tractable "reference" measure (e.g., a standard Gaussian) with a target measure of interest. Direct simulation from the desired measure can then be achieved by pushing forward reference samples through the map. Yet characterizing such a map---e.g., representing and evaluating it---grows challenging in high dimensions. The central contribution of this paper is to establish a link between the Markov properties of the target measure and the existence of low-dimensional couplings, induced by transport maps that are sparse and/or decomposable. Our analysis not only facilitates the construction of transformations in high-dimensional settings, but also suggests new inference methodologies for continuous non-Gaussian graphical models. For instance, in the context of nonlinear state-space models, we describe new variational algorithms for filtering, smoothing, and sequential parameter inference. These algorithms can be understood as the natural generalization---to the non-Gaussian case---of the square-root Rauch-Tung-Striebel Gaussian smoother.
  • We propose a gradient-based method for detecting and exploiting low-dimensional input parameter dependence of multivariate functions. The methodology consists in minimizing an upper bound, obtained by Poincar\'e-type inequalities, on the approximation error. The resulting method can be used to approximate vector-valued functions (e.g., functions taking values in $\mathbb{R}^n$ or functions taking values in function spaces) and generalizes the notion of active subspaces associated with scalar-valued functions. A comparison with the truncated Karhunen-Lo\`eve decomposition shows that using gradients of the function can yield more effective dimension reduction. Numerical examples reveal that the choice of norm on the codomain of the function can have a significant impact on the function's low-dimensional approximation.
  • Performing Bayesian inference via Markov chain Monte Carlo (MCMC) can be exceedingly expensive when posterior evaluations invoke the evaluation of a computationally expensive model, such as a system of partial differential equations. In recent work [Conrad et al. JASA 2016, arXiv:1402.1694], we described a framework for constructing and refining local approximations of such models during an MCMC simulation. These posterior--adapted approximations harness regularity of the model to reduce the computational cost of inference while preserving asymptotic exactness of the Markov chain. Here we describe two extensions of that work. First, we prove that samplers running in parallel can collaboratively construct a shared posterior approximation while ensuring ergodicity of each associated chain, providing a novel opportunity for exploiting parallel computation in MCMC. Second, focusing on the Metropolis--adjusted Langevin algorithm, we describe how a proposal distribution can successfully employ gradients and other relevant information extracted from the approximation. We investigate the practical performance of our strategies using two challenging inference problems, the first in subsurface hydrology and the second in glaciology. Using local approximations constructed via parallel chains, we successfully reduce the run time needed to characterize the posterior distributions in these problems from days to hours and from months to days, respectively, dramatically improving the tractability of Bayesian inference.
  • We introduce a new framework for efficient sampling from complex probability distributions, using a combination of optimal transport maps and the Metropolis-Hastings rule. The core idea is to use continuous transportation to transform typical Metropolis proposal mechanisms (e.g., random walks, Langevin methods) into non-Gaussian proposal distributions that can more effectively explore the target density. Our approach adaptively constructs a lower triangular transport map-an approximation of the Knothe-Rosenblatt rearrangement-using information from previous MCMC states, via the solution of an optimization problem. This optimization problem is convex regardless of the form of the target distribution. It is solved efficiently using a Newton method that requires no gradient information from the target probability distribution; the target distribution is instead represented via samples. Sequential updates enable efficient and parallelizable adaptation of the map even for large numbers of samples. We show that this approach uses inexact or truncated maps to produce an adaptive MCMC algorithm that is ergodic for the exact target distribution. Numerical demonstrations on a range of parameter inference problems show order-of-magnitude speedups over standard MCMC techniques, measured by the number of effectively independent samples produced per target density evaluation and per unit of wallclock time.
  • In this article we develop a new sequential Monte Carlo (SMC) method for multilevel (ML) Monte Carlo estimation. In particular, the method can be used to estimate expectations with respect to a target probability distribution over an infinite-dimensional and non-compact space as given, for example, by a Bayesian inverse problem with Gaussian random field prior. Under suitable assumptions the MLSMC method has the optimal $O(\epsilon^{-2})$ bound on the cost to obtain a mean-square error of $O(\epsilon^2)$. The algorithm is accelerated by dimension-independent likelihood-informed (DILI) proposals designed for Gaussian priors, leveraging a novel variation which uses empirical sample covariance information in lieu of Hessian information, hence eliminating the requirement for gradient evaluations. The efficiency of the algorithm is illustrated on two examples: inversion of noisy pressure measurements in a PDE model of Darcy flow to recover the posterior distribution of the permeability field, and inversion of noisy measurements of the solution of an SDE to recover the posterior path measure.
  • We propose optimal dimensionality reduction techniques for the solution of goal-oriented linear-Gaussian inverse problems, where the quantity of interest (QoI) is a function of the inversion parameters. These approximations are suitable for large-scale applications. In particular, we study the approximation of the posterior covariance of the QoI as a low-rank negative update of its prior covariance, and prove optimality of this update with respect to the natural geodesic distance on the manifold of symmetric positive definite matrices. Assuming exact knowledge of the posterior mean of the QoI, the optimality results extend to optimality in distribution with respect to the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose approximation of the posterior mean of the QoI as a low-rank linear function of the data, and prove optimality of this approximation with respect to a weighted Bayes risk. Both of these optimal approximations avoid the explicit computation of the full posterior distribution of the parameters and instead focus on directions that are well informed by the data and relevant to the QoI. These directions stem from a balance among all the components of the goal-oriented inverse problem: prior information, forward model, measurement noise, and ultimate goals. We illustrate the theory using a high-dimensional inverse problem in heat transfer.
  • A priori dimension reduction is a widely adopted technique for reducing the computational complexity of stationary inverse problems. In this setting, the solution of an inverse problem is parameterized by a low-dimensional basis that is often obtained from the truncated Karhunen-Loeve expansion of the prior distribution. For high-dimensional inverse problems equipped with smoothing priors, this technique can lead to drastic reductions in parameter dimension and significant computational savings. In this paper, we extend the concept of a priori dimension reduction to non-stationary inverse problems, in which the goal is to sequentially infer the state of a dynamical system. Our approach proceeds in an offline-online fashion. We first identify a low-dimensional subspace in the state space before solving the inverse problem (the offline phase), using either the method of "snapshots" or regularized covariance estimation. Then this subspace is used to reduce the computational complexity of various filtering algorithms - including the Kalman filter, extended Kalman filter, and ensemble Kalman filter - within a novel subspace-constrained Bayesian prediction-and-update procedure (the online phase). We demonstrate the performance of our new dimension reduction approach on various numerical examples. In some test cases, our approach reduces the dimensionality of the original problem by orders of magnitude and yields up to two orders of magnitude in computational savings.
  • In many inverse problems, model parameters cannot be precisely determined from observational data. Bayesian inference provides a mechanism for capturing the resulting parameter uncertainty, but typically at a high computational cost. This work introduces a multiscale decomposition that exploits conditional independence across scales, when present in certain classes of inverse problems, to decouple Bayesian inference into two stages: (1) a computationally tractable coarse-scale inference problem; and (2) a mapping of the low-dimensional coarse-scale posterior distribution into the original high-dimensional parameter space. This decomposition relies on a characterization of the non-Gaussian joint distribution of coarse- and fine-scale quantities via optimal transport maps. We demonstrate our approach on a sequence of inverse problems arising in subsurface flow, using the multiscale finite element method to discretize the steady state pressure equation. We compare the multiscale strategy with full-dimensional Markov chain Monte Carlo on a problem of moderate dimension (100 parameters) and then use it to infer a conductivity field described by over 10,000 parameters.
  • We present the fundamentals of a measure transport approach to sampling. The idea is to construct a deterministic coupling---i.e., a transport map---between a complex "target" probability measure of interest and a simpler reference measure. Given a transport map, one can generate arbitrarily many independent and unweighted samples from the target simply by pushing forward reference samples through the map. We consider two different and complementary scenarios: first, when only evaluations of the unnormalized target density are available, and second, when the target distribution is known only through a finite collection of samples. We show that in both settings the desired transports can be characterized as the solutions of variational problems. We then address practical issues associated with the optimization--based construction of transports: choosing finite-dimensional parameterizations of the map, enforcing monotonicity, quantifying the error of approximate transports, and refining approximate transports by enriching the corresponding approximation spaces. Approximate transports can also be used to "Gaussianize" complex distributions and thus precondition conventional asymptotically exact sampling schemes. We place the measure transport approach in broader context, describing connections with other optimization--based samplers, with inference and density estimation schemes using optimal transport, and with alternative transformation--based approaches to simulation. We also sketch current work aimed at the construction of transport maps in high dimensions, exploiting essential features of the target distribution (e.g., conditional independence, low-rank structure). The approaches and algorithms presented here have direct applications to Bayesian computation and to broader problems of stochastic simulation.
  • In the Bayesian approach to inverse problems, data are often informative, relative to the prior, only on a low-dimensional subspace of the parameter space. Significant computational savings can be achieved by using this subspace to characterize and approximate the posterior distribution of the parameters. We first investigate approximation of the posterior covariance matrix as a low-rank update of the prior covariance matrix. We prove optimality of a particular update, based on the leading eigendirections of the matrix pencil defined by the Hessian of the negative log-likelihood and the prior precision, for a broad class of loss functions. This class includes the F\"{o}rstner metric for symmetric positive definite matrices, as well as the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose two fast approximations of the posterior mean and prove their optimality with respect to a weighted Bayes risk under squared-error loss. These approximations are deployed in an offline-online manner, where a more costly but data-independent offline calculation is followed by fast online evaluations. As a result, these approximations are particularly useful when repeated posterior mean evaluations are required for multiple data sets. We demonstrate our theoretical results with several numerical examples, including high-dimensional X-ray tomography and an inverse heat conduction problem. In both of these examples, the intrinsic low-dimensional structure of the inference problem can be exploited while producing results that are essentially indistinguishable from solutions computed in the full space.