
We investigate the lowdimensional structure of deterministic transformations
between random variables, i.e., transport maps between probability measures. In
the context of statistics and machine learning, these transformations can be
used to couple a tractable "reference" measure (e.g., a standard Gaussian) with
a target measure of interest. Direct simulation from the desired measure can
then be achieved by pushing forward reference samples through the map. Yet
characterizing such a mape.g., representing and evaluating itgrows
challenging in high dimensions. The central contribution of this paper is to
establish a link between the Markov properties of the target measure and the
existence of lowdimensional couplings, induced by transport maps that are
sparse and/or decomposable. Our analysis not only facilitates the construction
of transformations in highdimensional settings, but also suggests new
inference methodologies for continuous nonGaussian graphical models. For
instance, in the context of nonlinear statespace models, we describe new
variational algorithms for filtering, smoothing, and sequential parameter
inference. These algorithms can be understood as the natural
generalizationto the nonGaussian caseof the squareroot
RauchTungStriebel Gaussian smoother.

We propose a gradientbased method for detecting and exploiting
lowdimensional input parameter dependence of multivariate functions. The
methodology consists in minimizing an upper bound, obtained by Poincar\'etype
inequalities, on the approximation error. The resulting method can be used to
approximate vectorvalued functions (e.g., functions taking values in
$\mathbb{R}^n$ or functions taking values in function spaces) and generalizes
the notion of active subspaces associated with scalarvalued functions. A
comparison with the truncated KarhunenLo\`eve decomposition shows that using
gradients of the function can yield more effective dimension reduction.
Numerical examples reveal that the choice of norm on the codomain of the
function can have a significant impact on the function's lowdimensional
approximation.

Performing Bayesian inference via Markov chain Monte Carlo (MCMC) can be
exceedingly expensive when posterior evaluations invoke the evaluation of a
computationally expensive model, such as a system of partial differential
equations. In recent work [Conrad et al. JASA 2016, arXiv:1402.1694], we
described a framework for constructing and refining local approximations of
such models during an MCMC simulation. These posterioradapted approximations
harness regularity of the model to reduce the computational cost of inference
while preserving asymptotic exactness of the Markov chain. Here we describe two
extensions of that work. First, we prove that samplers running in parallel can
collaboratively construct a shared posterior approximation while ensuring
ergodicity of each associated chain, providing a novel opportunity for
exploiting parallel computation in MCMC. Second, focusing on the
Metropolisadjusted Langevin algorithm, we describe how a proposal
distribution can successfully employ gradients and other relevant information
extracted from the approximation. We investigate the practical performance of
our strategies using two challenging inference problems, the first in
subsurface hydrology and the second in glaciology. Using local approximations
constructed via parallel chains, we successfully reduce the run time needed to
characterize the posterior distributions in these problems from days to hours
and from months to days, respectively, dramatically improving the tractability
of Bayesian inference.

We introduce a new framework for efficient sampling from complex probability
distributions, using a combination of optimal transport maps and the
MetropolisHastings rule. The core idea is to use continuous transportation to
transform typical Metropolis proposal mechanisms (e.g., random walks, Langevin
methods) into nonGaussian proposal distributions that can more effectively
explore the target density. Our approach adaptively constructs a lower
triangular transport mapan approximation of the KnotheRosenblatt
rearrangementusing information from previous MCMC states, via the solution of
an optimization problem. This optimization problem is convex regardless of the
form of the target distribution. It is solved efficiently using a Newton method
that requires no gradient information from the target probability distribution;
the target distribution is instead represented via samples. Sequential updates
enable efficient and parallelizable adaptation of the map even for large
numbers of samples. We show that this approach uses inexact or truncated maps
to produce an adaptive MCMC algorithm that is ergodic for the exact target
distribution. Numerical demonstrations on a range of parameter inference
problems show orderofmagnitude speedups over standard MCMC techniques,
measured by the number of effectively independent samples produced per target
density evaluation and per unit of wallclock time.

In this article we develop a new sequential Monte Carlo (SMC) method for
multilevel (ML) Monte Carlo estimation. In particular, the method can be used
to estimate expectations with respect to a target probability distribution over
an infinitedimensional and noncompact space as given, for example, by a
Bayesian inverse problem with Gaussian random field prior. Under suitable
assumptions the MLSMC method has the optimal $O(\epsilon^{2})$ bound on the
cost to obtain a meansquare error of $O(\epsilon^2)$. The algorithm is
accelerated by dimensionindependent likelihoodinformed (DILI) proposals
designed for Gaussian priors, leveraging a novel variation which uses empirical
sample covariance information in lieu of Hessian information, hence eliminating
the requirement for gradient evaluations. The efficiency of the algorithm is
illustrated on two examples: inversion of noisy pressure measurements in a PDE
model of Darcy flow to recover the posterior distribution of the permeability
field, and inversion of noisy measurements of the solution of an SDE to recover
the posterior path measure.

We propose optimal dimensionality reduction techniques for the solution of
goaloriented linearGaussian inverse problems, where the quantity of interest
(QoI) is a function of the inversion parameters. These approximations are
suitable for largescale applications. In particular, we study the
approximation of the posterior covariance of the QoI as a lowrank negative
update of its prior covariance, and prove optimality of this update with
respect to the natural geodesic distance on the manifold of symmetric positive
definite matrices. Assuming exact knowledge of the posterior mean of the QoI,
the optimality results extend to optimality in distribution with respect to the
KullbackLeibler divergence and the Hellinger distance between the associated
distributions. We also propose approximation of the posterior mean of the QoI
as a lowrank linear function of the data, and prove optimality of this
approximation with respect to a weighted Bayes risk. Both of these optimal
approximations avoid the explicit computation of the full posterior
distribution of the parameters and instead focus on directions that are well
informed by the data and relevant to the QoI. These directions stem from a
balance among all the components of the goaloriented inverse problem: prior
information, forward model, measurement noise, and ultimate goals. We
illustrate the theory using a highdimensional inverse problem in heat
transfer.

A priori dimension reduction is a widely adopted technique for reducing the
computational complexity of stationary inverse problems. In this setting, the
solution of an inverse problem is parameterized by a lowdimensional basis that
is often obtained from the truncated KarhunenLoeve expansion of the prior
distribution. For highdimensional inverse problems equipped with smoothing
priors, this technique can lead to drastic reductions in parameter dimension
and significant computational savings.
In this paper, we extend the concept of a priori dimension reduction to
nonstationary inverse problems, in which the goal is to sequentially infer the
state of a dynamical system. Our approach proceeds in an offlineonline
fashion. We first identify a lowdimensional subspace in the state space before
solving the inverse problem (the offline phase), using either the method of
"snapshots" or regularized covariance estimation. Then this subspace is used to
reduce the computational complexity of various filtering algorithms  including
the Kalman filter, extended Kalman filter, and ensemble Kalman filter  within
a novel subspaceconstrained Bayesian predictionandupdate procedure (the
online phase). We demonstrate the performance of our new dimension reduction
approach on various numerical examples. In some test cases, our approach
reduces the dimensionality of the original problem by orders of magnitude and
yields up to two orders of magnitude in computational savings.

In many inverse problems, model parameters cannot be precisely determined
from observational data. Bayesian inference provides a mechanism for capturing
the resulting parameter uncertainty, but typically at a high computational
cost. This work introduces a multiscale decomposition that exploits conditional
independence across scales, when present in certain classes of inverse
problems, to decouple Bayesian inference into two stages: (1) a computationally
tractable coarsescale inference problem; and (2) a mapping of the
lowdimensional coarsescale posterior distribution into the original
highdimensional parameter space. This decomposition relies on a
characterization of the nonGaussian joint distribution of coarse and
finescale quantities via optimal transport maps. We demonstrate our approach
on a sequence of inverse problems arising in subsurface flow, using the
multiscale finite element method to discretize the steady state pressure
equation. We compare the multiscale strategy with fulldimensional Markov chain
Monte Carlo on a problem of moderate dimension (100 parameters) and then use it
to infer a conductivity field described by over 10,000 parameters.

We present the fundamentals of a measure transport approach to sampling. The
idea is to construct a deterministic couplingi.e., a transport mapbetween
a complex "target" probability measure of interest and a simpler reference
measure. Given a transport map, one can generate arbitrarily many independent
and unweighted samples from the target simply by pushing forward reference
samples through the map. We consider two different and complementary scenarios:
first, when only evaluations of the unnormalized target density are available,
and second, when the target distribution is known only through a finite
collection of samples. We show that in both settings the desired transports can
be characterized as the solutions of variational problems. We then address
practical issues associated with the optimizationbased construction of
transports: choosing finitedimensional parameterizations of the map, enforcing
monotonicity, quantifying the error of approximate transports, and refining
approximate transports by enriching the corresponding approximation spaces.
Approximate transports can also be used to "Gaussianize" complex distributions
and thus precondition conventional asymptotically exact sampling schemes. We
place the measure transport approach in broader context, describing connections
with other optimizationbased samplers, with inference and density estimation
schemes using optimal transport, and with alternative transformationbased
approaches to simulation. We also sketch current work aimed at the construction
of transport maps in high dimensions, exploiting essential features of the
target distribution (e.g., conditional independence, lowrank structure). The
approaches and algorithms presented here have direct applications to Bayesian
computation and to broader problems of stochastic simulation.

In the Bayesian approach to inverse problems, data are often informative,
relative to the prior, only on a lowdimensional subspace of the parameter
space. Significant computational savings can be achieved by using this subspace
to characterize and approximate the posterior distribution of the parameters.
We first investigate approximation of the posterior covariance matrix as a
lowrank update of the prior covariance matrix. We prove optimality of a
particular update, based on the leading eigendirections of the matrix pencil
defined by the Hessian of the negative loglikelihood and the prior precision,
for a broad class of loss functions. This class includes the F\"{o}rstner
metric for symmetric positive definite matrices, as well as the
KullbackLeibler divergence and the Hellinger distance between the associated
distributions. We also propose two fast approximations of the posterior mean
and prove their optimality with respect to a weighted Bayes risk under
squarederror loss. These approximations are deployed in an offlineonline
manner, where a more costly but dataindependent offline calculation is
followed by fast online evaluations. As a result, these approximations are
particularly useful when repeated posterior mean evaluations are required for
multiple data sets. We demonstrate our theoretical results with several
numerical examples, including highdimensional Xray tomography and an inverse
heat conduction problem. In both of these examples, the intrinsic
lowdimensional structure of the inference problem can be exploited while
producing results that are essentially indistinguishable from solutions
computed in the full space.