
Dense kernel matrices $\Theta \in \mathbb{R}^{N \times N}$ obtained from
point evaluations of a covariance function $G$ at locations $\{ x_{i} \}_{1
\leq i \leq N}$ arise in statistics, machine learning, and numerical analysis.
For covariance functions that are Green's functions of elliptic boundary value
problems and homogeneouslydistributed sampling points, we show how to identify
a subset $S \subset \{ 1 , \dots , N \}^2$, with $\# S = O ( N \log (N)
\log^{d} ( N /\epsilon ) )$, such that the zero fillin incomplete Cholesky
factorisation of the sparse matrix $\Theta_{ij} 1_{( i, j ) \in S}$ is an
$\epsilon$approximation of $\Theta$. This factorisation can provably be
obtained in complexity $O ( N \log( N ) \log^{d}( N /\epsilon) )$ in space and
$O ( N \log^{2}( N ) \log^{2d}( N /\epsilon) )$ in time; we further present
numerical evidence that $d$ can be taken to be the intrinsic dimension of the
data set rather than that of the ambient space. The algorithm only needs to
know the spatial configuration of the $x_{i}$ and does not require an analytic
representation of $G$. Furthermore, this factorization straightforwardly
provides an approximate sparse PCA with optimal rate of convergence in the
operator norm. Hence, by using only subsampling and the incomplete Cholesky
factorization, we obtain, at nearly linear complexity, the compression,
inversion and approximate PCA of a large class of covariance matrices. By
inverting the order of the Cholesky factorization we also obtain a solver for
elliptic PDE with complexity $O ( N \log^{d}( N /\epsilon) )$ in space and $O (
N \log^{2d}( N /\epsilon) )$ in time.

Implicit schemes are popular methods for the integration of time dependent
PDEs such as hyperbolic and parabolic PDEs. However the necessity to solve
corresponding linear systems at each time step constitutes a complexity
bottleneck in their application to PDEs with rough coefficients. We present a
generalization of gamblets introduced in \cite{OwhadiMultigrid:2015} enabling
the resolution of these implicit systems in nearlinear complexity and provide
rigorous apriori error bounds on the resulting numerical approximations of
hyperbolic and parabolic PDEs. These generalized gamblets induce a
multiresolution decomposition of the solution space that is adapted to both the
underlying (hyperbolic and parabolic) PDE (and the system of ODEs resulting
from space discretization) and to the timesteps of the numerical scheme.

Can the theory that reality is a simulation be tested? We investigate this
question based on the assumption that if the system performing the simulation
is finite (i.e. has limited resources), then to achieve low computational
complexity, such a system would, as in a video game, render content (reality)
only at the moment that information becomes available for observation by a
player and not at the moment of detection by a machine (that would be part of
the simulation and whose detection would also be part of the internal
computation performed by the Virtual Reality server before rendering content to
the player). Guided by this principle we describe conceptual wave/particle
duality experiments aimed at testing the simulation theory.

We show how the discovery of robust scalable numerical solvers for arbitrary
bounded linear operators can be automated as a Game Theory problem by
reformulating the process of computing with partial information and limited
resources as that of playing underlying hierarchies of adversarial information
games. When the solution space is a Banach space $B$ endowed with a quadratic
norm $\\cdot\$, the optimal measure (mixed strategy) for such games (e.g. the
adversarial recovery of $u\in B$, given partial measurements $[\phi_i, u]$ with
$\phi_i\in B^*$, using relative error in $\\cdot\$norm as a loss) is a
centered Gaussian field $\xi$ solely determined by the norm $\\cdot\$, whose
conditioning (on measurements) produces optimal bets. When measurements are
hierarchical, the process of conditioning this Gaussian field produces a
hierarchy of elementary bets (gamblets). These gamblets generalize the notion
of Wavelets and Wannier functions in the sense that they are adapted to the
norm $\\cdot\$ and induce a multiresolution decomposition of $B$ that is
adapted to the eigensubspaces of the operator defining the norm $\\cdot\$.
When the operator is localized, we show that the resulting gamblets are
localized both in space and frequency and introduce the Fast Gamblet Transform
(FGT) with rigorous accuracy and (nearlinear) complexity estimates. As the FFT
can be used to solve and diagonalize arbitrary PDEs with constant coefficients,
the FGT can be used to decompose a wide range of continuous linear operators
(including arbitrary continuous linear bijections from $H^s_0$ to $H^{s}$ or
to $L^2$) into a sequence of independent linear systems with uniformly bounded
condition numbers and leads to $\mathcal{O}(N \operatorname{polylog} N)$
solvers and eigenspace adapted Multiresolution Analysis (resulting in near
linear complexity approximation of all eigensubspaces).

We introduce a nearlinear complexity (geometric and meshless/algebraic)
multigrid/multiresolution method for PDEs with rough ($L^\infty$) coefficients
with rigorous apriori accuracy and performance estimates. The method is
discovered through a decision/game theory formulation of the problems of (1)
identifying restriction and interpolation operators (2) recovering a signal
from incomplete measurements based on norm constraints on its image under a
linear operator (3) gambling on the value of the solution of the PDE based on a
hierarchy of nested measurements of its solution or source term. The resulting
elementary gambles form a hierarchy of (deterministic) basis functions of
$H^1_0(\Omega)$ (gamblets) that (1) are orthogonal across subscales/subbands
with respect to the scalar product induced by the energy norm of the PDE (2)
enable sparse compression of the solution space in $H^1_0(\Omega)$ (3) induce
an orthogonal multiresolution operator decomposition. The operating diagram of
the multigrid method is that of an inverted pyramid in which gamblets are
computed locally (by virtue of their exponential decay), hierarchically (from
fine to coarse scales) and the PDE is decomposed into a hierarchy of
independent linear systems with uniformly bounded condition numbers. The
resulting algorithm is parallelizable both in space (via localization) and in
bandwith/subscale (subscales can be computed independently from each other).
Although the method is deterministic it has a natural Bayesian interpretation
under the measure of probability emerging (as a mixed strategy) from the
information game formulation and multiresolution approximations form a
martingale with respect to the filtration induced by the hierarchy of nested
measurements.

We demonstrate that a reproducing kernel Hilbert or Banach space of functions
on a separable absolute Borel space or an analytic subset of a Polish space is
separable if it possesses a Borel measurable feature map.

The practical implementation of Bayesian inference requires numerical
approximation when closedform expressions are not available. What types of
accuracy (convergence) of the numerical approximations guarantee robustness and
what types do not? In particular, is the recursive application of Bayes' rule
robust when subsequent data or posteriors are approximated? When the prior is
the push forward of a distribution by the map induced by the solution of a PDE,
in which norm should that solution be approximated? Motivated by such
questions, we investigate the sensitivity of the distribution of posterior
distributions (i.e. posterior distributionvalued random variables, randomized
through the data) with respect to perturbations of the prior and data
generating distributions in the limit when the number of data points grows
towards infinity.

We show that, for the space of Borel probability measures on a Borel subset
of a Polish metric space, the extreme points of the Prokhorov,
MongeWasserstein and Kantorovich metric balls about a measure whose support
has at most n points, consist of measures whose supports have at most n+2
points. Moreover, we use the Strassen and KantorovichRubinstein duality
theorems to develop representations of supersets of the extreme points based on
linear programming, and then develop these representations towards the goal of
their efficient computation.

The past century has seen a steady increase in the need of estimating and
predicting complex systems and making (possibly critical) decisions with
limited information. Although computers have made possible the numerical
evaluation of sophisticated statistical models, these models are still designed
\emph{by humans} because there is currently no known recipe or algorithm for
dividing the design of a statistical model into a sequence of arithmetic
operations. Indeed enabling computers to \emph{think} as \emph{humans} have the
ability to do when faced with uncertainty is challenging in several major ways:
(1) Finding optimal statistical models remains to be formulated as a well posed
problem when information on the system of interest is incomplete and comes in
the form of a complex combination of sample data, partial knowledge of
constitutive relations and a limited description of the distribution of input
random variables. (2) The space of admissible scenarios along with the space of
relevant information, assumptions, and/or beliefs, tend to be infinite
dimensional, whereas calculus on a computer is necessarily discrete and finite.
With this purpose, this paper explores the foundations of a rigorous framework
for the scientific computation of optimal statistical estimators/models and
reviews their connections with Decision Theory, Machine Learning, Bayesian
Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty
Quantification and Information Based Complexity.

We consider the temporal homogenization of linear ODEs of the form
$\dot{x}=Ax+\epsilon P(t)x+f(t)$, where $P(t)$ is periodic and $\epsilon$ is
small. Using a 2scale expansion approach, we obtain the longtime
approximation $x(t)\approx \exp(At) \left( \Omega(t)+\int_0^t \exp(A \tau)
f(\tau) \, d\tau \right)$, where $\Omega$ solves the cell problem
$\dot{\Omega}=\epsilon B \Omega + \epsilon F(t)$ with an effective matrix $B$
and an explicitlyknown $F(t)$. We provide necessary and sufficient condition
for the accuracy of the approximation (over a $\mathcal{O}(\epsilon^{1})$
timescale), and show how $B$ can be computed (at a cost independent of
$\epsilon$). As a direct application, we investigate the possibility of using
RLC circuits to harvest the energy contained in small scale oscillations of
ambient electromagnetic fields (such as Schumann resonances). Although a RLC
circuit parametrically coupled to the field may achieve such energy extraction
via parametric resonance, its resistance $R$ needs to be smaller than a
threshold $\kappa$ proportional to the fluctuations of the field, thereby
limiting practical applications. We show that if $n$ RLC circuits are
appropriately coupled via mutual capacitances or inductances, then energy
extraction can be achieved when the resistance of each circuit is smaller than
$n\kappa$. Hence, if the resistance of each circuit has a nonzero fixed value,
energy extraction can be made possible through the coupling of a sufficiently
large number $n$ of circuits ($n\approx 1000$ for the first mode of Schumann
resonances and contemporary values of capacitances, inductances and
resistances). The theory is also applied to the control of the oscillation
amplitude of a (damped) oscillator.

For a Gaussian measure on a separable Hilbert space with covariance operator
$C$, we show that the family of conditional measures associated with
conditioning on a closed subspace $S^{\perp}$ are Gaussian with covariance
operator the short $\mathcal{S}(C)$ of the operator $C$ to $S$. We provide two
proofs. The first uses the theory of Gaussian Hilbert spaces and a
characterization of the shorted operator by Andersen and Trapp. The second uses
recent developments by Corach, Maestripieri and Stojanoff on the relationship
between the shorted operator and $C$symmetric oblique projections onto
$S^{\perp}$. To obtain the assertion when such projections do not exist, we
develop an approximation result for the shorted operator by showing, for any
positive operator $A$, how to construct a sequence of approximating operators
$A^{n}$ which possess $A^{n}$symmetric oblique projections onto $S^{\perp}$
such that the sequence of shorted operators $\mathcal{S}(A^{n})$ converges to
$\mathcal{S}(A)$ in the weak operator topology. This result combined with the
martingale convergence of random variables associated with the corresponding
approximations $C^{n}$ establishes the main assertion in general. Moreover, it
in turn strengthens the approximation theorem for shorted operator when the
operator is trace class; then the sequence of shorted operators
$\mathcal{S}(A^{n})$ converges to $\mathcal{S}(A)$ in trace norm.

Numerical homogenization, i.e. the finitedimensional approximation of
solution spaces of PDEs with arbitrary rough coefficients, requires the
identification of accurate basis elements. These basis elements are oftentimes
found after a laborious process of scientific investigation and plain
guesswork. Can this identification problem be facilitated? Is there a general
recipe/decision framework for guiding the design of basis elements? We suggest
that the answer to the above questions could be positive based on the
reformulation of numerical homogenization as a Bayesian Inference problem in
which a given PDE with rough coefficients (or multiscale operator) is excited
with noise (random right hand side/source term) and one tries to estimate the
value of the solution at a given point based on a finite number of
observations. We apply this reformulation to the identification of bases for
the numerical homogenization of arbitrary integrodifferential equations and
show that these bases have optimal recovery properties. In particular we show
how Rough Polyharmonic Splines can be rediscovered as the optimal solution of
a Gaussian filtering problem.

Optimal uncertainty quantification (OUQ) is a framework for numerical
extremecase analysis of stochastic systems with imperfect knowledge of the
underlying probability distribution. This paper presents sufficient conditions
under which an OUQ problem can be reformulated as a finitedimensional convex
optimization problem, for which efficient numerical solutions can be obtained.
The sufficient conditions include that the objective function is piecewise
concave and the constraints are piecewise convex. In particular, we show that
piecewise concave objective functions may appear in applications where the
objective is defined by the optimal value of a parameterized linear program.

With the advent of highperformance computing, Bayesian methods are
increasingly popular tools for the quantification of uncertainty throughout
science and industry. Since these methods impact the making of sometimes
critical decisions in increasingly complicated contexts, the sensitivity of
their posterior conclusions with respect to the underlying models and prior
beliefs is a pressing question for which there currently exist positive and
negative results. We report new results suggesting that, although Bayesian
methods are robust when the number of possible outcomes is finite or when only
a finite number of marginals of the datagenerating distribution are unknown,
they could be generically brittle when applied to continuous systems (and their
discretizations) with finite information on the datagenerating distribution.
If closeness is defined in terms of the total variation metric or the matching
of a finite system of generalized moments, then (1) two practitioners who use
arbitrarily close models and observe the same (possibly arbitrarily large
amount of) data may reach opposite conclusions; and (2) any given prior and
model can be slightly perturbed to achieve any desired posterior conclusions.
The mechanism causing brittlenss/robustness suggests that learning and
robustness are antagonistic requirements and raises the question of a missing
stability condition for using Bayesian Inference in a continuous world under
finite information.

We derive, in the classical framework of Bayesian sensitivity analysis,
optimal lower and upper bounds on posterior values obtained from Bayesian
models that exactly capture an arbitrarily large number of finitedimensional
marginals of the datagenerating distribution and/or that are as close as
desired to the datagenerating distribution in the Prokhorov or total variation
metrics; these bounds show that such models may still make the largest possible
prediction error after conditioning on an arbitrarily large number of sample
data measured at finite precision. These results are obtained through the
development of a reduction calculus for optimization problems over measures on
spaces of measures. We use this calculus to investigate the mechanisms that
generate brittleness/robustness and, in particular, we observe that learning
and robustness are antagonistic properties. It is now well understood that the
numerical resolution of PDEs requires the satisfaction of specific stability
conditions. Is there a missing stability condition for using Bayesian inference
in a continuous world under finite information?

We show that symplectic and linearlyimplicit integrators proposed by [Zhang
and Skeel, 1997] are variational linearizations of Newmark methods. When used
in conjunction with penalty methods (i.e., methods that replace constraints by
stiff potentials), these integrators permit coarse timestepping of
holonomically constrained mechanical systems and bypass the resolution of
nonlinear systems. Although penalty methods are widely employed, an explicit
link to Lagrange multiplier approaches appears to be lacking; such a link is
now provided (in the context of twoscale flow convergence [Tao, Owhadi and
Marsden, 2010]). The variational formulation also allows efficient simulations
of mechanical systems on Lie groups.

The incorporation of priors in the Optimal Uncertainty Quantification (OUQ)
framework \cite{OSSMO:2011} reveals brittleness in Bayesian inference; a model
may share an arbitrarily large number of finitedimensional marginals with, or
be arbitrarily close (in Prokhorov or total variation metrics) to, the
datagenerating distribution and still make the largest possible prediction
error after conditioning on an arbitrarily large number of samples. The initial
purpose of this paper is to unwrap this brittleness mechanism by providing (i)
a quantitative version of the Brittleness Theorem of \cite{BayesOUQ} and (ii) a
detailed and comprehensive analysis of its application to the revealing example
of estimating the mean of a random variable on the unit interval $[0,1]$ using
priors that exactly capture the distribution of an arbitrarily large number of
Hausdorff moments.
However, in doing so, we discovered that the free parameter associated with
Markov and Kre\u{\i}n's canonical representations of truncated Hausdorff
moments generates reproducing kernel identities corresponding to reproducing
kernel Hilbert spaces of polynomials.
Furthermore, these reproducing identities lead to biorthogonal systems of
Selberg integral formulas.
This process of discovery appears to be generic: whereas Karlin and Shapley
used Selberg's integral formula to first compute the volume of the Hausdorff
moment space (the polytope defined by the first $n$ moments of a probability
measure on the interval $[0,1]$), we observe that the computation of that
volume along with higher order moments of the uniform measure on the moment
space, using different finitedimensional representations of subsets of the
infinitedimensional set of probability measures on $[0,1]$ representing the
first $n$ moments, leads to families of equalities corresponding to classical
and new Selberg identities.

We introduce a new variational method for the numerical homogenization of
divergence form elliptic, parabolic and hyperbolic equations with arbitrary
rough ($L^\infty$) coefficients. Our method does not rely on concepts of
ergodicity or scaleseparation but on compactness properties of the solution
space and a new variational approach to homogenization. The approximation space
is generated by an interpolation basis (over scattered points forming a mesh of
resolution $H$) minimizing the $L^2$ norm of the source terms; its
(pre)computation involves minimizing $\mathcal{O}(H^{d})$ quadratic (cell)
problems on (super)localized subdomains of size $\mathcal{O}(H \ln (1/ H))$.
The resulting localized linear systems remain sparse and banded. The resulting
interpolation basis functions are biharmonic for $d\leq 3$, and polyharmonic
for $d\geq 4$, for the operator $\diiv(a\nabla \cdot)$ and can be seen as a
generalization of polyharmonic splines to differential operators with arbitrary
rough coefficients. The accuracy of the method ($\mathcal{O}(H)$ in energy norm
and independent from aspect ratios of the mesh formed by the scattered points)
is established via the introduction of a new class of higherorder Poincar\'{e}
inequalities. The method bypasses (pre)computations on the full domain and
naturally generalizes to time dependent problems, it also provides a natural
solution to the inverse problem of recovering the solution of a divergence form
elliptic equation from a finite number of point measurements.

We study the internal resonance, energy transfer, activation mechanism, and
control of a model of DNA division via parametric resonance. While the system
is robust to noise, this study shows that it is sensitive to specific fine
scale modes and frequencies that could be targeted by low intensity
electromagnetic fields for triggering and controlling the division. The DNA
model is a chain of pendula in a Morse potential. While the (possibly
parametrically excited) system has a large number of degrees of freedom and a
large number of intrinsic time scales, global and slow variables can be
identified by (i) first reducing its dynamic to two modes exchanging energy
between each other and (ii) averaging the dynamic of the reduced system with
respect to the phase of the fastest mode. Surprisingly the global and slow
dynamic of the system remains Hamiltonian (despite the parametric excitation)
and the study of its associated effective potential shows how parametric
excitation can turn the unstable open state into a stable one. Numerical
experiments support the accuracy of the timeaveraged reduced Hamiltonian in
capturing the global and slow dynamic of the full system.

We propose a rigorous framework for Uncertainty Quantification (UQ) in which
the UQ objectives and the assumptions/information set are brought to the
forefront. This framework, which we call \emph{Optimal Uncertainty
Quantification} (OUQ), is based on the observation that, given a set of
assumptions and information about the problem, there exist optimal bounds on
uncertainties: these are obtained as values of welldefined optimization
problems corresponding to extremizing probabilities of failure, or of
deviations, subject to the constraints imposed by the scenarios compatible with
the assumptions and information. In particular, this framework does not
implicitly impose inappropriate assumptions, nor does it repudiate relevant
information. Although OUQ optimization problems are extremely large, we show
that under general conditions they have finitedimensional reductions. As an
application, we develop \emph{Optimal Concentration Inequalities} (OCI) of
Hoeffding and McDiarmid type. Surprisingly, these results show that
uncertainties in input parameters, which propagate to output uncertainties in
the classical sensitivity analysis paradigm, may fail to do so if the transfer
functions (or probability distributions) are imperfectly known. We show how,
for hierarchical structures, this phenomenon may lead to the nonpropagation of
uncertainties or information across scales. In addition, a general algorithmic
framework is developed for OUQ and is tested on the Caltech surrogate model for
hypervelocity impact and on the seismic safety assessment of truss structures,
suggesting the feasibility of the framework for important complex systems. The
introduction of this paper provides both an overview of the paper and a
selfcontained minitutorial about basic concepts and issues of UQ.

We construct finitedimensional approximations of solution spaces of
divergence form operators with $L^\infty$coefficients. Our method does not
rely on concepts of ergodicity or scaleseparation, but on the property that
the solution space of these operators is compactly embedded in $H^1$ if source
terms are in the unit ball of $L^2$ instead of the unit ball of $H^{1}$.
Approximation spaces are generated by solving elliptic PDEs on localized
subdomains with source terms corresponding to approximation bases for $H^2$.
The $H^1$error estimates show that $\mathcal{O}(h^{d})$dimensional spaces
with basis elements localized to subdomains of diameter $\mathcal{O}(h^\alpha
\ln \frac{1}{h})$ (with $\alpha \in [1/2,1)$) result in an
$\mathcal{O}(h^{22\alpha})$ accuracy for elliptic, parabolic and hyperbolic
problems. For highcontrast media, the accuracy of the method is preserved
provided that localized subdomains contain buffer zones of width
$\mathcal{O}(h^\alpha \ln \frac{1}{h})$ where the contrast of the medium
remains bounded. The proposed method can naturally be generalized to vectorial
equations (such as elastodynamics).

We present a multiscale integrator for Hamiltonian systems with slowly
varying quadratic stiff potentials that uses coarse timesteps (analogous to
what the impulse method uses for constant quadratic stiff potentials). This
method is based on the highlynontrivial introduction of two efficient
symplectic schemes for exponentiations of matrices that only require O(n)
matrix multiplications operations at each coarse time step for a preset small
number n. The proposed integrator is shown to be (i) uniformly convergent on
positions; (ii) symplectic in both slow and fast variables; (iii) well adapted
to high dimensional systems. Our framework also provides a general method for
iteratively exponentiating a slowly varying sequence of (possibly high
dimensional) matrices in an efficient way.

We present a new class of integrators for stiff PDEs. These integrators are
generalizations of FLow AVeraging integratORS (FLAVORS) for stiff ODEs and SDEs
introduced in [Tao, Owhadi and Marsden 2010] with the following properties: (i)
Multiscale: they are based on flow averaging and have a computational cost
determined by mesoscopic steps in space and time instead of microscopic steps
in space and time; (ii) Versatile: the method is based on averaging the flows
of the given PDEs (which may have hidden slow and fast processes). This
bypasses the need for identifying explicitly (or numerically) the slow
variables or reduced effective PDEs; (iii) Nonintrusive: A preexisting
numerical scheme resolving the microscopic time scale can be used as a black
box and easily turned into one of the integrators in this paper by turning the
large coefficients on over a microscopic timescale and off during a mesoscopic
timescale; (iv) Convergent over two scales: strongly over slow processes and in
the sense of measures over fast ones; (v) Structurepreserving: for stiff
Hamiltonian PDEs (possibly on manifolds), they can be made to be
multisymplectic, symmetrypreserving (symmetries are group actions that leave
the system invariant) in all variables and variational.

In this contribution, we develop a variational integrator for the simulation
of (stochastic and multiscale) electric circuits. When considering the dynamics
of an electrical circuit, one is faced with three special situations: 1. The
system involves external (control) forcing through external (controlled)
voltage sources and resistors. 2. The system is constrained via the Kirchhoff
current (KCL) and voltage laws (KVL). 3. The Lagrangian is degenerate. Based on
a geometric setting, an appropriate variational formulation is presented to
model the circuit from which the equations of motion are derived. A
timediscrete variational formulation provides an iteration scheme for the
simulation of the electric circuit. Dependent on the discretization, the
intrinsic degeneracy of the system can be canceled for the discrete variational
scheme. In this way, a variational integrator is constructed that gains several
advantages compared to standard integration tools for circuits; in particular,
a comparison to BDF methods (which are usually the method of choice for the
simulation of electric circuits) shows that even for simple LCR circuits, a
better energy behavior and frequency spectrum preservation can be observed
using the developed variational integrator.

We consider a random variable $X$ that takes values in a (possibly
infinitedimensional) topological vector space $\mathcal{X}$. We show that,
with respect to an appropriate "normal distance" on $\mathcal{X}$,
concentration inequalities for linear and nonlinear functions of $X$ are
equivalent. This normal distance corresponds naturally to the concentration
rate in classical concentration results such as Gaussian concentration and
concentration on the Euclidean and Hamming cubes. Under suitable assumptions on
the roundness of the sets of interest, the concentration inequalities so
obtained are asymptotically optimal in the highdimensional limit.