
Stochastic gradient descent in continuous time (SGDCT) provides a
computationally efficient method for the statistical learning of
continuoustime models, which are widely used in science, engineering, and
finance. The SGDCT algorithm follows a (noisy) descent direction along a
continuous stream of data. The parameter updates occur in continuous time and
satisfy a stochastic differential equation. This paper analyzes the asymptotic
convergence rate of the SGDCT algorithm by proving a central limit theorem
(CLT) for strongly convex objective functions and, under slightly stronger
conditions, for nonconvex objective functions as well. An $L^{p}$ convergence
rate is also proven for the algorithm in the strongly convex case. The
mathematical analysis lies at the intersection of stochastic analysis and
statistical learning.

The two dimensional incompressible NavierStokes equation on $D_\delta := [0,
2\pi\delta] \times [0, 2\pi]$ with $\delta \approx 1$, periodic boundary
conditions, and viscosity $0 < \nu \ll 1$ is considered. Bars and dipoles, two
explicitly given quasistationary states of the system, evolve on the time
scale $\mathcal{O}(e^{\nu t})$ and have been shown to play a key role in its
longtime evolution. Of particular interest is the role that $\delta$ plays in
selecting which of these two states is observed. Recent numerical studies
suggest that, after a transient period of rapid decay of the high Fourier
modes, the bar state will be selected if $\delta \neq 1$, while the dipole will
be selected if $\delta = 1$. Our results support this claim and seek to
mathematically formalize it. We consider the system in Fourier space, project
it onto a center manifold consisting of the lowest eight Fourier modes, and use
this as a model to study the selection of bars and dipoles. It is shown for
this ODE model that the value of $\delta$ controls the behavior of the
asymptotic ratio of the low modes, thus determining the likelihood of observing
a bar state or dipole after an initial transient period. Moreover, in our
model, for all $\delta \approx 1$, there is an initial time period in which the
high modes decay at the rapid rate $\mathcal{O}(e^{t/\nu})$, while the low
modes evolve at the slower $\mathcal{O}(e^{\nu t})$ rate. The results for the
ODE model are proven using energy estimates and invariant manifolds and further
supported by formal asymptotic expansions and numerics.

We study multiscale integrator numerical schemes for a class of stiff
stochastic differential equations (SDEs). We consider multiscale SDEs with
potentially multiple attractors that behave as diffusions on graphs as the
stiffness parameter goes to its limit. Classical numerical discretization
schemes, such as the EulerMaruyama scheme, become unstable as the stiffness
parameter converges to its limit and appropriate multiscale integrators can
correct for this. We rigorously establish the convergence of the numerical
method to the related diffusion on graph, identifying the appropriate choice of
discretization parameters. Theoretical results are supplemented by numerical
studies on the problem of the recently developing area of introducing
irreversibility in Langevin samplers in order to accelerate convergence to
equilibrium.

We study statistical inference for smallnoiseperturbed multiscale dynamical
systems under the assumption that we observe a single time series from the slow
process only. We construct estimators for both averaging and homogenization
regimes, based on an appropriate misspecified model motivated by a secondorder
stochastic Taylor expansion of the slow process with respect to a function of
the timescale separation parameter. In the case of a fixed number of
observations, we establish consistency, asymptotic normality, and asymptotic
statistical efficiency of a minimum contrast estimator (MCE), the limiting
variance having been identified explicitly; we furthermore establish
consistency and asymptotic normality of a simplified minimum constrast
estimator (SMCE), which is however not in general efficient. These results are
then extended to the case of highfrequency observations under a condition
restricting the rate at which the number of observations may grow vis\`avis
the separation of scales. Numerical simulations illustrate the theoretical
results.

Highdimensional PDEs have been a longstanding computational challenge. We
propose to solve highdimensional PDEs by approximating the solution with a
deep neural network which is trained to satisfy the differential operator,
initial condition, and boundary conditions. Our algorithm is meshfree, which is
key since meshes become infeasible in higher dimensions. Instead of forming a
mesh, the neural network is trained on batches of randomly sampled time and
space points. The algorithm is tested on a class of highdimensional free
boundary PDEs, which we are able to accurately solve in up to $200$ dimensions.
The algorithm is also tested on a highdimensional HamiltonJacobiBellman PDE
and Burgers' equation. The deep learning algorithm approximates the general
solution to the Burgers' equation for a continuum of different boundary
conditions and physical conditions (which can be viewed as a highdimensional
space). We call the algorithm a "Deep Galerkin Method (DGM)" since it is
similar in spirit to Galerkin methods, with the solution approximated by a
neural network instead of a linear combination of basis functions. In addition,
we prove a theorem regarding the approximation power of neural networks for a
class of quasilinear parabolic PDEs.

Machine learning has revolutionized fields such as image, text, and speech
recognition. There's also growing interest in applying machine and deep
learning ideas in engineering, robotics, biotechnology, and finance. Despite
their immense success in practice, there is limited mathematical understanding
of neural networks. We mathematically study neural networks in the asymptotic
regime of simultaneously (A) large network sizes and (B) large numbers of
stochastic gradient descent training iterations. We rigorously prove that the
empirical distribution of the neural network parameters converges to the
solution of a nonlinear partial differential equation. This result can be
considered a law of large numbers for neural networks. In addition, a
consequence of our analysis is that the trained parameters of the neural
network asymptotically become independent, a property which is commonly called
"propagation of chaos".

This paper studies the effects of price impact upon optimal investment, as
well as the pricing of, and demand for, derivative contracts. Assuming market
makers have exponential preferences, we show for general utility functions that
a large investor's optimal investment problem with price impact can be
reexpressed as a constrained optimization problem in fictitious market without
price impact. While typically the (random) constraint set is neither closed nor
convex, in several important cases of interest, the constraint is nonbinding.
In these instances, we explicitly identify optimal demands for derivative
contracts, and state three notions of an arbitrage free price. Due to price
impact, even if a price is not arbitrage free, the resulting arbitrage
opportunity only exists for limited position sizes, and might be ignored
because of hedging considerations. Lastly, in a segmented market where large
investors interact with local market makers, we show equilibrium positions in
derivative contracts are inversely proportional to the market makers'
representative risk aversion. Thus, large positions endogenously arise either
as market makers approach risk neutrality, or as the number of market makers
becomes large.

We provide a unifying treatment of pathwise moderate deviations for models
commonly used in financial applications, and for related integrated
functionals. Suitable scaling allows us to transfer these results into
smalltime, largetime and tail asymptotics for diffusions, as well as for
option prices and realised variances. In passing, we highlight some intuitive
relationships between moderate deviations rate functions and their large
deviations counterparts; these turn out to be useful for numerical purposes, as
large deviations rate functions are often difficult to compute.

We study a large deviation principle for a system of stochastic
reactiondiffusion equations (SRDEs) with a separation of fast and slow
components and small noise in the slow component. The derivation of the large
deviation principle is based on the weak convergence method in infinite
dimensions, which results in studying averaging for controlled SRDEs. By
appropriate choice of the parameters, the fast process and the associated
control that arises from the weak convergence method decouple from each other.
We show that in this decoupling case one can use the weak convergence method to
characterize the limiting process via a "viable pair" that captures the
limiting controlled dynamics and the effective invariant measure
simultaneously. The characterization of the limit of the controlled slowfast
processes in terms of viable pair enables us to obtain a variational
representation of the large deviation action functional. Due to the
infinitedimensional nature of our setup, the proof of tightness as well as
the analysis of the limit process and in particular the proof of the large
deviations lower bound is considerably more delicate here than in the
finitedimensional situation. Smoothness properties of optimal controls in
infinite dimensions (a necessary step for the large deviations lower bound)
need to be established. We emphasize that many issues that are present in the
infinite dimensional case, are completely absent in finite dimensions.

In this article, we address the issues that come up in the design of
importance sampling schemes for rare events associated to stochastic dynamical
systems. We focus on the issue of metastability and on the effect of multiple
scales. We discuss why seemingly reasonable schemes that follow large
deviations optimal paths may perform poorly in practice, even though they are
asymptotically optimal. Preasymptotic optimality is important when one deals
with metastable dynamics and we discuss possible ways as to how to address this
issue. Moreover, we discuss how the effect of the multiple scales (either in
periodic or random environments) on the efficient design of importance sampling
should be addressed. We discuss the mathematical and practical issues that come
up, how to overcome some of the issues and discuss future challenges.

We consider partially observed multiscale diffusion models that are specified
up to an unknown vector parameter. We establish for a very general class of
test functions that the filter of the original model converges to a filter of
reduced dimension. Then, this result is used to justify statistical estimation
for the unknown parameters of interest based on the model of reduced dimension
but using the original available data. This allows to learn the unknown
parameters of interest while working in lower dimensions, as opposed to working
with the original high dimensional system. Simulation studies support and
illustrate the theoretical results.

The goal of this paper is to study organized flocking behavior and systemic
risk in heterogeneous meanfield interacting diffusions. We illustrate in a
number of case studies the effect of heterogeneity in the behavior of systemic
risk in the system, i.e., the risk that several agents default simultaneously
as a result of interconnections. We also investigate the effect of
heterogeneity on the "flocking behavior" of different agents, i.e., when agents
with different dynamics end up following very similar paths and follow closely
the mean behavior of the system. Using Laplace asymptotics, we derive an
asymptotic formula for the tail of the loss distribution as the number of
agents grows to infinity. This characterizes the tail of the loss distribution
and the effect of the heterogeneity of the network on the tail loss
probability.

In this paper, we prove the moderate deviations principle (MDP) for a general
system of slowfast dynamics. We provide a unified approach, based on weak
convergence ideas and stochastic control arguments, that cover both the
averaging and the homogenization regimes. We allow the coefficients to be in
the whole space and not just the torus and allow the noises driving the slow
and fast processes to be correlated arbitrarily. Similar to the large deviation
case, the methodology that we follow allows construction of provably efficient
Monte Carlo methods for rare events that fall into the moderate deviations
regime.

The goal of this paper is to develop provably efficient importance sampling
Monte Carlo methods for the estimation of rare events within the class of
linear stochastic partial differential equations (SPDEs). We find that if a
spectral gap of appropriate size exists, then one can identify a lower
dimensional manifold where the rare event takes place. This allows one to build
importance sampling changes of measures that perform provably well even
preasymptotically (i.e. for small but nonzero size of the noise) without
degrading in performance due to infinite dimensionality or due to long
simulation time horizons. Simulation studies supplement and illustrate the
theoretical results.

Stochastic gradient descent in continuous time (SGDCT) provides a
computationally efficient method for the statistical learning of
continuoustime models, which are widely used in science, engineering, and
finance. The SGDCT algorithm follows a (noisy) descent direction along a
continuous stream of data. SGDCT performs an online parameter update in
continuous time, with the parameter updates $\theta_t$ satisfying a stochastic
differential equation. We prove that $\lim_{t \rightarrow \infty} \nabla \bar
g(\theta_t) = 0$ where $\bar g$ is a natural objective function for the
estimation of the continuoustime dynamics. The convergence proof leverages
ergodicity by using an appropriate Poisson equation to help describe the
evolution of the parameters for large times. SGDCT can also be used to solve
continuoustime optimization problems, such as American options. For certain
continuoustime problems, SGDCT has some promising advantages compared to a
traditional stochastic gradient descent algorithm. As an example application,
SGDCT is combined with a deep neural network to price highdimensional American
options (up to 100 dimensions).

In this paper we consider a fractional stochastic volatility model, that is a
model in which the volatility may exhibit a longrange dependent or a
rough/antipersistent behavior. We propose a dynamic sequential Monte Carlo
methodology that is applicable to both long memory and antipersistent processes
in order to estimate the volatility as well as the unknown parameters of the
model. We establish a central limit theorem for the state and parameter filters
and we study asymptotic properties (consistency and asymptotic normality) for
the filter. We illustrate our results with a simulation study and we apply our
method to estimating the volatility and the parameters of a longrange
dependent model for S&P 500 data.

We consider a general class of nongradient hypoelliptic Langevin diffusions
and study two related questions. The first one is large deviations for
hypoelliptic multiscale diffusions. The second one is small mass asymptotics of
the invariant measure corresponding to hypoelliptic Langevin operators and of
related hypoelliptic Poisson equations. The invariant measure corresponding to
the hypoelliptic problem and appropriate hypoelliptic Poisson equations enter
the large deviations rate function due to the multiscale effects. Based on the
small mass asymptotics we derive that the large deviations behavior of the
multiscale hypoelliptic diffusion is consistent with the large deviations
behavior of its overdamped counterpart. Additionally, we rigorously obtain an
asymptotic expansion of the solution to the related density of the invariant
measure and to hypoelliptic Poisson equations with respect to the mass
parameter, characterizing the order of convergence. The proof of convergence of
invariant measures is of independent interest, as it involves an improvement of
the hypocoercivity result for the kinetic FokkerPlanck equation. We do not
restrict attention to gradient drifts and our proof provides explicit
information on the dependence of the bounds of interest in terms of the mass
parameter.

It is well known in many settings that reversible Langevin diffusions in
confining potentials converge to equilibrium exponentially fast. Adding
irreversible perturbations to the drift of a Langevin diffusion that maintain
the same invariant measure accelerates its convergence to stationarity. Many
existing works thus advocate the use of such nonreversible dynamics for
sampling. When implementing Markov Chain Monte Carlo algorithms (MCMC) using
time discretisations of such Stochastic Differential Equations (SDEs), one can
append the discretization with the usual MetropolisHastings acceptreject step
and this is often done in practice because the acceptreject step eliminates
bias. On the other hand, such a step makes the resulting chain reversible. It
is not known whether adding the acceptreject step preserves the faster mixing
properties of the nonreversible dynamics. In this paper, we address this gap
between theory and practice by analyzing the optimal scaling of MCMC algorithms
constructed from proposal moves that are timestep Euler discretisations of an
irreversible SDE, for high dimensional Gaussian target measures. We call the
resulting algorithm the \imala, in comparison to the classical MALA algorithm
(here {\em ip} is for irreversible proposal). In order to quantify how the cost
of the algorithm scales with the dimension $N$, we prove invariance principles
for the appropriately rescaled chain. In contrast to the usual MALA algorithm,
we show that there could be two regimes asymptotically: (i) a diffusive regime,
as in the MALA algorithm and (ii) a ``fluid" regime where the limit is an
ordinary differential equation. We provide concrete examples where the limit is
a diffusion, as in the standard MALA, but with provably higher limiting
acceptance probabilities. Numerical results are also given corroborating the
theory.

In this paper, we study one dimensional Markov processes with spatial delay.
Since the seminal work of Feller, we know that virtually any one dimensional,
strong, homogeneous, continuous Markov process can be uniquely characterized
via its infinitesimal generator and the generator's domain of definition.
Unlike standard diffusions like Brownian motion, processes with spatial delay
spend positive time at a single point of space. Interestingly, the set of times
that a delay process spends at its delay point is nowhere dense and forms a
positive measure Cantor set. The domain of definition of the generator has
restrictions involving second derivatives. In this article we provide a
pathwise characterization for processes with delay in terms of an SDE and an
occupation time formula involving the symmetric local time. This
characterization provides an explicit DoobMeyer decomposition, demonstrating
that such processes are semimartingales and that all of stochastic calculus
including It\^{o} formula and Girsanov formula applies. We also establish an
occupation time formula linking the time that the process spends at a delay
point with its symmetric local time there. A physical example of a stochastic
dynamical system with delay is lastly presented and analyzed.

We study utility indifference prices and optimal purchasing quantities for a
contingent claim, in an incomplete semimartingale market, in the presence of
vanishing hedging errors and/or risk aversion. Assuming that the average
indifference price converges to a well defined limit, we prove that optimally
taken positions become large in absolute value at a specific rate. We draw
motivation from and make connections to Large Deviations theory, and in
particular, the celebrated G\"{a}rtnerEllis theorem. We analyze a series of
well studied examples where this limiting behavior occurs, such as fixed
markets with vanishing risk aversion, the basis risk model with high
correlation, models of large markets with vanishing trading restrictions and
the BlackScholesMerton model with either vanishing default probabilities or
vanishing transaction costs. Lastly, we show that the large claim regime could
naturally arise in partial equilibrium models.

We study statistical inference for smallnoiseperturbed multiscale dynamical
systems. We prove consistency, asymptotic normality, and convergence of all
scaled moments of an appropriatelyconstructed maximum likelihood estimator
(MLE) for a parameter of interest, identifying precisely its limiting variance.
We allow full dependence of coefficients on both slow and fast processes, which
take values in the full Euclidean space; coefficients in the equation for the
slow process need not be bounded and there is no assumption of periodic
dependence. The results provide a theoretical basis for calibration of
smallnoiseperturbed multiscale dynamical systems. Data from numerical
simulations are presented to illustrate the theory.

In MonteCarlo methods the Markov processes used to sample a given target
distribution usually satisfy detailed balance, i.e. they are timereversible.
However, relatively recent results have demonstrated that appropriate
reversible and irreversible perturbations can accelerate convergence to
equilibrium. In this paper we present some general design principles which
apply to general Markov processes. Working with the generator of Markov
processes, we prove that for some of the most commonly used performance
criteria, i.e., spectral gap, asymptotic variance and large deviation
functionals, sampling is improved for appropriate reversible and irreversible
perturbations of some initially given reversible sampler. Moreover we provide
specific constructions for such reversible and irreversible perturbations for
various commonly used Markov processes, such as Markov chains and diffusions.
In the case of diffusions, we make the discussion more specific using the large
deviations rate function as a measure of performance.

We study utility indifference prices and optimal purchasing quantities for a
nontraded contingent claim in an incomplete semimartingale market with
vanishing hedging errors. We make connections with the theory of large
deviations. We concentrate on sequences of semicomplete markets where in the
$n^{th}$ market, the claim $B_n$ admits the decomposition $B_n = D_n+Y_n$.
Here, $D_n$ is replicable by trading in the underlying assets $S_n$, but $Y_n$
is independent of $S_n$. Under broad conditions, we may assume that $Y_n$
vanishes in accordance with a large deviations principle as $n$ grows. In this
setting, for an exponential investor, we identify the limit of the average
indifference price $p_n(q_n)$, for $q_n$ units of $B_n$, as $n\rightarrow
\infty$. We show that if $q_n\rightarrow\infty$, the limiting price typically
differs from the price obtained by assuming bounded positions
$\sup_nq_n<\infty$, and the difference is explicitly identifiable using large
deviations theory. Furthermore, we show that optimal purchase quantities occur
at the large deviations scaling, and hence large positions arise endogenously
in this setting.

We consider systems of stochastic differential equations with multiple scales
and small noise and assume that the coefficients of the equations are ergodic
and stationary random fields. Our goal is to construct provablyefficient
importance sampling Monte Carlo methods that allow efficient computation of
rare event probabilities or expectations of functionals that can be associated
with rare events. Standard Monte Carlo algorithms perform poorly in the small
noise limit and hence fast simulations algorithms become relevant. The presence
of multiple scales complicates the design and the analysis of efficient
importance sampling schemes. An additional complication is the randomness of
the environment. We construct explicit changes of measures that are proven to
be logarithmic asymptotically efficient with probability one with respect to
the random environment (i.e., in the quenched sense). Numerical simulations
support the theoretical results.

We discuss importance sampling schemes for the estimation of finite time exit
probabilities of small noise diffusions that involve escape from an
equilibrium. A factor that complicates the analysis is that rest points are
included in the domain of interest. We build importance sampling schemes with
provably good performance both preasymptotically, that is, for fixed size of
the noise, and asymptotically, that is, as the size of the noise goes to zero,
and that do not degrade as the time horizon gets large. Simulation studies
demonstrate the theoretical results.