• ### Stochastic Gradient Descent in Continuous Time: A Central Limit Theorem(1710.04273)

June 17, 2019 math.PR, math.ST, stat.TH, q-fin.CP, stat.ML
Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. The parameter updates occur in continuous time and satisfy a stochastic differential equation. This paper analyzes the asymptotic convergence rate of the SGDCT algorithm by proving a central limit theorem (CLT) for strongly convex objective functions and, under slightly stronger conditions, for non-convex objective functions as well. An $L^{p}$ convergence rate is also proven for the algorithm in the strongly convex case. The mathematical analysis lies at the intersection of stochastic analysis and statistical learning.
• ### Selection of quasi-stationary states in the Navier-Stokes equation on the torus(1701.04850)

Oct. 16, 2018 math.DS
The two dimensional incompressible Navier-Stokes equation on $D_\delta := [0, 2\pi\delta] \times [0, 2\pi]$ with $\delta \approx 1$, periodic boundary conditions, and viscosity $0 < \nu \ll 1$ is considered. Bars and dipoles, two explicitly given quasi-stationary states of the system, evolve on the time scale $\mathcal{O}(e^{-\nu t})$ and have been shown to play a key role in its long-time evolution. Of particular interest is the role that $\delta$ plays in selecting which of these two states is observed. Recent numerical studies suggest that, after a transient period of rapid decay of the high Fourier modes, the bar state will be selected if $\delta \neq 1$, while the dipole will be selected if $\delta = 1$. Our results support this claim and seek to mathematically formalize it. We consider the system in Fourier space, project it onto a center manifold consisting of the lowest eight Fourier modes, and use this as a model to study the selection of bars and dipoles. It is shown for this ODE model that the value of $\delta$ controls the behavior of the asymptotic ratio of the low modes, thus determining the likelihood of observing a bar state or dipole after an initial transient period. Moreover, in our model, for all $\delta \approx 1$, there is an initial time period in which the high modes decay at the rapid rate $\mathcal{O}(e^{-t/\nu})$, while the low modes evolve at the slower $\mathcal{O}(e^{-\nu t})$ rate. The results for the ODE model are proven using energy estimates and invariant manifolds and further supported by formal asymptotic expansions and numerics.
• ### Analysis of multiscale integrators for multiple attractors and irreversible Langevin samplers(1606.09539)

Oct. 9, 2018 math.NA, math.PR, stat.ME
We study multiscale integrator numerical schemes for a class of stiff stochastic differential equations (SDEs). We consider multiscale SDEs with potentially multiple attractors that behave as diffusions on graphs as the stiffness parameter goes to its limit. Classical numerical discretization schemes, such as the Euler-Maruyama scheme, become unstable as the stiffness parameter converges to its limit and appropriate multiscale integrators can correct for this. We rigorously establish the convergence of the numerical method to the related diffusion on graph, identifying the appropriate choice of discretization parameters. Theoretical results are supplemented by numerical studies on the problem of the recently developing area of introducing irreversibility in Langevin samplers in order to accelerate convergence to equilibrium.
• ### Discrete-Time Statistical Inference for Multiscale Diffusions(1709.02223)

Sept. 12, 2018 math.PR, math.ST, stat.TH, stat.ME, stat.AP
We study statistical inference for small-noise-perturbed multiscale dynamical systems under the assumption that we observe a single time series from the slow process only. We construct estimators for both averaging and homogenization regimes, based on an appropriate misspecified model motivated by a second-order stochastic Taylor expansion of the slow process with respect to a function of the time-scale separation parameter. In the case of a fixed number of observations, we establish consistency, asymptotic normality, and asymptotic statistical efficiency of a minimum contrast estimator (MCE), the limiting variance having been identified explicitly; we furthermore establish consistency and asymptotic normality of a simplified minimum constrast estimator (SMCE), which is however not in general efficient. These results are then extended to the case of high-frequency observations under a condition restricting the rate at which the number of observations may grow vis-\a-vis the separation of scales. Numerical simulations illustrate the theoretical results.
• ### DGM: A deep learning algorithm for solving partial differential equations(1708.07469)

Sept. 5, 2018 math.NA, q-fin.CP, stat.ML, q-fin.MF
High-dimensional PDEs have been a longstanding computational challenge. We propose to solve high-dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions. Our algorithm is meshfree, which is key since meshes become infeasible in higher dimensions. Instead of forming a mesh, the neural network is trained on batches of randomly sampled time and space points. The algorithm is tested on a class of high-dimensional free boundary PDEs, which we are able to accurately solve in up to $200$ dimensions. The algorithm is also tested on a high-dimensional Hamilton-Jacobi-Bellman PDE and Burgers' equation. The deep learning algorithm approximates the general solution to the Burgers' equation for a continuum of different boundary conditions and physical conditions (which can be viewed as a high-dimensional space). We call the algorithm a "Deep Galerkin Method (DGM)" since it is similar in spirit to Galerkin methods, with the solution approximated by a neural network instead of a linear combination of basis functions. In addition, we prove a theorem regarding the approximation power of neural networks for a class of quasilinear parabolic PDEs.
• ### Mean Field Analysis of Neural Networks(1805.01053)

May 2, 2018 math.PR
Machine learning has revolutionized fields such as image, text, and speech recognition. There's also growing interest in applying machine and deep learning ideas in engineering, robotics, biotechnology, and finance. Despite their immense success in practice, there is limited mathematical understanding of neural networks. We mathematically study neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously prove that the empirical distribution of the neural network parameters converges to the solution of a nonlinear partial differential equation. This result can be considered a law of large numbers for neural networks. In addition, a consequence of our analysis is that the trained parameters of the neural network asymptotically become independent, a property which is commonly called "propagation of chaos".
• ### Optimal Investment and Derivative Demand Under Price Impact(1804.09151)

April 24, 2018 math.OC, q-fin.MF
This paper studies the effects of price impact upon optimal investment, as well as the pricing of, and demand for, derivative contracts. Assuming market makers have exponential preferences, we show for general utility functions that a large investor's optimal investment problem with price impact can be re-expressed as a constrained optimization problem in fictitious market without price impact. While typically the (random) constraint set is neither closed nor convex, in several important cases of interest, the constraint is non-binding. In these instances, we explicitly identify optimal demands for derivative contracts, and state three notions of an arbitrage free price. Due to price impact, even if a price is not arbitrage free, the resulting arbitrage opportunity only exists for limited position sizes, and might be ignored because of hedging considerations. Lastly, in a segmented market where large investors interact with local market makers, we show equilibrium positions in derivative contracts are inversely proportional to the market makers' representative risk aversion. Thus, large positions endogenously arise either as market makers approach risk neutrality, or as the number of market makers becomes large.
• ### Pathwise moderate deviations for option pricing(1803.04483)

March 12, 2018 q-fin.PR, math.PR, q-fin.MF
We provide a unifying treatment of pathwise moderate deviations for models commonly used in financial applications, and for related integrated functionals. Suitable scaling allows us to transfer these results into small-time, large-time and tail asymptotics for diffusions, as well as for option prices and realised variances. In passing, we highlight some intuitive relationships between moderate deviations rate functions and their large deviations counterparts; these turn out to be useful for numerical purposes, as large deviations rate functions are often difficult to compute.
• ### Large deviations and averaging for systems of slow--fast stochastic reaction--diffusion equations(1710.02618)

April 30, 2019 math.PR
We study a large deviation principle for a system of stochastic reaction--diffusion equations (SRDEs) with a separation of fast and slow components and small noise in the slow component. The derivation of the large deviation principle is based on the weak convergence method in infinite dimensions, which results in studying averaging for controlled SRDEs. By appropriate choice of the parameters, the fast process and the associated control that arises from the weak convergence method decouple from each other. We show that in this decoupling case one can use the weak convergence method to characterize the limiting process via a "viable pair" that captures the limiting controlled dynamics and the effective invariant measure simultaneously. The characterization of the limit of the controlled slow-fast processes in terms of viable pair enables us to obtain a variational representation of the large deviation action functional. Due to the infinite--dimensional nature of our set--up, the proof of tightness as well as the analysis of the limit process and in particular the proof of the large deviations lower bound is considerably more delicate here than in the finite--dimensional situation. Smoothness properties of optimal controls in infinite dimensions (a necessary step for the large deviations lower bound) need to be established. We emphasize that many issues that are present in the infinite dimensional case, are completely absent in finite dimensions.
• ### Importance sampling for metastable and multiscale dynamical systems(1707.08868)

July 27, 2017 math.PR, stat.ME, math.OC
In this article, we address the issues that come up in the design of importance sampling schemes for rare events associated to stochastic dynamical systems. We focus on the issue of metastability and on the effect of multiple scales. We discuss why seemingly reasonable schemes that follow large deviations optimal paths may perform poorly in practice, even though they are asymptotically optimal. Pre-asymptotic optimality is important when one deals with metastable dynamics and we discuss possible ways as to how to address this issue. Moreover, we discuss how the effect of the multiple scales (either in periodic or random environments) on the efficient design of importance sampling should be addressed. We discuss the mathematical and practical issues that come up, how to overcome some of the issues and discuss future challenges.
• ### Dimension Reduction in Statistical Estimation of Partially Observed Multiscale Processes(1607.06158)

June 26, 2017 math.PR, math.ST, stat.TH, stat.ME, q-fin.ST
We consider partially observed multiscale diffusion models that are specified up to an unknown vector parameter. We establish for a very general class of test functions that the filter of the original model converges to a filter of reduced dimension. Then, this result is used to justify statistical estimation for the unknown parameters of interest based on the model of reduced dimension but using the original available data. This allows to learn the unknown parameters of interest while working in lower dimensions, as opposed to working with the original high dimensional system. Simulation studies support and illustrate the theoretical results.
• ### The effect of heterogeneity on flocking behavior and systemic risk(1607.08287)

June 8, 2017 q-fin.RM, math.PR, q-fin.PM
The goal of this paper is to study organized flocking behavior and systemic risk in heterogeneous mean-field interacting diffusions. We illustrate in a number of case studies the effect of heterogeneity in the behavior of systemic risk in the system, i.e., the risk that several agents default simultaneously as a result of interconnections. We also investigate the effect of heterogeneity on the "flocking behavior" of different agents, i.e., when agents with different dynamics end up following very similar paths and follow closely the mean behavior of the system. Using Laplace asymptotics, we derive an asymptotic formula for the tail of the loss distribution as the number of agents grows to infinity. This characterizes the tail of the loss distribution and the effect of the heterogeneity of the network on the tail loss probability.
• ### Moderate deviations principle for systems of slow-fast diffusions(1611.05903)

June 1, 2017 math.PR
In this paper, we prove the moderate deviations principle (MDP) for a general system of slow-fast dynamics. We provide a unified approach, based on weak convergence ideas and stochastic control arguments, that cover both the averaging and the homogenization regimes. We allow the coefficients to be in the whole space and not just the torus and allow the noises driving the slow and fast processes to be correlated arbitrarily. Similar to the large deviation case, the methodology that we follow allows construction of provably efficient Monte Carlo methods for rare events that fall into the moderate deviations regime.
• ### Rare event simulation via importance sampling for linear SPDE's(1609.04365)

May 4, 2017 math.PR, stat.ME, math.OC
The goal of this paper is to develop provably efficient importance sampling Monte Carlo methods for the estimation of rare events within the class of linear stochastic partial differential equations (SPDEs). We find that if a spectral gap of appropriate size exists, then one can identify a lower dimensional manifold where the rare event takes place. This allows one to build importance sampling changes of measures that perform provably well even pre-asymptotically (i.e. for small but non-zero size of the noise) without degrading in performance due to infinite dimensionality or due to long simulation time horizons. Simulation studies supplement and illustrate the theoretical results.
• ### Stochastic Gradient Descent in Continuous Time(1611.05545)

April 17, 2017 math.PR, math.ST, stat.TH, math.OC, stat.ML
Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. SGDCT performs an online parameter update in continuous time, with the parameter updates $\theta_t$ satisfying a stochastic differential equation. We prove that $\lim_{t \rightarrow \infty} \nabla \bar g(\theta_t) = 0$ where $\bar g$ is a natural objective function for the estimation of the continuous-time dynamics. The convergence proof leverages ergodicity by using an appropriate Poisson equation to help describe the evolution of the parameters for large times. SGDCT can also be used to solve continuous-time optimization problems, such as American options. For certain continuous-time problems, SGDCT has some promising advantages compared to a traditional stochastic gradient descent algorithm. As an example application, SGDCT is combined with a deep neural network to price high-dimensional American options (up to 100 dimensions).
• ### Sequential Monte Carlo for fractional Stochastic Volatility Models(1508.02651)

Feb. 25, 2017 stat.CO, math.ST, stat.TH, stat.ME
In this paper we consider a fractional stochastic volatility model, that is a model in which the volatility may exhibit a long-range dependent or a rough/antipersistent behavior. We propose a dynamic sequential Monte Carlo methodology that is applicable to both long memory and antipersistent processes in order to estimate the volatility as well as the unknown parameters of the model. We establish a central limit theorem for the state and parameter filters and we study asymptotic properties (consistency and asymptotic normality) for the filter. We illustrate our results with a simulation study and we apply our method to estimating the volatility and the parameters of a long-range dependent model for S&P 500 data.
• ### Hypoelliptic multiscale Langevin diffusions: Large deviations, invariant measures and small mass asymptotics(1506.06181)

Feb. 23, 2017 math.AP, math.PR, math.DS
We consider a general class of non-gradient hypoelliptic Langevin diffusions and study two related questions. The first one is large deviations for hypoelliptic multiscale diffusions. The second one is small mass asymptotics of the invariant measure corresponding to hypoelliptic Langevin operators and of related hypoelliptic Poisson equations. The invariant measure corresponding to the hypoelliptic problem and appropriate hypoelliptic Poisson equations enter the large deviations rate function due to the multiscale effects. Based on the small mass asymptotics we derive that the large deviations behavior of the multiscale hypoelliptic diffusion is consistent with the large deviations behavior of its overdamped counterpart. Additionally, we rigorously obtain an asymptotic expansion of the solution to the related density of the invariant measure and to hypoelliptic Poisson equations with respect to the mass parameter, characterizing the order of convergence. The proof of convergence of invariant measures is of independent interest, as it involves an improvement of the hypocoercivity result for the kinetic Fokker-Planck equation. We do not restrict attention to gradient drifts and our proof provides explicit information on the dependence of the bounds of interest in terms of the mass parameter.
• ### Optimal Scaling of the MALA algorithm with Irreversible Proposals for Gaussian targets(1702.01777)

July 1, 2019 math.PR, math.ST, stat.TH, stat.ME
It is well known in many settings that reversible Langevin diffusions in confining potentials converge to equilibrium exponentially fast. Adding irreversible perturbations to the drift of a Langevin diffusion that maintain the same invariant measure accelerates its convergence to stationarity. Many existing works thus advocate the use of such non-reversible dynamics for sampling. When implementing Markov Chain Monte Carlo algorithms (MCMC) using time discretisations of such Stochastic Differential Equations (SDEs), one can append the discretization with the usual Metropolis-Hastings accept-reject step and this is often done in practice because the accept--reject step eliminates bias. On the other hand, such a step makes the resulting chain reversible. It is not known whether adding the accept-reject step preserves the faster mixing properties of the non-reversible dynamics. In this paper, we address this gap between theory and practice by analyzing the optimal scaling of MCMC algorithms constructed from proposal moves that are time-step Euler discretisations of an irreversible SDE, for high dimensional Gaussian target measures. We call the resulting algorithm the \imala, in comparison to the classical MALA algorithm (here {\em ip} is for irreversible proposal). In order to quantify how the cost of the algorithm scales with the dimension $N$, we prove invariance principles for the appropriately rescaled chain. In contrast to the usual MALA algorithm, we show that there could be two regimes asymptotically: (i) a diffusive regime, as in the MALA algorithm and (ii) a `fluid" regime where the limit is an ordinary differential equation. We provide concrete examples where the limit is a diffusion, as in the standard MALA, but with provably higher limiting acceptance probabilities. Numerical results are also given corroborating the theory.
• ### Markov processes with spatial delay: path space characterization, occupation time and properties(1601.03759)

Oct. 5, 2016 math.PR
In this paper, we study one dimensional Markov processes with spatial delay. Since the seminal work of Feller, we know that virtually any one dimensional, strong, homogeneous, continuous Markov process can be uniquely characterized via its infinitesimal generator and the generator's domain of definition. Unlike standard diffusions like Brownian motion, processes with spatial delay spend positive time at a single point of space. Interestingly, the set of times that a delay process spends at its delay point is nowhere dense and forms a positive measure Cantor set. The domain of definition of the generator has restrictions involving second derivatives. In this article we provide a pathwise characterization for processes with delay in terms of an SDE and an occupation time formula involving the symmetric local time. This characterization provides an explicit Doob-Meyer decomposition, demonstrating that such processes are semi-martingales and that all of stochastic calculus including It\^{o} formula and Girsanov formula applies. We also establish an occupation time formula linking the time that the process spends at a delay point with its symmetric local time there. A physical example of a stochastic dynamical system with delay is lastly presented and analyzed.
• ### The pricing of contingent claims and optimal positions in asymptotically complete markets(1509.06210)

Sept. 22, 2016 math.PR, q-fin.MF
We study utility indifference prices and optimal purchasing quantities for a contingent claim, in an incomplete semi-martingale market, in the presence of vanishing hedging errors and/or risk aversion. Assuming that the average indifference price converges to a well defined limit, we prove that optimally taken positions become large in absolute value at a specific rate. We draw motivation from and make connections to Large Deviations theory, and in particular, the celebrated G\"{a}rtner-Ellis theorem. We analyze a series of well studied examples where this limiting behavior occurs, such as fixed markets with vanishing risk aversion, the basis risk model with high correlation, models of large markets with vanishing trading restrictions and the Black-Scholes-Merton model with either vanishing default probabilities or vanishing transaction costs. Lastly, we show that the large claim regime could naturally arise in partial equilibrium models.
• ### Statistical Inference for Perturbed Multiscale Dynamical Systems(1504.07645)

June 15, 2016 math.PR, math.ST, stat.TH
We study statistical inference for small-noise-perturbed multiscale dynamical systems. We prove consistency, asymptotic normality, and convergence of all scaled moments of an appropriately-constructed maximum likelihood estimator (MLE) for a parameter of interest, identifying precisely its limiting variance. We allow full dependence of coefficients on both slow and fast processes, which take values in the full Euclidean space; coefficients in the equation for the slow process need not be bounded and there is no assumption of periodic dependence. The results provide a theoretical basis for calibration of small-noise-perturbed multiscale dynamical systems. Data from numerical simulations are presented to illustrate the theory.
• ### Improving the convergence of reversible samplers(1601.08118)

June 9, 2016 math-ph, math.MP, math.PR, stat.ME
In Monte-Carlo methods the Markov processes used to sample a given target distribution usually satisfy detailed balance, i.e. they are time-reversible. However, relatively recent results have demonstrated that appropriate reversible and irreversible perturbations can accelerate convergence to equilibrium. In this paper we present some general design principles which apply to general Markov processes. Working with the generator of Markov processes, we prove that for some of the most commonly used performance criteria, i.e., spectral gap, asymptotic variance and large deviation functionals, sampling is improved for appropriate reversible and irreversible perturbations of some initially given reversible sampler. Moreover we provide specific constructions for such reversible and irreversible perturbations for various commonly used Markov processes, such as Markov chains and diffusions. In the case of diffusions, we make the discussion more specific using the large deviations rate function as a measure of performance.
• ### Indifference pricing for Contingent Claims: Large Deviations Effects(1410.0384)

Feb. 11, 2016 math.PR, q-fin.MF
We study utility indifference prices and optimal purchasing quantities for a non-traded contingent claim in an incomplete semi-martingale market with vanishing hedging errors. We make connections with the theory of large deviations. We concentrate on sequences of semi-complete markets where in the $n^{th}$ market, the claim $B_n$ admits the decomposition $B_n = D_n+Y_n$. Here, $D_n$ is replicable by trading in the underlying assets $S_n$, but $Y_n$ is independent of $S_n$. Under broad conditions, we may assume that $Y_n$ vanishes in accordance with a large deviations principle as $n$ grows. In this setting, for an exponential investor, we identify the limit of the average indifference price $p_n(q_n)$, for $q_n$ units of $B_n$, as $n\rightarrow \infty$. We show that if $|q_n|\rightarrow\infty$, the limiting price typically differs from the price obtained by assuming bounded positions $\sup_n|q_n|<\infty$, and the difference is explicitly identifiable using large deviations theory. Furthermore, we show that optimal purchase quantities occur at the large deviations scaling, and hence large positions arise endogenously in this setting.
• ### Rare event simulation for multiscale diffusions in random environments(1410.0386)

Sept. 28, 2015 math.PR, stat.ME
We consider systems of stochastic differential equations with multiple scales and small noise and assume that the coefficients of the equations are ergodic and stationary random fields. Our goal is to construct provably-efficient importance sampling Monte Carlo methods that allow efficient computation of rare event probabilities or expectations of functionals that can be associated with rare events. Standard Monte Carlo algorithms perform poorly in the small noise limit and hence fast simulations algorithms become relevant. The presence of multiple scales complicates the design and the analysis of efficient importance sampling schemes. An additional complication is the randomness of the environment. We construct explicit changes of measures that are proven to be logarithmic asymptotically efficient with probability one with respect to the random environment (i.e., in the quenched sense). Numerical simulations support the theoretical results.
• ### Escaping from an attractor: Importance sampling and rest points I(1303.0450)

Sept. 9, 2015 math.PR, math.OC
We discuss importance sampling schemes for the estimation of finite time exit probabilities of small noise diffusions that involve escape from an equilibrium. A factor that complicates the analysis is that rest points are included in the domain of interest. We build importance sampling schemes with provably good performance both pre-asymptotically, that is, for fixed size of the noise, and asymptotically, that is, as the size of the noise goes to zero, and that do not degrade as the time horizon gets large. Simulation studies demonstrate the theoretical results.