• Containing the recent West African outbreak of Ebola virus (EBOV) required the deployment of substantial global resources. Operationally, health workers and surveillance teams treated cases, collected genetic samples, and tracked case contacts. Despite the substantial progress in analyzing and modeling EBOV epidemiological data, a complete characterization of the spatiotemporal spread of Ebola cases remains a challenge. In this work, we offer a novel perspective on the EBOV epidemic that utilizes virus genome sequences to inform population-level, spatial models. Calibrated to phylogenetic linkages, these dynamic spatial models provide unique insight into the disease mobility of EBOV in Sierra Leone. Further, we developed a model selection framework that identifies important epidemiological variables influencing the spatiotemporal propagation of EBOV. Consistent with other investigations, our results show that the spread of EBOV during the beginning and middle portions of the epidemic strongly depended on the size of and distance between populations. Our analysis also revealed a substantial decline in the dependence on population size at the end of the epidemic, coinciding with the large-scale intervention campaign: Operation Western Area Surge. More generally, we believe this framework, pairing molecular diagnostics with dynamic models, has the potential to be a powerful forecasting tool along with offering operationally-relevant guidance for surveillance and sampling strategies during an epidemic.
  • We develop an algorithm for model selection which allows for the consideration of a combinatorially large number of candidate models governing a dynamical system. The innovation circumvents a disadvantage of standard model selection which typically limits the number candidate models considered due to the intractability of computing information criteria. Using a recently developed sparse identification of nonlinear dynamics algorithm, the sub-selection of candidate models near the Pareto frontier allows for a tractable computation of AIC (Akaike information criteria) or BIC (Bayes information criteria) scores for the remaining candidate models. The information criteria hierarchically ranks the most informative models, enabling the automatic and principled selection of the model with the strongest support in relation to the time series data. Specifically, we show that AIC scores place each candidate model in the {\em strong support}, {\em weak support} or {\em no support} category. The method correctly identifies several canonical dynamical systems, including an SEIR (susceptible-exposed-infectious-recovered) disease model and the Lorenz equations, giving the correct dynamical system as the only candidate model with strong support.
  • We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity promoting techniques to select the nonlinear and partial derivative terms terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework where the sensors are fixed spatially, or in a Lagrangian framework where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems of mathematical physics including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially non-unique dynamical terms by using multiple time series taken with different initial data. Thus for a traveling wave, the method can distinguish between a linear wave equation or the Korteweg-deVries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parametrized spatio-temporal systems where first-principles derivations are intractable.
  • Understanding the interplay of order and disorder in chaotic systems is a central challenge in modern quantitative science. We present a universal, data-driven decomposition of chaos as an intermittently forced linear system. This work combines Takens' delay embedding with modern Koopman operator theory and sparse regression to obtain linear representations of strongly nonlinear dynamics. The result is a decomposition of chaotic dynamics into a linear model in the leading delay coordinates with forcing by low energy delay coordinates; we call this the Hankel alternative view of Koopman (HAVOK) analysis. This analysis is applied to the canonical Lorenz system, as well as to real-world examples such as the Earth's magnetic field reversal, and data from electrocardiogram, electroencephalogram, and measles outbreaks. In each case, the forcing statistics are non-Gaussian, with long tails corresponding to rare events that trigger intermittent switching and bursting phenomena; this forcing is highly predictive, providing a clear signature that precedes these events. Moreover, the activity of the forcing signal demarcates large coherent regions of phase space where the dynamics are approximately linear from those that are strongly nonlinear.
  • We consider the application of Koopman theory to nonlinear partial differential equations. We demonstrate that the observables chosen for constructing the Koopman operator are critical for enabling an accurate approximation to the nonlinear dynamics. If such observables can be found, then the dynamic mode decomposition algorithm can be enacted to compute a finite-dimensional approximation of the Koopman operator, including its eigenfunctions, eigenvalues and Koopman modes. Judiciously chosen observables lead to physically interpretable spatio-temporal features of the complex system under consideration and provide a connection to manifold learning methods. We demonstrate the impact of observable selection, including kernel methods, and construction of the Koopman operator on two canonical, nonlinear PDEs: Burgers' equation and the nonlinear Schr\"odinger equation. These examples serve to highlight the most pressing and critical challenge of Koopman theory: a principled way to select appropriate observables.
  • Inferring the structure and dynamics of network models is critical to understanding the functionality and control of complex systems, such as metabolic and regulatory biological networks. The increasing quality and quantity of experimental data enable statistical approaches based on information theory for model selection and goodness-of-fit metrics. We propose an alternative method to infer networked nonlinear dynamical systems by using sparsity-promoting $\ell_1$ optimization to select a subset of nonlinear interactions representing dynamics on a fully connected network. Our method generalizes the sparse identification of nonlinear dynamics (SINDy) algorithm to dynamical systems with rational function nonlinearities, such as biological networks. We show that dynamical systems with rational nonlinearities may be cast in an implicit form, where the equations may be identified in the null-space of a library of mixed nonlinearities including the state and derivative terms; this approach applies more generally to implicit dynamical systems beyond those containing rational nonlinearities. This method, implicit-SINDy, succeeds in inferring three canonical biological models: Michaelis-Menten enzyme kinetics, the regulatory network for competence in bacteria, and the metabolic network for yeast glycolysis.
  • Identifying governing equations from data is a critical step in the modeling and control of complex dynamical systems. Here, we investigate the data-driven identification of nonlinear dynamical systems with inputs and forcing using regression methods, including sparse regression. Specifically, we generalize the sparse identification of nonlinear dynamics (SINDY) algorithm to include external inputs and feedback control. This method is demonstrated on examples including the Lotka-Volterra predator--prey model and the Lorenz system with forcing and control. We also connect the present algorithm with the dynamic mode decomposition (DMD) and Koopman operator theory to provide a broader context.
  • We develop a new generalization of Koopman operator theory that incorporates the effects of inputs and control. Koopman spectral analysis is a theoretical tool for the analysis of nonlinear dynamical systems. Moreover, Koopman is intimately connected to Dynamic Mode Decomposition (DMD), a method that discovers spatial-temporal coherent modes from data, connects local-linear analysis to nonlinear operator theory, and importantly creates an equation-free architecture allowing investigation of complex systems. In actuated systems, standard Koopman analysis and DMD are incapable of producing input-output models; moreover, the dynamics and the modes will be corrupted by external forcing. Our new theoretical developments extend Koopman operator theory to allow for systems with nonlinear input-output characteristics. We show how this generalization is rigorously connected and generalizes a recent development called Dynamic Mode Decomposition with control (DMDc). We demonstrate this new theory on nonlinear dynamical systems, including a standard Susceptible-Infectious-Recovered model with relevance to the analysis of infectious disease data with mass vaccination (actuation).
  • In this work, we explore finite-dimensional linear representations of nonlinear dynamical systems by restricting the Koopman operator to an invariant subspace. The Koopman operator is an infinite-dimensional linear operator that evolves observable functions of the state-space of a dynamical system [Koopman 1931, PNAS]. Dominant terms in the Koopman expansion are typically computed using dynamic mode decomposition (DMD). DMD uses linear measurements of the state variables, and it has recently been shown that this may be too restrictive for nonlinear systems [Williams et al. 2015, JNLS]. Choosing nonlinear observable functions to form an invariant subspace where it is possible to obtain linear models, especially those that are useful for control, is an open challenge. Here, we investigate the choice of observable functions for Koopman analysis that enable the use of optimal linear control techniques on nonlinear problems. First, to include a cost on the state of the system, as in linear quadratic regulator (LQR) control, it is helpful to include these states in the observable subspace, as in DMD. However, we find that this is only possible when there is a single isolated fixed point, as systems with multiple fixed points or more complicated attractors are not globally topologically conjugate to a finite-dimensional linear system, and cannot be represented by a finite-dimensional linear Koopman subspace that includes the state. We then present a data-driven strategy to identify relevant observable functions for Koopman analysis using a new algorithm to determine terms in a dynamical system by sparse regression of the data in a nonlinear function space [Brunton et al. 2015, arxiv]; we show how this algorithm is related to DMD. Finally, we demonstrate how to design optimal control laws for nonlinear systems using techniques from linear optimal control on Koopman invariant subspaces.
  • The ability to discover physical laws and governing equations from data is one of humankind's greatest intellectual achievements. A quantitative understanding of dynamic constraints and balances in nature has facilitated rapid development of knowledge and enabled advanced technological achievements, including aircraft, combustion engines, satellites, and electrical power. In this work, we combine sparsity-promoting techniques and machine learning with nonlinear dynamical systems to discover governing physical equations from measurement data. The only assumption about the structure of the model is that there are only a few important terms that govern the dynamics, so that the equations are sparse in the space of possible functions; this assumption holds for many physical systems. In particular, we use sparse regression to determine the fewest terms in the dynamic governing equations required to accurately represent the data. The resulting models are parsimonious, balancing model complexity with descriptive ability while avoiding overfitting. We demonstrate the algorithm on a wide range of problems, from simple canonical systems, including linear and nonlinear oscillators and the chaotic Lorenz system, to the fluid vortex shedding behind an obstacle. The fluid example illustrates the ability of this method to discover the underlying dynamics of a system that took experts in the community nearly 30 years to resolve. We also show that this method generalizes to parameterized, time-varying, or externally forced systems.
  • We develop a new method which extends Dynamic Mode Decomposition (DMD) to incorporate the effect of control to extract low-order models from high-dimensional, complex systems. DMD finds spatial-temporal coherent modes, connects local-linear analysis to nonlinear operator theory, and provides an equation-free architecture which is compatible with compressive sensing. In actuated systems, DMD is incapable of producing an input-output model; moreover, the dynamics and the modes will be corrupted by external forcing. Our new method, Dynamic Mode Decomposition with control (DMDc), capitalizes on all of the advantages of DMD and provides the additional innovation of being able to disambiguate between the underlying dynamics and the effects of actuation, resulting in accurate input-output models. The method is data-driven in that it does not require knowledge of the underlying governing equations, only snapshots of state and actuation data from historical, experimental, or black-box simulations. We demonstrate the method on high-dimensional dynamical systems, including a model with relevance to the analysis of infectious disease data with mass vaccination (actuation).
  • This work develops compressive sampling strategies for computing the dynamic mode decomposition (DMD) from heavily subsampled or output-projected data. The resulting DMD eigenvalues are equal to DMD eigenvalues from the full-state data. It is then possible to reconstruct full-state DMD eigenvectors using $\ell_1$-minimization or greedy algorithms. If full-state snapshots are available, it may be computationally beneficial to compress the data, compute a compressed DMD, and then reconstruct full-state modes by applying the projected DMD transforms to full-state snapshots. These results rely on a number of theoretical advances. First, we establish connections between the full-state and projected DMD. Next, we demonstrate the invariance of the DMD algorithm to left and right unitary transformations. When data and modes are sparse in some transform basis, we show a similar invariance of DMD to measurement matrices that satisfy the so-called restricted isometry principle from compressive sampling. We demonstrate the success of this architecture on two model systems. In the first example, we construct a spatial signal from a sparse vector of Fourier coefficients with a linear dynamical system driving the coefficients. In the second example, we consider the double gyre flow field, which is a model for chaotic mixing in the ocean.