
Containing the recent West African outbreak of Ebola virus (EBOV) required
the deployment of substantial global resources. Operationally, health workers
and surveillance teams treated cases, collected genetic samples, and tracked
case contacts. Despite the substantial progress in analyzing and modeling EBOV
epidemiological data, a complete characterization of the spatiotemporal spread
of Ebola cases remains a challenge. In this work, we offer a novel perspective
on the EBOV epidemic that utilizes virus genome sequences to inform
populationlevel, spatial models. Calibrated to phylogenetic linkages, these
dynamic spatial models provide unique insight into the disease mobility of EBOV
in Sierra Leone. Further, we developed a model selection framework that
identifies important epidemiological variables influencing the spatiotemporal
propagation of EBOV. Consistent with other investigations, our results show
that the spread of EBOV during the beginning and middle portions of the
epidemic strongly depended on the size of and distance between populations. Our
analysis also revealed a substantial decline in the dependence on population
size at the end of the epidemic, coinciding with the largescale intervention
campaign: Operation Western Area Surge. More generally, we believe this
framework, pairing molecular diagnostics with dynamic models, has the potential
to be a powerful forecasting tool along with offering operationallyrelevant
guidance for surveillance and sampling strategies during an epidemic.

We develop an algorithm for model selection which allows for the
consideration of a combinatorially large number of candidate models governing a
dynamical system. The innovation circumvents a disadvantage of standard model
selection which typically limits the number candidate models considered due to
the intractability of computing information criteria. Using a recently
developed sparse identification of nonlinear dynamics algorithm, the
subselection of candidate models near the Pareto frontier allows for a
tractable computation of AIC (Akaike information criteria) or BIC (Bayes
information criteria) scores for the remaining candidate models. The
information criteria hierarchically ranks the most informative models, enabling
the automatic and principled selection of the model with the strongest support
in relation to the time series data. Specifically, we show that AIC scores
place each candidate model in the {\em strong support}, {\em weak support} or
{\em no support} category. The method correctly identifies several canonical
dynamical systems, including an SEIR (susceptibleexposedinfectiousrecovered)
disease model and the Lorenz equations, giving the correct dynamical system as
the only candidate model with strong support.

We propose a sparse regression method capable of discovering the governing
partial differential equation(s) of a given system by time series measurements
in the spatial domain. The regression framework relies on sparsity promoting
techniques to select the nonlinear and partial derivative terms terms of the
governing equations that most accurately represent the data, bypassing a
combinatorially large search through all possible candidate models. The method
balances model complexity and regression accuracy by selecting a parsimonious
model via Pareto analysis. Time series measurements can be made in an Eulerian
framework where the sensors are fixed spatially, or in a Lagrangian framework
where the sensors move with the dynamics. The method is computationally
efficient, robust, and demonstrated to work on a variety of canonical problems
of mathematical physics including NavierStokes, the quantum harmonic
oscillator, and the diffusion equation. Moreover, the method is capable of
disambiguating between potentially nonunique dynamical terms by using multiple
time series taken with different initial data. Thus for a traveling wave, the
method can distinguish between a linear wave equation or the KortewegdeVries
equation, for instance. The method provides a promising new technique for
discovering governing equations and physical laws in parametrized
spatiotemporal systems where firstprinciples derivations are intractable.

Understanding the interplay of order and disorder in chaotic systems is a
central challenge in modern quantitative science. We present a universal,
datadriven decomposition of chaos as an intermittently forced linear system.
This work combines Takens' delay embedding with modern Koopman operator theory
and sparse regression to obtain linear representations of strongly nonlinear
dynamics. The result is a decomposition of chaotic dynamics into a linear model
in the leading delay coordinates with forcing by low energy delay coordinates;
we call this the Hankel alternative view of Koopman (HAVOK) analysis. This
analysis is applied to the canonical Lorenz system, as well as to realworld
examples such as the Earth's magnetic field reversal, and data from
electrocardiogram, electroencephalogram, and measles outbreaks. In each case,
the forcing statistics are nonGaussian, with long tails corresponding to rare
events that trigger intermittent switching and bursting phenomena; this forcing
is highly predictive, providing a clear signature that precedes these events.
Moreover, the activity of the forcing signal demarcates large coherent regions
of phase space where the dynamics are approximately linear from those that are
strongly nonlinear.

We consider the application of Koopman theory to nonlinear partial
differential equations. We demonstrate that the observables chosen for
constructing the Koopman operator are critical for enabling an accurate
approximation to the nonlinear dynamics. If such observables can be found, then
the dynamic mode decomposition algorithm can be enacted to compute a
finitedimensional approximation of the Koopman operator, including its
eigenfunctions, eigenvalues and Koopman modes. Judiciously chosen observables
lead to physically interpretable spatiotemporal features of the complex system
under consideration and provide a connection to manifold learning methods. We
demonstrate the impact of observable selection, including kernel methods, and
construction of the Koopman operator on two canonical, nonlinear PDEs: Burgers'
equation and the nonlinear Schr\"odinger equation. These examples serve to
highlight the most pressing and critical challenge of Koopman theory: a
principled way to select appropriate observables.

Inferring the structure and dynamics of network models is critical to
understanding the functionality and control of complex systems, such as
metabolic and regulatory biological networks. The increasing quality and
quantity of experimental data enable statistical approaches based on
information theory for model selection and goodnessoffit metrics. We propose
an alternative method to infer networked nonlinear dynamical systems by using
sparsitypromoting $\ell_1$ optimization to select a subset of nonlinear
interactions representing dynamics on a fully connected network. Our method
generalizes the sparse identification of nonlinear dynamics (SINDy) algorithm
to dynamical systems with rational function nonlinearities, such as biological
networks. We show that dynamical systems with rational nonlinearities may be
cast in an implicit form, where the equations may be identified in the
nullspace of a library of mixed nonlinearities including the state and
derivative terms; this approach applies more generally to implicit dynamical
systems beyond those containing rational nonlinearities. This method,
implicitSINDy, succeeds in inferring three canonical biological models:
MichaelisMenten enzyme kinetics, the regulatory network for competence in
bacteria, and the metabolic network for yeast glycolysis.

Identifying governing equations from data is a critical step in the modeling
and control of complex dynamical systems. Here, we investigate the datadriven
identification of nonlinear dynamical systems with inputs and forcing using
regression methods, including sparse regression. Specifically, we generalize
the sparse identification of nonlinear dynamics (SINDY) algorithm to include
external inputs and feedback control. This method is demonstrated on examples
including the LotkaVolterra predatorprey model and the Lorenz system with
forcing and control. We also connect the present algorithm with the dynamic
mode decomposition (DMD) and Koopman operator theory to provide a broader
context.

We develop a new generalization of Koopman operator theory that incorporates
the effects of inputs and control. Koopman spectral analysis is a theoretical
tool for the analysis of nonlinear dynamical systems. Moreover, Koopman is
intimately connected to Dynamic Mode Decomposition (DMD), a method that
discovers spatialtemporal coherent modes from data, connects locallinear
analysis to nonlinear operator theory, and importantly creates an equationfree
architecture allowing investigation of complex systems. In actuated systems,
standard Koopman analysis and DMD are incapable of producing inputoutput
models; moreover, the dynamics and the modes will be corrupted by external
forcing. Our new theoretical developments extend Koopman operator theory to
allow for systems with nonlinear inputoutput characteristics. We show how this
generalization is rigorously connected and generalizes a recent development
called Dynamic Mode Decomposition with control (DMDc). We demonstrate this new
theory on nonlinear dynamical systems, including a standard
SusceptibleInfectiousRecovered model with relevance to the analysis of
infectious disease data with mass vaccination (actuation).

In this work, we explore finitedimensional linear representations of
nonlinear dynamical systems by restricting the Koopman operator to an invariant
subspace. The Koopman operator is an infinitedimensional linear operator that
evolves observable functions of the statespace of a dynamical system [Koopman
1931, PNAS]. Dominant terms in the Koopman expansion are typically computed
using dynamic mode decomposition (DMD). DMD uses linear measurements of the
state variables, and it has recently been shown that this may be too
restrictive for nonlinear systems [Williams et al. 2015, JNLS]. Choosing
nonlinear observable functions to form an invariant subspace where it is
possible to obtain linear models, especially those that are useful for control,
is an open challenge.
Here, we investigate the choice of observable functions for Koopman analysis
that enable the use of optimal linear control techniques on nonlinear problems.
First, to include a cost on the state of the system, as in linear quadratic
regulator (LQR) control, it is helpful to include these states in the
observable subspace, as in DMD. However, we find that this is only possible
when there is a single isolated fixed point, as systems with multiple fixed
points or more complicated attractors are not globally topologically conjugate
to a finitedimensional linear system, and cannot be represented by a
finitedimensional linear Koopman subspace that includes the state. We then
present a datadriven strategy to identify relevant observable functions for
Koopman analysis using a new algorithm to determine terms in a dynamical system
by sparse regression of the data in a nonlinear function space [Brunton et al.
2015, arxiv]; we show how this algorithm is related to DMD. Finally, we
demonstrate how to design optimal control laws for nonlinear systems using
techniques from linear optimal control on Koopman invariant subspaces.

The ability to discover physical laws and governing equations from data is
one of humankind's greatest intellectual achievements. A quantitative
understanding of dynamic constraints and balances in nature has facilitated
rapid development of knowledge and enabled advanced technological achievements,
including aircraft, combustion engines, satellites, and electrical power. In
this work, we combine sparsitypromoting techniques and machine learning with
nonlinear dynamical systems to discover governing physical equations from
measurement data. The only assumption about the structure of the model is that
there are only a few important terms that govern the dynamics, so that the
equations are sparse in the space of possible functions; this assumption holds
for many physical systems. In particular, we use sparse regression to determine
the fewest terms in the dynamic governing equations required to accurately
represent the data. The resulting models are parsimonious, balancing model
complexity with descriptive ability while avoiding overfitting. We demonstrate
the algorithm on a wide range of problems, from simple canonical systems,
including linear and nonlinear oscillators and the chaotic Lorenz system, to
the fluid vortex shedding behind an obstacle. The fluid example illustrates the
ability of this method to discover the underlying dynamics of a system that
took experts in the community nearly 30 years to resolve. We also show that
this method generalizes to parameterized, timevarying, or externally forced
systems.

We develop a new method which extends Dynamic Mode Decomposition (DMD) to
incorporate the effect of control to extract loworder models from
highdimensional, complex systems. DMD finds spatialtemporal coherent modes,
connects locallinear analysis to nonlinear operator theory, and provides an
equationfree architecture which is compatible with compressive sensing. In
actuated systems, DMD is incapable of producing an inputoutput model;
moreover, the dynamics and the modes will be corrupted by external forcing. Our
new method, Dynamic Mode Decomposition with control (DMDc), capitalizes on all
of the advantages of DMD and provides the additional innovation of being able
to disambiguate between the underlying dynamics and the effects of actuation,
resulting in accurate inputoutput models. The method is datadriven in that it
does not require knowledge of the underlying governing equations, only
snapshots of state and actuation data from historical, experimental, or
blackbox simulations. We demonstrate the method on highdimensional dynamical
systems, including a model with relevance to the analysis of infectious disease
data with mass vaccination (actuation).

This work develops compressive sampling strategies for computing the dynamic
mode decomposition (DMD) from heavily subsampled or outputprojected data. The
resulting DMD eigenvalues are equal to DMD eigenvalues from the fullstate
data. It is then possible to reconstruct fullstate DMD eigenvectors using
$\ell_1$minimization or greedy algorithms. If fullstate snapshots are
available, it may be computationally beneficial to compress the data, compute a
compressed DMD, and then reconstruct fullstate modes by applying the projected
DMD transforms to fullstate snapshots.
These results rely on a number of theoretical advances. First, we establish
connections between the fullstate and projected DMD. Next, we demonstrate the
invariance of the DMD algorithm to left and right unitary transformations. When
data and modes are sparse in some transform basis, we show a similar invariance
of DMD to measurement matrices that satisfy the socalled restricted isometry
principle from compressive sampling. We demonstrate the success of this
architecture on two model systems. In the first example, we construct a spatial
signal from a sparse vector of Fourier coefficients with a linear dynamical
system driving the coefficients. In the second example, we consider the double
gyre flow field, which is a model for chaotic mixing in the ocean.