
We present a systematic coarsegraining (CG) strategy for many particle
molecular systems based on cluster expansion techniques. We construct a
hierarchy of coarsegrained Hamiltonians with interaction potentials consisting
of two, three and higher body interactions. The accuracy of the derived cluster
expansion based on interatomic potentials is examined over a range of various
temperatures and densities and compared to direct computation of pair potential
of mean force. The comparison of the coarsegrained simulations is done on the
basis of the structural properties, against the detailed allatom data. We give
specific examples for methane and ethane molecules in which the coarsegrained
variable is the center of mass of the molecule. We investigate different
temperature and density regimes, and we examine differences between the methane
and ethane systems. Results show that the cluster expansion formalism can be
used in order to provide accurate effective pair and threebody CG potentials
at high $T$ and low $\rho$ regimes. In the liquid regime the threebody
effective CG potentials give a small improvement, over the typical pair CG
ones; however in order to get significantly better results one needs to
consider even higher order terms.

In this paper, we discuss informationtheoretic tools for obtaining optimized
coarsegrained molecular models for both equilibrium and nonequilibrium
molecular dynamics. The latter are ubiquitous in physicochemical and biological
applications, where they are typically associated with coupling mechanisms,
multiphysics and/or boundary conditions. In general the nonequilibrium steady
states are not known explicitly as they do not necessarily have a Gibbs
structure.
The presented approach can compare microscopic behavior of molecular systems
to parametric and nonparametric coarsegrained one using the relative entropy
between distributions on the path space and setting up a corresponding path
space variational inference problem. The methods can become entirely
datadriven when the microscopic dynamics are replaced with corresponding
correlated data in the form of time series. Furthermore, we present connections
and generalizations of force matching methods in coarsegraining with
pathspace information methods, as well as demonstrate the enhanced
transferability of informationbased parameterizations to general observables
due to information inequalities.
We further discuss methodological connections between informationbased
coarsegraining of molecular systems and variational inference methods
primarily developed in the machine learning community. However, we note that
the work presented here addresses variational inference for correlated time
series due to the focus on dynamics. The applicability of the proposed methods
is demonstrated on highdimensional stochastic processes given by Langevin,
overdamped and driven Langevin dynamics of interacting particles.

Using the probabilistic language of conditional expectations we reformulate
the force matching method for coarsegraining of molecular systems as a
projection on spaces of coarse observables. A practical outcome of this
probabilistic description is the link of the force matching method with
thermodynamic integration. This connection provides a way to systematically
construct a local mean force in order to optimally approximate the potential of
mean force through force matching. We introduce a generalized force matching
condition for the local mean force in the sense that allows the approximation
of the potential of mean force under both linear and nonlinear coarse graining
mappings (e.g., reaction coordinates, endtoend length of chains).
Furthermore, we study the equivalence of force matching with relative entropy
minimization which we derive for general nonlinear coarse graining maps. We
present in detail the generalized force matching condition through applications
to specific examples in molecular systems.

In this paper we extend the parametric sensitivity analysis (SA) methodology
proposed in Ref. [Y. Pantazis and M. A. Katsoulakis, J. Chem. Phys. 138, 054115
(2013)] to continuous time and continuous space Markov processes represented by
stochastic differential equations and, particularly, stochastic molecular
dynamics as described by the Langevin equation. The utilized SA method is based
on the computation of the informationtheoretic (and thermodynamic) quantity of
relative entropy rate (RER) and the associated Fisher information matrix (FIM)
between path distributions. A major advantage of the pathwise SA method is that
both RER and pathwise FIM depend only on averages of the force field therefore
they are tractable and computable as ergodic averages from a single run of the
molecular dynamics simulation both in equilibrium and in nonequilibrium steady
state regimes. We validate the performance of the extended SA method to two
different molecular stochastic systems, a standard LennardJones fluid and an
allatom methane liquid and compare the obtained parameter sensitivities with
parameter sensitivities on three popular and wellstudied observable functions,
namely, the radial distribution function, the mean squared displacement and the
pressure. Results show that the RERbased sensitivities are highly correlated
with the observablebased sensitivities.