-
Due to the scarcity of quantitative details about biological phenomena,
quantitative modeling in systems biology can be compromised, especially at the
subcellular scale. One way to get around this is qualitative modeling because
it requires few to no quantitative information. One of the most popular
qualitative modeling approaches is the Boolean network formalism. However,
Boolean models allow variables to take only two values, which can be too
simplistic in some cases. The present work proposes a modeling approach derived
from Boolean networks where continuous logical operators are used and where
edges can be tuned. Using continuous logical operators allows variables to be
more finely valued while remaining qualitative. To consider that some
biological interactions can be slower or weaker than other ones, edge states
are also computed in order to modulate in speed and strength the signal they
convey. The proposed formalism is illustrated on a toy network coming from the
epidermal growth factor receptor signaling pathway. The obtained simulations
show that continuous results are produced, thus allowing finer analysis. The
simulations also show that modulating the signal conveyed by the edges allows
to incorporate knowledge about the interactions they model. The goal is to
provide enhancements in the ability of qualitative models to simulate the
dynamics of biological networks while limiting the need of quantitative
information.
-
We describe and analyse Levenberg-Marquardt methods for solving systems of
nonlinear equations. More specifically, we propose an adaptive formula for the
Levenberg-Marquardt parameter and analyse the local convergence of the method
under H\"{o}lder metric subregularity of the function defining the equation and
H\"older continuity of its gradient mapping. Further, we analyse the local
convergence of the method under the additional assumption that the
\L{}ojasiewicz gradient inequality holds. We finally report encouraging
numerical results confirming the theoretical findings for the problem of
computing moiety conserved steady states in biochemical reaction networks. This
problem can be cast as finding a solution of a system of nonlinear equations,
where the associated mapping satisfies the \L{}ojasiewicz gradient inequality
assumption.
-
We introduce a tensor-based clustering method to extract sparse,
low-dimensional structure from high-dimensional, multi-indexed datasets. This
framework is designed to enable detection of clusters of data in the presence
of structural requirements which we encode as algebraic constraints in a linear
program. Our clustering method is general and can be tailored to a variety of
applications in science and industry. We illustrate our method on a collection
of experiments measuring the response of genetically diverse breast cancer cell
lines to an array of ligands. Each experiment consists of a cell line-ligand
combination, and contains time-course measurements of the early-signalling
kinases MAPK and AKT at two different ligand dose levels. By imposing
appropriate structural constraints and respecting the multi-indexed structure
of the data, the analysis of clusters can be optimized for biological
interpretation and therapeutic understanding. We then perform a systematic,
large-scale exploration of mechanistic models of MAPK-AKT crosstalk for each
cluster. This analysis allows us to quantify the heterogeneity of breast cancer
cell subtypes, and leads to hypotheses about the signalling mechanisms that
mediate the response of the cell lines to ligands.
-
We map a class of well-mixed stochastic models of biochemical feedback in
steady state to the mean-field Ising model near the critical point. The mapping
provides an effective temperature, magnetic field, order parameter, and heat
capacity that can be extracted from biological data without fitting or
knowledge of the underlying molecular details. We demonstrate this procedure on
fluorescence data from mouse T cells, which reveals distinctions between how
the cells respond to different drugs. We also show that the heat capacity
allows inference of absolute molecule number from fluorescence intensity. We
explain this result in terms of the underlying fluctuations and demonstrate the
generality of our work.
-
Temporal variations in biological systems and more generally in natural
sciences are typically modelled as a set of Ordinary, Partial, or Stochastic
Differential or Difference Equations. Algorithms for learning the structure and
the parameters of a dynamical system are distinguished based on whether time is
discrete or continuous, observations are time-series or time-course, and
whether the system is deterministic or stochastic, however, there is no
approach able to handle the various types of dynamical systems simultaneously.
In this paper, we present a unified approach to infer both the structure and
the parameters of nonlinear dynamical systems of any type under the restriction
of being linear with respect to the unknown parameters. Our approach, which is
named Unified Sparse Dynamics Learning (USDL), constitutes of two steps. First,
an atemporal system of equations is derived through the application of the weak
formulation. Then, assuming a sparse representation for the dynamical system,
we show that the inference problem can be expressed as a sparse signal recovery
problem, allowing the application of an extensive body of algorithms and
theoretical results. Results on simulated data demonstrate the efficacy and
superiority of the USDL algorithm as a function of the experimental setup
(sample size, number of time measurements, number of interventions, noise
level). Additionally, USDL's accuracy significantly correlates with theoretical
metrics such as the exact recovery coefficient. On real single-cell data, the
proposed approach is able to induce high-confidence subgraphs of the signaling
pathway. USDL algorithm has been integrated in SCENERY
(\url{http://scenery.csd.uoc.gr/}); an online tool for single-cell mass
cytometry analytics.
-
The advent of high--throughput transcription profiling technologies has
enabled identification of genes and pathways associated with disease, providing
new avenues for precision medicine. A key challenge is to analyze this data in
the context of the regulatory networks and pathways that control cellular
processes, while still obtaining insights that can be used to design new
diagnostic and therapeutic interventions. While classical differential
expression analysis provides specific and hence targetable gene-level insights,
it does not include any systems-level information. On the other hand, pathway
analyses integrate systems-level information with expression data, but are
often limited in their ability to indicate specific molecular targets. We
introduce GeneSurrounder, an analysis method that takes into account the
complex structure of interaction networks to identify specific genes that
disrupt pathway activity in a disease-specific manner. GeneSurrounder
integrates transcriptomic data and pathway network information in a novel
two-step procedure to detect genes that (i) appear to influence the expression
of other genes local to it in the network and (ii) are part of a subnetwork of
differentially expressed genes. Combined, this evidence can be used to pinpoint
specific genes that have a mechanistic role in the phenotype of interest.
Applying GeneSurrounder to three distinct ovarian cancer studies using a global
KEGG network, we show that our method is able to identify biologically relevant
genes and genes missed by single-gene association tests, integrate pathway and
expression data, and yield more consistent results across multiple studies of
the same phenotype than competing methods.
-
This paper proposes a control theoretic framework to model and analyze the
self-organized pattern formation of molecular concentrations in biomolecular
communication networks, emerging applications in synthetic biology. In
biomolecular communication networks, bionanomachines, or biological cells,
communicate with each other using a cell-to-cell communication mechanism
mediated by a diffusible signaling molecule, thereby the dynamics of molecular
concentrations are approximately modeled as a reaction-diffusion system with a
single diffuser. We first introduce a feedback model representation of the
reaction-diffusion system and provide a systematic local stability/instability
analysis tool using the root locus of the feedback system. The instability
analysis then allows us to analytically derive the conditions for the
self-organized spatial pattern formation, or Turing pattern formation, of the
bionanomachines. We propose a novel synthetic biocircuit motif called
activator-repressor-diffuser system and show that it is one of the minimum
biomolecular circuits that admit self-organized patterns over cell population.
-
In this paper we investigate the complexity of model selection and model
testing for dynamical systems with toric steady states. Such systems frequently
arise in the study of chemical reaction networks. We do this by formulating
these tasks as a constrained optimization problem in Euclidean space. This
optimization problem is known as a Euclidean distance problem; the complexity
of solving this problem is measured by an invariant called the Euclidean
distance (ED) degree. We determine closed-form expressions for the ED degree of
the steady states of several families of chemical reaction networks with toric
steady states and arbitrarily many reactions. To illustrate the utility of this
work we show how the ED degree can be used as a tool for estimating the
computational cost of solving the model testing and model selection problems.
-
Transcription factors (TFs) exert their regulatory action by binding to DNA
with specific sequence preferences. However, different TFs can partially share
their binding sequences due to their common evolutionary origin. This
`redundancy' of binding defines a way of organizing TFs in `motif families' by
grouping TFs with similar binding preferences. Since these ultimately define
the TF target genes, the motif family organization entails information about
the structure of transcriptional regulation as it has been shaped by evolution.
Focusing on the human TF repertoire, we show that a one-parameter evolutionary
model of the Birth-Death-Innovation type can explain the TF empirical
ripartition in motif families, and allows to highlight the relevant
evolutionary forces at the origin of this organization. Moreover, the model
allows to pinpoint few deviations from the neutral scenario it assumes: three
over-expanded families (including HOX and FOX genes), a set of `singleton' TFs
for which duplication seems to be selected against, and a higher-than-average
rate of diversification of the binding preferences of TFs with a Zinc Finger
DNA binding domain. Finally, a comparison of the TF motif family organization
in different eukaryotic species suggests an increase of redundancy of binding
with organism complexity.
-
Controlling stochastic reactions networks is a challenging problem with
important implications in various fields such as systems and synthetic biology.
Various regulation motifs have been discovered or posited over the recent
years, the most recent one being the so-called Antithetic Integral Control
(AIC) motif in Briat et al. (Cell Systems, 2016). Several favorable properties
for the AIC motif have been demonstrated for classes of reaction networks that
satisfy certain irreducibility, ergodicity and output controllability
conditions. Here we address the problem of verifying these conditions for large
sets of reaction networks with fixed topology using two different approaches.
The first one is quantitative and relies on the notion of interval matrices
while the second one is qualitative and is based on sign properties of
matrices. The obtained results lie in the same spirit as those obtained in
Briat et al. (Cell Systems, 2016) where properties of reaction networks are
independently characterized in terms of control theoretic concepts, linear
programming conditions and graph theoretic conditions.
-
The stochastic dynamics of biochemical networks are usually modelled with the
chemical master equation (CME). The stationary distributions of CMEs are seldom
solvable analytically, and numerical methods typically produce estimates with
uncontrolled errors. Here, we introduce mathematical programming approaches
that yield approximations of these distributions with computable error bounds
which enable the verification of their accuracy. First, we use semidefinite
programming to compute increasingly tighter upper and lower bounds on the
moments of the stationary distributions for networks with rational
propensities. Second, we use these moment bounds to formulate linear programs
that yield convergent upper and lower bounds on the stationary distributions
themselves, their marginals and stationary averages. The bounds obtained also
provide a computational test for the uniqueness of the distribution. In the
unique case, the bounds form an approximation of the stationary distribution
with a computable bound on its error. In the non-unique case, our approach
yields converging approximations of the ergodic distributions. We illustrate
our methodology through several biochemical examples taken from the literature:
Schl\"ogl's model for a chemical bifurcation, a two-dimensional toggle switch,
a model for bursty gene expression, and a dimerisation model with multiple
stationary distributions.
-
The formalism of partial information decomposition provides independent or
non-overlapping components constituting total information content provided by a
set of source variables about the target variable. These components are
recognised as unique information, synergistic information and, redundant
information. The metric of net synergy, conceived as the difference between
synergistic and redundant information, is capable of detecting synergy,
redundancy and, information independence among stochastic variables. And it can
be quantified, as it is done here, using appropriate combinations of different
Shannon mutual information terms. Utilisation of such a metric in network
motifs with the nodes representing different biochemical species, involved in
information sharing, uncovers rich store for interesting results. In the
current study, we make use of this formalism to obtain a comprehensive
understanding of the relative information processing mechanism in a diamond
motif and two of its sub-motifs namely bifurcation and integration motif
embedded within the diamond motif. The emerging patterns of synergy and
redundancy and their effective contribution towards ensuring high fidelity
information transmission are duly compared in the sub-motifs and independent
motifs (bifurcation and integration). In this context, the crucial roles played
by various time scales and activation coefficients in the network topologies
are especially emphasised. We show that the origin of synergy and redundancy in
information transmission can be physically justified by decomposing diamond
motif into bifurcation and integration motif.
-
Sensitivity analysis of biochemical reactions aims at quantifying the
dependence of the reaction dynamics on the reaction rates. The computation of
the parameter sensitivities, however, poses many computational challenges when
taking stochastic noise into account. This paper proposes a new finite
difference method for efficiently computing sensitivities of biochemical
reactions. We employ propensity bounds of reactions to couple the simulation of
the nominal and perturbed processes. The exactness of the simulation is
reserved by applying the rejection-based mechanism. For each simulation step,
the nominal and perturbed processes under our coupling strategy are
synchronized and often jump together, increasing their positive correlation and
hence reducing the variance of the estimator. The distinctive feature of our
approach in comparison with existing coupling approaches is that it only needs
to maintain a single data structure storing propensity bounds of reactions
during the simulation of the nominal and perturbed processes. Our approach
allows to computing sensitivities of many reaction rates simultaneously.
Moreover, the data structure does not require to be updated frequently, hence
improving the computational cost. This feature is especially useful when
applied to large reaction networks. We benchmark our method on biological
reaction models to prove its applicability and efficiency.
-
In the immune system, T cells can quickly discriminate between foreign and
self ligands with high accuracy. There is evidence that T-cells achieve this
remarkable performance utilizing a network architecture based on a
generalization of kinetic proofreading (KPR). KPR-based mechanisms actively
consume energy to increase the specificity beyond what is possible in
equilibrium.An important theoretical question that arises is to understand the
trade-offs and fundamental limits on accuracy, speed, and dissipation (energy
consumption) in KPR and its generalization. Here, we revisit this question
through numerical simulations where we simultaneously measure the speed,
accuracy, and energy consumption of the KPR and adaptive sorting networks for
different parameter choices. Our simulations highlight the existence of a
'feasible operating regime' in the speed-energy-accuracy plane where T-cells
can quickly differentiate between foreign and self ligands at reasonable energy
expenditure. We give general arguments for why we expect this feasible
operating regime to be a generic property of all KPR-based biochemical networks
and discuss implications for our understanding of the T cell receptor circuit.
-
Model-based prediction of stochastic noise in biomolecular reactions often
resorts to approximation with unknown precision. As a result, unexpected
stochastic fluctuation causes a headache for the designers of biomolecular
circuits. This paper proposes a convex optimization approach to quantifying the
steady state moments of molecular copy counts with theoretical rigor. We show
that the stochastic moments lie in a convex semi-algebraic set specified by
linear matrix inequalities. Thus, the upper and the lower bounds of some
moments can be computed by a semidefinite program. Using a protein dimerization
process as an example, we demonstrate that the proposed method can precisely
predict the mean and the variance of the copy number of the monomer protein.
-
Investigations of topological uniqueness of gene interaction networks in
cancer cells are essential for understanding this disease. Based on the random
matrix theory, we study the distribution of the nearest neighbor level spacings
$P(s)$ of interaction matrices for gene networks in human cancer cells. The
interaction matrices are computed using the Cancer Network Galaxy (TCNG)
database, which is a repository of gene interactions inferred by a Bayesian
network model. 256 NCBI GEO entries regarding gene expressions in human cancer
cells have been selected for the Bayesian network calculations in TCNG. We
observe the Wigner distribution of $P(s)$ when the gene networks are dense
networks that have more than $\sim 38,000$ edges. In the opposite case, when
the networks have smaller numbers of edges, the distribution $P(s)$ becomes the
Poisson distribution. We investigate relevance of $P(s)$ both to the size of
the networks and to edge frequencies that manifest reliance of the inferred
gene interactions.
-
Biological systems are driven by intricate interactions among the complex
array of molecules that comprise the cell. Many methods have been developed to
reconstruct network models of those interactions. These methods often draw on
large numbers of samples with measured gene expression profiles to infer
connections between genes (or gene products). The result is an aggregate
network model representing a single estimate for the likelihood of each
interaction, or "edge," in the network. While informative, aggregate models
fail to capture the heterogeneity that is represented in any population. Here
we propose a method to reverse engineer sample-specific networks from aggregate
network models. We demonstrate the accuracy and applicability of our approach
in several data sets, including simulated data, microarray expression data from
synchronized yeast cells, and RNA-seq data collected from human lymphoblastoid
cell lines. We show that these sample-specific networks can be used to study
changes in network topology across time and to characterize shifts in gene
regulation that may not be apparent in expression data. We believe the ability
to generate sample-specific networks will greatly facilitate the application of
network methods to the increasingly large, complex, and heterogeneous
multi-omic data sets that are currently being generated, and ultimately support
the emerging field of precision network medicine.
-
Phenotypical variability in the absence of genetic variation often reflects
complex energetic landscapes associated with underlying gene regulatory
networks (GRNs). In this view, different phenotypes are associated with
alternative states of complex nonlinear systems: stable attractors in
deterministic models or modes of stationary distributions in stochastic
descriptions. We provide theoretical and practical characterizations of these
landscapes, specifically focusing on stochastic slow promoter kinetics, a time
scale relevant when transcription factor binding and unbinding are affected by
epigenetic processes like DNA methylation and chromatin remodeling. In this
case, largely unexplored except for numerical simulations, adiabatic
approximations of promoter kinetics are not appropriate. In contrast to the
existing literature, we provide rigorous analytic characterizations of multiple
modes. A general formal approach gives insight into the influence of parameters
and the prediction of how changes in GRN wiring, for example through mutations
or artificial interventions, impact the possible number, location, and
likelihood of alternative states. We adapt tools from the mathematical field of
singular perturbation theory to represent stationary distributions of Chemical
Master Equations for GRNs as mixtures of Poisson distributions and obtain
explicit formulas for the locations and probabilities of metastable states as a
function of the parameters describing the system. As illustrations, the theory
is used to tease out the role of cooperative binding in stochastic models in
comparison to deterministic models, and applications are given to various model
systems, such as toggle switches in isolation or in communicating populations,
a synthetic oscillator and a trans-differentiation network.
-
We consider stochastically modeled reaction networks and prove that if a
constant solution to the Kolmogorov forward equation decays fast enough
relatively to the transition rates, then the model is non-explosive. In
particular, complex balanced reaction networks are non-explosive.
-
The graph-related symmetries of a reaction network give rise to certain
special equilibria (such as complex balanced equilibria) in deterministic
models of dynamics of the reaction network. Correspondingly, in the stochastic
setting, when modeled as a continuous-time Markov chain, these symmetries give
rise to certain special stationary measures. Previous work by Anderson, Craciun
and Kurtz identified stationary distributions of a complex balanced network;
later Cappelletti and Wiuf developed the notion of complex balancing for
stochastic systems. We define and establish the relations between reaction
balanced measure, complex balanced measure, reaction vector balanced measure,
and cycle balanced measure and prove that with mild additional hypotheses, the
former two are stationary distributions. Furthermore, in spirit of earlier work
by Joshi, we give sufficient conditions under which detailed balance of the
stationary distribution of Markov chain models implies the existence of
positive detailed balance equilibria for the related deterministic reaction
network model. Finally, we provide a complete map of the implications between
balancing properties of deterministic and corresponding stochastic reaction
systems, such as complex balance, reaction balance, reaction vector balance and
cycle balance.
-
To estimate the time, many organisms, ranging from cyanobacteria to animals,
employ a circadian clock which is based on a limit-cycle oscillator that can
tick autonomously with a nearly 24h period. Yet, a limit-cycle oscillator is
not essential for knowing the time, as exemplified by bacteria that possess an
'hourglass': a system that when forced by an oscillatory light input exhibits
robust oscillations from which the organism can infer the time, but that in the
absence of driving relaxes to a stable fixed point. Here, using models of the
Kai system of cyanobacteria, we compare a limit- cycle oscillator with two
hourglass models, one that without driving relaxes exponentially and one that
does so in an oscillatory fashion. In the limit of low input-noise, all three
systems are equally informative on time, yet in the regime of high input-noise
the limit-cycle oscillator is far superior. The same behavior is found in the
Stuart-Landau model, indicating that our result is universal.
-
Many biological activities are induced by cellular chemical reactions of
diffusing reactants. The dynamics of such systems can be captured by stochastic
reaction networks. A recent numerical study has shown that diffusion can
significantly enhance the fluctuations in gene regulatory networks. However,
the general relation between diffusion and stochastic system dynamics remains
unveiled. Here we examine the general relation and find the universal law under
which the steady-state distribution in complex balanced networks is
diffusion-independent. Here, complex balance is the non-equilibrium
generalization of detailed balance. We also find that for a diffusion-included
network with a Poisson-like steady-state distribution, the diffusion can be
ignored at steady-state. We then derive a necessary and sufficient condition
for such steady-state network distributions. Moreover, we exactly prove that
the stochastic dynamics of linear reaction networks are unaffected by diffusion
at any arbitrary time. Our findings shed light on the fundamental question of
when diffusion can be neglected, or (if non-negligible) its effects on the
stochastic dynamics of the reaction network.
-
Networks are ubiquitous in biology where they encode connectivity patterns at
all scales of organization, from molecular to the biome. However, biological
networks are noisy due to the limitations of the technology used to generate
them as well as inherent variation within samples. The presence of high levels
of noise can hamper discovery of patterns and dynamics encapsulated by these
networks. Here we propose Network Enhancement (NE), a novel method for
improving the signal-to-noise ratio of undirected, weighted networks, and
thereby improving the performance of downstream analysis. NE applies a novel
operator that induces sparsity and leverages higher-order network structures to
remove weak edges and enhance real connections. This iterative approach has a
closed-form solution at convergence with desirable performance properties. We
demonstrate the effectiveness of NE in denoising biological networks for
several challenging yet important problems. Our experiments show that NE
improves gene function prediction by denoising interaction networks from 22
human tissues. Further, we use NE to interpret noisy Hi-C contact maps from the
human genome and demonstrate its utility across varying degrees of data
quality. Finally, when applied to fine-grained species identification, NE
outperforms alternative approaches by a significant margin. Taken together, our
results indicate that NE is widely applicable for denoising weighted biological
networks, especially when they contain high levels of noise.
-
Logical models offer a simple but powerful means to understand the complex
dynamics of biochemical regulation, without the need to estimate kinetic
parameters. However, even simple automata components can lead to collective
dynamics that are computationally intractable when aggregated into networks. In
previous work we demonstrated that automata network models of biochemical
regulation are highly canalizing, whereby many variable states and their
groupings are redundant (Marques-Pita and Rocha, 2013). The precise charting
and measurement of such canalization simplifies these models, making even very
large networks amenable to analysis. Moreover, canalization plays an important
role in the control, robustness, modularity and criticality of Boolean network
dynamics, especially those used to model biochemical regulation (Gates and
Rocha, 2016; Gates et al., 2016; Manicka, 2017). Here we describe a new
publicly-available Python package that provides the necessary tools to extract,
measure, and visualize canalizing redundancy present in Boolean network models.
It extracts the pathways most effective in controlling dynamics in these
models, including their effective graph and dynamics canalizing map, as well as
other tools to uncover minimum sets of control variables.
-
Infection by many viruses begins with fusion of viral and cellular lipid
membranes, followed by entry of viral contents into the target cell and
ultimately, after many biochemical steps, integration of viral DNA into that of
the host cell. The early steps of membrane fusion and viral capsid entry are
mediated by adsorption to the cell surface, and receptor and coreceptor
binding. HIV-1 specifically targets CD4+ helper T-cells of the human immune
system and binds to the receptor CD4 and coreceptor CCR5 before fusion is
initiated. Previous experiments have been performed using a cell line
(293-Affinofile) in which the expression of CD4 and CCR5 concentration were
independently controlled. After exposure to HIV-1 of various strains, the
resulting infectivity was measured through the fraction of infected cells. To
design and evaluate the effectiveness of drug therapies that target the
inhibition of the entry processes, an accurate functional relationship between
the CD4/CCR5 concentrations and infectivity is desired in order to more
quantitatively analyze experimental data. We propose three kinetic models
describing the possible mechanistic processes involved in HIV entry and fit
their predictions to infectivity measurements, contrasting and comparing
different outcomes. Our approach allows interpretation of the clustering of
infectivity of different strains of HIV-1 in the space of mechanistic kinetic
parameters. Our model fitting also allows inference of nontrivial
stoichiometries of receptor and coreceptor binding and provides a framework
through which to quantitatively investigate the effectiveness of fusion
inhibitors and neutralizing antibodies.