-
Due to the scarcity of quantitative details about biological phenomena,
quantitative modeling in systems biology can be compromised, especially at the
subcellular scale. One way to get around this is qualitative modeling because
it requires few to no quantitative information. One of the most popular
qualitative modeling approaches is the Boolean network formalism. However,
Boolean models allow variables to take only two values, which can be too
simplistic in some cases. The present work proposes a modeling approach derived
from Boolean networks where continuous logical operators are used and where
edges can be tuned. Using continuous logical operators allows variables to be
more finely valued while remaining qualitative. To consider that some
biological interactions can be slower or weaker than other ones, edge states
are also computed in order to modulate in speed and strength the signal they
convey. The proposed formalism is illustrated on a toy network coming from the
epidermal growth factor receptor signaling pathway. The obtained simulations
show that continuous results are produced, thus allowing finer analysis. The
simulations also show that modulating the signal conveyed by the edges allows
to incorporate knowledge about the interactions they model. The goal is to
provide enhancements in the ability of qualitative models to simulate the
dynamics of biological networks while limiting the need of quantitative
information.
-
Estimation of the number of species or unobserved classes from a random
sample of the underlying population is a ubiquitous problem in statistics. In
classical settings, the size of the sample is usually small. New technologies
such as high-throughput DNA sequencing have allowed for the sampling of
extremely large and heterogeneous populations at scales not previously
attainable or even considered. New algorithms are required that take advantage
of the size of the data to account for heterogeneity, but are also sufficiently
fast and scale well with large data. We present a non-parametric moment-based
estimator that is both computationally efficient and is sufficiently flexible
to account for heterogeneity in the abundances of underlying population. This
estimator is based on an extension of a popular moment-based lower bound (Chao,
1984), originally developed by Harris (1959) but unattainable due to the lack
of economical algorithms to solve the system of nonlinear equation required for
estimation. We apply results from the classical moment problem to show that
solutions can be obtained efficiently, allowing for estimators that are
simultaneously conservative and use more information. This is critical for
modern genomic applications, where there may be many large experiments that
require the application of species estimation. We present applications of our
estimator to estimating T-Cell receptor repertoire and dropout in single cell
RNA-seq experiments.
-
In usual demographic analysis, force of mortality is a function of one
variable, that is, of age. In this article bi-variate and multivariate force of
mortality functions are introduced for the first time to explain mortality
differentials. The pattern of mortality in a population is one of the strong
influencing factors in determining the life expectancies at various ages in the
population. Considering univariate functions of age only to understand the
human mortality data without associating with other variables could lead to
incomplete analysis. The reasons behind declining forces of mortality globally
could be studied using the proposed functions. Other applications of
multivariate forces of mortality could be in actuarial sciences.
-
Cryo-electron microscopy provides 2-D projection images of the 3-D electron
scattering intensity of many instances of the particle under study (e.g., a
virus). Both symmetry (rotational point groups) and heterogeneity are important
aspects of biological particles and both aspects can be combined by describing
the electron scattering intensity of the particle as a stochastic process with
a symmetric probability law and therefore symmetric moments. A maximum
likelihood estimator implemented by an expectation-maximization algorithm is
described which estimates the unknown statistics of the electron scattering
intensity stochastic process from images of instances of the particle. The
algorithm is demonstrated on the bacteriophage HK97 and the virus N$\omega$V.
The results are contrasted with existing algorithms which assume that each
instance of the particle has the symmetry rather than the less restrictive
assumption that the probability law has the symmetry.
-
We introduce a tensor-based clustering method to extract sparse,
low-dimensional structure from high-dimensional, multi-indexed datasets. This
framework is designed to enable detection of clusters of data in the presence
of structural requirements which we encode as algebraic constraints in a linear
program. Our clustering method is general and can be tailored to a variety of
applications in science and industry. We illustrate our method on a collection
of experiments measuring the response of genetically diverse breast cancer cell
lines to an array of ligands. Each experiment consists of a cell line-ligand
combination, and contains time-course measurements of the early-signalling
kinases MAPK and AKT at two different ligand dose levels. By imposing
appropriate structural constraints and respecting the multi-indexed structure
of the data, the analysis of clusters can be optimized for biological
interpretation and therapeutic understanding. We then perform a systematic,
large-scale exploration of mechanistic models of MAPK-AKT crosstalk for each
cluster. This analysis allows us to quantify the heterogeneity of breast cancer
cell subtypes, and leads to hypotheses about the signalling mechanisms that
mediate the response of the cell lines to ligands.
-
Pathway Tools is a bioinformatics software environment with a broad set of
capabilities. The software provides genome-informatics tools such as a genome
browser, sequence alignments, a genome-variant analyzer, and
comparative-genomics operations. It offers metabolic-informatics tools, such as
metabolic reconstruction, quantitative metabolic modeling, prediction of
reaction atom mappings, and metabolic route search. Pathway Tools also provides
regulatory-informatics tools, such as the ability to represent and visualize a
wide range of regulatory interactions. The software creates and manages a type
of organism-specific database called a Pathway/Genome Database (PGDB), which
the software enables database curators to interactively edit. It supports web
publishing of PGDBs and provides a large number of query, visualization, and
omics-data analysis tools. Scientists around the world have created more than
9,800 PGDBs by using Pathway Tools, many of which are curated databases for
important model organisms. Those PGDBs can be exchanged using a peer-to-peer
database-sharing system called the PGDB Registry.
-
The development of chemical reaction models aids understanding and prediction
in areas ranging from biology to electrochemistry and combustion. A systematic
approach to building reaction network models uses observational data not only
to estimate unknown parameters, but also to learn model structure. Bayesian
inference provides a natural approach to this data-driven construction of
models. Yet traditional Bayesian model inference methodologies that numerically
evaluate the evidence for each model are often infeasible for nonlinear
reaction network inference, as the number of plausible models can be
combinatorially large. Alternative approaches based on model-space sampling can
enable large-scale network inference, but their realization presents many
challenges. In this paper, we present new computational methods that make
large-scale nonlinear network inference tractable. First, we exploit the
topology of networks describing potential interactions among chemical species
to design improved "between-model" proposals for reversible-jump Markov chain
Monte Carlo. Second, we introduce a sensitivity-based determination of move
types which, when combined with network-aware proposals, yields significant
additional gains in sampling performance. These algorithms are demonstrated on
inference problems drawn from systems biology, with nonlinear differential
equation models of species interactions.
-
The step of expert taxa recognition currently slows down the response time of
many bioassessments. Shifting to quicker and cheaper state-of-the-art machine
learning approaches is still met with expert scepticism towards the ability and
logic of machines. In our study, we investigate both the differences in
accuracy and in the identification logic of taxonomic experts and machines. We
propose a systematic approach utilizing deep Convolutional Neural Nets with the
transfer learning paradigm and extensively evaluate it over a multi-pose
taxonomic dataset with hierarchical labels specifically created for this
comparison. We also study the prediction accuracy on different ranks of
taxonomic hierarchy in detail. Our results revealed that human experts using
actual specimens yield the lowest classification error ($\overline{CE}=6.1\%$).
However, a much faster, automated approach using deep Convolutional Neural Nets
comes close to human accuracy ($\overline{CE}=11.4\%$). Contrary to previous
findings in the literature, we find that for machines following a typical flat
classification approach commonly used in machine learning performs better than
forcing machines to adopt a hierarchical, local per parent node approach used
by human taxonomic experts. Finally, we publicly share our unique dataset to
serve as a public benchmark dataset in this field.
-
The advent of high--throughput transcription profiling technologies has
enabled identification of genes and pathways associated with disease, providing
new avenues for precision medicine. A key challenge is to analyze this data in
the context of the regulatory networks and pathways that control cellular
processes, while still obtaining insights that can be used to design new
diagnostic and therapeutic interventions. While classical differential
expression analysis provides specific and hence targetable gene-level insights,
it does not include any systems-level information. On the other hand, pathway
analyses integrate systems-level information with expression data, but are
often limited in their ability to indicate specific molecular targets. We
introduce GeneSurrounder, an analysis method that takes into account the
complex structure of interaction networks to identify specific genes that
disrupt pathway activity in a disease-specific manner. GeneSurrounder
integrates transcriptomic data and pathway network information in a novel
two-step procedure to detect genes that (i) appear to influence the expression
of other genes local to it in the network and (ii) are part of a subnetwork of
differentially expressed genes. Combined, this evidence can be used to pinpoint
specific genes that have a mechanistic role in the phenotype of interest.
Applying GeneSurrounder to three distinct ovarian cancer studies using a global
KEGG network, we show that our method is able to identify biologically relevant
genes and genes missed by single-gene association tests, integrate pathway and
expression data, and yield more consistent results across multiple studies of
the same phenotype than competing methods.
-
Meaningful laws of nature must be independent of the units employed to
measure the variables. The principle of similitude (Rayleigh 1915) or
dimensional homogeneity, states that only commensurable quantities (ones having
the same dimension) may be compared, therefore, meaningful laws of nature must
be homogeneous equations in their various units of measurement, a result which
was formalized in the $\rm \Pi$ theorem (Vaschy 1892; Buckingham 1914).
However, most relations in allometry do not satisfy this basic requirement,
including the `3/4 Law' (Kleiber 1932) that relates the basal metabolic rate
and body mass, which it is sometimes claimed to be the most fundamental
biological rate (Brown et al. 2004) and the closest to a law in life sciences
(West \& Brown 2004). Using the $\rm \Pi$ theorem, here we show that it is
possible to construct a unique homogeneous equation for the metabolic rates, in
agreement with data in the literature. We find that the variations in the
dependence of the metabolic rates on body mass are secondary, coming from
variations in the allometric dependence of the heart frequencies. This includes
not only different classes of animals (mammals, birds, invertebrates) but also
different exercise conditions (basal and maximal). Our results demonstrate that
most of the differences found in the allometric exponents (White et al. 2007)
are due to compare incommensurable quantities and that our dimensionally
homogenous formula, unify these differences into a single formulation. We
discuss the ecological implications of this new formulation in the context of
the Malthusian's, Fenchel's and the total energy consumed in a lifespan
relations.
-
There is a growing awareness that catastrophic phenomena in biology and
medicine can be mathematically represented in terms of saddle-node
bifurcations. In particular, the term `tipping', or critical transition has in
recent years entered the discourse of the general public in relation to
ecology, medicine, and public health. The saddle-node bifurcation and its
associated theory of catastrophe as put forth by Thom and Zeeman has seen
applications in a wide range of fields including molecular biophysics,
mesoscopic physics, and climate science. In this paper, we investigate a simple
model of a non-autonomous system with a time-dependent parameter $p(\tau)$ and
its corresponding `dynamic' (time-dependent) saddle-node bifurcation by the
modern theory of non-autonomous dynamical systems. We show that the actual
point of no return for a system undergoing tipping can be significantly delayed
in comparison to the {\em breaking time} $\hat{\tau}$ at which the
corresponding autonomous system with a time-independent parameter $p_{a}=
p(\hat{\tau})$ undergoes a bifurcation. A dimensionless parameter
$\alpha=\lambda p_0^3V^{-2}$ is introduced, in which $\lambda$ is the curvature
of the autonomous saddle-node bifurcation according to parameter $p(\tau)$,
which has an initial value of $p_{0}$ and a constant rate of change $V$. We
find that the breaking time $\hat{\tau}$ is always less than the actual point
of no return $\tau^*$ after which the critical transition is irreversible;
specifically, the relation $\tau^*-\hat{\tau}\simeq 2.338(\lambda
V)^{-\frac{1}{3}}$ is analytically obtained. For a system with a small $\lambda
V$, there exists a significant window of opportunity $(\hat{\tau},\tau^*)$
during which rapid reversal of the environment can save the system from
catastrophe.
-
In this paper we investigate the complexity of model selection and model
testing for dynamical systems with toric steady states. Such systems frequently
arise in the study of chemical reaction networks. We do this by formulating
these tasks as a constrained optimization problem in Euclidean space. This
optimization problem is known as a Euclidean distance problem; the complexity
of solving this problem is measured by an invariant called the Euclidean
distance (ED) degree. We determine closed-form expressions for the ED degree of
the steady states of several families of chemical reaction networks with toric
steady states and arbitrarily many reactions. To illustrate the utility of this
work we show how the ED degree can be used as a tool for estimating the
computational cost of solving the model testing and model selection problems.
-
The biomechanics of the human body gives subjects a high degree of freedom in
how they can execute movement. Nevertheless, subjects exhibit regularity in
their movement patterns. One way to account for this regularity is to suppose
that subjects select movement trajectories that are optimal in some sense. We
adopt the principle that human movements are optimal and develop a general
model for human movement patters that uses variational methods in the form of
optimal control theory to calculate trajectories of movement trajectories of
the body. We find that in this approach a constant of the motion that arises
from the model and which plays a role in the optimal control model that is
analogous to the role that the mechanical energy plays in classical physics. We
illustrate how this approach works in practice by using it to develop a model
of walking gait, making all the derivations and calculations in detail. We
finally show that this optimal control model of walking gait recovers in an
appropriate limit an existing model of walking gait which has been shown to
provide good estimates of many observed characteristics of walking gait.
-
Human movements are physical processes combining the classical mechanics of
the human body moving in space and the biomechanics of the muscles generating
the forces acting on the body under sophisticated sensory-motor control. The
characterization of the performance of human movements is a problem with
important applications in clinical and sports research. One way to characterize
movement performance is through measures of energy efficiency that relate the
mechanical energy of the body and metabolic energy expended by the muscles.
Such a characterization provides information about the performance of a
movement insofar as subjects select movements with the aim of maximizing the
energy efficiency. We examine the case of the energy efficiency of asynchronous
arm-cranking doing external mechanical work, that is, using the arms to turn an
asynchronous arm-crank that performs external mechanical work. We construct a
metabolic energy model and use it estimate how cranking may be performed with
maximum energy efficiency, and recover the intuitive result that for larger
external forces the crank-handles should be placed as far from the center of
the crank as is comfortable for the subject to turn. We further examine
mechanical advantage in asynchronous arm-cranking by constructing an idealized
system that is driven by a crank and which involves an adjustable mechanical
advantage, and analyze the case in which the avg. frequency is fixed and derive
the mechanical advantages that maximize energy efficiency.
-
Human movements are physical processes combining the classical mechanics of
the human body moving in space and the biomechanics of the muscles generating
the forces acting on the body under sophisticated sensory-motor control. One
way to characterize movement performance is through measures of energy
efficiency that relate the mechanical energy of the body and metabolic energy
expended by the muscles. We expect the practical utility of such measures to be
greater when human subjects execute movements that maximize energy efficiency.
We therefore seek to understand if and when subjects select movements with that
maximizing energy efficiency. We proceed using a model-based approach to
describe movements which perform a task requiring the body to add or remove
external mechanical work to or from an object. We use the specific example of
walking gaits doing external mechanical work by pulling a cart, and estimate
the relationship between the avg. walking speed and avg. step length. In the
limit where no external work is done, we find that the estimated maximum energy
efficiency walking gait is much slower than the walking gaits healthy adults
typically select. We then modify the situation of the walking gait by
introducing an idealized mechanical device that creates an adjustable
mechanical advantage. The walking gaits that maximize the energy efficiency
using the optimal mechanical advantage are again much slower than the walking
gaits healthy adults typically select. We finally modify the situation so that
the avg. walking speed is fixed and derive the pattern of the avg. step length
and mechanical advantage that maximize energy efficiency.
-
The biomechanics of the human body allow humans a range of possible ways of
executing movements to attain specific goals. This range of movement is limited
by a number of mechanical, biomechanical, or cognitive constraints. Shifts in
these limits result in changes available possible movements from which a
subject can select and can affect which movements a subject selects. Therefore
by understanding the limits on the range of movement we can come to a better
understanding of declines in movement performance due to disease or aging. In
this project, we look at how models for the limits on the range of movement can
be derived in a principled manner from a model of the movement. Using the
example of normal walking gaits, we develop a lower limit on the avg. walking
speed by examining the process by which the body restores mechanical energy
lost during walking, and we develop an upper limit on the avg. step length by
examining the forces the body can exert doing external mechanical work, in this
case, pulling a cart. Making slight changes to the model for normal walking
gaits, we develop a model of very slow walking gaits with avg. walking speeds
below the lower limit on normal walking gaits but that also has a lower limit
on the avg. walking speed. We note that the lowest avg. walking speeds observed
clinically fall into the range of very slow walking gaits so defined, and argue
that forms of bipedal locomotion with still lower speeds should be considered
distinct from walking gaits.
-
The biomechanics of the human body allow humans a range of possible ways of
executing movements to attain specific goals. Nevertheless, humans exhibit
significant patterns in how they execute movements. We propose that the
observed patterns of human movement arise because subjects select those ways to
execute movements that are, in a rigorous sense, optimal. In this project, we
show how this proposition can guide the development of computational models of
movement selection and thereby account for human movement patterns. We proceed
by first developing a movement utility formalism that operationalizes the
concept of a best or optimal way of executing a movement using a utility
function so that the problem of movement selection becomes the problem of
finding the movement that maximizes the utility function. Since the movement
utility formalism includes a contribution of the metabolic energy of the
movement (maximum utility movements try to minimize metabolic energy), we also
develop a metabolic energy formalism that we can use to construct estimators of
the metabolic energies of particular movements. We then show how we can
construct an estimator for the metabolic energies of normal walking gaits and
we use that estimator to construct a movement utility model of the selection of
normal walking gaits and show that the relationship between avg. walking speed
and avg. step length predicted by this model agrees with observation. We
conclude by proposing a physical mechanism that a subject might use to estimate
the metabolic energy of a movement in practice.
-
While the use of technology to provide accurate and objective measurements of
human movement performance is presently an area of great interest, efforts to
quantify the performance of movement are hampered by the lack of a principled
model that describes how a subject goes about making a movement. We put forward
a principled mathematical formalism that describes human movements using an
optimal control model in which the subject controls the jerk of the movement.
We construct the formalism by assuming that the movement a subject chooses to
make is better than the alternatives. We quantify the relative quality of
movements mathematically by specifying a cost functional that assigns a
numerical value to every possible movement; the subject makes the movement that
minimizes the cost functional. We develop the mathematical structure of
movements that minimize a cost functional, and observe that this development
parallels the development of analytical mechanics from the Principle of Least
Action. We derive a constant of the motion for human movements that plays a
role that is analogous to the role that the energy plays in classical
mechanics. We apply the formalism to the description of two movements: (1)
rapid, targeted movements of a computer mouse, and (2) finger-tapping, and show
that the constant of the motion that we have derived provides a useful value
with which we can characterize the performance of the movements. In the case of
rapid, targeted movements of a computer mouse, we show how the model of human
movement that we have developed can be made to agree with Fitts' law, and we
show how Fitts' law is related to the constant of the motion that we have
derived. We finally show that solutions exist within the model of human
movements that exhibit an oscillatory character reminiscent of tremor.
-
The lasso and elastic net linear regression models impose a
double-exponential prior distribution on the model parameters to achieve
regression shrinkage and variable selection, allowing the inference of robust
models from large data sets. However, there has been limited success in
deriving estimates for the full posterior distribution of regression
coefficients in these models, due to a need to evaluate analytically
intractable partition function integrals. Here, the Fourier transform is used
to express these integrals as complex-valued oscillatory integrals over
"regression frequencies". This results in an analytic expansion and stationary
phase approximation for the partition functions of the Bayesian lasso and
elastic net, where the non-differentiability of the double-exponential prior
has so far eluded such an approach. Use of this approximation leads to highly
accurate numerical estimates for the expectation values and marginal posterior
distributions of the regression coefficients, and allows for Bayesian inference
of much higher dimensional models than previously possible.
-
In this work, we consider the problem of estimating summary statistics to
characterise biochemical reaction networks of interest. Such networks are often
described using the framework of the Chemical Master Equation (CME). For
physically-realistic models, the CME is widely considered to be analytically
intractable. A variety of Monte Carlo algorithms have therefore been developed
to explore the dynamics of such networks empirically. Amongst them is the
multi-level method, which uses estimates from multiple ensembles of sample
paths of different accuracies to estimate a summary statistic of interest. {In
this work, we develop the multi-level method in two directions: (1) to increase
the robustness, reliability and performance of the multi-level method, we
implement an improved variance reduction method for generating the sample paths
of each ensemble; and (2) to improve computational performance, we demonstrate
the successful use of a different mechanism for choosing which ensembles should
be included in the multi-level algorithm.
-
We present a continuous model for structural brain connectivity based on the
Poisson point process. The model treats each streamline curve in a tractography
as an observed event in connectome space, here a product space of cortical
white matter boundaries. We approximate the model parameter via kernel density
estimation. To deal with the heavy computational burden, we develop a fast
parameter estimation method by pre-computing associated Legendre products of
the data, leveraging properties of the spherical heat kernel. We show how our
approach can be used to assess the quality of cortical parcellations with
respect to connectivty. We further present empirical results that suggest the
discrete connectomes derived from our model have substantially higher
test-retest reliability compared to standard methods.
-
Light microscopy as well as image acquisition and processing suffer from
physical and technical prejudices which preclude a correct interpretation of
biological observations which can be reflected in, e.g., medical and
pharmacological praxis. Using the examples of a diffracting microbead and
fluorescently labelled tissue, this article clarifies some ignored aspects of
image build-up in the light microscope and introduce algorithms for maximal
extraction of information from the 3D microscopic experiments. We provided a
correct set-up of the microscope and we sought a voxel (3D pixel) called an
electromagnetic centroid which localizes the information about the object. In
diffraction imaging and light emission, this voxel shows a minimal intensity
change in two consecutive optical cuts. This approach further enabled us to
identify z-stack of a DAPI-stained tissue section where at least one object of
a relevant fluorescent marker was in focus. The spatial corrections (overlaps)
of the DAPI-labelled region with in-focus autofluorescent regions then enabled
us to co-localize these three regions in the optimal way when considering
physical laws and information theory. We demonstrate that superresolution down
to the Nobelish level can be obtained from commonplace widefield bright-field
and fluorescence microscopy and bring new perspectives on co-localization in
fluorescent microscopy.
-
The color sensation evoked by an object depends on both the spectral power
distribution of the illumination and the reflectance properties of the object
being illuminated. The color sensation can be characterized by three
color-space values, such as XYZ, RGB, HSV, L*a*b*, etc. It is straightforward
to compute the three values given the illuminant and reflectance curves. The
converse process of computing a reflectance curve given the color-space values
and the illuminant is complicated by the fact that an infinite number of
different reflectance curves can give rise to a single set of color-space
values (metamerism). This paper presents five algorithms for generating a
reflectance curve from a specified sRGB triplet, written for a general
audience. The algorithms are designed to generate reflectance curves that are
similar to those found with naturally occurring colored objects. The computed
reflectance curves are compared to a database of thousands of reflectance
curves measured from paints and pigments available both commercially and in
nature, and the similarity is quantified. One particularly useful application
of these algorithms is in the field of computer graphics, where modeling color
transformations sometimes requires wavelength-specific information, such as
when modeling subtractive color mixture.
-
Biologists have long sought a way to explain how statistical properties of
genetic sequences emerged and are maintained through evolution. On the one
hand, non-random structures at different scales indicate a complex genome
organisation. On the other hand, single-strand symmetry has been scrutinised
using neutral models in which correlations are not considered or irrelevant,
contrary to empirical evidence. Different studies investigated these two
statistical features separately, reaching minimal consensus despite sustained
efforts. Here we unravel previously unknown symmetries in genetic sequences,
which are organized hierarchically through scales in which non-random
structures are known to be present. These observations are confirmed through
the statistical analysis of the human genome and explained through a simple
domain model. These results suggest that domain models which account for the
cumulative action of mobile elements can explain simultaneously non-random
structures and symmetries in genetic sequences.
-
We introduce and study a set of training-free methods of
information-theoretic and algorithmic complexity nature applied to DNA
sequences to identify their potential capabilities to determine nucleosomal
binding sites. We test our measures on well-studied genomic sequences of
different sizes drawn from different sources. The measures reveal the known in
vivo versus in vitro predictive discrepancies and uncover their potential to
pinpoint (high) nucleosome occupancy. We explore different possible signals
within and beyond the nucleosome length and find that complexity indices are
informative of nucleosome occupancy. We compare against the gold standard
(Kaplan model) and find similar and complementary results with the main
difference that our sequence complexity approach. For example, for high
occupancy, complexity-based scores outperform the Kaplan model for predicting
binding representing a significant advancement in predicting the highest
nucleosome occupancy following a training-free approach.