• The evolution of microbial and viral organisms often generates clonal interference, a mode of competition between genetic clades within a population. In this paper, we show that interference strongly constrains the genetic and phenotypic complexity of evolving systems. Our analysis uses biophysically grounded evolutionary models for an organism's quantitative molecular phenotypes, such as fold stability and enzymatic activity of genes. We find a generic mode of asexual evolution called phenotypic interference with strong implications for systems biology: it couples the stability and function of individual genes to the population's global speed of evolution. This mode occurs over a wide range of evolutionary parameters appropriate for microbial populations. It generates selection against genome complexity, because the fitness cost of mutations increases faster than linearly with the number of genes. Recombination can generate a distinct mode of sexual evolution that eliminates the superlinear cost. We show that positive selection can drive a transition from asexual to facultative sexual evolution, providing a specific, biophysically grounded scenario for the evolution of sex. In a broader context, our analysis suggests that the systems biology of microbial organisms is strongly intertwined with their mode of evolution.
  • Trait differences between species may be attributable to natural selection. However, quantifying the strength of evidence for selection acting on a particular trait is a difficult task. Here we develop a population-genetic test for selection acting on a quantitative trait which is based on multiple-line crosses. We show that using multiple lines increases both the power and the scope of selection inference. First, a test based on three or more lines detects selection with strongly increased statistical significance, and we show explicitly how the sensitivity of the test depends on the number of lines. Second, a multiple-line test allows to distinguish different lineage-specific selection scenarios. Our analytical results are complemented by extensive numerical simulations. We then apply the multiple-line test to QTL data on floral character traits in plant species of the Mimulus genus and on photoperiodic traits in different maize strains, where we find a signatures of lineage-specific selection not seen in a two-line test.
  • Gene expression levels are important molecular quantitative traits that link genotypes to molecular functions and fitness. In Drosophila, population-genetic studies in recent years have revealed substantial adaptive evolution at the genomic level. However, the evolutionary modes of gene expression have remained controversial. Here we present evidence that adaptation dominates the evolution of gene expression levels in flies. We show that 63% of the observed expression divergence across seven Drosophila species are adaptive changes driven by directional selection. Our results are derived from the variation of expression within species and the time-resolved divergence across a family of related species, using a new inference method for selection. We identify functional classes of adaptively regulated genes, as well as sex-specific adaptation occurring predominantly in males. Our analysis opens a new avenue to map system-wide selection on molecular quantitative traits independently of their genetic basis.
  • The 2014 epidemic of the Ebola virus is governed by a genetically diverse viral population. In the early Sierra Leone outbreak, a recent study has identified new mutations that generate genetically distinct sequence clades. Here we find evidence that major Sierra Leone clades have systematic differences in growth rate and reproduction number. If this growth heterogeneity remains stable, it will generate major shifts in clade frequencies and influence the overall epidemic dynamics on time scales within the current outbreak. Our method is based on simple summary statistics of clade growth, which can be inferred from genealogical trees with an underlying clade-specific birth-death model of the infection dynamics. This method can be used to perform realtime tracking of an evolving epidemic and identify emerging clades of epidemiological or evolutionary significance.
  • Recent studies have consistently inferred high rates of adaptive molecular evolution between Drosophila species. At the same time, the Drosophila genome evolves under different rates of recombination, which results in partial genetic linkage between alleles at neighboring genomic loci. Here we analyze how linkage correlations affect adaptive evolution. We develop a new inference method for adaptation that takes into account the effect on an allele at a focal site caused by neighboring deleterious alleles (background selection) and by neighboring adaptive substitutions (hitchhiking). Using complete genome sequence data and fine-scale recombination maps, we infer a highly heterogeneous scenario of adaptation in Drosophila. In high-recombining regions, about 50% of all amino acid substitutions are adaptive, together with about 20% of all substitutions in proximal intergenic regions. In low-recombining regions, only a small fraction of the amino acid substitutions are adaptive, while hitchhiking accounts for the majority of these changes. Hitchhiking of deleterious alleles generates a substantial collateral cost of adaptation, leading to a fitness decline of about 30/2N per gene and per million years in the lowest-recombining regions. Our results show how recombination shapes rate and efficacy of the adaptive dynamics in eukaryotic genomes.
  • Molecular phenotypes link genomic information with organismic functions, fitness, and evolution. Quantitative traits are complex phenotypes that depend on multiple genomic loci. In this paper, we study the adaptive evolution of a quantitative trait under time-dependent selection, which arises from environmental changes or through fitness interactions with other co-evolving phenotypes. We analyze a model of trait evolution under mutations and genetic drift in a single-peak fitness seascape. The fitness peak performs a constrained random walk in the trait amplitude, which determines the time-dependent trait optimum in a given population. We derive analytical expressions for the distribution of the time-dependent trait divergence between populations and of the trait diversity within populations. Based on this solution, we develop a method to infer adaptive evolution of quantitative traits. Specifically, we show that the ratio of the average trait divergence and the diversity is a universal function of evolutionary time, which predicts the stabilizing strength and the driving rate of the fitness seascape. From an information-theoretic point of view, this function measures the macro-evolutionary entropy in a population ensemble, which determines the predictability of the evolutionary process. Our solution also quantifies two key characteristics of adapting populations: the cumulative fitness flux, which measures the total amount of adaptation, and the adaptive load, which is the fitness cost due to a population's lag behind the fitness peak.
  • Molecular traits, such as gene expression levels or protein binding affinities, are increasingly accessible to quantitative measurement by modern high-throughput techniques. Such traits measure molecular functions and, from an evolutionary point of view, are important as targets of natural selection. We review recent developments in evolutionary theory and experiments that are expected to become building blocks of a quantitative genetics of molecular traits. We focus on universal evolutionary characteristics: these are largely independent of a trait's genetic basis, which is often at least partially unknown. We show that universal measurements can be used to infer selection on a quantitative trait, which determines its evolutionary mode of conservation or adaptation. Furthermore, universality is closely linked to predictability of trait evolution across lineages. We argue that universal trait statistics extends over a range of cellular scales and opens new avenues of quantitative evolutionary systems biology.
  • This paper addresses the statistical significance of structures in random data: Given a set of vectors and a measure of mutual similarity, how likely does a subset of these vectors form a cluster with enhanced similarity among its elements? The computation of this cluster p-value for randomly distributed vectors is mapped onto a well-defined problem of statistical mechanics. We solve this problem analytically, establishing a connection between the physics of quenched disorder and multiple testing statistics in clustering and related problems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes.
  • Sequence alignment forms the basis of many methods for functional annotation by phylogenetic comparison, but becomes unreliable in the `twilight' regions of high sequence divergence and short gene length. Here we perform a cross-species comparison of two herpesviruses, VZV and KSHV, with a hybrid method called graph alignment. The method is based jointly on the similarity of protein interaction networks and on sequence similarity. In our alignment, we find open reading frames for which interaction similarity concurs with a low level of sequence similarity, thus confirming the evolutionary relationship. In addition, we find high levels of interaction similarity between open reading frames without any detectable sequence similarity. The functional predictions derived from this alignment are consistent with genomic position and gene expression data.
  • An important part of the analysis of bio-molecular networks is to detect different functional units. Different functions are reflected in a different evolutionary dynamics, and hence in different statistical characteristics of network parts. In this sense, the {\em global statistics} of a biological network, e.g., its connectivity distribution, provides a background, and {\em local deviations} from this background signal functional units. In the computational analysis of biological networks, we thus typically have to discriminate between different statistical models governing different parts of the dataset. The nature of these models depends on the biological question asked. We illustrate this rationale here with three examples: identification of functional parts as highly connected \textit{network clusters}, finding \textit{network motifs}, which occur in a similar form at different places in the network, and the analysis of \textit{cross-species network correlations}, which reflect evolutionary dynamics between species.
  • Complex interactions between genes or proteins contribute a substantial part to phenotypic evolution. Here we develop an evolutionarily grounded method for the cross-species analysis of interaction networks by {\em alignment}, which maps bona fide functional relationships between genes in different organisms. Network alignment is based on a scoring function measuring mutual similarities between networks taking into account their interaction patterns as well as sequence similarities between their nodes. High-scoring alignments and optimal alignment parameters are inferred by a systematic Bayesian analysis. We apply this method to analyze the evolution of co-expression networks between human and mouse. We find evidence for significant conservation of gene expression clusters and give network-based predictions of gene function. We discuss examples where cross-species functional relationships between genes do not concur with sequence similarity.
  • We study secondary structures of random RNA molecules by means of a renormalized field theory based on an expansion in the sequence disorder. We show that there is a continuous phase transition from a molten phase at higher temperatures to a low-temperature glass phase. The primary freezing occurs above the critical temperature, with local islands of stable folds forming within the molten phase. The size of these islands defines the correlation length of the transition. Our results include critical exponents at the transition and in the glass phase.
  • We study the stochastic dynamics of sequences evolving by single site mutations, segmental duplications, deletions, and random insertions. These processes are relevant for the evolution of genomic DNA. They define a universality class of non-equilibrium 1D expansion-randomization systems with generic stationary long-range correlations in a regime of growing sequence length. We obtain explicitly the two-point correlation function of the sequence composition and the distribution function of the composition bias in sequences of finite length. The characteristic exponent $\chi$ of these quantities is determined by the ratio of two effective rates, which are explicitly calculated for several specific sequence evolution dynamics of the universality class. Depending on the value of $\chi$, we find two different scaling regimes, which are distinguished by the detectability of the initial composition bias. All analytic results are accurately verified by numerical simulations. We also discuss the non-stationary build-up and decay of correlations, as well as more complex evolutionary scenarios, where the rates of the processes vary in time. Our findings provide a possible example for the emergence of universality in molecular biology.
  • This is the first of two papers where we discuss the limits imposed by competition to the biodiversity of species communities. In this first paper we study the coexistence of competing species at the fixed point of population dynamic equations. For many simple models, this imposes a limit on the width of the productivity distribution, which is more severe the more diverse the ecosystem is (Chesson, 1994). Here we review and generalize this analysis, beyond the ``mean-field''-like approximation of the competition matrix used in previous works, and extend it to structured food webs. In all cases analysed, we obtain qualitatively similar relations between biodiversity and competition: the narrower the productivity distribution is, the more species can stably coexist. We discuss how this result, considered together with environmental fluctuations, limits the maximal biodiversity that a trophic level can host.
  • This is the second of two papers dedicated to the relationship between population models of competition and biodiversity. Here we consider species assembly models where the population dynamics is kept far from fixed points through the continuous introduction of new species, and generalize to such models thecoexistence condition derived for systems at the fixed point. The ecological overlap between species with shared preys, that we define here, provides a quantitative measure of the effective interspecies competition and of the trophic network topology. We obtain distributions of the overlap from simulations of a new model based both on immigration and speciation, and show that they are in good agreement with those measured for three large natural food webs. As discussed in the first paper, rapid environmental fluctuations, interacting with the condition for coexistence of competing species, limit the maximal biodiversity that a trophic level can host. This horizontal limitation to biodiversity is here combined with either dissipation of energy or growth of fluctuations, which in our model limit the length of food webs in the vertical direction. These ingredients yield an effective model of food webs that produce a biodiversity profile with a maximum at an intermediate trophic level, in agreement with field studies.
  • We study a minimal model for genome evolution whose elementary processes are single site mutation, duplication and deletion of sequence regions and insertion of random segments. These processes are found to generate long-range correlations in the composition of letters as long as the sequence length is growing, i.e., the combined rates of duplications and insertions are higher than the deletion rate. For constant sequence length, on the other hand, all initial correlations decay exponentially. These results are obtained analytically and by simulations. They are compared with the long-range correlations observed in genomic DNA, and the implications for genome evolution are discussed.
  • Interaction networks are of central importance in post-genomic molecular biology, with increasing amounts of data becoming available by high-throughput methods. Examples are gene regulatory networks or protein interaction maps. The main challenge in the analysis of these data is to read off biological functions from the topology of the network. Topological motifs, i.e., patterns occurring repeatedly at different positions in the network have recently been identified as basic modules of molecular information processing. In this paper, we discuss motifs derived from families of mutually similar but not necessarily identical patterns. We establish a statistical model for the occurrence of such motifs, from which we derive a scoring function for their statistical significance. Based on this scoring function, we develop a search algorithm for topological motifs called graph alignment, a procedure with some analogies to sequence alignment. The algorithm is applied to the gene regulation network of E. coli.
  • The regulation of a gene depends on the binding of transcription factors to specific sites located in the regulatory region of the gene. The generation of these binding sites and of cooperativity between them are essential building blocks in the evolution of complex regulatory networks. We study a theoretical model for the sequence evolution of binding sites by point mutations. The approach is based on biophysical models for the binding of transcription factors to DNA. Hence we derive empirically grounded fitness landscapes, which enter a population genetics model including mutations, genetic drift, and selection. We show that the selection for factor binding generically leads to specific correlations between nucleotide frequencies at different positions of a binding site. We demonstrate the possibility of rapid adaptive evolution generating a new binding site for a given transcription factor by point mutations. The evolutionary time required is estimated in terms of the neutral (background) mutation rate, the selection coefficient, and the effective population size. The efficiency of binding site formation is seen to depend on two joint conditions: the binding site motif must be short enough and the promoter region must be long enough. These constraints on promoter architecture are indeed seen in eukaryotic systems. Furthermore, we analyse the adaptive evolution of genetic switches and of signal integration through binding cooperativity between different sites. Experimental tests of this picture involving the statistics of polymorphisms and phylogenies of sites are discussed.
  • The structure of molecular networks derives from dynamical processes on evolutionary time scales. For protein interaction networks, global statistical features of their structure can now be inferred consistently from several large-throughput datasets. Understanding the underlying evolutionary dynamics is crucial for discerning random parts of the network from biologically important properties shaped by natural selection. We present a detailed statistical analysis of the protein interactions in Saccharomyces cerevisiae based on several large-throughput datasets. Protein pairs resulting from gene duplications are used as tracers into the evolutionary past of the network. From this analysis, we infer rate estimates for two key evolutionary processes shaping the network: (i) gene duplications and (ii) gain and loss of interactions through mutations in existing proteins, which are referred to as link dynamics. Importantly, the link dynamics is asymmetric, i.e., the evolutionary steps are mutations in just one of the binding parters. The link turnover is shown to be much faster than gene duplications. According to this model, the link dynamics is the dominant evolutionary force shaping the statistical structure of the network, while the slower gene duplication dynamics mainly affects its size. Specifically, the model predicts (i) a broad distribution of the connectivities (i.e., the number of binding partners of a protein) and (ii) correlations between the connectivities of interacting proteins.
  • Modes of speciation have been the subject of a century's debate. Traditionally, most speciations are believed to be caused by spatial separation of populations (allopatry). Recent observations (Meyer 1990, Schliewen 1994, Schliewen 2001, Rico 2002) and models (MaynardSmith 1966, Antonovics 1971, Dickinson 1973, Rosenzweig 1978, T urner 1995, Noest 1997, Geritz 1998, Kondrashov 1999, Dieckmann 1999, Doebeli 2000, Slatkin 1980), show that speciation can also take place in sympatry. We discuss a comprehensive model of coupled differentiation in phenotype, mating, and space, showing that spatial segregation can be an induced process following a sympatric differentiation. This is found to be a generic mechanism of adaptation to heterogeneous environments, for which we propose the term diapatric speciation (Greek). It explains the ubiquitous spatial patching of newly formed species, despite their sympatric origin (Schliewen 1994, Schliewen 2001, Rico 2002).
  • We develop a statistical theory of networks. A network is a set of vertices and links given by its adjacency matrix $\c$, and the relevant statistical ensembles are defined in terms of a partition function $Z=\sum_{\c} \exp {[}-\beta \H(\c) {]}$. The simplest cases are uncorrelated random networks such as the well-known Erd\"os-R\'eny graphs. Here we study more general interactions $\H(\c)$ which lead to {\em correlations}, for example, between the connectivities of adjacent vertices. In particular, such correlations occur in {\em optimized} networks described by partition functions in the limit $\beta \to \infty$. They are argued to be a crucial signature of evolutionary design in biological networks.
  • Quantum Game Theory (cond-mat/0206093)

    June 6, 2002 cond-mat.stat-mech, q-bio
    A systematic theory is introduced that describes stochastic effects in game theory. In a biological context, such effects are relevant for the evolution of finite populations with frequency-dependent selection. They are characterized by quantum Nash equilibria, a generalization of the well-known Nash equilibrium points in classical game theory. The implications of this theory for biological systems are discussed in detail.
  • We study the statistics of ecosystems with a variable number of co-evolving species. The species interact in two ways: by prey-predator relationships and by direct competition with similar kinds. The interaction coefficients change slowly through successful adaptations and speciations. We treat them as quenched random variables. These interactions determine long-term topological features of the species network, which are found to agree with those of biological systems.
  • Semi-flexible manifolds such as fluid membranes or semi-flexible polymers undergo delocalization transitions if they are subject to attractive interactions. We study manifolds with short-ranged interactions by field-theoretic methods based on the operator product expansion of local interaction fields. We apply this approach to manifolds in a random potential. Randomness is always relevant for fluid membranes, while for semi-flexible polymers there is a first order transition to the strong coupling regime at a finite temperature.
  • Mutual correlation between segments of DNA or protein sequences can be detected by Smith-Waterman local alignments. We present a statistical analysis of alignment of such sequences, based on a recent scaling theory. A new fidelity measure is introduced and shown to capture the significance of the local alignment, i.e., the extent to which the correlated subsequences are correctly identified. It is demonstrated how the fidelity may be optimized in the space of penalty parameters using only the alignment score data of a single sequence pair.