• Phylogenetic comparative methods explore the relationships between quantitative traits adjusting for shared evolutionary history. This adjustment often occurs through a Brownian diffusion process along the branches of the phylogeny that generates model residuals or the traits themselves. For high-dimensional traits, inferring all pair-wise correlations within the multivariate diffusion is limiting. To circumvent this problem, we propose phylogenetic factor analysis (PFA) that assumes a small unknown number of independent evolutionary factors arise along the phylogeny and these factors generate clusters of dependent traits. Set in a Bayesian framework, PFA provides measures of uncertainty on the factor number and groupings, combines both continuous and discrete traits, integrates over missing measurements and incorporates phylogenetic uncertainty with the help of molecular sequences. We develop Gibbs samplers based on dynamic programming to estimate the PFA posterior distribution, over three-fold faster than for multivariate diffusion and a further order-of-magnitude more efficiently in the presence of latent traits. We further propose a novel marginal likelihood estimator for previously impractical models with discrete data and find that PFA also provides a better fit than multivariate diffusion in evolutionary questions in columbine flower development, placental reproduction transitions and triggerfish fin morphometry.
  • Effective population size characterizes the genetic variability in a population and is a parameter of paramount importance in population genetics. Kingman's coalescent process enables inference of past population dynamics directly from molecular sequence data, and researchers have developed a number of flexible coalescent-based models for Bayesian nonparametric estimation of the effective population size as a function of time. A major goal of demographic reconstruction is understanding the association between the effective population size and potential explanatory factors. Building upon Bayesian nonparametric coalescent-based approaches, we introduce a flexible framework that incorporates time-varying covariates through Gaussian Markov random fields. To approximate the posterior distribution, we adapt efficient Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. Incorporating covariates into the demographic inference framework enables the modeling of associations between the effective population size and covariates while accounting for uncertainty in population histories. Furthermore, it can lead to more precise estimates of population dynamics. We apply our model to four examples. We reconstruct the demographic history of raccoon rabies in North America and find a significant association with the spatiotemporal spread of the outbreak. Next, we examine the effective population size trajectory of the DENV-4 virus in Puerto Rico along with viral isolate count data and find similar cyclic patterns. We compare the population history of the HIV-1 CRF02_AG clade in Cameroon with HIV incidence and prevalence data and find that the effective population size is more reflective of incidence rate. Finally, we explore the hypothesis that the population dynamics of musk ox during the Late Quaternary period were related to climate change.
  • Understanding the processes that give rise to quantitative measurements associated with molecular sequence data remains an important issue in statistical phylogenetics. Examples of such measurements include geographic coordinates in the context of phylogeography and phenotypic traits in the context of comparative studies. A popular approach is to model the evolution of continuously varying traits as a Brownian diffusion process. However, standard Brownian diffusion is quite restrictive and may not accurately characterize certain trait evolutionary processes. Here, we relax one of the major restrictions of standard Brownian diffusion by incorporating a nontrivial estimable drift into the process. We introduce a relaxed drift diffusion model for the evolution of multivariate continuously varying traits along a phylogenetic tree via Brownian diffusion with drift. Notably, the relaxed drift model accommodates branch-specific variation of drift rates while preserving model identifiability. We implement the relaxed drift model in a Bayesian inference framework to simultaneously reconstruct the evolutionary histories of molecular sequence data and associated multivariate continuous trait data, and provide tools to visualize evolutionary reconstructions. We illustrate our approach in three viral examples. In the first two, we examine the spatiotemporal spread of HIV-1 in central Africa and West Nile virus in North America and show that a relaxed drift approach uncovers a clearer, more detailed picture of the dynamics of viral dispersal than standard Brownian diffusion. Finally, we study antigenic evolution in the context of HIV-1 resistance to three broadly neutralizing antibodies. Our analysis reveals evidence of a continuous drift at the HIV-1 population level towards enhanced resistance to neutralization by the VRC01 monoclonal antibody over the course of the epidemic.
  • Understanding which phenotypic traits are consistently correlated throughout evolution is a highly pertinent problem in modern evolutionary biology. Here, we propose a multivariate phylogenetic latent liability model for assessing the correlation between multiple types of data, while simultaneously controlling for their unknown shared evolutionary history informed through molecular sequences. The latent formulation enables us to consider in a single model combinations of continuous traits, discrete binary traits and discrete traits with multiple ordered and unordered states. Previous approaches have entertained a single data type generally along a fixed history, precluding estimation of correlation between traits and ignoring uncertainty in the history. We implement our model in a Bayesian phylogenetic framework, and discuss inference techniques for hypothesis testing. Finally, we showcase the method through applications to columbine flower morphology, antibiotic resistance in Salmonella and epitope evolution in influenza.
  • Although foot-and-mouth disease virus (FMDV) incidence has decreased in South America over the last years, the pathogen still circulates in the region and the risk of re-emergence in previously FMDV-free areas is a veterinary public health concern. In this paper we merge environmental, epidemiological and genetic data to reconstruct spatiotemporal patterns and determinants of FMDV serotypes A and O dispersal in South America. Our dating analysis suggests that serotype A emerged in South America around 1930, while serotype O emerged around 1990. The rate of evolution for serotype A was significantly higher compared to serotype O. Phylogeographic inference identified two well-connected sub networks of viral flow, one including Venezuela, Colombia and Ecuador; another including Brazil, Uruguay and Argentina. The spread of serotype A was best described by geographic distances, while trade of live cattle was the predictor that best explained serotype O spread. Our findings show that the two serotypes have different underlying evolutionary and spatial dynamics and may pose different threats to control programmes. Key-words: Phylogeography, foot-and-mouth disease virus, South America, animal trade.
  • When estimating a phylogeny from a multiple sequence alignment, researchers often assume the absence of recombination. However, if recombination is present, then tree estimation and all downstream analyses will be impacted, because different segments of the sequence alignment support different phylogenies. Similarly, convergent selective pressures at the molecular level can also lead to phylogenetic tree incongruence across the sequence alignment. Current methods for detection of phylogenetic incongruence are not equipped to distinguish between these two different mechanisms and assume that the incongruence is a result of recombination or other horizontal transfer of genetic information. We propose a new recombination detection method that can make this distinction, based on synonymous codon substitution distances. Although some power is lost by discarding the information contained in the nonsynonymous substitutions, our new method has lower false positive probabilities than the comparable recombination detection method when the phylogenetic incongruence signal is due to convergent evolution. We apply our method to three empirical examples, where we analyze: 1) sequences from a transmission network of the human immunodeficiency virus, 2) tlpB gene sequences from a geographically diverse set of 38 Helicobacter pylori strains, and 3) Hepatitis C virus sequences sampled longitudinally from one patient.
  • Influenza viruses undergo continual antigenic evolution allowing mutant viruses to evade host immunity acquired to previous virus strains. Antigenic phenotype is often assessed through pairwise measurement of cross-reactivity between influenza strains using the hemagglutination inhibition (HI) assay. Here, we extend previous approaches to antigenic cartography, and simultaneously characterize antigenic and genetic evolution by modeling the diffusion of antigenic phenotype over a shared virus phylogeny. Using HI data from influenza lineages A/H3N2, A/H1N1, B/Victoria and B/Yamagata, we determine patterns of antigenic drift across viral lineages, showing that A/H3N2 evolves faster and in a more punctuated fashion than other influenza lineages. We also show that year-to-year antigenic drift appears to drive incidence patterns within each influenza lineage. This work makes possible substantial future advances in investigating the dynamics of influenza and other antigenically-variable pathogens by providing a model that intimately combines molecular and antigenic evolution.
  • Background: Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked matching machinery for simulation of character evolution along phylogenies. Results: We present a flexible Monte Carlo simulation tool, called piBUSS, that employs the BEAGLE high performance library for phylogenetic computations within BEAST to rapidly generate large sequence alignments under complex evolutionary models. piBUSS sports a user-friendly graphical user interface (GUI) that allows combining a rich array of models across an arbitrary number of partitions. A command-line interface mirrors the options available through the GUI and facilitates scripting in large-scale simulation studies. Analogous to BEAST model and analysis setup, more advanced simulation options are supported through an extensible markup language (XML) specification, which in addition to generating sequence output, also allows users to combine simulation and analysis in a single BEAST run. Conclusions: piBUSS offers a unique combination of flexibility and ease-of-use for sequence simulation under realistic evolutionary scenarios. Through different interfaces, piBUSS supports simulation studies ranging from modest endeavors for illustrative purposes to complex and large-scale assessments of evolutionary inference procedures. The software aims at implementing new models and data types that are continuously being developed as part of BEAST/BEAGLE.
  • Molecular phylogenetic and phylogeographic reconstructions generally assume time-homogeneous substitution processes. Motivated by computational convenience, this assumption sacrifices biological realism and offers little opportunity to uncover the temporal dynamics in evolutionary histories. Here, we extend and generalize an evolutionary approach that relaxes the time-homogeneous process assumption by allowing the specification of different infinitesimal substitution rate matrices across different time intervals, called epochs, along the evolutionary history. We focus on an epoch model implementation in a Bayesian inference framework that offers great modeling flexibility in drawing inference about any discrete data type characterized as a continuous-time Markov chain, including phylogeographic traits. To alleviate the computational burden that the additional temporal heterogeneity imposes, we adopt a massively parallel approach that achieves both fine- and coarse-grain parallelization of the computations across branches that accommodate epoch transitions, making extensive use of graphics processing units. Through synthetic examples, we assess model performance in recovering evolutionary parameters from data generated according to different evolutionary scenarios that comprise different numbers of epochs for both nucleotide and codon substitution processes. We illustrate the usefulness of our inference framework in two different applications to empirical data sets: the selection dynamics on within-host HIV populations throughout infection and the seasonality of global influenza circulation. In both cases, our epoch model captures key features of temporal heterogeneity that remained difficult to test using ad hoc procedures.
  • Global mobility flow data are at the heart of spatial epidemiological models used to predict infectious disease behavior but this wealth of data on human mobility has been largely neglected by reconstructions of pathogen evolutionary dynamics using viral genetic data. Although stochastic models of viral evolution may potentially be informed by such data, a major challenge lies in deciding which mobility processes are critical and to what extent they contribute to shaping contemporaneous distributions of pathogen diversity. Here, we develop a framework to integrate predictors of viral diffusion with phylogeographic inference and estimate human influenza H3N2 migration history while simultaneously testing and quantifying the factors that underly it. We provide evidence for air travel governing the global dynamics of human influenza whereas other processes act at a more local scale.