• We investigate the rates of drug resistance acquisition in a natural population using molecular epidemiological data from Bolivia. First, we study the rate of direct acquisition of double resistance from the double sensitive state within patients and compare it to the rates of evolution to single resistance. In particular, we address whether or not double resistance can evolve directly from a double sensitive state within a given host. Second, we aim to understand whether the differences in mutation rates to rifampicin and isoniazid resistance translate to the epidemiological scale. Third, we estimate the proportion of MDR TB cases that are due to the transmission of MDR strains compared to acquisition of resistance through evolution. To address these problems we develop a model of TB transmission in which we track the evolution of resistance to two drugs and the evolution of VNTR loci. However, the available data is incomplete, in that it is recorded only {for a fraction of the population and} at a single point in time. The likelihood function induced by the proposed model is computationally prohibitive to evaluate and accordingly impractical to work with directly. We therefore approach statistical inference using approximate Bayesian computation techniques.
  • Accurate estimation of evolutionary distances between taxa is important for many phylogenetic reconstruction methods. In the case of bacteria, distances can be estimated using a range of different evolutionary models, from single nucleotide polymorphisms to large-scale genome rearrangements. In the case of sequence evolution models (such as the Jukes-Cantor model and associated metric) have been used to correct pairwise distances. Similar correction methods for genome rearrangement processes are required to improve inference. Current attempts at correction fall into 3 categories: Empirical computational studies, Bayesian/MCMC approaches, and combinatorial approaches. Here we introduce a maximum likelihood estimator for the inversion distance between a pair of genomes, using the group-theoretic approach to modelling inversions introduced recently. This MLE functions as a corrected distance: in particular, we show that because of the way sequences of inversions interact with each other, it is quite possible for minimal distance and MLE distance to differently order the distances of two genomes from a third. This has obvious implications for the use of minimal distance in phylogeny reconstruction. The work also tackles the above problem allowing free rotation of the genome. Generally a frame of reference is locked, and all computation made accordingly. This work incorporates the action of the dihedral group so that distance estimates are free from any a priori frame of reference.
  • Modellers of large scale genome rearrangement events, in which segments of DNA are inverted, moved, swapped, or even inserted or deleted, have found a natural syntax in the language of permutations. Despite this, there has been a wide range of modelling choices, assumptions and interpretations that make navigating the literature a significant challenge. Indeed, even authors of papers that use permutations to model genome rearrangement can struggle to interpret each others' work, because of subtle differences in basic assumptions that are often deeply ingrained (and consequently sometimes not even mentioned). In this paper, we describe the different ways in which permutations have been used to model genomes and genome rearrangement events, presenting some features and limitations of each approach, and show how the various models are related. This paper will help researchers navigate the landscape of genome rearrangement models, and make it easier for authors to present clear and consistent models.
  • We present a search algorithm for constructing embeddings and deciding isomorphisms of semigroups, working with their multiplication tables. The algorithm is used for enumerating diagram semigroups up to isomorphism and for finding minimal degree representations.
  • Confidence nets, that is, collections of confidence intervals that fill out the parameter space and whose exact parameter coverage can be computed, are familiar in nonparametric statistics. Here, the distributional assumptions are based on invariance under the action of a finite reflection group. Exact confidence nets are exhibited for a single parameter, based on the root system of the group. The main result is a formula for the generating function of the coverage interval probabilities. The proof makes use of the theory of "buildings" and the Chevalley factorization theorem for the length distribution on Cayley graphs of finite reflection groups.
  • Many models of genome rearrangement involve operations (e.g. inversions and translocations) that are self-inverse, and hence generate a group acting on the space of genomes. This gives a correspondence between genome arrangements and the elements of a group, and consequently, between evolutionary paths and walks on the Cayley graph. Many common methods for phylogeny reconstruction rely on calculating the minimal distance between two genomes; this omits much of the other information available from the Cayley graph. In this paper we begin an exploration of some of this additional information, in particular describing the phylogeny as a Steiner tree within the Cayley graph, and exploring the "interval" between two genomes. While motivated by problems in systematic biology, many of these ideas are of independent group-theoretic interest.
  • A binary phylogenetic network may or may not be obtainable from a tree by the addition of directed edges (arcs) between tree arcs. Here, we establish a precise and easily tested criterion (based on `2-SAT') that efficiently determines whether or not any given network can be realized in this way. Moreover, the proof provides a polynomial-time algorithm for finding one or more trees (when they exist) on which the network can be based. A number of interesting consequences are presented as corollaries; these lead to some further relevant questions and observations, which we outline in the conclusion.
  • Diagram semigroups are interesting algebraic and combinatorial objects, several types of them originating from questions in computer science and in physics. Here we describe diagram semigroups in a general framework and extend our computational knowledge of them. The generated data set is replete with surprising observations raising many open questions for further theoretical research.
  • Establishing a distance between genomes is a significant problem in computational genomics, because its solution can be used to establish evolutionary relationships including phylogeny. The "double cut and join" (DCJ) model of chromosomal rearrangement proposed by Yancopoulos et al. has received attention as it can model inversions, translocations, fusion and fission on a multichromosomal genome that may contain both linear and circular chromosomes. In this paper, we realize the DCJ operator as a group action on the space of multichromosomal genomes. We study this group action, deriving some properties of the group and finding group-theoretic analogues for the key results in the DCJ theory.
  • Hybrid evolution and horizontal gene transfer (HGT) are processes where evolutionary relationships may more accurately be described by a reticulated network than by a tree. In such a network, there will often be several paths between any two extant species, reflecting the possible pathways that genetic material may have been passed down from a common ancestor to these species. These paths will typically have different lengths but an `average distance' can still be calculated between any two taxa. In this article, we ask whether this average distance is able to distinguish reticulate evolution from pure tree-like evolution. We consider two types of reticulation networks: hybridization networks and HGT networks. For the former, we establish a general result which shows that average distances between extant taxa can appear tree-like, but only under a single hybridization event near the root; in all other cases, the two forms of evolution can be distinguished by average distances. For HGT networks, we demonstrate some analogous but more intricate results.
  • The variation in genome arrangements among bacterial taxa is largely due to the process of inversion. Recent studies indicate that not all inversions are equally probable, suggesting, for instance, that shorter inversions are more frequent than longer, and those that move the terminus of replication are less probable than those that do not. Current methods for establishing the inversion distance between two bacterial genomes are unable to incorporate such information. In this paper we suggest a group-theoretic framework that in principle can take these constraints into account. In particular, we show that by lifting the problem from circular permutations to the affine symmetric group, the inversion distance can be found in polynomial time for a model in which inversions are restricted to acting on two regions. This requires the proof of new results in group theory, and suggests a vein of new combinatorial problems concerning permutation groups on which group theorists will be needed to collaborate with biologists. We apply the new method to inferring distances and phylogenies for published Yersinia pestis data.
  • Rearrangements of bacterial chromosomes can be studied mathematically at several levels, most prominently at a local, or sequence level, as well as at a topological level. The biological changes involved locally are inversions, deletions, and transpositions, while topologically they are knotting and catenation. These two modelling approaches share some surprising algebraic features related to braid groups and Coxeter groups. The structural approach that is at the core of algebra has long found applications in sciences such as physics and analytical chemistry, but only in a small number of ways so far in biology. And yet there are examples where an algebraic viewpoint may capture a deeper structure behind biological phenomena. This article discusses a family of biological problems in bacterial genome evolution for which this may be the case, and raises the prospect that the tools developed by algebraists over the last century might provide insight to this area of evolutionary biology. .
  • The extension of majorization (also called the rearrangement ordering), to more general groups than the symmetric (permutation) group, is referred to as $G$-majorization. There are strong results in the case that $G$ is a reflection group and this paper builds on this theory in the direction of subgroups, normal subgroups, quotient groups and extensions. The implications for fundamental cones and order-preserving functions are studied. The main example considered is the hyperoctahedral group, which, acting on a vector in $\mathbb R^n$, permutes and changes the signs of components.