• We introduce a tensor-based clustering method to extract sparse, low-dimensional structure from high-dimensional, multi-indexed datasets. This framework is designed to enable detection of clusters of data in the presence of structural requirements which we encode as algebraic constraints in a linear program. Our clustering method is general and can be tailored to a variety of applications in science and industry. We illustrate our method on a collection of experiments measuring the response of genetically diverse breast cancer cell lines to an array of ligands. Each experiment consists of a cell line-ligand combination, and contains time-course measurements of the early-signalling kinases MAPK and AKT at two different ligand dose levels. By imposing appropriate structural constraints and respecting the multi-indexed structure of the data, the analysis of clusters can be optimized for biological interpretation and therapeutic understanding. We then perform a systematic, large-scale exploration of mechanistic models of MAPK-AKT crosstalk for each cluster. This analysis allows us to quantify the heterogeneity of breast cancer cell subtypes, and leads to hypotheses about the signalling mechanisms that mediate the response of the cell lines to ligands.
  • Network theory provides a useful framework for studying interconnected systems of interacting agents. Many networked systems evolve continuously in time, but most existing methods for analyzing time-dependent networks rely on discrete or discretized time. In this paper, we propose a novel approach for studying networks that evolve in continuous time by distinguishing between interactions, which we model as discrete contacts, and \emph{ties}, which represent strengths of relationships as functions of time. To illustrate our framework of tie-decay networks, we show how to examine --- in a mathematically tractable and computationally efficient way --- important (i.e., 'central') nodes in networks in which tie strengths decay in time after individuals interact. As a concrete illustration, we introduce a continuous-time generalization of PageRank centrality and apply it to a network of retweets during the 2012 National Health Service controversy in the United Kingdom. Our work also provides guidance for similar generalizations of other tools from network theory to continuous-time networks with tie decay, including for applications to streaming data.
  • Cells adapt their metabolic fluxes in response to changes in the environment. We present a framework for the systematic construction of flux-based graphs derived from organism-wide metabolic networks. Our graphs encode the directionality of metabolic fluxes via edges that represent the flow of metabolites from source to target reactions. The methodology can be applied in the absence of a specific biological context by modelling fluxes probabilistically, or can be tailored to different environmental conditions by incorporating flux distributions computed through constraint-based approaches such as Flux Balance Analysis. We illustrate our approach on the central carbon metabolism of Escherichia coli and on a metabolic model of human hepatocytes. The flux-dependent graphs under various environmental conditions and genetic perturbations exhibit systemic changes in their topological and community structure, which capture the re-routing of metabolic fluxes and the varying importance of specific reactions and pathways. By integrating constraint-based models and tools from network science, our framework allows the study of context-specific metabolic responses at a system level beyond standard pathway descriptions.
  • We examine the relationship between social structure and sentiment through the analysis of a large collection of tweets about the Irish Marriage Referendum of 2015. We obtain the sentiment of every tweet with the hashtags #marref and #marriageref that was posted in the days leading to the referendum, and construct networks to aggregate sentiment and use it to study the interactions among users. Our results show that the sentiment of mention tweets posted by users is correlated with the sentiment of received mentions, and there are significantly more connections between users with similar sentiment scores than among users with opposite scores in the mention and follower networks. We combine the community structure of the two networks with the activity level of the users and sentiment scores to find groups of users who support voting `yes' or `no' in the referendum. There were numerous conversations between users on opposing sides of the debate in the absence of follower connections, which suggests that there were efforts by some users to establish dialogue and debate across ideological divisions. Our analysis shows that social structure can be integrated successfully with sentiment to analyse and understand the disposition of social media users. These results have potential applications in the integration of data and meta-data to study opinion dynamics, public opinion modelling, and polling.
  • Social media are being increasingly used for health promotion, yet the landscape of users, messages and interactions in such fora is poorly understood. Studies of social media and diabetes have focused mostly on patients, or public agencies addressing it, but have not looked broadly at all the participants or the diversity of content they contribute. We study Twitter conversations about diabetes through the systematic analysis of 2.5 million tweets collected over 8 months and the interactions between their authors. We address three questions: (1) what themes arise in these tweets?, (2) who are the most influential users?, (3) which type of users contribute to which themes? We answer these questions using a mixed-methods approach, integrating techniques from anthropology, network science and information retrieval such as thematic coding, temporal network analysis, and community and topic detection. Diabetes-related tweets fall within broad thematic groups: health information, news, social interaction, and commercial. At the same time, humorous messages and references to popular culture appear consistently, more than any other type of tweet. We classify authors according to their temporal 'hub' and 'authority' scores. Whereas the hub landscape is diffuse and fluid over time, top authorities are highly persistent across time and comprise bloggers, advocacy groups and NGOs related to diabetes, as well as for-profit entities without specific diabetes expertise. Top authorities fall into seven interest communities as derived from their Twitter follower network. Our findings have implications for public health professionals and policy makers who seek to use social media as an engagement tool and to inform policy design.
  • Cellular signal transduction usually involves activation cascades, the sequential activation of a series of proteins following the reception of an input signal. Here we study the classic model of weakly activated cascades and obtain analytical solutions for a variety of inputs. We show that in the special but important case of optimal-gain cascades (i.e., when the deactivation rates are identical) the downstream output of the cascade can be represented exactly as a lumped nonlinear module containing an incomplete gamma function with real parameters that depend on the rates and length of the cascade, as well as parameters of the input signal. The expressions obtained can be applied to the non-identical case when the deactivation rates are random to capture the variability in the cascade outputs. We also show that cascades can be rearranged so that blocks with similar rates can be lumped and represented through our nonlinear modules. Our results can be used both to represent cascades in computational models of differential equations and to fit data efficiently, by reducing the number of equations and parameters involved. In particular, the length of the cascade appears as a real-valued parameter and can thus be fitted in the same manner as Hill coefficients. Finally, we show how the obtained nonlinear modules can be used instead of delay differential equations to model delays in signal transduction.
  • We exploit flow propagation on the directed neuronal network of the nematode Caenorhabditis elegans to reveal dynamically relevant features of its connectome. We find flow-based groupings of neurons at different levels of granularity, which we relate to functional and anatomical constituents of its nervous system. A systematic in silico evaluation of the full set of single and double neuron ablations is used to identify deletions that induce the most severe disruptions of the multi-resolution flow structure. Such ablations are linked to functionally relevant neurons, and suggest potential candidates for further in vivo investigation. In addition, we use the directional patterns of incoming and outgoing network flows at all scales to identify flow profiles for the neurons in the connectome, without pre-imposing a priori categories. The four flow roles identified are linked to signal propagation motivated by biological input-response scenarios.
  • Directionality is a crucial ingredient in many complex networks in which information, energy or influence are transmitted. In such directed networks, analysing flows (and not only the strength of connections) is crucial to reveal important features of the network that might go undetected if the orientation of connections is ignored. We showcase here a flow-based approach for community detection in networks through the study of the network of the most influential Twitter users during the 2011 riots in England. Firstly, we use directed Markov Stability to extract descriptions of the network at different levels of coarseness in terms of interest communities, i.e., groups of nodes within which flows of information are contained and reinforced. Such interest communities reveal user groupings according to location, profession, employer, and topic. The study of flows also allows us to generate an interest distance, which affords a personalised view of the attention in the network as viewed from the vantage point of any given user. Secondly, we analyse the profiles of incoming and outgoing long-range flows with a combined approach of role-based similarity and the novel relaxed minimum spanning tree algorithm to reveal that the users in the network can be classified into five roles. These flow roles go beyond the standard leader/follower dichotomy and differ from classifications based on regular/structural equivalence. We then show that the interest communities fall into distinct informational organigrams characterised by a different mix of user roles reflecting the quality of dialogue within them. Our generic framework can be used to provide insight into how flows are generated, distributed, preserved and consumed in directed networks.
  • We present a framework to cluster nodes in directed networks according to their roles by combining Role-Based Similarity (RBS) and Markov Stability, two techniques based on flows. First we compute the RBS matrix, which contains the pairwise similarities between nodes according to the scaled number of in- and out-directed paths of different lengths. The weighted RBS similarity matrix is then transformed into an undirected similarity network using the Relaxed Minimum-Spanning Tree (RMST) algorithm, which uses the geometric structure of the RBS matrix to unblur the network, such that edges between nodes with high, direct RBS are preserved. Finally, we partition the RMST similarity network into role-communities of nodes at all scales using Markov Stability to find a robust set of roles in the network. We showcase our framework through a biological and a man-made network.
  • Motivation: Estimating parameters from data is a key stage of the modelling process, particularly in biological systems where many parameters need to be estimated from sparse and noisy data sets. Over the years, a variety of heuristics have been proposed to solve this complex optimisation problem, with good results in some cases yet with limitations in the biological setting. Results: In this work, we develop an algorithm for model parameter fitting that combines ideas from evolutionary algorithms, sequential Monte Carlo and direct search optimisation. Our method performs well even when the order of magnitude and/or the range of the parameters is unknown. The method refines iteratively a sequence of parameter distributions through local optimisation combined with partial resampling from a historical prior defined over the support of all previous iterations. We exemplify our method with biological models using both simulated and real experimental data and estimate the parameters efficiently even in the absence of a priori knowledge about the parameters.
  • We present a dynamical model for rewiring and attachment in bipartite networks in which edges are added between nodes that belong to catalogs that can either be fixed in size or growing in size. The model is motivated by an empirical study of data from the video rental service Netflix, which invites its users to give ratings to the videos available in its catalog. We find that the distribution of the number of ratings given by users and that of the number of ratings received by videos both follow a power law with an exponential cutoff. We also examine the activity patterns of Netflix users and find bursts of intense video-rating activity followed by long periods of inactivity. We derive ordinary differential equations to model the acquisition of edges by the nodes over time and obtain the corresponding time-dependent degree distributions. We then compare our results with the Netflix data and find good agreement. We conclude with a discussion of how catalog models can be used to study systems in which agents are forced to choose, rate, or prioritize their interactions from a very large set of options.