• The neural network is a powerful computing framework that has been exploited by biological evolution and by humans for solving diverse problems. Although the computational capabilities of neural networks are determined by their structure, the current understanding of the relationships between a neural network's architecture and function is still primitive. Here we reveal that neural network's modular architecture plays a vital role in determining the neural dynamics and memory performance of the network of threshold neurons. In particular, we demonstrate that there exists an optimal modularity for memory performance, where a balance between local cohesion and global connectivity is established, allowing optimally modular networks to remember longer. Our results suggest that insights from dynamical analysis of neural networks and information spreading processes can be leveraged to better design neural networks and may shed light on the brain's modular organization.
  • In this final chapter, we consider the state-of-the-art for spreading in social systems and discuss the future of the field. As part of this reflection, we identify a set of key challenges ahead. The challenges include the following questions: how can we improve the quality, quantity, extent, and accessibility of datasets? How can we extract more information from limited datasets? How can we take individual cognition and decision making processes into account? How can we incorporate other complexity of the real contagion processes? Finally, how can we translate research into positive real-world impact? In the following, we provide more context for each of these open questions.
  • Clustering is a central approach for unsupervised learning. After clustering is applied, the most fundamental analysis is to quantitatively compare clusterings. Such comparisons are crucial for the evaluation of clustering methods as well as other tasks such as consensus clustering. It is often argued that, in order to establish a baseline, clustering similarity should be assessed in the context of a random ensemble of clusterings. The prevailing assumption for the random clustering ensemble is the permutation model in which the number and sizes of clusters are fixed. However, this assumption does not necessarily hold in practice; for example, multiple runs of K-means clustering returns clusterings with a fixed number of clusters, while the cluster size distribution varies greatly. Here, we derive corrected variants of two clustering similarity measures (the Rand index and Mutual Information) in the context of two random clustering ensembles in which the number and sizes of clusters vary. In addition, we study the impact of one-sided comparisons in the scenario with a reference clustering. The consequences of different random models are illustrated using synthetic examples, handwriting recognition, and gene expression data. We demonstrate that the choice of random model can have a drastic impact on the ranking of similar clustering pairs, and the evaluation of a clustering method with respect to a random baseline; thus, the choice of random clustering model should be carefully justified.
  • We investigate the association between musical chords and lyrics by analyzing a large dataset of user-contributed guitar tablatures. Motivated by the idea that the emotional content of chords is reflected in the words used in corresponding lyrics, we analyze associations between lyrics and chord categories. We also examine the usage patterns of chords and lyrics in different musical genres, historical eras, and geographical regions. Our overall results confirms a previously known association between Major chords and positive valence. We also report a wide variation in this association across regions, genres, and eras. Our results suggest possible existence of different emotional associations for other types of chords.
  • Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science.
  • Metrics derived from Twitter and other social media---often referred to as altmetrics---are increasingly used to estimate the broader social impacts of scholarship. Such efforts, however, may produce highly misleading results, as the entities that participate in conversations about science on these platforms are largely unknown. For instance, if altmetric activities are generated mainly by scientists, does it really capture broader social impacts of science? Here we present a systematic approach to identifying and analyzing scientists on Twitter. Our method can identify scientists across many disciplines, without relying on external bibliographic data, and be easily adapted to identify other stakeholder groups in science. We investigate the demographics, sharing behaviors, and interconnectivity of the identified scientists. We find that Twitter has been employed by scholars across the disciplinary spectrum, with an over-representation of social and computer and information scientists; under-representation of mathematical, physical, and life scientists; and a better representation of women compared to scholarly publishing. Analysis of the sharing of URLs reveals a distinct imprint of scholarly sites, yet only a small fraction of shared URLs are science-related. We find an assortative mixing with respect to disciplines in the networks between scientists, suggesting the maintenance of disciplinary walls in social media. Our work contributes to the literature both methodologically and conceptually---we provide new methods for disambiguating and identifying particular actors on social media and describing the behaviors of scientists, thus providing foundational information for the construction and use of indicators on the basis of social media metrics.
  • Online social media and games are increasingly replacing offline social activities. Social media is now an indispensable mode of communication; online gaming is not only a genuine social activity but also a popular spectator sport. With support for anonymity and larger audiences, online interaction shrinks social and geographical barriers. Despite such benefits, social disparities such as gender inequality persist in online social media. In particular, online gaming communities have been criticized for persistent gender disparities and objectification. As gaming evolves into a social platform, persistence of gender disparity is a pressing question. Yet, there are few large-scale, systematic studies of gender inequality and objectification in social gaming platforms. Here we analyze more than one billion chat messages from Twitch, a social game-streaming platform, to study how the gender of streamers is associated with the nature of conversation. Using a combination of computational text analysis methods, we show that gendered conversation and objectification is prevalent in chats. Female streamers receive significantly more objectifying comments while male streamers receive more game-related comments. This difference is more pronounced for popular streamers. There also exists a large number of users who post only on female or male streams. Employing a neural vector-space embedding (paragraph vector) method, we analyze gendered chat messages and create prediction models that (i) identify the gender of streamers based on messages posted in the channel and (ii) identify the gender a viewer prefers to watch based on their chat messages. Our findings suggest that disparities in social game-streaming platforms is a nuanced phenomenon that involves the gender of streamers as well as those who produce gendered and game-related conversation.
  • Online communication channels, especially social web platforms, are rapidly replacing traditional ones. Online platforms allow users to overcome physical barriers, enabling worldwide participation. However, the power of online communication bears an important negative consequence --- we are exposed to too much information to process. Too many participants, for example, can turn online public spaces into noisy, overcrowded fora where no meaningful conversation can be held. Here we analyze a large dataset of public chat logs from Twitch, a popular video streaming platform, in order to examine how information overload affects online group communication. We measure structural and textual features of conversations such as user output, interaction, and information content per message across a wide range of information loads. Our analysis reveals the existence of a transition from a conversational state to a cacophony --- a state of overload with lower user participation, more copy-pasted messages, and less information per message. These results hold both on average and at the individual level for the majority of users. This study provides a quantitative basis for further studies of the social effects of information overload, and may guide the design of more resilient online communication systems.
  • Human history has been marked by social instability and conflict, often driven by the irreconcilability of opposing sets of beliefs, ideologies, and religious dogmas. The dynamics of belief systems has been studied mainly from two distinct perspectives, namely how cognitive biases lead to individual belief rigidity and how social influence leads to social conformity. Here we propose a unifying framework that connects cognitive and social forces together in order to study the dynamics of societal belief evolution. Each individual is endowed with a network of interacting beliefs that evolves through interaction with other individuals in a social network. The adoption of beliefs is affected by both internal coherence and social conformity. Our framework explains how social instabilities can arise in otherwise homogeneous populations, how small numbers of zealots with highly coherent beliefs can overturn societal consensus, and how belief rigidity protects fringe groups and cults against invasion from mainstream beliefs, allowing them to persist and even thrive in larger societies. Our results suggest that strong consensus may be insufficient to guarantee social stability, that the cognitive coherence of belief-systems is vital in determining their ability to spread, and that coherent belief-systems may pose a serious problem for resolving social polarization, due to their ability to prevent consensus even under high levels of social exposure. We therefore argue that the inclusion of cognitive factors into a social model is crucial in providing a more complete picture of collective human dynamics.
  • We investigate critical behaviors of a social contagion model on weighted networks. An edge-weight compartmental approach is applied to analyze the weighted social contagion on strongly heterogenous networks with skewed degree and weight distributions. We find that degree heterogeneity can not only alter the nature of contagion transition from discontinuous to continuous but also can enhance or hamper the size of adoption, depending on the unit transmission probability. We also show that, the heterogeneity of weight distribution always hinder social contagions, and does not alter the transition type.
  • We investigate the possibility of global optimization-based overlapping community detection, using link community framework. We first show that partition density, the original quality function used in link community detection method, is not suitable as a quality function for global optimization because it prefers breaking communities into triangles except in highly limited conditions. We analytically derive those conditions and confirm it with computational results on direct optimization of various synthetic and real-world networks. To overcome this limitation, we propose alternative approaches combining the weighted line graph transformation and existing quality functions for node-based communities. We suggest a new line graph weighting scheme, a normalized Jaccard index. Computational results show that community detection using the weighted line graphs generated with the normalized Jaccard index leads to a more accurate community structure.
  • Complex networks have recently attracted much interest due to their prevalence in nature and our daily lives [1, 2]. A critical property of a network is its resilience to random breakdown and failure [3-6], typically studied as a percolation problem [7-9] or by modeling cascading failures [10-12]. Many complex systems, from power grids and the Internet to the brain and society [13-15], can be modeled using modular networks comprised of small, densely connected groups of nodes [16, 17]. These modules often overlap, with network elements belonging to multiple modules [18, 19]. Yet existing work on robustness has not considered the role of overlapping, modular structure. Here we study the robustness of these systems to the failure of elements. We show analytically and empirically that it is possible for the modules themselves to become uncoupled or non-overlapping well before the network disintegrates. If overlapping modular organization plays a role in overall functionality, networks may be far more vulnerable than predicted by conventional percolation theory.
  • Propelled by the increasing availability of large-scale high-quality data, advanced data modeling and analysis techniques are enabling many novel and significant scientific understanding of a wide range of complex social, natural, and technological systems. These developments also provide opportunities for studying cultural systems and phenomena -- which can be said to refer to all products of human creativity and way of life. An important characteristic of a cultural product is that it does not exist in isolation from others, but forms an intricate web of connections on many levels. In the creation and dissemination of cultural products and artworks in particular, collaboration and communication of ideas play an essential role, which can be captured in the heterogeneous network of the creators and practitioners of art. In this paper we propose novel methods to analyze and uncover meaningful patterns from such a network using the network of western classical musicians constructed from a large-scale comprehensive Compact Disc recordings data. We characterize the complex patterns in the network landscape of collaboration between musicians across multiple scales ranging from the macroscopic to the mesoscopic and microscopic that represent the diversity of cultural styles and the individuality of the artists.
  • In this paper we propose weighted symmetric binary matrix factorization (wSBMF) framework to detect overlapping communities in bipartite networks, which describe relationships between two types of nodes. Our method improves performance by recognizing the distinction between two types of missing edges---ones among the nodes in each node type and the others between two node types. Our method can also explicitly assign community membership and distinguish outliers from overlapping nodes, as well as incorporating existing knowledge on the network. We propose a generalized partition density for bipartite networks as a quality function, which identifies the most appropriate number of communities. The experimental results on both synthetic and real-world networks demonstrate the effectiveness of our method.
  • We investigate the impact of community structure on information diffusion with the linear threshold model. Our results demonstrate that modular structure may have counter-intuitive effects on information diffusion when social reinforcement is present. We show that strong communities can facilitate global diffusion by enhancing local, intra-community spreading. Using both analytic approaches and numerical simulations, we demonstrate the existence of an optimal network modularity, where global diffusion require the minimal number of early adopters.
  • Science is increasingly dominated by teams. Understanding patterns of scientific collaboration and their impacts on the productivity and evolution of disciplines is crucial to understand scientific processes. Electronic bibliography offers a unique opportunity to map and investigate the nature of scientific collaboration. Recent work have demonstrated a counter-intuitive organizational pattern of scientific collaboration networks: densely interconnected local clusters consist of weak ties, whereas strong ties play the role of connecting different clusters. This pattern contrasts itself from many other types of networks where strong ties form communities while weak ties connect different communities. Although there are many models for collaboration networks, no model reproduces this pattern. In this paper, we present an evolution model of collaboration networks, which reproduces many properties of real-world collaboration networks, including the organization of tie strengths, skewed degree and weight distribution, high clustering and assortative mixing.
  • We investigate the predictability of successful memes using their early spreading patterns in the underlying social networks. We propose and analyze a comprehensive set of features and develop an accurate model to predict future popularity of a meme given its early spreading patterns. Our paper provides the first comprehensive comparison of existing predictive frameworks. We categorize our features into three groups: influence of early adopters, community concentration, and characteristics of adoption time series. We find that features based on community structure are the most powerful predictors of future success. We also find that early popularity of a meme is not a good predictor of its future popularity, contrary to common belief. Our methods outperform other approaches, particularly in the task of detecting very popular or unpopular memes.
  • How does network structure affect diffusion? Recent studies suggest that the answer depends on the type of contagion. Complex contagions, unlike infectious diseases (simple contagions), are affected by social reinforcement and homophily. Hence, the spread within highly clustered communities is enhanced, while diffusion across communities is hampered. A common hypothesis is that memes and behaviors are complex contagions. We show that, while most memes indeed behave like complex contagions, a few viral memes spread across many communities, like diseases. We demonstrate that the future popularity of a meme can be predicted by quantifying its early spreading pattern in terms of community concentration. The more communities a meme permeates, the more viral it is. We present a practical method to translate data about community structure into predictive knowledge about what information will spread widely. This connection may lead to significant advances in computational social science, social media analytics, and marketing applications.
  • Food occupies a central position in every culture and it is therefore of great interest to understand the evolution of food culture. The advent of the World Wide Web and online recipe repositories has begun to provide unprecedented opportunities for data-driven, quantitative study of food culture. Here we harness an online database documenting recipes from various Chinese regional cuisines and investigate the similarity of regional cuisines in terms of geography and climate. We found that the geographical proximity, rather than climate proximity is a crucial factor that determines the similarity of regional cuisines. We develop a model of regional cuisine evolution that provides helpful clues to understand the evolution of cuisines and cultures.
  • Discovering overlapping community structures is a crucial step to understanding the structure and dynamics of many networks. In this paper we develop a symmetric binary matrix factorization model (SBMF) to identify overlapping communities. Our model allows us not only to assign community memberships explicitly to nodes, but also to distinguish outliers from overlapping nodes. In addition, we propose a modified partition density to evaluate the quality of community structures. We use this to determine the most appropriate number of communities. We evaluate our methods using both synthetic benchmarks and real world networks, demonstrating the effectiveness of our approach.
  • The cultural diversity of culinary practice, as illustrated by the variety of regional cuisines, raises the question of whether there are any general patterns that determine the ingredient combinations used in food today or principles that transcend individual tastes and recipes. We introduce a flavor network that captures the flavor compounds shared by culinary ingredients. Western cuisines show a tendency to use ingredient pairs that share many flavor compounds, supporting the so-called food pairing hypothesis. By contrast, East Asian cuisines tend to avoid compound sharing ingredients. Given the increasing availability of information on food preparation, our data-driven investigation opens new avenues towards a systematic understanding of culinary practice.
  • Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society. One crucial step when studying the structure and dynamics of networks is to identify communities: groups of related nodes that correspond to functional subunits such as protein complexes or social spheres. Communities in networks often overlap such that nodes simultaneously belong to several groups. Meanwhile, many networks are known to possess hierarchical organization, where communities are recursively grouped into a hierarchical structure. However, the fact that many real networks have communities with pervasive overlap, where each and every node belongs to more than one group, has the consequence that a global hierarchy of nodes cannot capture the relationships between overlapping groups. Here we reinvent communities as groups of links rather than nodes and show that this unorthodox approach successfully reconciles the antagonistic organizing principles of overlapping communities and hierarchy. In contrast to the existing literature, which has entirely focused on grouping nodes, link communities naturally incorporate overlap while revealing hierarchical organization. We find relevant link communities in many networks, including major biological networks such as protein-protein interaction and metabolic networks, and show that a large social network contains hierarchically organized community structures spanning inner-city to regional scales while maintaining pervasive overlap. Our results imply that link communities are fundamental building blocks that reveal overlap and hierarchical organization in networks to be two aspects of the same phenomenon.
  • Social network analysis has long been an untiring topic of sociology. However, until the era of information technology, the availability of data, mainly collected by the traditional method of personal survey, was highly limited and prevented large-scale analysis. Recently, the exploding amount of automatically generated data has completely changed the pattern of research. For instance, the enormous amount of data from so-called high-throughput biological experiments has introduced a systematic or network viewpoint to traditional biology. Then, is "high-throughput" sociological data generation possible? Google, which has become one of the most influential symbols of the new Internet paradigm within the last ten years, might provide torrents of data sources for such study in this (now and forthcoming) digital era. We investigate social networks between people by extracting information on the Web and introduce new tools of analysis of such networks in the context of statistical physics of complex systems or socio-physics. As a concrete and illustrative example, the members of the 109th United States Senate are analyzed and it is demonstrated that the methods of construction and analysis are applicable to various other weighted networks.
  • We study the non-equilibrium phase transition in a model for epidemic spreading on scale-free networks. The model consists of two particle species $A$ and $B$, and the coupling between them is taken to be asymmetric; $A$ induces $B$ while $B$ suppresses $A$. This model describes the spreading of an epidemic on networks equipped with a reactive immune system. We present analytic results on the phase diagram and the critical behavior, which depends on the degree exponent $\gamma$ of the underlying scale-free networks. Numerical simulation results that support the analytic results are also presented.
  • To find out the role of the wiring cost in the organization of the neural network of the nematode \textit{Caenorhapditis elegans} (\textit{C. elegans}), we build the neuronal map of \textit{C. elegans} based on geometrical positions of neurons and define the cost as inter-neuronal Euclidean distance \textit{d}. We show that the wiring probability decays exponentially as a function of \textit{d}. Using the edge exchanging method and the component placement optimization scheme, we show that positions of neurons are not randomly distributed but organized to reduce the total wiring cost. Furthermore, we numerically study the trade-off between the wiring cost and the performance of the Hopfield model on the neural network.