• Countries tend to diversify their exports by entering products that are related to their current exports. Yet this average behavior is not representative of every diversification path. In this paper, we introduce a method to identify periods when countries enter unrelated products. We analyze the economic diversification paths of 93 countries between 1965 and 2014 and find that countries enter unrelated products in only about 7.2% of all observations. We find that countries enter more unrelated products when they are at an intermediate level of economic development, and when they have higher levels of human capital. Finally, we ask whether countries entering more unrelated products grow faster than those entering only related products. The data shows that countries that enter more unrelated activities experience a small but significant increase in future economic growth, compared to countries with a similar level of income, human capital, capital stock per worker, and economic complexity.
  • Countries and cities are likely to enter economic activities that are related to those that are already present in them. Yet, while these path dependencies are universally acknowledged, we lack an understanding of the diversification strategies that can optimally balance the development of related and unrelated activities. Here, we develop algorithms to identify the activities that are optimal to target at each time step. We find that the strategies that minimize the total time needed to diversify an economy target highly connected activities during a narrow and specific time window. We compare the strategies suggested by our model with the strategies followed by countries in the diversification of their exports and research activities, finding that countries follow strategies that are close to the ones suggested by the model. These findings add to our understanding of economic diversification and also to our general understanding of diffusion in networks.
  • Recently we uploaded to the arxiv a paper entitled: Improving the Economic Complexity Index. There, we compared three metrics of the knowledge intensity of an economy, the original metric we published in 2009 (the Economic Complexity Index or ECI), a variation of the metric proposed in 2012, and a variation we called ECI+. It was brought to our attention that the definition of ECI+ was equivalent to the variation of the metric proposed in 2012. We have verified this claim, and found that while the equations are not exactly the same, they are similar enough to be our own oversight. More importantly, we now ask: how many variations of the original ECI work? In this paper we provide a simple unifying framework to explore multiple variations of ECI, including both the original 2009 ECI and the 2012 variation. We found that a large fraction of variations have a similar predictive power, indicating that the chance of finding a variation of ECI that works, after the seminal 2009 measure, are surprisingly high. In fact, more than 28 percent of these variations have a predictive power that is within 90 percent of the maximum for any variation. These findings show that, once the idea of measuring economic complexity was out, creating a variation with a similar predictive power (like the ones proposed in 2012) was trivial (a 1 in 3 shot). More importantly, the result show that using exports data to measure the knowledge intensity of an economy is a robust phenomenon that works for multiple functional forms. Moreover, the fact that multiple variations of the 2009 ECI perform close to the maximum, tells us that no variation of ECI will have a performance that is substantially better. This suggests that research efforts should focus on uncovering the mechanisms that contribute to the diffusion and accumulation of productive knowledge instead of on exploring small variations to existing measures.
  • Communication technologies, from printing to social media, affect our historical records by changing the way ideas are spread and recorded. Yet, finding statistical instruments to address the endogeneity of this relationship has been problematic. Here we use a city's distance to Mainz as an instrument for the introduction of the printing press in European cities, together with data on nearly 50 thousand biographies, to show that cities that adopted printing earlier were more likely to be the birthplace of a famous scientist or artist in the years after the introduction of printing. At the global scale, we find that the introduction of printing is associated with a significant and discontinuous increase in the number of biographies available from people born after the introduction of printing. We bring these findings to more recent communication technologies by showing that the number of radios and televisions in a country correlates with the number of performing artists and sports players from that country that reached global fame, even after controlling for GDP, population, and including country and year fixed effects. These findings support the hypothesis that the introduction of communication technologies shift historical records in the direction of the content that is best suited for each technology.
  • How much knowledge is there in an economy? In recent years, data on the mix of products that countries export has been used to construct measures of economic complexity that estimate the knowledge available in an economy and predict future economic growth. Here we introduce a new and simpler metric of economic complexity (ECI+) that measures the total exports of an economy corrected by how difficult it is to export each product. We use data from 1973 to 2013 to compare the ability of ECI+, the Economic Complexity Index (ECI), and Fitness complexity, to predict future economic growth using 5, 10, and 20-year panels in a pooled OLS, a random effects model, and a fixed effects model. We find that ECI+ outperforms ECI and Fitness in its ability to predict economic growth and in the consistency of its estimators across most econometric specifications. On average, one standard deviation increase in ECI+ is associated with an increase in annualized growth of about 4% to 5%. We then combine ECI+ with measures of physical capital, human capital, and institutions, to find a robust model of economic growth. The ability of ECI+ to predict growth, and the value of its coefficient, is robust to these controls. Also, we find that human capital, political stability, and control of corruption; are positively associated with future economic growth, and that income is negatively associated with growth, in agreement with the traditional growth literature. Finally, we use ECI+ to generate economic growth predictions for the next 20 years and compare these predictions with the ones obtained using ECI and Fitness. These findings improve the methods available to estimate the knowledge intensity of economies and predict future economic growth.
  • Industrial development is the process by which economies learn how to produce new products and services. But how do economies learn? And who do they learn from? The literature on economic geography and economic development has emphasized two learning channels: inter-industry learning, which involves learning from related industries; and inter-regional learning, which involves learning from neighboring regions. Here we use 25 years of data describing the evolution of China's economy between 1990 and 2015--a period when China multiplied its GDP per capita by a factor of ten--to explore how Chinese provinces diversified their economies. First, we show that the probability that a province will develop a new industry increases with the number of related industries that are already present in that province, a fact that is suggestive of inter-industry learning. Also, we show that the probability that a province will develop an industry increases with the number of neighboring provinces that are developed in that industry, a fact suggestive of inter-regional learning. Moreover, we find that the combination of these two channels exhibit diminishing returns, meaning that the contribution of either of these learning channels is redundant when the other one is present. Finally, we address endogeneity concerns by using the introduction of high-speed rail as an instrument to isolate the effects of inter-regional learning. Our differences-in-differences (DID) analysis reveals that the introduction of high speed-rail increased the industrial similarity of pairs of provinces connected by high-speed rail. Also, industries in provinces that were connected by rail increased their productivity when they were connected by rail to other provinces where that industry was already present. These findings suggest that inter-regional and inter-industry learning played a role in China's great economic expansion.
  • Recent work has shown that a country's productive structure constrains its level of economic growth and income inequality. Here, we compare the productive structure of countries in Latin America and the Caribbean (LAC) with that of China and other High-Performing Asian Economies (HPAE) to expose the increasing gap in their productive capabilities. Moreover, we use the product space and the Product Gini Index to reveal the structural constraints on income inequality. Our network maps reveal that HPAE have managed to diversify into products typically produced by countries with low levels of income inequality, while LAC economies have remained dependent on products related to high levels of income inequality. We also introduce the Xgini, a coefficient that captures the constraints on income inequality imposed by the mix of products a country makes. Finally, we argue that LAC countries need to emphasize a smart combination of social and economic policies to overcome the structural constraints for inclusive growth.
  • Computer vision methods that quantify the perception of urban environment are increasingly being used to study the relationship between a city's physical appearance and the behavior and health of its residents. Yet, the throughput of current methods is too limited to quantify the perception of cities across the world. To tackle this challenge, we introduce a new crowdsourced dataset containing 110,988 images from 56 cities, and 1,170,000 pairwise comparisons provided by 81,630 online volunteers along six perceptual attributes: safe, lively, boring, wealthy, depressing, and beautiful. Using this data, we train a Siamese-like convolutional neural architecture, which learns from a joint classification and ranking loss, to predict human judgments of pairwise image comparisons. Our results show that crowdsourcing combined with neural networks can produce urban perception data at the global scale.
  • Policy makers, urban planners, architects, sociologists, and economists are interested in creating urban areas that are both lively and safe. But are the safety and liveliness of neighborhoods independent characteristics? Or are they just two sides of the same coin? In a world where people avoid unsafe looking places, neighborhoods that look unsafe will be less lively, and will fail to harness the natural surveillance of human activity. But in a world where the preference for safe looking neighborhoods is small, the connection between the perception of safety and liveliness will be either weak or nonexistent. In this paper we explore the connection between the levels of activity and the perception of safety of neighborhoods in two major Italian cities by combining mobile phone data (as a proxy for activity or liveliness) with scores of perceived safety estimated using a Convolutional Neural Network trained on a dataset of Google Street View images scored using a crowdsourced visual perception survey. We find that: (i) safer looking neighborhoods are more active than what is expected from their population density, employee density, and distance to the city centre; and (ii) that the correlation between appearance of safety and activity is positive, strong, and significant, for females and people over 50, but negative for people under 30, suggesting that the behavioral impact of perception depends on the demographic of the population. Finally, we use occlusion techniques to identify the urban features that contribute to the appearance of safety, finding that greenery and street facing windows contribute to a positive appearance of safety (in agreement with Oscar Newman's defensible space theory). These results suggest that urban appearance modulates levels of human activity and, consequently, a neighborhood's rate of natural surveillance.
  • Neighborhoods populated by amenities--such as restaurants, cafes, and libraries--are considered to be a key property of desirable cities. Yet, despite the global enthusiasm for amenity-rich neighborhoods, little is known about the empirical laws governing the colocation of amenities at the neighborhood scale. Here, we contribute to our understanding of the naturally occurring neighborhood-scale agglomerations of amenities observed in cities by using a dataset summarizing the precise location of millions of amenities. We use this dataset to build the network of co-location of amenities, or Amenity Space, by first introducing a clustering algorithm to identify neighborhoods, and then using the identified neighborhoods to map the probability that two amenities will be co-located in one of them. Finally, we use the Amenity Space to build a recommender system that identifies the amenities that are missing in a neighborhood given its current pattern of specialization. This opens the door for the construction of amenity recommendation algorithms that can be used to evaluate neighborhoods and inform their improvement and development.
  • During decades the study of networks has been divided between the efforts of social scientists and natural scientists, two groups of scholars who often do not see eye to eye. In this review I present an effort to mutually translate the work conducted by scholars from both of these academic fronts hoping to continue to unify what has become a diverging body of literature. I argue that social and natural scientists fail to see eye to eye because they have diverging academic goals. Social scientists focus on explaining how context specific social and economic mechanisms drive the structure of networks and on how networks shape social and economic outcomes. By contrast, natural scientists focus primarily on modeling network characteristics that are independent of context, since their focus is to identify universal characteristics of systems instead of context specific mechanisms. In the following pages I discuss the differences between both of these literatures by summarizing the parallel theories advanced to explain link formation and the applications used by scholars in each field to justify their approach to network science. I conclude by providing an outlook on how these literatures can be further unified.
  • In recent years scholars have built maps of science by connecting the academic fields that cite each other, are cited together, or that cite a similar literature. But since scholars cannot always publish in the fields they cite, or that cite them, these science maps are only rough proxies for the potential of a scholar, organization, or country, to enter a new academic field. Here we use a large dataset of scholarly publications disambiguated at the individual level to create a map of science-or research space-where links connect pairs of fields based on the probability that an individual has published in both of them. We find that the research space is a significantly more accurate predictor of the fields that individuals and organizations will enter in the future than citation based science maps. At the country level, however, the research space and citations based science maps are equally accurate. These findings show that data on career trajectories-the set of fields that individuals have previously published in-provide more accurate predictors of future research output for more focalized units-such as individuals or organizations-than citation based science maps.
  • We present the Pantheon 1.0 dataset: a manually verified dataset of individuals that have transcended linguistic, temporal, and geographic boundaries. The Pantheon 1.0 dataset includes the 11,341 biographies present in more than 25 languages in Wikipedia and is enriched with: (i) manually verified demographic information (place and date of birth, gender) (ii) a taxonomy of occupations classifying each biography at three levels of aggregation and (iii) two measures of global popularity including the number of languages in which a biography is present in Wikipedia (L), and the Historical Popularity Index (HPI) a metric that combines information on L, time since birth, and page-views (2008-2013). We compare the Pantheon 1.0 dataset to data from the 2003 book, Human Accomplishments, and also to external measures of accomplishment in individual games and sports: Tennis, Swimming, Car Racing, and Chess. In all of these cases we find that measures of popularity (L and HPI) correlate highly with individual accomplishment, suggesting that measures of global popularity proxy the historical impact of individuals.
  • The compartmental models used to study epidemic spreading often assume the same susceptibility for all individuals, and are therefore, agnostic about the effects that differences in susceptibility can have on epidemic spreading. Here we show that--for the SIS model--differential susceptibility can make networks more vulnerable to the spread of diseases when the correlation between a node's degree and susceptibility are positive, and less vulnerable when this correlation is negative. Moreover, we show that networks become more likely to contain a pocket of infection when individuals are more likely to connect with others that have similar susceptibility (the network is segregated). These results show that the failure to include differential susceptibility to epidemic models can lead to a systematic over/under estimation of fundamental epidemic parameters when the structure of the networks is not independent from the susceptibility of the nodes or when there are correlations between the susceptibility of connected individuals.
  • In economic systems, the mix of products that countries make or export has been shown to be a strong leading indicator of economic growth. Hence, methods to characterize and predict the structure of the network connecting countries to the products that they export are relevant for understanding the dynamics of economic development. Here we study the presence and absence of industries at the global and national levels and show that these networks are significantly nested. This means that the less filled rows and columns of these networks' adjacency matrices tend to be subsets of the fuller rows and columns. Moreover, we show that nestedness remains relatively stable as the matrices become more filled over time and that this occurs because of a bias for industries that deviate from the networks' nestedness to disappear, and a bias for the missing industries that reduce nestedness to appear. This makes the appearance and disappearance of individual industries in each location predictable. We interpret the high level of nestedness observed in these networks in the context of the neutral model of development introduced by Hidalgo and Hausmann (2009). We show that, for the observed fills, the model can reproduce the high level of nestedness observed in these networks only when we assume a high level of heterogeneity in the distribution of capabilities available in countries and required by products. In the context of the neutral model, this implies that the high level of nestedness observed in these economic networks emerges as a combination of both, the complementarity of inputs and heterogeneity in the number of capabilities available in countries and required by products. The stability of nestedness in industrial ecosystems, and the predictability implied by it, demonstrates the importance of the study of network properties in the evolution of economic networks.
  • What are East Africa's industrial opportunities? In this article we explore this question by using the Product Space to study the productive structure of five south-east African countries: Kenya, Mozambique, Rwanda, Tanzania and Zambia. The Product Space is a network connecting products that tend to be exported by the same sets of countries. Since countries are more likely to develop products that are close by in the Product Space to the ones that they already produce, the Product Space can be used to help anticipate a country's industrial opportunities. Our results suggest that the most natural avenue for future product diversification for these five south-east African nations resides in the agricultural sector, since all of these nations appear to have productive structures that are pre-adapted to the production of many agricultural products that none of them are currently exporting. We conclude this paper by exploring the potential benefits of further regional economic integration by doing an exercise in which we pull together the productive structures of these five countries. This exercise shows that the products that become more accessible in the combined economy are once again predominantly agricultural. These results suggest that while diversification into all sectors should remain an important long-term goal of the region, the path towards increased diversification in the near future may well lie in a more empowered and diverse agricultural sector.
  • Much of the analysis of economic growth has focused on the study of aggregate output. Here, we deviate from this tradition and look instead at the structure of output embodied in the network connecting countries to the products that they export.We characterize this network using four structural features: the negative relationship between the diversification of a country and the average ubiquity of its exports, and the non-normal distributions for product ubiquity, country diversification and product co-export. We model the structure of the network by assuming that products require a large number of non-tradable inputs, or capabilities, and that countries differ in the completeness of the set of capabilities they have. We solve the model assuming that the probability that a country has a capability and that a product requires a capability are constant and calibrate it to the data to find that it accounts well for all of the network features except for the heterogeneity in the distribution of country diversification. In the light of the model, this is evidence of a large heterogeneity in the distribution of capabilities across countries. Finally, we show that the model implies that the increase in diversification that is expected from the accumulation of a small number of capabilities is small for countries that have a few of them and large for those with many. This implies that the forces that help drive divergence in product diversity increase with the complexity of the global economy when capabilities travel poorly.
  • For Adam Smith, wealth was related to the division of labor. As people and firms specialize in different activities, economic efficiency increases, suggesting that development is associated with an increase in the number of individual activities and with the complexity that emerges from the interactions between them. Here we develop a view of economic growth and development that gives a central role to the complexity of a country's economy by interpreting trade data as a bipartite network in which countries are connected to the products they export, and show that it is possible to quantify the complexity of a country's economy by characterizing the structure of this network. Furthermore, we show that the measures of complexity we derive are correlated with a country's level of income, and that deviations from this relationship are predictive of future growth. This suggests that countries tend to converge to the level of income dictated by the complexity of their productive structures, indicating that development efforts should focus on generating the conditions that would allow complexity to emerge in order to generate sustained growth and prosperity.
  • The use of networks to integrate different genetic, proteomic, and metabolic datasets has been proposed as a viable path toward elucidating the origins of specific diseases. Here we introduce a new phenotypic database summarizing correlations obtained from the disease history of more than 30 million patients in a Phenotypic Disease Network (PDN). We present evidence that the structure of the PDN is relevant to the understanding of illness progression by showing that (1) patients develop diseases close in the network to those they already have; (2) the progression of disease along the links of the network is different for patients of different genders and ethnicities; (3) patients diagnosed with diseases which are more highly connected in the PDN tend to die sooner than those affected by less connected diseases; and (4) diseases that tend to be preceded by others in the PDN tend to be more connected than diseases that precede other illnesses, and are associated with higher degrees of mortality. Our findings show that disease progression can be represented and studied using network methods, offering the potential to enhance our understanding of the origin and evolution of human diseases. The dataset introduced here, released concurrently with this publication, represents the largest relational phenotypic resource publicly available to the research community.
  • The empirical study of network dynamics has been limited by the lack of longitudinal data. Here we introduce a quantitative indicator of link persistence to explore the correlations between the structure of a mobile phone network and the persistence of its links. We show that persistent links tend to be reciprocal and are more common for people with low degree and high clustering. We study the redundancy of the associations between persistence, degree, clustering and reciprocity and show that reciprocity is the strongest predictor of tie persistence. The method presented can be easily adapted to characterize the dynamics of other networks and can be used to identify the links that are most likely to survive in the future.
  • An analytical approach to network dynamics is used to show that when agents copy their state randomly the network arrives to a stationary status in which the distribution of states is independent of the agents degree. The effects of network topology on the process are characterized introducing a quantity called influence and studying its behavior for scale-free and random networks. We show that for this model degree averaged means are constant in time regardless of the number of states involved.