• Online social media are information resources that can have a transformative power in society. While the Web was envisioned as an equalizing force that allows everyone to access information, the digital divide prevents large amounts of people from being present online. Online social media in particular are prone to gender inequality, an important issue given the link between social media use and employment. Understanding gender inequality in social media is a challenging task due to the necessity of data sources that can provide large-scale measurements across multiple countries. Here we show how the Facebook Gender Divide (FGD), a metric based on aggregated statistics of more than 1.4 Billion users in 217 countries, explains various aspects of worldwide gender inequality. Our analysis shows that the FGD encodes gender equality indices in education, health, and economic opportunity. We find gender differences in network externalities that suggest that using social media has an added value for women. Furthermore, we find that low values of the FGD are associated with increases in economic gender equality. Our results suggest that online social networks, while suffering evident gender imbalance, may lower the barriers that women have to access informational resources and help to narrow the economic gender gap.
  • AI researchers employ not only the scientific method, but also methodology from mathematics and engineering. However, the use of the scientific method - specifically hypothesis testing - in AI is typically conducted in service of engineering objectives. Growing interest in topics such as fairness and algorithmic bias show that engineering-focused questions only comprise a subset of the important questions about AI systems. This results in the AI Knowledge Gap: the number of unique AI systems grows faster than the number of studies that characterize these systems' behavior. To close this gap, we argue that the study of AI could benefit from the greater inclusion of researchers who are well positioned to formulate and test hypotheses about the behavior of AI systems. We examine the barriers preventing social and behavioral scientists from conducting such studies. Our diagnosis suggests that accelerating the scientific study of AI systems requires new incentives for academia and industry, mediated by new tools and institutions. To address these needs, we propose a two-sided marketplace called TuringBox. On one side, AI contributors upload existing and novel algorithms to be studied scientifically by others. On the other side, AI examiners develop and post machine intelligence tasks designed to evaluate and characterize algorithmic behavior. We discuss this market's potential to democratize the scientific study of AI behavior, and thus narrow the AI Knowledge Gap.
  • Since Alan Turing envisioned Artificial Intelligence (AI) [1], a major driving force behind technical progress has been competition with human cognition. Historical milestones have been frequently associated with computers matching or outperforming humans in difficult cognitive tasks (e.g. face recognition [2], personality classification [3], driving cars [4], or playing video games [5]), or defeating humans in strategic zero-sum encounters (e.g. Chess [6], Checkers [7], Jeopardy! [8], Poker [9], or Go [10]). In contrast, less attention has been given to developing autonomous machines that establish mutually cooperative relationships with people who may not share the machine's preferences. A main challenge has been that human cooperation does not require sheer computational power, but rather relies on intuition [11], cultural norms [12], emotions and signals [13, 14, 15, 16], and pre-evolved dispositions toward cooperation [17], common-sense mechanisms that are difficult to encode in machines for arbitrary contexts. Here, we combine a state-of-the-art machine-learning algorithm with novel mechanisms for generating and acting on signals to produce a new learning algorithm that cooperates with people and other machines at levels that rival human cooperation in a variety of two-player repeated stochastic games. This is the first general-purpose algorithm that is capable, given a description of a previously unseen game environment, of learning to cooperate with people within short timescales in scenarios previously unanticipated by algorithm designers. This is achieved without complex opponent modeling or higher-order theories of mind, thus showing that flexible, fast, and general human-machine cooperation is computationally achievable using a non-trivial, but ultimately simple, set of algorithmic mechanisms.
  • The analysis of the creation, mutation, and propagation of social media content on the Internet is an essential problem in computational social science, affecting areas ranging from marketing to political mobilization. A first step towards understanding the evolution of images online is the analysis of rapidly modifying and propagating memetic imagery or `memes'. However, a pitfall in proceeding with such an investigation is the current incapability to produce a robust semantic space for such imagery, capable of understanding differences in Image Macros. In this study, we provide a first step in the systematic study of image evolution on the Internet, by proposing an algorithm based on sparse representations and deep learning to decouple various types of content in such images and produce a rich semantic embedding. We demonstrate the benefits of our approach on a variety of tasks pertaining to memes and Image Macros, such as image clustering, image retrieval, topic prediction and virality prediction, surpassing the existing methods on each. In addition to its utility on quantitative tasks, our method opens up the possibility of obtaining the first large-scale understanding of the evolution and propagation of memetic imagery.
  • Hurricane Sandy was one of the deadliest and costliest of hurricanes over the past few decades. Many states experienced significant power outage, however many people used social media to communicate while having limited or no access to traditional information sources. In this study, we explored the evolution of various communication patterns using machine learning techniques and determined user concerns that emerged over the course of Hurricane Sandy. The original data included ~52M tweets coming from ~13M users between October 14, 2012 and November 12, 2012. We run topic model on ~763K tweets from top 4,029 most frequent users who tweeted about Sandy at least 100 times. We identified 250 well-defined communication patterns based on perplexity. Conversations of most frequent and relevant users indicate the evolution of numerous storm-phase (warning, response, and recovery) specific topics. People were also concerned about storm location and time, media coverage, and activities of political leaders and celebrities. We also present each relevant keyword that contributed to one particular pattern of user concerns. Such keywords would be particularly meaningful in targeted information spreading and effective crisis communication in similar major disasters. Each of these words can also be helpful for efficient hash-tagging to reach target audience as needed via social media. The pattern recognition approach of this study can be used in identifying real time user needs in future crises.
  • The city has proven to be the most successful form of human agglomeration and provides wide employment opportunities for its dwellers. As advances in robotics and artificial intelligence revive concerns about the impact of automation on jobs, a question looms: How will automation affect employment in cities? Here, we provide a comparative picture of the impact of automation across U.S. urban areas. Small cities will undertake greater adjustments, such as worker displacement and job content substitutions. We demonstrate that large cities exhibit increased occupational and skill specialization due to increased abundance of managerial and technical professions. These occupations are not easily automatable, and, thus, reduce the potential impact of automation in large cities. Our results pass several robustness checks including potential errors in the estimation of occupational automation and sub-sampling of occupations. Our study provides the first empirical law connecting two societal forces: urban agglomeration and automation's impact on employment.
  • Modeling and predicting the popularity of online content is a significant problem for the practice of information dissemination, advertising, and consumption. Recent work analyzing massive datasets advances our understanding of popularity, but one major gap remains: To precisely quantify the relationship between the popularity of an online item and the external promotions it receives. This work supplies the missing link between exogenous inputs from public social media platforms, such as Twitter, and endogenous responses within the content platform, such as YouTube. We develop a novel mathematical model, the Hawkes intensity process, which can explain the complex popularity history of each video according to its type of content, network of diffusion, and sensitivity to promotion. Our model supplies a prototypical description of videos, called an endo-exo map. This map explains popularity as the result of an extrinsic factor - the amount of promotions from the outside world that the video receives, acting upon two intrinsic factors - sensitivity to promotion, and inherent virality. We use this model to forecast future popularity given promotions on a large 5-months feed of the most-tweeted videos, and found it to lower the average error by 28.6% from approaches based on popularity history. Finally, we can identify videos that have a high potential to become viral, as well as those for which promotions will have hardly any effect.
  • We conduct the largest ever investigation into the relationship between meteorological conditions and the sentiment of human expressions. To do this, we employ over three and a half billion social media posts from tens of millions of individuals from both Facebook and Twitter between 2009 and 2016. We find that cold temperatures, hot temperatures, precipitation, narrower daily temperature ranges, humidity, and cloud cover are all associated with worsened expressions of sentiment, even when excluding weather-related posts. We compare the magnitude of our estimates with the effect sizes associated with notable historical events occurring within our data.
  • Constitutions help define domestic political orders, but are known to be influenced by two international mechanisms: one that reflects global temporal trends in legal development, and another that reflects international network dynamics such as shared colonial history. We introduce the provision space; the growing set of all legal provisions existing in the world's constitutions over time. Through this we uncover a third mechanism influencing constitutional change: hierarchical dependencies between legal provisions, under which the adoption of essential, fundamental provisions precedes more advanced provisions. This third mechanism appears to play an especially important role in the emergence of new political rights, and may therefore provide a useful roadmap for advocates of those rights. We further characterise each legal provision in terms of the strength of these mechanisms.
  • Location-based social network data offers the promise of collecting the data from a large base of users over a longer span of time at negligible cost. While several studies have applied social network data to activity and mobility analysis, a comparison with travel diaries and general statistics has been lacking. In this paper, we analysed geo-referenced Twitter activities from a large number of users in Singapore and neighbouring countries. By combining this data, population statistics and travel diaries and applying clustering techniques, we addressed detection of activity locations, as well as spatial separation and transitions between these locations. Kernel density estimation performs best to detect activity locations due to the scattered nature of the twitter data; more activity locations are detected per user than reported in the travel survey. The descriptive analysis shows that determining home locations is more difficult than detecting work locations for most planning zones. Spatial separations between detected activity locations from Twitter data - as reported in a travel survey and captured by public transport smart card data - are mostly similarly distributed, but also show relevant differences for very short and very long distances. This also holds for the transitions between zones. Whether the differences between Twitter data and other data sources stem from differences in the population sub-sample, clustering methodology, or whether social networks are being used significantly more at specific locations must be determined by further research. Despite these shortcomings, location-based social network data offers a promising data source for insights into activity locations and mobility patterns, especially for regions where travel survey data is not readily available.
  • Many people use social media to seek information during disasters while lacking access to traditional information sources. In this study, we analyze Twitter data to understand information spreading activities of social media users during hurricane Sandy. We create multiple subgraphs of Twitter users based on activity levels and analyze network properties of the subgraphs. We observe that user information sharing activity follows a power-law distribution suggesting the existence of few highly active nodes in disseminating information and many other nodes being less active. We also observe close enough connected components and isolates at all levels of activity, and networks become less transitive, but more assortative for larger subgraphs. We also analyze the association between user activities and characteristics that may influence user behavior to spread information during a crisis. Users become more active in spreading information if they are centrally placed in the network, less eccentric, and have higher degrees. Our analysis provides insights on how to exploit user characteristics and network properties to spread information or limit the spreading of misinformation during a crisis event.
  • Superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. In light of recent advances in machine intelligence, a number of scientists, philosophers and technologists have revived the discussion about the potential catastrophic risks entailed by such an entity. In this article, we trace the origins and development of the neo-fear of superintelligence, and some of the major proposals for its containment. We argue that such containment is, in principle, impossible, due to fundamental limits inherent to computing itself. Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires simulations of such a program, something theoretically (and practically) infeasible.
  • The cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual's past actions and her private traits. To quantify this effect, we analyze the evolution of individual privacy loss by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia's contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits, such as gender, religion or education. We provide empirical evidence that the prediction accuracy for almost all private traits consistently improves over time. Surprisingly, the prediction performance for users who stopped editing after a given time still improves. The activities performed by new users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems.
  • Social influence has been shown to create significant unpredictability in cultural markets, providing one potential explanation why experts routinely fail at predicting commercial success of cultural products. To counteract the difficulty of making accurate predictions, "measure and react" strategies have been advocated but finding a concrete strategy that scales for very large markets has remained elusive so far. Here we propose a "measure and optimize" strategy based on an optimization policy that uses product quality, appeal, and social influence to maximize expected profits in the market at each decision point. Our computational experiments show that our policy leverages social influence to produce significant performance benefits for the market, while our theoretical analysis proves that our policy outperforms in expectation any policy not displaying social information. Our results contrast with earlier work which focused on showing the unpredictability and inequalities created by social influence. Not only do we show for the first time that dynamically showing consumers positive social information under our policy increases the expected performance of the seller in cultural markets. We also show that, in reasonable settings, our policy does not introduce significant unpredictability and identifies "blockbusters". Overall, these results shed new light on the nature of social influence and how it can be leveraged for the benefits of the market.
  • Could social media data aid in disaster response and damage assessment? Countries face both an increasing frequency and intensity of natural disasters due to climate change. And during such events, citizens are turning to social media platforms for disaster-related communication and information. Social media improves situational awareness, facilitates dissemination of emergency information, enables early warning systems, and helps coordinate relief efforts. Additionally, spatiotemporal distribution of disaster-related messages helps with real-time monitoring and assessment of the disaster itself. Here we present a multiscale analysis of Twitter activity before, during, and after Hurricane Sandy. We examine the online response of 50 metropolitan areas of the United States and find a strong relationship between proximity to Sandy's path and hurricane-related social media activity. We show that real and perceived threats -- together with the physical disaster effects -- are directly observable through the intensity and composition of Twitter's message stream. We demonstrate that per-capita Twitter activity strongly correlates with the per-capita economic damage inflicted by the hurricane. Our findings suggest that massive online social networks can be used for rapid assessment ("nowcasting") of damage caused by a large-scale disaster.
  • Understanding the long-term impact that changes in a city's transportation infrastructure have on its spatial interactions remains a challenge. The difficulty arises from the fact that the real impact may not be revealed in static or aggregated mobility measures, as these are remarkably robust to perturbations. More generally, the lack of longitudinal, cross-sectional data demonstrating the evolution of spatial interactions at a meaningful urban scale also hinders us from evaluating the sensitivity of movement indicators, limiting our capacity to understand the evolution of urban mobility in depth. Using very large mobility records distributed over three years we quantify the impact of the completion of a metro line extension: the circle line (CCL) in Singapore. We find that the commonly used movement indicators are almost identical before and after the project was completed. However, in comparing the temporal community structure across years, we do observe significant differences in the spatial reorganization of the affected geographical areas. The completion of CCL enables travelers to re-identify their desired destinations collectively with lower transport cost, making the community structure more consistent. These changes in locality are dynamic, and characterized over short time-scales, offering us a different approach to identify and analyze the long-term impact of new infrastructures on cities and their evolution dynamics.
  • Recent wide-spread adoption of electronic and pervasive technologies has enabled the study of human behavior at an unprecedented level, uncovering universal patterns underlying human activity, mobility, and inter-personal communication. In the present work, we investigate whether deviations from these universal patterns may reveal information about the socio-economical status of geographical regions. We quantify the extent to which deviations in diurnal rhythm, mobility patterns, and communication styles across regions relate to their unemployment incidence. For this we examine a country-scale publicly articulated social media dataset, where we quantify individual behavioral features from over 145 million geo-located messages distributed among more than 340 different Spanish economic regions, inferred by computing communities of cohesive mobility fluxes. We find that regions exhibiting more diverse mobility fluxes, earlier diurnal rhythms, and more correct grammatical styles display lower unemployment rates. As a result, we provide a simple model able to produce accurate, easily interpretable reconstruction of regional unemployment incidence from their social-media digital fingerprints alone. Our results show that cost-effective economical indicators can be built based on publicly-available social media datasets.
  • Motivated by applications in retail, online advertising, and cultural markets, this paper studies how to find the optimal assortment and positioning of products subject to a capacity constraint. We prove that the optimal assortment and positioning can be found in polynomial time for a multinomial logit model capturing utilities, position bias, and social influence. Moreover, in a dynamic market, we show that the policy that applies the optimal assortment and positioning and leverages social influence outperforms in expectation any policy not using social influence.
  • Information flow during catastrophic events is a critical aspect of disaster management. Modern communication platforms, in particular online social networks, provide an opportunity to study such flow, and a mean to derive early-warning sensors, improving emergency preparedness and response. Performance of the social networks sensor method, based on topological and behavioural properties derived from the "friendship paradox", is studied here for over 50 million Twitter messages posted before, during, and after Hurricane Sandy. We find that differences in user's network centrality effectively translate into moderate awareness advantage (up to 26 hours); and that geo-location of users within or outside of the hurricane-affected area plays significant role in determining the scale of such advantage. Emotional response appears to be universal regardless of the position in the network topology, and displays characteristic, easily detectable patterns, opening a possibility of implementing a simple "sentiment sensing" technique to detect and locate disasters.
  • Physical contact remains difficult to trace in large metropolitan networks, though it is a key vehicle for the transmission of contagious outbreaks. Co-presence encounters during daily transit use provide us with a city-scale time-resolved physical contact network, consisting of 1 billion contacts among 3 million transit users. Here, we study the advantage that knowledge of such co-presence structures may provide for early detection of contagious outbreaks. We first examine the "friend sensor" scheme --- a simple, but universal strategy requiring only local information --- and demonstrate that it provides significant early detection of simulated outbreaks. Taking advantage of the full network structure, we then identify advanced "global sensor sets", obtaining substantial early warning times savings over the friends sensor scheme. Individuals with highest number of encounters are the most efficient sensors, with performance comparable to individuals with the highest travel frequency, exploratory behavior and structural centrality. An efficiency balance emerges when testing the dependency on sensor size and evaluating sensor reliability; we find that substantial and reliable lead-time could be attained by monitoring only 0.01% of the population with the highest degree.
  • While Artificial Intelligence has successfully outperformed humans in complex combinatorial games (such as chess and checkers), humans have retained their supremacy in social interactions that require intuition and adaptation, such as cooperation and coordination games. Despite significant advances in learning algorithms, most algorithms adapt at times scales which are not relevant for interactions with humans, and therefore the advances in AI on this front have remained of a more theoretical nature. This has also hindered the experimental evaluation of how these algorithms perform against humans, as the length of experiments needed to evaluate them is beyond what humans are reasonably expected to endure (max 100 repetitions). This scenario is rapidly changing, as recent algorithms are able to converge to their functional regimes in shorter time-scales. Additionally, this shift opens up possibilities for experimental investigation: where do humans stand compared with these new algorithms? We evaluate humans experimentally against a representative element of these fast-converging algorithms. Our results indicate that the performance of at least one of these algorithms is comparable to, and even exceeds, the performance of people.
  • Social mobilization, the ability to mobilize large numbers of people via social networks to achieve highly distributed tasks, has received significant attention in recent times. This growing capability, facilitated by modern communication technology, is highly relevant to endeavors which require the search for individuals that posses rare information or skill, such as finding medical doctors during disasters, or searching for missing people. An open question remains, as to whether in time-critical situations, people are able to recruit in a targeted manner, or whether they resort to so-called blind search, recruiting as many acquaintances as possible via broadcast communication. To explore this question, we examine data from our recent success in the U.S. State Department's Tag Challenge, which required locating and photographing 5 target persons in 5 different cities in the United States and Europe in less than 12 hours, based only on a single mug-shot. We find that people are able to consistently route information in a targeted fashion even under increasing time pressure. We derive an analytical model for global mobilization and use it to quantify the extent to which people were targeting others during recruitment. Our model estimates that approximately 1 in 3 messages were of targeted fashion during the most time-sensitive period of the challenge.This is a novel observation at such short temporal scales, and calls for opportunities for devising viral incentive schemes that provide distance- or time-sensitive rewards to approach the target geography more rapidly, with applications in multiple areas from emergency preparedness, to political mobilization.
  • Crowdsourcing offers unprecedented potential for solving tasks efficiently by tapping into the skills of large groups of people. A salient feature of crowdsourcing---its openness of entry---makes it vulnerable to malicious behavior. Such behavior took place in a number of recent popular crowdsourcing competitions. We provide game-theoretic analysis of a fundamental tradeoff between the potential for increased productivity and the possibility of being set back by malicious behavior. Our results show that in crowdsourcing competitions malicious behavior is the norm, not the anomaly---a result contrary to the conventional wisdom in the area. Counterintuitively, making the attacks more costly does not deter them but leads to a less desirable outcome. These findings have cautionary implications for the design of crowdsourcing competitions.
  • The Internet has enabled the emergence of collective problem solving, also known as crowdsourcing, as a viable option for solving complex tasks. However, the openness of crowdsourcing presents a challenge because solutions obtained by it can be sabotaged, stolen, and manipulated at a low cost for the attacker. We extend a previously proposed crowdsourcing dilemma game to an iterated game to address this question. We enumerate pure evolutionarily stable strategies within the class of so-called reactive strategies, i.e., those depending on the last action of the opponent. Among the 4096 possible reactive strategies, we find 16 strategies each of which is stable in some parameter regions. Repeated encounters of the players can improve social welfare when the damage inflicted by an attack and the cost of attack are both small. Under the current framework, repeated interactions do not really ameliorate the crowdsourcing dilemma in a majority of the parameter space.
  • Peer punishment of free-riders (defectors) is a key mechanism for promoting cooperation in society. However, it is highly unstable since some cooperators may contribute to a common project but refuse to punish defectors. Centralized sanctioning institutions (for example, tax-funded police and criminal courts) can solve this problem by punishing both defectors and cooperators who refuse to punish. These institutions have been shown to emerge naturally through social learning and then displace all other forms of punishment, including peer punishment. However, this result provokes a number of questions. If centralized sanctioning is so successful, then why do many highly authoritarian states suffer from low levels of cooperation? Why do states with high levels of public good provision tend to rely more on citizen-driven peer punishment? And what happens if centralized institutions can be circumvented by individual acts of bribery? Here, we consider how corruption influences the evolution of cooperation and punishment. Our model shows that the effectiveness of centralized punishment in promoting cooperation breaks down when some actors in the model are allowed to bribe centralized authorities. Counterintuitively, increasing the sanctioning power of the central institution makes things even worse, since this prevents peer punishers from playing a role in maintaining cooperation. As a result, a weaker centralized authority is actually more effective because it allows peer punishment to restore cooperation in the presence of corruption. Our results provide an evolutionary rationale for why public goods provision rarely flourishes in polities that rely only on strong centralized institutions. Instead, cooperation requires both decentralized and centralized enforcement. These results help to explain why citizen participation is a fundamental necessity for policing the commons.