• Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and providing confidence intervals. Scaling up the posterior inference for massive-scale matrices is challenging and requires distributing both data and computation over many workers, making communication the main computational bottleneck. Embarrassingly parallel inference would remove the communication needed, by using completely independent computations on different data subsets, but it suffers from the inherent unidentifiability of BMF solutions. We introduce a hierarchical decomposition of the joint posterior distribution, which couples the subset inferences, allowing for embarrassingly parallel computations in a sequence of at most three stages. Using an efficient approximate implementation, we show improvements empirically on both real and simulated data. Our distributed approach is able to achieve a speed-up of almost an order of magnitude over the full posterior, with a negligible effect on predictive accuracy. Our method outperforms state-of-the-art embarrassingly parallel MCMC methods in accuracy, and achieves results competitive to other available distributed and parallel implementations of BMF.
  • Collaborations such as Wikipedia are a key part of the value of the modern Internet. At the same time there is concern that these collaborations are threatened by high levels of member turnover. In this paper we borrow ideas from topic analysis to editor activity on Wikipedia over time into a latent space that offers an insight into the evolving patterns of editor behavior. This latent space representation reveals a number of different categories of editor (e.g. content experts, social networkers) and we show that it does provide a signal that predicts an editor's departure from the community. We also show that long term editors gradually diversify their participation by shifting edit preference from one or two namespaces to multiple namespaces and experience relatively soft evolution in their editor profiles, while short term editors generally distribute their contribution randomly among the namespaces and experience considerably fluctuated evolution in their editor profiles.
  • One of the more disruptive reforms associated with the modern Internet is the emergence of online communities working together on knowledge artefacts such as Wikipedia and OpenStreetMap. Recently it has become clear that these initiatives are vulnerable because of problems with membership turnover. This study presents a longitudinal analysis of 891 WikiProjects where we model the impact of member turnover and social capital losses on project productivity. By examining social capital losses we attempt to provide a more nuanced analysis of member turnover. In this context social capital is modelled from a social network perspective where the loss of more central members has more impact. We find that only a small proportion of WikiProjects are in a relatively healthy state with low levels of membership turnover and social capital losses. The results show that the relationship between social capital losses and project performance is U-shaped, and that member withdrawal has significant negative effect on project outcomes. The results also support the mediation of turnover rate and network density on the curvilinear relationship.
  • In this paper we address the challenge of assessing the quality of Wikipedia pages using scores derived from edit contribution and contributor authoritativeness measures. The hypothesis is that pages with significant contributions from authoritative contributors are likely to be high-quality pages. Contributions are quantified using edit longevity measures and contributor authoritativeness is scored using centrality metrics in either the Wikipedia talk or co-author networks. The results suggest that it is useful to take into account the contributor authoritativeness when assessing the information quality of Wikipedia content. The percentile visualization of the quality scores provides some insights about the anomalous articles, and can be used to help Wikipedia editors to identify Start and Stub articles that are of relatively good quality.