-
Online social media are information resources that can have a transformative
power in society. While the Web was envisioned as an equalizing force that
allows everyone to access information, the digital divide prevents large
amounts of people from being present online. Online social media in particular
are prone to gender inequality, an important issue given the link between
social media use and employment. Understanding gender inequality in social
media is a challenging task due to the necessity of data sources that can
provide large-scale measurements across multiple countries. Here we show how
the Facebook Gender Divide (FGD), a metric based on aggregated statistics of
more than 1.4 Billion users in 217 countries, explains various aspects of
worldwide gender inequality. Our analysis shows that the FGD encodes gender
equality indices in education, health, and economic opportunity. We find gender
differences in network externalities that suggest that using social media has
an added value for women. Furthermore, we find that low values of the FGD are
associated with increases in economic gender equality. Our results suggest that
online social networks, while suffering evident gender imbalance, may lower the
barriers that women have to access informational resources and help to narrow
the economic gender gap.
-
Crowd employment is a new form of short-term and flexible employment which
has emerged during the past decade. In order to understand this new form of
employment, it is crucial to illuminate the underlying motivations of the
workforce involved in it. This paper introduces the Multidimensional
Crowdworker Motivation Scale (MCMS), a scale for measuring the motivation of
crowdworkers on micro-task platforms. The MCMS is theoretically grounded in
self-determination theory and tailored specifically to the context of paid
crowdsourced micro-labor. The scale measures the motivation of crowdworkers
along six motivational dimensions, ranging from amotivation to intrinsic
motivation. We validated the MCMS on data collected in ten countries and three
income groups. Factor analyses demonstrated that the MCMS's six dimensions
showed good model fit, validity, and reliability. Furthermore, our measurement
invariance tests showed that motivations measured with the MCMS are comparable
across countries and income groups, and we present a first cross-country
comparison of crowdworker motivations. This work constitutes an important first
step towards understanding the motivations of the international crowd
workforce.
-
The association between light and psychological states has a long history and
permeates our language. LIVEIA (Light-based Immersive Visualization Environment
for Imaginative Actualization) is a new immersive, interactive technology that
uses physical light as a metaphor for visualizing peoples' inner lives and
relationships. This paper outlines its educational value, as a tool for
understanding and explaining aspects of how people think and interact, and its
potential therapeutic value as a form of art therapy in which the artwork has
straightforwardly interpretable symbolic meanings.
-
In many domain applications, a continuous timeline of human locations is
critical; for example for understanding possible locations where a disease may
spread, or the flow of traffic. While data sources such as GPS trackers or Call
Data Records are temporally-rich, they are expensive, often not publicly
available or garnered only in select locations, restricting their wide use.
Conversely, geo-located social media data are publicly and freely available,
but present challenges especially for full timeline inference due to their
sparse nature. We propose a stochastic framework, Intermediate Location
Computing (ILC) which uses prior knowledge about human mobility patterns to
predict every missing location from an individual's social media timeline. We
compare ILC with a state-of-the-art RNN baseline as well as methods that are
optimized for next-location prediction only. For three major cities, ILC
predicts the top 1 location for all missing locations in a timeline, at 1 and
2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all
compared methods). Specifically, ILC also outperforms the RNN in settings of
low data; both cases of very small number of users (under 50), as well as
settings with more users, but with sparser timelines. In general, the RNN model
needs a higher number of users to achieve the same performance as ILC. Overall,
this work illustrates the tradeoff between prior knowledge of heuristics and
more data, for an important societal problem of filling in entire timelines
using freely available, but sparse social media data.
-
Recent progress in applying complex network theory to problems in quantum
information has resulted in a beneficial crossover. Complex network methods
have successfully been applied to transport and entanglement models while
information physics is setting the stage for a theory of complex systems with
quantum information-inspired methods. Novel quantum induced effects have been
predicted in random graphs---where edges represent entangled links---and
quantum computer algorithms have been proposed to offer enhancement for several
network problems. Here we review the results at the cutting edge, pinpointing
the similarities and the differences found at the intersection of these two
fields.
-
Characterizing human values is a topic deeply interwoven with the sciences,
humanities, art, and many other human endeavors. In recent years, a number of
thinkers have argued that accelerating trends in computer science, cognitive
science, and related disciplines foreshadow the creation of intelligent
machines which meet and ultimately surpass the cognitive abilities of human
beings, thereby entangling an understanding of human values with future
technological development. Contemporary research accomplishments suggest
sophisticated AI systems becoming widespread and responsible for managing many
aspects of the modern world, from preemptively planning users' travel schedules
and logistics, to fully autonomous vehicles, to domestic robots assisting in
daily living. The extrapolation of these trends has been most forcefully
described in the context of a hypothetical "intelligence explosion," in which
the capabilities of an intelligent software agent would rapidly increase due to
the presence of feedback loops unavailable to biological organisms. The
possibility of superintelligent agents, or simply the widespread deployment of
sophisticated, autonomous AI systems, highlights an important theoretical
problem: the need to separate the cognitive and rational capacities of an agent
from the fundamental goal structure, or value system, which constrains and
guides the agent's actions. The "value alignment problem" is to specify a goal
structure for autonomous agents compatible with human values. In this brief
article, we suggest that recent ideas from affective neuroscience and related
disciplines aimed at characterizing neurological and behavioral universals in
the mammalian class provide important conceptual foundations relevant to
describing human values. We argue that the notion of "mammalian value systems"
points to a potential avenue for fundamental research in AI safety and AI
ethics.
-
We propose a camera-based assistive text reading framework to help blind
persons read text labels and product packaging from hand-held objects in their
daily life. To isolate the object from untidy backgrounds or other surrounding
objects in the camera vision, we initially propose an efficient and effective
motion based method to define a region of interest (ROI) in the video by asking
the user to tremble the object. This scheme extracts moving object region by a
mixture-of-Gaussians-based background subtraction technique. In the extracted
ROI, text localization and recognition are conducted to acquire text details.
To automatically focus the text regions from the object ROI, we offer a novel
text localization algorithm by learning gradient features of stroke
orientations and distributions of edge pixels in an Adaboost model. Text
characters in the localized text regions are then binarized and recognized by
off-the-shelf optical character identification software. The renowned text
codes are converted into audio output to the blind users. Performance of the
suggested text localization algorithm is quantitatively evaluated on ICDAR-2003
and ICDAR-2011 Robust Reading Datasets. Experimental results demonstrate that
our algorithm achieves the highest level of developments at present time. The
proof-of-concept example is also evaluated on a dataset collected using ten
blind persons to evaluate the effectiveness of the scheme. We explore the user
interface issues and robustness of the algorithm in extracting and reading text
from different objects with complex backgrounds.
-
We describe the course of a hackathon dedicated to the development of
linguistic tools for Tibetan Buddhist studies. Over a period of five days, a
group of seventeen scholars, scientists, and students developed and compared
algorithms for intertextual alignment and text classification, along with some
basic language tools, including a stemmer and word segmenter.
-
In this work we describe a simple MATLAB based language which allows to
create randomized multiple choice questions with minimal effort. This language
has been successfully tested at Flinders University by the author in a number
of mathematics topics including Numerical Analysis, Abstract Algebra and
Partial Differential Equations.
The open source code of Spike is available at:
https://github.com/NurullaAzamov/Spike.
Enquiries about Spike should be sent to azamovnurulla@gmail.com
-
We present a general approach to automating ethical decisions, drawing on
machine learning and computational social choice. In a nutshell, we propose to
learn a model of societal preferences, and, when faced with a specific ethical
dilemma at runtime, efficiently aggregate those preferences to identify a
desirable choice. We provide a concrete algorithm that instantiates our
approach; some of its crucial steps are informed by a new theory of
swap-dominance efficient voting rules. Finally, we implement and evaluate a
system for ethical decision making in the autonomous vehicle domain, using
preference data collected from 1.3 million people through the Moral Machine
website.
-
The Internet of Mobile Things encompasses stream data being generated by
sensors, network communications that pull and push these data streams, as well
as running processing and analytics that can effectively leverage actionable
information for transportation planning, management, and business advantage.
Edge computing emerges as a new paradigm that decentralizes the communication,
computation, control and storage resources from the cloud to the edge of the
network. This paper proposes an edge computing platform where mobile edge nodes
are physical devices deployed on a transit bus where descriptive analytics is
used to uncover meaningful patterns from real-time transit data streams. An
application experiment is used to evaluate the advantages and disadvantages of
our proposed platform to support descriptive analytics at a mobile edge node
and generate actionable information to transit managers.
-
Smartphones have ubiquitously integrated into our home and work environments,
however, users normally rely on explicit but inefficient identification
processes in a controlled environment. Therefore, when a device is stolen, a
thief can have access to the owner's personal information and services against
the stored passwords. As a result of this potential scenario, this work
proposes an automatic legitimate user identification system based on gait
biometrics extracted from user walking patterns captured by a smartphone. A set
of preprocessing schemes is applied to calibrate noisy and invalid samples and
augment the gait-induced time and frequency domain features, then further
optimized using a non-linear unsupervised feature selection method. The
selected features create an underlying gait biometric representation able to
discriminate among individuals and identify them uniquely. Different
classifiers (i.e. Support Vector Machine (SVM), K-Nearest Neighbors (KNN),
Bagging, and Extreme Learning Machine (ELM)) are adopted to achieve accurate
legitimate user identification. Extensive experiments on a group of 16
individuals in an indoor environment show the effectiveness of the proposed
solution: with 5 to 70 samples per window, KNN and bagging classifiers
achieve 87−99% accuracy, 82−98% for ELM, and 81−94% for SVM. The
proposed pipeline achieves a 100% true positive and 0% false-negative
rate for almost all classifiers.
-
In the recent years, the rapid spread of mobile device has create the vast
amount of mobile data. However, some shallow-structure models such as support
vector machine (SVM) have difficulty dealing with high dimensional data with
the development of mobile network. In this paper, we analyze mobile data to
predict human trajectories in order to understand human mobility pattern via a
deep-structure model called "DeepSpace". To the best of out knowledge, it is
the first time that the deep learning approach is applied to predicting human
trajectories. Furthermore, we develop the vanilla convolutional neural network
(CNN) to be an online learning system, which can deal with the continuous
mobile data stream. In general, "DeepSpace" consists of two different
prediction models corresponding to different scales in space (the coarse
prediction model and fine prediction models). This two models constitute a
hierarchical structure, which enable the whole architecture to be run in
parallel. Finally, we test our model based on the data usage detail records
(UDRs) from the mobile cellular network in a city of southeastern China,
instead of the call detail records (CDRs) which are widely used by others as
usual. The experiment results show that "DeepSpace" is promising in human
trajectories prediction.
-
The Internet of Things (IoT) represents a comprehensive environment that
consists of a large number of smart devices interconnecting heterogeneous
physical objects to the Internet. Many domains such as logistics,
manufacturing, agriculture, urban computing, home automation, ambient assisted
living and various ubiquitous computing applications have utilised IoT
technologies. Meanwhile, Business Process Management Systems (BPMS) have become
a successful and efficient solution for coordinated management and optimised
utilisation of resources/entities. However, past BPMS have not considered many
issues they will face in managing large scale connected heterogeneous IoT
entities. Without fully understanding the behaviour, capability and state of
the IoT entities, the BPMS can fail to manage the IoT integrated information
systems. In this paper, we analyse existing BPMS for IoT and identify the
limitations and their drawbacks based on Mobile Cloud Computing perspective.
Later, we discuss a number of open challenges in BPMS for IoT.
-
Machine learning systems are increasingly used to support public sector
decision-making across a variety of sectors. Given concerns around
accountability in these domains, and amidst accusations of intentional or
unintentional bias, there have been increased calls for transparency of these
technologies. Few, however, have considered how logics and practices concerning
transparency have been understood by those involved in the machine learning
systems already being piloted and deployed in public bodies today. This short
paper distils insights about transparency on the ground from interviews with 27
such actors, largely public servants and relevant contractors, across 5 OECD
countries. Considering transparency and opacity in relation to trust and
buy-in, better decision-making, and the avoidance of gaming, it seeks to
provide useful insights for those hoping to develop socio-technical approaches
to transparency that might be useful to practitioners on-the-ground.
An extended, archival version of this paper is available as Veale M., Van
Kleek M., & Binns R. (2018). `Fairness and accountability design needs for
algorithmic support in high-stakes public sector decision-making' Proceedings
of the 2018 CHI Conference on Human Factors in Computing Systems (CHI'18),
http://doi.org/10.1145/3173574.3174014.
-
Many societal decision problems lie in high-dimensional continuous spaces not
amenable to the voting techniques common for their discrete or
single-dimensional counterparts. These problems are typically discretized
before running an election or decided upon through negotiation by
representatives. We propose a algorithm called {\sc Iterative Local Voting} for
collective decision-making in this setting. In this algorithm, voters are
sequentially sampled and asked to modify a candidate solution within some local
neighborhood of its current value, as defined by a ball in some chosen norm,
with the size of the ball shrinking at a specified rate.
We first prove the convergence of this algorithm under appropriate choices of
neighborhoods to Pareto optimal solutions with desirable fairness properties in
certain natural settings: when the voters' utilities can be expressed in terms
of some form of distance from their ideal solution, and when these utilities
are additively decomposable across dimensions. In many of these cases, we
obtain convergence to the societal welfare maximizing solution.
We then describe an experiment in which we test our algorithm for the
decision of the U.S. Federal Budget on Mechanical Turk with over 2,000 workers,
employing neighborhoods defined by L1,L2 and
L∞ balls. We make several observations that inform future
implementations of such a procedure.
-
The increasing availability and adoption of shared vehicles as an alternative
to personally-owned cars presents ample opportunities for achieving more
efficient transportation in cities. With private cars spending on the average
over 95\% of the time parked, one of the possible benefits of shared mobility
is the reduced need for parking space. While widely discussed, a systematic
quantification of these benefits as a function of mobility demand and sharing
models is still mostly lacking in the literature. As a first step in this
direction, this paper focuses on a type of private mobility which, although
specific, is a major contributor to traffic congestion and parking needs,
namely, home-work commuting. We develop a data-driven methodology for
estimating commuter parking needs in different shared mobility models,
including a model where self-driving vehicles are used to partially compensate
flow imbalance typical of commuting, and further reduce parking infrastructure
at the expense of increased traveled kilometers. We consider the city of
Singapore as a case study, and produce very encouraging results showing that
the gradual transition to shared mobility models will bring tangible reductions
in parking infrastructure. In the future-looking, self-driving vehicle
scenario, our analysis suggests that up to 50\% reduction in parking needs can
be achieved at the expense of increasing total traveled kilometers of less than
2\%.
-
The Shannon-Weaver model of linear information transmission is extended with
two loops potentially generating redundancies: (i) meaning is provided locally
to the information from the perspective of hindsight, and (ii) meanings can be
codified differently and then refer to other horizons of meaning. Thus, three
layers are distinguished: variations in the communications, historical
organization at each moment of time, and evolutionary self-organization of the
codes of communication over time. Furthermore, the codes of communication can
functionally be different and then the system is both horizontally and
vertically differentiated. All these subdynamics operate in parallel and
necessarily generate uncertainty. However, meaningful information can be
considered as the specific selection of a signal from the noise; the codes of
communication are social constructs that can generate redundancy by giving
different meanings to the same information. Reflexively, one can translate
among codes in more elaborate discourses. The second (instantiating) layer can
be operationalized in terms of semantic maps using the vector space model; the
third in terms of mutual redundancy among the latent dimensions of the vector
space. Using Blaise Cronin's {\oe}uvre, the different operations of the three
layers are demonstrated empirically.
-
Trends change rapidly in today's world, prompting this key question: What is
the mechanism behind the emergence of new trends? By representing real-world
dynamic systems as complex networks, the emergence of new trends can be
symbolized by vertices that "shine." That is, at a specific time interval in a
network's life, certain vertices become increasingly connected to other
vertices. This process creates new high-degree vertices, i.e., network stars.
Thus, to study trends, we must look at how networks evolve over time and
determine how the stars behave. In our research, we constructed the largest
publicly available network evolution dataset to date, which contains 38,000
real-world networks and 2.5 million graphs. Then, we performed the first
precise wide-scale analysis of the evolution of networks with various scales.
Three primary observations resulted: (a) links are most prevalent among
vertices that join a network at a similar time; (b) the rate that new vertices
join a network is a central factor in molding a network's topology; and (c) the
emergence of network stars (high-degree vertices) is correlated with
fast-growing networks. We applied our learnings to develop a flexible
network-generation model based on large-scale, real-world data. This model
gives a better understanding of how stars rise and fall within networks, and is
applicable to dynamic systems both in nature and society.
-
A large number of statistical decision problems in the social sciences and
beyond can be framed as a (contextual) multi-armed bandit problem. However, it
is notoriously hard to develop and evaluate policies that tackle these types of
problem, and to use such policies in applied studies. To address this issue,
this paper introduces StreamingBandit, a Python web application for developing
and testing bandit policies in field studies. StreamingBandit can sequentially
select treatments using (online) policies in real time. Once StreamingBandit is
implemented in an applied context, different policies can be tested, altered,
nested, and compared. StreamingBandit makes it easy to apply a multitude of
bandit policies for sequential allocation in field experiments, and allows for
the quick development and re-use of novel policies. In this article, we detail
the implementation logic of StreamingBandit and provide several examples of its
use.
-
We propose a hybrid model of differential privacy that considers a
combination of regular and opt-in users who desire the differential privacy
guarantees of the local privacy model and the trusted curator model,
respectively. We demonstrate that within this model, it is possible to design a
new type of blended algorithm for the task of privately computing the head of a
search log. This blended approach provides significant improvements in the
utility of obtained data compared to related work while providing users with
their desired privacy guarantees. Specifically, on two large search click data
sets, comprising 1.75 and 16 GB respectively, our approach attains NDCG values
exceeding 95% across a range of privacy budget values.
-
Security surveillance is one of the most important issues in smart cities,
especially in an era of terrorism. Deploying a number of (video) cameras is a
common surveillance approach. Given the never-ending power offered by vehicles
to metropolises, exploiting vehicle traffic to design camera placement
strategies could potentially facilitate security surveillance. This article
constitutes the first effort toward building the linkage between vehicle
traffic and security surveillance, which is a critical problem for smart
cities. We expect our study could influence the decision making of surveillance
camera placement, and foster more research of principled ways of security
surveillance beneficial to our physical-world life. Code has been made publicly
available.
-
What do college students reveal to their peers on social media under complete
anonymity? Do their campus environments relate to the topics of their
disclosure? To answer these questions, I analyze Facebook confessions pages.
Popular on hundreds of college campuses, these pages allow students to
anonymously post personal confessions on a public community forum. In this
preliminary research note, I analyze several explanatory factors of online
student confessional behavior. Aggregating nearly 200,000 confessions posts
spanning a period of 3 years, I combine Latent Dirichlet Allocation (LDA) with
human verification through Mechanical Turk to scalably identify topics in these
online confessions. Where possible, I also link posts to real-world news events
parsed from Twitter. I find that confessions mentioning socioeconomics as well
as mental and physical health occur more often at top-ranking, expensive
private colleges. While event-related confessions most often mention timely
school-related events, many mention global and domestic events outside of the
local campus sphere. Results suggest that undergraduates from different
campuses disclose about topics such as race, socioeonomics, and politics
differently, but in aggregate, post in similar patterns over time.
Additionally, results confirm that anonymous Facebook confessors receive
support for confessions on important, but taboo topics such as health and
socioeconomic status.
-
We provide an overview of PSI ("a Private data Sharing Interface"), a system
we are developing to enable researchers in the social sciences and other fields
to share and explore privacy-sensitive datasets with the strong privacy
protections of differential privacy.
-
Online media outlets, in a bid to expand their reach and subsequently
increase revenue through ad monetisation, have begun adopting clickbait
techniques to lure readers to click on articles. The article fails to fulfill
the promise made by the headline. Traditional methods for clickbait detection
have relied heavily on feature engineering which, in turn, is dependent on the
dataset it is built for. The application of neural networks for this task has
only been explored partially. We propose a novel approach considering all
information found in a social media post. We train a bidirectional LSTM with an
attention mechanism to learn the extent to which a word contributes to the
post's clickbait score in a differential manner. We also employ a Siamese net
to capture the similarity between source and target information. Information
gleaned from images has not been considered in previous approaches. We learn
image embeddings from large amounts of data using Convolutional Neural Networks
to add another layer of complexity to our model. Finally, we concatenate the
outputs from the three separate components, serving it as input to a fully
connected layer. We conduct experiments over a test corpus of 19538 social
media posts, attaining an F1 score of 65.37% on the dataset bettering the
previous state-of-the-art, as well as other proposed approaches, feature
engineering or otherwise.