Modern neural networks are often augmented with an attention mechanism, which
tells the network where to focus within the input. We propose in this paper a
new framework for sparse and structured attention, building upon a smoothed max
operator. We show that the gradient of this operator defines a mapping from
real values to probabilities, suitable as an attention mechanism. Our framework
includes softmax and a slight generalization of the recently-proposed sparsemax
as special cases. However, we also show how our framework can incorporate
modern structured penalties, resulting in more interpretable attention
mechanisms, that focus on entire segments or groups of an input. We derive
efficient algorithms to compute the forward and backward passes of our
attention mechanisms, enabling their use in a neural network trained with
backpropagation. To showcase their potential as a drop-in replacement for
existing ones, we evaluate our attention mechanisms on three large-scale tasks:
textual entailment, machine translation, and sentence summarization. Our
attention mechanisms improve interpretability without sacrificing performance;
notably, on textual entailment and summarization, we outperform the standard
attention mechanisms based on softmax and sparsemax.
Structured prediction requires searching over a combinatorial number of
structures. To tackle it, we introduce SparseMAP, a new method for sparse
structured inference, together with corresponding loss functions. SparseMAP
inference is able to automatically select only a few global structures: it is
situated between MAP inference, which picks a single structure, and marginal
inference, which assigns probability mass to all structures, including
implausible ones. Importantly, SparseMAP can be computed using only calls to a
MAP oracle, hence it is applicable even to problems where marginal inference is
intractable, such as linear assignment. Moreover, thanks to the solution
sparsity, gradient backpropagation is efficient regardless of the structure.
SparseMAP thus enables us to augment deep neural networks with generic and
sparse structured hidden layers. Experiments in dependency parsing and natural
language inference reveal competitive accuracy, improved interpretability, and
the ability to capture natural language ambiguities, which is attractive for
Factorization machines and polynomial networks are supervised polynomial
models based on an efficient low-rank decomposition. We extend these models to
the multi-output setting, i.e., for learning vector-valued functions, with
application to multi-class or multi-task problems. We cast this as the problem
of learning a 3-way tensor whose slices share a common decomposition and
propose a convex formulation of that problem. We then develop an efficient
conditional gradient algorithm and prove its global convergence, despite the
fact that it involves a non-convex hidden unit selection step. On
classification tasks, we show that our algorithm achieves excellent accuracy
with much sparser models than existing methods. On recommendation system tasks,
we show how to combine our algorithm with a reduction from ordinal regression
to multi-output classification and show that the resulting algorithm
outperforms existing baselines in terms of ranking accuracy.
We propose a novel factor graph model for argument mining, designed for
settings in which the argumentative relations in a document do not necessarily
form a tree structure. (This is the case in over 20% of the web comments
dataset we release.) Our model jointly learns elementary unit type
classification and argumentative relation prediction. Moreover, our model
supports SVM and RNN parametrizations, can enforce structure constraints (e.g.,
transitivity), and can express dependencies between adjacent relations and
propositions. Our approaches outperform unstructured baselines in both web
comments and argumentative essay datasets.
Group discussions are essential for organizing every aspect of modern life,
from faculty meetings to senate debates, from grant review panels to papal
conclaves. While costly in terms of time and organization effort, group
discussions are commonly seen as a way of reaching better decisions compared to
solutions that do not require coordination between the individuals (e.g.
voting)---through discussion, the sum becomes greater than the parts. However,
this assumption is not irrefutable: anecdotal evidence of wasteful discussions
abounds, and in our own experiments we find that over 30% of discussions are
We propose a framework for analyzing conversational dynamics in order to
determine whether a given task-oriented discussion is worth having or not. We
exploit conversational patterns reflecting the flow of ideas and the balance
between the participants, as well as their linguistic choices. We apply this
framework to conversations naturally occurring in an online collaborative world
exploration game developed and deployed to support this research. Using this
setting, we show that linguistic cues and conversational patterns extracted
from the first 20 seconds of a team discussion are predictive of whether it
will be a wasteful or a productive one.
Changing someone's opinion is arguably one of the most important challenges
of social interaction. The underlying process proves difficult to study: it is
hard to know how someone's opinions are formed and whether and how someone's
views shift. Fortunately, ChangeMyView, an active community on Reddit, provides
a platform where users present their own opinions and reasoning, invite others
to contest them, and acknowledge when the ensuing discussions change their
original views. In this work, we study these interactions to understand the
mechanisms behind persuasion.
We find that persuasive arguments are characterized by interesting patterns
of interaction dynamics, such as participant entry-order and degree of
back-and-forth exchange. Furthermore, by comparing similar counterarguments to
the same opinion, we show that language factors play an essential role. In
particular, the interplay between the language of the opinion holder and that
of the counterargument provides highly predictive cues of persuasiveness.
Finally, since even in this favorable setting people may not be persuaded, we
investigate the problem of determining whether someone's opinion is susceptible
to being changed at all. For this more difficult task, we show that stylistic
choices in how the opinion is expressed carry predictive power.
Interpersonal relations are fickle, with close friendships often dissolving
into enmity. In this work, we explore linguistic cues that presage such
transitions by studying dyadic interactions in an online strategy game where
players form alliances and break those alliances through betrayal. We
characterize friendships that are unlikely to last and examine temporal
patterns that foretell betrayal.
We reveal that subtle signs of imminent betrayal are encoded in the
conversational patterns of the dyad, even if the victim is not aware of the
relationship's fate. In particular, we find that lasting friendships exhibit a
form of balance that manifests itself through language. In contrast, sudden
changes in the balance of certain conversational attributes---such as positive
sentiment, politeness, or focus on future planning---signal impending betrayal.
Given the extremely large pool of events and stories available, media outlets
need to focus on a subset of issues and aspects to convey to their audience.
Outlets are often accused of exhibiting a systematic bias in this selection
process, with different outlets portraying different versions of reality.
However, in the absence of objective measures and empirical evidence, the
direction and extent of systematicity remains widely disputed.
In this paper we propose a framework based on quoting patterns for
quantifying and characterizing the degree to which media outlets exhibit
systematic bias. We apply this framework to a massive dataset of news articles
spanning the six years of Obama's presidency and all of his speeches, and
reveal that a systematic pattern does indeed emerge from the outlet's quoting
behavior. Moreover, we show that this pattern can be successfully exploited in
an unsupervised prediction setting, to determine which new quotes an outlet
will select to broadcast. By encoding bias patterns in a low-rank space we
provide an analysis of the structure of political media coverage. This reveals
a latent media bias space that aligns surprisingly well with political ideology
and outlet type. A linguistic analysis exposes striking differences across
these latent dimensions, showing how the different types of media outlets
portray different realities even when reporting on the same events. For
example, outlets mapped to the mainstream conservative side of the latent space
focus on quotes that portray a presidential persona disproportionately
characterized by negativity.
Scikit-learn is an increasingly popular machine learning li- brary. Written
in Python, it is designed to be simple and efficient, accessible to
non-experts, and reusable in various contexts. In this paper, we present and
discuss our design choices for the application programming interface (API) of
the project. In particular, we describe the simple and elegant interface shared
by all learning and processing units in the library and then discuss its
advantages in terms of composition and reusability. The paper also comments on
implementation details specific to the Python ecosystem and analyzes obstacles
faced by users and developers of the library.