
This paper aims at formulating the issue of ranking multivariate unlabeled
observations depending on their degree of abnormality as an unsupervised
statistical learning task. In the 1d situation, this problem is usually
tackled by means of tail estimation techniques: univariate observations are
viewed as all the more `abnormal' as they are located far in the tail(s) of the
underlying probability distribution. It would be desirable as well to dispose
of a scalar valued `scoring' function allowing for comparing the degree of
abnormality of multivariate observations. Here we formulate the issue of
scoring anomalies as a Mestimation problem by means of a novel functional
performance criterion, referred to as the Mass Volume curve (MV curve in
short), whose optimal elements are strictly increasing transforms of the
density almost everywhere on the support of the density. We first study the
statistical estimation of the MV curve of a given scoring function and we
provide a strategy to build confidence regions using a smoothed bootstrap
approach. Optimization of this functional criterion over the set of piecewise
constant scoring functions is next tackled. This boils down to estimating a
sequence of empirical minimum volume sets whose levels are chosen adaptively
from the data, so as to adjust to the variations of the optimal MV curve, while
controling the bias of its approximation by a stepwise curve. Generalization
bounds are then established for the difference in sup norm between the MV curve
of the empirical scoring function thus obtained and the optimal MV curve.

Originally motivated by default risk management applications, this paper
investigates a novel problem, referred to as the profitable bandit problem
here. At each step, an agent chooses a subset of the K possible actions. For
each action chosen, she then receives the sum of a random number of rewards.
Her objective is to maximize her cumulated earnings. We adapt and study three
wellknown strategies in this purpose, that were proved to be most efficient in
other settings: klUCB, BayesUCB and Thompson Sampling. For each of them, we
prove a finite time regret bound which, together with a lower bound we obtain
as well, establishes asymptotic optimality. Our goal is also to compare these
three strategies from a theoretical and empirical perspective both at the same
time. We give simple, selfcontained proofs that emphasize their similarities,
as well as their differences. While both Bayesian strategies are automatically
adapted to the geometry of information, the numerical experiments carried out
show a slight advantage for Thompson Sampling in practice.

We formulate a supervised learning problem, referred to as continuous
ranking, where a continuous realvalued label Y is assigned to an observable
r.v. X taking its values in a feature space $\mathcal{X}$ and the goal is to
order all possible observations x in $\mathcal{X}$ by means of a scoring
function $s:\mathcal{X}\rightarrow \mathbb{R}$ so that s(X) and Y tend to
increase or decrease together with highest probability. This problem
generalizes bi/multipartite ranking to a certain extent and the task of
finding optimal scoring functions s(x) can be naturally cast as optimization of
a dedicated functional criterion, called the IROC curve here, or as
maximization of the Kendall ${\tau}$ related to the pair (s(X), Y ). From the
theoretical side, we describe the optimal elements of this problem and provide
statistical guarantees for empirical Kendall ${\tau}$ maximization under
appropriate conditions for the class of scoring function candidates. We also
propose a recursive statistical learning algorithm tailored to empirical IROC
curve optimization and producing a piecewise constant scoring function that is
fully described by an oriented binary tree. Preliminary numerical experiments
highlight the difference in nature between regression and continuous ranking
and provide strong empirical evidence of the performance of empirical
optimizers of the criteria proposed.

This paper is devoted to the study of the max Karmed bandit problem, which
consists in sequentially allocating resources in order to detect extreme
values. Our contribution is twofold. We first significantly refine the analysis
of the ExtremeHunter algorithm carried out in Carpentier and Valko (2014), and
next propose an alternative approach, showing that, remarkably, Extreme Bandits
can be reduced to a classical version of the bandit problem to a certain
extent. Beyond the formal analysis, these two approaches are compared through
numerical experiments.

This paper is devoted to establishing exponential bounds for the
probabilities of deviation of a sample sum from its expectation, when the
variables involved in the summation are obtained by sampling in a finite
population according to a rejective scheme, generalizing sampling without
replacement, and by using an appropriate normalization. In contrast to Poisson
sampling, classical deviation inequalities in the i.i.d. setting do not
straightforwardly apply to sample sums related to rejective schemes, due to the
inherent dependence structure of the sampled points. We show here how to
overcome this difficulty, by combining the formulation of rejective sampling as
Poisson sampling conditioned upon the sample size with the Escher
transformation. In particular, the Bennett/Bernstein type bounds established
highlight the effect of the asymptotic variance of the (properly standardized)
sample weighted sum and are shown to be much more accurate than those based on
the negative association property shared by the terms involved in the
summation. Beyond its interest in itself, such a result for rejective sampling
is crucial, insofar as it can be extended to many other sampling schemes,
namely those that can be accurately approximated by rejective plans in the
sense of the total variation distance.

In decentralized networks (of sensors, connected objects, etc.), there is an
important need for efficient algorithms to optimize a global cost function, for
instance to learn a global model from the local data collected by each
computing unit. In this paper, we address the problem of decentralized
minimization of pairwise functions of the data points, where these points are
distributed over the nodes of a graph defining the communication topology of
the network. This general problem finds applications in ranking, distance
metric learning and graph inference, among others. We propose new gossip
algorithms based on dual averaging which aims at solving such problems both in
synchronous and asynchronous settings. The proposed framework is flexible
enough to deal with constrained and regularized variants of the optimization
problem. Our theoretical analysis reveals that the proposed algorithms preserve
the convergence rate of centralized dual averaging up to an additive bias term.
We present numerical simulations on Area Under the ROC Curve (AUC) maximization
and metric learning problems which illustrate the practical interest of our
approach.

In a wide range of statistical learning problems such as ranking, clustering
or metric learning among others, the risk is accurately estimated by
$U$statistics of degree $d\geq 1$, i.e. functionals of the training data with
low variance that take the form of averages over $k$tuples. From a
computational perspective, the calculation of such statistics is highly
expensive even for a moderate sample size $n$, as it requires averaging
$O(n^d)$ terms. This makes learning procedures relying on the optimization of
such data functionals hardly feasible in practice. It is the major goal of this
paper to show that, strikingly, such empirical risks can be replaced by
drastically computationally simpler MonteCarlo estimates based on $O(n)$ terms
only, usually referred to as incomplete $U$statistics, without damaging the
$O_{\mathbb{P}}(1/\sqrt{n})$ learning rate of Empirical Risk Minimization (ERM)
procedures. For this purpose, we establish uniform deviation results describing
the error made when approximating a $U$process by its incomplete version under
appropriate complexity assumptions. Extensions to model selection, fast rate
situations and various sampling techniques are also considered, as well as an
application to stochastic gradient descent for ERM. Finally, numerical examples
are displayed in order to provide strong empirical evidence that the approach
we promote largely surpasses more naive subsampling techniques.

Extremes play a special role in Anomaly Detection. Beyond inference and
simulation purposes, probabilistic tools borrowed from Extreme Value Theory
(EVT), such as the angular measure, can also be used to design novel
statistical learning methods for Anomaly Detection/ranking. This paper proposes
a new algorithm based on multivariate EVT to learn how to rank observations in
a high dimensional space with respect to their degree of 'abnormality'. The
procedure relies on an original dimensionreduction technique in the extreme
domain that possibly produces a sparse representation of multivariate extremes
and allows to gain insight into the dependence structure thereof, escaping the
curse of dimensionality. The representation output by the unsupervised
methodology we propose here can be combined with any Anomaly Detection
technique tailored to nonextreme data. As it performs linearly with the
dimension and almost linearly in the data (in O(dn log n)), it fits to large
scale problems. The approach in this paper is novel in that EVT has never been
used in its multivariate version in the field of Anomaly Detection.
Illustrative experimental results provide strong empirical evidence of the
relevance of our approach.

Capturing the dependence structure of multivariate extreme events is a major
concern in many fields involving the management of risks stemming from multiple
sources, e.g. portfolio monitoring, insurance, environmental risk management
and anomaly detection. One convenient (nonparametric) characterization of
extremal dependence in the framework of multivariate Extreme Value Theory (EVT)
is the angular measure, which provides direct information about the probable
'directions' of extremes, that is, the relative contribution of each
feature/coordinate of the 'largest' observations. Modeling the angular measure
in high dimensional problems is a major challenge for the multivariate analysis
of rare events. The present paper proposes a novel methodology aiming at
exhibiting a sparsity pattern within the dependence structure of extremes. This
is done by estimating the amount of mass spread by the angular measure on
representative sets of directions, corresponding to specific subcones of
$R^d\_+$. This dimension reduction technique paves the way towards scaling up
existing multivariate EVT methods. Beyond a nonasymptotic study providing a
theoretical validity framework for our method, we propose as a direct
application a first anomaly detection algorithm based on multivariate EVT.
This algorithm builds a sparse 'normal profile' of extreme behaviours, to be
confronted with new (possibly abnormal) extreme observations. Illustrative
experimental results provide strong empirical evidence of the relevance of our
approach.

Efficient and robust algorithms for decentralized estimation in networks are
essential to many distributed systems. Whereas distributed estimation of sample
mean statistics has been the subject of a good deal of attention, computation
of $U$statistics, relying on more expensive averaging over pairs of
observations, is a less investigated area. Yet, such data functionals are
essential to describe global properties of a statistical population, with
important examples including Area Under the Curve, empirical variance, Gini
mean difference and withincluster point scatter. This paper proposes new
synchronous and asynchronous randomized gossip algorithms which simultaneously
propagate data across the network and maintain local estimates of the
$U$statistic of interest. We establish convergence rate bounds of $O(1/t)$ and
$O(\log t / t)$ for the synchronous and asynchronous cases respectively, where
$t$ is the number of iterations, with explicit data and network dependent
terms. Beyond favorable comparisons in terms of rate analysis, numerical
experiments provide empirical evidence the proposed algorithms surpasses the
previously introduced approach.

In recommendation systems, one is interested in the ranking of the predicted
items as opposed to other losses such as the mean squared error. Although a
variety of ways to evaluate rankings exist in the literature, here we focus on
the Area Under the ROC Curve (AUC) as it widely used and has a strong
theoretical underpinning. In practical recommendation, only items at the top of
the ranked list are presented to the users. With this in mind, we propose a
class of objective functions over matrix factorisations which primarily
represent a smooth surrogate for the real AUC, and in a special case we show
how to prioritise the top of the list. The objectives are differentiable and
optimised through a carefully designed stochastic gradientdescentbased
algorithm which scales linearly with the size of the data. In the special case
of square loss we show how to improve computational complexity by leveraging
previously computed measures. To understand theoretically the underlying matrix
factorisation approaches we study both the consistency of the loss functions
with respect to AUC, and generalisation using Rademacher theory. The resulting
generalisation analysis gives strong motivation for the optimisation under
study. Finally, we provide computation results as to the efficacy of the
proposed method using synthetic and real data.

Assessing the probability of occurrence of extreme events is a crucial issue
in various fields like finance, insurance, telecommunication or environmental
sciences. In a multivariate framework, the tail dependence is characterized by
the socalled stable tail dependence function (STDF). Learning this structure
is the keystone of multivariate extremes. Although extensive studies have
proved consistency and asymptotic normality for the empirical version of the
STDF, nonasymptotic bounds are still missing. The main purpose of this paper
is to fill this gap. Taking advantage of adapted VCtype concentration
inequalities, upper bounds are derived with expected rate of convergence in
O(k^1/2). The concentration tools involved in this analysis rely on a more
general study of maximal deviations in low probability regions, and thus
directly apply to the classification of extreme data.

The Cuban contacttracing detection system set up in 1986 allowed the
reconstruction and analysis of the sexual network underlying the epidemic
(5,389 vertices and 4,073 edges, giant component of 2,386 nodes and 3,168
edges), shedding light onto the spread of HIV and the role of contacttracing.
Clustering based on modularity optimization provides a better visualization and
understanding of the network, in combination with the study of covariates. The
graph has a globally low but heterogeneous density, with clusters of high
intraconnectivity but low interconnectivity. Though descriptive, our results
pave the way for incorporating structure when studying stochastic SIR epidemics
spreading on social networks.

Learning how to rank multivariate unlabeled observations depending on their
degree of abnormality/novelty is a crucial problem in a wide range of
applications. In practice, it generally consists in building a real valued
"scoring" function on the feature space so as to quantify to which extent
observations should be considered as abnormal. In the 1d situation,
measurements are generally considered as "abnormal" when they are remote from
central measures such as the mean or the median. Anomaly detection then relies
on tail analysis of the variable of interest. Extensions to the multivariate
setting are far from straightforward and it is precisely the main purpose of
this paper to introduce a novel and convenient (functional) criterion for
measuring the performance of a scoring function regarding the anomaly ranking
task, referred to as the ExcessMass curve (EM curve). In addition, an adaptive
algorithm for building a scoring function based on unlabeled data X1 , . . . ,
Xn with a nearly optimal EM is proposed and is analyzed from a statistical
perspective.

In certain situations that shall be undoubtedly more and more common in the
Big Data era, the datasets available are so massive that computing statistics
over the full sample is hardly feasible, if not unfeasible. A natural approach
in this context consists in using survey schemes and substituting the "full
data" statistics with their counterparts based on the resulting random samples,
of manageable size. It is the main purpose of this paper to investigate the
impact of survey sampling with unequal inclusion probabilities on stochastic
gradient descentbased Mestimation methods in largescale statistical and
machinelearning problems. Precisely, we prove that, in presence of some a
priori information, one may significantly increase asymptotic accuracy when
choosing appropriate first order inclusion probabilities, without affecting
complexity. These striking results are described here by limit theorems and are
also illustrated by numerical experiments.

This article is devoted to the study of tail index estimation based on i.i.d.
multivariate observations, drawn from a standard heavytailed distribution,
i.e. of which 1d Paretolike marginals share the same tail index. A
multivariate Central Limit Theorem for a random vector, whose components
correspond to (possibly dependent) Hill estimators of the common shape index
alpha, is established under mild conditions. Motivated by the statistical
analysis of extremal spatial data in particular, we introduce the concept of
(standard) heavytailed random field of tail index alpha and show how this
limit result can be used in order to build an estimator of alpha with small
asymptotic mean squared error, through a proper convex linear combination of
the coordinates. Beyond asymptotic results, simulation experiments illustrating
the relevance of the approach promoted are also presented.

Incomplete rankings on a set of items $\{1,\; \ldots,\; n\}$ are orderings of
the form $a_{1}\prec\dots\prec a_{k}$, with $\{a_{1},\dots
a_{k}\}\subset\{1,\dots,n\}$ and $k < n$. Though they arise in many modern
applications, only a few methods have been introduced to manipulate them, most
of them consisting in representing any incomplete ranking by the set of all its
possible linear extensions on $\{1,\; \ldots,\; n\}$. It is the major purpose
of this paper to introduce a completely novel approach, which allows to treat
incomplete rankings directly, representing them as injective words over $\{1,\;
\ldots,\; n\}$. Unexpectedly, operations on incomplete rankings have very
simple equivalents in this setting and the topological structure of the complex
of injective words can be interpretated in a simple fashion from the
perspective of ranking. We exploit this connection here and use recent results
from algebraic topology to construct a multiresolution analysis and develop a
wavelet framework for incomplete rankings. Though purely combinatorial, this
construction relies on the same ideas underlying multiresolution analysis on a
Euclidean space, and permits to localize the information related to rankings on
each subset of items. It can be viewed as a crucial step toward nonlinear
approximation of distributions of incomplete rankings and paves the way for
many statistical applications, including preference data analysis and the
design of recommender systems.

Partitioning a graph into groups of vertices such that those within each
group are more densely connected than vertices assigned to different groups,
known as graph clustering, is often used to gain insight into the organisation
of large scale networks and for visualisation purposes. Whereas a large number
of dedicated techniques have been recently proposed for static graphs, the
design of online graph clustering methods tailored for evolving networks is a
challenging problem, and much less documented in the literature. Motivated by
the broad variety of applications concerned, ranging from the study of
biological networks to the analysis of networks of scientific references
through the exploration of communications networks such as the World Wide Web,
it is the main purpose of this paper to introduce a novel, computationally
efficient, approach to graph clustering in the evolutionary context. Namely,
the method promoted in this article can be viewed as an incremental eigenvalue
solution for the spectral clustering method described by Ng. et al. (2001). The
incremental eigenvalue solution is a general technique for finding the
approximate eigenvectors of a symmetric matrix given a change. As well as
outlining the approach in detail, we present a theoretical bound on the quality
of the approximate eigenvectors using perturbation theory. We then derive a
novel spectral clustering algorithm called Incremental Approximate Spectral
Clustering (IASC). The IASC algorithm is simple to implement and its efficacy
is demonstrated on both synthetic and real datasets modelling the evolution of
a HIV epidemic, a citation network and the purchase history graph of an
ecommerce website.

It is the main goal of this paper to propose a novel method to perform matrix
completion online. Motivated by a wide variety of applications, ranging from
the design of recommender systems to sensor network localization through
seismic data reconstruction, we consider the matrix completion problem when
entries of the matrix of interest are observed gradually. Precisely, we place
ourselves in the situation where the predictive rule should be refined
incrementally, rather than recomputed from scratch each time the sample of
observed entries increases. The extension of existing matrix completion methods
to the sequential prediction context is indeed a major issue in the Big Data
era, and yet little addressed in the literature. The algorithm promoted in this
article builds upon the Soft Impute approach introduced in Mazumder et al.
(2010). The major novelty essentially arises from the use of a randomised
technique for both computing and updating the Singular Value Decomposition
(SVD) involved in the algorithm. Though of disarming simplicity, the method
proposed turns out to be very efficient, while requiring reduced computations.
Several numerical experiments based on real datasets illustrating its
performance are displayed, together with preliminary results giving it a
theoretical basis.

It is the main goal of this article to address the bipartite ranking issue
from the perspective of functional data analysis (FDA). Given a training set of
independent realizations of a (possibly sampled) secondorder random function
with a (locally) smooth autocorrelation structure and to which a binary label
is randomly assigned, the objective is to learn a scoring function s with
optimal ROC curve. Based on linear/nonlinear waveletbased approximations, it
is shown how to select compact finite dimensional representations of the input
curves adaptively, in order to build accurate ranking rules, using recent
advances in the ranking problem for multivariate data with binary feedback.
Beyond theoretical considerations, the performance of the learning methods for
functional bipartite ranking proposed in this paper are illustrated by
numerical experiments.

This article focuses, in the context of epidemic models, on rare events that
may possibly correspond to crisis situations from the perspective of Public
Health. In general, no close analytic form for their occurrence probabilities
is available and crude MonteCarlo procedures fail. We show how recent
intensive computer simulation techniques, such as interacting branching
particle methods, can be used for estimation purposes, as well as for
generating model paths that correspond to realizations of such events.
Applications of these simulationbased methods to several epidemic models are
also considered and discussed thoroughly.

Model selection is a crucial issue in machinelearning and a wide variety of
penalisation methods (with possibly data dependent complexity penalties) have
recently been introduced for this purpose. However their empirical performance
is generally not well documented in the literature. It is the goal of this
paper to investigate to which extent such recent techniques can be successfully
used for the tuning of both the regularisation and kernel parameters in support
vector regression (SVR) and the complexity measure in regression trees (CART).
This task is traditionally solved via Vfold crossvalidation (VFCV), which
gives efficient results for a reasonable computational cost. A disadvantage
however of VFCV is that the procedure is known to provide an asymptotically
suboptimal risk estimate as the number of examples tends to infinity. Recently,
a penalisation procedure called Vfold penalisation has been proposed to
improve on VFCV, supported by theoretical arguments. Here we report on an
extensive set of experiments comparing Vfold penalisation and VFCV for
SVR/CART calibration on several benchmark datasets. We highlight cases in which
VFCV and Vfold penalisation provide poor estimates of the risk respectively
and introduce a modified penalisation technique to reduce the estimation error.

In this paper, we investigate, how information about a common food born
health hazard, known as Campylobacter, spreads once it was delivered to a
random sample of individuals in France. The central question addressed here is
how individual characteristics and the various aspects of social network
influence the spread of information. A key claim of our paper is that
information diffusion processes occur in a patterned network of social ties of
heterogeneous actors. Our percolation models show that the characteristics of
the recipients of the information matter as much if not more than the
characteristics of the sender of the information in deciding whether the
information will be transmitted through a particular tie. We also found that at
least for this particular advisory, it is not the perceived need of the
recipients for the information that matters but their general interest in the
topic.

This paper describes a graph visualization methodology based on hierarchical
maximal modularity clustering, with interactive and significant coarsening and
refining possibilities. An application of this method to HIV epidemic analysis
in Cuba is outlined.

We show how an interactive graph visualization method based on maximal
modularity clustering can be used to explore a large epidemic network. The
visual representation is used to display statistical tests results that expose
the relations between the propagation of HIV in a sexual contact network and
the sexual orientation of the patients.