A well-know drawback of l_1-penalized estimators is the systematic shrinkage
of the large coefficients towards zero. A simple remedy is to treat Lasso as a
model-selection procedure and to perform a second refitting step on the
selected support. In this work we formalize the notion of refitting and provide
oracle bounds for arbitrary refitting procedures of the Lasso solution. One of
the most widely used refitting techniques which is based on Least-Squares may
bring a problem of interpretability, since the signs of the refitted estimator
might be flipped with respect to the original estimator. This problem arises
from the fact that the Least-Squares refitting considers only the support of
the Lasso solution, avoiding any information about signs or amplitudes. To this
end we define a sign consistent refitting as an arbitrary refitting procedure,
preserving the signs of the first step Lasso solution and provide Oracle
inequalities for such estimators. Finally, we consider special refitting
strategies: Bregman Lasso and Boosted Lasso. Bregman Lasso has a fruitful
property to converge to the Sign-Least-Squares refitting (Least-Squares with
sign constraints), which provides with greater interpretability. We
additionally study the Bregman Lasso refitting in the case of orthogonal
design, providing with simple intuition behind the proposed method. Boosted
Lasso, in contrast, considers information about magnitudes of the first Lasso
step and allows to develop better oracle rates for prediction. Finally, we
conduct an extensive numerical study to show advantages of one approach over
others in different synthetic and semi-real scenarios.
The multi-label classification framework, where each observation can be
associated with a set of labels, has generated a tremendous amount of attention
over recent years. The modern multi-label problems are typically large-scale in
terms of number of observations, features and labels, and the amount of labels
can even be comparable with the amount of observations. In this context,
different remedies have been proposed to overcome the curse of dimensionality.
In this work, we aim at exploiting the output sparsity by introducing a new
loss, called the sparse weighted Hamming loss. This proposed loss can be seen
as a weighted version of classical ones, where active and inactive labels are
weighted separately. Leveraging the influence of sparsity in the loss function,
we provide improved generalization bounds for the empirical risk minimizer, a
suitable property for large-scale problems. For this new loss, we derive rates
of convergence linear in the underlying output-sparsity rather than linear in
the number of labels. In practice, minimizing the associated risk can be
performed efficiently by using convex surrogates and modern convex optimization
algorithms. We provide experiments on various real-world datasets demonstrating
the pertinence of our approach when compared to non-weighted techniques.