
A wellknow drawback of l_1penalized estimators is the systematic shrinkage
of the large coefficients towards zero. A simple remedy is to treat Lasso as a
modelselection procedure and to perform a second refitting step on the
selected support. In this work we formalize the notion of refitting and provide
oracle bounds for arbitrary refitting procedures of the Lasso solution. One of
the most widely used refitting techniques which is based on LeastSquares may
bring a problem of interpretability, since the signs of the refitted estimator
might be flipped with respect to the original estimator. This problem arises
from the fact that the LeastSquares refitting considers only the support of
the Lasso solution, avoiding any information about signs or amplitudes. To this
end we define a sign consistent refitting as an arbitrary refitting procedure,
preserving the signs of the first step Lasso solution and provide Oracle
inequalities for such estimators. Finally, we consider special refitting
strategies: Bregman Lasso and Boosted Lasso. Bregman Lasso has a fruitful
property to converge to the SignLeastSquares refitting (LeastSquares with
sign constraints), which provides with greater interpretability. We
additionally study the Bregman Lasso refitting in the case of orthogonal
design, providing with simple intuition behind the proposed method. Boosted
Lasso, in contrast, considers information about magnitudes of the first Lasso
step and allows to develop better oracle rates for prediction. Finally, we
conduct an extensive numerical study to show advantages of one approach over
others in different synthetic and semireal scenarios.

The multilabel classification framework, where each observation can be
associated with a set of labels, has generated a tremendous amount of attention
over recent years. The modern multilabel problems are typically largescale in
terms of number of observations, features and labels, and the amount of labels
can even be comparable with the amount of observations. In this context,
different remedies have been proposed to overcome the curse of dimensionality.
In this work, we aim at exploiting the output sparsity by introducing a new
loss, called the sparse weighted Hamming loss. This proposed loss can be seen
as a weighted version of classical ones, where active and inactive labels are
weighted separately. Leveraging the influence of sparsity in the loss function,
we provide improved generalization bounds for the empirical risk minimizer, a
suitable property for largescale problems. For this new loss, we derive rates
of convergence linear in the underlying outputsparsity rather than linear in
the number of labels. In practice, minimizing the associated risk can be
performed efficiently by using convex surrogates and modern convex optimization
algorithms. We provide experiments on various realworld datasets demonstrating
the pertinence of our approach when compared to nonweighted techniques.

Although the Lasso has been extensively studied, the relationship between its
prediction performance and the correlations of the covariates is not fully
understood. In this paper, we give new insights into this relationship in the
context of multiple linear regression. We show, in particular, that the
incorporation of a simple correlation measure into the tuning parameter can
lead to a nearly optimal prediction performance of the Lasso even for highly
correlated covariates. However, we also reveal that for moderately correlated
covariates, the prediction performance of the Lasso can be mediocre
irrespective of the choice of the tuning parameter. We finally show that our
results also lead to nearoptimal rates for the leastsquares estimator with
total variation penalty.

We study how correlations in the design matrix influence Lasso prediction.
First, we argue that the higher the correlations are, the smaller the optimal
tuning parameter is. This implies in particular that the standard tuning
parameters, that do not depend on the design matrix, are not favorable.
Furthermore, we argue that Lasso prediction works well for any degree of
correlations if suitable tuning parameters are chosen. We study these two
subjects theoretically as well as with simulations.

We consider a linear regression problem in a high dimensional setting where
the number of covariates $p$ can be much larger than the sample size $n$. In
such a situation, one often assumes sparsity of the regression vector, \textit
i.e., the regression vector contains many zero components. We propose a
Lassotype estimator $\hat{\beta}^{Quad}$ (where '$Quad$' stands for quadratic)
which is based on two penalty terms. The first one is the $\ell_1$ norm of the
regression coefficients used to exploit the sparsity of the regression as done
by the Lasso estimator, whereas the second is a quadratic penalty term
introduced to capture some additional information on the setting of the
problem. We detail two special cases: the ElasticNet $\hat{\beta}^{EN}$, which
deals with sparse problems where correlations between variables may exist; and
the SmoothLasso $\hat{\beta}^{SL}$, which responds to sparse problems where
successive regression coefficients are known to vary slowly (in some
situations, this can also be interpreted in terms of correlations between
successive variables). From a theoretical point of view, we establish variable
selection consistency results and show that $\hat{\beta}^{Quad}$ achieves a
Sparsity Inequality, \textit i.e., a bound in terms of the number of nonzero
components of the 'true' regression vector. These results are provided under a
weaker assumption on the Gram matrix than the one used by the Lasso. In some
situations this guarantees a significant improvement over the Lasso.
Furthermore, a simulation study is conducted and shows that the SLasso
$\hat{\beta}^{SL}$ performs better than known methods as the Lasso, the
ElasticNet $\hat{\beta}^{EN}$, and the FusedLasso with respect to the
estimation accuracy. This is especially the case when the regression vector is
'smooth', \textit i.e., when the variations between successive coefficients of
the unknown parameter of the regression are small. The study also reveals that
the theoretical calibration of the tuning parameters and the one based on 10
fold cross validation imply two SLasso solutions with close performance.

Conformal predictors, introduced by Vovk et al. (2005), serve to build
prediction intervals by exploiting a notion of conformity of the new data point
with previously observed data. In the present paper, we propose a novel method
for constructing prediction intervals for the response variable in multivariate
linear models. The main emphasis is on sparse linear models, where only few of
the covariates have significant influence on the response variable even if
their number is very large. Our approach is based on combining the principle of
conformal prediction with the $\ell_1$ penalized least squares estimator
(LASSO). The resulting confidence set depends on a parameter $\epsilon>0$ and
has a coverage probability larger than or equal to $1\epsilon$. The numerical
experiments reported in the paper show that the length of the confidence set is
small. Furthermore, as a byproduct of the proposed approach, we provide a
datadriven procedure for choosing the LASSO penalty. The selection power of
the method is illustrated on simulated data.

We consider the linear regression problem. We propose the SLasso procedure
to estimate the unknown regression parameters. This estimator enjoys sparsity
of the representation while taking into account correlation between successive
covariates (or predictors). The study covers the case when $p\gg n$, i.e. the
number of covariates is much larger than the number of observations. In the
theoretical point of view, for fixed $p$, we establish asymptotic normality and
consistency in variable selection results for our procedure. When $p\geq n$, we
provide variable selection consistency results and show that the SLasso
achieved a Sparsity Inequality, i.e., a bound in term of the number of nonzero
components of the oracle vector. It appears that the SLasso has nice variable
selection properties compared to its challengers. Furthermore, we provide an
estimator of the effective degree of freedom of the SLasso estimator. A
simulation study shows that the SLasso performs better than the Lasso as far
as variable selection is concerned especially when high correlations between
successive covariates exist. This procedure also appears to be a good
challenger to the ElasticNet (Zou and Hastie, 2005).