
We consider a linear regression problem in a high dimensional setting where
the number of covariates $p$ can be much larger than the sample size $n$. In
such a situation, one often assumes sparsity of the regression vector, \textit
i.e., the regression vector contains many zero components. We propose a
Lassotype estimator $\hat{\beta}^{Quad}$ (where '$Quad$' stands for quadratic)
which is based on two penalty terms. The first one is the $\ell_1$ norm of the
regression coefficients used to exploit the sparsity of the regression as done
by the Lasso estimator, whereas the second is a quadratic penalty term
introduced to capture some additional information on the setting of the
problem. We detail two special cases: the ElasticNet $\hat{\beta}^{EN}$, which
deals with sparse problems where correlations between variables may exist; and
the SmoothLasso $\hat{\beta}^{SL}$, which responds to sparse problems where
successive regression coefficients are known to vary slowly (in some
situations, this can also be interpreted in terms of correlations between
successive variables). From a theoretical point of view, we establish variable
selection consistency results and show that $\hat{\beta}^{Quad}$ achieves a
Sparsity Inequality, \textit i.e., a bound in terms of the number of nonzero
components of the 'true' regression vector. These results are provided under a
weaker assumption on the Gram matrix than the one used by the Lasso. In some
situations this guarantees a significant improvement over the Lasso.
Furthermore, a simulation study is conducted and shows that the SLasso
$\hat{\beta}^{SL}$ performs better than known methods as the Lasso, the
ElasticNet $\hat{\beta}^{EN}$, and the FusedLasso with respect to the
estimation accuracy. This is especially the case when the regression vector is
'smooth', \textit i.e., when the variations between successive coefficients of
the unknown parameter of the regression are small. The study also reveals that
the theoretical calibration of the tuning parameters and the one based on 10
fold cross validation imply two SLasso solutions with close performance.

Oracle inequalities and variable selection properties for the Lasso in linear
models have been established under a variety of different assumptions on the
design matrix. We show in this paper how the different conditions and concepts
relate to each other. The restricted eigenvalue condition (Bickel et al., 2009)
or the slightly weaker compatibility condition (van de Geer, 2007) are
sufficient for oracle results. We argue that both these conditions allow for a
fairly general class of design matrices. Hence, optimality of the Lasso for
prediction and estimation holds for more general situations than what it
appears from coherence (Bunea et al, 2007b,c) or restricted isometry (Candes
and Tao, 2005) assumptions.

We consider highdimensional generalized linear models with Lipschitz loss
functions, and prove a nonasymptotic oracle inequality for the empirical risk
minimizer with Lasso penalty. The penalty is based on the coefficients in the
linear predictor, after normalization with the empirical norm. The examples
include logistic regression, density estimation and classification with hinge
loss. Least squares regression is also discussed.

We study a highdimensional generalized linear model and penalized empirical
risk minimization with $\ell_1$ penalty. Our aim is to provide a nontrivial
illustration that nonasymptotic bounds for the estimator can be obtained
without relying on the chaining technique and/or the peeling device.