• ### The Smooth-Lasso and other $\ell_1+\ell_2$-penalized methods(1003.4885)

Oct. 7, 2011 math.ST, stat.TH
We consider a linear regression problem in a high dimensional setting where the number of covariates $p$ can be much larger than the sample size $n$. In such a situation, one often assumes sparsity of the regression vector, \textit i.e., the regression vector contains many zero components. We propose a Lasso-type estimator $\hat{\beta}^{Quad}$ (where '$Quad$' stands for quadratic) which is based on two penalty terms. The first one is the $\ell_1$ norm of the regression coefficients used to exploit the sparsity of the regression as done by the Lasso estimator, whereas the second is a quadratic penalty term introduced to capture some additional information on the setting of the problem. We detail two special cases: the Elastic-Net $\hat{\beta}^{EN}$, which deals with sparse problems where correlations between variables may exist; and the Smooth-Lasso $\hat{\beta}^{SL}$, which responds to sparse problems where successive regression coefficients are known to vary slowly (in some situations, this can also be interpreted in terms of correlations between successive variables). From a theoretical point of view, we establish variable selection consistency results and show that $\hat{\beta}^{Quad}$ achieves a Sparsity Inequality, \textit i.e., a bound in terms of the number of non-zero components of the 'true' regression vector. These results are provided under a weaker assumption on the Gram matrix than the one used by the Lasso. In some situations this guarantees a significant improvement over the Lasso. Furthermore, a simulation study is conducted and shows that the S-Lasso $\hat{\beta}^{SL}$ performs better than known methods as the Lasso, the Elastic-Net $\hat{\beta}^{EN}$, and the Fused-Lasso with respect to the estimation accuracy. This is especially the case when the regression vector is 'smooth', \textit i.e., when the variations between successive coefficients of the unknown parameter of the regression are small. The study also reveals that the theoretical calibration of the tuning parameters and the one based on 10 fold cross validation imply two S-Lasso solutions with close performance.
• ### On the conditions used to prove oracle results for the Lasso(0910.0722)

Oct. 5, 2009 math.ST, stat.TH, stat.ML
Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue condition (Bickel et al., 2009) or the slightly weaker compatibility condition (van de Geer, 2007) are sufficient for oracle results. We argue that both these conditions allow for a fairly general class of design matrices. Hence, optimality of the Lasso for prediction and estimation holds for more general situations than what it appears from coherence (Bunea et al, 2007b,c) or restricted isometry (Candes and Tao, 2005) assumptions.
• ### High-dimensional generalized linear models and the lasso(0804.0703)

April 4, 2008 math.ST, stat.TH
We consider high-dimensional generalized linear models with Lipschitz loss functions, and prove a nonasymptotic oracle inequality for the empirical risk minimizer with Lasso penalty. The penalty is based on the coefficients in the linear predictor, after normalization with the empirical norm. The examples include logistic regression, density estimation and classification with hinge loss. Least squares regression is also discussed.
• ### On non-asymptotic bounds for estimation in generalized linear models with highly correlated design(0709.0844)

Sept. 6, 2007 math.ST, stat.TH
We study a high-dimensional generalized linear model and penalized empirical risk minimization with $\ell_1$ penalty. Our aim is to provide a non-trivial illustration that non-asymptotic bounds for the estimator can be obtained without relying on the chaining technique and/or the peeling device.