• ### The noise barrier and the large signal bias of the Lasso and other convex estimators(1804.01230)

April 4, 2018 math.ST, stat.TH
Convex estimators such as the Lasso, the matrix Lasso and the group Lasso have been studied extensively in the last two decades, demonstrating great success in both theory and practice. Two quantities are introduced, the noise barrier and the large scale bias, that provides insights on the performance of these convex regularized estimators. It is now well understood that the Lasso achieves fast prediction rates, provided that the correlations of the design satisfy some Restricted Eigenvalue or Compatibility condition, and provided that the tuning parameter is large enough. Using the two quantities introduced in the paper, we show that the compatibility condition on the design matrix is actually unavoidable to achieve fast prediction rates with the Lasso. The Lasso must incur a loss due to the correlations of the design matrix, measured in terms of the compatibility constant. This results holds for any design matrix, any active subset of covariates, and any tuning parameter. It is now well known that the Lasso enjoys a dimension reduction property: the prediction error is of order $\lambda\sqrt k$ where $k$ is the sparsity; even if the ambient dimension $p$ is much larger than $k$. Such results require that the tuning parameters is greater than some universal threshold. We characterize sharp phase transitions for the tuning parameter of the Lasso around a critical threshold dependent on $k$. If $\lambda$ is equal or larger than this critical threshold, the Lasso is minimax over $k$-sparse target vectors. If $\lambda$ is equal or smaller than critical threshold, the Lasso incurs a loss of order $\sigma\sqrt k$ --which corresponds to a model of size $k$-- even if the target vector has fewer than $k$ nonzero coefficients. Remarkably, the lower bounds obtained in the paper also apply to random, data-driven tuning parameters. The results extend to convex penalties beyond the Lasso.
• ### Optimistic lower bounds for convex regularized least-squares(1703.01332)

Oct. 6, 2017 math.ST, stat.TH
Minimax lower bounds are pessimistic in nature: for any given estimator, minimax lower bounds yield the existence of a worst-case target vector $\beta^*_{worst}$ for which the prediction error of the given estimator is bounded from below. However, minimax lower bounds shed no light on the prediction error of the given estimator for target vectors different than $\beta^*_{worst}$. A characterization of the prediction error of any convex regularized least-squares is given. This characterization provide both a lower bound and an upper bound on the prediction error. This produces lower bounds that are applicable for any target vector and not only for a single, worst-case $\beta^*_{worst}$. Finally, these lower and upper bounds on the prediction error are applied to the Lasso is sparse linear regression. We obtain a lower bound involving the compatibility constant for any tuning parameter, matching upper and lower bounds for the universal choice of the tuning parameter, and a lower bound for the Lasso with small tuning parameter.
• ### Localized Gaussian width of $M$-convex hulls with applications to Lasso and convex aggregation(1705.10696)

Sept. 26, 2017 math.ST, stat.TH
Upper and lower bounds are derived for the Gaussian mean width of the intersection of a convex hull of $M$ points with an Euclidean ball of a given radius. The upper bound holds for any collection of extreme point bounded in Euclidean norm. The upper bound and the lower bound match up to a multiplicative constant whenever the extreme points satisfy a one sided Restricted Isometry Property. This bound is then applied to study the Lasso estimator in fixed-design regression, the Empirical Risk Minimizer in the anisotropic persistence problem, and the convex aggregation problem in density estimation.
• ### A sharp oracle inequality for Graph-Slope(1706.06977)

June 21, 2017 math.ST, stat.TH
Following recent success on the analysis of the Slope estimator, we provide a sharp oracle inequality in term of prediction error for Graph-Slope, a generalization of Slope to signals observed over a graph. In addition to improving upon best results obtained so far for the Total Variation denoiser (also referred to as Graph-Lasso or Generalized Lasso), we propose an efficient algorithm to compute Graph-Slope. The proposed algorithm is obtained by applying the forward-backward method to the dual formulation of the Graph-Slope optimization problem. We also provide experiments showing the interest of the method.