• ### A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees(1410.0247)

Nov. 8, 2016 math.ST, stat.TH, stat.ME
We introduce a novel scheme for choosing the regularization parameter in high-dimensional linear regression with Lasso. This scheme, inspired by Lepski's method for bandwidth selection in non-parametric regression, is equipped with both optimal finite-sample guarantees and a fast algorithm. In particular, for any design matrix such that the Lasso has low sup-norm error under an "oracle choice" of the regularization parameter, we show that our method matches the oracle performance up to a small constant factor, and show that it can be implemented by performing simple tests along a single Lasso path. By applying the Lasso to simulated and real data, we find that our novel scheme can be faster and more accurate than standard schemes such as Cross-Validation.
• ### Adaptive wavelet multivariate regression with errors in variables(1601.02762)

Jan. 12, 2016 math.ST, stat.TH
In the multidimensional setting, we consider the errors-in-variables model. We aim at estimating the unknown nonparametric multivariate regression function with errors in the covariates. We devise an adaptive estimator based on projection kernels on wavelets and a deconvolution operator. We propose an automatic and fully data driven procedure to select the wavelet level resolution. We obtain an oracle inequality and optimal rates of convergence over anisotropic H{\"o}lder classes. Our theoretical results are illustrated by some simulations.
• ### Bandwidth selection in kernel empirical risk minimization via the gradient(1401.6882)

Aug. 18, 2015 math.ST, stat.TH
In this paper, we deal with the data-driven selection of multidimensional and possibly anisotropic bandwidths in the general framework of kernel empirical risk minimization. We propose a universal selection rule, which leads to optimal adaptive results in a large variety of statistical models such as nonparametric robust regression and statistical learning with errors in variables. These results are stated in the context of smooth loss functions, where the gradient of the risk appears as a good criterion to measure the performance of our estimators. The selection rule consists of a comparison of gradient empirical risks. It can be viewed as a nontrivial improvement of the so-called Goldenshluger-Lepski method to nonlinear estimators. Furthermore, one main advantage of our selection rule is the nondependency on the Hessian matrix of the risk, usually involved in standard adaptive procedures.
• ### A robust, adaptive M-estimator for pointwise estimation in heteroscedastic regression(1207.4447)

July 9, 2014 math.ST, stat.TH
We introduce a robust and fully adaptive method for pointwise estimation in heteroscedastic regression. We allow for noise and design distributions that are unknown and fulfill very weak assumptions only. In particular, we do not impose moment conditions on the noise distribution. Moreover, we do not require a positive density for the design distribution. In a first step, we study the consistency of locally polynomial M-estimators that consist of a contrast and a kernel. Afterwards, minimax results are established over unidimensional H\"older spaces for degenerate design. We then choose the contrast and the kernel that minimize an empirical variance term and demonstrate that the corresponding M-estimator is adaptive with respect to the noise and design distributions and adaptive (Huber) minimax for contamination models. In a second step, we additionally choose a data-driven bandwidth via Lepski's method. This leads to an M-estimator that is adaptive with respect to the noise and design distributions and, additionally, adaptive with respect to the smoothness of an isotropic, multivariate, locally polynomial target function. These results are also extended to anisotropic, locally constant target functions. Our data-driven approach provides, in particular, a level of robustness that adapts to the noise, contamination, and outliers.
The problem of adaptive noisy clustering is investigated. Given a set of noisy observations $Z_i=X_i+\epsilon_i$, $i=1,...,n$, the goal is to design clusters associated with the law of $X_i$'s, with unknown density $f$ with respect to the Lebesgue measure. Since we observe a corrupted sample, a direct approach as the popular {\it $k$-means} is not suitable in this case. In this paper, we propose a noisy $k$-means minimization, which is based on the $k$-means loss function and a deconvolution estimator of the density $f$. In particular, this approach suffers from the dependence on a bandwidth involved in the deconvolution kernel. Fast rates of convergence for the excess risk are proposed for a particular choice of the bandwidth, which depends on the smoothness of the density $f$. Then, we turn out into the main issue of the paper: the data-driven choice of the bandwidth. We state an adaptive upper bound for a new selection rule, called ERC (Empirical Risk Comparison). This selection rule is based on the Lepski's principle, where empirical risks associated with different bandwidths are compared. Finally, we illustrate that this adaptive rule can be used in many statistical problems of $M$-estimation where the empirical risk depends on a nuisance parameter.
This paper deals with the nonparametric estimation in heteroscedastic regression $Y_i=f(X_i)+\xi_i, \: i=1,...,n$, with incomplete information, i.e. each real random variable $\xi_i$ has a density $g_{i}$ which is unknown to the statistician. The aim is to estimate the regression function $f$ at a given point. Using a local polynomial fitting from M-estimator denoted $\hat f^h$ and applying Lepski's procedure for the bandwidth selection, we construct an estimator $\hat f^{\hat h}$ which is adaptive over the collection of isotropic H\"{o}lder classes. In particular, we establish new exponential inequalities to control deviations of local M-estimators allowing to construct the minimax estimator. The advantage of this estimator is that it does not depend on densities of random errors and we only assume that the probability density functions are symmetric and monotonically on $\bR_+$. It is important to mention that our estimator is robust compared to extreme values of the noise.