
Consider the standard nonparametric regression model and take as estimator
the penalized least squares function. In this article, we study the tradeoff
between closeness to the true function and complexity penalization of the
estimator, where complexity is described by a seminorm on a class of functions.
First, we present an exponential concentration inequality revealing the
concentration behavior of the tradeoff of the penalized least squares
estimator around a nonrandom quantity, where such quantity depends on the
problem under consideration. Then, under some conditions and for the proper
choice of the tuning parameter, we obtain bounds for this nonrandom quantity.
We illustrate our results with some examples that include the smoothing splines
estimator.

This study aims at contributing to lower bounds for empirical compatibility
constants or empirical restricted eigenvalues. This is of importance in
compressed sensing and theory for $\ell_1$regularized estimators. Let $X$ be
an $n \times p$ data matrix with rows being independent copies of a
$p$dimensional random variable. Let $\hat \Sigma := X^T X / n$ be the inner
product matrix. We show that the quadratic forms $u^T \hat \Sigma u$ are lower
bounded by a value converging to one, uniformly over the set of vectors $u$
with $u^T \Sigma_0 u $ equal to one and $\ell_1$norm at most $M$. Here
$\Sigma_0 := {\bf E} \hat \Sigma$ is the theoretical inner product matrix which
we assume to exist. The constant $M$ is required to be of small order $\sqrt {n
/ \log p}$. We assume moreover $m$th order isotropy for some $m >2$ and
subexponential tails or moments up to order $\log p$ for the entries in $X$.
As a consequence we obtain convergence of the empirical compatibility constant
to its theoretical counterpart, and similarly for the empirical restricted
eigenvalue. If the data matrix $X$ is first normalized so that its columns all
have equal length we obtain lower bounds assuming only isotropy and no further
moment conditions on its entries. The isotropy condition is shown to hold for
certain martingale situations.

We consider an additive regression model consisting of two components $f^0$
and $g^0$, where the first component $f^0$ is in some sense "smoother" than the
second $g^0$. Smoothness is here described in terms of a seminorm on the class
of regression functions. We use a penalized least squares estimator $(\hat f,
\hat g)$ of $(f^0, g^0)$ and show that the rate of convergence for $\hat f $ is
faster than the rate of convergence for $\hat g$. In fact, both rates are
generally as fast as in the case where one of the two components is known. The
theory is illustrated by a simulation study. Our proofs rely on recent results
from empirical process theory.