
We consider the highdimensional linear regression model $Y = X \beta^0 +
\epsilon$ with Gaussian noise $\epsilon$ and Gaussian random design $X$. We
assume that $\Sigma:= E X^T X / n$ is nonsingular and write its inverse as
$\Theta := \Sigma^{1}$. The parameter of interest is the first component
$\beta_1^0$ of $\beta^0$. We show that in the highdimensional case the
asymptotic variance of a debiased Lasso estimator can be smaller than
$\Theta_{1,1}$. For some special such cases we establish asymptotic efficiency.
The conditions include $\beta^0$ being sparse and the first column $\Theta_1$
of $\Theta$ being not sparse. These conditions depend on whether $\Sigma$ is
known or not.

We present upper and lower bounds for the prediction error of the Lasso. For
the case of random Gaussian design, we show that under mild conditions the
prediction error of the Lasso is up to smaller order terms dominated by the
prediction error of its noiseless counterpart. We then provide exact
expressions for the prediction error of the latter, in terms of compatibility
constants. Here, we assume the active components of the underlying regression
function satisfy some "betamin" condition. For the case of fixed design, we
provide upper and lower bounds, again in terms of compatibility constants. As
an example, we give an up to a logarithmic term tight bound for the least
squares estimator with total variation penalty.

Many statistical estimation procedures lead to nonconvex optimization
problems. Algorithms to solve these are often guaranteed to output a stationary
point of the optimization problem. Oracle inequalities are an important
theoretical instrument to asses the statistical performance of an estimator.
Oracle results have focused on the theoretical properties of the uncomputable
(global) minimum or maximum. In the present work a general framework used for
convex optimization problems to derive oracle inequalities for stationary
points is extended. A main new ingredient of these oracle inequalities is that
they are sharp: they show closeness to the best approximation within the model
plus a remainder term. We apply this framework to different estimation
problems.

Sparse principal component analysis (sPCA) has become one of the most widely
used techniques for dimensionality reduction in highdimensional datasets. The
main challenge underlying sPCA is to estimate the first vector of loadings of
the population covariance matrix, provided that only a certain number of
loadings are nonzero. In this paper, we propose confidence intervals for
individual loadings and for the largest eigenvalue of the population covariance
matrix. Given an independent sample $X^i \in\mathbb R^p, i = 1,...,n,$
generated from an unknown distribution with an unknown covariance matrix
$\Sigma_0$, our aim is to estimate the first vector of loadings and the largest
eigenvalue of $\Sigma_0$ in a setting where $p\gg n$. Next to the
highdimensionality, another challenge lies in the inherent nonconvexity of
the problem. We base our methodology on a Lassopenalized Mestimator which,
despite nonconvexity, may be solved by a polynomialtime algorithm such as
coordinate or gradient descent. We show that our estimator achieves the minimax
optimal rates in $\ell_1$ and $\ell_2$norm. We identify the bias in the
Lassobased estimator and propose a debiased sparse PCA estimator for the
vector of loadings and for the largest eigenvalue of the covariance matrix
$\Sigma_0$. Our main results provide theoretical guarantees for asymptotic
normality of the debiased estimator. The major conditions we impose are
sparsity in the first eigenvector of small order $\sqrt{n}/\log p$ and sparsity
of the same order in the columns of the inverse Hessian matrix of the
population risk.

We provide a selected overview of methodology and theory for estimation and
inference on the edge weights in highdimensional directed and undirected
Gaussian graphical models. For undirected graphical models, two main explicit
constructions are provided: one based on a global method that maximizes the
joint likelihood (the graphical Lasso) and one based on a local (nodewise)
method that sequentially applies the Lasso to estimate the neighbourhood of
each node. The proposed estimators lead to confidence intervals for edge
weights and recovery of the edge structure. We evaluate their empirical
performance in an extensive simulation study. The theoretical guarantees for
the methods are achieved under a sparsity condition relative to the sample size
and regularity conditions. For directed acyclic graphs, we apply similar ideas
to construct confidence intervals for edge weights, when the directed acyclic
graph is identifiable.

Asymptotic lower bounds for estimation play a fundamental role in assessing
the quality of statistical procedures. In this paper we propose a framework for
obtaining semiparametric efficiency bounds for sparse highdimensional models,
where the dimension of the parameter is larger than the sample size. We adopt a
semiparametric point of view: we concentrate on one dimensional functions of a
highdimensional parameter. We follow two different approaches to reach the
lower bounds: asymptotic Cram\'erRao bounds and Le Cam's type of analysis.
Both these approaches allow us to define a class of asymptotically unbiased or
"regular" estimators for which a lower bound is derived. Consequently, we show
that certain estimators obtained by desparsifying (or debiasing) an
$\ell_1$penalized Mestimator are asymptotically unbiased and achieve the
lower bound on the variance: thus in this sense they are asymptotically
efficient. The paper discusses in detail the linear regression model and the
Gaussian graphical model.

Many results have been proved for various nuclear norm penalized estimators
of the uniform sampling matrix completion problem. However, most of these
estimators are not robust: in most of the cases the quadratic loss function and
its modifications are used. We consider robust nuclear norm penalized
estimators using two wellknown robust loss functions: the absolute value loss
and the Huber loss. Under several conditions on the sparsity of the problem
(i.e. the rank of the parameter matrix) and on the regularity of the risk
function sharp and nonsharp oracle inequalities for these estimators are shown
to hold with high probability. As a consequence, the asymptotic behavior of the
estimators is derived. Similar error bounds are obtained under the assumption
of weak sparsity, i.e. the case where the matrix is assumed to be only
approximately lowrank. In all our results we consider a highdimensional
setting. In this case, this means that we assume $n\leq pq$. Finally, various
simulations confirm our theoretical results.

In the setting of highdimensional linear regression models, we propose two
frameworks for constructing pointwise and group confidence sets for penalized
estimators which incorporate prior knowledge about the organization of the
nonzero coefficients. This is done by desparsifying the estimator as in van de
Geer et al. [18] and van de Geer and Stucky [17], then using an appropriate
estimator for the precision matrix $\Theta$. In order to estimate the precision
matrix a corresponding structured matrix norm penalty has to be introduced.
After normalization the result is an asymptotic pivot.
The asymptotic behavior is studied and simulations are added to study the
differences between the two schemes.

We consider the Lasso for a noiseless experiment where one has observations
$X \beta^0$ and uses the penalized version of basis pursuit. We compute for
some special designs the compatibility constant, a quantity closely related to
the restricted eigenvalue. We moreover show the dependence of the (penalized)
prediction error on this compatibility constant. This exercise illustrates that
compatibility is necessarily entering into the bounds for the (penalized)
prediction error and that the bounds in the literature therefore are  up to
constants  tight. We also give conditions that show that in the noisy case the
dominating term for the prediction error is given by the prediction error of
the noiseless case.

Consider the standard nonparametric regression model and take as estimator
the penalized least squares function. In this article, we study the tradeoff
between closeness to the true function and complexity penalization of the
estimator, where complexity is described by a seminorm on a class of functions.
First, we present an exponential concentration inequality revealing the
concentration behavior of the tradeoff of the penalized least squares
estimator around a nonrandom quantity, where such quantity depends on the
problem under consideration. Then, under some conditions and for the proper
choice of the tuning parameter, we obtain bounds for this nonrandom quantity.
We illustrate our results with some examples that include the smoothing splines
estimator.

We study asymptotically normal estimation and confidence regions for
lowdimensional parameters in highdimensional sparse models. Our approach is
based on the $\ell_1$penalized Mestimator which is used for construction of a
bias corrected estimator. We show that the proposed estimator is asymptotically
normal, under a sparsity assumption on the highdimensional parameter,
smoothness conditions on the expected loss and an entropy condition. This leads
to uniformly valid confidence regions and hypothesis testing for
lowdimensional parameters. The present approach is different in that it allows
for treatment of loss functions that we not sufficiently differentiable, such
as quantile loss, Huber loss or hinge loss functions. We also provide new
results for estimation of the inverse Fisher information matrix, which is
necessary for the construction of the proposed estimator. We formulate our
results for general models under highlevel conditions, but investigate these
conditions in detail for generalized linear models and provide mild sufficient
conditions. As particular examples, we investigate the case of quantile loss
and Huber loss in linear regression and demonstrate the performance of the
estimators in a simulation study and on real datasets from genomewide
association studies. We further investigate the case of logistic regression and
illustrate the performance of the estimator on simulated and real data.

We propose methodology for estimation of sparse precision matrices and
statistical inference for their lowdimensional parameters in a
highdimensional setting where the number of parameters $p$ can be much larger
than the sample size. We show that the novel estimator achieves minimax rates
in supremum norm and the lowdimensional components of the estimator have a
Gaussian limiting distribution. These results hold uniformly over the class of
precision matrices with row sparsity of small order $\sqrt{n}/\log p$ and
spectrum uniformly bounded, under a subGaussian tail assumption on the margins
of the true underlying distribution. Consequently, our results lead to
uniformly valid confidence regions for lowdimensional parameters of the
precision matrix. Thresholding the estimator leads to variable selection
without imposing irrepresentability conditions. The performance of the method
is demonstrated in a simulation study and on real data.

We study a set of regularization methods for highdimensional linear
regression models. These penalized estimators have the square root of the
residual sum of squared errors as loss function, and any weakly decomposable
norm as penalty function. This fit measure is chosen because of its property
that the estimator does not depend on the unknown standard deviation of the
noise. On the other hand, a generalized weakly decomposable norm penalty is
very useful in being able to deal with different underlying sparsity
structures. We can choose a different sparsity inducing norm depending on how
we want to interpret the unknown parameter vector $\beta$. Structured sparsity
norms, as defined in Micchelli et al. [18], are special cases of weakly
decomposable norms, therefore we also include the square root LASSO (Belloni et
al. [3]), the group square root LASSO (Bunea et al. [10]) and a new method
called the square root SLOPE (in a similar fashion to the SLOPE from Bogdan et
al. [6]). For this collection of estimators our results provide sharp oracle
inequalities with the KarushKuhnTucker conditions. We discuss some examples
of estimators. Based on a simulation we illustrate some advantages of the
square root SLOPE.

Rates of convergence for empirical risk minimizers have been well studied in
the literature. In this paper, we aim to provide a complementary set of
results, in particular by showing that after normalization, the risk of the
empirical minimizer concentrates on a single point. Such results have been
established by~\cite{chatterjee2014new} for constrained estimators in the
normal sequence model. We first generalize and sharpen this result to
regularized least squares with convex penalties, making use of a "direct"
argument based on Borell's theorem. We then study generalizations to other loss
functions, including the negative loglikelihood for exponential families
combined with a strictly convex regularization penalty. The results in this
general setting are based on more "indirect" arguments as well as on
concentration inequalities for maxima of empirical processes.

We study a highdimensional regression model. Aim is to construct a
confidence set for a given group of regression coefficients, treating all other
regression coefficients as nuisance parameters. We apply a onestep procedure
with the squareroot Lasso as initial estimator and a multivariate squareroot
Lasso for constructing a surrogate Fisher information matrix. The multivariate
squareroot Lasso is based on nuclear norm loss with $\ell_1$penalty. We show
that this procedure leads to an asymptotically $\chi^2$distributed pivot, with
a remainder term depending only on the $\ell_1$error of the initial estimator.
We show that under $\ell_1$sparsity conditions on the regression coefficients
$\beta^0$ the squareroot Lasso produces to a consistent estimator of the noise
variance and we establish sharp oracle inequalities which show that the
remainder term is small under further sparsity conditions on $\beta^0$ and
compatibility conditions on the design.

We propose methodology for statistical inference for lowdimensional
parameters of sparse precision matrices in a highdimensional setting. Our
method leads to a nonsparse estimator of the precision matrix whose entries
have a Gaussian limiting distribution. Asymptotic properties of the novel
estimator are analyzed for the case of subGaussian observations under a
sparsity assumption on the entries of the true precision matrix and regularity
conditions. Thresholding the desparsified estimator gives guarantees for edge
selection in the associated graphical model. Performance of the proposed method
is illustrated in a simulation study.

We consider highdimensional inference when the assumed linear model is
misspecified. We describe some correct interpretations and corresponding
sufficient assumptions for valid asymptotic inference of the model parameters,
which still have a useful meaning when the model is misspecified. We largely
focus on the desparsified Lasso procedure but we also indicate some
implications for (multiple) sample splitting techniques. In view of available
methods and software, our results contribute to robustness considerations with
respect to model misspecification.

This study aims at contributing to lower bounds for empirical compatibility
constants or empirical restricted eigenvalues. This is of importance in
compressed sensing and theory for $\ell_1$regularized estimators. Let $X$ be
an $n \times p$ data matrix with rows being independent copies of a
$p$dimensional random variable. Let $\hat \Sigma := X^T X / n$ be the inner
product matrix. We show that the quadratic forms $u^T \hat \Sigma u$ are lower
bounded by a value converging to one, uniformly over the set of vectors $u$
with $u^T \Sigma_0 u $ equal to one and $\ell_1$norm at most $M$. Here
$\Sigma_0 := {\bf E} \hat \Sigma$ is the theoretical inner product matrix which
we assume to exist. The constant $M$ is required to be of small order $\sqrt {n
/ \log p}$. We assume moreover $m$th order isotropy for some $m >2$ and
subexponential tails or moments up to order $\log p$ for the entries in $X$.
As a consequence we obtain convergence of the empirical compatibility constant
to its theoretical counterpart, and similarly for the empirical restricted
eigenvalue. If the data matrix $X$ is first normalized so that its columns all
have equal length we obtain lower bounds assuming only isotropy and no further
moment conditions on its entries. The isotropy condition is shown to hold for
certain martingale situations.

While effective concentration inequalities for suprema of empirical processes
exist under boundedness or strict tail assumptions, no comparable results have
been available under considerably weaker assumptions. In this paper, we derive
concentration inequalities assuming only low moments for an envelope of the
empirical process. These concentration inequalities are beneficial even when
the envelope is much larger than the single functions under consideration.

These lecture notes consist of three chapters. In the first chapter we
present oracle inequalities for the prediction error of the Lasso and
squareroot Lasso and briefly describe the scaled Lasso. In the second chapter
we establish asymptotic linearity of a desparsified Lasso. This implies
asymptotic normality under certain conditions and therefore can be used to
construct confidence intervals for parameters of interest. A similar line of
reasoning can be invoked to derive bounds in supnorm for the Lasso and
asymptotic linearity of desparsified estimators of a precision matrix. In the
third chapter we consider chaining and the more general generic chaining method
developed by Talagrand. This allows one to bound suprema of random processes.
Concentration inequalities are refined probability inequalities, mostly again
for suprema of random processes. We combine the two. We prove a deviation
inequality directly using (generic) chaining.

We propose a general method for constructing confidence intervals and
statistical tests for single or lowdimensional components of a large parameter
vector in a highdimensional model. It can be easily adjusted for multiplicity
taking dependence among tests into account. For linear models, our method is
essentially the same as in Zhang and Zhang [J. R. Stat. Soc. Ser. B Stat.
Methodol. 76 (2014) 217242]: we analyze its asymptotic properties and
establish its asymptotic optimality in terms of semiparametric efficiency. Our
method naturally extends to generalized linear models with convex loss
functions. We develop the corresponding theory which includes a careful
analysis for Gaussian, subGaussian and bounded correlated designs.

Discussion of "A significance test for the lasso" by Richard Lockhart,
Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani [arXiv:1301.7161].

We consider an additive regression model consisting of two components $f^0$
and $g^0$, where the first component $f^0$ is in some sense "smoother" than the
second $g^0$. Smoothness is here described in terms of a seminorm on the class
of regression functions. We use a penalized least squares estimator $(\hat f,
\hat g)$ of $(f^0, g^0)$ and show that the rate of convergence for $\hat f $ is
faster than the rate of convergence for $\hat g$. In fact, both rates are
generally as fast as in the case where one of the two components is known. The
theory is illustrated by a simulation study. Our proofs rely on recent results
from empirical process theory.

Censored data are quite common in statistics and have been studied in depth
in the last years. In this paper we consider censored highdimensional data.
Highdimensional models are in some way more complex than their
lowdimensional versions, therefore some different techniques are required. For
the linear case appropriate estimators based on penalized regression, have been
developed in the last years. In particular in sparse contexts the
$l_1$penalised regression (also known as LASSO) performs very well. Only few
theoretical work was done in order to analyse censored linear models in a
highdimensional context.
We therefore consider a highdimensional censored linear model, where the
response variable is leftcensored. We propose a new estimator, which aims to
work with highdimensional linear censored data. Theoretical nonasymptotic
oracle inequalities are derived.

We examine the rate of convergence of the Lasso estimator of lower
dimensional components of the highdimensional parameter. Under bounds on the
$\ell_1$norm on the worst possible subdirection these rates are of order
$\sqrt {J \log p / n }$ where $p$ is the total number of parameters, $J
\subset \{ 1, \ldots, p \}$ represents a subset of the parameters and $n$ is
the number of observations. We also derive rates in supnorm in terms of the
rate of convergence in $\ell_1$norm. The irrepresentable condition on a set
$J$ requires that the $\ell_1$norm of the worst possible subdirection is
sufficiently smaller than one. In that case sharp oracle results can be
obtained. Moreover, if the coefficients in $J$ are small enough the Lasso will
put these coefficients to zero. This extends known results which say that the
irrepresentable condition on the inactive set (the set where coefficients are
exactly zero) implies no false positives. We further show that by
desparsifying one obtains fast rates in supremum norm without conditions on
the worst possible subdirection. The main assumption here is that approximate
sparsity is of order $o (\sqrt n / \log p )$. The results are extended to
Mestimation with $\ell_1$penalty for generalized linear models and
exponential families for example. For the graphical Lasso this leads to an
extension of known results to the case where the precision matrix is only
approximately sparse. The bounds we provide are nonasymptotic but we also
present asymptotic formulations for ease of interpretation.