
Consider the multivariate nonparametric regression model. It is shown that
estimators based on sparsely connected deep neural networks with ReLU
activation function and properly chosen network architecture achieve the
minimax rates of convergence (up to $\log n$factors) under a general
composition assumption on the regression function. The framework includes many
wellstudied structural constraints such as (generalized) additive models.
While there is a lot of flexibility in the network architecture, the tuning
parameter is the sparsity of the network. Specifically, we consider large
networks with number of potential network parameters exceeding the sample size.
The analysis gives some insights into why multilayer feedforward neural
networks perform well in practice. Interestingly, for ReLU activation function
the depth (number of layers) of the neural network architectures plays an
important role and our theory suggests that for nonparametric regression,
scaling the network depth with the sample size is natural. It is also shown
that under the composition assumption wavelet estimators can only achieve
suboptimal rates.

Given a sample of a Poisson point process with intensity $\lambda_f(x,y) = n
\mathbf{1}(f(x) \leq y),$ we study recovery of the boundary function $f$ from a
nonparametric Bayes perspective. Because of the irregularity of this model, the
analysis is nonstandard. We establish a general result for the posterior
contraction rate with respect to the $L^1$norm based on entropy and onesided
small probability bounds. From this, specific posterior contraction results are
derived for Gaussian process priors and priors based on random wavelet series.

It is wellknown that density estimation on the unit interval is
asymptotically equivalent to a Gaussian white noise experiment, provided the
densities have H\"older smoothness larger than $1/2$ and are uniformly bounded
away from zero. We derive matching lower and constructive upper bounds for the
Le Cam deficiencies between these experiments, with explicit dependence on both
the sample size and the size of the densities in the parameter space. As a
consequence, we derive sharp conditions on how small the densities can be for
asymptotic equivalence to hold. The related case of Poisson intensity
estimation is also treated.

Deep neural networks (DNNs) generate much richer function spaces than shallow
networks. Since the function spaces induced by shallow networks have several
approximation theoretic drawbacks, this explains, however, not necessarily the
success of deep networks. In this article we take another route by comparing
the expressive power of DNNs with ReLU activation function to piecewise linear
spline methods. We show that MARS (multivariate adaptive regression splines) is
improper learnable by DNNs in the sense that for any given function that can be
expressed as a function in MARS with $M$ parameters there exists a multilayer
neural network with $O(M \log (M/\varepsilon))$ parameters that approximates
this function up to supnorm error $\varepsilon.$ We show a similar result for
expansions with respect to the FaberSchauder system. Based on this, we derive
risk comparison inequalities that bound the statistical risk of fitting a
neural network by the statistical risk of splinebased methods. This shows that
deep networks perform better or only slightly worse than the considered spline
methods. We provide a constructive proof for the function approximations.

The random coefficients model is an extension of the linear regression model
that allows for unobserved heterogeneity in the population by modeling the
regression coefficients as random variables. Given data from this model, the
statistical challenge is to recover information about the joint density of the
random coefficients which is a multivariate and illposed problem. Because of
the curse of dimensionality and the illposedness, pointwise nonparametric
estimation of the joint density is difficult and suffers from slow convergence
rates. Larger features, such as an increase of the density along some direction
or a wellaccentuated mode can, however, be much easier detected from data by
means of statistical tests. In this article, we follow this strategy and
construct tests and confidence statements for qualitative features of the joint
density, such as increases, decreases and modes. We propose a multiple testing
approach based on aggregating single tests which are designed to extract shape
information on fixed scales and directions. Using recent tools for Gaussian
approximations of multivariate empirical processes, we derive expressions for
the critical value. We apply our method to simulated and real data.

It is wellknown that density estimation on the unit interval is
asymptotically equivalent to a Gaussian white noise experiment, provided the
densities are sufficiently smooth and uniformly bounded away from zero. We show
that a uniform lower bound, whose size we sharply characterize, is in general
necessary for asymptotic equivalence to hold.

We investigate the regularity of the positive roots of a nonnegative
function of onevariable. A modified H\"older space $\mathcal{F}^\beta$ is
introduced such that if $f\in \mathcal{F}^\beta$ then $f^\alpha \in C^{\alpha
\beta}$. This provides sufficient conditions to overcome the usual limitation
in the square root case ($\alpha = 1/2$) for H\"older functions that $f^{1/2}$
need be no more than $C^1$ in general. We also derive bounds on the wavelet
coefficients of $f^\alpha$, which provide a finer understanding of its local
regularity.

We study a class of statistical inverse problems with nonlinear pointwise
operators motivated by concrete statistical applications. A twostep procedure
is proposed, where the first step smoothes the data and inverts the
nonlinearity. This reduces the initial nonlinear problem to a linear inverse
problem with deterministic noise, which is then solved in a second step. The
noise reduction step is based on wavelet thresholding and is shown to be
minimax optimal (up to logarithmic factors) in a pointwise functiondependent
sense. Our analysis is based on a modified notion of H\"older smoothness scales
that are natural in this setting.

We investigate the problem of deriving posterior concentration rates under
different loss functions in nonparametric Bayes. We first provide a lower bound
on posterior coverages of shrinking neighbourhoods that relates the metric or
loss under which the shrinking neighbourhood is considered, and an intrinsic
premetric linked to frequentist separation rates. In the Gaussian white noise
model, we construct feasible priors based on a spike and slab procedure
reminiscent of wavelet thresholding that achieve adaptive rates of contraction
under $L^2$ or $L^{\infty}$ metrics when the underlying parameter belongs to a
collection of H\"{o}lder balls and that moreover achieve our lower bound. We
analyse the consequences in terms of asymptotic behaviour of posterior credible
balls as well as frequentist minimax adaptive estimation. Our results are
appended with an upper bound for the contraction rate under an arbitrary loss
in a generic regular experiment. The upper bound is attained for certain sieve
priors and enables to extend our results to density estimation.

We study full Bayesian procedures for highdimensional linear regression
under sparsity constraints. The prior is a mixture of point masses at zero and
continuous distributions. Under compatibility conditions on the design matrix,
the posterior distribution is shown to contract at the optimal rate for
recovery of the unknown sparse vector, and to give optimal prediction of the
response vector. It is also shown to select the correct sparse model, or at
least the coefficients that are significantly different from zero. The
asymptotic shape of the posterior distribution is characterized and employed to
the construction and study of credible sets for uncertainty quantification.

The first Bayesian results for the sparse normal means problem were proven
for spikeandslab priors. However, these priors are less convenient from a
computational point of view. In the meanwhile, a large number of continuous
shrinkage priors has been proposed. Many of these shrinkage priors can be
written as a scale mixture of normals, which makes them particularly easy to
implement. We propose general conditions on the prior on the local variance in
scale mixtures of normals, such that posterior contraction at the minimax rate
is assured. The conditions require tails at least as heavy as Laplace, but not
too heavy, and a large amount of mass around zero relative to the tails, more
so as the sparsity increases. These conditions give some general guidelines for
choosing a shrinkage prior for estimation under a nearly black sparsity
assumption. We verify these conditions for the class of priors considered by
Ghosh and Chakrabarti (2015), which includes the horseshoe and the
normalexponential gamma priors, and for the horseshoe+, the inverseGaussian
prior, the normalgamma prior, and the spikeandslab Lasso, and thus extend
the number of shrinkage priors which are known to lead to posterior contraction
at the minimax estimation rate.

Consider nonparametric function estimation under $L^p$loss. The minimax rate
for estimation of the regression function over a H\"older ball with smoothness
index $\beta$ is $n^{\beta/(2\beta+1)}$ if $1\leq p<\infty$ and $(n/\log
n)^{\beta/(2\beta+1)}$ if $p=\infty.$ There are many known procedures that
either attain this rate for $p=\infty$ but are suboptimal by a $\log n$ factor
in the case $p<\infty$ or the other way around. In this article, we construct
an estimator that simultaneously achieves the optimal rates under $L^p$risk
for all $1\leq p\leq \infty$ without prior knowledge of $\beta.$ In contrast to
classical wavelet thresholding methods that kill small empirical wavelet
coefficients and keep large ones, it is essential for simultaneous adaptation
that on each resolution level, the largest empirical wavelet coefficients are
truncated. This leads to a completely different point of view on wavelet
thresholding. The crucial part in the construction of the estimator is the size
of the truncation level which is linked to the unknown smoothness index.
Although estimation of the smoothness index is known to be a difficult task,
there is a datadriven choice of the truncation level that is sufficiently
precise for our purpose.

Consider estimation of the regression function based on a model with
equidistant design and measurement errors generated from a fractional Gaussian
noise process. In previous literature, this model has been heuristically linked
to an experiment, where the antiderivative of the regression function is
continuously observed under additive perturbation by a fractional Brownian
motion. Based on a reformulation of the problem using reproducing kernel
Hilbert spaces, we derive abstract approximation conditions on function spaces
under which asymptotic equivalence between these models can be established and
show that the conditions are satisfied for certain Sobolev balls exceeding some
minimal smoothness. Furthermore, we construct a sequence space representation
and provide necessary conditions for asymptotic equivalence to hold.

Mimicking the maximum likelihood estimator, we construct first order
CramerRao efficient and explicitly computable estimators for the scale
parameter $\sigma^2$ in the model $Z_{i,n}=\sigma
n^{\beta}X_i+Y_i,i=1,\ldots,n,\beta>0$ with independent, stationary Gaussian
processes $(X_i)_{i\in\mathbb{N}}$, $(Y_i)_{i\in\mathbb{N}}$, and
$(X_i)_{i\in\mathbb{N}}$ exhibits possibly longrange dependence. In a second
part, closedform expressions for the asymptotic behavior of the corresponding
Fisher information are derived. Our main finding is that depending on the
behavior of the spectral densities at zero, the Fisher information has
asymptotically two different scaling regimes, which are separated by a sharp
phase transition. The most prominent example included in our analysis is the
Fisher information for the scaling factor of a highfrequency sample of
fractional Brownian motion under additive noise.

We develop further the spot volatility estimator introduced in Hoffmann, Munk
and SchmidtHieber (2012) from a practical point of view and make it useful for
the analysis of highfrequency financial data. In a first part, we adjust the
estimator substantially in order to achieve good finite sample performance and
to overcome difficulties arising from violations of the additive microstructure
noise model (e.g. jumps, rounding errors). These modifications are justified by
simulations. The second part is devoted to investigate the behavior of
volatility in response to macroeconomic events. We give evidence that the spot
volatility of EuroBUND futures is considerably higher during press conferences
of the European Central Bank. As an outlook, we present an estimator for the
spot covolatility of two different prices.

We derive multiscale statistics for deconvolution in order to detect
qualitative features of the unknown density. An important example covered
within this framework is to test for local monotonicity on all scales
simultaneously. We investigate the moderately illposed setting, where the
Fourier transform of the error density in the deconvolution model is of
polynomial decay. For multiscale testing, we consider a calibration, motivated
by the modulus of continuity of Brownian motion. We investigate the performance
of our results from both the theoretical and simulation based point of view. A
major consequence of our work is that the detection of qualitative features of
a density in a deconvolution problem is a doable task although the minimax
rates for pointwise estimation are very slow.

We study nonparametric estimation of the diffusion coefficient from discrete
data, when the observations are blurred by additional noise. Such issues have
been developed over the last 10 years in several application fields and in
particular in high frequency financial data modelling, however mainly from a
parametric and semiparametric point of view. This paper addresses the
nonparametric estimation of the path of the (possibly stochastic) diffusion
coefficient in a relatively general setting. By developing preaveraging
techniques combined with wavelet thresholding, we construct adaptive estimators
that achieve a nearly optimal rate within a large scale of smoothness
constraints of Besov type. Since the diffusion coefficient is usually genuinely
random, we propose a new criterion to assess the quality of estimation; we
retrieve the usual minimax theory when this approach is restricted to a
deterministic diffusion coefficient. In particular, we take advantage of recent
results of Reiss [33] of asymptotic equivalence between a Gaussian diffusion
with additive noise and Gaussian white noise model, in order to prove a sharp
lower bound.

We consider the models Y_{i,n}=\int_0^{i/n}
\sigma(s)dW_s+\tau(i/n)\epsilon_{i,n}, and \tilde
Y_{i,n}=\sigma(i/n)W_{i/n}+\tau(i/n)\epsilon_{i,n}, i=1,...,n, where W_t
denotes a standard Brownian motion and \epsilon_{i,n} are centered i.i.d.
random variables with E(\epsilon_{i,n}^2)=1 and finite fourth moment.
Furthermore, \sigma and \tau are unknown deterministic functions and W_t and
(\epsilon_{1,n},...,\epsilon_{n,n}) are assumed to be independent processes.
Based on a spectral decomposition of the covariance structures we derive series
estimators for \sigma^2 and \tau^2 and investigate their rate of convergence of
the MISE in dependence of their smoothness. To this end specific basis
functions and their corresponding Sobolev ellipsoids are introduced and we show
that our estimators are optimal in minimax sense. Our work is motivated by
microstructure noise models. Our major finding is that the microstructure noise
\epsilon_{i,n} introduces an additionally degree of illposedness of 1/2;
irrespectively of the tail behavior of \epsilon_{i,n}. The method is
illustrated by a small numerical study.

In this paper we derive lower bounds in minimax sense for estimation of the
instantaneous volatility if the diffusion type part cannot be observed directly
but under some additional Gaussian noise. Three different models are
considered. Our technique is based on a general inequality for KullbackLeibler
divergence of multivariate normal random variables and spectral analysis of the
processes. The derived lower bounds are indeed optimal. Upper bounds can be
found in Munk and SchmidtHieber [18]. Our major finding is that the Gaussian
microstructure noise introduces an additional degree of illposedness for each
model, respectively.