• Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to $\log n$-factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential network parameters exceeding the sample size. The analysis gives some insights into why multilayer feedforward neural networks perform well in practice. Interestingly, for ReLU activation function the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural. It is also shown that under the composition assumption wavelet estimators can only achieve suboptimal rates.
  • Given a sample of a Poisson point process with intensity $\lambda_f(x,y) = n \mathbf{1}(f(x) \leq y),$ we study recovery of the boundary function $f$ from a nonparametric Bayes perspective. Because of the irregularity of this model, the analysis is non-standard. We establish a general result for the posterior contraction rate with respect to the $L^1$-norm based on entropy and one-sided small probability bounds. From this, specific posterior contraction results are derived for Gaussian process priors and priors based on random wavelet series.
  • It is well-known that density estimation on the unit interval is asymptotically equivalent to a Gaussian white noise experiment, provided the densities have H\"older smoothness larger than $1/2$ and are uniformly bounded away from zero. We derive matching lower and constructive upper bounds for the Le Cam deficiencies between these experiments, with explicit dependence on both the sample size and the size of the densities in the parameter space. As a consequence, we derive sharp conditions on how small the densities can be for asymptotic equivalence to hold. The related case of Poisson intensity estimation is also treated.
  • Deep neural networks (DNNs) generate much richer function spaces than shallow networks. Since the function spaces induced by shallow networks have several approximation theoretic drawbacks, this explains, however, not necessarily the success of deep networks. In this article we take another route by comparing the expressive power of DNNs with ReLU activation function to piecewise linear spline methods. We show that MARS (multivariate adaptive regression splines) is improper learnable by DNNs in the sense that for any given function that can be expressed as a function in MARS with $M$ parameters there exists a multilayer neural network with $O(M \log (M/\varepsilon))$ parameters that approximates this function up to sup-norm error $\varepsilon.$ We show a similar result for expansions with respect to the Faber-Schauder system. Based on this, we derive risk comparison inequalities that bound the statistical risk of fitting a neural network by the statistical risk of spline-based methods. This shows that deep networks perform better or only slightly worse than the considered spline methods. We provide a constructive proof for the function approximations.
  • The random coefficients model is an extension of the linear regression model that allows for unobserved heterogeneity in the population by modeling the regression coefficients as random variables. Given data from this model, the statistical challenge is to recover information about the joint density of the random coefficients which is a multivariate and ill-posed problem. Because of the curse of dimensionality and the ill-posedness, pointwise nonparametric estimation of the joint density is difficult and suffers from slow convergence rates. Larger features, such as an increase of the density along some direction or a well-accentuated mode can, however, be much easier detected from data by means of statistical tests. In this article, we follow this strategy and construct tests and confidence statements for qualitative features of the joint density, such as increases, decreases and modes. We propose a multiple testing approach based on aggregating single tests which are designed to extract shape information on fixed scales and directions. Using recent tools for Gaussian approximations of multivariate empirical processes, we derive expressions for the critical value. We apply our method to simulated and real data.
  • It is well-known that density estimation on the unit interval is asymptotically equivalent to a Gaussian white noise experiment, provided the densities are sufficiently smooth and uniformly bounded away from zero. We show that a uniform lower bound, whose size we sharply characterize, is in general necessary for asymptotic equivalence to hold.
  • We investigate the regularity of the positive roots of a non-negative function of one-variable. A modified H\"older space $\mathcal{F}^\beta$ is introduced such that if $f\in \mathcal{F}^\beta$ then $f^\alpha \in C^{\alpha \beta}$. This provides sufficient conditions to overcome the usual limitation in the square root case ($\alpha = 1/2$) for H\"older functions that $f^{1/2}$ need be no more than $C^1$ in general. We also derive bounds on the wavelet coefficients of $f^\alpha$, which provide a finer understanding of its local regularity.
  • We study a class of statistical inverse problems with non-linear pointwise operators motivated by concrete statistical applications. A two-step procedure is proposed, where the first step smoothes the data and inverts the non-linearity. This reduces the initial non-linear problem to a linear inverse problem with deterministic noise, which is then solved in a second step. The noise reduction step is based on wavelet thresholding and is shown to be minimax optimal (up to logarithmic factors) in a pointwise function-dependent sense. Our analysis is based on a modified notion of H\"older smoothness scales that are natural in this setting.
  • We investigate the problem of deriving posterior concentration rates under different loss functions in nonparametric Bayes. We first provide a lower bound on posterior coverages of shrinking neighbourhoods that relates the metric or loss under which the shrinking neighbourhood is considered, and an intrinsic pre-metric linked to frequentist separation rates. In the Gaussian white noise model, we construct feasible priors based on a spike and slab procedure reminiscent of wavelet thresholding that achieve adaptive rates of contraction under $L^2$ or $L^{\infty}$ metrics when the underlying parameter belongs to a collection of H\"{o}lder balls and that moreover achieve our lower bound. We analyse the consequences in terms of asymptotic behaviour of posterior credible balls as well as frequentist minimax adaptive estimation. Our results are appended with an upper bound for the contraction rate under an arbitrary loss in a generic regular experiment. The upper bound is attained for certain sieve priors and enables to extend our results to density estimation.
  • We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It is also shown to select the correct sparse model, or at least the coefficients that are significantly different from zero. The asymptotic shape of the posterior distribution is characterized and employed to the construction and study of credible sets for uncertainty quantification.
  • The first Bayesian results for the sparse normal means problem were proven for spike-and-slab priors. However, these priors are less convenient from a computational point of view. In the meanwhile, a large number of continuous shrinkage priors has been proposed. Many of these shrinkage priors can be written as a scale mixture of normals, which makes them particularly easy to implement. We propose general conditions on the prior on the local variance in scale mixtures of normals, such that posterior contraction at the minimax rate is assured. The conditions require tails at least as heavy as Laplace, but not too heavy, and a large amount of mass around zero relative to the tails, more so as the sparsity increases. These conditions give some general guidelines for choosing a shrinkage prior for estimation under a nearly black sparsity assumption. We verify these conditions for the class of priors considered by Ghosh and Chakrabarti (2015), which includes the horseshoe and the normal-exponential gamma priors, and for the horseshoe+, the inverse-Gaussian prior, the normal-gamma prior, and the spike-and-slab Lasso, and thus extend the number of shrinkage priors which are known to lead to posterior contraction at the minimax estimation rate.
  • Consider nonparametric function estimation under $L^p$-loss. The minimax rate for estimation of the regression function over a H\"older ball with smoothness index $\beta$ is $n^{-\beta/(2\beta+1)}$ if $1\leq p<\infty$ and $(n/\log n)^{-\beta/(2\beta+1)}$ if $p=\infty.$ There are many known procedures that either attain this rate for $p=\infty$ but are suboptimal by a $\log n$ factor in the case $p<\infty$ or the other way around. In this article, we construct an estimator that simultaneously achieves the optimal rates under $L^p$-risk for all $1\leq p\leq \infty$ without prior knowledge of $\beta.$ In contrast to classical wavelet thresholding methods that kill small empirical wavelet coefficients and keep large ones, it is essential for simultaneous adaptation that on each resolution level, the largest empirical wavelet coefficients are truncated. This leads to a completely different point of view on wavelet thresholding. The crucial part in the construction of the estimator is the size of the truncation level which is linked to the unknown smoothness index. Although estimation of the smoothness index is known to be a difficult task, there is a data-driven choice of the truncation level that is sufficiently precise for our purpose.
  • Consider estimation of the regression function based on a model with equidistant design and measurement errors generated from a fractional Gaussian noise process. In previous literature, this model has been heuristically linked to an experiment, where the anti-derivative of the regression function is continuously observed under additive perturbation by a fractional Brownian motion. Based on a reformulation of the problem using reproducing kernel Hilbert spaces, we derive abstract approximation conditions on function spaces under which asymptotic equivalence between these models can be established and show that the conditions are satisfied for certain Sobolev balls exceeding some minimal smoothness. Furthermore, we construct a sequence space representation and provide necessary conditions for asymptotic equivalence to hold.
  • Mimicking the maximum likelihood estimator, we construct first order Cramer-Rao efficient and explicitly computable estimators for the scale parameter $\sigma^2$ in the model $Z_{i,n}=\sigma n^{-\beta}X_i+Y_i,i=1,\ldots,n,\beta>0$ with independent, stationary Gaussian processes $(X_i)_{i\in\mathbb{N}}$, $(Y_i)_{i\in\mathbb{N}}$, and $(X_i)_{i\in\mathbb{N}}$ exhibits possibly long-range dependence. In a second part, closed-form expressions for the asymptotic behavior of the corresponding Fisher information are derived. Our main finding is that depending on the behavior of the spectral densities at zero, the Fisher information has asymptotically two different scaling regimes, which are separated by a sharp phase transition. The most prominent example included in our analysis is the Fisher information for the scaling factor of a high-frequency sample of fractional Brownian motion under additive noise.
  • We develop further the spot volatility estimator introduced in Hoffmann, Munk and Schmidt-Hieber (2012) from a practical point of view and make it useful for the analysis of high-frequency financial data. In a first part, we adjust the estimator substantially in order to achieve good finite sample performance and to overcome difficulties arising from violations of the additive microstructure noise model (e.g. jumps, rounding errors). These modifications are justified by simulations. The second part is devoted to investigate the behavior of volatility in response to macroeconomic events. We give evidence that the spot volatility of Euro-BUND futures is considerably higher during press conferences of the European Central Bank. As an outlook, we present an estimator for the spot covolatility of two different prices.
  • We derive multiscale statistics for deconvolution in order to detect qualitative features of the unknown density. An important example covered within this framework is to test for local monotonicity on all scales simultaneously. We investigate the moderately ill-posed setting, where the Fourier transform of the error density in the deconvolution model is of polynomial decay. For multiscale testing, we consider a calibration, motivated by the modulus of continuity of Brownian motion. We investigate the performance of our results from both the theoretical and simulation based point of view. A major consequence of our work is that the detection of qualitative features of a density in a deconvolution problem is a doable task although the minimax rates for pointwise estimation are very slow.
  • We study nonparametric estimation of the diffusion coefficient from discrete data, when the observations are blurred by additional noise. Such issues have been developed over the last 10 years in several application fields and in particular in high frequency financial data modelling, however mainly from a parametric and semiparametric point of view. This paper addresses the nonparametric estimation of the path of the (possibly stochastic) diffusion coefficient in a relatively general setting. By developing pre-averaging techniques combined with wavelet thresholding, we construct adaptive estimators that achieve a nearly optimal rate within a large scale of smoothness constraints of Besov type. Since the diffusion coefficient is usually genuinely random, we propose a new criterion to assess the quality of estimation; we retrieve the usual minimax theory when this approach is restricted to a deterministic diffusion coefficient. In particular, we take advantage of recent results of Reiss [33] of asymptotic equivalence between a Gaussian diffusion with additive noise and Gaussian white noise model, in order to prove a sharp lower bound.
  • We consider the models Y_{i,n}=\int_0^{i/n} \sigma(s)dW_s+\tau(i/n)\epsilon_{i,n}, and \tilde Y_{i,n}=\sigma(i/n)W_{i/n}+\tau(i/n)\epsilon_{i,n}, i=1,...,n, where W_t denotes a standard Brownian motion and \epsilon_{i,n} are centered i.i.d. random variables with E(\epsilon_{i,n}^2)=1 and finite fourth moment. Furthermore, \sigma and \tau are unknown deterministic functions and W_t and (\epsilon_{1,n},...,\epsilon_{n,n}) are assumed to be independent processes. Based on a spectral decomposition of the covariance structures we derive series estimators for \sigma^2 and \tau^2 and investigate their rate of convergence of the MISE in dependence of their smoothness. To this end specific basis functions and their corresponding Sobolev ellipsoids are introduced and we show that our estimators are optimal in minimax sense. Our work is motivated by microstructure noise models. Our major finding is that the microstructure noise \epsilon_{i,n} introduces an additionally degree of ill-posedness of 1/2; irrespectively of the tail behavior of \epsilon_{i,n}. The method is illustrated by a small numerical study.
  • In this paper we derive lower bounds in minimax sense for estimation of the instantaneous volatility if the diffusion type part cannot be observed directly but under some additional Gaussian noise. Three different models are considered. Our technique is based on a general inequality for Kullback-Leibler divergence of multivariate normal random variables and spectral analysis of the processes. The derived lower bounds are indeed optimal. Upper bounds can be found in Munk and Schmidt-Hieber [18]. Our major finding is that the Gaussian microstructure noise introduces an additional degree of ill-posedness for each model, respectively.