• ### Jackknife multiplier bootstrap: finite sample approximations to the $U$-process supremum with applications(1708.02705)

Feb. 14, 2019 math.PR, math.ST, stat.TH, stat.ME
This paper is concerned with finite sample approximations to the supremum of a non-degenerate $U$-process of a general order indexed by a function class. We are primarily interested in situations where the function class as well as the underlying distribution change with the sample size, and the $U$-process itself is not weakly convergent as a process. Such situations arise in a variety of modern statistical problems. We first consider Gaussian approximations, namely, approximate the $U$-process supremum by the supremum of a Gaussian process, and derive coupling and Kolmogorov distance bounds. Such Gaussian approximations are, however, not often directly applicable in statistical problems since the covariance function of the approximating Gaussian process is unknown. This motivates us to study bootstrap-type approximations to the $U$-process supremum. We propose a novel jackknife multiplier bootstrap (JMB) tailored to the $U$-process, and derive coupling and Kolmogorov distance bounds for the proposed JMB method. All these results are non-asymptotic, and established under fairly general conditions on function classes and underlying distributions. Key technical tools in the proofs are new local maximal inequalities for $U$-processes, which may be useful in other problems. We also discuss applications of the general approximation results to testing for qualitative features of nonparametric functions based on generalized local $U$-processes.
• ### Inference on causal and structural parameters using many moment inequalities(1312.7614)

Oct. 18, 2018 math.ST, stat.TH, stat.AP, econ.EM
This paper considers the problem of testing many moment inequalities where the number of moment inequalities, denoted by $p$, is possibly much larger than the sample size $n$. There is a variety of economic applications where solving this problem allows to carry out inference on causal and structural parameters, a notable example is the market structure model of Ciliberto and Tamer (2009) where $p=2^{m+1}$ with $m$ being the number of firms that could possibly enter the market. We consider the test statistic given by the maximum of $p$ Studentized (or $t$-type) inequality-specific statistics, and analyze various ways to compute critical values for the test statistic. Specifically, we consider critical values based upon (i) the union bound combined with a moderate deviation inequality for self-normalized sums, (ii) the multiplier and empirical bootstraps, and (iii) two-step and three-step variants of (i) and (ii) by incorporating the selection of uninformative inequalities that are far from being binding and a novel selection of weakly informative inequalities that are potentially binding but do not provide first order information. We prove validity of these methods, showing that under mild conditions, they lead to tests with the error in size decreasing polynomially in $n$ while allowing for $p$ being much larger than $n$, indeed $p$ can be of order $\exp (n^{c})$ for some $c > 0$. Importantly, all these results hold without any restriction on the correlation structure between $p$ Studentized statistics, and also hold uniformly with respect to suitably large classes of underlying distributions. Moreover, in the online supplement, we show validity of a test based on the block multiplier bootstrap in the case of dependent data under some general mixing conditions.
• ### On frequentist coverage errors of Bayesian credible sets in high dimensions(1803.03450)

March 9, 2018 math.ST, stat.TH
In this paper, we study frequentist coverage errors of Bayesian credible sets for an approximately linear regression model with (moderately) high dimensional regressors, where the dimension of the regressors may increase with but is smaller than the sample size. Specifically, we consider Bayesian inference on the slope vector by fitting a Gaussian distribution on the error term and putting priors on the slope vector together with the error variance. The Gaussian specification on the error distribution may be incorrect, so that we work with quasi-likelihoods. Under this setup, we derive finite sample bounds on frequentist coverage errors of Bayesian credible rectangles. Derivation of those bounds builds on a novel Berry-Esseen type bound on quasi-posterior distributions and recent results on high-dimensional CLT on hyper-rectangles. We use this general result to quantify coverage errors of Castillo-Nickl and $L^{\infty}$-credible bands for Gaussian white noise models, linear inverse problems, and (possibly non-Gaussian) nonparametric regression models. In particular, we show that Bayesian credible bands for those nonparametric models have coverage errors decaying polynomially fast in the sample size, implying advantages of Bayesian credible bands over confidence bands based on extreme value theory.
• ### Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors(1212.6906)

Jan. 23, 2018 math.PR, math.ST, stat.TH, econ.EM
We derive a Gaussian approximation result for the maximum of a sum of high-dimensional random vectors. Specifically, we establish conditions under which the distribution of the maximum is approximated by that of the maximum of a sum of the Gaussian random vectors with the same covariance matrices as the original vectors. This result applies when the dimension of random vectors ($p$) is large compared to the sample size ($n$); in fact, $p$ can be much larger than $n$, without restricting correlations of the coordinates of these vectors. We also show that the distribution of the maximum of a sum of the random vectors with unknown covariance matrices can be consistently estimated by the distribution of the maximum of a sum of the conditional Gaussian random vectors obtained by multiplying the original vectors with i.i.d. Gaussian multipliers. This is the Gaussian multiplier (or wild) bootstrap procedure. Here too, $p$ can be large or even much larger than $n$. These distributional approximations, either Gaussian or conditional Gaussian, yield a high-quality approximation to the distribution of the original maximum, often with approximation error decreasing polynomially in the sample size, and hence are of interest in many applications. We demonstrate how our Gaussian approximations and the multiplier bootstrap can be used for modern high-dimensional estimation, multiple hypothesis testing, and adaptive specification testing. All these results contain nonasymptotic bounds on approximation errors.
• ### Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems(1304.0282)

Jan. 22, 2018 math.ST, stat.TH, stat.ME, econ.EM
We develop uniformly valid confidence regions for regression coefficients in a high-dimensional sparse median regression model with homoscedastic errors. Our methods are based on a moment equation that is immunized against non-regular estimation of the nuisance part of the median regression function by using Neyman's orthogonalization. We establish that the resulting instrumental median regression estimator of a target regression coefficient is asymptotically normally distributed uniformly with respect to the underlying sparse model and is semi-parametrically efficient. We also generalize our method to a general non-smooth Z-estimation framework with the number of target parameters $p_1$ being possibly much larger than the sample size $n$. We extend Huber's results on asymptotic normality to this setting, demonstrating uniform asymptotic normality of the proposed estimators over $p_1$-dimensional rectangles, constructing simultaneous confidence bands on all of the $p_1$ target parameters, and establishing asymptotic validity of the bands uniformly over underlying approximately sparse models. Keywords: Instrument; Post-selection inference; Sparsity; Neyman's Orthogonal Score test; Uniformly valid inference; Z-estimation.
• ### Uniform confidence bands in deconvolution with unknown error distribution(1608.02251)

July 22, 2017 math.ST, stat.TH
This paper develops a method to construct uniform confidence bands in deconvolution when the error distribution is unknown. We mainly focus on the baseline setting where an auxiliary sample from the error distribution is available and the error density is ordinary smooth. The auxiliary sample may directly come from validation data, or can be constructed from panel data with a symmetric error distribution. We also present extensions of the results on confidence bands to the case of super-smooth error densities. Simulation studies demonstrate the performance of the multiplier bootstrap confidence band in the finite sample. We apply our method to the Outer Continental Shelf (OCS) Auction Data and draw confidence bands for the density of common values of mineral rights on oil and gas tracts. Finally, we present an application of our main theoretical result specifically to additive fixed-effect panel data models. As an empirical illustration of the panel data analysis, we draw confidence bands for the density of the total factor productivity in a manufacturing industry in Chile.
• ### Bootstrap confidence bands for spectral estimation of L\'evy densities under high-frequency observations(1705.00586)

May 29, 2017 math.ST, stat.TH
This paper develops bootstrap methods to construct uniform confidence bands for nonparametric spectral estimation of L\'{e}vy densities under high-frequency observations. We assume that we observe $n$ discrete observations at frequency $1/\Delta > 0$, and work with the high-frequency setup where $\Delta = \Delta_{n} \to 0$ and $n\Delta \to \infty$ as $n \to \infty$. We employ a spectral (or Fourier-based) estimator of the L\'{e}vy density, and develop novel implementations of Gaussian multiplier (or wild) and empirical (or Efron's) bootstraps to construct confidence bands for the spectral estimator on a compact set that does not intersect the origin. We provide conditions under which the proposed confidence bands are asymptotically valid. Our confidence bands are shown to be asymptotically valid for a wide class of L\'{e}vy processes. We also develop a practical method for bandwidth selection, and conduct simulation studies to investigate the finite sample performance of the proposed confidence bands.
• ### A simple method to construct confidence bands in functional linear regression(1612.07490)

May 1, 2017 math.ST, stat.TH
This paper develops a simple method to construct confidence bands, centered at a principal component analysis (PCA) based estimator, for the slope function in a functional linear regression model with a scalar response variable and a functional predictor variable. The PCA-based estimator is a series estimator with estimated basis functions, and so construction of valid confidence bands for it is a non-trivial challenge. We propose a confidence band that aims at covering the slope function at "most" of points with a prespecified probability (level), and prove its asymptotic validity under suitable regularity conditions. Importantly, this is the first paper that derives confidence bands having theoretical justifications for the PCA-based estimator. We also propose a practical method to choose the cut-off level used in PCA-based estimation, and conduct numerical studies to verify the finite sample performance of the proposed confidence band. Finally, we apply our methodology to spectrometric data, and discuss extensions of our methodology to cases where additional vector-valued regressors are present.
• ### PCA-based estimation for functional linear regression with functional responses(1609.00286)

March 22, 2017 math.ST, stat.TH
This paper studies a regression model where both predictor and response variables are random functions. We consider a functional linear model where the conditional mean of the response variable at each time point is given by a linear functional of the predictor variable. In this paper, we are interested in estimation of the integral kernel $b(s,t)$ of the conditional expectation operator, where $s$ is an output variable while $t$ is a variable that interacts with the predictor variable. This problem is an ill-posed inverse problem, and we consider two estimators based on the functional principal component analysis (PCA). We show that under suitable regularity conditions, an estimator based on the single truncation attains the convergence rate for the integrated squared error that is characterized by smoothness of the function $b (s,t)$ in $t$ together with the decay rate of the eigenvalues of the covariance operator, but the rate does not depend on smoothness of $b(s,t)$ in $s$. This rate is shown to be minimax optimal, and consequently smoothness of $b(s,t)$ in $s$ does not affect difficulty of estimating $b$. We also consider an alternative estimator based on the double truncation, and provide conditions under which the alternative estimator attains the optimal rate. We conduct simulations to verify the performance of PCA-based estimators in the finite sample. Finally, we apply our estimators to investigate the relation between the lifetime pattern of working hours and total income, and the relation between the electricity spot price and the wind power infeed.
• ### Uniform confidence bands for nonparametric errors-in-variables regression(1702.03377)

June 13, 2019 math.ST, stat.TH
This paper develops a method to construct uniform confidence bands for a nonparametric regression function where a predictor variable is subject to a measurement error. We allow for the distribution of the measurement error to be unknown, but assume the availability of validation data or repeated measurements on the latent predictor variable. The proposed confidence band builds on the deconvolution kernel estimation and a novel application of the multiplier bootstrap method. We establish asymptotic validity of the proposed confidence band. To our knowledge, this is the first paper to derive asymptotically valid uniform confidence bands for nonparametric errors-in-variables regression.
• ### Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models(1312.7186)

June 23, 2016 math.ST, stat.TH, econ.EM
This work proposes new inference methods for a regression coefficient of interest in a (heterogeneous) quantile regression model. We consider a high-dimensional model where the number of regressors potentially exceeds the sample size but a subset of them suffice to construct a reasonable approximation to the conditional quantile function. The proposed methods are (explicitly or implicitly) based on orthogonal score functions that protect against moderate model selection mistakes, which are often inevitable in the approximately sparse model considered in the present paper. We establish the uniform validity of the proposed confidence regions for the quantile regression coefficient. Importantly, these methods directly apply to more than one variable and a continuum of quantile indices. In addition, the performance of the proposed methods is illustrated through Monte-Carlo experiments and an empirical example, dealing with risk factors in childhood malnutrition.
• ### Central Limit Theorems and Bootstrap in High Dimensions(1412.3661)

March 8, 2016 math.ST, stat.TH
This paper derives central limit and bootstrap theorems for probabilities that sums of centered high-dimensional random vectors hit hyperrectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for probabilities $\Pr(n^{-1/2}\sum_{i=1}^n X_i\in A)$ where $X_1,\dots,X_n$ are independent random vectors in $\mathbb{R}^p$ and $A$ is a hyperrectangle, or, more generally, a sparsely convex set, and show that the approximation error converges to zero even if $p=p_n\to \infty$ as $n \to \infty$ and $p \gg n$; in particular, $p$ can be as large as $O(e^{Cn^c})$ for some constants $c,C>0$. The result holds uniformly over all hyperrectangles, or more generally, sparsely convex sets, and does not require any restriction on the correlation structure among coordinates of $X_i$. Sparsely convex sets are sets that can be represented as intersections of many convex sets whose indicator functions depend only on a small subset of their arguments, with hyperrectangles being a special case.
• ### Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings(1502.00352)

Sept. 6, 2015 math.ST, stat.TH
We derive strong approximations to the supremum of the non-centered empirical process indexed by a possibly unbounded VC-type class of functions by the suprema of the Gaussian and bootstrap processes. The bounds of these approximations are non-asymptotic, which allows us to work with classes of functions whose complexity increases with the sample size. The construction of couplings is not of the Hungarian type and is instead based on the Slepian-Stein methods and Gaussian comparison inequalities. The increasing complexity of classes of functions and non-centrality of the processes make the results useful for applications in modern nonparametric statistics (Gin\'{e} and Nickl, 2015), in particular allowing us to study the power properties of nonparametric tests using Gaussian and bootstrap approximations.
• ### Some New Asymptotic Theory for Least Squares Series: Pointwise and Uniform Results(1212.0442)

June 17, 2015 stat.ME, econ.EM
In applications it is common that the exact form of a conditional expectation is unknown and having flexible functional forms can lead to improvements. Series method offers that by approximating the unknown function based on $k$ basis functions, where $k$ is allowed to grow with the sample size $n$. We consider series estimators for the conditional mean in light of: (i) sharp LLNs for matrices derived from the noncommutative Khinchin inequalities, (ii) bounds on the Lebesgue factor that controls the ratio between the $L^\infty$ and $L_2$-norms of approximation errors, (iii) maximal inequalities for processes whose entropy integrals diverge, and (iv) strong approximations to series-type processes. These technical tools allow us to contribute to the series literature, specifically the seminal work of Newey (1997), as follows. First, we weaken the condition on the number $k$ of approximating functions used in series estimation from the typical $k^2/n \to 0$ to $k/n \to 0$, up to log factors, which was available only for spline series before. Second, we derive $L_2$ rates and pointwise central limit theorems results when the approximation error vanishes. Under an incorrectly specified model, i.e. when the approximation error does not vanish, analogous results are also shown. Third, under stronger conditions we derive uniform rates and functional central limit theorems that hold if the approximation error vanishes or not. That is, we derive the strong approximation for the entire estimate of the nonparametric function. We derive uniform rates, Gaussian approximations, and uniform confidence bands for a wide collection of linear functionals of the conditional expectation function.
• ### Anti-concentration and honest, adaptive confidence bands(1303.7152)

Sept. 23, 2014 math.ST, stat.TH
Modern construction of uniform confidence bands for nonparametric densities (and other functions) often relies on the classical Smirnov-Bickel-Rosenblatt (SBR) condition; see, for example, Gin\'{e} and Nickl [Probab. Theory Related Fields 143 (2009) 569-596]. This condition requires the existence of a limit distribution of an extreme value type for the supremum of a studentized empirical process (equivalently, for the supremum of a Gaussian process with the same covariance function as that of the studentized empirical process). The principal contribution of this paper is to remove the need for this classical condition. We show that a considerably weaker sufficient condition is derived from an anti-concentration property of the supremum of the approximating Gaussian process, and we derive an inequality leading to such a property for separable Gaussian processes. We refer to the new condition as a generalized SBR condition. Our new result shows that the supremum does not concentrate too fast around any value. We then apply this result to derive a Gaussian multiplier bootstrap procedure for constructing honest confidence bands for nonparametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our approach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical process does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the confidence bands. Finally, of independent interest is our introduction of a new, practical version of Lepski's method, which computes the optimal, nonconservative resolution levels via a Gaussian multiplier bootstrap method.
• ### Gaussian approximation of suprema of empirical processes(1212.6885)

Aug. 17, 2014 math.PR, math.ST, stat.TH
This paper develops a new direct approach to approximating suprema of general empirical processes by a sequence of suprema of Gaussian processes, without taking the route of approximating whole empirical processes in the sup-norm. We prove an abstract approximation theorem applicable to a wide variety of statistical problems, such as construction of uniform confidence bands for functions. Notably, the bound in the main approximation theorem is nonasymptotic and the theorem allows for functions that index the empirical process to be unbounded and have entropy divergent with the sample size. The proof of the approximation theorem builds on a new coupling inequality for maxima of sums of random vectors, the proof of which depends on an effective use of Stein's method for normal approximation, and some new empirical process techniques. We study applications of this approximation theorem to local and series empirical processes arising in nonparametric estimation via kernel and series methods, where the classes of functions change with the sample size and are non-Donsker. Importantly, our new technique is able to prove the Gaussian approximation for the supremum type statistics under weak regularity conditions, especially concerning the bandwidth and the number of series functions, in those examples.
• ### Comparison and anti-concentration bounds for maxima of Gaussian random vectors(1301.4807)

April 13, 2014 math.PR, math.ST, stat.TH
Slepian and Sudakov-Fernique type inequalities, which compare expectations of maxima of Gaussian random vectors under certain restrictions on the covariance matrices, play an important role in probability theory, especially in empirical process and extreme value theories. Here we give explicit comparisons of expectations of smooth functions and distribution functions of maxima of Gaussian random vectors without any restriction on the covariance matrices. We also establish an anti-concentration inequality for the maximum of a Gaussian random vector, which derives a useful upper bound on the L\'{e}vy concentration function for the Gaussian maximum. The bound is dimension-free and applies to vectors with arbitrary covariance matrices. This anti-concentration inequality plays a crucial role in establishing bounds on the Kolmogorov distance between maxima of Gaussian random vectors. These results have immediate applications in mathematical statistics. As an example of application, we establish a conditional multiplier central limit theorem for maxima of sums of independent random vectors where the dimension of the vectors is possibly much larger than the sample size.
• ### Estimation and inference for linear panel data models under misspecification when both $n$ and $T$ are large(1403.2085)

March 11, 2014 math.ST, stat.TH
This paper considers fixed effects (FE) estimation for linear panel data models under possible model misspecification when both the number of individuals, $n$, and the number of time periods, $T$, are large. We first clarify the probability limit of the FE estimator and argue that this probability limit can be regarded as a pseudo-true parameter. We then establish the asymptotic distributional properties of the FE estimator around the pseudo-true parameter when $n$ and $T$ jointly go to infinity. Notably, we show that the FE estimator suffers from the incidental parameters bias of which the top order is $O(T^{-1})$, and even after the incidental parameters bias is completely removed, the rate of convergence of the FE estimator depends on the degree of model misspecification and is either $(nT)^{-1/2}$ or $n^{-1/2}$. Second, we establish asymptotically valid inference on the (pseudo-true) parameter. Specifically, we derive the asymptotic properties of the clustered covariance matrix (CCM) estimator and the cross section bootstrap, and show that they are robust to model misspecification. This establishes a rigorous theoretical ground for the use of the CCM estimator and the cross section bootstrap when model misspecification and the incidental parameters bias (in the coefficient estimate) are present. We conduct Monte Carlo simulations to evaluate the finite sample performance of the estimators and inference methods, together with a simple application to the unemployment dynamics in the U.S.
• ### Quasi-Bayesian analysis of nonparametric instrumental variables models(1204.2108)

Nov. 20, 2013 math.ST, stat.TH, stat.ME
This paper aims at developing a quasi-Bayesian analysis of the nonparametric instrumental variables model, with a focus on the asymptotic properties of quasi-posterior distributions. In this paper, instead of assuming a distributional assumption on the data generating process, we consider a quasi-likelihood induced from the conditional moment restriction, and put priors on the function-valued parameter. We call the resulting posterior quasi-posterior, which corresponds to Gibbs posterior'' in the literature. Here we focus on priors constructed on slowly growing finite-dimensional sieves. We derive rates of contraction and a nonparametric Bernstein-von Mises type result for the quasi-posterior distribution, and rates of convergence for the quasi-Bayes estimator defined by the posterior expectation. We show that, with priors suitably chosen, the quasi-posterior distribution (the quasi-Bayes estimator) attains the minimax optimal rate of contraction (convergence, resp.). These results greatly sharpen the previous related work.
• ### Estimation in functional linear quantile regression(1202.4850)

Feb. 27, 2013 math.ST, stat.TH, stat.ME
This paper studies estimation in functional linear quantile regression in which the dependent variable is scalar while the covariate is a function, and the conditional quantile for each fixed quantile index is modeled as a linear functional of the covariate. Here we suppose that covariates are discretely observed and sampling points may differ across subjects, where the number of measurements per subject increases as the sample size. Also, we allow the quantile index to vary over a given subset of the open unit interval, so the slope function is a function of two variables: (typically) time and quantile index. Likewise, the conditional quantile function is a function of the quantile index and the covariate. We consider an estimator for the slope function based on the principal component basis. An estimator for the conditional quantile function is obtained by a plug-in method. Since the so-constructed plug-in estimator not necessarily satisfies the monotonicity constraint with respect to the quantile index, we also consider a class of monotonized estimators for the conditional quantile function. We establish rates of convergence for these estimators under suitable norms, showing that these rates are optimal in a minimax sense under some smoothness assumptions on the covariance kernel of the covariate and the slope function. Empirical choice of the cutoff level is studied by using simulations.
• ### Two-step estimation of high dimensional additive models(1207.5313)

Jan. 29, 2013 math.ST, stat.TH, stat.ME
This paper investigates the two-step estimation of a high dimensional additive regression model, in which the number of nonparametric additive components is potentially larger than the sample size but the number of significant additive components is sufficiently small. The approach investigated consists of two steps. The first step implements the variable selection, typically by the group Lasso, and the second step applies the penalized least squares estimation with Sobolev penalties to the selected additive components. Such a procedure is computationally simple to implement and, in our numerical experiments, works reasonably well. Despite its intuitive nature, the theoretical properties of this two-step procedure have to be carefully analyzed, since the effect of the first step variable selection is random, and generally it may contain redundant additive components and at the same time miss significant additive components. This paper derives a generic performance bound on the two-step estimation procedure allowing for these situations, and studies in detail the overall performance when the first step variable selection is implemented by the group Lasso.
• ### Group Lasso for high dimensional sparse quantile regression models(1103.1458)

March 25, 2011 math.ST, stat.TH, stat.ME
This paper studies the statistical properties of the group Lasso estimator for high dimensional sparse quantile regression models where the number of explanatory variables (or the number of groups of explanatory variables) is possibly much larger than the sample size while the number of variables in "active" groups is sufficiently small. We establish a non-asymptotic bound on the $\ell_{2}$-estimation error of the estimator. This bound explains situations under which the group Lasso estimator is potentially superior/inferior to the $\ell_{1}$-penalized quantile regression estimator in terms of the estimation error. We also propose a data-dependent choice of the tuning parameter to make the method more practical, by extending the original proposal of Belloni and Chernozhukov (2011) for the $\ell_{1}$-penalized quantile regression estimator. As an application, we analyze high dimensional additive quantile regression models. We show that under a set of suitable regularity conditions, the group Lasso estimator can attain the convergence rate arbitrarily close to the oracle rate. Finally, we conduct simulations experiments to examine our theoretical results.