• A multivariate quantile regression model with a factor structure is proposed to mine data with many responses of interest. The factor structure is allowed to vary with the quantile levels, which makes our framework more flexible than the classical factor models. The model is estimated with the nuclear norm regularization in order to accommodate the high dimensionality of data, but the incurred optimization problem can only be efficiently solved in an approximate manner by off-the-shelf optimization methods. Such a scenario is often seen when the empirical risk is non-smooth or the numerical procedure involves expensive subroutines such as singular value decomposition. To ensure that the approximate estimator accurately estimates the model, sufficient conditions on the optimization error and non-asymptotic error bounds are established to characterize the risk of the proposed estimator. A numerical procedure that provably achieves small approximate error is proposed. The merits of our model and the proposed numerical procedures are demonstrated through Monte Carlo experiments and an application to finance involving a large pool of asset returns.
  • The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big data, we propose a two-step procedure: (i) estimate conditional quantile functions at different levels in a parallel computing environment; (ii) construct a conditional quantile regression process through projection based on these estimated quantile curves. Our general quantile regression framework covers both linear models with fixed or growing dimension and series approximation models. We prove that the proposed procedure does not sacrifice any statistical inferential accuracy provided that the number of distributed computing units and quantile levels are chosen properly. In particular, a sharp upper bound for the former and a sharp lower bound for the latter are derived to capture the minimal computational cost from a statistical perspective. As an important application, the statistical inference on conditional distribution functions is considered. Moreover, we propose computationally efficient approaches to conducting inference in the distributed estimation setting described above. Those approaches directly utilize the availability of estimators from sub-samples and can be carried out at almost no additional computational cost. Simulations confirm our statistical inferential theory.
  • A collection of quantile curves provides a complete picture of conditional distributions. Properly centered and scaled versions of estimated curves at various quantile levels give rise to the so-called quantile regression process (QRP). In this paper, we establish weak convergence of QRP in a general series approximation framework, which includes linear models with increasing dimension, nonparametric models and partial linear models. An interesting consequence is obtained in the last class of models, where parametric and non-parametric estimators are shown to be asymptotically independent. Applications of our general process convergence results include the construction of non-crossing quantile curves and the estimation of conditional distribution functions. As a result of independent interest, we obtain a series of Bahadur representations with exponential bounds for tail probabilities of all remainder terms. Bounds of this kind are potentially useful in analyzing statistical inference procedures under divide-and-conquer setup.
  • We focus on the construction of confidence corridors for multivariate nonparametric generalized quantile regression functions. This construction is based on asymptotic results for the maximal deviation between a suitable nonparametric estimator and the true function of interest which follow after a series of approximation steps including a Bahadur representation, a new strong approximation theorem and exponential tail inequalities for Gaussian random fields. As a byproduct we also obtain confidence corridors for the regression function in the classical mean regression. In order to deal with the problem of slowly decreasing error in coverage probability of the asymptotic confidence corridors, which results in meager coverage for small sample sizes, a simple bootstrap procedure is designed based on the leading term of the Bahadur representation. The finite sample properties of both procedures are investigated by means of a simulation study and it is demonstrated that the bootstrap procedure considerably outperforms the asymptotic bands in terms of coverage accuracy. Finally, the bootstrap confidence corridors are used to study the efficacy of the National Supported Work Demonstration, which is a randomized employment enhancement program launched in the 1970s. This article has supplementary materials.