• ### Optimal Two-Step Prediction in Regression(1410.5014)

June 6, 2017 math.ST, stat.TH, stat.ME
High-dimensional prediction typically comprises two steps: variable selection and subsequent least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and lacks finite sample guarantees. In this paper, we introduce an alternative scheme, easy to implement and both computationally and theoretically efficient.
• ### The middle-scale asymptotics of Wishart matrices(1705.03510)

May 9, 2017 math.PR, math.ST, stat.TH
We study the behavior of a real $p$-dimensional Wishart random matrix with $n$ degrees of freedom when $n,p\rightarrow\infty$ but $p/n\rightarrow 0$. We establish the existence of phase transitions when $p$ grows at the order $n^{(K+1)/(K+3)}$ for every $k\in\mathbb{N}$, and derive expressions for approximating densities between every two phase transitions. To do this, we make use of a novel tool we call the G-transform of a distribution, which is closely related to the characteristic function. We also derive an extension of the $t$-distribution to the real symmetric matrices, which naturally appears as the conjugate distribution to the Wishart under a G-transformation, and show its empirical spectral distribution obeys a semicircle law when $p/n\rightarrow 0$. Finally, we discuss how the phase transitions of the Wishart distribution might originate from changes in rates of convergence of symmetric $t$ statistics.
• ### On the Domain of Attraction of a Tracy-Widom Law with Applications to Testing Multiple Largest Roots(1510.08873)

Oct. 29, 2015 math.ST, stat.TH
The greatest root statistic arises as the test statistic in several multivariate analysis settings. Suppose there is a global null hypothesis that consists of different independent sub-null hypotheses, and suppose the greatest root statistic is used as the test statistic for each sub-null hypothesis. Such problems may arise when conducting a batch MANOVA or several batches of pairwise testing for equality of covariance matrices. Using the union-intersection testing approach and by letting the problem dimension tend to infinity faster than the number of batches, we show that the global null can be tested using a Gumbel distribution to approximate the critical values. Although the theoretical results are asymptotic, simulation studies indicate that the approximations are very good even for small to moderate dimensions. The results are general and can be applied in any setting where the greatest root statistic is used, not just for the two methods we use for illustrative purposes.
• ### Improved Second Order Estimation in the Singular Multivariate Normal Model(1509.02451)

Sept. 8, 2015 math.ST, stat.TH
We consider the problem of estimating covariance and precision matrices, and their associated discriminant coefficients, from normal data when the rank of the covariance matrix is strictly smaller than its dimension and the available sample size. Using unbiased risk estimation, we construct novel estimators by minimizing upper bounds on the difference in risk over several classes. Our proposal estimates are empirically demonstrated to offer substantial improvement over classical approaches.
• ### Noise Estimation in the Spiked Covariance Model(1408.6440)

Aug. 27, 2014 math.ST, stat.TH, stat.ME
The problem of estimating a spiked covariance matrix in high dimensions under Frobenius loss, and the parallel problem of estimating the noise in spiked PCA is investigated. We propose an estimator of the noise parameter by minimizing an unbiased estimator of the invariant Frobenius risk using calculus of variations. The resulting estimator is shown, using random matrix theory, to be strongly consistent and essentially asymptotically normal and minimax for the noise estimation problem. We apply the construction to construct a robust spiked covariance matrix estimator with consistent eigenvalues.
• ### Improved multivariate normal mean estimation with unknown covariance when p is greater than n(1302.6746)

Feb. 27, 2013 math.ST, stat.TH
We consider the problem of estimating the mean vector of a p-variate normal $(\theta,\Sigma)$ distribution under invariant quadratic loss, $(\delta-\theta)'\Sigma^{-1}(\delta-\theta)$, when the covariance is unknown. We propose a new class of estimators that dominate the usual estimator $\delta^0(X)=X$. The proposed estimators of $\theta$ depend upon X and an independent Wishart matrix S with n degrees of freedom, however, S is singular almost surely when p>n. The proof of domination involves the development of some new unbiased estimators of risk for the p>n setting. We also find some relationships between the amount of domination and the magnitudes of n and p.