• Many statistical models seek relationship between variables via subspaces of reduced dimensions. For instance, in factor models, variables are roughly distributed around a low dimensional subspace determined by the loading matrix; in mixed linear regression models, the coefficient vectors for different mixtures form a subspace that captures all regression functions; in multiple index models, the effect of covariates is summarized by the effective dimension reduction space. Such subspaces are typically unknown, and good estimates are crucial for data visualization, dimension reduction, diagnostics and estimation of unknown parameters. Usually, we can estimate these subspaces by computing moments from data. Often, there are many ways to estimate a subspace, by using moments of different orders, transformed moments, etc. A natural question is: how can we combine all these moment conditions and achieve optimality for subspace estimation? In this paper, we formulate our problem as estimation of an unknown subspace $\mathcal{S}$ of dimension $r$, given a set of overidentifying vectors $\{ \mathrm{\bf v}_\ell \}_{\ell=1}^m$ (namely $m \ge r$) that satisfy $\mathbb{E} \mathrm{\bf v}_{\ell} \in \mathcal{S}$ and have the form $$ \mathrm{\bf v}_\ell = \frac{1}{n} \sum_{i=1}^n \mathrm{\bf f}_\ell(\mathbf{x}_i, y_i), $$ where data are i.i.d. and each function $\mathrm{\bf f}_\ell$ is known. By exploiting certain covariance information related to $\mathrm{\bf v}_\ell$, our estimator of $\mathcal{S}$ uses an optimal weighting matrix and achieves the smallest asymptotic error, in terms of canonical angles. The analysis is based on the generalized method of moments that is tailored to our problem. Our method is applied to aforementioned models and distributed estimation of heterogeneous datasets, and may be potentially extended to analyze matrix completion, neural nets, among others.
  • Recovering low-rank structures via eigenvector perturbation analysis is a common problem in statistical machine learning, such as in factor analysis, community detection, ranking, matrix completion, among others. While a large variety of bounds are available for average errors between empirical and population statistics of eigenvectors, few results are tight for entrywise analyses, which are critical for a number of problems such as community detection. This paper investigates entrywise behaviors of eigenvectors for a large class of random matrices whose expectations are low-rank, which helps settle the conjecture in Abbe et al. (2014b) that the spectral algorithm achieves exact recovery in the stochastic block model without any trimming or cleaning steps. The key is a first-order approximation of eigenvectors under the $\ell_\infty$ norm: $$u_k \approx \frac{A u_k^*}{\lambda_k^*},$$ where $\{u_k\}$ and $\{u_k^*\}$ are eigenvectors of a random matrix $A$ and its expectation $\mathbb{E} A$, respectively. The fact that the approximation is both tight and linear in $A$ facilitates sharp comparisons between $u_k$ and $u_k^*$. In particular, it allows for comparing the signs of $u_k$ and $u_k^*$ even if $\| u_k - u_k^*\|_{\infty}$ is large. The results are further extended to perturbations of eigenspaces, yielding new $\ell_\infty$-type bounds for synchronization ($\mathbb{Z}_2$-spiked Wigner model) and noisy matrix completion.
  • In statistics and machine learning, people are often interested in the eigenvectors (or singular vectors) of certain matrices (e.g. covariance matrices, data matrices, etc). However, those matrices are usually perturbed by noises or statistical errors, either from random sampling or structural patterns. One usually employs Davis-Kahan $\sin \theta$ theorem to bound the difference between the eigenvectors of a matrix $A$ and those of a perturbed matrix $\widetilde{A} = A + E$, in terms of $\ell_2$ norm. In this paper, we prove that when $A$ is a low-rank and incoherent matrix, the $\ell_{\infty}$ norm perturbation bound of singular vectors (or eigenvectors in the symmetric case) is smaller by a factor of $\sqrt{d_1}$ or $\sqrt{d_2}$ for left and right vectors, where $d_1$ and $d_2$ are the matrix dimensions. The power of this new perturbation result is shown in robust covariance estimation, particularly when random variables have heavy tails. There, we propose new robust covariance estimators and establish their asymptotic properties using the newly developed perturbation bound. Our theoretical results are verified through extensive numerical experiments.
  • The problem of phase synchronization is to estimate the phases (angles) of a complex unit-modulus vector $z$ from their noisy pairwise relative measurements $C = zz^* + \sigma W$, where $W$ is a complex-valued Gaussian random matrix. The maximum likelihood estimator (MLE) is a solution to a unit-modulus constrained quadratic programming problem, which is nonconvex. Existing works have proposed polynomial-time algorithms such as a semidefinite relaxation (SDP) approach or the generalized power method (GPM) to solve it. Numerical experiments suggest both of these methods succeed with high probability for $\sigma$ up to $\tilde{\mathcal{O}}(n^{1/2})$, yet, existing analyses only confirm this observation for $\sigma$ up to $\mathcal{O}(n^{1/4})$. In this paper, we bridge the gap, by proving SDP is tight for $\sigma = \mathcal{O}(\sqrt{n /\log n})$, and GPM converges to the global optimum under the same regime. Moreover, we establish a linear convergence rate for GPM, and derive a tighter $\ell_\infty$ bound for the MLE. A novel technique we develop in this paper is to track (theoretically) $n$ closely related sequences of iterates, in addition to the sequence of iterates GPM actually produces. As a by-product, we obtain an $\ell_\infty$ perturbation bound for leading eigenvectors. Our result also confirms intuitions that use techniques from statistical mechanics.
  • Rayleigh-Schr\"{o}dinger perturbation theory is a well-known theory in quantum mechanics and it offers useful characterization of eigenvectors of a perturbed matrix. Suppose $A$ and perturbation $E$ are both Hermitian matrices, $A^t = A + tE$, $\{\lambda_j\}_{j=1}^n$ are eigenvalues of $A$ in descending order, and $u_1, u^t_1$ are leading eigenvectors of $A$ and $A^t$. Rayleigh-Schr\"{o}dinger theory shows asymptotically, $\langle u^t_1, u_j \rangle \propto t / (\lambda_1 - \lambda_j)$ where $ t = o(1)$. However, the asymptotic theory does not apply to larger $t$; in particular, it fails when $ t \| E \|_2 > \lambda_1 - \lambda_2$. In this paper, we present a nonasymptotic theory with $E$ being a random matrix. We prove that, when $t = 1$ and $E$ has independent and centered subgaussian entries above its diagonal, with high probability, \begin{equation*} | \langle u^1_1, u_j \rangle | = O(\sqrt{\log n} / (\lambda_1 - \lambda_j)), \end{equation*} for all $j>1$ simultaneously, under a condition on eigenvalues of $A$ that involves all gaps $\lambda_1 - \lambda_j$. This bound is valid, even in cases where $\| E \|_2 \gg \lambda_1 - \lambda_2$. The result is optimal, except for a log term. It also leads to an improvement of Davis-Kahan theorem.
  • In this paper, we study robust covariance estimation under the approximate factor model with observed factors. We propose a novel framework to first estimate the initial joint covariance matrix of the observed data and the factors, and then use it to recover the covariance matrix of the observed data. We prove that once the initial matrix estimator is good enough to maintain the element-wise optimal rate, the whole procedure will generate an estimated covariance with desired properties. For data with only bounded fourth moments, we propose to use Huber loss minimization to give the initial joint covariance estimation. This approach is applicable to a much wider range of distributions, including sub-Gaussian and elliptical distributions. We also present an asymptotic result for Huber's M-estimator with a diverging parameter. The conclusions are demonstrated by extensive simulations and real data analysis.
  • We consider accurately answering smooth queries while preserving differential privacy. A query is said to be $K$-smooth if it is specified by a function defined on $[-1,1]^d$ whose partial derivatives up to order $K$ are all bounded. We develop an $\epsilon$-differentially private mechanism for the class of $K$-smooth queries. The major advantage of the algorithm is that it outputs a synthetic database. In real applications, a synthetic database output is appealing. Our mechanism achieves an accuracy of $O (n^{-\frac{K}{2d+K}}/\epsilon )$, and runs in polynomial time. We also generalize the mechanism to preserve $(\epsilon, \delta)$-differential privacy with slightly improved accuracy. Extensive experiments on benchmark datasets demonstrate that the mechanisms have good accuracy and are efficient.