
Largescale multiple testing with highly correlated test statistics arises
frequently in many scientific research. Incorporating correlation information
in estimating false discovery proportion has attracted increasing attention in
recent years. When the covariance matrix of test statistics is known, Fan, Han
& Gu (2012) provided a consistent estimate of False Discovery Proportion (FDP)
under arbitrary dependence structure. However, the covariance matrix is often
unknown in many applications and such dependence information has to be
estimated before estimating FDP (Efron, 2010). The estimation accuracy can
greatly affect the convergence result of FDP or even violate its consistency.
In the current paper, we provide methodological modification and theoretical
investigations for estimation of FDP with unknown covariance. First we develop
requirements for estimates of eigenvalues and eigenvectors such that we can
obtain a consistent estimate of FDP. Secondly we give conditions on the
dependence structures such that the estimate of FDP is consistent. Such
dependence structures include sparse covariance matrices, which have been
popularly considered in the contemporary random matrix theory. When data are
sampled from an approximate factor model, which encompasses most practical
situations, we provide a consistent estimate of FDP via exploiting this
specific dependence structure. The results are further demonstrated by
simulation studies and some real data applications.

Measuring conditional dependence is an important topic in statistics with
broad applications including graphical models. Under a factor model setting, a
new conditional dependence measure based on projection is proposed. The
corresponding conditional independence test is developed with the asymptotic
null distribution unveiled where the number of factors could be
highdimensional. It is also shown that the new test has control over the
asymptotic significance level and can be calculated efficiently. A generic
method for building dependency graphs without Gaussian assumption using the new
test is elaborated. Numerical results and real data analysis show the
superiority of the new method.

Big data can easily be contaminated by outliers or contain variables with
heavytailed distributions, which makes many conventional methods inadequate.
To address this challenge, we propose the adaptive Huber regression for robust
estimation and inference. The key observation is that the robustification
parameter should adapt to the sample size, dimension and moments for optimal
tradeoff between bias and robustness. Our theoretical framework deals with
heavytailed distributions with bounded $(1+\delta)$th moment for any $\delta
> 0$. We establish a sharp phase transition for robust estimation of regression
parameters in both low and high dimensions: when $\delta \geq 1$, the estimator
admits a subGaussiantype deviation bound without subGaussian assumptions on
the data, while only a slower rate is available in the regime $0<\delta< 1$.
Furthermore, this transition is smooth and optimal. In addition, we extend the
methodology to allow both heavytailed predictors and observation noise.
Simulation studies lend further support to the theory. In a genetic study of
cancer cell lines that exhibit heavytailedness, the proposed methods are shown
to be more robust and predictive.

We study factor models augmented by observed covariates that have explanatory
powers on the unknown factors. In financial factor models, the unknown factors
can be reasonably well explained by a few observable proxies, such as the
FamaFrench factors. In diffusion index forecasts, identified factors are
strongly related to several directly measurable economic variables such as
consumptionwealth variable, financial ratios, and term spread. With those
covariates, both the factors and loadings are identifiable up to a rotation
matrix even only with a finite dimension. To incorporate the explanatory power
of these covariates, we propose a smoothed principal component analysis (PCA):
(i) regress the data onto the observed covariates, and (ii) take the principal
components of the fitted data to estimate the loadings and factors. This allows
us to accurately estimate the percentage of both explained and unexplained
components in factors and thus to assess the explanatory power of covariates.
We show that both the estimated factors and loadings can be estimated with
improved rates of convergence compared to the benchmark method. The degree of
improvement depends on the strength of the signals, representing the
explanatory power of the covariates on the factors. The proposed estimator is
robust to possibly heavytailed distributions. We apply the model to forecast
US bond risk premia, and find that the observed macroeconomic characteristics
contain strong explanatory powers of the factors. The gain of forecast is more
substantial when the characteristics are incorporated to estimate the common
factors than directly used for forecasts.

This paper studies model selection consistency for high dimensional sparse
regression when data exhibits both crosssectional and serial dependency. Most
commonlyused model selection methods fail to consistently recover the true
model when the covariates are highly correlated. Motivated by econometric
studies, we consider the case where covariate dependence can be reduced through
factor model, and propose a consistent strategy named FactorAdjusted
Regularized Model Selection (FarmSelect). By separating the latent factors from
idiosyncratic components, we transform the problem from model selection with
highly correlated covariates to that with weakly correlated variables. Model
selection consistency as well as optimal rates of convergence are obtained
under mild conditions. Numerical studies demonstrate the nice finite sample
performance in terms of both model selection and outofsample prediction.
Moreover, our method is flexible in a sense that it pays no price for weakly
correlated and uncorrelated cases. Our method is applicable to a wide range of
high dimensional sparse regression problems. An Rpackage FarmSelect is also
provided for implementation.

This paper is concerned with the problem of top$K$ ranking from pairwise
comparisons. Given a collection of $n$ items and a few pairwise comparisons
across them, one wishes to identify the set of $K$ items that receive the
highest ranks. To tackle this problem, we adopt the logistic parametric model
 the BradleyTerryLuce model, where each item is assigned a latent
preference score, and where the outcome of each pairwise comparison depends
solely on the relative scores of the two items involved. Recent works have made
significant progress towards characterizing the performance (e.g. the mean
square error for estimating the scores) of several classical methods, including
the spectral method and the maximum likelihood estimator (MLE). However, where
they stand regarding top$K$ ranking remains unsettled.
We demonstrate that under a natural random sampling model, the spectral
method alone, or the regularized MLE alone, is minimax optimal in terms of the
sample complexity  the number of paired comparisons needed to ensure exact
top$K$ identification, for the fixed dynamic range regime. This is
accomplished via optimal control of the entrywise error of the score estimates.
We complement our theoretical studies by numerical experiments, confirming that
both methods yield low entrywise errors for estimating the underlying scores.
Our theory is established via a novel leaveoneout trick, which proves
effective for analyzing both iterative and noniterative procedures. Along the
way, we derive an elementary eigenvector perturbation bound for probability
transition matrices, which parallels the DavisKahan $\sin\Theta$ theorem for
symmetric matrices. This also allows us to close the gap between the $\ell_2$
error upper bound for the spectral method and the minimax lower limit.

Many statistical models seek relationship between variables via subspaces of
reduced dimensions. For instance, in factor models, variables are roughly
distributed around a low dimensional subspace determined by the loading matrix;
in mixed linear regression models, the coefficient vectors for different
mixtures form a subspace that captures all regression functions; in multiple
index models, the effect of covariates is summarized by the effective dimension
reduction space.
Such subspaces are typically unknown, and good estimates are crucial for data
visualization, dimension reduction, diagnostics and estimation of unknown
parameters. Usually, we can estimate these subspaces by computing moments from
data. Often, there are many ways to estimate a subspace, by using moments of
different orders, transformed moments, etc. A natural question is: how can we
combine all these moment conditions and achieve optimality for subspace
estimation?
In this paper, we formulate our problem as estimation of an unknown subspace
$\mathcal{S}$ of dimension $r$, given a set of overidentifying vectors $\{
\mathrm{\bf v}_\ell \}_{\ell=1}^m$ (namely $m \ge r$) that satisfy $\mathbb{E}
\mathrm{\bf v}_{\ell} \in \mathcal{S}$ and have the form $$ \mathrm{\bf v}_\ell
= \frac{1}{n} \sum_{i=1}^n \mathrm{\bf f}_\ell(\mathbf{x}_i, y_i), $$ where
data are i.i.d. and each function $\mathrm{\bf f}_\ell$ is known. By exploiting
certain covariance information related to $\mathrm{\bf v}_\ell$, our estimator
of $\mathcal{S}$ uses an optimal weighting matrix and achieves the smallest
asymptotic error, in terms of canonical angles. The analysis is based on the
generalized method of moments that is tailored to our problem. Our method is
applied to aforementioned models and distributed estimation of heterogeneous
datasets, and may be potentially extended to analyze matrix completion, neural
nets, among others.

This paper considers the problem of solving systems of quadratic equations,
namely, recovering an object of interest
$\mathbf{x}^{\natural}\in\mathbb{R}^{n}$ from $m$ quadratic equations/samples
$y_{i}=(\mathbf{a}_{i}^{\top}\mathbf{x}^{\natural})^{2}$, $1\leq i\leq m$. This
problem, also dubbed as phase retrieval, spans multiple domains including
physical sciences and machine learning.
We investigate the efficiency of gradient descent (or Wirtinger flow)
designed for the nonconvex least squares problem. We prove that under Gaussian
designs, gradient descent  when randomly initialized  yields an
$\epsilon$accurate solution in $O\big(\log n+\log(1/\epsilon)\big)$ iterations
given nearly minimal samples, thus achieving nearoptimal computational and
sample complexities at once. This provides the first global convergence
guarantee concerning vanilla gradient descent for phase retrieval, without the
need of (i) carefullydesigned initialization, (ii) sample splitting, or (iii)
sophisticated saddlepoint escaping schemes. All of these are achieved by
exploiting the statistical models in analyzing optimization algorithms, via a
leaveoneout approach that enables the decoupling of certain statistical
dependency between the gradient descent iterates and the data.

This paper studies hypothesis testing and confidence interval construction in
highdimensional linear models with possible nonsparse structures. For a given
component of the parameter vector, we show that the difficulty of the problem
depends on the sparsity of the corresponding row of the precision matrix of the
covariates, not the sparsity of the model itself. We develop new concepts of
uniform and essentially uniform nontestability that allow the study of
limitations of tests across a broad set of alternatives. Uniform
nontestability identifies an extensive collection of alternatives such that
the power of any test, against any alternative in this group, is asymptotically
at most equal to the nominal size, whereas minimaxity shows the existence of
one particularly "bad" alternative. Implications of the new constructions
include new minimax testability results that in sharp contrast to existing
results do not depend on the sparsity of the model parameters. We identify new
tradeoffs between testability and feature correlation. In particular, we show
that in models with weak feature correlations minimax lower bound can be
attained by a confidence interval whose width has the parametric rate
regardless of the size of the model sparsity.

We establish the counterpart of Hoeffding's lemma for Markov dependent random
variables. Specifically, if a stationary Markov chain $\{X_i\}_{i \ge 1}$ with
invariant measure $\pi$ admits an $\mathcal{L}_2(\pi)$spectral gap
$1\lambda$, then for any bounded functions $f_i: x \mapsto [a_i,b_i]$, the sum
of $f_i(X_i)$ is subGaussian with variance proxy $\frac{1+\lambda}{1\lambda}
\cdot \sum_i \frac{(b_ia_i)^2}{4}$. The counterpart of Hoeffding's inequality
immediately follows. Our results assume none of reversibility, countable state
space and timehomogeneity of Markov chains. They are optimal in terms of the
multiplicative coefficient $(1+\lambda)/(1\lambda)$, and cover Hoeffding's
lemma and inequality for independent random variables as special cases with
$\lambda = 0$. We illustrate the utility of these results by applying them to
six problems in statistics and machine learning. They are linear regression,
lasso regression, sparse covariance matrix estimation with Markovdependent
samples; Markov chain Monte Carlo estimation; respondence driven sampling; and
multiarmed bandit problems with Markovian rewards.

Principal component analysis (PCA) is fundamental to statistical machine
learning. It extracts latent principal factors that contribute to the most
variation of the data. When data are stored across multiple machines, however,
communication cost can prohibit the computation of PCA in a central location
and distributed algorithms for PCA are thus needed. This paper proposes and
studies a distributed PCA algorithm: each node machine computes the top $K$
eigenvectors and transmits them to the central server; the central server then
aggregates the information from all the node machines and conducts a PCA based
on the aggregated information. We investigate the bias and variance for the
resulting distributed estimator of the top $K$ eigenvectors. In particular, we
show that for distributions with symmetric innovation, the empirical top
eigenspaces are unbiased and hence the distributed PCA is "unbiased". We derive
the rate of convergence for distributed PCA estimators, which depends
explicitly on the effective rank of covariance, eigengap, and the number of
machines. We show that when the number of machines is not unreasonably large,
the distributed PCA performs as well as the whole sample PCA, even without full
access of whole data. The theoretical results are verified by an extensive
simulation study. We also extend our analysis to the heterogeneous case where
the population covariance matrices are different across local machines but
share similar top eigenstructures.

Big data is transforming our world, revolutionizing operations and analytics
everywhere, from financial engineering to biomedical sciences. The complexity
of big data often makes dimension reduction techniques necessary before
conducting statistical inference. Principal component analysis, commonly
referred to as PCA, has become an essential tool for multivariate data analysis
and unsupervised dimension reduction, the goal of which is to find a lower
dimensional subspace that captures most of the variation in the dataset. This
article provides an overview of methodological and theoretical developments of
PCA over the last decade, with focus on its applications to big data analytics.
We first review the mathematical formulation of PCA and its theoretical
development from the view point of perturbation analysis. We then briefly
discuss the relationship between PCA and factor analysis as well as its
applications to large covariance estimation and multiple testing. PCA also
finds important applications in many modern machine learning problems, and we
focus on community detection, ranking, mixture model and manifold learning in
this paper.

Recovering lowrank structures via eigenvector perturbation analysis is a
common problem in statistical machine learning, such as in factor analysis,
community detection, ranking, matrix completion, among others. While a large
variety of bounds are available for average errors between empirical and
population statistics of eigenvectors, few results are tight for entrywise
analyses, which are critical for a number of problems such as community
detection.
This paper investigates entrywise behaviors of eigenvectors for a large class
of random matrices whose expectations are lowrank, which helps settle the
conjecture in Abbe et al. (2014b) that the spectral algorithm achieves exact
recovery in the stochastic block model without any trimming or cleaning steps.
The key is a firstorder approximation of eigenvectors under the $\ell_\infty$
norm: $$u_k \approx \frac{A u_k^*}{\lambda_k^*},$$ where $\{u_k\}$ and
$\{u_k^*\}$ are eigenvectors of a random matrix $A$ and its expectation
$\mathbb{E} A$, respectively. The fact that the approximation is both tight and
linear in $A$ facilitates sharp comparisons between $u_k$ and $u_k^*$. In
particular, it allows for comparing the signs of $u_k$ and $u_k^*$ even if $\
u_k  u_k^*\_{\infty}$ is large. The results are further extended to
perturbations of eigenspaces, yielding new $\ell_\infty$type bounds for
synchronization ($\mathbb{Z}_2$spiked Wigner model) and noisy matrix
completion.

Penalized estimation principle is fundamental to highdimensional problems.
In the literature, it has been extensively and successfully applied to various
models with only structural parameters. As a contrast, in this paper, we apply
this penalization principle to a linear regression model with a
finitedimensional vector of structural parameters and a highdimensional
vector of sparse incidental parameters. For the estimators of the structural
parameters, we derive their consistency and asymptotic normality, which reveals
an oracle property. However, the penalized estimators for the incidental
parameters possess only partial selection consistency but not consistency. This
is an interesting partial consistency phenomenon: the structural parameters are
consistently estimated while the incidental ones cannot. For the structural
parameters, also considered is an alternative twostep penalized estimator,
which has fewer possible asymptotic distributions and thus is more suitable for
statistical inferences. We further extend the methods and results to the case
where the dimension of the structural parameter vector diverges with but slower
than the sample size. A datadriven approach for selecting a penalty
regularization parameter is provided. The finitesample performance of the
penalized estimators for the structural parameters is evaluated by simulations
and a real data set is analyzed.

Over the last two decades, many exciting variable selection methods have been
developed for finding a small group of covariates that are associated with the
response from a large pool. Can the discoveries from these data mining
approaches be spurious due to high dimensionality and limited sample size? Can
our fundamental assumptions about the exogeneity of the covariates needed for
such variable selection be validated with the data? To answer these questions,
we need to derive the distributions of the maximum spurious correlations given
a certain number of predictors, namely, the distribution of the correlation of
a response variable $Y$ with the best $s$ linear combinations of $p$ covariates
$\mathbf{X}$, even when $\mathbf{X}$ and $Y$ are independent. When the
covariance matrix of $\mathbf{X}$ possesses the restricted eigenvalue property,
we derive such distributions for both a finite $s$ and a diverging $s$, using
Gaussian approximation and empirical process techniques. However, such a
distribution depends on the unknown covariance matrix of $\mathbf{X}$. Hence,
we use the multiplier bootstrap procedure to approximate the unknown
distributions and establish the consistency of such a simple bootstrap
approach. The results are further extended to the situation where the residuals
are from regularized fits. Our approach is then used to construct the upper
confidence limit for the maximum spurious correlation and to test the
exogeneity of the covariates. The former provides a baseline for guarding
against false discoveries and the latter tests whether our fundamental
assumptions for highdimensional model selection are statistically valid. Our
techniques and results are illustrated with both numerical examples and real
data analysis.

In statistics and machine learning, people are often interested in the
eigenvectors (or singular vectors) of certain matrices (e.g. covariance
matrices, data matrices, etc). However, those matrices are usually perturbed by
noises or statistical errors, either from random sampling or structural
patterns. One usually employs DavisKahan $\sin \theta$ theorem to bound the
difference between the eigenvectors of a matrix $A$ and those of a perturbed
matrix $\widetilde{A} = A + E$, in terms of $\ell_2$ norm. In this paper, we
prove that when $A$ is a lowrank and incoherent matrix, the $\ell_{\infty}$
norm perturbation bound of singular vectors (or eigenvectors in the symmetric
case) is smaller by a factor of $\sqrt{d_1}$ or $\sqrt{d_2}$ for left and right
vectors, where $d_1$ and $d_2$ are the matrix dimensions. The power of this new
perturbation result is shown in robust covariance estimation, particularly when
random variables have heavy tails. There, we propose new robust covariance
estimators and establish their asymptotic properties using the newly developed
perturbation bound. Our theoretical results are verified through extensive
numerical experiments.

This paper introduces a simple principle for robust highdimensional
statistical inference via an appropriate shrinkage on the data. This widens the
scope of highdimensional techniques, reducing the moment conditions from
subexponential or subGaussian distributions to merely bounded second or
fourth moment. As an illustration of this principle, we focus on robust
estimation of the lowrank matrix $\Theta^*$ from the trace regression model
$Y=Tr (\Theta^{*T}X) +\epsilon$. It encompasses four popular problems: sparse
linear models, compressed sensing, matrix completion and multitask regression.
We propose to apply penalized leastsquares approach to appropriately truncated
or shrunk data. Under only bounded $2+\delta$ moment condition on the response,
the proposed robust methodology yields an estimator that possesses the same
statistical error rates as previous literature with subGaussian errors. For
sparse linear models and multitasking regression, we further allow the design
to have only bounded fourth moment and obtain the same statistical rates,
again, by appropriate shrinkage of the design matrix. As a byproduct, we give a
robust covariance matrix estimator and establish its concentration inequality
in terms of the spectral norm when the random samples have only bounded fourth
moment. Extensive simulations have been carried out to support our theories.

We propose a computational framework named iterative local adaptive
majorizeminimization (ILAMM) to simultaneously control algorithmic complexity
and statistical error when fitting high dimensional models. ILAMM is a
twostage algorithmic implementation of the local linear approximation to a
family of folded concave penalized quasilikelihood. The first stage solves a
convex program with a crude precision tolerance to obtain a coarse initial
estimator, which is further refined in the second stage by iteratively solving
a sequence of convex programs with smaller precision tolerances. Theoretically,
we establish a phase transition: the first stage has a sublinear iteration
complexity, while the second stage achieves an improved linear rate of
convergence. Though this framework is completely algorithmic, it provides
solutions with optimal statistical performances and controlled algorithmic
complexity for a large family of nonconvex optimization problems. The iteration
effects on statistical errors are clearly demonstrated via a contraction
property. Our theory relies on a localized version of the sparse/restricted
eigenvalue condition, which allows us to analyze a large family of loss and
penalty functions and provide optimality guarantees under very weak assumptions
(For example, ILAMM requires much weaker minimal signal strength than other
procedures). Thorough numerical results are provided to support the obtained
theory.

Factor modeling is an essential tool for exploring intrinsic dependence
structures among highdimensional random variables. Much progress has been made
for estimating the covariance matrix from a highdimensional factor model.
However, the blessing of dimensionality has not yet been fully embraced in the
literature: much of the available data is often ignored in constructing
covariance matrix estimates. If our goal is to accurately estimate a covariance
matrix of a set of targeted variables, shall we employ additional data, which
are beyond the variables of interest, in the estimation? In this paper, we
provide sufficient conditions for an affirmative answer, and further quantify
its gain in terms of Fisher information and convergence rate. In fact, even an
oraclelike result (as if all the factors were known) can be achieved when a
sufficiently large number of variables is used. The idea of utilizing data as
much as possible brings computational challenges. A divideandconquer
algorithm is thus proposed to alleviate the computational burden, and also
shown not to sacrifice any statistical accuracy in comparison with a pooled
analysis. Simulation studies further confirm our advocacy for the use of full
data, and demonstrate the effectiveness of the above algorithm. Our proposal is
applied to a microarray data example that shows empirical benefits of using
more data.

Many data mining and statistical machine learning algorithms have been
developed to select a subset of covariates to associate with a response
variable. Spurious discoveries can easily arise in highdimensional data
analysis due to enormous possibilities of such selections. How can we know
statistically our discoveries better than those by chance? In this paper, we
define a measure of goodness of spurious fit, which shows how good a response
variable can be fitted by an optimally selected subset of covariates under the
null model, and propose a simple and effective LAMM algorithm to compute it. It
coincides with the maximum spurious correlation for linear models and can be
regarded as a generalized maximum spurious correlation. We derive the
asymptotic distribution of such goodness of spurious fit for generalized linear
models and $L_1$ regression. Such an asymptotic distribution depends on the
sample size, ambient dimension, the number of variables used in the fit, and
the covariance information. It can be consistently estimated by multiplier
bootstrapping and used as a benchmark to guard against spurious discoveries. It
can also be applied to model selection, which considers only candidate models
with goodness of fits better than those by spurious fits. The theory and method
are convincingly illustrated by simulated examples and an application to the
binary outcomes from German Neuroblastoma Trials.

Motivated by the problem of colocalization analysis in fluorescence
microscopic imaging, we study in this paper structured detection of correlated
regions between two random processes observed on a common domain. We argue that
although intuitive, direct use of the maximum loglikelihood statistic suffers
from potential bias and substantially reduced power, and introduce a simple
sizebased normalization to overcome this problem. We show that scanning with
the proposed sizecorrected likelihood ratio statistics leads to optimal
correlation detection over a large collection of structured correlation
detection problems.

Heterogeneity is an unwanted variation when analyzing aggregated datasets
from multiple sources. Though different methods have been proposed for
heterogeneity adjustment, no systematic theory exists to justify these methods.
In this work, we propose a generic framework named ALPHA (short for Adaptive
Lowrank Principal Heterogeneity Adjustment) to model, estimate, and adjust
heterogeneity from the original data. Once the heterogeneity is adjusted, we
are able to remove the biases of batch effects and to enhance the inferential
power by aggregating the homogeneous residuals from multiple sources. Under a
pervasive assumption that the latent heterogeneity factors simultaneously
affect a large fraction of observed variables, we provide a rigorous theory to
justify the proposed framework. Our framework also allows the incorporation of
informative covariates and appeals to the "Bless of Dimensionality". As an
illustrative application of this generic framework, we consider a problem of
estimating highdimensional precision matrix for graphical model inference
based on multiple datasets. We also provide thorough numerical studies on both
synthetic datasets and a brain imaging dataset to demonstrate the efficacy of
the developed theory and methods.

In this paper, we study robust covariance estimation under the approximate
factor model with observed factors. We propose a novel framework to first
estimate the initial joint covariance matrix of the observed data and the
factors, and then use it to recover the covariance matrix of the observed data.
We prove that once the initial matrix estimator is good enough to maintain the
elementwise optimal rate, the whole procedure will generate an estimated
covariance with desired properties. For data with only bounded fourth moments,
we propose to use Huber loss minimization to give the initial joint covariance
estimation. This approach is applicable to a much wider range of distributions,
including subGaussian and elliptical distributions. We also present an
asymptotic result for Huber's Mestimator with a diverging parameter. The
conclusions are demonstrated by extensive simulations and real data analysis.

This paper introduces a Projected Principal Component Analysis
(ProjectedPCA), which employs principal component analysis to the projected
(smoothed) data matrix onto a given linear space spanned by covariates. When it
applies to highdimensional factor analysis, the projection removes noise
components. We show that the unobserved latent factors can be more accurately
estimated than the conventional PCA if the projection is genuine, or more
precisely, when the factor loading matrices are related to the projected linear
space. When the dimensionality is large, the factors can be estimated
accurately even when the sample size is finite. We propose a flexible
semiparametric factor model, which decomposes the factor loading matrix into
the component that can be explained by subjectspecific covariates and the
orthogonal residual component. The covariates' effects on the factor loadings
are further modeled by the additive model via sieve approximations. By using
the newly proposed ProjectedPCA, the rates of convergence of the smooth factor
loading matrices are obtained, which are much faster than those of the
conventional factor analysis. The convergence is achieved even when the sample
size is finite and is particularly appealing in the
highdimensionlowsamplesize situation. This leads us to developing
nonparametric tests on whether observed covariates have explaining powers on
the loadings and whether they fully explain the loadings. The proposed method
is illustrated by both simulated data and the returns of the components of the
S&P 500 index.

We consider forecasting a single time series when there is a large number of
predictors and a possible nonlinear effect. The dimensionality was first
reduced via a highdimensional (approximate) factor model implemented by the
principal component analysis. Using the extracted factors, we develop a novel
forecasting method called the sufficient forecasting, which provides a set of
sufficient predictive indices, inferred from highdimensional predictors, to
deliver additional predictive power. The projected principal component analysis
will be employed to enhance the accuracy of inferred factors when a
semiparametric (approximate) factor model is assumed. Our method is also
applicable to crosssectional sufficient regression using extracted factors.
The connection between the sufficient forecasting and the deep learning
architecture is explicitly stated. The sufficient forecasting correctly
estimates projection indices of the underlying factors even in the presence of
a nonparametric forecasting function. The proposed method extends the
sufficient dimension reduction to highdimensional regimes by condensing the
crosssectional information through factor models. We derive asymptotic
properties for the estimate of the central subspace spanned by these projection
directions as well as the estimates of the sufficient predictive indices. We
further show that the natural method of running multiple regression of target
on estimated factors yields a linear estimate that actually falls into this
central subspace. Our method and theory allow the number of predictors to be
larger than the number of observations. We finally demonstrate that the
sufficient forecasting improves upon the linear forecasting in both simulation
studies and an empirical study of forecasting macroeconomic variables.