
The purpose of this paper is to reinvestigate the estimation of multiple
factor models by relaxing the convention that the number of factors is small.
We first obtain the collection of all possible factors and we provide a
simultaneous test, security by security, of which factors are significant.
Since the collection of risk factors selected for investigation is large and
highly correlated, we use dimension reduction methods, including the Least
Absolute Shrinkage and Selection Operator (LASSO) and prototype clustering, to
perform the investigation. For comparison with the existing literature, we
compare the multifactor model's performance with the FamaFrench 5factor
model. We find that both the FamaFrench 5factor and the multifactor model
are consistent with the behavior of "largetime scale" security returns. In a
goodnessoffit test comparing the FamaFrench 5factor with the multifactor
model, the multifactor model has a substantially larger adjusted $R^{2}$.
Robustness tests confirm that the multifactor model provides a reasonable
characterization of security returns.

A new empirical Bayes approach to variable selection in the context of
generalized linear models is developed. The proposed algorithm scales to
situations in which the number of putative explanatory variables is very large,
possibly much larger than the number of responses. The coefficients in the
linear predictor are modeled as a threecomponent mixture allowing the
explanatory variables to have a random positive effect on the response, a
random negative effect, or no effect. A key assumption is that only a small
(but unknown) fraction of the candidate variables have a nonzero effect. This
assumption, in addition to treating the coefficients as random effects
facilitates an approach that is computationally efficient. In particular, the
number of parameters that have to be estimated is small, and remains constant
regardless of the number of explanatory variables. The model parameters are
estimated using a modified form of the EM algorithm which is scalable, and
leads to significantly faster convergence compared with simulationbased fully
Bayesian methods.

We study the behavior of a real $p$dimensional Wishart random matrix with
$n$ degrees of freedom when $n,p\rightarrow\infty$ but $p/n\rightarrow 0$. We
establish the existence of phase transitions when $p$ grows at the order
$n^{(K+1)/(K+3)}$ for every $k\in\mathbb{N}$, and derive expressions for
approximating densities between every two phase transitions. To do this, we
make use of a novel tool we call the Gtransform of a distribution, which is
closely related to the characteristic function. We also derive an extension of
the $t$distribution to the real symmetric matrices, which naturally appears as
the conjugate distribution to the Wishart under a Gtransformation, and show
its empirical spectral distribution obeys a semicircle law when $p/n\rightarrow
0$. Finally, we discuss how the phase transitions of the Wishart distribution
might originate from changes in rates of convergence of symmetric $t$
statistics.

Ensembles of decision trees are known to perform well on many problems, but
are not interpretable. In contrast to existing explanations of tree ensembles
that explain relationships between features and predictions, we propose an
alternative approach to interpreting tree ensembles by surfacing representative
points for each class, in which we explain a prediction by presenting points
with similar predictions  prototypes. We introduce a new distance for
Gradient Boosted Tree models, and propose new prototype selection methods with
theoretical guarantees, with the flexibility to choose a different number of
prototypes in each class. We demonstrate our methods on random forests and
gradient boosted trees, showing that our found prototypes perform as well as or
even better than the original tree ensemble when used as a nearestprototype
classifier. We also present a use case of debugging dataset errors using our
proposed methods.

The greatest root statistic arises as the test statistic in several
multivariate analysis settings. Suppose there is a global null hypothesis that
consists of different independent subnull hypotheses, and suppose the greatest
root statistic is used as the test statistic for each subnull hypothesis. Such
problems may arise when conducting a batch MANOVA or several batches of
pairwise testing for equality of covariance matrices. Using the
unionintersection testing approach and by letting the problem dimension tend
to infinity faster than the number of batches, we show that the global null can
be tested using a Gumbel distribution to approximate the critical values.
Although the theoretical results are asymptotic, simulation studies indicate
that the approximations are very good even for small to moderate dimensions.
The results are general and can be applied in any setting where the greatest
root statistic is used, not just for the two methods we use for illustrative
purposes.

We develop a modelbased empirical Bayes approach to variable selection
problems in which the number of predictors is very large, possibly much larger
than the number of responses (the socalled 'large p, small n' problem). We
consider the multiple linear regression setting, where the response is assumed
to be a continuous variable and it is a linear function of the predictors plus
error. The explanatory variables in the linear model can have a positive effect
on the response, a negative effect, or no effect. We model the effects of the
linear predictors as a threecomponent mixture in which a key assumption is
that only a small (unknown) fraction of the candidate predictors have a
nonzero effect on the response variable. By treating the coefficients as
random effects we develop an approach that is computationally efficient because
the number of parameters that have to be estimated is small, and remains
constant regardless of the number of explanatory variables. The model
parameters are estimated using the EM algorithm which is scalable and leads to
significantly faster convergence, compared with simulationbased methods.

We consider the problem of estimating covariance and precision matrices, and
their associated discriminant coefficients, from normal data when the rank of
the covariance matrix is strictly smaller than its dimension and the available
sample size. Using unbiased risk estimation, we construct novel estimators by
minimizing upper bounds on the difference in risk over several classes. Our
proposal estimates are empirically demonstrated to offer substantial
improvement over classical approaches.

We investigate the difference between using an $\ell_1$ penalty versus an
$\ell_1$ constraint in generalized eigenvalue problems, such as principal
component analysis and discriminant analysis. Our main finding is that an
$\ell_1$ penalty may fail to provide very sparse solutions; a severe
disadvantage for variable selection that can be remedied by using an $\ell_1$
constraint. Our claims are supported both by empirical evidence and theoretical
analysis. Finally, we illustrate the advantages of an $\ell_1$ constraint in
the context of discriminant analysis and principal component analysis.

This article considers the problem of sparse estimation of canonical vectors
in linear discriminant analysis when $p\gg N$. Several methods have been
proposed in the literature that estimate one canonical vector in the twogroup
case. However, $G1$ canonical vectors can be considered if the number of
groups is $G$. In the multigroup context, it is common to estimate canonical
vectors in a sequential fashion. Moreover, separate prior estimation of the
covariance structure is often required. We propose a novel methodology for
direct estimation of canonical vectors. In contrast to existing techniques, the
proposed method estimates all canonical vectors at once, performs variable
selection across all the vectors and comes with theoretical guarantees on the
variable selection and classification consistency. First, we highlight the fact
that in the $N>p$ setting the canonical vectors can be expressed in a closed
form up to an orthogonal transformation. Secondly, we propose an extension of
this form to the $p\gg N$ setting and achieve feature selection by using a
group penalty. The resulting optimization problem is convex and can be solved
using a blockcoordinate descent algorithm. The practical performance of the
method is evaluated through simulation studies as well as real data
applications.

It is well known that in a supervised classification setting when the number
of features is smaller than the number of observations, Fisher's linear
discriminant rule is asymptotically Bayes. However, there are numerous modern
applications where classification is needed in the highdimensional setting.
Naive implementation of Fisher's rule in this case fails to provide good
results because the sample covariance matrix is singular. Moreover, by
constructing a classifier that relies on all features the interpretation of the
results is challenging. Our goal is to provide robust classification that
relies only on a small subset of important features and accounts for the
underlying correlation structure. We apply a lassotype penalty to the
discriminant vector to ensure sparsity of the solution and use a shrinkage type
estimator for the covariance matrix. The resulting optimization problem is
solved using an iterative coordinate ascent algorithm. Furthermore, we analyze
the effect of nonconvexity on the sparsity level of the solution and highlight
the difference between the penalized and the constrained versions of the
problem. The simulation results show that the proposed method performs
favorably in comparison to alternatives. The method is used to classify
leukemia patients based on DNA methylation features.

The problem of estimating a spiked covariance matrix in high dimensions under
Frobenius loss, and the parallel problem of estimating the noise in spiked PCA
is investigated. We propose an estimator of the noise parameter by minimizing
an unbiased estimator of the invariant Frobenius risk using calculus of
variations. The resulting estimator is shown, using random matrix theory, to be
strongly consistent and essentially asymptotically normal and minimax for the
noise estimation problem. We apply the construction to construct a robust
spiked covariance matrix estimator with consistent eigenvalues.

In this article, we develop a modern perspective on Akaike's Information
Criterion and Mallows' Cp for model selection. Despite the diff erences in
their respective motivation, they are equivalent in the special case of
Gaussian linear regression. In this case they are also equivalent to a third
criterion, an unbiased estimator of the quadratic prediction loss, derived from
loss estimation theory. Our first contribution is to provide an explicit link
between loss estimation and model selection through a new oracle inequality. We
then show that the form of the unbiased estimator of the quadratic prediction
loss under a Gaussian assumption still holds under a more general
distributional assumption, the family of spherically symmetric distributions.
One of the features of our results is that our criterion does not rely on the
speci ficity of the distribution, but only on its spherical symmetry. Also this
family of laws o ffers some dependence property between the observations, a
case not often studied.

We consider the problem of estimating the mean vector of a pvariate normal
$(\theta,\Sigma)$ distribution under invariant quadratic loss,
$(\delta\theta)'\Sigma^{1}(\delta\theta)$, when the covariance is unknown.
We propose a new class of estimators that dominate the usual estimator
$\delta^0(X)=X$. The proposed estimators of $\theta$ depend upon X and an
independent Wishart matrix S with n degrees of freedom, however, S is singular
almost surely when p>n. The proof of domination involves the development of
some new unbiased estimators of risk for the p>n setting. We also find some
relationships between the amount of domination and the magnitudes of n and p.

Let $X$ be a random vector with distribution $P_{\theta}$ where $\theta$ is
an unknown parameter. When estimating $\theta$ by some estimator $\varphi(X)$
under a loss function $L(\theta,\varphi)$, classical decision theory advocates
that such a decision rule should be used if it has suitable properties with
respect to the frequentist risk $R(\theta,\varphi)$. However, after having
observed $X=x$, instances arise in practice in which $\varphi$ is to be
accompanied by an assessment of its loss, $L(\theta,\varphi(x))$, which is
unobservable since $\theta$ is unknown. A common approach to this assessment is
to consider estimation of $L(\theta,\varphi(x))$ by an estimator $\delta$,
called a loss estimator. We present an expository development of loss
estimation with substantial emphasis on the setting where the distributional
context is normal and its extension to the case where the underlying
distribution is spherically symmetric. Our overview covers improved loss
estimators for least squares but primarily focuses on shrinkage estimators.
Bayes estimation is also considered and comparisons are made with unbiased
estimation.

In this paper, we propose a general class of algorithms for optimizing an
extensive variety of nonsmoothly penalized objective functions that satisfy
certain regularity conditions. The proposed framework utilizes the
majorizationminimization (MM) algorithm as its core optimization engine. The
resulting algorithms rely on iterated softthresholding, implemented
componentwise, allowing for fast, stable updating that avoids the need for any
highdimensional matrix inversion. We establish a local convergence theory for
this class of algorithms under weaker assumptions than previously considered in
the statistical literature. We also demonstrate the exceptional effectiveness
of new acceleration methods, originally proposed for the EM algorithm, in this
class of problems. Simulation results and a microarray data example are
provided to demonstrate the algorithm's capabilities and versatility.

A twogroups mixedeffects model for the comparison of (normalized)
microarray data from two treatment groups is considered. Most competing
parametric methods that have appeared in the literature are obtained as special
cases or by minor modification of the proposed model. Approximate maximum
likelihood fitting is accomplished via a fast and scalable algorithm, which we
call LEMMA (Laplace approximated EM Microarray Analysis). The posterior odds of
treatment $\times$ gene interactions, derived from the model, involve shrinkage
estimates of both the interactions and of the gene specific error variances.
Genes are classified as being associated with treatment based on the posterior
odds and the local false discovery rate (f.d.r.) with a fixed cutoff. Our
modelbased approach also allows one to declare the nonnull status of a gene
by controlling the false discovery rate (FDR). It is shown in a detailed
simulation study that the approach outperforms wellknown competitors. We also
apply the proposed methodology to two previously analyzed microarray examples.
Extensions of the proposed method to paired treatments and multiple treatments
are also discussed.

Born in New Zealand, Shayle Robert Searle earned a bachelor's degree (1949)
and a master's degree (1950) from Victoria University, Wellington, New Zealand.
After working for an actuary, Searle went to Cambridge University where he
earned a Diploma in mathematical statistics in 1953. Searle won a Fulbright
travel award to Cornell University, where he earned a doctorate in animal
breeding, with a strong minor in statistics in 1959, studying under Professor
Charles Henderson. In 1962, Cornell invited Searle to work in the university's
computing center, and he soon joined the faculty as an assistant professor of
biological statistics. He was promoted to associate professor in 1965, and
became a professor of biological statistics in 1970. Searle has also been a
visiting professor at Texas A&M University, Florida State University,
Universit\"{a}t Augsburg and the University of Auckland. He has published
several statistics textbooks and has authored more than 165 papers. Searle is a
Fellow of the American Statistical Association, the Royal Statistical Society,
and he is an elected member of the International Statistical Institute. He also
has received the prestigious Alexander von Humboldt U.S. Senior Scientist
Award, is an Honorary Fellow of the Royal Society of New Zealand and was
recently awarded the D.Sc. Honoris Causa by his alma mater, Victoria University
of Wellington, New Zealand.

Traditional methods for covariate adjustment of treatment means in designed
experiments are inherently conditional on the observed covariate values. In
order to develop a coherent general methodology for analysis of covariance, we
propose a multivariate variance components model for the joint distribution of
the response and covariates. It is shown that, if the design is orthogonal with
respect to (random) blocking factors, then appropriate adjustments to treatment
means can be made using the univariate variance components model obtained by
conditioning on the observed covariate values. However, it is revealed that
some widely used models are incorrectly specified, leading to biased estimates
and incorrect standard errors. The approach clarifies some issues that have
been the source of ongoing confusion in the statistics literature.