We consider the estimation and inference of graphical models that
characterize the dependency structure of high-dimensional tensor-valued data.
To facilitate the estimation of the precision matrix corresponding to each way
of the tensor, we assume the data follow a tensor normal distribution whose
covariance has a Kronecker product structure. A critical challenge in the
estimation and inference of this model is the fact that its penalized maximum
likelihood estimation involves minimizing a non-convex objective function. To
address it, this paper makes two contributions: (i) In spite of the
non-convexity of this estimation problem, we prove that an alternating
minimization algorithm, which iteratively estimates each sparse precision
matrix while fixing the others, attains an estimator with an optimal
statistical rate of convergence. (ii) We propose a de-biased statistical
inference procedure for testing hypotheses on the true support of the sparse
precision matrices, and employ it for testing a growing number of hypothesis
with false discovery rate (FDR) control. The asymptotic normality of our test
statistic and the consistency of FDR control procedure are established. Our
theoretical results are backed up by thorough numerical studies and our real
applications on neuroimaging studies of Autism spectrum disorder and users'
advertising click analysis bring new scientific findings and business insights.
The proposed methods are encoded into a publicly available R package Tlasso.
Dynamic tensor data are becoming prevalent in numerous applications. Existing
tensor clustering methods either fail to account for the dynamic nature of the
data, or are inapplicable to a general-order tensor. Also there is often a gap
between statistical guarantee and computational efficiency for existing tensor
clustering solutions. In this article, we aim to bridge this gap by proposing a
new dynamic tensor clustering method, which takes into account both sparsity
and fusion structures, and enjoys strong statistical guarantees as well as high
computational efficiency. Our proposal is based upon a new structured tensor
factorization that encourages both sparsity and smoothness in parameters along
the specified tensor modes. Computationally, we develop a highly efficient
optimization algorithm that benefits from substantial dimension reduction. In
theory, we first establish a non-asymptotic error bound for the estimator from
the structured tensor factorization. Built upon this error bound, we then
derive the rate of convergence of the estimated cluster centers, and show that
the estimated clusters recover the true cluster structures with a high
probability. Moreover, our proposed method can be naturally extended to
co-clustering of multiple modes of the tensor data. The efficacy of our
approach is illustrated via simulations and a brain dynamic functional
connectivity analysis from an Autism spectrum disorder study.
Cluster analysis is a fundamental tool for pattern discovery of complex
heterogeneous data. Prevalent clustering methods mainly focus on vector or
matrix-variate data and are not applicable to general-order tensors, which
arise frequently in modern scientific and business applications. Moreover,
there is a gap between statistical guarantees and computational efficiency for
existing tensor clustering solutions due to the nature of their non-convex
formulations. In this work, we bridge this gap by developing a provable convex
formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator
enjoys stability guarantees and is both computationally and storage efficient.
We further establish a non-asymptotic error bound for the CoCo estimator, which
reveals a surprising "blessing of dimensionality" phenomenon that does not
exist in vector or matrix-variate cluster analysis. Our theoretical findings
are supported by extensive simulated studies. Finally, we apply the CoCo
estimator to the cluster analysis of advertisement click tensor data from a
major online company. Our clustering results provide meaningful business
insights to improve advertising effectiveness.
We consider joint estimation of multiple graphical models arising from
heterogeneous and high-dimensional observations. Unlike most previous
approaches which assume that the cluster structure is given in advance, an
appealing feature of our method is to learn cluster structure while estimating
heterogeneous graphical models. This is achieved via a high dimensional version
of Expectation Conditional Maximization (ECM) algorithm (Meng and Rubin, 1993).
A joint graphical lasso penalty is imposed on the conditional maximization step
to extract both homogeneity and heterogeneity components across all clusters.
Our algorithm is computationally efficient due to fast sparse learning routines
and can be implemented without unsupervised learning knowledge. The superior
performance of our method is demonstrated by extensive experiments and its
application to a Glioblastoma cancer dataset reveals some new insights in
understanding the Glioblastoma cancer. In theory, a non-asymptotic error bound
is established for the output directly from our high dimensional ECM algorithm,
and it consists of two quantities: statistical error (statistical accuracy) and
optimization error (computational complexity). Such a result gives a
theoretical guideline in terminating our ECM iterations.
Convex clustering, a convex relaxation of k-means clustering and hierarchical
clustering, has drawn recent attentions since it nicely addresses the
instability issue of traditional nonconvex clustering methods. Although its
computational and statistical properties have been recently studied, the
performance of convex clustering has not yet been investigated in the
high-dimensional clustering scenario, where the data contains a large number of
features and many of them carry no information about the clustering structure.
In this paper, we demonstrate that the performance of convex clustering could
be distorted when the uninformative features are included in the clustering. To
overcome it, we introduce a new clustering method, referred to as Sparse Convex
Clustering, to simultaneously cluster observations and conduct feature
selection. The key idea is to formulate convex clustering in a form of
regularization, with an adaptive group-lasso penalty term on cluster centers.
In order to optimally balance the tradeoff between the cluster fitting and
sparsity, a tuning criterion based on clustering stability is developed. In
theory, we provide an unbiased estimator for the degrees of freedom of the
proposed sparse convex clustering method. Finally, the effectiveness of the
sparse convex clustering is examined through a variety of numerical experiments
and a real data application.
Stability is an important aspect of a classification procedure because
unstable predictions can potentially reduce users' trust in a classification
system and also harm the reproducibility of scientific conclusions. The major
goal of our work is to introduce a novel concept of classification instability,
i.e., decision boundary instability (DBI), and incorporate it with the
generalization error (GE) as a standard for selecting the most accurate and
stable classifier. Specifically, we implement a two-stage algorithm: (i)
initially select a subset of classifiers whose estimated GEs are not
significantly different from the minimal estimated GE among all the candidate
classifiers; (ii) the optimal classifier is chosen as the one achieving the
minimal DBI among the subset selected in stage (i). This general selection
principle applies to both linear and nonlinear classifiers. Large-margin
classifiers are used as a prototypical example to illustrate the above idea.
Our selection method is shown to be consistent in the sense that the optimal
classifier simultaneously achieves the minimal GE and the minimal DBI. Various
simulations and real examples further demonstrate the advantage of our method
over several alternative approaches.
Motivated by applications in neuroimaging analysis, we propose a new
regression model with a tensor response and a vector predictor. The model
embeds two key sparse structures: element-wise sparsity and low-rankness. It
can handle both a general and a symmetric tensor response, and thus is
applicable to both structural and functional neuroimaging data. We formulate
the model parameter estimation as a non-convex optimization problem, and
develop an efficient alternating updating algorithm. We establish a
non-asymptotic estimation error bound for the actual estimator obtained from
the proposed algorithm. This error bound reveals an interesting interaction
between the computational efficiency and the statistical rate of convergence.
Based on this general error bound, we further obtain an optimal estimation
error rate when the distribution of the error tensor is Gaussian. We illustrate
the efficacy of our model through intensive simulations and an analysis of the
Autism spectrum disorder neuroimaging data.
We propose a novel sparse tensor decomposition method, namely Tensor
Truncated Power (TTP) method, that incorporates variable selection into the
estimation of decomposition components. The sparsity is achieved via an
efficient truncation step embedded in the tensor power iteration. Our method
applies to a broad family of high dimensional latent variable models, including
high dimensional Gaussian mixture and mixtures of sparse regressions. A
thorough theoretical investigation is further conducted. In particular, we show
that the final decomposition estimator is guaranteed to achieve a local
statistical rate, and further strengthen it to the global statistical rate by
introducing a proper initialization procedure. In high dimensional regimes, the
obtained statistical rate significantly improves those shown in the existing
non-sparse decomposition methods. The empirical advantages of TTP are confirmed
in extensive simulated results and two real applications of click-through rate
prediction and high-dimensional gene clustering.