• We consider the estimation and inference of graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. A critical challenge in the estimation and inference of this model is the fact that its penalized maximum likelihood estimation involves minimizing a non-convex objective function. To address it, this paper makes two contributions: (i) In spite of the non-convexity of this estimation problem, we prove that an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, attains an estimator with an optimal statistical rate of convergence. (ii) We propose a de-biased statistical inference procedure for testing hypotheses on the true support of the sparse precision matrices, and employ it for testing a growing number of hypothesis with false discovery rate (FDR) control. The asymptotic normality of our test statistic and the consistency of FDR control procedure are established. Our theoretical results are backed up by thorough numerical studies and our real applications on neuroimaging studies of Autism spectrum disorder and users' advertising click analysis bring new scientific findings and business insights. The proposed methods are encoded into a publicly available R package Tlasso.
  • Dynamic tensor data are becoming prevalent in numerous applications. Existing tensor clustering methods either fail to account for the dynamic nature of the data, or are inapplicable to a general-order tensor. Also there is often a gap between statistical guarantee and computational efficiency for existing tensor clustering solutions. In this article, we aim to bridge this gap by proposing a new dynamic tensor clustering method, which takes into account both sparsity and fusion structures, and enjoys strong statistical guarantees as well as high computational efficiency. Our proposal is based upon a new structured tensor factorization that encourages both sparsity and smoothness in parameters along the specified tensor modes. Computationally, we develop a highly efficient optimization algorithm that benefits from substantial dimension reduction. In theory, we first establish a non-asymptotic error bound for the estimator from the structured tensor factorization. Built upon this error bound, we then derive the rate of convergence of the estimated cluster centers, and show that the estimated clusters recover the true cluster structures with a high probability. Moreover, our proposed method can be naturally extended to co-clustering of multiple modes of the tensor data. The efficacy of our approach is illustrated via simulations and a brain dynamic functional connectivity analysis from an Autism spectrum disorder study.
  • Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations. In this work, we bridge this gap by developing a provable convex formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator enjoys stability guarantees and is both computationally and storage efficient. We further establish a non-asymptotic error bound for the CoCo estimator, which reveals a surprising "blessing of dimensionality" phenomenon that does not exist in vector or matrix-variate cluster analysis. Our theoretical findings are supported by extensive simulated studies. Finally, we apply the CoCo estimator to the cluster analysis of advertisement click tensor data from a major online company. Our clustering results provide meaningful business insights to improve advertising effectiveness.
  • We consider joint estimation of multiple graphical models arising from heterogeneous and high-dimensional observations. Unlike most previous approaches which assume that the cluster structure is given in advance, an appealing feature of our method is to learn cluster structure while estimating heterogeneous graphical models. This is achieved via a high dimensional version of Expectation Conditional Maximization (ECM) algorithm (Meng and Rubin, 1993). A joint graphical lasso penalty is imposed on the conditional maximization step to extract both homogeneity and heterogeneity components across all clusters. Our algorithm is computationally efficient due to fast sparse learning routines and can be implemented without unsupervised learning knowledge. The superior performance of our method is demonstrated by extensive experiments and its application to a Glioblastoma cancer dataset reveals some new insights in understanding the Glioblastoma cancer. In theory, a non-asymptotic error bound is established for the output directly from our high dimensional ECM algorithm, and it consists of two quantities: statistical error (statistical accuracy) and optimization error (computational complexity). Such a result gives a theoretical guideline in terminating our ECM iterations.
  • Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational and statistical properties have been recently studied, the performance of convex clustering has not yet been investigated in the high-dimensional clustering scenario, where the data contains a large number of features and many of them carry no information about the clustering structure. In this paper, we demonstrate that the performance of convex clustering could be distorted when the uninformative features are included in the clustering. To overcome it, we introduce a new clustering method, referred to as Sparse Convex Clustering, to simultaneously cluster observations and conduct feature selection. The key idea is to formulate convex clustering in a form of regularization, with an adaptive group-lasso penalty term on cluster centers. In order to optimally balance the tradeoff between the cluster fitting and sparsity, a tuning criterion based on clustering stability is developed. In theory, we provide an unbiased estimator for the degrees of freedom of the proposed sparse convex clustering method. Finally, the effectiveness of the sparse convex clustering is examined through a variety of numerical experiments and a real data application.
  • Stability is an important aspect of a classification procedure because unstable predictions can potentially reduce users' trust in a classification system and also harm the reproducibility of scientific conclusions. The major goal of our work is to introduce a novel concept of classification instability, i.e., decision boundary instability (DBI), and incorporate it with the generalization error (GE) as a standard for selecting the most accurate and stable classifier. Specifically, we implement a two-stage algorithm: (i) initially select a subset of classifiers whose estimated GEs are not significantly different from the minimal estimated GE among all the candidate classifiers; (ii) the optimal classifier is chosen as the one achieving the minimal DBI among the subset selected in stage (i). This general selection principle applies to both linear and nonlinear classifiers. Large-margin classifiers are used as a prototypical example to illustrate the above idea. Our selection method is shown to be consistent in the sense that the optimal classifier simultaneously achieves the minimal GE and the minimal DBI. Various simulations and real examples further demonstrate the advantage of our method over several alternative approaches.
  • Motivated by applications in neuroimaging analysis, we propose a new regression model with a tensor response and a vector predictor. The model embeds two key sparse structures: element-wise sparsity and low-rankness. It can handle both a general and a symmetric tensor response, and thus is applicable to both structural and functional neuroimaging data. We formulate the model parameter estimation as a non-convex optimization problem, and develop an efficient alternating updating algorithm. We establish a non-asymptotic estimation error bound for the actual estimator obtained from the proposed algorithm. This error bound reveals an interesting interaction between the computational efficiency and the statistical rate of convergence. Based on this general error bound, we further obtain an optimal estimation error rate when the distribution of the error tensor is Gaussian. We illustrate the efficacy of our model through intensive simulations and an analysis of the Autism spectrum disorder neuroimaging data.
  • We propose a novel sparse tensor decomposition method, namely Tensor Truncated Power (TTP) method, that incorporates variable selection into the estimation of decomposition components. The sparsity is achieved via an efficient truncation step embedded in the tensor power iteration. Our method applies to a broad family of high dimensional latent variable models, including high dimensional Gaussian mixture and mixtures of sparse regressions. A thorough theoretical investigation is further conducted. In particular, we show that the final decomposition estimator is guaranteed to achieve a local statistical rate, and further strengthen it to the global statistical rate by introducing a proper initialization procedure. In high dimensional regimes, the obtained statistical rate significantly improves those shown in the existing non-sparse decomposition methods. The empirical advantages of TTP are confirmed in extensive simulated results and two real applications of click-through rate prediction and high-dimensional gene clustering.