• ### Combinatorial Inference for Graphical Models(1608.03045)

Feb. 13, 2018 math.ST, stat.TH, stat.ML
We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. To begin with, we develop a unified theory for the fundamental limits of a large family of combinatorial inference problems. We propose new concepts including structural packing and buffer entropies to characterize how the complexity of combinatorial graph structures impacts the corresponding minimax lower bounds. On the other hand, we propose a family of novel and practical structural testing algorithms to match the lower bounds. We provide thorough numerical results on both synthetic graphical models and brain networks to illustrate the usefulness of these proposed methods.
• ### Post-Regularization Inference for Time-Varying Nonparanormal Graphical Models(1512.08298)

Feb. 12, 2018 stat.ML
We propose a novel class of time-varying nonparanormal graphical models, which allows us to model high dimensional heavy-tailed systems and the evolution of their latent network structures. Under this model, we develop statistical tests for presence of edges both locally at a fixed index value and globally over a range of values. The tests are developed for a high-dimensional regime, are robust to model selection mistakes and do not require commonly assumed minimum signal strength. The testing procedures are based on a high dimensional, debiasing-free moment estimator, which uses a novel kernel smoothed Kendall's tau correlation matrix as an input statistic. The estimator consistently estimates the latent inverse Pearson correlation matrix uniformly in both the index variable and kernel bandwidth. Its rate of convergence is shown to be minimax optimal. Our method is supported by thorough numerical simulations and an application to a neural imaging data set.
• ### Kernel Meets Sieve: Post-Regularization Confidence Bands for Sparse Additive Model(1503.02978)

Feb. 12, 2018 math.ST, stat.TH, stat.ML
We develop a novel procedure for constructing confidence bands for components of a sparse additive model. Our procedure is based on a new kernel-sieve hybrid estimator that combines two most popular nonparametric estimation methods in the literature, the kernel regression and the spline method, and is of interest in its own right. Existing methods for fitting sparse additive model are primarily based on sieve estimators, while the literature on confidence bands for nonparametric models are primarily based upon kernel or local polynomial estimators. Our kernel-sieve hybrid estimator combines the best of both worlds and allows us to provide a simple procedure for constructing confidence bands in high-dimensional sparse additive models. We prove that the confidence bands are asymptotically honest by studying approximation with a Gaussian process. Thorough numerical results on both synthetic data and real-world neuroscience data are provided to demonstrate the efficacy of the theory.
• ### Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization(1612.09296)

Jan. 20, 2018 math.OC, cs.LG, stat.ML
We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}. In specific, we characterize the locations of stationary points and the null space of Hessian matrices \xl{of the objective function} via the lens of invariant groups\removed{for associated optimization problems, including low-rank matrix factorization, phase retrieval, and deep linear neural networks}. As a major motivating example, we apply the proposed general theory to characterize the global \xl{landscape} of the \xl{nonconvex optimization in} low-rank matrix factorization problem. In particular, we illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: ($\cR_1$) the region containing the neighborhoods of all strict saddle points, where the objective has negative curvatures; ($\cR_2$) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and ($\cR_3$) the complement of the above regions, where the gradient has sufficiently large magnitudes. We further extend our result to the matrix sensing problem. Such global landscape implies strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.
• ### Inter-Subject Analysis: Inferring Sparse Interactions with Dense Intra-Graphs(1709.07036)

Sept. 20, 2017 math.ST, stat.TH, stat.ME, stat.ML
We develop a new modeling framework for Inter-Subject Analysis (ISA). The goal of ISA is to explore the dependency structure between different subjects with the intra-subject dependency as nuisance. It has important applications in neuroscience to explore the functional connectivity between brain regions under natural stimuli. Our framework is based on the Gaussian graphical models, under which ISA can be converted to the problem of estimation and inference of the inter-subject precision matrix. The main statistical challenge is that we do not impose sparsity constraint on the whole precision matrix and we only assume the inter-subject part is sparse. For estimation, we propose to estimate an alternative parameter to get around the non-sparse issue and it can achieve asymptotic consistency even if the intra-subject dependency is dense. For inference, we propose an "untangle and chord" procedure to de-bias our estimator. It is valid without the sparsity assumption on the inverse Hessian of the log-likelihood function. This inferential method is general and can be applied to many other statistical problems, thus it is of independent theoretical interest. Numerical experiments on both simulated and brain imaging data validate our methods and theory.
• ### Adaptive Inferential Method for Monotone Graph Invariants(1707.09114)

July 28, 2017 math.ST, stat.TH, stat.ML
We consider the problem of undirected graphical model inference. In many applications, instead of perfectly recovering the unknown graph structure, a more realistic goal is to infer some graph invariants (e.g., the maximum degree, the number of connected subgraphs, the number of isolated nodes). In this paper, we propose a new inferential framework for testing nested multiple hypotheses and constructing confidence intervals of the unknown graph invariants under undirected graphical models. Compared to perfect graph recovery, our methods require significantly weaker conditions. This paper makes two major contributions: (i) Methodologically, for testing nested multiple hypotheses, we propose a skip-down algorithm on the whole family of monotone graph invariants (The invariants which are non-decreasing under addition of edges). We further show that the same skip-down algorithm also provides valid confidence intervals for the targeted graph invariants. (ii) Theoretically, we prove that the length of the obtained confidence intervals are optimal and adaptive to the unknown signal strength. We also prove generic lower bounds for the confidence interval length for various invariants. Numerical results on both synthetic simulations and a brain imaging dataset are provided to illustrate the usefulness of the proposed method.
• ### Provable Sparse Tensor Decomposition(1502.01425)

May 2, 2016 stat.ML
We propose a novel sparse tensor decomposition method, namely Tensor Truncated Power (TTP) method, that incorporates variable selection into the estimation of decomposition components. The sparsity is achieved via an efficient truncation step embedded in the tensor power iteration. Our method applies to a broad family of high dimensional latent variable models, including high dimensional Gaussian mixture and mixtures of sparse regressions. A thorough theoretical investigation is further conducted. In particular, we show that the final decomposition estimator is guaranteed to achieve a local statistical rate, and further strengthen it to the global statistical rate by introducing a proper initialization procedure. In high dimensional regimes, the obtained statistical rate significantly improves those shown in the existing non-sparse decomposition methods. The empirical advantages of TTP are confirmed in extensive simulated results and two real applications of click-through rate prediction and high-dimensional gene clustering.
• ### Nonparametric Heterogeneity Testing For Massive Data(1601.06212)

Jan. 23, 2016 math.ST, stat.TH
A massive dataset often consists of a growing number of (potentially) heterogeneous sub-populations. This paper is concerned about testing various forms of heterogeneity arising from massive data. In a general nonparametric framework, a set of testing procedures are designed to accommodate a growing number of sub-populations, denoted as $s$, with computational feasibility. In theory, their null limit distributions are derived as being nearly Chi-square with diverging degrees of freedom as long as $s$ does not grow too fast. Interestingly, we find that a lower bound on $s$ needs to be set for obtaining a sufficiently powerful testing result, so-called "blessing of aggregation." As a by-produc, a type of homogeneity testing is also proposed with a test statistic being aggregated over all sub-populations. Numerical results are presented to support our theory.
• ### Distributed Estimation and Inference with Statistical Guarantees(1509.05457)

Sept. 17, 2015 math.ST, stat.TH
This paper studies hypothesis testing and parameter estimation in the context of the divide and conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n/k$, where $n$ is the sample size. In both low dimensional and high dimensional settings, we address the important question of how to choose $k$ as $n$ grows large, providing a theoretical upper bound on $k$ such that the information loss due to the divide and conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as a practically infeasible oracle with access to the full sample. Thorough numerical results are provided to back up the theory.
• ### Graphical Fermat's Principle and Triangle-Free Graph Estimation(1504.06026)

April 23, 2015 stat.ML
We consider the problem of estimating undirected triangle-free graphs of high dimensional distributions. Triangle-free graphs form a rich graph family which allows arbitrary loopy structures but 3-cliques. For inferential tractability, we propose a graphical Fermat's principle to regularize the distribution family. Such principle enforces the existence of a distribution-dependent pseudo-metric such that any two nodes have a smaller distance than that of two other nodes who have a geodesic path include these two nodes. Guided by this principle, we show that a greedy strategy is able to recover the true graph. The resulting algorithm only requires a pairwise distance matrix as input and is computationally even more efficient than calculating the minimum spanning tree. We consider graph estimation problems under different settings, including discrete and nonparametric distribution families. Thorough numerical results are provided to illustrate the usefulness of the proposed method.