• We consider the estimation and inference of graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. A critical challenge in the estimation and inference of this model is the fact that its penalized maximum likelihood estimation involves minimizing a non-convex objective function. To address it, this paper makes two contributions: (i) In spite of the non-convexity of this estimation problem, we prove that an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, attains an estimator with an optimal statistical rate of convergence. (ii) We propose a de-biased statistical inference procedure for testing hypotheses on the true support of the sparse precision matrices, and employ it for testing a growing number of hypothesis with false discovery rate (FDR) control. The asymptotic normality of our test statistic and the consistency of FDR control procedure are established. Our theoretical results are backed up by thorough numerical studies and our real applications on neuroimaging studies of Autism spectrum disorder and users' advertising click analysis bring new scientific findings and business insights. The proposed methods are encoded into a publicly available R package Tlasso.
  • Measuring the corporate default risk is broadly important in economics and finance. Quantitative methods have been developed to predictively assess future corporate default probabilities. However, as a more difficult yet crucial problem, evaluating the uncertainties associated with the default predictions remains little explored. In this paper, we attempt to fill this blank by developing a procedure for quantifying the level of associated uncertainties upon carefully disentangling multiple contributing sources. Our framework effectively incorporates broad information from historical default data, corporates' financial records, and macroeconomic conditions by a) characterizing the default mechanism, and b) capturing the future dynamics of various features contributing to the default mechanism. Our procedure overcomes the major challenges in this large scale statistical inference problem and makes it practically feasible by using parsimonious models, innovative methods, and modern computational facilities. By predicting the marketwide total number of defaults and assessing the associated uncertainties, our method can also be applied for evaluating the aggregated market credit risk level. Upon analyzing a US market data set, we demonstrate that the level of uncertainties associated with default risk assessments is indeed substantial. More informatively, we also find that the level of uncertainties associated with the default risk predictions is correlated with the level of default risks, indicating potential for new scopes in practical applications including improving the accuracy of default risk assessments.
  • Graph classification is a fundamental but challenging problem due to the non-Euclidean property of graph. In this work, we jointly leverage the powerful representation ability of random walk and the essential success of standard convolutional network work (CNN), to propose a random walk based convolutional network, called walk-steered convolution (WSC). Different from those existing graph CNNs with deterministic neighbor searching, we randomly sample multi-scale walk fields by using random walk, which is more flexible to the scalability of graph. To encode each-scale walk field consisting of several walk paths, specifically, we characterize the directions of walk field by multiple Gaussian models so as to better analogize the standard CNNs on images. Each Gaussian implicitly defines a directions and all of them properly encode the spatial layout of walks after the gradient projecting to the space of Gaussian parameters. Further, a graph coarsening layer using dynamical clustering is stacked upon the Gaussian encoding to capture high-level semantics of graph. Comprehensive evaluations on several public datasets well demonstrate the superiority of our proposed graph learning method over other state-of-the-arts for graph classification.
  • A new low-profile planar Eleven antenna is designed for optimal MIMO performance as a wideband MIMO antenna for micro base-stations in future wireless communication systems. The design objective has been to optimize both the reflection coefficient at the input port of the antenna and the 1-bitstream and 2-bitstream MIMO efficiency of the antenna at the same time, in both the Rich Isotropic MultiPath (RIMP) and Random Line-of-Sight (Random-LOS) environments. The planar Eleven antenna can be operated in 2-, 4-, and 8-port modes with slight modifications. The optimization is performed using genetic algorithms. The effects of polarization deficiencies and antenna total embedded efficiency on the MIMO performance of the antenna are further studied. A prototype of the antenna has been fabricated and the design has been verified by measurements against the simulations.
  • The performance of 5G wireless communication systems, employing Massive-MIMO at millimeter-wave frequencies, is most likely measured only in Over-The-Air (OTA) setups. It is proposed to perform OTA measurements in two limiting environments of Rich Isotropic MultiPath (RIMP) and Random Line-of-Sight (Random-LOS) instead of a typical or representative channel. In the present paper, we present a back-of-the envelope investigation of the impact of scattering on the frequency dependence of the signal fading statistics in the 500 MHz-100 GHz band. We introduce a simple model for a generic scattering environment by using randomly distributed resonant scatterers to investigate the impact of the size of the scattering environment, the scatterer density, and the number of scatterers on the signal variability in terms of the Rician K-factor as a function of frequency. The simplified model is also verified against full-wave simulation using the Method of Moments (MoM).
  • Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations. In this work, we bridge this gap by developing a provable convex formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator enjoys stability guarantees and is both computationally and storage efficient. We further establish a non-asymptotic error bound for the CoCo estimator, which reveals a surprising "blessing of dimensionality" phenomenon that does not exist in vector or matrix-variate cluster analysis. Our theoretical findings are supported by extensive simulated studies. Finally, we apply the CoCo estimator to the cluster analysis of advertisement click tensor data from a major online company. Our clustering results provide meaningful business insights to improve advertising effectiveness.
  • Variations of human body skeletons may be considered as dynamic graphs, which are generic data representation for numerous real-world applications. In this paper, we propose a spatio-temporal graph convolution (STGC) approach for assembling the successes of local convolutional filtering and sequence learning ability of autoregressive moving average. To encode dynamic graphs, the constructed multi-scale local graph convolution filters, consisting of matrices of local receptive fields and signal mappings, are recursively performed on structured graph data of temporal and spatial domain. The proposed model is generic and principled as it can be generalized into other dynamic models. We theoretically prove the stability of STGC and provide an upper-bound of the signal transformation to be learnt. Further, the proposed recursive model can be stacked into a multi-layer architecture. To evaluate our model, we conduct extensive experiments on four benchmark skeleton-based action datasets, including the large-scale challenging NTU RGB+D. The experimental results demonstrate the effectiveness of our proposed model and the improvement over the state-of-the-art.
  • In the past decades, intensive efforts have been put to design various loss functions and metric forms for metric learning problem. These improvements have shown promising results when the test data is similar to the training data. However, the trained models often fail to produce reliable distances on the ambiguous test pairs due to the distribution bias between training set and test set. To address this problem, the Adversarial Metric Learning (AML) is proposed in this paper, which automatically generates adversarial pairs to remedy the distribution bias and facilitate robust metric learning. Specifically, AML consists of two adversarial stages, i.e. confusion and distinguishment. In confusion stage, the ambiguous but critical adversarial data pairs are adaptively generated to mislead the learned metric. In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both the adversarial pairs and the original training pairs. Thanks to the challenges posed by the confusion stage in such competing process, the AML model is able to grasp plentiful difficult knowledge that has not been contained by the original training pairs, so the discriminability of AML can be significantly improved. The entire model is formulated into optimization framework, of which the global convergence is theoretically proved. The experimental results on toy data and practical datasets clearly demonstrate the superiority of AML to the representative state-of-the-art metric learning methodologies.
  • Basing on the analysis by revealing the equivalence of modern networks, we find that both ResNet and DenseNet are essentially derived from the same "dense topology", yet they only differ in the form of connection -- addition (dubbed "inner link") vs. concatenation (dubbed "outer link"). However, both two forms of connections have the superiority and insufficiency. To combine their advantages and avoid certain limitations on representation learning, we present a highly efficient and modularized Mixed Link Network (MixNet) which is equipped with flexible inner link and outer link modules. Consequently, ResNet, DenseNet and Dual Path Network (DPN) can be regarded as a special case of MixNet, respectively. Furthermore, we demonstrate that MixNets can achieve superior efficiency in parameter over the state-of-the-art architectures on many competitive datasets like CIFAR-10/100, SVHN and ImageNet.
  • Deep neural networks have recently been shown to achieve highly competitive performance in many computer vision tasks due to their abilities of exploring in a much larger hypothesis space. However, since most deep architectures like stacked RNNs tend to suffer from the vanishing-gradient and overfitting problems, their effects are still understudied in many NLP tasks. Inspired by this, we propose a novel multi-layer RNN model called densely connected bidirectional long short-term memory (DC-Bi-LSTM) in this paper, which essentially represents each layer by the concatenation of its hidden state and all preceding layers' hidden states, followed by recursively passing each layer's representation to all subsequent layers. We evaluate our proposed model on five benchmark datasets of sentence classification. DC-Bi-LSTM with depth up to 20 can be successfully trained and obtain significant improvements over the traditional Bi-LSTM with the same or even less parameters. Moreover, our model has promising performance compared with the state-of-the-art approaches.
  • Single image super resolution is a very important computer vision task, with a wide range of applications. In recent years, the depth of the super-resolution model has been constantly increasing, but with a small increase in performance, it has brought a huge amount of computation and memory consumption. In this work, in order to make the super resolution models more effective, we proposed a novel single image super resolution method via recursive squeeze and excitation networks (SESR). By introducing the squeeze and excitation module, our SESR can model the interdependencies and relationships between channels and that makes our model more efficiency. In addition, the recursive structure and progressive reconstruction method in our model minimized the layers and parameters and enabled SESR to simultaneously train multi-scale super resolution in a single model. After evaluating on four benchmark test sets, our model is proved to be above the state-of-the-art methods in terms of speed and accuracy.
  • We propose to form a two-component effective field theory from L = (L_ce + L_ch)/2, where L_ce is the Lagrangian of composite electrons with a Chern-Simons term, and L_ch is the particle-hole conjugate of L_ce - the Lagrangian of composite holes. In the theory, the two-component fermion field phi is a composite particle-hole spinor coupled to an emergent effective gauge field in the presence of a background electromagnetic field. The Chern-Simons terms for both the composite electrons and composite holes are exactly cancelled out, and a 1/2 pseudospin degree of freedom, which responses to the emergent gauge field the same way as the real spin to the electromagnetic field, emerges automatically. Furthermore, the composite particle-hole spinor theory has exactly the same form as the non-relativistic limit of the massless Dirac composite fermion theory after expanded to the four-component form and with a mass term added.
  • This paper first answers the question "why do the two most powerful techniques Dropout and Batch Normalization (BN) often lead to a worse performance when they are combined together?" in both theoretical and statistical aspects. Theoretically, we find that Dropout would shift the variance of a specific neural unit when we transfer the state of that network from train to test. However, BN would maintain its statistical variance, which is accumulated from the entire learning procedure, in the test phase. The inconsistency of that variance (we name this scheme as "variance shift") causes the unstable numerical behavior in inference that leads to more erroneous predictions finally, when applying Dropout before BN. Thorough experiments on DenseNet, ResNet, ResNeXt and Wide ResNet confirm our findings. According to the uncovered mechanism, we next explore several strategies that modifies Dropout and try to overcome the limitations of their combination by avoiding the variance shift risks.
  • The particle-hole (PH) symmetry at half-filled Landau level requires the relationship between the flux number N_phi and the particle number N on a sphere to be exactly N_phi - 2(N-1) = 1. The wave functions of composite fermions with 1/2 "orbital spin", which contributes to the shift "1" in the N_phi and N relationship, are proposed, shown to be PH symmetric, and validated with exact finite system results. It is shown the many-body composite electron and composite hole wave functions at half-filling can be formed from the two components of the same spinor wave function of a massless Dirac fermion at zero-magnetic field. It is further shown that away from half-filling, the many-body composite electron wave function at filling factor nu and its PH conjugated composite hole wave function at 1-nu can be formed from the two components of the very same spinor wave functions of a massless Dirac fermion at non-zero magnetic field. This relationship leads to the proposal of a very simple Dirac composite fermion effective field theory, where the two-component Dirac fermion field is a particle-hole spinor field coupled to the same emergent gauge field, with one field component describing the composite electrons and the other describing the PH conjugated composite holes. As such, the density of the Dirac spinor field is the density sum of the composite electron and hole field components, and therefore is equal to the degeneracy of the Lowest Landau level. On the other hand, the charge density coupled to the external magnetic field is the density difference between the composite electron and hole field components, and is therefore neutral at exactly half-filling. It is shown that the proposed particle-hole spinor effective field theory gives essentially the same electromagnetic responses as Son's Dirac composite fermion theory does.
  • The motion analysis of human skeletons is crucial for human action recognition, which is one of the most active topics in computer vision. In this paper, we propose a fully end-to-end action-attending graphic neural network (A$^2$GNN) for skeleton-based action recognition, in which each irregular skeleton is structured as an undirected attribute graph. To extract high-level semantic representation from skeletons, we perform the local spectral graph filtering on the constructed attribute graphs like the standard image convolution operation. Considering not all joints are informative for action analysis, we design an action-attending layer to detect those salient action units (AUs) by adaptively weighting skeletal joints. Herein the filtering responses are parameterized into a weighting function irrelevant to the order of input nodes. To further encode continuous motion variations, the deep features learnt from skeletal graphs are gathered along consecutive temporal slices and then fed into a recurrent gated network. Finally, the spectral graph filtering, action-attending and recurrent temporal encoding are integrated together to jointly train for the sake of robust action recognition as well as the intelligibility of human actions. To evaluate our A$^2$GNN, we conduct extensive experiments on four benchmark skeleton-based action datasets, including the large-scale challenging NTU RGB+D dataset. The experimental results demonstrate that our network achieves the state-of-the-art performances.
  • Complex bufferless networks such as on-chip networks and optical burst switching networks haven't been paid enough attention in network science. In complex bufferless networks, the store and forward mechanism is not applicable, since the network nodes are not allowed to buffer data packets. In this paper, we study the data transmission process in complex bufferless networks from the perspective of network science. Specifically, we use the Price model to generate the underlying network topological structures. We propose a delivery queue based deflection mechanism, which accompanies the efficient routing protocol, to transmit data packets in bufferless networks. We investigate the average deflection times, packets loss rate, average arrival time, and how the network topological structure and some other factors affect these transmission performances. Our work provides some clues for the architecture and routing design of bufferless networks.
  • Recently, very deep convolutional neural networks (CNNs) have been attracting considerable attention in image restoration. However, as the depth grows, the long-term dependency problem is rarely realized for these very deep models, which results in the prior states/layers having little influence on the subsequent ones. Motivated by the fact that human thoughts have persistency, we propose a very deep persistent memory network (MemNet) that introduces a memory block, consisting of a recursive unit and a gate unit, to explicitly mine persistent memory through an adaptive learning process. The recursive unit learns multi-level representations of the current state under different receptive fields. The representations and the outputs from the previous memory blocks are concatenated and sent to the gate unit, which adaptively controls how much of the previous states should be reserved, and decides how much of the current state should be stored. We apply MemNet to three image restoration tasks, i.e., image denosing, super-resolution and JPEG deblocking. Comprehensive experiments demonstrate the necessity of the MemNet and its unanimous superiority on all three tasks over the state of the arts. Code is available at https://github.com/tyshiwo/MemNet.
  • Existing block-diagonal representation researches mainly focuses on casting block-diagonal regularization on training data, while only little attention is dedicated to concurrently learning both block-diagonal representations of training and test data. In this paper, we propose a discriminative block-diagonal low-rank representation (BDLRR) method for recognition. In particular, the elaborate BDLRR is formulated as a joint optimization problem of shrinking the unfavorable representation from off-block-diagonal elements and strengthening the compact block-diagonal representation under the semi-supervised framework of low-rank representation. To this end, we first impose penalty constraints on the negative representation to eliminate the correlation between different classes such that the incoherence criterion of the extra-class representation is boosted. Moreover, a constructed subspace model is developed to enhance the self-expressive power of training samples and further build the representation bridge between the training and test samples, such that the coherence of the learned intra-class representation is consistently heightened. Finally, the resulting optimization problem is solved elegantly by employing an alternative optimization strategy, and a simple recognition algorithm on the learned representation is utilized for final prediction. Extensive experimental results demonstrate that the proposed method achieves superb recognition results on four face image datasets, three character datasets, and the fifteen scene multi-categories dataset. It not only shows superior potential on image recognition but also outperforms state-of-the-art methods.
  • For human pose estimation in monocular images, joint occlusions and overlapping upon human bodies often result in deviated pose predictions. Under these circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of joint inter-connectivity. To address the problem by incorporating priors about the structure of human bodies, we propose a novel structure-aware convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator (G) generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors.
  • We propose a game-theoretic framework that incorporates both incomplete information and general ambiguity attitudes on factors external to all players. Our starting point is players' preferences on payoff-distribution vectors, essentially mappings from states of the world to distributions of payoffs to be received by players. There are two ways in which equilibria for this preference game can be defined. When the preferences possess ever more features, we can gradually add ever more structures to the game. These include real-valued utility-like functions over payoff-distribution vectors, sets of probabilistic priors over states of the world, and eventually the traditional expected-utility framework involving one single prior. We establish equilibrium existence results, show the upper hemi-continuity of equilibrium sets over changing ambiguity attitudes, and uncover relations between the two versions of equilibria. Some attention is paid to the enterprising game, in which players exhibit ambiguity seeking attitudes while betting optimistically on the favorable resolution of ambiguities. The two solution concepts are unified at this game's pure equilibria, whose existence is guaranteed when strategic complementarities are present. The current framework can be applied to settings like auctions involving ambiguity on competitors' assessments of item worths.
  • For dynamic situations where the evolution of a player's state is influenced by his own action as well as other players' states and actions, we show that equilibria derived for nonatomic games (NGs) can be used by their large finite counterparts to achieve near-equilibrium performances. We focus on the case with quite general spaces but also with independently generated shocks driving random actions and state transitions. The NG equilibria we consider are random state-to-action maps that pay no attention to players' external environments. They are adoptable by a variety of real situations where awareness of other players' states can be anywhere between full and non-existent. Transient results here also form the basis of a link between an NG's stationary equilibrium (SE) and good stationary profiles for large finite games.
  • We propose a derivative operator formed as a function of derivatives of the electron coordinates. When the derivative operator is applied to the Laughlin wave function, two new wave functions in the lowest Landau level at filling factor 1/2 are generated. For systems of 4, 6, and 8 electrons in spherical geometry, it is shown that the first wave function has nearly unity overlap with the particle-hole conjugate of the Moore-Read Pfaffian wave function, therefore together with the Moore-Read Pfaffian state forms a particle-hole conjugate pair. The second wave function has essentially perfect particle-hole symmetry itself, with a positive parity when the number of electron pairs N/2 is an even integer and and a negative parity when N/2 is an odd integer. An equivalent form suggests the first wave function forms a f-wave pairing of composite fermions, and the second wave function forms a p-wave pairing. The corresponding Non-Abelian statistics quasiparticle wave functions are also proposed.
  • Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector. Depending on its position in the table, a word is jointly represented by two components: a row vector and a column vector. Since the words in the same row share the row vector and the words in the same column share the column vector, we only need $2 \sqrt{|V|}$ vectors to represent a vocabulary of $|V|$ unique words, which are far less than the $|V|$ vectors required by existing approaches. Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets. The results show that our algorithm significantly reduces the model size and speeds up the training process, without sacrifice of accuracy (it achieves similar, if not better, perplexity as compared to state-of-the-art language models). Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves comparable perplexity to previous language models, whilst reducing the model size by a factor of 40-100, and speeding up the training process by a factor of 2. We name our proposed algorithm \emph{LightRNN} to reflect its very small model size and very high training speed.
  • Information entropy has been proved to be an effective tool to quantify the structural importance of complex networks. In the previous work (Xu et al, 2016 \cite{xu2016}), we measure the contribution of a path in link prediction with information entropy. In this paper, we further quantify the contribution of a path with both path entropy and path weight, and propose a weighted prediction index based on the contributions of paths, namely Weighted Path Entropy (WPE), to improve the prediction accuracy in weighted networks. Empirical experiments on six weighted real-world networks show that WPE achieves higher prediction accuracy than three typical weighted indices.
  • An eigenfunction method is applied to reduce the regular projective representations (Reps) of finite groups to obtain their irreducible projective Reps. Anti-unitary groups are treated specially, where the decoupled factor systems and modified Schur's lemma are introduced. We discuss the applications of irreducible Reps in many-body physics. It is shown that in symmetry protected topological phases, geometric defects or symmetry defects may carry projective Rep of the symmetry group; while in symmetry enriched topological phases, intrinsic excitations (such as spinons or visons) may carry projective Rep of the symmetry group. We also discuss the applications of projective Reps in problems related to spectrum degeneracy, such as in search of models without sign problem in quantum Monte Carlo Simulations.