
We consider the estimation and inference of graphical models that
characterize the dependency structure of highdimensional tensorvalued data.
To facilitate the estimation of the precision matrix corresponding to each way
of the tensor, we assume the data follow a tensor normal distribution whose
covariance has a Kronecker product structure. A critical challenge in the
estimation and inference of this model is the fact that its penalized maximum
likelihood estimation involves minimizing a nonconvex objective function. To
address it, this paper makes two contributions: (i) In spite of the
nonconvexity of this estimation problem, we prove that an alternating
minimization algorithm, which iteratively estimates each sparse precision
matrix while fixing the others, attains an estimator with an optimal
statistical rate of convergence. (ii) We propose a debiased statistical
inference procedure for testing hypotheses on the true support of the sparse
precision matrices, and employ it for testing a growing number of hypothesis
with false discovery rate (FDR) control. The asymptotic normality of our test
statistic and the consistency of FDR control procedure are established. Our
theoretical results are backed up by thorough numerical studies and our real
applications on neuroimaging studies of Autism spectrum disorder and users'
advertising click analysis bring new scientific findings and business insights.
The proposed methods are encoded into a publicly available R package Tlasso.

Measuring the corporate default risk is broadly important in economics and
finance. Quantitative methods have been developed to predictively assess future
corporate default probabilities. However, as a more difficult yet crucial
problem, evaluating the uncertainties associated with the default predictions
remains little explored. In this paper, we attempt to fill this blank by
developing a procedure for quantifying the level of associated uncertainties
upon carefully disentangling multiple contributing sources. Our framework
effectively incorporates broad information from historical default data,
corporates' financial records, and macroeconomic conditions by a)
characterizing the default mechanism, and b) capturing the future dynamics of
various features contributing to the default mechanism. Our procedure overcomes
the major challenges in this large scale statistical inference problem and
makes it practically feasible by using parsimonious models, innovative methods,
and modern computational facilities. By predicting the marketwide total number
of defaults and assessing the associated uncertainties, our method can also be
applied for evaluating the aggregated market credit risk level. Upon analyzing
a US market data set, we demonstrate that the level of uncertainties associated
with default risk assessments is indeed substantial. More informatively, we
also find that the level of uncertainties associated with the default risk
predictions is correlated with the level of default risks, indicating potential
for new scopes in practical applications including improving the accuracy of
default risk assessments.

Graph classification is a fundamental but challenging problem due to the
nonEuclidean property of graph. In this work, we jointly leverage the powerful
representation ability of random walk and the essential success of standard
convolutional network work (CNN), to propose a random walk based convolutional
network, called walksteered convolution (WSC). Different from those existing
graph CNNs with deterministic neighbor searching, we randomly sample
multiscale walk fields by using random walk, which is more flexible to the
scalability of graph. To encode eachscale walk field consisting of several
walk paths, specifically, we characterize the directions of walk field by
multiple Gaussian models so as to better analogize the standard CNNs on images.
Each Gaussian implicitly defines a directions and all of them properly encode
the spatial layout of walks after the gradient projecting to the space of
Gaussian parameters. Further, a graph coarsening layer using dynamical
clustering is stacked upon the Gaussian encoding to capture highlevel
semantics of graph. Comprehensive evaluations on several public datasets well
demonstrate the superiority of our proposed graph learning method over other
stateofthearts for graph classification.

A new lowprofile planar Eleven antenna is designed for optimal MIMO
performance as a wideband MIMO antenna for micro basestations in future
wireless communication systems. The design objective has been to optimize both
the reflection coefficient at the input port of the antenna and the 1bitstream
and 2bitstream MIMO efficiency of the antenna at the same time, in both the
Rich Isotropic MultiPath (RIMP) and Random LineofSight (RandomLOS)
environments. The planar Eleven antenna can be operated in 2, 4, and 8port
modes with slight modifications. The optimization is performed using genetic
algorithms. The effects of polarization deficiencies and antenna total embedded
efficiency on the MIMO performance of the antenna are further studied. A
prototype of the antenna has been fabricated and the design has been verified
by measurements against the simulations.

The performance of 5G wireless communication systems, employing MassiveMIMO
at millimeterwave frequencies, is most likely measured only in OverTheAir
(OTA) setups. It is proposed to perform OTA measurements in two limiting
environments of Rich Isotropic MultiPath (RIMP) and Random LineofSight
(RandomLOS) instead of a typical or representative channel. In the present
paper, we present a backofthe envelope investigation of the impact of
scattering on the frequency dependence of the signal fading statistics in the
500 MHz100 GHz band. We introduce a simple model for a generic scattering
environment by using randomly distributed resonant scatterers to investigate
the impact of the size of the scattering environment, the scatterer density,
and the number of scatterers on the signal variability in terms of the Rician
Kfactor as a function of frequency. The simplified model is also verified
against fullwave simulation using the Method of Moments (MoM).

Cluster analysis is a fundamental tool for pattern discovery of complex
heterogeneous data. Prevalent clustering methods mainly focus on vector or
matrixvariate data and are not applicable to generalorder tensors, which
arise frequently in modern scientific and business applications. Moreover,
there is a gap between statistical guarantees and computational efficiency for
existing tensor clustering solutions due to the nature of their nonconvex
formulations. In this work, we bridge this gap by developing a provable convex
formulation of tensor coclustering. Our convex coclustering (CoCo) estimator
enjoys stability guarantees and is both computationally and storage efficient.
We further establish a nonasymptotic error bound for the CoCo estimator, which
reveals a surprising "blessing of dimensionality" phenomenon that does not
exist in vector or matrixvariate cluster analysis. Our theoretical findings
are supported by extensive simulated studies. Finally, we apply the CoCo
estimator to the cluster analysis of advertisement click tensor data from a
major online company. Our clustering results provide meaningful business
insights to improve advertising effectiveness.

Variations of human body skeletons may be considered as dynamic graphs, which
are generic data representation for numerous realworld applications. In this
paper, we propose a spatiotemporal graph convolution (STGC) approach for
assembling the successes of local convolutional filtering and sequence learning
ability of autoregressive moving average. To encode dynamic graphs, the
constructed multiscale local graph convolution filters, consisting of matrices
of local receptive fields and signal mappings, are recursively performed on
structured graph data of temporal and spatial domain. The proposed model is
generic and principled as it can be generalized into other dynamic models. We
theoretically prove the stability of STGC and provide an upperbound of the
signal transformation to be learnt. Further, the proposed recursive model can
be stacked into a multilayer architecture. To evaluate our model, we conduct
extensive experiments on four benchmark skeletonbased action datasets,
including the largescale challenging NTU RGB+D. The experimental results
demonstrate the effectiveness of our proposed model and the improvement over
the stateoftheart.

In the past decades, intensive efforts have been put to design various loss
functions and metric forms for metric learning problem. These improvements have
shown promising results when the test data is similar to the training data.
However, the trained models often fail to produce reliable distances on the
ambiguous test pairs due to the distribution bias between training set and test
set. To address this problem, the Adversarial Metric Learning (AML) is proposed
in this paper, which automatically generates adversarial pairs to remedy the
distribution bias and facilitate robust metric learning. Specifically, AML
consists of two adversarial stages, i.e. confusion and distinguishment. In
confusion stage, the ambiguous but critical adversarial data pairs are
adaptively generated to mislead the learned metric. In distinguishment stage, a
metric is exhaustively learned to try its best to distinguish both the
adversarial pairs and the original training pairs. Thanks to the challenges
posed by the confusion stage in such competing process, the AML model is able
to grasp plentiful difficult knowledge that has not been contained by the
original training pairs, so the discriminability of AML can be significantly
improved. The entire model is formulated into optimization framework, of which
the global convergence is theoretically proved. The experimental results on toy
data and practical datasets clearly demonstrate the superiority of AML to the
representative stateoftheart metric learning methodologies.

Basing on the analysis by revealing the equivalence of modern networks, we
find that both ResNet and DenseNet are essentially derived from the same "dense
topology", yet they only differ in the form of connection  addition (dubbed
"inner link") vs. concatenation (dubbed "outer link"). However, both two forms
of connections have the superiority and insufficiency. To combine their
advantages and avoid certain limitations on representation learning, we present
a highly efficient and modularized Mixed Link Network (MixNet) which is
equipped with flexible inner link and outer link modules. Consequently, ResNet,
DenseNet and Dual Path Network (DPN) can be regarded as a special case of
MixNet, respectively. Furthermore, we demonstrate that MixNets can achieve
superior efficiency in parameter over the stateoftheart architectures on
many competitive datasets like CIFAR10/100, SVHN and ImageNet.

Deep neural networks have recently been shown to achieve highly competitive
performance in many computer vision tasks due to their abilities of exploring
in a much larger hypothesis space. However, since most deep architectures like
stacked RNNs tend to suffer from the vanishinggradient and overfitting
problems, their effects are still understudied in many NLP tasks. Inspired by
this, we propose a novel multilayer RNN model called densely connected
bidirectional long shortterm memory (DCBiLSTM) in this paper, which
essentially represents each layer by the concatenation of its hidden state and
all preceding layers' hidden states, followed by recursively passing each
layer's representation to all subsequent layers. We evaluate our proposed model
on five benchmark datasets of sentence classification. DCBiLSTM with depth up
to 20 can be successfully trained and obtain significant improvements over the
traditional BiLSTM with the same or even less parameters. Moreover, our model
has promising performance compared with the stateoftheart approaches.

Single image super resolution is a very important computer vision task, with
a wide range of applications. In recent years, the depth of the
superresolution model has been constantly increasing, but with a small
increase in performance, it has brought a huge amount of computation and memory
consumption. In this work, in order to make the super resolution models more
effective, we proposed a novel single image super resolution method via
recursive squeeze and excitation networks (SESR). By introducing the squeeze
and excitation module, our SESR can model the interdependencies and
relationships between channels and that makes our model more efficiency. In
addition, the recursive structure and progressive reconstruction method in our
model minimized the layers and parameters and enabled SESR to simultaneously
train multiscale super resolution in a single model. After evaluating on four
benchmark test sets, our model is proved to be above the stateoftheart
methods in terms of speed and accuracy.

We propose to form a twocomponent effective field theory from L = (L_ce +
L_ch)/2, where L_ce is the Lagrangian of composite electrons with a
ChernSimons term, and L_ch is the particlehole conjugate of L_ce  the
Lagrangian of composite holes. In the theory, the twocomponent fermion field
phi is a composite particlehole spinor coupled to an emergent effective gauge
field in the presence of a background electromagnetic field. The ChernSimons
terms for both the composite electrons and composite holes are exactly
cancelled out, and a 1/2 pseudospin degree of freedom, which responses to the
emergent gauge field the same way as the real spin to the electromagnetic
field, emerges automatically. Furthermore, the composite particlehole spinor
theory has exactly the same form as the nonrelativistic limit of the massless
Dirac composite fermion theory after expanded to the fourcomponent form and
with a mass term added.

This paper first answers the question "why do the two most powerful
techniques Dropout and Batch Normalization (BN) often lead to a worse
performance when they are combined together?" in both theoretical and
statistical aspects. Theoretically, we find that Dropout would shift the
variance of a specific neural unit when we transfer the state of that network
from train to test. However, BN would maintain its statistical variance, which
is accumulated from the entire learning procedure, in the test phase. The
inconsistency of that variance (we name this scheme as "variance shift") causes
the unstable numerical behavior in inference that leads to more erroneous
predictions finally, when applying Dropout before BN. Thorough experiments on
DenseNet, ResNet, ResNeXt and Wide ResNet confirm our findings. According to
the uncovered mechanism, we next explore several strategies that modifies
Dropout and try to overcome the limitations of their combination by avoiding
the variance shift risks.

The particlehole (PH) symmetry at halffilled Landau level requires the
relationship between the flux number N_phi and the particle number N on a
sphere to be exactly N_phi  2(N1) = 1. The wave functions of composite
fermions with 1/2 "orbital spin", which contributes to the shift "1" in the
N_phi and N relationship, are proposed, shown to be PH symmetric, and validated
with exact finite system results. It is shown the manybody composite electron
and composite hole wave functions at halffilling can be formed from the two
components of the same spinor wave function of a massless Dirac fermion at
zeromagnetic field. It is further shown that away from halffilling, the
manybody composite electron wave function at filling factor nu and its PH
conjugated composite hole wave function at 1nu can be formed from the two
components of the very same spinor wave functions of a massless Dirac fermion
at nonzero magnetic field. This relationship leads to the proposal of a very
simple Dirac composite fermion effective field theory, where the twocomponent
Dirac fermion field is a particlehole spinor field coupled to the same
emergent gauge field, with one field component describing the composite
electrons and the other describing the PH conjugated composite holes. As such,
the density of the Dirac spinor field is the density sum of the composite
electron and hole field components, and therefore is equal to the degeneracy of
the Lowest Landau level. On the other hand, the charge density coupled to the
external magnetic field is the density difference between the composite
electron and hole field components, and is therefore neutral at exactly
halffilling. It is shown that the proposed particlehole spinor effective
field theory gives essentially the same electromagnetic responses as Son's
Dirac composite fermion theory does.

The motion analysis of human skeletons is crucial for human action
recognition, which is one of the most active topics in computer vision. In this
paper, we propose a fully endtoend actionattending graphic neural network
(A$^2$GNN) for skeletonbased action recognition, in which each irregular
skeleton is structured as an undirected attribute graph. To extract highlevel
semantic representation from skeletons, we perform the local spectral graph
filtering on the constructed attribute graphs like the standard image
convolution operation. Considering not all joints are informative for action
analysis, we design an actionattending layer to detect those salient action
units (AUs) by adaptively weighting skeletal joints. Herein the filtering
responses are parameterized into a weighting function irrelevant to the order
of input nodes. To further encode continuous motion variations, the deep
features learnt from skeletal graphs are gathered along consecutive temporal
slices and then fed into a recurrent gated network. Finally, the spectral graph
filtering, actionattending and recurrent temporal encoding are integrated
together to jointly train for the sake of robust action recognition as well as
the intelligibility of human actions. To evaluate our A$^2$GNN, we conduct
extensive experiments on four benchmark skeletonbased action datasets,
including the largescale challenging NTU RGB+D dataset. The experimental
results demonstrate that our network achieves the stateoftheart
performances.

Complex bufferless networks such as onchip networks and optical burst
switching networks haven't been paid enough attention in network science. In
complex bufferless networks, the store and forward mechanism is not applicable,
since the network nodes are not allowed to buffer data packets. In this paper,
we study the data transmission process in complex bufferless networks from the
perspective of network science. Specifically, we use the Price model to
generate the underlying network topological structures. We propose a delivery
queue based deflection mechanism, which accompanies the efficient routing
protocol, to transmit data packets in bufferless networks. We investigate the
average deflection times, packets loss rate, average arrival time, and how the
network topological structure and some other factors affect these transmission
performances. Our work provides some clues for the architecture and routing
design of bufferless networks.

Recently, very deep convolutional neural networks (CNNs) have been attracting
considerable attention in image restoration. However, as the depth grows, the
longterm dependency problem is rarely realized for these very deep models,
which results in the prior states/layers having little influence on the
subsequent ones. Motivated by the fact that human thoughts have persistency, we
propose a very deep persistent memory network (MemNet) that introduces a memory
block, consisting of a recursive unit and a gate unit, to explicitly mine
persistent memory through an adaptive learning process. The recursive unit
learns multilevel representations of the current state under different
receptive fields. The representations and the outputs from the previous memory
blocks are concatenated and sent to the gate unit, which adaptively controls
how much of the previous states should be reserved, and decides how much of the
current state should be stored. We apply MemNet to three image restoration
tasks, i.e., image denosing, superresolution and JPEG deblocking.
Comprehensive experiments demonstrate the necessity of the MemNet and its
unanimous superiority on all three tasks over the state of the arts. Code is
available at https://github.com/tyshiwo/MemNet.

Existing blockdiagonal representation researches mainly focuses on casting
blockdiagonal regularization on training data, while only little attention is
dedicated to concurrently learning both blockdiagonal representations of
training and test data. In this paper, we propose a discriminative
blockdiagonal lowrank representation (BDLRR) method for recognition. In
particular, the elaborate BDLRR is formulated as a joint optimization problem
of shrinking the unfavorable representation from offblockdiagonal elements
and strengthening the compact blockdiagonal representation under the
semisupervised framework of lowrank representation. To this end, we first
impose penalty constraints on the negative representation to eliminate the
correlation between different classes such that the incoherence criterion of
the extraclass representation is boosted. Moreover, a constructed subspace
model is developed to enhance the selfexpressive power of training samples and
further build the representation bridge between the training and test samples,
such that the coherence of the learned intraclass representation is
consistently heightened. Finally, the resulting optimization problem is solved
elegantly by employing an alternative optimization strategy, and a simple
recognition algorithm on the learned representation is utilized for final
prediction. Extensive experimental results demonstrate that the proposed method
achieves superb recognition results on four face image datasets, three
character datasets, and the fifteen scene multicategories dataset. It not only
shows superior potential on image recognition but also outperforms
stateoftheart methods.

For human pose estimation in monocular images, joint occlusions and
overlapping upon human bodies often result in deviated pose predictions. Under
these circumstances, biologically implausible pose predictions may be produced.
In contrast, human vision is able to predict poses by exploiting geometric
constraints of joint interconnectivity. To address the problem by
incorporating priors about the structure of human bodies, we propose a novel
structureaware convolutional network to implicitly take such priors into
account during training of the deep network. Explicit learning of such
constraints is typically challenging. Instead, we design discriminators to
distinguish the real poses from the fake ones (such as biologically implausible
ones). If the pose generator (G) generates results that the discriminator fails
to distinguish from real ones, the network successfully learns the priors.

We propose a gametheoretic framework that incorporates both incomplete
information and general ambiguity attitudes on factors external to all players.
Our starting point is players' preferences on payoffdistribution vectors,
essentially mappings from states of the world to distributions of payoffs to be
received by players. There are two ways in which equilibria for this preference
game can be defined. When the preferences possess ever more features, we can
gradually add ever more structures to the game. These include realvalued
utilitylike functions over payoffdistribution vectors, sets of probabilistic
priors over states of the world, and eventually the traditional
expectedutility framework involving one single prior. We establish equilibrium
existence results, show the upper hemicontinuity of equilibrium sets over
changing ambiguity attitudes, and uncover relations between the two versions of
equilibria. Some attention is paid to the enterprising game, in which players
exhibit ambiguity seeking attitudes while betting optimistically on the
favorable resolution of ambiguities. The two solution concepts are unified at
this game's pure equilibria, whose existence is guaranteed when strategic
complementarities are present. The current framework can be applied to settings
like auctions involving ambiguity on competitors' assessments of item worths.

For dynamic situations where the evolution of a player's state is influenced
by his own action as well as other players' states and actions, we show that
equilibria derived for nonatomic games (NGs) can be used by their large finite
counterparts to achieve nearequilibrium performances. We focus on the case
with quite general spaces but also with independently generated shocks driving
random actions and state transitions. The NG equilibria we consider are random
statetoaction maps that pay no attention to players' external environments.
They are adoptable by a variety of real situations where awareness of other
players' states can be anywhere between full and nonexistent. Transient
results here also form the basis of a link between an NG's stationary
equilibrium (SE) and good stationary profiles for large finite games.

We propose a derivative operator formed as a function of derivatives of the
electron coordinates. When the derivative operator is applied to the Laughlin
wave function, two new wave functions in the lowest Landau level at filling
factor 1/2 are generated. For systems of 4, 6, and 8 electrons in spherical
geometry, it is shown that the first wave function has nearly unity overlap
with the particlehole conjugate of the MooreRead Pfaffian wave function,
therefore together with the MooreRead Pfaffian state forms a particlehole
conjugate pair. The second wave function has essentially perfect particlehole
symmetry itself, with a positive parity when the number of electron pairs N/2
is an even integer and and a negative parity when N/2 is an odd integer. An
equivalent form suggests the first wave function forms a fwave pairing of
composite fermions, and the second wave function forms a pwave pairing. The
corresponding NonAbelian statistics quasiparticle wave functions are also
proposed.

Recurrent neural networks (RNNs) have achieved stateoftheart performances
in many natural language processing tasks, such as language modeling and
machine translation. However, when the vocabulary is large, the RNN model will
become very big (e.g., possibly beyond the memory capacity of a GPU device) and
its training will become very inefficient. In this work, we propose a novel
technique to tackle this challenge. The key idea is to use 2Component (2C)
shared embedding for word representations. We allocate every word in the
vocabulary into a table, each row of which is associated with a vector, and
each column associated with another vector. Depending on its position in the
table, a word is jointly represented by two components: a row vector and a
column vector. Since the words in the same row share the row vector and the
words in the same column share the column vector, we only need $2 \sqrt{V}$
vectors to represent a vocabulary of $V$ unique words, which are far less
than the $V$ vectors required by existing approaches. Based on the
2Component shared embedding, we design a new RNN algorithm and evaluate it
using the language modeling task on several benchmark datasets. The results
show that our algorithm significantly reduces the model size and speeds up the
training process, without sacrifice of accuracy (it achieves similar, if not
better, perplexity as compared to stateoftheart language models).
Remarkably, on the OneBillionWord benchmark Dataset, our algorithm achieves
comparable perplexity to previous language models, whilst reducing the model
size by a factor of 40100, and speeding up the training process by a factor of
2. We name our proposed algorithm \emph{LightRNN} to reflect its very small
model size and very high training speed.

Information entropy has been proved to be an effective tool to quantify the
structural importance of complex networks. In the previous work (Xu et al, 2016
\cite{xu2016}), we measure the contribution of a path in link prediction with
information entropy. In this paper, we further quantify the contribution of a
path with both path entropy and path weight, and propose a weighted prediction
index based on the contributions of paths, namely Weighted Path Entropy (WPE),
to improve the prediction accuracy in weighted networks. Empirical experiments
on six weighted realworld networks show that WPE achieves higher prediction
accuracy than three typical weighted indices.

An eigenfunction method is applied to reduce the regular projective
representations (Reps) of finite groups to obtain their irreducible projective
Reps. Antiunitary groups are treated specially, where the decoupled factor
systems and modified Schur's lemma are introduced. We discuss the applications
of irreducible Reps in manybody physics. It is shown that in symmetry
protected topological phases, geometric defects or symmetry defects may carry
projective Rep of the symmetry group; while in symmetry enriched topological
phases, intrinsic excitations (such as spinons or visons) may carry projective
Rep of the symmetry group. We also discuss the applications of projective Reps
in problems related to spectrum degeneracy, such as in search of models without
sign problem in quantum Monte Carlo Simulations.