
In this paper, we study the problem of designing efficient convolutional
neural network architectures with the interest in eliminating the redundancy in
convolution kernels. In addition to structured sparse kernels, lowrank kernels
and the product of lowrank kernels, the product of structured sparse kernels,
which is a framework for interpreting the recentlydeveloped interleaved group
convolutions (IGC) and its variants (e.g., Xception), has been attracting
increasing interests.
Motivated by the observation that the convolutions contained in a group
convolution in IGC can be further decomposed in the same manner, we present a
modularized building block, {IGCV$2$:} interleaved structured sparse
convolutions. It generalizes interleaved group convolutions, which is composed
of two structured sparse kernels, to the product of more structured sparse
kernels, further eliminating the redundancy. We present the complementary
condition and the balance condition to guide the design of structured sparse
kernels, obtaining a balance among three aspects: model size, computation
complexity and classification accuracy. Experimental results demonstrate the
advantage on the balance among these three aspects compared to interleaved
group convolutions and Xception, and competitive performance compared to other
stateoftheart architecture design methods.

Semantic matching of natural language sentences or identifying the
relationship between two sentences is a core research problem underlying many
natural language tasks. Depending on whether training data is available, prior
research has proposed both unsupervised distancebased schemes and supervised
deep learning schemes for sentence matching. However, previous approaches
either omit or fail to fully utilize the ordered, hierarchical, and flexible
structures of language objects, as well as the interactions between them. In
this paper, we propose Hierarchical Sentence Factorizationa technique to
factorize a sentence into a hierarchical representation, with the components at
each different scale reordered into a "predicateargument" form. The proposed
sentence factorization technique leads to the invention of: 1) a new
unsupervised distance metric which calculates the semantic distance between a
pair of text snippets by solving a penalized optimal transport problem while
preserving the logical relationship of words in the reordered sentences, and 2)
new multiscale deep learning models for supervised semantic training, based on
factorized sentence hierarchies. We apply our techniques to textpair
similarity estimation and textpair relationship classification tasks, based on
multiple datasets such as STSbenchmark, the Microsoft Research paraphrase
identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments
show that the proposed hierarchical sentence factorization can be used to
significantly improve the performance of existing unsupervised distancebased
metrics as well as multiple supervised deep learning models based on the
convolutional neural network (CNN) and long shortterm memory (LSTM).

By applying the delicate \textit{a priori} estimates for the equations of
$(\Phi,\Gamma)$, which is introduced in the previous work, we obtain some
multiscale regularity criteria of the swirl component $u^{\theta}$ for the 3D
axisymmetric NavierStokes equations. In particularly, the solution
$\mathbf{u}$ can be continued beyond the time $T$, provided that $u^{\theta}$
satiesfies $$ u^{\theta} \in
L^{p}_{T}L^{q_{v}}_{v}L^{q_{h},w}_{h},~~\frac{2}{p}+\frac{1}{q_{v}}+\frac{2}{q_{h}}\leq
1, ~2<q_{h}\leq\infty,~\frac{1}{q_{v}}+\frac{2}{q_{h}}<1. $$

Identifying the relationship between two text objects is a core research
problem underlying many natural language processing tasks. A wide range of deep
learning schemes have been proposed for text matching, mainly focusing on
sentence matching, question answering or query document matching. We point out
that existing approaches do not perform well at matching long documents, which
is critical, for example, to AIbased news article understanding and event or
story formation. The reason is that these methods either omit or fail to fully
utilize complicated semantic structures in long documents. In this paper, we
propose a graph approach to text matching, especially targeting long document
matching, such as identifying whether two news articles report the same event
in the real world, possibly with different narratives. We propose the Concept
Interaction Graph to yield a graph representation for a document, with vertices
representing different concepts, each being one or a group of coherent keywords
in the document, and with edges representing the interactions between different
concepts, connected by sentences in the document. Based on the graph
representation of document pairs, we further propose a Siamese Encoded Graph
Convolutional Network that learns vertex representations through a Siamese
neural network and aggregates the vertex features though Graph Convolutional
Networks to generate the matching result. Extensive evaluation of the proposed
approach based on two labeled news article datasets created at Tencent for its
intelligent news products show that the proposed graph approach to long
document matching significantly outperforms a wide range of stateoftheart
methods.

Recently, Ising superconductors which possess inplane upper critical fields
much larger than the Pauli limit field are under intense experimental study.
Many monolayer or few layer transition metal dichalcogenides are shown to be
Ising superconductors. In this work, we show that in a wide range of
experimentally accessible regimes where the inplane magnetic field is higher
than the Pauli limit field but lower than $H_{c2}$, a 2Hstructure monolayer
NbSe$_2$ or simiarly TaS$_2$ becomes a nodal topological superconductor. The
bulk nodal points appear on the $\Gamma M$ lines of the Brillouin zone where
the Ising SOC vanishes. The nodal points are connected by Majorana flat bands,
similar to the Weyl points being connected by surface Fermi arcs in Weyl
semimetals. The Majorana flat bands are associated with a large number of zero
energy Majorana fermion edge modes which induce spintriplet Cooper pairs. This
work demonstrates an experimentally feasible way to realise Majorana fermions
in nodal topological superconductor, without any fining tuning of experimental
parameters.

It is well accepted that convolutional neural networks play an important role
in learning excellent features for image classification and recognition.
However, in tradition they only allow adjacent layers connected, limiting
integration of multiscale information. To further improve their performance,
we present a concatenating framework of shortcut convolutional neural networks.
This framework can concatenate multiscale features by shortcut connections to
the fullyconnected layer that is directly fed to the output layer. We do a
large number of experiments to investigate performance of the shortcut
convolutional neural networks on many benchmark visual datasets for different
tasks. The datasets include AR, FERET, FaceScrub, CelebA for gender
classification, CUReT for texture classification, MNIST for digit recognition,
and CIFAR10 for object recognition. Experimental results show that the
shortcut convolutional neural networks can achieve better results than the
traditional ones on these tasks, with more stability in different settings of
pooling schemes, activation functions, optimizations, initializations, kernel
numbers and kernel sizes.

This work reports an experimental study on an antiferromagnetic honeycomb
lattice of MnPS$_3$ that couples the valley degree of freedom to a macroscopic
antiferromagnetic order. The crystal structure of MnPS$_3$ is identified by
high resolution scanning transmission electron microscopy. Layer dependent
angle resolved polarized Raman fingerprints of the MnPS$_3$ crystal are
obtained and the Raman peak at 383 cm$^{1}$ exhibits 100% polarity.
Temperature dependences of anisotropic magnetic susceptibility of MnPS$_3$
crystal are measured in superconducting quantum interference device. Magnetic
parameters like effective magnetic moment, and exchange interaction are
extracted from the mean field approximation mode. Ambipolar electronic
transport channels in MnPS$_3$ are realized by the liquid gating technique. The
conducting channel of MnPS$_3$ offers a unique platform for exploring the
spin/valleytronics and magnetic orders in 2D limitation.

In this paper, we present a simple and modularized neural network
architecture, named interleaved group convolutional neural networks (IGCNets).
The main point lies in a novel building block, a pair of two successive
interleaved group convolutions: primary group convolution and secondary group
convolution. The two group convolutions are complementary: (i) the convolution
on each partition in primary group convolution is a spatial convolution, while
on each partition in secondary group convolution, the convolution is a
pointwise convolution; (ii) the channels in the same secondary partition come
from different primary partitions. We discuss one representative advantage:
Wider than a regular convolution with the number of parameters and the
computation complexity preserved. We also show that regular convolutions, group
convolution with summation fusion, and the Xception block are special cases of
interleaved group convolutions. Empirical results over standard benchmarks,
CIFAR$10$, CIFAR$100$, SVHN and ImageNet demonstrate that our networks are
more efficient in using parameters and computation complexity with similar or
higher accuracy.

Nearest neighbor search is a problem of finding the data points from the
database such that the distances from them to the query point are the smallest.
Learning to hash is one of the major solutions to this problem and has been
widely studied recently. In this paper, we present a comprehensive survey of
the learning to hash algorithms, categorize them according to the manners of
preserving the similarities into: pairwise similarity preserving, multiwise
similarity preserving, implicit similarity preserving, as well as quantization,
and discuss their relations. We separate quantization from pairwise similarity
preserving as the objective function is very different though quantization, as
we show, can be derived from preserving the pairwise similarities. In addition,
we present the evaluation protocols, and the general performance analysis, and
point out that the quantization algorithms perform superiorly in terms of
search accuracy, search time cost, and space cost. Finally, we introduce a few
emerging topics.

Hierarchical C@MoS2@C hollow spheres with the active MoS2 nanosheets being
sandwiched by carbon layers have been produced by means of a modified template
method. The process applies polydopamine (PDA) layers which inhibit morphology
change of the template thereby enforcing the hollow microsphere structure. In
addition, PDA forms complexes with the Mo precursor, leading to an insitu
growth of MoS2 on its surface and preventing the nanosheets from agglomeration.
It also supplies the carbon that finally sandwiches the 100150 nm thin MoS2
spheres. The resulting hierarchically structured material provides a stable
microstructure where carbon layers strongly linked to MoS2 offer efficient
pathways for electron and ion transfer, and concomitantly buffer the volume
changes inevitably appearing during the chargedischarge process.
Carbonsandwiched MoS2based electrodes exhibit high specific capacity of
approximately 900 mA h g1 after 50 cycles at 0.1 C, excellent cycling
stability up to 200 cycles, and superior rate performance. The versatile
synthesis method reported here offers a general route to design hollow sandwich
structures with a variety of different active materials.

We demonstrate that charge density wave (CDW) phase transition occurs on the
surface of electronically doped multilayer graphene when the Fermi level
approaches the M points (also known as van Hove singularities where the density
of states diverge) in the Brillouin zone of graphene band structure. The
occurrence of such CDW phase transitions are supported by both the electrical
transport measurement and optical measurements in electrostatically doped
multilayer graphene. The CDW transition is accompanied with the sudden change
of graphene channel resistance at T$_m$= 100K, as well as the splitting of
Raman G peak (1580 cm$^{1}$). The splitting of Raman G peak indicats the
lifting of inplane optical phonon branch degeneracy and the nondegenerate
phonon branches are correlated to the lattice reconstructions of graphene 
the CDW phase transition.

In this paper, we consider the global wellposedness problem of the
isentropic compressible NavierStokes equations in the whole space $\R^N$ with
$N\ge2$. In order to better reflect the characteristics of the dispersion
equation, we make full use of the role of the frequency on the integrability
and regularity of the solution, and prove that the isentropic compressible
NavierStokes equations admit global solutions when the initial data are close
to a stable equilibrium in the sense of suitable hybrid Besov norm. As a
consequence, the initial velocity with arbitrary $\dot{B}^{\fr{N}{2}1}_{2,1}$
norm of potential part $\Pe^\bot u_0$ and large highly oscillating are allowed
in our results. The proof relies heavily on the dispersive estimates for the
system of acoustics, and a careful study of the nonlinear terms.

In this paper, we present a novel deep learning approach, deeplyfused nets.
The central idea of our approach is deep fusion, i.e., combine the intermediate
representations of base networks, where the fused output serves as the input of
the remaining part of each base network, and perform such combinations deeply
over several intermediate representations. The resulting deeply fused net
enjoys several benefits. First, it is able to learn multiscale representations
as it enjoys the benefits of more base networks, which could form the same
fused network, other than the initial group of base networks. Second, in our
suggested fused net formed by one deep and one shallow base networks, the flows
of the information from the earlier intermediate layer of the deep base network
to the output and from the input to the later intermediate layer of the deep
base network are both improved. Last, the deep and shallow base networks are
jointly learnt and can benefit from each other. More interestingly, the
essential depth of a fused net composed from a deep base network and a shallow
base network is reduced because the fused net could be composed from a less
deep base network, and thus training the fused net is less difficult than
training the initial deep base network. Empirical results demonstrate that our
approach achieves superior performance over two closelyrelated methods, ResNet
and Highway, and competitive performance compared to the stateofthearts.

The inference procedure for the mean of a stationary time series is usually
quite different under various model assumptions because the partial sum process
behaves differently depending on whether the time series is short or longrange
dependent, or whether it has a light or heavytailed marginal distribution. In
the current paper, we develop an asymptotic theory for the selfnormalized
block sampling, and prove that the corresponding block sampling method can
provide a unified inference approach for the aforementioned different
situations in the sense that it does not require the {\em a priori} estimation
of auxiliary parameters. Monte Carlo simulations are presented to illustrate
its finitesample performance. The R function implementing the method is
available from the authors.

In this paper, we investigate the global wellposedness for the 3D
inhomogeneous incompressible NavierStokes system with the axisymmetric initial
data. We prove the global wellposedness provided that
$$\\frac{a_{0}}{r}\_{\infty} \textrm{ and } \u_{0}^{\theta}\_{3} \textrm{
are sufficiently small}.
$$
Furthermore, if $\mathbf{u}_0\in L^1$ and $ru^\theta_0\in L^1\cap L^2$, we
have \begin{equation*} \u^{\theta}(t)\_{2}^{2}+\langle t\rangle \\nabla
(u^{\theta}\mathbf{e}_{\theta})(t)\_{2}^{2}+t\langle
t\rangle(\u_{t}^{\theta}(t)\_{2}^{2}+\\Delta(u^{\theta}\mathbf{e}_{\theta})(t)\_{2}^{2})
\leq C \langle t\rangle^{\frac{5}{2}},\ \forall\ t>0. \end{equation*}

We describe tunable optical sawtooth and zigzag lattices for ultracold atoms.
Making use of the superlattice generated by commensurate wavelengths of light
beams, tunable geometries including zigzag and sawtooth configurations can be
realised. We provide an experimentally feasible method to fully control inter
($t$) and intra ($t'$) unitcell tunnelling in zigzag and sawtooth lattices.
We analyse the conversion of the lattice geometry from zigzag to sawtooth, and
show that a nearly flat band is attainable in the sawtooth configuration by
means of tuning the lattice parameters. The bandwidth of the first excited band
can be reduced up to 2$\%$ of the ground bandwidth for a wide range of lattice
setting. A nearly flat band available in a tunable sawtooth lattice would offer
a versatile platform for the study of interactiondriven quantum manybody
states with ultracold atoms.

In this paper, we study the threedimensional axisymmetric NavierStokes
system with nonzero swirl. By establishing a new key inequality for the pair
$(\frac{\omega^{r}}{r},\frac{\omega^{\theta}}{r})$, we get several ProdiSerrin
type regularity criteria based on the angular velocity, $u^\theta$. Moreover,
we obtain the global wellposedness result if the initial angular velocity
$u_{0}^{\theta}$ is appropriate small in the critical space $L^{3}(\R^{3})$.
Furthermore, we also get several ProdiSerrin type regularity criteria based on
one component of the solutions, say $\omega^3$ or $u^3$.

This paper considers a general class of nonparametric time series regression
models where the regression function can be timedependent. We establish an
asymptotic theory for estimates of the timevarying regression functions. For
this general class of models, an important issue in practice is to address the
necessity of modeling the regression function as nonlinear and timevarying. To
tackle this, we propose an information criterion and prove its selection
consistency property. The results are applied to the U.S. Treasury interest
rate data.

Rating Prediction is a basic problem in Recommender System, and one of the
most widely used method is Factorization Machines(FM). However, traditional
matrix factorization methods fail to utilize the benefit of implicit feedback,
which has been proved to be important in Rating Prediction problem. In this
work, we consider a specific situation, movie rating prediction, where we
assume that watching history has a big influence on his/her rating behavior on
an item. We introduce two models, Latent Dirichlet Allocation(LDA) and
word2vec, both of which perform stateoftheart results in training latent
features. Based on that, we propose two feature based models. One is the
Topicbased FM Model which provides the implicit feedback to the matrix
factorization. The other is the Vectorbased FM Model which expresses the order
info of watching history. Empirical results on three datasets demonstrate that
our method performs better than the baseline model and confirm that
Vectorbased FM Model usually works better as it contains the order info.

In this article, we consider the global wellposedness to the 3D
incompressible inhomogeneous NavierStokes equations with a class of large
velocity. More precisely, assuming $a_0 \in
\dot{B}_{q,1}^{\frac{3}{q}}(\mathbb{R}^3)$ and $u_0=(u_0^h,u_0^3)\in
\dot{B}_{p,1}^{1+\frac{3}{p}}(\mathbb{R}^3)$ for $p,q \in (1,6)$ with
$\sup(\frac{1}{p}, \frac{1}{q})\leq\frac{1}{3}+ \inf (\frac{1}{p},
\frac{1}{q})$, we prove that if
$C\a_0\_{\dot{B}_{q,1}^{\frac{3}{q}}}^{\alpha}(\u_0^3\_{\dot{B}_{p,1}^{1+\frac{3}{p}}}/{\mu}+1)\leq1$,
$\frac{C}{\mu}(\u_0^h\_{\dot{B}_{p,1}^{1+\frac{3}{p}}}+\u_0^3\_{\dot{B}_{p,1}^{1+\frac{3}{p}}}^{1\alpha}\u_0^h\_{\dot{B}_{p,1}^{1+\frac{3}{p}}}^{\alpha})\leq
1$, then the system has a unique global solution
$a\in\widetilde{\mathcal{C}}([0,\infty);\dot{B}_{q,1}^{\frac{3}{q}}(\mathbb{R}^3))$,
$u\in\widetilde{\mathcal{C}}([0,\infty);\dot{B}_{p,1}^{1+\frac{3}{p}}(\mathbb{R}^3))\cap
L^1(\mathbb{R}^+;\dot{B}_{p,1}^{1+\frac{3}{p}}(\mathbb{R}^3))$. It improves the
recent result of M. Paicu, P. Zhang (J. Funct. Anal. 262 (2012) 35563584),
where the exponent form of the initial smallness condition is replaced by a
polynomial form.

In this paper, we provide a much simplified proof of the main result in [Lin,
Xu, Zhang, arXiv:1302.5877] concerning the global existence and uniqueness of
smooth solutions to the Cauchy problem for a 2D incompressible viscous and
nonresistive MHD system under the assumption that the initial data are close
to some equilibrium states. Beside the classical energy method, the
interpolating inequalities and the algebraic structure of the equations coming
from the incompressibility of the fluid are crucial in our arguments. We
combine the energy estimates with the $L^\infty$ estimates for time slices to
deduce the key $L^1$ in time estimates. The latter is responsible for the
global in time existence.

In this paper, we provide a much simplified proof of the main result in [Lin
and Zhang, Comm. Pure Appl. Math.,67(2014), 531580] concerning the global
existence and uniqueness of smooth solutions to the Cauchy problem for a 3D
incompressible complex fluid model under the assumption that the initial data
are close to some equilibrium states. Beside the classical energy method, the
interpolating inequalities and the algebraic structure of the equations coming
from the incompressibility of the fluid are crucial in our arguments. We
combine the energy estimates with the $L^\infty$ estimates for time slices to
deduce the key $L^1$ in time estimates. The latter is responsible for the
global in time existence.

The paper considers the block sampling method for longrange dependent
processes. Our theory generalizes earlier ones by Hall, Jing and Lahiri (1998)
on functionals of Gaussian processes and Nordman and Lahiri (2005) on linear
processes. In particular, we allow nonlinear transforms of linear processes.
Under suitable conditions on physical dependence measures, we prove the
validity of the block sampling method. The problem of estimating the
selfsimilar index is also studied.

In this paper we consider the local and global wellposedness to the
densitydependent incompressible viscoelastic fluids. We first study some
linear models associated to the incompressible viscoelastic system. Then we
approximate the system by a sequence of ordinary differential equations, by
means of the Friedrichs method. Some uniform estimates for those solutions will
be obtained. Using compactness arguments, we will get the local existence up to
extracting a subsequence by means of Ascoli's lemma. With the help of small
data conditions and hybird Besov spaces, we finally derive the global
existence.

We consider parameter estimation, hypothesis testing and variable selection
for partially timevarying coefficient models. Our asymptotic theory has the
useful feature that it can allow dependent, nonstationary error and covariate
processes. With a twostage method, the parametric component can be estimated
with a $n^{1/2}$convergence rate. A simulationassisted hypothesis testing
procedure is proposed for testing significance and parameter constancy. We
further propose an information criterion that can consistently select the true
set of significant predictors. Our method is applied to autoregressive models
with timevarying coefficients. Simulation results and a real data application
are provided.