• In this paper, we study the problem of designing efficient convolutional neural network architectures with the interest in eliminating the redundancy in convolution kernels. In addition to structured sparse kernels, low-rank kernels and the product of low-rank kernels, the product of structured sparse kernels, which is a framework for interpreting the recently-developed interleaved group convolutions (IGC) and its variants (e.g., Xception), has been attracting increasing interests. Motivated by the observation that the convolutions contained in a group convolution in IGC can be further decomposed in the same manner, we present a modularized building block, {IGCV$2$:} interleaved structured sparse convolutions. It generalizes interleaved group convolutions, which is composed of two structured sparse kernels, to the product of more structured sparse kernels, further eliminating the redundancy. We present the complementary condition and the balance condition to guide the design of structured sparse kernels, obtaining a balance among three aspects: model size, computation complexity and classification accuracy. Experimental results demonstrate the advantage on the balance among these three aspects compared to interleaved group convolutions and Xception, and competitive performance compared to other state-of-the-art architecture design methods.
  • Semantic matching of natural language sentences or identifying the relationship between two sentences is a core research problem underlying many natural language tasks. Depending on whether training data is available, prior research has proposed both unsupervised distance-based schemes and supervised deep learning schemes for sentence matching. However, previous approaches either omit or fail to fully utilize the ordered, hierarchical, and flexible structures of language objects, as well as the interactions between them. In this paper, we propose Hierarchical Sentence Factorization---a technique to factorize a sentence into a hierarchical representation, with the components at each different scale reordered into a "predicate-argument" form. The proposed sentence factorization technique leads to the invention of: 1) a new unsupervised distance metric which calculates the semantic distance between a pair of text snippets by solving a penalized optimal transport problem while preserving the logical relationship of words in the reordered sentences, and 2) new multi-scale deep learning models for supervised semantic training, based on factorized sentence hierarchies. We apply our techniques to text-pair similarity estimation and text-pair relationship classification tasks, based on multiple datasets such as STSbenchmark, the Microsoft Research paraphrase identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments show that the proposed hierarchical sentence factorization can be used to significantly improve the performance of existing unsupervised distance-based metrics as well as multiple supervised deep learning models based on the convolutional neural network (CNN) and long short-term memory (LSTM).
  • By applying the delicate \textit{a priori} estimates for the equations of $(\Phi,\Gamma)$, which is introduced in the previous work, we obtain some multi-scale regularity criteria of the swirl component $u^{\theta}$ for the 3D axisymmetric Navier-Stokes equations. In particularly, the solution $\mathbf{u}$ can be continued beyond the time $T$, provided that $u^{\theta}$ satiesfies $$ u^{\theta} \in L^{p}_{T}L^{q_{v}}_{v}L^{q_{h},w}_{h},~~\frac{2}{p}+\frac{1}{q_{v}}+\frac{2}{q_{h}}\leq 1, ~2<q_{h}\leq\infty,~\frac{1}{q_{v}}+\frac{2}{q_{h}}<1. $$
  • Identifying the relationship between two text objects is a core research problem underlying many natural language processing tasks. A wide range of deep learning schemes have been proposed for text matching, mainly focusing on sentence matching, question answering or query document matching. We point out that existing approaches do not perform well at matching long documents, which is critical, for example, to AI-based news article understanding and event or story formation. The reason is that these methods either omit or fail to fully utilize complicated semantic structures in long documents. In this paper, we propose a graph approach to text matching, especially targeting long document matching, such as identifying whether two news articles report the same event in the real world, possibly with different narratives. We propose the Concept Interaction Graph to yield a graph representation for a document, with vertices representing different concepts, each being one or a group of coherent keywords in the document, and with edges representing the interactions between different concepts, connected by sentences in the document. Based on the graph representation of document pairs, we further propose a Siamese Encoded Graph Convolutional Network that learns vertex representations through a Siamese neural network and aggregates the vertex features though Graph Convolutional Networks to generate the matching result. Extensive evaluation of the proposed approach based on two labeled news article datasets created at Tencent for its intelligent news products show that the proposed graph approach to long document matching significantly outperforms a wide range of state-of-the-art methods.
  • Recently, Ising superconductors which possess in-plane upper critical fields much larger than the Pauli limit field are under intense experimental study. Many monolayer or few layer transition metal dichalcogenides are shown to be Ising superconductors. In this work, we show that in a wide range of experimentally accessible regimes where the in-plane magnetic field is higher than the Pauli limit field but lower than $H_{c2}$, a 2H-structure monolayer NbSe$_2$ or simiarly TaS$_2$ becomes a nodal topological superconductor. The bulk nodal points appear on the $\Gamma- M$ lines of the Brillouin zone where the Ising SOC vanishes. The nodal points are connected by Majorana flat bands, similar to the Weyl points being connected by surface Fermi arcs in Weyl semimetals. The Majorana flat bands are associated with a large number of zero energy Majorana fermion edge modes which induce spin-triplet Cooper pairs. This work demonstrates an experimentally feasible way to realise Majorana fermions in nodal topological superconductor, without any fining tuning of experimental parameters.
  • It is well accepted that convolutional neural networks play an important role in learning excellent features for image classification and recognition. However, in tradition they only allow adjacent layers connected, limiting integration of multi-scale information. To further improve their performance, we present a concatenating framework of shortcut convolutional neural networks. This framework can concatenate multi-scale features by shortcut connections to the fully-connected layer that is directly fed to the output layer. We do a large number of experiments to investigate performance of the shortcut convolutional neural networks on many benchmark visual datasets for different tasks. The datasets include AR, FERET, FaceScrub, CelebA for gender classification, CUReT for texture classification, MNIST for digit recognition, and CIFAR-10 for object recognition. Experimental results show that the shortcut convolutional neural networks can achieve better results than the traditional ones on these tasks, with more stability in different settings of pooling schemes, activation functions, optimizations, initializations, kernel numbers and kernel sizes.
  • This work reports an experimental study on an antiferromagnetic honeycomb lattice of MnPS$_3$ that couples the valley degree of freedom to a macroscopic antiferromagnetic order. The crystal structure of MnPS$_3$ is identified by high resolution scanning transmission electron microscopy. Layer dependent angle resolved polarized Raman fingerprints of the MnPS$_3$ crystal are obtained and the Raman peak at 383 cm$^{-1}$ exhibits 100% polarity. Temperature dependences of anisotropic magnetic susceptibility of MnPS$_3$ crystal are measured in superconducting quantum interference device. Magnetic parameters like effective magnetic moment, and exchange interaction are extracted from the mean field approximation mode. Ambipolar electronic transport channels in MnPS$_3$ are realized by the liquid gating technique. The conducting channel of MnPS$_3$ offers a unique platform for exploring the spin/valleytronics and magnetic orders in 2D limitation.
  • In this paper, we present a simple and modularized neural network architecture, named interleaved group convolutional neural networks (IGCNets). The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution. The two group convolutions are complementary: (i) the convolution on each partition in primary group convolution is a spatial convolution, while on each partition in secondary group convolution, the convolution is a point-wise convolution; (ii) the channels in the same secondary partition come from different primary partitions. We discuss one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved. We also show that regular convolutions, group convolution with summation fusion, and the Xception block are special cases of interleaved group convolutions. Empirical results over standard benchmarks, CIFAR-$10$, CIFAR-$100$, SVHN and ImageNet demonstrate that our networks are more efficient in using parameters and computation complexity with similar or higher accuracy.
  • Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algorithms perform superiorly in terms of search accuracy, search time cost, and space cost. Finally, we introduce a few emerging topics.
  • Hierarchical C@MoS2@C hollow spheres with the active MoS2 nanosheets being sandwiched by carbon layers have been produced by means of a modified template method. The process applies polydopamine (PDA) layers which inhibit morphology change of the template thereby enforcing the hollow microsphere structure. In addition, PDA forms complexes with the Mo precursor, leading to an in-situ growth of MoS2 on its surface and preventing the nanosheets from agglomeration. It also supplies the carbon that finally sandwiches the 100-150 nm thin MoS2 spheres. The resulting hierarchically structured material provides a stable microstructure where carbon layers strongly linked to MoS2 offer efficient pathways for electron and ion transfer, and concomitantly buffer the volume changes inevitably appearing during the charge-discharge process. Carbon-sandwiched MoS2-based electrodes exhibit high specific capacity of approximately 900 mA h g-1 after 50 cycles at 0.1 C, excellent cycling stability up to 200 cycles, and superior rate performance. The versatile synthesis method reported here offers a general route to design hollow sandwich structures with a variety of different active materials.
  • We demonstrate that charge density wave (CDW) phase transition occurs on the surface of electronically doped multilayer graphene when the Fermi level approaches the M points (also known as van Hove singularities where the density of states diverge) in the Brillouin zone of graphene band structure. The occurrence of such CDW phase transitions are supported by both the electrical transport measurement and optical measurements in electrostatically doped multilayer graphene. The CDW transition is accompanied with the sudden change of graphene channel resistance at T$_m$= 100K, as well as the splitting of Raman G peak (1580 cm$^{-1}$). The splitting of Raman G peak indicats the lifting of in-plane optical phonon branch degeneracy and the non-degenerate phonon branches are correlated to the lattice reconstructions of graphene -- the CDW phase transition.
  • In this paper, we consider the global well-posedness problem of the isentropic compressible Navier-Stokes equations in the whole space $\R^N$ with $N\ge2$. In order to better reflect the characteristics of the dispersion equation, we make full use of the role of the frequency on the integrability and regularity of the solution, and prove that the isentropic compressible Navier-Stokes equations admit global solutions when the initial data are close to a stable equilibrium in the sense of suitable hybrid Besov norm. As a consequence, the initial velocity with arbitrary $\dot{B}^{\fr{N}{2}-1}_{2,1}$ norm of potential part $\Pe^\bot u_0$ and large highly oscillating are allowed in our results. The proof relies heavily on the dispersive estimates for the system of acoustics, and a careful study of the nonlinear terms.
  • In this paper, we present a novel deep learning approach, deeply-fused nets. The central idea of our approach is deep fusion, i.e., combine the intermediate representations of base networks, where the fused output serves as the input of the remaining part of each base network, and perform such combinations deeply over several intermediate representations. The resulting deeply fused net enjoys several benefits. First, it is able to learn multi-scale representations as it enjoys the benefits of more base networks, which could form the same fused network, other than the initial group of base networks. Second, in our suggested fused net formed by one deep and one shallow base networks, the flows of the information from the earlier intermediate layer of the deep base network to the output and from the input to the later intermediate layer of the deep base network are both improved. Last, the deep and shallow base networks are jointly learnt and can benefit from each other. More interestingly, the essential depth of a fused net composed from a deep base network and a shallow base network is reduced because the fused net could be composed from a less deep base network, and thus training the fused net is less difficult than training the initial deep base network. Empirical results demonstrate that our approach achieves superior performance over two closely-related methods, ResNet and Highway, and competitive performance compared to the state-of-the-arts.
  • The inference procedure for the mean of a stationary time series is usually quite different under various model assumptions because the partial sum process behaves differently depending on whether the time series is short or long-range dependent, or whether it has a light or heavy-tailed marginal distribution. In the current paper, we develop an asymptotic theory for the self-normalized block sampling, and prove that the corresponding block sampling method can provide a unified inference approach for the aforementioned different situations in the sense that it does not require the {\em a priori} estimation of auxiliary parameters. Monte Carlo simulations are presented to illustrate its finite-sample performance. The R function implementing the method is available from the authors.
  • In this paper, we investigate the global well-posedness for the 3-D inhomogeneous incompressible Navier-Stokes system with the axisymmetric initial data. We prove the global well-posedness provided that $$\|\frac{a_{0}}{r}\|_{\infty} \textrm{ and } \|u_{0}^{\theta}\|_{3} \textrm{ are sufficiently small}. $$ Furthermore, if $\mathbf{u}_0\in L^1$ and $ru^\theta_0\in L^1\cap L^2$, we have \begin{equation*} \|u^{\theta}(t)\|_{2}^{2}+\langle t\rangle \|\nabla (u^{\theta}\mathbf{e}_{\theta})(t)\|_{2}^{2}+t\langle t\rangle(\|u_{t}^{\theta}(t)\|_{2}^{2}+\|\Delta(u^{\theta}\mathbf{e}_{\theta})(t)\|_{2}^{2}) \leq C \langle t\rangle^{-\frac{5}{2}},\ \forall\ t>0. \end{equation*}
  • We describe tunable optical sawtooth and zigzag lattices for ultracold atoms. Making use of the superlattice generated by commensurate wavelengths of light beams, tunable geometries including zigzag and sawtooth configurations can be realised. We provide an experimentally feasible method to fully control inter- ($t$) and intra- ($t'$) unit-cell tunnelling in zigzag and sawtooth lattices. We analyse the conversion of the lattice geometry from zigzag to sawtooth, and show that a nearly flat band is attainable in the sawtooth configuration by means of tuning the lattice parameters. The bandwidth of the first excited band can be reduced up to 2$\%$ of the ground bandwidth for a wide range of lattice setting. A nearly flat band available in a tunable sawtooth lattice would offer a versatile platform for the study of interaction-driven quantum many-body states with ultracold atoms.
  • In this paper, we study the three-dimensional axisymmetric Navier-Stokes system with nonzero swirl. By establishing a new key inequality for the pair $(\frac{\omega^{r}}{r},\frac{\omega^{\theta}}{r})$, we get several Prodi-Serrin type regularity criteria based on the angular velocity, $u^\theta$. Moreover, we obtain the global well-posedness result if the initial angular velocity $u_{0}^{\theta}$ is appropriate small in the critical space $L^{3}(\R^{3})$. Furthermore, we also get several Prodi-Serrin type regularity criteria based on one component of the solutions, say $\omega^3$ or $u^3$.
  • This paper considers a general class of nonparametric time series regression models where the regression function can be time-dependent. We establish an asymptotic theory for estimates of the time-varying regression functions. For this general class of models, an important issue in practice is to address the necessity of modeling the regression function as nonlinear and time-varying. To tackle this, we propose an information criterion and prove its selection consistency property. The results are applied to the U.S. Treasury interest rate data.
  • Rating Prediction is a basic problem in Recommender System, and one of the most widely used method is Factorization Machines(FM). However, traditional matrix factorization methods fail to utilize the benefit of implicit feedback, which has been proved to be important in Rating Prediction problem. In this work, we consider a specific situation, movie rating prediction, where we assume that watching history has a big influence on his/her rating behavior on an item. We introduce two models, Latent Dirichlet Allocation(LDA) and word2vec, both of which perform state-of-the-art results in training latent features. Based on that, we propose two feature based models. One is the Topic-based FM Model which provides the implicit feedback to the matrix factorization. The other is the Vector-based FM Model which expresses the order info of watching history. Empirical results on three datasets demonstrate that our method performs better than the baseline model and confirm that Vector-based FM Model usually works better as it contains the order info.
  • In this article, we consider the global well-posedness to the 3-D incompressible inhomogeneous Navier-Stokes equations with a class of large velocity. More precisely, assuming $a_0 \in \dot{B}_{q,1}^{\frac{3}{q}}(\mathbb{R}^3)$ and $u_0=(u_0^h,u_0^3)\in \dot{B}_{p,1}^{-1+\frac{3}{p}}(\mathbb{R}^3)$ for $p,q \in (1,6)$ with $\sup(\frac{1}{p}, \frac{1}{q})\leq\frac{1}{3}+ \inf (\frac{1}{p}, \frac{1}{q})$, we prove that if $C\|a_0\|_{\dot{B}_{q,1}^{\frac{3}{q}}}^{\alpha}(\|u_0^3\|_{\dot{B}_{p,1}^{-1+\frac{3}{p}}}/{\mu}+1)\leq1$, $\frac{C}{\mu}(\|u_0^h\|_{\dot{B}_{p,1}^{-1+\frac{3}{p}}}+\|u_0^3\|_{\dot{B}_{p,1}^{-1+\frac{3}{p}}}^{1-\alpha}\|u_0^h\|_{\dot{B}_{p,1}^{-1+\frac{3}{p}}}^{\alpha})\leq 1$, then the system has a unique global solution $a\in\widetilde{\mathcal{C}}([0,\infty);\dot{B}_{q,1}^{\frac{3}{q}}(\mathbb{R}^3))$, $u\in\widetilde{\mathcal{C}}([0,\infty);\dot{B}_{p,1}^{-1+\frac{3}{p}}(\mathbb{R}^3))\cap L^1(\mathbb{R}^+;\dot{B}_{p,1}^{1+\frac{3}{p}}(\mathbb{R}^3))$. It improves the recent result of M. Paicu, P. Zhang (J. Funct. Anal. 262 (2012) 3556-3584), where the exponent form of the initial smallness condition is replaced by a polynomial form.
  • In this paper, we provide a much simplified proof of the main result in [Lin, Xu, Zhang, arXiv:1302.5877] concerning the global existence and uniqueness of smooth solutions to the Cauchy problem for a 2D incompressible viscous and non-resistive MHD system under the assumption that the initial data are close to some equilibrium states. Beside the classical energy method, the interpolating inequalities and the algebraic structure of the equations coming from the incompressibility of the fluid are crucial in our arguments. We combine the energy estimates with the $L^\infty$ estimates for time slices to deduce the key $L^1$ in time estimates. The latter is responsible for the global in time existence.
  • In this paper, we provide a much simplified proof of the main result in [Lin and Zhang, Comm. Pure Appl. Math.,67(2014), 531--580] concerning the global existence and uniqueness of smooth solutions to the Cauchy problem for a 3D incompressible complex fluid model under the assumption that the initial data are close to some equilibrium states. Beside the classical energy method, the interpolating inequalities and the algebraic structure of the equations coming from the incompressibility of the fluid are crucial in our arguments. We combine the energy estimates with the $L^\infty$ estimates for time slices to deduce the key $L^1$ in time estimates. The latter is responsible for the global in time existence.
  • The paper considers the block sampling method for long-range dependent processes. Our theory generalizes earlier ones by Hall, Jing and Lahiri (1998) on functionals of Gaussian processes and Nordman and Lahiri (2005) on linear processes. In particular, we allow nonlinear transforms of linear processes. Under suitable conditions on physical dependence measures, we prove the validity of the block sampling method. The problem of estimating the self-similar index is also studied.
  • In this paper we consider the local and global well-posedness to the density-dependent incompressible viscoelastic fluids. We first study some linear models associated to the incompressible viscoelastic system. Then we approximate the system by a sequence of ordinary differential equations, by means of the Friedrichs method. Some uniform estimates for those solutions will be obtained. Using compactness arguments, we will get the local existence up to extracting a subsequence by means of Ascoli's lemma. With the help of small data conditions and hybird Besov spaces, we finally derive the global existence.
  • We consider parameter estimation, hypothesis testing and variable selection for partially time-varying coefficient models. Our asymptotic theory has the useful feature that it can allow dependent, nonstationary error and covariate processes. With a two-stage method, the parametric component can be estimated with a $n^{1/2}$-convergence rate. A simulation-assisted hypothesis testing procedure is proposed for testing significance and parameter constancy. We further propose an information criterion that can consistently select the true set of significant predictors. Our method is applied to autoregressive models with time-varying coefficients. Simulation results and a real data application are provided.