• Let $V=\bigotimes_{k=1}^{N} V_{k}$ be the $N$ spin-$j$ Hilbert space with $d=2j+1$-dimensional single particle space. We fix an orthonormal basis $\{|m_i\rangle\}$ for each $V_{k}$, with weight $m_i\in \{-j,\ldots j\}$. Let $V_{(w)}$ be the subspace of $V$ with a constant weight $w$, with an orthonormal basis $\{|m_1,\ldots,m_N\rangle\}$ subject to $\sum_k m_k=w$. We show that the combinatorial properties of the constant weight condition imposes strong constraints on the reduced density matrices for any vector $|\psi\rangle$ in the constant weight subspace, which limits the possible entanglement structures of $|\psi\rangle$. Our results find applications in the overlapping quantum marginal problems, quantum error-correcting codes, and the spin-network structures in quantum gravity.
  • We prove the Landau-Ginzburg/Calabi-Yau correspondence between the Gromov-Witten theory of each elliptic orbifold curve and its Fan-Jarvis-Ruan-Witten theory counterpart via modularity. We show that the correlation functions in these two enumerative theories are different representations of the same set of quasi-modular forms, expanded around different points on the upper-half plane. We relate these two representations by the Cayley transform.
  • In this paper, we propose an improved quantitative evaluation framework for Generative Adversarial Networks (GANs) on generating domain-specific images, where we improve conventional evaluation methods on two levels: the feature representation and the evaluation metric. Unlike most existing evaluation frameworks which transfer the representation of ImageNet inception model to map images onto the feature space, our framework uses a specialized encoder to acquire fine-grained domain-specific representation. Moreover, for datasets with multiple classes, we propose Class-Aware Frechet Distance (CAFD), which employs a Gaussian mixture model on the feature space to better fit the multi-manifold feature distribution. Experiments and analysis on both the feature level and the image level were conducted to demonstrate improvements of our proposed framework over the recently proposed state-of-the-art FID method. To our best knowledge, we are the first to provide counter examples where FID gives inconsistent results with human judgments. It is shown in the experiments that our framework is able to overcome the shortness of FID and improves robustness. Code will be made available.
  • We experimentally simulate the spin networks -- a fundamental description of quantum spacetime at the Planck level. We achieve this by simulating quantum tetrahedra and their interactions. The tensor product of these quantum tetrahedra comprises spin networks. In this initial attempt to study quantum spacetime by quantum information processing, on a four-qubit nuclear magnetic resonance quantum simulator, we simulate the basic module -- comprising five quantum tetrahedra -- of the interactions of quantum spacetime. By measuring the geometric properties on the corresponding quantum tetrahedra and simulate their interactions, our experiment serves as the basic module that represents the Feynman diagram vertex in the spin-network formulation of quantum spacetime.
  • The Jaynes-Cummings model is solved with the raising and lowering (shift) operators by using the matrix-diagonalizing technique. Bell nonlocality is also found present ubiquitously in the excitations states of the model.
  • Predicting traffic conditions has been recently explored as a way to relieve traffic congestion. Several pioneering approaches have been proposed based on traffic observations of the target location as well as its adjacent regions, but they obtain somewhat limited accuracy due to lack of mining road topology. To address the effect attenuation problem, we propose to take account of the traffic of surrounding locations(wider than adjacent range). We propose an end-to-end framework called DeepTransport, in which Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are utilized to obtain spatial-temporal traffic information within a transport network topology. In addition, attention mechanism is introduced to align spatial and temporal information. Moreover, we constructed and released a real-world large traffic condition dataset with 5-minute resolution. Our experiments on this dataset demonstrate our method captures the complex relationship in temporal and spatial domain. It significantly outperforms traditional statistical methods and a state-of-the-art deep learning method.
  • This paper describes our solution for the video recognition task of ActivityNet Kinetics challenge that ranked the 1st place. Most of existing state-of-the-art video recognition approaches are in favor of an end-to-end pipeline. One exception is the framework of DevNet. The merit of DevNet is that they first use the video data to learn a network (i.e. fine-tuning or training from scratch). Instead of directly using the end-to-end classification scores (e.g. softmax scores), they extract the features from the learned network and then fed them into the off-the-shelf machine learning models to conduct video classification. However, the effectiveness of this line work has long-term been ignored and underestimated. In this submission, we extensively use this strategy. Particularly, we investigate four temporal modeling approaches using the learned features: Multi-group Shifting Attention Network, Temporal Xception Network, Multi-stream sequence Model and Fast-Forward Sequence Model. Experiment results on the challenging Kinetics dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing approaches in the large-scale video recognition tasks. Most remarkably, our best single Multi-group Shifting Attention Network can achieve 77.7% in term of top-1 accuracy and 93.2% in term of top-5 accuracy on the validation set.
  • This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place. Because the challenge provides pre-extracted visual and audio features instead of the raw videos, we mainly investigate various temporal modeling approaches to aggregate the frame-level features for multi-label video recognition. Our system contains three major components: two-stream sequence model, fast-forward sequence model and temporal residual neural networks. Experiment results on the challenging Youtube-8M dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing temporal modeling approaches in the large-scale video recognition tasks. To be noted, our fast-forward LSTM with a depth of 7 layers achieves 82.75% in term of GAP@20 on the Kaggle Public test set.
  • How self-loops on vertices affect quantum walks is an interesting issue, and self-loops play important roles in quantum walk based algorithms. However, the original model that adjusting the effect of self-loops by changing their number has limitations. For example, the effect of self-loops cannot be adjusted continuously, for their number must be an integer. In this paper, we proposed a model of adjustable self-loop on discrete-time quantum walk, whose weight is controlled by a real parameter in the coin operator. The proposed method not only generalises the situations where the number of self-loops is an integer, but also provides a way to adjust the weight of the self-loop continuously. It enhances the potential of self-loops in applications. For instance, we improve the success rate of the quantum walk based search on a $20\times20$ two-dimension lattice from $23.6\%$ to $97.2\%$ by the proposed method. And the success rate of the improved search, which only scales as $O(1/\log{N})$ before being improved, even increases slightly with the size of the lattice. To the best of our knowledge, this is the first time that such an improvement is achieved on the quantum walk based spatial search.
  • The study of epidemic spreading on populations of networked individuals has seen recently a great deal of significant progresses. A common point of all past studies is, however, that there is only one peak of infected density in each single epidemic spreading episode. At variance, real data from different cities over the world suggest that, besides a major single peak trait of infected density, a finite probability exists for a pattern made of two (or multiple) peaks. We show that such a latter feature is fully distinctive of a multilayered network of interactions, and reveal that actually a two peaks pattern emerges from different time delays at which the epidemic spreads in between the two layers. Further, we show that essential ingredients are different degree distributions in the two layers and a weak coupling condition between the layers themselves. Moreover, an edge-based theory is developed which fully explains all numerical results. Our findings may therefore be of significance for protecting secondary disasters of epidemics, which are definitely undesired in real life.
  • The GKZ system for the Hesse pencil of elliptic curves has more solutions than the period integrals. In this work we give different realizations and interpretations of the extra solution, in terms of oscillating integral, Eichler integral, chain integral on the elliptic curve, limit of a period of a certain compact Calabi-Yau threefold geometry, etc. We also highlight the role played by the orbifold singularity on the moduli space and its relation to the GKZ system.
  • Deep Neural Networks (DNNs) have provably enhanced the state-of-the-art Neural Machine Translation (NMT) with their capability in modeling complex functions and capturing complex linguistic structures. However NMT systems with deep architecture in their encoder or decoder RNNs often suffer from severe gradient diffusion due to the non-linear recurrent activations, which often make the optimization much more difficult. To address this problem we propose novel linear associative units (LAU) to reduce the gradient propagation length inside the recurrent unit. Different from conventional approaches (LSTM unit and GRU), LAUs utilizes linear associative connections between input and output of the recurrent unit, which allows unimpeded information flow through both space and time direction. The model is quite simple, but it is surprisingly effective. Our empirical study on Chinese-English translation shows that our model with proper configuration can improve by 11.7 BLEU upon Groundhog and the best reported results in the same setting. On WMT14 English-German task and a larger WMT14 English-French task, our model achieves comparable results with the state-of-the-art.
  • We prove that the ancestor Gromov-Witten correlation functions of one-dimensional compact Calabi-Yau orbifolds are quasi-modular forms. This includes the pillowcase orbifold which can not yet be handled by using Milanov-Ruan's B-model technique. We first show that genus zero modularity is obtained from the phenomenon that the system of WDVV equations is essentially equivalent to the set of Ramanujan identities satisfied by the generators of the ring of quasi-modular forms for a certain modular group associated to the orbifold curve. Higher genus modularity then follows by using tautological relations.
  • In this paper, we propose a new correlated and individual multi-modal deep learning (CIMDL) method for RGB-D object recognition. Unlike most conventional RGB-D object recognition methods which extract features from the RGB and depth channels individually, our CIMDL jointly learns feature representations from raw RGB-D data with a pair of deep neural networks, so that the sharable and modal-specific information can be simultaneously exploited. Specifically, we construct a pair of deep convolutional neural networks (CNNs) for the RGB and depth data, and concatenate them at the top layer of the network with a loss function which learns a new feature space where both correlated part and the individual part of the RGB-D information are well modelled. The parameters of the whole networks are updated by using the back-propagation criterion. Experimental results on two widely used RGB-D object image benchmark datasets clearly show that our method outperforms state-of-the-arts.
  • Study of the production of pairs of top quarks in association with a Higgs boson is one of the primary goals of the Large Hadron Collider over the next decade, as measurements of this process may help us to understand whether the uniquely large mass of the top quark plays a special role in electroweak symmetry breaking. Higgs bosons decay predominantly to \bbbar, yielding signatures for the signal that are similar to $t\bar{t}$ + jets with heavy flavor. Though particularly challenging to study due to the similar kinematics between signal and background events, such final states ($t\bar{t} b \bar{b}$) are an important channel for studying the top quark Yukawa coupling. This paper presents a systematic study of machine learning (ML) methods for detecting $t\bar{t}h$ in the $h \rightarrow b\bar{b}$ decay channel. Among the eight ML methods tested, we show that two models, extreme gradient boosted trees and neural network models, outperform alternative methods. We further study the effectiveness of ML algorithms by investigating the impact of feature set and data size, as well as the structure of the models. While extended feature set and larger training sets expectedly lead to improvement of performance, shallow models deliver comparable or better performance than their deeper counterparts. Our study suggests that ensembles of trees and neurons, not necessarily deep, work effectively for the problem of $t\bar{t}h$ detection.
  • While question answering (QA) with neural network, i.e. neural QA, has achieved promising results in recent years, lacking of large scale real-word QA dataset is still a challenge for developing and evaluating neural QA system. To alleviate this problem, we propose a large scale human annotated real-world QA dataset WebQA with more than 42k questions and 556k evidences. As existing neural QA methods resolve QA either as sequence generation or classification/ranking problem, they face challenges of expensive softmax computation, unseen answers handling or separate candidate answer generation component. In this work, we cast neural QA as a sequence labeling problem and propose an end-to-end sequence labeling model, which overcomes all the above challenges. Experimental results on WebQA show that our model outperforms the baselines significantly with an F1 score of 74.69% with word-based input, and the performance drops only 3.72 F1 points with more challenging character-based input.
  • Neural machine translation (NMT) aims at solving machine translation (MT) problems using neural networks and has exhibited promising results in recent years. However, most of the existing NMT models are shallow and there is still a performance gap between a single NMT model and the best conventional MT system. In this work, we introduce a new type of linear connections, named fast-forward connections, based on deep Long Short-Term Memory (LSTM) networks, and an interleaved bi-directional architecture for stacking the LSTM layers. Fast-forward connections play an essential role in propagating the gradients and building a deep topology of depth 16. On the WMT'14 English-to-French task, we achieve BLEU=37.7 with a single attention model, which outperforms the corresponding single shallow model by 6.2 BLEU points. This is the first time that a single NMT model achieves state-of-the-art performance and outperforms the best conventional model by 0.7 BLEU points. We can still achieve BLEU=36.3 even without using an attention mechanism. After special handling of unknown words and model ensembling, we obtain the best score reported to date on this task with BLEU=40.4. Our models are also validated on the more difficult WMT'14 English-to-German task.
  • The reduced density matrices of a many-body quantum system form a convex set, whose three-dimensional projection $\Theta$ is convex in $\mathbb{R}^3$. The boundary $\partial\Theta$ of $\Theta$ may exhibit nontrivial geometry, in particular ruled surfaces. Two physical mechanisms are known for the origins of ruled surfaces: symmetry breaking and gapless. In this work, we study the emergence of ruled surfaces for systems with local Hamiltonians in infinite spatial dimension, where the reduced density matrices are known to be separable as a consequence of the quantum de Finetti's theorem. This allows us to identify the reduced density matrix geometry with joint product numerical range $\Pi$ of the Hamiltonian interaction terms. We focus on the case where the interaction terms have certain structures, such that ruled surface emerge naturally when taking a convex hull of $\Pi$. We show that, a ruled surface on $\partial\Theta$ sitting in $\Pi$ has a gapless origin, otherwise it has a symmetry breaking origin. As an example, we demonstrate that a famous ruled surface, known as the oloid, is a possible shape of $\Theta$, with two boundary pieces of symmetry breaking origin separated by two gapless lines.
  • In this expository note we discuss some arithmetic aspects of the mirror symmetry for plane cubic curves. We also explain how the Picard-Fuchs equation can be used to reveal part of these arithmetic properties. The application of Picard-Fuchs equations in studying the genus zero Gromov-Witten invariants of more general Calabi-Yau varieties and the Weil-Petersson geometry on their moduli spaces will also be discussed.
  • Cr2AlC materials were irradiated with 7 MeV Xe26+ ions and 500 keV He2+ ions at room temperature. A structural transition with an increased c lattice parameter and a decreased a lattice parameter occurs after irradiation to doses above 1 dpa. Nevertheless, the modified structure is stable up to the dose of 5.2 dpa without obvious lattice disorder. The three samples irradiated to doses above 1 dpa have comparable lattice parameters and hardness values, suggesting a saturation of irradiation effects in Cr2AlC. The structural transition and irradiation effects saturation are ascribed to irradiation-induced antisite defects (CrAl and AlCr) and C interstitials, which is supported by the calculations of the formation energies of various defects in Cr2AlC. The irradiation-induced antisite defects and C interstitials may be critical to understand the excellent resistance to irradiation-induced amorphization of MAX phases.
  • Due to the indefiniteness and poor spectral properties, the discretized linear algebraic system of the vector Laplacian by mixed finite element methods is hard to solve. A block diagonal preconditioner has been developed and shown to be an effective preconditioner by Arnold, Falk, and Winther [Acta Numerica, 15:1--155, 2006]. The purpose of this paper is to propose alternative and effective block diagonal and block triangular preconditioners for solving this saddle point system. A variable V-cycle multigrid method with the standard point-wise Gauss-Seidel smoother is proved to be a good preconditioner for a discrete vector Laplacian operator. This multigrid solver will be further used to build preconditioners for the saddle point systems of the vector Laplacian and the Maxwell equations with divergent free constraint. The major benefit of our approach is that the point-wise Gauss-Seidel smoother is more algebraic and can be easily implemented as a black-box smoother.
  • Person re-identification aims at matching pedestrians observed from non-overlapping camera views. Feature descriptor and metric learning are two significant problems in person re-identification. A discriminative metric learning method should be capable of exploiting complex nonlinear transformations due to the large variations in feature space. In this paper, we propose a nonlinear local metric learning (NLML) method to improve the state-of-the-art performance of person re-identification on public datasets. Motivated by the fact that local metric learning has been introduced to handle the data which varies locally and deep neural network has presented outstanding capability in exploiting the nonlinearity of samples, we utilize the merits of both local metric learning and deep neural network to learn multiple sets of nonlinear transformations. By enforcing a margin between the distances of positive pedestrian image pairs and distances of negative pairs in the transformed feature subspace, discriminative information can be effectively exploited in the developed neural networks. Our experiments show that the proposed NLML method achieves the state-of-the-art results on the widely used VIPeR, GRID, and CUHK 01 datasets.
  • In this paper, we present the mQA model, which is able to answer questions about the content of an image. The answer can be a sentence, a phrase or a single word. Our model contains four components: a Long Short-Term Memory (LSTM) to extract the question representation, a Convolutional Neural Network (CNN) to extract the visual representation, an LSTM for storing the linguistic context in an answer, and a fusing component to combine the information from the first three components and generate the answer. We construct a Freestyle Multilingual Image Question Answering (FM-IQA) dataset to train and evaluate our mQA model. It contains over 150,000 images and 310,000 freestyle Chinese question-answer pairs and their English translations. The quality of the generated answers of our mQA model on this dataset is evaluated by human judges through a Turing Test. Specifically, we mix the answers provided by humans and our model. The human judges need to distinguish our model from the human. They will also provide a score (i.e. 0, 1, 2, the larger the better) indicating the quality of the answer. We propose strategies to monitor the quality of this evaluation process. The experiments show that in 64.7% of cases, the human judges cannot distinguish our model from humans. The average score is 1.454 (1.918 for human). The details of this work, including the FM-IQA dataset, can be found on the project page: http://idl.baidu.com/FM-IQA.html
  • Conventional single image based localization methods usually fail to localize a querying image when there exist large variations between the querying image and the pre-built scene. To address this, we propose an image-set querying based localization approach. When the localization by a single image fails to work, the system will ask the user to capture more auxiliary images. First, a local 3D model is established for the querying image set. Then, the pose of the querying image set is estimated by solving a nonlinear optimization problem, which aims to match the local 3D model against the pre-built scene. Experiments have shown the effectiveness and feasibility of the proposed approach.
  • We review the polynomial structure of the topological string partition functions as solutions to the holomorphic anomaly equations. We also explain the connection between the ring of propagators defined from special K\"ahler geometry and the ring of almost-holomorphic modular forms defined on modular curves.