• Layer-structured materials are often considered to be good candidates for thermoelectric materials, because they tend to exhibit intrinsically low thermal conductivity as a result of atomic interlayer interactions. The electrical properties of layer-structured materials can be easily tuned using various methods, such as band modification and intercalation. We report TiNBr, as a member of the layer-structured metal nitride halide system MNX (M = Ti, Zr, Hf; X = Cl, Br, I), and it exhibits an ultrahigh Seebeck coefficient of 2215 $\mu V/K$ at 300K. The value of the dimensionless figure of merit, ZT, along A axis can be as high as 0.661 at 800K, corresponding to a lattice thermal conductivity as low as 1.34 W/(m K). The low ${\kappa_l}$ of TiNBr is associated with a collectively low phonon group velocity ($2.05\times 10^3 $ m/s on average) and large phonon anharmonicity that can be quantified using the Gr\"uneisen parameter and three-phonon processes. Animation of the atomic motion in highly anharmonic modes mainly involves the motion of N atoms, and the charge density difference reveals that the N atoms become polarized with the merging of anharmonicity. Moreover, the fitting procedure of the energy-displacement curve verifies that in addition to the three-phonon processes, the fourth-order anharmonic effect is also important in the integral anharmonicity of TiNBr. Our work is the first study of the thermoelectric properties of TiNBr and may help establish a connection between the low lattice thermal conductivity and the behavior of phonon vibrational modes.
  • We study active object tracking, where a tracker takes as input the visual observation (i.e., frame sequence) and produces the camera control signal (e.g., move forward, turn left, etc.). Conventional methods tackle the tracking and the camera control separately, which is challenging to tune jointly. It also incurs many human efforts for labeling and many expensive trial-and-errors in realworld. To address these issues, we propose, in this paper, an end-to-end solution via deep reinforcement learning, where a ConvNet-LSTM function approximator is adopted for the direct frame-toaction prediction. We further propose an environment augmentation technique and a customized reward function, which are crucial for a successful training. The tracker trained in simulators (ViZDoom, Unreal Engine) shows good generalization in the case of unseen object moving path, unseen object appearance, unseen background, and distracting object. It can restore tracking when occasionally losing the target. With the experiments over the VOT dataset, we also find that the tracking ability, obtained solely from simulators, can potentially transfer to real-world scenarios.
  • In this paper we develop a method to solve evolution equations on Gelfand triples with time-fractional derivative based on monotonicity techniques. Applications include deterministic and stochastic quasi-linear partial differential equations with time-fractional derivatives, including time-fractional (stochastic) porous media equations (including the case where the Laplace operator is also fractional) and $p$-Laplace equations as special cases.
  • This paper studies the Tensor Robust Principal Component (TRPCA) problem which extends the known Robust PCA (Candes et al. 2011) to the tensor case. Our model is based on a new tensor Singular Value Decomposition (t-SVD) (Kilmer and Martin 2011) and its induced tensor tubal rank and tensor nuclear norm. Consider that we have a 3-way tensor ${\mathcal{X}}\in\mathbb{R}^{n_1\times n_2\times n_3}$ such that ${\mathcal{X}}={\mathcal{L}}_0+{\mathcal{E}}_0$, where ${\mathcal{L}}_0$ has low tubal rank and ${\mathcal{E}}_0$ is sparse. Is that possible to recover both components? In this work, we prove that under certain suitable assumptions, we can recover both the low-rank and the sparse components exactly by simply solving a convex program whose objective is a weighted combination of the tensor nuclear norm and the $\ell_1$-norm, i.e., $\min_{{\mathcal{L}},\ {\mathcal{E}}} \ \|{{\mathcal{L}}}\|_*+\lambda\|{{\mathcal{E}}}\|_1, \ \text{s.t.} \ {\mathcal{X}}={\mathcal{L}}+{\mathcal{E}}$, where $\lambda= {1}/{\sqrt{\max(n_1,n_2)n_3}}$. Interestingly, TRPCA involves RPCA as a special case when $n_3=1$ and thus it is a simple and elegant tensor extension of RPCA. Also numerical experiments verify our theory and the application for the image denoising demonstrates the effectiveness of our method.
  • In this paper, we propose a deep learning approach to tackle the automatic summarization tasks by incorporating topic information into the convolutional sequence-to-sequence (ConvS2S) model and using self-critical sequence training (SCST) for optimization. Through jointly attending to topics and word-level alignment, our approach can improve coherence, diversity, and informativeness of generated summaries via a biased probability generation mechanism. On the other hand, reinforcement training, like SCST, directly optimizes the proposed model with respect to the non-differentiable metric ROUGE, which also avoids the exposure bias during inference. We carry out the experimental evaluation with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. The empirical results demonstrate the superiority of our proposed method in the abstractive summarization.
  • Human motion prediction aims at generating future frames of human motion based on an observed sequence of skeletons. Recent methods employ the latest hidden states of a recurrent neural network (RNN) to encode the historical skeletons, which can only address short-term prediction. In this work, we propose a motion context modeling by summarizing the historical human motion with respect to the current prediction. A modified highway unit (MHU) is proposed for efficiently eliminating motionless joints and estimating next pose given the motion context. Furthermore, we enhance the motion dynamic by minimizing the gram matrix loss for long-term motion prediction. Experimental results show that the proposed model can promisingly forecast the human future movements, which yields superior performances over related state-of-the-art approaches. Moreover, specifying the motion context with the activity labels enables our model to perform human motion transfer.
  • In a previous work, we found that an object's gravity can be regarded as its buoyancy in space when it displaces gravitons. However, this seems to contradict the pointlike elementary particle in the standard model. In this work, by combining Klein's curled fifth dimension and Newman's complex space, we find a particle have a micro complex horizon, which can be described by the complex Kerr-Newman metrics in a 6-D complex space, the 3-D imaginary subspace of which is curled in the points of its 3-D real subspace (vice versa). A particle can appear as a pointlike particle or a particle with no-zero volume in different slices of the 6-D complex space. The ring singularity of a particle is hidden in one slice of its complex horizon. As two phases of a complex black hole, a article and a black hole can be transformed into each other through a phase transformation.
  • In this letter, by combining the holographic principle with the graviton Bose-Einstein condensates hypothesis of gravitational backgrounds, we provide a theory of gravity, which provides some kinetic details of how the gravitational coupling between matter and spacetime works. The effective radial potential energy of an object in a gravitational field is found to be the sum of the interfacial energy caused by its micro horizon and the energy required to make room for it by displacing gravitons. A version of Archimedes' principle for gravity can be described as "the effective internal energy of the gravitons that a body displaces is equal to the work by multiplying the gravity exerted on it and its distance to the centre of gravity."
  • Color names are often made up of multiple words. As a task in natural language understanding we investigate in depth the capacity of neural networks based on sums of word embeddings (SOWE), recurrence (LSTM and GRU based RNNs) and convolution (CNN), to estimate colors from sequences of terms. We consider both point and distribution estimates of color. We argue that the latter has a particular value as there is no clear agreement between people as to what a particular color describes -- different people have a different idea of what it means to be ``very dark orange'', for example. Surprisingly, despite it's simplicity, the sum of word embeddings generally performs the best on almost all evaluations.
  • Since their discovery by SDO/AIA in EUV, rapid (phase speeds of 1000 km/s), quasi-periodic, fast-mode propagating wave trains (QFPs) have been observed accompanying many solar flares. They typically propagate in funnel-like structures associated with the expanding magnetic field topology of the active regions (ARs). The waves provide information on the associated flare pulsations and the magnetic structure through coronal seismology. The reported waves usually originate from a single localized source associated with the flare. Here, we report the first detection of counter-propagating QFPs associated with two neighboring flares on 2013 May 22, apparently connected by large-scale, trans-equatorial coronal loops. We present the first results of 3D MHD model of counter-propagating QFPs an idealized bi-polar AR. We investigate the excitation, propagation, nonlinearity, and interaction of the counter-propagating waves for a range of key model parameters, such as the properties of the sources and the background magnetic structure. In addition to QFPs, we also find evidence of trapped fast (kink) and slow mode waves associated with the event. We apply coronal seismology to determine the magnetic field strength in an oscillating loop during the event. Our model results are in qualitative agreement with the AIA-observed counter propagating waves and are used to identify the various MHD wave modes associated with the observed event providing insights into their linear and nonlinear interactions. Our observations provide the first direct evidence of counter-propagating fast magnetosonic waves that can potentially lead to turbulent cascade and carry significant energy flux for coronal heating in low-corona magnetic structures.
  • An experiment for $p(^{14}\rm{C}$,$^{14}\rm{C}^{*}\rightarrow^{10}\rm{Be}+\alpha)\mathit{p}$ inelastic excitation and decay was performed in inverse kinematics at a beam energy of 25.3 MeV/u. A series of $^{14}\rm{C}$ excited states, including a new one at 18.3(1) MeV, were observed which decay to various states of the final nucleus of $^{10}\rm{Be}$. A specially designed telescope-system, installed around the zero degree, played an essential role in detecting the resonant states near the $\alpha$-separation threshold. A state at 14.1(1) MeV is clearly identified, being consistent with the predicted band-head of the molecular rotational band characterized by the $\pi$-bond linear-chain-configuration. Further clarification of the properties of this exotic state is suggested by using appropriate reaction tools.
  • In this paper, the Milstein method is used to approximate invariant measures of stochastic differential equations with commutative noise. The decay rate of the transition probability kernel generated by the Milstein method to the unique invariant measure of the method is observed to be exponential with respect to the time variable. The convergence rate of the numerical invariant measure to the underlying one is shown to be a one. Numerical simulations are presented to demonstrate the theoretical results.
  • Recently, the booming fashion sector and its huge potential benefits have attracted tremendous attention from many research communities. In particular, increasing research efforts have been dedicated to the complementary clothing matching as matching clothes to make a suitable outfit has become a daily headache for many people, especially those who do not have the sense of aesthetics. Thanks to the remarkable success of neural networks in various applications such as image classification and speech recognition, the researchers are enabled to adopt the data-driven learning methods to analyze fashion items. Nevertheless, existing studies overlook the rich valuable knowledge (rules) accumulated in fashion domain, especially the rules regarding clothing matching. Towards this end, in this work, we shed light on complementary clothing matching by integrating the advanced deep neural networks and the rich fashion domain knowledge. Considering that the rules can be fuzzy and different rules may have different confidence levels to different samples, we present a neural compatibility modeling scheme with attentive knowledge distillation based on the teacher-student network scheme. Extensive experiments on the real-world dataset show the superiority of our model over several state-of-the-art baselines. Based upon the comparisons, we observe certain fashion insights that add value to the fashion matching study. As a byproduct, we released the codes, and involved parameters to benefit other researchers.
  • Salient object detection, which aims to identify and locate the most salient pixels or regions in images, has been attracting more and more interest due to its various real-world applications. However, this vision task is quite challenging, especially under complex image scenes. Inspired by the intrinsic reflection of natural images, in this paper we propose a novel feature learning framework for large-scale salient object detection. Specifically, we design a symmetrical fully convolutional network (SFCN) to learn complementary saliency features under the guidance of lossless feature reflection. The location information, together with contextual and semantic information, of salient objects are jointly utilized to supervise the proposed network for more accurate saliency predictions. In addition, to overcome the blurry boundary problem, we propose a new structural loss function to learn clear object boundaries and spatially consistent saliency. The coarse prediction results are effectively refined by these structural information for performance improvements. Extensive experiments on seven saliency detection datasets demonstrate that our approach achieves consistently superior performance and outperforms the very recent state-of-the-art methods.
  • In this work, we study 3D object detection from RGB-D data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by popping up RGB-D scans. However, a key challenge of this approach is how to efficiently localize objects in point clouds of large-scale scenes (region proposal). Instead of solely relying on 3D proposals, our method leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects. Benefited from learning directly in raw point clouds, our method is also able to precisely estimate 3D bounding boxes even under strong occlusion or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection benchmarks, our method outperforms the state of the art by remarkable margins while having real-time capability.
  • We investigate the pair-production of right-handed neutrinos via the Standard Model (SM) Higgs boson in a gauged $B-L$ model. The right-handed neutrinos with a mass of few tens of GeV generating viable light neutrino masses via the seesaw mechanism naturally exhibit displaced vertices and distinctive signatures at the LHC and proposed lepton colliders. The production rate of the right-handed neutrinos depends on the mixing between the SM Higgs and the exotic Higgs associated with the $B-L$ breaking, whereas their decay length depends on the active-sterile neutrino mixing. We focus on the displaced leptonic final states arising from such a process, and analyze the sensitivity reach of the LHC and proposed lepton colliders in probing the active-sterile neutrino mixing. We show that mixing to muons as small as $V_{\mu N} \approx 10^{-7}$ can be probed at the LHC with 100 fb$^{-1}$ and at proposed lepton colliders with 5000 fb$^{-1}$. The future high luminosity run at LHC and the proposed MATHUSLA detector may further improve this reach by an order of magnitude.
  • The linearly constrained nonconvex nonsmooth program has drawn much attention over the last few years due to its ubiquitous power of modeling in the area of machine learning. A variety of important problems, including deep learning, matrix factorization and phase retrieval, can be reformulated as the problem of optimizing a highly nonconvex and nonsmooth objective function with some linear constraints. However, it is challenging to solve a linearly constrained nonconvex nonsmooth program, which is much complicated than its unconstrained counterpart. In fact, the feasible region is a polyhedron, where a simple projection is intractable in general, and moreover, the per-iteration cost is extremely expensive in real scenario, where the dimension of decision variable is high. Therefore, it has been recognized promising to develop a provable and practical algorithm for solving linearly constrained nonconvex nonsmooth programs. In this paper, we develop an incremental path-following splitting algorithm, denoted as \textsf{IPFS}, with a theoretical guarantee and a low computational cost. In specific, we show that this algorithm converges to an $\epsilon$-approximate stationary solution within $O(1/\epsilon)$ iterations with very low per-iteration cost. To the best of our knowledge, this is the first incremental method to solve linearly constrained nonconvex nonsmooth programs with a theoretical guarantee. Experiments conducted on the constrained concave penalized linear regression (CCPLR) and nonconvex support vector machine (NCSVM) demonstrate that the proposed algorithm is more effective and stable than other competing methods.
  • In this paper, we consider the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the low-rank and sparse components from their sum. Our model is based on the recently proposed tensor-tensor product (or t-product) [13]. Induced by the t-product, we first rigorously deduce the tensor spectral norm, tensor nuclear norm, and tensor average rank, and show that the tensor nuclear norm is the convex envelope of the tensor average rank within the unit ball of the tensor spectral norm. These definitions, their relationships and properties are consistent with matrix cases. Equipped with the new tensor nuclear norm, we then solve the TRPCA problem by solving a convex program and provide the theoretical guarantee for the exact recovery. Our TRPCA model and recovery guarantee include matrix RPCA as a special case. Numerical experiments verify our results, and the applications to image recovery and background modeling problems demonstrate the effectiveness of our method.
  • Nowadays, billions of videos are online ready to be viewed and shared. Among an enormous volume of videos, some popular ones are widely viewed by online users while the majority attract little attention. Furthermore, within each video, different segments may attract significantly different numbers of views. This phenomenon leads to a challenging yet important problem, namely fine-grained video attractiveness prediction. However, one major obstacle for such a challenging problem is that no suitable benchmark dataset currently exists. To this end, we construct the first fine-grained video attractiveness dataset, which is collected from one of the most popular video websites in the world. In total, the constructed FVAD consists of 1,019 drama episodes with 780.6 hours covering different categories and a wide variety of video contents. Apart from the large amount of videos, hundreds of millions of user behaviors during watching videos are also included, such as "view counts", "fast-forward", "fast-rewind", and so on, where "view counts" reflects the video attractiveness while other engagements capture the interactions between the viewers and videos. First, we demonstrate that video attractiveness and different engagements present different relationships. Second, FVAD provides us an opportunity to study the fine-grained video attractiveness prediction problem. We design different sequential models to perform video attractiveness prediction by relying solely on video contents. The sequential models exploit the multimodal relationships between visual and audio components of the video contents at different levels. Experimental results demonstrate the effectiveness of our proposed sequential models with different visual and audio representations, the necessity of incorporating the two modalities, and the complementary behaviors of the sequential prediction models at different levels.
  • Recently, caption generation with an encoder-decoder framework has been extensively studied and applied in different domains, such as image captioning, code captioning, and so on. In this paper, we propose a novel architecture, namely Auto-Reconstructor Network (ARNet), which, coupling with the conventional encoder-decoder framework, works in an end-to-end fashion to generate captions. ARNet aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator. Therefore, ARNet encourages the current hidden state to embed more information from the previous one, which can help regularize the transition dynamics of recurrent neural networks (RNNs). Extensive experimental results show that our proposed ARNet boosts the performance over the existing encoder-decoder models on both image captioning and source code captioning tasks. Additionally, ARNet remarkably reduces the discrepancy between training and inference processes for caption generation. Furthermore, the performance on permuted sequential MNIST demonstrates that ARNet can effectively regularize RNN, especially on modeling long-term dependencies. Our code is available at: https://github.com/chenxinpeng/ARNet
  • In this paper, we establish the large deviation principles, with respect to the weak convergence topology and the stronger Wasserstein metrics, for the empirical measure under the mean field Gibbs measure, under the strong exponential integrability condition for the negative part of the interaction potential.
  • We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image. Limited by the nature of deep neural network, previous methods usually represent a 3D shape in volume or point cloud, and it is non-trivial to convert them to the more ready-to-use mesh model. Unlike the existing methods, our network represents 3D mesh in a graph-based convolutional neural network and produces correct geometry by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image. We adopt a coarse-to-fine strategy to make the whole deformation procedure stable, and define various of mesh related losses to capture properties of different levels to guarantee visually appealing and physically accurate 3D geometry. Extensive experiments show that our method not only qualitatively produces mesh model with better details, but also achieves higher 3D shape estimation accuracy compared to the state-of-the-art.
  • Thanks to the success of deep learning, cross-modal retrieval has made significant progress recently. However, there still remains a crucial bottleneck: how to bridge the modality gap to further enhance the retrieval accuracy. In this paper, we propose a self-supervised adversarial hashing (\textbf{SSAH}) approach, which lies among the early attempts to incorporate adversarial learning into cross-modal hashing in a self-supervised fashion. The primary contribution of this work is that two adversarial networks are leveraged to maximize the semantic correlation and consistency of the representations between different modalities. In addition, we harness a self-supervised semantic network to discover high-level semantic information in the form of multi-label annotations. Such information guides the feature learning process and preserves the modality relationships in both the common semantic space and the Hamming space. Extensive experiments carried out on three benchmark datasets validate that the proposed SSAH surpasses the state-of-the-art methods.
  • Leveraging the disparity information from both left and right views is crucial for stereo disparity estimation. Left-right consistency check is an effective way to enhance the disparity estimation by referring to the information from the opposite view. However, the conventional left-right consistency check is an isolated post-processing step and heavily hand-crafted. This paper proposes a novel left-right comparative recurrent model to perform left-right consistency checking jointly with disparity estimation. At each recurrent step, the model produces disparity results for both views, and then performs online left-right comparison to identify the mismatched regions which may probably contain erroneously labeled pixels. A soft attention mechanism is introduced, which employs the learned error maps for better guiding the model to selectively focus on refining the unreliable regions at the next recurrent step. In this way, the generated disparity maps are progressively improved by the proposed recurrent model. Extensive evaluations on KITTI 2015, Scene Flow and Middlebury benchmarks validate the effectiveness of our model, demonstrating that state-of-the-art stereo disparity estimation results can be achieved by this new model.
  • Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.