• We present a real-time method for synthesizing highly complex human motions using a novel training regime we call the auto-conditioned Recurrent Neural Network (acRNN). Recently, researchers have attempted to synthesize new motion by using autoregressive techniques, but existing methods tend to freeze or diverge after a couple of seconds due to an accumulation of errors that are fed back into the network. Furthermore, such methods have only been shown to be reliable for relatively simple human motions, such as walking or running. In contrast, our approach can synthesize arbitrary motions with highly complex styles, including dances or martial arts in addition to locomotion. The acRNN is able to accomplish this by explicitly accommodating for autoregressive noise accumulation during training. Our work is the first to our knowledge that demonstrates the ability to generate over 18,000 continuous frames (300 seconds) of new complex human motion w.r.t. different styles.
  • 3D face reconstruction from a single image is a classical and challenging problem, with wide applications in many areas. Inspired by recent works in face animation from RGB-D or monocular video inputs, we develop a novel method for reconstructing 3D faces from unconstrained 2D images, using a coarse-to-fine optimization strategy. First, a smooth coarse 3D face is generated from an example-based bilinear face model, by aligning the projection of 3D face landmarks with 2D landmarks detected from the input image. Afterwards, using local corrective deformation fields, the coarse 3D face is refined using photometric consistency constraints, resulting in a medium face shape. Finally, a shape-from-shading method is applied on the medium face to recover fine geometric details. Our method outperforms state-of-the-art approaches in terms of accuracy and detail recovery, which is demonstrated in extensive experiments using real world models and publicly available datasets.
  • In this paper, we consider the quantum MHD equations with both the viscosity coefficient and the magnetic diffusion coefficient are depend on the density. we prove the global existence of weak solutions and perform the lower planck limit in a 3-dimensional torus for large initial data. The global existence is shown by using Faedo-Galerkin method and weak compactness techniques for the adiabatic exponent $\gamma>1$.
  • We propose DoubleFusion, a new real-time system that combines volumetric dynamic reconstruction with data-driven template fitting to simultaneously reconstruct detailed geometry, non-rigid motion and the inner human body shape from a single depth camera. One of the key contributions of this method is a double layer representation consisting of a complete parametric body shape inside, and a gradually fused outer surface layer. A pre-defined node graph on the body surface parameterizes the non-rigid deformations near the body, and a free-form dynamically changing graph parameterizes the outer surface layer far from the body, which allows more general reconstruction. We further propose a joint motion tracking method based on the double layer representation to enable robust and fast motion tracking performance. Moreover, the inner body shape is optimized online and forced to fit inside the outer surface layer. Overall, our method enables increasingly denoised, detailed and complete surface reconstructions, fast motion tracking performance and plausible inner body shape reconstruction in real-time. In particular, experiments show improved fast motion tracking and loop closure performance on more challenging scenarios.
  • We study the task of image inpainting, which is to fill in the missing region of an incomplete image with plausible contents. To this end, we propose a learning-based approach to generate visually coherent completion given a high-resolution image with missing components. In order to overcome the difficulty to directly learn the distribution of high-dimensional image data, we divide the task into inference, translation as two separate steps and model each step with a deep neural network. We also use simple heuristics to guide matching of textures from boundary to the hole. We show that, by using such techniques, inpainting reduces to the problem of learning two image-feature translation functions of much smaller dimensionality. We evaluate our method on several public datasets and show that we not only generate results of comparable or better visual quality, but are orders of magnitude faster than previous state-of-the-art methods.
  • Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effect on the underlying loss landscape, is not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature, and make meaningful side-by-side comp arisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.
  • End-to-end (E2E) automatic speech recognition (ASR) systems directly map acoustics to words using a unified model. Previous works mostly focus on E2E training a single model which integrates acoustic and language model into a whole. Although E2E training benefits from sequence modeling and simplified decoding pipelines, large amount of transcribed acoustic data is usually required, and traditional acoustic and language modelling techniques cannot be utilized. In this paper, a novel modular training framework of E2E ASR is proposed to separately train neural acoustic and language models during training stage, while still performing end-to-end inference in decoding stage. Here, an acoustics-to-phoneme model (A2P) and a phoneme-to-word model (P2W) are trained using acoustic data and text data respectively. A phone synchronous decoding (PSD) module is inserted between A2P and P2W to reduce sequence lengths without precision loss. Finally, modules are integrated into an acousticsto-word model (A2W) and jointly optimized using acoustic data to retain the advantage of sequence modeling. Experiments on a 300- hour Switchboard task show significant improvement over the direct A2W model. The efficiency in both training and decoding also benefits from the proposed method.
  • By using solid-state reactions, we successfully synthesize new oxyselenides CsV$_2$Se$_{2-x}$O (x = 0, 0.5). These compounds containing V$_2$O planar layers with a square lattice crystallize in the CeCr$_2$Si$_2$C structure with the space group of $P4/mmm$. Another new compound V$_2$Se$_2$O which crystallizes in space group $I4/mmm$ is fabricated by topochemical deintercalation of cesium from CsV$_2$Se$_2$O powder with iodine in tetrahydrofuran(THF). Resistivity measurements show a semiconducting behavior for CsV$_2$Se$_2$O, while a metallic behavior for CsV$_2$Se$_{1.5}$O, and an insulating feature for V$_2$Se$_2$O. A charge- or spin-density wave-like anomaly has been observed at 168 K for CsV$_2$Se$_2$O and 150 K for CsV$_2$Se$_{1.5}$O, respectively. And these anomalies are also confirmed by the magnetic susceptibility measurements. The resistivity in V$_2$Se$_2$O exhibits an anomalous log(1/$T$) temperature dependence, which is similar to the case in parent phase or very underdoped cuprates indicating the involvement of strong correlation. Magnetic susceptibility measurements show that the magnetic moment per V-site in V$_2$Se$_2$O is much larger than that of CsV$_2$Se$_{2-x}$O, which again suggests the correlation induced localization effect in the former.
  • Boson sampling is a well-defined task that is strongly believed to be intractable for classical computers, but can be efficiently solved by a specific quantum simulator. However, an outstanding problem for large-scale experimental boson sampling is the scalability. Here we report an experiment on boson sampling with photon loss, and demonstrate that boson sampling with a few photons lost can increase the sampling rate. Our experiment uses a quantum-dot-micropillar single-photon source demultiplexed into up to seven input ports of a 16*16 mode ultra-low-loss photonic circuit, and we detect three-, four- and five-fold coincidence counts. We implement and validate lossy boson sampling with one and two photons lost, and obtain sampling rates of 187 kHz, 13.6 kHz, and 0.78 kHz for five-, six- and seven-photon boson sampling with two photons lost, which is 9.4, 13.9, and 18.0 times faster than the standard boson sampling, respectively. Our experiment shows an approach to significantly enhance the sampling rate of multiphoton boson sampling.
  • Understanding the surrounding environment of the vehicle is still one of the challenges for autonomous driving. This paper addresses 360-degree road scene semantic segmentation using surround view cameras, which are widely equipped in existing production cars. First, in order to address large distortion problem in the fisheye images, Restricted Deformable Convolution (RDC) is proposed for semantic segmentation, which can effectively model geometric transformations by learning the shapes of convolutional filters conditioned on the input feature map. Second, in order to obtain a large-scale training set of surround view images, a novel method called zoom augmentation is proposed to transform conventional images to fisheye images. Finally, an RDC based semantic segmentation model is built. The model is trained for real-world surround view images through a multi-task learning architecture by combining real-world images with transformed images. Experiments demonstrate the effectiveness of the RDC to handle images with large distortions, and the proposed approach shows a good performance using surround view cameras with the help of the transformed images.
  • A thermal ghost imaging scheme between two distant parties is proposed and experimentally demonstrated over long-distance optical fibers. In the scheme, the weak thermal light is split into two paths. Photons in one path are spatially diffused according to their frequencies by a spatial dispersion component, then illuminate the object and record its spatial transmission information. Photons in the other path are temporally diffused by a temporal dispersion component. By the coincidence measurement between photons of two paths, the object can be imaged in a way of ghost imaging, based on the frequency correlation between photons in the two paths. In the experiment, the weak thermal light source is prepared by the spontaneous four-wave mixing in a silicon waveguide. The temporal dispersion is introduced by single mode fibers of 50 km, which also could be looked as a fiber link. Experimental results show that this scheme can be realized over long-distance optical fibers.
  • Superconducting nanowire single photon detectors (SNSPDs) have advanced various frontier scientific and technological fields such as quantum key distribution and deep space communications. However, limited by available cooling technology, all past experimental demonstrations have had ground-based applications. In this work we demonstrate a SNSPD system using a hybrid cryocooler compatible with space applications. With a minimum operational temperature of 2.8 K, this SNSPD system presents a maximum system detection efficiency of over 50% and a timing jitter of 48 ps, which paves the way for various space applications.
  • Measuring the performance of solar energy and heat transfer systems requires a lot of time, economic cost and manpower. Meanwhile, directly predicting their performance is challenging due to the complicated internal structures. Fortunately, a knowledge-based machine learning method can provide a promising prediction and optimization strategy for the performance of energy systems. In this Chapter, the authors will show how they utilize the machine learning models trained from a large experimental database to perform precise prediction and optimization on a solar water heater (SWH) system. A new energy system optimization strategy based on a high-throughput screening (HTS) process is proposed. This Chapter consists of: i) Comparative studies on varieties of machine learning models (artificial neural networks (ANNs), support vector machine (SVM) and extreme learning machine (ELM)) to predict the performances of SWHs; ii) Development of an ANN-based software to assist the quick prediction and iii) Introduction of a computational HTS method to design a high-performance SWH system.
  • Quantum mechanics provides means of generating genuine randomness that is impossible with deterministic classical processes. Remarkably, the unpredictability of randomness can be certified in a self-testing manner that is independent of implementation devices. Here, we present an experimental demonstration of self-testing quantum random number generation based on an detection-loophole free Bell test with entangled photons. In the randomness analysis, without the assumption of independent identical distribution, we consider the worst case scenario that the adversary launches the most powerful attacks against quantum adversary. After considering statistical fluctuations and applying an 80 Gb $\times$ 45.6 Mb Toeplitz matrix hashing, we achieve a final random bit rate of 114 bits/s, with a failure probability less than $10^{-5}$. Such self-testing random number generators mark a critical step towards realistic applications in cryptography and fundamental physics tests.
  • Covert communication offers a method to transmit messages in such a way that it is not possible to detect that the communication is happening at all. In this work, we report an experimental demonstration of covert communication that is provably secure against unbounded quantum adversaries. The covert communication is carried out over 10 km of optical fiber, addressing the challenges associated with transmission over metropolitan distances. We deploy the protocol in a dense wavelength-division multiplexing infrastructure, where our system has to coexist with a co-propagating C-band classical channel. The noise from the classical channel allows us to perform covert communication in a neighbouring channel. We perform an optimization of all protocol parameters and report the transmission of three different messages with varying levels of security. Our results showcase the feasibility of secure covert communication in a practical setting, with several possible future improvements from both theory and experiment.
  • A quantum money scheme enables a trusted bank to provide untrusted users with verifiable quantum banknotes that cannot be forged. In this work, we report an experimental demonstration of the preparation and verification of unforgeable quantum banknotes. We employ a security analysis that takes experimental imperfections fully into account. We measure a total of $3.6\times 10^6$ states in one verification round, limiting the forging probability to $10^{-7}$ based on the security analysis. Our results demonstrate the feasibility of preparing and verifying quantum banknotes using currently available experimental techniques.
  • Although deep learning models are highly effective for various learning tasks, their high computational costs prohibit the deployment to scenarios where either memory or computational resources are limited. In this paper, we focus on compressing and accelerating deep models with network weights represented by very small numbers of bits, referred to as extremely low bit neural network. We model this problem as a discretely constrained optimization problem. Borrowing the idea from Alternating Direction Method of Multipliers (ADMM), we decouple the continuous parameters from the discrete constraints of network, and cast the original hard problem into several subproblems. We propose to solve these subproblems using extragradient and iterative quantization algorithms that lead to considerably faster convergency compared to conventional optimization methods. Extensive experiments on image recognition and object detection verify that the proposed algorithm is more effective than state-of-the-art approaches when coming to extremely low bit neural network.
  • Supervised speech separation uses supervised learning algorithms to learn a mapping from an input noisy signal to an output target. With the fast development of deep learning, supervised separation has become the most important direction in speech separation area in recent years. For the supervised algorithm, training target has a significant impact on the performance. Ideal ratio mask is a commonly used training target, which can improve the speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use the optimal ratio mask as the training target of the deep neural network (DNN) for speech separation. The experiments are carried out under various noise environments and signal to noise ratio (SNR) conditions. The results show that the optimal ratio mask outperforms other training targets in general.
  • We present a minimalistic but effective neural network that computes dense facial correspondences in highly unconstrained RGB images. Our network learns a per-pixel flow and a matchability mask between 2D input photographs of a person and the projection of a textured 3D face model. To train such a network, we generate a massive dataset of synthetic faces with dense labels using renderings of a morphable face model with variations in pose, expressions, lighting, and occlusions. We found that a training refinement using real photographs is required to drastically improve the ability to handle real images. When combined with a facial detection and 3D face fitting step, we show that our approach outperforms the state-of-the-art face alignment methods in terms of accuracy and speed. By directly estimating dense correspondences, we do not rely on the full visibility of sparse facial landmarks and are not limited to the model space of regression-based approaches. We also assess our method on video frames and demonstrate successful per-frame processing under extreme pose variations, occlusions, and lighting conditions. Compared to existing 3D facial tracking techniques, our fitting does not rely on previous frames or frontal facial initialization and is robust to imperfect face detections.
  • Both reverberation and additive noises degrade the speech quality and intelligibility. Weighted prediction error (WPE) method performs well on the dereverberation but with limitations. First, WPE doesn't consider the influence of the additive noise which degrades the performance of dereverberation. Second, it relies on a time-consuming iterative process, and there is no guarantee or a widely accepted criterion on its convergence. In this paper, we integrate deep neural network (DNN) into WPE for dereverberation and denoising. DNN is used to suppress the background noise to meet the noise-free assumption of WPE. Meanwhile, DNN is applied to directly predict spectral variance of the target speech to make the WPE work without iteration. The experimental results show that the proposed method has a significant improvement in speech quality and runs fast.
  • The interior penalty methods using $C^0$ Lagrange elements ($C^0$IPG) developed in the last decade for the fourth order problems are an interesting topic in academia at present. In this paper, we discuss the adaptive fashion of $C^0$IPG method for the Helmholtz transmission eigenvalue problem.We give the a posteriori error indicators for primal and dual eigenfunctions, and prove their reliability and efficiency. We also give the a posteriori error indicator for eigenvalues and design a $C^0$IPG adaptive algorithm. Numerical experiments show that this algorithm is efficient and can get the optimal convergence rate.
  • Currently, deep neural networks are deployed on low-power embedded devices by first training a full-precision model using powerful computing hardware, and then deriving a corresponding low-precision model for efficient inference on such systems. However, training models directly with coarsely quantized weights is a key step towards learning on embedded platforms that have limited computing resources, memory capacity, and power consumption. Numerous recent publications have studied methods for training quantized network, but these studies have mostly been empirical. In this work, we investigate training methods for quantized neural networks from a theoretical viewpoint. We first explore accuracy guarantees for training methods under convexity assumptions. We then look at the behavior of algorithms for non-convex problems, and we show that training algorithms that exploit high-precision representations have an important annealing property that purely quantized training methods lack, which explains many of the observed empirical differences between these types of algorithms.
  • In this paper we give a proof of Enomoto's conjecture for graphs of sufficiently large order. Enomoto's conjecture states that, if $G$ is a graph of order $n$ with minimum degree $\delta(G)\geq \frac{n}{2}+1$, then for any pair of vertices $x$, $y$ in $G$, there is a Hamiltonian cycle $C$ of $G$ such that $d_C(x,y)=\lfloor \frac{n}{2}\rfloor$. The main tools of our proof are Regularity Lemma of Szemer\'edi and Blow-up Lemma of Koml\'os et al.
  • Machine comprehension(MC) style question answering is a representative problem in natural language processing. Previous methods rarely spend time on the improvement of encoding layer, especially the embedding of syntactic information and name entity of the words, which are very crucial to the quality of encoding. Moreover, existing attention methods represent each query word as a vector or use a single vector to represent the whole query sentence, neither of them can handle the proper weight of the key words in query sentence. In this paper, we introduce a novel neural network architecture called Multi-layer Embedding with Memory Network(MEMEN) for machine reading task. In the encoding layer, we employ classic skip-gram model to the syntactic and semantic information of the words to train a new kind of embedding layer. We also propose a memory network of full-orientation matching of the query and passage to catch more pivotal information. Experiments show that our model has competitive results both from the perspectives of precision and efficiency in Stanford Question Answering Dataset(SQuAD) among all published results and achieves the state-of-the-art results on TriviaQA dataset.
  • The "digital Michelangelo project" was a seminal computer vision project in the early 2000's that pushed the capabilities of acquisition systems and involved multiple people from diverse fields, many of whom are now leaders in industry and academia. Reviewing this project with modern eyes provides us with the opportunity to reflect on several issues, relevant now as then to the field of computer vision and research in general, that go beyond the technical aspects of the work. This article was written in the context of a reading group competition at the week-long International Computer Vision Summer School 2017 (ICVSS) on Sicily, Italy. To deepen the participants understanding of computer vision and to foster a sense of community, various reading groups were tasked to highlight important lessons which may be learned from provided literature, going beyond the contents of the paper. This report is the winning entry of this guided discourse (Fig. 1). The authors closely examined the origins, fruits and most importantly lessons about research in general which may be distilled from the "digital Michelangelo project". Discussions leading to this report were held within the group as well as with Hao Li, the group mentor.