• Grasping skill is a major ability that a wide number of real-life applications require for robotisation. State-of-the-art robotic grasping methods perform prediction of object grasp locations based on deep neural networks which require huge amount of labeled data for training and prove impracticable in robotics. In this paper, we propose to generate a large scale synthetic dataset with ground truth, which we refer to as the Jacquard grasping dataset. Specifically, the proposed Jacquard dataset builds on a subset of ShapeNet with its numerous shape objects and features at a scale of millions varied object grasping positions for a large diversity of more than 11k objects. Beyond the simulation of the grasping scene with the underlying object, the ground truth for a successful grasping position is also based on tentatives of a simulated grasping robot. We carried out experiments using an off-the-shelf CNN, with three different evaluation metrics, including real grasping robot trials. The results show that Jacquard enables much better generalization skills than a human labeled dataset thanks to its diversity of objects and grasping positions. For the purpose of reproducible research in robotics, we are releasing along with the Jacquard dataset a web interface for researchers to evaluate the successfulness of their grasping position detections using our dataset.
  • Deep CNN-based object detection systems have achieved remarkable success on several large-scale object detection benchmarks. However, training such detectors requires a large number of labeled bounding boxes, which are more difficult to obtain than image-level annotations. Previous work addresses this issue by transforming image-level classifiers into object detectors. This is done by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We improve this previous work by incorporating knowledge about object similarities from visual and semantic domains during the transfer process. The intuition behind our proposed method is that visually and semantically similar categories should exhibit more common transferable properties than dissimilar categories, e.g. a better detector would result by transforming the differences between a dog classifier and a dog detector onto the cat class, than would by transforming from the violin class. Experimental results on the challenging ILSVRC2013 detection dataset demonstrate that each of our proposed object similarity based knowledge transfer methods outperforms the baseline methods. We found strong evidence that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.
  • Meaningful facial parts can convey key cues for both facial action unit detection and expression prediction. Textured 3D face scan can provide both detailed 3D geometric shape and 2D texture appearance cues of the face which are beneficial for Facial Expression Recognition (FER). However, accurate facial parts extraction as well as their fusion are challenging tasks. In this paper, a novel system for 3D FER is designed based on accurate facial parts extraction and deep feature fusion of facial parts. In particular, each textured 3D face scan is firstly represented as a 2D texture map and a depth map with one-to-one dense correspondence. Then, the facial parts of both texture map and depth map are extracted using a novel 4-stage process consists of facial landmark localization, facial rotation correction, facial resizing, facial parts bounding box extraction and post-processing procedures. Finally, deep fusion Convolutional Neural Networks (CNNs) features of all facial parts are learned from both texture maps and depth maps, respectively and nonlinear SVMs are used for expression prediction. Experiments are conducted on the BU-3DFE database, demonstrating the effectiveness of combing different facial parts, texture and depth cues and reporting the state-of-the-art results in comparison with all existing methods under the same setting.
  • In the era of Internet of Things (IoT) technologies the potential for privacy invasion is becoming a major concern especially in regards to healthcare data and Ambient Assisted Living (AAL) environments. Systems that offer AAL technologies make extensive use of personal data in order to provide services that are context-aware and personalized. This makes privacy preservation a very important issue especially since the users are not always aware of the privacy risks they could face. A lot of progress has been made in the deep learning field, however, there has been lack of research on privacy preservation of sensitive personal data with the use of deep learning. In this paper we focus on a Long Short Term Memory (LSTM) Encoder-Decoder, which is a principal component of deep learning, and propose a new encoding technique that allows the creation of different AAL data views, depending on the access level of the end user and the information they require access to. The efficiency and effectiveness of the proposed method are demonstrated with experiments on a simulated AAL dataset. Qualitatively, we show that the proposed model learns privacy operations such as disclosure, deletion and generalization and can perform encoding and decoding of the data with almost perfect recovery.
  • Domain adaptation (DA) is transfer learning which aims to learn an effective predictor on target data from source data despite data distribution mismatch between source and target. We present in this paper a novel unsupervised DA method for cross-domain visual recognition which simultaneously optimizes the three terms of a theoretically established error bound. Specifically, the proposed DA method iteratively searches a latent shared feature subspace where not only the divergence of data distributions between the source domain and the target domain is decreased as most state-of-the-art DA methods do, but also the inter-class distances are increased to facilitate discriminative learning. Moreover, the proposed DA method sparsely regresses class labels from the features achieved in the shared subspace while minimizing the prediction errors on the source data and ensuring label consistency between source and target. Data outliers are also accounted for to further avoid negative knowledge transfer. Comprehensive experiments and in-depth analysis verify the effectiveness of the proposed DA method which consistently outperforms the state-of-the-art DA methods on standard DA benchmarks, i.e., 12 cross-domain image classification tasks.
  • Correctly estimating the discrepancy between two data distributions has always been an important task in Machine Learning. Recently, Cuturi proposed the Sinkhorn distance which makes use of an approximate Optimal Transport cost between two distributions as a distance to describe distribution discrepancy. Although it has been successfully adopted in various machine learning applications (e.g. in Natural Language Processing and Computer Vision) since then, the Sinkhorn distance also suffers from two unnegligible limitations. The first one is that the Sinkhorn distance only gives an approximation of the real Wasserstein distance, the second one is the `divide by zero' problem which often occurs during matrix scaling when setting the entropy regularization coefficient to a small value. In this paper, we introduce a new Brenier approach for calculating a more accurate Wasserstein distance between two discrete distributions, this approach successfully avoids the two limitations shown above for Sinkhorn distance and gives an alternative way for estimating distribution discrepancy.
  • A number of pattern recognition tasks, \textit{e.g.}, face verification, can be boiled down to classification or clustering of unit length directional feature vectors whose distance can be simply computed by their angle. In this paper, we propose the von Mises-Fisher (vMF) mixture model as the theoretical foundation for an effective deep-learning of such directional features and derive a novel vMF Mixture Loss and its corresponding vMF deep features. The proposed vMF feature learning achieves the characteristics of discriminative learning, \textit{i.e.}, compacting the instances of the same class while increasing the distance of instances from different classes. Moreover, it subsumes a number of popular loss functions as well as an effective method in deep learning, namely normalization. We conduct extensive experiments on face verification using 4 different challenging face datasets, \textit{i.e.}, LFW, YouTube faces, CACD and IJB-A. Results show the effectiveness and excellent generalization ability of the proposed approach as it achieves state-of-the-art results on the LFW, YouTube faces and CACD datasets and competitive results on the IJB-A dataset.
  • Domain adaptation (DA) aims to generalize a learning model across training and testing data despite the mismatch of their data distributions. In light of a theoretical estimation of upper error bound, we argue in this paper that an effective DA method should 1) search a shared feature subspace where source and target data are not only aligned in terms of distributions as most state of the art DA methods do, but also discriminative in that instances of different classes are well separated; 2) account for the geometric structure of the underlying data manifold when inferring data labels on the target domain. In comparison with a baseline DA method which only cares about data distribution alignment between source and target, we derive three different DA models, namely CDDA, GA-DA, and DGA-DA, to highlight the contribution of Close yet Discriminative DA(CDDA) based on 1), Geometry Aware DA (GA-DA) based on 2), and finally Discriminative and Geometry Aware DA (DGA-DA) implementing jointly 1) and 2). Using both synthetic and real data, we show the effectiveness of the proposed approach which consistently outperforms state of the art DA methods over 36 image classification DA tasks through 6 popular benchmarks. We further carry out in-depth analysis of the proposed DA method in quantifying the contribution of each term of our DA model and provide insights into the proposed DA methods in visualizing both real and synthetic data.
  • Compact acceleration of a tightly collimated relativistic electron beam with high charge from a laser-plasma interaction has many unique applications. However, currently the well-known schemes, including laser wakefield acceleration from gases and vacuum laser acceleration from solids, often produce electron beams either with low charge or with large divergence angles. In this work, we report the generation of highly collimated electron beams with a divergence angle of a few degrees, quasi-monoenergetic spectra peaked at the MeV level, and extremely high charge ($\sim$100 nC) via a powerful sub-ps laser pulse interacting with a solid target in grazing incidence. Particle-in-cell simulations illustrate a new direct laser acceleration scenario, in which the self-filamentation is triggered in a large-scale near-critical-density plasma and electron bunches are accelerated periodically and collimated by the ultra-intense electromagnetic field. The energy density of such electron beams in high-Z materials reaches to $\sim10^{12} \mathrm{J/m^{3}}$, making it a promising tool to drive warm or even hot dense matter states.
  • 2D face analysis techniques, such as face landmarking, face recognition and face verification, are reasonably dependent on illumination conditions which are usually uncontrolled and unpredictable in the real world. An illumination robust preprocessing method thus remains a significant challenge in reliable face analysis. In this paper we propose a novel approach for improving lighting normalization through building the underlying reflectance model which characterizes interactions between skin surface, lighting source and camera sensor, and elaborates the formation of face color appearance. Specifically, the proposed illumination processing pipeline enables the generation of Chromaticity Intrinsic Image (CII) in a log chromaticity space which is robust to illumination variations. Moreover, as an advantage over most prevailing methods, a photo-realistic color face image is subsequently reconstructed which eliminates a wide variety of shadows whilst retaining the color information and identity details. Experimental results under different scenarios and using various face databases show the effectiveness of the proposed approach to deal with lighting variations, including both soft and hard shadows, in face recognition.
  • 3D face dense tracking aims to find dense inter-frame correspondences in a sequence of 3D face scans and constitutes a powerful tool for many face analysis tasks, e.g., 3D dynamic facial expression analysis. The majority of the existing methods just fit a 3D face surface or model to a 3D target surface without considering temporal information between frames. In this paper, we propose a novel method for densely tracking sequences of 3D face scans, which ex- tends the non-rigid ICP algorithm by adding a novel specific criterion for temporal information. A novel fitting framework is presented for automatically tracking a full sequence of 3D face scans. The results of experiments carried out on the BU4D-FE database are promising, showing that the proposed algorithm outperforms state-of-the-art algorithms for 3D face dense tracking.
  • Heterogeneous face recognition between color image and depth image is a much desired capacity for real world applications where shape information is looked upon as merely involved in gallery. In this paper, we propose a cross-modal deep learning method as an effective and efficient workaround for this challenge. Specifically, we begin with learning two convolutional neural networks (CNNs) to extract 2D and 2.5D face features individually. Once trained, they can serve as pre-trained models for another two-way CNN which explores the correlated part between color and depth for heterogeneous matching. Compared with most conventional cross-modal approaches, our method additionally conducts accurate depth image reconstruction from single color image with Conditional Generative Adversarial Nets (cGAN), and further enhances the recognition performance by fusing multi-modal matching results. Through both qualitative and quantitative experiments on benchmark FRGC 2D/3D face database, we demonstrate that the proposed pipeline outperforms state-of-the-art performance on heterogeneous face recognition and ensures a drastically efficient on-line stage.
  • Training a Deep Neural Network (DNN) from scratch requires a large amount of labeled data. For a classification task where only small amount of training data is available, a common solution is to perform fine-tuning on a DNN which is pre-trained with related source data. This consecutive training process is time consuming and does not consider explicitly the relatedness between different source and target tasks. In this paper, we propose a novel method to jointly fine-tune a Deep Neural Network with source data and target data. By adding an Optimal Transport loss (OT loss) between source and target classifier predictions as a constraint on the source classifier, the proposed Joint Transfer Learning Network (JTLN) can effectively learn useful knowledge for target classification from source data. Furthermore, by using different kind of metric as cost matrix for the OT loss, JTLN can incorporate different prior knowledge about the relatedness between target categories and source categories. We carried out experiments with JTLN based on Alexnet on image classification datasets and the results verify the effectiveness of the proposed JTLN in comparison with standard consecutive fine-tuning. This Joint Transfer Learning with OT loss is general and can also be applied to other kind of Neural Networks.
  • Domain adaptation (DA) is transfer learning which aims to leverage labeled data in a related source domain to achieve informed knowledge transfer and help the classification of unlabeled data in a target domain. In this paper, we propose a novel DA method, namely Robust Data Geometric Structure Aligned, Close yet Discriminative Domain Adaptation (RSA-CDDA), which brings closer, in a latent joint subspace, both source and target data distributions, and aligns inherent hidden source and target data geometric structures while performing discriminative DA in repulsing both interclass source and target data. The proposed method performs domain adaptation between source and target in solving a unified model, which incorporates data distribution constraints, in particular via a nonparametric distance, i.e., Maximum Mean Discrepancy (MMD), as well as constraints on inherent hidden data geometric structure segmentation and alignment between source and target, through low rank and sparse representation. RSA-CDDA achieves the search of a joint subspace in solving the proposed unified model through iterative optimization, alternating Rayleigh quotient algorithm and inexact augmented Lagrange multiplier algorithm. Extensive experiments carried out on standard DA benchmarks, i.e., 16 cross-domain image classification tasks, verify the effectiveness of the proposed method, which consistently outperforms the state-of-the-art methods.
  • Domain adaptation is transfer learning which aims to generalize a learning model across training and testing data with different distributions. Most previous research tackle this problem in seeking a shared feature representation between source and target domains while reducing the mismatch of their data distributions. In this paper, we propose a close yet discriminative domain adaptation method, namely CDDA, which generates a latent feature representation with two interesting properties. First, the discrepancy between the source and target domain, measured in terms of both marginal and conditional probability distribution via Maximum Mean Discrepancy is minimized so as to attract two domains close to each other. More importantly, we also design a repulsive force term, which maximizes the distances between each label dependent sub-domain to all others so as to drag different class dependent sub-domains far away from each other and thereby increase the discriminative power of the adapted domain. Moreover, given the fact that the underlying data manifold could have complex geometric structure, we further propose the constraints of label smoothness and geometric structure consistency for label propagation. Extensive experiments are conducted on 36 cross-domain image classification tasks over four public datasets. The comprehensive results show that the proposed method consistently outperforms the state-of-the-art methods with significant margins.
  • Face recognition (FR) methods report significant performance by adopting the convolutional neural network (CNN) based learning methods. Although CNNs are mostly trained by optimizing the softmax loss, the recent trend shows an improvement of accuracy with different strategies, such as task-specific CNN learning with different loss functions, fine-tuning on target dataset, metric learning and concatenating features from multiple CNNs. Incorporating these tasks obviously requires additional efforts. Moreover, it demotivates the discovery of efficient CNN models for FR which are trained only with identity labels. We focus on this fact and propose an easily trainable and single CNN based FR method. Our CNN model exploits the residual learning framework. Additionally, it uses normalized features to compute the loss. Our extensive experiments show excellent generalization on different datasets. We obtain very competitive and state-of-the-art results on the LFW, IJB-A, YouTube faces and CACD datasets.
  • Nuclear fusion reactions are the most important processes in nature to power stars and produce new elements, and lie at the center of the understanding of nucleosynthesis in the universe. It is critically important to study the reactions in full plasma environments that are close to true astrophysical conditions. By using laser-driven counter-streaming collisionless plasmas, we studied the fusion D$+$D$\rightarrow n +^3$He in a Gamow-like window around 27 keV. The results show that astrophysical nuclear reaction yield can be modulated significantly by the self-generated electromagnetic fields and the collective motion of the plasma. This plasma-version mini-collider may provide a novel tool for studies of astrophysics-interested nuclear reactions in plasma with tunable energies in earth-based laboratories.
  • In this paper, we present a novel approach to automatic 3D Facial Expression Recognition (FER) based on deep representation of facial 3D geometric and 2D photometric attributes. A 3D face is firstly represented by its geometric and photometric attributes, including the geometry map, normal maps, normalized curvature map and texture map. These maps are then fed into a pre-trained deep convolutional neural network to generate the deep representation. Then the facial expression prediction is simplyachieved by training linear SVMs over the deep representation for different maps and fusing these SVM scores. The visualizations show that the deep representation provides a complete and highly discriminative coding scheme for 3D faces. Comprehensive experiments on the BU-3DFE database demonstrate that the proposed deep representation can outperform the widely used hand-crafted descriptors (i.e., LBP, SIFT, HOG, Gabor) and the state-of-art approaches under the same experimental protocols.
  • With ever increasing computing power and data storage capacity, the potential for large digital video libraries is growing rapidly.However, the massive use of video for the moment is limited by its opaque characteristics. Indeed, a user who has to handle and retrieve sequentially needs too much time in order to find out segments of interest within a video. Therefore, providing an environment both convenient and efficient for video storing and retrieval, especially for content-based searching as this exists in traditional textbased database systems, has been the focus of recent and important efforts of a large research community In this paper, we propose a new automatic video scene segmentation method that explores two main video features; these are spatial-temporal relationship and rhythm of shots. The experimental evidence we obtained from a 80 minutevideo showed that our prototype provides very high accuracy for video segmentation.
  • We report a laser wakefield acceleration of electron beams up to 130 MeV from laser-driven 4-mm long nitrogen gas jet. By using a moderate laser intensity (3.5*10^18 W.cm^(-2)) and relatively low plasma densities (0.8*10^18 cm^(-3) to 2.7*10^18 cm^(-3)) we have achieved a stable regime for laser propagation and consequently a stable generation of electron beams. We experimentally studied the dependence of the drive laser energy on the laser-plasma channel and electron beam parameters. The quality of the generated electron beams is discussed within the framework of the ionization-induced injection mechanism.
  • X-ray devices are far superior to optical ones for providing nanometre spatial and attosecond temporal resolutions. Such resolution is indispensable in biology, medicine, physics, material sciences, and their applications. A bright ultrafast coherent X-ray source is highly desirable, for example, for the diffractive imaging of individual large molecules, viruses, or cells. Here we demonstrate experimentally a new compact X-ray source involving high-order harmonics produced by a relativistic-irradiance femtosecond laser in a gas target. In our first implementation using a 9 Terawatt laser, coherent soft X-rays are emitted with a comb-like spectrum reaching the 'water window' range. The generation mechanism is robust being based on phenomena inherent in relativistic laser plasmas: self-focusing, nonlinear wave generation accompanied by electron density singularities, and collective radiation by a compact electric charge. The formation of singularities (electron density spikes) is described by the elegant mathematical catastrophe theory, which explains sudden changes in various complex systems, from physics to social sciences. The new X-ray source has advantageous scalings, as the maximum harmonic order is proportional to the cube of the laser amplitude enhanced by relativistic self-focusing in plasma. This allows straightforward extension of the coherent X-ray generation to the keV and tens of keV spectral regions. The implemented X-ray source is remarkably easily accessible: the requirements for the laser can be met in a university-scale laboratory, the gas jet is a replenishable debris-free target, and the harmonics emanate directly from the gas jet without additional devices. Our results open the way to a compact coherent ultrashort brilliant X-ray source with single shot and high-repetition rate capabilities, suitable for numerous applications and diagnostics in many research fields.