• Sound event detection (SED) aims to detect what and when sound events happen in an audio clip. Sound events can be segmented in the time-frequency (T-F) domain and is called T-F segmentation. Many supervised SED algorithms rely on strongly labelled data which contains labels of onset and offset times of sound events. However, many audio tagging datasets are weakly labelled, that is, only the presence or absence of the sound events is known, without knowing their onset and offset times. In this paper, we propose a SED and T-F segmentation framework trained with weakly labelled data. In the training stage, we propose a segmentation mapping applied on a T-F representation of an audio clip to obtain T-F segmentation masks of sound events. We then apply a classification mapping on each T-F segmentation mask to estimate the presence probability of each sound event. Both of the segmentation mapping and classification mapping are trained jointly. In T-F segmentation, T-F segmentation masks can be obtained by presenting a T-F representation of an audio clip to the trained segmentation mapping. In SED, predicted onset and offset times can be obtained from the T-F segmentation masks. We propose to model the segmentation mapping using a convolutional neural network and the segmentation mapping using a global weighted rank pooling (GWRP). As a byproduct, separated waveforms of sound events can be obtained from their corresponding T-F segmentation masks. Experiments on the remixed DCASE 2013 dataset show that the proposed method obtains an area under the curve (AUC) score of 0.948 in audio tagging and 0.893 in sound event detection, outperforming a deep neural network baseline of 0.719 and 0.616, respectively.
  • Dense video captioning is a newly emerging task that aims at both localizing and describing all events in a video. We identify and tackle two challenges on this task, namely, (1) how to utilize both past and future contexts for accurate event proposal predictions, and (2) how to construct informative input to the decoder for generating natural event descriptions. First, previous works predominantly generate temporal event proposals in the forward direction, which neglects future video context. We propose a bidirectional proposal method that effectively exploits both past and future contexts to make proposal predictions. Second, different events ending at (nearly) the same time are indistinguishable in the previous works, resulting in the same captions. We solve this problem by representing each event with an attentive fusion of hidden states from the proposal module and video contents (e.g., C3D features). We further propose a novel context gating mechanism to balance the contributions from the current event and its surrounding contexts dynamically. We empirically show that our attentively fused event representation is superior to the proposal hidden states or video contents alone. By coupling proposal and captioning modules into one unified framework, our model outperforms the state-of-the-arts on the ActivityNet Captions dataset with a relative gain of over 100% (Meteor score increases from 4.82 to 9.65).
  • Controllability of ultracold atomic gases has reached an unprecedented level, allowing for experimental realization of the long-sought-after Thouless pump, which can be interpreted as a dynamical quantum Hall effect. On the other hand, Weyl semimetals and Weyl nodal line semimetals with touching points and rings in band structures have sparked tremendous interest in various fields in the past few years. Here, we show that dynamical Weyl points and dynamical 4D Weyl nodal rings, which are protected by the first Chern number on a parameter surface formed by quasi-momentum and time, emerge in a two-dimensional and three-dimensional system, respectively. We find that the topological pump occurs in these systems but the amount of pumped particles is not quantized and can be continuously tuned by controlling experimental parameters over a wide range. We also propose an experimental scheme to realize the dynamical Weyl points and 4D Weyl nodal rings and to observe their corresponding topological pump in cold atomic gases.
  • Source separation (SS) aims to separate individual sources from an audio recording. Sound event detection (SED) aims to detect sound events from an audio recording. We propose a joint separation-classification (JSC) model trained only on weakly labelled audio data, that is, only the tags of an audio recording are known but the time of the events are unknown. First, we propose a separation mapping from the time-frequency (T-F) representation of an audio to the T-F segmentation masks of the audio events. Second, a classification mapping is built from each T-F segmentation mask to the presence probability of each audio event. In the source separation stage, sources of audio events and time of sound events can be obtained from the T-F segmentation masks. The proposed method achieves an equal error rate (EER) of 0.14 in SED, outperforming deep neural network baseline of 0.29. Source separation SDR of 8.08 dB is obtained by using global weighted rank pooling (GWRP) as probability mapping, outperforming the global max pooling (GMP) based probability mapping giving SDR at 0.03 dB. Source code of our work is published.
  • This paper investigates the classification of the Audio Set dataset. Audio Set is a large scale weakly labelled dataset of sound clips. Previous work used multiple instance learning (MIL) to classify weakly labelled data. In MIL, a bag consists of several instances, and a bag is labelled positive if at least one instances in the audio clip is positive. A bag is labelled negative if all the instances in the bag are negative. We propose an attention model to tackle the MIL problem and explain this attention model from a novel probabilistic perspective. We define a probability space on each bag, where each instance in the bag has a trainable probability measure for each class. Then the classification of a bag is the expectation of the classification output of the instances in the bag with respect to the learned probability measure. Experimental results show that our proposed attention model modeled by fully connected deep neural network obtains mAP of 0.327 on Audio Set dataset, outperforming the Google's baseline of 0.314 and recurrent neural network of 0.325.
  • A single atomic slice of {\alpha}-tin-stanene-has been predicted to host quantum spin Hall effect at room temperature, offering an ideal platform to study low-dimensional and topological physics. While recent research has intensively focused on monolayer stanene, the quantum size effect in few-layer stanene could profoundly change material properties, but remains unexplored. By exploring the layer degree of freedom, we unexpectedly discover superconductivity in few-layer stanene down to a bilayer grown on PbTe, while bulk {\alpha}-tin is not superconductive. Through substrate engineering, we further realize a transition from a single-band to a two-band superconductor with a doubling of the transition temperature. In-situ angle resolved photoemission spectroscopy (ARPES) together with first-principles calculations elucidate the corresponding band structure. Interestingly, the theory also indicates the existence of a topologically nontrivial band. Our experimental findings open up novel strategies for constructing two-dimensional topological superconductors.
  • Usually, the superconducting quantum interference device (SQUID) consists of two Josephson junctions and the interference therein is modulated by a magnetic flux. In this work, we propose an electrically modulated SQUID consisting of single Josephson junction coupled by a time-reversal breaking Weyl semimetal thin film. For a low Fermi energy, the Josephson current is only mediated by Fermi arc surface states, and has an arbitrary ground-state phase difference \phi0 which is directly proportional to the product of the transverse electric field and the cross section area of the junction.For a suitable Fermi energy, the bulk states make comparable contributions to the Josephson current with the current-phase relation of a 0-junction. The interference between the surface channel and the bulk channel results in an electrically modulated SQUID with single Josephson junction, which provides an experimental proposal to identify magnetic Weyl semimetals and may have potential applications in superconducting quantum computation.
  • The search for exotic topological effects of phonons has attracted enormous interest for both fundamental science and practical applications. By studying phonons in a Kekul\'e lattice, we find a new type of pseudospins characterized by quantized Berry phases and pseudoangular momenta, which introduces various novel topological effects, including topologically protected pseudospin-polarized interface states and a phonon pseudospin Hall effect. We further demonstrate a pseudospin-contrasting optical selection rule and a pseudospin Zeeman effect, giving a complete generation-manipulation-detection paradigm of the phonon pseudospin. The pseudospin and topology-related physics revealed for phonons is general and applicable for electrons, photons and other particles.
  • In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge. The audio clips in this task, which are extracted from YouTube videos, are manually labeled with one or a few audio tags but without timestamps of the audio events, which is called as weakly labeled data. Two sub-tasks are defined in this challenge including audio tagging and sound event detection using this weakly labeled data. A convolutional recurrent neural network (CRNN) with learnable gated linear units (GLUs) non-linearity applied on the log Mel spectrogram is proposed. In addition, a temporal attention method is proposed along the frames to predicate the locations of each audio event in a chunk from the weakly labeled data. We ranked the 1st and the 2nd as a team in these two sub-tasks of DCASE 2017 challenge with F value 55.6\% and Equal error 0.73, respectively.
  • The inverse mapping of GANs'(Generative Adversarial Nets) generator has a great potential value.Hence, some works have been developed to construct the inverse function of generator by directly learning or adversarial learning.While the results are encouraging, the problem is highly challenging and the existing ways of training inverse models of GANs have many disadvantages, such as hard to train or poor performance.Due to these reasons, we propose a new approach based on using inverse generator ($IG$) model as encoder and pre-trained generator ($G$) as decoder of an AutoEncoder network to train the $IG$ model. In the proposed model, the difference between the input and output, which are both the generated image of pre-trained GAN's generator, of AutoEncoder is directly minimized. The optimizing method can overcome the difficulty in training and inverse model of an non one-to-one function.We also applied the inverse model of GANs' generators to image searching and translation.The experimental results prove that the proposed approach works better than the traditional approaches in image searching.
  • In this technique report, we present a bunch of methods for the task 4 of Detection and Classification of Acoustic Scenes and Events 2017 (DCASE2017) challenge. This task evaluates systems for the large-scale detection of sound events using weakly labeled training data. The data are YouTube video excerpts focusing on transportation and warnings due to their industry applications. There are two tasks, audio tagging and sound event detection from weakly labeled data. Convolutional neural network (CNN) and gated recurrent unit (GRU) based recurrent neural network (RNN) are adopted as our basic framework. We proposed a learnable gating activation function for selecting informative local features. Attention-based scheme is used for localizing the specific events in a weakly-supervised mode. A new batch-level balancing strategy is also proposed to tackle the data unbalancing problem. Fusion of posteriors from different systems are found effective to improve the performance. In a summary, we get 61% F-value for the audio tagging subtask and 0.73 error rate (ER) for the sound event detection subtask on the development set. While the official multilayer perceptron (MLP) based baseline just obtained 13.1% F-value for the audio tagging and 1.02 for the sound event detection.
  • Unconventional fermions with high degeneracies in three dimensions beyond Weyl and Dirac fermions have sparked tremendous interest in condensed matter physics. Here, we study quantum Hall effects (QHEs) in a two-dimensional (2D) unconventional fermion system with a pair of gapped spin-1 fermions. We find that the original unlimited number of zero energy Landau levels (LLs) in the gapless case develop into a series of bands, leading to a novel QHE phenomenon that the Hall conductance first decreases (or increases) to zero and then revives as an infinite ladder of fine staircase when the Fermi surface is moved toward zero energy, and it suddenly reverses with its sign being flipped due to a Van Hove singularity when the Fermi surface is moved across zero. We further investigate the peculiar QHEs in a dice model with a pair of spin-1 fermions, which agree well with the results of the continuous model.
  • The quantum anomalous Hall effect, an exotic topological state first theoretically predicted by Haldane and recently experimentally observed, has attracted enormous interest for low-power-consumption electronics. In this work, we derived a Schr{\"o}dinger-like equation of phonons, where topology-related quantities, time reversal symmetry and its breaking can be naturally introduced similar as for electrons. Furthermore, we proposed a phononic analog of the Haldane model, which gives the novel quantum (anomalous) Hall-like phonon states characterized by one-way gapless edge modes immune to scattering. The topologically nontrivial phonon states are useful not only for conducting phonons without dissipation but also for designing highly efficient phononic devices, like an ideal phonon diode, which could find important applications in future phononics.
  • Phonons as collective excitations of lattice vibrations are the main heat carriers in solids. Tremendous effort has been devoted to investigate phonons and related properties, giving rise to an intriguing field of phononics, which is of great importance to many practical applications, including heat dissipation, thermal barrier coating, thermoelectrics and thermal control devices. Meanwhile, the research of topology-related physics, awarded the 2016 Nobel Prize in Physics, has led to discoveries of various exotic quantum states of matter, including the quantum (anomalous/spin) Hall [Q(A/S)H] effects, topological insulators/semimetals and topological superconductors. An emerging research field is to bring topological concepts for a new paradigm phononics---"topological phononics". In this Perspective, we will briefly introduce this emerging field and discuss the use of novel quantum degrees of freedom like the Berry phase and topology for manipulating phonons in unprecedentedly new ways.
  • Existing block-diagonal representation researches mainly focuses on casting block-diagonal regularization on training data, while only little attention is dedicated to concurrently learning both block-diagonal representations of training and test data. In this paper, we propose a discriminative block-diagonal low-rank representation (BDLRR) method for recognition. In particular, the elaborate BDLRR is formulated as a joint optimization problem of shrinking the unfavorable representation from off-block-diagonal elements and strengthening the compact block-diagonal representation under the semi-supervised framework of low-rank representation. To this end, we first impose penalty constraints on the negative representation to eliminate the correlation between different classes such that the incoherence criterion of the extra-class representation is boosted. Moreover, a constructed subspace model is developed to enhance the self-expressive power of training samples and further build the representation bridge between the training and test samples, such that the coherence of the learned intra-class representation is consistently heightened. Finally, the resulting optimization problem is solved elegantly by employing an alternative optimization strategy, and a simple recognition algorithm on the learned representation is utilized for final prediction. Extensive experimental results demonstrate that the proposed method achieves superb recognition results on four face image datasets, three character datasets, and the fifteen scene multi-categories dataset. It not only shows superior potential on image recognition but also outperforms state-of-the-art methods.
  • In domain adaptation, maximum mean discrepancy (MMD) has been widely adopted as a discrepancy metric between the distributions of source and target domains. However, existing MMD-based domain adaptation methods generally ignore the changes of class prior distributions, i.e., class weight bias across domains. This remains an open problem but ubiquitous for domain adaptation, which can be caused by changes in sample selection criteria and application scenarios. We show that MMD cannot account for class weight bias and results in degraded domain adaptation performance. To address this issue, a weighted MMD model is proposed in this paper. Specifically, we introduce class-specific auxiliary weights into the original MMD for exploiting the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable. To account for it, our proposed weighted MMD model is defined by introducing an auxiliary weight for each class in the source domain, and a classification EM algorithm is suggested by alternating between assigning the pseudo-labels, estimating auxiliary weights and updating model parameters. Extensive experiments demonstrate the superiority of our weighted MMD over conventional MMD for domain adaptation.
  • We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals. In deep neural network (DNN) based SE we introduce an auxiliary structure to learn secondary continuous features, such as mel-frequency cepstral coefficients (MFCCs), and categorical information, such as the ideal binary mask (IBM), and integrate it into the original DNN architecture for joint optimization of all the parameters. This joint estimation scheme imposes additional constraints not available in the direct prediction of LPS, and potentially improves the learning of the primary target. Furthermore, the learned secondary information as a byproduct can be used for other purposes, e.g., the IBM-based post-processing in this work. A series of experiments show that joint LPS and MFCC learning improves the SE performance, and IBM-based post-processing further enhances listening quality of the reconstructed speech.
  • Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. This task encourages research efforts to better analyze and understand the content of the huge amounts of audio data on the web. The difficulty in audio tagging is that it only has a chunk-level label without a frame-level label. This paper presents a weakly supervised method to not only predict the tags but also indicate the temporal locations of the occurred acoustic events. The attention scheme is found to be effective in identifying the important frames while ignoring the unrelated frames. The proposed framework is a deep convolutional recurrent model with two auxiliary modules: an attention module and a localization module. The proposed algorithm was evaluated on the Task 4 of DCASE 2016 challenge. State-of-the-art performance was achieved on the evaluation set with equal error rate (EER) reduced from 0.13 to 0.11, compared with the convolutional recurrent baseline system.
  • Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.
  • Deterministic all-optical control of magnetization without an applied magnetic field has been reported for different materials such as ferrimagnetic and ferromagnetic thin films and granular recording media. These findings have challenged the understanding of all-optical helicity-dependent switching of magnetization and opened many potential applications for future magnetic information, memory and storage technologies. Here we demonstrate optical control of an antiferromagnetic layer through the exchange bias interaction using the helicity of a femtosecond pulsed laser on IrMn/[Co/Pt]xN antiferromagnetic/ ferromagnetic heterostructures. We show controlled switching of the sign of the exchange bias field without any applied field, only by changing the helicity of the light, and quantify the influence of the laser fluence and the number of light pulses on the exchange bias control. We also present the combined effect of laser pulses and applied magnetic field. This study opens applications in spintronic devices where the exchange bias phenomenon is routinely used to fix the magnetization orientation of a magnetic layer in one direction.
  • The recent experimental realization of synthetic spin-orbit coupling (SOC) opens a new avenue for exploring novel quantum states with ultracold atoms. However, in experiments for generating two-dimensional SOC (e.g., Rashba type), a perpendicular Zeeman field, which opens a band gap at the Dirac point and induces many topological phenomena, is still lacking. Here we theoretically propose and experimentally realize a simple scheme for generating two-dimension SOC and a perpendicular Zeeman field simultaneously in ultracold Fermi gases by tuning the polarization of three Raman lasers that couple three hyperfine ground states of atoms. The resulting band gap opening at the Dirac point is probed using spin injection radio-frequency spectroscopy. Our observation may pave the way for exploring topological transport and topological superfluids with exotic Majorana and Weyl fermion excitations in ultracold atoms.
  • Environmental audio tagging aims to predict only the presence or absence of certain acoustic events in the interested acoustic scene. In this paper we make contributions to audio tagging in two parts, respectively, acoustic modeling and feature learning. We propose to use a shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multi-label classification task. For the acoustic modeling, a large set of contextual frames of the chunk are fed into the DNN to perform a multi-label classification for the expected tags, considering that only chunk (or utterance) level rather than frame-level labels are available. Dropout and background noise aware training are also adopted to improve the generalization capability of the DNNs. For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features. The new features, which are smoothed against background noise and more compact with contextual information, can further improve the performance of the DNN baseline. Compared with the standard Gaussian Mixture Model (GMM) baseline of the DCASE 2016 audio tagging challenge, our proposed method obtains a significant equal error rate (EER) reduction from 0.21 to 0.13 on the development set. The proposed aDAE system can get a relative 6.7% EER reduction compared with the strong DNN baseline on the development set. Finally, the results also show that our approach obtains the state-of-the-art performance with 0.15 EER on the evaluation set of the DCASE 2016 audio tagging task while EER of the first prize of this challenge is 0.17.
  • Topological Lifshitz phase transition characterizes an abrupt change of the topology of the Fermi surface through a continuous deformation of parameters. Recently, Lifshitz transition has been predicted to separate two types of Weyl points: type-I and type-II (or called structured Weyl points), which has attracted considerable attention in various fields. Although recent experimental investigation has seen a rapid progress on type-II Weyl points, it still remains a significant challenge to observe their characteristic Lifshitz transition. Here, we propose a scheme to realize both type-I and type-II Weyl points in three-dimensional ultracold atomic gases by introducing an experimentally feasible configuration based on current spin-orbit coupling technology. In the resultant Hamiltonian, we find three degenerate points: two Weyl points carrying a Chern number $-1$ and a four-fold degenerate point carrying a Chern number $2$. Remarkably, by continuous tuning of a convenient experimental knob, all these degenerate points can transition from type-I to type-II, thereby providing an ideal platform to study different types of Weyl points and directly probe their Lifshitz phase transition.
  • Three-dimensional topological Weyl semimetals can generally support a zero-dimensional Weyl point characterized by a quantized Chern number or a one-dimensional Weyl nodal ring (or line) characterized by a quantized Berry phase in the momentum space. Here, in a dissipative system with particle gain and loss, we discover a new type of topological ring, dubbed Weyl exceptional ring consisting of exceptional points at which two eigenstates coalesce. Such a Weyl exceptional ring is characterized by both a quantized Chern number and a quantized Berry phase, which are defined via the Riemann surface. We propose an experimental scheme to realize and measure the Weyl exceptional ring in a dissipative cold atomic gas trapped in an optical lattice.
  • We focus on interference mitigation and energy conservation within a single wireless body area network (WBAN). We adopt two-hop communication scheme supported by the the IEEE 802.15.6 standard (2012). In this paper, we propose a dynamic channel allocation scheme, namely DCAIM to mitigate node-level interference amongst the coexisting regions of a WBAN. At the time, the sensors are in the radius communication of a relay, they form a relay region (RG) coordinated by that relay using time division multiple access (TDMA). In the proposed scheme, each RG creates a table consisting of interfering sensors which it broadcasts to its neighboring sensors. This broadcast allows each pair of RGs to create an interference set (IS). Thus, the members of IS are assigned orthogonal sub-channels whereas other sonsors that do not belong to IS can transmit using the same time slots. Experimental results show that our proposal mitigates node-level interference and improves node and WBAN energy savings. These results are then compared to the results of other schemes. As a result, our scheme outperforms in all cases. Node-level signal to interference and noise ratio (SINR) improved by 11dB whilst, the energy consumption decreased significantly. We further present a probabilistic method and analytically show the outage probability can be effectively reduced to the minimal.