• We propose a deep hashing framework for sketch retrieval that, for the first time, works on a multi-million scale human sketch dataset. Leveraging on this large dataset, we explore a few sketch-specific traits that were otherwise under-studied in prior literature. Instead of following the conventional sketch recognition task, we introduce the novel problem of sketch hashing retrieval which is not only more challenging, but also offers a better testbed for large-scale sketch analysis, since: (i) more fine-grained sketch feature learning is required to accommodate the large variations in style and abstraction, and (ii) a compact binary code needs to be learned at the same time to enable efficient retrieval. Key to our network design is the embedding of unique characteristics of human sketch, where (i) a two-branch CNN-RNN architecture is adapted to explore the temporal ordering of strokes, and (ii) a novel hashing loss is specifically designed to accommodate both the temporal and abstract traits of sketches. By working with a 3.8M sketch dataset, we show that state-of-the-art hashing models specifically engineered for static images fail to perform well on temporal sketch data. Our network on the other hand not only offers the best retrieval performance on various code sizes, but also yields the best generalization performance under a zero-shot setting and when re-purposed for sketch recognition. Such superior performances effectively demonstrate the benefit of our sketch-specific design.
  • In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when training the EN, all labels are set as the same, i.e., the clean speech label, which aims to make the AN features invariant to noise and thus achieve noise robustness. We evaluate the performance of the proposed feature on a Gaussian Mixture Model-Universal Background Model based speaker verification system, and make comparison to MFCC features of speech enhanced by short-time spectral amplitude minimum mean square error (STSA-MMSE) and deep neural network-based speech enhancement (DNN-SE) methods. Experimental results on the RSR2015 database show that the proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE and DNN-SE based MFCCs for different noise types and signal-to-noise ratios. Furthermore, the AN-BN feature is able to improve the speaker verification performance under the clean condition.
  • In this paper, we propose novel strategies for neutral vector variable decorrelation. Two fundamental invertible transformations, namely serial nonlinear transformation and parallel nonlinear transformation, are proposed to carry out the decorrelation. For a neutral vector variable, which is not multivariate Gaussian distributed, the conventional principal component analysis (PCA) cannot yield mutually independent scalar variables. With the two proposed transformations, a highly negatively correlated neutral vector can be transformed to a set of mutually independent scalar variables with the same degrees of freedom. We also evaluate the decorrelation performances for the vectors generated from a single Dirichlet distribution and a mixture of Dirichlet distributions. The mutual independence is verified with the distance correlation measurement. The advantages of the proposed decorrelation strategies are intensively studied and demonstrated with synthesized data and practical application evaluations.
  • Data analysis plays an important role in the development of intelligent energy networks (IENs). This article reviews and discusses the application of data analysis methods for energy big data. The installation of smart energy meters has provided a huge volume of data at different time resolutions, suggesting data analysis is required for clustering, demand forecasting, energy generation optimization, energy pricing, monitoring and diagnostics. The currently adopted data analysis technologies for IENs include pattern recognition, machine learning, data mining, statistics methods, etc. However, existing methods for data analysis cannot fully meet the requirements for processing the big data produced by the IENs and, therefore, more comprehensive data analysis methods are needed to handle the increasing amount of data and to mine more valuable information.
  • Sketch-based image retrieval (SBIR) is challenging due to the inherent domain-gap between sketch and photo. Compared with pixel-perfect depictions of photos, sketches are iconic renderings of the real world with highly abstract. Therefore, matching sketch and photo directly using low-level visual clues are unsufficient, since a common low-level subspace that traverses semantically across the two modalities is non-trivial to establish. Most existing SBIR studies do not directly tackle this cross-modal problem. This naturally motivates us to explore the effectiveness of cross-modal retrieval methods in SBIR, which have been applied in the image-text matching successfully. In this paper, we introduce and compare a series of state-of-the-art cross-modal subspace learning methods and benchmark them on two recently released fine-grained SBIR datasets. Through thorough examination of the experimental results, we have demonstrated that the subspace learning can effectively model the sketch-photo domain-gap. In addition we draw a few key insights to drive future research.
  • With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof {2015} database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks.