-
We propose a deep hashing framework for sketch retrieval that, for the first
time, works on a multi-million scale human sketch dataset. Leveraging on this
large dataset, we explore a few sketch-specific traits that were otherwise
under-studied in prior literature. Instead of following the conventional sketch
recognition task, we introduce the novel problem of sketch hashing retrieval
which is not only more challenging, but also offers a better testbed for
large-scale sketch analysis, since: (i) more fine-grained sketch feature
learning is required to accommodate the large variations in style and
abstraction, and (ii) a compact binary code needs to be learned at the same
time to enable efficient retrieval. Key to our network design is the embedding
of unique characteristics of human sketch, where (i) a two-branch CNN-RNN
architecture is adapted to explore the temporal ordering of strokes, and (ii) a
novel hashing loss is specifically designed to accommodate both the temporal
and abstract traits of sketches. By working with a 3.8M sketch dataset, we show
that state-of-the-art hashing models specifically engineered for static images
fail to perform well on temporal sketch data. Our network on the other hand not
only offers the best retrieval performance on various code sizes, but also
yields the best generalization performance under a zero-shot setting and when
re-purposed for sketch recognition. Such superior performances effectively
demonstrate the benefit of our sketch-specific design.
-
In this paper, we propose a noise robust bottleneck feature representation
which is generated by an adversarial network (AN). The AN includes two cascade
connected networks, an encoding network (EN) and a discriminative network (DN).
Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used
as input to the EN and the output of the EN is used as the noise robust
feature. The EN and DN are trained in turn, namely, when training the DN, noise
types are selected as the training labels and when training the EN, all labels
are set as the same, i.e., the clean speech label, which aims to make the AN
features invariant to noise and thus achieve noise robustness. We evaluate the
performance of the proposed feature on a Gaussian Mixture Model-Universal
Background Model based speaker verification system, and make comparison to MFCC
features of speech enhanced by short-time spectral amplitude minimum mean
square error (STSA-MMSE) and deep neural network-based speech enhancement
(DNN-SE) methods. Experimental results on the RSR2015 database show that the
proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE
and DNN-SE based MFCCs for different noise types and signal-to-noise ratios.
Furthermore, the AN-BN feature is able to improve the speaker verification
performance under the clean condition.
-
In this paper, we propose novel strategies for neutral vector variable
decorrelation. Two fundamental invertible transformations, namely serial
nonlinear transformation and parallel nonlinear transformation, are proposed to
carry out the decorrelation. For a neutral vector variable, which is not
multivariate Gaussian distributed, the conventional principal component
analysis (PCA) cannot yield mutually independent scalar variables. With the two
proposed transformations, a highly negatively correlated neutral vector can be
transformed to a set of mutually independent scalar variables with the same
degrees of freedom. We also evaluate the decorrelation performances for the
vectors generated from a single Dirichlet distribution and a mixture of
Dirichlet distributions. The mutual independence is verified with the distance
correlation measurement. The advantages of the proposed decorrelation
strategies are intensively studied and demonstrated with synthesized data and
practical application evaluations.
-
Data analysis plays an important role in the development of intelligent
energy networks (IENs). This article reviews and discusses the application of
data analysis methods for energy big data. The installation of smart energy
meters has provided a huge volume of data at different time resolutions,
suggesting data analysis is required for clustering, demand forecasting, energy
generation optimization, energy pricing, monitoring and diagnostics. The
currently adopted data analysis technologies for IENs include pattern
recognition, machine learning, data mining, statistics methods, etc. However,
existing methods for data analysis cannot fully meet the requirements for
processing the big data produced by the IENs and, therefore, more comprehensive
data analysis methods are needed to handle the increasing amount of data and to
mine more valuable information.
-
Sketch-based image retrieval (SBIR) is challenging due to the inherent
domain-gap between sketch and photo. Compared with pixel-perfect depictions of
photos, sketches are iconic renderings of the real world with highly abstract.
Therefore, matching sketch and photo directly using low-level visual clues are
unsufficient, since a common low-level subspace that traverses semantically
across the two modalities is non-trivial to establish. Most existing SBIR
studies do not directly tackle this cross-modal problem. This naturally
motivates us to explore the effectiveness of cross-modal retrieval methods in
SBIR, which have been applied in the image-text matching successfully. In this
paper, we introduce and compare a series of state-of-the-art cross-modal
subspace learning methods and benchmark them on two recently released
fine-grained SBIR datasets. Through thorough examination of the experimental
results, we have demonstrated that the subspace learning can effectively model
the sketch-photo domain-gap. In addition we draw a few key insights to drive
future research.
-
With the development of speech synthesis techniques, automatic speaker
verification systems face the serious challenge of spoofing attack. In order to
improve the reliability of speaker verification systems, we develop a new
filter bank based cepstral feature, deep neural network filter bank cepstral
coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The
deep neural network filter bank is automatically generated by training a filter
bank neural network (FBNN) using natural and synthetic speech. By adding
restrictions on the training rules, the learned weight matrix of FBNN is
band-limited and sorted by frequency, similar to the normal filter bank. Unlike
the manually designed filter bank, the learned filter bank has different filter
shapes in different channels, which can capture the differences between natural
and synthetic speech more effectively. The experimental results on the ASVspoof
{2015} database show that the Gaussian mixture model maximum-likelihood
(GMM-ML) classifier trained by the new feature performs better than the
state-of-the-art linear frequency cepstral coefficients (LFCC) based
classifier, especially on detecting unknown attacks.