
We address the problem of using handdrawn sketches to create exaggerated
deformations to faces in videos, such as enlarging the shape or modifying the
position of eyes or mouth. This task is formulated as a 3D face model
reconstruction and deformation problem. We first recover the facial identity
and expressions from the video by fitting a face morphable model for each
frame. At the same time, user's editing intention is recognized from input
sketches as a set of facial modifications. Then a novel identity deformation
algorithm is proposed to transfer these facial deformations from 2D space to
the 3D facial identity directly while preserving the facial expressions. After
an optional stage for further refining the 3D face model, these changes are
propagated to the whole video with the modified identity. Both the user study
and experimental results demonstrate that our sketching framework can help
users effectively edit facial identities in videos, while high consistency and
fidelity are ensured at the same time.

Lifting is a common manual material handling task performed in the
workplaces. It is considered as one of the main risk factors for Workrelated
Musculoskeletal Disorders. To improve work place safety, it is necessary to
assess musculoskeletal and biomechanical risk exposures associated with these
tasks, which requires very accurate 3D pose. Existing approaches mainly utilize
markerbased sensors to collect 3D information. However, these methods are
usually expensive to setup, timeconsuming in process, and sensitive to the
surrounding environment. In this study, we propose a multiview based deep
perceptron approach to address aforementioned limitations. Our approach
consists of two modules: a "viewspecific perceptron" network extracts rich
information independently from the image of view, which includes both 2D shape
and hierarchical texture information; while a "multiview integration" network
synthesizes information from all available views to predict accurate 3D pose.
To fully evaluate our approach, we carried out comprehensive experiments to
compare different variants of our design. The results prove that our approach
achieves comparable performance with former markerbased methods, i.e. an
average error of $14.72 \pm 2.96$ mm on the lifting dataset. The results are
also compared with stateoftheart methods on HumanEvaI dataset, which
demonstrates the superior performance of our approach.

We propose a novel method for realtime face alignment in videos based on a
recurrent encoderdecoder network model. Our proposed model predicts 2D facial
point heat maps regularized by both detection and regression loss, while
uniquely exploiting recurrent learning at both spatial and temporal dimensions.
At the spatial level, we add a feedback loop connection between the combined
output response map and the input, in order to enable iterative coarsetofine
face alignment using a single network model, instead of relying on traditional
cascaded model ensembles. At the temporal level, we first decouple the features
in the bottleneck of the network into temporalvariant factors, such as pose
and expression, and temporalinvariant factors, such as identity information.
Temporal recurrent learning is then applied to the decoupled temporalvariant
features. We show that such feature disentangling yields better generalization
and significantly more accurate results at test time. We perform a
comprehensive experimental analysis, showing the importance of each component
of our proposed model, as well as superior results over the state of the art
and several variations of our method in standard datasets.

In this paper, we present a deep extension of Sparse Subspace Clustering,
termed Deep Sparse Subspace Clustering (DSSC). Regularized by the unit sphere
distribution assumption for the learned deep features, DSSC can infer a new
data affinity matrix by simultaneously satisfying the sparsity principle of SSC
and the nonlinearity given by neural networks. One of the appealing advantages
brought by DSSC is: when original realworld data do not meet the
classspecific linear subspace distribution assumption, DSSC can employ neural
networks to make the assumption valid with its hierarchical nonlinear
transformations. To the best of our knowledge, this is among the first deep
learning based subspace clustering methods. Extensive experiments are conducted
on four realworld datasets to show the proposed DSSC is significantly superior
to 12 existing methods for subspace clustering.

Deep neural networks (DNNs) trained on largescale datasets have recently
achieved impressive improvements in face recognition. But a persistent
challenge remains to develop methods capable of handling large pose variations
that are relatively underrepresented in training data. This paper presents a
method for learning a feature representation that is invariant to pose, without
requiring extensive pose coverage in training data. We first propose to
generate nonfrontal views from a single frontal face, in order to increase the
diversity of training data while preserving accurate facial details that are
critical for identity discrimination. Our next contribution is to seek a rich
embedding that encodes identity features, as well as nonidentity ones such as
pose and landmark locations. Finally, we propose a new feature reconstruction
metric learning to explicitly disentangle identity and pose, by demanding
alignment between the feature reconstructions through various combinations of
identity and pose features, which is obtained from two images of the same
subject. Experiments on both controlled and inthewild face datasets, such as
MultiPIE, 300WLP and the profile view database CFP, show that our method
consistently outperforms the stateoftheart, especially on images with large
head pose variations. Detail results and resource are referred to
https://sites.google.com/site/xipengcshomepage/iccv2017

The exponential growth of mobile data traffic is driving the deployment of
dense wireless networks, which will not only impose heavy backhaul burdens, but
also generate considerable power consumption. Introducing caches to the
wireless network edge is a potential and costeffective solution to address
these challenges. In this paper, we will investigate the problem of minimizing
the network power consumption of cacheenabled wireless networks, consisting of
the base station (BS) and backhaul power consumption. The objective is to
develop efficient algorithms that unify adaptive BS selection, backhaul content
assignment and multicast beamforming, while taking account of user QoS
requirements and backhaul capacity limitations. To address the NPhardness of
the network power minimization problem, we first propose a generalized layered
group sparse beamforming (LGSBF) modeling framework, which helps to reveal the
layered sparsity structure in the beamformers. By adopting the reweighted
$\left.\ell_{1}\right/\ell_{2}$norm technique, we further develop a convex
approximation procedure for the LGSBF problem, followed by a threestage
iterative LGSBF framework to induce the desired sparsity structure in the
beamformers. Simulation results validate the effectiveness of the proposed
algorithm in reducing the network power consumption, and demonstrate that
caching plays a more significant role in networks with higher user densities
and less powerefficient backhaul links.

It is a key to construct a similarity graph in graphoriented subspace
learning and clustering. In a similarity graph, each vertex denotes a data
point and the edge weight represents the similarity between two points. There
are two popular schemes to construct a similarity graph, i.e., pairwise
distance based scheme and linear representation based scheme. Most existing
works have only involved one of the above schemes and suffered from some
limitations. Specifically, pairwise distance based methods are sensitive to the
noises and outliers compared with linear representation based methods. On the
other hand, there is the possibility that linear representation based
algorithms wrongly select intersubspaces points to represent a point, which
will degrade the performance. In this paper, we propose an algorithm, called
Locally Linear Representation (LLR), which integrates pairwise distance with
linear representation together to address the problems. The proposed algorithm
can automatically encode each data point over a set of points that not only
could denote the objective point with less residual error, but also are close
to the point in Euclidean space. The experimental results show that our
approach is promising in subspace learning and subspace clustering.

A lot of works have shown that frobeniusnorm based representation (FNR) is
competitive to sparse representation and nuclearnorm based representation
(NNR) in numerous tasks such as subspace clustering. Despite the success of FNR
in experimental studies, less theoretical analysis is provided to understand
its working mechanism. In this paper, we fill this gap by building the
theoretical connections between FNR and NNR. More specially, we prove that: 1)
when the dictionary can provide enough representative capacity, FNR is exactly
NNR even though the data set contains the Gaussian noise, Laplacian noise, or
samplespecified corruption, 2) otherwise, FNR and NNR are two solutions on the
column space of the dictionary.

In this paper, we address two challenging problems in unsupervised subspace
learning: 1) how to automatically identify the feature dimension of the learned
subspace (i.e., automatic subspace learning), and 2) how to learn the
underlying subspace in the presence of Gaussian noise (i.e., robust subspace
learning). We show that these two problems can be simultaneously solved by
proposing a new method (called principal coefficients embedding, PCE). For a
given data set $\mathbf{D}\in \mathds{R}^{m\times n}$, PCE recovers a clean
data set $\mathbf{D}_{0}\in \mathds{R}^{m\times n}$ from $\mathbf{D}$ and
simultaneously learns a global reconstruction relation $\mathbf{C}\in
\mathbf{R}^{n\times n}$ of $\mathbf{D}_{0}$. By preserving $\mathbf{C}$ into an
$m^{\prime}$dimensional space, the proposed method obtains a projection matrix
that can capture the latent manifold structure of $\mathbf{D}_{0}$, where
$m^{\prime}\ll m$ is automatically determined by the rank of $\mathbf{C}$ with
theoretical guarantees. PCE has three advantages: 1) it can automatically
determine the feature dimension even though data are sampled from a union of
multiple linear subspaces in presence of the Gaussian noise, 2) Although the
objective function of PCE only considers the Gaussian noise, experimental
results show that it is robust to the nonGaussian noise (\textit{e.g.}, random
pixel corruption) and real disguises, 3) Our method has a closedform solution
and can be calculated very fast. Extensive experimental results show the
superiority of PCE on a range of databases with respect to the classification
accuracy, robustness and efficiency.

Tracking Facial Points in unconstrained videos is challenging due to the
nonrigid deformation that changes over time. In this paper, we propose to
exploit incremental learning for personspecific alignment in wild conditions.
Our approach takes advantage of partbased representation and cascade
regression for robust and efficient alignment on each frame. Unlike existing
methods that usually rely on models trained offline, we incrementally update
the representation subspace and the cascade of regressors in a unified
framework to achieve personalized modeling on the fly. To alleviate the
drifting issue, the fitting results are evaluated using a deep neural network,
where wellaligned faces are picked out to incrementally update the
representation and fitting models. Both image and video datasets are employed
to valid the proposed method. The results demonstrate the superior performance
of our approach compared with existing approaches in terms of fitting accuracy
and efficiency.

We propose a novel recurrent encoderdecoder network model for realtime
videobased face alignment. Our proposed model predicts 2D facial point maps
regularized by a regression loss, while uniquely exploiting recurrent learning
at both spatial and temporal dimensions. At the spatial level, we add a
feedback loop connection between the combined output response map and the
input, in order to enable iterative coarsetofine face alignment using a
single network model. At the temporal level, we first decouple the features in
the bottleneck of the network into temporalvariant factors, such as pose and
expression, and temporalinvariant factors, such as identity information.
Temporal recurrent learning is then applied to the decoupled temporalvariant
features, yielding better generalization and significantly more accurate
results at test time. We perform a comprehensive experimental analysis, showing
the importance of each component of our proposed model, as well as superior
results over the stateoftheart in standard datasets.

As mobile services are shifting from "connectioncentric" communications to
"contentcentric" communications, contentcentric wireless networking emerges
as a promising paradigm to evolve the current network architecture. Caching
popular content at the wireless edge, including base stations (BSs) and user
terminals (UTs), provides an effective approach to alleviate the heavy burden
on backhaul links, as well as lowering delays and deployment costs. In contrast
to wired networks, a unique characteristic of contentcentric wireless networks
(CCWNs) is the mobility of mobile users. While it has rarely been considered by
existing works in caching design, user mobility contains various helpful side
information that can be exploited to improve caching efficiency at both BSs and
UTs. In this paper, we present a general framework on mobilityaware caching in
CCWNs. Key properties of user mobility patterns that are useful for content
caching will be firstly identified, and then different design methodologies for
mobilityaware caching will be proposed. Moreover, two design examples will be
provided to illustrate the proposed framework in details, and interesting
future research directions will be identified.

Caching popular content at base stations is a powerful supplement to existing
limited backhaul links for accommodating the exponentially increasing mobile
data traffic. Given the limited cache budget, we investigate the cache size
allocation problem in cellular networks to maximize the user success
probability (USP), taking wireless channel statistics, backhaul capacities and
file popularity distributions into consideration. The USP is defined as the
probability that one user can successfully download its requested file either
from the local cache or via the backhaul link. We first consider a singlecell
scenario and derive a closedform expression for the USP, which helps reveal
the impacts of various parameters, such as the file popularity distribution.
More specifically, for a highly concentrated file popularity distribution, the
required cache size is independent of the total number of files, while for a
less concentrated file popularity distribution, the required cache size is in
linear relation to the total number of files. Furthermore, we study the
multicell scenario, and provide a bisection search algorithm to find the
optimal cache size allocation. The optimal cache size allocation is verified by
simulations, and it is shown to play a more significant role when the file
popularity distribution is less concentrated.

Under the framework of spectral clustering, the key of subspace clustering is
building a similarity graph which describes the neighborhood relations among
data points. Some recent works build the graph using sparse, lowrank, and
$\ell_2$normbased representation, and have achieved stateoftheart
performance. However, these methods have suffered from the following two
limitations. First, the time complexities of these methods are at least
proportional to the cube of the data size, which make those methods inefficient
for solving largescale problems. Second, they cannot cope with outofsample
data that are not used to construct the similarity graph. To cluster each
outofsample datum, the methods have to recalculate the similarity graph and
the cluster membership of the whole data set. In this paper, we propose a
unified framework which makes representationbased subspace clustering
algorithms feasible to cluster both outofsample and largescale data. Under
our framework, the largescale problem is tackled by converting it as
outofsample problem in the manner of "sampling, clustering, coding, and
classifying". Furthermore, we give an estimation for the error bounds by
treating each subspace as a point in a hyperspace. Extensive experimental
results on various benchmark data sets show that our methods outperform several
recentlyproposed scalable methods in clustering largescale data set.

Spatial Pyramid Matching (SPM) and its variants have achieved a lot of
success in image classification. The main difference among them is their
encoding schemes. For example, ScSPM incorporates Sparse Code (SC) instead of
Vector Quantization (VQ) into the framework of SPM. Although the methods
achieve a higher recognition rate than the traditional SPM, they consume more
time to encode the local descriptors extracted from the image. In this paper,
we propose using Low Rank Representation (LRR) to encode the descriptors under
the framework of SPM. Different from SC, LRR considers the group effect among
data points instead of sparsity. Benefiting from this property, the proposed
method (i.e., LrrSPM) can offer a better performance. To further improve the
generalizability and robustness, we reformulate the rankminimization problem
as a truncated projection problem. Extensive experimental studies show that
LrrSPM is more efficient than its counterparts (e.g., ScSPM) while achieving
competitive recognition rates on nine image data sets.

As the capacity demand of mobile applications keeps increasing, the backhaul
network is becoming a bottleneck to support high quality of experience (QoE) in
nextgeneration wireless networks. Content caching at base stations (BSs) is a
promising approach to alleviate the backhaul burden and reduce userperceived
latency. In this paper, we consider a wireless caching network where all the
BSs are connected to a central controller via backhaul links. In such a
network, users can obtain the required data from candidate BSs if the data are
precached. Otherwise, the user data need to be first retrieved from the
central controller to local BSs, which introduces extra delay over the
backhaul. In order to reduce the download delay, the caching placement strategy
needs to be optimized. We formulate such a design problem as the minimization
of the average download delay over user requests, subject to the caching
capacity constraint of each BS. Different from existing works, our model takes
BS cooperation in the radio access into consideration and is fully aware of the
propagation delay on the backhaul links. The design problem is a mixed integer
programming problem and is highly complicated, and thus we relax the problem
and propose a lowcomplexity algorithm. Simulation results will show that the
proposed algorithm can effectively determine the nearoptimal caching placement
and provide significant performance gains over conventional caching placement
strategies.

Under the framework of graphbased learning, the key to robust subspace
clustering and subspace learning is to obtain a good similarity graph that
eliminates the effects of errors and retains only connections between the data
points from the same subspace (i.e., intrasubspace data points). Recent works
achieve good performance by modeling errors into their objective functions to
remove the errors from the inputs. However, these approaches face the
limitations that the structure of errors should be known prior and a complex
convex problem must be solved. In this paper, we present a novel method to
eliminate the effects of the errors from the projection space (representation)
rather than from the input space. We first prove that $\ell_1$, $\ell_2$,
$\ell_{\infty}$, and nuclearnorm based linear projection spaces share the
property of Intrasubspace Projection Dominance (IPD), i.e., the coefficients
over intrasubspace data points are larger than those over intersubspace data
points. Based on this property, we introduce a method to construct a sparse
similarity graph, called L2Graph. The subspace clustering and subspace
learning algorithms are developed upon L2Graph. Experiments show that L2Graph
algorithms outperform the stateoftheart methods for feature extraction,
image clustering, and motion segmentation in terms of accuracy, robustness, and
time efficiency.

The model of lowdimensional manifold and sparse representation are two
wellknown concise models that suggest each data can be described by a few
characteristics. Manifold learning is usually investigated for dimension
reduction by preserving some expected local geometric structures from the
original space to a lowdimensional one. The structures are generally
determined by using pairwise distance, e.g., Euclidean distance. Alternatively,
sparse representation denotes a data point as a linear combination of the
points from the same subspace. In practical applications, however, the nearby
points in terms of pairwise distance may not belong to the same subspace, and
vice versa. Consequently, it is interesting and important to explore how to get
a better representation by integrating these two models together. To this end,
this paper proposes a novel coding algorithm, called LocalityConstrained
Collaborative Representation (LCCR), which improves the robustness and
discrimination of data representation by introducing a kind of local
consistency. The locality term derives from a biologic observation that the
similar inputs have similar code. The objective function of LCCR has an
analytical solution, and it does not involve local minima. The empirical
studies based on four public facial databases, ORL, AR, Extended Yale B, and
Multiple PIE, show that LCCR is promising in recognizing human faces from
frontal views with varying expression and illumination, as well as various
corruptions and occlusions.

Sparse Subspace Clustering (SSC) has achieved stateoftheart clustering
quality by performing spectral clustering over a $\ell^{1}$norm based
similarity graph. However, SSC is a transductive method which does not handle
with the data not used to construct the graph (outofsample data). For each
new datum, SSC requires solving $n$ optimization problems in O(n) variables for
performing the algorithm over the whole data set, where $n$ is the number of
data points. Therefore, it is inefficient to apply SSC in fast online
clustering and scalable graphing. In this letter, we propose an inductive
spectral clustering algorithm, called inductive Sparse Subspace Clustering
(iSSC), which makes SSC feasible to cluster outofsample data. iSSC adopts the
assumption that highdimensional data actually lie on the lowdimensional
manifold such that outofsample data could be grouped in the embedding space
learned from insample data. Experimental results show that iSSC is promising
in clustering outofsample data.