• We address the problem of using hand-drawn sketches to create exaggerated deformations to faces in videos, such as enlarging the shape or modifying the position of eyes or mouth. This task is formulated as a 3D face model reconstruction and deformation problem. We first recover the facial identity and expressions from the video by fitting a face morphable model for each frame. At the same time, user's editing intention is recognized from input sketches as a set of facial modifications. Then a novel identity deformation algorithm is proposed to transfer these facial deformations from 2D space to the 3D facial identity directly while preserving the facial expressions. After an optional stage for further refining the 3D face model, these changes are propagated to the whole video with the modified identity. Both the user study and experimental results demonstrate that our sketching framework can help users effectively edit facial identities in videos, while high consistency and fidelity are ensured at the same time.
  • Lifting is a common manual material handling task performed in the workplaces. It is considered as one of the main risk factors for Work-related Musculoskeletal Disorders. To improve work place safety, it is necessary to assess musculoskeletal and biomechanical risk exposures associated with these tasks, which requires very accurate 3D pose. Existing approaches mainly utilize marker-based sensors to collect 3D information. However, these methods are usually expensive to setup, time-consuming in process, and sensitive to the surrounding environment. In this study, we propose a multi-view based deep perceptron approach to address aforementioned limitations. Our approach consists of two modules: a "view-specific perceptron" network extracts rich information independently from the image of view, which includes both 2D shape and hierarchical texture information; while a "multi-view integration" network synthesizes information from all available views to predict accurate 3D pose. To fully evaluate our approach, we carried out comprehensive experiments to compare different variants of our design. The results prove that our approach achieves comparable performance with former marker-based methods, i.e. an average error of $14.72 \pm 2.96$ mm on the lifting dataset. The results are also compared with state-of-the-art methods on HumanEva-I dataset, which demonstrates the superior performance of our approach.
  • We propose a novel method for real-time face alignment in videos based on a recurrent encoder-decoder network model. Our proposed model predicts 2D facial point heat maps regularized by both detection and regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to enable iterative coarse-to-fine face alignment using a single network model, instead of relying on traditional cascaded model ensembles. At the temporal level, we first decouple the features in the bottleneck of the network into temporal-variant factors, such as pose and expression, and temporal-invariant factors, such as identity information. Temporal recurrent learning is then applied to the decoupled temporal-variant features. We show that such feature disentangling yields better generalization and significantly more accurate results at test time. We perform a comprehensive experimental analysis, showing the importance of each component of our proposed model, as well as superior results over the state of the art and several variations of our method in standard datasets.
  • In this paper, we present a deep extension of Sparse Subspace Clustering, termed Deep Sparse Subspace Clustering (DSSC). Regularized by the unit sphere distribution assumption for the learned deep features, DSSC can infer a new data affinity matrix by simultaneously satisfying the sparsity principle of SSC and the nonlinearity given by neural networks. One of the appealing advantages brought by DSSC is: when original real-world data do not meet the class-specific linear subspace distribution assumption, DSSC can employ neural networks to make the assumption valid with its hierarchical nonlinear transformations. To the best of our knowledge, this is among the first deep learning based subspace clustering methods. Extensive experiments are conducted on four real-world datasets to show the proposed DSSC is significantly superior to 12 existing methods for subspace clustering.
  • Deep neural networks (DNNs) trained on large-scale datasets have recently achieved impressive improvements in face recognition. But a persistent challenge remains to develop methods capable of handling large pose variations that are relatively underrepresented in training data. This paper presents a method for learning a feature representation that is invariant to pose, without requiring extensive pose coverage in training data. We first propose to generate non-frontal views from a single frontal face, in order to increase the diversity of training data while preserving accurate facial details that are critical for identity discrimination. Our next contribution is to seek a rich embedding that encodes identity features, as well as non-identity ones such as pose and landmark locations. Finally, we propose a new feature reconstruction metric learning to explicitly disentangle identity and pose, by demanding alignment between the feature reconstructions through various combinations of identity and pose features, which is obtained from two images of the same subject. Experiments on both controlled and in-the-wild face datasets, such as MultiPIE, 300WLP and the profile view database CFP, show that our method consistently outperforms the state-of-the-art, especially on images with large head pose variations. Detail results and resource are referred to https://sites.google.com/site/xipengcshomepage/iccv2017
  • The exponential growth of mobile data traffic is driving the deployment of dense wireless networks, which will not only impose heavy backhaul burdens, but also generate considerable power consumption. Introducing caches to the wireless network edge is a potential and cost-effective solution to address these challenges. In this paper, we will investigate the problem of minimizing the network power consumption of cache-enabled wireless networks, consisting of the base station (BS) and backhaul power consumption. The objective is to develop efficient algorithms that unify adaptive BS selection, backhaul content assignment and multicast beamforming, while taking account of user QoS requirements and backhaul capacity limitations. To address the NP-hardness of the network power minimization problem, we first propose a generalized layered group sparse beamforming (LGSBF) modeling framework, which helps to reveal the layered sparsity structure in the beamformers. By adopting the reweighted $\left.\ell_{1}\right/\ell_{2}$-norm technique, we further develop a convex approximation procedure for the LGSBF problem, followed by a three-stage iterative LGSBF framework to induce the desired sparsity structure in the beamformers. Simulation results validate the effectiveness of the proposed algorithm in reducing the network power consumption, and demonstrate that caching plays a more significant role in networks with higher user densities and less power-efficient backhaul links.
  • It is a key to construct a similarity graph in graph-oriented subspace learning and clustering. In a similarity graph, each vertex denotes a data point and the edge weight represents the similarity between two points. There are two popular schemes to construct a similarity graph, i.e., pairwise distance based scheme and linear representation based scheme. Most existing works have only involved one of the above schemes and suffered from some limitations. Specifically, pairwise distance based methods are sensitive to the noises and outliers compared with linear representation based methods. On the other hand, there is the possibility that linear representation based algorithms wrongly select inter-subspaces points to represent a point, which will degrade the performance. In this paper, we propose an algorithm, called Locally Linear Representation (LLR), which integrates pairwise distance with linear representation together to address the problems. The proposed algorithm can automatically encode each data point over a set of points that not only could denote the objective point with less residual error, but also are close to the point in Euclidean space. The experimental results show that our approach is promising in subspace learning and subspace clustering.
  • A lot of works have shown that frobenius-norm based representation (FNR) is competitive to sparse representation and nuclear-norm based representation (NNR) in numerous tasks such as subspace clustering. Despite the success of FNR in experimental studies, less theoretical analysis is provided to understand its working mechanism. In this paper, we fill this gap by building the theoretical connections between FNR and NNR. More specially, we prove that: 1) when the dictionary can provide enough representative capacity, FNR is exactly NNR even though the data set contains the Gaussian noise, Laplacian noise, or sample-specified corruption, 2) otherwise, FNR and NNR are two solutions on the column space of the dictionary.
  • In this paper, we address two challenging problems in unsupervised subspace learning: 1) how to automatically identify the feature dimension of the learned subspace (i.e., automatic subspace learning), and 2) how to learn the underlying subspace in the presence of Gaussian noise (i.e., robust subspace learning). We show that these two problems can be simultaneously solved by proposing a new method (called principal coefficients embedding, PCE). For a given data set $\mathbf{D}\in \mathds{R}^{m\times n}$, PCE recovers a clean data set $\mathbf{D}_{0}\in \mathds{R}^{m\times n}$ from $\mathbf{D}$ and simultaneously learns a global reconstruction relation $\mathbf{C}\in \mathbf{R}^{n\times n}$ of $\mathbf{D}_{0}$. By preserving $\mathbf{C}$ into an $m^{\prime}$-dimensional space, the proposed method obtains a projection matrix that can capture the latent manifold structure of $\mathbf{D}_{0}$, where $m^{\prime}\ll m$ is automatically determined by the rank of $\mathbf{C}$ with theoretical guarantees. PCE has three advantages: 1) it can automatically determine the feature dimension even though data are sampled from a union of multiple linear subspaces in presence of the Gaussian noise, 2) Although the objective function of PCE only considers the Gaussian noise, experimental results show that it is robust to the non-Gaussian noise (\textit{e.g.}, random pixel corruption) and real disguises, 3) Our method has a closed-form solution and can be calculated very fast. Extensive experimental results show the superiority of PCE on a range of databases with respect to the classification accuracy, robustness and efficiency.
  • Tracking Facial Points in unconstrained videos is challenging due to the non-rigid deformation that changes over time. In this paper, we propose to exploit incremental learning for person-specific alignment in wild conditions. Our approach takes advantage of part-based representation and cascade regression for robust and efficient alignment on each frame. Unlike existing methods that usually rely on models trained offline, we incrementally update the representation subspace and the cascade of regressors in a unified framework to achieve personalized modeling on the fly. To alleviate the drifting issue, the fitting results are evaluated using a deep neural network, where well-aligned faces are picked out to incrementally update the representation and fitting models. Both image and video datasets are employed to valid the proposed method. The results demonstrate the superior performance of our approach compared with existing approaches in terms of fitting accuracy and efficiency.
  • We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to enable iterative coarse-to-fine face alignment using a single network model. At the temporal level, we first decouple the features in the bottleneck of the network into temporal-variant factors, such as pose and expression, and temporal-invariant factors, such as identity information. Temporal recurrent learning is then applied to the decoupled temporal-variant features, yielding better generalization and significantly more accurate results at test time. We perform a comprehensive experimental analysis, showing the importance of each component of our proposed model, as well as superior results over the state-of-the-art in standard datasets.
  • As mobile services are shifting from "connection-centric" communications to "content-centric" communications, content-centric wireless networking emerges as a promising paradigm to evolve the current network architecture. Caching popular content at the wireless edge, including base stations (BSs) and user terminals (UTs), provides an effective approach to alleviate the heavy burden on backhaul links, as well as lowering delays and deployment costs. In contrast to wired networks, a unique characteristic of content-centric wireless networks (CCWNs) is the mobility of mobile users. While it has rarely been considered by existing works in caching design, user mobility contains various helpful side information that can be exploited to improve caching efficiency at both BSs and UTs. In this paper, we present a general framework on mobility-aware caching in CCWNs. Key properties of user mobility patterns that are useful for content caching will be firstly identified, and then different design methodologies for mobility-aware caching will be proposed. Moreover, two design examples will be provided to illustrate the proposed framework in details, and interesting future research directions will be identified.
  • Caching popular content at base stations is a powerful supplement to existing limited backhaul links for accommodating the exponentially increasing mobile data traffic. Given the limited cache budget, we investigate the cache size allocation problem in cellular networks to maximize the user success probability (USP), taking wireless channel statistics, backhaul capacities and file popularity distributions into consideration. The USP is defined as the probability that one user can successfully download its requested file either from the local cache or via the backhaul link. We first consider a single-cell scenario and derive a closed-form expression for the USP, which helps reveal the impacts of various parameters, such as the file popularity distribution. More specifically, for a highly concentrated file popularity distribution, the required cache size is independent of the total number of files, while for a less concentrated file popularity distribution, the required cache size is in linear relation to the total number of files. Furthermore, we study the multi-cell scenario, and provide a bisection search algorithm to find the optimal cache size allocation. The optimal cache size allocation is verified by simulations, and it is shown to play a more significant role when the file popularity distribution is less concentrated.
  • Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and $\ell_2$-norm-based representation, and have achieved state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving large-scale problems. Second, they cannot cope with out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework which makes representation-based subspace clustering algorithms feasible to cluster both out-of-sample and large-scale data. Under our framework, the large-scale problem is tackled by converting it as out-of-sample problem in the manner of "sampling, clustering, coding, and classifying". Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently-proposed scalable methods in clustering large-scale data set.
  • Spatial Pyramid Matching (SPM) and its variants have achieved a lot of success in image classification. The main difference among them is their encoding schemes. For example, ScSPM incorporates Sparse Code (SC) instead of Vector Quantization (VQ) into the framework of SPM. Although the methods achieve a higher recognition rate than the traditional SPM, they consume more time to encode the local descriptors extracted from the image. In this paper, we propose using Low Rank Representation (LRR) to encode the descriptors under the framework of SPM. Different from SC, LRR considers the group effect among data points instead of sparsity. Benefiting from this property, the proposed method (i.e., LrrSPM) can offer a better performance. To further improve the generalizability and robustness, we reformulate the rank-minimization problem as a truncated projection problem. Extensive experimental studies show that LrrSPM is more efficient than its counterparts (e.g., ScSPM) while achieving competitive recognition rates on nine image data sets.
  • As the capacity demand of mobile applications keeps increasing, the backhaul network is becoming a bottleneck to support high quality of experience (QoE) in next-generation wireless networks. Content caching at base stations (BSs) is a promising approach to alleviate the backhaul burden and reduce user-perceived latency. In this paper, we consider a wireless caching network where all the BSs are connected to a central controller via backhaul links. In such a network, users can obtain the required data from candidate BSs if the data are pre-cached. Otherwise, the user data need to be first retrieved from the central controller to local BSs, which introduces extra delay over the backhaul. In order to reduce the download delay, the caching placement strategy needs to be optimized. We formulate such a design problem as the minimization of the average download delay over user requests, subject to the caching capacity constraint of each BS. Different from existing works, our model takes BS cooperation in the radio access into consideration and is fully aware of the propagation delay on the backhaul links. The design problem is a mixed integer programming problem and is highly complicated, and thus we relax the problem and propose a low-complexity algorithm. Simulation results will show that the proposed algorithm can effectively determine the near-optimal caching placement and provide significant performance gains over conventional caching placement strategies.
  • Under the framework of graph-based learning, the key to robust subspace clustering and subspace learning is to obtain a good similarity graph that eliminates the effects of errors and retains only connections between the data points from the same subspace (i.e., intra-subspace data points). Recent works achieve good performance by modeling errors into their objective functions to remove the errors from the inputs. However, these approaches face the limitations that the structure of errors should be known prior and a complex convex problem must be solved. In this paper, we present a novel method to eliminate the effects of the errors from the projection space (representation) rather than from the input space. We first prove that $\ell_1$-, $\ell_2$-, $\ell_{\infty}$-, and nuclear-norm based linear projection spaces share the property of Intra-subspace Projection Dominance (IPD), i.e., the coefficients over intra-subspace data points are larger than those over inter-subspace data points. Based on this property, we introduce a method to construct a sparse similarity graph, called L2-Graph. The subspace clustering and subspace learning algorithms are developed upon L2-Graph. Experiments show that L2-Graph algorithms outperform the state-of-the-art methods for feature extraction, image clustering, and motion segmentation in terms of accuracy, robustness, and time efficiency.
  • The model of low-dimensional manifold and sparse representation are two well-known concise models that suggest each data can be described by a few characteristics. Manifold learning is usually investigated for dimension reduction by preserving some expected local geometric structures from the original space to a low-dimensional one. The structures are generally determined by using pairwise distance, e.g., Euclidean distance. Alternatively, sparse representation denotes a data point as a linear combination of the points from the same subspace. In practical applications, however, the nearby points in terms of pairwise distance may not belong to the same subspace, and vice versa. Consequently, it is interesting and important to explore how to get a better representation by integrating these two models together. To this end, this paper proposes a novel coding algorithm, called Locality-Constrained Collaborative Representation (LCCR), which improves the robustness and discrimination of data representation by introducing a kind of local consistency. The locality term derives from a biologic observation that the similar inputs have similar code. The objective function of LCCR has an analytical solution, and it does not involve local minima. The empirical studies based on four public facial databases, ORL, AR, Extended Yale B, and Multiple PIE, show that LCCR is promising in recognizing human faces from frontal views with varying expression and illumination, as well as various corruptions and occlusions.
  • Sparse Subspace Clustering (SSC) has achieved state-of-the-art clustering quality by performing spectral clustering over a $\ell^{1}$-norm based similarity graph. However, SSC is a transductive method which does not handle with the data not used to construct the graph (out-of-sample data). For each new datum, SSC requires solving $n$ optimization problems in O(n) variables for performing the algorithm over the whole data set, where $n$ is the number of data points. Therefore, it is inefficient to apply SSC in fast online clustering and scalable graphing. In this letter, we propose an inductive spectral clustering algorithm, called inductive Sparse Subspace Clustering (iSSC), which makes SSC feasible to cluster out-of-sample data. iSSC adopts the assumption that high-dimensional data actually lie on the low-dimensional manifold such that out-of-sample data could be grouped in the embedding space learned from in-sample data. Experimental results show that iSSC is promising in clustering out-of-sample data.