• Although tremendous strides have been made in face detection, one of the remaining open challenges is to achieve real-time speed on the CPU as well as maintain high performance, since effective models for face detection tend to be computationally prohibitive. To address this challenge, we propose a novel face detector, named FaceBoxes, with superior performance on both speed and accuracy. Specifically, our method has a lightweight yet powerful network structure that consists of the Rapidly Digested Convolutional Layers (RDCL) and the Multiple Scale Convolutional Layers (MSCL). The RDCL is designed to enable FaceBoxes to achieve real-time speed on the CPU. The MSCL aims at enriching the receptive fields and discretizing anchors over different layers to handle faces of various scales. Besides, we propose a new anchor densification strategy to make different types of anchors have the same density on the image, which significantly improves the recall rate of small faces. As a consequence, the proposed detector runs at 20 FPS on a single CPU core and 125 FPS using a GPU for VGA-resolution images. Moreover, the speed of FaceBoxes is invariant to the number of faces. We comprehensively evaluate this method and present state-of-the-art detection performance on several face detection benchmark datasets, including the AFW, PASCAL face, and FDDB. Code is available at https://github.com/sfzhang15/FaceBoxes
  • Softmax loss is arguably one of the most popular losses to train CNN models for image classification. However, recent works have exposed its limitation on feature discriminability. This paper casts a new viewpoint on the weakness of softmax loss. On the one hand, the CNN features learned using the softmax loss are often inadequately discriminative. We hence introduce a soft-margin softmax function to explicitly encourage the discrimination between different classes. On the other hand, the learned classifier of softmax loss is weak. We propose to assemble multiple these weak classifiers to a strong one, inspired by the recognition that the diversity among weak classifiers is critical to a good ensemble. To achieve the diversity, we adopt the Hilbert-Schmidt Independence Criterion (HSIC). Considering these two aspects in one framework, we design a novel loss, named as Ensemble soft-Margin Softmax (EM-Softmax). Extensive experiments on benchmark datasets are conducted to show the superiority of our design over the baseline softmax loss and several state-of-the-art alternatives.
  • Face alignment, which fits a face model to an image and extracts the semantic meanings of facial pixels, has been an important topic in the computer vision community. However, most algorithms are designed for faces in small to medium poses (yaw angle is smaller than 45 degrees), which lack the ability to align faces in large poses up to 90 degrees. The challenges are three-fold. Firstly, the commonly used landmark face model assumes that all the landmarks are visible and is therefore not suitable for large poses. Secondly, the face appearance varies more drastically across large poses, from the frontal view to the profile view. Thirdly, labelling landmarks in large poses is extremely challenging since the invisible landmarks have to be guessed. In this paper, we propose to tackle these three challenges in an new alignment framework termed 3D Dense Face Alignment (3DDFA), in which a dense 3D Morphable Model (3DMM) is fitted to the image via Cascaded Convolutional Neural Networks. We also utilize 3D information to synthesize face images in profile views to provide abundant samples for training. Experiments on the challenging AFLW database show that the proposed approach achieves significant improvements over the state-of-the-art methods.
  • For object detection, the two-stage approach (e.g., Faster R-CNN) has been achieving the highest accuracy, whereas the one-stage approach (e.g., SSD) has the advantage of high efficiency. To inherit the merits of both while overcoming their disadvantages, in this paper, we propose a novel single-shot based detector, called RefineDet, that achieves better accuracy than two-stage methods and maintains comparable efficiency of one-stage methods. RefineDet consists of two inter-connected modules, namely, the anchor refinement module and the object detection module. Specifically, the former aims to (1) filter out negative anchors to reduce search space for the classifier, and (2) coarsely adjust the locations and sizes of anchors to provide better initialization for the subsequent regressor. The latter module takes the refined anchors as the input from the former to further improve the regression and predict multi-class label. Meanwhile, we design a transfer connection block to transfer the features in the anchor refinement module to predict locations, sizes and class labels of objects in the object detection module. The multi-task loss function enables us to train the whole network in an end-to-end way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO demonstrate that RefineDet achieves state-of-the-art detection accuracy with high efficiency. Code is available at https://github.com/sfzhang15/RefineDet
  • We find a new scaling invariance of the barotropic compressible Navier-Stokes equations. Then it is shown that type I singularities of solutions with $$\limsup_{t \nearrow T}|{\rm div} u(t, x)|(T - t) \leq \kappa,$$ can never happen at time $T$ for all adiabatic number $\gamma \geq 1$. Here $\kappa > 0$ doesn't depend on the initial data. This is achieved by proving the regularity of solutions under $$\rho(t, x) \leq \frac{M}{(T - t)^\kappa},\quad M < \infty.$$ This new scaling invariance also motivates us to construct an explicit type II blowup solution for $\gamma > 1$.
  • This paper presents a real-time face detector, named Single Shot Scale-invariant Face Detector (S$^3$FD), which performs superiorly on various scales of faces with a single deep neural network, especially for small faces. Specifically, we try to solve the common problem that anchor-based detectors deteriorate dramatically as the objects become smaller. We make contributions in the following three aspects: 1) proposing a scale-equitable face detection framework to handle different scales of faces well. We tile anchors on a wide range of layers to ensure that all scales of faces have enough features for detection. Besides, we design anchor scales based on the effective receptive field and a proposed equal proportion interval principle; 2) improving the recall rate of small faces by a scale compensation anchor matching strategy; 3) reducing the false positive rate of small faces via a max-out background label. As a consequence, our method achieves state-of-the-art detection performance on all the common face detection benchmarks, including the AFW, PASCAL face, FDDB and WIDER FACE datasets, and can run at 36 FPS on a Nvidia Titan X (Pascal) for VGA-resolution images.
  • We consider the conditional regularity of mild solution $v$ to the incompressible Navier-Stokes equations in three dimensions. Let $e \in \mathbb{S}^2$ and $0 < T^\ast < \infty$. J. Chemin and P. Zhang \cite{CP} proved the regularity of $v$ on $(0,T^\ast]$ if there exists $p \in (4, 6)$ such that $$\int_0^{T^\ast}\|v\cdot e\|^p_{\dot{H}^{\frac{1}{2}+\frac{2}{p}}}dt < \infty.$$ J. Chemin, P. Zhang and Z. F. Zhang \cite{CPZ} extended the range of $p$ to $(4, \infty)$. In this article we settle the case $p \in [2, 4]$. Our proof also works for the case $p \in (4,\infty)$.
  • Color names based image representation is successfully used in person re-identification, due to the advantages of being compact, intuitively understandable as well as being robust to photometric variance. However, there exists the diversity between underlying distribution of color names' RGB values and that of image pixels' RGB values, which may lead to inaccuracy when directly comparing them in Euclidean space. In this paper, we propose a new method named soft Gaussian mapping (SGM) to address this problem. We model the discrepancies between color names and pixels using a Gaussian and utilize the inverse of covariance matrix to bridge the gap between them. Based on SGM, an image could be converted to several soft Gaussian maps. In each soft Gaussian map, we further seek to establish stable and robust descriptors within a local region through a max pooling operation. Then, a robust image representation based on color names is obtained by concatenating the statistical descriptors in each stripe. When labeled data are available, one discriminative subspace projection matrix is learned to build efficient representations of an image via cross-view coupling learning. Experiments on the public datasets - VIPeR, PRID450S and CUHK03, demonstrate the effectiveness of our method.
  • This paper studies the inviscid limit of the two-dimensional incompressible viscoelasticity, which is a system coupling a Navier-Stokes equation with a transport equation for the deformation tensor. The existence of global smooth solutions near the equilibrium with a fixed positive viscosity was known since the work of F. H. Lin, C. Liu, and P. Zhang in "On hydrodynamics of viscoelastic fluids". The inviscid case was solved recently by the second author Z. Lei. in "Global well-posedness of incompressible elastodynamics in two dimensions". While the latter was solely based on the techniques from the studies of hyperbolic equations, and hence the 2D problem is in general more challenge than that in higher dimensions, the former was relied crucially upon a dissipative mechanism. Indeed, after a symmetrization and a linearization around the equilibrium, the system of the incompressible viscoelasticity reduces to an incompressible system of damped wave equations for both the fluid velocity and the deformation tensor. These two approaches are not compatible. In this paper, we prove global existence of solutions, uniformly in both time $t \in [0, \infty)$ and viscosity $\mu \geq 0$. This allows us to justify in particular the vanishing viscosity limit for all time. In order to overcome difficulties coming from the incompatibility between the purely hyperbolic limiting system and the systems with additional parabolic viscous perturbations, we introduce in this paper a rather robust method which may apply to a wide class of physical systems of similar nature. Roughly speaking, the method works in two dimensional case whenever the hyperbolic system satisfies intrinsically a "Strong Null Condition". For dimensions not less than three, the usual null condition is sufficient for this method to work.
  • Person Re-IDentification (Re-ID) aims to match person images captured from two non-overlapping cameras. In this paper, a deep hybrid similarity learning (DHSL) method for person Re-ID based on a convolution neural network (CNN) is proposed. In our approach, a CNN learning feature pair for the input image pair is simultaneously extracted. Then, both the element-wise absolute difference and multiplication of the CNN learning feature pair are calculated. Finally, a hybrid similarity function is designed to measure the similarity between the feature pair, which is realized by learning a group of weight coefficients to project the element-wise absolute difference and multiplication into a similarity score. Consequently, the proposed DHSL method is able to reasonably assign parameters of feature learning and metric learning in a CNN so that the performance of person Re-ID is improved. Experiments on three challenging person Re-ID databases, QMUL GRID, VIPeR and CUHK03, illustrate that the proposed DHSL method is superior to multiple state-of-the-art person Re-ID methods.
  • In this paper, we consider the Liouville property for ancient solutions of the incompressible Navier-Stokes equations. In 2D and the 3D axially symmetric case without swirl, we prove sharp Liouville theorems for smooth ancient mild solutions: velocity fields $v$ are constants if vorticity fields satisfy certain condition and $v$ are sublinear with respect to spatial variables, and we also give counterexamples when $v$ are linear with respect to spatial variables. The condition which vorticity fields need to satisfy is $\lim\limits_{|x|\rightarrow +\infty}|w(x,t)|=0$ and $\lim\limits_{r\rightarrow +\infty}\frac{|w|}{\sqrt{x_1^2+x_2^2}}=0$ uniformly for all $t\in(-\infty,0)$ in 2D and 3D axially symmetric case without swirl, respectively. In the case when solutions are axially symmetric with nontrivial swirl, we prove that if $\Gamma=rv_\theta\in L^\infty_tL^p_x(\mathbb{R}^3\times(-\infty,0))$ where $1\leq p<\infty$, then bounded ancient mild solutions are constants.
  • Person re-identification is challenging due to the large variations of pose, illumination, occlusion and camera view. Owing to these variations, the pedestrian data is distributed as highly-curved manifolds in the feature space, despite the current convolutional neural networks (CNN)'s capability of feature extraction. However, the distribution is unknown, so it is difficult to use the geodesic distance when comparing two samples. In practice, the current deep embedding methods use the Euclidean distance for the training and test. On the other hand, the manifold learning methods suggest to use the Euclidean distance in the local range, combining with the graphical relationship between samples, for approximating the geodesic distance. From this point of view, selecting suitable positive i.e. intra-class) training samples within a local range is critical for training the CNN embedding, especially when the data has large intra-class variations. In this paper, we propose a novel moderate positive sample mining method to train robust CNN for person re-identification, dealing with the problem of large variation. In addition, we improve the learning by a metric weight constraint, so that the learned metric has a better generalization ability. Experiments show that these two strategies are effective in learning robust deep metrics for person re-identification, and accordingly our deep model significantly outperforms the state-of-the-art methods on several benchmarks of person re-identification. Therefore, the study presented in this paper may be useful in inspiring new designs of deep models for person re-identification.
  • We consider the evolution of two incompressible, immiscible fluids with different densities in porous media, known as the Muskat problem [21], which in two dimensions is analogous to the Hele-Shaw cell [26]. We establish, for a class of large and monotone initial data, the global existence of weak solutions. The proof is based on a local well-posedness result for the initial data with certain specific asymptotics at spatial infinity and a new maximum principle for the first derivative of the graph function.
  • In recent years, numerous effective multi-object tracking (MOT) methods are developed because of the wide range of applications. Existing performance evaluations of MOT methods usually separate the object tracking step from the object detection step by using the same fixed object detection results for comparisons. In this work, we perform a comprehensive quantitative study on the effects of object detection accuracy to the overall MOT performance, using the new large-scale University at Albany DETection and tRACking (UA-DETRAC) benchmark dataset. The UA-DETRAC benchmark dataset consists of 100 challenging video sequences captured from real-world traffic scenes (over 140,000 frames with rich annotations, including occlusion, weather, vehicle category, truncation, and vehicle bounding boxes) for object detection, object tracking and MOT system. We evaluate complete MOT systems constructed from combinations of state-of-the-art object detection and object tracking methods. Our analysis shows the complex effects of object detection accuracy on MOT system performance. Based on these observations, we propose new evaluation tools and metrics for MOT systems that consider both object detection and object tracking for comprehensive analysis.
  • For any $A > 2$, we construct solutions to the two-dimensional incompressible Euler equations on the torus $\mathbb{T}^2$ whose vorticity gradient $\nabla\omega$ grows exponentially in time: $$\|\nabla\omega(t, \cdot)\|_{L^\infty} \gtrsim e^{At},\quad \forall\ t \geq 0.$$
  • Deep neural networks usually benefit from unsupervised pre-training, e.g. auto-encoders. However, the classifier further needs supervised fine-tuning methods for good discrimination. Besides, due to the limits of full-connection, the application of auto-encoders is usually limited to small, well aligned images. In this paper, we incorporate the supervised information to propose a novel formulation, namely class-encoder, whose training objective is to reconstruct a sample from another one of which the labels are identical. Class-encoder aims to minimize the intra-class variations in the feature space, and to learn a good discriminative manifolds on a class scale. We impose the class-encoder as a constraint into the softmax for better supervised training, and extend the reconstruction on feature-level to tackle the parameter size issue and translation issue. The experiments show that the class-encoder helps to improve the performance on benchmarks of classification and face recognition. This could also be a promising direction for fast training of face recognition models.
  • This paper studies the Cauchy problem of the incompressible magnetohydrodynamic systems with or without viscosity $\nu$. Under the assumption that the initial velocity field and the displacement of the initial magnetic field from a non-zero constant are sufficiently small in certain weighted Sobolev spaces, the Cauchy problem is shown to be globally well-posed for all $\nu \geq 0$ and all space dimension $n \geq 2$. Such a result holds true uniformly in nonnegative viscosity parameter. The proof is based on the inherent strong null structure of the systems which was first introduced for incompressible elastodynamics by the second author in \cite{Lei14} and Alinhac's ghost weight technique.
  • Object detection is a fundamental problem in image understanding. One popular solution is the R-CNN framework and its fast versions. They decompose the object detection problem into two cascaded easier tasks: 1) generating object proposals from images, 2) classifying proposals into various object categories. Despite that we are handling with two relatively easier tasks, they are not solved perfectly and there's still room for improvement. In this paper, we push the "divide and conquer" solution even further by dividing each task into two sub-tasks. We call the proposed method "CRAFT" (Cascade Region-proposal-network And FasT-rcnn), which tackles each task with a carefully designed network cascade. We show that the cascade structure helps in both tasks: in proposal generation, it provides more compact and better localized object proposals; in object classification, it reduces false positives (mainly between ambiguous categories) by capturing both inter- and intra-category variances. CRAFT achieves consistent and considerable improvement over the state-of-the-art on object detection benchmarks like PASCAL VOC 07/12 and ILSVRC.
  • Masses of the three generations of charged leptons are known to completely satisfy the Koide's mass relation. But the question remains if such a relation exists for neutrinos? In this paper, by considering SeeSaw mechanism as the mechanism generating tiny neutrino masses, we show how neutrinos satisfy the Koide's mass relation, on the basis of which we systematically give exact values of not only left but also right handed neutrino masses.
  • We prove that for sufficiently small initial displacements in some weighted Sobolev space, the Cauchy problem of the systems of incompressible isotropic elastodynamics in two space dimensions admits a uniqueness global classical solution.
  • Person re-identification aims to re-identify the probe image from a given set of images under different camera views. It is challenging due to large variations of pose, illumination, occlusion and camera view. Since the convolutional neural networks (CNN) have excellent capability of feature extraction, certain deep learning methods have been recently applied in person re-identification. However, in person re-identification, the deep networks often suffer from the over-fitting problem. In this paper, we propose a novel CNN-based method to learn a discriminative metric with good robustness to the over-fitting problem in person re-identification. Firstly, a novel deep architecture is built where the Mahalanobis metric is learned with a weight constraint. This weight constraint is used to regularize the learning, so that the learned metric has a better generalization ability. Secondly, we find that the selection of intra-class sample pairs is crucial for learning but has received little attention. To cope with the large intra-class variations in pedestrian images, we propose a novel training strategy named moderate positive mining to prevent the training process from over-fitting to the extreme samples in intra-class pairs. Experiments show that our approach significantly outperforms state-of-the-art methods on several benchmarks of person re-identification.
  • Face alignment, which fits a face model to an image and extracts the semantic meanings of facial pixels, has been an important topic in CV community. However, most algorithms are designed for faces in small to medium poses (below 45 degree), lacking the ability to align faces in large poses up to 90 degree. The challenges are three-fold: Firstly, the commonly used landmark-based face model assumes that all the landmarks are visible and is therefore not suitable for profile views. Secondly, the face appearance varies more dramatically across large poses, ranging from frontal view to profile view. Thirdly, labelling landmarks in large poses is extremely challenging since the invisible landmarks have to be guessed. In this paper, we propose a solution to the three problems in an new alignment framework, called 3D Dense Face Alignment (3DDFA), in which a dense 3D face model is fitted to the image via convolutional neutral network (CNN). We also propose a method to synthesize large-scale training samples in profile views to solve the third problem of data labelling. Experiments on the challenging AFLW database show that our approach achieves significant improvements over state-of-the-art methods.
  • This article concerns the time growth of Sobolev norms of classical solutions to the 3D incompressible isotropic elastodynamics with small initial displacements.
  • Deep learning methods are powerful tools but often suffer from expensive computation and limited flexibility. An alternative is to combine light-weight models with deep representations. As successful cases exist in several visual problems, a unified framework is absent. In this paper, we revisit two widely used approaches in computer vision, namely filtered channel features and Convolutional Neural Networks (CNN), and absorb merits from both by proposing an integrated method called Convolutional Channel Features (CCF). CCF transfers low-level features from pre-trained CNN models to feed the boosting forest model. With the combination of CNN features and boosting forest, CCF benefits from the richer capacity in feature representation compared with channel features, as well as lower cost in computation and storage compared with end-to-end CNN methods. We show that CCF serves as a good way of tailoring pre-trained CNN models to diverse tasks without fine-tuning the whole network to each task by achieving state-of-the-art performances in pedestrian detection, face detection, edge detection and object proposal generation.
  • Smooth solutions to the axi-symmetric Navier-Stokes equations obey the following maximum principle: $$\sup_{t\geq 0}\|rv^\theta(t, \cdot)\|_{L^\infty} \leq \|rv^\theta(0, \cdot)\|_{L^\infty}.$$ We prove that all solutions with initial data in $H^{\frac{1}{2}}$ is smooth globally in time if $rv^\theta$ satisfies a kind of Form Boundedness Condition (FBC) which is invariant under the natural scaling of the Navier-Stokes equations. In particular, if $rv^\theta$ satisfies \begin{equation}\nonumber \sup_{t \geq 0}|rv^\theta(t, r, z)| \leq C_\ast|\ln r|^{- 2},\ \ r \leq \delta_0 \in (0, \frac{1}{2}),\ C_\ast < \infty, \end{equation} then our FBC is satisfied. Here $\delta_0$ and $C_\ast$ are independent of neither the profile nor the norm of the initial data. So the gap from regularity is logarithmic in nature. We also prove the global regularity of solutions if $\|rv^\theta(0, \cdot)\|_{L^\infty}$ or $\sup_{t \geq 0}\|rv^\theta(t, \cdot)\|_{L^\infty(r \leq r_0)}$ is small but the smallness depends on certain dimensionless quantity of the initial data.