
Although tremendous strides have been made in face detection, one of the
remaining open challenges is to achieve realtime speed on the CPU as well as
maintain high performance, since effective models for face detection tend to be
computationally prohibitive. To address this challenge, we propose a novel face
detector, named FaceBoxes, with superior performance on both speed and
accuracy. Specifically, our method has a lightweight yet powerful network
structure that consists of the Rapidly Digested Convolutional Layers (RDCL) and
the Multiple Scale Convolutional Layers (MSCL). The RDCL is designed to enable
FaceBoxes to achieve realtime speed on the CPU. The MSCL aims at enriching the
receptive fields and discretizing anchors over different layers to handle faces
of various scales. Besides, we propose a new anchor densification strategy to
make different types of anchors have the same density on the image, which
significantly improves the recall rate of small faces. As a consequence, the
proposed detector runs at 20 FPS on a single CPU core and 125 FPS using a GPU
for VGAresolution images. Moreover, the speed of FaceBoxes is invariant to the
number of faces. We comprehensively evaluate this method and present
stateoftheart detection performance on several face detection benchmark
datasets, including the AFW, PASCAL face, and FDDB. Code is available at
https://github.com/sfzhang15/FaceBoxes

Softmax loss is arguably one of the most popular losses to train CNN models
for image classification. However, recent works have exposed its limitation on
feature discriminability. This paper casts a new viewpoint on the weakness of
softmax loss. On the one hand, the CNN features learned using the softmax loss
are often inadequately discriminative. We hence introduce a softmargin softmax
function to explicitly encourage the discrimination between different classes.
On the other hand, the learned classifier of softmax loss is weak. We propose
to assemble multiple these weak classifiers to a strong one, inspired by the
recognition that the diversity among weak classifiers is critical to a good
ensemble. To achieve the diversity, we adopt the HilbertSchmidt Independence
Criterion (HSIC). Considering these two aspects in one framework, we design a
novel loss, named as Ensemble softMargin Softmax (EMSoftmax). Extensive
experiments on benchmark datasets are conducted to show the superiority of our
design over the baseline softmax loss and several stateoftheart
alternatives.

Face alignment, which fits a face model to an image and extracts the semantic
meanings of facial pixels, has been an important topic in the computer vision
community. However, most algorithms are designed for faces in small to medium
poses (yaw angle is smaller than 45 degrees), which lack the ability to align
faces in large poses up to 90 degrees. The challenges are threefold. Firstly,
the commonly used landmark face model assumes that all the landmarks are
visible and is therefore not suitable for large poses. Secondly, the face
appearance varies more drastically across large poses, from the frontal view to
the profile view. Thirdly, labelling landmarks in large poses is extremely
challenging since the invisible landmarks have to be guessed. In this paper, we
propose to tackle these three challenges in an new alignment framework termed
3D Dense Face Alignment (3DDFA), in which a dense 3D Morphable Model (3DMM) is
fitted to the image via Cascaded Convolutional Neural Networks. We also utilize
3D information to synthesize face images in profile views to provide abundant
samples for training. Experiments on the challenging AFLW database show that
the proposed approach achieves significant improvements over the
stateoftheart methods.

For object detection, the twostage approach (e.g., Faster RCNN) has been
achieving the highest accuracy, whereas the onestage approach (e.g., SSD) has
the advantage of high efficiency. To inherit the merits of both while
overcoming their disadvantages, in this paper, we propose a novel singleshot
based detector, called RefineDet, that achieves better accuracy than twostage
methods and maintains comparable efficiency of onestage methods. RefineDet
consists of two interconnected modules, namely, the anchor refinement module
and the object detection module. Specifically, the former aims to (1) filter
out negative anchors to reduce search space for the classifier, and (2)
coarsely adjust the locations and sizes of anchors to provide better
initialization for the subsequent regressor. The latter module takes the
refined anchors as the input from the former to further improve the regression
and predict multiclass label. Meanwhile, we design a transfer connection block
to transfer the features in the anchor refinement module to predict locations,
sizes and class labels of objects in the object detection module. The
multitask loss function enables us to train the whole network in an endtoend
way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO
demonstrate that RefineDet achieves stateoftheart detection accuracy with
high efficiency. Code is available at https://github.com/sfzhang15/RefineDet

We find a new scaling invariance of the barotropic compressible NavierStokes
equations. Then it is shown that type I singularities of solutions with
$$\limsup_{t \nearrow T}{\rm div} u(t, x)(T  t) \leq \kappa,$$ can never
happen at time $T$ for all adiabatic number $\gamma \geq 1$. Here $\kappa > 0$
doesn't depend on the initial data. This is achieved by proving the regularity
of solutions under $$\rho(t, x) \leq \frac{M}{(T  t)^\kappa},\quad M <
\infty.$$ This new scaling invariance also motivates us to construct an
explicit type II blowup solution for $\gamma > 1$.

This paper presents a realtime face detector, named Single Shot
Scaleinvariant Face Detector (S$^3$FD), which performs superiorly on various
scales of faces with a single deep neural network, especially for small faces.
Specifically, we try to solve the common problem that anchorbased detectors
deteriorate dramatically as the objects become smaller. We make contributions
in the following three aspects: 1) proposing a scaleequitable face detection
framework to handle different scales of faces well. We tile anchors on a wide
range of layers to ensure that all scales of faces have enough features for
detection. Besides, we design anchor scales based on the effective receptive
field and a proposed equal proportion interval principle; 2) improving the
recall rate of small faces by a scale compensation anchor matching strategy; 3)
reducing the false positive rate of small faces via a maxout background label.
As a consequence, our method achieves stateoftheart detection performance on
all the common face detection benchmarks, including the AFW, PASCAL face, FDDB
and WIDER FACE datasets, and can run at 36 FPS on a Nvidia Titan X (Pascal) for
VGAresolution images.

We consider the conditional regularity of mild solution $v$ to the
incompressible NavierStokes equations in three dimensions. Let $e \in
\mathbb{S}^2$ and $0 < T^\ast < \infty$. J. Chemin and P. Zhang \cite{CP}
proved the regularity of $v$ on $(0,T^\ast]$ if there exists $p \in (4, 6)$
such that $$\int_0^{T^\ast}\v\cdot e\^p_{\dot{H}^{\frac{1}{2}+\frac{2}{p}}}dt
< \infty.$$ J. Chemin, P. Zhang and Z. F. Zhang \cite{CPZ} extended the range
of $p$ to $(4, \infty)$. In this article we settle the case $p \in [2, 4]$. Our
proof also works for the case $p \in (4,\infty)$.

Color names based image representation is successfully used in person
reidentification, due to the advantages of being compact, intuitively
understandable as well as being robust to photometric variance. However, there
exists the diversity between underlying distribution of color names' RGB values
and that of image pixels' RGB values, which may lead to inaccuracy when
directly comparing them in Euclidean space. In this paper, we propose a new
method named soft Gaussian mapping (SGM) to address this problem. We model the
discrepancies between color names and pixels using a Gaussian and utilize the
inverse of covariance matrix to bridge the gap between them. Based on SGM, an
image could be converted to several soft Gaussian maps. In each soft Gaussian
map, we further seek to establish stable and robust descriptors within a local
region through a max pooling operation. Then, a robust image representation
based on color names is obtained by concatenating the statistical descriptors
in each stripe. When labeled data are available, one discriminative subspace
projection matrix is learned to build efficient representations of an image via
crossview coupling learning. Experiments on the public datasets  VIPeR,
PRID450S and CUHK03, demonstrate the effectiveness of our method.

This paper studies the inviscid limit of the twodimensional incompressible
viscoelasticity, which is a system coupling a NavierStokes equation with a
transport equation for the deformation tensor. The existence of global smooth
solutions near the equilibrium with a fixed positive viscosity was known since
the work of F. H. Lin, C. Liu, and P. Zhang in "On hydrodynamics of
viscoelastic fluids". The inviscid case was solved recently by the second
author Z. Lei. in "Global wellposedness of incompressible elastodynamics in
two dimensions". While the latter was solely based on the techniques from the
studies of hyperbolic equations, and hence the 2D problem is in general more
challenge than that in higher dimensions, the former was relied crucially upon
a dissipative mechanism. Indeed, after a symmetrization and a linearization
around the equilibrium, the system of the incompressible viscoelasticity
reduces to an incompressible system of damped wave equations for both the fluid
velocity and the deformation tensor. These two approaches are not compatible.
In this paper, we prove global existence of solutions, uniformly in both time
$t \in [0, \infty)$ and viscosity $\mu \geq 0$. This allows us to justify in
particular the vanishing viscosity limit for all time. In order to overcome
difficulties coming from the incompatibility between the purely hyperbolic
limiting system and the systems with additional parabolic viscous
perturbations, we introduce in this paper a rather robust method which may
apply to a wide class of physical systems of similar nature. Roughly speaking,
the method works in two dimensional case whenever the hyperbolic system
satisfies intrinsically a "Strong Null Condition". For dimensions not less than
three, the usual null condition is sufficient for this method to work.

Person ReIDentification (ReID) aims to match person images captured from
two nonoverlapping cameras. In this paper, a deep hybrid similarity learning
(DHSL) method for person ReID based on a convolution neural network (CNN) is
proposed. In our approach, a CNN learning feature pair for the input image pair
is simultaneously extracted. Then, both the elementwise absolute difference
and multiplication of the CNN learning feature pair are calculated. Finally, a
hybrid similarity function is designed to measure the similarity between the
feature pair, which is realized by learning a group of weight coefficients to
project the elementwise absolute difference and multiplication into a
similarity score. Consequently, the proposed DHSL method is able to reasonably
assign parameters of feature learning and metric learning in a CNN so that the
performance of person ReID is improved. Experiments on three challenging
person ReID databases, QMUL GRID, VIPeR and CUHK03, illustrate that the
proposed DHSL method is superior to multiple stateoftheart person ReID
methods.

In this paper, we consider the Liouville property for ancient solutions of
the incompressible NavierStokes equations. In 2D and the 3D axially symmetric
case without swirl, we prove sharp Liouville theorems for smooth ancient mild
solutions: velocity fields $v$ are constants if vorticity fields satisfy
certain condition and $v$ are sublinear with respect to spatial variables, and
we also give counterexamples when $v$ are linear with respect to spatial
variables. The condition which vorticity fields need to satisfy is
$\lim\limits_{x\rightarrow +\infty}w(x,t)=0$ and
$\lim\limits_{r\rightarrow +\infty}\frac{w}{\sqrt{x_1^2+x_2^2}}=0$
uniformly for all $t\in(\infty,0)$ in 2D and 3D axially symmetric case without
swirl, respectively.
In the case when solutions are axially symmetric with nontrivial swirl, we
prove that if $\Gamma=rv_\theta\in
L^\infty_tL^p_x(\mathbb{R}^3\times(\infty,0))$ where $1\leq p<\infty$, then
bounded ancient mild solutions are constants.

Person reidentification is challenging due to the large variations of pose,
illumination, occlusion and camera view. Owing to these variations, the
pedestrian data is distributed as highlycurved manifolds in the feature space,
despite the current convolutional neural networks (CNN)'s capability of feature
extraction. However, the distribution is unknown, so it is difficult to use the
geodesic distance when comparing two samples. In practice, the current deep
embedding methods use the Euclidean distance for the training and test. On the
other hand, the manifold learning methods suggest to use the Euclidean distance
in the local range, combining with the graphical relationship between samples,
for approximating the geodesic distance. From this point of view, selecting
suitable positive i.e. intraclass) training samples within a local range is
critical for training the CNN embedding, especially when the data has large
intraclass variations. In this paper, we propose a novel moderate positive
sample mining method to train robust CNN for person reidentification, dealing
with the problem of large variation. In addition, we improve the learning by a
metric weight constraint, so that the learned metric has a better
generalization ability. Experiments show that these two strategies are
effective in learning robust deep metrics for person reidentification, and
accordingly our deep model significantly outperforms the stateoftheart
methods on several benchmarks of person reidentification. Therefore, the study
presented in this paper may be useful in inspiring new designs of deep models
for person reidentification.

We consider the evolution of two incompressible, immiscible fluids with
different densities in porous media, known as the Muskat problem [21], which in
two dimensions is analogous to the HeleShaw cell [26]. We establish, for a
class of large and monotone initial data, the global existence of weak
solutions. The proof is based on a local wellposedness result for the initial
data with certain specific asymptotics at spatial infinity and a new maximum
principle for the first derivative of the graph function.

In recent years, numerous effective multiobject tracking (MOT) methods are
developed because of the wide range of applications. Existing performance
evaluations of MOT methods usually separate the object tracking step from the
object detection step by using the same fixed object detection results for
comparisons. In this work, we perform a comprehensive quantitative study on the
effects of object detection accuracy to the overall MOT performance, using the
new largescale University at Albany DETection and tRACking (UADETRAC)
benchmark dataset. The UADETRAC benchmark dataset consists of 100 challenging
video sequences captured from realworld traffic scenes (over 140,000 frames
with rich annotations, including occlusion, weather, vehicle category,
truncation, and vehicle bounding boxes) for object detection, object tracking
and MOT system. We evaluate complete MOT systems constructed from combinations
of stateoftheart object detection and object tracking methods. Our analysis
shows the complex effects of object detection accuracy on MOT system
performance. Based on these observations, we propose new evaluation tools and
metrics for MOT systems that consider both object detection and object tracking
for comprehensive analysis.

For any $A > 2$, we construct solutions to the twodimensional incompressible
Euler equations on the torus $\mathbb{T}^2$ whose vorticity gradient
$\nabla\omega$ grows exponentially in time: $$\\nabla\omega(t,
\cdot)\_{L^\infty} \gtrsim e^{At},\quad \forall\ t \geq 0.$$

Deep neural networks usually benefit from unsupervised pretraining, e.g.
autoencoders. However, the classifier further needs supervised finetuning
methods for good discrimination. Besides, due to the limits of fullconnection,
the application of autoencoders is usually limited to small, well aligned
images. In this paper, we incorporate the supervised information to propose a
novel formulation, namely classencoder, whose training objective is to
reconstruct a sample from another one of which the labels are identical.
Classencoder aims to minimize the intraclass variations in the feature space,
and to learn a good discriminative manifolds on a class scale. We impose the
classencoder as a constraint into the softmax for better supervised training,
and extend the reconstruction on featurelevel to tackle the parameter size
issue and translation issue. The experiments show that the classencoder helps
to improve the performance on benchmarks of classification and face
recognition. This could also be a promising direction for fast training of face
recognition models.

This paper studies the Cauchy problem of the incompressible
magnetohydrodynamic systems with or without viscosity $\nu$. Under the
assumption that the initial velocity field and the displacement of the initial
magnetic field from a nonzero constant are sufficiently small in certain
weighted Sobolev spaces, the Cauchy problem is shown to be globally wellposed
for all $\nu \geq 0$ and all space dimension $n \geq 2$. Such a result holds
true uniformly in nonnegative viscosity parameter. The proof is based on the
inherent strong null structure of the systems which was first introduced for
incompressible elastodynamics by the second author in \cite{Lei14} and
Alinhac's ghost weight technique.

Object detection is a fundamental problem in image understanding. One popular
solution is the RCNN framework and its fast versions. They decompose the
object detection problem into two cascaded easier tasks: 1) generating object
proposals from images, 2) classifying proposals into various object categories.
Despite that we are handling with two relatively easier tasks, they are not
solved perfectly and there's still room for improvement. In this paper, we push
the "divide and conquer" solution even further by dividing each task into two
subtasks. We call the proposed method "CRAFT" (Cascade Regionproposalnetwork
And FasTrcnn), which tackles each task with a carefully designed network
cascade. We show that the cascade structure helps in both tasks: in proposal
generation, it provides more compact and better localized object proposals; in
object classification, it reduces false positives (mainly between ambiguous
categories) by capturing both inter and intracategory variances. CRAFT
achieves consistent and considerable improvement over the stateoftheart on
object detection benchmarks like PASCAL VOC 07/12 and ILSVRC.

Masses of the three generations of charged leptons are known to completely
satisfy the Koide's mass relation. But the question remains if such a relation
exists for neutrinos? In this paper, by considering SeeSaw mechanism as the
mechanism generating tiny neutrino masses, we show how neutrinos satisfy the
Koide's mass relation, on the basis of which we systematically give exact
values of not only left but also right handed neutrino masses.

We prove that for sufficiently small initial displacements in some weighted
Sobolev space, the Cauchy problem of the systems of incompressible isotropic
elastodynamics in two space dimensions admits a uniqueness global classical
solution.

Person reidentification aims to reidentify the probe image from a given set
of images under different camera views. It is challenging due to large
variations of pose, illumination, occlusion and camera view. Since the
convolutional neural networks (CNN) have excellent capability of feature
extraction, certain deep learning methods have been recently applied in person
reidentification. However, in person reidentification, the deep networks
often suffer from the overfitting problem. In this paper, we propose a novel
CNNbased method to learn a discriminative metric with good robustness to the
overfitting problem in person reidentification. Firstly, a novel deep
architecture is built where the Mahalanobis metric is learned with a weight
constraint. This weight constraint is used to regularize the learning, so that
the learned metric has a better generalization ability. Secondly, we find that
the selection of intraclass sample pairs is crucial for learning but has
received little attention. To cope with the large intraclass variations in
pedestrian images, we propose a novel training strategy named moderate positive
mining to prevent the training process from overfitting to the extreme samples
in intraclass pairs. Experiments show that our approach significantly
outperforms stateoftheart methods on several benchmarks of person
reidentification.

Face alignment, which fits a face model to an image and extracts the semantic
meanings of facial pixels, has been an important topic in CV community.
However, most algorithms are designed for faces in small to medium poses (below
45 degree), lacking the ability to align faces in large poses up to 90 degree.
The challenges are threefold: Firstly, the commonly used landmarkbased face
model assumes that all the landmarks are visible and is therefore not suitable
for profile views. Secondly, the face appearance varies more dramatically
across large poses, ranging from frontal view to profile view. Thirdly,
labelling landmarks in large poses is extremely challenging since the invisible
landmarks have to be guessed. In this paper, we propose a solution to the three
problems in an new alignment framework, called 3D Dense Face Alignment (3DDFA),
in which a dense 3D face model is fitted to the image via convolutional neutral
network (CNN). We also propose a method to synthesize largescale training
samples in profile views to solve the third problem of data labelling.
Experiments on the challenging AFLW database show that our approach achieves
significant improvements over stateoftheart methods.

This article concerns the time growth of Sobolev norms of classical solutions
to the 3D incompressible isotropic elastodynamics with small initial
displacements.

Deep learning methods are powerful tools but often suffer from expensive
computation and limited flexibility. An alternative is to combine lightweight
models with deep representations. As successful cases exist in several visual
problems, a unified framework is absent. In this paper, we revisit two widely
used approaches in computer vision, namely filtered channel features and
Convolutional Neural Networks (CNN), and absorb merits from both by proposing
an integrated method called Convolutional Channel Features (CCF). CCF transfers
lowlevel features from pretrained CNN models to feed the boosting forest
model. With the combination of CNN features and boosting forest, CCF benefits
from the richer capacity in feature representation compared with channel
features, as well as lower cost in computation and storage compared with
endtoend CNN methods. We show that CCF serves as a good way of tailoring
pretrained CNN models to diverse tasks without finetuning the whole network
to each task by achieving stateoftheart performances in pedestrian
detection, face detection, edge detection and object proposal generation.

Smooth solutions to the axisymmetric NavierStokes equations obey the
following maximum principle: $$\sup_{t\geq 0}\rv^\theta(t, \cdot)\_{L^\infty}
\leq \rv^\theta(0, \cdot)\_{L^\infty}.$$ We prove that all solutions with
initial data in $H^{\frac{1}{2}}$ is smooth globally in time if $rv^\theta$
satisfies a kind of Form Boundedness Condition (FBC) which is invariant under
the natural scaling of the NavierStokes equations. In particular, if
$rv^\theta$ satisfies \begin{equation}\nonumber \sup_{t \geq 0}rv^\theta(t, r,
z) \leq C_\ast\ln r^{ 2},\ \ r \leq \delta_0 \in (0, \frac{1}{2}),\ C_\ast
< \infty, \end{equation} then our FBC is satisfied. Here $\delta_0$ and
$C_\ast$ are independent of neither the profile nor the norm of the initial
data. So the gap from regularity is logarithmic in nature. We also prove the
global regularity of solutions if $\rv^\theta(0, \cdot)\_{L^\infty}$ or
$\sup_{t \geq 0}\rv^\theta(t, \cdot)\_{L^\infty(r \leq r_0)}$ is small but
the smallness depends on certain dimensionless quantity of the initial data.