
Singular value decomposition (SVD) is the mathematical basis of principal
component analysis (PCA). Together, SVD and PCA are one of the most widely used
mathematical formalism/decomposition in machine learning, data mining, pattern
recognition, artificial intelligence, computer vision, signal processing, etc.
In recent applications, regularization becomes an increasing trend. In this
paper, we present a regularized SVD (RSVD), present an efficient computational
algorithm, and provide several theoretical analysis. We show that although RSVD
is nonconvex, it has a closedform global optimal solution. Finally, we apply
RSVD to the application of recommender system and experimental result show that
RSVD outperforms SVD significantly.

Support Vector Machine (SVM) is an efficient classification approach, which
finds a hyperplane to separate data from different classes. This hyperplane is
determined by support vectors. In existing SVM formulations, the objective
function uses L2 norm or L1 norm on slack variables. The number of support
vectors is a measure of generalization errors. In this work, we propose a
Minimal SVM, which uses L0.5 norm on slack variables. The result model further
reduces the number of support vectors and increases the classification
performance.

In many realworld applications, data usually contain outliers. One popular
approach is to use L2,1 norm function as a robust error/loss function. However,
the robustness of L2,1 norm function is not well understood so far. In this
paper, we propose a new Vector Outlier Regularization (VOR) framework to
understand and analyze the robustness of L2,1 norm function. Our VOR function
defines a data point to be outlier if it is outside a threshold with respect to
a theoretical prediction, and regularize itpull it back to the threshold line.
We then prove that L2,1 function is the limiting case of this VOR with the
usual least square/L2 error function as the threshold shrinks to zero. One
interesting property of VOR is that how far an outlier lies away from its
theoretically predicted value does not affect the final regularization and
analysis results. This VOR property unmasks one of the most peculiar property
of L2,1 norm function: The effects of outliers seem to be independent of how
outlying they areif an outlier is moved further away from the intrinsic
manifold/subspace, the final analysis results do not change. VOR provides a new
way to understand and analyze the robustness of L2,1 norm function. Applying
VOR to matrix factorization leads to a new VORPCA model. We give a
comprehensive comparison with tracenorm based L21norm PCA to demonstrate the
advantages of VORPCA.

In many realworld applications, data come with corruptions, large errors or
outliers. One popular approach is to use L1norm function. However, the
robustness of L1norm function is not well understood so far. In this paper, we
present a new outlier regularization framework to understand and analyze the
robustness of L1norm function. There are two main features for the proposed
outlier regularization. (1) A key property of outlier regularization is that
how far an outlier lies away from its theoretically predicted value does not
affect the final regularization and analysis results. (2) Another important
feature of outlier regularization is that it has an equivalent continuous
representation that closely relates to L1 function. This provides a new way to
understand and analyze the robustness of L1 function. We apply our outlier
regularization framework to PCA and propose an outlier regularized PCA (ORPCA)
model. Comparing to the tracenormbased robust PCA, ORPCA has several benefits:
(1) It does not suffer singular value suppression. (2) It can retain small high
rank components which help retain fine details of data. (3) ORPCA can be
computed more efficiently.

In many realworld applications, image data often come with noises,
corruptions or large errors. One approach to deal with noise image data is to
use data recovery techniques which aim to recover the true uncorrupted signals
from the observed noise images. In this paper, we first introduce a novel
corruption recovery transformation (CRT) model which aims to recover multiple
(or a collection of) corrupted images using a single affine transformation.
Then, we show that the introduced CRT can be efficiently constructed through
learning from training data. Once CRT is learned, we can recover the true
signals from the new incoming/test corrupted images explicitly. As an
application, we apply our CRT to image recognition task. Experimental results
on six image datasets demonstrate that the proposed CRT model is effective in
recovering noise image data and thus leads to better recognition results.

Linear Discriminant Analysis (LDA) is a widelyused supervised dimensionality
reduction method in computer vision and pattern recognition. In null space
based LDA (NLDA), a wellknown LDA extension, betweenclass distance is
maximized in the null space of the withinclass scatter matrix. However, there
are some limitations in NLDA. Firstly, for many data sets, null space of
withinclass scatter matrix does not exist, thus NLDA is not applicable to
those datasets. Secondly, NLDA uses arithmetic mean of betweenclass distances
and gives equal consideration to all betweenclass distances, which makes
larger betweenclass distances can dominate the result and thus limits the
performance of NLDA. In this paper, we propose a harmonic mean based Linear
Discriminant Analysis, MultiClass Discriminant Analysis (MCDA), for image
classification, which minimizes the reciprocal of weighted harmonic mean of
pairwise betweenclass distance. More importantly, MCDA gives higher priority
to maximize small betweenclass distances. MCDA can be extended to multilabel
dimension reduction. Results on 7 singlelabel data sets and 4 multilabel data
sets show that MCDA has consistently better performance than 10 other
singlelabel approaches and 4 other multilabel approaches in terms of
classification accuracy, macro and micro average F1 score.

Kernel alignment measures the degree of similarity between two kernels. In
this paper, inspired from kernel alignment, we propose a new Linear
Discriminant Analysis (LDA) formulation, kernel alignment LDA (kaLDA). We first
define two kernels, data kernel and class indicator kernel. The problem is to
find a subspace to maximize the alignment between subspacetransformed data
kernel and class indicator kernel. Surprisingly, the kernel alignment induced
kaLDA objective function is very similar to classical LDA and can be expressed
using betweenclass and total scatter matrices. This can be extended to
multilabel data. We use a Stiefelmanifold gradient descent algorithm to solve
this problem. We perform experiments on 8 singlelabel and 6 multilabel data
sets. Results show that kaLDA has very good performance on many singlelabel
and multilabel problems.

Real life data often includes information from different channels. For
example, in computer vision, we can describe an image using different image
features, such as pixel intensity, color, HOG, GIST feature, SIFT features,
etc.. These different aspects of the same objects are often called multiview
(or multimodal) data. Lowrank regression model has been proved to be an
effective learning mechanism by exploring the lowrank structure of real life
data. But previous lowrank regression model only works on single view data. In
this paper, we propose a multiview lowrank regression model by imposing
lowrank constraints on multiview regression model. Most importantly, we
provide a closedform solution to the multiview lowrank regression model.
Extensive experiments on 4 multiview datasets show that the multiview
lowrank regression model outperforms singleview regression model and reveals
that multiview lowrank structure is very helpful.

Deep Learning is a very powerful machine learning model. Deep Learning trains
a large number of parameters for multiple layers and is very slow when data is
in large scale and the architecture size is large. Inspired from the shrinking
technique used in accelerating computation of Support Vector Machines (SVM)
algorithm and screening technique used in LASSO, we propose a shrinking Deep
Learning with recall (sDLr) approach to speed up deep learning computation. We
experiment shrinking Deep Learning with recall (sDLr) using Deep Neural Network
(DNN), Deep Belief Network (DBN) and Convolution Neural Network (CNN) on 4 data
sets. Results show that the speedup using shrinking Deep Learning with recall
(sDLr) can reach more than 2.0 while still giving competitive classification
performance.

For tensor decompositions such as HOSVD and ParaFac, the objective functions
are nonconvex. This implies, theoretically, there exists a large number of
local optimas: starting from different starting point, the iteratively improved
solution will converge to different local solutions. This nonuniqueness
present a stability and reliability problem for image compression and
retrieval. In this paper, we present the results of a comprehensive
investigation of this problem. We found that although all tensor decomposition
algorithms fail to reach a unique global solution on random data and severely
scrambled data; surprisingly however, on all real life several data sets (even
with substantial scramble and occlusions), HOSVD always produce the unique
global solution in the parameter region suitable to practical applications,
while ParaFac produce nonunique solutions. We provide an eigenvalue based rule
for the assessing the solution uniqueness.