
Inferring the relations between two images is an important class of tasks in
computer vision. Examples of such tasks include computing optical flow and
stereo disparity. We treat the relation inference tasks as a machine learning
problem and tackle it with neural networks. A key to the problem is learning a
representation of relations. We propose a new neural network module, contrast
association unit (CAU), which explicitly models the relations between two sets
of input variables. Due to the nonnegativity of the weights in CAU, we adopt a
multiplicative update algorithm for learning these weights. Experiments show
that neural networks with CAUs are more effective in learning five fundamental
image transformations than conventional neural networks.

Stochastic Neighbor Embedding (SNE) methods minimize the divergence between
the similarity matrix of a highdimensional data set and its counterpart from a
lowdimensional embedding, leading to widely applied tools for data
visualization. Despite their popularity, the current SNE methods experience a
crowding problem when the data include highly imbalanced similarities. This
implies that the data points with higher total similarity tend to get crowded
around the display center. To solve this problem, we introduce a fast
normalization method and normalize the similarity matrix to be doubly
stochastic such that all the data points have equal total similarities.
Furthermore, we show empirically and theoretically that the doubly
stochasticity constraint often leads to embeddings which are approximately
spherical. This suggests replacing a flat space with spheres as the embedding
space. The spherical embedding eliminates the discrepancy between the center
and the periphery in visualization, which efficiently resolves the crowding
problem. We compared the proposed method (DOSNES) with the stateoftheart SNE
method on three realworld datasets and the results clearly indicate that our
method is more favorable in terms of visualization quality.

Information divergence that measures the difference between two nonnegative
matrices or tensors has found its use in a variety of machine learning
problems. Examples are Nonnegative Matrix/Tensor Factorization, Stochastic
Neighbor Embedding, topic models, and Bayesian network optimization. The
success of such a learning task depends heavily on a suitable divergence. A
large variety of divergences have been suggested and analyzed, but very few
results are available for an objective choice of the optimal divergence for a
given task. Here we present a framework that facilitates automatic selection of
the best divergence among a given family, based on standard maximum likelihood
estimation. We first propose an approximated Tweedie distribution for the
betadivergence family. Selecting the best beta then becomes a machine learning
problem solved by maximum likelihood. Next, we reformulate alphadivergence in
terms of betadivergence, which enables automatic selection of alpha by maximum
likelihood with reuse of the learning principle for betadivergence.
Furthermore, we show the connections between gamma and betadivergences as well
as R\'enyi and alphadivergences, such that our automatic selection framework
is extended to nonseparable divergences. Experiments on both synthetic and
realworld data demonstrate that our method can quite accurately select
information divergence across different learning problems and various
divergence families.

Clustering analysis by nonnegative lowrank approximations has achieved
remarkable progress in the past decade. However, most approximation approaches
in this direction are still restricted to matrix factorization. We propose a
new lowrank learning method to improve the clustering performance, which is
beyond matrix factorization. The approximation is based on a twostep bipartite
random walk through virtual cluster nodes, where the approximation is formed by
only cluster assigning probabilities. Minimizing the approximation error
measured by KullbackLeibler divergence is equivalent to maximizing the
likelihood of a discriminative model, which endows our method with a solid
probabilistic interpretation. The optimization is implemented by a relaxed
MajorizationMinimization algorithm that is advantageous in finding good local
minima. Furthermore, we point out that the regularized algorithm with Dirichlet
prior only serves as initialization. Experimental results show that the new
method has strong performance in clustering purity for various datasets,
especially for largescale manifold data.