
The affine Grassmannian is a noncompact smooth manifold that parameterizes
all affine subspaces of a fixed dimension. It is a natural generalization of
Euclidean space, points being zerodimensional affine subspaces. We will
realize the affine Grassmannian as a matrix manifold and extend Riemannian
optimization algorithms including steepest descent, Newton method, and
conjugate gradient, to realvalued functions on the affine Grassmannian. Like
their counterparts for the Grassmannian, these algorithms are in the style of
EdelmanAriasSmith  they rely only on standard numerical linear algebra
and are readily computable.

We present an elementary, selfcontained proof of Grothendieck's inequality
that unifies the real and complex cases and yields both the Krivine and
Haagerup bounds, the current bestknown explicit bounds for the real and
complex Grothendieck constants respectively. This article is intended to be
pedagogical, combining and streamlining known ideas of
LindenstraussPe{\l}czy\'nski, Krivine, and Haagerup into one that requires
only univariate calculus and basic complex variables.

We study pathconnectedness and homotopy groups of sets of tensors defined by
tensor rank, border rank, multilinear rank, as well as their symmetric
counterparts for symmetric tensors. We show that over $\mathbb{C}$, the set of
rank$r$ tensors and the set of symmetric rank$r$ symmetric tensors are both
pathconnected if $r$ is not more than the complex generic rank; these results
also extend to border rank and symmetric border rank over $\mathbb{C}$. Over
$\mathbb{R}$, the set of rank$r$ tensors is pathconnected if it has the
expected dimension but the corresponding result for symmetric rank$r$
symmetric $d$tensors depends on the order $d$: connected when $d$ is odd but
not when $d$ is even. Border rank and symmetric border rank over $\mathbb{R}$
have essentially the same pathconnectedness properties as rank and symmetric
rank over $\mathbb{R}$. When $r$ is greater than the complex generic rank, we
are unable to discern any general pattern: For example, we show that
borderrankthree tensors in $\mathbb{R}^2 \otimes \mathbb{R}^2 \otimes
\mathbb{R}^2$ fall into four connected components. For multilinear rank, the
manifold of $d$tensors of multilinear rank $(r_1,\dots,r_d)$ in
$\mathbb{C}^{n_1} \otimes \cdots \otimes \mathbb{C}^{n_d}$ is always
pathconnected, and the same is true in $\mathbb{R}^{n_1} \otimes \cdots
\otimes \mathbb{R}^{n_d}$ unless $n_i = r_i = \prod_{j \ne i} r_j$ for some
$i\in\{1, \dots, d\}$. Beyond pathconnectedness, we determine, over both
$\mathbb{R}$ and $\mathbb{C}$, the fundamental and higher homotopy groups of
the set of tensors of a fixed small rank, and, taking advantage of Bott
periodicity, those of the manifold of tensors of a fixed multilinear rank. We
also obtain analogues of these results for symmetric tensors of a fixed
symmetric rank or a fixed symmetric multilinear rank.

In problems involving approximation, completion, denoising, dimension
reduction, estimation, interpolation, modeling, order reduction, regression,
etc, we argue that the nearuniversal practice of assuming that a function,
matrix, or tensor (which we will see are all the same object in this context)
has low rank may be illjustified. There are many natural instances where the
object in question has high rank with respect to the classical notions of rank:
matrix rank, tensor rank, multilinear rank  the latter two being the most
straightforward generalizations of the former. To remedy this, we show that one
may vastly expand these classical notions of ranks: Given any undirected graph
$G$, there is a notion of $G$rank associated with $G$, which provides us with
as many different kinds of ranks as there are undirected graphs. In particular,
the popular tensor network states in physics (e.g., MPS, TTNS, PEPS, MERA) may
be regarded as functions of a specific $G$rank for various choices of $G$.
Among other things, we will see that a function, matrix, or tensor may have
very high matrix, tensor, or multilinear rank and yet very low $G$rank for
some $G$. In fact the difference is in the orders of magnitudes and the gaps
between $G$ranks and these classical ranks are arbitrarily large for some
important objects in computer science, mathematics, and physics. Furthermore,
we show that there is a $G$ such that almost every tensor has $G$rank
exponentially lower than its rank or the dimension of its ambient space.

We review the basic outline of the highly successful diffusion Monte Carlo
technique commonly used in contexts ranging from electronic structure
calculations to rare event simulation and data assimilation, and propose a new
class of randomized iterative algorithms based on similar principles to address
a variety of common tasks in numerical linear algebra. From the point of view
of numerical linear algebra, the main novelty of the Fast Randomized Iteration
schemes described in this article is that they work in either linear or
constant cost per iteration (and in total, under appropriate conditions) and
are rather versatile: we will show how they apply to solution of linear
systems, eigenvalue problems, and matrix exponentiation, in dimensions far
beyond the present limits of numerical linear algebra. While traditional
iterative methods in numerical linear algebra were created in part to deal with
instances where a matrix (of size $\mathcal{O}(n^2)$) is too big to store, the
algorithms that we propose are effective even in instances where the solution
vector itself (of size $\mathcal{O}(n)$) may be too big to store or manipulate.
In fact, our work is motivated by recent DMC based quantum Monte Carlo schemes
that have been applied to matrices as large as $10^{108} \times 10^{108}$. We
provide basic convergence results, discuss the dependence of these results on
the dimension of the system, and demonstrate dramatic cost savings on a range
of test problems.

We show that the two notions of entanglement: the maximum of the geometric
measure of entanglement and the maximum of the nuclear norm is attained for the
same states. We affirm the conjecture of HiguchiSudberry on the maximum
entangled state of four qubits. We introduce the notion of ddensity tensor for
mixed dpartite states. We show that ddensity tensor is separable if and only
if its nuclear norm is $1$. We suggest an alternating method for computing the
nuclear norm of tensors. We apply the above results to symmetric tensors. We
give many numerical examples.

The goal of cryoelectron microscopy (EM) is to reconstruct the 3dimensional
structure of a molecule from a collection of its 2dimensional projected
images. In this article, we show that the basic premise of cryoEM  patching
together 2dimensional projections to reconstruct a 3dimensional object  is
naturally one of Cech cohomology with SO(2)coefficients. We deduce that every
cryoEM reconstruction problem corresponds to an oriented circle bundle on a
simplicial complex, allowing us to classify cryoEM problems via principal
bundles. In practice, the 2dimensional images are noisy and a main task in
cryoEM is to denoise them. We will see how the aforementioned insights can be
used towards this end.

Random walks are a fundamental model in applied mathematics and are a common
example of a Markov chain. The limiting stationary distribution of the Markov
chain represents the fraction of the time spent in each state during the
stochastic process. A standard way to compute this distribution for a random
walk on a finite set of states is to compute the Perron vector of the
associated transition matrix. There are algebraic analogues of this Perron
vector in terms of transition probability tensors of higherorder Markov
chains. These vectors are nonnegative, have dimension equal to the dimension of
the state space, and sum to one and are derived by making an algebraic
substitution in the equation for the jointstationary distribution of a
higherorder Markov chains. Here, we present the spacey random walk, a
nonMarkovian stochastic process whose stationary distribution is given by the
tensor eigenvector. The process itself is a vertexreinforced random walk, and
its discrete dynamics are related to a continuous dynamical system. We analyze
the convergence properties of these dynamics and discuss numerical methods for
computing the stationary distribution. Finally, we provide several applications
of the spacey random walk model in population genetics, ranking, and clustering
data, and we use the process to analyze taxi trajectory data in New York. This
example shows definite nonMarkovian structure.

We study the semialgebraic structure of $D_r$, the set of nonnegative tensors
of nonnegative rank not more than $r$, and use the results to infer various
properties of nonnegative tensor rank. We determine all nonnegative typical
ranks for cubical nonnegative tensors and show that the direct sum conjecture
is true for nonnegative tensor rank. We show that nonnegative, real, and
complex ranks are all equal for a general nonnegative tensor of nonnegative
rank strictly less than the complex generic rank. In addition, such nonnegative
tensors always have unique nonnegative rank$r$ decompositions if the real
tensor space is $r$identifiable. We determine conditions under which a best
nonnegative rank$r$ approximation has a unique nonnegative rank$r$
decomposition: for $r \le 3$, this is always the case; for general $r$, this is
the case when the best nonnegative rank$r$ approximation does not lie on the
boundary of $D_r$. Many of our general identifiability results also apply to
real tensors and real symmetric tensors.

We show that for any given norm ball or proper cone, weak membership in its
dual ball or dual cone is polynomialtime reducible to weak membership in the
given ball or cone. A consequence is that the weak membership or membership
problem for a ball or cone is NPhard if and only if the corresponding problem
for the dual ball or cone is NPhard. In a similar vein, we show that
computation of the dual norm of a given norm is polynomialtime reducible to
computation of the given norm. This extends to convex functions satisfying a
polynomial growth condition: for such a given function, computation of its
Fenchel dual/conjugate is polynomialtime reducible to computation of the given
function. Hence the computation of a norm or a convex function of
polynomialgrowth is NPhard if and only if the computation of its dual norm or
Fenchel dual is NPhard. We discuss implications of these results on the weak
membership problem for a symmetric convex body and its polar dual, the
polynomial approximability of Mahler volume, and the weak membership problem
for the epigraph of a convex function with polynomial growth and that of its
Fenchel dual.

We resolve a basic problem on subspace distances that often arises in
applications: How can the usual Grassmann distance between equidimensional
subspaces be extended to subspaces of different dimensions? We show that a
natural solution is given by the distance of a point to a Schubert variety
within the Grassmannian. This distance reduces to the Grassmann distance when
the subspaces are equidimensional and does not depend on any embedding into a
larger ambient space. Furthermore, it has a concrete expression involving
principal angles, and is efficiently computable in numerically stable ways. Our
results are largely independent of the Grassmann distance  if desired, it
may be substituted by any other common distances between subspaces. Our
approach depends on a concrete algebraic geometric view of the Grassmannian
that parallels the differential geometric perspective that is wellestablished
in applied and computational mathematics.

We discuss a generalization of the CohnUmans method, a potent technique
developed for studying the bilinear complexity of matrix multiplication by
embedding matrices into an appropriate group algebra. We investigate how the
CohnUmans method may be used for bilinear operations other than matrix
multiplication, with algebras other than group algebras, and we relate it to
Strassen's tensor rank approach, the traditional framework for investigating
bilinear complexity. To demonstrate the utility of the generalized method, we
apply it to find the fastest algorithms for forming structured matrixvector
product, the basic operation underlying iterative algorithms for structured
matrices. The structures we study include Toeplitz, Hankel, circulant,
symmetric, skewsymmetric, fcirculant, blockToeplitzToeplitzblock,
triangular Toeplitz matrices, ToeplitzplusHankel, sparse/banded/triangular.
Except for the case of skewsymmetric matrices, for which we have only upper
bounds, the algorithms derived using the generalized CohnUmans method in all
other instances are the fastest possible in the sense of having minimum
bilinear complexity. We also apply this framework to a few other bilinear
operations including matrixmatrix, commutator, simultaneous matrix products,
and briefly discuss the relation between tensor nuclear norm and numerical
stability.

We establish several mathematical and computational properties of the nuclear
norm for higherorder tensors. We show that like tensor rank, tensor nuclear
norm is dependent on the choice of base field  the value of the nuclear norm
of a real 3tensor depends on whether we regard it as a real 3tensor or a
complex 3tensor with real entries. We show that every tensor has a nuclear
norm attaining decomposition and every symmetric tensor has a symmetric nuclear
norm attaining decomposition. There is a corresponding notion of nuclear rank
that, unlike tensor rank, is upper semicontinuous. We establish an analogue of
Banach's theorem for tensor spectral norm and Comon's conjecture for tensor
rank  for a symmetric tensor, its symmetric nuclear norm always equals its
nuclear norm. We show that computing tensor nuclear norm is NPhard in several
sense. Deciding weak membership in the nuclear norm unit ball of 3tensors is
NPhard, as is finding an $\varepsilon$approximation of nuclear norm for
3tensors. In addition, the problem of computing spectral or nuclear norm of a
4tensor is NPhard, even if we restrict the 4tensor to be biHermitian,
bisymmetric, positive semidefinite, nonnegative valued, or all of the above. We
discuss some simple polynomialtime approximation bounds. As an aside, we show
that the nuclear $(p,q)$norm of a matrix is NPhard in general but can be
computed in polynomialtime if $p=1$, $q = 1$, or $p=q=2$, with closedform
expressions for the nuclear $(1,q)$ and $(p,1)$norms.

We present explicit algorithms for computing structured matrixvector
products that are optimal in the sense of Strassen, i.e., using a provably
minimum number of multiplications. These structures include
Toeplitz/Hankel/circulant, symmetric, ToeplitzplusHankel, sparse, and
multilevel structures. The last category include \textsc{bttb}, \textsc{bhhb},
\textsc{bccb} but also any arbitrarily complicated nested structures built out
of other structures.

We show that for a nonnegative tensor, a best nonnegative rankr
approximation is almost always unique, its best rankone approximation may
always be chosen to be a best nonnegative rankone approximation, and that the
set of nonnegative tensors with nonunique best rankone approximations form an
algebraic hypersurface. We show that the last part holds true more generally
for real tensors and thereby determine a polynomial equation so that a real or
nonnegative tensor which does not satisfy this equation is guaranteed to have a
unique best rankone approximation. We also establish an analogue for real or
nonnegative symmetric tensors. In addition, we prove a singular vector variant
of the PerronFrobenius Theorem for positive tensors and apply it to show that
a best nonnegative rankr approximation of a positive tensor can never be
obtained by deflation. As an aside, we verify that the Euclidean distance (ED)
discriminants of the Segre variety and the Veronese variety are hypersurfaces
and give defining equations of these ED discriminants.

This is an elementary introduction to the Hodge Laplacian on a graph, a
higherorder generalization of the graph Laplacian. We will discuss basic
properties including cohomology and Hodge theory. The main feature of our
approach is simplicity, requiring only knowledge of linear algebra and graph
theory. We have also isolated the algebra from the topology to show that a
large part of cohomology and Hodge theory is nothing more than the linear
algebra of matrices satisfying $AB = 0$. For the remaining topological aspect,
we cast our discussions entirely in terms of graphs as opposed to lessfamiliar
topological objects like simplicial complexes.

We introduce a Bayesian model for inferring mixtures of subspaces of
different dimensions. The key challenge in such a mixture model is
specification of prior distributions over subspaces of different dimensions. We
address this challenge by embedding subspaces or Grassmann manifolds into a
sphere of relatively low dimension and specifying priors on the sphere. We
provide an efficient sampling algorithm for the posterior distribution of the
model parameters. We illustrate that a simple extension of our mixture of
subspaces model can be applied to topic modeling. We also prove posterior
consistency for the mixture of subspaces model. The utility of our approach is
demonstrated with applications to real and simulated data.

In this paper, we first extend the celebrated PageRank modification to a
higherorder Markov chain. Although this system has attractive theoretical
properties, it is computationally intractable for many interesting problems. We
next study a computationally tractable approximation to the higherorder
PageRank vector that involves a system of polynomial equations called
multilinear PageRank, which is a type of tensor PageRank vector. It is
motivated by a novel "spacey random surfer" model, where the surfer remembers
bits and pieces of history and is influenced by this information. The
underlying stochastic process is an instance of a vertexreinforced random
walk. We develop convergence theory for a simple fixedpoint method, a shifted
fixedpoint method, and a Newton iteration in a particular parameter regime. In
marked contrast to the case of the PageRank vector of a Markov chain where the
solution is always unique and easy to compute, there are parameter regimes of
multilinear PageRank where solutions are not unique and simple algorithms do
not converge. We provide a repository of these nonconvergent cases that we
encountered through exhaustive enumeration and randomly sampling that we
believe is useful for future study of the problem.

We show that every nbyn matrix is generically a product of [n/2] + 1
Toeplitz matrices and always a product of at most 2n+5 Toeplitz matrices. The
same result holds true if the word "Toeplitz" is replaced by "Hankel", and the
generic bound [n/2] + 1 is sharp. We will see that these decompositions into
Toeplitz or Hankel factors are unusual: We may not in general replace the
subspace of Toeplitz or Hankel matrices by an arbitrary (2n1)dimensional
subspace of nbyn matrices. Furthermore such decompositions do not exist if we
require the factors to be symmetric Toeplitz, persymmetric Hankel, or circulant
matrices, even if we allow an infinite number of factors. Lastly, we discuss
how the Toeplitz and Hankel decompositions of a generic matrix may be computed
by either (i) solving a system of linear and quadratic equations if the number
of factors is required to be [n/2] + 1, or (ii) Gaussian elimination in O(n^3)
time if the number of factors is allowed to be 2n.

We discuss a technique that allows blind recovery of signals or blind
identification of mixtures in instances where such recovery or identification
were previously thought to be impossible: (i) closely located or highly
correlated sources in antenna array processing, (ii) highly correlated
spreading codes in CDMA radio communication, (iii) nearly dependent spectra in
fluorescent spectroscopy. This has important implications  in the case of
antenna array processing, it allows for joint localization and extraction of
multiple sources from the measurement of a noisy mixture recorded on multiple
sensors in an entirely deterministic manner. In the case of CDMA, it allows the
possibility of having a number of users larger than the spreading gain. In the
case of fluorescent spectroscopy, it allows for detection of nearly identical
chemical constituents. The proposed technique involves the solution of a
bounded coherence lowrank multilinear approximation problem. We show that
bounded coherence allows us to establish existence and uniqueness of the
recovered solution. We will provide some statistical motivation for the
approximation problem and discuss greedy approximation bounds. To provide the
theoretical underpinnings for this technique, we develop a corresponding theory
of sparse separable decompositions of functions, including notions of rank and
nuclear norm that specialize to the usual ones for matrices and operators but
apply to also hypermatrices and tensors.

We prove that multilinear (tensor) analogues of many efficiently computable
problems in numerical linear algebra are NPhard. Our list here includes:
determining the feasibility of a system of bilinear equations, deciding whether
a 3tensor possesses a given eigenvalue, singular value, or spectral norm;
approximating an eigenvalue, eigenvector, singular vector, or the spectral
norm; and determining the rank or best rank1 approximation of a 3tensor.
Furthermore, we show that restricting these problems to symmetric tensors does
not alleviate their NPhardness. We also explain how deciding nonnegative
definiteness of a symmetric 4tensor is NPhard and how computing the
combinatorial hyperdeterminant of a 4tensor is NP, #P, and VNPhard. We
shall argue that our results provide another view of the boundary separating
the computational tractability of linear/convex problems from the
intractability of nonlinear/nonconvex ones.

We discuss how recently discovered techniques and tools from compressed
sensing can be used in tensor decompositions, with a view towards modeling
signals from multiple arrays of multiple sensors. We show that with appropriate
bounds on a measure of separation between radiating sources called coherence,
one could always guarantee the existence and uniqueness of a best rankr
approximation of the tensor representing the signal. We also deduce a
computationally feasible variant of Kruskal's uniqueness condition, where the
coherence appears as a proxy for krank. Problems of sparsest recovery with an
infinite continuous dictionary, lowestrank tensor representation, and blind
source separation are treated in a uniform fashion. The decomposition of the
measurement tensor leads to simultaneous localization and extraction of
radiating sources, in an entirely deterministic manner.

We give an elementary proof of a somewhat curious result, namely, that
deciding whether a convex function is selfconcordant is in general an
intractable problem.

In this article, we propose an algorithm, NESTALASSO, for the LASSO problem,
i.e., an underdetermined linear leastsquares problem with a 1norm constraint
on the solution. We prove under the assumption of the restricted isometry
property (RIP) and a sparsity condition on the solution, that NESTALASSO is
guaranteed to be almost always locally linearly convergent. As in the case of
the algorithm NESTA proposed by Becker, Bobin, and Candes, we rely on
Nesterov's accelerated proximal gradient method, which takes O(e^{1/2})
iterations to come within e > 0 of the optimal value. We introduce a
modification to Nesterov's method that regularly updates the proxcenter in a
provably optimal manner, and the aforementioned linear convergence is in part
due to this modification.
In the second part of this article, we attempt to solve the basis pursuit
denoising BPDN problem (i.e., approximating the minimum 1norm solution to an
underdetermined least squares problem) by using NESTALASSO in conjunction with
the Pareto rootfinding method employed by van den Berg and Friedlander in
their SPGL1 solver. The resulting algorithm is called PARNES. We provide
numerical evidence to show that it is comparable to currently available
solvers.

The process of rank aggregation is intimately intertwined with the structure
of skewsymmetric matrices. We apply recent advances in the theory and
algorithms of matrix completion to skewsymmetric matrices. This combination of
ideas produces a new method for ranking a set of items. The essence of our idea
is that a rank aggregation describes a partially filled skewsymmetric matrix.
We extend an algorithm for matrix completion to handle skewsymmetric data and
use that to extract ranks for each item. Our algorithm applies to both pairwise
comparison and rating data. Because it is based on matrix completion, it is
robust to both noise and incomplete data. We show a formal recovery result for
the noiseless case and present a detailed study of the algorithm on synthetic
data and Netflix ratings.