
A rank$r$ matrix $X \in \mathbb{R}^{m \times n}$ can be written as a product
$U V^\top$, where $U \in \mathbb{R}^{m \times r}$ and $V \in \mathbb{R}^{n
\times r}$. One could exploit this observation in optimization: e.g., consider
the minimization of a convex function $f(X)$ over rank$r$ matrices, where the
set of rank$r$ matrices is modeled via the factorization $UV^\top$. Though
such parameterization reduces the number of variables, and is more
computationally efficient (of particular interest is the case $r \ll \min\{m,
n\}$), it comes at a cost: $f(UV^\top)$ becomes a nonconvex function w.r.t.
$U$ and $V$.
We study such parameterization for optimization of generic convex objectives
$f$, and focus on firstorder, gradient descent algorithmic solutions. We
propose the BiFactored Gradient Descent (BFGD) algorithm, an efficient
firstorder method that operates on the $U, V$ factors. We show that when $f$
is (restricted) smooth, BFGD has local sublinear convergence, and linear
convergence when $f$ is both (restricted) smooth and (restricted) strongly
convex. For several key applications, we provide simple and efficient
initialization schemes that provide approximate solutions good enough for the
above convergence results to hold.

We study the projected gradient descent method on lowrank matrix problems
with a strongly convex objective. We use the BurerMonteiro factorization
approach to implicitly enforce lowrankness; such factorization introduces
nonconvexity in the objective. We focus on constraint sets that include both
positive semidefinite (PSD) constraints and specific matrix normconstraints.
Such criteria appear in quantum state tomography and phase retrieval
applications.
We show that nonconvex projected gradient descent favors local linear
convergence in the factored space. We build our theory on a novel descent
lemma, that nontrivially extends recent results on the unconstrained problem.
The resulting algorithm is Projected Factored Gradient Descent, abbreviated as
ProjFGD, and shows superior performance compared to state of the art on quantum
state tomography and sparse phase retrieval applications.

We consider the nonsquare matrix sensing problem, under restricted isometry
property (RIP) assumptions. We focus on the nonconvex formulation, where any
rank$r$ matrix $X \in \mathbb{R}^{m \times n}$ is represented as $UV^\top$,
where $U \in \mathbb{R}^{m \times r}$ and $V \in \mathbb{R}^{n \times r}$. In
this paper, we complement recent findings on the nonconvex geometry of the
analogous PSD setting [5], and show that matrix factorization does not
introduce any spurious local minima, under RIP.

We consider the problem of Robust PCA in the fully and partially observed
settings. Without corruptions, this is the wellknown matrix completion
problem. From a statistical standpoint this problem has been recently
wellstudied, and conditions on when recovery is possible (how many
observations do we need, how many corruptions can we tolerate) via
polynomialtime algorithms is by now understood. This paper presents and
analyzes a nonconvex optimization approach that greatly reduces the
computational complexity of the above problems, compared to the best available
algorithms. In particular, in the fully observed case, with $r$ denoting rank
and $d$ dimension, we reduce the complexity from
$\mathcal{O}(r^2d^2\log(1/\varepsilon))$ to
$\mathcal{O}(rd^2\log(1/\varepsilon))$  a big savings when the rank is big.
For the partially observed case, we show the complexity of our algorithm is no
more than $\mathcal{O}(r^4d \log d \log(1/\varepsilon))$. Not only is this the
bestknown runtime for a provable algorithm under partial observation, but in
the setting where $r$ is small compared to $d$, it also allows for
nearlinearin$d$ runtime that can be exploited in the fullyobserved case as
well, by simply running our algorithm on a subset of the observations.

In this paper we consider the collaborative ranking setting: a pool of users
each provides a small number of pairwise preferences between $d$ possible
items; from these we need to predict preferences of the users for items they
have not yet seen. We do so by fitting a rank $r$ score matrix to the pairwise
data, and provide two main contributions: (a) we show that an algorithm based
on convex optimization provides good generalization guarantees once each user
provides as few as $O(r\log^2 d)$ pairwise comparisons  essentially matching
the sample complexity required in the related matrix completion setting (which
uses actual numerical as opposed to pairwise information), and (b) we develop a
largescale nonconvex implementation, which we call AltSVM, that trains a
factored form of the matrix via alternating minimization (which we show reduces
to alternating SVM problems), and scales and parallelizes very well to large
problem settings. It also outperforms common baselines on many moderately large
popular collaborative filtering datasets in both NDCG and in other measures of
ranking performance.

We consider the problem of subspace clustering: given points that lie on or
near the union of many lowdimensional linear subspaces, recover the subspaces.
To this end, one first identifies sets of points close to the same subspace and
uses the sets to estimate the subspaces. As the geometric structure of the
clusters (linear subspaces) forbids proper performance of general distance
based approaches such as Kmeans, many modelspecific methods have been
proposed. In this paper, we provide new simple and efficient algorithms for
this problem. Our statistical analysis shows that the algorithms are guaranteed
exact (perfect) clustering performance under certain conditions on the number
of points and the affinity between subspaces. These conditions are weaker than
those considered in the standard statistical literature. Experimental results
on synthetic data generated from the standard unions of subspaces model
demonstrate our theory. We also show that our algorithm performs competitively
against stateoftheart algorithms on realworld applications such as motion
segmentation and face clustering, with much simpler implementation and lower
computational cost.

We introduce a distributed opportunistic scheduling (DOS) strategy, based on
two predetermined thresholds, for uplink $K$cell networks with timeinvariant
channel coefficients. Each base station (BS) opportunistically selects a mobile
station (MS) who has a large signal strength of the desired channel link among
a set of MSs generating a sufficiently small interference to other BSs. Then,
performance on the achievable throughput scaling law is analyzed. As our main
result, it is shown that the achievable sumrate scales as
$K\log(\text{SNR}\log N)$ in a high signaltonoise ratio (SNR) regime, if the
total number of users in a cell, $N$, scales faster than
$\text{SNR}^{\frac{K1}{1\epsilon}}$ for a constant $\epsilon\in(0,1)$. This
result indicates that the proposed scheme achieves the multiuser diversity gain
as well as the degreesoffreedom gain even under multicell environments.
Simulation results show that the DOS provides a better sumrate throughput over
conventional schemes.

We introduce an opportunistic interference mitigation (OIM) protocol, where a
user scheduling strategy is utilized in $K$cell uplink networks with
timeinvariant channel coefficients and base stations (BSs) having $M$
antennas. Each BS opportunistically selects a set of users who generate the
minimum interference to the other BSs. Two OIM protocols are shown according to
the number $S$ of simultaneously transmitting users per cell: opportunistic
interference nulling (OIN) and opportunistic interference alignment (OIA).
Then, their performance is analyzed in terms of degreesoffreedom (DoFs). As
our main result, it is shown that $KM$ DoFs are achievable under the OIN
protocol with $M$ selected users per cell, if the total number $N$ of users in
a cell scales at least as $\text{SNR}^{(K1)M}$. Similarly, it turns out that
the OIA scheme with $S$($<M$) selected users achieves $KS$ DoFs, if $N$ scales
faster than $\text{SNR}^{(K1)S}$. These results indicate that there exists a
tradeoff between the achievable DoFs and the minimum required $N$. By deriving
the corresponding upper bound on the DoFs, it is shown that the OIN scheme is
DoF optimal. Finally, numerical evaluation, a twostep scheduling method, and
the extension to multicarrier scenarios are shown.