
In this paper, we propose a new approach to solve lowrank tensor completion
and robust tensor PCA. Our approach is based on some novel notion of
(evenorder) tensor ranks, to be called the Mrank, the symmetric Mrank, and
the strongly symmetric Mrank. We discuss the connections between these new
tensor ranks and the CPrank and the symmetric CPrank of an evenorder tensor.
We show that the Mrank provides a reliable and easycomputable approximation
to the CPrank. As a result, we propose to replace the CPrank by the Mrank in
the lowCPrank tensor completion and robust tensor PCA. Numerical results
suggest that our new approach based on the Mrank outperforms existing methods
that are based on lownrank, tSVD and KBR approaches for solving lowrank
tensor completion and robust tensor PCA when the underlying tensor has low
CPrank.

In this paper, we consider the linearly constrained composite convex
optimization problem, whose objective is a sum of a smooth function and a
possibly nonsmooth function. We propose an inexact augmented Lagrangian (IAL)
framework for solving the problem. The stopping criterion used in solving the
augmented Lagrangian (AL) subproblem in the proposed IAL framework is weaker
and potentially much easier to check than the one used in most of the existing
IAL frameworks/methods. We analyze the global convergence and the nonergodic
convergence rate of the proposed IAL framework.

Computed tomography (CT) examinations are commonly used to predict lung
nodule malignancy in patients, which are shown to improve noninvasive early
diagnosis of lung cancer. It remains challenging for computational approaches
to achieve performance comparable to experienced radiologists. Here we present
NoduleX, a systematic approach to predict lung nodule malignancy from CT data,
based on deep learning convolutional neural networks (CNN). For training and
validation, we analyze >1000 lung nodules in images from the LIDC/IDRI cohort.
All nodules were identified and classified by four experienced thoracic
radiologists who participated in the LIDC project. NoduleX achieves high
accuracy for nodule malignancy classification, with an AUC of ~0.99. This is
commensurate with the analysis of the dataset by experienced radiologists. Our
approach, NoduleX, provides an effective framework for highly accurate nodule
malignancy prediction with the model trained on a large patient population. Our
results are replicable with software available at
http://bioinformatics.astate.edu/NoduleX.

Nonconvex and nonsmooth optimization problems are frequently encountered in
much of statistics, business, science and engineering, but they are not yet
widely recognized as a technology in the sense of scalability. A reason for
this relatively low degree of popularity is the lack of a well developed system
of theory and algorithms to support the applications, as is the case for its
convex counterpart. This paper aims to take one step in the direction of
disciplined nonconvex and nonsmooth optimization. In particular, we consider in
this paper some constrained nonconvex optimization models in block decision
variables, with or without coupled affine constraints. In the case of without
coupled constraints, we show a sublinear rate of convergence to an
$\epsilon$stationary solution in the form of variational inequality for a
generalized conditional gradient method, where the convergence rate is shown to
be dependent on the H\"olderian continuity of the gradient of the smooth part
of the objective. For the model with coupled affine constraints, we introduce
corresponding $\epsilon$stationarity conditions, and apply two proximaltype
variants of the ADMM to solve such a model, assuming the proximal ADMM updates
can be implemented for all the block variables except for the last block, for
which either a gradient step or a majorizationminimization step is
implemented. We show an iteration complexity bound of $O(1/\epsilon^2)$ to
reach an $\epsilon$stationary solution for both algorithms. Moreover, we show
that the same iteration complexity of a proximal BCD method follows
immediately. Numerical results are provided to illustrate the efficacy of the
proposed algorithms for tensor robust PCA.

The alternating direction method of multipliers (ADMM) has been successfully
applied to solve structured convex optimization problems due to its superior
practical performance. The convergence properties of the 2block ADMM have been
studied extensively in the literature. Specifically, it has been proven that
the 2block ADMM globally converges for any penalty parameter $\gamma>0$. In
this sense, the 2block ADMM allows the parameter to be free, i.e., there is no
need to restrict the value for the parameter when implementing this algorithm
in order to ensure convergence. However, for the 3block ADMM, Chen \etal
\cite{Chenadmmfailure2013} recently constructed a counterexample showing
that it can diverge if no further condition is imposed. The existing results on
studying further sufficient conditions on guaranteeing the convergence of the
3block ADMM usually require $\gamma$ to be smaller than a certain bound, which
is usually either difficult to compute or too small to make it a practical
algorithm. In this paper, we show that the 3block ADMM still globally
converges with any penalty parameter $\gamma>0$ if the third function $f_3$ in
the objective is smooth and strongly convex, and its condition number is in
$[1,1.0798)$, besides some other mild conditions. This requirement covers an
important class of problems to be called regularized least squares
decomposition (RLSD) in this paper.

In this paper we study nonconvex and nonsmooth multiblock optimization over
Riemannian manifolds with coupled linear constraints. Such optimization
problems naturally arise from machine learning, statistical learning,
compressive sensing, image processing, and tensor PCA, among others. We develop
an ADMMlike primaldual approach based on decoupled solvable subroutines such
as linearized proximal mappings. First, we introduce the optimality conditions
for the aforementioned optimization models. Then, the notion of
$\epsilon$stationary solutions is introduced as a result. The main part of the
paper is to show that the proposed algorithms enjoy an iteration complexity of
$O(1/\epsilon^2)$ to reach an $\epsilon$stationary solution. For prohibitively
largesize tensor or machine learning models, we present a samplingbased
stochastic algorithm with the same iteration complexity bound in expectation.
In case the subproblems are not analytically solvable, a feasible curvilinear
linesearch variant of the algorithm based on retraction operators is proposed.
Finally, we show specifically how the algorithms can be implemented to solve a
variety of practical problems such as the NPhard maximum bisection problem,
the $\ell_q$ regularized sparse tensor principal component analysis and the
community detection problem. Our preliminary numerical results show great
potentials of the proposed methods.

In this paper, we extend the geometric descent method recently proposed by
Bubeck, Lee and Singh to tackle nonsmooth and strongly convex composite
problems. We prove that our proposed algorithm, dubbed geometric proximal
gradient method (GeoPG), converges with a linear rate $(11/\sqrt{\kappa})$ and
thus achieves the optimal rate among firstorder methods, where $\kappa$ is the
condition number of the problem. Numerical results on linear regression and
logistic regression with elastic net regularization show that GeoPG compares
favorably with Nesterov's accelerated proximal gradient method, especially when
the problem is illconditioned.

In this paper, we propose a vector transportfree stochastic variance reduced
gradient (SVRG) method with general retraction for empirical risk minimization
over Riemannian manifold. Existing SVRG methods on manifold usually consider a
specific retraction operation, and involve additional computational costs such
as parallel transport or vector transport. The vector transportfree SVRG with
general retraction we propose in this paper handles general retraction
operations, and do not need additional computational costs mentioned above. As
a result, we name our algorithm SSVRG, where the first "S" means simple. We
analyze the iteration complexity of SSVRG for obtaining an
$\epsilon$stationary point and its local linear convergence by assuming the \L
ojasiewicz inequality, which naturally holds for PCA and holds with high
probability for matrix completion problem. We also incorporate the
BarzilaiBorwein step size and design a very practical SSVRGBB method.
Numerical results on PCA and matrix completion problems are reported to
demonstrate the efficiency of our methods.

In this paper we study stochastic quasiNewton methods for nonconvex
stochastic optimization, where we assume that noisy information about the
gradients of the objective function is available via a stochastic firstorder
oracle (SFO). We propose a general framework for such methods, for which we
prove almost sure convergence to stationary points and analyze its worstcase
iteration complexity. When a randomly chosen iterate is returned as the output
of such an algorithm, we prove that in the worstcase, the SFOcalls complexity
is $O(\epsilon^{2})$ to ensure that the expectation of the squared norm of the
gradient is smaller than the given accuracy tolerance $\epsilon$. We also
propose a specific algorithm, namely a stochastic damped LBFGS (SdLBFGS)
method, that falls under the proposed framework. {Moreover, we incorporate the
SVRG variance reduction technique into the proposed SdLBFGS method, and analyze
its SFOcalls complexity. Numerical results on a nonconvex binary
classification problem using SVM, and a multiclass classification problem using
neural networks are reported.

Given an undirected graph $\mathcal{G}=(\mathcal{N},\mathcal{E})$ of agents
$\mathcal{N}=\{1,\ldots,N\}$ connected with edges in $\mathcal{E}$, we study
how to compute an optimal decision on which there is consensus among agents and
that minimizes the sum of agentspecific private convex composite functions
$\{\Phi_i\}_{i\in\mathcal{N}}$ while respecting privacy requirements, where
$\Phi_i\triangleq \xi_i + f_i$ belongs to agent$i$. Assuming only agents
connected by an edge can communicate, we propose a distributed proximal
gradient method DPGA for consensus optimization over both unweighted and
weighted static (undirected) communication networks. In one iteration, each
agent$i$ computes the prox map of $\xi_i$ and gradient of $f_i$, and this is
followed by local communication with neighboring agents. We also study its
stochastic gradient variant, SDPGA, which can only access to noisy estimates of
$\nabla f_i$ at each agent$i$. This computational model abstracts a number of
applications in distributed sensing, machine learning and statistical
inference. We show ergodic convergence in both suboptimality error and
consensus violation for DPGA and SDPGA with rates $\mathcal{O}(1/t)$ and
$\mathcal{O}(1/\sqrt{t})$, respectively.

One of the major issues in stochastic gradient descent (SGD) methods is how
to choose an appropriate step size while running the algorithm. Since the
traditional line search technique does not apply for stochastic optimization
algorithms, the common practice in SGD is either to use a diminishing step
size, or to tune a fixed step size by hand, which can be time consuming in
practice. In this paper, we propose to use the BarzilaiBorwein (BB) method to
automatically compute step sizes for SGD and its variant: stochastic variance
reduced gradient (SVRG) method, which leads to two algorithms: SGDBB and
SVRGBB. We prove that SVRGBB converges linearly for strongly convex objective
functions. As a byproduct, we prove the linear convergence result of SVRG with
Option I proposed in [10], whose convergence result is missing in the
literature. Numerical experiments on standard data sets show that the
performance of SGDBB and SVRGBB is comparable to and sometimes even better
than SGD and SVRG with besttuned step sizes, and is superior to some advanced
SGD variants.

In this paper, we propose a class of penalty methods with stochastic
approximation for solving stochastic nonlinear programming problems. We assume
that only noisy gradients or function values of the objective function are
available via calls to a stochastic firstorder or zerothorder oracle. In each
iteration of the proposed methods, we minimize an exact penalty function which
is nonsmooth and nonconvex with only stochastic firstorder or zerothorder
information available. Stochastic approximation algorithms are presented for
solving this particular subproblem. The worstcase complexity of calls to the
stochastic firstorder (or zerothorder) oracle for the proposed penalty
methods for obtaining an $\epsilon$stochastic critical point is analyzed.

Gauge duality theory was originated by Freund [Math. Programming,
38(1):4767, 1987] and was recently further investigated by Friedlander,
Mac{\^e}do and Pong [SIAM J. Optm., 24(4):19992022, 2014]. When solving some
matrix optimization problems via gauge dual, one is usually able to avoid full
matrix decompositions such as singular value and/or eigenvalue decompositions.
In such an approach, a gauge dual problem is solved in the first stage, and
then an optimal solution to the primal problem can be recovered from the dual
optimal solution obtained in the first stage. Recently, this theory has been
applied to a class of \emph{semidefinite programming} (SDP) problems with
promising numerical results [Friedlander and Mac{\^e}do, SIAM J. Sci. Comp., to
appear, 2016]. In this paper, we establish some theoretical results on applying
the gauge duality theory to robust \emph{principal component analysis} (PCA)
and general SDP. For each problem, we present its gauge dual problem,
characterize the optimality conditions for the primaldual gauge pair, and
validate a way to recover a primal optimal solution from a dual one. These
results are extensions of [Friedlander and Mac{\^e}do, SIAM J. Sci. Comp., to
appear, 2016] from nuclear norm regularization to robust PCA and from a special
class of SDP which requires the coefficient matrix in the linear objective to
be positive definite to SDP problems without this restriction. Our results
provide further understanding in the potential advantages and disadvantages of
the gauge duality theory.

In this paper, we consider largescale linearly constrained composite convex
optimization problem, whose objective is a sum of a smooth function and a
possibly nonsmooth function. We propose a scalable
\textbf{F}rank\textbf{W}olfe based \textbf{A}ugmented \textbf{L}agrangian
(FWAL) method for solving this problem. At each iteration, the proposed FWAL
method employs the FW method (or its variants) to approximately solve the AL
subproblem {(with fixed Lagrange multiplier)} within a preselected tolerance
and then updates the Lagrange multiplier. The proposed FWAL method is well
suitable for solving largescale problems, because its computational cost per
step scales (essentially) linearly with the size of the input. We analyze the
nonergodic convergence rate of the proposed FWAL method.

The alternating direction method of multipliers (ADMM) is widely used in
solving structured convex optimization problems. Despite of its success in
practice, the convergence properties of the standard ADMM for minimizing the
sum of $N$ $(N\geq 3)$ convex functions with $N$ block variables linked by
linear constraints, have remained unclear for a very long time. In this paper,
we present convergence and convergence rate results for the standard ADMM
applied to solve $N$block $(N\geq 3)$ convex minimization problem, under the
condition that one of these functions is convex (not necessarily strongly
convex) and the other $N1$ functions are strongly convex. Specifically, in
that case the ADMM is proven to converge with rate $O(1/t)$ in a certain
ergodic sense, and $o(1/t)$ in nonergodic sense, where $t$ denotes the number
of iterations. As a byproduct, we also provide a simple proof for the $O(1/t)$
convergence rate of twoblock ADMM in terms of both objective error and
constraint violation, without assuming any condition on the penalty parameter
and strong convexity on the functions.

In this paper, we consider the problem of minimizing the sum of two convex
functions subject to linear linking constraints. The classical alternating
direction type methods usually assume that the two convex functions have
relatively easy proximal mappings. However, many problems arising from
statistics, image processing and other fields have the structure that while one
of the two functions has easy proximal mapping, the other function is smoothly
convex but does not have an easy proximal mapping. Therefore, the classical
alternating direction methods cannot be applied. To deal with the difficulty,
we propose in this paper an alternating direction method based on
extragradients. Under the assumption that the smooth function has a Lipschitz
continuous gradient, we prove that the proposed method returns an
$\epsilon$optimal solution within $O(1/\epsilon)$ iterations. We apply the
proposed method to solve a new statistical model called fused logistic
regression. Our numerical experiments show that the proposed method performs
very well when solving the test problems. We also test the performance of the
proposed method through solving the lasso problem arising from statistics and
compare the result with several existing efficient solvers for this problem;
the results are very encouraging indeed.

The alternating direction method of multipliers (ADMM) has been widely used
for solving structured convex optimization problems. In particular, the ADMM
can solve convex programs that minimize the sum of $N$ convex functions with
$N$block variables linked by some linear constraints. While the convergence of
the ADMM for $N=2$ was well established in the literature, it remained an open
problem for a long time whether or not the ADMM for $N \ge 3$ is still
convergent. Recently, it was shown in [3] that without further conditions the
ADMM for $N\ge 3$ may actually fail to converge. In this paper, we show that
under some easily verifiable and reasonable conditions the global linear
convergence of the ADMM when $N\geq 3$ can still be assured, which is important
since the ADMM is a popular method for solving large scale multiblock
optimization models and is known to perform very well in practice even when
$N\ge 3$. Our study aims to offer an explanation for this phenomenon.

The alternating direction method of multipliers (ADMM) is widely used in
solving structured convex optimization problems due to its superior practical
performance. On the theoretical side however, a counterexample was shown in [7]
indicating that the multiblock ADMM for minimizing the sum of $N$ $(N\geq 3)$
convex functions with $N$ block variables linked by linear constraints may
diverge. It is therefore of great interest to investigate further sufficient
conditions on the input side which can guarantee convergence for the
multiblock ADMM. The existing results typically require the strong convexity
on parts of the objective. In this paper, we present convergence and
convergence rate results for the multiblock ADMM applied to solve certain
$N$block $(N\geq 3)$ convex minimization problems without requiring strong
convexity. Specifically, we prove the following two results: (1) the
multiblock ADMM returns an $\epsilon$optimal solution within
$O(1/\epsilon^2)$ iterations by solving an associated perturbation to the
original problem; (2) the multiblock ADMM returns an $\epsilon$optimal
solution within $O(1/\epsilon)$ iterations when it is applied to solve a
certain sharing problem, under the condition that the augmented Lagrangian
function satisfies the KurdykaLojasiewicz property, which essentially covers
most convex optimization models except for some pathological cases.

In this paper we study stochastic quasiNewton methods for nonconvex
stochastic optimization, where we assume that only stochastic information of
the gradients of the objective function is available via a stochastic
firstorder oracle (SFO). Firstly, we propose a general framework of stochastic
quasiNewton methods for solving nonconvex stochastic optimization. The
proposed framework extends the classic quasiNewton methods working in
deterministic settings to stochastic settings, and we prove its almost sure
convergence to stationary points. Secondly, we propose a general framework for
a class of randomized stochastic quasiNewton methods, in which the number of
iterations conducted by the algorithm is a random variable. The worstcase
SFOcalls complexities of this class of methods are analyzed. Thirdly, we
present two specific methods that fall into this framework, namely stochastic
dampedBFGS method and stochastic cyclic BarzilaiBorwein method. Finally, we
report numerical results to demonstrate the efficiency of the proposed methods.

In an interference limited network, joint power and admission control (JPAC)
aims at supporting a maximum number of links at their specified signal to
interference plus noise ratio (SINR) targets while using a minimum total
transmission power. Various convex approximation deflation approaches have been
developed for the JPAC problem. In this paper, we propose an effective
polynomial time nonconvex approximation deflation approach for solving the
problem. The approach is based on the nonconvex $\ell_q$minimization
approximation of an equivalent sparse $\ell_0$minimization reformulation of
the JPAC problem where $q\in(0,1).$ We show that, for any instance of the JPAC
problem, there exists a $\bar q\in(0,1)$ such that it can be exactly solved by
solving its $\ell_q$minimization approximation problem with any $q\in(0, \bar
q]$. We also show that finding the global solution of the $\ell_q$
approximation problem is NPhard. Then, we propose a potential reduction
interiorpoint algorithm, which can return an $\epsilon$KKT solution of the
NPhard $\ell_q$minimization approximation problem in polynomial time. The
returned solution can be used to check the simultaneous supportability of all
links in the network and to guide an iterative link removal procedure,
resulting in the polynomial time nonconvex approximation deflation approach
for the JPAC problem. Numerical simulations show that the proposed approach
outperforms the existing convex approximation approaches in terms of the number
of supported links and the total transmission power, particularly exhibiting a
quite good performance in selecting which subset of links to support.

The primaldual algorithm recently proposed by Chambolle & Pock (abbreviated
as CPA) for structured convex optimization is very efficient and popular. It
was shown by Chambolle & Pock in \cite{CP11} and also by Shefi & Teboulle in
\cite{ST14} that CPA and variants are closely related to preconditioned
versions of the popular alternating direction method of multipliers
(abbreviated as ADM). In this paper, we further clarify this connection and
show that CPAs generate exactly the same sequence of points with the socalled
linearized ADM (abbreviated as LADM) applied to either the primal problem or
its Lagrangian dual, depending on different updating orders of the primal and
the dual variables in CPAs, as long as the initial points for the LADM are
properly chosen. The dependence on initial points for LADM can be relaxed by
focusing on cyclically equivalent forms of the algorithms. Furthermore, by
utilizing the fact that CPAs are applications of a general weighted proximal
point method to the mixed variational inequality formulation of the KKT system,
where the weighting matrix is positive definite under a parameter condition, we
are able to propose and analyze inertial variants of CPAs. Under certain
conditions, global pointconvergence, nonasymptotic $O(1/k)$ and asymptotic
$o(1/k)$ convergence rate of the proposed inertial CPAs can be guaranteed,
where $k$ denotes the iteration index. Finally, we demonstrate the profits
gained by introducing the inertial extrapolation step via experimental results
on compressive image reconstruction based on total variation minimization.

We consider the image denoising problem using total variation (TV)
regularization. This problem can be computationally challenging to solve due to
the nondifferentiability and nonlinearity of the regularization term. We
propose an alternating direction augmented Lagrangian (ADAL) method, based on a
new variable splitting approach that results in subproblems that can be solved
efficiently and exactly. The global convergence of the new algorithm is
established for the anisotropic TV model. For the isotropic TV model, by doing
further variable splitting, we are able to derive an ADAL method that is
globally convergent. We compare our methods with the split Bregman method
\cite{goldstein2009split},which is closely related to it, and demonstrate their
competitiveness in computational performance on a set of standard test images.

In this paper, we first propose a general inertial proximal point method for
the mixed variational inequality (VI) problem. Based on our knowledge, without
stronger assumptions, convergence rate result is not known in the literature
for inertial type proximal point methods. Under certain conditions, we are able
to establish the global convergence and a $o(1/k)$ convergence rate result
(under certain measure) of the proposed general inertial proximal point method.
We then show that the linearized alternating direction method of multipliers
(ADMM) for separable convex optimization with linear constraints is an
application of a general proximal point method, provided that the algorithmic
parameters are properly chosen. As byproducts of this finding, we establish
global convergence and $O(1/k)$ convergence rate results of the linearized ADMM
in both ergodic and nonergodic sense. In particular, by applying the proposed
inertial proximal point method for mixed VI to linearly constrained separable
convex optimization, we obtain an inertial version of the linearized ADMM for
which the global convergence is guaranteed. We also demonstrate the effect of
the inertial extrapolation step via experimental results on the compressive
principal component pursuit problem.

The composite $L_q~(0<q<1)$ minimization problem over a general polyhedron
has received various applications in machine learning, wireless communications,
image restoration, signal reconstruction, etc. This paper aims to provide a
theoretical study on this problem. Firstly, we show that for any fixed $0<q<1$,
finding the global minimizer of the problem, even its unconstrained
counterpart, is strongly NPhard. Secondly, we derive KarushKuhnTucker (KKT)
optimality conditions for local minimizers of the problem. Thirdly, we propose
a smoothing sequential quadratic programming framework for solving this
problem. The framework requires a (approximate) solution of a convex quadratic
program at each iteration. Finally, we analyze the worstcase iteration
complexity of the framework for returning an $\epsilon$KKT point; i.e., a
feasible point that satisfies a perturbed version of the derived KKT optimality
conditions. To the best of our knowledge, the proposed framework is the first
one with a worstcase iteration complexity guarantee for solving composite
$L_q$ minimization over a general polyhedron.

Consider the problem of minimizing the sum of a smooth convex function and a
separable nonsmooth convex function subject to linear coupling constraints.
Problems of this form arise in many contemporary applications including signal
processing, wireless networking and smart grid provisioning. Motivated by the
huge size of these applications, we propose a new class of first order
primaldual algorithms called the block successive upperbound minimization
method of multipliers (BSUMM) to solve this family of problems. The BSUMM
updates the primal variable blocks successively by minimizing locally tight
upperbounds of the augmented Lagrangian of the original problem, followed by a
gradient type update for the dual variable in closed form. We show that under
certain regularity conditions, and when the primal block variables are updated
in either a deterministic or a random fashion, the BSUMM converges to the set
of optimal solutions. Moreover, in the absence of linear constraints, we show
that the BSUMM, which reduces to the block successive upperbound minimization
(BSUM) method, is capable of linear convergence without strong convexity.