• Random permutation is observed to be powerful for optimization algorithms: for multi-block ADMM (alternating direction method of multipliers), while the classical cyclic version divergence, the randomly permuted version converges in practice; for BCD (block coordinate descent), the randomly permuted version is typically faster than other versions. In this paper, we provide strong theoretical evidence that random permutation has positive effects on ADMM and BCD, by analyzing randomly permuted ADMM (RP-ADMM) for solving linear systems of equations, and randomly permuted BCD (RP-BCD) for solving unconstrained quadratic problems. First, we prove that RP-ADMM converges in expectation for solving systems of linear equations. The key technical result is that the spectrum of the expected update matrix of RP-BCD lies in $(-1/3, 1)$, instead of the typical range $(-1, 1)$. Second, we establish expected convergence rates of RP-ADMM for solving linear sytems and RP-BCD for solving unconstrained quadratic problems. This expected rate of RP-BCD is $O(n)$ times better than the worst-case rate of cyclic BCD, thus establishing a gap of at least $O(n)$ between RP-BCD and cyclic BCD. To analyze RP-BCD, we propose a conjecture of a new matrix AM-GM (algebraic mean-geometric mean) inequality, and prove a weaker version of it.
  • An important step in a multi-sensor surveillance system is to estimate sensor biases from their noisy asynchronous measurements. This estimation problem is computationally challenging due to the highly nonlinear transformation between the global and local coordinate systems as well as the measurement asynchrony from different sensors. In this paper, we propose a novel nonlinear least squares (LS) formulation for the problem by assuming the existence of a reference target moving with an (unknown) constant velocity. We also propose an efficient block coordinate decent (BCD) optimization algorithm, with a judicious initialization, to solve the problem. The proposed BCD algorithm alternately updates the range and azimuth bias estimates by solving linear least squares problems and semidefinite programs (SDPs). In the absence of measurement noise, the proposed algorithm is guaranteed to find the global solution of the problem and the true biases. Simulation results show that the proposed algorithm significantly outperforms the existing approaches in terms of the root mean square error (RMSE).
  • To support multiple on-demand services over fixed communication networks, network operators must allow flexible customization and fast provision of their network resources. One effective approach to this end is network virtualization, whereby each service is mapped to a virtual subnetwork providing dedicated on-demand support to network users. In practice, each service consists of a prespecified sequence of functions, called a service function chain (SFC), while each service function in a SFC can only be provided by some given network nodes. Thus, to support a given service, we must select network function nodes according to the SFC and determine the routing strategy through the function nodes in a specified order. A crucial network slicing problem that needs to be addressed is how to optimally localize the service functions in a physical network as specified by the SFCs, subject to link and node capacity constraints. In this paper, we formulate the network slicing problem as a mixed binary linear program and establish its strong NP-hardness. Furthermore, we propose efficient penalty successive upper bound minimization (PSUM) and PSUM-R(ounding) algorithms, and two heuristic algorithms to solve the problem. Simulation results are shown to demonstrate the effectiveness of the proposed algorithms.
  • We consider the problem of reserving link capacity in a network in such a way that any of a given set of flow scenarios can be supported. In the optimal capacity reservation problem, we choose the reserved link capacities to minimize the reservation cost. This problem reduces to a large linear program, with the number of variables and constraints on the order of the number of links times the number of scenarios. Small and medium size problems are within the capabilities of generic linear program solvers. We develop a more scalable, distributed algorithm for the problem that alternates between solving (in parallel) one flow problem per scenario, and coordination steps, which connect the individual flows and the reservation capacities.
  • Matrix factorization is a popular approach for large-scale matrix completion. The optimization formulation based on matrix factorization can be solved very efficiently by standard algorithms in practice. However, due to the non-convexity caused by the factorization model, there is a limited theoretical understanding of this formulation. In this paper, we establish a theoretical guarantee for the factorization formulation to correctly recover the underlying low-rank matrix. In particular, we show that under similar conditions to those in previous works, many standard optimization algorithms converge to the global optima of a factorization formulation, and recover the true low-rank matrix. We study the local geometry of a properly regularized factorization formulation and prove that any stationary point in a certain local region is globally optimal. A major difference of our work from the existing results is that we do not need resampling in either the algorithm or its analysis. Compared to other works on nonconvex optimization, one extra difficulty lies in analyzing nonconvex constrained optimization when the constraint (or the corresponding regularizer) is not "consistent" with the gradient direction. One technical contribution is the perturbation analysis for non-symmetric matrix factorization.
  • The joint base station (BS) association and beamforming problem has been studied extensively in recent years, yet the computational complexity for even the simplest SISO case has not been fully characterized. In this paper, we consider the problems for an uplink SISO/SIMO cellular network under the max-min fairness criterion. We first prove that the problems for both the SISO and SIMO scenarios are polynomial time solvable. Secondly, we present a fixed point based binary search (BS-FP) algorithm for both SISO and SIMO scenarios whereby a QoS (Quality of Service) constrained subproblem is solved at each step by a fixed point method. Thirdly, we propose a normalized fixed point (NFP) iterative algorithm to directly solve the original problem and prove its geometric convergence to global optima. Although it is not known whether the NFP algorithm is a polynomial time algorithm, empirically it converges to the global optima orders of magnitude faster than the polynomial time algorithms, making it suitable for applications in huge-scale networks.
  • The alternating direction method of multipliers (ADMM) is widely used to solve large-scale linearly constrained optimization problems, convex or nonconvex, in many engineering fields. However there is a general lack of theoretical understanding of the algorithm when the objective function is nonconvex. In this paper we analyze the convergence of the ADMM for solving certain nonconvex consensus and sharing problems, and show that the classical ADMM converges to the set of stationary solutions, provided that the penalty parameter in the augmented Lagrangian is chosen to be sufficiently large. For the sharing problems, we show that the ADMM is convergent regardless of the number of variable blocks. Our analysis does not impose any assumptions on the iterates generated by the algorithm, and is broadly applicable to many ADMM variants involving proximal update rules and various flexible block selection rules.
  • This article presents a powerful algorithmic framework for big data optimization, called the Block Successive Upper bound Minimization (BSUM). The BSUM includes as special cases many well-known methods for analyzing massive data sets, such as the Block Coordinate Descent (BCD), the Convex-Concave Procedure (CCCP), the Block Coordinate Proximal Gradient (BCPG) method, the Nonnegative Matrix Factorization (NMF), the Expectation Maximization (EM) method and so on. In this article, various features and properties of the BSUM are discussed from the viewpoint of design flexibility, computational efficiency, parallel/distributed implementation and the required communication overhead. Illustrative examples from networking, signal processing and machine learning are presented to demonstrate the practical performance of the BSUM framework
  • In this paper we consider the dictionary learning problem for sparse representation. We first show that this problem is NP-hard by polynomial time reduction of the densest cut problem. Then, using successive convex approximation strategies, we propose efficient dictionary learning schemes to solve several practical formulations of this problem to stationary points. Unlike many existing algorithms in the literature, such as K-SVD, our proposed dictionary learning scheme is theoretically guaranteed to converge to the set of stationary points under certain mild assumptions. For the image denoising application, the performance and the efficiency of the proposed dictionary learning scheme are comparable to that of K-SVD algorithm in simulation.
  • Consider a multi-input multi-output (MIMO) downlink multi-user channel. A well-studied problem in such system is the design of linear beamformers for power minimization with the quality of service (QoS) constraints. The most representative algorithms for solving this class of problems are the so-called MMSE-SOCP algorithm [11-12] and the UDD algorithm [9]. The former is based on alternating optimization of the transmit and receive beamformers, while the latter is based on the well-known uplink-dowlink duality theory. Despite their wide applicability, the convergence (to KKT solutions) of both algorithms is still open in the literature. In this paper, we rigorously establish the convergence of these algorithms for QoS-constrained power minimization (QCPM) problem with both single stream and multiple streams per user cases. Key to our analysis is the development and analysis of a new MMSE-DUAL algorithm, which connects the MMSE-SOCP and the UDD algorithm. Our numerical experiments show that 1) all these algorithms can almost always reach points with the same objective value irrespective of initialization, 2) the MMSE-SOCP/MMSE-DUAL algorithm works well while the UDD algorithm may fail with an infeasible initialization.
  • In this paper, we provide a unified iteration complexity analysis for a family of general block coordinate descent (BCD) methods, covering popular methods such as the block coordinate gradient descent (BCGD) and the block coordinate proximal gradient (BCPG), under various different coordinate update rules. We unify these algorithms under the so-called Block Successive Upper-bound Minimization (BSUM) framework, and show that for a broad class of multi-block nonsmooth convex problems, all algorithms covered by the BSUM framework achieve a global sublinear iteration complexity of $O(1/r)$, where r is the iteration index. Moreover, for the case of block coordinate minimization (BCM) where each block is minimized exactly, we establish the sublinear convergence rate of $O(1/r)$ without per block strong convexity assumption. Further, we show that when there are only two blocks of variables, a special BSUM algorithm with Gauss-Seidel rule can be accelerated to achieve an improved rate of $O(1/r^2)$.
  • Vector space interference alignment (IA) is known to achieve high degrees of freedom (DoF) with infinite independent channel extensions, but its performance is largely unknown for a finite number of possibly dependent channel extensions. In this paper, we consider a $K$-user $M_t \times M_r$ MIMO interference channel (IC) with arbitrary number of channel extensions $T$ and arbitrary channel diversity order $L$ (i.e., each channel matrix is a generic linear combination of $L$ fixed basis matrices). We study the maximum DoF achievable via vector space IA in the single beam case (i.e. each user sends one data stream). We prove that the total number of users $K$ that can communicate interference-free using linear transceivers is upper bounded by $NL+N^2/4$, where $N = \min\{M_tT, M_rT \}$. An immediate consequence of this upper bound is that for a SISO IC the DoF in the single beam case is no more than $\min\left\{\sqrt{ 5K/4}, L + T/4\right\}$. When the channel extensions are independent, i.e. $ L$ achieves the maximum $M_r M_t T $, we show that this maximum DoF lies in $[M_r+M_t-1, M_r+M_t]$ regardless of $T$. Unlike the well-studied constant MIMO IC case, the main difficulty is how to deal with a hybrid system of equations (zero-forcing condition) and inequalities (full rank condition). Our approach combines algebraic tools that deal with equations with an induction analysis that indirectly considers the inequalities.
  • Consider the problem of minimizing the sum of a smooth (possibly non-convex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is updated while the remaining variables are held fixed. With the recent advances in the developments of the multi-core parallel processing technology, it is desirable to parallelize the BCD method by allowing multiple blocks to be updated simultaneously at each iteration of the algorithm. In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function. We investigate the convergence of this parallel BCD method for both randomized and cyclic variable selection rules. We analyze the asymptotic and non-asymptotic convergence behavior of the algorithm for both convex and non-convex objective functions. The numerical experiments suggest that for a special case of Lasso minimization problem, the cyclic block selection rule can outperform the randomized rule.
  • We consider the problem of minimizing block-separable convex functions subject to linear constraints. While the Alternating Direction Method of Multipliers (ADMM) for two-block linear constraints has been intensively studied both theoretically and empirically, in spite of some preliminary work, effective generalizations of ADMM to multiple blocks is still unclear. In this paper, we propose a randomized block coordinate method named Parallel Direction Method of Multipliers (PDMM) to solve the optimization problems with multi-block linear constraints. PDMM randomly updates some primal and dual blocks in parallel, behaving like parallel randomized block coordinate descent. We establish the global convergence and the iteration complexity for PDMM with constant step size. We also show that PDMM can do randomized block coordinate descent on overlapping blocks. Experimental results show that PDMM performs better than state-of-the-arts methods in two applications, robust principal component analysis and overlapping group lasso.
  • In a heterogeneous network (HetNet) with a large number of low power base stations (BSs), proper user-BS association and power control is crucial to achieving desirable system performance. In this paper, we systematically study the joint BS association and power allocation problem for a downlink cellular network under the max-min fairness criterion. First, we show that this problem is NP-hard. Second, we show that the upper bound of the optimal value can be easily computed, and propose a two-stage algorithm to find a high-quality suboptimal solution. Simulation results show that the proposed algorithm is near-optimal in the high-SNR regime. Third, we show that the problem under some additional mild assumptions can be solved to global optima in polynomial time by a semi-distributed algorithm. This result is based on a transformation of the original problem to an assignment problem with gains $\log(g_{ij})$, where $\{g_{ij}\}$ are the channel gains.
  • Consider the problem of minimizing the sum of two convex functions, one being smooth and the other non-smooth. In this paper, we introduce a general class of approximate proximal splitting (APS) methods for solving such minimization problems. Methods in the APS class include many well-known algorithms such as the proximal splitting method (PSM), the block coordinate descent method (BCD) and the approximate gradient projection methods for smooth convex optimization. We establish the linear convergence of APS methods under a local error bound assumption. Since the latter is known to hold for compressive sensing and sparse group LASSO problems, our analysis implies the linear convergence of the BCD method for these problems without strong convexity assumption.
  • Motivated by applications in wireless communications, this paper develops semidefinite programming (SDP) relaxation techniques for some mixed binary quadratically constrained quadratic programs (MBQCQP) and analyzes their approximation performance. We consider both a minimization and a maximization model of this problem. For the minimization model, the objective is to find a minimum norm vector in $N$-dimensional real or complex Euclidean space, such that $M$ concave quadratic constraints and a cardinality constraint are satisfied with both binary and continuous variables. {\color{blue}By employing a special randomized rounding procedure, we show that the ratio between the norm of the optimal solution of the minimization model and its SDP relaxation is upper bounded by $\cO(Q^2(M-Q+1)+M^2)$ in the real case and by $\cO(M(M-Q+1))$ in the complex case.} For the maximization model, the goal is to find a maximum norm vector subject to a set of quadratic constraints and a cardinality constraint with both binary and continuous variables. We show that in this case the approximation ratio is bounded from below by $\cO(\epsilon/\ln(M))$ for both the real and the complex cases. Moreover, this ratio is tight up to a constant factor.
  • Consider the problem of minimizing the sum of a smooth convex function and a separable nonsmooth convex function subject to linear coupling constraints. Problems of this form arise in many contemporary applications including signal processing, wireless networking and smart grid provisioning. Motivated by the huge size of these applications, we propose a new class of first order primal-dual algorithms called the block successive upper-bound minimization method of multipliers (BSUM-M) to solve this family of problems. The BSUM-M updates the primal variable blocks successively by minimizing locally tight upper-bounds of the augmented Lagrangian of the original problem, followed by a gradient type update for the dual variable in closed form. We show that under certain regularity conditions, and when the primal block variables are updated in either a deterministic or a random fashion, the BSUM-M converges to the set of optimal solutions. Moreover, in the absence of linear constraints, we show that the BSUM-M, which reduces to the block successive upper-bound minimization (BSUM) method, is capable of linear convergence without strong convexity.
  • We consider a heterogeneous network (HetNet) of base stations (BSs) connected via a backhaul network of routers and wired/wireless links with limited capacity. The optimal provision of such networks requires proper resource allocation across the radio access links in conjunction with appropriate traffic engineering within the backhaul network. In this paper we propose an efficient algorithm for joint resource allocation across the wireless links and the flow control within the backhaul network. The proposed algorithm, which maximizes the minimum rate among all the users and/or flows, is based on a decomposition approach that leverages both the Alternating Direction Method of Multipliers (ADMM) and the weighted-MMSE (WMMSE) algorithm. We show that this algorithm is easily parallelizable and converges globally to a stationary solution of the joint optimization problem. The proposed algorithm can also be extended to deal with per-flow quality of service constraint, or to networks with multi-antenna nodes.
  • In a wireless system with a large number of distributed nodes, the quality of communication can be greatly improved by pooling the nodes to perform joint transmission/reception. In this paper, we consider the problem of optimally selecting a subset of nodes from potentially a large number of candidates to form a virtual multi-antenna system, while at the same time designing their joint linear transmission strategies. We focus on two specific application scenarios: 1) multiple single antenna transmitters cooperatively transmit to a receiver; 2) a single transmitter transmits to a receiver with the help of a number of cooperative relays. We formulate the joint node selection and beamforming problems as cardinality constrained optimization problems with both discrete variables (used for selecting cooperative nodes) and continuous variables (used for designing beamformers). For each application scenario, we first characterize the computational complexity of the joint optimization problem, and then propose novel semi-definite relaxation (SDR) techniques to obtain approximate solutions. We show that the new SDR algorithms have a guaranteed approximation performance in terms of the gap to global optimality, regardless of channel realizations. The effectiveness of the proposed algorithms is demonstrated via numerical experiments.
  • In a densely deployed heterogeneous network (HetNet), the number of pico/micro base stations (BS) can be comparable with the number of the users. To reduce the operational overhead of the HetNet, proper identification of the set of serving BSs becomes an important design issue. In this work, we show that by jointly optimizing the transceivers and determining the active set of BSs, high system resource utilization can be achieved with only a small number of BSs. In particular, we provide formulations and efficient algorithms for such joint optimization problem, under the following two common design criteria: i) minimization of the total power consumption at the BSs, and ii) maximization of the system spectrum efficiency. In both cases, we introduce a nonsmooth regularizer to facilitate the activation of the most appropriate BSs. We illustrate the efficiency and the efficacy of the proposed algorithms via extensive numerical simulations.
  • In this paper, we consider solving multiple-block separable convex minimization problems using alternating direction method of multipliers (ADMM). Motivated by the fact that the existing convergence theory for ADMM is mostly limited to the two-block case, we analyze in this paper, both theoretically and numerically, a new strategy that first transforms a multi-block problem into an equivalent two-block problem (either in the primal domain or in the dual domain) and then solves it using the standard two-block ADMM. In particular, we derive convergence results for this two-block ADMM approach to solve multi-block separable convex minimization problems, including an improved O(1/\epsilon) iteration complexity result. Moreover, we compare the numerical efficiency of this approach with the standard multi-block ADMM on several separable convex minimization problems which include basis pursuit, robust principal component analysis and latent variable Gaussian graphical model selection. The numerical results show that the multiple-block ADMM, although lacks theoretical convergence guarantees, typically outperforms two-block ADMMs.
  • Consider the problem of minimizing the expected value of a cost function parameterized by a random variable. The classical sample average approximation (SAA) method for solving this problem requires minimization of an ensemble average of the objective at each step, which can be expensive. In this paper, we propose a stochastic successive upper-bound minimization method (SSUM) which minimizes an approximate ensemble average at each iteration. To ensure convergence and to facilitate computation, we require the approximate ensemble average to be a locally tight upper-bound of the expected cost function and be easily optimized. The main contributions of this work include the development and analysis of the SSUM method as well as its applications in linear transceiver design for wireless communication networks and online dictionary learning. Moreover, using the SSUM framework, we extend the classical stochastic (sub-)gradient (SG) method to the case of minimizing a nonsmooth nonconvex objective function and establish its convergence.
  • We analyze the convergence rate of the alternating direction method of multipliers (ADMM) for minimizing the sum of two or more nonsmooth convex separable functions subject to linear constraints. Previous analysis of the ADMM typically assumes that the objective function is the sum of only two convex functions defined on two separable blocks of variables even though the algorithm works well in numerical experiments for three or more blocks. Moreover, there has been no rate of convergence analysis for the ADMM without strong convexity in the objective function. In this paper we establish the global linear convergence of the ADMM for minimizing the sum of any number of convex separable functions. This result settles a key question regarding the convergence of the ADMM when the number of blocks is more than two or if the strong convexity is absent. It also implies the linear convergence of the ADMM for several contemporary applications including LASSO, Group LASSO and Sparse Group LASSO without any strong convexity assumption. Our proof is based on estimating the distance from a dual feasible solution to the optimal dual solution set by the norm of a certain proximal residual, and by requiring the dual stepsize to be sufficiently small.
  • We consider the interference management problem in a multicell MIMO heterogenous network. Within each cell there are a large number of distributed micro/pico base stations (BSs) that can be potentially coordinated for joint transmission. To reduce coordination overhead, we consider user-centric BS clustering so that each user is served by only a small number of (potentially overlapping) BSs. Thus, given the channel state information, our objective is to jointly design the BS clustering and the linear beamformers for all BSs in the network. In this paper, we formulate this problem from a {sparse optimization} perspective, and propose an efficient algorithm that is based on iteratively solving a sequence of group LASSO problems. A novel feature of the proposed algorithm is that it performs BS clustering and beamformer design jointly rather than separately as is done in the existing approaches for partial coordinated transmission. Moreover, the cluster size can be controlled by adjusting a single penalty parameter in the nonsmooth regularized utility function. The convergence of the proposed algorithm (to a local optimal solution) is guaranteed, and its effectiveness is demonstrated via extensive simulation.