• This paper considers optimization over multiple renewal systems coupled by time average constraints. These systems act asynchronously over variable length frames. For each system, at the beginning of each renewal frame, it chooses an action which affects the duration of its own frame, the penalty, and the resource expenditure throughout the frame. The goal is to minimize the overall time average penalty subject to several overall time average resource constraints which couple these systems. This problem has applications to task processing networks, coupled Markov decision processes(MDPs) and so on. We propose a distributed algorithm so that each system can make its own decision after observing a global multiplier which is updated slot-wise. We show that this algorithm satisfies the desired constraints and achieves $\mathcal{O}(\varepsilon)$ near optimality with $\mathcal{O}(1/\varepsilon^2)$ convergence time.
  • This paper considers utility optimal power control for energy harvesting wireless devices with a finite capacity battery. The distribution information of the underlying wireless environment and harvestable energy is unknown and only outdated system state information is known at the device controller. This scenario shares similarity with Lyapunov opportunistic optimization and online learning but is different from both. By a novel combination of Zinkevich's online gradient learning technique and the drift-plus-penalty technique from Lyapunov opportunistic optimization, this paper proposes a learning-aided algorithm that achieves utility within $O(\epsilon)$ of the optimal, for any desired $\epsilon>0$, by using a battery with an $O(1/\epsilon)$ capacity. The proposed algorithm has low complexity and makes power investment decisions based on system history, without requiring knowledge of the system state or its probability distribution.
  • This paper considers the fundamental convergence time for opportunistic scheduling over time-varying channels. The channel state probabilities are unknown and algorithms must perform some type of estimation and learning while they make decisions to optimize network utility. Existing schemes can achieve a utility within $\epsilon$ of optimality, for any desired $\epsilon>0$, with convergence and adaptation times of $O(1/\epsilon^2)$. This paper shows that if the utility function is concave and smooth, then $O(\log(1/\epsilon)/\epsilon)$ convergence time is possible via an existing stochastic variation on the Frank-Wolfe algorithm, called the RUN algorithm. Next, a converse result is proven to show it is impossible for any algorithm to have convergence time better than $O(1/\epsilon)$, provided the algorithm has no a-priori knowledge of channel state probabilities. Hence, RUN is within a logarithmic factor of convergence time optimality. However, RUN has a vanishing stepsize and hence has an infinite adaptation time. Using stochastic Frank-Wolfe with a fixed stepsize yields improved $O(1/\epsilon^2)$ adaptation time, but convergence time increases to $O(1/\epsilon^2)$, similar to existing drift-plus-penalty based algorithms. This raises important open questions regarding optimal adaptation.
  • We consider multiple parallel Markov decision processes (MDPs) coupled by global constraints, where the time varying objective and constraint functions can only be observed after the decision is made. Special attention is given to how well the decision maker can perform in $T$ slots, starting from any state, compared to the best feasible randomized stationary policy in hindsight. We develop a new distributed online algorithm where each MDP makes its own decision each slot after observing a multiplier computed from past information. While the scenario is significantly more challenging than the classical online learning context, the algorithm is shown to have a tight $O(\sqrt{T})$ regret and constraint violations simultaneously. To obtain such a bound, we combine several new ingredients including ergodicity and mixing time bound in weakly coupled MDPs, a new regret analysis for online constrained optimization, a drift analysis for queue processes, and a perturbation analysis based on Farkas' Lemma.
  • This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich's OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i.i.d. generated at each round and are disclosed to the decision maker only after the decision is made. This formulation arises naturally when decisions are restricted by stochastic environments or deterministic environments with noisy observations. It also includes many important problems as special cases, such as OCO with long term constraints, stochastic constrained convex optimization, and deterministic constrained convex optimization. To solve this problem, this paper proposes a new algorithm that achieves $O(\sqrt{T})$ expected regret and constraint violations and $O(\sqrt{T}\log(T))$ high probability regret and constraint violations. Experiments on a real-world data center scheduling problem further verify the performance of the new algorithm.
  • This paper studies the convergence time of dual gradient methods for general (possibly non-differentiable) strongly convex programs. For general convex programs, the convergence time of dual subgradient/gradient methods with simple running averages (running averages started from iteration $0$) is known to be $O(\frac{1}{\epsilon^{2}})$. This paper shows that the convergence time for general strongly convex programs is $O(\frac{1}{\epsilon})$. This paper also considers a variation of the average scheme, called the sliding running averages, and shows that if the dual function of the strongly convex program is locally quadratic (Note that the locally quadratic property is implied by the locally strongly concave property.) then the convergence time of the dual gradient method with sliding running averages is $O(\log(\frac{1}{\epsilon}))$. The convergence time analysis is further verified by numerical experiments.
  • This paper considers large scale constrained convex (possibly composite and non-separable) programs, which are usually difficult to solve by interior point methods or other Newton-type methods due to the non-smoothness or the prohibitive computation and storage complexity for Hessians and matrix inversions. Instead, they are often solved by first order gradient based methods or decomposition based methods. The conventional primal-dual subgradient method, also known as the Arrow-Hurwicz-Uzawa subgradient method, is a low complexity algorithm with an $O(1/\epsilon^2)$ convergence time. Recently, a new Lagrangian dual type algorithm with a faster $O(1/\epsilon)$ convergence time is proposed in Yu and Neely (2017). However, if the objective or constraint functions are not separable, each iteration of the Lagrangian dual type method in Yu and Neely (2017) requires to solve a unconstrained convex program, which can have huge complexity. This paper proposes a new primal-dual type algorithm with $O(1/\epsilon)$ convergence for general constrained convex programs. Each iteration of the new algorithm can be implemented in parallel with low complexity even when the original problem is composite and non-separable.
  • This paper considers convex programs with a general (possibly non-differentiable) convex objective function and Lipschitz continuous convex inequality constraint functions. A simple algorithm is developed and achieves an $O(1/t)$ convergence rate. Similar to the classical dual subgradient algorithm and the ADMM algorithm, the new algorithm has a parallel implementation when the objective and constraint functions are separable. However, the new algorithm has a faster $O(1/t)$ convergence rate compared with the best known $O(1/\sqrt{t})$ convergence rate for the dual subgradient algorithm with primal averaging. Further, it can solve convex programs with nonlinear constraints, which cannot be handled by the ADMM algorithm. The new algorithm is applied to a multipath network utility maximization problem and yields a decentralized flow control algorithm with the fast $O(1/t)$ convergence rate.
  • This paper considers dynamic transmit covariance design in point-to-point MIMO fading systems with unknown channel state distributions and inaccurate channel state information subject to both long term and short term power constraints. First, the case of instantaneous but possibly inaccurate channel state information at the transmitter (CSIT) is treated. By extending the drift-plus-penalty technique, a dynamic transmit covariance policy is developed and is shown to approach optimality with an $O(\delta)$ gap, where $\delta$ is the inaccuracy measure of CSIT, regardless of the channel state distribution and without requiring knowledge of this distribution. Next, the case of delayed and inaccurate channel state information is considered. The optimal transmit covariance solution that maximizes the ergodic capacity is fundamentally different in this case, and a different online algorithm based on convex projections is developed. The proposed algorithm for this delayed-CSIT case also has an $O(\delta)$ optimality gap, where $\delta$ is again the inaccuracy measure of CSIT.
  • This paper considers online convex optimization with time-varying constraint functions. Specifically, we have a sequence of convex objective functions $\{f_t(x)\}_{t=0}^{\infty}$ and convex constraint functions $\{g_{t,i}(x)\}_{t=0}^{\infty}$ for $i \in \{1, ..., k\}$. The functions are gradually revealed over time. For a given $\epsilon>0$, the goal is to choose points $x_t$ every step $t$, without knowing the $f_t$ and $g_{t,i}$ functions on that step, to achieve a time average at most $\epsilon$ worse than the best fixed-decision that could be chosen with hindsight, subject to the time average of the constraint functions being nonpositive. It is known that this goal is generally impossible. This paper develops an online algorithm that solves the problem with $O(1/\epsilon^2)$ convergence time in the special case when all constraint functions are nonpositive over a common subset of $\mathbb{R}^n$. Similar performance is shown in an expected sense when the common subset assumption is removed but the constraint functions are assumed to vary according to a random process that is independent and identically distributed (i.i.d.) over time slots $t \in \{0, 1, 2, \ldots\}$. Finally, in the special case when both the constraint and objective functions are i.i.d. over time slots $t$, the algorithm is shown to come within $\epsilon$ of optimality with respect to the best (possibly time-varying) causal policy that knows the full probability distribution.
  • The backpressure algorithm has been widely used as a distributed solution to the problem of joint rate control and routing in multi-hop data networks. By controlling a parameter $V$ in the algorithm, the backpressure algorithm can achieve an arbitrarily small utility optimality gap. However, this in turn brings in a large queue length at each node and hence causes large network delay. This phenomenon is known as the fundamental utility-delay tradeoff. The best known utility-delay tradeoff for general networks is $[O(1/V), O(V)]$ and is attained by a backpressure algorithm based on a drift-plus-penalty technique. This may suggest that to achieve an arbitrarily small utility optimality gap, the existing backpressure algorithms necessarily yield an arbitrarily large queue length. However, this paper proposes a new backpressure algorithm that has a vanishing utility optimality gap, so utility converges to exact optimality as the algorithm keeps running, while queue lengths are bounded throughout by a finite constant. The technique uses backpressure and drift concepts with a new method for convex programming.
  • This paper considers a cost minimization problem for data centers with N servers and randomly arriving service requests. A central router decides which server to use for each new request. Each server has three types of states (active, idle, setup) with different costs and time durations. The servers operate asynchronously over their own states and can choose one of multiple sleep modes when idle. We develop an online distributed control algorithm so that each server makes its own decisions, the request queues are bounded and the overall time average cost is near optimal with probability 1. The algorithm does not need probability information for the arrival rate or job sizes. Next, an improved algorithm that uses a single queue is developed via a "virtualization" technique which is shown to provide the same (near optimal) costs. Simulation experiments on a real data center traffic trace demonstrate the efficiency of our algorithm compared to other existing algorithms.
  • This paper considers the problem of minimizing the time average of a controlled stochastic process subject to multiple time average constraints on other related processes. The probability distribution of the random events in the system is unknown to the controller. A typical application is time average power minimization subject to network throughput constraints for different users in a network with time varying channel conditions. We show that with probability at least $1-2\delta$, the classical drift-plus-penalty algorithm provides a sample path $\mathcal{O}(\varepsilon)$ approximation to optimality with a convergence time $\mathcal{O}(\frac{1}{\varepsilon^2}\max\left\{\log^2\frac1\varepsilon\log\frac2\delta,~\log^3\frac2\delta\right\})$, where $\varepsilon>0$ is a parameter related to the algorithm. When there is only one constraint, we further show that the convergence time can be improved to $\mathcal{O}\left(\frac{1}{\varepsilon^2}\log^2\frac1\delta\right)$.
  • Traffic load-balancing in datacenters alleviates hot spots and improves network utilization. In this paper, a stable in-network load-balancing algorithm is developed in the setting of software-defined networking. A control plane configures a data plane over successive intervals of time. While the MaxWeight algorithm can be applied in this setting and offers certain throughput optimality properties, its bang-bang control structure rewards single flows on each interval and prohibits link-capacity sharing. This paper develops a new algorithm that is throughput-optimal and allows link-capacity sharing, leading to low queue occupancy. The algorithm deliberately imitates weighted fair queueing, which provides fairness and graceful interaction with TCP traffic. Inspired by insights from the analysis, a heuristic improvement is also developed to operate with practical switches and TCP flows. Simulations from a network simulator shows that the algorithm outperforms the widely-used equal-cost multipath (ECMP) technique.
  • This paper considers time-average optimization, where a decision vector is chosen every time step within a (possibly non-convex) set, and the goal is to minimize a convex function of the time averages subject to convex constraints on these averages. Such problems have applications in networking, multi-agent systems, and operations research, where decisions are constrained to a discrete set and the decision average can represent average bit rates or average agent actions. This time-average optimization extends traditional convex formulations to allow a non-convex decision set. This class of problems can be solved by Lyapunov optimization. A simple drift-based algorithm, related to a classical dual subgradient algorithm, converges to an $\epsilon$-optimal solution within $O(1/\epsilon^2)$ time steps. Further, the algorithm is shown to have a transient phase and a steady state phase which can be exploited to improve convergence rates to $O(1/\epsilon)$ and $O(1/{\epsilon^{1.5}})$ when vectors of Lagrange multipliers satisfy locally-polyhedral and locally-smooth assumptions respectively. Practically, this improved convergence suggests that decisions should be implemented after the transient period.
  • This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional projection based online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the potentially high computation complexity of the projection operation. In this paper, we relax the functional constraints by allowing them to be violated at each round but still requiring them to be satisfied in the long term. This type of relaxed online convex optimization (with long term constraints) was first considered in Mahdavi et al. (2012). That prior work proposes an algorithm to achieve $O(\sqrt{T})$ regret and $O(T^{3/4})$ constraint violations for general problems and another algorithm to achieve an $O(T^{2/3})$ bound for both regret and constraint violations when the constraint set can be described by a finite number of linear constraints. A recent extension in Jenatton et al. (2016) can achieve $O(T^{\max\{\beta,1-\beta\}})$ regret and $O(T^{1-\beta/2})$ constraint violations where $\beta\in (0,1)$. The current paper proposes a new simple algorithm that yields improved performance in comparison to prior works. The new algorithm achieves an $O(\sqrt{T})$ regret bound with finite constraint violations.
  • Stochastic non-smooth convex optimization constitutes a class of problems in machine learning and operations research. This paper considers minimization of a non-smooth function based on stochastic subgradients. When the function has a locally polyhedral structure, a staggered time average algorithm is proven to have O(1/T) convergence rate. A more general convergence result is proven when the locally polyhedral assumption is removed. In that case, the convergence bound depends on the curvature of the function near the minimum. Finally, the locally polyhedral assumption is shown to improve convergence beyond O(1/T) for a special case of deterministic problems.
  • This paper considers an opportunistic scheduling problem over a renewal system. A controller observes a random event at the beginning of each renewal frame and then chooses an action in response to the event, which affects the duration of the frame, the amount of resources used, and a penalty metric. The goal is to make frame-wise decisions so as to minimize the time average penalty subject to time average resource constraints. This problem has applications to task processing and communication in data networks, as well as to certain classes of Markov decision problems. We formulate the problem as a dynamic fractional program and propose an adaptive algorithm which uses an empirical accumulation as a feedback parameter. A key feature of the proposed algorithm is that it does not require knowledge of the random event statistics and potentially allows (uncountably) infinite event sets. We prove the algorithm satisfies all desired constraints and achieves $O(\epsilon)$ near optimality with probability 1.
  • This paper considers large scale constrained convex programs, which are usually not solvable by interior point methods or other Newton-type methods due to the prohibitive computation and storage complexity for Hessians and matrix inversions. Instead, large scale constrained convex programs are often solved by gradient based methods or decomposition based methods. The conventional primal-dual subgradient method, aka, Arrow-Hurwicz-Uzawa subgradient method, is a low complexity algorithm with the $O(1/\sqrt{t})$ convergence rate, where $t$ is the number of iterations. If the objective and constraint functions are separable, the Lagrangian dual type method can decompose a large scale convex program into multiple parallel small scale convex programs. The classical dual gradient algorithm is an example of Lagrangian dual type methods and has convergence rate $O(1/\sqrt{t})$. Recently, a new Lagrangian dual type algorithm with faster $O(1/t)$ convergence is proposed in Yu and Neely (2015). However, if the objective or constraint functions are not separable, each iteration of the Lagrangian dual type method in Yu and Neely (2015) requires to solve a large scale unconstrained convex program, which can have huge complexity. This paper proposes a new primal-dual type algorithm, which only involves simple gradient updates at each iteration and has the $O(1/t)$ convergence rate.
  • This paper considers optimization of power and delay in a time-varying wireless link using rateless codes. The link serves a sequence of variable-length packets. Each packet is coded and transmitted over multiple slots. Channel conditions can change from slot to slot and are unknown to the transmitter. The amount of mutual information accumulated on each slot depends on the random channel realization and the power used. The goal is to minimize average service delay subject to an average power constraint. We formulate this problem as a frame-based stochastic optimization problem and solve it via an online algorithm. We show that the subproblem within each frame is a simple integer program which can be effectively solved using a dynamic program. The optimality of this online algorithm is proved using the frame-based Lyapunov drift analysis.
  • We consider the problem of simultaneous on-demand streaming of stored video to multiple users in a multi-cell wireless network where multiple unicast streaming sessions are run in parallel and share the same frequency band. Each streaming session is formed by the sequential transmission of video "chunks," such that each chunk arrives into the corresponding user playback buffer within its playback deadline. We formulate the problem as a Network Utility Maximization (NUM) where the objective is to fairly maximize users' video streaming Quality of Experience (QoE) and then derive an iterative control policy using Lyapunov Optimization, which solves the NUM problem up to any level of accuracy and yields an online protocol with control actions at every iteration decomposing into two layers interconnected by the users' request queues : i) a video streaming adaptation layer reminiscent of DASH, implemented at each user node; ii) a transmission scheduling layer where a max-weight scheduler is implemented at each base station. The proposed chunk request scheme is a pull strategy where every user opportunistically requests video chunks from the neighboring base stations and dynamically adapts the quality of its requests based on the current size of the request queue. For the transmission scheduling component, we first describe the general max-weight scheduler and then particularize it to a wireless network where the base stations have multiuser MIMO (MU-MIMO) beamforming capabilities. We exploit the channel hardening effect of large-dimensional MIMO channels (massive MIMO) and devise a low complexity user selection scheme to solve the underlying combinatorial problem of selecting user subsets for downlink beamforming, which can be easily implemented and run independently at each base station.
  • This paper treats power-aware throughput maxi-mization in a multi-user file downloading system. Each user can receive a new file only after its previous file is finished. The file state processes for each user act as coupled Markov chains that form a generalized restless bandit system. First, an optimal algorithm is derived for the case of one user. The algorithm maximizes throughput subject to an average power constraint. Next, the one-user algorithm is extended to a low complexity heuristic for the multi-user problem. The heuristic uses a simple online index policy. In a special case with no power-constraint, the multi-user heuristic is shown to be throughput optimal. Simulations are used to demonstrate effectiveness of the heuristic in the general case. For simple cases where the optimal solution can be computed offline, the heuristic is shown to be near-optimal for a wide range of parameters.
  • We consider the optimal design of a scheduling policy for adaptive video streaming in a wireless network formed by several users and helpers. A feature of such networks is that any user is typically in the range of multiple helpers. Hence, in order to cope with user-helper association, load balancing and inter-cell interference, an efficient streaming policy should allow the users to dynamically select the helper node to download from, and determine adaptively the video quality level of the download. In order to obtain a tractable formulation, we follow a "divide and conquer" approach: i) Assuming that each video packet (chunk) is delivered within its playback delay ("smooth streaming regime"), the problem is formulated as a network utility maximization (NUM), subject to queue stability, where the network utility function is a concave and componentwise non-decreasing function of the users' video quality measure. ii) We solve the NUM problem by using a Lyapunov Drift Plus Penalty approach, obtaining a scheme that naturally decomposes into two sub-policies referred to as "congestion control" (adaptive video quality and helper station selection) and "transmission scheduling" (dynamic allocation of the helper-user physical layer transmission rates).Our solution is provably optimal with respect to the proposed NUM problem, in a strong per-sample path sense. iii) Finally, we propose a method to adaptively estimate the maximum queuing delays, such that each user can calculate its pre-buffering and re-buffering time in order to cope with the fluctuations of the queuing delays. Through simulations, we evaluate the performance of the proposed algorithm under realistic assumptions of a network with densely deployed helper nodes, and demonstrate the per-sample path optimality of the proposed solution by considering a non-stationary non-ergodic scenario with user mobility, VBR video coding.
  • This paper considers time-average stochastic optimization, where a time average decision vector, an average of decision vectors chosen in every time step from a time-varying (possibly non-convex) set, minimizes a convex objective function and satisfies convex constraints. This formulation has applications in networking and operations research. In general, time-average stochastic optimization can be solved by a Lyapunov optimization technique. This paper shows that the technique exhibits a transient phase and a steady state phase. When the problem has a unique vector of Lagrange multipliers, the convergence time can be improved. By starting the time average in the steady state the convergence times become $O(1/\epsilon)$ under a locally-polyhedral assumption and $O(1/\epsilon^{1.5})$ under a locally-non-polyhedral assumption, where $\epsilon$ denotes the proximity to the optimal objective cost. Simulations suggest that the results may hold more generally without the unique Lagrange multiplier assumption.
  • One practical open problem is the development of a distributed algorithm that achieves near-optimal utility using only a finite (and small) buffer size for queues in a stochastic network. This paper studies utility maximization (or cost minimization) in a finite-buffer regime and considers the corresponding delay and reliability (or rate of packet drops) tradeoff. A floating-queue algorithm allows the stochastic network optimization framework to be implemented with finite buffers at the cost of packet drops. Further, the buffer size requirement is significantly smaller than previous works in this area. With a finite buffer size of $B$ packets, the proposed algorithm achieves within $O(e^{-B})$ of the optimal utility while maintaining average per-hop delay of $O(B)$ and an average per-hop drop rate of $O(e^{-B})$ in steady state. From an implementation perspective, the floating-queue algorithm requires little modification of the well-known Drift-Plus-Penalty policy (including MaxWeight and Backpressure policies). As a result, the floating-queue algorithm inherits the distributed and low complexity nature of these policies.