
This paper considers optimization over multiple renewal systems coupled by
time average constraints. These systems act asynchronously over variable length
frames. For each system, at the beginning of each renewal frame, it chooses an
action which affects the duration of its own frame, the penalty, and the
resource expenditure throughout the frame. The goal is to minimize the overall
time average penalty subject to several overall time average resource
constraints which couple these systems. This problem has applications to task
processing networks, coupled Markov decision processes(MDPs) and so on. We
propose a distributed algorithm so that each system can make its own decision
after observing a global multiplier which is updated slotwise. We show that
this algorithm satisfies the desired constraints and achieves
$\mathcal{O}(\varepsilon)$ near optimality with $\mathcal{O}(1/\varepsilon^2)$
convergence time.

This paper considers utility optimal power control for energy harvesting
wireless devices with a finite capacity battery. The distribution information
of the underlying wireless environment and harvestable energy is unknown and
only outdated system state information is known at the device controller. This
scenario shares similarity with Lyapunov opportunistic optimization and online
learning but is different from both. By a novel combination of Zinkevich's
online gradient learning technique and the driftpluspenalty technique from
Lyapunov opportunistic optimization, this paper proposes a learningaided
algorithm that achieves utility within $O(\epsilon)$ of the optimal, for any
desired $\epsilon>0$, by using a battery with an $O(1/\epsilon)$ capacity. The
proposed algorithm has low complexity and makes power investment decisions
based on system history, without requiring knowledge of the system state or its
probability distribution.

This paper considers the fundamental convergence time for opportunistic
scheduling over timevarying channels. The channel state probabilities are
unknown and algorithms must perform some type of estimation and learning while
they make decisions to optimize network utility. Existing schemes can achieve a
utility within $\epsilon$ of optimality, for any desired $\epsilon>0$, with
convergence and adaptation times of $O(1/\epsilon^2)$. This paper shows that if
the utility function is concave and smooth, then $O(\log(1/\epsilon)/\epsilon)$
convergence time is possible via an existing stochastic variation on the
FrankWolfe algorithm, called the RUN algorithm. Next, a converse result is
proven to show it is impossible for any algorithm to have convergence time
better than $O(1/\epsilon)$, provided the algorithm has no apriori knowledge
of channel state probabilities. Hence, RUN is within a logarithmic factor of
convergence time optimality. However, RUN has a vanishing stepsize and hence
has an infinite adaptation time. Using stochastic FrankWolfe with a fixed
stepsize yields improved $O(1/\epsilon^2)$ adaptation time, but convergence
time increases to $O(1/\epsilon^2)$, similar to existing driftpluspenalty
based algorithms. This raises important open questions regarding optimal
adaptation.

We consider multiple parallel Markov decision processes (MDPs) coupled by
global constraints, where the time varying objective and constraint functions
can only be observed after the decision is made. Special attention is given to
how well the decision maker can perform in $T$ slots, starting from any state,
compared to the best feasible randomized stationary policy in hindsight. We
develop a new distributed online algorithm where each MDP makes its own
decision each slot after observing a multiplier computed from past information.
While the scenario is significantly more challenging than the classical online
learning context, the algorithm is shown to have a tight $O(\sqrt{T})$ regret
and constraint violations simultaneously. To obtain such a bound, we combine
several new ingredients including ergodicity and mixing time bound in weakly
coupled MDPs, a new regret analysis for online constrained optimization, a
drift analysis for queue processes, and a perturbation analysis based on
Farkas' Lemma.

This paper considers online convex optimization (OCO) with stochastic
constraints, which generalizes Zinkevich's OCO over a known simple fixed set by
introducing multiple stochastic functional constraints that are i.i.d.
generated at each round and are disclosed to the decision maker only after the
decision is made. This formulation arises naturally when decisions are
restricted by stochastic environments or deterministic environments with noisy
observations. It also includes many important problems as special cases, such
as OCO with long term constraints, stochastic constrained convex optimization,
and deterministic constrained convex optimization. To solve this problem, this
paper proposes a new algorithm that achieves $O(\sqrt{T})$ expected regret and
constraint violations and $O(\sqrt{T}\log(T))$ high probability regret and
constraint violations. Experiments on a realworld data center scheduling
problem further verify the performance of the new algorithm.

This paper studies the convergence time of dual gradient methods for general
(possibly nondifferentiable) strongly convex programs. For general convex
programs, the convergence time of dual subgradient/gradient methods with simple
running averages (running averages started from iteration $0$) is known to be
$O(\frac{1}{\epsilon^{2}})$. This paper shows that the convergence time for
general strongly convex programs is $O(\frac{1}{\epsilon})$. This paper also
considers a variation of the average scheme, called the sliding running
averages, and shows that if the dual function of the strongly convex program is
locally quadratic (Note that the locally quadratic property is implied by the
locally strongly concave property.) then the convergence time of the dual
gradient method with sliding running averages is $O(\log(\frac{1}{\epsilon}))$.
The convergence time analysis is further verified by numerical experiments.

This paper considers large scale constrained convex (possibly composite and
nonseparable) programs, which are usually difficult to solve by interior point
methods or other Newtontype methods due to the nonsmoothness or the
prohibitive computation and storage complexity for Hessians and matrix
inversions. Instead, they are often solved by first order gradient based
methods or decomposition based methods. The conventional primaldual
subgradient method, also known as the ArrowHurwiczUzawa subgradient method,
is a low complexity algorithm with an $O(1/\epsilon^2)$ convergence time.
Recently, a new Lagrangian dual type algorithm with a faster $O(1/\epsilon)$
convergence time is proposed in Yu and Neely (2017). However, if the objective
or constraint functions are not separable, each iteration of the Lagrangian
dual type method in Yu and Neely (2017) requires to solve a unconstrained
convex program, which can have huge complexity. This paper proposes a new
primaldual type algorithm with $O(1/\epsilon)$ convergence for general
constrained convex programs. Each iteration of the new algorithm can be
implemented in parallel with low complexity even when the original problem is
composite and nonseparable.

This paper considers convex programs with a general (possibly
nondifferentiable) convex objective function and Lipschitz continuous convex
inequality constraint functions. A simple algorithm is developed and achieves
an $O(1/t)$ convergence rate. Similar to the classical dual subgradient
algorithm and the ADMM algorithm, the new algorithm has a parallel
implementation when the objective and constraint functions are separable.
However, the new algorithm has a faster $O(1/t)$ convergence rate compared with
the best known $O(1/\sqrt{t})$ convergence rate for the dual subgradient
algorithm with primal averaging. Further, it can solve convex programs with
nonlinear constraints, which cannot be handled by the ADMM algorithm. The new
algorithm is applied to a multipath network utility maximization problem and
yields a decentralized flow control algorithm with the fast $O(1/t)$
convergence rate.

This paper considers dynamic transmit covariance design in pointtopoint
MIMO fading systems with unknown channel state distributions and inaccurate
channel state information subject to both long term and short term power
constraints. First, the case of instantaneous but possibly inaccurate channel
state information at the transmitter (CSIT) is treated. By extending the
driftpluspenalty technique, a dynamic transmit covariance policy is developed
and is shown to approach optimality with an $O(\delta)$ gap, where $\delta$ is
the inaccuracy measure of CSIT, regardless of the channel state distribution
and without requiring knowledge of this distribution. Next, the case of delayed
and inaccurate channel state information is considered. The optimal transmit
covariance solution that maximizes the ergodic capacity is fundamentally
different in this case, and a different online algorithm based on convex
projections is developed. The proposed algorithm for this delayedCSIT case
also has an $O(\delta)$ optimality gap, where $\delta$ is again the inaccuracy
measure of CSIT.

This paper considers online convex optimization with timevarying constraint
functions. Specifically, we have a sequence of convex objective functions
$\{f_t(x)\}_{t=0}^{\infty}$ and convex constraint functions
$\{g_{t,i}(x)\}_{t=0}^{\infty}$ for $i \in \{1, ..., k\}$. The functions are
gradually revealed over time. For a given $\epsilon>0$, the goal is to choose
points $x_t$ every step $t$, without knowing the $f_t$ and $g_{t,i}$ functions
on that step, to achieve a time average at most $\epsilon$ worse than the best
fixeddecision that could be chosen with hindsight, subject to the time average
of the constraint functions being nonpositive. It is known that this goal is
generally impossible. This paper develops an online algorithm that solves the
problem with $O(1/\epsilon^2)$ convergence time in the special case when all
constraint functions are nonpositive over a common subset of $\mathbb{R}^n$.
Similar performance is shown in an expected sense when the common subset
assumption is removed but the constraint functions are assumed to vary
according to a random process that is independent and identically distributed
(i.i.d.) over time slots $t \in \{0, 1, 2, \ldots\}$. Finally, in the special
case when both the constraint and objective functions are i.i.d. over time
slots $t$, the algorithm is shown to come within $\epsilon$ of optimality with
respect to the best (possibly timevarying) causal policy that knows the full
probability distribution.

The backpressure algorithm has been widely used as a distributed solution to
the problem of joint rate control and routing in multihop data networks. By
controlling a parameter $V$ in the algorithm, the backpressure algorithm can
achieve an arbitrarily small utility optimality gap. However, this in turn
brings in a large queue length at each node and hence causes large network
delay. This phenomenon is known as the fundamental utilitydelay tradeoff. The
best known utilitydelay tradeoff for general networks is $[O(1/V), O(V)]$ and
is attained by a backpressure algorithm based on a driftpluspenalty
technique. This may suggest that to achieve an arbitrarily small utility
optimality gap, the existing backpressure algorithms necessarily yield an
arbitrarily large queue length. However, this paper proposes a new backpressure
algorithm that has a vanishing utility optimality gap, so utility converges to
exact optimality as the algorithm keeps running, while queue lengths are
bounded throughout by a finite constant. The technique uses backpressure and
drift concepts with a new method for convex programming.

This paper considers a cost minimization problem for data centers with N
servers and randomly arriving service requests. A central router decides which
server to use for each new request. Each server has three types of states
(active, idle, setup) with different costs and time durations. The servers
operate asynchronously over their own states and can choose one of multiple
sleep modes when idle. We develop an online distributed control algorithm so
that each server makes its own decisions, the request queues are bounded and
the overall time average cost is near optimal with probability 1. The algorithm
does not need probability information for the arrival rate or job sizes. Next,
an improved algorithm that uses a single queue is developed via a
"virtualization" technique which is shown to provide the same (near optimal)
costs. Simulation experiments on a real data center traffic trace demonstrate
the efficiency of our algorithm compared to other existing algorithms.

This paper considers the problem of minimizing the time average of a
controlled stochastic process subject to multiple time average constraints on
other related processes. The probability distribution of the random events in
the system is unknown to the controller. A typical application is time average
power minimization subject to network throughput constraints for different
users in a network with time varying channel conditions. We show that with
probability at least $12\delta$, the classical driftpluspenalty algorithm
provides a sample path $\mathcal{O}(\varepsilon)$ approximation to optimality
with a convergence time
$\mathcal{O}(\frac{1}{\varepsilon^2}\max\left\{\log^2\frac1\varepsilon\log\frac2\delta,~\log^3\frac2\delta\right\})$,
where $\varepsilon>0$ is a parameter related to the algorithm. When there is
only one constraint, we further show that the convergence time can be improved
to $\mathcal{O}\left(\frac{1}{\varepsilon^2}\log^2\frac1\delta\right)$.

Traffic loadbalancing in datacenters alleviates hot spots and improves
network utilization. In this paper, a stable innetwork loadbalancing
algorithm is developed in the setting of softwaredefined networking. A control
plane configures a data plane over successive intervals of time. While the
MaxWeight algorithm can be applied in this setting and offers certain
throughput optimality properties, its bangbang control structure rewards
single flows on each interval and prohibits linkcapacity sharing. This paper
develops a new algorithm that is throughputoptimal and allows linkcapacity
sharing, leading to low queue occupancy. The algorithm deliberately imitates
weighted fair queueing, which provides fairness and graceful interaction with
TCP traffic. Inspired by insights from the analysis, a heuristic improvement is
also developed to operate with practical switches and TCP flows. Simulations
from a network simulator shows that the algorithm outperforms the widelyused
equalcost multipath (ECMP) technique.

This paper considers timeaverage optimization, where a decision vector is
chosen every time step within a (possibly nonconvex) set, and the goal is to
minimize a convex function of the time averages subject to convex constraints
on these averages. Such problems have applications in networking, multiagent
systems, and operations research, where decisions are constrained to a discrete
set and the decision average can represent average bit rates or average agent
actions. This timeaverage optimization extends traditional convex formulations
to allow a nonconvex decision set. This class of problems can be solved by
Lyapunov optimization. A simple driftbased algorithm, related to a classical
dual subgradient algorithm, converges to an $\epsilon$optimal solution within
$O(1/\epsilon^2)$ time steps. Further, the algorithm is shown to have a
transient phase and a steady state phase which can be exploited to improve
convergence rates to $O(1/\epsilon)$ and $O(1/{\epsilon^{1.5}})$ when vectors
of Lagrange multipliers satisfy locallypolyhedral and locallysmooth
assumptions respectively. Practically, this improved convergence suggests that
decisions should be implemented after the transient period.

This paper considers online convex optimization over a complicated constraint
set, which typically consists of multiple functional constraints and a set
constraint. The conventional projection based online projection algorithm
(Zinkevich, 2003) can be difficult to implement due to the potentially high
computation complexity of the projection operation. In this paper, we relax the
functional constraints by allowing them to be violated at each round but still
requiring them to be satisfied in the long term. This type of relaxed online
convex optimization (with long term constraints) was first considered in
Mahdavi et al. (2012). That prior work proposes an algorithm to achieve
$O(\sqrt{T})$ regret and $O(T^{3/4})$ constraint violations for general
problems and another algorithm to achieve an $O(T^{2/3})$ bound for both regret
and constraint violations when the constraint set can be described by a finite
number of linear constraints. A recent extension in Jenatton et al. (2016) can
achieve $O(T^{\max\{\beta,1\beta\}})$ regret and $O(T^{1\beta/2})$ constraint
violations where $\beta\in (0,1)$. The current paper proposes a new simple
algorithm that yields improved performance in comparison to prior works. The
new algorithm achieves an $O(\sqrt{T})$ regret bound with finite constraint
violations.

Stochastic nonsmooth convex optimization constitutes a class of problems in
machine learning and operations research. This paper considers minimization of
a nonsmooth function based on stochastic subgradients. When the function has a
locally polyhedral structure, a staggered time average algorithm is proven to
have O(1/T) convergence rate. A more general convergence result is proven when
the locally polyhedral assumption is removed. In that case, the convergence
bound depends on the curvature of the function near the minimum. Finally, the
locally polyhedral assumption is shown to improve convergence beyond O(1/T) for
a special case of deterministic problems.

This paper considers an opportunistic scheduling problem over a renewal
system. A controller observes a random event at the beginning of each renewal
frame and then chooses an action in response to the event, which affects the
duration of the frame, the amount of resources used, and a penalty metric. The
goal is to make framewise decisions so as to minimize the time average penalty
subject to time average resource constraints. This problem has applications to
task processing and communication in data networks, as well as to certain
classes of Markov decision problems. We formulate the problem as a dynamic
fractional program and propose an adaptive algorithm which uses an empirical
accumulation as a feedback parameter. A key feature of the proposed algorithm
is that it does not require knowledge of the random event statistics and
potentially allows (uncountably) infinite event sets. We prove the algorithm
satisfies all desired constraints and achieves $O(\epsilon)$ near optimality
with probability 1.

This paper considers large scale constrained convex programs, which are
usually not solvable by interior point methods or other Newtontype methods due
to the prohibitive computation and storage complexity for Hessians and matrix
inversions. Instead, large scale constrained convex programs are often solved
by gradient based methods or decomposition based methods. The conventional
primaldual subgradient method, aka, ArrowHurwiczUzawa subgradient method, is
a low complexity algorithm with the $O(1/\sqrt{t})$ convergence rate, where $t$
is the number of iterations. If the objective and constraint functions are
separable, the Lagrangian dual type method can decompose a large scale convex
program into multiple parallel small scale convex programs. The classical dual
gradient algorithm is an example of Lagrangian dual type methods and has
convergence rate $O(1/\sqrt{t})$. Recently, a new Lagrangian dual type
algorithm with faster $O(1/t)$ convergence is proposed in Yu and Neely (2015).
However, if the objective or constraint functions are not separable, each
iteration of the Lagrangian dual type method in Yu and Neely (2015) requires to
solve a large scale unconstrained convex program, which can have huge
complexity. This paper proposes a new primaldual type algorithm, which only
involves simple gradient updates at each iteration and has the $O(1/t)$
convergence rate.

This paper considers optimization of power and delay in a timevarying
wireless link using rateless codes. The link serves a sequence of
variablelength packets. Each packet is coded and transmitted over multiple
slots. Channel conditions can change from slot to slot and are unknown to the
transmitter. The amount of mutual information accumulated on each slot depends
on the random channel realization and the power used. The goal is to minimize
average service delay subject to an average power constraint. We formulate this
problem as a framebased stochastic optimization problem and solve it via an
online algorithm. We show that the subproblem within each frame is a simple
integer program which can be effectively solved using a dynamic program. The
optimality of this online algorithm is proved using the framebased Lyapunov
drift analysis.

We consider the problem of simultaneous ondemand streaming of stored video
to multiple users in a multicell wireless network where multiple unicast
streaming sessions are run in parallel and share the same frequency band. Each
streaming session is formed by the sequential transmission of video "chunks,"
such that each chunk arrives into the corresponding user playback buffer within
its playback deadline. We formulate the problem as a Network Utility
Maximization (NUM) where the objective is to fairly maximize users' video
streaming Quality of Experience (QoE) and then derive an iterative control
policy using Lyapunov Optimization, which solves the NUM problem up to any
level of accuracy and yields an online protocol with control actions at every
iteration decomposing into two layers interconnected by the users' request
queues : i) a video streaming adaptation layer reminiscent of DASH, implemented
at each user node; ii) a transmission scheduling layer where a maxweight
scheduler is implemented at each base station. The proposed chunk request
scheme is a pull strategy where every user opportunistically requests video
chunks from the neighboring base stations and dynamically adapts the quality of
its requests based on the current size of the request queue. For the
transmission scheduling component, we first describe the general maxweight
scheduler and then particularize it to a wireless network where the base
stations have multiuser MIMO (MUMIMO) beamforming capabilities. We exploit the
channel hardening effect of largedimensional MIMO channels (massive MIMO) and
devise a low complexity user selection scheme to solve the underlying
combinatorial problem of selecting user subsets for downlink beamforming, which
can be easily implemented and run independently at each base station.

This paper treats poweraware throughput maximization in a multiuser file
downloading system. Each user can receive a new file only after its previous
file is finished. The file state processes for each user act as coupled Markov
chains that form a generalized restless bandit system. First, an optimal
algorithm is derived for the case of one user. The algorithm maximizes
throughput subject to an average power constraint. Next, the oneuser algorithm
is extended to a low complexity heuristic for the multiuser problem. The
heuristic uses a simple online index policy. In a special case with no
powerconstraint, the multiuser heuristic is shown to be throughput optimal.
Simulations are used to demonstrate effectiveness of the heuristic in the
general case. For simple cases where the optimal solution can be computed
offline, the heuristic is shown to be nearoptimal for a wide range of
parameters.

We consider the optimal design of a scheduling policy for adaptive video
streaming in a wireless network formed by several users and helpers. A feature
of such networks is that any user is typically in the range of multiple
helpers. Hence, in order to cope with userhelper association, load balancing
and intercell interference, an efficient streaming policy should allow the
users to dynamically select the helper node to download from, and determine
adaptively the video quality level of the download. In order to obtain a
tractable formulation, we follow a "divide and conquer" approach: i) Assuming
that each video packet (chunk) is delivered within its playback delay ("smooth
streaming regime"), the problem is formulated as a network utility maximization
(NUM), subject to queue stability, where the network utility function is a
concave and componentwise nondecreasing function of the users' video quality
measure. ii) We solve the NUM problem by using a Lyapunov Drift Plus Penalty
approach, obtaining a scheme that naturally decomposes into two subpolicies
referred to as "congestion control" (adaptive video quality and helper station
selection) and "transmission scheduling" (dynamic allocation of the helperuser
physical layer transmission rates).Our solution is provably optimal with
respect to the proposed NUM problem, in a strong persample path sense. iii)
Finally, we propose a method to adaptively estimate the maximum queuing delays,
such that each user can calculate its prebuffering and rebuffering time in
order to cope with the fluctuations of the queuing delays. Through simulations,
we evaluate the performance of the proposed algorithm under realistic
assumptions of a network with densely deployed helper nodes, and demonstrate
the persample path optimality of the proposed solution by considering a
nonstationary nonergodic scenario with user mobility, VBR video coding.

This paper considers timeaverage stochastic optimization, where a time
average decision vector, an average of decision vectors chosen in every time
step from a timevarying (possibly nonconvex) set, minimizes a convex
objective function and satisfies convex constraints. This formulation has
applications in networking and operations research. In general, timeaverage
stochastic optimization can be solved by a Lyapunov optimization technique.
This paper shows that the technique exhibits a transient phase and a steady
state phase. When the problem has a unique vector of Lagrange multipliers, the
convergence time can be improved. By starting the time average in the steady
state the convergence times become $O(1/\epsilon)$ under a locallypolyhedral
assumption and $O(1/\epsilon^{1.5})$ under a locallynonpolyhedral assumption,
where $\epsilon$ denotes the proximity to the optimal objective cost.
Simulations suggest that the results may hold more generally without the unique
Lagrange multiplier assumption.

One practical open problem is the development of a distributed algorithm that
achieves nearoptimal utility using only a finite (and small) buffer size for
queues in a stochastic network. This paper studies utility maximization (or
cost minimization) in a finitebuffer regime and considers the corresponding
delay and reliability (or rate of packet drops) tradeoff. A floatingqueue
algorithm allows the stochastic network optimization framework to be
implemented with finite buffers at the cost of packet drops. Further, the
buffer size requirement is significantly smaller than previous works in this
area. With a finite buffer size of $B$ packets, the proposed algorithm achieves
within $O(e^{B})$ of the optimal utility while maintaining average perhop
delay of $O(B)$ and an average perhop drop rate of $O(e^{B})$ in steady
state. From an implementation perspective, the floatingqueue algorithm
requires little modification of the wellknown DriftPlusPenalty policy
(including MaxWeight and Backpressure policies). As a result, the
floatingqueue algorithm inherits the distributed and low complexity nature of
these policies.