
Smallcell architecture is widely adopted by cellular network operators to
increase network capacity. By reducing the size of cells, operators can pack
more (lowpower) base stations in an area to better serve the growing demands,
without causing extra interference. However, this approach suffers from low
spectrum temporal efficiency. When a cell becomes smaller and covers fewer
users, its total traffic fluctuates significantly due to insufficient traffic
aggregation and exhibiting a large "peaktomean" ratio. As operators
customarily provision spectrum for peak traffic, large traffic temporal
fluctuation inevitably leads to low spectrum temporal efficiency. In this
paper, we advocate devicetodevice (D2D) loadbalancing as a useful mechanism
to address the fundamental drawback of smallcell architecture. The idea is to
shift traffic from a congested cell to its adjacent underutilized cells by
leveraging intercell D2D communication, so that the traffic can be served
without using extra spectrum, effectively improving the spectrum temporal
efficiency. We provide theoretical modeling and analysis to characterize the
benefit of D2D load balancing, in terms of total spectrum requirements of all
individual cells. We also derive the corresponding cost, in terms of incurred
D2D traffic overhead. We carry out empirical evaluations based on realworld 4G
data traces to gauge the benefit and cost of D2D load balancing under practical
settings. The results show that D2D load balancing can reduce the spectrum
requirement by 25% as compared to the standard scenario without D2D load
balancing, at the expense of negligible 0.7% D2D traffic overhead.

Adaptive optimization algorithms, such as Adam and RMSprop, have shown better
optimization performance than stochastic gradient descent (SGD) in some
scenarios. However, recent studies show that they often lead to worse
generalization performance than SGD, especially for training deep neural
networks (DNNs). In this work, we identify the reasons that Adam generalizes
worse than SGD, and develop a variant of Adam to eliminate the generalization
gap. The proposed method, normalized directionpreserving Adam (NDAdam),
enables more precise control of the direction and step size for updating weight
vectors, leading to significantly improved generalization performance.
Following a similar rationale, we further improve the generalization
performance in classification tasks by regularizing the softmax logits. By
bridging the gap between SGD and Adam, we also hope to shed light on why
certain optimization algorithms generalize better than others.

We study a class of linear network coding (LNC) schemes, called
circularshift LNC, whose encoding operations consist of only circularshifts
and bitwise additions (XOR). Formulated as a special vector linear code over
GF($2$), an $L$dimensional circularshift linear code of degree $\delta$
restricts its local encoding kernels to be the summation of at most $\delta$
cyclic permutation matrices of size $L$. We show that on a general network, for
a certain block length $L$, every scalar linear solution over GF($2^{L1}$) can
induce an $L$dimensional circularshift linear solution with 1bit redundancy
peredge transmission. Consequently, specific to a multicast network, such a
circularshift linear solution of an arbitrary degree $\delta$ can be
efficiently constructed, which has an interesting complexity tradeoff between
encoding and decoding with different choices of $\delta$. By further proving
that circularshift LNC is insufficient to achieve the exact capacity of
certain multicast networks, we show the optimality of the efficiently
constructed circularshift linear solution in the sense that its 1bit
redundancy is inevitable. Finally, both theoretical and numerical analysis
imply that with increasing $L$, a randomly constructed circularshift linear
code has linear solvability behavior comparable to a randomly constructed
permutationbased linear code, but has shorter overheads.

We study stochastic algorithms for solving nonconvex optimization problems
with a convex yet possibly nonsmooth regularizer, which find wide applications
in many practical machine learning applications. However, compared to
asynchronous parallel stochastic gradient descent (AsynSGD), an algorithm
targeting smooth optimization, the understanding of the behavior of stochastic
algorithms for the nonsmooth regularized optimization problems is limited,
especially when the objective function is nonconvex. To fill this gap, in this
paper, we propose and analyze asynchronous parallel stochastic proximal
gradient (AsynSPG) methods, including a full update version and a blockwise
version, for nonconvex problems. We establish an ergodic convergence rate of
$O(1/\sqrt{K})$ for the proposed AsynSPG, $K$ being the number of updates made
on the model, matching the convergence rate currently known for AsynSGD (for
smooth problems). To our knowledge, this is the first work that provides
convergence rates of asynchronous parallel SPG algorithms for nonconvex
problems. Furthermore, our results are also the first to prove convergence of
any stochastic proximal methods without assuming an increasing batch size or
the use of additional variance reduction techniques. We implement the proposed
algorithms on Parameter Server and demonstrate its convergence behavior and
nearlinear speedup, as the number of workers increases, for sparse learning
problems on a realworld dataset.

Many machine learning models, including those with nonsmooth regularizers,
can be formulated as consensus optimization problems, which can be solved by
the alternating direction method of multipliers (ADMM). Many recent efforts
have been made to develop asynchronous distributed ADMM to handle large amounts
of training data. However, all existing asynchronous distributed ADMM methods
are based on full model updates and require locking all global model parameters
to handle concurrency, which essentially serializes the updates from different
workers. In this paper, we present a novel blockwise, asynchronous and
distributed ADMM algorithm, which allows different blocks of model parameters
to be updated in parallel. The lockfree blockwise algorithm may greatly
speedup sparse optimization problems, a common scenario in reality, in which
most model updates only modify a subset of all decision variables. We
theoretically prove the convergence of our proposed algorithm to stationary
points for nonconvex general form consensus problems with possibly nonsmooth
regularizers. We implement the proposed ADMM algorithm on the Parameter Server
framework and demonstrate its convergence and nearlinear speedup performance
as the number of workers increases.

Cloud containers represent a new, lightweight alternative to virtual
machines in cloud computing. A user job may be described by a container graph
that specifies the resource profile of each container and container dependence
relations. This work is the first in the cloud computing literature that
designs efficient market mechanisms for container based cloud jobs. Our design
targets simultaneously incentive compatibility, computational efficiency, and
economic efficiency. It further adapts the idea of batch online optimization
into the paradigm of mechanism design, leveraging agile creation of cloud
containers and exploiting delay tolerance of elastic cloud jobs. The new and
classic techniques we employ include: (i) compact exponential optimization for
expressing and handling nontraditional constraints that arise from container
dependence and job deadlines; (ii) the primaldual schema for designing
efficient approximation algorithms for social welfare maximization; and (iii)
posted price mechanisms for batch decision making and truthful payment design.
Theoretical analysis and tracedriven empirical evaluation verify the efficacy
of our container auction algorithms.

Nowadays largescale distributed machine learning systems have been deployed
to support various analytics and intelligence services in IT firms. To train a
large dataset and derive the prediction/inference model, e.g., a deep neural
network, multiple workers are run in parallel to train partitions of the input
dataset, and update shared model parameters. In a shared cluster handling
multiple training jobs, a fundamental issue is how to efficiently schedule jobs
and set the number of concurrent workers to run for each job, such that server
resources are maximally utilized and model training can be completed in time.
Targeting a distributed machine learning system using the parameter server
framework, we design an online algorithm for scheduling the arriving jobs and
deciding the adjusted numbers of concurrent workers and parameter servers for
each job over its course, to maximize overall utility of all jobs, contingent
on their completion times. Our online algorithm design utilizes a primaldual
framework coupled with efficient dual subroutines, achieving good longterm
performance guarantees with polynomial time complexity. Practical effectiveness
of the online algorithm is evaluated using tracedriven simulation and testbed
experiments, which demonstrate its outperformance as compared to commonly
adopted scheduling algorithms in today's cloud systems.

We study online resource allocation in a cloud computing platform, through a
posted pricing mechanism: The cloud provider publishes a unit price for each
resource type, which may vary over time; upon arrival at the cloud system, a
cloud user either takes the current prices, renting resources to execute its
job, or refuses the prices without running its job there. We design pricing
functions based on the current resource utilization ratios, in a wide array of
demandsupply relationships and resource occupation durations, and prove
worstcase competitive ratios of the pricing functions in terms of social
welfare. In the basic case of a singletype, nonrecycled resource (i.e.,
allocated resources are not later released for reuse), we prove that our
pricing function design is optimal, in that any other pricing function can only
lead to a worse competitive ratio. Insights obtained from the basic cases are
then used to generalize the pricing functions to more realistic cloud systems
with multiple types of resources, where a job occupies allocated resources for
a number of time slots till completion, upon which time the resources are
returned back to the cloud resource pool.

Matrix factorization is a popular approach to solving matrix estimation
problems based on partial observations. Existing matrix factorization is based
on least squares and aims to yield a lowrank matrix to interpret the
conditional sample means given the observations. However, in many real
applications with skewed and extreme data, least squares cannot explain their
central tendency or tail distributions, yielding undesired estimates. In this
paper, we propose \emph{expectile matrix factorization} by introducing
asymmetric least squares, a key concept in expectile regression analysis, into
the matrix factorization framework. We propose an efficient algorithm to solve
the new problem based on alternating minimization and quadratic programming. We
prove that our algorithm converges to a global optimum and exactly recovers the
true underlying lowrank matrices when noise is zero. For synthetic data with
skewed noise and a realworld dataset containing web service response times,
the proposed scheme achieves lower recovery errors than the existing matrix
factorization method based on least squares in a wide range of settings.

The growing demand in mobile Internet access calls for high capacity and
energy efficient cellular access with better cell coverage. The inband
relaying solution, proposed in LTEAdvanced, improves coverage without
requiring additional spectrum for backhauling, making its deployment more
economical and practical. However, inband relay without careful management
incurs low spectrum utilization and reduces the system capacity. We propose
auctionbased solutions that aim at dynamic spectrum resource sharing,
maximizing the utilization of precious spectrum resources. We first present a
truthful auction that ensures a theoretical performance guarantee in terms of
social welfare. Then in an extended system model that focuses on addressing the
heterogeneity of resource blocks, we design a more practical auction mechanism.
We implement our proposed auctions under large scale realworld settings.
Simulation results verify the efficacy of proposed auctions, showing
improvements in both cell coverage and spectrum efficiency.

Network Function Virtualization (NFV) is an emerging paradigm that turns
hardwaredependent implementation of network functions (i.e., middleboxes) into
software modules running on virtualized platforms, for significant cost
reduction and ease of management. Such virtual network functions (VNFs)
commonly constitute service chains, to provide network services that traffic
flows need to go through. Efficient deployment of VNFs for network service
provisioning is key to realize the NFV goals. Existing efforts on VNF placement
mostly deal with offline or onetime placement, ignoring the fundamental,
dynamic deployment and scaling need of VNFs to handle practical timevarying
traffic volumes. This work investigates dynamic placement of VNF service chains
across geodistributed datacenters to serve flows between dispersed source and
destination pairs, for operational cost minimization of the service chain
provider over the entire system span. An efficient online algorithm is
proposed, which consists of two main components: (1) A regularizationbased
approach from online learning literature to convert the offline optimal
deployment problem into a sequence of oneshot regularized problems, each to be
efficiently solved in one time slot; (2) An online dependent rounding scheme to
derive feasible integer solutions from the optimal fractional solutions of the
oneshot problems, and to guarantee a good competitive ratio of the online
algorithm over the entire time span. We verify our online algorithm with solid
theoretical analysis and tracedriven simulations under realistic settings.

Ondemand resource provisioning in cloud computing provides tailormade
resource packages (typically in the form of VMs) to meet users' demands. Public
clouds nowadays provide more and more elaborated types of VMs, but have yet to
offer the most flexible dynamic VM assembly, which is partly due to the lack of
a mature mechanism for pricing tailormade VMs on the spot. This work proposes
an efficient randomized auction mechanism based on a novel application of
smoothed analysis and randomized reduction, for dynamic VM provisioning and
pricing in geodistributed cloud data centers. This auction, to the best of our
knowledge, is the first one in literature that achieves (i) truthfulness in
expectation, (ii) polynomial running time in expectation, and (iii)
$(1\epsilon)$optimal social welfare in expectation for resource allocation,
where $\epsilon$ can be arbitrarily close to 0. Our mechanism consists of three
modules: (1) an exact algorithm to solve the NPhard social welfare
maximization problem, which runs in polynomial time in expectation, (2) a
perturbationbased randomized resource allocation scheme which produces a VM
provisioning solution that is $(1\epsilon)$optimal, and (3) an auction
mechanism that applies the perturbationbased scheme for dynamic VM
provisioning and prices the customized VMs using a randomized VCG payment, with
a guarantee in truthfulness in expectation. We validate the efficacy of the
mechanism through careful theoretical analysis and tracedriven simulations.

Vector linear network coding (LNC) is a generalization of the conventional
scalar LNC, such that the data unit transmitted on every edge is an
$L$dimensional vector of data symbols over a base field GF($q$). Vector LNC
enriches the choices of coding operations at intermediate nodes, and there is a
popular conjecture on the benefit of vector LNC over scalar LNC in terms of
alphabet size of data units: there exist (singlesource) multicast networks
that are vector linearly solvable of dimension $L$ over GF($q$) but not scalar
linearly solvable over any field of size $q' \leq q^L$. This paper introduces a
systematic way to construct such multicast networks, and subsequently establish
explicit instances to affirm the positive answer of this conjecture for
\emph{infinitely many} alphabet sizes $p^L$ with respect to an \emph{arbitrary}
prime $p$. On the other hand, this paper also presents explicit instances with
the special property that they do not have a vector linear solution of
dimension $L$ over GF(2) but have scalar linear solutions over GF($q'$) for
some $q' < 2^L$, where $q'$ can be odd or even. This discovery also unveils
that over a given base field, a multicast network that has a vector linear
solution of dimension $L$ does not necessarily have a vector linear solution of
dimension $L' > L$.

Network Function Virtualization (NFV) is a promising technology that promises
to significantly reduce the operational costs of network services by deploying
virtualized network functions (VNFs) to commodity servers in place of dedicated
hardware middleboxes. The VNFs are typically running on virtual machine
instances in a cloud infrastructure, where the virtualization technology
enables dynamic provisioning of VNF instances, to process the fluctuating
traffic that needs to go through the network functions in a network service. In
this paper, we target dynamic provisioning of enterprise network services 
expressed as one or multiple service chains  in cloud datacenters, and design
efficient online algorithms without requiring any information on future traffic
rates. The key is to decide the number of instances of each VNF type to
provision at each time, taking into consideration the server resource
capacities and traffic rates between adjacent VNFs in a service chain. In the
case of a single service chain, we discover an elegant structure of the problem
and design an efficient randomized algorithm achieving a e/(e1) competitive
ratio. For multiple concurrent service chains, an online heuristic algorithm is
proposed, which is O(1)competitive. We demonstrate the effectiveness of our
algorithms using solid theoretical analysis and tracedriven simulations.

For a (singlesource) multicast network, the size of a base field is the most
known and studied algebraic identity that is involved in characterizing its
linear solvability over the base field. In this paper, we design a new class
$\mathcal{N}$ of multicast networks and obtain an explicit formula for the
linear solvability of these networks, which involves the associated coset
numbers of a multiplicative subgroup in a base field. The concise formula turns
out to be the first that matches the topological structure of a multicast
network and algebraic identities of a field other than size. It further
facilitates us to unveil \emph{infinitely many} new multicast networks linearly
solvable over GF($q$) but not over GF($q'$) with $q < q'$, based on a subgroup
order criterion. In particular, i) for every $k\geq 2$, an instance in
$\mathcal{N}$ can be found linearly solvable over GF($2^{2k}$) but \emph{not}
over GF($2^{2k+1}$), and ii) for arbitrary distinct primes $p$ and $p'$, there
are infinitely many $k$ and $k'$ such that an instance in $\mathcal{N}$ can be
found linearly solvable over GF($p^k$) but \emph{not} over GF($p'^{k'}$) with
$p^k < p'^{k'}$. On the other hand, the construction of $\mathcal{N}$ also
leads to a new class of multicast networks with $\Theta(q^2)$ nodes and
$\Theta(q^2)$ edges, where $q \geq 5$ is the minimum field size for linear
solvability of the network.

In an acyclic multicast network, it is well known that a linear network
coding solution over GF($q$) exists when $q$ is sufficiently large. In
particular, for each prime power $q$ no smaller than the number of receivers, a
linear solution over GF($q$) can be efficiently constructed. In this work, we
reveal that a linear solution over a given finite field does \emph{not}
necessarily imply the existence of a linear solution over all larger finite
fields. Specifically, we prove by construction that: (i) For every source
dimension no smaller than 3, there is a multicast network linearly solvable
over GF(7) but not over GF(8), and another multicast network linearly solvable
over GF(16) but not over GF(17); (ii) There is a multicast network linearly
solvable over GF(5) but not over such GF($q$) that $q > 5$ is a Mersenne prime
plus 1, which can be extremely large; (iii) A multicast network linearly
solvable over GF($q^{m_1}$) and over GF($q^{m_2}$) is \emph{not} necessarily
linearly solvable over GF($q^{m_1+m_2}$); (iv) There exists a class of
multicast networks with a set $T$ of receivers such that the minimum field size
$q_{min}$ for a linear solution over GF($q_{min}$) is lower bounded by
$\Theta(\sqrt{T})$, but not every larger field than GF($q_{min}$) suffices to
yield a linear solution. The insight brought from this work is that not only
the field size, but also the order of subgroups in the multiplicative group of
a finite field affects the linear solvability of a multicast network.

IaaS clouds invest substantial capital in operating their data centers.
Reducing the cost of resource provisioning, is their forever pursuing goal.
Computing resource trading among multiple IaaS clouds provide a potential for
IaaS clouds to utilize cheaper resources to fulfill their jobs, by exploiting
the diversities of different clouds' workloads and operational costs. In this
paper, we focus on studying the IaaS clouds' cost reduction through computing
resource trading among multiple IaaS clouds. We formulate the global cost
minimization problem among multiple IaaS clouds under cooperative scenario
where each individual cloud's workload and cost information is known. Taking
into consideration jobs with disparate lengths, a nonpreemptive approximation
algorithm for leftover job migration and new job scheduling is designed. Given
to the selfishness of individual clouds, we further design a randomized double
auction mechanism to elicit clouds' truthful bidding for buying or selling
virtual machines. We evaluate our algorithms using tracedriven simulations.

Network Coding encourages information coding across a communication network.
While the necessity, benefit and complexity of network coding are sensitive to
the underlying graph structure of a network, existing theory on network coding
often treats the network topology as a black box, focusing on algebraic or
information theoretic aspects of the problem. This work aims at an indepth
examination of the relation between algebraic coding and network topologies. We
mathematically establish a series of results along the direction of: if network
coding is necessary/beneficial, or if a particular finite field is required for
coding, then the network must have a corresponding hidden structure embedded in
its underlying topology, and such embedding is computationally efficient to
verify. Specifically, we first formulate a metaconjecture, the NCMinor
Conjecture, that articulates such a connection between graph theory and network
coding, in the language of graph minors. We next prove that the NCMinor
Conjecture is almost equivalent to the Hadwiger Conjecture, which connects
graph minors with graph coloring. Such equivalence implies the existence of
$K_4$, $K_5$, $K_6$, and $K_{O(q/\log{q})}$ minors, for networks requiring
$\mathbb{F}_3$, $\mathbb{F}_4$, $\mathbb{F}_5$ and $\mathbb{F}_q$,
respectively. We finally prove that network coding can make a difference from
routing only if the network contains a $K_4$ minor, and this minor containment
result is tight. Practical implications of the above results are discussed.

By sharing resources among different cloud providers, the paradigm of
federated clouds exploits temporal availability of resources and geographical
diversity of operational costs for efficient job service. While
interoperability issues across different cloud platforms in a cloud federation
have been extensively studied, fundamental questions on cloud economics remain:
When and how should a cloud trade resources (e.g., virtual machines) with
others, such that its net profit is maximized over the long run, while a
closetooptimal social welfare in the entire federation can also be
guaranteed? To answer this question, a number of important, interrelated
decisions, including job scheduling, server provisioning and resource pricing,
should be dynamically and jointly made, while the longterm profit optimality
is pursued. In this work, we design efficient algorithms for intercloud
virtual machine (VM) trading and scheduling in a cloud federation. For VM
transactions among clouds, we design a doubleauction based mechanism that is
strategyproof, individual rational, expost budget balanced, and efficient to
execute over time. Closely combined with the auction mechanism is a dynamic VM
trading and scheduling algorithm, which carefully decides the true valuations
of VMs in the auction, optimally schedules stochastic job arrivals with
different SLAs onto the VMs, and judiciously turns on and off servers based on
the current electricity prices. Through rigorous analysis, we show that each
individual cloud, by carrying out the dynamic algorithm in the online double
auction, can achieve a timeaveraged profit arbitrarily close to the offline
optimum. Asymptotic optimality in social welfare is also achieved under
homogeneous cloud settings. We carry out tracedriven simulations to examine
the effectiveness of our algorithms and the achievable social welfare under
heterogeneous cloud settings.