• Small-cell architecture is widely adopted by cellular network operators to increase network capacity. By reducing the size of cells, operators can pack more (low-power) base stations in an area to better serve the growing demands, without causing extra interference. However, this approach suffers from low spectrum temporal efficiency. When a cell becomes smaller and covers fewer users, its total traffic fluctuates significantly due to insufficient traffic aggregation and exhibiting a large "peak-to-mean" ratio. As operators customarily provision spectrum for peak traffic, large traffic temporal fluctuation inevitably leads to low spectrum temporal efficiency. In this paper, we advocate device-to-device (D2D) load-balancing as a useful mechanism to address the fundamental drawback of small-cell architecture. The idea is to shift traffic from a congested cell to its adjacent under-utilized cells by leveraging inter-cell D2D communication, so that the traffic can be served without using extra spectrum, effectively improving the spectrum temporal efficiency. We provide theoretical modeling and analysis to characterize the benefit of D2D load balancing, in terms of total spectrum requirements of all individual cells. We also derive the corresponding cost, in terms of incurred D2D traffic overhead. We carry out empirical evaluations based on real-world 4G data traces to gauge the benefit and cost of D2D load balancing under practical settings. The results show that D2D load balancing can reduce the spectrum requirement by 25% as compared to the standard scenario without D2D load balancing, at the expense of negligible 0.7% D2D traffic overhead.
  • Adaptive optimization algorithms, such as Adam and RMSprop, have shown better optimization performance than stochastic gradient descent (SGD) in some scenarios. However, recent studies show that they often lead to worse generalization performance than SGD, especially for training deep neural networks (DNNs). In this work, we identify the reasons that Adam generalizes worse than SGD, and develop a variant of Adam to eliminate the generalization gap. The proposed method, normalized direction-preserving Adam (ND-Adam), enables more precise control of the direction and step size for updating weight vectors, leading to significantly improved generalization performance. Following a similar rationale, we further improve the generalization performance in classification tasks by regularizing the softmax logits. By bridging the gap between SGD and Adam, we also hope to shed light on why certain optimization algorithms generalize better than others.
  • We study a class of linear network coding (LNC) schemes, called circular-shift LNC, whose encoding operations consist of only circular-shifts and bit-wise additions (XOR). Formulated as a special vector linear code over GF($2$), an $L$-dimensional circular-shift linear code of degree $\delta$ restricts its local encoding kernels to be the summation of at most $\delta$ cyclic permutation matrices of size $L$. We show that on a general network, for a certain block length $L$, every scalar linear solution over GF($2^{L-1}$) can induce an $L$-dimensional circular-shift linear solution with 1-bit redundancy per-edge transmission. Consequently, specific to a multicast network, such a circular-shift linear solution of an arbitrary degree $\delta$ can be efficiently constructed, which has an interesting complexity tradeoff between encoding and decoding with different choices of $\delta$. By further proving that circular-shift LNC is insufficient to achieve the exact capacity of certain multicast networks, we show the optimality of the efficiently constructed circular-shift linear solution in the sense that its 1-bit redundancy is inevitable. Finally, both theoretical and numerical analysis imply that with increasing $L$, a randomly constructed circular-shift linear code has linear solvability behavior comparable to a randomly constructed permutation-based linear code, but has shorter overheads.
  • We study stochastic algorithms for solving non-convex optimization problems with a convex yet possibly non-smooth regularizer, which find wide applications in many practical machine learning applications. However, compared to asynchronous parallel stochastic gradient descent (AsynSGD), an algorithm targeting smooth optimization, the understanding of the behavior of stochastic algorithms for the non-smooth regularized optimization problems is limited, especially when the objective function is non-convex. To fill this gap, in this paper, we propose and analyze asynchronous parallel stochastic proximal gradient (AsynSPG) methods, including a full update version and a block-wise version, for non-convex problems. We establish an ergodic convergence rate of $O(1/\sqrt{K})$ for the proposed AsynSPG, $K$ being the number of updates made on the model, matching the convergence rate currently known for AsynSGD (for smooth problems). To our knowledge, this is the first work that provides convergence rates of asynchronous parallel SPG algorithms for non-convex problems. Furthermore, our results are also the first to prove convergence of any stochastic proximal methods without assuming an increasing batch size or the use of additional variance reduction techniques. We implement the proposed algorithms on Parameter Server and demonstrate its convergence behavior and near-linear speedup, as the number of workers increases, for sparse learning problems on a real-world dataset.
  • Many machine learning models, including those with non-smooth regularizers, can be formulated as consensus optimization problems, which can be solved by the alternating direction method of multipliers (ADMM). Many recent efforts have been made to develop asynchronous distributed ADMM to handle large amounts of training data. However, all existing asynchronous distributed ADMM methods are based on full model updates and require locking all global model parameters to handle concurrency, which essentially serializes the updates from different workers. In this paper, we present a novel block-wise, asynchronous and distributed ADMM algorithm, which allows different blocks of model parameters to be updated in parallel. The lock-free block-wise algorithm may greatly speedup sparse optimization problems, a common scenario in reality, in which most model updates only modify a subset of all decision variables. We theoretically prove the convergence of our proposed algorithm to stationary points for non-convex general form consensus problems with possibly non-smooth regularizers. We implement the proposed ADMM algorithm on the Parameter Server framework and demonstrate its convergence and near-linear speedup performance as the number of workers increases.
  • Cloud containers represent a new, light-weight alternative to virtual machines in cloud computing. A user job may be described by a container graph that specifies the resource profile of each container and container dependence relations. This work is the first in the cloud computing literature that designs efficient market mechanisms for container based cloud jobs. Our design targets simultaneously incentive compatibility, computational efficiency, and economic efficiency. It further adapts the idea of batch online optimization into the paradigm of mechanism design, leveraging agile creation of cloud containers and exploiting delay tolerance of elastic cloud jobs. The new and classic techniques we employ include: (i) compact exponential optimization for expressing and handling non-traditional constraints that arise from container dependence and job deadlines; (ii) the primal-dual schema for designing efficient approximation algorithms for social welfare maximization; and (iii) posted price mechanisms for batch decision making and truthful payment design. Theoretical analysis and trace-driven empirical evaluation verify the efficacy of our container auction algorithms.
  • Nowadays large-scale distributed machine learning systems have been deployed to support various analytics and intelligence services in IT firms. To train a large dataset and derive the prediction/inference model, e.g., a deep neural network, multiple workers are run in parallel to train partitions of the input dataset, and update shared model parameters. In a shared cluster handling multiple training jobs, a fundamental issue is how to efficiently schedule jobs and set the number of concurrent workers to run for each job, such that server resources are maximally utilized and model training can be completed in time. Targeting a distributed machine learning system using the parameter server framework, we design an online algorithm for scheduling the arriving jobs and deciding the adjusted numbers of concurrent workers and parameter servers for each job over its course, to maximize overall utility of all jobs, contingent on their completion times. Our online algorithm design utilizes a primal-dual framework coupled with efficient dual subroutines, achieving good long-term performance guarantees with polynomial time complexity. Practical effectiveness of the online algorithm is evaluated using trace-driven simulation and testbed experiments, which demonstrate its outperformance as compared to commonly adopted scheduling algorithms in today's cloud systems.
  • We study online resource allocation in a cloud computing platform, through a posted pricing mechanism: The cloud provider publishes a unit price for each resource type, which may vary over time; upon arrival at the cloud system, a cloud user either takes the current prices, renting resources to execute its job, or refuses the prices without running its job there. We design pricing functions based on the current resource utilization ratios, in a wide array of demand-supply relationships and resource occupation durations, and prove worst-case competitive ratios of the pricing functions in terms of social welfare. In the basic case of a single-type, non-recycled resource (i.e., allocated resources are not later released for reuse), we prove that our pricing function design is optimal, in that any other pricing function can only lead to a worse competitive ratio. Insights obtained from the basic cases are then used to generalize the pricing functions to more realistic cloud systems with multiple types of resources, where a job occupies allocated resources for a number of time slots till completion, upon which time the resources are returned back to the cloud resource pool.
  • Matrix factorization is a popular approach to solving matrix estimation problems based on partial observations. Existing matrix factorization is based on least squares and aims to yield a low-rank matrix to interpret the conditional sample means given the observations. However, in many real applications with skewed and extreme data, least squares cannot explain their central tendency or tail distributions, yielding undesired estimates. In this paper, we propose \emph{expectile matrix factorization} by introducing asymmetric least squares, a key concept in expectile regression analysis, into the matrix factorization framework. We propose an efficient algorithm to solve the new problem based on alternating minimization and quadratic programming. We prove that our algorithm converges to a global optimum and exactly recovers the true underlying low-rank matrices when noise is zero. For synthetic data with skewed noise and a real-world dataset containing web service response times, the proposed scheme achieves lower recovery errors than the existing matrix factorization method based on least squares in a wide range of settings.
  • The growing demand in mobile Internet access calls for high capacity and energy efficient cellular access with better cell coverage. The in-band relaying solution, proposed in LTE-Advanced, improves coverage without requiring additional spectrum for backhauling, making its deployment more economical and practical. However, in-band relay without careful management incurs low spectrum utilization and reduces the system capacity. We propose auction-based solutions that aim at dynamic spectrum resource sharing, maximizing the utilization of precious spectrum resources. We first present a truthful auction that ensures a theoretical performance guarantee in terms of social welfare. Then in an extended system model that focuses on addressing the heterogeneity of resource blocks, we design a more practical auction mechanism. We implement our proposed auctions under large scale real-world settings. Simulation results verify the efficacy of proposed auctions, showing improvements in both cell coverage and spectrum efficiency.
  • Network Function Virtualization (NFV) is an emerging paradigm that turns hardware-dependent implementation of network functions (i.e., middleboxes) into software modules running on virtualized platforms, for significant cost reduction and ease of management. Such virtual network functions (VNFs) commonly constitute service chains, to provide network services that traffic flows need to go through. Efficient deployment of VNFs for network service provisioning is key to realize the NFV goals. Existing efforts on VNF placement mostly deal with offline or one-time placement, ignoring the fundamental, dynamic deployment and scaling need of VNFs to handle practical time-varying traffic volumes. This work investigates dynamic placement of VNF service chains across geo-distributed datacenters to serve flows between dispersed source and destination pairs, for operational cost minimization of the service chain provider over the entire system span. An efficient online algorithm is proposed, which consists of two main components: (1) A regularization-based approach from online learning literature to convert the offline optimal deployment problem into a sequence of one-shot regularized problems, each to be efficiently solved in one time slot; (2) An online dependent rounding scheme to derive feasible integer solutions from the optimal fractional solutions of the one-shot problems, and to guarantee a good competitive ratio of the online algorithm over the entire time span. We verify our online algorithm with solid theoretical analysis and trace-driven simulations under realistic settings.
  • On-demand resource provisioning in cloud computing provides tailor-made resource packages (typically in the form of VMs) to meet users' demands. Public clouds nowadays provide more and more elaborated types of VMs, but have yet to offer the most flexible dynamic VM assembly, which is partly due to the lack of a mature mechanism for pricing tailor-made VMs on the spot. This work proposes an efficient randomized auction mechanism based on a novel application of smoothed analysis and randomized reduction, for dynamic VM provisioning and pricing in geo-distributed cloud data centers. This auction, to the best of our knowledge, is the first one in literature that achieves (i) truthfulness in expectation, (ii) polynomial running time in expectation, and (iii) $(1-\epsilon)$-optimal social welfare in expectation for resource allocation, where $\epsilon$ can be arbitrarily close to 0. Our mechanism consists of three modules: (1) an exact algorithm to solve the NP-hard social welfare maximization problem, which runs in polynomial time in expectation, (2) a perturbation-based randomized resource allocation scheme which produces a VM provisioning solution that is $(1-\epsilon)$-optimal, and (3) an auction mechanism that applies the perturbation-based scheme for dynamic VM provisioning and prices the customized VMs using a randomized VCG payment, with a guarantee in truthfulness in expectation. We validate the efficacy of the mechanism through careful theoretical analysis and trace-driven simulations.
  • Vector linear network coding (LNC) is a generalization of the conventional scalar LNC, such that the data unit transmitted on every edge is an $L$-dimensional vector of data symbols over a base field GF($q$). Vector LNC enriches the choices of coding operations at intermediate nodes, and there is a popular conjecture on the benefit of vector LNC over scalar LNC in terms of alphabet size of data units: there exist (single-source) multicast networks that are vector linearly solvable of dimension $L$ over GF($q$) but not scalar linearly solvable over any field of size $q' \leq q^L$. This paper introduces a systematic way to construct such multicast networks, and subsequently establish explicit instances to affirm the positive answer of this conjecture for \emph{infinitely many} alphabet sizes $p^L$ with respect to an \emph{arbitrary} prime $p$. On the other hand, this paper also presents explicit instances with the special property that they do not have a vector linear solution of dimension $L$ over GF(2) but have scalar linear solutions over GF($q'$) for some $q' < 2^L$, where $q'$ can be odd or even. This discovery also unveils that over a given base field, a multicast network that has a vector linear solution of dimension $L$ does not necessarily have a vector linear solution of dimension $L' > L$.
  • Network Function Virtualization (NFV) is a promising technology that promises to significantly reduce the operational costs of network services by deploying virtualized network functions (VNFs) to commodity servers in place of dedicated hardware middleboxes. The VNFs are typically running on virtual machine instances in a cloud infrastructure, where the virtualization technology enables dynamic provisioning of VNF instances, to process the fluctuating traffic that needs to go through the network functions in a network service. In this paper, we target dynamic provisioning of enterprise network services - expressed as one or multiple service chains - in cloud datacenters, and design efficient online algorithms without requiring any information on future traffic rates. The key is to decide the number of instances of each VNF type to provision at each time, taking into consideration the server resource capacities and traffic rates between adjacent VNFs in a service chain. In the case of a single service chain, we discover an elegant structure of the problem and design an efficient randomized algorithm achieving a e/(e-1) competitive ratio. For multiple concurrent service chains, an online heuristic algorithm is proposed, which is O(1)-competitive. We demonstrate the effectiveness of our algorithms using solid theoretical analysis and trace-driven simulations.
  • For a (single-source) multicast network, the size of a base field is the most known and studied algebraic identity that is involved in characterizing its linear solvability over the base field. In this paper, we design a new class $\mathcal{N}$ of multicast networks and obtain an explicit formula for the linear solvability of these networks, which involves the associated coset numbers of a multiplicative subgroup in a base field. The concise formula turns out to be the first that matches the topological structure of a multicast network and algebraic identities of a field other than size. It further facilitates us to unveil \emph{infinitely many} new multicast networks linearly solvable over GF($q$) but not over GF($q'$) with $q < q'$, based on a subgroup order criterion. In particular, i) for every $k\geq 2$, an instance in $\mathcal{N}$ can be found linearly solvable over GF($2^{2k}$) but \emph{not} over GF($2^{2k+1}$), and ii) for arbitrary distinct primes $p$ and $p'$, there are infinitely many $k$ and $k'$ such that an instance in $\mathcal{N}$ can be found linearly solvable over GF($p^k$) but \emph{not} over GF($p'^{k'}$) with $p^k < p'^{k'}$. On the other hand, the construction of $\mathcal{N}$ also leads to a new class of multicast networks with $\Theta(q^2)$ nodes and $\Theta(q^2)$ edges, where $q \geq 5$ is the minimum field size for linear solvability of the network.
  • In an acyclic multicast network, it is well known that a linear network coding solution over GF($q$) exists when $q$ is sufficiently large. In particular, for each prime power $q$ no smaller than the number of receivers, a linear solution over GF($q$) can be efficiently constructed. In this work, we reveal that a linear solution over a given finite field does \emph{not} necessarily imply the existence of a linear solution over all larger finite fields. Specifically, we prove by construction that: (i) For every source dimension no smaller than 3, there is a multicast network linearly solvable over GF(7) but not over GF(8), and another multicast network linearly solvable over GF(16) but not over GF(17); (ii) There is a multicast network linearly solvable over GF(5) but not over such GF($q$) that $q > 5$ is a Mersenne prime plus 1, which can be extremely large; (iii) A multicast network linearly solvable over GF($q^{m_1}$) and over GF($q^{m_2}$) is \emph{not} necessarily linearly solvable over GF($q^{m_1+m_2}$); (iv) There exists a class of multicast networks with a set $T$ of receivers such that the minimum field size $q_{min}$ for a linear solution over GF($q_{min}$) is lower bounded by $\Theta(\sqrt{|T|})$, but not every larger field than GF($q_{min}$) suffices to yield a linear solution. The insight brought from this work is that not only the field size, but also the order of subgroups in the multiplicative group of a finite field affects the linear solvability of a multicast network.
  • IaaS clouds invest substantial capital in operating their data centers. Reducing the cost of resource provisioning, is their forever pursuing goal. Computing resource trading among multiple IaaS clouds provide a potential for IaaS clouds to utilize cheaper resources to fulfill their jobs, by exploiting the diversities of different clouds' workloads and operational costs. In this paper, we focus on studying the IaaS clouds' cost reduction through computing resource trading among multiple IaaS clouds. We formulate the global cost minimization problem among multiple IaaS clouds under cooperative scenario where each individual cloud's workload and cost information is known. Taking into consideration jobs with disparate lengths, a non-preemptive approximation algorithm for leftover job migration and new job scheduling is designed. Given to the selfishness of individual clouds, we further design a randomized double auction mechanism to elicit clouds' truthful bidding for buying or selling virtual machines. We evaluate our algorithms using trace-driven simulations.
  • Network Coding encourages information coding across a communication network. While the necessity, benefit and complexity of network coding are sensitive to the underlying graph structure of a network, existing theory on network coding often treats the network topology as a black box, focusing on algebraic or information theoretic aspects of the problem. This work aims at an in-depth examination of the relation between algebraic coding and network topologies. We mathematically establish a series of results along the direction of: if network coding is necessary/beneficial, or if a particular finite field is required for coding, then the network must have a corresponding hidden structure embedded in its underlying topology, and such embedding is computationally efficient to verify. Specifically, we first formulate a meta-conjecture, the NC-Minor Conjecture, that articulates such a connection between graph theory and network coding, in the language of graph minors. We next prove that the NC-Minor Conjecture is almost equivalent to the Hadwiger Conjecture, which connects graph minors with graph coloring. Such equivalence implies the existence of $K_4$, $K_5$, $K_6$, and $K_{O(q/\log{q})}$ minors, for networks requiring $\mathbb{F}_3$, $\mathbb{F}_4$, $\mathbb{F}_5$ and $\mathbb{F}_q$, respectively. We finally prove that network coding can make a difference from routing only if the network contains a $K_4$ minor, and this minor containment result is tight. Practical implications of the above results are discussed.
  • By sharing resources among different cloud providers, the paradigm of federated clouds exploits temporal availability of resources and geographical diversity of operational costs for efficient job service. While interoperability issues across different cloud platforms in a cloud federation have been extensively studied, fundamental questions on cloud economics remain: When and how should a cloud trade resources (e.g., virtual machines) with others, such that its net profit is maximized over the long run, while a close-to-optimal social welfare in the entire federation can also be guaranteed? To answer this question, a number of important, inter-related decisions, including job scheduling, server provisioning and resource pricing, should be dynamically and jointly made, while the long-term profit optimality is pursued. In this work, we design efficient algorithms for inter-cloud virtual machine (VM) trading and scheduling in a cloud federation. For VM transactions among clouds, we design a double-auction based mechanism that is strategyproof, individual rational, ex-post budget balanced, and efficient to execute over time. Closely combined with the auction mechanism is a dynamic VM trading and scheduling algorithm, which carefully decides the true valuations of VMs in the auction, optimally schedules stochastic job arrivals with different SLAs onto the VMs, and judiciously turns on and off servers based on the current electricity prices. Through rigorous analysis, we show that each individual cloud, by carrying out the dynamic algorithm in the online double auction, can achieve a time-averaged profit arbitrarily close to the offline optimum. Asymptotic optimality in social welfare is also achieved under homogeneous cloud settings. We carry out trace-driven simulations to examine the effectiveness of our algorithms and the achievable social welfare under heterogeneous cloud settings.