• This paper considers optimization over multiple renewal systems coupled by time average constraints. These systems act asynchronously over variable length frames. For each system, at the beginning of each renewal frame, it chooses an action which affects the duration of its own frame, the penalty, and the resource expenditure throughout the frame. The goal is to minimize the overall time average penalty subject to several overall time average resource constraints which couple these systems. This problem has applications to task processing networks, coupled Markov decision processes(MDPs) and so on. We propose a distributed algorithm so that each system can make its own decision after observing a global multiplier which is updated slot-wise. We show that this algorithm satisfies the desired constraints and achieves $\mathcal{O}(\varepsilon)$ near optimality with $\mathcal{O}(1/\varepsilon^2)$ convergence time.
  • We present Rosenthal-type moment inequalities for matrix-valued U-statistics of order 2. As a corollary, we obtain new matrix concentration inequalities for U-statistics. One of our main technical tools, a version of the non-commutative Khintchine inequality for the spectral norm of the Rademacher chaos, could be of independent interest.
  • This paper introduces a general regularized thresholded least-square procedure estimating a structured signal $\theta_*\in\mathbb{R}^d$ from the following observations: $y_i = f(\langle\mathbf{x}_i, \theta_*\rangle, \xi_i),~i\in\{1,2,\cdots,N\}$, with i.i.d. heavy-tailed measurements $\{(\mathbf{x}_i,y_i)\}_{i=1}^N$. A general framework analyzing the thresholding procedure is proposed, which boils down to computing three critical radiuses of the bounding balls of the estimator. Then, we demonstrate these critical radiuses can be tightly bounded in the following two scenarios: (1) The link function $f(\cdot)$ is linear, i.e. $y = \langle\mathbf{x},\theta_*\rangle + \xi$, with $\theta_*$ being a sparse vector and $\{\mathbf{x}_i\}_{i=1}^N$ being general heavy-tailed random measurements with bounded $(20+\epsilon)$-moments. (2) The function $f(\cdot)$ is arbitrary unknown (possibly discontinuous) and $\{\mathbf{x}_i\}_{i=1}^N$ are heavy-tailed elliptical random vectors with bounded $(4+\epsilon)$-moments. In both scenarios, we show under these rather minimal bounded moment assumptions, such a procedure and corresponding analysis lead to optimal sample and error bounds with high probability in terms of the structural properties of $\theta_*$.
  • Let $Y$ be a $d$-dimensional random vector with unknown mean $\mu$ and covariance matrix $\Sigma$. This paper is motivated by the problem of designing an estimator of $\Sigma$ that admits tight deviation bounds in the operator norm under minimal assumptions on the underlying distribution, such as existence of only 4th moments of the coordinates of $Y$. To address this problem, we propose robust modifications of the operator-valued U-statistics, obtain non-asymptotic guarantees for their performance, and demonstrate the implications of these results to the covariance estimation problem under various structural assumptions.
  • We propose and analyze a new estimator of the covariance matrix that admits strong theoretical guarantees under weak assumptions on the underlying distribution, such as existence of moments of only low order. While estimation of covariance matrices corresponding to sub-Gaussian distributions is well-understood, much less in known in the case of heavy-tailed data. As K. Balasubramanian and M. Yuan write, "data from real-world experiments oftentimes tend to be corrupted with outliers and/or exhibit heavy tails. In such cases, it is not clear that those covariance matrix estimators .. remain optimal" and "..what are the other possible strategies to deal with heavy tailed distributions warrant further studies." We make a step towards answering this question and prove tight deviation inequalities for the proposed estimator that depend only on the parameters controlling the "intrinsic dimension" associated to the covariance matrix (as opposed to the dimension of the ambient space); in particular, our results are applicable in the case of high-dimensional observations.
  • Performance guarantees for compression in nonlinear models under non-Gaussian observations can be achieved through the use of distributional characteristics that are sensitive to the distance to normality, and which in particular return the value of zero under Gaussian or linear sensing. The use of these characteristics, or discrepancies, improves some previous results in this area by relaxing conditions and tightening performance bounds. In addition, these characteristics are tractable to compute when Gaussian sensing is corrupted by either additive errors or mixing.
  • We consider multiple parallel Markov decision processes (MDPs) coupled by global constraints, where the time varying objective and constraint functions can only be observed after the decision is made. Special attention is given to how well the decision maker can perform in $T$ slots, starting from any state, compared to the best feasible randomized stationary policy in hindsight. We develop a new distributed online algorithm where each MDP makes its own decision each slot after observing a multiplier computed from past information. While the scenario is significantly more challenging than the classical online learning context, the algorithm is shown to have a tight $O(\sqrt{T})$ regret and constraint violations simultaneously. To obtain such a bound, we combine several new ingredients including ergodicity and mixing time bound in weakly coupled MDPs, a new regret analysis for online constrained optimization, a drift analysis for queue processes, and a perturbation analysis based on Farkas' Lemma.
  • This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich's OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i.i.d. generated at each round and are disclosed to the decision maker only after the decision is made. This formulation arises naturally when decisions are restricted by stochastic environments or deterministic environments with noisy observations. It also includes many important problems as special cases, such as OCO with long term constraints, stochastic constrained convex optimization, and deterministic constrained convex optimization. To solve this problem, this paper proposes a new algorithm that achieves $O(\sqrt{T})$ expected regret and constraint violations and $O(\sqrt{T}\log(T))$ high probability regret and constraint violations. Experiments on a real-world data center scheduling problem further verify the performance of the new algorithm.
  • This paper considers the recovery of group sparse signals over a multi-agent network, where the measurements are subject to sparse errors. We first investigate the robust group LASSO model and its centralized algorithm based on the alternating direction method of multipliers (ADMM), which requires a central fusion center to compute a global row-support detector. To implement it in a decentralized network environment, we then adopt dynamic average consensus strategies that enable dynamic tracking of the global row-support detector. Numerical experiments demonstrate the effectiveness of the proposed algorithms.
  • This paper considers a cost minimization problem for data centers with N servers and randomly arriving service requests. A central router decides which server to use for each new request. Each server has three types of states (active, idle, setup) with different costs and time durations. The servers operate asynchronously over their own states and can choose one of multiple sleep modes when idle. We develop an online distributed control algorithm so that each server makes its own decisions, the request queues are bounded and the overall time average cost is near optimal with probability 1. The algorithm does not need probability information for the arrival rate or job sizes. Next, an improved algorithm that uses a single queue is developed via a "virtualization" technique which is shown to provide the same (near optimal) costs. Simulation experiments on a real data center traffic trace demonstrate the efficiency of our algorithm compared to other existing algorithms.
  • This paper considers the problem of minimizing the time average of a controlled stochastic process subject to multiple time average constraints on other related processes. The probability distribution of the random events in the system is unknown to the controller. A typical application is time average power minimization subject to network throughput constraints for different users in a network with time varying channel conditions. We show that with probability at least $1-2\delta$, the classical drift-plus-penalty algorithm provides a sample path $\mathcal{O}(\varepsilon)$ approximation to optimality with a convergence time $\mathcal{O}(\frac{1}{\varepsilon^2}\max\left\{\log^2\frac1\varepsilon\log\frac2\delta,~\log^3\frac2\delta\right\})$, where $\varepsilon>0$ is a parameter related to the algorithm. When there is only one constraint, we further show that the convergence time can be improved to $\mathcal{O}\left(\frac{1}{\varepsilon^2}\log^2\frac1\delta\right)$.
  • We study high-dimensional signal recovery from non-linear measurements with design vectors having elliptically symmetric distribution. Special attention is devoted to the situation when the unknown signal belongs to a set of low statistical complexity, while both the measurements and the design vectors are heavy-tailed. We propose and analyze a new estimator that adapts to the structure of the problem, while being robust both to the possible model misspecification characterized by arbitrary non-linearity of the measurements as well as to data corruption modeled by the heavy-tailed distributions. Moreover, this estimator has low computational complexity. Our results are expressed in the form of exponential concentration inequalities for the error of the proposed estimator. On the technical side, our proofs rely on the generic chaining methods, and illustrate the power of this approach for statistical applications. Theory is supported by numerical experiments demonstrating that our estimator outperforms existing alternatives when data is heavy-tailed.
  • This paper considers an opportunistic scheduling problem over a renewal system. A controller observes a random event at the beginning of each renewal frame and then chooses an action in response to the event, which affects the duration of the frame, the amount of resources used, and a penalty metric. The goal is to make frame-wise decisions so as to minimize the time average penalty subject to time average resource constraints. This problem has applications to task processing and communication in data networks, as well as to certain classes of Markov decision problems. We formulate the problem as a dynamic fractional program and propose an adaptive algorithm which uses an empirical accumulation as a feedback parameter. A key feature of the proposed algorithm is that it does not require knowledge of the random event statistics and potentially allows (uncountably) infinite event sets. We prove the algorithm satisfies all desired constraints and achieves $O(\epsilon)$ near optimality with probability 1.
  • This paper considers optimization of power and delay in a time-varying wireless link using rateless codes. The link serves a sequence of variable-length packets. Each packet is coded and transmitted over multiple slots. Channel conditions can change from slot to slot and are unknown to the transmitter. The amount of mutual information accumulated on each slot depends on the random channel realization and the power used. The goal is to minimize average service delay subject to an average power constraint. We formulate this problem as a frame-based stochastic optimization problem and solve it via an online algorithm. We show that the subproblem within each frame is a simple integer program which can be effectively solved using a dynamic program. The optimality of this online algorithm is proved using the frame-based Lyapunov drift analysis.
  • This paper considers the problem of recovering a group sparse signal matrix $\mathbf{Y} = [\mathbf{y}_1, \cdots, \mathbf{y}_L]$ from sparsely corrupted measurements $\mathbf{M} = [\mathbf{A}_{(1)}\mathbf{y}_{1}, \cdots, \mathbf{A}_{(L)}\mathbf{y}_{L}] + \mathbf{S}$, where $\mathbf{A}_{(i)}$'s are known sensing matrices and $\mathbf{S}$ is an unknown sparse error matrix. A robust group lasso (RGL) model is proposed to recover $\mathbf{Y}$ and $\mathbf{S}$ through simultaneously minimizing the $\ell_{2,1}$-norm of $\mathbf{Y}$ and the $\ell_1$-norm of $\mathbf{S}$ under the measurement constraints. We prove that $\mathbf{Y}$ and $\mathbf{S}$ can be exactly recovered from the RGL model with a high probability for a very general class of $\mathbf{A}_{(i)}$'s.
  • This paper treats power-aware throughput maxi-mization in a multi-user file downloading system. Each user can receive a new file only after its previous file is finished. The file state processes for each user act as coupled Markov chains that form a generalized restless bandit system. First, an optimal algorithm is derived for the case of one user. The algorithm maximizes throughput subject to an average power constraint. Next, the one-user algorithm is extended to a low complexity heuristic for the multi-user problem. The heuristic uses a simple online index policy. In a special case with no power-constraint, the multi-user heuristic is shown to be throughput optimal. Simulations are used to demonstrate effectiveness of the heuristic in the general case. For simple cases where the optimal solution can be computed offline, the heuristic is shown to be near-optimal for a wide range of parameters.
  • This paper treats power-aware throughput maximization in a multi-user file downloading system. Each user can receive a new file only after its previous file is finished. The file state processes for each user act as coupled Markov chains that form a generalized restless bandit system. First, an optimal algorithm is derived for the case of one user. The algorithm maximizes throughput subject to an average power constraint. Next, the one-user algorithm is extended to a low complexity heuristic for the multi-user problem. The heuristic uses a simple online index policy and its effectiveness is shown via simulation. For simple 3-user cases where the optimal solution can be computed offline, the heuristic is shown to be near-optimal for a wide range of parameters.