
This paper establishes the first performance guarantees for a combinatorial
online algorithm that schedules stochastic, nonpreemptive jobs on unrelated
machines to minimize the expected total weighted completion time. Prior work on
unrelated machine scheduling with stochastic jobs was restricted to the offline
case, and required sophisticated linear or convex programming relaxations for
the assignment of jobs to machines. The algorithm introduced in this paper is
based on a purely combinatorial assignment of jobs to machines, hence it also
works online. The performance bounds are of the same order of magnitude as
those of earlier work, and depend linearly on an upper bound $\Delta$ on the
squared coefficient of variation of the jobs' processing times. They are
$4+2\Delta$ when there are no release dates, and $12+6\Delta$ when jobs are
released over time. For the special case of deterministic processing times,
without and with release times, this paper shows that the same combinatorial
greedy algorithm has a competitive ratio of 4 and 6, respectively. As to the
technical contribution, the paper shows for the first time how dual fitting
techniques can be used for stochastic and nonpreemptive scheduling problems.

We study the problem of recovery of matrices that are simultaneously low rank
and row and/or column sparse. Such matrices appear in recent applications in
cognitive neuroscience, imaging, computer vision, macroeconomics, and genetics.
We propose a GDT (Gradient Descent with hard Thresholding) algorithm to
efficiently recover matrices with such structure, by minimizing a biconvex
function over a nonconvex set of constraints. We show linear convergence of the
iterates obtained by GDT to a region within statistical error of an optimal
solution. As an application of our method, we consider multitask learning
problems and show that the statistical error rate obtained by GDT is near
optimal compared to minimax rate. Experiments demonstrate competitive
performance and much faster running speed compared to existing methods, on both
simulations and real data sets.

We analyse JointheShortestQueue in a contemporary scaling regime known as
the NonDegenerate Slowdown regime. JointheShortestQueue (JSQ) is a
classical load balancing policy for queueing systems with multiple parallel
servers. Parallel server queueing systems are regularly analysed and
dimensioned by diffusion approximations achieved in the HalfinWhitt scaling
regime. However, when jobs must be dispatched to a server upon arrival, we
advocate the NonDegenerate Slowdown regime (NDS) to compare different
loadbalancing rules.
In this paper we identify novel diffusion approximation and timescale
separation that provides insights into the performance of JSQ. We calculate the
price of irrevocably dispatching jobs to servers and prove this to within 15%
(in the NDS regime) of the rules that may manoeuvre jobs between servers. We
also compare ours results for the JSQ policy with the NDS approximations of
many modern load balancing policies such as IdleQueueFirst and
Powerof$d$choices policies which act as low information proxies for the JSQ
policy. Our analysis leads us to construct new rules that have identical
performance to JSQ but require less communication overhead than
powerof2choices.

We consider the problem of estimating the latent structure of a social
network based on the observed information diffusion events, or {\it cascades}.
Here for a given cascade, we only observe the times of infection for infected
nodes but not the source of the infection. Most of the existing work on this
problem has focused on estimating a diffusion matrix without any structural
assumptions on it. In this paper, we propose a novel model based on the
intuition that an information is more likely to propagate among two nodes if
they are interested in similar topics which are also prominent in the
information content. In particular, our model endows each node with an
influence vector (which measures how authoritative the node is on each topic)
and a receptivity vector (which measures how susceptible the node is for each
topic). We show how this nodetopic structure can be estimated from the
observed cascades and prove an analytical upper bound on the estimation error.
The estimated model can be used to build recommendation systems based on the
receptivity vectors, as well as for marketing based on the influence vectors.
Experiments on synthetic and real data demonstrate the improved performance and
better interpretability of our model compared to existing stateoftheart
methods.

LTE evolved Multimedia Broadcast/Multicast Service (eMBMS) is an attractive
solution for video delivery to very large groups in crowded venues. However,
deployment and management of eMBMS systems is challenging, due to the lack of
realtime feedback from the User Equipment (UEs). Therefore, we present the
Dynamic Monitoring (DyMo) system for lowoverhead feedback collection. DyMo
leverages eMBMS for broadcasting Stochastic Group Instructions to all UEs.
These instructions indicate the reporting rates as a function of the observed
Quality of Service (QoS). This simple feedback mechanism collects very limited
QoS reports from the UEs. The reports are used for network optimization,
thereby ensuring high QoS to the UEs. We present the design aspects of DyMo and
evaluate its performance analytically and via extensive simulations.
Specifically, we show that DyMo infers the optimal eMBMS settings with
extremely low overhead, while meeting strict QoS requirements under different
UE mobility patterns and presence of network component failures. For instance,
DyMo can detect the eMBMS SignaltoNoise Ratio (SNR) experienced by the 0.1%
percentile of the UEs with Root Mean Square Error (RMSE) of 0.05% with only 5
to 10 reports per second regardless of the number of UEs.

Linear elastic fracture mechanics admit analytic solutions that have low
regularity at crack tips. Current numerical methods for partial differential
equations (PDEs) of this type suffer from the constraint of such low
regularity, and fail to deliver optimal high order rate of convergence. We
approach the problem by (i) choosing an artificial interface to enclose the
center of the low regularity; and (ii) representing the solution in the
interior of artificial interface as unknown linear combination of known modes
of low regular solutions. This gives rise to an interface formulation of the
original PDE, and the linear combination are represented the interface
conditions. By enforcing the smooth component of numerical solution in the
interior domain to be approximately zero, a least square problem is obtained
for the unknown coefficients. The solution of this least square problem will
provide approximate interface conditions for the numerical solution of the PDE
in the exterior domain. The potential of our interface formulation is favorably
demonstrated by numerical experiments on 1D and 2D Poisson equations with low
regular solutions. High order numerical solutions of unknown coefficients and
PDEs are obtained. This proves the potential of the proposed interface
formulation as the theoretical basis for solving linear elastic fracture
mechanics problems. We indicate the relations between our interface formulation
and domain decomposition methods as well as a regularization strategy for the
PoissonBoltzmann equation with singular charge density.

WiFi multicast to very large groups has gained attention as a solution for
multimedia delivery in crowded areas. Yet, most recently proposed schemes do
not provide performance guarantees and none have been tested at scale. To
address the issue of providing high multicast throughput with performance
guarantees, we present the design and experimental evaluation of the Multicast
Dynamic Rate Adaptation (MuDRA) algorithm. MuDRA balances fast adaptation to
channel conditions and stability, which is essential for multimedia
applications. MuDRA relies on feedback from some nodes collected via a
lightweight protocol and dynamically adjusts the rate adaptation response
time. Our experimental evaluation of MuDRA on the ORBIT testbed with over 150
nodes shows that MuDRA outperforms other schemes and supports high throughput
multicast flows to hundreds of receivers while meeting quality requirements.
MuDRA can support multiple high quality video streams, where 90% of the nodes
report excellent or very good video quality.

The paper studies approximations and control of a processor sharing (PS)
server where the service rate depends on the number of jobs occupying the
server. The control of such a system is implemented by imposing a limit on the
number of jobs that can share the server concurrently, with the rest of the
jobs waiting in a firstinfirstout (FIFO) buffer. A desirable control scheme
should strike the right balance between efficiency (operating at a high service
rate) and parallelism (preventing small jobs from getting stuck behind large
ones).
We employ the framework of heavytraffic diffusion analysis to devise near
optimal control heuristics for such a queueing system. However, while the
literature on diffusion control of statedependent queueing systems begins with
a sequence of systems and an exogenously defined drift function, we begin with
a finite discrete PS server and propose an axiomatic recipe to explicitly
construct a sequence of statedependent PS servers which then yields a drift
function. We establish diffusion approximations and use them to obtain
insightful and closedform approximations for the original system under a
static concurrency limit control policy.
We extend our study to control policies that dynamically adjust the
concurrency limit. We provide two novel numerical algorithms to solve the
associated diffusion control problem. Our algorithms can be viewed as "average
cost" iteration: The first algorithm uses binarysearch on the average cost and
can find an $\epsilon$optimal policy in time $O\left( \log^2
\frac{1}{\epsilon} \right)$; the second algorithm uses the NewtonRaphson
method for rootfinding and requires $O\left( \log \frac{1}{\epsilon} \log\log
\frac{1}{\epsilon}\right)$ time.
Numerical experiments demonstrate the accuracy of our approximation for
choosing optimal or nearoptimal static and dynamic concurrency control
heuristics.

Motivated by the problem of packing Virtual Machines on physical servers in
the cloud, we study the problem of onedimensional online stochastic bin
packing. Items with sizes i.i.d. from an unknown distribution with integral
support arrive as a stream and must be packed on arrival in bins of size B,
also an integer. The size of an item is known when it arrives and the goal is
to minimize the waste, defined to be the total unused space in nonempty bins.
While there are many heuristics for online stochastic bin packing, all such
heuristics are either optimal for only certain classes of item size
distributions, or rely on learning the distribution. The stateoftheart Sum
of Squares heuristic (Csirik et al.) obtains sublinear (in number of items
seen) waste for distributions where the expected waste for the optimal offline
algorithm is sublinear, but has a constant factor larger waste for
distributions with linear waste under OPT. Csirik et al. solved this problem by
learning the distribution and solving an LP to inject phantom jobs in the
arrival stream.
We present two distributionagnostic bin packing heuristics that achieve
additive O(sqrt{n}) waste compared to OPT for all distributions. Our algorithms
are gradient descent on suitably defined Lagrangian relaxations of the bin
packing Linear Program. The first algorithm is very similar to the SS
algorithm, but conceptually packs the bins topdown instead of bottomup. This
motivates our second heuristic that uses a different Lagrangian relaxation to
pack bins bottomup.
Next, we consider the more general problem of online stochastic bin packing
with item departures where the time requirement of an item is only revealed
when the item departs. Our algorithms extend as is to the case of item
departures. We also briefly revisit the Best Fit heuristic which has not been
studied in the scenario of item departures yet.

We consider the bipartite matching model of customers and servers introduced
by Caldentey, Kaplan, and Weiss (Adv. Appl. Probab., 2009). Customers and
servers play symmetrical roles. There is a finite set C resp. S, of customer,
resp. server, classes. Time is discrete and at each time step, one customer and
one server arrive in the system according to a joint probability measure on
CxS, independently of the past. Also, at each time step, pairs of matched
customer and server, if they exist, depart from the system. Authorized
matchings are given by a fixed bipartite graph. A matching policy is chosen,
which decides how to match when there are several possibilities.
Customers/servers that cannot be matched are stored in a buffer. The evolution
of the model can be described by a discrete time Markov chain. We study its
stability under various admissible matching policies including: ML (Match the
Longest), MS (Match the Shortest), FIFO (match the oldest), priorities. There
exist natural necessary conditions for stability (independent of the matching
policy) defining the maximal possible stability region. For some bipartite
graphs, we prove that the stability region is indeed maximal for any admissible
matching policy. For the ML policy, we prove that the stability region is
maximal for any bipartite graph. For the MS and priority policies, we exhibit a
bipartite graph with a nonmaximal stability region.