• ### Randomized Experimental Design via Geographic Clustering(1611.03780)

Feb. 15, 2019 cs.SI, cs.DS
Web-based services often run randomized experiments to improve their products. A popular way to run these experiments is to use geographical regions as units of experimentation, since this does not require tracking of individual users or browser cookies. Since users may issue queries from multiple geographical locations, geo-regions cannot be considered independent and interference may be present in the experiment. In this paper, we study this problem, and first present GeoCUTS, a novel algorithm that forms geographical clusters to minimize interference while preserving balance in cluster size. We use a random sample of anonymized traffic from Google Search to form a graph representing user movements, then construct a geographically coherent clustering of the graph. Our main technical contribution is a statistical framework to measure the effectiveness of clusterings. Furthermore, we perform empirical evaluations showing that the performance of GeoCUTS is comparable to hand-crafted geo-regions with respect to both novel and existing metrics.
• ### Stochastic bandits robust to adversarial corruptions(1803.09353)

March 25, 2018 cs.GT, cs.DS, cs.LG, stat.ML
We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be adversarially changed to trick the algorithm, e.g., click fraud, fake reviews and email spam. The goal of this model is to encourage the design of bandit algorithms that (i) work well in mixed adversarial and stochastic models, and (ii) whose performance deteriorates gracefully as we move from fully stochastic to fully adversarial models. In our model, the rewards for all arms are initially drawn from a distribution and are then altered by an adaptive adversary. We provide a simple algorithm whose performance gracefully degrades with the total corruption the adversary injected in the data, measured by the sum across rounds of the biggest alteration the adversary made in the data in that round; this total corruption is denoted by $C$. Our algorithm provides a guarantee that retains the optimal guarantee (up to a logarithmic term) if the input is stochastic and whose performance degrades linearly to the amount of corruption $C$, while crucially being agnostic to it. We also provide a lower bound showing that this linear degradation is necessary if the algorithm achieves optimal performance in the stochastic setting (the lower bound works even for a known amount of corruption, a special case in which our algorithm achieves optimal performance without the extra logarithm).
• ### Optimizing cluster-based randomized experiments under a monotonicity assumption(1803.02876)

March 7, 2018 stat.ME
Cluster-based randomized experiments are popular designs for mitigating the bias of standard estimators when interference is present and classical causal inference and experimental design assumptions (such as SUTVA or ITR) do not hold. Without an exact knowledge of the interference structure, it can be challenging to understand which partitioning of the experimental units is optimal to minimize the estimation bias. In the paper, we introduce a monotonicity condition under which a novel two-stage experimental design allows us to determine which of two cluster-based designs yields the least biased estimator. We then consider the setting of online advertising auctions and show that reserve price experiments verify the monotonicity condition and the proposed framework and methodology applies. We validate our findings on an advertising auction dataset.
• ### Robust Repeated Auctions under Heterogeneous Buyer Behavior(1803.00494)

March 1, 2018 cs.GT
We study revenue optimization in a repeated auction between a single seller and a single buyer. Traditionally, the design of repeated auctions requires strong modeling assumptions about the bidder behavior, such as it being myopic, infinite lookahead, or some specific form of learning behavior. Is it possible to design mechanisms which are simultaneously optimal against a multitude of possible buyer behaviors? We answer this question by designing a simple state-based mechanism that is simultaneously approximately optimal against a $k$-lookahead buyer for all $k$, a buyer who is a no-regret learner, and a buyer who is a policy-regret learner. Against each type of buyer our mechanism attains a constant fraction of the optimal revenue attainable against that type of buyer. We complement our positive results with almost tight impossibility results, showing that the revenue approximation tradeoffs achieved by our mechanism for different lookahead attitudes are near-optimal.
• ### Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs(1711.03076)

Feb. 27, 2018 cs.DC, cs.DS
Randomized composable coresets were introduced recently as an effective technique for solving matching and vertex cover problems in various models of computation. In this technique, one partitions the edges of a graph randomly into multiple pieces, compresses each piece into a smaller subgraph, namely a coreset, and solves the problem on the union of these coresets to find the solution. By designing small size randomized coresets, one can obtain efficient algorithms, in a black-box way, in multiple computational models including streaming, distributed communication, and the massively parallel computation (MPC) model. We develop randomized coresets of size $\widetilde{O}(n)$ that for any constant $\varepsilon > 0$, give a $(3/2+\varepsilon)$-approximation to matching and a $(3+\varepsilon)$-approximation to vertex cover. Our coresets improve upon the previously best approximation ratio of $O(1)$ for matching and $O(\log{n})$ for vertex cover. Our result for matching goes beyond a 2-approximation, which is a natural barrier for maximum matching in many models of computation. We further build on our coresets to achieve a $(1+\varepsilon)$-approximation to matching and $O(1)$-approximation to vertex cover in the MPC model with only $O(n/poly\log{(n)})$ memory per machine and $O(\log\log{(n)})$ MPC rounds. Our algorithm for vertex cover is the first $O(1)$-approximation MPC algorithm with $o(\log{n})$ rounds with $O(n)$ memory per machine and our matching algorithm improves upon the state-of-the-art $O((\log\log{n})^2)$ round algorithm of Czumaj et. al. (STOC 2018). A key technical ingredient of our paper is a novel application of edge degree constrained subgraphs (EDCS). At the heart of our proofs are new structural properties of EDCS that identify these subgraphs as sparse certificates for matchings and vertex covers which are quite robust to sampling and composition.
• ### ASYMP: Fault-tolerant Mining of Massive Graphs(1712.09731)

Dec. 28, 2017 cs.DC, cs.SI
We present ASYMP, a distributed graph processing system developed for the timely analysis of graphs with trillions of edges. ASYMP has several distinguishing features including a robust fault tolerance mechanism, a lockless architecture which scales seamlessly to thousands of machines, and efficient data access patterns to reduce per-machine overhead. ASYMP is used to analyze the largest graphs at Google, and the graphs we consider in our empirical evaluation here are, to the best of our knowledge, the largest considered in the literature. Our experimental results show that compared to previous graph processing frameworks at Google, ASYMP can scale to larger graphs, operate on more crowded clusters, and complete real-world graph mining analytic tasks faster. First, we evaluate the speed of ASYMP, where we show that across a diverse selection of graphs, it runs Connected Component 3-50x faster than state of the art implementations in MapReduce and Pregel. Then we demonstrate the scalability and parallelism of this framework: first by showing that the running time increases linearly by increasing the size of the graphs (without changing the number of machines), and then by showing the gains in running time while increasing the number of machines. Finally, we demonstrate the fault-tolerance properties for the framework, showing that inducing 50% of our machines to fail increases the running time by only 41%.
• ### Consistent Hashing with Bounded Loads(1608.01350)

July 27, 2017 cs.DS
Designing algorithms for balanced allocation of clients to servers in dynamic settings is a challenging problem for a variety of reasons. Both servers and clients may be added and/or removed from the system periodically, and the main objectives of allocation algorithms are: the uniformity of the allocation, and the number of moves after adding or removing a server or a client. The most popular solution for our dynamic settings is Consistent Hashing. However, the load balancing of consistent hashing is no better than a random assignment of clients to servers, so with $n$ of each, we expect many servers to be overloaded with $\Theta(\log n/ \log\log n)$ clients. In this paper, with $n$ clients and $n$ servers, we get a guaranteed max-load of 2 while only moving an expected constant number of clients for each update. We take an arbitrary user specified balancing parameter $c=1+\epsilon>1$. With $m$ balls and $n$ bins in the system, we want no load above $\lceil cm/n\rceil$. Meanwhile we want to bound the expected number of balls that have to be moved when a ball or server is added or removed. Compared with general lower bounds without capacity constraints, we show that in our algorithm when a ball or bin is inserted or deleted, the expected number of balls that have to be moved is increased only by a multiplicative factor $O({1\over \epsilon^2})$ for $\epsilon \le 1$ (Theorem 4) and by a factor $1+O(\frac{\log c}c)$ for $\epsilon\ge 1$ (Theorem 3). Technically, the latter bound is the most challenging to prove. It implies that we for superconstant $c$ only pay a negligible cost in extra moves. We also get the same bounds for the simpler problem where we instead of a user specified balancing parameter have a fixed bin capacity $C$ for all bins.
• ### Matroids Hitting Sets and Unsupervised Dependency Grammar Induction(1705.08992)

July 15, 2017 cs.DM, cs.DS, cs.CL
This paper formulates a novel problem on graphs: find the minimal subset of edges in a fully connected graph, such that the resulting graph contains all spanning trees for a set of specifed sub-graphs. This formulation is motivated by an un-supervised grammar induction problem from computational linguistics. We present a reduction to some known problems and algorithms from graph theory, provide computational complexity results, and describe an approximation algorithm.
• ### Overcommitment in Cloud Services -- Bin packing with Chance Constraints(1705.09335)

May 25, 2017 cs.DS
This paper considers a traditional problem of resource allocation, scheduling jobs on machines. One such recent application is cloud computing, where jobs arrive in an online fashion with capacity requirements and need to be immediately scheduled on physical machines in data centers. It is often observed that the requested capacities are not fully utilized, hence offering an opportunity to employ an overcommitment policy, i.e., selling resources beyond capacity. Setting the right overcommitment level can induce a significant cost reduction for the cloud provider, while only inducing a very low risk of violating capacity constraints. We introduce and study a model that quantifies the value of overcommitment by modeling the problem as a bin packing with chance constraints. We then propose an alternative formulation that transforms each chance constraint into a submodular function. We show that our model captures the risk pooling effect and can guide scheduling and overcommitment decisions. We also develop a family of online algorithms that are intuitive, easy to implement and provide a constant factor guarantee from optimal. Finally, we calibrate our model using realistic workload data, and test our approach in a practical setting. Our analysis and experiments illustrate the benefit of overcommitment in cloud services, and suggest a cost reduction of 1.5% to 17% depending on the provider's risk tolerance.
• ### Distributed Coverage Maximization via Sketching(1612.02327)

March 10, 2017 cs.DS
Coverage problems are central in optimization and have a wide range of applications in data mining and machine learning. While several distributed algorithms have been developed for coverage problems, the existing methods suffer from several limitations, e.g., they all achieve either suboptimal approximation guarantees or suboptimal space and memory complexities. In addition, previous algorithms developed for submodular maximization assume oracle access, and ignore the computational complexity of communicating large subsets or computing the size of the union of the subsets in this subfamily. In this paper, we develop an improved distributed algorithm for the $k$-cover and the set cover with $\lambda$ outliers problems, with almost optimal approximation guarantees, almost optimal memory complexity, and linear communication complexity running in only four rounds of computation. Finally, we perform an extensive empirical study of our algorithms on a number of publicly available real data sets, and show that using sketches of size $30$ to $600$ times smaller than the input, one can solve the coverage maximization problem with quality very close to that of the state-of-the-art single-machine algorithm.
• ### Almost Optimal Streaming Algorithms for Coverage Problems(1610.08096)

March 10, 2017 cs.DS
Maximum coverage and minimum set cover problems --collectively called coverage problems-- have been studied extensively in streaming models. However, previous research not only achieve sub-optimal approximation factors and space complexities, but also study a restricted set arrival model which makes an explicit or implicit assumption on oracle access to the sets, ignoring the complexity of reading and storing the whole set at once. In this paper, we address the above shortcomings, and present algorithms with improved approximation factor and improved space complexity, and prove that our results are almost tight. Moreover, unlike most of previous work, our results hold on a more general edge arrival model. More specifically, we present (almost) optimal approximation algorithms for maximum coverage and minimum set cover problems in the streaming model with an (almost) optimal space complexity of $\tilde{O}(n)$, i.e., the space is {\em independent of the size of the sets or the size of the ground set of elements}. These results not only improve over the best known algorithms for the set arrival model, but also are the first such algorithms for the more powerful {\em edge arrival} model. In order to achieve the above results, we introduce a new general sketching technique for coverage functions: This sketching scheme can be applied to convert an $\alpha$-approximation algorithm for a coverage problem to a $(1-\eps)\alpha$-approximation algorithm for the same problem in streaming, or RAM models. We show the significance of our sketching technique by ruling out the possibility of solving coverage problems via accessing (as a black box) a $(1 \pm \eps)$-approximate oracle (e.g., a sketch function) that estimates the coverage function on any subfamily of the sets.
• ### Optimal dynamic mechanisms with ex-post IR via bank accounts(1605.08840)

July 6, 2016 cs.GT
Lately, the problem of designing multi-stage dynamic mechanisms has been shown to be both theoretically challenging and practically important. In this paper, we consider the problem of designing revenue optimal dynamic mechanism for a setting where an auctioneer sells a set of items to a buyer in multiple stages. At each stage, there could be multiple items for sale but each item can only appear in one stage. The type of the buyer at each stage is thus a multi-dimensional vector characterizing the buyer's valuations of the items at that stage and is assumed to be stage-wise independent. In particular, we propose a novel class of mechanisms called bank account mechanisms. Roughly, a bank account mechanism is no different from any stage-wise individual mechanism except for an augmented structure called bank account, a real number for each node that summarizes the history so far. We first establish that the optimal revenue from any dynamic mechanism in this setting can be achieved by a bank account mechanism, and we provide a simple characterization of the set of incentive compatible and ex-post individually rational bank account mechanisms. Based on these characterizations, we then investigate the problem of finding the (approximately) optimal bank account mechanisms. We prove that there exists a simple, randomized bank account mechanism that approximates optimal revenue up to a constant factor. Our result is general and can accommodate previous approximation results in single-shot multi-dimensional mechanism design. Based on the previous mechanism, we further show that there exists a deterministic bank account mechanism that achieves constant-factor approximation as well. Finally, we consider the problem of computing optimal mechanisms when the type space is discrete and provide an FPTAS via linear and dynamic programming.
• ### Greedy Column Subset Selection: New Bounds and Distributed Algorithms(1605.08795)

June 11, 2016 cs.DS
The problem of column subset selection has recently attracted a large body of research, with feature selection serving as one obvious and important application. Among the techniques that have been applied to solve this problem, the greedy algorithm has been shown to be quite effective in practice. However, theoretical guarantees on its performance have not been explored thoroughly, especially in a distributed setting. In this paper, we study the greedy algorithm for the column subset selection problem from a theoretical and empirical perspective and show its effectiveness in a distributed setting. In particular, we provide an improved approximation guarantee for the greedy algorithm which we show is tight up to a constant factor, and present the first distributed implementation with provable approximation factors. We use the idea of randomized composable core-sets, developed recently in the context of submodular maximization. Finally, we validate the effectiveness of this distributed algorithm via an empirical study.
• ### Tight Bounds for Approximate Carath\'eodory and Beyond(1512.08602)

Dec. 29, 2015 cs.DS, math.OC, cs.LG
We give a deterministic nearly-linear time algorithm for approximating any point inside a convex polytope with a sparse convex combination of the polytope's vertices. Our result provides a constructive proof for the Approximate Carath\'{e}odory Problem, which states that any point inside a polytope contained in the $\ell_p$ ball of radius $D$ can be approximated to within $\epsilon$ in $\ell_p$ norm by a convex combination of only $O\left(D^2 p/\epsilon^2\right)$ vertices of the polytope for $p \geq 2$. We also show that this bound is tight, using an argument based on anti-concentration for the binomial distribution. Along the way of establishing the upper bound, we develop a technique for minimizing norms over convex sets with complicated geometry; this is achieved by running Mirror Descent on a dual convex function obtained via Sion's Theorem. As simple extensions of our method, we then provide new algorithms for submodular function minimization and SVM training. For submodular function minimization we obtain a simplification and (provable) speed-up over Wolfe's algorithm, the method commonly found to be the fastest in practice. For SVM training, we obtain $O(1/\epsilon^2)$ convergence for arbitrary kernels; each iteration only requires matrix-vector operations involving the kernel matrix, so we overcome the obstacle of having to explicitly store the kernel or compute its Cholesky factorization.
• ### Distributed Balanced Partitioning via Linear Embedding(1512.02727)

Dec. 9, 2015 cs.DC
Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems, e.g., in some cases, a big graph can be chopped into pieces that fit on one machine to be processed independently before stitching the results together. In other cases, links between different parts may show up in the running time and/or network communications cost. We study a distributed balanced partitioning problem where the goal is to partition the vertices of a given graph into k pieces so as to minimize the total cut size. Our algorithm is composed of a few steps that are easily implementable in distributed computation frameworks. The algorithm first embeds nodes of the graph onto a line, and then processes nodes in a distributed manner guided by the linear embedding order. We examine various ways to find the first embedding, e.g., via a hierarchical clustering or Hilbert curves. Then we apply four different techniques including local swaps, minimum cuts on the boundaries of partitions, as well as contraction and dynamic programming. As our empirical study, we compare the above techniques with each other, and also to previous work in distributed graph algorithms, e.g., a label propagation method, FENNEL and Spinner. We report our results both on a private map graph and several public social networks, and show that our results beat previous distributed algorithms: e.g., compared to the label propagation algorithm, we report an improvement of 15-25% in the cut value. We also observe that our algorithms allow for scalable distributed implementation for any number of partitions. Finally, we apply our techniques for the Google Maps Driving Directions to minimize the number of multi-shard queries with the goal of saving in CPU usage. During live experiments, we observe an ~40% drop in the number of multi-shard queries when comparing our method with a standard geography-based method.
• ### Expanders via Local Edge Flips(1510.07768)

Oct. 27, 2015 math.PR, cs.DC, cs.NI, cs.DS
Designing distributed and scalable algorithms to improve network connectivity is a central topic in peer-to-peer networks. In this paper we focus on the following well-known problem: given an $n$-node $d$-regular network for $d=\Omega(\log n)$, we want to design a decentralized, local algorithm that transforms the graph into one that has good connectivity properties (low diameter, expansion, etc.) without affecting the sparsity of the graph. To this end, Mahlmann and Schindelhauer introduced the random "flip" transformation, where in each time step, a random pair of vertices that have an edge decide to swap a neighbor'. They conjectured that performing $O(n d)$ such flips at random would convert any connected $d$-regular graph into a $d$-regular expander graph, with high probability. However, the best known upper bound for the number of steps is roughly $O(n^{17} d^{23})$, obtained via a delicate Markov chain comparison argument. Our main result is to prove that a natural instantiation of the random flip produces an expander in at most $O(n^2 d^2 \sqrt{\log n})$ steps, with high probability. Our argument uses a potential-function analysis based on the matrix exponential, together with the recent beautiful results on the higher-order Cheeger inequality of graphs. We also show that our technique can be used to analyze another well-studied random process known as the random switch', and show that it produces an expander in $O(n d)$ steps with high probability.
• ### Randomized Composable Core-sets for Distributed Submodular Maximization(1506.06715)

June 22, 2015 cs.DC, cs.DS
An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of {\em composable core-sets}, and has been recently applied to solve diversity maximization problems as well as several clustering problems. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique \cite{IMMM14}. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a {\em random clustering} of the data. We employ this technique for the coverage, monotone and non-monotone submodular maximization problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in a distributed and streaming settings. In summary, we show that a simple greedy algorithm results in a $1/3$-approximate randomized composable core-set for submodular maximization under a cardinality constraint. This is in contrast to a known $O({\log k\over \sqrt{k}})$ impossibility result for (non-randomized) composable core-set. Our result also extends to non-monotone submodular functions, and leads to the first 2-round MapReduce-based constant-factor approximation algorithm with $O(n)$ total communication complexity for either monotone or non-monotone functions. Finally, using an improved analysis technique and a new algorithm $\mathsf{PseudoGreedy}$, we present an improved $0.545$-approximation algorithm for monotone submodular maximization, which is in turn the first MapReduce-based algorithm beating factor $1/2$ in a constant number of rounds.
• ### Multiplicative Bidding in Online Advertising(1404.6727)

April 27, 2014 cs.DS
In this paper, we initiate the study of the multiplicative bidding language adopted by major Internet search companies. In multiplicative bidding, the effective bid on a particular search auction is the product of a base bid and bid adjustments that are dependent on features of the search (for example, the geographic location of the user, or the platform on which the search is conducted). We consider the task faced by the advertiser when setting these bid adjustments, and establish a foundational optimization problem that captures the core difficulty of bidding under this language. We give matching algorithmic and approximation hardness results for this problem; these results are against an information-theoretic bound, and thus have implications on the power of the multiplicative bidding language itself. Inspired by empirical studies of search engine price data, we then codify the relevant restrictions of the problem, and give further algorithmic and hardness results. Our main technical contribution is an $O(\log n)$-approximation for the case of multiplicative prices and monotone values. We also provide empirical validations of our problem restrictions, and test our algorithms on real data against natural benchmarks. Our experiments show that they perform favorably compared with the baseline.
• ### Clinching Auctions Beyond Hard Budget Constraints(1404.5000)

April 20, 2014 cs.GT
Constraints on agent's ability to pay play a major role in auction design for any setting where the magnitude of financial transactions is sufficiently large. Those constraints have been traditionally modeled in mechanism design as \emph{hard budget}, i.e., mechanism is not allowed to charge agents more than a certain amount. Yet, real auction systems (such as Google AdWords) allow more sophisticated constraints on agents' ability to pay, such as \emph{average budgets}. In this work, we investigate the design of Pareto optimal and incentive compatible auctions for agents with \emph{constrained quasi-linear utilities}, which captures more realistic models of liquidity constraints that the agents may have. Our result applies to a very general class of allocation constraints known as polymatroidal environments, encompassing many settings of interest such as multi-unit auctions, matching markets, video-on-demand and advertisement systems. Our design is based Ausubel's \emph{clinching framework}. Incentive compatibility and feasibility with respect to ability-to-pay constraints are direct consequences of the clinching framework. Pareto-optimality, on the other hand, is considerably more challenging, since the no-trade condition that characterizes it depends not only on whether agents have their budgets exhausted or not, but also on prices {at} which the goods are allocated. In order to get a handle on those prices, we introduce novel concepts of dropping prices and saturation. These concepts lead to our main structural result which is a characterization of the tight sets in the clinching auction outcome and its relation to dropping prices.
• ### Local Graph Clustering Beyond Cheeger's Inequality(1304.8132)

Nov. 7, 2013 cs.DS, cs.LG, stat.ML
Motivated by applications of large-scale graph clustering, we study random-walk-based LOCAL algorithms whose running times depend only on the size of the output cluster, rather than the entire graph. All previously known such algorithms guarantee an output conductance of $\tilde{O}(\sqrt{\phi(A)})$ when the target set $A$ has conductance $\phi(A)\in[0,1]$. In this paper, we improve it to $$\tilde{O}\bigg( \min\Big\{\sqrt{\phi(A)}, \frac{\phi(A)}{\sqrt{\mathsf{Conn}(A)}} \Big\} \bigg)\enspace,$$ where the internal connectivity parameter $\mathsf{Conn}(A) \in [0,1]$ is defined as the reciprocal of the mixing time of the random walk over the induced subgraph on $A$. For instance, using $\mathsf{Conn}(A) = \Omega(\lambda(A) / \log n)$ where $\lambda$ is the second eigenvalue of the Laplacian of the induced subgraph on $A$, our conductance guarantee can be as good as $\tilde{O}(\phi(A)/\sqrt{\lambda(A)})$. This builds an interesting connection to the recent advance of the so-called improved Cheeger's Inequality [KKL+13], which says that global spectral algorithms can provide a conductance guarantee of $O(\phi_{\mathsf{opt}}/\sqrt{\lambda_3})$ instead of $O(\sqrt{\phi_{\mathsf{opt}}})$. In addition, we provide theoretical guarantee on the clustering accuracy (in terms of precision and recall) of the output set. We also prove that our analysis is tight, and perform empirical evaluation to support our theory on both synthetic and real data. It is worth noting that, our analysis outperforms prior work when the cluster is well-connected. In fact, the better it is well-connected inside, the more significant improvement (both in terms of conductance and accuracy) we can obtain. Our results shed light on why in practice some random-walk-based algorithms perform better than its previous theory, and help guide future research about local clustering.
• ### Clinching Auctions with Online Supply(1210.1456)

Oct. 4, 2012 cs.GT
Auctions for perishable goods such as internet ad inventory need to make real-time allocation and pricing decisions as the supply of the good arrives in an online manner, without knowing the entire supply in advance. These allocation and pricing decisions get complicated when buyers have some global constraints. In this work, we consider a multi-unit model where buyers have global {\em budget} constraints, and the supply arrives in an online manner. Our main contribution is to show that for this setting there is an individually-rational, incentive-compatible and Pareto-optimal auction that allocates these units and calculates prices on the fly, without knowledge of the total supply. We do so by showing that the Adaptive Clinching Auction satisfies a {\em supply-monotonicity} property. We also analyze and discuss, using examples, how the insights gained by the allocation and payment rule can be applied to design better ad allocation heuristics in practice. Finally, while our main technical result concerns multi-unit supply, we propose a formal model of online supply that captures scenarios beyond multi-unit supply and has applications to sponsored search. We conjecture that our results for multi-unit auctions can be extended to these more general models.
• ### Yield Optimization of Display Advertising with Ad Exchange(1102.2551)

Sept. 21, 2012 cs.DS, math.OC, cs.SY
Lookahead search is perhaps the most natural and widely used game playing strategy. Given the practical importance of the method, the aim of this paper is to provide a theoretical performance examination of lookahead search in a wide variety of applications. To determine a strategy play using lookahead search}, each agent predicts multiple levels of possible re-actions to her move (via the use of a search tree), and then chooses the play that optimizes her future payoff accounting for these re-actions. There are several choices of optimization function the agents can choose, where the most appropriate choice of function will depend on the specifics of the actual game - we illustrate this in our examples. Furthermore, the type of search tree chosen by computationally-constrained agent can vary. We focus on the case where agents can evaluate only a bounded number, $k$, of moves into the future. That is, we use depth $k$ search trees and call this approach {\em k-lookahead search}. We apply our method in five well-known settings: AdWord auctions; industrial organization (Cournot's model); congestion games; valid-utility games and basic-utility games; cost-sharing network design games. We consider two questions. First, what is the expected social quality of outcome when agents apply lookahead search? Second, what interactive behaviours can be exhibited when players use lookahead search?
The spread of influence in social networks is studied in two main categories: the progressive model and the non-progressive model (see e.g. the seminal work of Kempe, Kleinberg, and Tardos in KDD 2003). While the progressive models are suitable for modeling the spread of influence in monopolistic settings, non-progressive are more appropriate for modeling non-monopolistic settings, e.g., modeling diffusion of two competing technologies over a social network. Despite the extensive work on the progressive model, non-progressive models have not been studied well. In this paper, we study the spread of influence in the non-progressive model under the strict majority threshold: given a graph $G$ with a set of initially infected nodes, each node gets infected at time $\tau$ iff a majority of its neighbors are infected at time $\tau-1$. Our goal in the \textit{MinPTS} problem is to find a minimum-cardinality initial set of infected nodes that would eventually converge to a steady state where all nodes of $G$ are infected. We prove that while the MinPTS is NP-hard for a restricted family of graphs, it admits an improved constant-factor approximation algorithm for power-law graphs. We do so by proving lower and upper bounds in terms of the minimum and maximum degree of nodes in the graph. The upper bound is achieved in turn by applying a natural greedy algorithm. Our experimental evaluation of the greedy algorithm also shows its superior performance compared to other algorithms for a set of real-world graphs as well as the random power-law graphs. Finally, we study the convergence properties of these algorithms and show that the non-progressive model converges in at most $O(|E(G)|)$ steps.