• ### High-Dimensional Bayesian Optimization via Additive Models with Overlapping Groups(1802.07028)

March 28, 2018 cs.LG, stat.ML
Bayesian optimization (BO) is a popular technique for sequential black-box function optimization, with applications including parameter tuning, robotics, environmental monitoring, and more. One of the most important challenges in BO is the development of algorithms that scale to high dimensions, which remains a key open problem despite recent progress. In this paper, we consider the approach of Kandasamy et al. (2015), in which the high-dimensional function decomposes as a sum of lower-dimensional functions on subsets of the underlying variables. In particular, we significantly generalize this approach by lifting the assumption that the subsets are disjoint, and consider additive models with arbitrary overlap among the subsets. By representing the dependencies via a graph, we deduce an efficient message passing algorithm for optimizing the acquisition function. In addition, we provide an algorithm for learning the graph from samples based on Gibbs sampling. We empirically demonstrate the effectiveness of our methods on both synthetic and real-world data.
• ### Robust Maximization of Non-Submodular Objectives(1802.07073)

March 14, 2018 cs.AI, cs.DS, cs.LG, stat.ML
We study the problem of maximizing a monotone set function subject to a cardinality constraint $k$ in the setting where some number of elements $\tau$ is deleted from the returned set. The focus of this work is on the worst-case adversarial setting. While there exist constant-factor guarantees when the function is submodular, there are no guarantees for non-submodular objectives. In this work, we present a new algorithm Oblivious-Greedy and prove the first constant-factor approximation guarantees for a wider class of non-submodular objectives. The obtained theoretical bounds are the first constant-factor bounds that also hold in the linear regime, i.e. when the number of deletions $\tau$ is linear in $k$. Our bounds depend on established parameters such as the submodularity ratio and some novel ones such as the inverse curvature. We bound these parameters for two important objectives including support selection and variance reduction. Finally, we numerically demonstrate the robust performance of Oblivious-Greedy for these two objectives on various datasets.
• ### Robust Submodular Maximization: A Non-Uniform Partitioning Approach(1706.04918)

June 15, 2017 cs.LG, stat.ML
We study the problem of maximizing a monotone submodular function subject to a cardinality constraint $k$, with the added twist that a number of items $\tau$ from the returned set may be removed. We focus on the worst-case setting considered in (Orlin et al., 2016), in which a constant-factor approximation guarantee was given for $\tau = o(\sqrt{k})$. In this paper, we solve a key open problem raised therein, presenting a new Partitioned Robust (PRo) submodular maximization algorithm that achieves the same guarantee for more general $\tau = o(k)$. Our algorithm constructs partitions consisting of buckets with exponentially increasing sizes, and applies standard submodular optimization subroutines on the buckets in order to construct the robust solution. We numerically demonstrate the performance of PRo in data summarization and influence maximization, demonstrating gains over both the greedy algorithm and the algorithm of (Orlin et al., 2016).
• ### An Efficient Streaming Algorithm for the Submodular Cover Problem(1611.08574)

Nov. 25, 2016 cs.DS
We initiate the study of the classical Submodular Cover (SC) problem in the data streaming model which we refer to as the Streaming Submodular Cover (SSC). We show that any single pass streaming algorithm using sublinear memory in the size of the stream will fail to provide any non-trivial approximation guarantees for SSC. Hence, we consider a relaxed version of SSC, where we only seek to find a partial cover. We design the first Efficient bicriteria Submodular Cover Streaming (ESC-Streaming) algorithm for this problem, and provide theoretical guarantees for its performance supported by numerical evidence. Our algorithm finds solutions that are competitive with the near-optimal offline greedy algorithm despite requiring only a single pass over the data stream. In our numerical experiments, we evaluate the performance of ESC-Streaming on active set selection and large-scale graph cover problems.
• ### Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation(1610.07379)

Oct. 24, 2016 cs.IT, math.IT, cs.LG, stat.ML
We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets.
• ### Learning-based Compressive Subsampling(1510.06188)

March 28, 2016 cs.IT, math.IT, cs.LG, stat.ML
The problem of recovering a structured signal $\mathbf{x} \in \mathbb{C}^p$ from a set of dimensionality-reduced linear measurements $\mathbf{b} = \mathbf {A}\mathbf {x}$ arises in a variety of applications, such as medical imaging, spectroscopy, Fourier optics, and computerized tomography. Due to computational and storage complexity or physical constraints imposed by the problem, the measurement matrix $\mathbf{A} \in \mathbb{C}^{n \times p}$ is often of the form $\mathbf{A} = \mathbf{P}_{\Omega}\boldsymbol{\Psi}$ for some orthonormal basis matrix $\boldsymbol{\Psi}\in \mathbb{C}^{p \times p}$ and subsampling operator $\mathbf{P}_{\Omega}: \mathbb{C}^{p} \rightarrow \mathbb{C}^{n}$ that selects the rows indexed by $\Omega$. This raises the fundamental question of how best to choose the index set $\Omega$ in order to optimize the recovery performance. Previous approaches to addressing this question rely on non-uniform \emph{random} subsampling using application-specific knowledge of the structure of $\mathbf{x}$. In this paper, we instead take a principled learning-based approach in which a \emph{fixed} index set is chosen based on a set of training signals $\mathbf{x}_1,\dotsc,\mathbf{x}_m$. We formulate combinatorial optimization problems seeking to maximize the energy captured in these signals in an average-case or worst-case sense, and we show that these can be efficiently solved either exactly or approximately via the identification of modularity and submodularity structures. We provide both deterministic and statistical theoretical guarantees showing how the resulting measurement matrices perform on signals differing from the training signals, and we provide numerical examples showing our approach to be effective on a variety of data sets.
• ### Time-Varying Gaussian Process Bandit Optimization(1601.06650)

Jan. 25, 2016 cs.LG, stat.ML
We consider the sequential Bayesian optimization problem with bandit feedback, adopting a formulation that allows for the reward function to vary with time. We model the reward function using a Gaussian process whose evolution obeys a simple Markov model. We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB, instead forgets about old data in a smooth fashion. Our main contribution comprises of novel regret bounds for these algorithms, providing an explicit characterization of the trade-off between the time horizon and the rate at which the function varies. We illustrate the performance of the algorithms on both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover, both algorithms significantly outperform classical GP-UCB, since it treats stale and fresh data equally.
• ### Near-Optimally Teaching the Crowd to Classify(1402.2092)

March 7, 2014 cs.LG
How should we present training examples to learners to teach them classification rules? This is a natural problem when training workers for crowdsourcing labeling tasks, and is also motivated by challenges in data-driven online education. We propose a natural stochastic model of the learners, modeling them as randomly switching among hypotheses based on observed feedback. We then develop STRICT, an efficient algorithm for selecting examples to teach to workers. Our solution greedily maximizes a submodular surrogate objective function in order to select examples to show to the learners. We prove that our strategy is competitive with the optimal teaching policy. Moreover, for the special case of linear separators, we prove that an exponential reduction in error probability can be achieved. Our experiments on simulated workers as well as three real image annotation tasks on Amazon Mechanical Turk show the effectiveness of our teaching algorithm.