
Bayesian optimization (BO) is a popular technique for sequential blackbox
function optimization, with applications including parameter tuning, robotics,
environmental monitoring, and more. One of the most important challenges in BO
is the development of algorithms that scale to high dimensions, which remains a
key open problem despite recent progress. In this paper, we consider the
approach of Kandasamy et al. (2015), in which the highdimensional function
decomposes as a sum of lowerdimensional functions on subsets of the underlying
variables. In particular, we significantly generalize this approach by lifting
the assumption that the subsets are disjoint, and consider additive models with
arbitrary overlap among the subsets. By representing the dependencies via a
graph, we deduce an efficient message passing algorithm for optimizing the
acquisition function. In addition, we provide an algorithm for learning the
graph from samples based on Gibbs sampling. We empirically demonstrate the
effectiveness of our methods on both synthetic and realworld data.

We study the problem of maximizing a monotone set function subject to a
cardinality constraint $k$ in the setting where some number of elements $\tau$
is deleted from the returned set. The focus of this work is on the worstcase
adversarial setting. While there exist constantfactor guarantees when the
function is submodular, there are no guarantees for nonsubmodular objectives.
In this work, we present a new algorithm ObliviousGreedy and prove the first
constantfactor approximation guarantees for a wider class of nonsubmodular
objectives. The obtained theoretical bounds are the first constantfactor
bounds that also hold in the linear regime, i.e. when the number of deletions
$\tau$ is linear in $k$. Our bounds depend on established parameters such as
the submodularity ratio and some novel ones such as the inverse curvature. We
bound these parameters for two important objectives including support selection
and variance reduction. Finally, we numerically demonstrate the robust
performance of ObliviousGreedy for these two objectives on various datasets.

We study the problem of maximizing a monotone submodular function subject to
a cardinality constraint $k$, with the added twist that a number of items
$\tau$ from the returned set may be removed. We focus on the worstcase setting
considered in (Orlin et al., 2016), in which a constantfactor approximation
guarantee was given for $\tau = o(\sqrt{k})$. In this paper, we solve a key
open problem raised therein, presenting a new Partitioned Robust (PRo)
submodular maximization algorithm that achieves the same guarantee for more
general $\tau = o(k)$. Our algorithm constructs partitions consisting of
buckets with exponentially increasing sizes, and applies standard submodular
optimization subroutines on the buckets in order to construct the robust
solution. We numerically demonstrate the performance of PRo in data
summarization and influence maximization, demonstrating gains over both the
greedy algorithm and the algorithm of (Orlin et al., 2016).

We initiate the study of the classical Submodular Cover (SC) problem in the
data streaming model which we refer to as the Streaming Submodular Cover (SSC).
We show that any single pass streaming algorithm using sublinear memory in the
size of the stream will fail to provide any nontrivial approximation
guarantees for SSC. Hence, we consider a relaxed version of SSC, where we only
seek to find a partial cover.
We design the first Efficient bicriteria Submodular Cover Streaming
(ESCStreaming) algorithm for this problem, and provide theoretical guarantees
for its performance supported by numerical evidence. Our algorithm finds
solutions that are competitive with the nearoptimal offline greedy algorithm
despite requiring only a single pass over the data stream. In our numerical
experiments, we evaluate the performance of ESCStreaming on active set
selection and largescale graph cover problems.

We present a new algorithm, truncated variance reduction (TruVaR), that
treats Bayesian optimization (BO) and levelset estimation (LSE) with Gaussian
processes in a unified fashion. The algorithm greedily shrinks a sum of
truncated variances within a set of potential maximizers (BO) or unclassified
points (LSE), which is updated based on confidence bounds. TruVaR is effective
in several important settings that are typically nontrivial to incorporate
into myopic algorithms, including pointwise costs and heteroscedastic noise. We
provide a general theoretical guarantee for TruVaR covering these aspects, and
use it to recover and strengthen existing results on BO and LSE. Moreover, we
provide a new result for a setting where one can select from a number of noise
levels having associated costs. We demonstrate the effectiveness of the
algorithm on both synthetic and realworld data sets.

The problem of recovering a structured signal $\mathbf{x} \in \mathbb{C}^p$
from a set of dimensionalityreduced linear measurements $\mathbf{b} = \mathbf
{A}\mathbf {x}$ arises in a variety of applications, such as medical imaging,
spectroscopy, Fourier optics, and computerized tomography. Due to computational
and storage complexity or physical constraints imposed by the problem, the
measurement matrix $\mathbf{A} \in \mathbb{C}^{n \times p}$ is often of the
form $\mathbf{A} = \mathbf{P}_{\Omega}\boldsymbol{\Psi}$ for some orthonormal
basis matrix $\boldsymbol{\Psi}\in \mathbb{C}^{p \times p}$ and subsampling
operator $\mathbf{P}_{\Omega}: \mathbb{C}^{p} \rightarrow \mathbb{C}^{n}$ that
selects the rows indexed by $\Omega$. This raises the fundamental question of
how best to choose the index set $\Omega$ in order to optimize the recovery
performance. Previous approaches to addressing this question rely on
nonuniform \emph{random} subsampling using applicationspecific knowledge of
the structure of $\mathbf{x}$. In this paper, we instead take a principled
learningbased approach in which a \emph{fixed} index set is chosen based on a
set of training signals $\mathbf{x}_1,\dotsc,\mathbf{x}_m$. We formulate
combinatorial optimization problems seeking to maximize the energy captured in
these signals in an averagecase or worstcase sense, and we show that these
can be efficiently solved either exactly or approximately via the
identification of modularity and submodularity structures. We provide both
deterministic and statistical theoretical guarantees showing how the resulting
measurement matrices perform on signals differing from the training signals,
and we provide numerical examples showing our approach to be effective on a
variety of data sets.

We consider the sequential Bayesian optimization problem with bandit
feedback, adopting a formulation that allows for the reward function to vary
with time. We model the reward function using a Gaussian process whose
evolution obeys a simple Markov model. We introduce two natural extensions of
the classical Gaussian process upper confidence bound (GPUCB) algorithm. The
first, RGPUCB, resets GPUCB at regular intervals. The second, TVGPUCB,
instead forgets about old data in a smooth fashion. Our main contribution
comprises of novel regret bounds for these algorithms, providing an explicit
characterization of the tradeoff between the time horizon and the rate at
which the function varies. We illustrate the performance of the algorithms on
both synthetic and real data, and we find the gradual forgetting of TVGPUCB
to perform favorably compared to the sharp resetting of RGPUCB. Moreover,
both algorithms significantly outperform classical GPUCB, since it treats
stale and fresh data equally.

How should we present training examples to learners to teach them
classification rules? This is a natural problem when training workers for
crowdsourcing labeling tasks, and is also motivated by challenges in
datadriven online education. We propose a natural stochastic model of the
learners, modeling them as randomly switching among hypotheses based on
observed feedback. We then develop STRICT, an efficient algorithm for selecting
examples to teach to workers. Our solution greedily maximizes a submodular
surrogate objective function in order to select examples to show to the
learners. We prove that our strategy is competitive with the optimal teaching
policy. Moreover, for the special case of linear separators, we prove that an
exponential reduction in error probability can be achieved. Our experiments on
simulated workers as well as three real image annotation tasks on Amazon
Mechanical Turk show the effectiveness of our teaching algorithm.