
We study the detection problem of finding planted solutions in random
instances of flat satisfiability problems, a generalization of boolean
satisfiability formulas. We describe the properties of random instances of flat
satisfiability, as well of the optimal rates of detection of the associated
hypothesis testing problem. We also study the performance of an algorithmically
efficient testing procedure. We introduce a modification of our model, the
light planting of solutions, and show that it is as hard as the problem of
learning parity with noise. This hints strongly at the difficulty of detecting
planted flat satisfiability for a wide class of tests.

We consider the problem of link prediction, based on partial observation of a
large network, and on side information associated to its vertices. The
generative model is formulated as a matrix logistic regression. The performance
of the model is analysed in a highdimensional regime under a structural
assumption. The minimax rate for the Frobeniusnorm risk is established and a
combinatorial estimator based on the penalised maximum likelihood approach is
shown to achieve it. Furthermore, it is shown that this rate cannot be attained
by any (randomised) algorithm computable in polynomial time under a
computational complexity assumption.

We consider the problem of bandit optimization, inspired by stochastic
optimization and online learning problems with bandit feedback. In this
problem, the objective is to minimize a global loss function of all the
actions, not necessarily a cumulative loss. This framework allows us to study a
very general class of problems, with applications in statistics, machine
learning, and other fields. To solve this problem, we analyze the
UpperConfidence FrankWolfe algorithm, inspired by techniques for bandits and
convex optimization. We give theoretical guarantees for the performance of this
algorithm over various classes of functions, and discuss the optimality of
these results.

We consider the problem associated to recovering the block structure of an
Ising model given independent observations on the binary hypercube. This new
model, called the Ising blockmodel, is a perturbation of the mean field
approximation of the Ising model known as the CurieWeiss model: the sites are
partitioned into two blocks of equal size and the interaction between those of
the same block is stronger than across blocks, to account for more order within
each block. We study probabilistic, statistical and computational aspects of
this model in the highdimensional case when the number of sites may be much
larger than the sample size.

In recent years, sparse principal component analysis has emerged as an
extremely popular dimension reduction technique for highdimensional data. The
theoretical challenge, in the simplest case, is to estimate the leading
eigenvector of a population covariance matrix under the assumption that this
eigenvector is sparse. An impressive range of estimators have been proposed;
some of these are fast to compute, while others are known to achieve the
minimax optimal rate over certain Gaussian or subGaussian classes. In this
paper, we show that, under a widelybelieved assumption from computational
complexity theory, there is a fundamental tradeoff between statistical and
computational performance in this problem. More precisely, working with new,
larger classes satisfying a restricted covariance concentration condition, we
show that there is an effective sample size regime in which no randomised
polynomial time algorithm can achieve the minimax optimal rate. We also study
the theoretical performance of a (polynomial time) variant of the wellknown
semidefinite relaxation estimator, revealing a subtle interplay between
statistical and computational efficiency.

The restricted isometry property (RIP) for design matrices gives guarantees
for optimal recovery in sparse linear models. It is of high interest in
compressed sensing and statistical learning. This property is particularly
important for computationally efficient recovery methods. As a consequence,
even though it is in general NPhard to check that RIP holds, there have been
substantial efforts to find tractable proxies for it. These would allow the
construction of RIP matrices and the polynomialtime verification of RIP given
an arbitrary matrix. We consider the framework of averagecase certifiers, that
never wrongly declare that a matrix is RIP, while being often correct for
random instances. While there are such functions which are tractable in a
suboptimal parameter regime, we show that this is a computationally hard task
in any better regime. Our results are based on a new, weaker assumption on the
problem of detecting dense subgraphs.

We study the problem of detecting planted solutions in a random
satisfiability formula. Adopting the formalism of hypothesis testing in
statistical analysis, we describe the minimax optimal rates of detection. Our
analysis relies on the study of the number of satisfying assignments, for which
we prove new results. We also address algorithmic issues, and give a
computationally efficient test with optimal statistical performance. This
result is compared to an averagecase hypothesis on the hardness of refuting
satisfiability of random formulas.

Statistical estimation in many contemporary settings involves the
acquisition, analysis, and aggregation of datasets from multiple sources, which
can have significant differences in character and in value. Due to these
variations, the effectiveness of employing a given resource (e.g., a sensing
device or computing power) for gathering or processing data from a particular
source depends on the nature of that source. As a result, the appropriate
division and assignment of a collection of resources to a set of data sources
can substantially impact the overall performance of an inferential strategy. In
this expository article, we adopt a general view of the notion of a resource
and its effect on the quality of a data source, and we describe a framework for
the allocation of a given set of resources to a collection of sources in order
to optimize a specified metric of statistical efficiency. We discuss several
stylized examples involving inferential tasks such as parameter estimation and
hypothesis testing based on heterogeneous data sources, in which optimal
allocations can be computed either in closed form or via efficient numerical
procedures based on convex optimization.

We perform a finite sample analysis of the detection levels for sparse
principal components of a highdimensional covariance matrix. Our minimax
optimal test is based on a sparse eigenvalue statistic. Alas, computing this
test is known to be NPcomplete in general, and we describe a computationally
efficient alternative test using convex relaxations. Our relaxation is also
proved to detect sparse principal components at near optimal detection levels,
and it performs well on simulated datasets. Moreover, using polynomial time
reductions from theoretical computer science, we bring significant evidence
that our results cannot be improved, thus revealing an inherent trade off
between statistical and computational performance.

In the context of sparse principal component detection, we bring evidence
towards the existence of a statistical price to pay for computational
efficiency. We measure the performance of a test by the smallest signal
strength that it can detect and we propose a computationally efficient method
based on semidefinite programming. We also prove that the statistical
performance of this test cannot be strictly improved by any computationally
efficient method. Our results can be viewed as complexity theoretic lower
bounds conditionally on the assumptions that some instances of the planted
clique problem cannot be solved in randomized polynomial time.