
We consider rankone nonsymmetric tensor estimation and derive simple
formulas for the mutual information. We start by the order 2 problem, namely
matrix factorization. We treat it completely in a simpler fashion than previous
proofs using a new type of interpolation method developed in [1]. We then show
how to harness the structure in "layers" of tensor estimation in order to
obtain a formula for the mutual information for the order 3 problem from the
knowledge of the formula for the order 2 problem, still using the same kind of
interpolation. Our proof technique straightforwardly generalizes and allows to
rigorously obtain the mutual information at any order in a recursive way.

Sparse superposition codes, or sparse regression codes, constitute a new
class of codes which was first introduced for communication over the additive
white Gaussian noise (AWGN) channel. It has been shown that such codes are
capacityachieving over the AWGN channel under optimal maximumlikelihood
decoding as well as under various efficient iterative decoding schemes equipped
with power allocation or spatially coupled constructions. Here, we generalize
the analysis of these codes to a much broader setting that includes all
memoryless channels. We show, for a large class of memoryless channels, that
spatial coupling allows an efficient decoder, based on the generalized
approximate messagepassing (GAMP) algorithm, to reach the potential (or Bayes
optimal) threshold of the underlying (or uncoupled) code ensemble. Moreover, we
argue that spatially coupled sparse superposition codes universally achieve
capacity under GAMP decoding by showing, through analytical computations, that
the error floor vanishes and the potential threshold tends to capacity as one
of the code parameter goes to infinity. Furthermore, we provide a closed form
formula for the algorithmic threshold of the underlying code ensemble in terms
of a Fisher information. Relating an algorithmic threshold to a Fisher
information has theoretical as well as practical importance. Our proof relies
on the state evolution analysis and uses the potential method developed in the
theory of lowdensity paritycheck (LDPC) codes and compressed sensing.

Generalized linear models (GLMs) arise in highdimensional machine learning,
statistics, communications and signal processing. In this paper we analyze GLMs
when the data matrix is random, as relevant in problems such as compressed
sensing, errorcorrecting codes or benchmark models in neural networks. We
evaluate the mutual information (or "free entropy") from which we deduce the
Bayesoptimal estimation and generalization errors. Our analysis applies to the
highdimensional limit where both the number of samples and the dimension are
large and their ratio is fixed. Nonrigorous predictions for the optimal errors
existed for special cases of GLMs, e.g. for the perceptron, in the field of
statistical physics based on the socalled replica method. Our present paper
rigorously establishes those decades old conjectures and brings forward their
algorithmic interpretation in terms of performance of the generalized
approximate messagepassing algorithm. Furthermore, we tightly characterize,
for many learning problems, regions of parameters for which this algorithm
achieves the optimal performance, and locate the associated sharp phase
transitions separating learnable and nonlearnable regions. We believe that
this random version of GLMs can serve as a challenging benchmark for
multipurpose algorithms. This paper is divided in two parts that can be read
independently: The first part (main part) presents the model and main results,
discusses some applications and sketches the main ideas of the proof. The
second part (supplementary informations) is much more detailed and provides
more examples as well as all the proofs.

In recent years important progress has been achieved towards proving the
validity of the replica predictions for the (asymptotic) mutual information (or
"free energy") in Bayesian inference problems. The proof techniques that have
emerged appear to be quite general, despite they have been worked out on a
casebycase basis. Unfortunately, a common point between all these schemes is
their relatively high level of technicality. We present a new proof scheme that
is quite straightforward with respect to the previous ones. We call it the
adaptive interpolation method because it can be seen as an extension of the
interpolation method developped by Guerra and Toninelli in the context of spin
glasses, with an interpolation path that is adaptive. In order to illustrate
our method we show how to prove the replica formula for three nontrivial
inference problems. The first one is symmetric rankone matrix estimation (or
factorisation), which is the simplest problem considered here and the one for
which the method is presented in full details. Then we generalize to symmetric
tensor estimation and random linear estimation. We believe that the present
method has a much wider range of applicability and also sheds new insights on
the reasons for the validity of replica formulas in Bayesian inference.

There has been definite progress recently in proving the variational
singleletter formula given by the heuristic replica method for various
estimation problems. In particular, the replica formula for the mutual
information in the case of noisy linear estimation with random i.i.d. matrices,
a problem with applications ranging from compressed sensing to statistics, has
been proven rigorously. In this contribution we go beyond the restrictive
i.i.d. matrix assumption and discuss the formula proposed by Takeda, Uda,
Kabashima and later by Tulino, Verdu, Caire and Shamai who used the replica
method. Using the recently introduced adaptive interpolation method and random
matrix theory, we prove this formula for a relevant large subclass of
rotationally invariant matrices.

Consider random linear estimation with Gaussian measurement matrices and
noise. One can compute infinitesimal variations of the mutual information under
infinitesimal variations of the signaltonoise ratio or of the measurement
rate. We discuss how each variation is related to the minimum meansquare error
and deduce that the two variations are directly connected through a very simple
identity. The main technical ingredient is a new interpolation method called
"subextensive interpolation method". We use it to provide a new proof of an
IMMSE relation recently found by Reeves and Pfister [1] when the measurement
rate is varied. Our proof makes it clear that this relation is intimately
related to another IMMSE relation also recently proved in [2]. One can
directly verify that the identity relating the two types of variation of mutual
information is indeed consistent with the one letter replica symmetric formula
for the mutual information, first derived by Tanaka [3] for binary signals, and
recently proved in more generality in [1,2,4,5] (by independent methods).
However our proof is independent of any knowledge of Tanaka's formula.

We consider the estimation of a signal from the knowledge of its noisy linear
random Gaussian projections. A few examples where this problem is relevant are
compressed sensing, sparse superposition codes, and code division multiple
access. There has been a number of works considering the mutual information for
this problem using the replica method from statistical physics. Here we put
these considerations on a firm rigorous basis. First, we show, using a
GuerraToninelli type interpolation, that the replica formula yields an upper
bound to the exact mutual information. Secondly, for many relevant practical
cases, we present a converse lower bound via a method that uses spatial
coupling, state evolution analysis and the IMMSE theorem. This yields a single
letter formula for the mutual information and the minimalmeansquare error for
random Gaussian linear estimation of all discrete bounded signals. In addition,
we prove that the low complexity approximate messagepassing algorithm is
optimal outside of the socalled hard phase, in the sense that it
asymptotically reaches the minimalmeansquare error. In this work spatial
coupling is used primarily as a proof technique. However our results also prove
two important features of spatially coupled noisy linear random Gaussian
estimation. First there is no algorithmically hard phase. This means that for
such systems approximate messagepassing always reaches the minimalmeansquare
error. Secondly, in a proper limit the mutual information associated to such
systems is the same as the one of uncoupled linear random Gaussian estimation.

Sparse superposition (SS) codes were originally proposed as a
capacityachieving communication scheme over the additive white Gaussian noise
channel (AWGNC) [1]. Very recently, it was discovered that these codes are
universal, in the sense that they achieve capacity over any memoryless channel
under generalized approximate messagepassing (GAMP) decoding [2], although
this decoder has never been stated for SS codes. In this contribution we
introduce the GAMP decoder for SS codes, we confirm empirically the
universality of this communication scheme through its study on various channels
and we provide the main analysis tools: state evolution and potential. We also
compare the performance of GAMP with the Bayesoptimal MMSE decoder. We
empirically illustrate that despite the presence of a phase transition
preventing GAMP to reach the optimal performance, spatial coupling allows to
boost the performance that eventually tends to capacity in a proper limit. We
also prove that, in contrast with the AWGNC case, SS codes for binary input
channels have a vanishing error floor in the limit of large codewords.
Moreover, the performance of Hadamardbased encoders is assessed for practical
implementations.

We study the approximate messagepassing decoder for sparse superposition
coding on the additive white Gaussian noise channel and extend our preliminary
work [1]. We use heuristic statisticalphysicsbased tools such as the cavity
and the replica methods for the statistical analysis of the scheme. While
superposition codes asymptotically reach the Shannon capacity, we show that our
iterative decoder is limited by a phase transition similar to the one that
happens in Low Density Parity check codes. We consider two solutions to this
problem, that both allow to reach the Shannon capacity: i) a power allocation
strategy and ii) the use of spatial coupling, a novelty for these codes that
appears to be promising. We present in particular simulations suggesting that
spatial coupling is more robust and allows for better reconstruction at finite
code lengths. Finally, we show empirically that the use of a fast
Hadamardbased operator allows for an efficient reconstruction, both in terms
of computational time and memory, and the ability to deal with very large
messages.

We consider the estimation of a signal from the knowledge of its noisy linear
random Gaussian projections, a problem relevant in compressed sensing, sparse
superposition codes or code division multiple access just to cite few. There
has been a number of works considering the mutual information for this problem
using the heuristic replica method from statistical physics. Here we put these
considerations on a firm rigorous basis. First, we show, using a Guerratype
interpolation, that the replica formula yields an upper bound to the exact
mutual information. Secondly, for many relevant practical cases, we present a
converse lower bound via a method that uses spatial coupling, state evolution
analysis and the IMMSE theorem. This yields, in particular, a single letter
formula for the mutual information and the minimalmeansquare error for random
Gaussian linear estimation of all discrete bounded signals.

Factorizing lowrank matrices has many applications in machine learning and
statistics. For probabilistic models in the Bayes optimal setting, a general
expression for the mutual information has been proposed using heuristic
statistical physics computations, and proven in few specific cases. Here, we
show how to rigorously prove the conjectured formula for the symmetric rankone
case. This allows to express the minimal meansquareerror and to characterize
the detectability phase transitions in a large set of estimation problems
ranging from community detection to sparse PCA. We also show that for a large
set of parameters, an iterative algorithm called approximate messagepassing is
Bayes optimal. There exists, however, a gap between what currently known
polynomial algorithms can do and what is expected information theoretically.
Additionally, the proof technique has an interest of its own and exploits three
essential ingredients: the interpolation method introduced in statistical
physics by Guerra, the analysis of the approximate messagepassing algorithm
and the theory of spatial coupling and threshold saturation in coding. Our
approach is generic and applicable to other open problems in statistical
estimation where heuristic statistical physics predictions are available.

We recently proved threshold saturation for spatially coupled sparse
superposition codes on the additive white Gaussian noise channel. Here we
generalize our analysis to a much broader setting. We show for any memoryless
channel that spatial coupling allows generalized approximate messagepassing
(GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code
ensemble. Moreover in the large input alphabet size limit: i) the GAMP
algorithmic threshold of the underlying (or uncoupled) code ensemble is simply
expressed as a Fisher information; ii) the potential threshold tends to
Shannon's capacity. Although we focus on coding for sake of coherence with our
previous results, the framework and methods are very general and hold for a
wide class of generalized estimation problems with random linear mixing.

Recently, a new class of codes, called sparse superposition or sparse
regression codes, has been proposed for communication over the AWGN channel. It
has been proven that they achieve capacity using power allocation and various
forms of iterative decoding. Empirical evidence has also strongly suggested
that the codes achieve capacity when spatial coupling and approximate message
passing decoding are used, without need of power allocation. In this note we
prove that state evolution (which tracks message passing) indeed saturates the
potential threshold of the underlying code ensemble, which approaches in a
proper limit the optimal threshold. Our proof uses ideas developed in the
theory of lowdensity paritycheck codes and compressive sensing.

Reconstruction of images from noisy linear measurements is a core problem in
image processing, for which convex optimization methods based on total
variation (TV) minimization have been the longstanding stateoftheart. We
present an alternative probabilistic reconstruction procedure based on
approximate messagepassing, Scampi, which operates in the compressive regime,
where the inverse imaging problem is underdetermined. While the proposed method
is related to the recently proposed GrAMPA algorithm of Borgerding, Schniter,
and Rangan, we further develop the probabilistic approach to compressive
imaging by introducing an expectationmaximizaiton learning of model
parameters, making the Scampi robust to model uncertainties. Additionally, our
numerical experiments indicate that Scampi can provide reconstruction
performance superior to both GrAMPA as well as convex approaches to TV
reconstruction. Finally, through exhaustive bestcase experiments, we show that
in many cases the maximal performance of both Scampi and convex TV can be quite
close, even though the approaches are a prori distinct. The theoretical reasons
for this correspondence remain an open question. Nevertheless, the proposed
algorithm remains more practical, as it requires far less parameter tuning to
perform optimally.

This thesis is interested in the application of statistical physics methods
and inference to sparse linear estimation problems. The main tools are the
graphical models and approximate messagepassing algorithm together with the
cavity method. We will also use the replica method of statistical physics of
disordered systems which allows to associate to the studied problems a cost
function referred as the potential of free entropy in physics. It allows to
predict the different phases of typical complexity of the problem as a function
of external parameters such as the noise level or the number of measurements
one has about the signal: the inference can be typically easy, hard or
impossible. We will see that the hard phase corresponds to a regime of
coexistence of the actual solution together with another unwanted solution of
the message passing equations. In this phase, it represents a metastable state
which is not the true equilibrium solution. This phenomenon can be linked to
supercooled water blocked in the liquid state below its freezing critical
temperature. We will use a method that allows to overcome the metastability
mimicing the strategy adopted by nature itself for supercooled water: the
nucleation and spatial coupling. In supercooled water, a weak localized
perturbation is enough to create a crystal nucleus that will propagate in all
the medium thanks to the physical couplings between closeby atoms. The same
process will help the algorithm to find the signal, thanks to the introduction
of a nucleus containing local information about the signal. It will then spread
as a "reconstruction wave" similar to the crystal in the water. After an
introduction to statistical inference and sparse linear estimation, we will
introduce the necessary tools. Then we will move to applications of these
notions to signal processing and coding theory problems.

We study the behavior of Approximate MessagePassing, a solver for linear
sparse estimation problems such as compressed sensing, when the i.i.d matrices
for which it has been specifically designed are replaced by structured
operators, such as Fourier and Hadamard ones. We show empirically that after
proper randomization, the structure of the operators does not significantly
affect the performances of the solver. Furthermore, for some specially designed
spatially coupled operators, this allows a computationally fast and memory
efficient reconstruction in compressed sensing up to the
informationtheoretical limit. We also show how this approach can be applied to
sparse superposition codes, allowing the Approximate MessagePassing decoder to
perform at large rates for moderate block length.

These are notes from the lecture of R\"udiger Urbanke given at the autumn
school "Statistical Physics, Optimization, Inference, and MessagePassing
Algorithms", that took place in Les Houches, France from Monday September 30th,
2013, till Friday October 11th, 2013. The school was organized by Florent
Krzakala from UPMC and ENS Paris, Federico RicciTersenghi from La Sapienza
Roma, Lenka Zdeborov\`a from CEA Saclay and CNRS, and Riccardo Zecchina from
Politecnico Torino. The first three sections cover the basics of polar codes
and low density parity check codes. In the last three sections, we see how the
spatial coupling helps belief propagation decoding.

Superposition codes are efficient for the Additive White Gaussian Noise
channel. We provide here a replica analysis of the performances of these codes
for large signals. We also consider a Bayesian Approximate Message Passing
decoder based on a beliefpropagation approach, and discuss its performance
using the density evolution technic. Our main findings are 1) for the sizes we
can access, the messagepassing decoder outperforms other decoders studied in
the literature 2) its performance is limited by a sharp phase transition and 3)
while these codes reach capacity as $B$ (a crucial parameter in the code)
increases, the performance of the message passing decoder worsen as the phase
transition goes to lower rates.

We revisit the classical hardcore model, also known as independent set and
dual to vertex cover problem, where one puts particles with a firstneighbor
hardcore repulsion on the vertices of a random graph. Although the case of
random graphs with small and very large average degrees respectively are quite
well understood, they yield qualitatively different results and our aim here is
to reconciliate these two cases. We revisit results that can be obtained using
the (heuristic) cavity method and show that it provides a closedform
conjecture for the exact density of the densest packing on random regular
graphs with degree K>=20, and that for K>16 the nature of the phase transition
is the same as for large K. This also shows that the hardcode model is the
simplest meanfield lattice model for structural glasses and jamming.

We revisit the error correction scheme of realvalued signals when the
codeword is corrupted by gross errors on a fraction of entries and a small
noise on all the entries. Combining the recent developments of approximate
message passing and the spatiallycoupled measurement matrix in compressed
sensing we show that the error correction and its robustness towards noise can
be enhanced considerably. We discuss the performance in the large signal limit
using previous results on state evolution, as well as for finite size signals
through numerical simulations. Even for relatively small sizes, the approach
proposed here outperforms convexrelaxationbased decoders.

Compressed sensing is designed to measure sparse signals directly in a
compressed form. However, most signals of interest are only "approximately
sparse", i.e. even though the signal contains only a small fraction of relevant
(large) components the other components are not strictly equal to zero, but are
only close to zero. In this paper we model the approximately sparse signal with
a Gaussian distribution of small components, and we study its compressed
sensing with dense random matrices. We use replica calculations to determine
the meansquared error of the Bayesoptimal reconstruction for such signals, as
a function of the variance of the small components, the density of large
components and the measurement rate. We then use the GAMP algorithm and we
quantify the region of parameters for which this algorithm achieves optimality
(for large systems). Finally, we show that in the region where the GAMP for the
homogeneous measurement matrices is not optimal, a special "seeding" design of
a spatiallycoupled measurement matrix allows to restore optimality.