
This paper investigates the behavior of the MinSum message passing scheme to
solve systems of linear equations in the Laplacian matrices of graphs and to
compute electric flows. Voltage and flow problems involve the minimization of
quadratic functions and are fundamental primitives that arise in several
domains. Algorithms that have been proposed are typically centralized and
involve multiple graphtheoretic constructions or sampling mechanisms that make
them difficult to implement and analyze. On the other hand, message passing
routines are distributed, simple, and easy to implement. In this paper we
establish a framework to analyze MinSum to solve voltage and flow problems. We
characterize the error committed by the algorithm on general weighted graphs in
terms of hitting times of random walks defined on the computation trees that
support the operations of the algorithms with time. For $d$regular graphs with
equal weights, we show that the convergence of the algorithms is controlled by
the total variation distance between the distributions of nonbacktracking
random walks defined on the original graph that start from neighboring nodes.
The framework that we introduce extends the analysis of MinSum to settings
where the contraction arguments previously considered in the literature (based
on the assumption of walk summability or scaled diagonal dominance) can not be
used, possibly in the presence of constraints.

In probability theory and statistics notions of correlation among random
variables, decay of correlation, and biasvariance tradeoff are fundamental.
In this work we introduce analogous notions in optimization, and we show their
usefulness in a concrete setting. We propose a general notion of correlation
among variables in optimization procedures that is based on the sensitivity of
optimal points upon (possibly finite) perturbations. We present a canonical
instance in network optimization (the mincost network flow problem) that
exhibits locality, i.e., a setting where the correlation decays as a function
of the graphtheoretical distance in the network. In the case of warmstart
reoptimization, we develop a general approach to localize a given optimization
routine in order to exploit locality. We show that the localization mechanism
is responsible for introducing a bias in the original algorithm, and that the
biasvariance tradeoff that emerges can be exploited to minimize the
computational complexity required to reach a prescribed level of error
accuracy. We provide numerical evidence to support our claims.

This paper studies the performance of sparse regression codes for lossy
compression with the squarederror distortion criterion. In a sparse regression
code, codewords are linear combinations of subsets of columns of a design
matrix. It is shown that with minimumdistance encoding, sparse regression
codes achieve the Shannon ratedistortion function for i.i.d. Gaussian sources
$R^*(D)$ as well as the optimal excessdistortion exponent. This completes a
previous result which showed that $R^*(D)$ and the optimal exponent were
achievable for distortions below a certain threshold. The proof of the
ratedistortion result is based on the second moment method, a popular
technique to show that a nonnegative random variable $X$ is strictly positive
with high probability. In our context, $X$ is the number of codewords within
target distortion $D$ of the source sequence. We first identify the reason
behind the failure of the standard second moment method for certain
distortions, and illustrate the different failure modes via a stylized example.
We then use a refinement of the second moment method to show that $R^*(D)$ is
achievable for all distortion values. Finally, the refinement technique is
applied to Suen's correlation inequality to prove the achievability of the
optimal Gaussian excessdistortion exponent.

We apply the MinSum messagepassing protocol to solve the consensus problem
in distributed optimization. We show that while the ordinary MinSum algorithm
does not converge, a modified version of it known as Splitting yields
convergence to the problem solution. We prove that a proper choice of the
tuning parameters allows MinSum Splitting to yield subdiffusive accelerated
convergence rates, matching the rates obtained by shiftregister methods. The
acceleration scheme embodied by MinSum Splitting for the consensus problem
bears similarities with lifted Markov chains techniques and with multistep
first order methods in convex optimization.

We investigate the fundamental principles that drive the development of
scalable algorithms for network optimization. Despite the significant amount of
work on parallel and decentralized algorithms in the optimization community,
the methods that have been proposed typically rely on strict separability
assumptions for objective function and constraints. Beside sparsity, these
methods typically do not exploit the strength of the interaction between
variables in the system. We propose a notion of correlation in constrained
optimization that is based on the sensitivity of the optimal solution upon
perturbations of the constraints. We develop a general theory of sensitivity of
optimizers the extends beyond the infinitesimal setting. We present instances
in network optimization where the correlation decays exponentially fast with
respect to the natural distance in the network, and we design algorithms that
can exploit this decay to yield dimensionfree optimization. Our results are
the first of their kind, and open new possibilities in the theory of local
algorithms.

We study a new class of codes for lossy compression with the squarederror
distortion criterion, designed using the statistical framework of
highdimensional linear regression. Codewords are linear combinations of
subsets of columns of a design matrix. Called a Sparse Superposition or Sparse
Regression codebook, this structure is motivated by an analogous construction
proposed recently by Barron and Joseph for communication over an AWGN channel.
For i.i.d Gaussian sources and minimumdistance encoding, we show that such a
code can attain the Shannon ratedistortion function with the optimal error
exponent, for all distortions below a specified value. It is also shown that
sparse regression codes are robust in the following sense: a codebook designed
to compress an i.i.d Gaussian source of variance $\sigma^2$ with
(squarederror) distortion $D$ can compress any ergodic source of variance less
than $\sigma^2$ to within distortion $D$. Thus the sparse regression ensemble
retains many of the good covering properties of the i.i.d random Gaussian
ensemble, while having having a compact representation in terms of a matrix
whose size is a loworder polynomial in the blocklength.

Linear regression models depend directly on the design matrix and its
properties. Techniques that efficiently estimate model coefficients by
partitioning rows of the design matrix are increasingly popular for largescale
problems because they fit well with modern parallel computing architectures. We
propose a simple measure of {\em concordance} between a design matrix and a
subset of its rows that estimates how well a subset captures the
variancecovariance structure of a larger data set. We illustrate the use of
this measure in a heuristic method for selecting row partition sizes that
balance statistical and computational efficiency goals in realworld problems.

We propose computationally efficient encoders and decoders for lossy
compression using a Sparse Regression Code. The codebook is defined by a design
matrix and codewords are structured linear combinations of columns of this
matrix. The proposed encoding algorithm sequentially chooses columns of the
design matrix to successively approximate the source sequence. It is shown to
achieve the optimal distortionrate function for i.i.d Gaussian sources under
the squarederror distortion criterion. For a given rate, the parameters of the
design matrix can be varied to trade off distortion performance with encoding
complexity. An example of such a tradeoff as a function of the block length n
is the following. With computational resource (space or time) per source sample
of O((n/\log n)^2), for a fixed distortionlevel above the Gaussian
distortionrate function, the probability of excess distortion decays
exponentially in n. The Sparse Regression Code is robust in the following
sense: for any ergodic source, the proposed encoder achieves the optimal
distortionrate function of an i.i.d Gaussian source with the same variance.
Simulations show that the encoder has good empirical performance, especially at
low and moderate rates.

This paper considers a binary channel with deletions and insertions, where
each input bit is transformed in one of the following ways: it is deleted with
probability d, or an extra bit is added after it with probability i, or it is
transmitted unmodified with probability 1di. A computable lower bound on the
capacity of this channel is derived. The transformation of the input sequence
by the channel may be viewed in terms of runs as follows: some runs of the
input sequence get shorter/longer, some runs get deleted, and some new runs are
added. It is difficult for the decoder to synchronize the channel output
sequence to the transmitted codeword mainly due to deleted runs and new
inserted runs.
The main idea is a mutual information decomposition in terms of the rate
achieved by a suboptimal decoder that determines the positions of the deleted
and inserted runs in addition to decoding the transmitted codeword. The mutual
information between the channel input and output sequences is expressed as the
sum of the rate achieved by this decoder and the rate loss due to its
suboptimality. Obtaining computable lower bounds on each of these quantities
yields a lower bound on the capacity. The bounds proposed in this paper provide
the first characterization of achievable rates for channels with general
insertions, and for channels with both deletions and insertions. For the
special case of the deletion channel, the proposed bound improves on the
previous best lower bound for deletion probabilities up to 0.3.

Many storage channels admit reading and rewriting of the content at a given
cost. We consider rewritable channels with a hidden state which models the
unknown characteristics of the memory cell. In addition to mitigating the
effect of the write noise, rewrites can help the write controller obtain a
better estimate of the hidden state. The paper has two contributions. The first
is a lower bound on the capacity of a general rewritable channel with hidden
state. The lower bound is obtained using a coding scheme that combines
GelfandPinsker coding with superposition coding. The rewritable AWGN channel
is discussed as an example. The second contribution is a simple coding scheme
for a rewritable channel where the write noise and hidden state are both
uniformly distributed. It is shown that this scheme is asymptotically optimal
as the number of rewrites gets large.

We address the question of convergence in the loopy belief propagation (LBP)
algorithm. Specifically, we relate convergence of LBP to the existence of a
weak limit for a sequence of Gibbs measures defined on the LBP s associated
computation tree.Using tools FROM the theory OF Gibbs measures we develop
easily testable sufficient conditions FOR convergence.The failure OF
convergence OF LBP implies the existence OF multiple phases FOR the associated
Gibbs specification.These results give new insight INTO the mechanics OF the
algorithm.

We study a new class of codes for Gaussian multiterminal source and channel
coding. These codes are designed using the statistical framework of
highdimensional linear regression and are called Sparse Superposition or
Sparse Regression codes. Codewords are linear combinations of subsets of
columns of a design matrix. These codes were recently introduced by Barron and
Joseph and shown to achieve the channel capacity of AWGN channels with
computationally feasible decoding. They have also recently been shown to
achieve the optimal ratedistortion function for Gaussian sources. In this
paper, we demonstrate how to implement random binning and superposition coding
using sparse regression codes. In particular, with minimumdistance
encoding/decoding it is shown that sparse regression codes attain the optimal
informationtheoretic limits for a variety of multiterminal source and channel
coding problems.

Gaussian belief propagation (GaBP) is an iterative algorithm for computing
the mean of a multivariate Gaussian distribution, or equivalently, the minimum
of a multivariate positive definite quadratic function. Sufficient conditions,
such as walksummability, that guarantee the convergence and correctness of
GaBP are known, but GaBP may fail to converge to the correct solution given an
arbitrary positive definite quadratic function. As was observed in previous
work, the GaBP algorithm fails to converge if the computation trees produced by
the algorithm are not positive definite. In this work, we will show that the
failure modes of the GaBP algorithm can be understood via graph covers, and we
prove that a parameterized generalization of the minsum algorithm can be used
to ensure that the computation trees remain positive definite whenever the
input matrix is positive definite. We demonstrate that the resulting algorithm
is closely related to other iterative schemes for quadratic minimization such
as the GaussSeidel and Jacobi algorithms. Finally, we observe, empirically,
that there always exists a choice of parameters such that the above
generalization of the GaBP algorithm converges.

The maxproduct algorithm, a local messagepassing scheme that attempts to
compute the most probable assignment (MAP) of a given probability distribution,
has been successfully employed as a method of approximate inference for
applications arising in coding theory, computer vision, and machine learning.
However, the maxproduct algorithm is not guaranteed to converge to the MAP
assignment, and if it does, is not guaranteed to recover the MAP assignment.
Alternative convergent messagepassing schemes have been proposed to overcome
these difficulties. This work provides a systematic study of such
messagepassing algorithms that extends the known results by exhibiting new
sufficient conditions for convergence to local and/or global optima, providing
a combinatorial characterization of these optima based on graph covers, and
describing a new convergent and correct messagepassing algorithm whose
derivation unifies many of the known convergent messagepassing algorithms.
While convergent and correct messagepassing algorithms represent a step
forward in the analysis of maxproduct style messagepassing algorithms, the
conditions needed to guarantee convergence to a global optimum can be too
restrictive in both theory and practice. This limitation of convergent and
correct messagepassing schemes is characterized by graph covers and
illustrated by example.

The paper provides a simpler method for proving a delicate inequality that
was used by Achlioptis and Naor to establish asymptotic concentration for
chromatic numbers of ErdosRenyi random graphs. The simplifications come from
two new ideas. The first involves a sharpened form of a piece of statistical
folklore regarding goodnessoffit tests for twoway tables of Poisson counts
under linear conditioning constraints. The second idea takes the form of a new
inequality that controls the extreme tails of the distribution of a quadratic
form in independent Poissons random variables.

Variable length communication over a compound channel with feedback is
considered. Traditionally, capacity of a compound channel without feedback is
defined as the maximum rate that is determined before the start of
communication such that communication is reliable. This traditional definition
is pessimistic. In the presence of feedback, an opportunistic definition is
given. Capacity is defined as the maximum rate that is determined at the end of
communication such that communication is reliable. Thus, the transmission rate
can adapt to the channel chosen by nature. Under this definition, feedback
communication over a compound channel is conceptually similar to multiterminal
communication. Transmission rate is a vector rather than a scalar; channel
capacity is a region rather than a scalar; error exponent is a region rather
than a scalar. In this paper, variable length communication over a compound
channel with feedback is formulated, its opportunistic capacity region is
characterized, and lower bounds for its error exponent region are provided..

In this paper, we propose capacityachieving communication schemes for
Gaussian finitestate Markov channels (FSMCs) subject to an average channel
input power constraint, under the assumption that the transmitters can have
access to delayed noiseless output feedback as well as instantaneous or delayed
channel state information (CSI). We show that the proposed schemes reveals
connections between feedback communication and feedback control.

Inference of the network structure (e.g., routing topology) and dynamics
(e.g., link performance) is an essential component in many network design and
management tasks. In this paper we propose a new, general framework for
analyzing and designing routing topology and link performance inference
algorithms using ideas and tools from phylogenetic inference in evolutionary
biology. The framework is applicable to a variety of measurement techniques.
Based on the framework we introduce and develop several polynomialtime
distancebased inference algorithms with provable performance. We provide
sufficient conditions for the correctness of the algorithms. We show that the
algorithms are consistent (return correct topology and link performance with an
increasing sample size) and robust (can tolerate a certain level of measurement
errors). In addition, we establish certain optimality properties of the
algorithms (i.e., they achieve the optimal $l_\infty$radius) and demonstrate
their effectiveness via model simulation.

The error exponent of Markov channels with feedback is studied in the
variablelength blockcoding setting. Burnashev's classic result is extended
and a single letter characterization for the reliability function of
finitestate Markov channels is presented, under the assumption that the
channel state is causally observed both at the transmitter and at the receiver
side. Tools from stochastic control theory are used in order to treat channels
with intersymbol interference. In particular the convex analytical approach to
Markov decision processes is adopted to handle problems with stopping time
horizons arising from variablelength coding schemes.

For a stationary additive Gaussiannoise channel with a rational noise power
spectrum of a finiteorder $L$, we derive two new results for the feedback
capacity under an average channel input power constraint. First, we show that a
very simple feedbackdependent GaussMarkov source achieves the feedback
capacity, and that KalmanBucy filtering is optimal for processing the
feedback. Based on these results, we develop a new method for optimizing the
channel inputs for achieving the CoverPombra blocklength$n$ feedback
capacity by using a dynamic programming approach that decomposes the
computation into $n$ sequentially identical optimization problems where each
stage involves optimizing $O(L^2)$ variables. Second, we derive the explicit
maximal information rate for stationary feedbackdependent sources. In general,
evaluating the maximal information rate for stationary sources requires solving
only a few equations by simple nonlinear programming. For firstorder
autoregressive and/or moving average (ARMA) noise channels, this optimization
admits a closed form maximal information rate formula. The maximal information
rate for stationary sources is a lower bound on the feedback capacity, and it
equals the feedback capacity if the longstanding conjecture, that stationary
sources achieve the feedback capacity, holds.

We introduce a general framework for treating channels with memory and
feedback. First, we generalize Massey's concept of directed information and use
it to characterize the feedback capacity of general channels. Second, we
present coding results for Markov channels. This requires determining
appropriate sufficient statistics at the encoder and decoder. Third, a dynamic
programming framework for computing the capacity of Markov channels is
presented. Fourth, it is shown that the average cost optimality equation (ACOE)
can be viewed as an implicit singleletter characterization of the capacity.
Fifth, scenarios with simple sufficient statistics are described.