• This paper investigates the behavior of the Min-Sum message passing scheme to solve systems of linear equations in the Laplacian matrices of graphs and to compute electric flows. Voltage and flow problems involve the minimization of quadratic functions and are fundamental primitives that arise in several domains. Algorithms that have been proposed are typically centralized and involve multiple graph-theoretic constructions or sampling mechanisms that make them difficult to implement and analyze. On the other hand, message passing routines are distributed, simple, and easy to implement. In this paper we establish a framework to analyze Min-Sum to solve voltage and flow problems. We characterize the error committed by the algorithm on general weighted graphs in terms of hitting times of random walks defined on the computation trees that support the operations of the algorithms with time. For $d$-regular graphs with equal weights, we show that the convergence of the algorithms is controlled by the total variation distance between the distributions of non-backtracking random walks defined on the original graph that start from neighboring nodes. The framework that we introduce extends the analysis of Min-Sum to settings where the contraction arguments previously considered in the literature (based on the assumption of walk summability or scaled diagonal dominance) can not be used, possibly in the presence of constraints.
  • In probability theory and statistics notions of correlation among random variables, decay of correlation, and bias-variance trade-off are fundamental. In this work we introduce analogous notions in optimization, and we show their usefulness in a concrete setting. We propose a general notion of correlation among variables in optimization procedures that is based on the sensitivity of optimal points upon (possibly finite) perturbations. We present a canonical instance in network optimization (the min-cost network flow problem) that exhibits locality, i.e., a setting where the correlation decays as a function of the graph-theoretical distance in the network. In the case of warm-start reoptimization, we develop a general approach to localize a given optimization routine in order to exploit locality. We show that the localization mechanism is responsible for introducing a bias in the original algorithm, and that the bias-variance trade-off that emerges can be exploited to minimize the computational complexity required to reach a prescribed level of error accuracy. We provide numerical evidence to support our claims.
  • This paper studies the performance of sparse regression codes for lossy compression with the squared-error distortion criterion. In a sparse regression code, codewords are linear combinations of subsets of columns of a design matrix. It is shown that with minimum-distance encoding, sparse regression codes achieve the Shannon rate-distortion function for i.i.d. Gaussian sources $R^*(D)$ as well as the optimal excess-distortion exponent. This completes a previous result which showed that $R^*(D)$ and the optimal exponent were achievable for distortions below a certain threshold. The proof of the rate-distortion result is based on the second moment method, a popular technique to show that a non-negative random variable $X$ is strictly positive with high probability. In our context, $X$ is the number of codewords within target distortion $D$ of the source sequence. We first identify the reason behind the failure of the standard second moment method for certain distortions, and illustrate the different failure modes via a stylized example. We then use a refinement of the second moment method to show that $R^*(D)$ is achievable for all distortion values. Finally, the refinement technique is applied to Suen's correlation inequality to prove the achievability of the optimal Gaussian excess-distortion exponent.
  • We apply the Min-Sum message-passing protocol to solve the consensus problem in distributed optimization. We show that while the ordinary Min-Sum algorithm does not converge, a modified version of it known as Splitting yields convergence to the problem solution. We prove that a proper choice of the tuning parameters allows Min-Sum Splitting to yield subdiffusive accelerated convergence rates, matching the rates obtained by shift-register methods. The acceleration scheme embodied by Min-Sum Splitting for the consensus problem bears similarities with lifted Markov chains techniques and with multi-step first order methods in convex optimization.
  • We investigate the fundamental principles that drive the development of scalable algorithms for network optimization. Despite the significant amount of work on parallel and decentralized algorithms in the optimization community, the methods that have been proposed typically rely on strict separability assumptions for objective function and constraints. Beside sparsity, these methods typically do not exploit the strength of the interaction between variables in the system. We propose a notion of correlation in constrained optimization that is based on the sensitivity of the optimal solution upon perturbations of the constraints. We develop a general theory of sensitivity of optimizers the extends beyond the infinitesimal setting. We present instances in network optimization where the correlation decays exponentially fast with respect to the natural distance in the network, and we design algorithms that can exploit this decay to yield dimension-free optimization. Our results are the first of their kind, and open new possibilities in the theory of local algorithms.
  • We study a new class of codes for lossy compression with the squared-error distortion criterion, designed using the statistical framework of high-dimensional linear regression. Codewords are linear combinations of subsets of columns of a design matrix. Called a Sparse Superposition or Sparse Regression codebook, this structure is motivated by an analogous construction proposed recently by Barron and Joseph for communication over an AWGN channel. For i.i.d Gaussian sources and minimum-distance encoding, we show that such a code can attain the Shannon rate-distortion function with the optimal error exponent, for all distortions below a specified value. It is also shown that sparse regression codes are robust in the following sense: a codebook designed to compress an i.i.d Gaussian source of variance $\sigma^2$ with (squared-error) distortion $D$ can compress any ergodic source of variance less than $\sigma^2$ to within distortion $D$. Thus the sparse regression ensemble retains many of the good covering properties of the i.i.d random Gaussian ensemble, while having having a compact representation in terms of a matrix whose size is a low-order polynomial in the block-length.
  • Linear regression models depend directly on the design matrix and its properties. Techniques that efficiently estimate model coefficients by partitioning rows of the design matrix are increasingly popular for large-scale problems because they fit well with modern parallel computing architectures. We propose a simple measure of {\em concordance} between a design matrix and a subset of its rows that estimates how well a subset captures the variance-covariance structure of a larger data set. We illustrate the use of this measure in a heuristic method for selecting row partition sizes that balance statistical and computational efficiency goals in real-world problems.
  • We propose computationally efficient encoders and decoders for lossy compression using a Sparse Regression Code. The codebook is defined by a design matrix and codewords are structured linear combinations of columns of this matrix. The proposed encoding algorithm sequentially chooses columns of the design matrix to successively approximate the source sequence. It is shown to achieve the optimal distortion-rate function for i.i.d Gaussian sources under the squared-error distortion criterion. For a given rate, the parameters of the design matrix can be varied to trade off distortion performance with encoding complexity. An example of such a trade-off as a function of the block length n is the following. With computational resource (space or time) per source sample of O((n/\log n)^2), for a fixed distortion-level above the Gaussian distortion-rate function, the probability of excess distortion decays exponentially in n. The Sparse Regression Code is robust in the following sense: for any ergodic source, the proposed encoder achieves the optimal distortion-rate function of an i.i.d Gaussian source with the same variance. Simulations show that the encoder has good empirical performance, especially at low and moderate rates.
  • This paper considers a binary channel with deletions and insertions, where each input bit is transformed in one of the following ways: it is deleted with probability d, or an extra bit is added after it with probability i, or it is transmitted unmodified with probability 1-d-i. A computable lower bound on the capacity of this channel is derived. The transformation of the input sequence by the channel may be viewed in terms of runs as follows: some runs of the input sequence get shorter/longer, some runs get deleted, and some new runs are added. It is difficult for the decoder to synchronize the channel output sequence to the transmitted codeword mainly due to deleted runs and new inserted runs. The main idea is a mutual information decomposition in terms of the rate achieved by a sub-optimal decoder that determines the positions of the deleted and inserted runs in addition to decoding the transmitted codeword. The mutual information between the channel input and output sequences is expressed as the sum of the rate achieved by this decoder and the rate loss due to its sub-optimality. Obtaining computable lower bounds on each of these quantities yields a lower bound on the capacity. The bounds proposed in this paper provide the first characterization of achievable rates for channels with general insertions, and for channels with both deletions and insertions. For the special case of the deletion channel, the proposed bound improves on the previous best lower bound for deletion probabilities up to 0.3.
  • Many storage channels admit reading and rewriting of the content at a given cost. We consider rewritable channels with a hidden state which models the unknown characteristics of the memory cell. In addition to mitigating the effect of the write noise, rewrites can help the write controller obtain a better estimate of the hidden state. The paper has two contributions. The first is a lower bound on the capacity of a general rewritable channel with hidden state. The lower bound is obtained using a coding scheme that combines Gelfand-Pinsker coding with superposition coding. The rewritable AWGN channel is discussed as an example. The second contribution is a simple coding scheme for a rewritable channel where the write noise and hidden state are both uniformly distributed. It is shown that this scheme is asymptotically optimal as the number of rewrites gets large.
  • We address the question of convergence in the loopy belief propagation (LBP) algorithm. Specifically, we relate convergence of LBP to the existence of a weak limit for a sequence of Gibbs measures defined on the LBP s associated computation tree.Using tools FROM the theory OF Gibbs measures we develop easily testable sufficient conditions FOR convergence.The failure OF convergence OF LBP implies the existence OF multiple phases FOR the associated Gibbs specification.These results give new insight INTO the mechanics OF the algorithm.
  • We study a new class of codes for Gaussian multi-terminal source and channel coding. These codes are designed using the statistical framework of high-dimensional linear regression and are called Sparse Superposition or Sparse Regression codes. Codewords are linear combinations of subsets of columns of a design matrix. These codes were recently introduced by Barron and Joseph and shown to achieve the channel capacity of AWGN channels with computationally feasible decoding. They have also recently been shown to achieve the optimal rate-distortion function for Gaussian sources. In this paper, we demonstrate how to implement random binning and superposition coding using sparse regression codes. In particular, with minimum-distance encoding/decoding it is shown that sparse regression codes attain the optimal information-theoretic limits for a variety of multi-terminal source and channel coding problems.
  • Gaussian belief propagation (GaBP) is an iterative algorithm for computing the mean of a multivariate Gaussian distribution, or equivalently, the minimum of a multivariate positive definite quadratic function. Sufficient conditions, such as walk-summability, that guarantee the convergence and correctness of GaBP are known, but GaBP may fail to converge to the correct solution given an arbitrary positive definite quadratic function. As was observed in previous work, the GaBP algorithm fails to converge if the computation trees produced by the algorithm are not positive definite. In this work, we will show that the failure modes of the GaBP algorithm can be understood via graph covers, and we prove that a parameterized generalization of the min-sum algorithm can be used to ensure that the computation trees remain positive definite whenever the input matrix is positive definite. We demonstrate that the resulting algorithm is closely related to other iterative schemes for quadratic minimization such as the Gauss-Seidel and Jacobi algorithms. Finally, we observe, empirically, that there always exists a choice of parameters such that the above generalization of the GaBP algorithm converges.
  • The max-product algorithm, a local message-passing scheme that attempts to compute the most probable assignment (MAP) of a given probability distribution, has been successfully employed as a method of approximate inference for applications arising in coding theory, computer vision, and machine learning. However, the max-product algorithm is not guaranteed to converge to the MAP assignment, and if it does, is not guaranteed to recover the MAP assignment. Alternative convergent message-passing schemes have been proposed to overcome these difficulties. This work provides a systematic study of such message-passing algorithms that extends the known results by exhibiting new sufficient conditions for convergence to local and/or global optima, providing a combinatorial characterization of these optima based on graph covers, and describing a new convergent and correct message-passing algorithm whose derivation unifies many of the known convergent message-passing algorithms. While convergent and correct message-passing algorithms represent a step forward in the analysis of max-product style message-passing algorithms, the conditions needed to guarantee convergence to a global optimum can be too restrictive in both theory and practice. This limitation of convergent and correct message-passing schemes is characterized by graph covers and illustrated by example.
  • The paper provides a simpler method for proving a delicate inequality that was used by Achlioptis and Naor to establish asymptotic concentration for chromatic numbers of Erdos-Renyi random graphs. The simplifications come from two new ideas. The first involves a sharpened form of a piece of statistical folklore regarding goodness-of-fit tests for two-way tables of Poisson counts under linear conditioning constraints. The second idea takes the form of a new inequality that controls the extreme tails of the distribution of a quadratic form in independent Poissons random variables.
  • Variable length communication over a compound channel with feedback is considered. Traditionally, capacity of a compound channel without feedback is defined as the maximum rate that is determined before the start of communication such that communication is reliable. This traditional definition is pessimistic. In the presence of feedback, an opportunistic definition is given. Capacity is defined as the maximum rate that is determined at the end of communication such that communication is reliable. Thus, the transmission rate can adapt to the channel chosen by nature. Under this definition, feedback communication over a compound channel is conceptually similar to multi-terminal communication. Transmission rate is a vector rather than a scalar; channel capacity is a region rather than a scalar; error exponent is a region rather than a scalar. In this paper, variable length communication over a compound channel with feedback is formulated, its opportunistic capacity region is characterized, and lower bounds for its error exponent region are provided..
  • In this paper, we propose capacity-achieving communication schemes for Gaussian finite-state Markov channels (FSMCs) subject to an average channel input power constraint, under the assumption that the transmitters can have access to delayed noiseless output feedback as well as instantaneous or delayed channel state information (CSI). We show that the proposed schemes reveals connections between feedback communication and feedback control.
  • Inference of the network structure (e.g., routing topology) and dynamics (e.g., link performance) is an essential component in many network design and management tasks. In this paper we propose a new, general framework for analyzing and designing routing topology and link performance inference algorithms using ideas and tools from phylogenetic inference in evolutionary biology. The framework is applicable to a variety of measurement techniques. Based on the framework we introduce and develop several polynomial-time distance-based inference algorithms with provable performance. We provide sufficient conditions for the correctness of the algorithms. We show that the algorithms are consistent (return correct topology and link performance with an increasing sample size) and robust (can tolerate a certain level of measurement errors). In addition, we establish certain optimality properties of the algorithms (i.e., they achieve the optimal $l_\infty$-radius) and demonstrate their effectiveness via model simulation.
  • The error exponent of Markov channels with feedback is studied in the variable-length block-coding setting. Burnashev's classic result is extended and a single letter characterization for the reliability function of finite-state Markov channels is presented, under the assumption that the channel state is causally observed both at the transmitter and at the receiver side. Tools from stochastic control theory are used in order to treat channels with intersymbol interference. In particular the convex analytical approach to Markov decision processes is adopted to handle problems with stopping time horizons arising from variable-length coding schemes.
  • For a stationary additive Gaussian-noise channel with a rational noise power spectrum of a finite-order $L$, we derive two new results for the feedback capacity under an average channel input power constraint. First, we show that a very simple feedback-dependent Gauss-Markov source achieves the feedback capacity, and that Kalman-Bucy filtering is optimal for processing the feedback. Based on these results, we develop a new method for optimizing the channel inputs for achieving the Cover-Pombra block-length-$n$ feedback capacity by using a dynamic programming approach that decomposes the computation into $n$ sequentially identical optimization problems where each stage involves optimizing $O(L^2)$ variables. Second, we derive the explicit maximal information rate for stationary feedback-dependent sources. In general, evaluating the maximal information rate for stationary sources requires solving only a few equations by simple non-linear programming. For first-order autoregressive and/or moving average (ARMA) noise channels, this optimization admits a closed form maximal information rate formula. The maximal information rate for stationary sources is a lower bound on the feedback capacity, and it equals the feedback capacity if the long-standing conjecture, that stationary sources achieve the feedback capacity, holds.
  • We introduce a general framework for treating channels with memory and feedback. First, we generalize Massey's concept of directed information and use it to characterize the feedback capacity of general channels. Second, we present coding results for Markov channels. This requires determining appropriate sufficient statistics at the encoder and decoder. Third, a dynamic programming framework for computing the capacity of Markov channels is presented. Fourth, it is shown that the average cost optimality equation (ACOE) can be viewed as an implicit single-letter characterization of the capacity. Fifth, scenarios with simple sufficient statistics are described.