• We study an open discrete-time queueing network that models the collection of data in a multi-hop sensor network. We assume data is generated at the sensor nodes as a discrete-time Bernoulli process. All nodes in the network maintain a queue and relay data, which is to be finally collected by a designated sink. We prove that the resulting multi-dimensional Markov chain representing the queue size of nodes has two behavior regimes depending on the value of the rate of data generation. In particular, we show that there is a non-trivial critical value of data rate below which the chain is ergodic and converges to a stationary distribution and above which it is non-ergodic, i.e., the queues at the nodes grow in an unbounded manner. We show that the rate of convergence to stationarity is geometric in the sub-critical regime. We also show the connections of this process to a class of Laplacian systems of equations whose solutions include the important problem of finding the effective resistance between two nodes, a subroutine that has been widely used to develop efficient algorithms for a number of computational problems. Hence our work provides the theoretical basis for a new class of distributed algorithms for these problems.
  • We study in-network computation on general network topologies. Specifically, we are given the description of a function, and a network with distinct nodes at which the operands of the function are made available, and a designated sink where the computed value of the function is to be consumed. We want to compute the function during the process of moving the data towards the sink. Such settings have been studied in the literature, but mainly for symmetric functions, e.g. average, parity etc., which have the specific property that the output is invariant to permutation of the operands. To the best of our knowledge, we present the first fully decentralised algorithms for arbitrary functions, which we model as those functions whose computation schema is structured as a binary tree. We propose two algorithms, Fixed Random-Compute and Flexible Random-Compute, for this problem, both of which use simple random walks on the network as their basic primitive. Assuming a stochastic model for the generation of streams of data at each source, we provide a lower and an upper bound on the rate at which Fixed Random-Compute can compute the stream of associated function values. Note that the lower bound on rate though computed for our algorithm serves as a general lower bound for the function computation problem and to the best of our knowledge is first such lower bound for asymmetric functions. We also provide upper bounds on the average time taken to compute the function, characterising this time in terms of the fundamental parameters of the random walk on the network: the hitting time in the case of Fixed Random-Compute, and the mixing time in the case of Flexible Random-Compute.
  • Given a capacitated communication network $\mathcal{N}$ and a function f that needs to be computed on $\mathcal{N},$ we study the problem of generating a computation and communication schedule in $\mathcal{N}$ to maximize the rate of computation of f. Shah et. al.[IEEE Journal of Selected Areas in Communication, 2013] studied this problem when the computation schema $\mathcal{G}$ for f is a tree. We define the notion of a schedule when $\mathcal{G}$ is a general DAG and show that finding an optimal schedule is equivalent to finding the solution of a packing LP. We prove that approximating the maximum rate is MAX SNP-hard by looking at the packing LP. For this packing LP we prove that solving the separation oracle of its dual is equivalent to solving the LP. The separation oracle of the dual reduces to the problem of finding minimum cost embedding given $\mathcal{N},\mathcal{G},$ which we prove to be MAX SNP-hard even when $\mathcal{G}$ has bounded degree and bounded edge weights and $\mathcal{N}$ has just three vertices. We present a polynomial time algorithm to compute the maximum rate of function computation when $\mathcal{N}$ has two vertices by reducing the problem to a version of submodular function minimization problem. For the general $\mathcal{N}$ we study restricted class of schedules and its equivalent packing LP. We observe that for this packing LP also the separation oracle of its dual reduces to finding minimum cost embedding. A version of this minimum cost embedding problem has been studied in literature. We present a quadratic integer program for the minimum cost embedding problem and its linear programming relaxation based on earthmover metric. We also present some approximate algorithms for special classes of $\mathcal{G}.$
  • We consider optimal distributed computation of a given function of distributed data. The input (data) nodes and the sink node that receives the function form a connected network that is described by an undirected weighted network graph. The algorithm to compute the given function is described by a weighted directed acyclic graph and is called the computation graph. An embedding defines the computation communication sequence that obtains the function at the sink. Two kinds of optimal embeddings are sought, the embedding that---(1)~minimizes delay in obtaining function at sink, and (2)~minimizes cost of one instance of computation of function. This abstraction is motivated by three applications---in-network computation over sensor networks, operator placement in distributed databases, and module placement in distributed computing. We first show that obtaining minimum-delay and minimum-cost embeddings are both NP-complete problems and that cost minimization is actually MAX SNP-hard. Next, we consider specific forms of the computation graph for which polynomial time solutions are possible. When the computation graph is a tree, a polynomial time algorithm to obtain the minimum delay embedding is described. Next, for the case when the function is described by a layered graph we describe an algorithm that obtains the minimum cost embedding in polynomial time. This algorithm can also be used to obtain an approximation for delay minimization. We then consider bounded treewidth computation graphs and give an algorithm to obtain the minimum cost embedding in polynomial time.
  • We consider the problem of estimating functions of distributed data using a distributed algorithm over a network. The extant literature on computing functions in distributed networks such as wired and wireless sensor networks and peer-to-peer networks deals with computing linear functions of the distributed data when the alphabet size of the data values is small, O(1). We describe a distributed randomized algorithm to estimate a class of non-linear functions of the distributed data which is over a large alphabet. We consider three types of networks: point-to-point networks with gossip based communication, random planar networks in the connectivity regime and random planar networks in the percolating regime both of which use the slotted Aloha communication protocol. For each network type, we estimate the scaled $k$-th frequency moments, for $k \geq 2$. Specifically, for every $k \geq 2,$ we give a distributed randomized algorithm that computes, with probability $(1-\delta),$ an $\epsilon$-approximation of the scaled $k$-th frequency moment, $F_k/N^k$, using time $O(M^{1-\frac{1}{k-1}} T)$ and $O(M^{1-\frac{1}{k-1}} \log N \log (\delta^{-1})/\epsilon^2)$ bits of transmission per communication step. Here, $N$ is the number of nodes in the network, $T$ is the information spreading time and $M=o(N)$ is the alphabet size.