
We study an open discretetime queueing network that models the collection of
data in a multihop sensor network. We assume data is generated at the sensor
nodes as a discretetime Bernoulli process. All nodes in the network maintain a
queue and relay data, which is to be finally collected by a designated sink. We
prove that the resulting multidimensional Markov chain representing the queue
size of nodes has two behavior regimes depending on the value of the rate of
data generation. In particular, we show that there is a nontrivial critical
value of data rate below which the chain is ergodic and converges to a
stationary distribution and above which it is nonergodic, i.e., the queues at
the nodes grow in an unbounded manner. We show that the rate of convergence to
stationarity is geometric in the subcritical regime. We also show the
connections of this process to a class of Laplacian systems of equations whose
solutions include the important problem of finding the effective resistance
between two nodes, a subroutine that has been widely used to develop efficient
algorithms for a number of computational problems. Hence our work provides the
theoretical basis for a new class of distributed algorithms for these problems.

We study innetwork computation on general network topologies. Specifically,
we are given the description of a function, and a network with distinct nodes
at which the operands of the function are made available, and a designated sink
where the computed value of the function is to be consumed. We want to compute
the function during the process of moving the data towards the sink. Such
settings have been studied in the literature, but mainly for symmetric
functions, e.g. average, parity etc., which have the specific property that the
output is invariant to permutation of the operands. To the best of our
knowledge, we present the first fully decentralised algorithms for arbitrary
functions, which we model as those functions whose computation schema is
structured as a binary tree. We propose two algorithms, Fixed RandomCompute
and Flexible RandomCompute, for this problem, both of which use simple random
walks on the network as their basic primitive. Assuming a stochastic model for
the generation of streams of data at each source, we provide a lower and an
upper bound on the rate at which Fixed RandomCompute can compute the stream of
associated function values. Note that the lower bound on rate though computed
for our algorithm serves as a general lower bound for the function computation
problem and to the best of our knowledge is first such lower bound for
asymmetric functions. We also provide upper bounds on the average time taken to
compute the function, characterising this time in terms of the fundamental
parameters of the random walk on the network: the hitting time in the case of
Fixed RandomCompute, and the mixing time in the case of Flexible
RandomCompute.

Given a capacitated communication network $\mathcal{N}$ and a function f that
needs to be computed on $\mathcal{N},$ we study the problem of generating a
computation and communication schedule in $\mathcal{N}$ to maximize the rate of
computation of f. Shah et. al.[IEEE Journal of Selected Areas in Communication,
2013] studied this problem when the computation schema $\mathcal{G}$ for f is a
tree. We define the notion of a schedule when $\mathcal{G}$ is a general DAG
and show that finding an optimal schedule is equivalent to finding the solution
of a packing LP. We prove that approximating the maximum rate is MAX SNPhard
by looking at the packing LP. For this packing LP we prove that solving the
separation oracle of its dual is equivalent to solving the LP. The separation
oracle of the dual reduces to the problem of finding minimum cost embedding
given $\mathcal{N},\mathcal{G},$ which we prove to be MAX SNPhard even when
$\mathcal{G}$ has bounded degree and bounded edge weights and $\mathcal{N}$ has
just three vertices. We present a polynomial time algorithm to compute the
maximum rate of function computation when $\mathcal{N}$ has two vertices by
reducing the problem to a version of submodular function minimization problem.
For the general $\mathcal{N}$ we study restricted class of schedules and its
equivalent packing LP. We observe that for this packing LP also the separation
oracle of its dual reduces to finding minimum cost embedding. A version of this
minimum cost embedding problem has been studied in literature. We present a
quadratic integer program for the minimum cost embedding problem and its linear
programming relaxation based on earthmover metric. We also present some
approximate algorithms for special classes of $\mathcal{G}.$

We consider optimal distributed computation of a given function of
distributed data. The input (data) nodes and the sink node that receives the
function form a connected network that is described by an undirected weighted
network graph. The algorithm to compute the given function is described by a
weighted directed acyclic graph and is called the computation graph. An
embedding defines the computation communication sequence that obtains the
function at the sink. Two kinds of optimal embeddings are sought, the embedding
that(1)~minimizes delay in obtaining function at sink, and (2)~minimizes
cost of one instance of computation of function. This abstraction is motivated
by three applicationsinnetwork computation over sensor networks, operator
placement in distributed databases, and module placement in distributed
computing.
We first show that obtaining minimumdelay and minimumcost embeddings are
both NPcomplete problems and that cost minimization is actually MAX SNPhard.
Next, we consider specific forms of the computation graph for which polynomial
time solutions are possible. When the computation graph is a tree, a polynomial
time algorithm to obtain the minimum delay embedding is described. Next, for
the case when the function is described by a layered graph we describe an
algorithm that obtains the minimum cost embedding in polynomial time. This
algorithm can also be used to obtain an approximation for delay minimization.
We then consider bounded treewidth computation graphs and give an algorithm to
obtain the minimum cost embedding in polynomial time.

We consider the problem of estimating functions of distributed data using a
distributed algorithm over a network. The extant literature on computing
functions in distributed networks such as wired and wireless sensor networks
and peertopeer networks deals with computing linear functions of the
distributed data when the alphabet size of the data values is small, O(1). We
describe a distributed randomized algorithm to estimate a class of nonlinear
functions of the distributed data which is over a large alphabet. We consider
three types of networks: pointtopoint networks with gossip based
communication, random planar networks in the connectivity regime and random
planar networks in the percolating regime both of which use the slotted Aloha
communication protocol. For each network type, we estimate the scaled $k$th
frequency moments, for $k \geq 2$. Specifically, for every $k \geq 2,$ we give
a distributed randomized algorithm that computes, with probability
$(1\delta),$ an $\epsilon$approximation of the scaled $k$th frequency
moment, $F_k/N^k$, using time $O(M^{1\frac{1}{k1}} T)$ and
$O(M^{1\frac{1}{k1}} \log N \log (\delta^{1})/\epsilon^2)$ bits of
transmission per communication step. Here, $N$ is the number of nodes in the
network, $T$ is the information spreading time and $M=o(N)$ is the alphabet
size.