
Processing data collected by a network of agents often boils down to solving
an optimization problem. The distributed nature of these problems calls for
methods that are, themselves, distributed. While most collaborative learning
problems require agents to reach a common (or consensus) model, there are
situations in which the consensus solution may not be optimal. For instance,
agents may want to reach a compromise between agreeing with their neighbors and
minimizing a personal loss function. We present DJAM, a Jacobilike distributed
algorithm for learning personalized models. This method is
implementationfriendly: it has no hyperparameters that need tuning, it is
asynchronous, and its updates only require singleneighbor interactions. We
prove that DJAM converges with probability one to the solution, provided that
the personal loss functions are strongly convex and have Lipschitz gradient. We
then give evidence that DJAM is on par with stateoftheart methods: our
method reaches a solution with error similar to the error of a carefully tuned
ADMM in about the same number of singleneighbor interactions.

We construct a novel preorder on the set of nodes of a simple undirected
graph. We prove that the preorder (induced by the topology of the graph) is
preserved, e.g., by the logistic dynamical system (both in discrete and
continuous time). Moreover, the underlying equivalence relation of the preorder
corresponds to the coarsest equitable partition (CEP). This will further imply
that the logistic dynamical system on a graph preserves its coarsest equitable
partition. The results provide a nontrivial invariant set for the logistic and
the like dynamical systems, as we show. We note that our construct provides a
functional characterization for the CEP as an alternative to the pure set
theoretical iterated degree sequences characterization. The construct and
results presented might have independent interest for analysis on graphs or
qualitative analysis of dynamical systems over networks.

We establish a functional weak law of large numbers for observable
macroscopic state variables of interacting particle systems (e.g., voter and
contact processes) over fast timevarying sparse random networks of
interactions. We show that, as the number of agents $N$ grows large, the
proportion of agents $\left(\overline{Y}_{k}^{N}(t)\right)$ at a certain state
$k$ converges in distribution  or, more precisely, weakly with respect to the
uniform topology on the space of \emph{c\`adl\`ag} sample paths  to the
solution of an ordinary differential equation over any compact interval
$\left[0,T\right]$. Although the limiting process is Markov, the prelimit
processes, i.e., the normalized macrostate vector processes
$\left(\mathbf{\overline{Y}}^{N}(t)\right)=\left(\overline{Y}_{1}^{N}(t),\ldots,\overline{Y}_{K}^{N}(t)\right)$,
are nonMarkov as they are tied to the \emph{highdimensional} microscopic
state of the system, which precludes the direct application of standard
arguments for establishing weak convergence. The techniques developed in the
paper for establishing weak convergence might be of independent interest.

Fictitious play (FP) is a canonical gametheoretic learning algorithm which
has been deployed extensively in decentralized control scenarios. However
standard treatments of FP, and of many other gametheoretic models, assume
rather idealistic conditions which rarely hold in realistic control scenarios.
This paper considers a broad class of best response learning algorithms, that
we refer to as FPtype algorithms. In such an algorithm, given some (possibly
limited) information about the history of actions, each individual forecasts
the future play and chooses a (myopic) best action given their forecast. We
provide a unifed analysis of the behavior of FPtype algorithms under an
important class of perturbations, thus demonstrating robustness to deviations
from the idealistic operating conditions that have been previously assumed.
This robustness result is then used to derive convergence results for two
controlrelevant relaxations of standard gametheoretic applications:
distributed (networkbased) implementation without full observability and
asynchronous deployment (including in continuous time). In each case the
results follow as a direct consequence of the main robustness result.

We address the sensor network localization problem given noisy range
measurements between pairs of nodes. We approach the nonconvex
maximumlikelihood formulation via a known simple convex relaxation. We exploit
its favorable optimization properties to the full to obtain an approach that:
is completely distributed, has a simple implementation at each node, and
capitalizes on an optimal gradient method to attain fast convergence. We offer
a parallel but also an asynchronous flavor, both with theoretical convergence
guarantees and iteration complexity analysis. Experimental results establish
leading performance. Our algorithms top the accuracy of a comparable state of
the art method by one order of magnitude, using one order of magnitude fewer
communications.

With the help of an inband fullduplex relay station, it is possible to
simultaneously transmit and receive signals from multiple users. The
performance of such system can be greatly increased when the relay station is
equipped with a large number of antennas on both transmitter and receiver
sides. In this paper, we exploit the use of massive arrays to effectively
suppress the loopback interference (LI) of a decodeandforward relay (DF) and
evaluate the performance of the endtoend (e2e) transmission. This paper
assumes imperfect channel state information is available at the relay and
designs a minimum meansquare error (MMSE) filter to mitigate the interference.
Subsequently, we adopt zeroforcing (ZF) filters for both detection and
beamforming. The performance of such system is evaluated in terms of bit error
rate (BER) at both relay and destinations, and an optimal choice for the
transmission power at the relay is shown. We then propose a complexity
efficient optimal power allocation (OPA) algorithm that, using the channel
statistics, computes the minimum power that satisfies the rate constraints of
each pair. The results obtained via simulation show that when both MMSE
filtering and OPA method are used, better values for the energy efficiency are
attained.

We find large deviations rates for consensusbased distributed inference for
directed networks. When the topology is deterministic, we establish the large
deviations principle and find exactly the corresponding rate function, equal at
all nodes. We show that the dependence of the rate function on the stochastic
weight matrix associated with the network is fully captured by its left
eigenvector corresponding to the unit eigenvalue. Further, when the sensors'
observations are Gaussian, the rate function admits a closed form expression.
Motivated by these observations, we formulate the optimal network design
problem of finding the left eigenvector which achieves the highest value of the
rate function, for a given target accuracy. This eigenvector therefore
minimizes the time that the inference algorithm needs to reach the desired
accuracy. For Gaussian observations, we show that the network design problem
can be formulated as a semidefinite (convex) program, and hence can be solved
efficiently. When observations are identically distributed across agents, the
system exhibits an interesting property: the graph of the rate function always
lies between the graphs of the rate function of an isolated node and the rate
function of a fusion center that has access to all observations. We prove that
this fundamental property holds even when the topology and the associated
system matrices change randomly over time, with arbitrary distribution. Due to
generality of its assumptions, the latter result requires more subtle
techniques than the standard large deviations tools, contributing to the
general theory of large deviations.

The paper studies the highly prototypical Fictitious Play (FP) algorithm, as
well as a broad class of learning processes based on bestresponse dynamics,
that we refer to as FPtype algorithms. A wellknown shortcoming of FP is that,
while players may learn an equilibrium strategy in some abstract sense, there
are no guarantees that the periodbyperiod strategies generated by the
algorithm actually converge to equilibrium themselves. This issue is
fundamentally related to the discontinuous nature of the best response
correspondence and is inherited by many FPtype algorithms. Not only does it
cause problems in the interpretation of such algorithms as a mechanism for
economic and social learning, but it also greatly diminishes the practical
value of these algorithms for use in distributed control. We refer to forms of
learning in which players learn equilibria in some abstract sense only (to be
defined more precisely in the paper) as weak learning, and we refer to forms of
learning where players' periodbyperiod strategies converge to equilibrium as
strong learning. An approach is presented for modifying an FPtype algorithm
that achieves weak learning in order to construct a variant that achieves
strong learning. Theoretical convergence results are proved.

Empirical Centroid Fictitious Play (ECFP) is a generalization of the
wellknown Fictitious Play (FP) algorithm designed for implementation in
largescale games. In ECFP, the set of players is subdivided into equivalence
classes with players in the same class possessing similar properties. Players
choose a nextstage action by tracking and responding to aggregate statistics
related to each equivalence class. This setup alleviates the difficult task of
tracking and responding to the statistical behavior of every individual player,
as is the case in traditional FP. Aside from ECFP, many useful modifications
have been proposed to classical FP, e.g., rules allowing for networkbased
implementation, increased computational efficiency, and stronger forms of
learning. Such modifications tend to be of great practical value; however,
their effectiveness relies heavily on two fundamental properties of FP:
robustness to alterations in the empirical distribution step size process, and
robustness to bestresponse perturbations. The main contribution of the paper
is to show that similar robustness properties also hold for the ECFP algorithm.
This result serves as a first step in enabling practical modifications to ECFP,
similar to those already developed for FP.

The paper is concerned with distributed learning in largescale games. The
wellknown fictitious play (FP) algorithm is addressed, which, despite
theoretical convergence results, might be impractical to implement in
largescale settings due to intense computation and communication requirements.
An adaptation of the FP algorithm, designated as the empirical centroid
fictitious play (ECFP), is presented. In ECFP players respond to the centroid
of all players' actions rather than track and respond to the individual actions
of every player. Convergence of the ECFP algorithm in terms of average
empirical frequency (a notion made precise in the paper) to a subset of the
Nash equilibria is proven under the assumption that the game is a potential
game with permutation invariant potential function. A more general formulation
of ECFP is then given (which subsumes FP as a special case) and convergence
results are given for the class of potential games. Furthermore, a distributed
formulation of the ECFP algorithm is presented, in which, players endowed with
a (possibly sparse) preassigned communication graph, engage in local,
nonstrategic information exchange to eventually agree on a common equilibrium.
Convergence results are proven for the distributed ECFP algorithm.

We propose a simple, stable and distributed algorithm which directly
optimizes the nonconvex maximum likelihood criterion for sensor network
localization, with no need to tune any free parameter. We reformulate the
problem to obtain a gradient Lipschitz cost; by shifting to this cost function
we enable a MajorizationMinimization (MM) approach based on quadratic upper
bounds that decouple across nodes; the resulting algorithm happens to be
distributed, with all nodes working in parallel. Our method inherits the MM
stability: each communication cuts down the cost function. Numerical
simulations indicate that the proposed approach tops the performance of the
state of the art algorithm, both in accuracy and communication cost.

We study distributed optimization where nodes cooperatively minimize the sum
of their individual, locally known, convex costs $f_i(x)$'s, $x \in {\mathbb
R}^d$ is global. Distributed augmented Lagrangian (AL) methods have good
empirical performance on several signal processing and learning applications,
but there is limited understanding of their convergence rates and how it
depends on the underlying network. This paper establishes globally linear
(geometric) convergence rates of a class of deterministic and randomized
distributed AL methods, when the $f_i$'s are twice continuously differentiable
and have a bounded Hessian. We give explicit dependence of the convergence
rates on the underlying network parameters. Simulations illustrate our
analytical findings.

We study distributed optimization problems when $N$ nodes minimize the sum of
their individual costs subject to a common vector variable. The costs are
convex, have Lipschitz continuous gradient (with constant $L$), and bounded
gradient. We propose two fast distributed gradient algorithms based on the
centralized Nesterov gradient algorithm and establish their convergence rates
in terms of the pernode communications $\mathcal{K}$ and the pernode gradient
evaluations $k$. Our first method, Distributed Nesterov Gradient, achieves
rates $O\left({\log \mathcal{K}}/{\mathcal{K}}\right)$ and $O\left({\log
k}/{k}\right)$. Our second method, Distributed Nesterov gradient with Consensus
iterations, assumes at all nodes knowledge of $L$ and $\mu(W)$  the second
largest singular value of the $N \times N$ doubly stochastic weight matrix $W$.
It achieves rates $O\left({1}/{\mathcal{K}^{2\xi}}\right)$ and
$O\left({1}/{k^2}\right)$ ($\xi>0$ arbitrarily small). Further, we give with
both methods explicit dependence of the convergence constants on $N$ and $W$.
Simulation examples illustrate our findings.

We consider distributed optimization in random networks where N nodes
cooperatively minimize the sum \sum_{i=1}^N f_i(x) of their individual convex
costs. Existing literature proposes distributed gradientlike methods that are
computationally cheap and resilient to link failures, but have slow convergence
rates. In this paper, we propose accelerated distributed gradient methods that:
1) are resilient to link failures; 2) computationally cheap; and 3) improve
convergence rates over other gradient methods. We model the network by a
sequence of independent, identically distributed random matrices {W(k)} drawn
from the set of symmetric, stochastic matrices with positive diagonals. The
network is connected on average and the cost functions are convex,
differentiable, with Lipschitz continuous and bounded gradients. We design two
distributed Nesterovlike gradient methods that modify the DNG and DNC
methods that we proposed for static networks. We prove their convergence rates
in terms of the expected optimality gap at the cost function. Let k and K be
the number of pernode gradient evaluations and pernode communications,
respectively. Then the modified DNG achieves rates O(log k/k) and O(\log K/K),
and the modified DNC rates O(1/k^2) and O(1/K^{2\xi}), where \xi>0 is
arbitrarily small. For comparison, the standard distributed gradient method
cannot do better than \Omega(1/k^{2/3}) and \Omega(1/K^{2/3}), on the same
class of cost functions (even for static networks). Simulation examples
illustrate our analytical findings.

Single virus epidemics over complete networks are widely explored in the
literature as the fraction of infected nodes is, under appropriate microscopic
modeling of the virus infection, a Markov process. With noncomplete networks,
this macroscopic variable is no longer Markov. In this paper, we study virus
diffusion, in particular, multivirus epidemics, over noncomplete stochastic
networks. We focus on multipartite networks. In companying work
http://arxiv.org/abs/1306.6198, we show that the peertopeer local random
rules of virus infection lead, in the limit of large multipartite networks, to
the emergence of structured dynamics at the macroscale. The exact fluid limit
evolution of the fraction of nodes infected by each virus strain across islands
obeys a set of nonlinear coupled differential equations, see
http://arxiv.org/abs/1306.6198. In this paper, we develop methods to analyze
the qualitative behavior of these limiting dynamics, establishing conditions on
the virus micro characteristics and network structure under which a virus
persists or a natural selection phenomenon is observed.

Epidemics in large complete networks is well established. In contrast, we
consider epidemics in noncomplete networks. We establish the fluid limit
macroscopic dynamics of a multivirus spread over a multipartite network as the
number of nodes at each partite or island grows large. The virus spread follows
a peertopeer random rule of infection in line with the Harris contact
process. The model conforms to an SIS (susceptibleinfectedsusceptible) type,
where a node is either infected or it is healthy and prone to be infected. The
local (at node level) random infection model induces the emergence of
structured dynamics at the macroscale. Namely, we prove that, as the
multipartite network grows large, the normalized Markov jump vector process
$\left(\bar{\mathbf{Y}}^\mathbf{N}(t)\right) =
\left(\bar{Y}_1^\mathbf{N}(t),\ldots, \bar{Y}_M^\mathbf{N}(t)\right)$
collecting the fraction of infected nodes at each island $i=1,\ldots,M$,
converges weakly (with respect to the Skorokhod topology on the space of
\emph{c\`{a}dl\`{a}g} sample paths) to the solution of an $M$dimensional
vector nonlinear coupled ordinary differential equation. In the case of
multivirus diffusion with $K\in\mathbb{N}$ distinct strains of virus, the
Markov jurmp matrix process $\left(\bar{\mathbf{Y}}^\mathbf{N}(t)\right)$,
stacking the fraction of nodes infected with virus type $j$, $j=1,\ldots,K$, at
each island $i=1,\ldots,M$, converges weakly as well to the solution of a
$\left(K\times M\right)$dimensional vector differential equation that is also
characterized.

This paper considers the problem of filter design with secrecy constraints,
where two legitimate parties (Alice and Bob) communicate in the presence of an
eavesdropper (Eve), over a Gaussian multipleinputmultipleoutput (MIMO)
wiretap channel. This problem involves designing, subject to a power
constraint, the transmit and the receive filters which minimize the
meansquared error (MSE) between the legitimate parties whilst assuring that
the eavesdropper MSE remains above a certain threshold. We consider a general
MIMO Gaussian wiretap scenario, where the legitimate receiver uses a linear
ZeroForcing (ZF) filter and the eavesdropper receiver uses either a ZF or an
optimal linear Wiener filter. We provide a characterization of the optimal
filter designs by demonstrating the convexity of the optimization problems. We
also provide generalizations of the filter designs from the scenario where the
channel state is known to all the parties to the scenario where there is
uncertainty in the channel state. A set of numerical results illustrates the
performance of the novel filter designs, including the robustness to channel
modeling errors. In particular, we assess the efficacy of the designs in
guaranteeing not only a certain MSE level at the eavesdropper, but also in
limiting the error probability at the eavesdropper. We also assess the impact
of the filter designs on the achievable secrecy rates. The penalty induced by
the fact that the eavesdropper may use the optimal nonlinear receive filter
rather than the optimal linear one is also explored in the paper.

We present DCOOLNET, a scalable distributed innetwork algorithm for sensor
network localization based on noisy range measurements. DCOOLNET operates by
parallel, collaborative message passing between singlehop neighbor sensors,
and involves simple computations at each node. It stems from an application of
the majorizationminimization (MM) framework to the nonconvex optimization
problem at hand, and capitalizes on a novel convex majorizer. The proposed
majorizer is endowed with several desirable properties and represents a key
contribution of this work. It is a more accurate match to the underlying
nonconvex cost function than popular MM quadratic majorizers, and is readily
amenable to distributed minimization via the alternating direction method of
multipliers (ADMM). Moreover, it allows for lowcomplexity, fast Nesterov
gradient methods to tackle the ADMM subproblems induced at each node. Computer
simulations show that DCOOLNET achieves comparable or better sensor position
accuracies than a stateofart method which, furthermore, is not parallel.

We establish the large deviations asymptotic performance (error exponent) of
consensus+innovations distributed detection over random networks with generic
(nonGaussian) sensor observations. At each time instant, sensors 1) combine
theirs with the decision variables of their neighbors (consensus) and 2)
assimilate their new observations (innovations). This paper shows for general
nonGaussian distributions that consensus+innovations distributed detection
exhibits a phase transition behavior with respect to the network degree of
connectivity. Above a threshold, distributed is as good as centralized, with
the same optimal asymptotic detection performance, but, below the threshold,
distributed detection is suboptimal with respect to centralized detection. We
determine this threshold and quantify the performance loss below threshold.
Finally, we show the dependence of the threshold and performance on the
distribution of the observations: distributed detectors over the same random
network, but with different observations' distributions, for example, Gaussian,
Laplace, or quantized, may have different asymptotic performance, even when the
corresponding centralized detectors have the same asymptotic performance.

Distributed consensus and other linear systems with system stochastic
matrices $W_k$ emerge in various settings, like opinion formation in social
networks, rendezvous of robots, and distributed inference in sensor networks.
The matrices $W_k$ are often random, due to, e.g., random packet dropouts in
wireless sensor networks. Key in analyzing the performance of such systems is
studying convergence of matrix products $W_kW_{k1}... W_1$. In this paper, we
find the exact exponential rate $I$ for the convergence in probability of the
product of such matrices when time $k$ grows large, under the assumption that
the $W_k$'s are symmetric and independent identically distributed in time.
Further, for commonly used random models like with gossip and link failure, we
show that the rate $I$ is found by solving a mincut problem and, hence, easily
computable. Finally, we apply our results to optimally allocate the sensors'
transmission power in consensus+innovations distributed detection.

This work considers the problem of locating a single source from noisy range
measurements to a set of nodes in a wireless sensor network. We propose two new
techniques that we designate as Source Localization with Nuclear Norm (SLNN)
and Source Localization with l1norm (SLl1), which extend to arbitrary real
dimensions, including 3D, our prior work on 2D source localization formulated
in the complex plane. Broadly, our approach is based on formulating a
MaximumLikelihood (ML) estimation problem for the source position, and then
using convex relaxation techniques to obtain a semidefinite program (SDP) that
can be globally and efficiently solved. SLNN directly approximates the Gaussian
ML solution, and the relaxation is shown to be tighter than in other methods in
the same class. We present an analysis of the convexity properties of the
constraint set for the 2D complex version of SLNN (SLCP) to justify the
observed tightness of the relaxation. In terms of global accuracy of
localization, SLNN outperforms stateoftheart optimizationbased methods with
either iterative or closedform formulations. We propose the SLl1 algorithm to
address the Laplacian noise case, which models the presence of outliers in
range measurements. We overcome the nondifferentiability of the Laplacian
likelihood function by rewriting the ML problem as an exact weighted version of
the Gaussian case, and compare two solution strategies. One of them is
iterative, based on block coordinate descent, and uses SLNN as a subprocessing
block. The other, attaining only slightly worse performance, is noniterative
and based on an SDP relaxation of the weighted ML problem.

We study the large deviations performance of consensus+innovations
distributed detection over noisy networks, where sensors at a time step k
cooperate with immediate neighbors (consensus) and assimilate their new
observations (innovation.) We show that, even under noisy communication,
\emph{all sensors} can achieve exponential decay e^{k C_{\mathrm{dis}}} of the
detection error probability, even when certain (or most) sensors cannot detect
the event of interest in isolation. We achieve this by designing a single time
scale stochastic approximation type distributed detector with the optimal
weight sequence {\alpha_k}, by which sensors weigh their neighbors' messages.
The optimal design of {\alpha_k} balances the opposing effects of communication
noise and information flow from neighbors: larger, slowly decaying \alpha_k
improves information flow but injects more communication noise. Further, we
quantify the best achievable C_{\mathrm{dis}} as a function of the sensing
signal and noise, communication noise, and network connectivity. Finally, we
find a threshold on the communication noise power below which a sensor that can
detect the event in isolation still improves its detection by cooperation
through noisy links.

We study distributed optimization in networked systems, where nodes cooperate
to find the optimal quantity of common interest, x=x^\star. The objective
function of the corresponding optimization problem is the sum of private (known
only by a node,) convex, nodes' objectives and each node imposes a private
convex constraint on the allowed values of x. We solve this problem for generic
connected network topologies with asymmetric random link failures with a novel
distributed, decentralized algorithm. We refer to this algorithm as ALG
(augmented Lagrangian gossiping,) and to its variants as ALMG (augmented
Lagrangian multi neighbor gossiping) and ALBG (augmented Lagrangian broadcast
gossiping.) The ALG algorithm is based on the augmented Lagrangian dual
function. Dual variables are updated by the standard method of multipliers, at
a slow time scale. To update the primal variables, we propose a novel,
GaussSeidel type, randomized algorithm, at a fast time scale. ALG uses
unidirectional gossip communication, only between immediate neighbors in the
network and is resilient to random link failures. For networks with reliable
communication (i.e., no failures,) the simplified, ALBG (augmented Lagrangian
broadcast gossiping) algorithm reduces communication, computation and data
storage cost. We prove convergence for all proposed algorithms and demonstrate
by simulations the effectiveness on two applications: l_1regularized logistic
regression for classification and cooperative spectrum sensing for cognitive
radio networks.

We study the large deviations performance, i.e., the exponential decay rate
of the error probability, of distributed detection algorithms over random
networks. At each time step $k$ each sensor: 1) averages its decision variable
with the neighbors' decision variables; and 2) accounts onthefly for its new
observation. We show that distributed detection exhibits a "phase change"
behavior. When the rate of network information flow (the speed of averaging) is
above a threshold, then distributed detection is asymptotically equivalent to
the optimal centralized detection, i.e., the exponential decay rate of the
error probability for distributed detection equals the Chernoff information.
When the rate of information flow is below a threshold, distributed detection
achieves only a fraction of the Chernoff information rate; we quantify this
achievable rate as a function of the network rate of information flow.
Simulation examples demonstrate our theoretical findings on the behavior of
distributed detection over random networks.

We consider the problem of sensor selection for event detection in wireless
sensor networks (WSNs). We want to choose a subset of p out of n sensors that
yields the best detection performance. As the sensor selection optimality
criteria, we propose the KullbackLeibler and Chernoff distances between the
distributions of the selected measurements under the two hypothesis. We
formulate the maxmin robust sensor selection problem to cope with the
uncertainties in distribution means. We prove that the sensor selection problem
is NP hard, for both KullbackLeibler and Chernoff criteria. To (sub)optimally
solve the sensor selection problem, we propose an algorithm of affordable
complexity. Extensive numerical simulations on moderate size problem instances
(when the optimum by exhaustive search is feasible to compute) demonstrate the
algorithm's near optimality in a very large portion of problem instances. For
larger problems, extensive simulations demonstrate that our algorithm
outperforms random searches, once an upper bound on computational time is set.
We corroborate numerically the validity of the KullbackLeibler and Chernoff
sensor selection criteria, by showing that they lead to sensor selections
nearly optimal both in the NeymanPearson and Bayes sense.