
Motivated by applications arising from large scale optimization and machine
learning, we consider stochastic quasiNewton (SQN) methods for solving
unconstrained convex optimization problems. The convergence analysis of the SQN
methods, both full and limitedmemory variants, require the objective function
to be strongly convex. However, this assumption is fairly restrictive and does
not hold for applications such as minimizing the logistic regression loss
function. To the best of our knowledge, no rate statements currently exist for
SQN methods in the absence of such an assumption. Also, among the existing
firstorder methods for addressing stochastic optimization problems with merely
convex objectives, those equipped with provable convergence rates employ
averaging. However, this averaging technique has a detrimental impact on
inducing sparsity. Motivated by these gaps, the main contributions of the paper
are as follows: (i) Addressing large scale stochastic optimization problems, we
develop an iteratively regularized stochastic limitedmemory BFGS (IRSLBFGS)
algorithm, where the stepsize, regularization parameter, and the Hessian
inverse approximation matrix are updated iteratively. We establish the
convergence to an optimal solution of the original problem both in an
almostsure and mean senses. We derive the convergence rate in terms of the
objective function's values and show that it is of the order
$\mathcal{O}\left(k^{\left(\frac{1}{3}\epsilon\right)}\right)$, where
$\epsilon$ is an arbitrary small positive scalar; (ii) In deterministic regime,
we show that the regularized limitedmemory BFGS algorithm displays a rate of
the order $\mathcal{O}\left(\frac{1}{k^{1 \epsilon'}}\right)$, where
$\epsilon'$ is an arbitrary small positive scalar. We present our numerical
experiments performed on a large scale text classification problem.

In this paper, we develop a class of decentralized algorithms for solving a
convex resource allocation problem in a network of $n$ agents, where the agent
objectives are decoupled while the resource constraints are coupled. The agents
communicate over a connected undirected graph, and they want to collaboratively
determine a solution to the overall network problem, while each agent only
communicates with its neighbors. We first study the connection between the
decentralized resource allocation problem and the decentralized consensus
optimization problem. Then, using a class of algorithms for solving consensus
optimization problems, we propose a novel class of decentralized schemes for
solving resource allocation problems in a distributed manner. Specifically, we
first propose an algorithm for solving the resource allocation problem with an
$o(1/k)$ convergence rate guarantee when the agents' objective functions are
generally convex (could be nondifferentiable) and per agent local convex
constraints are allowed; We then propose a gradientbased algorithm for solving
the resource allocation problem when per agent local constraints are absent and
show that such scheme can achieve geometric rate when the objective functions
are strongly convex and have Lipschitz continuous gradients. We have also
provided scalability/network dependency analysis. Based on these two
algorithms, we have further proposed a gradient projectionbased algorithm
which can handle smooth objective and simple constraints more efficiently.
Numerical experiments demonstrates the viability and performance of all the
proposed algorithms.

In this paper, we study the problem of distributed multiagent optimization
over a network, where each agent possesses a local cost function that is smooth
and strongly convex. The global objective is to find a common solution that
minimizes the average of all cost functions. Assuming agents only have access
to unbiased estimates of the gradients of their local cost functions, we
consider a distributed stochastic gradient tracking method. We show that, in
expectation, the iterates generated by each agent are attracted to a
neighborhood of the optimal solution, where they accumulate exponentially fast
(under a constant step size choice). More importantly, the limiting (expected)
error bounds on the distance of the iterates from the optimal solution decrease
with the network size, which is a comparable performance to a centralized
stochastic gradient algorithm. Numerical examples further demonstrate the
effectiveness of the method.

We propose and analyze a new stochastic gradient method, which we call
Stochastic Unbiased Curvatureaided Gra dient (SUCAG), for finite sum
optimization problems. SUCAG constitutes an unbiased total gradient tracking
technique that uses Hessian information to accelerate convergence. We an alyze
our method under the general asynchronous model of computation, in which
functions are selected infinitely often, but with delays that can grow
sublinearly. For strongly convex problems, we establish linear convergence for
the SUCAG method. When the initialization point is sufficiently close to the
optimal solution, the established convergence rate is only dependent on the
condition number of the problem, making it strictly faster than the known rate
for the SAGA method.
Furthermore, we describe a Markovdriven approach of implementing the SUCAG
method in a distributed asynchronous multiagent setting, via gossiping along a
random walk on the communication graph. We show that our analysis applies as
long as the undirected graph is connected and, notably, establishes an
asymptotic linear convergence rate that is robust to the graph topology.
Numerical results demonstrate the merit of our algorithm over existing methods.

We propose a new classoptimal algorithm for the distributed computation of
Wasserstein Barycenters over networks. Assuming that each node in a graph has a
probability distribution, we prove that every node is able to reach the
barycenter of all distributions held in the network by using local interactions
compliant with the topology of the graph. We show the minimum number of
communication rounds required for the proposed method to achieve arbitrary
relative precision both in the optimality of the solution and the consensus
among all agents for undirected fixed networks.

In decentralized optimization, nodes cooperate to minimize an overall
objective function that is the sum (or average) of pernode private objective
functions. Algorithms interleave local computations with communication among
all or a subset of the nodes. Motivated by a variety of
applicationsdistributed estimation in sensor networks, fitting models to
massive data sets, and distributed control of multirobot systems, to name a
fewsignificant advances have been made towards the development of robust,
practical algorithms with theoretical performance guarantees. This paper
presents an overview of recent work in this area. In general, rates of
convergence depend not only on the number of nodes involved and the desired
level of accuracy, but also on the structure and nature of the network over
which nodes communicate (e.g., whether links are directed or undirected, static
or timevarying). We survey the stateoftheart algorithms and their analyses
tailored to these different scenarios, highlighting the role of the network
topology.

This paper considers a discretetime opinion dynamics model in which each
individual's susceptibility to being influenced by others is dependent on her
current opinion. We assume that the social network has timevarying topology
and that the opinions are scalars on a continuous interval. We first propose a
general opinion dynamics model based on the DeGroot model, with a general
function to describe the functional dependence of each individual's
susceptibility on her own opinion, and show that this general model is
analogous to the FriedkinJohnsen model, which assumes a constant
susceptibility for each individual. We then consider two specific functions in
which the individual's susceptibility depends on the \emph{polarity} of her
opinion, and provide motivating social examples. First, we consider stubborn
positives, who have reduced susceptibility if their opinions are at one end of
the interval and increased susceptibility if their opinions are at the opposite
end. A court jury is used as a motivating example. Second, we consider stubborn
neutrals, who have reduced susceptibility when their opinions are in the middle
of the spectrum, and our motivating examples are social networks discussing
established social norms or institutionalized behavior. For each specific
susceptibility model, we establish the initial and graph topology conditions in
which consensus is reached, and develop necessary and sufficient conditions on
the initial conditions for the final consensus value to be at either extreme of
the opinion interval. Simulations are provided to show the effects of the
susceptibility function when compared to the DeGroot model.

Epidemic processes are used commonly for modeling and analysis of biological
networks, computer networks, and human contact networks. The idea of competing
viruses has been explored recently, motivated by the spread of different ideas
along different social networks. Previous studies of competitive viruses have
focused only on two viruses and on static graph structures. In this paper, we
consider multiple competing viruses over static and dynamic graph structures,
and investigate the eradication and propagation of diseases in these systems.
Stability analysis for the class of models we consider is performed and an
antidote control technique is proposed.

We consider the problem of distributed learning, where a network of agents
collectively aim to agree on a hypothesis that best explains a set of
distributed observations of conditionally independent random processes. We
propose a distributed algorithm and establish consistency, as well as a
nonasymptotic, explicit and geometric convergence rate for the concentration
of the beliefs around the set of optimal hypotheses. Additionally, if the
agents interact over static networks, we provide an improved learning protocol
with better scalability with respect to the number of nodes in the network.

We study the problem of cooperative inference where a group of agents
interact over a network and seek to estimate a joint parameter that best
explains a set of observations. Agents do not know the network topology or the
observations of other agents. We explore a variational interpretation of the
Bayesian posterior density, and its relation to the stochastic mirror descent
algorithm, to propose a new distributed learning algorithm. We show that, under
appropriate assumptions, the beliefs generated by the proposed algorithm
concentrate around the true parameter exponentially fast. We provide explicit
nonasymptotic bounds for the convergence rate. Moreover, we develop explicit
and computationally efficient algorithms for observation models belonging to
exponential families.

We present a distributed (nonBayesian) learning algorithm for the problem of
parameter estimation with Gaussian noise. The algorithm is expressed as
explicit updates on the parameters of the Gaussian beliefs (i.e. means and
precision). We show a convergence rate of $O(1/k)$ with the constant term
depending on the number of agents and the topology of the network. Moreover, we
show almost sure convergence to the optimal solution of the estimation problem
for the general case of timevarying directed graphs.

We overview some results on distributed learning with focus on a family of
recently proposed algorithms known as nonBayesian social learning. We consider
different approaches to the distributed learning problem and its algorithmic
solutions for the case of finitely many hypotheses. The original centralized
problem is discussed at first, and then followed by a generalization to the
distributed setting. The results on convergence and convergence rate are
presented for both asymptotic and finite time regimes. Various extensions are
discussed such as those dealing with directed timevarying networks, Nesterov's
acceleration technique and a continuum sets of hypothesis.

A recent algorithmic family for distributed optimization, DIGing's, have been
shown to have geometric convergence over timevarying undirected/directed
graphs. Nevertheless, an identical stepsize for all agents is needed. In this
paper, we study the convergence rates of the AdaptThenCombine (ATC) variation
of the DIGing algorithm under uncoordinated stepsizes. We show that the ATC
variation of DIGing algorithm converges geometrically fast even if the
stepsizes are different among the agents. In addition, our analysis implies
that the ATC structure can accelerate convergence compared to the distributed
gradient descent (DGD) structure which has been used in the original DIGing
algorithm.

The spread of viruses in biological networks, computer networks, and human
contact networks can have devastating effects; developing and analyzing
mathematical models of these systems can be insightful and lead to societal
benefits. Prior research has focused mainly on network models with static graph
structures, however the systems being modeled typically have dynamic graph
structures. Therefore to better understand and analyze virus spread, further
study is required. In this paper, we consider virus spread models over networks
with dynamic graph structures, and investigate the behavior of diseases in
these systems. A stability analysis of epidemic processes over timevarying
networks is performed, examining conditions for the disease free equilibrium,
in both the deterministic and stochastic cases. We present simulation results,
propose a number of corollaries based on these simulations, and discuss
quarantine control via simulation.

The problem of least squares regression of a $d$dimensional unknown
parameter is considered. A stochastic gradient descent based algorithm with
weighted iterateaveraging that uses a single pass over the data is studied and
its convergence rate is analyzed. We first consider a bounded constraint set of
the unknown parameter. Under some standard regularity assumptions, we provide
an explicit $O(1/k)$ upper bound on the convergence rate, depending on the
variance (due to the additive noise in the measurements) and the size of the
constraint set. We show that the variance term dominates the error and
decreases with rate $1/k$, while the term which is related to the size of the
constraint set decreases with rate $\log k/k^2$. We then compare the asymptotic
ratio $\rho$ between the convergence rate of the proposed scheme and the
empirical risk minimizer (ERM) as the number of iterations approaches infinity.
We show that $\rho\leq 4$ under some mild conditions for all $d\geq 1$. We
further improve the upper bound by showing that $\rho\leq 4/3$ for the case of
$d=1$ and unbounded parameter set. Simulation results demonstrate strong
performance of the algorithm as compared to existing methods, and coincide with
$\rho\leq 4/3$ even for large $d$ in practice.

We consider a class of Nash games, termed as aggregative games, being played
over a networked system. In an aggregative game, a player's objective is a
function of the aggregate of all the players' decisions. Every player maintains
an estimate of this aggregate, and the players exchange this information with
their local neighbors over a connected network. We study distributed
synchronous and asynchronous algorithms for information exchange and
equilibrium computation over such a network. Under standard conditions, we
establish the almostsure convergence of the obtained sequences to the
equilibrium point. We also consider extensions of our schemes to aggregative
games where the players' objectives are coupled through a more general form of
aggregate function. Finally, we present numerical results that demonstrate the
performance of the proposed schemes.

We consider a decentralized online convex optimization problem in a network
of agents, where each agent controls only a coordinate (or a part) of the
global decision vector. For such a problem, we propose two decentralized
variants (ODAC and ODAPS) of Nesterov's primaldual algorithm with dual
averaging. In ODAC, to mitigate the disagreements on the primalvector
updates, the agents implement a generalization of the local
informationexchange dynamics recently proposed by Li and Marden over a static
undirected graph. In ODAPS, the agents implement the broadcastbased pushsum
dynamics over a timevarying sequence of uniformly connected digraphs. We show
that the regret bounds in both cases have sublinear growth of $O(\sqrt{T})$,
with the time horizon $T$, when the stepsize is of the form $1/\sqrt{t}$ and
the objective functions are Lipschitzcontinuous convex functions with
Lipschitz gradients. We also implement the proposed algorithms on a sensor
network to complement our theoretical analysis.

We consider a distributed learning setup where a network of agents
sequentially access realizations of a set of random variables with unknown
distributions. The network objective is to find a parametrized distribution
that best describes their joint observations in the sense of the
KullbackLeibler divergence. Apart from recent efforts in the literature, we
analyze the case of countably many hypotheses and the case of a continuum of
hypotheses. We provide nonasymptotic bounds for the concentration rate of the
agents' beliefs around the correct hypothesis in terms of the number of agents,
the network parameters, and the learning abilities of the agents. Additionally,
we provide a novel motivation for a general set of distributed NonBayesian
update rules as instances of the distributed stochastic mirror descent
algorithm.

Motivated by the spread of opinions on different social networks, we study a
distributed continuoustime bivirus model for a system of groups of
individuals. An indepth stability analysis is performed for more general
models than have been previously considered, for the healthy and epidemic
states. In addition, we investigate sensitivity properties of some nontrivial
equilibria and obtain an impossibility result for distributed feedback control.

Motivated by applications in optimization and machine learning, we consider
stochastic quasiNewton (SQN) methods for solving stochastic optimization
problems. In the literature, the convergence analysis of these algorithms
relies on strong convexity of the objective function. To our knowledge, no
theoretical analysis is provided for the rate statements in the absence of this
assumption. Motivated by this gap, we allow the objective function to be merely
convex and we develop a cyclic regularized SQN method where the gradient
mapping and the Hessian approximation matrix are both regularized at each
iteration and are updated in a cyclic manner. We show that, under suitable
assumptions on the stepsize and regularization parameters, the objective
function value converges to the optimal objective function of the original
problem in both almost sure and the expected senses. For each case, a class of
feasible sequences that guarantees the convergence is provided. Moreover, the
rate of convergence in terms of the objective function value is derived. Our
empirical analysis on a binary classification problem shows that the proposed
scheme performs well compared to both classic regularization SQN schemes and
stochastic approximation method.

Spectral clustering approaches have led to wellaccepted algorithms for
finding accurate clusters in a given dataset. However, their application to
largescale datasets has been hindered by computational complexity of
eigenvalue decompositions. Several algorithms have been proposed in the recent
past to accelerate spectral clustering, however they compromise on the accuracy
of the spectral clustering to achieve faster speed. In this paper, we propose a
novel spectral clustering algorithm based on a mixing process on a graph.
Unlike the existing spectral clustering algorithms, our algorithm does not
require computing eigenvectors. Specifically, it finds the equivalent of a
linear combination of eigenvectors of the normalized similarity matrix weighted
with corresponding eigenvalues. This linear combination is then used to
partition the dataset into meaningful clusters. Simulations on real datasets
show that partitioning datasets based on such linear combinations of
eigenvectors achieves better accuracy than standard spectral clustering methods
as the number of clusters increase. Our algorithm can easily be implemented in
a distributed setting.

Traditionally, stochastic approximation schemes for SVIs have relied on
strong monotonicity and Lipschitzian properties of the underlying map. In
contrast, we consider monotone stochastic variational inequality (SVI) problems
where the strong monotonicity and Lipschitzian assumptions on the mappings are
weakened. In the first part of the paper, to address such shortcomings, a
regularized smoothed SA (RSSA) scheme is developed wherein the stepsize,
smoothing, and regularization parameters are diminishing sequences updated
after every iteration. Under suitable assumptions on the sequences, we show
that the algorithm generates iterates that converge to a solution in an almost
sure sense, extending the results in [16] to the nonLipschitzian regime.
Motivated by the need to develop nonasymptotic rate statements, in the second
part of the paper, we develop a variant of the RSSA scheme, denoted by
aRSSA$_r$, in which we employ a weighted iterateaveraging, parametrized by a
scalar $r$ where $r = 1$ provides us with the standard averaging scheme. We
make several contributions in this context: First, we show that the gap
function associated with the sequences by the aRSSA$_r$ scheme tends to zero
when the parameter sequences are chosen appropriately. Second, we show that the
gap function associated with the averaged sequence diminishes to zero at the
optimal rate $\cal{O}(1/\sqrt{K})$ after $K$ steps when smoothing and
regularization are suppressed and $r < 1$, thus improving the rate statement
for the standard averaging which admits a rate of $\cal{O}(\ln(K)/\sqrt{K})$.
Third, we develop a windowbased variant of this scheme that also displays the
optimal rate for $r < 1$. Notably, we prove the superiority of the scheme with
$r < 1$ with its counterpart with $r=1$ in terms of the constant factor of the
error bound when the size of the averaging window is sufficiently large.

We propose a new belief update rule for Distributed NonBayesian learning in
timevarying directed graphs, where a group of agents tries to collectively
identify a hypothesis that best describes a sequence of observed data. We show
that the proposed update rule, inspired by the PushSum algorithm, is
consistent, moreover we provide an explicit characterization of its convergence
rate. Our main result states that, after a transient time, all agents will
concentrate their beliefs at a network independent rate. Network independent
rates were not available for other consensus based distributed learning
algorithms.

We present and analyze a computational hybrid architecture for performing
multiagent optimization. The optimization problems under consideration have
convex objective and constraint functions with mild smoothness conditions
imposed on them. For such problems, we provide a primaldual algorithm
implemented in the hybrid architecture, which consists of a decentralized
network of agents into which centralized information is occasionally injected,
and we establish its convergence properties. To accomplish this, a central
cloud computer aggregates global information, carries out computations of the
dual variables based on this information, and then distributes the updated dual
variables to the agents. The agents update their (primal) state variables and
also communicate among themselves with each agent sharing and receiving state
information with some number of its neighbors. Throughout, communications with
the cloud are not assumed to be synchronous or instantaneous, and communication
delays are explicitly accounted for in the modeling and analysis of the system.
Experimental results are presented to support the theoretical developments
made.

We study the problem of distributed hypothesis testing with a network of
agents where some agents repeatedly gain access to information about the
correct hypothesis. The group objective is to globally agree on a joint
hypothesis that best describes the observed data at all the nodes. We assume
that the agents can interact with their neighbors in an unknown sequence of
timevarying directed graphs. Following the pioneering work of Jadbabaie,
Molavi, Sandroni, and TahbazSalehi, we propose local learning dynamics which
combine Bayesian updates at each node with a local aggregation rule of private
agent signals. We show that these learning dynamics drive all agents to the set
of hypotheses which best explain the data collected at all nodes as long as the
sequence of interconnection graphs is uniformly strongly connected. Our main
result establishes a nonasymptotic, explicit, geometric convergence rate for
the learning dynamic.