
Federated learning poses new statistical and systems challenges in training
machine learning models over distributed networks of devices. In this work, we
show that multitask learning is naturally suited to handle the statistical
challenges of this setting, and propose a novel systemsaware optimization
method, MOCHA, that is robust to practical systems issues. Our method and
theory for the first time consider issues of high communication cost,
stragglers, and fault tolerance for distributed multitask learning. The
resulting method achieves significant speedups compared to alternatives in the
federated setting, as we demonstrate through simulations on realworld
federated datasets.

Generative Adversarial Networks (GANs) are one of the most practical
strategies to learn data distributions. A popular GAN formulation is based on
the use of Wasserstein distance as a metric between probability distributions.
Unfortunately, minimizing the Wasserstein distance between the data
distribution and the generative model distribution is a challenging problem as
its objective is nonconvex, nonsmooth, and even hard to compute. In this
work, we propose to use a smooth approximation of the Wasserstein GANs. We show
that this smooth approximation is close to the original objective. Moreover,
obtaining gradient information of this approximate formulation is
computationally effortless and hence one can easily apply first order
optimization methods to optimize this objective. Based on this observation, we
proposed a class of algorithms with guaranteed theoretical convergence to
stationarity. Unlike the original nonsmooth objective, our proposed algorithm
only requires solving the discriminator to approximate optimality. We applied
our method to learning Gaussian mixtures on a grid and also to learning MNIST
digits. Our method allows the use of powerful cost functions based on latent
representations of the data, where this latent representation could also be
optimized adversarially.

Consider the problem of minimizing the sum of two convex functions, one being
smooth and the other nonsmooth. In this paper, we introduce a general class of
approximate proximal splitting (APS) methods for solving such minimization
problems. Methods in the APS class include many wellknown algorithms such as
the proximal splitting method (PSM), the block coordinate descent method (BCD)
and the approximate gradient projection methods for smooth convex optimization.
We establish the linear convergence of APS methods under a local error bound
assumption. Since the latter is known to hold for compressive sensing and
sparse group LASSO problems, our analysis implies the linear convergence of the
BCD method for these problems without strong convexity assumption.

Consider the problem of minimizing the expected value of a cost function
parameterized by a random variable. The classical sample average approximation
(SAA) method for solving this problem requires minimization of an ensemble
average of the objective at each step, which can be expensive. In this paper,
we propose a stochastic successive upperbound minimization method (SSUM) which
minimizes an approximate ensemble average at each iteration. To ensure
convergence and to facilitate computation, we require the approximate ensemble
average to be a locally tight upperbound of the expected cost function and be
easily optimized. The main contributions of this work include the development
and analysis of the SSUM method as well as its applications in linear
transceiver design for wireless communication networks and online dictionary
learning. Moreover, using the SSUM framework, we extend the classical
stochastic (sub)gradient (SG) method to the case of minimizing a nonsmooth
nonconvex objective function and establish its convergence.

Consider a MIMO interference channel whereby each transmitter and receiver
are equipped with multiple antennas. The basic problem is to design optimal
linear transceivers (or beamformers) that can maximize system throughput. The
recent work [1] suggests that optimal beamformers should maximize the total
degrees of freedom and achieve interference alignment in high SNR. In this
paper we first consider the interference alignment problem in spatial domain
and prove that the problem of maximizing the total degrees of freedom for a
given MIMO interference channel is NPhard. Furthermore, we show that even
checking the achievability of a given tuple of degrees of freedom for all
receivers is NPhard when each receiver is equipped with at least three
antennas. Interestingly, the same problem becomes polynomial time solvable when
each transmit/receive node is equipped with no more than two antennas. Finally,
we propose a distributed algorithm for transmit covariance matrix design, while
assuming each receiver uses a linear MMSE beamformer. The simulation results
show that the proposed algorithm outperforms the existing interference
alignment algorithms in terms of system throughput.