• Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical systems issues. Our method and theory for the first time consider issues of high communication cost, stragglers, and fault tolerance for distributed multi-task learning. The resulting method achieves significant speedups compared to alternatives in the federated setting, as we demonstrate through simulations on real-world federated datasets.
  • Generative Adversarial Networks (GANs) are one of the most practical strategies to learn data distributions. A popular GAN formulation is based on the use of Wasserstein distance as a metric between probability distributions. Unfortunately, minimizing the Wasserstein distance between the data distribution and the generative model distribution is a challenging problem as its objective is non-convex, non-smooth, and even hard to compute. In this work, we propose to use a smooth approximation of the Wasserstein GANs. We show that this smooth approximation is close to the original objective. Moreover, obtaining gradient information of this approximate formulation is computationally effortless and hence one can easily apply first order optimization methods to optimize this objective. Based on this observation, we proposed a class of algorithms with guaranteed theoretical convergence to stationarity. Unlike the original non-smooth objective, our proposed algorithm only requires solving the discriminator to approximate optimality. We applied our method to learning Gaussian mixtures on a grid and also to learning MNIST digits. Our method allows the use of powerful cost functions based on latent representations of the data, where this latent representation could also be optimized adversarially.
  • Consider the problem of minimizing the sum of two convex functions, one being smooth and the other non-smooth. In this paper, we introduce a general class of approximate proximal splitting (APS) methods for solving such minimization problems. Methods in the APS class include many well-known algorithms such as the proximal splitting method (PSM), the block coordinate descent method (BCD) and the approximate gradient projection methods for smooth convex optimization. We establish the linear convergence of APS methods under a local error bound assumption. Since the latter is known to hold for compressive sensing and sparse group LASSO problems, our analysis implies the linear convergence of the BCD method for these problems without strong convexity assumption.
  • Consider the problem of minimizing the expected value of a cost function parameterized by a random variable. The classical sample average approximation (SAA) method for solving this problem requires minimization of an ensemble average of the objective at each step, which can be expensive. In this paper, we propose a stochastic successive upper-bound minimization method (SSUM) which minimizes an approximate ensemble average at each iteration. To ensure convergence and to facilitate computation, we require the approximate ensemble average to be a locally tight upper-bound of the expected cost function and be easily optimized. The main contributions of this work include the development and analysis of the SSUM method as well as its applications in linear transceiver design for wireless communication networks and online dictionary learning. Moreover, using the SSUM framework, we extend the classical stochastic (sub-)gradient (SG) method to the case of minimizing a nonsmooth nonconvex objective function and establish its convergence.
  • Consider a MIMO interference channel whereby each transmitter and receiver are equipped with multiple antennas. The basic problem is to design optimal linear transceivers (or beamformers) that can maximize system throughput. The recent work [1] suggests that optimal beamformers should maximize the total degrees of freedom and achieve interference alignment in high SNR. In this paper we first consider the interference alignment problem in spatial domain and prove that the problem of maximizing the total degrees of freedom for a given MIMO interference channel is NP-hard. Furthermore, we show that even checking the achievability of a given tuple of degrees of freedom for all receivers is NP-hard when each receiver is equipped with at least three antennas. Interestingly, the same problem becomes polynomial time solvable when each transmit/receive node is equipped with no more than two antennas. Finally, we propose a distributed algorithm for transmit covariance matrix design, while assuming each receiver uses a linear MMSE beamformer. The simulation results show that the proposed algorithm outperforms the existing interference alignment algorithms in terms of system throughput.