• While Generative Adversarial Networks (GANs) have demonstrated promising performance on multiple vision tasks, their learning dynamics are not yet well understood, both in theory and in practice. To address this issue, we study GAN dynamics in a simple yet rich parametric model that exhibits several of the common problematic convergence behaviors such as vanishing gradients, mode collapse, and diverging or oscillatory behavior. In spite of the non-convex nature of our model, we are able to perform a rigorous theoretical analysis of its convergence behavior. Our analysis reveals an interesting dichotomy: a GAN with an optimal discriminator provably converges, while first order approximations of the discriminator steps lead to unstable GAN dynamics and mode collapse. Our result suggests that using first order discriminator steps (the de-facto standard in most existing GAN setups) might be one of the factors that makes GAN training challenging in practice.
  • Machine learning models are often susceptible to adversarial perturbations of their inputs. Even small perturbations can cause state-of-the-art classifiers with high "standard" accuracy to produce an incorrect prediction with high confidence. To better understand this phenomenon, we study adversarially robust learning from the viewpoint of generalization. We show that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of "standard" learning. This gap is information theoretic and holds irrespective of the training algorithm or the model family. We complement our theoretical results with experiments on popular image classification datasets and show that a similar gap exists here as well. We postulate that the difficulty of training robust classifiers stems, at least partially, from this inherently larger sample complexity.
  • A fundamental, and still largely unanswered, question in the context of Generative Adversarial Networks (GANs) is whether GANs are actually able to capture the key characteristics of the datasets they are trained on. The current approaches to examining this issue require significant human supervision, such as visual inspection of sampled images, and often offer only fairly limited scalability. In this paper, we propose new techniques that employ a classification-based perspective to evaluate synthetic GAN distributions and their capability to accurately reflect the essential properties of the training data. These techniques require only minimal human supervision and can easily be scaled and adapted to evaluate a variety of state-of-the-art GANs on large, popular datasets. Our analysis indicates that GANs have significant problems in reproducing the more distributional properties of the training dataset. In particular, when seen through the lens of classification, the diversity of GAN data is orders of magnitude less than that of the original data.
  • We show that simple transformations, namely translations and rotations alone, are sufficient to fool neural network-based vision models on a significant fraction of inputs. This is in sharp contrast to previous work that relied on more complicated optimization approaches that are unlikely to appear outside of a truly adversarial setting. Moreover, fooling rotations and translations are easy to find and require only a few black-box queries to the target model. Overall, our findings emphasize the need for designing robust classifiers even in natural, benign contexts.
  • For over a decade now we have been witnessing the success of {\em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the {\em maximum matching} problem---one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in $O(\log{n})$ rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has $n^{1+\Omega(1)}$ memory, this problem can also be solved $2$-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly $O(\log{n})$ rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that perplexing possibility. That is, we break the above $O(\log n)$ round complexity bound even in the case of {\em slightly sublinear} memory per machine. In fact, our improvement here is {\em almost exponential}: we are able to deliver a $(2+\epsilon)$-approximation to maximum matching, for any fixed constant $\epsilon>0$, in $O((\log \log n)^2)$ rounds.
  • In this paper, we study matrix scaling and balancing, which are fundamental problems in scientific computing, with a long line of work on them that dates back to the 1960s. We provide algorithms for both these problems that, ignoring logarithmic factors involving the dimension of the input matrix and the size of its entries, both run in time $\widetilde{O}\left(m\log \kappa \log^2 (1/\epsilon)\right)$ where $\epsilon$ is the amount of error we are willing to tolerate. Here, $\kappa$ represents the ratio between the largest and the smallest entries of the optimal scalings. This implies that our algorithms run in nearly-linear time whenever $\kappa$ is quasi-polynomial, which includes, in particular, the case of strictly positive matrices. We complement our results by providing a separate algorithm that uses an interior-point method and runs in time $\widetilde{O}(m^{3/2} \log (1/\epsilon))$. In order to establish these results, we develop a new second-order optimization framework that enables us to treat both problems in a unified and principled manner. This framework identifies a certain generalization of linear system solving that we can use to efficiently minimize a broad class of functions, which we call second-order robust. We then show that in the context of the specific functions capturing matrix scaling and balancing, we can leverage and generalize the work on Laplacian system solving to make the algorithms obtained via this framework very efficient.
  • Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at https://github.com/MadryLab/mnist_challenge and https://github.com/MadryLab/cifar10_challenge.
  • We present an $\tilde{O}\left(m^{\frac{10}{7}}U^{\frac{1}{7}}\right)$-time algorithm for the maximum $s$-$t$ flow problem and the minimum $s$-$t$ cut problem in directed graphs with $m$ arcs and largest integer capacity $U$. This matches the running time of the $\tilde{O}\left((mU)^{\frac{10}{7}}\right)$-time algorithm of M\k{a}dry (FOCS 2013) in the unit-capacity case, and improves over it, as well as over the $\tilde{O}\left(m \sqrt{n} \log U\right)$-time algorithm of Lee and Sidford (FOCS 2014), whenever $U$ is moderately large and the graph is sufficiently sparse. By well-known reductions, this also gives similar running time improvements for the maximum-cardinality bipartite $b$-matching problem. One of the advantages of our algorithm is that it is significantly simpler than the ones presented in Madry (FOCS 2013) and Lee and Sidford (FOCS 2014). In particular, these algorithms employ a sophisticated interior-point method framework, while our algorithm is cast directly in the classic augmenting path setting that almost all the combinatorial maximum flow algorithms use. At a high level, the presented algorithm takes a primal dual approach in which each iteration uses electrical flows computations both to find an augmenting $s$-$t$ flow in the current residual graph and to update the dual solution. We show that by maintain certain careful coupling of these primal and dual solutions we are always guaranteed to make significant progress.
  • We present and study the Static-Routing-Resiliency problem, motivated by routing on the Internet: Given a graph $G$, a unique destination vertex $d$, and an integer constant $c>0$, does there exist a static and destination-based routing scheme such that the correct delivery of packets from any source $s$ to the destination $d$ is guaranteed so long as (1) no more than $c$ edges fail and (2) there exists a physical path from $s$ to $d$? We embark upon a systematic exploration of this fundamental question in a variety of models (deterministic routing, randomized routing, with packet-duplication, with packet-header-rewriting) and present both positive and negative results that relate the edge-connectivity of a graph, i.e., the minimum number of edges whose deletion partitions $G$, to its resiliency.
  • In this paper, we study a set of combinatorial optimization problems on weighted graphs: the shortest path problem with negative weights, the weighted perfect bipartite matching problem, the unit-capacity minimum-cost maximum flow problem and the weighted perfect bipartite $b$-matching problem under the assumption that $\Vert b\Vert_1=O(m)$. We show that each one of these four problems can be solved in $\tilde{O}(m^{10/7}\log W)$ time, where $W$ is the absolute maximum weight of an edge in the graph, which gives the first in over 25 years polynomial improvement in their sparse-graph time complexity. At a high level, our algorithms build on the interior-point method-based framework developed by Madry (FOCS 2013) for solving unit-capacity maximum flow problem. We develop a refined way to analyze this framework, as well as provide new variants of the underlying preconditioning and perturbation techniques. Consequently, we are able to extend the whole interior-point method-based approach to make it applicable in the weighted graph regime.
  • We present a new algorithm for generating a uniformly random spanning tree in an undirected graph. Our algorithm samples such a tree in expected $\tilde{O}(m^{4/3})$ time. This improves over the best previously known bound of $\min(\tilde{O}(m\sqrt{n}),O(n^{\omega}))$ -- that follows from the work of Kelner and M\k{a}dry [FOCS'09] and of Colbourn et al. [J. Algorithms'96] -- whenever the input graph is sufficiently sparse. At a high level, our result stems from carefully exploiting the interplay of random spanning trees, random walks, and the notion of effective resistance, as well as from devising a way to algorithmically relate these concepts to the combinatorial structure of the graph. This involves, in particular, establishing a new connection between the effective resistance metric and the cut structure of the underlying graph.
  • We present an $\tilde{O}(m^{10/7})=\tilde{O}(m^{1.43})$-time algorithm for the maximum s-t flow and the minimum s-t cut problems in directed graphs with unit capacities. This is the first improvement over the sparse-graph case of the long-standing $O(m \min(\sqrt{m},n^{2/3}))$ time bound due to Even and Tarjan [EvenT75]. By well-known reductions, this also establishes an $\tilde{O}(m^{10/7})$-time algorithm for the maximum-cardinality bipartite matching problem. That, in turn, gives an improvement over the celebrated celebrated $O(m \sqrt{n})$ time bound of Hopcroft and Karp [HK73] whenever the input graph is sufficiently sparse.
  • We study theoretical runtime guarantees for a class of optimization problems that occur in a wide variety of inference problems. these problems are motivated by the lasso framework and have applications in machine learning and computer vision. Our work shows a close connection between these problems and core questions in algorithmic graph theory. While this connection demonstrates the difficulties of obtaining runtime guarantees, it also suggests an approach of using techniques originally developed for graph algorithms. We then show that most of these problems can be formulated as a grouped least squares problem, and give efficient algorithms for this formulation. Our algorithms rely on routines for solving quadratic minimization problems, which in turn are equivalent to solving linear systems. Finally we present some experimental results on applying our approximation algorithm to image processing problems.
  • We give the first polylogarithmic-competitive randomized online algorithm for the $k$-server problem on an arbitrary finite metric space. In particular, our algorithm achieves a competitive ratio of O(log^3 n log^2 k log log n) for any metric space on n points. Our algorithm improves upon the deterministic (2k-1)-competitive algorithm of Koutsoupias and Papadimitriou [J.ACM'95] whenever n is sub-exponential in k.
  • We present a general method of designing fast approximation algorithms for cut-based minimization problems in undirected graphs. In particular, we develop a technique that given any such problem that can be approximated quickly on trees, allows approximating it almost as quickly on general graphs while only losing a poly-logarithmic factor in the approximation guarantee. To illustrate the applicability of our paradigm, we focus our attention on the undirected sparsest cut problem with general demands and the balanced separator problem. By a simple use of our framework, we obtain poly-logarithmic approximation algorithms for these problems that run in time close to linear. The main tool behind our result is an efficient procedure that decomposes general graphs into simpler ones while approximately preserving the cut-flow structure. This decomposition is inspired by the cut-based graph decomposition of R\"acke that was developed in the context of oblivious routing schemes, as well as, by the construction of the ultrasparsifiers due to Spielman and Teng that was employed to preconditioning symmetric diagonally-dominant matrices.
  • We introduce a new approach to computing an approximately maximum s-t flow in a capacitated, undirected graph. This flow is computed by solving a sequence of electrical flow problems. Each electrical flow is given by the solution of a system of linear equations in a Laplacian matrix, and thus may be approximately computed in nearly-linear time. Using this approach, we develop the fastest known algorithm for computing approximately maximum s-t flows. For a graph having n vertices and m edges, our algorithm computes a (1-\epsilon)-approximately maximum s-t flow in time \tilde{O}(mn^{1/3} \epsilon^{-11/3}). A dual version of our approach computes a (1+\epsilon)-approximately minimum s-t cut in time \tilde{O}(m+n^{4/3}\eps^{-8/3}), which is the fastest known algorithm for this problem as well. Previously, the best dependence on m and n was achieved by the algorithm of Goldberg and Rao (J. ACM 1998), which can be used to compute approximately maximum s-t flows in time \tilde{O}(m\sqrt{n}\epsilon^{-1}), and approximately minimum s-t cuts in time \tilde{O}(m+n^{3/2}\epsilon^{-3}).
  • We combine the work of Garg and Konemann, and Fleischer with ideas from dynamic graph algorithms to obtain faster (1-eps)-approximation schemes for various versions of the multicommodity flow problem. In particular, if eps is moderately small and the size of every number used in the input instance is polynomially bounded, the running times of our algorithms match - up to poly-logarithmic factors and some provably optimal terms - the Omega(mn) flow-decomposition barrier for single-commodity flow.
  • In this paper, we set forth a new algorithm for generating approximately uniformly random spanning trees in undirected graphs. We show how to sample from a distribution that is within a multiplicative $(1+\delta)$ of uniform in expected time $\TO(m\sqrt{n}\log 1/\delta)$. This improves the sparse graph case of the best previously known worst-case bound of $O(\min \{mn, n^{2.376}\})$, which has stood for twenty years. To achieve this goal, we exploit the connection between random walks on graphs and electrical networks, and we use this to introduce a new approach to the problem that integrates discrete random walk-based techniques with continuous linear algebraic methods. We believe that our use of electrical networks and sparse linear system solvers in conjunction with random walks and combinatorial partitioning techniques is a useful paradigm that will find further applications in algorithmic graph theory.
  • We give a 7/9 - Approximation Algorithm for the Maximum Traveling Salesman Problem.
  • It is well known that unconditionally secure bit commitment is impossible even in the quantum world. In this paper a weak variant of quantum bit commitment, introduced independently by Aharonov et al. [STOC, 2000] and Hardy and Kent [Phys. Rev. Lett. 92 (2004)] is investigated. In this variant, the parties require some nonzero probability of detecting a cheating, i.e. if Bob, who commits a bit b to Alice, changes his mind during the revealing phase then Alice detects the cheating with a positive probability (we call this property binding); and if Alice gains information about the committed bit before the revealing phase then Bob discovers this with positive probability (sealing). In our paper we give quantum bit commitment scheme that is simultaneously binding and sealing and we show that if a cheating gives epsilon advantage to a malicious Alice then Bob can detect the cheating with a probability Omega(epsilon^2). If Bob cheats then Alice's probability of detecting the cheating is greater than some fixed constant lambda>0. This improves the probabilities of cheating detections shown by Hardy and Kent and the scheme by Aharonov et al. who presented a protocol that is either binding or sealing, but not simultaneously both. To construct a cheat sensitive quantum bit commitment scheme we use a protocol for a weak quantum one-out-of-two oblivious transfer.