
While Generative Adversarial Networks (GANs) have demonstrated promising
performance on multiple vision tasks, their learning dynamics are not yet well
understood, both in theory and in practice. To address this issue, we study GAN
dynamics in a simple yet rich parametric model that exhibits several of the
common problematic convergence behaviors such as vanishing gradients, mode
collapse, and diverging or oscillatory behavior. In spite of the nonconvex
nature of our model, we are able to perform a rigorous theoretical analysis of
its convergence behavior. Our analysis reveals an interesting dichotomy: a GAN
with an optimal discriminator provably converges, while first order
approximations of the discriminator steps lead to unstable GAN dynamics and
mode collapse. Our result suggests that using first order discriminator steps
(the defacto standard in most existing GAN setups) might be one of the factors
that makes GAN training challenging in practice.

Machine learning models are often susceptible to adversarial perturbations of
their inputs. Even small perturbations can cause stateoftheart classifiers
with high "standard" accuracy to produce an incorrect prediction with high
confidence. To better understand this phenomenon, we study adversarially robust
learning from the viewpoint of generalization. We show that already in a simple
natural data model, the sample complexity of robust learning can be
significantly larger than that of "standard" learning. This gap is information
theoretic and holds irrespective of the training algorithm or the model family.
We complement our theoretical results with experiments on popular image
classification datasets and show that a similar gap exists here as well. We
postulate that the difficulty of training robust classifiers stems, at least
partially, from this inherently larger sample complexity.

A fundamental, and still largely unanswered, question in the context of
Generative Adversarial Networks (GANs) is whether GANs are actually able to
capture the key characteristics of the datasets they are trained on. The
current approaches to examining this issue require significant human
supervision, such as visual inspection of sampled images, and often offer only
fairly limited scalability. In this paper, we propose new techniques that
employ a classificationbased perspective to evaluate synthetic GAN
distributions and their capability to accurately reflect the essential
properties of the training data. These techniques require only minimal human
supervision and can easily be scaled and adapted to evaluate a variety of
stateoftheart GANs on large, popular datasets. Our analysis indicates that
GANs have significant problems in reproducing the more distributional
properties of the training dataset. In particular, when seen through the lens
of classification, the diversity of GAN data is orders of magnitude less than
that of the original data.

We show that simple transformations, namely translations and rotations alone,
are sufficient to fool neural networkbased vision models on a significant
fraction of inputs. This is in sharp contrast to previous work that relied on
more complicated optimization approaches that are unlikely to appear outside of
a truly adversarial setting. Moreover, fooling rotations and translations are
easy to find and require only a few blackbox queries to the target model.
Overall, our findings emphasize the need for designing robust classifiers even
in natural, benign contexts.

For over a decade now we have been witnessing the success of {\em massive
parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or
Spark. One of the reasons for their success is the fact that these frameworks
are able to accurately capture the nature of largescale computation. In
particular, compared to the classic distributed algorithms or PRAM models,
these frameworks allow for much more local computation. The fundamental
question that arises in this context is though: can we leverage this additional
power to obtain even faster parallel algorithms?
A prominent example here is the {\em maximum matching} problemone of the
most classic graph problems. It is well known that in the PRAM model one can
compute a 2approximate maximum matching in $O(\log{n})$ rounds. However, the
exact complexity of this problem in the MPC framework is still far from
understood. Lattanzi et al. showed that if each machine has $n^{1+\Omega(1)}$
memory, this problem can also be solved $2$approximately in a constant number
of rounds. These techniques, as well as the approaches developed in the follow
up work, seem though to get stuck in a fundamental way at roughly $O(\log{n})$
rounds once we enter the nearlinear memory regime. It is thus entirely
possible that in this regime, which captures in particular the case of sparse
graph computations, the best MPC round complexity matches what one can already
get in the PRAM model, without the need to take advantage of the extra local
computation power.
In this paper, we finally refute that perplexing possibility. That is, we
break the above $O(\log n)$ round complexity bound even in the case of {\em
slightly sublinear} memory per machine. In fact, our improvement here is {\em
almost exponential}: we are able to deliver a $(2+\epsilon)$approximation to
maximum matching, for any fixed constant $\epsilon>0$, in $O((\log \log n)^2)$
rounds.

In this paper, we study matrix scaling and balancing, which are fundamental
problems in scientific computing, with a long line of work on them that dates
back to the 1960s. We provide algorithms for both these problems that, ignoring
logarithmic factors involving the dimension of the input matrix and the size of
its entries, both run in time $\widetilde{O}\left(m\log \kappa \log^2
(1/\epsilon)\right)$ where $\epsilon$ is the amount of error we are willing to
tolerate. Here, $\kappa$ represents the ratio between the largest and the
smallest entries of the optimal scalings. This implies that our algorithms run
in nearlylinear time whenever $\kappa$ is quasipolynomial, which includes, in
particular, the case of strictly positive matrices. We complement our results
by providing a separate algorithm that uses an interiorpoint method and runs
in time $\widetilde{O}(m^{3/2} \log (1/\epsilon))$.
In order to establish these results, we develop a new secondorder
optimization framework that enables us to treat both problems in a unified and
principled manner. This framework identifies a certain generalization of linear
system solving that we can use to efficiently minimize a broad class of
functions, which we call secondorder robust. We then show that in the context
of the specific functions capturing matrix scaling and balancing, we can
leverage and generalize the work on Laplacian system solving to make the
algorithms obtained via this framework very efficient.

Recent work has demonstrated that deep neural networks are vulnerable to
adversarial examplesinputs that are almost indistinguishable from natural
data and yet classified incorrectly by the network. In fact, some of the latest
findings suggest that the existence of adversarial attacks may be an inherent
weakness of deep learning models. To address this problem, we study the
adversarial robustness of neural networks through the lens of robust
optimization. This approach provides us with a broad and unifying view on much
of the prior work on this topic. Its principled nature also enables us to
identify methods for both training and attacking neural networks that are
reliable and, in a certain sense, universal. In particular, they specify a
concrete security guarantee that would protect against any adversary. These
methods let us train networks with significantly improved resistance to a wide
range of adversarial attacks. They also suggest the notion of security against
a firstorder adversary as a natural and broad security guarantee. We believe
that robustness against such welldefined classes of adversaries is an
important stepping stone towards fully resistant deep learning models. Code and
pretrained models are available at https://github.com/MadryLab/mnist_challenge
and https://github.com/MadryLab/cifar10_challenge.

We present an $\tilde{O}\left(m^{\frac{10}{7}}U^{\frac{1}{7}}\right)$time
algorithm for the maximum $s$$t$ flow problem and the minimum $s$$t$ cut
problem in directed graphs with $m$ arcs and largest integer capacity $U$. This
matches the running time of the
$\tilde{O}\left((mU)^{\frac{10}{7}}\right)$time algorithm of M\k{a}dry (FOCS
2013) in the unitcapacity case, and improves over it, as well as over the
$\tilde{O}\left(m \sqrt{n} \log U\right)$time algorithm of Lee and Sidford
(FOCS 2014), whenever $U$ is moderately large and the graph is sufficiently
sparse. By wellknown reductions, this also gives similar running time
improvements for the maximumcardinality bipartite $b$matching problem.
One of the advantages of our algorithm is that it is significantly simpler
than the ones presented in Madry (FOCS 2013) and Lee and Sidford (FOCS 2014).
In particular, these algorithms employ a sophisticated interiorpoint method
framework, while our algorithm is cast directly in the classic augmenting path
setting that almost all the combinatorial maximum flow algorithms use. At a
high level, the presented algorithm takes a primal dual approach in which each
iteration uses electrical flows computations both to find an augmenting $s$$t$
flow in the current residual graph and to update the dual solution. We show
that by maintain certain careful coupling of these primal and dual solutions we
are always guaranteed to make significant progress.

We present and study the StaticRoutingResiliency problem, motivated by
routing on the Internet: Given a graph $G$, a unique destination vertex $d$,
and an integer constant $c>0$, does there exist a static and destinationbased
routing scheme such that the correct delivery of packets from any source $s$ to
the destination $d$ is guaranteed so long as (1) no more than $c$ edges fail
and (2) there exists a physical path from $s$ to $d$? We embark upon a
systematic exploration of this fundamental question in a variety of models
(deterministic routing, randomized routing, with packetduplication, with
packetheaderrewriting) and present both positive and negative results that
relate the edgeconnectivity of a graph, i.e., the minimum number of edges
whose deletion partitions $G$, to its resiliency.

In this paper, we study a set of combinatorial optimization problems on
weighted graphs: the shortest path problem with negative weights, the weighted
perfect bipartite matching problem, the unitcapacity minimumcost maximum flow
problem and the weighted perfect bipartite $b$matching problem under the
assumption that $\Vert b\Vert_1=O(m)$. We show that each one of these four
problems can be solved in $\tilde{O}(m^{10/7}\log W)$ time, where $W$ is the
absolute maximum weight of an edge in the graph, which gives the first in over
25 years polynomial improvement in their sparsegraph time complexity.
At a high level, our algorithms build on the interiorpoint methodbased
framework developed by Madry (FOCS 2013) for solving unitcapacity maximum flow
problem. We develop a refined way to analyze this framework, as well as provide
new variants of the underlying preconditioning and perturbation techniques.
Consequently, we are able to extend the whole interiorpoint methodbased
approach to make it applicable in the weighted graph regime.

We present a new algorithm for generating a uniformly random spanning tree in
an undirected graph. Our algorithm samples such a tree in expected
$\tilde{O}(m^{4/3})$ time. This improves over the best previously known bound
of $\min(\tilde{O}(m\sqrt{n}),O(n^{\omega}))$  that follows from the work of
Kelner and M\k{a}dry [FOCS'09] and of Colbourn et al. [J. Algorithms'96] 
whenever the input graph is sufficiently sparse.
At a high level, our result stems from carefully exploiting the interplay of
random spanning trees, random walks, and the notion of effective resistance, as
well as from devising a way to algorithmically relate these concepts to the
combinatorial structure of the graph. This involves, in particular,
establishing a new connection between the effective resistance metric and the
cut structure of the underlying graph.

We present an $\tilde{O}(m^{10/7})=\tilde{O}(m^{1.43})$time algorithm for
the maximum st flow and the minimum st cut problems in directed graphs with
unit capacities. This is the first improvement over the sparsegraph case of
the longstanding $O(m \min(\sqrt{m},n^{2/3}))$ time bound due to Even and
Tarjan [EvenT75]. By wellknown reductions, this also establishes an
$\tilde{O}(m^{10/7})$time algorithm for the maximumcardinality bipartite
matching problem. That, in turn, gives an improvement over the celebrated
celebrated $O(m \sqrt{n})$ time bound of Hopcroft and Karp [HK73] whenever the
input graph is sufficiently sparse.

We study theoretical runtime guarantees for a class of optimization problems
that occur in a wide variety of inference problems. these problems are
motivated by the lasso framework and have applications in machine learning and
computer vision.
Our work shows a close connection between these problems and core questions
in algorithmic graph theory. While this connection demonstrates the
difficulties of obtaining runtime guarantees, it also suggests an approach of
using techniques originally developed for graph algorithms.
We then show that most of these problems can be formulated as a grouped least
squares problem, and give efficient algorithms for this formulation. Our
algorithms rely on routines for solving quadratic minimization problems, which
in turn are equivalent to solving linear systems. Finally we present some
experimental results on applying our approximation algorithm to image
processing problems.

We give the first polylogarithmiccompetitive randomized online algorithm for
the $k$server problem on an arbitrary finite metric space. In particular, our
algorithm achieves a competitive ratio of O(log^3 n log^2 k log log n) for any
metric space on n points. Our algorithm improves upon the deterministic
(2k1)competitive algorithm of Koutsoupias and Papadimitriou [J.ACM'95]
whenever n is subexponential in k.

We present a general method of designing fast approximation algorithms for
cutbased minimization problems in undirected graphs. In particular, we develop
a technique that given any such problem that can be approximated quickly on
trees, allows approximating it almost as quickly on general graphs while only
losing a polylogarithmic factor in the approximation guarantee.
To illustrate the applicability of our paradigm, we focus our attention on
the undirected sparsest cut problem with general demands and the balanced
separator problem. By a simple use of our framework, we obtain polylogarithmic
approximation algorithms for these problems that run in time close to linear.
The main tool behind our result is an efficient procedure that decomposes
general graphs into simpler ones while approximately preserving the cutflow
structure. This decomposition is inspired by the cutbased graph decomposition
of R\"acke that was developed in the context of oblivious routing schemes, as
well as, by the construction of the ultrasparsifiers due to Spielman and Teng
that was employed to preconditioning symmetric diagonallydominant matrices.

We introduce a new approach to computing an approximately maximum st flow in
a capacitated, undirected graph. This flow is computed by solving a sequence of
electrical flow problems. Each electrical flow is given by the solution of a
system of linear equations in a Laplacian matrix, and thus may be approximately
computed in nearlylinear time.
Using this approach, we develop the fastest known algorithm for computing
approximately maximum st flows. For a graph having n vertices and m edges, our
algorithm computes a (1\epsilon)approximately maximum st flow in time
\tilde{O}(mn^{1/3} \epsilon^{11/3}). A dual version of our approach computes a
(1+\epsilon)approximately minimum st cut in time
\tilde{O}(m+n^{4/3}\eps^{8/3}), which is the fastest known algorithm for this
problem as well. Previously, the best dependence on m and n was achieved by the
algorithm of Goldberg and Rao (J. ACM 1998), which can be used to compute
approximately maximum st flows in time \tilde{O}(m\sqrt{n}\epsilon^{1}), and
approximately minimum st cuts in time \tilde{O}(m+n^{3/2}\epsilon^{3}).

We combine the work of Garg and Konemann, and Fleischer with ideas from
dynamic graph algorithms to obtain faster (1eps)approximation schemes for
various versions of the multicommodity flow problem. In particular, if eps is
moderately small and the size of every number used in the input instance is
polynomially bounded, the running times of our algorithms match  up to
polylogarithmic factors and some provably optimal terms  the Omega(mn)
flowdecomposition barrier for singlecommodity flow.

In this paper, we set forth a new algorithm for generating approximately
uniformly random spanning trees in undirected graphs. We show how to sample
from a distribution that is within a multiplicative $(1+\delta)$ of uniform in
expected time $\TO(m\sqrt{n}\log 1/\delta)$. This improves the sparse graph
case of the best previously known worstcase bound of $O(\min \{mn,
n^{2.376}\})$, which has stood for twenty years.
To achieve this goal, we exploit the connection between random walks on
graphs and electrical networks, and we use this to introduce a new approach to
the problem that integrates discrete random walkbased techniques with
continuous linear algebraic methods. We believe that our use of electrical
networks and sparse linear system solvers in conjunction with random walks and
combinatorial partitioning techniques is a useful paradigm that will find
further applications in algorithmic graph theory.

We give a 7/9  Approximation Algorithm for the Maximum Traveling Salesman
Problem.

It is well known that unconditionally secure bit commitment is impossible
even in the quantum world. In this paper a weak variant of quantum bit
commitment, introduced independently by Aharonov et al. [STOC, 2000] and Hardy
and Kent [Phys. Rev. Lett. 92 (2004)] is investigated. In this variant, the
parties require some nonzero probability of detecting a cheating, i.e. if Bob,
who commits a bit b to Alice, changes his mind during the revealing phase then
Alice detects the cheating with a positive probability (we call this property
binding); and if Alice gains information about the committed bit before the
revealing phase then Bob discovers this with positive probability (sealing). In
our paper we give quantum bit commitment scheme that is simultaneously binding
and sealing and we show that if a cheating gives epsilon advantage to a
malicious Alice then Bob can detect the cheating with a probability
Omega(epsilon^2). If Bob cheats then Alice's probability of detecting the
cheating is greater than some fixed constant lambda>0. This improves the
probabilities of cheating detections shown by Hardy and Kent and the scheme by
Aharonov et al. who presented a protocol that is either binding or sealing, but
not simultaneously both. To construct a cheat sensitive quantum bit commitment
scheme we use a protocol for a weak quantum oneoutoftwo oblivious transfer.