• CleverHans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models' performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial example construction are not comparable to each other, because a good result may indicate a robust model or it may merely indicate a weak implementation of the adversarial example construction procedure. This technical report is structured as follows. Section 1 provides an overview of adversarial examples in machine learning and of the CleverHans software. Section 2 presents the core functionalities of the library: namely the attacks based on adversarial examples and defenses to improve the robustness of machine learning models to these attacks. Section 3 describes how to report benchmark results using the library. Section 4 describes the versioning system.
  • Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and the Transformer attain state-of-the-art results on many tasks. However, they are difficult to parallelize and are thus slow at processing long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallelizable during training, yet still operate sequentially during decoding. Inspired by [arxiv:1711.00937], we present a method to extend sequence models using discrete latent variables that makes decoding much more parallelizable. We first auto-encode the target sequence into a shorter sequence of discrete latent variables, which at inference time is generated autoregressively, and finally decode the output sequence from this shorter latent sequence in parallel. To this end, we introduce a novel method for constructing a sequence of discrete latent variables and compare it with previously introduced methods. Finally, we evaluate our model end-to-end on the task of neural machine translation, where it is an order of magnitude faster at decoding than comparable autoregressive models. While lower in BLEU than purely autoregressive models, our model achieves higher scores than previously proposed non-autoregressive translation models.
  • We present a method to create universal, robust, targeted adversarial image patches in the real world. The patches are universal because they can be used to attack any scene, robust because they work under a wide variety of transformations, and targeted because they can cause a classifier to output any target class. These adversarial patches can be printed, added to any scene, photographed, and presented to image classifiers; even when the patches are small, they cause the classifiers to ignore the other items in the scene and report a chosen target class.
  • We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the model-free Reinforcement Learning setting, where we do not have access to the model parameters, but can only sample states from it. We define robust versions of Q-learning, SARSA, and TD-learning and prove convergence to an approximately optimal robust policy and approximate value function respectively. We scale up the robust algorithms to large MDPs via function approximation and prove convergence under two different settings. We prove convergence of robust approximate policy iteration and robust approximate value iteration for linear architectures (under mild assumptions). We also define a robust loss function, the mean squared robust projected Bellman error and give stochastic gradient descent algorithms that are guaranteed to converge to a local minimum.
  • Despite recent advances, memory-augmented deep neural networks are still limited when it comes to life-long and one-shot learning, especially in remembering rare events. We present a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training. Our memory module can be easily added to any part of a supervised neural network. To show its versatility we add it to a number of networks, from simple convolutional ones tested on image classification to deep sequence-to-sequence and recurrent-convolutional models. In all cases, the enhanced network gains the ability to remember and do life-long one-shot learning. Our module remembers training examples shown many thousands of steps in the past and it can successfully generalize from them. We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task.
  • Yannakakis showed that the matching problem does not have a small symmetric linear program. Rothvo{\ss} recently proved that any, not necessarily symmetric, linear program also has exponential size. It is natural to ask whether the matching problem can be expressed compactly in a framework such as semidefinite programming (SDP) that is more powerful than linear programming but still allows efficient optimization. We answer this question negatively for symmetric SDPs: any symmetric SDP for the matching problem has exponential size. We also show that an O(k)-round Lasserre SDP relaxation for the metric traveling salesperson problem yields at least as good an approximation as any symmetric SDP relaxation of size $n^k$. The key technical ingredient underlying both these results is an upper bound on the degree needed to derive polynomial identities that hold over the space of matchings or traveling salesperson tours.
  • We study the cost function for hierarchical clusterings introduced by [arXiv:1510.05043] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [arXiv:1510.05043] that a top-down algorithm returns a hierarchical clustering of cost at most $O\left(\alpha_n \log n\right)$ times the cost of the optimal hierarchical clustering, where $\alpha_n$ is the approximation ratio of the Sparsest Cut subroutine used. Thus using the best known approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top down algorithm returns a hierarchical clustering of cost at most $O\left(\log^{3/2} n\right)$ times the cost of the optimal solution. We improve this by giving an $O(\log{n})$-approximation algorithm for this problem. Our main technical ingredients are a combinatorial characterization of ultrametrics induced by this cost function, deriving an Integer Linear Programming (ILP) formulation for this family of ultrametrics, and showing how to iteratively round an LP relaxation of this formulation by using the idea of \emph{sphere growing} which has been extensively used in the context of graph partitioning. We also prove that our algorithm returns an $O(\log{n})$-approximate hierarchical clustering for a generalization of this cost function also studied in [arXiv:1510.05043]. Experiments show that the hierarchies found by using the ILP formulation as well as our rounding algorithm often have better projections into flat clusters than the standard linkage based algorithms. We also give constant factor inapproximability results for this problem.
  • We generalize the reduction mechanism for linear programming problems and semidefinite programming problems from [arXiv:1410.8816] in two ways 1) relaxing the requirement of affineness and 2) extending to fractional optimization problems. As applications we provide several new LP-hardness and SDP-hardness results, e.g., for the SparsestCut problem, the BalancedSeparator problem, the MaxCut problem and the Matching problem on 3-regular graphs. We also provide a new, very strong Lasserre integrality gap result for the IndependentSet problem, which is strictly greater than the best known LP approximation, showing that the Lasserre hierarchy does not always provide the tightest SDP relaxation.
  • We consider the problem of deterministically factoring a univariate polynomial over a finite field under the assumption of the Extended Riemann Hypothesis (ERH). This work builds upon the line of approach first explored by Gao in $2001$. The general approach has been to implicitly construct a graph with the roots as vertices and the edges formed by some polynomial time computable relation. The algorithm then fails to factor a polynomial if this associated graph turns out to be \emph{regular}. In the first part of our work we strengthen the edge relation so that the resulting set of graphs we obtain are subgraphs of Gao's graphs, all of which must be \emph{regular}. In the second part of our work we strengthen the regularity condition of these graphs. This is accomplished by finding a parallel between their algorithms and the $1$-dimensional Weisfeiler-Leman algorithm for solving the Graph Isomorphism problem. We observe that the general principle behind their algorithms is to separate the roots by computing the $1$-dimensional Weisfeiler-Leman approximation to the orbits of this graph. This leads us to the natural question of whether this approximation may be improved. We then go on to show how to implicitly compute the $2$-dimensional Weisfeiler-Leman approximation of the orbits of these graphs. The polynomials that our algorithm fails to factor form graphs that are \emph{strongly regular} and their set of adjacency matrices forms a combinatorial structure called an \emph{Association scheme}.