
CleverHans is a software library that provides standardized reference
implementations of adversarial example construction techniques and adversarial
training. The library may be used to develop more robust machine learning
models and to provide standardized benchmarks of models' performance in the
adversarial setting. Benchmarks constructed without a standardized
implementation of adversarial example construction are not comparable to each
other, because a good result may indicate a robust model or it may merely
indicate a weak implementation of the adversarial example construction
procedure.
This technical report is structured as follows. Section 1 provides an
overview of adversarial examples in machine learning and of the CleverHans
software. Section 2 presents the core functionalities of the library: namely
the attacks based on adversarial examples and defenses to improve the
robustness of machine learning models to these attacks. Section 3 describes how
to report benchmark results using the library. Section 4 describes the
versioning system.

Autoregressive sequence models based on deep neural networks, such as RNNs,
Wavenet and the Transformer attain stateoftheart results on many tasks.
However, they are difficult to parallelize and are thus slow at processing long
sequences. RNNs lack parallelism both during training and decoding, while
architectures like WaveNet and Transformer are much more parallelizable during
training, yet still operate sequentially during decoding.
Inspired by [arxiv:1711.00937], we present a method to extend sequence models
using discrete latent variables that makes decoding much more parallelizable.
We first autoencode the target sequence into a shorter sequence of discrete
latent variables, which at inference time is generated autoregressively, and
finally decode the output sequence from this shorter latent sequence in
parallel. To this end, we introduce a novel method for constructing a sequence
of discrete latent variables and compare it with previously introduced methods.
Finally, we evaluate our model endtoend on the task of neural machine
translation, where it is an order of magnitude faster at decoding than
comparable autoregressive models. While lower in BLEU than purely
autoregressive models, our model achieves higher scores than previously
proposed nonautoregressive translation models.

We present a method to create universal, robust, targeted adversarial image
patches in the real world. The patches are universal because they can be used
to attack any scene, robust because they work under a wide variety of
transformations, and targeted because they can cause a classifier to output any
target class. These adversarial patches can be printed, added to any scene,
photographed, and presented to image classifiers; even when the patches are
small, they cause the classifiers to ignore the other items in the scene and
report a chosen target class.

We study reinforcement learning under model misspecification, where we do not
have access to the true environment but only to a reasonably close
approximation to it. We address this problem by extending the framework of
robust MDPs to the modelfree Reinforcement Learning setting, where we do not
have access to the model parameters, but can only sample states from it. We
define robust versions of Qlearning, SARSA, and TDlearning and prove
convergence to an approximately optimal robust policy and approximate value
function respectively. We scale up the robust algorithms to large MDPs via
function approximation and prove convergence under two different settings. We
prove convergence of robust approximate policy iteration and robust approximate
value iteration for linear architectures (under mild assumptions). We also
define a robust loss function, the mean squared robust projected Bellman error
and give stochastic gradient descent algorithms that are guaranteed to converge
to a local minimum.

Despite recent advances, memoryaugmented deep neural networks are still
limited when it comes to lifelong and oneshot learning, especially in
remembering rare events. We present a largescale lifelong memory module for
use in deep learning. The module exploits fast nearestneighbor algorithms for
efficiency and thus scales to large memory sizes. Except for the
nearestneighbor query, the module is fully differentiable and trained
endtoend with no extra supervision. It operates in a lifelong manner, i.e.,
without the need to reset it during training.
Our memory module can be easily added to any part of a supervised neural
network. To show its versatility we add it to a number of networks, from simple
convolutional ones tested on image classification to deep sequencetosequence
and recurrentconvolutional models. In all cases, the enhanced network gains
the ability to remember and do lifelong oneshot learning. Our module
remembers training examples shown many thousands of steps in the past and it
can successfully generalize from them. We set new stateoftheart for oneshot
learning on the Omniglot dataset and demonstrate, for the first time, lifelong
oneshot learning in recurrent neural networks on a largescale machine
translation task.

Yannakakis showed that the matching problem does not have a small symmetric
linear program. Rothvo{\ss} recently proved that any, not necessarily
symmetric, linear program also has exponential size. It is natural to ask
whether the matching problem can be expressed compactly in a framework such as
semidefinite programming (SDP) that is more powerful than linear programming
but still allows efficient optimization. We answer this question negatively for
symmetric SDPs: any symmetric SDP for the matching problem has exponential
size.
We also show that an O(k)round Lasserre SDP relaxation for the metric
traveling salesperson problem yields at least as good an approximation as any
symmetric SDP relaxation of size $n^k$.
The key technical ingredient underlying both these results is an upper bound
on the degree needed to derive polynomial identities that hold over the space
of matchings or traveling salesperson tours.

We study the cost function for hierarchical clusterings introduced by
[arXiv:1510.05043] where hierarchies are treated as firstclass objects rather
than deriving their cost from projections into flat clusters. It was also shown
in [arXiv:1510.05043] that a topdown algorithm returns a hierarchical
clustering of cost at most $O\left(\alpha_n \log n\right)$ times the cost of
the optimal hierarchical clustering, where $\alpha_n$ is the approximation
ratio of the Sparsest Cut subroutine used. Thus using the best known
approximation algorithm for Sparsest Cut due to AroraRaoVazirani, the top
down algorithm returns a hierarchical clustering of cost at most
$O\left(\log^{3/2} n\right)$ times the cost of the optimal solution. We improve
this by giving an $O(\log{n})$approximation algorithm for this problem. Our
main technical ingredients are a combinatorial characterization of ultrametrics
induced by this cost function, deriving an Integer Linear Programming (ILP)
formulation for this family of ultrametrics, and showing how to iteratively
round an LP relaxation of this formulation by using the idea of \emph{sphere
growing} which has been extensively used in the context of graph partitioning.
We also prove that our algorithm returns an $O(\log{n})$approximate
hierarchical clustering for a generalization of this cost function also studied
in [arXiv:1510.05043]. Experiments show that the hierarchies found by using the
ILP formulation as well as our rounding algorithm often have better projections
into flat clusters than the standard linkage based algorithms. We also give
constant factor inapproximability results for this problem.

We generalize the reduction mechanism for linear programming problems and
semidefinite programming problems from [arXiv:1410.8816] in two ways 1)
relaxing the requirement of affineness and 2) extending to fractional
optimization problems. As applications we provide several new LPhardness and
SDPhardness results, e.g., for the SparsestCut problem, the BalancedSeparator
problem, the MaxCut problem and the Matching problem on 3regular graphs. We
also provide a new, very strong Lasserre integrality gap result for the
IndependentSet problem, which is strictly greater than the best known LP
approximation, showing that the Lasserre hierarchy does not always provide the
tightest SDP relaxation.

We consider the problem of deterministically factoring a univariate
polynomial over a finite field under the assumption of the Extended Riemann
Hypothesis (ERH). This work builds upon the line of approach first explored by
Gao in $2001$. The general approach has been to implicitly construct a graph
with the roots as vertices and the edges formed by some polynomial time
computable relation. The algorithm then fails to factor a polynomial if this
associated graph turns out to be \emph{regular}. In the first part of our work
we strengthen the edge relation so that the resulting set of graphs we obtain
are subgraphs of Gao's graphs, all of which must be \emph{regular}.
In the second part of our work we strengthen the regularity condition of
these graphs. This is accomplished by finding a parallel between their
algorithms and the $1$dimensional WeisfeilerLeman algorithm for solving the
Graph Isomorphism problem. We observe that the general principle behind their
algorithms is to separate the roots by computing the $1$dimensional
WeisfeilerLeman approximation to the orbits of this graph. This leads us to
the natural question of whether this approximation may be improved. We then go
on to show how to implicitly compute the $2$dimensional WeisfeilerLeman
approximation of the orbits of these graphs. The polynomials that our algorithm
fails to factor form graphs that are \emph{strongly regular} and their set of
adjacency matrices forms a combinatorial structure called an \emph{Association
scheme}.