
Animals execute goaldirected behaviours despite the limited range and scope
of their sensors. To cope, they explore environments and store memories
maintaining estimates of important information that is not presently available.
Recently, progress has been made with artificial intelligence (AI) agents that
learn to perform tasks from sensory input, even at a human level, by merging
reinforcement learning (RL) algorithms with deep neural networks, and the
excitement surrounding these results has led to the pursuit of related ideas as
explanations of nonhuman animal learning. However, we demonstrate that
contemporary RL algorithms struggle to solve simple tasks when enough
information is concealed from the sensors of the agent, a property called
"partial observability". An obvious requirement for handling partially observed
tasks is access to extensive memory, but we show memory is not enough; it is
critical that the right information be stored in the right format. We develop a
model, the Memory, RL, and Inference Network (MERLIN), in which memory
formation is guided by a process of predictive modeling. MERLIN facilitates the
solution of tasks in 3D virtual reality environments for which partial
observability is severe and memories must be maintained over long durations.
Our model demonstrates a single learning agent architecture that can solve
canonical behavioural tasks in psychology and neurobiology without strong
simplifying assumptions about the dimensionality of sensory input or the
duration of experiences.

Deep autoregressive models have shown stateoftheart performance in density
estimation for natural images on largescale datasets such as ImageNet.
However, such models require many thousands of gradientbased weight updates
and unique image examples for training. Ideally, the models would rapidly learn
visual concepts from only a handful of examples, similar to the manner in which
humans learns across many vision tasks. In this paper, we show how 1) neural
attention and 2) meta learning techniques can be used in combination with
autoregressive models to enable effective fewshot density estimation. Our
proposed modifications to PixelCNN result in stateofthe art fewshot density
estimation on the Omniglot dataset. Furthermore, we visualize the learned
attention policy and find that it learns intuitive algorithms for simple tasks
such as image mirroring on ImageNet and handwriting on Omniglot without
supervision. Finally, we extend the model to natural images and demonstrate
fewshot image generation on the Stanford Online Products dataset.

A key challenge in modelbased reinforcement learning (RL) is to synthesize
computationally efficient and accurate environment models. We show that
carefully designed generative models that learn and operate on compact state
representations, socalled statespace models, substantially reduce the
computational costs for predicting outcomes of sequences of actions. Extensive
experiments establish that statespace models accurately capture the dynamics
of Atari games from the Arcade Learning Environment from raw pixels. The
computational speedup of statespace models while maintaining high accuracy
makes their application in RL feasible: We demonstrate that agents which query
these models for decision making outperform strong modelfree baselines on the
game MSPACMAN, demonstrating the potential of using learned environment models
for planning.

Reasoning about objects, relations, and physics is central to human
intelligence, and a key goal of artificial intelligence. Here we introduce the
interaction network, a model which can reason about how objects in complex
systems interact, supporting dynamical predictions, as well as inferences about
the abstract properties of the system. Our model takes graphs as input,
performs object and relationcentric reasoning in a way that is analogous to a
simulation, and is implemented using deep neural networks. We evaluate its
ability to reason about several challenging physical domains: nbody problems,
rigidbody collision, and nonrigid dynamics. Our results show it can be
trained to accurately simulate the physical trajectories of dozens of objects
over thousands of time steps, estimate abstract quantities such as energy, and
generalize automatically to systems with different numbers and configurations
of objects and relations. Our interaction network implementation is the first
generalpurpose, learnable physics engine, and a powerful general framework for
reasoning about object and relations in a wide variety of complex realworld
domains.

We consider the problem of density estimation on Riemannian manifolds.
Density estimation on manifolds has many applications in fluidmechanics,
optics and plasma physics and it appears often when dealing with angular
variables (such as used in protein folding, robot limbs, geneexpression) and
in general directional statistics. In spite of the multitude of algorithms
available for density estimation in the Euclidean spaces $\mathbf{R}^n$ that
scale to large n (e.g. normalizing flows, kernel methods and variational
approximations), most of these methods are not immediately suitable for density
estimation in more general Riemannian manifolds. We revisit techniques related
to homeomorphisms from differential geometry for projecting densities to
submanifolds and use it to generalize the idea of normalizing flows to more
general Riemannian manifolds. The resulting algorithm is scalable, simple to
implement and suitable for use with automatic differentiation. We demonstrate
concrete examples of this method on the nsphere $\mathbf{S}^n$.

General unsupervised learning is a longstanding conceptual problem in
machine learning. Supervised learning is successful because it can be solved by
the minimization of the training error cost function. Unsupervised learning is
not as successful, because the unsupervised objective may be unrelated to the
supervised task of interest. For an example, density modelling and
reconstruction have often been used for unsupervised learning, but they did not
produced the soughtafter performance gains, because they have no knowledge of
the supervised tasks.
In this paper, we present an unsupervised cost function which we name the
Output Distribution Matching (ODM) cost, which measures a divergence between
the distribution of predictions and distributions of labels. The ODM cost is
appealing because it is consistent with the supervised cost in the following
sense: a perfect supervised classifier is also perfect according to the ODM
cost. Therefore, by aggressively optimizing the ODM cost, we are almost
guaranteed to improve our supervised performance whenever the space of possible
predictions is exponentially large.
We demonstrate that the ODM cost works well on number of small and
semiartificial datasets using no (or almost no) labelled training cases.
Finally, we show that the ODM cost can be used for oneshot domain adaptation,
which allows the model to classify inputs that differ from the input
distribution in significant ways without the need for prior exposure to the new
domain.