
In a physical neural system, learning rules must be local both in space and
time. In order for learning to occur, nonlocal information must be
communicated to the deep synapses through a communication channel, the deep
learning channel. We identify several possible architectures for this learning
channel (Bidirectional, Conjoined, Twin, Distinct) and six symmetry challenges:
1) symmetry of architectures; 2) symmetry of weights; 3) symmetry of neurons;
4) symmetry of derivatives; 5) symmetry of processing; and 6) symmetry of
learning rules. Random backpropagation (RBP) addresses the second and third
symmetry, and some of its variations, such as skipped RBP (SRBP) address the
first and the fourth symmetry. Here we address the last two desirable
symmetries showing through simulations that they can be achieved and that the
learning channel is particularly robust to symmetry variations. Specifically,
random backpropagation and its variations can be performed with the same
nonlinear neurons used in the main inputoutput forward channel, and the
connections in the learning channel can be adapted using the same algorithm
used in the forward channel, removing the need for any specialized hardware in
the learning channel. Finally, we provide mathematical results in simple cases
showing that the learning equations in the forward and backward channels
converge to fixed points, for almost any initial conditions. In symmetric
architectures, if the weights in both channels are small at initialization,
adaptation in both channels leads to weights that are essentially symmetric
during and after learning. Biological connections are discussed.

Random backpropagation (RBP) is a variant of the backpropagation algorithm
for training neural networks, where the transpose of the forward matrices are
replaced by fixed random matrices in the calculation of the weight updates. It
is remarkable both because of its effectiveness, in spite of using random
matrices to communicate error information, and because it completely removes
the taxing requirement of maintaining symmetric weights in a physical neural
system. To better understand random backpropagation, we first connect it to the
notions of local learning and learning channels. Through this connection, we
derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP
(ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their
computational complexity. We then study their behavior through simulations
using the MNIST and CIFAR10 bechnmark datasets. These simulations show that
most of these variants work robustly, almost as well as backpropagation, and
that multiplication by the derivatives of the activation functions is
important. As a followup, we study also the lowend of the number of bits
required to communicate error information over the learning channel. We then
provide partial intuitive explanations for some of the remarkable properties of
RBP and its variations. Finally, we prove several mathematical results,
including the convergence to fixed points of linear chains of arbitrary length,
the convergence to fixed points of linear autoencoders with decorrelated data,
the longterm existence of solutions for linear systems with a single hidden
layer and convergence in special cases, and the convergence to fixed points of
nonlinear chains, when the derivative of the activation functions is included.

Antihydrogen is at the forefront of antimatter research at the CERN
Antiproton Decelerator. Experiments aiming to test the fundamental CPT symmetry
and antigravity effects require the efficient detection of antihydrogen
annihilation events, which is performed using highly granular tracking
detectors installed around an antimatter trap. Improving the efficiency of the
antihydrogen annihilation detection plays a central role in the final
sensitivity of the experiments. We propose deep learning as a novel technique
to analyze antihydrogen annihilation data, and compare its performance with a
traditional track and vertex reconstruction method. We report that the deep
learning approach yields significant improvement, tripling event coverage while
simultaneously improving performance by over 5% in terms of Area Under Curve
(AUC).

We describe a strategy for constructing a neural network jet substructure
tagger which powerfully discriminates boosted decay signals while remaining
largely uncorrelated with the jet mass. This reduces the impact of systematic
uncertainties in background modeling while enhancing signal purity, resulting
in improved discovery significance relative to existing taggers. The network is
trained using an adversarial strategy, resulting in a tagger that learns to
balance classification accuracy with decorrelation. As a benchmark scenario, we
consider the case where largeradius jets originating from a boosted resonance
decay are discriminated from a background of nonresonant quark and gluon jets.
We show that in the presence of systematic uncertainties on the background
rate, our adversariallytrained, decorrelated tagger considerably outperforms a
conventionally trained neural network, despite having a slightly worse
signalbackground separation power. We generalize the adversarial training
technique to include a parametric dependence on the signal hypothesis, training
a single network that provides optimized, interpolatable decorrelated jet
tagging across a continuous range of hypothetical resonance masses, after
training on discrete choices of the signal mass.

Experiments in particle physics produce enormous quantities of data that must
be analyzed and interpreted by teams of physicists. This analysis is often
exploratory, where scientists are unable to enumerate the possible types of
signal prior to performing the experiment. Thus, tools for summarizing,
clustering, visualizing and classifying highdimensional data are essential. In
this work, we show that meaningful physical content can be revealed by
transforming the raw data into a learned highlevel representation using deep
neural networks, with measurements taken at the Daya Bay Neutrino Experiment as
a case study. We further show how convolutional deep neural networks can
provide an effective classification filter with greater than 97% accuracy
across different classes of physics events, significantly better than other
machine learning approaches.

In a physical neural system, where storage and processing are intimately
intertwined, the rules for adjusting the synaptic weights can only depend on
variables that are available locally, such as the activity of the pre and
postsynaptic neurons, resulting in local learning rules. A systematic
framework for studying the space of local learning rules is obtained by first
specifying the nature of the local variables, and then the functional form that
ties them together into each learning rule. Such a framework enables also the
systematic discovery of new learning rules and exploration of relationships
between learning rules and group symmetries. We study polynomial local learning
rules stratified by their degree and analyze their behavior and capabilities in
both linear and nonlinear units and networks. Stacking local learning rules in
deep feedforward networks leads to deep local learning. While deep local
learning can learn interesting representations, it cannot learn complex
inputoutput functions, even when targets are available for the top layer.
Learning complex inputoutput functions requires local deep learning where
target information is communicated to the deep layers through a backward
learning channel. The nature of the communicated information about the targets
and the structure of the learning channel partition the space of learning
algorithms. We estimate the learning channel capacity associated with several
algorithms and show that backpropagation outperforms them by simultaneously
maximizing the information rate and minimizing the computational cost, even in
recurrent networks. The theory clarifies the concept of Hebbian learning,
establishes the power and limitations of local learning rules, introduces the
learning channel which enables a formal analysis of the optimality of
backpropagation, and explains the sparsity of the space of learning rules
discovered so far.

Computing $k$Nearest Neighbors (KNN) is one of the core kernels used in many
machine learning, data mining and scientific computing applications. Although
kdtree based $O(\log n)$ algorithms have been proposed for computing KNN, due
to its inherent sequentiality, linear algorithms are being used in practice.
This limits the applicability of such methods to millions of data points, with
limited scalability for Big Data analytics challenges in the scientific domain.
In this paper, we present parallel and highly optimized kdtree based KNN
algorithms (both construction and querying) suitable for distributed
architectures. Our algorithm includes novel approaches for pruning search space
and improving load balancing and partitioning among nodes and threads. Using
TBsized datasets from three science applications: astrophysics, plasma
physics, and particle physics, we show that our implementation can construct
kdtree of 189 billion particles in 48 seconds on utilizing $\sim$50,000 cores.
We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We
demonstrate almost linear speedup both for shared and distributed memory
computers. Our algorithms outperforms earlier implementations by more than
order of magnitude; thereby radically improving the applicability of our
implementation to stateoftheart Big Data analytics problems. In addition, we
showcase performance and scalability on the recently released Intel Xeon Phi
processor showing that our algorithm scales well even on massively parallel
architectures.

Theano is a Python library that allows to define, optimize, and evaluate
mathematical expressions involving multidimensional arrays efficiently. Since
its introduction, it has been one of the most used CPU and GPU mathematical
compilers  especially in the machine learning community  and has shown steady
performance improvements. Theano is being actively and continuously developed
since 2008, multiple frameworks have been built on top of it and it has been
used to produce many stateoftheart machine learning models.
The present article is structured as follows. Section I provides an overview
of the Theano software and its community. Section II presents the principal
features of Theano and how to use them, and compares them with other similar
projects. Section III focuses on recentlyintroduced functionalities and
improvements. Section IV compares the performance of Theano against Torch7 and
TensorFlow on several machine learning models. Section V discusses current
limitations of Theano and potential ways of improving it.

At the extreme energies of the Large Hadron Collider, massive particles can
be produced at such high velocities that their hadronic decays are collimated
and the resulting jets overlap. Deducing whether the substructure of an
observed jet is due to a lowmass single particle or due to multiple decay
objects of a massive particle is an important problem in the analysis of
collider data. Traditional approaches have relied on expert features designed
to detect energy deposition patterns in the calorimeter, but the complexity of
the data make this task an excellent candidate for the application of machine
learning tools. The data collected by the detector can be treated as a
twodimensional image, lending itself to the natural application of image
classification techniques. In this work, we apply deep neural networks with a
mixture of locallyconnected and fullyconnected nodes. Our experiments
demonstrate that without the aid of expert features, such networks match or
modestly outperform the current stateoftheart approach for discriminating
between jets from single hadronic particles and overlapping jets from pairs of
collimated hadronic particles, and that such performance gains persist in the
presence of pileup interactions.

We investigate a new structure for machine learning classifiers applied to
problems in highenergy physics by expanding the inputs to include not only
measured features but also physics parameters. The physics parameters represent
a smoothly varying learning task, and the resulting parameterized classifier
can smoothly interpolate between them and replace sets of classifiers trained
at individual values. This simplifies the training process and gives improved
performance at intermediate values, even for complex problems requiring deep
learning. Applications include tools parameterized in terms of theoretical
model parameters, such as the mass of a particle, which allow for a single
network to provide improved discrimination across a range of masses. This
concept is simple to implement and allows for optimized interpolatable results.

Artificial neural networks typically have a fixed, nonlinear activation
function at each neuron. We have designed a novel form of piecewise linear
activation function that is learned independently for each neuron using
gradient descent. With this adaptive activation function, we are able to
improve upon deep neural network architectures composed of static rectified
linear units, achieving stateoftheart performance on CIFAR10 (7.51%),
CIFAR100 (30.83%), and a benchmark from highenergy physics involving Higgs
boson decay modes.

The Higgs boson is thought to provide the interaction that imparts mass to
the fundamental fermions, but while measurements at the Large Hadron Collider
(LHC) are consistent with this hypothesis, current analysis techniques lack the
statistical power to cross the traditional 5$\sigma$ significance barrier
without more data. \emph{Deep learning} techniques have the potential to
increase the statistical power of this analysis by \emph{automatically}
learning complex, highlevel data representations. In this work, deep neural
networks are used to detect the decay of the Higgs to a pair of tau leptons. A
Bayesian optimization algorithm is used to tune the network architecture and
training algorithm hyperparameters, resulting in a deep network of eight
nonlinear processing layers that improves upon the performance of shallow
classifiers even without the use of features specifically engineered by
physicists for this application. The improvement in discovery significance is
equivalent to an increase in the accumulated dataset of 25\%.

Collisions at highenergy particle colliders are a traditionally fruitful
source of exotic particle discoveries. Finding these rare particles requires
solving difficult signalversusbackground classification problems, hence
machine learning approaches are often used. Standard approaches have relied on
`shallow' machine learning models that have a limited capacity to learn complex
nonlinear functions of the inputs, and rely on a painstaking search through
manually constructed nonlinear features. Progress on this problem has slowed,
as a variety of techniques have shown equivalent performance. Recent advances
in the field of deep learning make it possible to learn more complex functions
and better discriminate between signal and background classes. Using benchmark
datasets, we show that deep learning methods need no manually constructed
inputs and yet improve the classification metric by as much as 8\% over the
best current approaches. This demonstrates that deep learning approaches can
improve the power of collider searches for exotic particles.