
An outstanding problem in neuroscience is to understand how information is
integrated across the many modules of the brain. While classic
informationtheoretic measures have transformed our understanding of
feedforward information processing in the brain's sensory periphery, comparable
measures for information flow in the massively recurrent networks of the rest
of the brain have been lacking. To address this, recent work in information
theory has produced a sound measure of networkwide "integrated information,"
which can be estimated from timeseries data. But, a computational hurdle has
stymied attempts to measure largescale information integration in real brains.
Specifically, the measurement of integrated information involves a
combinatorial search for the informational "weakest link" of a network, a
process whose computation time explodes superexponentially with network size.
Here, we show that spectral clustering, applied on the correlation matrix of
timeseries data, provides an approximate but robust solution to the search for
the the informational weakest link of large networks. This reduces the
computation time for integrated information in large systems from longer than
the lifespan of the universe to just minutes. We evaluate this solution in
brainlike systems of coupled oscillators as well as in highdensity
electrocortigraphy data from two macaque monkeys, and show that the
informational "weakest link" of the monkey cortex splits posterior sensory
areas from anterior association areas. Finally, we use our solution to provide
evidence in support of the longstanding hypothesis that information
integration is maximized by networks with a high global efficiency, and that
modular network structures promote the segregation of information.

Finding overcomplete latent representations of data has applications in data
analysis, signal processing, machine learning, theoretical neuroscience and
many other fields. In an overcomplete representation, the number of latent
features exceeds the data dimensionality, which is useful when the data is
undersampled by the measurements (compressed sensing, information bottlenecks
in neural systems) or composed from multiple complete sets of linear features,
each spanning the data space. Independent Components Analysis (ICA) is a linear
technique for learning sparse latent representations, which typically has a
lower computational cost than sparse coding, its nonlinear, recurrent
counterpart. While well suited for finding complete representations, we show
that overcompleteness poses a challenge to existing ICA algorithms.
Specifically, the coherence control in existing ICA algorithms, necessary to
prevent the formation of duplicate dictionary features, is illsuited in the
overcomplete case. We show that in this case several existing ICA algorithms
have undesirable global minima that maximize coherence. Further, by comparing
ICA algorithms on synthetic data and natural images to the computationally more
expensive sparse coding solution, we show that the coherence control biases the
exploration of the data manifold, sometimes yielding suboptimal solutions. We
provide a theoretical explanation of these failures and, based on the theory,
propose improved overcomplete ICA algorithms. All told, this study contributes
new insights into and methods for coherence control for linear ICA, some of
which are applicable to many other, potentially nonlinear, unsupervised
learning methods.

To accommodate structured approaches of neural computation, we propose a
class of recurrent neural networks for indexing and storing sequences of
symbols or analog data vectors. These networks with randomized input weights
and orthogonal recurrent weights implement coding principles previously
described in vector symbolic architectures (VSA), and leverage properties of
reservoir computing. In general, the storage in reservoir computing is lossy
and crosstalk noise limits the retrieval accuracy and information capacity. A
novel theory to optimize memory performance in such networks is presented and
compared with simulation experiments. The theory describes linear readout of
analog data, and readout with winnertakeall error correction of symbolic data
as proposed in VSA models. We find that diverse VSA models from the literature
have universal performance properties, which are superior to what previous
analyses predicted. Further, we propose novel VSA models with the statistically
optimal Wiener filter in the readout that exhibit much higher information
capacity, in particular for storing analog data.
The presented theory also applies to memory buffers, networks with gradual
forgetting, which can operate on infinite data streams without memory overflow.
Interestingly, we find that different forgetting mechanisms, such as
attenuating recurrent weights or neural nonlinearities, produce very similar
behavior if the forgetting time constants are aligned. Such models exhibit
extensive capacity when their forgetting time constant is optimized for given
noise conditions and network size. These results enable the design of new types
of VSA models for the online processing of data streams.

To understand cognitive reasoning in the brain, it has been proposed that
symbols and compositions of symbols are represented by activity patterns
(vectors) in a large population of neurons. Formal models implementing this
idea [Plate 2003], [Kanerva 2009], [Gayler 2003], [Eliasmith 2012] include a
reversible superposition operation for representing with a single vector an
entire set of symbols or an ordered sequence of symbols. If the representation
space is highdimensional, large sets of symbols can be superposed and
individually retrieved. However, crosstalk noise limits the accuracy of
retrieval and information capacity. To understand information processing in the
brain and to design artificial neural systems for cognitive reasoning, a theory
of this superposition operation is essential. Here, such a theory is presented.
The superposition operations in different existing models are mapped to linear
neural networks with unitary recurrent matrices, in which retrieval accuracy
can be analyzed by a single equation. We show that networks representing
information in superposition can achieve a channel capacity of about half a bit
per neuron, a significant fraction of the total available entropy. Going beyond
existing models, superposition operations with recency effects are proposed
that avoid catastrophic forgetting when representing the history of infinite
data streams. These novel models correspond to recurrent networks with
nonunitary matrices or with nonlinear neurons, and can be analyzed and
optimized with an extension of our theory.

Sparse coding or sparse dictionary learning has been widely used to recover
underlying structure in many kinds of natural data. Here, we provide conditions
guaranteeing when this recovery is universal; that is, when sparse codes and
dictionaries are unique (up to natural symmetries). Our main tool is a useful
lemma in combinatorial matrix theory that allows us to derive bounds on the
sample sizes guaranteeing such uniqueness under various assumptions for how
training data are generated. Whenever the conditions to one of our theorems are
met, any sparsityconstrained learning algorithm that succeeds in
reconstructing the data recovers the original sparse codes and dictionary. We
also discuss potential applications to neuroscience and data analysis.

Although exploratory behaviors are ubiquitous in the animal kingdom, their
computational underpinnings are still largely unknown. Behavioral Psychology
has identified learning as a primary drive underlying many exploratory
behaviors. Exploration is seen as a means for an animal to gather sensory data
useful for reducing its ignorance about the environment. While related problems
have been addressed in Data Mining and Reinforcement Learning, the
computational modeling of learningdriven exploration by embodied agents is
largely unrepresented.
Here, we propose a computational theory for learningdriven exploration based
on the concept of missing information that allows an agent to identify
informative actions using Bayesian inference. We demonstrate that when
embodiment constraints are high, agents must actively coordinate their actions
to learn efficiently. Compared to earlier approaches, our exploration policy
yields more efficient learning across a range of worlds with diverse
structures. The improved learning in turn affords greater success in general
tasks including navigation and reward gathering. We conclude by discussing how
the proposed theory relates to previous informationtheoretic objectives of
behavior, such as predictive information and the free energy principle, and how
it might contribute to a general theory of exploratory behavior.

Sparse coding networks, which utilize unsupervised learning to maximize
coding efficiency, have successfully reproduced response properties found in
primary visual cortex \cite{AN:OlshausenField96}. However, conventional sparse
coding models require that the coding circuit can fully sample the sensory data
in a onetoone fashion, a requirement not supported by experimental data from
the thalamocortical projection. To relieve these strict wiring requirements,
we propose a sparse coding network constructed by introducing synaptic learning
in the framework of compressed sensing. We demonstrate that the new model
evolves biologically realistic spatially smooth receptive fields despite the
fact that the feedforward connectivity subsamples the input and thus the
learning has to rely on an impoverished and distorted account of the original
visual data. Further, we demonstrate that the model could form a general scheme
of cortical communication: it can form meaningful representations in a
secondary sensory area, which receives input from the primary sensory area
through a "compressing" corticocortical projection. Finally, we prove that our
model belongs to a new class of sparse coding algorithms in which recurrent
connections are essential in forming the spatial receptive fields.

A new algorithm is proposed for a) unsupervised learning of sparse
representations from subsampled measurements and b) estimating the parameters
required for linearly reconstructing signals from the sparse codes. We verify
that the new algorithm performs efficient data compression on par with the
recent method of compressive sampling. Further, we demonstrate that the
algorithm performs robustly when stacked in several stages or when applied in
undercomplete or overcomplete situations. The new algorithm can explain how
neural populations in the brain that receive subsampled input through fiber
bottlenecks are able to form coherent response properties.

Thalamic relay cells fire action potentials that transmit information from
retina to cortex. The amount of information that spike trains encode is usually
estimated from the precision of spike timing with respect to the stimulus.
Sensory input, however, is only one factor that influences neural activity. For
example, intrinsic dynamics, such as oscillations of networks of neurons, also
modulate firing pattern. Here, we asked if retinal oscillations might help to
convey information to neurons downstream. Specifically, we made wholecell
recordings from relay cells to reveal retinal inputs (EPSPs) and thalamic
outputs (spikes) and analyzed these events with information theory. Our results
show that thalamic spike trains operate as two multiplexed channels. One
channel, which occupies a low frequency band (<30 Hz), is encoded by average
firing rate with respect to the stimulus and carries information about local
changes in the image over time. The other operates in the gamma frequency band
(4080 Hz) and is encoded by spike time relative to the retinal oscillations.
Because these oscillations involve extensive areas of the retina, it is likely
that the second channel transmits information about global features of the
visual scene. At times, the second channel conveyed even more information than
the first.

Periodic neural activity not locked to the stimulus or to motor responses is
usually ignored. Here, we present new tools for modeling and quantifying the
information transmission based on periodic neural activity that occurs with
quasirandom phase relative to the stimulus. We propose a model to reproduce
characteristic features of oscillatory spike trains, such as histograms of
interspike intervals and phase locking of spikes to an oscillatory influence.
The proposed model is based on an inhomogeneous Gamma process governed by a
density function that is a product of the usual stimulusdependent rate and a
quasiperiodic function. Further, we present an analysis method generalizing
the direct method (Rieke et al, 1999; Brenner et al, 2000) to assess the
information content in such data. We demonstrate these tools on recordings from
relay cells in the lateral geniculate nucleus of the cat.