• An outstanding problem in neuroscience is to understand how information is integrated across the many modules of the brain. While classic information-theoretic measures have transformed our understanding of feedforward information processing in the brain's sensory periphery, comparable measures for information flow in the massively recurrent networks of the rest of the brain have been lacking. To address this, recent work in information theory has produced a sound measure of network-wide "integrated information," which can be estimated from time-series data. But, a computational hurdle has stymied attempts to measure large-scale information integration in real brains. Specifically, the measurement of integrated information involves a combinatorial search for the informational "weakest link" of a network, a process whose computation time explodes super-exponentially with network size. Here, we show that spectral clustering, applied on the correlation matrix of time-series data, provides an approximate but robust solution to the search for the the informational weakest link of large networks. This reduces the computation time for integrated information in large systems from longer than the lifespan of the universe to just minutes. We evaluate this solution in brain-like systems of coupled oscillators as well as in high-density electrocortigraphy data from two macaque monkeys, and show that the informational "weakest link" of the monkey cortex splits posterior sensory areas from anterior association areas. Finally, we use our solution to provide evidence in support of the long-standing hypothesis that information integration is maximized by networks with a high global efficiency, and that modular network structures promote the segregation of information.
  • Finding overcomplete latent representations of data has applications in data analysis, signal processing, machine learning, theoretical neuroscience and many other fields. In an overcomplete representation, the number of latent features exceeds the data dimensionality, which is useful when the data is undersampled by the measurements (compressed sensing, information bottlenecks in neural systems) or composed from multiple complete sets of linear features, each spanning the data space. Independent Components Analysis (ICA) is a linear technique for learning sparse latent representations, which typically has a lower computational cost than sparse coding, its nonlinear, recurrent counterpart. While well suited for finding complete representations, we show that overcompleteness poses a challenge to existing ICA algorithms. Specifically, the coherence control in existing ICA algorithms, necessary to prevent the formation of duplicate dictionary features, is ill-suited in the overcomplete case. We show that in this case several existing ICA algorithms have undesirable global minima that maximize coherence. Further, by comparing ICA algorithms on synthetic data and natural images to the computationally more expensive sparse coding solution, we show that the coherence control biases the exploration of the data manifold, sometimes yielding suboptimal solutions. We provide a theoretical explanation of these failures and, based on the theory, propose improved overcomplete ICA algorithms. All told, this study contributes new insights into and methods for coherence control for linear ICA, some of which are applicable to many other, potentially nonlinear, unsupervised learning methods.
  • To accommodate structured approaches of neural computation, we propose a class of recurrent neural networks for indexing and storing sequences of symbols or analog data vectors. These networks with randomized input weights and orthogonal recurrent weights implement coding principles previously described in vector symbolic architectures (VSA), and leverage properties of reservoir computing. In general, the storage in reservoir computing is lossy and crosstalk noise limits the retrieval accuracy and information capacity. A novel theory to optimize memory performance in such networks is presented and compared with simulation experiments. The theory describes linear readout of analog data, and readout with winner-take-all error correction of symbolic data as proposed in VSA models. We find that diverse VSA models from the literature have universal performance properties, which are superior to what previous analyses predicted. Further, we propose novel VSA models with the statistically optimal Wiener filter in the readout that exhibit much higher information capacity, in particular for storing analog data. The presented theory also applies to memory buffers, networks with gradual forgetting, which can operate on infinite data streams without memory overflow. Interestingly, we find that different forgetting mechanisms, such as attenuating recurrent weights or neural nonlinearities, produce very similar behavior if the forgetting time constants are aligned. Such models exhibit extensive capacity when their forgetting time constant is optimized for given noise conditions and network size. These results enable the design of new types of VSA models for the online processing of data streams.
  • To understand cognitive reasoning in the brain, it has been proposed that symbols and compositions of symbols are represented by activity patterns (vectors) in a large population of neurons. Formal models implementing this idea [Plate 2003], [Kanerva 2009], [Gayler 2003], [Eliasmith 2012] include a reversible superposition operation for representing with a single vector an entire set of symbols or an ordered sequence of symbols. If the representation space is high-dimensional, large sets of symbols can be superposed and individually retrieved. However, crosstalk noise limits the accuracy of retrieval and information capacity. To understand information processing in the brain and to design artificial neural systems for cognitive reasoning, a theory of this superposition operation is essential. Here, such a theory is presented. The superposition operations in different existing models are mapped to linear neural networks with unitary recurrent matrices, in which retrieval accuracy can be analyzed by a single equation. We show that networks representing information in superposition can achieve a channel capacity of about half a bit per neuron, a significant fraction of the total available entropy. Going beyond existing models, superposition operations with recency effects are proposed that avoid catastrophic forgetting when representing the history of infinite data streams. These novel models correspond to recurrent networks with non-unitary matrices or with nonlinear neurons, and can be analyzed and optimized with an extension of our theory.
  • Sparse coding or sparse dictionary learning has been widely used to recover underlying structure in many kinds of natural data. Here, we provide conditions guaranteeing when this recovery is universal; that is, when sparse codes and dictionaries are unique (up to natural symmetries). Our main tool is a useful lemma in combinatorial matrix theory that allows us to derive bounds on the sample sizes guaranteeing such uniqueness under various assumptions for how training data are generated. Whenever the conditions to one of our theorems are met, any sparsity-constrained learning algorithm that succeeds in reconstructing the data recovers the original sparse codes and dictionary. We also discuss potential applications to neuroscience and data analysis.
  • Although exploratory behaviors are ubiquitous in the animal kingdom, their computational underpinnings are still largely unknown. Behavioral Psychology has identified learning as a primary drive underlying many exploratory behaviors. Exploration is seen as a means for an animal to gather sensory data useful for reducing its ignorance about the environment. While related problems have been addressed in Data Mining and Reinforcement Learning, the computational modeling of learning-driven exploration by embodied agents is largely unrepresented. Here, we propose a computational theory for learning-driven exploration based on the concept of missing information that allows an agent to identify informative actions using Bayesian inference. We demonstrate that when embodiment constraints are high, agents must actively coordinate their actions to learn efficiently. Compared to earlier approaches, our exploration policy yields more efficient learning across a range of worlds with diverse structures. The improved learning in turn affords greater success in general tasks including navigation and reward gathering. We conclude by discussing how the proposed theory relates to previous information-theoretic objectives of behavior, such as predictive information and the free energy principle, and how it might contribute to a general theory of exploratory behavior.
  • Sparse coding networks, which utilize unsupervised learning to maximize coding efficiency, have successfully reproduced response properties found in primary visual cortex \cite{AN:OlshausenField96}. However, conventional sparse coding models require that the coding circuit can fully sample the sensory data in a one-to-one fashion, a requirement not supported by experimental data from the thalamo-cortical projection. To relieve these strict wiring requirements, we propose a sparse coding network constructed by introducing synaptic learning in the framework of compressed sensing. We demonstrate that the new model evolves biologically realistic spatially smooth receptive fields despite the fact that the feedforward connectivity subsamples the input and thus the learning has to rely on an impoverished and distorted account of the original visual data. Further, we demonstrate that the model could form a general scheme of cortical communication: it can form meaningful representations in a secondary sensory area, which receives input from the primary sensory area through a "compressing" cortico-cortical projection. Finally, we prove that our model belongs to a new class of sparse coding algorithms in which recurrent connections are essential in forming the spatial receptive fields.
  • A new algorithm is proposed for a) unsupervised learning of sparse representations from subsampled measurements and b) estimating the parameters required for linearly reconstructing signals from the sparse codes. We verify that the new algorithm performs efficient data compression on par with the recent method of compressive sampling. Further, we demonstrate that the algorithm performs robustly when stacked in several stages or when applied in undercomplete or overcomplete situations. The new algorithm can explain how neural populations in the brain that receive subsampled input through fiber bottlenecks are able to form coherent response properties.
  • Thalamic relay cells fire action potentials that transmit information from retina to cortex. The amount of information that spike trains encode is usually estimated from the precision of spike timing with respect to the stimulus. Sensory input, however, is only one factor that influences neural activity. For example, intrinsic dynamics, such as oscillations of networks of neurons, also modulate firing pattern. Here, we asked if retinal oscillations might help to convey information to neurons downstream. Specifically, we made whole-cell recordings from relay cells to reveal retinal inputs (EPSPs) and thalamic outputs (spikes) and analyzed these events with information theory. Our results show that thalamic spike trains operate as two multiplexed channels. One channel, which occupies a low frequency band (<30 Hz), is encoded by average firing rate with respect to the stimulus and carries information about local changes in the image over time. The other operates in the gamma frequency band (40-80 Hz) and is encoded by spike time relative to the retinal oscillations. Because these oscillations involve extensive areas of the retina, it is likely that the second channel transmits information about global features of the visual scene. At times, the second channel conveyed even more information than the first.
  • Periodic neural activity not locked to the stimulus or to motor responses is usually ignored. Here, we present new tools for modeling and quantifying the information transmission based on periodic neural activity that occurs with quasi-random phase relative to the stimulus. We propose a model to reproduce characteristic features of oscillatory spike trains, such as histograms of inter-spike intervals and phase locking of spikes to an oscillatory influence. The proposed model is based on an inhomogeneous Gamma process governed by a density function that is a product of the usual stimulus-dependent rate and a quasi-periodic function. Further, we present an analysis method generalizing the direct method (Rieke et al, 1999; Brenner et al, 2000) to assess the information content in such data. We demonstrate these tools on recordings from relay cells in the lateral geniculate nucleus of the cat.