• Learning, especially rapid learning, is critical for survival. However, learning is hard: a large number of synaptic weights must be set based on noisy, often ambiguous, sensory information. In such a high-noise regime, keeping track of probability distributions over weights - not just point estimates - is the optimal strategy. Here we hypothesize that synapses take that optimal strategy: they do not store just the mean weight; they also store their degree of uncertainty - in essence, they put error bars on the weights. They then use that uncertainty to adjust their learning rates, with higher uncertainty resulting in higher learning rates. We also make a second, independent, hypothesis: synapses communicate their uncertainty by linking it to variability, with more uncertainty leading to more variability. More concretely, the value of a synaptic weight at a given time is a sample from its probability distribution. These two hypotheses cast synaptic plasticity as a problem of Bayesian inference, and thus provide a normative view of learning. They are consistent with known learning rules, offer an explanation for the large variability in the size of post-synaptic potentials, and make several falsifiable experimental predictions.
  • Sensory neurons give highly variable responses to stimulation, which can limit the amount of stimulus information available to downstream circuits. Much work has investigated the factors that affect the amount of information encoded in these population responses, leading to insights about the role of covariability among neurons, tuning curve shape, etc. However, the informativeness of neural responses is not the only relevant feature of population codes; of potentially equal importance is how robustly that information propagates to downstream structures. For instance, to quantify the retina's performance, one must consider not only the informativeness of the optic nerve responses, but also the amount of information that survives the spike-generating nonlinearity and noise corruption in the next stage of processing, the lateral geniculate nucleus. Our study identifies the set of covariance structures for the upstream cells that optimize the ability of information to propagate through noisy, nonlinear circuits. Within this optimal family are covariances with "differential correlations", which are known to reduce the information encoded in neural population activities. Thus, covariance structures that maximize information in neural population codes, and those that maximize the ability of this information to propagate, can be very different.
  • Zipf's law, which states that the probability of an observation is inversely proportional to its rank, has been observed in many domains. While there are models that explain Zipf's law in each of them, those explanations are typically domain specific. Recently, methods from statistical physics were used to show that a fairly broad class of models does provide a general explanation of Zipf's law. This explanation rests on the observation that real world data is often generated from underlying causes, known as latent variables. Those latent variables mix together multiple models that do not obey Zipf's law, giving a model that does. Here we extend that work both theoretically and empirically. Theoretically, we provide a far simpler and more intuitive explanation of Zipf's law, which at the same time considerably extends the class of models to which this explanation can apply. Furthermore, we also give methods for verifying whether this explanation applies to a particular dataset. Empirically, these advances allowed us extend this explanation to important classes of data, including word frequencies (the first domain in which Zipf's law was discovered), data with variable sequence length, and multi-neuron spiking activity.
  • Activity in neocortex exhibits a range of behaviors, from irregular to temporally precise, and from weakly to strongly correlated. So far there has been no single theoretical framework that could explain all these behaviors, leaving open the possibility that they are a signature of radically different mechanisms. Here, we suggest that this is not the case. Instead, we show that a single theory can account for a broad spectrum of experimental observations, including specifics such as the fine temporal details of subthreshold cross-correlations. For the model underlying our theory, we need only assume a small number of well-established properties common to all local cortical networks. When these assumptions are combined with realistically structured input, they produce exactly the repertoire of behaviors that is observed experimentally, and lead to a number of testable predictions.
  • When an action potential is transmitted to a postsynaptic neuron, a small change in the postsynaptic neuron's membrane potential occurs. These small changes, known as a postsynaptic potentials (PSPs), are highly variable, and current models assume that this variability is corrupting noise. In contrast, we show that this variability could have an important computational role: representing a synapse's uncertainty about the optimal synaptic weight (i.e. the best possible setting for the synaptic weight). We show that this link between uncertainty and variability, that we call synaptic sampling, leads to more accurate estimates of the uncertainty in task relevant quantities, leading to more effective decision making. Synaptic sampling makes four predictions, all of which have some experimental support. First the more variable a synapse is, the more it should change during LTP protocols. Second, variability should increase as the presynpatic firing rate falls. Third, PSP variance should be proportional to PSP mean. Fourth, variability should increase with distance from the cell soma. We provide support for the first two predictions by reanalysing existing datasets, and we find preexisting data in support of the last two predictions.
  • Correlations among spikes, both on the same neuron and across neurons, are ubiquitous in the brain. For example cross-correlograms can have large peaks, at least in the periphery, and smaller -- but still non-negligible -- ones in cortex, and auto-correlograms almost always exhibit non-trivial temporal structure at a range of timescales. Although this has been known for over forty years, it's still not clear what role these correlations play in the brain -- and, indeed, whether they play any role at all. The goal of this chapter is to shed light on this issue by reviewing some of the work on this subject.
  • A fundamental problem in neuroscience is understanding how working memory -- the ability to store information at intermediate timescales, like 10s of seconds -- is implemented in realistic neuronal networks. The most likely candidate mechanism is the attractor network, and a great deal of effort has gone toward investigating it theoretically. Yet, despite almost a quarter century of intense work, attractor networks are not fully understood. In particular, there are still two unanswered questions. First, how is it that attractor networks exhibit irregular firing, as is observed experimentally during working memory tasks? And second, how many memories can be stored under biologically realistic conditions? Here we answer both questions by studying an attractor neural network in which inhibition and excitation balance each other. Using mean field analysis, we derive a three-variable description of attractor networks. From this description it follows that irregular firing can exist only if the number of neurons involved in a memory is large. The same mean field analysis also shows that the number of memories that can be stored in a network scales with the number of excitatory connections, a result that has been suggested for simple models but never shown for realistic ones. Both of these predictions are verified using simulations with large networks of spiking neurons.
  • We would like to know whether the statistics of neuronal responses vary across cortical areas. We examined stimulus-elicited spike count response distributions in V1 and IT cortices of awake monkeys. In both areas the distribution of spike counts for each stimulus was well-described by a Gaussian, with the log of the variance in the spike count linearly related to the log of the mean spike count. Two significant differences in response characteristics were found: both the range of spike counts and the slope of the log(variance) vs. log(mean) regression were larger in V1 than in IT. However, neurons in the two areas transmitted approximately the same amount of information about the stimuli, and had about the same channel capacity (the maximum possible transmitted information given noise in the responses). These results suggest that neurons in V1 use more variable signals over a larger dynamic range than neurons in IT, which use less variable signals over a smaller dynamic range. The two coding strategies are approximately as effective in transmitting information.