
Learning, especially rapid learning, is critical for survival. However,
learning is hard: a large number of synaptic weights must be set based on
noisy, often ambiguous, sensory information. In such a highnoise regime,
keeping track of probability distributions over weights  not just point
estimates  is the optimal strategy. Here we hypothesize that synapses take
that optimal strategy: they do not store just the mean weight; they also store
their degree of uncertainty  in essence, they put error bars on the weights.
They then use that uncertainty to adjust their learning rates, with higher
uncertainty resulting in higher learning rates. We also make a second,
independent, hypothesis: synapses communicate their uncertainty by linking it
to variability, with more uncertainty leading to more variability. More
concretely, the value of a synaptic weight at a given time is a sample from its
probability distribution. These two hypotheses cast synaptic plasticity as a
problem of Bayesian inference, and thus provide a normative view of learning.
They are consistent with known learning rules, offer an explanation for the
large variability in the size of postsynaptic potentials, and make several
falsifiable experimental predictions.

Sensory neurons give highly variable responses to stimulation, which can
limit the amount of stimulus information available to downstream circuits. Much
work has investigated the factors that affect the amount of information encoded
in these population responses, leading to insights about the role of
covariability among neurons, tuning curve shape, etc. However, the
informativeness of neural responses is not the only relevant feature of
population codes; of potentially equal importance is how robustly that
information propagates to downstream structures. For instance, to quantify the
retina's performance, one must consider not only the informativeness of the
optic nerve responses, but also the amount of information that survives the
spikegenerating nonlinearity and noise corruption in the next stage of
processing, the lateral geniculate nucleus. Our study identifies the set of
covariance structures for the upstream cells that optimize the ability of
information to propagate through noisy, nonlinear circuits. Within this optimal
family are covariances with "differential correlations", which are known to
reduce the information encoded in neural population activities. Thus,
covariance structures that maximize information in neural population codes, and
those that maximize the ability of this information to propagate, can be very
different.

Zipf's law, which states that the probability of an observation is inversely
proportional to its rank, has been observed in many domains. While there are
models that explain Zipf's law in each of them, those explanations are
typically domain specific. Recently, methods from statistical physics were used
to show that a fairly broad class of models does provide a general explanation
of Zipf's law. This explanation rests on the observation that real world data
is often generated from underlying causes, known as latent variables. Those
latent variables mix together multiple models that do not obey Zipf's law,
giving a model that does. Here we extend that work both theoretically and
empirically. Theoretically, we provide a far simpler and more intuitive
explanation of Zipf's law, which at the same time considerably extends the
class of models to which this explanation can apply. Furthermore, we also give
methods for verifying whether this explanation applies to a particular dataset.
Empirically, these advances allowed us extend this explanation to important
classes of data, including word frequencies (the first domain in which Zipf's
law was discovered), data with variable sequence length, and multineuron
spiking activity.

Activity in neocortex exhibits a range of behaviors, from irregular to
temporally precise, and from weakly to strongly correlated. So far there has
been no single theoretical framework that could explain all these behaviors,
leaving open the possibility that they are a signature of radically different
mechanisms. Here, we suggest that this is not the case. Instead, we show that a
single theory can account for a broad spectrum of experimental observations,
including specifics such as the fine temporal details of subthreshold
crosscorrelations. For the model underlying our theory, we need only assume a
small number of wellestablished properties common to all local cortical
networks. When these assumptions are combined with realistically structured
input, they produce exactly the repertoire of behaviors that is observed
experimentally, and lead to a number of testable predictions.

When an action potential is transmitted to a postsynaptic neuron, a small
change in the postsynaptic neuron's membrane potential occurs. These small
changes, known as a postsynaptic potentials (PSPs), are highly variable, and
current models assume that this variability is corrupting noise. In contrast,
we show that this variability could have an important computational role:
representing a synapse's uncertainty about the optimal synaptic weight (i.e.
the best possible setting for the synaptic weight). We show that this link
between uncertainty and variability, that we call synaptic sampling, leads to
more accurate estimates of the uncertainty in task relevant quantities, leading
to more effective decision making. Synaptic sampling makes four predictions,
all of which have some experimental support. First the more variable a synapse
is, the more it should change during LTP protocols. Second, variability should
increase as the presynpatic firing rate falls. Third, PSP variance should be
proportional to PSP mean. Fourth, variability should increase with distance
from the cell soma. We provide support for the first two predictions by
reanalysing existing datasets, and we find preexisting data in support of the
last two predictions.

Correlations among spikes, both on the same neuron and across neurons, are
ubiquitous in the brain. For example crosscorrelograms can have large peaks,
at least in the periphery, and smaller  but still nonnegligible  ones in
cortex, and autocorrelograms almost always exhibit nontrivial temporal
structure at a range of timescales. Although this has been known for over forty
years, it's still not clear what role these correlations play in the brain 
and, indeed, whether they play any role at all. The goal of this chapter is to
shed light on this issue by reviewing some of the work on this subject.

A fundamental problem in neuroscience is understanding how working memory 
the ability to store information at intermediate timescales, like 10s of
seconds  is implemented in realistic neuronal networks. The most likely
candidate mechanism is the attractor network, and a great deal of effort has
gone toward investigating it theoretically. Yet, despite almost a quarter
century of intense work, attractor networks are not fully understood. In
particular, there are still two unanswered questions. First, how is it that
attractor networks exhibit irregular firing, as is observed experimentally
during working memory tasks? And second, how many memories can be stored under
biologically realistic conditions? Here we answer both questions by studying an
attractor neural network in which inhibition and excitation balance each other.
Using mean field analysis, we derive a threevariable description of attractor
networks. From this description it follows that irregular firing can exist only
if the number of neurons involved in a memory is large. The same mean field
analysis also shows that the number of memories that can be stored in a network
scales with the number of excitatory connections, a result that has been
suggested for simple models but never shown for realistic ones. Both of these
predictions are verified using simulations with large networks of spiking
neurons.

We would like to know whether the statistics of neuronal responses vary
across cortical areas. We examined stimuluselicited spike count response
distributions in V1 and IT cortices of awake monkeys. In both areas the
distribution of spike counts for each stimulus was welldescribed by a
Gaussian, with the log of the variance in the spike count linearly related to
the log of the mean spike count. Two significant differences in response
characteristics were found: both the range of spike counts and the slope of the
log(variance) vs. log(mean) regression were larger in V1 than in IT. However,
neurons in the two areas transmitted approximately the same amount of
information about the stimuli, and had about the same channel capacity (the
maximum possible transmitted information given noise in the responses). These
results suggest that neurons in V1 use more variable signals over a larger
dynamic range than neurons in IT, which use less variable signals over a
smaller dynamic range. The two coding strategies are approximately as effective
in transmitting information.