
Here we study polysemy as a potential learning bias in vocabulary learning in
children. Words of low polysemy could be preferred as they reduce the
disambiguation effort for the listener. However, such preference could be a
sideeffect of another bias: the preference of children for nouns in
combination with the lower polysemy of nouns with respect to other
partofspeech categories. Our results show that mean polysemy in children
increases over time in two phases, i.e. a fast growth till the 31st month
followed by a slower tendency towards adult speech. In contrast, this evolution
is not found in adults interacting with children. This suggests that children
have a preference for nonpolysemous words in their early stages of vocabulary
acquisition. Interestingly, the evolutionary pattern described above weakens
when controlling for syntactic category (noun, verb, adjective or adverb) but
it does not disappear completely, suggesting that it could result from
acombination of a standalone bias for low polysemy and a preference for nouns.

A comment on "Neurophysiological dynamics of phrasestructure building during
sentence processing" by Nelson et al (2017), Proceedings of the National
Academy of Sciences USA 114(18), E3669E3678.

In his pioneering research, G. K. Zipf observed that more frequent words tend
to have more meanings, and showed that the number of meanings of a word grows
as the square root of its frequency. He derived this relationship from two
assumptions: that words follow Zipf's law for word frequencies (a power law
dependency between frequency and rank) and Zipf's law of meaning distribution
(a power law dependency between number of meanings and rank). Here we show that
a single assumption on the joint probability of a word and a meaning suffices
to infer Zipf's meaningfrequency law or relaxed versions. Interestingly, this
assumption can be justified as the outcome of a biased random walk in the
process of mental exploration.

The syntactic structure of a sentence can be modelled as a tree, where
vertices correspond to words and edges indicate syntactic dependencies. It has
been claimed recurrently that the number of edge crossings in real sentences is
small. However, a baseline or null hypothesis has been lacking. Here we
quantify the amount of crossings of real sentences and compare it to the
predictions of a series of baselines. We conclude that crossings are really
scarce in real sentences. Their scarcity is unexpected by the hubiness of the
trees. Indeed, real sentences are close to linear trees, where the potential
number of crossings is maximized.

The minimization of the length of syntactic dependencies is a
wellestablished principle of word order and the basis of a mathematical theory
of word order. Here we complete that theory from the perspective of information
theory, adding a competing word order principle: the maximization of
predictability of a target element. These two principles are in conflict: to
maximize the predictability of the head, the head should appear last, which
maximizes the costs with respect to dependency length minimization. The
implications of such a broad theoretical framework to understand the
optimality, diversity and evolution of the six possible orderings of subject,
object and verb are reviewed.

A family of information theoretic models of communication was introduced more
than a decade ago to explain the origins of Zipf's law for word frequencies.
The family is a based on a combination of two information theoretic principles:
maximization of mutual information between forms and meanings and minimization
of form entropy. The family also sheds light on the origins of three other
patterns: the principle of contrast, a related vocabulary learning bias and the
meaningfrequency law. Here two important components of the family, namely the
information theoretic principles and the energy function that combines them
linearly, are reviewed from the perspective of psycholinguistics, language
learning, information theory and synergetic linguistics. The minimization of
this linear function is linked to the problem of compression of standard
information theory and might be tuned by selforganization.

Entropy is a fundamental property of a repertoire. Here, we present an
efficient algorithm to estimate the entropy of types with the help of Zhang's
estimator. The algorithm takes advantage of the fact that the number of
different frequencies in a text is in general much smaller than the number of
types. We justify the convenience of the algorithm by means of an analysis of
the statistical properties of texts from more than 1000 languages. Our work
opens up various possibilities for future research.

In a recent article, Christiansen and Chater (2016) present a fundamental
constraint on language, i.e. a nowornever bottleneck that arises from our
fleeting memory, and explore its implications, e.g., chunkandpass processing,
outlining a framework that promises to unify different areas of research. Here
we explore additional support for this constraint and suggest further
connections from quantitative linguistics and information theory.

Comment on "Dependency distance: a new perspective on syntactic patterns in
natural language" by Haitao Liu et al

The structure of a sentence can be represented as a network where vertices
are words and edges indicate syntactic dependencies. Interestingly, crossing
syntactic dependencies have been observed to be infrequent in human languages.
This leads to the question of whether the scarcity of crossings in languages
arises from an independent and specific constraint on crossings. We provide
statistical evidence suggesting that this is not the case, as the proportion of
dependency crossings of sentences from a wide range of languages can be
accurately estimated by a simple predictor based on a null hypothesis on the
local probability that two dependencies cross given their lengths. The relative
error of this predictor never exceeds 5% on average, whereas the error of a
baseline predictor assuming a random ordering of the words of a sentence is at
least 6 times greater. Our results suggest that the low frequency of crossings
in natural languages is neither originated by hidden knowledge of language nor
by the undesirability of crossings per se, but as a mere side effect of the
principle of dependency length minimization.

It has been hypothesized that the rather small number of crossings in real
syntactic dependency trees is a sideeffect of pressure for dependency length
minimization. Here we answer a related important research question: what would
be the expected number of crossings if the natural order of a sentence was lost
and replaced by a random ordering? We show that this number depends only on the
number of vertices of the dependency tree (the sentence length) and the second
moment about zero of vertex degrees. The expected number of crossings is
minimum for a star tree (crossings are impossible) and maximum for a linear
tree (the number of crossings is of the order of the square of the sequence
length).

More than 30 years ago, Shiloach published an algorithm to solve the minimum
linear arrangement problem for undirected trees. Here we fix a small error in
the original version of the algorithm and discuss its effect on subsequent
literature. We also improve some aspects of the notation.

Vocabulary learning by children can be characterized by many biases. When
encountering a new word, children as well as adults, are biased towards
assuming that it means something totally different from the words that they
already know. To the best of our knowledge, the 1st mathematical proof of the
optimality of this bias is presented here. First, it is shown that this bias is
a particular case of the maximization of mutual information between words and
meanings. Second, the optimality is proven within a more general information
theoretic framework where mutual information maximization competes with other
information theoretic principles. The bias is a prediction from modern
information theory. The relationship between information theoretic principles
and the principles of contrast and mutual exclusivity is also shown.

Vocalizations and less often gestures have been the object of linguistic
research over decades. However, the development of a general theory of
communication with human language as a particular case requires a clear
understanding of the organization of communication through other means.
Infochemicals are chemical compounds that carry information and are employed by
small organisms that cannot emit acoustic signals of optimal frequency to
achieve successful communication. Here the distribution of infochemicals across
species is investigated when they are ranked by their degree or the number of
species with which it is associated (because they produce or they are sensitive
to it). The quality of the fit of different functions to the dependency between
degree and rank is evaluated with a penalty for the number of parameters of the
function. Surprisingly, a double Zipf (a Zipf distribution with two regimes
with a different exponent each) is the model yielding the best fit although it
is the function with the largest number of parameters. This suggests that the
world wide repertoire of infochemicals contains a chemical nucleus shared by
many species and reminiscent of the core vocabularies found for human language
in dictionaries or large corpora.

According to Zipf's meaningfrequency law, words that are more frequent tend
to have more meanings. Here it is shown that a linear dependency between the
frequency of a form and its number of meanings is found in a family of models
of Zipf's law for word frequencies. This is evidence for a weak version of the
meaningfrequency law. Interestingly, that weak law (a) is not an inevitable of
property of the assumptions of the family and (b) is found at least in the
narrow regime where those models exhibit Zipf's law for word frequencies.

Here we sketch a new derivation of Zipf's law for word frequencies based on
optimal coding. The structure of the derivation is reminiscent of Mandelbrot's
random typing model but it has multiple advantages over random typing: (1) it
starts from realistic cognitive pressures (2) it does not require fine tuning
of parameters and (3) it sheds light on the origins of other statistical laws
of language and thus can lead to a compact theory of linguistic laws. Our
findings suggest that the recurrence of Zipf's law in human languages could
originate from pressure for easy and fast communication.

The syntactic structure of sentences exhibits a striking regularity:
dependencies tend to not cross when drawn above the sentence. We investigate
two competing explanations. The traditional hypothesis is that this trend
arises from an independent principle of syntax that reduces crossings
practically to zero. An alternative to this view is the hypothesis that
crossings are a side effect of dependency lengths, i.e. sentences with shorter
dependency lengths should tend to have fewer crossings. We are able to reject
the traditional view in the majority of languages considered. The alternative
hypothesis can lead to a more parsimonious theory of language.

The minimum linear arrangement problem on a network consists of finding the
minimum sum of edge lengths that can be achieved when the vertices are arranged
linearly. Although there are algorithms to solve this problem on trees in
polynomial time, they have remained theoretical and have not been implemented
in practical contexts to our knowledge. Here we use one of those algorithms to
investigate the growth of this sum as a function of the size of the tree in
uniformly random trees. We show that this sum is bounded above by its value in
a star tree. We also show that the mean edge length grows logarithmically in
optimal linear arrangements, in stark contrast to the linear growth that is
expected on optimal arrangements of star trees or on random linear
arrangements.

A commentary on the article "Largescale evidence of dependency length
minimization in 37 languages" by Futrell, Mahowald & Gibson (PNAS 2015 112 (33)
1033610341).

Word order evolution has been hypothesized to be constrained by a word order
permutation ring: transitions involving orders that are closer in the
permutation ring are more likely. The hypothesis can be seen as a particular
case of Kauffman's adjacent possible in word order evolution. Here we consider
the problem of the association of the six possible orders of S, V and O to
yield a couple of primary alternating orders as a window to word order
evolution. We evaluate the suitability of various competing hypotheses to
predict one member of the couple from the other with the help of information
theoretic model selection. Our ensemble of models includes a sixway model that
is based on the word order permutation ring (Kauffman's adjacent possible) and
another model based on the dual twoway of standard typology, that reduces word
order to basic orders preferences (e.g., a preference for SV over VS and
another for SO over OS). Our analysis indicates that the permutation ring
yields the best model when favoring parsimony strongly, providing support for
Kauffman's general view and a sixway typology.

Zipf's law is a fundamental paradigm in the statistics of written and spoken
natural language as well as in other communication systems. We raise the
question of the elementary units for which Zipf's law should hold in the most
natural way, studying its validity for plain word forms and for the
corresponding lemma forms. In order to have as homogeneous sources as possible,
we analyze some of the longest literary texts ever written, comprising four
different languages, with different levels of morphological complexity. In all
cases Zipf's law is fulfilled, in the sense that a powerlaw distribution of
word or lemma frequencies is valid for several orders of magnitude. We
investigate the extent to which the wordlemma transformation preserves two
parameters of Zipf's law: the exponent and the lowfrequency cutoff. We are
not able to demonstrate a strict invariance of the tail, as for a few texts
both exponents deviate significantly, but we conclude that the exponents are
very similar, despite the remarkable transformation that going from words to
lemmas represents, considerably affecting all ranges of frequencies. In
contrast, the lowfrequency cutoffs are less stable.

Here we respond to some comments by Alday concerning headedness in linguistic
theory and the validity of the assumptions of a mathematical model for word
order. For brevity, we focus only on two assumptions: the unit of measurement
of dependency length and the monotonicity of the cost of a dependency as a
function of its length. We also revise the implicit psychological bias in
Alday's comments. Notwithstanding, Alday is indicating the path for linguistic
research with his unusual concerns about parsimony from multiple dimensions.

It is well known that the length of a syntactic dependency determines its
online memory cost. Thus, the problem of the placement of a head and its
dependents (complements or modifiers) that minimizes online memory is
equivalent to the problem of the minimum linear arrangement of a star tree.
However, how that length is translated into cognitive cost is not known. This
study shows that the online memory cost is minimized when the head is placed at
the center, regardless of the function that transforms length into cost,
provided only that this function is strictly monotonically increasing. Online
memory defines a quasiconvex adaptive landscape with a single central minimum
if the number of elements is odd and two central minima if that number is even.
We discuss various aspects of the dynamics of word order of subject (S), verb
(V) and object (O) from a complex systems perspective and suggest that word
orders tend to evolve by swapping adjacent constituents from an initial or
early SOV configuration that is attracted towards a central word order by
online memory minimization. We also suggest that the stability of SVO is due to
at least two factors, the quasiconvex shape of the adaptive landscape in the
online memory dimension and online memory adaptations that avoid regression to
SOV. Although OVS is also optimal for placing the verb at the center, its low
frequency is explained by its long distance to the seminal SOV in the
permutation space.

The syntactic structure of a sentence can be modeled as a tree where vertices
are words and edges indicate syntactic dependencies between words. It is
wellknown that those edges normally do not cross when drawn over the sentence.
Here a new null hypothesis for the number of edge crossings of a sentence is
presented. That null hypothesis takes into account the length of the pair of
edges that may cross and predicts the relative number of crossings in random
trees with a small error, suggesting that a ban of crossings or a principle of
minimization of crossings are not needed in general to explain the origins of
noncrossing dependencies. Our work paves the way for more powerful null
hypotheses to investigate the origins of noncrossing dependencies in nature.

The use of null hypotheses (in a statistical sense) is common in hard
sciences but not in theoretical linguistics. Here the null hypothesis that the
low frequency of syntactic dependency crossings is expected by an arbitrary
ordering of words is rejected. It is shown that this would require star
dependency structures, which are both unrealistic and too restrictive. The
hypothesis of the limited resources of the human brain is revisited. Stronger
null hypotheses taking into account actual dependency lengths for the
likelihood of crossings are presented. Those hypotheses suggests that crossings
are likely to reduce when dependencies are shortened. A hypothesis based on
pressure to reduce dependency lengths is more parsimonious than a principle of
minimization of crossings or a grammatical ban that is totally dissociated from
the general and nonlinguistic principle of economy.