Professional Documents
Culture Documents
Mandelbrot's Model For Zipf's Law: Can Mandelbrot's Model Explain Zipf's Law For Language?
Mandelbrot's Model For Zipf's Law: Can Mandelbrot's Model Explain Zipf's Law For Language?
discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/220469172
CITATIONS READS
5 303
1 AUTHOR:
Dmitrii Manin
27 PUBLICATIONS 256 CITATIONS
SEE PROFILE
Mandelbrot's Model for Zipf's Law: Can Mandelbrot's Model Explain Zipf's
Law for Language?
D. Yu. Manin
To cite this Article Manin, D. Yu.(2009) 'Mandelbrot's Model for Zipf's Law: Can Mandelbrot's Model Explain Zipf's Law
for Language?', Journal of Quantitative Linguistics, 16: 3, 274 — 285
To link to this Article: DOI: 10.1080/09296170902850358
URL: http://dx.doi.org/10.1080/09296170902850358
This article may be used for research, teaching and private study purposes. Any substantial or
systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or
distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses
should be independently verified with primary sources. The publisher shall not be liable for any loss,
actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly
or indirectly in connection with or arising out of the use of this material.
Journal of Quantitative Linguistics
2009, Volume 16, Number 3, pp. 274–285
DOI: 10.1080/09296170902850358
ABSTRACT
Zipf’s law states that if words of a language are sorted in the order of decreasing
frequency of usage, a word’s frequency is inversely proportional to its rank, or sequence
number in the list. The Zipf-Mandelbrot law is a more general formula that provides a
better fit in the low-rank region. Among several models aimed at explaining this effect,
Mandelbrot’s model is one of the best known. It derives Zipf’s law as a result of the
optimization of information/cost ratio, but leads to an unrealistic view of texts as random
character sequences. In this article, a new modification of the model is proposed that is
free from this drawback and allows the optimal information/cost ratio to be achieved via
language evolution. It is demonstrated that the Zipf-Mandelbrot formula follows from
this model, but its two parameters are not independent. As a result, the formula cannot
convincingly be fitted to the actual word frequency distributions.
INTRODUCTION
Zipf’s law (Zipf, 1949) may be one of the most enigmatic and
controversial regularities known in linguistics. In its most straightfor-
ward form, it states that if words of a language are ranked in the order of
decreasing frequency in texts, the frequency is inversely proportional to
the rank (sequence number in the list),
fk / kB ð1Þ
*Address correspondence to: D. Yu. Manin, 3127 Bryant Street, Palo Alto, CA 94306.
Tel: 650-575-1506. E-mail: manin@pobox.com
unity) be pk, and let the cost of producing word wk be Ck. It makes sense
to leave the function Ck unspecified for as long as possible. The word’s
information content, or entropy, is related to its frequency pk as
Hk ¼ 7 log2 pk. The average cost per word is given by
X
C¼ p k Ck ð3Þ
k
P
One can now ask what frequency distribution {pk} satisfying kpk ¼ 1
will minimize the cost ratio C* ¼ C/H.
We can use the standard method of Lagrange multipliers to find the
minimum of C*, given the normalization constraint on pk:
!
@ X
C þl pj ¼ 0: ð5Þ
@pk j
Ck C
þ ðlog2 pk þ 1= ln 2Þ þ l ¼ 0; 8k: ð6Þ
H H2
pk ¼ l0 2HCk =C ; ð7Þ
where we denoted
2
l0 ¼ 2lH =C1= ln 2 : ð8Þ
Ck ¼ C0 log2 k ð9Þ
which leads to
C0
pk ¼ l0 kB ; B ¼ H ð10Þ
C
Downloaded By: [University of Chicago] At: 23:57 9 March 2011
(note that C / C0, so C0/C does not depend on C0). How could one justify
Equation (9)? In Mandelbrot’s original formulation, as we already
mentioned, the cost of a word was assumed to be proportional to its length;
thus the only way to get the logarithmic dependency on the rank is to assume
that the number of distinct words grows exponentially with length. It is not
necessary in this formulation to postulate that any combination of letters of a
given length is equally probable, but even this weaker requirement is not
realistic for natural languages, as demonstrated by Figure 2.
There is, however, a much more plausible argument in favour of the
desired Ansatz (9), which does not depend on any assumptions about
word length at all. Suppose words are stored in some kind of an
addressable memory. For simplicity, one can imagine a linear array of
memory cells, each containing one word. Then, the cost of retrieving the
word in the kth cell can be assumed to be proportional to the length of its
address, that is to the minimum number of bits (or neuron firings, say)
needed to specify the address. And this is precisely log2 k. Of course, this
does not depend on memory being in any real sense ‘‘linear’’.
It is important to note that this is not just a different justification,
because with it the optimality model is no longer equivalent to the
random typing model. Let us now proceed to solving (10). From the
normalization condition for frequencies, we get
1 B
pk ¼ k ð11Þ
zðBÞ
P s
where z is the Riemann zeta-function zðsÞ ¼ 1 1 n . But this is not the
end of the story, since B is related to H and C via Equation (10), and they
MANDELBROT’S MODEL FOR ZIPF’S LAW 281
in turn depend on B via pk. This amounts to an equation for the power
law exponent B, which thus is not arbitrary. By substituting (11) back
into (3) and (4), we get
C0 X1
C¼ kB log2 k ð12Þ
zðBÞ 1
B X 1
H¼ kB log2 ðkzðBÞ1=B Þ: ð13Þ
zðBÞ 1
since it means that the minimum cost per unit information is achieved
when there is only one word in use, and both cost and information
vanish.
This conclusion is borne out by a simple numerical simulation. Recall
that in Section 2, we noted that cost ratio optimization can be achieved
via local dynamics. Namely, if speakers notice that a word’s individual
information/cost ratio is below average, they start using it less, and
conversely, if the ratio is favourable, the word’s frequency increases. It is
hard to tell a priori whether this process would converge to a stationary
distribution, so a numerical simulation was performed. The following
algorithm implements this dynamics:
This procedure quickly leads to the state where all frequencies, but one,
are zero.
We have seen that the Ansatz (9) does not eventually lead to the desired
result. It is probably this problem that prompted Mandelbrot to propose
a modification to Zipf’s law. In his own words,
. . . it seems worth pointing out that it has not been obtained by ‘‘mere
curve fitting’’: in attempting to explain the first approximation law,
i(r,k) ¼ (1/10)kr71, I invariably obtained the more general second
Downloaded By: [University of Chicago] At: 23:57 9 March 2011
approximation, and only later did I realize that this more general
formula was necessary and basically sufficient to fit the empirical data.
(Mandelbrot, 1966, p. 356)
Ck ¼ C0 log2 ðk þ k0 Þ: ð14Þ
It looks rather natural if we again imagine the linear memory, but this
time with first k0 cells not occupied by useful words. Substitution of (14)
into (7) yields the Zipf-Mandelbrot law
1
pk ¼ ðk þ k0 ÞB ð15Þ
zðB; 1 þ k0 Þ
P
where z is now the Hurwitz zeta function, zðs; qÞ ¼ 1 s
0 ðn þ qÞ .
The Zipf-Mandelbrot formula has the potential of correctly approx-
imating not only the power law, but also the initial, low-rank range of the
real frequency distributions, which flatten out at k 5 10 or so. But
remember again that the second part of (10), B ¼ HC0/C, needs to be
satisfied, which means that parameters k0 and B are not independent.
This is rarely, if ever, mentioned in the literature, while it is a rather
important constraint. Substituting (15) into (3) and (4) and noting that
@ X
1
zðs; qÞ ¼ ðn þ qÞs lnðn þ qÞ ð16Þ
@s 0
MANDELBROT’S MODEL FOR ZIPF’S LAW 283
we obtain
C0 z0 ðB; 1 þ k0 Þ
C¼ ð17Þ
ln 2 zðB; 1 þ k0 Þ
B z0 ðB; 1 þ k0 Þ
H ¼ ln zðB; 1 þ k0 Þ ð18Þ
ln 2 zðB; 1 þ k0 Þ
B ¼ HC0 =C ð19Þ
ln zðB; 1 þ k0 Þ
B¼B ln 2 ð20Þ
ðln zðB; 1 þ k0 ÞÞ0
that is
zðB; 1 þ k0 Þ ¼ 1: ð21Þ
X
k0
B 1
n ¼ O ke ð24Þ
1
e 0
whence
Fig. 3. Zipf-Mandelbrot law with different values of k0. Real frequency distribution (not
to scale) and Zipf’s law are shown for comparison.
MANDELBROT’S MODEL FOR ZIPF’S LAW 285
REFERENCES
Ferrer i Cancho, R. (2005). The variation of Zipf’s law in human language. European
Physical Journal B, 44, 249–257.
Li, W. (1992). Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE
Transactions on Information Theory, 38(6), 1842–1845.
Mandelbrot, B. (1953). An informational theory of the statistical structure of languages.
In W. Jackson (Ed.), Communication Theory (pp. 486–502). Woburn, MA:
Butterworth.
Mandelbrot, B. (1966). Information theory and psycholinguistics: A theory of word
frequencies. In P. F. Lazarsfield & N. W. Henry (Eds), Readings in Mathematical
Social Sciences (pp. 350–368). Cambridge: MIT Press.
Mandelbrot, B. (1982). The Fractal Geometry of Nature. New York: Freeman.
Manin, D. Yu. (2008). Zipf’s law and avoidance of excessive synonymy. Cognitive Science
Downloaded By: [University of Chicago] At: 23:57 9 March 2011