Professional Documents
Culture Documents
Stratification in Texts
Stratification in Texts
Research
Report
Series
Stratification in Texts
Gabriel Altmann1, Ioan-Iovitz Popescu2,
Dan Zotta2
1
Ludenscheid and 2Bucharest
CDMTCS-433
February 2013
Stratification in texts
Gabriel Altmann, Ldenscheid
Ioan-Iovitz Popescu, Bucharest1
Dan Zotta, Bucharest
Abstract. Stratification in texts is a process analogous to those in nature and culture. Though one cannot identify the individual strata in every case, it is possible to show the rise of this phenomenon in
mathematical terms and apply the resulting formulas to examples from textology and music. It allows
also to study the evolution of a writer, text sort, language or music.
(1)
f(x) = C + y(x),
(2)
Stratification in texts
used successfully to rank-frequency sequences proposed as an alternative to Zipfs law which
does not capture stratification (cf. Popescu, Altmann, Khler 2010). The derivatives of (2) are
(4)
where k1
p = - (k1 + k2)
q = (k1k2)
we get the standard form of the 2nd order linear homogeneous ordinary differential equation
with constant coefficients
(5)
y'' + py' + qy = 0.
Conversely, let's start from this equation
y'' + py' + qy = 0
where p and q are real numbers, and look for a solution
y = exp(kx).
Inserting it into the above equation we have
(k2 + pk + q)exp(kx) = 0
or, because exp(kx) is never zero, we obtain the so called characteristic equation
k2 + pk + q = 0
with the discriminant
= p2 - 4q
If
> 0, the characteristic equation has two real and distinct solutions, k1 and k2, given by
k1 = (-p +
k2 = (-p -
)/2
) / 2,
p = (k1 + k2)
q = k 1 k2 .
To conclude, the fitting function consisting of two exponential components represents the
solution for the case
of the 2nd order linear homogeneous ordinary differential equation with constant coefficients, see more, for instance, at http://www.efunda.com/math/ode/
linearode_consthomo.cfm
The generalization is straightforward: the fitting function consisting of n exponential
components represents the solution of the nth order linear homogeneous ordinary differential
equation with constant coefficients, for the case when all solutions of the characteristic equation are real and distinct numbers.
The above solution of the stratification problem has the advantage of telling us the
number of strata of the given unit in the given text (cf. Popescu, Altmann, Khler (2010);
ltmann (2009); Popescu, MartinkovRendekov, Altmann (2012)). However, it does not enable us to identify the strata.
Take as an example the word form frequency in Goethes poem Erlknig ranked in
decreasing order as shown in Table 1.
Table 1
Ranked word form frequencies in Erlknig by Goethe
fx
fx
fx
fx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
11
9
9
7
6
6
5
5
4
4
4
4
4
4
4
3
3
3
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Stratification in texts
19
20
21
22
23
24
25
26
27
28
29
30
31
3
3
3
2
2
2
2
2
2
2
2
2
2
50
51
52
53
54
55
56
57
58
59
60
61
62
1
1
1
1
1
1
1
1
1
1
1
1
1
81
82
83
84
85
86
87
88
89
90
91
92
93
1
1
1
1
1
1
1
1
1
1
1
1
1
112
113
114
115
116
117
118
119
120
121
122
123
124
1
1
1
1
1
1
1
1
1
1
1
1
1
If we fit the data with a function having a sum of three exponential functions in its expression,
that is with
(6)
we obtain the results presented in Figure 1 with the determination coefficient R2 = 0.9824.
Stratification in texts
Figure 4. Fitting the pitch rank-frequencies in Mozarts Sonata A major K.331 with a function
of type (6) indicates three strata
References
Beliankou A., Khler R. (2010). The distribution of parts-of-speach in Russian texts. Glottometrics 20, 59-69.
Fan, F., Altmann G. (2010). On meaning diversification in English. In: Sprachlehrforschung:
Theorie und Empirie. Festschrift fr Rdiger Grotjahn: 223-233. Frankfurt: Lang.
(1997). Lectures on text theory. Prague: Oriental Institute.
Kelih, E. (2009). Graphemhufigkeiten in slawischen Sprachen: Stetige Modelle, Glottometrics 18, 52-68.
Knight, R. (2013). Laws Governing Rank Frequency and Stratification in English Texts.
Glottometrics 25 (present issue).
Khler, R. (2005). Synergetic linguistics. In: Khler, R., Altmann, G., Piotrowski, R.G. (eds.),
Quantitative Linguistics. An International Handbook: 760-774. Berlin: de Gruyter.
Khler, R. (2006). The frequency distribution of the lengths of length sequences. In: Genzor,
J., Buckov, M. (eds.), Favete linguis. Studies in honour of Victor Krupa: 145-152. Bratislava: Slovak Academic Press.
Stratification in texts
Khler, R. (2008). Word length in text. A study in the syntagmatic dimension. In: MisloviJazyk a jazykoveda v pohybe: 416-421. Bratislava: VEDA: Vydavatel'stvo SAV.
Khler, R. (2008). Sequences of linguistic quantities. Report on a new unit of investigation.
Glottotheory 1(1), 115-119.
Laufer, J., Nemcov E. (2009). Diversifikation deutscher morphologischer Klassen in SMS.
Glottometrics 18, 13-25.
Martinkov, Z., Popescu, I.Some problems of musical texts. Glotometrics 16, 80-110
Nemcov, E., Popescu, I.-I., Altmann, G. (2001). Word associations in French. In: Berndt,
A., Bcker, J. (eds.), Sprachlehrforschung: Theorie und Empirie: 223-237. Frankfurt:
Lang.
Popescu, I.-I., Altmann, G., Khler, R. (2010). Zipf's law another view. Quality and
Quantity 44(4), 713-731.
Popescu, I.(2011). On stratification in poetry. Glottometrics 21,
54-59.
Popescu, I.(2009). Aspects of word frequencies, Ldenscheid:
RAM.
Popescu, I.-I., Martinkov-Rendekov Z., Altmann G. (2012). Stratification in musical
texts based on rank-frequency distribution of tone pitches, Glottometrics 24, 25-40.
Sanada H., Altmann G. (2009). Diversification of postpositions in Japanese, Glottometrics
19, 70-79.
Tuzzi, A., Popescu, I.-I., Altmann, G. (2010). Quantitative analysis of Italian texts, Ldenscheid: RAM.