Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

A MORPHOLOGICAL PROCESSOR FOR MODERN GREEK

Angela Ralli Eleni Galiotou


- Universit~ de Montreal, Montreal, - National Documentation Center Prj.,
Quebec, Canada National Hellenic Research Foundation,
- EUROTRA - GR, Athens, Greece
Athens, Greece - EUROTRA - GR,
Athens, Greece

ABSTRACT tion process.


In this paper, we present a morphological pro- b. The validation Pules that determine all
cessor for Modern Greek. possible combinations of the entry with other
morphemes.
From the linguistic point of view, we tr5, to
elucidate the complexity of the inflectional sy- 4. A list of phonemes described as sets of featu-
stem using a lexical model which follows the res. The same file contains also a set of phonolo-
mecent work by Lieber, 1980, Selkirk 1982, Kipar- gical rules generating lexical phonological phe-
sky 1982, and others. nomena. These rules govern permissible correspon-
dences between the form of entries listed in the
The implementation is based on the concept of
dictionary and the form they develop when they
"validation grammars" (Coumtin 1977).
are combined in sequences of morphemes.
The morphological processing is controlled by a
These files are used both for analysis and ge-
finite automaton and it combines
neration. The process of the present morphological
a. a dictionary containing the stems for a analysis consists of parsing an input of inflected
representative fragment of Modern Greek and all words with respect to the word grammar. Stems
the inflectional affixes with associated to the appropriate morpho-syntactic in-
formation will be the output of the parsing.
b. a grammar which camries out the transmis-
sion of the linguistic information needed for the The process of generation of a given inflected
processing. The words are structured by concate- word consists of
nating a stem with an inflectional part. In cer-
a. determining its stem by a morphological
tain cases, phonological rules are added to the
analysis.
grammar in order to capture lexical phonological
phenomena. b. Generating all or a subset of the permis-
sible word forms.
i. Intu'oduction-Ovemview
For the needs of this presentation, lexical
Our processor is intended to provide an analy- items have been transcribed in a semi-phonological
sis as well as a generation for every derived item manner. According to this transcription,all greek
of the greek lexicon. It covers both inflectional vowels written as double character are kept as
and derivational morphology but for the time such:
being only inflection has been treated.
(1) Gmaphemes Phonemes
Greek is the only language tested so far.
Nevertheless, we hope that our system is general
enough to b e of use to other languages since the o~ ~ oi
formal and computational aspect of "validation ~u ~ ai
grammars" and finite automata has already been OH ~ oy
used for French (c.f. Courtin et al. 1976, Galio-
Moreover, the sounds [i] and [o~ written in Greek
tou 1983).
as n and ~ respectively are transcribed as i:
The system is built around the following data and o:. The transcription of the last two vowels
files: reminds of their ancient greek status as long
vowels.
I.A "dictionary" holding morphemes associated to
morpho-syntactic information. As far as accent is concerned, we decided to
exclude this aspect from the present form of the
2.A "model" file containing items which act as
processor. Accentuation in Greek is a linguistic
reference to every morphematic entry in order to
problem which has not been solved as yet. We are
determine what kind of process the entry under-
working on this matter and we hope to implement
goes.
accent in the near future.
3.A word grammar which governs permissible word
The morphological processing is controlled by
structures. The rules that can apply to an entry
a finite automaton I with the help of the dictio-
are divided in
T--F~r-a detailed discussion on the control auto-
a. a "basic initial rule" acting as a recogni-
maton, c.f.Courtin et al 1969.

26
namy and the word grammar which controls word for- affixes belong to The endings of verbal types that
marion and carries out the transmission of The are aspectually marked.
linguistic information needed for the processing.
In certain cases, the gPammar makes use of phono- (4) Infl * affix Infl
logical rules in order To capture lexlcal phonolo- ame
Example: 7mapsame + 7rap s
gical phenomena such as insertion, deletion and BP
"we wTote .... write" ~erf~
change. pl
The processor is implemented in TURBO-PROLO~ pastJ
(version 1.0) running under MS-DOS (version 3.10)
on an IBM-XT with 640 kB main memory. It consists
Note that the stem 7rap is listed in the dictiona-
of an analysis and a generation sub-module.
ry as ymaf. The consonant [f~ is changed to [p]
because of the [s3 that follows. The phonological
2. Linguistic assumptions rule in ouestion is lexical and it applies to the
The theoPetical fPamework underlying the morpheme boundary. As such, the rule is morpholo-
linsuistic aspects of the project is that of Gene- gically conditioned and ~r allows exceptions~
rative Morphology, in particular the recent work
When verbal types do not contain an aspectual
by Lieber 1980, Selkirk 1982, Kiparsky 1982 and
marker, Infl refers to a single affix.
others.
In developing our system, we have adopted the 3.1 The dictionary structure
proposals made in Ralli's study on Greek Morpholo-
In our system, The dictionary consists of a se-
gY (Ph.D.diss., 1987). Therefore, we assume that
quence of entries each in the form of a Prolog
the greek lexicon contains a list of entries
term.
(dictionary) and a grammap which combines morpholo-
gy with phonology. The dictionary is morpheme It has to be noted that no significant semantic
based. It contains stems and affixes which ape information is present in our entries because that
associated with the following infor~nation fields. field is still unexploited. Similarly, The syntac-
tic information concerning subcategorization pro-
a. The string in its basic phonological form.
perties of lexical entries is not taken into
b. Reference to possible allomorphic varia- account.
tions of The string which are not productively ge-
The dictionary also contains information That
nerated by rule.
perTniTs the "linking" with the grammar. So, apart
c. Specifications of grammatical category and from the linguistic information mentioned in
other morpho-syntactic features that characterize section 2, every entry of the dictionary contains
the particular entries. also
d. The meaning. a. a list of rules that permit the use of a
particular entry (rules That have the entry as
e. Diacritic marks which are integers permit-
Their Terminal symbol).
ring the correct mapping between the stem and the
affix where this cannot be done by rule. b. a list of validatio~ rules (rules that can
be applied after each use of that entry).
(i) Stem Affix
As far as morphology is concerned, forms can be
arranged into classes. We choose arbitrarily an
vivli 3 + o3 "book" (neut, nom,sg)
element of this class called a "model" and every
stem in the dictionary refers to a model. Morpho-
krat 4 ÷ os 4 "state" (neuT,nom,sg)
logical information is found at the model level.
In this way, the size of the dictionary is signi-
In our work, diacritic marks replace the tradition-
ficantly reduced.
al use of declensions and conjugations which fail
to divide nouns and verbs in inflectional classes. The model file consists also of sequences of
entries, each in the form of a Prolog term. Each
The inflectional structure of words is handled
model includes information concerning
by a grammar which assigns a binary tree structure
to the words in question. The rules are of the form a. The form of the string,
b. the "basic initial mule" which identifies
(2) Word ÷ stem Infl,
the string,
where, Word and stem are lexical categories and c. the possible diacritic mark,
Infl indicates the inflectional ending. For nomi- d. the set of morpho-syntactic features,
nal stems, Infl corresponds to a single affix
marked for number and case. e. the validation rules which substitute word
formation rules.
(3) Infl ~ affix
3.2 Examples from the dictionary
Example: 6romos ÷ 6rom-os (nom, sg)
"street" Example of a dictionary entry:

For verbs, the constituent Infl refers either 2For a detailed study of lexical Dhonological ru-
to one or to two affixes. In the latter case, Two les, c.f. Kiparsky 1982/83.

27
Stem Model List of tions which ame redefined after the application of
allomor~hs each rule. In our system, validation rules consist
of a list of PPolog clauses.
dict ("papa%yr", "vivli",
Transmission concerns the grammatical category
"window" "book"
and other morpho-syntactic features.
Model en%Ty of the example above Linguistically, we regard stems to be the head
of inflectedwords. As such, they contribute to
Entmy Boln.R. Diac. Feat. Valid.
the categorial specifications of the words. More-
stem ("vivli", ~init], ~], [n,neut], [nll,nl2] over, all morpho-syntactic features of inflectio-
nal affixes ape also copied to the word. In word
We did not write separate dictionary entries for structures built in the form of a tree, features
affixes because each affix is a model on its own. ape percolated to the mother node according to the
Therefore, information associated with an affix Percolation Principle as it was formulated by
model must cover all unpredictable information Selkirk.
listed within the corresponding dictionary entry.
Instead of a "basic initial rule", every affix mo- (i) Percolation Principle (Selkirk 1982)
del refers to a set of rules that govern the com-
a. If a head has a feature specification [aFi],
bination of the affix with a particular stem. An
affix that terminates a word is identified by an a~u, its mother node must be specified [aFi] and
vice versa.
empty set of validation rules.
b. If a non head has a feature specification
Example of an affix model uSfj] and the head has the feature specification
EnVy Rules Diac. Feat. Val. Fjj, then the mother node must have the feature
specification ~Fj]. (page 76).
af("o", [n12,a4], [3], [nom,sg] , [])
The principle in question is incorporated in
4. The gmammam our validation Pules where, for each inflected
word, it is determined which features are taken
In order to carry out the processing we use a from the stem and which come from the affix.
"validation grammar" as defined in Cour~in 1977.
(2) Example of a validation mule
4.1 Review of validation g~e,,,a~s
rule(nil,Stem, ,StFeat, ,
A validation grammar GV is a 4-tuple Affix,[],[fFeat,A~al
GV=(VTv , SV, gV, E), where, Result,[],ResFeat,AfVal):-
concat(Stem,Affix,Result),
VTV = a vocabulary of terminal symbols. append_list(StFeat,AfFeat,ResFeat)

E=a subset of the set of i n t e g e r s . where, "concat" is a Prolog predicate performing


SV @ ~(E) and is called axiom the concatenation of two strings and "append list"
is a Prolog predicate performing the concatenat-
~V=a finite set of production rules. ion of two lists.
However, accoDding to Ralli's study, features
A production is an element of the application are not only percolated To words from stems and
affixes. Feature values may also be inserted to
E ÷ VTV X@(E) certain underspecified environments. For instance,
Productions are of the form when an inflected word fails to take certain fea-
tures fl~om both the stem and the ending, the rule
i ÷ a[jl ..... jq] or then takes over the role of adding them. Consider
i ÷ a[O], where i e E, the verbal form 71"afo: "I write". It takes the ca-
tegory value from the stem (TTaf-) and the featu-
Dl'J ..... jq] e @(E~, a ~ Vrv res of person and number from the affix (-o:). It
is clear that at this point, 7Taro: is underspeci-
Property 1 fled because besides the values of person and num-
ber, greek verbal forms must be characterized by
A validation Krammar is equivalent to a re~ul~v
grammar since they generate the same language. aspect, tense and voice. Following this, we assume
Consequently, there is a finite automaton that re- that specific values of the last three attributes
cognizes the strings generated by a validation are inserted by the rule governing the combination
grammar. of the stem ymaf- with the ending -o:.

P~oper, ty 2 (3) Rule generating 7mafo:

The number of production rules of a validation rule(vll,Stem,[],StFeat,_,


grammar is less than or equal to the number of Affix,[],AfFeat, AfVal,
production rules of its equivalent regular grammar. Result,[],ResFeat, AfVal):-
Concat(Stem,Affix,Result),
4.2 Contmol, Transmission and phonological changes feat ins(StFeat~[non__perf,present,
-- activeJ,AfFeat,ResFeat)
Contr~l is carried out with the help of valida-

28
IT is worth noting that a validation rule can insert the feature value "active".
also take into account instances of morpho-phono-
In this way, we obtain:
logical phenomena.
(i) e-yraf-a "I was writing"
#.2.1 Morpho-phonological insertion
~Taf-ame "We were writing"
In Greek, in several cases, transition elements but not ee-yraf-ame
appear at a morpheme boundary between Two consti-
Tuents (c.f.Ralli 1987). Both the insertion and the 5. The Process
phonological form of the elements are always con-
The analysis of a word form is carried out in-
ditioned by the morphological environment.
dependently of its syntactic environment. Conse-
Nominal as well as verbal inflection undergo quently, the analyzer will provide the set of all
morpho-phonological insertion depending on the possible analyses.
kind of stem that is involved in the process. An
In order to program and store the automaton,we
example of morpho-phonological insertion is the
perform a splitting of its transitions and each
verbal thematic vowel.
transition is represented by a rule.
(i) Stem Th.V. Af
(1) avli: "yard" (nom/acc singular)
yraf o mai "I am written"
dictionamy entries
yraf e Tai "It is written"
diet( "avl", "avl", [] )
Similarly, in certain nouns and adjectives, a
vowel appears in singular, between the stem and model ant:ties
the inflection. stem( "avl", [init], [l'l,
In,fern] , ~nll,n12",n21,n22 ,n23] )
(2) Stem Th.V. Af
af(" ",[n21,n23,n32,n33,a21,a23],
tami a s "cashier" [],[],[])
foiti:t i: s "univ. student"
Transitions
Insertion is not the only morphophonological
Rule STring Resulting Feat.,
phenomenon. Val.
s%Ting
4.2.2 Morpho-phonological change init "avl" "avl" cat=n
gd=fem
As already mentioned in section 2, verbal in- diec= [i]
flecZion undergoes morphophonological changes on
val= [ nll ,nl2 ,n21,
the stem and/or the affix during the construction
of aspectually marked verbal types. Rules perfor- n22,n23]
ming phonological changes are applied cyclically n21 " " "avli :" cst:n
each time the appropriate lexical string is formed. gd=fem
Phonological rules take into account a list of num:sg
phonemes described as sets of distinctive features. case:nom
In our system, phonemes are listed as Prolog terms.
n23 " " "avli :" cat=n
Phonological rules are listed as Prolog clauses.
gd:fem
Take for example the form 6e-s-ame "we tied". num:sg
The stem 6e- is listed in the dictionary as 6en-. case:ace
The validation rule authorizing the concatenation
The rule init starts the analysis by taking
of 6en- and -s- demands the application of a lexi-
every information from the dictionary level. The
cal phonological rule responsible for the deletion
stem "avl" is validated by rules n2! and n23,
of the final Inl.
among others, which will also authorize the use
of a 0-affix. Moreover, they perform morpho-pho-
~.2.3 The augment rule
nological insertion of the transition element -i:
It is generally accepted that augment in Modern during the concatenation of "avl" and " ". The
Greek must be considered as a phonological element resulting string is avli: i n both cases. These
introduced in the appropriate morphological envi- rules also perform feature insertions. Rule n21
ronment. That is, an e- is prefixed to forms marked inserts feature values [nominative] and [singular]
for past in which it is always accentuated. Given while n23 inserts feature values ~ccusative] and
the fact that accentuation is not treated here, we [singular_~ .
decided to divide verbal stems in marked and un-
The analysis of the form avli: is completed in
marked for augment. Once a verbal item is built,
27 hundredths of a second (cpu time).
the e- is added at the beginning of the form in
singular and third person plural only if the stem As already mentioned the system is reversible.
carries the feature [aug]. In order to generate all possible forms of avli:
In our system, the augment rule, listed also as we apply all validation rules of the stem "avl"
and thus we obtain:
a Prolog clause, is activated by validation rules
authorizing the concatenation of a verbal stem and
a verbal affix marked for past. The same rules

29
" " n21

./ g d = f e m ~ .
"avl" init

- string="avli :"

/Final
state
cat=n ~
gd=fem ~ /
diao=[1]
val= [nll,nl2 ,n21, n22, n23]
s Ircing= "avl"
" " n23
cat=n
gd=fem
case=acc
num=sg
string="avli:"

FiEume i: T~ansition graph of the automaton

(2) avli: (fem,nom,sg) REFERENCES


avli:s (fem,gen,sg) A r o n o f f , M. 1976 Word F o r m a t i o n i n G e n e r a t i v e
avli: (fem,acc,sg) Grammam, Linguistic Inquiry, Monograph i., M.I.T.
avles (fem,nom,pl) Press
avlo:n (fem,gen,pl)
avles (fem,acc,pl) Babiniotis, G. 1972 The Greek Verb, Athens,
Greece
The generation of all possible forms of avl-~:)
is completed in 43 hundredths of a second (cpu Chomsky, N. and M. Halle 1968 The Sound Pattern
time). of English, Hamper and Row, New York
As an example of processing of a verbal form Courtin, J. 1977 AlgTorithmes pour le traite-
we mention the analysis of 5e-s-ame "we tied" ment interactif des langues naturelles, Th~se d'
discussed in section 4.2.2 which is completed in Etat, Universit~ de Grenoble I, Grenoble, France.
50 hundredths of a second (cpu time), while the Courtin, J., Dujardin D. and Grandjean E. 1976
generation of all possible forms of 5en-(o:) "to Editeur lexicographique pou_r les langues naturel-
tie" is completed in i second and 59 hundredths
les, Document Interne, I.R.M.A, Grenoble, France.
(cpu time).
Courtin, J., Rieu J.L. and Szgall P. 1969 Un
5. Conclusion m~talangage pour l'analyse morphologique, Docu-
ment interne, C.E.T.A, Grenoble, France
In this paper, a morphological processor has
been presented that is capable of handling lexical Galiotou E. 1983 Construction d'un Analyseur
phonological phenomena. Future developments aim at Morphologique du Franqai~ en Foll-Prolog, M~moire
implementing a friendly user language and comple- D.E.A., Universit~ de Grenoble II, Grenoble,
ting the user interface. We also plan to produce France.
an implementation under UNIX, probably in C,which Kiparsky, P. 1982 Lexical Morphology and Pho-
will hopefully become a component of an integrated
nology, in Linguistic Society of Korea (Ed.),
natural language processing system for Greek.
Linguistics in the Mozn~ing Calm, Hanshin Publish-
ing Co, Seoul.
ACKNOWLEDGEMENTS
Kiparsky, P. 1983 Word Formation and the Lexi-
Our participation in the Conference was finan- con, in F. Ingemann (ed.) Proceedings of the 1982
ced partially by the EUROTRA-GR project and par- Mid-America Linguistics Conference, Univ. of Kan-
tially by the National Hellenic Research Founda- sas, Lawrence
tion. Koutsoudas, A. 1962 Verb Morphology of Modern
The realization of the project was made possi- Greek: a descriptive analysis, The Hague
ble thanks to the infrastructure provided by the Lieber, R. 1980 On the Organization of the Le-
National Documentation Center project at the
xicon P h . D . dissertation, M.I.T.
N.H.R.F.
Malikouti-Drachman, A. 1970 Transformational
We would like to thank Prof. A. Koutsoudas and Morphology of the Greek Noun, Athens, Greece
Prof. Th. Alevizos for their help and support.
Mohanan, K.P. 1982 Lexical Phonology, P h . D .
Special thanks go to Dr. J. Kontos for his va-
dissertation, M.I.T.
luable guidance, comments and encouragement.

30
Ralli, A. 1984 Verbal Morphology and the Theory
of Lexicon Proceedings of the 5th meeting of Lin-
guistics, Univ. of Thessaloniki, Greece (in Greek)
Ralli, A. 1986 Derivation vs Inflection Pro-
ceedings of the 7th meeting of Linguistics, Univ.
of Thessaloniki, Greece (in Greek)
Ralli, A. 19877 La morphologie verbale grecque,
Ph. D. dissertation Universitg de Montrgal, Mont-
real, Quebec, Canada
Selkirk, E. 1982 The Syntax of Womds, Linguis-
tic Inquiry Monograph, M.I.T-Press
Williams, E. 1981 On the notions "lexically
relazed" and "head of the word", Linguiszic In-
quiry, 12(2).
Warburton, I. 1970 On the Verb in Model-n Greek
Language Science Monographs, Volume 4 The Hague:
Mouton, Bloomlngton, Indiana University.

3/

You might also like