Stemmer of Kemant

UNIVERSITY OF GONDAR
FACULTY OF NATURAL AND COMPUTATIONAL SCIENCE
DEPARTMENT OF INFORMATION TECHNOLOGY
DEVELOPING A STEMMER FOR KEMANTNEY TEXT
BY
SEMALGN ESHETE ABERRA
JANUARY, 2015
GONDAR
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE

REQUIREMENT FOR THE DEGREE OF MASTER OF SCIENCE IN
INFORMATION TECHNOLOGY
By
SEMALGN ESHETE
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE

REQUIREMENT FOR THE DEGREE OF MASTER OF SCIENCE IN
INFORMATION TECHNOLOGY
By
SEMALGN ESHETE
Names and signature of Members of Examining Board
Name Title Signature Date
_______________________Chairperson _____ ______
_______________________Advisor ______ ______
_______________________Examiner ______ ______

ACKNOWLEDGEMENT
First of all, I would like to thank the “Almighty God” and his mother “St. Marry” who made it
possible.
Without the help and support of several persons this thesis could not have been written. I would
like to express my gratitude to all, who contributed to this research in one or another way. Then I
would like to forward my special thanks and respect to my advisor, Million Meshesha (PhD) for his
incredible and continuous support to accomplish my thesis successfully and to attain this shape.
I can not list all persons who have shown their brotherly and parental affection during my thesis. I
thank them all. There are some persons, however, who must be specifically listed:
I thank the most important person Womber Muluneh Mersha who was outstanding informant,
care-taker and to give me sufficient and make some available materials that are required for my
thesis. But Womber Muluneh Mersha passed away a few months ago. Let find here my heartfelt
recognition and thanks.
I am most grateful to Kefale Mamo, Wolde Ferede, Doctor Girma M., Bereket Abew, and Mezgebu
Ayele who were friendly and co-operative to deserve special gratitude for their all-rounded
assistance.
My special gratitude also Abuhay Tadese and W/amanuale Zenebe who were uninterrupted
encouragement and the involvement of their expert knowledge for Kemantney language.
Last, but not least, I wish to express my deep appreciation to my family, whose unfailing and
consistent help has a special place in this study. Their forbearance and assistance contributed in
no small measure for the completion of this thesis at the right time. It would have been extremely
difficult for me to finish this study without their untiring encouragement and deserves a credit for
being my power and great love.
Page | I
Finally I want to thank my colleagues, friends, and everyone who have always rendered material,
encourage me and give me moral support in the course of the work on this thesis.
January, 2015
Semalgn Eshete
University of Gondar
Gondar, Ethiopia
Page | II
TABLE OF CONTENTS
ACKNOWLEDGEMENT --------------------------------------------------------------------------------------------------- I
TABLE OF CONTENTS -------------------------------------------------------------------------------------------------- III
LIST OF TABLES ---------------------------------------------------------------------------------------------------------- XI
LIST OF FIGURES -------------------------------------------------------------------------------------------------------- XI
LIST OF ACRONYMS AND ABBREVIATIONS ---------------------------------------------------------------------- XII
ABSTRACT -------------------------------------------------------------------------------------------------------------- XIII
CHAPTER ONE ------------------------------------------------------------------------------------------------------------ 1
INTRODUCTION ---------------------------------------------------------------------------------------------------- 1
1.1. BACKGROUND OF THE STUDY ------------------------------------------------------------------- 1
1.2. MOTIVATION ---------------------------------------------------------------------------------------- 3
1.3. KEMANTNEY LANGUAGE AND ITS CLASSIFICATION ---------------------------------------- 4
1.4. STATEMENT OF THE PROBLEM ------------------------------------------------------------------ 6
1.5. OBJECTIVE OF THE STUDY ---------------------------------------------------------------------- 12
1.5.1. GENERAL OBJECTIVE -------------------------------------------------------------------------- 12
1.5.2. SPECIFIC OBJECTIVES -------------------------------------------------------------------------- 12
1.6. SCOPE AND LIMITATION OF THE STUDY ----------------------------------------------------- 13
Page | III
1.7. METHODOLOGY ----------------------------------------------------------------------------------- 13
1.7.1. LITRATURE REVIEW ---------------------------------------------------------------------------- 14
1.7.2. DATA SOURCES --------------------------------------------------------------------------------- 14
1.7.3. EXPERMENTATION METHOD ---------------------------------------------------------------- 14
1.7.4. TESTING PROCEDURE ------------------------------------------------------------------------- 15
1.8. SIGNIFICANCE OF THE STUDY ------------------------------------------------------------------ 16
1.9. ORGANIZATION OF THE THESIS---------------------------------------------------------------- 16
CHAPTER TWO --------------------------------------------------------------------------------------------------------- 18
REVIEW OF RELATED LITRATURE ---------------------------------------------------------------------------- 18
2.1. OVERVIEW OF STEMMING---------------------------------------------------------------------- 18
2.2. STEMMING TECHNIQUES ----------------------------------------------------------------------- 19
AFFIX REMOVAL METHOD ---------------------------------------------------------------------------- 21
SUCCESSOR VARIETY METHOD ---------------------------------------------------------------------- 22
TABLE LOOKUP METHOD ------------------------------------------------------------------------------ 24
N-GRAM METHOD -------------------------------------------------------------------------------------- 25
2.3. CLASSIFICATION OF STEMMING ALGORITHMS -------------------------------------------- 27
2.3.1. RULE BASED APPROACH ---------------------------------------------------------------------- 27
Page | IV
2.3.2. STATISTICAL APPROACH ---------------------------------------------------------------------- 29
2.4. RELATED WORKS ---------------------------------------------------------------------------------- 35
2.4.1. ENGLISH LANGUAGE STEMMERS----------------------------------------------------------- 35
LOVINS STEMMING ALGORITHM ---------------------------------------------------------------- 36
DAWSON STEMMING ALGORITHM ------------------------------------------------------------- 36
PORTER STEMMING ALGORITHM --------------------------------------------------------------- 37
PAICE/HUSK STEMMING ALGOTRITHM -------------------------------------------------------- 38
KROVETZ STEMMING ALGORITHM -------------------------------------------------------------- 38
2.4.2. Bon: first Persian stemmer ------------------------------------------------------------------ 39
2.4.3. ARABIC STEMMING ALGORITHMS --------------------------------------------------------- 39
2.4.4. STEMMING ALGORITHM FOR ETHIOPIA LANGUAGE ---------------------------------- 40
SILT'E STEMMER ------------------------------------------------------------------------------------- 40
WOLAYTA STEMMERS ------------------------------------------------------------------------------ 41
OROMO STEMMERS -------------------------------------------------------------------------------- 42
TIGRIGNA STEMMERS ------------------------------------------------------------------------------ 43
AMHARIC STEMMERS ------------------------------------------------------------------------------ 44
CHAPTER THREE ------------------------------------------------------------------------------------------------------- 46
Page | V
MORPHOLOGY OF KEMANTNEY LANGUAGE -------------------------------------------------------------- 46
3.1. MORPHOLOGY------------------------------------------------------------------------------------- 46
3.2. WORD FORMATION IN KEMANTNEY --------------------------------------------------------- 47
3.3. INFLECTIONAL SUFFIXES OF KEMANTNEY--------------------------------------------------- 49
3.3.1. NOUNS ------------------------------------------------------------------------------------------- 49
3.3.1.1. ACTION NOMINALS ---------------------------------------------------------------------- 49
3.3.1.2. GENDER ------------------------------------------------------------------------------------ 50
3.3.1.3. NUMBER ----------------------------------------------------------------------------------- 50
3.3.1.4. CASE SYSTEM------------------------------------------------------------------------------ 55
NOMINATIVE CASE ------------------------------------------------------------------------------ 55
ACCUSATIVE CASE ------------------------------------------------------------------------------- 57
GENITIVE CASE ----------------------------------------------------------------------------------- 59
OBLIQUE CASE ------------------------------------------------------------------------------------ 61
3.3.1.5. NOMINAL DERIVATION ----------------------------------------------------------------- 62
INFINITIVAL NOMINAL -------------------------------------------------------------------------- 62
ABSTRACT NOMINALS -------------------------------------------------------------------------- 64
AGENTIVE NOMINALS -------------------------------------------------------------------------- 65
Page | VI
3.3.2. PRONOUNS -------------------------------------------------------------------------------------- 65
3.3.2.1. SUBJECT PERSONAL PRONOUNS ----------------------------------------------------- 65
3.3.2.2. OBJECT PRONOUNS --------------------------------------------------------------------- 66
3.3.2.3. POSSESSIVE PRONOUNS ---------------------------------------------------------------- 66
INDEPENDENT POSSESSIVE PRONOUNS ---------------------------------------------------- 66
BOUND POSSESSIVE PRONOUNS (POSSESSIVE ADJECTIVES) -------------------------- 68
3.3.2.4. DEMONSTRATIVE PRONOUNS -------------------------------------------------------- 69
3.3.2.5. REFLEXIVE PRONOUNS------------------------------------------------------------------ 69
3.3.3. ADJECTIVES -------------------------------------------------------------------------------------- 70
3.3.4. VERB ---------------------------------------------------------------------------------------------- 71
3.3.4.1. STEM ---------------------------------------------------------------------------------------- 71
3.3.4.2. VERB INFLECTIONS----------------------------------------------------------------------- 71
PERSON, NUMBER AND GENDER INFLECTIONS------------------------------------------- 72
3.3.4.3. VERB TO BE -------------------------------------------------------------------------------- 72
3.3.4.4. TENSE AND ASPECT ---------------------------------------------------------------------- 74
3.3.4.4.1. PERFECTIVE ASPECT ---------------------------------------------------------------- 74
SIMPLE PAST ---------------------------------------------------------------------------------- 74
Page | VII
PRESENT PERFECT---------------------------------------------------------------------------- 75
PAST PERFECT --------------------------------------------------------------------------------- 77
3.3.4.4.2. IMPERFECTIVE ASPECT ------------------------------------------------------------ 78
PRESENT/FUTURE ---------------------------------------------------------------------------- 78
PROGRESSIVE (DURATIVE) ASPECT ------------------------------------------------------ 79
3.3.4.5. VERBAL EXTENTIONS -------------------------------------------------------------------- 80
PASSIVE -------------------------------------------------------------------------------------------- 81
CAUSATIVE ---------------------------------------------------------------------------------------- 81
ADJUTATIVE --------------------------------------------------------------------------------------- 82
FREQUENTATIVE --------------------------------------------------------------------------------- 83
RECIPROCAL --------------------------------------------------------------------------------------- 84
EMPHATIC ----------------------------------------------------------------------------------------- 85
3.3.4.6. MOOD -------------------------------------------------------------------------------------- 87
GERUNDIVE --------------------------------------------------------------------------------------- 87
JUSSIVE --------------------------------------------------------------------------------------------- 88
IMPERATIVE --------------------------------------------------------------------------------------- 89
CONDITIONAL ------------------------------------------------------------------------------------ 91
Page | VIII
3.3.4.7. INTERROGATIVE -------------------------------------------------------------------------- 92
3.3.4.8. NEGATION --------------------------------------------------------------------------------- 96
3.3.4.9. EMBEDDED VERBS ----------------------------------------------------------------------- 97
RELATIVE PARADIGM --------------------------------------------------------------------------- 98
SUBORDINATOR /-ኘ/---------------------------------------------------------------------------- 98
SUBORDINATOR /-ትዝ/ ------------------------------------------------------------------------- 99
ADVERBIAL SUBORDINATOR /-ኙ/----------------------------------------------------------- 100
3.3.5. ADVERB ----------------------------------------------------------------------------------------- 100
3.3.6. QUESTION WORDS --------------------------------------------------------------------------- 101
3.3.7. CONNECTIVES---------------------------------------------------------------------------------- 102
3.3.8. NOUN PHRASE --------------------------------------------------------------------------------- 104
3.3.9. EMBEDDED CLAUSES ------------------------------------------------------------------------- 105
3.3.10. SEMANTIC LOAD (POLYSEMY) ------------------------------------------------------------ 107
3.3.11. VOCABULARY TEST -------------------------------------------------------------------------- 107
3.4. SUMMARY ----------------------------------------------------------------------------------------- 108
CHAPTER FOUR ------------------------------------------------------------------------------------------------------- 110
IMPLEMENTAION AND EXPERIMENTAL RESULT -------------------------------------------------------- 110
Page | IX
4.1. MORPHOLOGICAL PREPROCESSING --------------------------------------------------------- 110
4.1.1. TOKENIZATION -------------------------------------------------------------------------------- 111
4.2. COMPILATION OF AFFIXES --------------------------------------------------------------------- 112
4.3. THE PROPOSED STEMMER--------------------------------------------------------------------- 113
4.4. RULES FOR REMOVING SUFFIXES ------------------------------------------------------------ 115
4.5. IMPLEMENTATION OF THE STEMMER ------------------------------------------------------ 116
4.6. EVALUATION OF THE STEMMER ------------------------------------------------------------- 117
CHAPTER FIVE --------------------------------------------------------------------------------------------------------- 122
CONCLUSION AND RECOMMENDATIONS ---------------------------------------------------------------- 122
5.1. CONCLUSION ------------------------------------------------------------------------------------- 122
5.2. RECOMMDATIONS ------------------------------------------------------------------------------ 123
REFERENCES ----------------------------------------------------------------------------------------------------------- 125
Page | X
LIST OF TABLES
Table 3.1: Sound of Kemantney language-------------------------------------------------------------------------48
Table 3.2: Sample list of action nominal---------------------------------------------------------------------------49
Table 3.3: Sample list Kemantney of noun------------------------------------------------------------------------51
Table 3.4: Imperative---------------------------------------------------------------------------------------------------90
Table.3.5: Negation of imperfect aspect---------------------------------------------------------------------------97
Table 3.6: Semantic load (Polysemy) -----------------------------------------------------------------------------107
Table 4.1: List of rules constructed for stemming--------------------------------------------------------115-116
Table 4.2: Correct stem and incorrect stem---------------------------------------------------------------------119
Table 4.3: Sample of Kemantney words under-stemmed and some over-stemmed------------119-120
LIST OF FIGURES
Figure 2.1: Approaches for designing a stemmer for a given language-----------------------------------20
Figure 2.2: Classification of stemming algorithm----------------------------------------------------------------27
Figure 4.1: Flow chart for the general suffix removal algorithm-------------------------------------------114
Figure 4.2: Sample Python code for suffix removal------------------------------------------------------------117
Page | XI
LIST OF ACRONYMS AND ABBREVIATIONS
AC Accusative NEG Negative

AG Agent NLP Natural language processing
AUX Auxiliary OBJ Objective
CON Conditional P Post Position
EMP Emphatic PL Plural
F Female PO Possessive
FS Female Singular POL Polite
IM Imperative PRO Progressive
IMP Imperfect PS Past
INF Infinitive RE Relative
IR Information Retrieval RED Reduplication
M Male S Singular
MS Male Singular SOV Subject Object Verb
N+Q Noun + Quantifier SUB Subordinator
Page | XII
ABSTRACT
This study presents the development of stemmer for Kemantney language text. Stemming is a
natural language technique that identifies a stem/root from morphologically conflated words
using various techniques on inflectional and derivational affixes. It helps to reduce morphological
variants of words in to a common form. Stemmer is a system in which we are giving the input as
an inflected word and get the output in the form of root word.
In this study an attempt is made to model the language and develop an automatic procedure for
conflation. It is indicated by scholars that suffixation is the main word formation process in
Kemantney language. Accordingly, the Kemantney stemmer is developed following a hybrid
approach of iterative stemming algorithm followed by rule based as post-processor to enhance
the performance of the stemmer. The prototype Kemantney stemmer is developed using Python
programming language.
The result of the study is promising to develop an applicable stemmer for the Kemantney language
text. The stemmer was evaluated using error counting method that counts under stemming and
over stemming errors. The stemmer registers an average performance of 93.65% accuracy and
reduces dictionary size by 66.13%. Out of the total errors, 5.39% are under-stemmed and 0.95%
are over-stemmed. Most of the errors occurred due to under-stemming of words. Stemming
words with single letter suffixes is the most difficult task in the suffix removal process. These
errors occur because of the large exception words in the Kemantney language as well as the high
inflectional character of the language. One of the biggest difficulties in building this stemming
algorithm was that for nearly every rule formulated there are exceptions. Therefore, there is a
need to find a way to handle exceptions cases in the language.
Page | XIII
CHAPTER ONE
INTRODUCTION
1.1. BACKGROUND OF THE STUDY
Nowadays the world is changing very fast especially in the area of information technology.
Information technology deals with an understanding of the technology infrastructure that
underpins much of today’s life; an understanding of the tools technology provides and their
interaction with this infrastructure; and an understanding of the legal, social, economic and public
policy issues that shape the development of the infrastructure and the applications and use of the
technologies [8].
Information, on the other hand, deals with content and communication. It encompasses
authoring, information finding and organization, the research process, and information analysis,
assessment and evaluation. The content in question here can take many forms: text, images,
video, audio, computer simulations, and multi-media interactive works. Content can also serve
many purposes: news, art, entertainment, education, research and scholarship, advertising,
politics, commerce, and documents and records that structure activities of everyday business and
personal life. In an increasingly technological society, the means of authoring, information finding
and organization and research, and even information use are increasingly mediated by
information technology [8].
Information retrieval acts as an important tool for searching relevant information, and stemming
is the process of reducing inflectional and derivational morphology of word variants into its stem
word [8]. Morphology is a branch of linguistics that studies and describes how words are formed in
language and includes inflection. Inflection characterizes the changes in word form that
accompany case, gender, number, and person. Affix removal stemmers apply set of
transformation rules to each word, trying to cut off known prefixes or suffixes. Stemmers, which
Page | 1
eliminate all affixes, give good results for the conflation and normalization of unitary variants.
They conflate morphologically similar terms into a single term without performing a complete
morphological analysis. Stemming algorithms have the advantage of reducing the corpus size thus
making information retrieval a faster process [28], [37].
Stemming is part of the composite process of extracting the words from text and turning them
into index terms and query terms in an IR system. It helps to reduce morphological variants of
words in to a common form in which given the input as an inflected word the stemmer process it
and provides as an output the root word. The inflected word may be singular, plural or containing
some other affixes.
A stemming algorithm (SA) is a computational procedure which reduces all words with the same
root (or, if prefixes are left untouched, the same stem) to a common form. It is a word
transformation in which a word may be stripped of some suffixes without losing its core semantic
content. There are many useful areas of computational linguistics and information retrieval work.
There are various stemming techniques depending on the application where stemmer is to be
used. Stemming techniques for conflating morphological variants can be accomplished by either
manual or automated approaches.
Porter [37] proposed an algorithm for suffix stripping, is perhaps the most widely used algorithm
for English stemming. Removing suffixes by automatic means is an operation which is especially
useful in the field of information retrieval. It is rule based and is best suited for less inflectional
languages like English.
Paice [47] proposed an evaluation method for stemming algorithms. This method outlines an
approach to stemmer evaluation which is based on detecting and counting the actual under- and
over-stemming errors committed during stemming of word samples derived from actual texts.
Page | 2
John [58] proposed suffix removal and word conflation which follows the longest match process
and has perhaps the most comprehensive list of English suffixes (along with transformation rules)
– about 1200 entries. The suffixes are stored in the reversed order indexed by their length and last
letter. The rules define if a suffix found can be removed (for example, if the remaining part of the
word is not shorter than N symbols; or if the suffix is preceded by a particular sequence of
characters).
Massimo and Nicola [62] proposed a novel statistical method for stemmer generation based on
Hidden Markov models (HMMs). It doesn't need a prior linguistic knowledge or a manually created
training set. Instead it uses unsupervised training which can be performed at indexing time. HMMs
are finite-state automata with transitions defined by probability functions. Since probability of
each path can be computed, it is possible to find the most probable path in the automata graph.
Each character comprising a word is considered as a state. The authors divided all possible states
into two groups (roots and suffixes) and two categories: initial (which can be roots only) and final
(roots or suffixes). Transitions between states define word building process. For any given word,
the most probable path from initial to final states will produce the split point (a transition from
roots to suffixes). Then the sequence of characters before this point can be considered as a stem.
Xu and Croft [56] proposed an approach, which allows correcting "rude" stemming results based
on the statistical properties of a corpus used. The basic idea is to generate equivalence classes for
words with a classical stemmer and then "separate back" some conflated words based on their co-
occurrence in the corpora. It also helps preventing well-known incorrect conflations of Porter's
algorithm, such as "policy/police" since chances of these two words co-occurrence are rather low.
1.2. MOTIVATION
The stemming algorithm is the well-known area in information retrieval that benefits from system
known to be distributed Kemantney language and culture to transmission line of other society.
Information retrieval in the multilingual world which is the future of the information age. Since the
Page | 3
time of invention, information retrieval has been knocking global world environments of many
application domains including language. Obviously language is, among other things, a means of
communicating information and index terms are units of language used as tools for
communicating information or a means of communicating factual information. This is the
approach of this study to language, information, and index terms. Index term is an expression that
describes the contents of a text and guides a user to the information. Stemming and related
natural language processing tasks are in a great need for fast, accurate, valid information sharing.
Information retrieval environments are crowded with many computing language stemmer and
also have high communication facility with technology, which are stored the language morphology
with society in their daily life. Therefore, the technology based language distribution service
prepared for the society should meet their expectation to put a positive impact on the quality
transferable service environment for the people.
To this end, developing a stemming algorithm for Kemantney text focusing on the Kemantney
language. The new generation of the Kemantney people and other people easily understand,
constructing quality analysis or transformation the language morphology from one generation to
another generation of the people are motivation of the current work.
1.3. KEMANTNEY LANGUAGE AND ITS CLASSIFICATION
Ethiopia has 83 different languages with up to 200 different dialects spoken. The Ethiopian
languages are divided into three major language groups, such as Cushitic, Omotic and Semitic
languages [18].
The Semitic languages are spoken in northern, central and eastern Ethiopia (mainly in Tigray,
Amhara, Harar and northern part of the Southern Peoples' State regions). They use the Ge'ez
script that is unique to the country, which consists of 33 letters, each with 7 orders, making a total
of 231 characters. Languages that use geez script include Adarigna, Amharigna, Argobba, Biral,
Page | 4
Gafat, Ge'ez, Guragigna, Chaha group (Chaha, Muher, Ezha, Gumer, Gura), Inor group (Inor, Enner,
Endegegna, Gyeto, Mesemes), Silt'e group (Silt'e, Ulbareg, Enneqor, Walane), Soddo group
(Soddo, Gogot, Galila), Tigrigna, Zay [11].
The Omotic languages are predominantly spoken between the Lakes of southern Rift Valley and
the Omo River. Such as Anfillo, Ari, Bambassi, Basketto, Bench, Boro, Chara,Dime, Dizzi, Dorze,
Gamo-Gofa, Ganza, Hammer-Banna, Hozo, Kachama-Ganjule, Kara, Kefa, Kore, Male, Melo,
Mocha, Nayi, Oyda, Shakacho, Sheko, Welaytta (Welamo), Yemsa, Zayse-Zergulla. The largest
ethnic and linguistic groups are the Oromos, Amharas and Tigrayans [18].
The Cushitic languages are mostly spoken in central, southern and eastern Ethiopia (mainly in Afar,
Oromia and Somali regions). The Cushitic languages use the Roman alphabet and Ge'ez script. For
example, Oromo is written in the Ge'ez script whereas Somali is written in the Roman alphabet.
Such as Afarigna, Agewigna, Alaba, Arbore, Awngi, Baiso, Burji, Bussa, Daasanech, Gawwada,
Gedeo, Hadiyya, Kambatta, Kemantney, Konso, Kunfal, Libido, Oromigna, Saho, Sidamigna,
Somaligna, Tsamai, Werize, Xamtanga [18].
Kemantney is the original language of the Kemantney people of Semien Gondar Zone, Ethiopia.
The Kemantney people speak a dialect of “Agew”, a Cushitic language. According to Gamst [17], all
“Agew” speak mutually intelligible dialects of an “Agew language”, and as far as their use of other
languages is concerned, he says almost all “Agew” are bilingual speaker of Amharic and/or
Tigrigna. However, as far as the well attested linguistic literature of the Cushitic language family is
concerned, Kemantney is an independent language of the “Central Cushitic” or “Agew” family.
Kemantney (Western Agew) together with its sister languages, Bilen of Eritrea (Northern Agew),
Xamt'anga of Wollo (Eastern Agew) language spoken in Wag Ximra zone, Northern Ethiopia, and
Awngi of Gojjam (Southern Agew) constitute the Central Cushitic language family, traditionally
called Agew (information of the classification based on morphological criteria). The Awngi
language spoken mainly in Awngi zone and in different districts in Metekel zone together with the
Gumuz and Shinasha. Bilen (from the self-name for the language bₔlin) is spoken in Eritrea and is
Page | 5
the northern most of the Agaw (or Central Cushitic) family of languages, the other members of
which are spoken entirely in Ethiopia. Like all the Agaw languages, Bilin has an extremely complex
morphology [52].
1.4. STATEMENT OF THE PROBLEM
As per the report of 1994 Population and Housing Census of Ethiopia, the ethnic population of the
Kemantney was 172,327, which is an eight times increment as compared to Gamst [17] estimate
of 20000 to 25000. Out of total population, there are 1625 first-language speakers; 3450 second
language speakers; in general the total number of Kemantney language speakers is 5075 [5]. All
speakers of the language are older than 30 years, and more than 75% of the speakers are older
than 50 years [5]. In September 2000 E.C. for Chilga Woreda Notice Office to produce the
newspaper 10,000 people are the speakers of language. Both sample data are small
representation of the total number of speakers in the language based on the current study to take
the latest version.
The situation the language exists in shows that Kemantney language is a highly endangered
language spoken by a small and elderly fraction of the Kemantney people in northern Ethiopia,
mainly in the Semien Gondar Zone in Chilga woreda to Kirakir north to Lake Tana in the woredas of
Lay Armachiho, Qwara, Dembiya, Gondar City, Gondar Zuria, Metemma and Wogera. The
language belongs to the western subsection of the Agaw or Central Cushitic languages.
The language is personal marking system to distinguish between first person singular and plural,
second person singular, and polite, and plural, and third person masculine, feminine and plural.
On the verb side, all inflectional categories are marked by suffixes [1]. Zelealem identifies three
different aspect forms in Kemantney: Perfective, Imperfective and Progressive.
Surafel [63] discussed that text is the main form used for communicating information and
knowledge. Technically speaking, text is any string of language, usually one that is more than one
Page | 6
sentence long. Text is composed of symbols from a finite alphabet. Text has been created
everywhere, in many forms (paper and electronic) and languages. All languages have their own
grammatical rules for expressing concepts with the help of constructing statements using words.
Depending on time, gender and number, the same word can be inflected to produce word variants
or affixes (such as suffix, prefix and infix).
Stemming algorithms, or stemmers, are used to group words based on semantic similarity. There
are several types of stemming algorithms. Affix removal algorithms are the most common. Affix
removal stemming algorithms remove from words extra characters that are attached at the end of
the word (suffixes) or at the beginning of the word (prefixes); producing a root form called a stem
that often closely approximates the root morpheme of a word. Stemming algorithms are used in
many types of language processing and text analysis systems, and are also widely used in
information retrieval. The main purpose of stemming is to reduce different grammatical
forms/word forms of a word like its noun, adjective, verb, adverb etc. to its root form.
Alemayehu and Willet [3] discussed that stemming plays an important role in the identification of
a word stem from a full word by removing inflectional and derivational affixes, and there has thus
been much interest for algorithms for this purpose. This interest is likely to increase still further as
more and more types of text-processing application become of wide spread importance.
The Kemantney language history and grammatical structure has been studied by different
researchers.
The Kemant people are an ethnic group (or to use Gamst’s term ‘Caucasoid people’) residing in the
North Gondar Zone of Amhara Region; they speak a dialect of Cushitic language and practice a
religion of ‘Pagan-Hebraic’. According to the anthropologist Gamst, the Kemant belong to the
branch of the Agaw [4].
Page | 7
“Both Agew and Kemant peoples conduct plough agriculture and are difficult to distinguish from
their Amhara neighbors in their everyday life.” During the second half of the 19 th century, Stern
described the Kemant: “They are, as a body, an industrious, energetic, and active race, residing in
districts where they have fine pasture for their cattle, and fertile soil to reward their field labor”
[4].
The population of the Kemant, according to the 1984 Population and Housing Census, was 169,
169.18. According to the 1994 Census, it rose to 172,327. However, at that time this number may
not have reflected the exact size of the population, for the time was not yet convenient for the
people to identify themselves. In both Censuses, the Kemant were eighteenth in rank among
ethnic groups of Ethiopia. Despite the fact that there might have been discrepancies between the
Censuses and the exact population size, the Kemant had ever been recognized as distinct people
[4].
Paradoxically, the Transitional Government of Ethiopia did not include the Kemant among the
ethnic groups eligible to establish national/regional self-government. This law was essential during
the transitional period and was a prelude to the federal constitution [4].
Unfortunately, and to the disadvantage of the Kemant, by the 2007 Population and Housing
Census, they were not counted as a separate ethnic group while it did count eighty five ethnic
groups. Although there is lack of official census concerning the Kemant at present, according to
the group of Kemant struggling for its recognition and self-determination, the population is
estimated to be well over 900,000. This makes them rank 12th in population size among Ethiopian
ethnic groups [4].
Although language and religion are main elements of ethnicity, the Kemant give more emphasis to
their common ancestor. Because of their ancestry, they are distinct from others, for “Traditional
Kemant believed they had ‘pure blood’ and that their ancestors had been ‘white’ or ‘pure’
(chewa)” *4].
Page | 8
Some Kemant claim that Kemant would have been recognized and would have enjoyed self-
determination, had they been in regions other than the Amhara which is dominated by the
Amhara people, who had political, economic, cultural, linguistic and religious dominance over the
entire country. They argue that this hegemony has continued exclusively on the Kemant people.
This can be deduced from the fact that the Amhara Regional State was not willing to submit the
Kemant among the lists of ethnic groups in the region to the Central Statistics Office in 2007 while
the Southern Nations, Nationalities and Peoples Regional State had included ethnic groups in the
region that are not represented in the House of Federation [4].
The Kemant are considered Amhara while they are actually distinct peoples. The Kemant have
been ethnically, religiously and linguistically distinct people for a long time and because of this
distinction they have been victims of stigma, exclusion and marginalization for the last seven
centuries [4].
The Kemant have their own language, which makes them grouped under the category of the Agaw
languages. Although in earlier times it was widely spoken in North Gondar region, presently, very
few of the population speak the language called Kemantney. Among the reasons for the decrease
of the number of the speakers of the Kemantney in favor of Amharic were: primary schools were
(are) run only in Amharic, the association of Kemantney with the ‘stigmatized’ Kemant identity
because of their traditional religion (Hege-Lebona (literally which means believe in heart)), not
only by the dominant Amhara but also by those convert Kemants and to find job and other
opportunities in the government offices it was mandatory to avoid such stigma by speaking
Amharic and following Christianity, as the case was with other ethnic groups of the country “or
non-Amhara the learning of Amharic and the adoption of Amhara culture, tradition and religion
were necessary steps to develop a career within the state administration” [4].
In other words, a negative attitude towards Hege-Lebona had a negative implication on the
Kemant language (Kemantney) [4].
Page | 9
Tourny [64] discussed that the Kemant Agaw people are considered as the original inhabitants of
central-northern Ethiopia. Living in the Gondar area – the historical ‘Kemantland’ Gamst *17+ –
they have been progressively, then massively Christianized and Amharized the last century.
Nowadays, less than one percent of the 170,000 Kemant people (1998 census) have preserved
their ancestral language and beliefs. Our personal observation during three different field works
conducted in the Gondar area (1999, 2002, 2007) confirm Zelealem’s expertise [2].
The use of the term Kedus, in Ge’ez language, to denominate one prayer used in both rituals, as
well as the bread sharing should also be interrogated [64].
In Kemant language, sowasu is the term which refers to ‘ritual music’. When listening to (and
looking at) different audio (and video) recordings of Kedassie, some general elements emerge
[64].
Nowadays, one of the main difficulties to face when doing fieldwork among the Kemant, is the
problem of language. Less and less people speak Kemantney (Afro-Asiatic, Cushitic, Central,
Western): 1) there are probably no more monolingual speakers of Kemant; 2) very few are
Kemant-Amharic bilinguals; 3) all, including the priests, are native Amharic-speakers [2]. In such
conditions, the given translation of the prayers is often rough, because the priests do not know
the strict equivalence in Amharic, or because they do not know the signification of archaic terms
used in the ritual. Besides the ageing of the concerned population, another problem is the lack of
tangible structures strong enough to ensure the survival of the tradition. The priests’ lapses of
memory and contradictions can be explained by the weakness of the passing on and the absence
of written sources [64].
The Kemant are the original inhabitants of the north central Ethiopia [17]. Their historical land
stretched from north of Lake Tana, the origin of Abay River (Blue Nile), to North West rural areas
around Gonder town. Chilga, Metema and Lay Armachiho were the historic places [65].
Page | 10
Since the mid-1950s, they have been immigrated to the areas inhabited by Amharas and
established their settlements. Nowadays, the Kemant reside around the highlands of northern and
north western parts of Gonder town. The deputy chairperson of the Interim Committee revealed
that the Kemant inhabited eight woredas in North Gonder Zone contiguously, including Quara,
Chilga, Lay Armachiho, Dembiya and Metemma and portions of Wogera, partially in Gondar and
Gonder Zuria Woreda [65].
According to the 1984 and 1994 Ethiopian Housing and Population Census the total population of
the Kemant was 169,169 and 172,327, and the 17 th and 10th populated groups respectively.
Without doubt, the census reports were less likely to reflect the exact population size of the time,
and any conceivable demographic transition [65].
At the moment, precise population figures have difficult to determine because the Kemant were
not counted in the latest national census of 2007. Unofficial estimates have ranged from 300,000,
to 600,000, to 1 million. It is important to look these numbers critically because political elites
might overestimate to get more public attention. Thus, the population size remains a major
political and academic spot light [65].
As to the researchers knowledge there is no research conducted to design a stemmer for the
Kemantney Language.
Nowadays the situation the language exists in shows that Kemantney language is a highly
endangered language spoken by a small and elderly fraction of the Kemantney people in northern
Ethiopia, mainly in the Semien Gondar Zone. There is a need to promote the language such that it
gets wide acceptance by the community. One of the means is automating the language with the
help of natural language processing.
Page | 11
This research therefore aims to develop a stemming algorithm that conflates morphological
variants of words in Kemantney text so as to initiate further research towards natural language
processing and information retrieval.
To this end, the following research questions are investigated and answered.
 What are the properties and word formations in Kemantney language?

 How to design suitable stemming algorithm for conflating Kemantney word variants into
their stem?
 What is the performance level registered by the stemmer developed for Kemantney?
1.5. OBJECTIVE OF THE STUDY
The general objective and specific objectives of the current study are presented as follows:
1.5.1. GENERAL OBJECTIVE
The general objective of this research is to identify inflectional and derivational morphology of the
Kemantney language and develop a stemmer for the language so as to identify root/stem of word
variants.
1.5.2. SPECIFIC OBJECTIVES
The specific objectives of the research are:
 To review different stemming algorithms that have been developed for other languages
 To identify properties of words in the Kemantney language in order to get familiar with the
different aspects of the language
 To select and prepare Kemantney text for the experiment
 To select suitable algorithm to design Kemantney stemmer
Page | 12
 To develop a prototype stemmer for the Kemantney text
 To evaluate the performance of the prototype stemmer
1.6. SCOPE AND LIMITATION OF THE STUDY
In this study the stemming algorithm was developed for Kemantney that works only in conflating
the inflectional morphology with specific emphasis to suffixes attached to word variants. A rule-
based approach and iterative stemmer are used that removes the affixes of the Kemantney words.
The reason why the study focuses on the suffix is that prefixes on the language are not frequently
observed, but suffixes are most commonly occurring affixes on Kemantney texts. In this research
Kemantney noun are not handled due to compound words and irregular words which requires
advanced stemming algorithm. Hence, detailed analysis of the morphology (including detail
investigation of the formation of inflectional and some derivational) of the language is necessary
to improve the performance evaluation of the stemmer. Selection of text is, therefore, an
important component in developing a stemmer. For the purpose of this research, text which can
be representative of the language. A sample text of different disciplines is collected from the
following sources, like text book, PhD thesis on Kemantney language and manuals from one text
book and three newspapers. One text book, one PhD thesis documents and manuals are used for
collecting a sample text.
The limitation of the study is the non-existence of standard text corpus for experimentation which
constrains to undertake comprehensive evaluation of the Kemantney stemmer developed in this
study.
1.7. METHODOLOGY
Some explanation of the methodology will be elaborated under this part. The following
methodologies are used to realize distribution of quality text in the stemming algorithm for the
Kemantney language environment.
Page | 13
1.7.1. LITRATURE REVIEW
As studying the language’s developmental stemmer constitute an important component in the

research, a literature survey is made to gather information and in understanding the subject.
Furthermore, appropriate individuals especially from Agaw or Central Cushitic language branch of
Ethiopian languages are consulted on the stemming algorithm of Kemantney text.
1.7.2. DATA SOURCES
A large text corpus is one of the input resources in IR researches. Corpus in the linguistics is a
growing discipline that applies analytical results from morphological language behavior and
collocation is a language feature that occurs when particular words are used to purely
grammatical reasons. According to Blair [12] the process of representing texts for stemmer is
fundamentally a linguistic process, and the problem of describing texts for stemmer is, first and
foremost, a problem of how language is no longer passed on to the next generation of speakers.
Thus, any theory of text representation presupposes a theory of language and meaning. Selection
of text is, therefore, an important component in developing stemmer. The sociolinguistic and
grammatical study of language replacement [1], sociolinguistic survey report of the Kemant
(Qimant) language of Ethiopia [2] was collected to study the Kemantney language, the Qemant
Agew of Ethiopia: a study in culture change of a Pagan Hebraic culture [17], ‘Kedassie’. A Kemant
(Ethiopian Agaw) ritual [64], Kemant (ness): the quest for identity and autonomy in Ethiopian
federal polity [65], Kemant nationality identity and self-rule question. Request letter to house of
federation [66], manual document, and internet. Hence the soft copy document and How to
transfer from one generation to another generation was used as a representative sample to study
the development of stemmer morphological language behavior and come up with a Kemantney
text.
1.7.3. EXPERMENTATION METHOD
Page | 14
After a detailed study of the language’s morphology, a hybrid approach based stemmer is
developed based on the sample text. The compilation of affixes was considered. The
characteristics of the affixes were then used to guide the development of the stemmer. The
approaches considered are iterative approach and rule based approach were selected in due
course. For developing the prototype Python 3.4.1 is used. Python is a programming language that
lets you work more quickly and integrate your systems more effectively. This programming
language was selected because it has good documentation strings (or docstrings) manipulation
capability and multi-paradigm programming language, used to implement the stemmer, it is easy
to learn, powerful programming language, high-level data structures and a simple but effective
approach to object-oriented programming [60].
1.7.4. TESTING PROCEDURE
Kemantney language has inflectional and derivational morphology; the process of stemming
involves dealing with inflectional and derivatives to the suffix stripping. The algorithm is developed
by analyzing morphological and word formation rules of the language. To evaluate the stemmer,
correctness, manual error counting and dictionary reduction method were used.
The experimentation of the stemmer is done on text collection of relatively small size collected
from the source. Manual error counting method found to be appropriate and was used to test the
stemmer. The algorithm is tested using error counting techniques. The evaluation is based on
error counting where indices of the stemmer are generated by assessing the under-stemming and
over-stemming errors. The under-stemming and over-stemming errors are manually counted. This
over-stemming and under-stemming are then used to evaluate the incorrectly stemmed of the
stemmer or this helps to compare number of errors that are not conflated correctly with the
correct one. The extent of accuracy of the stemmer, the stemmers’ accuracy on its stemming
capability to stem the suffix is determined by qualitative analysis of the output from the stemmer.
Dictionary reduction method also was used to test the compration power of the stemmer. This
shows that using a stemmer for Kemantney brings a significant reduction in dictionary size as a
Page | 15
result of conflating variant words to their same stem. The evaluation for the final stemmer reveals
that there is significant difference between stemming and non-stemming for Kemantney in terms
of the size of word compression.
1.8. SIGNIFICANCE OF THE STUDY
This research work is done in partial fulfillment to the Master of Science degree in Information
Technology. Stemming is well-known in the Natural Language Processing (NLP), Information
Retrieval (IR), and Text Mining (TM) research areas as an essential preprocessing step for some
tasks, such as text and document retrieval, document clustering, classification, information
extraction, and other content-related applications. Stemming also benefits an IR system: a better
IR recall can be achieved since query words are matched with their variants in the documents, and
stemming decreases the size of the overall term vocabulary, which leads to significant efficiency
benefits in speed and memory requirements, due to decreased size of the term index and
dimensionality of term vectors [25].
To be competent enough with the global technology change, to use the document as a reference
material for future work, to reduce the existing problems, to apply what I have learnt theoretically
and practically, to study stemming algorithm have the advantage of reducing the corpus size thus
making information delivery a faster process, familiarizing with real life problems and experience
for technology. In addition to being an academic exercise to fulfill the requirements of the
program this research is believed to produce results that can indicate the application of a general
developing stemming algorithm systems for Kemantney language. The results of the research can
be used as an input to the development of long age distribution computer based technique is used
for Kemantney text. The output of this thesis can also be used as a starting point for further
investigations in the possibilities of automatic Kemantney text system development for the
Kemantney language.
1.9. ORGANIZATION OF THE THESIS

Page | 16
This thesis is divided into five chapters. The first chapter is an introduction to the research
environment. Moreover, this chapter also presents the importance of stemming in IR environment
and the need to develop a stemming algorithm for conflating variants of a word in Kemantney
language. Kemantney language and its classification, significance of the study, statement of the
problem and its justification, methodology, scope and limitation of the study are also discussions
made in this chapter.
Chapter two reviews the works on overview stemming, stemming techniques in general and
stemming algorithms in particular. Some discussions are made on approaches to stemming and
types of stemmers.
Review is also made in this chapter on some stemming algorithms developed for other languages.
Kemantney language morphology is reviewed in chapter three. The inflectional and some
derivational morphologies of the language are the main concerns of this chapter. Word formation
processes for Kemantney nouns, pronouns, adjectives, conjunctions, adverbs and verbs are also
presented in detail in the chapter.
Discussions on the development of implementation and experimentation result for the stemming
algorithm for Kemantney text are given in the fourth chapter. The compilation of affix (suffix) list is
presented in this chapter. The approach employed to develop the stemmer and the reasons for its
selection also parts of the discussions in this chapter.
The last chapter, chapter five, presents conclusion deduced from the findings and
recommendations for future research.
Page | 17
CHAPTER TWO
REVIEW OF RELATED LITRATURE
Stemming is the process of reducing words to their grammatical root form [40], [56]. Stemming is
only used to identify morphological variants. Stemming algorithms are language dependent and a
number of attempts are there for different languages in the world. They have proven to be
successful to reduce words with the same stem to a common form and are evidenced by the work
of many researchers [25].
2.1. OVERVIEW OF STEMMING
Conflation defined as the act of fusing or combining, as the general term for the process of
matching morphological term variants. It maps term variants to a single form, usually a unique
well-formed root for each word [28].
Conflation is an important facility in any text development system, whereby; it is also able to find
not identical words in the database that match the vocabulary that the user used [48]. The most
common example of conflation technique is a stemming algorithm [26], [28] that handles only
morphological relationship between words. There are many techniques described to overcome
certain types of word variant using conflation algorithms. Appropriate conflation procedure must
be chosen according to the type of word variants to be processed. Morphological variants are
conflated by the use of stemming algorithm. Such complex procedures require either removal of
the longest matching suffix once or interactively and specification of detailed [26], [30], [38], [48].
Stemming algorithms can be very simple by just removing plurals, past and present participles to
very complex techniques that include all morphological rules. Stemming is the process of reducing
inflected, or derived words to their stem, base or root form. The stem need not be identical to the
Page | 18
morphological root of the word. It is usually sufficient that related words map to the same stem,
even if this stem is not in itself a valid root.
Stemming algorithm has been addressed as problem in the areas of information retrieval long
time ago. The process of stemming, often called conflation, is useful in search engines for query
expansion or indexing and other natural language processing problems. In information retrieval,
the relationship between a query and a document is determined primarily by the number and
frequency of terms which they have in common. Unfortunately, words have many morphological
variants which will not be recognized by term-matching algorithms without some form of natural
language processing. In most cases, these variants have similar semantic interpretations and can
be treated as equivalent for information retrieval (as opposed to linguistic) applications.
Therefore, a number of stemming or conflation algorithms have been developed for IR in order to
reduce morphological variants to their root form. Stemming programs are commonly referred to
as stemming algorithms or stemmers [16].
2.2. STEMMING TECHNIQUES
Stemming is a computational procedure that identifies word variants and reduces them to a single
canonical form [26]. Stemming techniques for conflating morphological variants can be
accomplished by either manual or automated approaches [25]. Manual conflation technique is
usually performed during the search by truncating the right-hand letters of the word. In the
manual technique, a human being, who decides the correct stem for each word, performs the
evaluation process. It is performed on query words but not on documents. The process of
truncation two things could happen: over-truncation and under-truncation. Over-truncation
occurs when too short a stem remains after truncation and may result in totally unrelated words
being truncated to the same stem; under-truncation, on the other hand, arises if too short a suffix
is removed and may result in related terms being described by different stems, as with
'computers' being truncated to 'computer', rather than 'comput*' (which would also include words
such as 'computing' and 'computational') [39].
Page | 19
Automatic conflation, on the other hand, is effected by means of a stemming algorithm. These
types of algorithms are usually designed to handle morphological variants with in language. A
stemming algorithm is an automated procedure when it reduces words with the same stem to a
common form, usually by removing derivational and inflectional suffixes from each word [16].
As presented in Figure 2.1, Frakes [28] distinguished four types of stemming algorithms: affix
removal, table lookup, successor variety, and n-grams.
Figure 2.1: Approaches for designing a stemmer for a given language [21]
Page | 20
There are several criteria for judging stemmers: correctness, retrieval effectiveness, and
compression performance. There are two ways stemming can be incorrect over-stemming and
under-stemming. When a term is over-stemmed, too much of it is removed. Over-stemming can
cause unrelated terms to be conflated. The effect on IR performance is retrieval of non-relevant
documents. For example the terms ‘legal’ and ‘legging’ are derived from two unrelated terms but
due to over-stemming may be stemmed to the term ‘leg’ which may yield incorrect results. Under-
stemming is the removal of too little of a term. Under-stemming will prevent related terms from
being conflated. For example the terms ‘absorption’ and ‘absorbing’ are derived from same root
word ‘absorb’ but due to under-stemming they may not be stemmed under same root. The effect
of under-stemming on IR performance is that relevant documents will not be retrieved. Stemmers
can also be judged on their retrieval effectiveness usually measured with recall (how many of the
relevant document are retrieved) and precision (how many of the retrieved document are
relevant) and on their speed, size, and so on. Finally, they can be rated on their compression
performance. Stemmers for IR are not usually judged on the basis of linguistic correctness, though
the stems they produce are usually very similar to root morphemes, as described below.
AFFIX REMOVAL METHOD
Affix removal methods (the strategy that is followed in this research work) removes suffix from
the words so as to convert them into a common stem form. Most of the stemmers that are
currently used use this type of approach for conflation. Savoy [30] further added that the design of
stemming algorithm such procedures is based mainly on one of two principles (or manners of
operation): technique is based on iterations and the other is longest match.
An iterative stemming algorithm is simply a recursive procedure, as its name implies, which
removes strings is based on the fact that suffixes are attached to stems one after the other. Such
algorithm involves a recursive procedure which removes the suffixes one at a time, starting at the
end of a word and working towards its beginning. For instance, a word such as willingness might
have “–ness” removed in the first iteration and “-ing” in the second [36], [39].
Page | 21
Longest-match algorithms are more than one suffix matches the end of the word, the longest one
is removed. This requires, however, the compilation of all possible suffixes. In order to reduce
programming complexities, this list of suffixes is sorted in decreasing order of suffix length. The
procedure is then to scan through suffix list in order of decreasing length. That is, the longer
endings are first scanned, and if a match is not found, then the shorter ones are scanned. Longest-
match algorithms are often easier to program but require a match longer dictionary since frequent
combinations of short suffixes must be included. Lovin's stemmer is one best example of longest
match stemming algorithm [36], [39].
SUCCESSOR VARIETY METHOD
Successor variety stemmers use the frequencies of letter sequences in a body of text as the basis
of stemming. Successor variety stemming is done based on the determination of morpheme
boundaries, uses knowledge from structural linguistics, and is more complex than affix removal
stemming algorithms. The successor variety algorithm has the advantage of not requiring affix
removal rules that are based on the morphological structure of a language. However, the
effectiveness of this algorithm depends on the corpus and on threshold values used in word
segmentation [20].
Two criteria used to evaluate various segmentation methods: the number of correct segment cuts
divided by the total number of cuts, or the number of correct segment cuts divided by the total
number of true boundaries [20].
After segmenting, if the first segment occurs in more than 12 words in the corpus, it is probably a
prefix [20].
The successor variety of substrings of a term will decrease as more characters are added until a
segment boundary is reached. The stemming method based on this work uses letters in place of
phonemes, and a body of text in place of phonemically transcribed words. It scans the word to be
Page | 22
stemmed and finds the cut point where the successor variety increases sharply. This information is
used to identify stems.
Successor variety stemmers use the frequencies of letter sequences in a body of text as the basis
of stemming. In less formal terms, the successor variety of a string is the number of different
characters that follow it in words in some body of text. Consider a body of text consisting of the
following words, for example.
back, beach, body, backward, boy
To determine the successor varieties for "battle," for example, the following process would be
used. The first letter of battle is "b." "b" is followed in the text body by three characters: "a," "e,”
and "o." Thus, the successor variety of "b" is three. The next successor variety for battle would be
one, since only "c" follows "ba" in the text. When this process is carried out using a large body of
text [20].
Once the successor varieties for a given word have been derived, this information must be used to
segment the word. Hafer and Weiss [20] discuss four ways of doing this.
Using the cutoff method, some cutoff value is selected for successor varieties and a boundary is
identified whenever the cutoff value is reached. The problem with this method is how to select
the cutoff value-if it is too small, incorrect cuts will be made; if too large, correct cuts will be
missed.
With the peak and plateau method, a segment break is made after a character whose successor
variety exceeds that of the character immediately preceding it and the character immediately
following it. This method removes the need for the cutoff value to be selected.
In the complete word method, a break is made after a segment if the segment is a complete word
in the corpus.
Page | 23
The entropy method takes advantage of the distribution of successor variety letters. The method
works as follows. Let /Dαi/ be the number of words in a text body beginning with the i length
sequence of letters. Let /Dαij/ be the number of words in Dαi with the successor j. The probability
| Dij |
that a member of Dαi has the successor j is given by . The entropy of is | Di |
| Di |
26
| Dij | | Dij |
Hi     log 2
j 1 | Di | | Di |
Using this equation, a set of entropy measures can be determined for a word. A set of entropy
measures for predecessors can also be defined similarly. A cutoff value is selected, and a boundary
is identified whenever the cutoff value is reached.
The successor variety stemming process has three parts [20]: first determine the successor
varieties for a word, this followed by segmenting the word using one of the methods. Finally,
select one of the segments as the stem.
TABLE LOOKUP METHOD
Frakes & Baeza-Yates [15] defined dictionary-based technique as one way to do stemming by
storing a table of all index terms and their stems. A dictionary technique depends mainly on
creating a very large dictionary which stores words found in natural texts with their corresponding
morphological parts. Such parts include: stems, roots, and affixations. This technique lists all
words that exist in a specific language with their corresponding variation of words created by
different attachments to the root word. Each word uses a unique entry in a lookup table. Table
lookup based techniques use tables which map terms to their stems. Terms and their
corresponding stems can also be stored in a table. Terms from queries and indexes could then be
stemmed via table lookup. In the stemming is then done via lookups in the table. Using Binary tree
(B-tree) or Hash table, such lookups would be very fast. For example, presented, presentable,
presenting all can be stemmed to a common stem present.
Page | 24
The B-tree when the dataset is too large, and alternative methods are required. The B-tree, is
useful when data resides in external storage. External storage typically refers to some kind of disk
system, such as the hard disk found in most desktop computers or servers. Dictionaries for very
large files typically reside on secondary storage, such as a disk. The dictionary is implemented as
an index to the actual file and contains the key and record address of data.
A hash table is simply an array that is addressed via a hash function. Dictionaries are data
structures that support search, insert, and delete operations. One of the most effective
representations is a hash table. Typically, a simple function is applied to the key to determine its
place in the dictionary. Hash tables are a simple and effective method to implement dictionaries.
Under B-tree or Hash table approach, the stemming process is performed manually, wherein the
stems are defined for each word and stored in some kind of structured form. Lookup algorithm
basically uses database table. Database already contains the roots. Lookup approach is simple,
fast, and generates perfect stems.
There are problems with this approach [21]. The first is that for building the lookup table we need
to extensively work on a language, there will be some probability that this table may miss out
some exceptional cases, it requires a large database and hence large amount of manual work is
required, limited to retrieving only those words that have been previously stored, what is more,
the space occupied for storage tends to grow as the corpus expands, which can make the search
process inefficient. It’s another disadvantage is that it can only stem those words which are
contained in database.
N-GRAM METHOD
The N-gram stemming underperformed stems and required significant amount of memory and
storage for index, but its ability to work with an arbitrary language makes it useful for many
applications. N-gram technique is no prior linguistic knowledge about the text being analyzed is
Page | 25
required by the N-gram technique, it has been assumed to be language-independent. On the
identification of bi-grams and tri-grams and is more of a term used for clustering than stemming. It
is done based on numbers of n-grams [42].
According to Bethlehem [68] bi-grams and tri-grams are used as index terms both for document
and query text.
Documents and queries are then matched using these sets of index terms. Similarity values can be
calculated to quantify the degree of the matching.
Two examples of the different methods of forming n-grams are the adjacent and non-adjacent
methods depending on whether the constituent characters of the resulting n-gram are found
adjacent to each other or not in the source word, sentence etc... For example with the term
COMPUTER, we can form the character tri-grams from non-adjacent characters:
COM, COP, COU, COT, etc... The process results in more number of tri-grams. The most common
type of n-gram formation method for information retrieval, disclosed in literature is the adjacent
method [68].
In forming the n-grams, the slicing process starts from the first character of the word and
combines the n consecutive characters to form an n-gram. Then it goes on to the second character
and repeats the grouping, goes on to the third character and forms the group and so on up till the
last n characters that will form the final n-gram.
For example, for the word RESEARCH, if n is set to 2, the following bi-grams can be generated:
RE ES SE EA AR RC CH
If n is set to 3, the following tri-grams can be generated:
RES ESE SEA EAR ARC RCH

Page | 26
2.3. CLASSIFICATION OF STEMMING ALGORITHMS
Stemming algorithms can be broadly classified into two categories [21] namely, Rule-Based
approach and Statistical approach.
Figure 2.2: Classification of stemming algorithms
Rule based stemmer encodes language specific rules whereas statistical stemmer employs
statistical information from a large corpus of a given language to learn the morphology.
2.3.1. RULE BASED APPROACH
In a rule based approach (that is followed in this research work) language specific rules are
encoded and based on these rules stemming is performed. In this approach various conditions are
specified for converting a word to its derivational stem, a list of all valid stems are given and also
there are some exceptional rules which are used to handle the exceptional cases. In Lovins
stemmer, stemming comprises of two phases [21]. In the first phase, the stemming algorithm
Page | 27
retrieves the stem from a word by removing its longest possible ending by matching these endings
with the list of suffixes stored in computer and in the second phase spelling exceptions are
handled. For example the word “absorption” is derived from the stem “absorpt” and “absorbing”
is derived from the stem “absorb”. The problem of the spelling exceptions arises in the above case
when we try to match the two words “absorpt” and “absorb”. Such exceptions are handled very
carefully by introducing recording and partial matching techniques in the stemmer as post
stemming procedures.
Recording [16] occurs immediately following the removal of an ending and makes such changes at
the end of the resultant stem as are necessary to allow the ultimate matching of varying stems.
These changes may involve turning one stem into another (e.g. the rule rpt → rb changes absorpt
to absorb), or changing both stems involved by either recording their terminal consonants to some
neutral element (absorb → absorƋ, absorpt → absorƋ), or removing some of these letters entirely,
that is, changing them to nullity (absorb → absor, absorpt → absor).
The main difference between recording and partial matching is that a recording procedure is a
part of stemming algorithm whereas partial matching procedure is applied on the output of
stemming algorithm where the stems derived from the catalogue terms are being searched for
matches to the user’s query.
Apart form Lovins method; one more rule based method is given by MF Porter which comprises of
a set of conditional rules [37]. These conditions are either applied on the stem or on the suffix or
on the stated rules.
As pointed by Sharma [21], rule based approach has the following advantages.
 Rule based stemmers are fast in nature i.e. the computation time used to find a stem is
lesser.
Page | 28
 The retrieval results for English by using rule based stemmer are very high especially for
suffix conflation.
On the other hand, some of the disadvantages of rule based approach mentioned below.
 One of the main disadvantages of rule based stemmer is that one need to have extensive
language expertise to make them.
 The procedure used in this approach handles individual words: it has no access to
information about their grammatical and semantic relations with one another.
 The amount of storage required to store rules for stem extraction from the words and also
to store the exceptional cases.
 These stemmers may apply over-stemming and under-stemming to the words.
2.3.2. STATISTICAL APPROACH
Statistical stemming is an effective and popular approach in information retrieval. Some recent
studies show that statistical stemmers are good alternatives to rule based stemmers. Additionally,
their advantage lies in the fact that they do not require language expertise. Rather, they employ
statistical information from a large corpus of a given language to learn morphology of words. Lot
of research has been done in the area of statistical stemming method such as yet another suffix
stripper and graph based stemmer [21].
Yet Another Suffix Stripper (YASS) is the most popular stemmers encode a large number of
languages specific rules built over a length of time. Such stemmers with comprehensive rules are
available only for a few languages. In the absence of extensive linguistic resources for certain
languages, statistical language processing methods have been successfully used to improve the
performance of IR systems. Yet another suffix stripper is one such statistics based language
independent stemmer. Its performance is comparable to that of Porter’s and Lovin’s stemmers,
Page | 29
both in terms of average precision and the total number of relevant documents retrieved the
challenge of retrieval from languages with poor resources.
In this approach, a set of string distance measures is defined, and complete linkage clustering is
used to discover equivalence classes from the lexicon. It does not rely on linguistic expertise.
Retrieval experiments by the authors on English, French, and Bengali datasets show that the
proposed approach is effective for languages that are primarily suffixing in nature. The clusters are
created using hierarchical approach and distance measures. Then the resulting clusters are
considered as equivalence classes and their centric as the stems.
Graph Based Stemmer (GRAS) is a graph based language independent stemming algorithm for
information retrieval. The following features make this algorithm attractive and useful: retrieval
effectiveness, generality that is, its language-independent nature and low computational cost. The
steps that are followed in this approach can be summarized as below:
1. Find long common prefix among the word pairs present in the documents. For this,
consider the word-pairs of the form W1 = PS1 & W2 = PS2 where, P is the long common
prefix between W1 & W2.
2. The suffix pair S1 & S2 should be valid suffixes i.e. if other word pairs also have a common
initial part followed by these suffixes such that W’1 = P’S1 & W’2 = P’S2. Then, S1 & S2 is
the pair of candidate suffix if large number of word pairs is of this form. Thus, suffixes are
considered in pair rather than individually.
3. Look for pairs that are morphological related i.e. if they share a non-empty common prefix
and the suffix pair is a valid candidate suffix pair.
4. These words relationships will be modeled using a graph where nodes represent the words
and edges are used to connect the related words.
5. Pivot node is identified i.e. pivot is considered that node which is connected by edges to a
large number of other nodes.
Page | 30
6. In the final step, a word that is connected to a pivot is put in the same class as the pivot if it
shares many common neighbours with the pivot.
Once such words classes are formed, stemming is done by mapping all the words in a class to the
pivot for that class. This stemming algorithm has outperformed rule-based stemmer, statistical
stemmer (YASS, Linguistica [19]), and baseline strategy [21].
This approach yields best retrieval results for suffixing languages or the languages which are
morphologically more complex like French, Portuguese, Hindi, Marathi, and Bengali rather than
English [21].
As pointed out by Sharma [21], some of advantages of statistical approach include the following.
 Stemmers are useful for languages having scarce resources. Like the Asian languages are
heavily used in Asian Sub Continent but very less research is done on these languages.
 They are considered as Recall-Enhancing Devices as they increase the value of recall at a
given rate.
On the other hand, some of the disadvantages of statistical approach mentioned below.
 Most of the statistical stemmer does their statistical analysis based on some sample of the
actual corpus. As sample size decreases, the possibility of covering most morphological
variants will also decrease. Naturally, this would result in a stemmer with poorer coverage.
 For the Bengali lexicon, there are few instances where two semantically different terms fall
in the same cluster due to their string similarity. For example, Akram (the name of a
cricketer from Pakistan) and akraman (to attack) fall in the same cluster, as they share a
significant prefix. Such cases might lead to unsatisfactory results.
 Statistical stemmers are time consuming because for these stemmers to work we need to
have complete language coverage, in terms of morphology of words, their variants etc.
Page | 31
Hybrid approach an approach that combine two or more algorithms that are used for stemming or
combination of statistical approach and rule based approach such as inflectional and derivational
methods (Krovetz, Xerox), corpus based and context sensitive [73].
Inflectional and Derivational Methods: this is another approach to stemming and it involves both
the inflectional as well as the derivational morphology analysis. The corpus should be very large to
develop these types of stemmers and hence they are part of corpus base stemmers too. In case of
inflectional the word variants are related to the language specific syntactic variations like plural,
gender, case, etc... whereas in derivational the word variants are related to the part-of-speech
(POS) of a sentence where the word occurs [73].
Krovetz Stemmer (KSTEM): the Krovetz stemmer was presented in 1993 by Robert Krovetz [14]
and is a linguistic lexical validation stemmer. Since it is based on the inflectional property of words
and the language syntax, it is very complicated in nature. It effectively and accurately removes
inflectional suffixes in three steps: transforming the plurals of a word to its singular form,
converting the past tense of a word to its present tense and removing the suffix ‘ing’.
The conversion process first removes the suffix and then through the process of checking in a
dictionary for any recoding, returns the stem to a word. The dictionary lookup also performs any
transformations that are required due to spelling exception and also converts any stem produced
into a real word, whose meaning can be understood.
The strength of derivational and inflectional analysis is in their ability to produce morphologically
correct stems, cope with exceptions, processing prefixes as well as suffixes. Since this stemmer
does not find the stems for all word variants, it can be used as a pre-stemmer before actually
applying a stemming algorithm. This would increase the speed and effectiveness of the main
stemmer. Compared to Porter and Paice/Husk, this is a very light stemmer. The Krovetz stemmer
attempts to increase accuracy and robustness by treating spelling errors and meaningless stems.
Page | 32
If the input document size is large this stemmer becomes weak and does not perform very
effectively. The major and obvious flaw in dictionary-based algorithms is their inability to cope
with words, which are not in the lexicon. Also, a lexicon must be manually created in advance,
which requires significant efforts. This stemmer does not consistently produce a good recall and
precision performance [73].
Xerox Inflectional and Derivational Analyzer: The linguistics groups at Xerox have developed a
number of linguistic tools for English which can be used in information retrieval. In particular, they
have produced English lexical database which provides a morphological analysis of any word in the
lexicon and identifies the base form. Xerox linguists have developed a lexical database for English
and some other languages also which can analyze and generate inflectional and derivational
morphology.
The inflectional database reduces each surface word to the form which can be found in the
dictionary, as follows [71]: nouns singular (e.g. children child), verbs infinitive (e.g. understood
understand), adjectives positive form (e.g. best good) and pronoun nominative (e.g. whom who)
[73].
The derivational database reduces surface forms to stems which are related to the original in both
form and semantics. For example, ‘government’ stems to ‘govern’ while ‘department’ is not
reduced to ‘depart’ since the two forms have different meanings. All stems are valid English terms,
and irregular forms are handled correctly. The derivational process uses both suffix and prefix
removal, unlike most conventional stemming algorithms which rely solely on suffix removal [73].
The databases are constructed using finite state transducers, which promotes very efficient
storage and access. This technology also allows the conflation process to act in reverse, generating
all conceivable surface forms from a single base form. The database starts with a lexicon of about
77 thousand base forms from which it can generate roughly half a million surface forms [73].
Page | 33
The advantages of this stemmer are that it works well with a large document also and removes
the prefixes also where ever applicable. All stems are valid words since a lexical database which
provides a morphological analysis of any word in the lexicon is available for stemming. It has
proved to work better than the Krovetz stemmer for a large corpus.
The disadvantage is that the output depends on the lexical database which may not be exhaustive.
Since this method is based on a lexicon, it can not correctly stem words which are not part of the
lexicon. This stemmer has not been implemented successfully on many other languages.
Dependence on the lexicon makes it a language dependent stemmer.
Corpus Based Stemmer: this method of stemming was proposed by Xu and Croft in their paper
“Corpus-based stemming using cooccurrences of word variants” *56]. They have suggested an
approach which tries to overcome some of the drawbacks of Porter stemmer. For example, the
words ‘policy’ and ‘police’ are conflated though they have a different meaning but the words
‘index’ and ‘indices’ are not conflated though they have the same root. Porter stemmer also
generates stems which are not real words like ‘iteration’ becomes ‘iter’ and ‘general’ becomes
‘gener’. Another problem is that while some stemming algorithms may be suitable for one corpus,
they will produce too many errors on another.
Corpus based stemming refers to automatic modification of conflation classes-words that have
resulted in a common stem, to suit the characteristics of a given text corpus using statistical
methods. The basic hypothesis is that word forms that should be conflated for a given corpus will
co-occur in documents from that corpus. Using this concept some of the over-stemming or under-
stemming drawbacks are resolved e.g. ‘policy’ and ‘police’ will no longer be conflated [73].
The advantage of this method is it can potentially avoid making conflations that are not
appropriate for a given corpus and the result is an actual word and not an incomplete stem.
Page | 34
The disadvantage is that you need to develop the statistical measure for every corpus separately
and the processing time increases as in the first step two stemming algorithms are first used
before using this method.
Context Sensitive Stemmer: this is a very interesting method of stemming unlike the usual method
where stemming is done before indexing a document, over here for a web search, context
sensitive analysis is done using statistical modeling on the query side. This method was proposed
by Peng et. al. [72]. Basically for the words of the input query, the morphological variants which
would be useful for the search are predicted before the query is submitted to the search engine.
This dramatically reduces the number of bad expansions, which in turn reduces the cost of
additional computation and improves the precision at the same time. After the predicted word
variants from the query have been derived, a context sensitive document matching is done for
these variants. This conservative strategy serves as a safeguard against spurious stemming, and it
turns out to be very important for improving precision.
The advantage of this stemmer is it improves selective word expansion on the query side and
conservative word occurrence matching on the document side.
The disadvantage is the processing time and the complex nature of the stemmer. There can be
errors in finding the noun phrases in the query and the proximity words.
2.4. RELATED WORKS
There are a number of stemming algorithms developed for different languages. The approaches
and techniques used in these algorithms especially for Arabic, English and other languages are
presented as follows.
2.4.1. ENGLISH LANGUAGE STEMMERS
Page | 35
There are many kinds of stemming algorithm available for English language. Some of them are the
following:
LOVINS STEMMING ALGORITHM
The Lovins stemming algorithm was first presented in 1968 by Julie Beth Lovins [16]. The stemmer
was the first to be published and was extremely well developed considering the date of its release
and has been the main influence on a large amount of the future work in the area. The Lovins
stemmer is a single pass, context-sensitive stemmer, longest-match stemmer developed by Julie
Beth Lovins of Massachusetts Institute of Technology in 1968 [16]. This early stemmer was
targeted at both the IR and Computational linguistics areas of stemming. This is interesting as
Lovins’ rule list was derived by, processing and studying a word sample. Perhaps if this process
was repeated with a much larger sample a more satisfactory rule list could be derived. There are
also known to be problems regarding the reformation of words. This process uses the recoding
rules to reform the stems into words to ensure they match stems of other similar meaning words.
The main problem with this process is that it has been found to be highly unreliable and frequently
fails to form words from the stems, or matches the stems of like meaning words.
The Lovins stemmer removes a maximum of one suffix from a word, due to its nature as single
pass algorithm. It uses a list of about 294 different suffixes, and removes the longest suffix
attached to the word, ensuring that the stem after the suffix has been removed is always at least 3
characters long. Then the ending of the stem may be reformed (e.g., by un-doubling a final
consonant if applicable), by referring to a list of recoding transformations [16].
DAWSON STEMMING ALGORITHM
Dawson stemmer was developed by J.L. Dawson of the literary and linguistics computing centre at
Cambridge University in 1974 [58]. The Dawson stemming algorithm is based on the one
developed by Lovins which makes use of iterative longest match approval. It is a complex
Page | 36
linguistically targeted stemmer that is strongly based upon the Lovins stemmer. Initially, Dawson
uses the list that contains 260 English suffixes with associated removal condition codes given by
Lovins.
But, after he has corrected the list, he brings the total up to about 1200 suffixes [16]. To avoid the
problems of storage and processing time taken, the suffixes and their condition code numbers
backwards are read, stored and indexes by length and final letter. Dawson does not use recording
technique in his algorithm to handle stems and instead used an extension of the partial matching
procedure also defined within the Lovins paper. The basic principle of Dawson’s algorithm is if two
stem matches up to a certain number of characters and the remaining characters of each stem
belong to the same stem ending class, then two stems are of the same form [58].
PORTER STEMMING ALGORITHM
The Porter stemmer is a conflation stemmer developed by Martin Porter at the University of
Cambridge in 1980 [37]. The stemmer is based on the idea that the suffixes in the English language
(approximately 1200) are mostly made up of a combination of smaller and simpler suffixes.
This stemmer is a linear step stemmer [37]. Specifically, it has five steps applying rules within each
step. Within each step, if a suffix rule matched to a word, then the conditions attached to that rule
are tested on what would be the resulting stem, if that suffix was removed, in the way defined by
the rule.
For example, such a condition may be, the number of vowel characters, which are followed be a
consonant character in the stem (measure), must be greater than one for the rule to be applied.
Once a rule passes its conditions and is accepted the rule fires and the suffix is removed and
control moves to the next step. If the rule is not accepted then the next rule in the step is tested,
until either a rule from that step fires and control passes to the next step or there are no more
rules in that step when control moves to the next step. This process continues for all five steps,
Page | 37
the resultant stem being returned by the stemmer after control has been passed from step five
[37].
PAICE/HUSK STEMMING ALGOTRITHM
The Paice/Husk stemmer was developed by Chris Paice at Lancaster University in the late 1980s,
and was originally implemented with assistance from Gareth Husk [47]. The Paice/Husk stemmer
is a simple iterative stemmer that is to say; it removes the endings from a word in an indefinite
number of steps. The stemmer uses a separate rule file, which is first read into an array or list. This
file is divided into a series of sections, each section corresponding to a letter of the alphabet. The
section for a given letter, say "e", contains the rules for all endings ending with "e", the sections
being ordered alphabetically. An index can thus be built, leading from the last letter of the word to
be stemmed to the first rule for that letter [47].
When a word is to be processed, the stemmer takes its last letter and uses the index to find the
first rule for that letter. The rule is examined, and is accepted if: It specifies an ending which
matches the last letters of the word. Any special conditions for that rule are satisfied (e.g., the so-
called 'intact' condition, which ensures that the rule is only fired if no other rules have yet been
applied to the word). Application of the rule would not result in a stem shorter than a specified
length or without a vowel. If a rule is accepted then it is applied to the word. If it is not accepted,
the rule index is incremented by one and the next rule is tried. However, if the first letter of the
next rule does not match with the last letter of the word, this implies that no ending can be
removed, and so the process terminates [47].
KROVETZ STEMMING ALGORITHM
The Krovetz stemmer was developed by Bob Krovetz, at the University of Massachusetts, in 1993
[14]. It is quite a 'light' stemmer, as it makes use of inflectional linguistic morphology. The area of
morphology (the internal structure of words) can be broken down into two subclasses, inflectional
Page | 38
and derivational. Inflectional morphology describes predictable changes a word undergoes as a
result of syntax (the plural and possessive form for nouns, and the past tense and progressive
form for verbs are the most common in English).
These changes have no effect on a word’s ‘part-of-speech’ (a noun still remains a noun after
pluralizations). In contrast, changes of derivational morphology may or may not affect a word’s
meaning. Although English is a relatively weak morphological language, languages such as
Hungarian and Hebrew have stronger morphology where thousands of variants may exist for a
given word [14].
2.4.2. Bon: first Persian stemmer
Persian is an Indo-European language. So in this language there are few stems and other words
are constructed by adding prefixes and suffixes to stems. Bon is an affix removal stemmer [74].
Affix removal algorithms remove suffixes and/or prefixes from terms leaving a stem. These
algorithms sometimes also transform the resultant stem. A simple example of an affix removal
stemmer is one that removes the plurals from terms. However, in removing Persian affixes, there
are many exceptions. Persian verbs have inflectional property, because they include person,
number, and tense. Therefore Bon has a dictionary of infinitives (and in exceptions, present tense
of an infinitive). Moreover, infinitives (verbs) in Persian can be simple, or compound, or phrasal
[74].
They can find at least one space between components of compound or phrasal infinitives in
Persian. Bon algorithm such as most stemmers currently in use is an iterative longest match
stemmer. An iterative longest match stemmer removes the longest possible string of characters
from a word according to a set of rules. This process is repeated until no more characters can be
removed. Even after all characters have been removed, stems may not be correct [74].
2.4.3. ARABIC STEMMING ALGORITHMS
Page | 39
Arabic is highly productive, both derivationally and inflectionally [61]. Arabic is a highly inflected
language and has a complex morphological structure. Some of the applications of Arabic natural
language processing require the basic form of the word (root or stem) to be most effective,
therefore stemming process is a necessity. There are several stemming approaches that are
applied to Arabic language. The morphology complexity of Arabic makes it particularly difficult to
develop natural language processing applications for Arabic information retrieval.
Several stemming algorithms for Arabic have been proposed based on different principles and
each produces different sets of stem classifications. The most common approaches used in Arabic
stemming are the light and the root based stemmers. Root-based stemming is based on removing
all attached prefixes and suffixes in an attempt to extract the root of a given Arabic surface word.
Several morphological analyzers have been developed for Arabic, e.g. Khoja and Garside [61]. Light
stemming is used not to produce the linguistic root of a given Arabic surface form, but to remove
the most frequent suffixes and prefixes. The most common suffixation includes duals and plurals
for masculine and feminine, possessive forms, definite articles, and pronouns. Several light
stemmers have been developed, all based on suffix and prefix removal and normalization.
Examples of light stemmers include: Aljlayl & Frieder’s Stemmer (Aljlayl) Darwish’s Al-Stem, and
Larkey et al.'s U Mass Stemmer [29].
2.4.4. STEMMING ALGORITHM FOR ETHIOPIA LANGUAGE
Stemming in most cases is language dependent, to develop stemming algorithm one has to know
the morphology of the language or work with a linguistic expert. Most of the Ethiopian languages
there are no automation system including Kemantney language [54].
SILT'E STEMMER
Page | 40
Silt’e belongs to the Semitic language group. Muzyn [31] has developed (designed) a stemmer for
a language requires analyzing, understanding and also modeling the language trend in terms of
word formation. Morphology plays an important role to guide the development of the stemmer,
and describe the nature and characteristics of affixes in Silt’e words. Therefore languages have a
common grammatical system based on a root-pattern structure. Consonants bear the basic
meaning while vowels form different patterns. Stems are built from consonantal roots before
other word forms are built. Silt’e uses affixation and reduplication to derive different word forms
from stems. Common affixations are prefix, suffix, and infix. Silt’e uses extensive concatenation of
affixes and can result in relatively long words, which often contain an amount of semantic
information equivalent to a whole English phrase, clause or sentence. As a result of this complex
morphological structure, a single Silt’e word can have very large variants.
To design the stemmer, a sample document was collected from different sources and research
paper that explains the morphology of Silt’e language also used and affixes and stop words
collected from this research paper and the sample text document to develop the stemmer. The
stemmer, developed in this study is iterative and uses context sensitive and recoding rules that
remove prefix, suffix and reduplication of letters (type 1 and type 2). In this experiment the
stripping procedure were applied in order: prefix, suffix and finally letter reduplication.
WOLAYTA STEMMERS
Wolaytta is categorized under Omotic language family. Lemma [39] has developed a stemmer for
Wolaytta language based on the morphological nature of the language. As stated in his paper
Wolaytta is a language dependent on suffixation to form different forms of a given word.
Concatenation of suffixes is common in Wolaytta.
As a result, two or more suffixes may be concatenated together and attached to a word. In the
language, possible list of combination can be very large making difficult to have complete list of
combination (concatenations). Besides, concatenation in the language makes suffixes long ones
Page | 41
attaching one suffix to another. Hence, iteratively removing each base suffix one by one is
considered as the best choice in this algorithm. As a result of the characteristics of the language,
the algorithm adopted iterative approach to develop the stemmer for Wolaytta text.
The stemmer developed for stemming Wolaytta text is context sensitive. In his study, Lemma has
employed a semi-automatic means to compile the possible suffix list. In the process of compiling
the suffix dictionary, the words in the sample text were first written in reverse order. The reversed
list of words was then sorted and frequencies of matching substrings identified, finally the sub-
strings which occur more than once are selected as suffixes. First, a word is inputted to the
stemmer. The algorithm checks whether a suffix from the list is attached to the inputted word.
The suffixes are iteratively stripped from the word and after application of necessary condition,
the final word is considered as a stem.
OROMO STEMMERS
Oromigna is categorized under Cushitic language family. One of the stemmers for Oromigna
language is the one developed by Wakshum [55]. The stemmer uses suffix table in combination
with rules that strips off suffix from a given word by looking up the longest match suffix in the
suffix list. 342 suffixes are compiled automatically by counting and sorting the most frequent
endings. Other linguistically valid suffixes are also included manually. The stemmer fined the
longest suffixes that matches the end of a given word and remove.
Debela Tesfaye and Ermias Abebe [6] have developed Oromigna stemmer by considering some
problems from the stemmer developed by Wakshum M. and come out with new stemmer. This
stemmer adopted some concepts from Porter stemmers. Specifically, concepts about measure,
arranging the rules in clusters, analyzing word formation based on the nature of their endings are
taken from porter algorithm. This Afan Oromo stemmer is based on a series of steps that each
removes a certain type of affix by way of substitution rules. These rules only apply when certain
conditions hold, e.g. the resulting stem must have a certain minimal length. Most rules have a
Page | 42
condition based on the measure. The measure is the number of vowel-consonant sequences
which are present in the resulting stem. This condition must prevent that letters which look like a
suffix but are just part of the stem will be removed.
TIGRIGNA STEMMERS
Tigrigna is categorized under Semitic language family. The first attempt to develop Tigrigna
stemmer was made by Girma Berhe [44]. The algorithm uses iterative approach but when it finds
two affixes that match with the word, it removes the longest one. As Tigrigna is morphologically
complex language a context-sensitive stemmer was considered appropriate in this algorithm.
This stemmer used five step rules for the purpose of removing affixes. The first step takes the
word to be stemmed as an input and removes double letter reduplication. The second step
removes prefix-suffix pair. This step takes the output of the first step as input and checks if the
words contains match with any prefix-suffix pair. If the word contains a match and the remaining
string has a length greater than the minimum length, then the prefix and suffix are removed from
a word. The third step removes prefixes and takes the output of prefix-suffix stripping. In
removing a prefix, checking for match in the prefix list and counting length of the remaining string
is done. The fourth step removes suffixes by accepting the output from the previous stem and
checks if the word contains any match from the list of suffixes. If the word has a match and the
remaining string is greater than minimum length the suffix is removed from the word. In the last
step the algorithm stems reduplication of single letter. This algorithm has recording rule that is
applied after each step is applied for checking some spelling exceptions and making readjustment.
Tigrigna stemmer was developed by Yonas Fisseha [54]. This paper presents the development of a
rule-based stemming algorithm for Tigrigna. The algorithm is working based on a set of steps
composed by a collection of rules. Each rule specifies the affixes to be removed; the minimum
length allowed for the stem and a list of exceptions rules. There are many exceptions for making
any stemming rule in Tigrigna. The researcher has considered these exceptions in designing the
Page | 43
stemmer. Analysis of the inflectional and derivational types of affixes of the language was
necessary for this kind of thesis work. The stemmer was designed by new word classification
according to their affixes. The stemming is performed using a rule-based algorithm that removes
affixes. The researcher conducts the research as the past research of Tigrigna language stemming
is limited. By analyzing the Tigrigna grammatical rules, the researcher decided to follow
inflectional and derivational affix removal and designed a new rule-set for the Tigrigna stemmer.
The goal of the research was to develop and document a new rule-based stemmer for the Tigrigna
language.
AMHARIC STEMMERS
Amharic is categorized under Semitic language family. Amharic is a language with very rich
morphology and the main previous contribution in the area of stemming Amharic is the work by
Nega [32] which investigated the effect of stemming for information retrieval. Nega Alemayehu
and Peter Willett [3] have developed a stemmer for information retrieval purposes. The stemmer
has been developed in a manner analogous to that used previously in a stemmer for Slovene [36].
Specifically the algorithm first identifies a set of stop-words and then a set of affixes associated
with the remaining content-bearing words. They have used the characteristics of the resulting
affixes were used to guide the development of the stemmer. The stemmer removes affixes by
iterative procedures that employ a minimum stem length, recording and context sensitive rules,
with prefixes being removed before suffixes. Once the stem of the word is obtained, the root is
obtained by stripping all the remaining vowels. Those studies also indicated a positive effect from
using the stemmed forms in information retrieval: Alemayehu and Willett [3] compared
performance of word based, stem-based, and root-based retrieval, and showed better recall levels
for stem- and root-based retrieval over word based.
The other Amharic stemmer was developed by Atelach Alemu and Lars Asker [7]. The stemmer
finds all possible segmentations of a given word according to the morphological rules of the
language and then selects the most likely prefix and suffix for the word based on corpus statistics.
Page | 44
It strips off the prefix and suffix and then tries to look up the remaining stem (or alternatively,
some morphologically motivated variants of it) in a dictionary to verify that it is a possible stem of
the word.
As described above, there are a number of works to develop a stemming algorithm for local
languages. However, as to the researcher knowledge there is no attempt done to develop a
stemmer for Kemantney language. Hence, the aim of the current study is to investigate the
characteristics of Kemantney word formation and come up with a stemmer for the language.
Page | 45
CHAPTER THREE
MORPHOLOGY OF KEMANTNEY LANGUAGE
Linguistics is concerned with languages and their structures. It studies languages at different levels
such as at phone, word, sentence and the like. There are also different branches in linguistics to
deal with specific features of a language. The four major branches of linguistics commonly
available in the literature are phonology, syntax, semantics and morphology [23].
Phonology concerns sounds (or phones) and their features. The general understanding in
linguistics is that human utterance is produced from distinct sounds (or phones) that in
combination are used to form words or any other meaningful linguistic entity [23]. Hence,
Phonology also tries to identify rules that govern combination and co-occurrences of phones in
words and sentences. Syntax concerns the combinations of words as phrases and phrases as
sentences and the rules that govern the combinations. Semantics is concerned with word and
sentence meanings and their interpretations. Morphology, on the other hand, concerns words and
their internal structures [19]. It deals with constituents of words and the rules that govern their
co-occurrences in words. This study specifically concentrates on morphology, principal concern of
which is words and their internal structures [53].
3.1. MORPHOLOGY
Morphology plays an important role to guide the development of the stemmer, and describe the
nature and characteristics of suffixes in Kemantney words. Therefore, developing a stemmer for a
language requires analyzing, understanding and also modeling the language trend in terms of
word formation. The problem related to research in developing stemming algorithm for a
language requires studying word formation in that given language, like Kemantney.
Kemantney language has the feature of transmitting different messages with a single word alone.
The above concept shows the importance of studying the Kemantney morphology for the purpose
Page | 46
of modeling it, promoting it and developing automatic procedures for the language. Accordingly,
in this study an attempt is made to study the inflectional and derivational morphology of the
language since it is the nature and characteristics of suffixation that guide the development of the
stemming algorithm.
As discussed by Trost [53], there are two kinds of morphology: inflectional morphology and
derivational morphology. Inflectional morphology is concerned with the inflectional changes in
words where word stems are combined with grammatical markers for things like person, gender,
number, tense, case and mode. Inflectional changes do not result in changes of parts of speech.
Derivational morphology deals with those changes that result in changing classes of words
(changes in the part of speech; for example, noun → verb, verb → adjective) and less productive
than inflection morphology.
A morpheme is the smallest linguistic unit of a word that has a meaning or grammatical function in
the language.
3.2. WORD FORMATION IN KEMANTNEY
Zelealem [1] mentions the following syntactic features of Cushitic which give the modern Ethio-
Semitic languages their peculiarly non-Semitic character.
 Pre- and post-positions (Cushitic has postpositions, Semitic prepositions);

 Clause-final position of main verbs and auxiliaries;
 Subordinate clauses precede main clauses;
 Modifiers: Adjectives, relative clauses and genitive phrases precede the nouns they modify.
Kemantney has the general Cushitic word order. Most words in Kemantney end in vowels. All
vowels appear in word medial and final positions. Declarative sentences are predominantly
Subject-Object-Verb (SOV). The features that Kemantney adopted from Amharic, especially in
Page | 47
syntax, which are distinctively Amharic but not Agew, are not so striking because of the fact that
the structure of the replaced language has been adopted by the replacing language and as a result
have equivalent systems. The adjutative confused with the causative. Reduplication seems to be
more vulnerable to be forgotten faster that other grammatical categories such as suffixation.
Hence, the frequentative and reciprocity show striking dissimilarity among informants. The
gerundive is simplified and as a result speakers use bare stems without the gerundive marker.
Derivations of nominal and adjectives become closed classes as a result of which derivational
suffixes are limited to few words only. Case simplification is the other area where terminal
speakers show variations. The case in point is the nominative and genitive cases which often
appear without the case markers. Plural marking, which is heterogeneous from the outset show a
high degree of deviation. Terminal speakers show a high degree of uniformity or overuse of the
suffix /-እክ/ indiscriminately [1].
The vowel inventory of Kemantney shows that it has the following list of vowels at the front, back
and central part of a word.
Front Central Back

ኢ እ ኡ
ኤ ኧ ኦ
ኣ
Sound of Kemantney language is represents the following letter:
Table 3.1: Sound of Kemantney language
Page | 48
The fluent speakers produce these sounds perfectly; passive speakers have difficulty in producing
them.
3.3. INFLECTIONAL SUFFIXES OF KEMANTNEY
3.3.1. NOUNS
According to Zelealem [1] Kemantney nouns are either simple or derived. While simple nouns are
abundant, derived ones are exceedingly rare.
3.3.1.1. ACTION NOMINALS
There are some native action nominal derived from verbal stems.
Verb root Nominal Verb root Nominal

ዋኝⷐር- ዋኝⷐር Ask Question
ወንተርሽ- ወንተርሽ Answer Answer
ክልኝ- ክልኝ Dance Dance
ዋጘርት- ዋጘርት Play Game
ሹም- ሹም Fast Fasting
ሻጝ- ሻጝ Urinate Urine
ዅይ- ------ Eat Eating
ጋኝ- ------ Run Running
ጃⷕ- ------ Drink Drinking
Table 3.2: Sample list of action nominal
The possible and frequent forms of action nominal are similar to the corresponding verb stems
and to imperative forms: ዋኝⷐር ‘ask! (2S)’, ወንተርሽ ‘answer! (2S)’, ክልኝ ‘dance! (2S)’, ሹም ‘fast! (2S)’,
ዋጘርት ‘play! (2S)’, ሻጝ ‘urinate! (2S)’, ጃⷕ ‘drink! (2S)’.
Page | 49
There are plenty of examples derived with the suffix /-ኣ/ as in ኯሰንት-‘use a pillow’ → ኯሰንት-ኣ
‘pillow’; ጏዝ-‘farm’ → ጏዝ-ኣ ‘farm’ ; ትⷕዝ- ‘smoke’ → ትⷕዝ-ኣ ‘smoke’; ⷕይር- ‘smell’ → ⷕይር-ኣ ‘smell’;
አደይት- ‘borrow’ → አደይት-ኣ ‘loan’.
Nevertheless, the above derivational pattern does not work for all such nominal. For example, one
would expect ጋኝ, ዅይ and ጃⷕ to be action nominal for 'run', 'eat' and 'drink' respectively.
3.3.1.2. GENDER
Kemantney noun has a two-term gender system: masculine and feminine. The distinction is shown
in the singular. As in many other languages, natural gender operates as in, for instance, አባ 'father'
vs. ገና 'mother', ዘን 'brother' vs. ሸን 'sister', ቴር 'aunt' vs. አግ 'uncle', ቢራ 'ox' vs. ከማ 'cow'. Gender is
also expressed with the words ጕርዋ 'male' and ይውና 'female' as in ጕርዋ/ ጕሬይር 'a male person
(man)' and ይውኔዅራ 'a female person (woman)'.
In nouns with quantifiers, the masculine is indicated by /- ጝ/ and the feminine by /-ይ/ which are
identical with the relative markers.
ላ-ይ ሻሹና One-F chick = one chick (F) ላ-ጝ ግዝኝ One-M dog = One dog (M)
ላ-ጝ ሻሹኒ One-M chick = one chick (M) ላ-ይ ይር One-F person = One person (F)
ላ-ይ ድⶓራ One-F donkey = one donkey (F) ላ-ጝ ይር One-M person = One person (M)
ላ-ጝ ድⶓሪ One-M donkey = one donkey (M) ላ-ይ ፍንትራ One-F goat = One goat (F)
ላ-ይ ግዝኝ One-F dog = one dog (F) ላ-ጝ ፍንትሪ One-M goat = One goat (M)
Gender distinction is marked neither in pronouns nor verbs in the 2S. The distinction between the
3FS and 3MS pronouns is also neutralized in the speech of the majority of terminal speakers.
3.3.1.3. NUMBER
Kemantney noun has two terms of number: singular and plural.
Page | 50
Singular Plural Singular Plural
በልጋ Barley በልገክ Barley ዲባ Mountain ድብክ Mountains
ሽብካ Hair ሽብከክ Hairs ክብና Forest ክብንክ Forests
ገር Calf ገርክ Calves ክርና Stone ክርንክ Stones
ሻንካ Grass ሻንክክ Grasses ካና Wood ካንክ Woods
እንሽዋ Mouse እንሽክ Mice አሻ Leaf አሽክ Leaves
ትዀና Bedbug ትዀንት Bedbugs ጋጛ Cliff ጋጝክ Cliffs
አርግ Bed አርግክ Beds ውላጛ Field ውላጝክ Fields
ኵራ River ኵርክ Rivers እኝዅ Ear እኝዀክ Ears
Table 3.3: Sample list of Kemantney noun
Plural marking in Kemantney is quite heterogeneous.
1. By eliding the final vowel (ኣ → Ø) and then by adding /-እክ/:
Singular Plural Singular Plural

በይላ Mule በይልክ Mules ዲርዋ Hen ዲርውክ Hens
ከተማ Town ከተምክ Towns ዲⷓ Poor ዲⷕክ Poors
ፈርዛ Horse ፈርዝክ Horses ጏርዋ Road ጏርውክ Roads
ይውና Female ይውንክ Females ጕርዋ Male ጕርውክ Males
2. By suffixing /-ክ(እ)/:
Singular Plural
ከው Village ከውክ Villages
ግዝኝ Dog ግዝኝክ Dogs
ንኝ House ንኝክ Houses
ሰይኝ Clothe ሰይኝክ Clothes
ደብር Church ደብርከ Churches
ሽኙ Name ሽኙክ Names
3. By eliding the final syllable, in this case (-ያ) , and then by suffixing /-ክ/:
Page | 51
Singular Plural
ሻሊያ Knife ሻሊክ Knives
ዳሚያ Cat ዳሚክ Cats
እኝጊያ Horn እኝጊክ Horns
4. By suffixing /-ት(እ)/:
Singular Plural
ዘን Brother ዘንት Brethren /Brothers
ሸን Sister ሸንት Sisters
አብ Guest አብት Guests
5. By suffixing -ተት in pair items:
Singular Plural
ይል Eye ይልተት Eyes
ናን Hand ናንተት Hands
6. By eliding the final vowel and suffixing /-ኧክ/:
Singular Plural
ክምብ Stick ክምበክ Sticks
7. By changing the final vowel (ኣ→እ ):
Singular Plural
ፍንትራ Goat ፍንትር Goats
ማⷕላ Friend ማⷕል Friends
ከማ Cow ከም Cows
8. By changing the stem final letter to (ል), and by adding suffixing /-ት(እ)/.
Page | 52
Singular Plural
ቢራ Ox ቢልት Oxen
ድⶓራ Donkey ድⶓልት Donkeys
ገር Calf ገልት Calves
9. By suffixing /-ኣን/:
Singular Plural
እርኵ Tooth እርኳን Teeth
10. By final vowel change /ኣ→እ/:
Singular Plural
ጏዘንታ Farmer ጏዘንት Farmers
ኪንሸንታ Teacher ኪንሸንት Teachers
11. By partial reduplication and adding -ን:
Singular Plural
ልኵ Foot ልኯኯን Feet
12. By way of gemination:
Singular Plural
ዅራ Child ዅርራ Childern
13. By suppletion:
Singular Plural
ይር Person እይየን Persons
14. Most borrowed nouns and adjectives use /-ኧን/:

Page | 53
Singular Plural
ሰይጣን Devil ሰይጣነን Devils
ጣት Finger ጣተን Fingers
ረባሽ Naughty ረባሸን Naughty
ጏበዝ Clever ጏበዘን Clever
ⷐብት Cattle ⷕብተን Cattle
ⷓብታም Rich ⷓብታመን Rich
Most adjectives show the plural through complete reduplication as shown below.
Complete reduplication:
Singular Plural
ሽⶕይ Small ሽⶕይ-ሽⶕይ Small
ፋራጝ Big ፋራጝ-ፋራጝ Big
ለገዛጝ Tall ለገዛጝ-ለገዛጝ Tall
Complete reduplication and suffixing /-ኧን/ or /-ኣ/:
Singular Plural
ግሴር Short ግሴረንግሴረን Short
ሸመን Black ሸመናሸመና Black
There are few adjectives which take their plural form by suffixing /-ኧው/:
Singular Plural
ይዘን Fat ይዘነው Fat
አዚ New አዘው New
The main plural marking processes are suffixation and reduplication. The rest include vowel
changes, vowel addition, consonant change, gemination and suppletion.
Page | 54
3.3.1.4. CASE SYSTEM
NOMINATIVE CASE
In Kemantney, the nominative (NOM) case is morphologically indicated by the suffix /-ኢ/ in
definite nouns.
Kemantney words:
Absolutive Nominative Absolutive Nominative

ፈርዛ ፈርዛ-ኢ Horse ⷓ ሸና ⷓ ሸና-ኢ Thief
ዅራ ዅራ-ኢ Child ሲያ ሲያ-ኢ Meat
ጃና ጃና-ኢ Elephant ገመና ገመና-ኢ Lion
Kemantney verbal nouns:
Absolutive Nominative
ⷓ ልና ⷓልና-ኢ Seeing
ጋኝና ጋኝና-ኢ Running
ዅና ዅና-ኢ Eating
Amharic loanwords:
Absolutive Nominative
አሳ አሳ-ኢ Fish
ካሳ ካሳ-ኢ Kasa (proper noun)
ካራ ካራ-ኢ Knife
ከተማ ከተማ-ኢ Town
Page | 55
From the above table Kemantney nouns which end in /-ኣ/ drop this vowel and take the
nominative marker /-ኢ/. This is a typical Cushitic feature where root or stem final vowels drop in
the context of a vowel initial suffix as vowel sequencing is not allowed.
When a noun ends in a vowel other than /-ኣ/ or in a consonant, the nominative marker /-ኢ/ does
not appear the following table.
ይር አዅ ጃⷕ-እⶖ = man water drink-PS The/A man drank water.

ግዝኝ ሲያ ዅይ-እⶖ = dog meat eat-PS The/ A dog ate meat.
ገር ሸብ ጟብ-እⶖ = calf milk suck-PS The/A calf suck milk.
ከልሜድ ከማ ⷓሸንት-እⶖ = shepherd cow steal-PS The/ A shepherd stole cow.
Personal and demonstrative pronouns can be taken as evidence for the above nominative case
marking rule in Kemantney since they do not show the suffix /-ኢ/ due to their form. Furthermore,
proper nouns can be examples for the same phenomenon. As a result, whereas, for instance, ካሳ,
ለማ, አሰፋ become ካሳ-ኢ, ለማ-ኢ, አሰፋ-ኢ respectively for the nominative, nouns which end in
consonants and vowels other than /ኣ/, are unmarked.
The grammatical relation of nominative case is characterized by gender since only masculine
nouns show the nominative marker /-ኢ/.
ድⶓራ አውኝ-እ-ት = donkey bray-PS-3F The/A donkey (F) brayed.

ድⶓራ-ኢ አውኝ-እⶖ = donkey-NOM bray-PS The donkey (M) brayed.
ፈርዛ ይር-እስ ደድ-እ-ት = horse tread on-PS-3F The /A horse (F) trod on the man.
ፈርዛ-ኢ ይር-እስ ደድ-እⶖ = horse-NOM tread on-PS The horse (M) trod on the man.
Only masculine nouns which end in /ኣ/ take the nominative marker.
Page | 56
In Kemantney, definiteness and case are inseparable. The indefinite is the unmarked form.
Definiteness and indefiniteness are not morphologically marked in Kemantney neither in the
singular nor in the plural nouns.
ይር ት-እⶖ = ይር ትⶖ The/A man came

ግዝኝ ከውይ-ኣጝ = ግዝኝ ከውያጝ The/A dog barked
ኵራ-ኢ ካግ-እⶖ = ኵሪ ካግⶖ The river dried
ይውኔ ዅራ ፈይ-ኢ-ት = ይውኔ ዅራ ፈዪት The/A woman went
እይየን ት-ን-እⶖ = እይየን ትንⶖ (The) people came
ግዝኝ-ክ በብ-ን-እⶖ = ግዝኝክ በብንⶖ (The) dogs barked
ኵራ-እክ ካግ-ን-እⶖ = ኵርክ ካግንⶖ (The) rivers dried
ይውን ፈይ-ን-እⶖ = ይውን ፈይንⶖ (The) women went
ACCUSATIVE CASE
The two suffixes which indicate accusative case are /-(እ)ት/ and /-(እ)ስ/. The vowel is inserted when
the noun to which the case suffix is attached ends in a consonant to avoid impermissible clusters.
Like in the nominative, a noun takes an accusative suffix if it is definite. The accusative case is
marked in feminine nouns with the suffix /-(እ)ት/. The first are singular object personal pronouns
which take /-(እ)ት/ like feminine nouns. In masculine and plural nouns, the accusative suffix is /-
(እ)ስ/. The plural personal pronouns, which take the suffix /-(እ)ስ/ like masculine and plural nouns.
ይር ዛፍ-እስ ከብ-እⶖ = man tree-AC cut-PS = The man cut the tree.
ኳንት-ኢ ንኝ-እስ ⷐትይ-ኣጝ = ice-NOM house-AC puncture-PS =The ice pierced the house.
ገን-ኣጝ ይር ዅርራ-ስ ገውት-እⶖ = be old-RE man children-AC praise-PS = The old man praised the
children.
Page | 57
ⷕብት-ኧን ሻንካ-ስ ድኙ-እን-እⶖ = cattle-PL grass-AC finish-PL-PS = The cattle finished the grass.
አን ይ-ገና-ት አንጅኝ ⷓል-እⶖ = I my-mother-AC yesterday see-PS = I saw my mother yesterday.
ይውኔ ዅራ ኒሽ-ሰይኝ-እስ ⷐሽ-እ-ት = female child her-cloth-AC wash-PS-3F = The woman washed her
clothes.
A. Accusative case for demonstratives pronoun take the suffix /-(እ)ስ/ and /-(እ)ት/.
Near Far
እን-ስ = እንስ (ይህን) This (M) ይን-ስ = ይንስ (ያን) That (m)
እን-ት = እንት (ይችን) This (F) ይን-ት = ይንት (ያችን) That (F)
እንደው-እስ = እንደውስ (እነዚህን) These (PL) ይንደው-እስ = ይንደውስ (እነዚያን) Those (PL)
B. Reflexive pronominal shows the case suffix in both the base and reflexive forms.
ይ-ት ይ-አⶓይ-እስ = my-AC my-head-AC I myself

ኵ-ት ቲ-አⶓይ-እስ = your-AC your-head-AC You yourself
ኒ-ት ኒ-አⶓይ-እስ = his-AC his-head-AC He himself
ኒ-ት ኒሽ-አⶓይ-እስ = her-AC her-head-AC She herself
አኔው-እስ አነ-አⶓይ-እስ = our-AC our-head-AC We ourselves
እንተዴው-እስ እንተ-አⶓይ-እስ = your-AC your-head-AC You yourselves
ናይዴው-እስ ና-አⶓይ-እስ = their-AC their-head-AC They themselves
C. Numerals show the accusative case suffix.
ላጝ-እት = ላጝት The one (feminine or small one) ነኘይኝ-እስ = ነኘይኝስ The twenty
ኒኛ-ት = ኒኛት The two ሳⶓይኝ-እስ = ሳⶓይኝስ The thirty
ሴጟ-ት = ሴጟት The three አርበ-ስ = አርበስ The forty
ሽካ-ት = ሽካት The ten ሊⷕ -እስ = ሊⷕስ The hundred
Page | 58
GENITIVE CASE
The third form of morphological case is the genitive case. Like other modifying elements in noun
phrases, genitive noun phrases (NPs) precede the head they modify. The genitive endings are /-ኢ/,
/-ኧ/, /-ኤይ/, /-Ø/ and /-ድ/.
A. The following are examples with semantic relations of source and purpose and with the
endings /-ኢ/, /-Ø/, and /-ኧ/.
ሽⷕ -ኢ ንኝ = ሽⷒ ንኝ (የጭቃ ቤት) House of mud

ንኝ-Ø ሺⷓ = ንኝ ሺⷓ Mud for house
ሻንክ-ኢ ንኝ = ሻንኪ ንኝ (የሳር ቤት) House of grass
ንኝ-Ø ሻንካ = ንኝ ሻንካ Grass for house
ሳጝ-ኢ ሚዝ = ሳጚ ሚዝ (የማር ጟጅ) Mead of honey
ሚዝ-ኧ ሳጝያ = ሚዘ ሳጝያ Honey for mead
በልግ-ኢ ስላጛ = በልጊ ስላጛ (የገብስ ጟላ) Tella of barely
ስላጝ-ኢ በልጋ = ስላጚ በልጋ Barely for tella
ካን-ኢ አርግ = ካኒ አርግ (የእንጨት አልጋ) Bed of wood
አርግ-ኢ ካና = አርጊ ካና Wood for bed
ሻጝ-ኢ ክምብ = ሻጚ ክምብ (የብረት ዱላ) Iron of stick
ክምብ-ኢ ሻጛ = ክምቢ ሻጛ Stick for iron
B. Genitive constructions show the suffix /-ኤይ/.
ከማ (ላም) + ጊያ (ቀንድ) → ከማ-ኤይ ጊያ Horn of cow

አሸር (አጋም) + ኵራ (ወንዝ) → አሸር-ኤይ ኵራ River of thorn
ዲርዋ (ደሮ) + ላባ → ዲርዋ-ኤይ ላባ Feather of hen
C. When a noun ends in a consonant cluster and as a result has the epenthethic vowel /-እ/ the
genitive is marked by a zero morpheme.
Page | 59
ክምብ-Ø + ሻጛ → ክምብ ሻጛ Stick of iron
ሰርግ-Ø + ክልኝ (ዘፈን) → ሰርግ ክልኝ Dance of wedding
ካርት-Ø (ሩቅ) + ዘመድ → ካርት ዘመድ Distant relative
D. Some gentive constructions show vocalic changes of /ኣ/ to /ኧ/.
በልጋ-ኧ (ገብስ) + ክርመና (ክምር) → በልገ ክርመና Hip of barely

ማይላ-ኧ (ማሽላ) + ስላጛ (ጟላ) → ማይለ ስላጛ Beer of maize
ዳውሻ-ኧ (ዳጉሳ) + ትⷕ ዛ (አረቂ) → ዳውሸ ትⷕ ዛ Alcohol of millet
E. The majority of genitive constructions in Kemantney show the suffix /-ኢ/ which is similar to
the nominative case marker.
ምራዋ-ኢ (እባብ) + አⶓይ (እራስ) → ምራዊ አⶓይ Head of snake

ዲርዋ-ኢ (ደሮ) + ይል (ዏይን) → ዲርዊ ይል Eye of hen
ንኝ-ኢ (ቤት) + በላ (በር) → ንኚ በላ Door of house
ኳራ-ኢ (ፀሐይ) + ትው (መግቢያ) → ኳሪ ትው Entrance of sun (west)
ኳራ-ኢ (ፀሐይ) + ፊያጝ (መዉጫ) → ኳሪ ፊያጝ Rise of sun (east)
F. Most Kemantney compound nouns have similar structure and show /-ኢ/ as the following.
ክመንታ-ኢ (ክማንት) + ታሪክ → ክመንቲ ታሪክ Story of Kemantney

ክብና-ኢ (ዱር/ደን) + እንስሲ (እንስሳ) → ክብኒ እንስሲ Animal of forest (wild animal)
ግርጋ-ኢ (ቀን) + ስርጝ (ስራ) → ግርኪ ስርጝ Labour of day (daily labour)
G. There are genitive noun phrases, which do not show any genitive marker the following table:
ጏንደር-Ø ይር A man of Gondar ⷕብተን-Ø መንጋ Herd of cattle
ብር-Ø ሳንቲም Coin of silver ክመም-Ø ሻዅ Spice of sauce
ጏኝ-Ø አዅ Water of well ሰይኝ-Ø ሱክ Shop of cloth
Page | 60
H. With proper nouns, the suffix /-ድ/ and /-ኤይ/ appear as possessive genitives with the
possessor noun. The morpheme /-ድ/ is suffixed to nouns which end in consonants, whereas
it’s variant /-ኤይ/ appears with nouns which end in vowels.
ቢተዋ-ኤይ ሸመርጊና The spear of Bitwa

ካሳ-ኤይ ከማ The cow of Kasa
አዳም-ድ ሰይኝ The cloth of Adam
በላይ-ድ ከምብ The stick of Belay
I. Demonstrative pronouns show gentive case.
እን-ዝⶖ ጋላ = this-of is = It is of this (M) እን-ዝ ከው = እንዝ ከው = Of this (M) village

ይን-ዝⶖ ጋላ = that-of is = It is of that (M) ይን-ዝ ከው = ይንዝ ከው = Of that (M) village
እን-ሽⶖ ጋላ = this-F-of is = It is of this (F) እን-ሽ ከው = እንሽ ከው = Of this (F) village
ይን-ሽⶖ ጋላ = that-F-of is = It is of that (F) ይን-ሽ ከው = ይንሽ ከው = Of that (F) village
እንደው-ዝⶖ ጋላ = these-of is = It is of these እንደው-ዝ ከው = እንደውዝ ከው = Of these (PL) villages
ይንደው-ዝⶖ ጋላ = those-of is = It is of those ይንደው-ዝ ከው = ይንደውዝ ከው = Of those (PL) villages
OBLIQUE CASE
Kemantney, as is usual in SOV languages, had postpositions which assign oblique cases to the
nouns they are suffixed.
As has been mentioned in Tucker and Bryan [57], there is a problem of making a clear distinction
between postpositions and case endings in Cushitic languages, as the former are considered as
case suffixes by some Cushiticists and postpositions by others. Kemantney has the following
postpositions.
Page | 61
ፈርዛ-ዝ = ፈርዛዝ By horse ጎንደር-(ዳ)ጛሽ = ጎንደር(ዳ)ጛሽ Up to Gondar
ይ-ዝ = ይዝ For me ይ-ወ = ይወ To me
ባር-እዝ = ባርዝ In the sea ጎንደር-ወ = ጎንደርወ To gondar
ፌንስትረ-ዝ = ፌንስትረዝ Through the window ከተሚ-ዳይ = ከተሚዳይ Near the town
ⷓሸነ-ዝ = ⷓሸነዝ About the thief አመር-አጛሽ = አመርአጛሽ Until tomorrow
ሸመርጊነ-ዝ = ሸመርጊነዝ With a spear ኵሪ-ዳይ-ኢል = ኵሪዳዪል By the side of the river
መሽ-እዝ = መሽዝ In winter አብን-ድ(ክ) = አብንድ(ክ) With the guest
Simple sentence for Kemantney has the general Cushitic word order. Declarative sentences are
predominantly SOV.
አን ኒ-ት ⷐ ል-እⶖ = I he-Ac see-PS I saw him

አን ሲያ-ስ ዅይ-እⶖ = I meat-Ac eat-PS I ate the meat
ታየ ኒ ምዚ-ስ ዅይ-እⶖ = Taye his-lunch-Ac eat-PS Taye ate his lunch
ኒ ካነ-ስ ከብ-እⶖ = He tree-Ac cut-PS He cut the tree
3.3.1.5. NOMINAL DERIVATION
INFINITIVAL NOMINAL
A. The infinitive derives from the verb by suffixing /-(እ) ና/ to verb stems.
ኵይ- Kill ኵ-ና = ኵና To kill ፈለገ- Desire ፈለገ-ና = ፈለገና To desire

ዅይ- Eat ዅ-ና = ዅና To eat ክልኝ- Dance ክልኝ-ና = ክልኝና To dance
ጃⷕ - Drink ጃⷕ -ና = ጃⷕና To drink ዲንት- Swim ዲንት-እና = ዲንትና To swim
ጏዝ- Plough ጏዝ-ና = ጏዝና To plough ገንጅ- Sleep ገንጅ-እና = ገንጅና To sleep
At word and morpheme boundaries, /ይ/ is deleted and the vowel /እ/ is fronted.
Page | 62
B. The negative infinitive is ‘not to’ shown by a combination of two verbs: a main and a helping
verb. The main verb takes the negative suffix /-(እ)ግ/ which appears with embedded verbs and
the helping verb በይ- 'give up/dismiss' takes the infinitive marker /-(እ)ና/.
Positive infinitive Negative infinitive

ዋስ-ና To hear ዋስ-ግ በይ-ና = hear-NEG give up-INF To give up without hearing
ገንጅ-እና To sleep ገንጅ-እግ በይ-ና = sleep-NEG give up-INF To give up without sleeping
ክልኝ-ና To dance ክልኝ-ግ በይ-ና = dance-NEG give up-INF To give up without dancing
C. Kemantney infinitive forms do not inflect for person, number or gender. But, they are inflected
for case.
አን ጋኝ-ነ-ስ ኒ አⷕ-ኧኵ = I run-INF-AC he know-IMP He knows my running

እንት ጋኝ-ነ-ስ ኒ አⷕ-ኧኵ = you run-INF-AC he know-IMP He knows your running
ኒ ጋኝ-ነ-ስ ኒ አⷕ-ኧኵ = she run-INF-AC he know-IMP He knows her running
ኒ ጋኝ-ነ-ስ ኒ አⷕ-ኧኵ = he run-INF-AC he know-IMP He knows his running
አኔው ጋኝ-ነ-ስ ኒ አⷕ-ኧኵ = we run-INF-AC he know-IMP He knows our running
እንተዴው ጋኝ-ነ-ስ ኒ አⷕ-ኧኵ = you run-INF-AC he know-IMP He knows your (PL) running
ናይዴው ጋኝ-ነ-ስ ኒ አⷕ-ኧኵ = they run-INF-AC he know-IMP He knows their running
The infinitival nominals inflect for accusative case. The vowel /ኣ/ of the infinitive marker changes
to /ኧ/ as it is always the case at morpheme boundaries.
D. As nominals, infinitives also take the bound possessive pronominal and the nominative case.
Page | 63
ይ ጋኝ-ን-ኢ ዋስ-ስ-እⶖ = my-run-INF-NOM hear-PA-PS My running was heard
ቲ ጋኝ-ን-ኢ ዋስ-ስ-እⶖ = your-run-INF-NOM hear-PA-PS Your running were heard
ኒ ጋኝ-ን-ኢ ዋስ-ስ-እⶖ = his-run-INF-NOM hear-PA-PS His running was heard
ኒሽ ጋኝ-ን-ኢ ዋስ-ስ-እⶖ = her-run-INF-NOM hear-PA-PS Her running was heard
አነ ጋኝ-ን-ኢ ዋስ-ስ-እⶖ = our-run-INF-NOM hear-PA-PS Our running were heard
እንተ ጋኝ-ን-ኢ ዋስ-ስ-እⶖ = your-run-INF-NOM hear-PA-PS Your running were heard
ና ጋኝ-ን-ኢ ዋስ-ስ-እⶖ = their-run-INF-NOM hear-PA-PS Their running were heard
The /ኣ/ of the infinitive marker is deleted before /-ኢ/ due to the impermissibility of vowel
sequencing.
ABSTRACT NOMINALS
Abstract (AB) nominals are derived from simple nouns by suffixing /-ነይ/
ሸጛ → ሸጝ-ነይ Mud → Mudness ይውና → ይውን-ነይ Female → Femaleness

ዅራ → ዅር-ነይ Child → Childhood ዲⷓ → ዲⷕ -ነይ Poor → Poorness
ጕርዋ → ጕርው-ነይ Male → Maleness ⷓ ሸንታ → ⷓ ሸን-ነይ Thief → Stealing
The above pattern is still retained by only some active terminal speakers. Passive terminal
speakers used the simple nominal forms for the abstract whereas others use borrowings from
Amharic.
Like other nouns, abstract nominals show the following possessive paradigm.
ቲ-ዅር-ነይ = your-child-AB Your childhood

ኒ-ዅር-ነይ = his-child-AB His childhood
ኒሽ-ዅር-ነይ = her-child-AB Her childhood
አነ-ዅር-ነይ = our-child-AB Our childhood
እንተ-ዅር-ነይ = your-child-AB your childhood
ና-ዅር-ነይ = their-child-AB Their childhood
Page | 64
AGENTIVE NOMINALS
The derivation of agent nominals is by far the most productive process well retained. Its pattern is
[verb stem + -ኧንታ] as the following examples illustrate.
ጏዝ-
Farm Farm ጏዘንታ Farmer
አⷕ - Know አⷐንታ Knowledgeable
ኪንሽ- Teach ኪንሸንታ Teacher
ⷓ ሸንት- Steal ⷓ ሸ-ኧና/-ኧንታ = ⷓ ሸና/ ⷓ ሸንታ Thief
ክልኝ- Dance ክልኘንታ Dancer
ሽው- Beg ሽወንታ Beggar
3.3.2. PRONOUNS
According to Zelealem [1], Kemantney pronouns include are the following subject personal
pronouns, object pronouns and possessive pronouns:
3.3.2.1. SUBJECT PERSONAL PRONOUNS
There are seven subject personal pronouns in Kemantney language.

Number Person
Singular First አን (እኔ) I
Second እንት (አንተ/አንች) You
ናይ (እርሰዎ/እርሳቸዉ) You (POL)
Third ኒ (እሱ) He
ኒ (እሷ) She
Plural First አኔው (እኛ) We
Second እንተእንዴው (እናንተ) You
Third ናይዴው (እነሱ) They
Page | 65
3.3.2.2. OBJECT PRONOUNS
The object pronouns of Kemantney language are the following to show the object suffixes: /-(እ)ት/
and /-(እ)ስ/.
Number Person
Singular First ይ-ት = ይት (እኔን) Me
Second ኵ-ት = ኵት (አንተን/አንችን) You
ና-ት = ናት(እርሰዎን/እርሳቸዉን) You (POL)
Third ኒ-ት = ኒት (እሱን) Him
ኒ-ት = ኒት (እሷን) Her
Plural First አኔው-እስ = አኔውስ (እኛን) Us
Second እንተእንዴው-እስ = እንተእንዴውስ (እናንተን) You
Third ናይዴው-እስ = ናይዴውስ (እነሱን) Them
While the basic shape of the plural personal pronouns, the second polite, third male singular and
third female singular are similar to that of the subject pronouns, the first singular and second
singular forms are different from their subject pronouns. Like the subject pronouns, gender
distinction is not shown in both the second singular and third singular object pronouns.
3.3.2.3. POSSESSIVE PRONOUNS
There are two sets of possessive personal pronouns in Kemantney language. In the first set are the
independent possessive pronouns and in the second set, the bound possessive pronouns.
INDEPENDENT POSSESSIVE PRONOUNS
The independent possessive pronouns of Kemantney language are the following:
Page | 66
Number Person
Singular First ይ-እⶖ = ይⶖ (የኔ) Mine
Second ቲ-እⶖ = ቲⶖ (ያንተ/ያንች) Yours
ና-ኣጝ = ናጝ (የእርሰዎ/የእርሳቸዉ) Yours (POL)
Third ኒ-እⶖ = ኒⶖ (የእርሱ) His
ኒሽ-እⶖ = ኒሽⶖ (የርሷ) Hers
Plural First አነ-ኣጝ = አናጝ (የኛ) Ours
Second እንተ-ኣጝ = እንታጝ (የናንተ) Yours
Third ና-ኣጝ = ናጝ (የነርሱ) Theirs
As shown in the above unlike the subject and object pronouns, the possessive pronouns show
gender distinction in the third person singular. The independent possessive pronouns constitute
the bound pronominal and the suffixes /-እⶖ/ and /-ኣጝ/ attached to the singular and plural forms
respectively. In the singular, the initial vowel of the suffix is deleted when the base ends in a vowel
and resulted in ይⶖ, ቲⶖ, ኒⶖ, and ኒሽⶖ. In the plural forms, the final vowel of the base is dropped
and resulted in አናጝ, እንታጝ, and ናጝ. The deletion in both cases is due to the impermissibility of
vowel sequencing.
ይን ቢራ-ኢ ይ-እⶖ ጋላ = that ox-NOM my-PO he is = That ox is mine

ይን ቢራ-ኢ ቲ-እⶖ ጋላ = that ox-NOM your-PO he is = That ox is yours (M/F)
ይን ቢራ-ኢ ና-ኣጝ ጋላ = that ox-NOM your-PO he is = That ox is yours (POL)
ይን ቢራ-ኢ ኒ-እⶖ ጋላ = that ox-NOM his-PO he is = That ox is his
ይን ቢራ-ኢ ኒሽ-እⶖ ጋላ = that ox-NOM her-PO he is = That ox is hers
ይን ቢራ-ኢ አነ-ኣጝ ጋላ = that ox-NOM our-PO he is = That ox is ours
ይን ቢራ-ኢ እንተ-ኣጝ ጋላ = that ox-NOM your-PO he is = That ox is yours
ይን ቢራ-ኢ ና-ኣጝ ጋላ = that ox-NOM their-PO he is = That ox is theirs
Page | 67
The possessive suffixes are /-እⶖ/ and /-ኣጝ/ for the masculine singular and plural respectively.
However, when the possessed noun is feminine, the suffixes change to /-እይ/ and /-ኧይ/ in the
singular and plural respectively.
ይን ከማ ይ-እይ ጋይላ = that cow my-PO she is = That cow is mine

ይን ከማ ቲ-እይ ጋይላ = that cow my-PO she is = That cow is yours (M/F)
ይን ከማ ና-ኧይ ጋይላ = that cow your-PO she is = That cow is yours (POL)
ይን ከማ ኒ-እይ ጋይላ = that cow his-PO she is = That cow is his
ይን ከማ ኒሽ-እይ ጋይላ = that cow her-PO she is = That cow is hers
ይን ከማ አነ-ኧይ ጋይላ = that cow our-PO she is = That cow is ours
ይን ከማ እንተ-ኧይ ጋይላ = that cow your-PO she is = That cow is yours
ይን ከማ ና-ኧይ ጋይላ = that cow their-PO she is = That cow is theirs
The suffixes /-እⶖ/ ~ /-ኣጝ/ and /-እይ/ ~ /-ኧይ/ are homophonous with the main clause past tense
marker and the relativized suffixes in the third male singular and third female singular
respectively.
BOUND POSSESSIVE PRONOUNS (POSSESSIVE ADJECTIVES)
The bound possessive pronouns of Kemantney language are the following:
Number Person
Singular First ይ (የኔ) My
Second ቲ (ያንተ/ያንች) Your
ና (የእርሰዎ/የእርሳቸዉ) Your (POL)
Third ኒ (የሱ) His
ኒሽ (የሷ) Her
Plural First አነ (የኛ) Our
Second እንተ (የናንተ) Your
Third ና (የነሱ) Their
Page | 68
As shown in the above table, except for the second person singular and third person female
singular, the rest are similar to the basic forms of the object pronouns.
3.3.2.4. DEMONSTRATIVE PRONOUNS
According to Zelealem [1]; Hetzron [10] Agew languages make gender distinction in
demonstratives. However, in Kemantney, as in personal pronouns, the distinction of gender is
neutralized in demonstratives. As in pronouns, the informants produced the demonstrative forms
እኒ: (this (F)) and, ይኒ: (that (F)) in the feminine which contrast with the masculine እኒ (this (M)) and
ይኒ (that (M)) respectively.
Number Person Near Far

Singular እኒ This ይኒ that
Plural እንደው These ይንደው Those
3.3.2.5. REFLEXIVE PRONOUNS
Reflexive pronouns are identified by the suffix /-አⶓይ/ 'head' attached to pronouns. They occur on
the following subject pronouns and are used for emphasis.
The bound reflexive pronominal forms are identical with the bound possessive pronouns.
አን ይ-አⶓይ (እኔ እራሴ) = I my-head = I myself
እንት ቲ-አⶓይ (አንተ እራስህ/አንች እራስሽ) = You your-head = you (M/F) yourself
ናይ ና-አⶓይ (እርሰዎ/እርሳቸዉ እራሳቸው) = You your-head = You (POL) yourself
ኒ ኒ-አⶓይ (እሱ እራሱ) = he his-head = He himself
ኒ ኒሽ-አⶓይ (እሷ እራሷ) = she her-head = She herself
አኔው አነ-አⶓይ (እኛ እራሳችን) = we our-head = We ourselves
እንተዴው እንተ-አⶓይ (እናንተ እራሳቹህ) = you your-head = You yourself
ናይዴው ና-አⶓይ (እነሱ እራሳችው) = they their-head = They themselves
Page | 69
3.3.3. ADJECTIVES
According to Zelealem [1] Kemantney adjectives (ADJ) are non-derived adjectives are very rare in
Kemantney as they are in other Agew languages [67]. Simple adjectives still used by the majority
of the informants include only አዚ (new) and ሸመና (black). Adjectival meanings are expressed with
relativized verbs, and as a result, Kemantney is an adjectival-verb language, following the typology
of [43].
There are derived adjectives but they are also rare. They are derived from verbal bases with the
suffix change of the vowel. This is also the same morpheme used in the derivation of nouns.
ጅከክ- Be heavy ጅከካ Heavy

ጕⶖ ይ- Be mad ጕⶖ ያ Mad
ካርት- Be far ካርት Far
ተይት- Be near ተይት Near
ናⷕት- Become many ናⷕት Many
In the case of the first two examples where the verb stem ends in a single consonant, the
adjectival suffix /-ኣ/ appears in the corresponding adjectives. However, when the verb stem ends
in consonant clusters, as in the other three examples, /-ኣ/ is replaced by the epenthetic እ word
finally.
In Kemantney, the deficiency of lexical adjectives is compensated for by clauses. In other words,
relatives are used as adjectival expressions where proper adjectives are lacking.
ገን- Became old ገን-ኣጝ = ገናጝ One, who is old

ለገዝ- Became tall ለገዝ-ኣጝ = ለገዛጝ That, which is tall
Such base forms can occur as predicates in structures like the following:
Page | 70
አን ሸር-ኣር ጋይል = አን ሸራር ጋይል = I am good ሸር-ኧው ጋጝልል/ላ = ሸረው ጋጝልል/ላ = We are good
እንት ሸር-ኣር ጋይላ = እንት ሸራር ጋይላ = You are good ሸር-ኧው ጋጝይልላ = ሸረው ጋጝይልላ = You are good
ኒ ሸር-ኣጝ ጋላ = ኒ ሸራጝ ጋላ = He is good ሸር-ኧው ጋጝልላ = ሸረው ጋጝልላ = They are good
ሸር-ኤይ ጋይላ = ሸሬይ ጋይላ = She is good
3.3.4. VERB
According to Zelealem [1]; Hetzron [67] the prefix conjugations of verbs (V) in Agew languages are
the most archaic and are still preserved in Awngi and Xamt'anga. In Kemantney, they are lost and
instead, there is only suffix conjugation. As mentioned in Dimmendaal [13], reduction in language
use is accompanied by a reduction in structure as a result of which speakers use approximations.
This is what is evident in the morphological structure of Kemantney as a replaced language.
3.3.4.1. STEM
The verb is the most complex category in Kemantney. The stem carries the lexical meaning. The
inflections are suffixes marking person, number, gender, tense and mood. The following are
example of verbs:
ዅይ- (ብላ/ቢይ) Eat ላⶖ- (ና/ናይ) Come ፈሽ-(ዉሰድ) Take

ጃⷕ - (ጟጣ/ጭ) Drink ትምብ- (ተነስ/ሽ) Stand ገመር-(ተናገር) Speak
ኪንት- (መማር) Learn ጋኝ- (መሮጥ) Run ፊ-(ሂድ/ጅ) Go out
ገንጅ- (ተኛ/ኚ) Sleep ኵይ- (ግደል) Kill ሻር-(ፃፋ) Write
ተኯሰም- (ተቀመጥ/ጭ) Sit ይ- (አለ/አለች) Say
3.3.4.2. VERB INFLECTIONS
As mentioned in Hetzron [10], [67], the Agew verbal system has an extremely rich inflectional
system. Palmer [34] has said that he had counted up to ten thousand verb forms in Bilen. As in
Page | 71
most other Cushitic languages, all Kemantney verbs are suffixing. The system shows inflections of
person, number, gender, tense and mood [1].
PERSON, NUMBER AND GENDER INFLECTIONS
Though it is not always easy to tell the exact number of the distinct verbs and their respective
verbal inflections, in the majority of cases, there are five formally distinct conjugational paradigms
corresponding to the seven forms of personal pronouns. The five forms are first singular and third
male singular; second male singular and second female singular; second polite, first plural and
third plural; second plural and finally third female singular.
3.3.4.3. VERB TO BE
Verb to be in Kemantney has an irregular conjugational pattern. There is a high degree of

simplification of all conjugations reduced to just the third male singular form ጋላ (IS).
አን ጋይል (እኔ ነኝ) I am

እንት ጋይላ (አንቺ ነሽ) You (F) are
እንት ጋይላ (አንተ ነህ) You (M) are
ናይ ጋጝልላ (እርሰዎ/እርሳቸዉ ናቸዉ) You (POL) are
ኒ ጋላ (እሱ ነዉ) He is
ኒ ጋይላ (እሷ ናት) She is
አኔው ጋጝልል/ጋጝልላ (እኛ ነን) We are
እንተእንዴው ጋጝይልላ (እናንተ ናቹህ) You are
ናይዴው ጋጝልላ (እነሱ ናቸዉ) They are
Affirmative copula for verb to be has the following.
Page | 72
አጝ-ኧኵ = be(come)-IMP I become/I will become
አጝ-እይ-ኧኵ = be(come)-2P-IMP You (F) become/You will become
አጝ-እይ-ኧኵ = be(come)-2P-IMP You (M) become/You will become
አጝ-ኧኵ-እን = be(come)-IMP-PL You (POL) become/You (POL) will become
አጝ-ኧኵ = be(come)-IMP He becomes/He will become
አጝ-ኧ-ቲ = be(come)-IMP-F She becomes/She will become
አጝ-ን-ኧኵ = be(come)-PL-IMP We become/We will become
አጝ-እይ-ን-ኧኵ = be(come)-2P-PL-IMP You become/You will become
አጝ-ኧኵ-እን = be(come)-IMP-PL They become/They will become
The strange pattern from the above table in form of the second polite and third plural whereas
the expected form is አጝ-ን-ኧኵ = አጝነኵ, the plural and the tense markers metathesize to result in
አጝ-ኧኵ-እን = አጘኵን.
A. Negative copula for verb to be.
አጝ-ኧ-ል = be(come)-IMP-NEG I do not become/I will not become

አጝ-እይ-ኧ-ላ = be(come)-2P-IMP-NEG You do not become /You will not become (M)
አጝ-እይ-ኧ-ላ = be(come)-2P-IMP-NEG You do not become/You will not become (F)
አጝ-ኧ-ል-ላ = be(come)-IMP-PL-NEG You (POL) do not become/You (POL) will not become
አጝ-ኧ-ላ = be(come)-IMP-NEG He does not become/He will not become
አጝ-እይ-ኧ-ላ = be(come)-3F-IMP-NEG She does not become/She will not become
አጝ-ን-ኧ-ል = be(come)-1PL-IMP-NEG We do not become/We will not become
አጝ-እይ-ኧ-ል-ላ = be(come)-2P-IMP-PL-NEG You do not become/You will not become
አጝ-ኧ-ል-ላ = be(come)-IMP-3PL-NEG They do not become/They will not become
In the negative paradigm, the 1S and 1PL drop the final vowel of the negative marker.
B. Verb to be conjugation of the copula for the past tense shows the same as that of finite verbs.
Page | 73
አን ስምብ-Ø-ኤⶖ = I be-1S-PS I was
እንት ስምብ-እይ-ኤⶖ = you be-2MS-PS You were
እንት ስምብ-እይ-ኤⶖ = you be-2FS-PS You were
ናይ ስምብ-እን-ኤⶖ = you (POL) be-PL-PS You (POL) were
ኒ ስምብ-Ø-ኤⶖ = he be-3MS-PS He was
ኒ ስምብ-ኢ-ቲ = she be-PS-F She was
አኔው ስምብ-እን-ኤⶖ = we be-PL-PS We were
እንተዴው ስምብ-ኢ-ን-ኤⶖ = you (PL) be-2P-PL-PS You were
ናይዴው ስምብ-እን-ኤⶖ = they be-PL-PS They were
The past form copula ስምብ- and the verb ስም-'live' can be used alternatively. Like in other
conjugational patters, /-Ø/ marks the 1Sand 3MS; /-ኢ/or /-(እ)ይ/ shows 2P; /-(እ)ን/ is the plural
form which also shows politeness and /-ቲ/ is the 3rd person feminine suffix.
3.3.4.4. TENSE AND ASPECT
Kemantney has two aspects and different tenses. The aspects are perfect, imperfect and
progressive. The perfect aspect is used for the simple past, remote-past (pluperfect or past
perfect) and present perfect with temporal distinctions. The imperfect is used for the habitual
present and future. The progressive aspect is either a past progressive or present progressive [1].
The morphological property of the present and the future tenses provides a good ground for
considering the forms as being non-past tense and imperfect aspect [1].
3.3.4.4.1. PERFECTIVE ASPECT
SIMPLE PAST
This tense expresses actions that started and completed in the past that is before the time of the
utterance about them.
Page | 74
አን ዋስ-Ø-እⶖ = I hear-1S-PS I heard
እንት ዋስ-ይ-እⶖ = you hear-2S-PS You heard
ኒ ዋስ-Ø-እⶖ = he hear-3MS-PS He heard
ኒ ዋስ-እ-ት = she hear-PS-3F She heard
አኔው ዋስ-እን-እⶖ = we hear-1PL-PS We heard
እንተዴው ዋስ-ይ-እን-እⶖ = you hear-2P-2PL-PS You heard
ናይዴው ዋስ-እን-እⶖ = they hear-3PL-PS They heard
There are principally two variants of the past tense marker: /-እ/ in the 3FS and /-እⶖ/ elsewhere. In
rare instances, when the verb stem consists of a single syllable ending in /ይ/, as in ፊይ-ኣጝ = ፊያጝ
'went', ቢይ-ኣጝ = ቢያጝ ‘left’ and ሸይ-ኣጝ = ሸያጝ 'held', the tense marker in the first singular and third
singular is /-ኣጝ/ which is identical with the third male singular relativized. In forms, such as, for
instance, ይዀይ-እⶖ = ይዀይⶖ 'laughed', ድውይ-እⶖ = ድውይⶖ 'told', the tense marker remains /-እⶖ/.
All tense and aspect markers occur immediately following the person, number and gender
markers.
PRESENT PERFECT
This tense expresses an action, which started sometime in the past and is completed sometime in
the near past. The form consists of the stem, the auxiliary and the imperfective marker.
Page | 75
Table one:
አን ዋስ-ዋን-ኧኵ = I hear-AUX-IMP I have heard

እንት ዋስ-ይ-ኣን-ይ-ኧኵ = you hear-2P-AUX-2P-IMP You (M/F) have heard
ናይ ዋስ-እን-ዋን-ኧኵ = you (POL) hear-PL-AUX-IMP You (POL) have heard
ኒ ዋስ-ዋን-ኧኵ = he hear-AUX-IMP He has heard
ኒ ዋስ-ይ-ኣን-ኧ-ት = she hear-3P-AUX-IMP-F She has heard
አኔው ዋስ-ን-ዋን-ን-ኧኵ = we hear-PL-AUX-PL-IMP We have heard
እንተዴው ዋስ-ይ-ን-ዋን-ይ-ኧኵ-እን = you hear-2P-PL-AUX-2P-IMP-PL You have heard
ናይዴው ዋስ-ን-ዋን-ኧኵ-እን = they hear-PL-AUX-IMP-PL They have heard
The internal structure of the verb in the present perfect shows [stem + inflection + AUX +
inflection + aspect]. Unlike other paradigms, the positions of number and aspect markers
interchange in the second plural and third plural. The first segment of the AUX, that is, /ዋ/ is
deleted following /ይ/ in the second singular and third female singular. The imperfect markers /-ኧ/
(in the third female) and /-ኧኵ/ (elsewhere) invariably show imperfective aspect and the auxiliary
/-ዋን/ ~/-ኣን/ show tense.
Another feature of the verb in the present perfect tense and imperfective aspect is the repetition
of the plural marker in the first plural, second plural and third plural. This phenomenon is related
to the property of the auxiliary.
As mentioned above, ዋን is an auxiliary and at the same time a verb with the following conjugation.
Page | 76
Table two
አን ዋን-ኧኵ = have/present/have to-IMP I have/I am present/I have to

እንት ዋን-ይ-ኧኵ = have/present/have to-2P-IMP You have/You are present/You have to
ኒ ዋን-ኧኵ = has/present/has to-IMP He has/He is present/He has to
ኒ ዋን-ኧ-ት = has/present/has to-IMP-3F She has/She is present/She has to
አኔው ዋን-ን-ኧኵ = have/present/have to-PL-IMP We have/We are present/We have to
እንተዴው ዋን-ይ-ኧኵ-እን = have/present/have to-2P- You have/ You are present/You have to
IMP-PL
ናይዴው ዋን-ኧኵ-እን = have/present/have to-IMP-PL They have/They are present/They have to
Though the expected forms of the 2PL and 3PL paradigms in table two are ዋን-ይ-እን-ኧኵ and ዋን-እን-
ኧኵ, the tense/aspect and the plural suffixes metathesize.
As can be seen in table two, what has been exhibited in the verbs in table one is the exact replica
of the independent verb ዋን- 'have' or 'exist' or 'have to'.
The following table shows the underlying representation of table one where ዋን appears in its full
conjugation.
አን ዋስ-ዋን-ኧኵ = I hear-AUX-IMP I have heard

እንት ዋስ-ይ-እዋን-ይ-ኧኵ = you hear-2P-AUX-2P-IMP You (M/F) have heard
ኒ ዋስ-ዋን-ኧኵ = he hear-AUX-IMP He has heard
ኒ ዋስ-ይ-እዋን-ኧ-ት = she hear-3p-AUX-IMP-F She has heard
አኔው ዋስ-ን-እዋን-ን-ኧኵ = we hear-PL-AUX-IMP We have heard
እንተዴው ዋስ-ይ-እን-ዋን-ይ-ኧኵ-እን = you hear-2P-PL-AUX-2P-IMP-PL You have heard
ናይዴው ዋስ-ን-እዋን-ኧኵ-እን = they hear-PL-AUX-IMP-PL They have heard
PAST PERFECT
Page | 77
The past perfect shows actions, which started and completed in the remote past. It is denoted by
a combination of a gerundive (GE) form of a verb and the copula ስምብ 'be'.
አን ክዝ-ወ ስምብ-Ø-ኤⶖ = I sell-GE be-1S-PS I had sold

እንት ክዝ-ይ-ኧ ስምብ-እይ-ኤⶖ = you sell-2P-GE be-2P-PS You had sold
ናይ ክዝ-ን-ወ ስምብ-እን-ኤⶖ = you sell-PL-GE be-PL-PS You (POL) had sold
ኒ ክዝ-ወ ስምብ-Ø-ኤⶖ = he sell-GE be-3MS-PS He had sold
ኒ ክዝ-ይ-ኧ ስምብ-ኢ-ቲ = she sell-3F-GE be-PS-F She had sold
አኔው ክዝ-ን-ወ ስምብ-ን-ኤⶖ = we sell-PL-GE be-1PL-PS We had sold
እንተዴው ክዝ-ይ-ን-ወ ስምብ-ኢ-ን-ኤⶖ = you sell-2P-PL-GE be-2P-PL-PS You had sold
ናይዴው ክዝ-ን-ወ ስምብ-ን-ኤⶖ = they sell-PL-GE be-PL-PS They had sold
Both the gerundive stem and the copula show person, gender and number inflections. The tense
is, however, indicated by the copula. The /ው/ of the gerundive /ወ/ is deleted in the second
singular and third female singular following the person marker /ይ/ as shown in present perfect. As
we can see in above, the past marker in the AUX in the 3F is the front high vowel /-ኢ/ which is the
variant of the mid-high vowel /-እ/ shown in simple past.
3.3.4.4.2. IMPERFECTIVE ASPECT
PRESENT/FUTURE
A habitual action has done customarily (present) and an action which will take place sometime in
the future after the moments of speech have the same form. Both show the mid-central vowel /-
ኧ/ in the third female singular and /-ኧኵ/ elsewhere. The vowel /ኧ/ is the imperfect marker in
subordinate verbs for all persons.
Page | 78
አን ⷓል-Ø-ኧኵ = I see-1P-IMP I see/ I will see
እንት ⷓል-ይ-ኧኵ = you see-2P-IMP You see/ You will see
ኒ ⷓል-Ø-ኧኵ = he see-3P-IMP He see/ He will see
ኒ ⷓል-ኧ-ት = she see-IMP-F She see/ She will see
አኔው ⷓል-ን-ኧኵ = we see-PL-IMP We see/ We will see
እንተዴው ⷓል-ይ-ኧኵ-እን = you see-2P-IMP-PL You see/ You will see
ናይዴው ⷓል-ኧኵ-እን = they see-IMP-PL They see/ They will see
Like in the present perfect forms, the change of the positions of the plural and the imperfect
markers is witnessed in the present future tense. Some terminal speakers use the underlying
pattern ⷓል-ይ-እን-ኧኵ and ⷓል-ን-ኧኵ for the second plural and third plural respectively at the
surface level. However, the competent speakers of the language who consistently metathesize the
two inflectional elements, rejected such patterns. Incidentally, the first plural and third plural,
which show similar forms in the past, show distinction in the present/future tense. In order to
make a distinction between the present and future, adverbs of time can be used. These include ናን
'now', ንኝ 'today' etc. for the present, and አመር 'tomorrow', ሻጞ ‘next year' etc. for the future.
PROGRESSIVE (DURATIVE) ASPECT
This aspect denotes the continuation of an action either sometime in the past (past progressive) or
during the time of utterance (present progressive). In both cases, the progressive (PRO) aspect is
indicated by suffixing /-ሳብ/.
Page | 79
አን ዋስ-ኧ-ሳብ ጋይል = I hear-IMP-PRO I am I am hearing
እንት ዋስ-ይ-ኧ-ሳብ ጋይላ = you hear-2P-IMP-PRO you are You are hearing
ናይ ዋስ-ኧን-ኧ-ሳብ ጋጝልላ = you hear-PL-IMP-PRO you are You are hearing
ኒ ዋስ-ኧ-ሳብ ጋላ = he hear-IMP-PRO he is He is hearing
ኒ ዋስ-ይ-ኧ-ሳብ ጋይላ = she hear-3F-IMP-PRO she is She is hearing
አኔው ዋስ-እን-ኧ-ሳብ ጋጝልል/ላ = we hear-PL-IMP-PRO we are We are hearing
እንተዴው ዋስ-ይ-ኧን-ኧ-ሳብ ጋጝይልላ = you hear-2P-PL-IMP-PRO you are You are hearing
ናይዴው ዋስ-ኧን-ኧ-ሳብ ጋጝልላ = they hear-PL-IMP-PRO they are They are hearing
The morpheme /-ሳብ/, which are suffixed with the main verb, and the helping verb, which appears
in the present, denotes the present progressive. In addition to the progressive marker, the main
verb is inflected for the imperfective marker /ኧ/.
The past progressive is indicated by the same inflection in a similar pattern. The only difference is
that the helping verb is in the past.
አን ዋስ-ኧ-ሳብ ስምብ-ኤⶖ = I hear-IMP-PRO be-PS I was hearing

እንት ዋስ-ይ-ኧ-ሳብ ስምብ-ይ-ኤⶖ = you hear-2P-IMP-PRO be-2P-PS You were hearing
ናይ ዋስ-ይ-ኧን-ኧ-ሳብ ስምብ-ን-ኤⶖ = you hear-2P-PL-IMP-PRO be-PL-PS You were hearing
ኒ ዋስ-ኧ-ሳብ ስምብ-ኤⶖ = he hear-IMP-PRO be-PS He was hearing
ኒ ዋስ-ይ-ሳብ ስምብ-ኢ-ቲ = she hear-3F-IMP-PRO be-PS-3F She was hearing
አኔው ዋስ-እን-ኧ-ሳብ ስምብ-ን-ኤⶖ = we hear-PL-IMP-PRO be-PL-PS We were hearing
እንተዴው ዋስ-ይ-ኧን-ኧ-ሳብ ስምብ-ኢ-ን-ኤⶖ = you hear-2P-PL-IMP-PRO be-2P-PL-PS You were hearing
ናይዴው ዋስ-ኧን-ኧ-ሳብ ስምብ-ን-ኤⶖ = they hear-PL-IMP-PRO be-PL-PS They were hearing
The shape of the main verb in the past progressive is identical with that of the present
progressive. The morpheme /-ሳብ/ and the helping verb designate the past progressive.
3.3.4.5. VERBAL EXTENTIONS
Page | 80
PASSIVE
The passive (PA) is predominantly indicated by the suffixes /-(እ) ስ/ and /-(እ) ት/.
አር-ኢ ኒ-ዝ ዅይ-ስ-እⶖ = grain-NOM he-by eat-PA-PS = The food was eaten by him.
ደብር እንተዴው-ዝ ታብ-ስ-እዋን-ኧኵ = church you (PL)-by fence-PA-AUX-IMP = The church has been
fenced by you.
ሳማ-ኢ ይ-ዝ ይው-ስ-እⶖ = money-NOM me-by give-PA-PS = The money was given by me.
ዛፍ-ኧን እይየን-እዝ ታከለ-ስ-እን-እⶖ = tree-PL people-by plant-PA-PL-PS = The trees were planted by the
people.
ጀግር-ኢ ናይዴው-እዝ ⷓል-ስ-እⶖ = monkey-NOM they-by see-PA-PS = The monkey was seen by them.
ፍንትራ አኔው-እዝ ትኝ-ስ-እ-ት = goat we-by find-PA-PS-3FS = The goat was found by us.
ናይዴው ፎሊስ-እዝ ጟረጟረ-ስ-እን-እⶖ = they police-by suspect-PA-PL-PS = They were suspected by the
police.
CAUSATIVE
Zelealem [1] the causative (CA) marker is /-(እ)ሽ/. Appleyard [9] and Conti Rossini (1912) identify
the same morpheme, plus /-(እ)ዝ/. Like in the passive, the causative comes immediately following
the verb stem. Conti Rossini (1912) has sketch the first grammatical structure design for
Kemantney language his documentation was not found or inaccessible on the current work.
Page | 81
ይር ከሽኝ-ሽ-እⶖ = man call-CA-PS The man caused to call
ፈንኪያ ሸወ-ሽ-እⶖ = soldier arrest CA-PS The solider caused to arrest
ይር ኵይ-እሽ-እⶖ = man kill-CA-PS The man caused to kill
ኒ ይ-ት ⷓሻⷕ -ሽ-እⶖ = he me-AC punish-CA-PS He caused me to be punished
ዅራ-ኢ ገረፈ-ሽ-እⶖ = child-NOM flagellate-CA-PS The boy caused to flagellate
The causativization is a kind of transitivization, some intransitive verbs become transitive by

adding /-ሽ/.
Intransitive verb Transitive verb

ኯንኯል- Take a mouthful ኯንኯልሽ Give a mouthful
አቨር- Be thin አቨርሽ Make thin
በጀጅ- Be fat በጀጅሽ Make fat
ጅልው- Turn ጅልውሽ Make turn
ADJUTATIVE
This form is denoted by reduplicating the causative /-ሽ/.
ኒ ናይዴው-እስ አረ-ስ ዅይ-ሽ-እሽ-እⶖ = he they-AC food-AC eat-CA-RED-PS = He helped them eat the
food.
አኔው ኒ-ት ካነ-ስ ለሽ-እሽ-እሽ-እን-እⶖ = we he-AC wood-AC bring-CA-RED-PL-PS = We helped him bring
the wood.
አን ኒሽ አርግ-ስ ምⶖ-እሽ-እሽ-እⶖ = I her-bed-AC carry-CA-RED-PS = I helped her carry the bed.
ታየ አኔው-ስ ምረወ-ስ ኵይ-እሽ-እሽ-እⶖ = Taye we-AC snake-AC kill-CA-RED-PS = Taye helped us kill the
snake.
Page | 82
ኒ ኒ-ት አር-ስ አሽ-እሽ-እሽ-እⶖ = he he-AC crop-AC harvest-CA-RED-PS = He helped him harvest the crop.
ናይዴው ይ-ት ንኝ-ስ ሰር-እሽ-እሽ-እን-እⶖ = they me-AC house-AC build-CA-RED-3PL-PS = They helped me
build the house.
FREQUENTATIVE
A. One of the ways of expressing the frequentative is by reduplicating the verb stem and by
inserting the linking vowel /-ኧ/. This is also the same for attenuative.
ከል Break ከል-ኧ-ከል-እⶖ = break-LINK-break-PS Broke again and again

ጃⷕ Drink ጃⷕ -ኧ-ጃⷕ -እⶖ = drink-LINK-drink-PS Drank again and again
ብዝ Open ብዝ-ኧ-ብዝ-እⶖ = open-LINK-open-PS Opened again and again
ታም Taste ታም-ኧ-ታም-እⶖ = taste-LINK-taste-PS Tasted again and again
B. The other way of showing the frequentative is by reduplicating the verb stem and by adding a
helping verb to carry the inflectional elements.
ከል-ከል ሸብ-እⶖ = break-break do-PS Did break and break

ጃⷕ -ጃⷕ ሸብ-እⶖ = drink-drink do-PS Did drink and drink
ብዝ-ብዝ ሸብ-እⶖ = open-open do-PS Did open and open
ታም-ታም ሸብ-እⶖ = taste-taste do-PS Did taste and taste
C. When the verb is tri-radical, the frequentative is designated by reduplicating the penultimate
radical.
አኔው መረ-ረ-ጥ-ን-እⶖ = we choose-RED-choose-PL-PS = We did select and select.

ኒ ክል-ል-ኝ-እ-ት = she dance-RED-dance-PS-3F = She spoiled the house.
ናይዴው ይር-ስ በተ-ተ-ን-ን-እⶖ = they man-AC disperse-RED-disperse-PL-PS = They dispersed the people
Page | 83
again and again.
ካሳ ኒ ንኝ-ስ ለወ-ወ-ይ-እⶖ = Kasa his-house-AC change-RED-change-PS = Kasa changed his house again
and again.
D. Frequency of actions is indicated by reduplicating time expressions like ሰን 'monday' and አመይ
'year'.
ኒ ሰን ሰን ቲይ-ኧኵ = he monday monday come-IMP He comes on mondays

ኒ አመይ አመይ-እዝ ቲይ-ኧኵ = he year year- by come-IMP He comes yearly
ኒ ናን-ኢር ናን-ኢር ዅይ-ኧኵ = he now-too now-too eat-IMP He eats now and then
Zelealem says the frequentative seems to be not much in Agew languages. As to Kemantney, it is
most likely a recent introduction from Amharic.
RECIPROCAL
There are different ways of expressing reciprocity in Kemantney. Reduplication is the major one.
But the reduplication process is sensitive to the syllable structure of the verb.
A. In monosyllabic verb stems, reciprocity is marked by total reduplication of the stem, insertion
of the linking vowel /-ኧ/ and suffixing the passive /-ስ/ with both the reduplicated stems.
አኔው ⷓ ል-ስ-ኧ-ⷓ ል-ስ-እን-እⶖ = we see-PA-LINK-RED-PA-PL-PS We saw each other

አኔው ፋጝ-ስ-ኧ-ፋጝ-ስ-እን-እⶖ = we marry-PA-LINK-RED-PA-PL-PS We married each other
እንተዴው ⷓ ል-ስ-ኧ-ⷓ ል-ስ-ኢ-ን-እⶖ = you see-PA-LINK-RED-PA-2P-PL-PS You (PL) saw each other
ናይዴው ⷓ ል-ስ-ኧ-ⷓ ል-ስ-እን-እⶖ = they see-PA-LINK-RED-PA-PL-PS They saw each other
እንተዴው ፋጝ-ስ-ኧ-ፋጝ-ስ-ኢ-ን-እⶖ = you marry-PA-LINK-RED-PA-2P-PL-PS You married each other
ናይዴው ፋጝ-ስ-ኧ-ፋጝ-ስ-እን-እⶖ = they marry-PA-LINK-RED-PA-PL-PS They married each other
Page | 84
B. Another way of showing reciprocity in verb stems with a CVCVC syllable structure is by
reduplicating the second CV syllable and suffixing the passive /-ስ/.
አኔው ከበ-በን-ስ-እን-እⶖ = we bear-bear-PA-PL-PS We gave birth to each other's child.

እንተዴው ይከ-ከል-ስ-ኢ-እን-ኧኵ = you love-RED-PA-2P-PL-IMP You love each other.
C. In Kemantney, reciprocity is also expressed by using the bound possessive pronouns of the first
plural, second plural and third plural plus the reflexive pronoun.
አነ-ኒሽ-ኒሽ = አነኒሽኒሽ = we each other.
እንተ-ኒሽ-ኒሽ = እንተኒሽኒሽ = You each other.
ና-ኒሽ-ኒሽ = ናኒሽኒሽ = They each other.
አኔው አነ-ኒሽ-ኒሽ ከበ-በን-ስ-እን-እⶖ = we 1PL (PO) each other bear (RED)-PA-PL-PS = We gave birth
to each other's child.
እንተዴው እንተ-ኒሽ-ኒሽ ይከ-ከል-ስ-ኢ-እን-ኧኵ-እን = you 2PL (PO) each other love-RED-PA-2P-PL-IMP-

PL = You love each other.
EMPHATIC
A. The morpheme /-ገን/ which is attached to verbs shows emphasis. It is equivalent to the
Amharic /-እኮ/, which also shows the same function.
Page | 85
አን ስላጛ ጃⷕ -እⶖ-ገን = I beer drink-PS-EMP I did drink beer
እንት ስላጛ ጃⷕ -እይ-እⶖ-ገን = you beer drink-2P-PS-EMP You did drink beer
ኒ ስላጛ ጃⷕ -እⶖ-ገን = he beer drink-PS-EMP He did drink beer
ኒ ስላጛ ጃⷕ -እ-ት-ገን = she beer drink-PS-3F-EMP She did drink beer
አኔው ስላጛ ጃⷕ -ን-እⶖ-ገን = we beer drink-PL-PS-EMP We did drink beer
እንተዴው ስላጛ ጃⷕ -እይ-ን-እⶖ-ገን = you beer drink-2P-PL-PS-EMP You did drink beer
ናይዴው ስላጛ ጃⷕ -ን-እⶖ-ገን = they beer drink-PL-PS-EMP They did drink beer
B. The other nominal emphatic marker is /-(እ)ር/ which is equivalent to the Amharic /-ም/.
ታየ-ር በላይ-ር ዋስ-ን-እⶖ = Taye-too Belay-too hear-PL-PS = Taye too, Belay too heard.
ሚዝ-እስ-እር ጃⷕ -ስ-እⶖ = mead-AC-too drink-PA-PS = The mead too was drank.
አነ-ኳንኳ-ኢ-ር ድዝ-እⶖ = our- language-NOM-too perish-PS = Our language too perished.
ኒ ገንጅ-እይ-ኧ-ድ አን-እር ገንጅ-ኧኵ = she sleep-3F-IMP-if I-too sleep-IMP = If she sleeps, I will sleep
too.
C. Cleft constructions are also used to show emphasis
ንኝ ጋላ ተኯስ-ስ-ኣጝ = house is burn-PA-RE It is the house which was burnt

ኒ-ት ጋላ ኒ ⷓ ል-ስ-ኤይ = he-AC is she see-RE It is him that she saw
አን ጋይል ይዅም ፋጝ-ኣር = I am last year marry-RE It is me who married last year
አንጅኝ ጋላ ናይዴው ፈሽ-ን-ኧው = yesterday is they take-PL-RE It is yesterday that they took
This table show emphasized element appears initially followed by the copula and then by the
relative verb. In normal utterances, however, the head noun follows its relative modifier.
D. The other particle which shows emphasis is /-ጌ/. It is equivalent to the Amharic እንጅ.
Page | 86
ዋስ-ኢው-ጌ = hear-JU-EMP Let me hear
ዋስ-ጌ = hear-EMP You hear
ዋስ-ድ-እወ-ጌ = hear-3M-JU-EMP Let him hear
ዋስ-ት-እወ-ጌ = hear-3F-JU-EMP Let her hear
ዋስ-ን-እወ-ጌ = hear-PL-JU-EMP Let us hear
ዋስ-ኧ-ጌ = hear-IMP-EMP You hear
ዋስ-ድ-እን-ወ-ጌ = hear-3P-PL-JU-EMP Let them hear
“The most frequently used emphatic marker is /-ጌ/ followed by /-(እ)ር/.”
3.3.4.6. MOOD
GERUNDIVE
The gerundive (GE) form shows the precedence of an action over another action designated by the
main verb. The predominantly used form is /-(እ)ወ/. Its variant /-ኧ/ appears following the suffix /-
ይ/ in the second singular and third female where /ወ/ is deleted.
አን ⷓ ሸንት-እወ = I steal-GE I having stolen

እንት ⷓ ሸንት-እይ-ኧ = you steal-2P-GE You having stolen
ኒ ⷓ ሸንት-እወ = he steal-GE He having stolen
ኒ ⷓ ሸንት-እይ-ኧ = she steal-F-GE She having stolen
አኔው ⷓ ሸንት-እን-ወ = we steal-PL-GE We having stolen
እንተዴው ⷓ ሸንት-ኢ-ን-ወ = you steal-2P-PL-GE You having stolen
ናይዴው ⷓ ሸንት-እን-ወ = they steal-PL-GE They having stolen
The gerundive is not always marked especially in the speech of passive speakers. It is designated
by a zero morpheme for all persons are the following sample list.
Page | 87
አን ጋኝ-Ø = I run-GE I having run
እንት ጋኝ-ይ-Ø = you run-2P-GE You having run
ኒ ጋኝ-Ø = he run-GE He having run
ኒ ጋኝ-ይ-Ø = she run-3F-GE She having run
አኔው ጋኝ-እን-Ø = we run-PL-GE We having run
እንተዴው ጋኝ-ይ-እን-Ø = you run-2P-PL-GE You having run
ናይዴው ጋኝ-እን-Ø = they run-PL-GE They having run
JUSSIVE
The Jussive (JU) has the following forms in all except the second person.
አን ክልኝ-ኢው = I dance-JU Let me dance

ኒ ክልኝ-ድ-እወ = he dance-3M-JU Let him dance
ኒ ክልኝ-ት-እወ = she dance-3F-JU Let her dance
አኔው ክልኝ-ን-እወ = we dance-PL-JU Let us dance
ናይዴው ክልኝ-ድ-እን-ወ = they dance-3P-PL-JU Let them dance
አን ኪንት-ኢው = I learn-JU Let me learn
ኒ ኪንት-እድ-እወ = he learn-3M-JU Let him learn
ኒ ኪንት-እት-እወ = she learn-3F-JU Let her learn
አኔው ኪንት-ን-እወ = we learn-PL-JU Let us learn
ናይዴው ኪንት-እድ-እን-ወ = they learn-3P-PL-JU Let them learn
The jussive suffixes are /-ኢው/ for the firsts person singular and /-(እ)ወ/ for the other persons
which makes it similar to the gerundive. The unlike in the other verb paradigms, the suffix /-(እ)ድ/
appears as a third male singular and third plural marker in the jussive. /-እድ/ changes to its
voiceless counterpart in the third female singular.
The negative jussive, on the other hand, has the following form.
Page | 88
አን ኪንት-ኧ-ል-Ø = I learn-IMP-NEG-JU Let me not learn/I do not learn
ኒ ኪንት-እግ-ኢን = he learn-NEG-JU Let him not learn
ኒ ኪንት-እክ-ኢን = she learn-NEG-JU Let her not learn
አኔው ኪንት-እን-ኧ-ል-Ø = we learn-PL-IMP-NEG-JU Let us not learn /We do not learn
ናይዴው ኪንት-እግ-እን-ኢን = they learn-NEG-PL-JU Let them not learn
In the negative, the jussive marker is /-ኢን/ but not in all persons. In the first person singular and
plural, it is a zero morpheme. Another difference between the first person and the third person is
on the negative particle. Whereas in the first person, negation is marked by /-ል/, in the third
person, it is marked by /-ግ/ ~ /-ክ/ which appears in subordinate verbs.
IMPERATIVE
As shown in table, the imperative is expressed by complete change of verb (suppletion),

suffixation of /-ኣ/, bare stem, and a combination of the first two.
Page | 89
Positive Negative
ት- Come
ላⶖ- You (M) come ት-እት-ኣ = come-NEG-IM You (M) do not come
ላⶖ- You (F) come ት-እት-ኣ = come-NEG-IM You (F) do not come
ላⶖ-ኣ You (PL) come ት-እት-እን-ኣ = come-NEG-PL-IM You (PL) do not come
Positive Negative
ለሽ- Take
አስ- You (M) take ለሽ-እት-ኣ = take-NEG-IM You (M) do not take
አስ- You (F) take ለሽ-እት-ኣ = take-NEG-IM You (F) do not take
አስ-ኣ You (PL) take ለሽ-እት-እን-ኣ = take-NEG-PL-IM You (PL) do not take
Positive Negative
ⷓ ል- Look
ⷓ ል- You (M) look ⷓ ል-እት-ኣ = look-NEG-IM You (M) do not look
ⷓ ል- You (F) look ⷓ ል-እት-ኣ = look-NEG-IM You (F) do not look
ⷓ ል-ኣ You (PL) look ⷓ ል-እት-እን-ኣ = look-NEG-PL-IM You (PL) do not look
Positive Negative
ገንጀ- sleep
ገንጅ- You (M) sleep ገንጅ-እት-ኣ = sleep-NEG-IM You (M) do not sleep
ገንጅ- You (F) sleep ገንጅ-እት-ኣ = sleep-NEG-IM You (F) do not sleep
ገንጅ-ኣ You (PL) sleep ገንጅ-እት-እን-ኣ = sleep-NEG-PL-IM You (PL) do not sleep
Table 3.4: Imperative
To show the above, in the affirmative, the imperative ending appears only in the plural whereas in
the negative, it also appears in the singular. An irregular suppletive form are shown in the verbs ት-
'come’ and ለሽ- 'take' whose imperative forms are ላⶖ 'come' and አስ 'take' respectively. As shown
in the above, the commonest imperative marker is a zero morpheme in the singular, whereas in
the plural, it is /-ኣ/.
Page | 90
In the second polite form, the imperative marker is /-ኢን/.
ⷓ ል-ን-ኢን = see-PL-IM You look!

ገንጅ-ን-ኢን = sleep-PL-IM You sleep!
Unlike the 2PL where the plural marker /-ን/ is dropped, in the 2POL, it appears. As shown in the
above examples, the negative marker in the imperative is /-(እ)ት/ in both the singular and plural.
CONDITIONAL
A. The suffix /-ድ(ክ)/ mark conditional or hypothetical mood which is equivalent to 'if' in English
and /ከ-/ in Amharic. It is suffixed to the verb of the protasis clause. Like the rest of subordinate
verbs, conditional verbs are inflected for person, number and tense/aspect.
አን ዋስ-ኧ-ድ(ክ) = I hear-IMP-CON If I hear

እንት ዋስ-ይ-ኧ-ድ(ክ) = you hear-2P-IMP-CON If you hear
ኒ ዋስ-ኧ-ድ(ክ) = he hear-IMP-CON If he hear
ኒ ዋስ-ይ-ኧ-ድ(ክ) = she hear-3F-IMP-CON If she hear
አኔው ዋስ-ን-ኧ-ድ(ክ) = we hear-PL-IMP-CON If we hear
እንተዴው ዋስ-ይ-እን-ኧ-ድ(ክ) = you hear-2P-PL-IMP-CON If you hear
ናይዴው ዋስ-ን-ኧ-ድ(ክ) = they hear-PL-IMP-CON If they hear
B. The conditional can also be expressed by the suffix /-ን/ can substitute /-ድ (ክ)/ without causing
any meaning difference.
Page | 91
አን ፋጝ-ኧ-ን ሸራጝ ጋላ = I marry-IMP-CON good it is It is good if I marry
እንት ፋጝ-እይ-ኧ-ን ሸራጝ ጋላ = you marry-2P-IMP-CON good it is It is good if you marry
ኒ ፋጝ-ኧ-ን ሸራጝ ጋላ = he marry-IMP-CON good it is It is good if he marry
ኒ ፋጝ-እይ-ኧ-ን ሸራጝ ጋላ = she marry-3F-IMP-CON good it is It is good if she marry
አኔው ፋጝ-ን-ኧ-ን ሸራጝ ጋላ = we marry-PL-IMP-CON good it is It is good if we marry
እንተዴው ፋጝ-እይ-ን-ኧ-ን ሸራጝ ጋላ = you marry-2P-PL-IMP-CON good it is It is good if you marry
ናይዴው ፋጝ-ኧን-ኧ-ን ሸራጝ ጋላ = they marry-PL-IMP-CON good it is It is good if they marry
C. The temporal relations are expressed by the suffix /-ኙ/ 'when' and in Latin.
አን ጋኝ-ኧ-ኙ = I run-IMP-CON When I run

እንት ጋኝ-ይ-ኧ-ኙ = you run-2P-IMP-CON When you run
ኒ ጋኝ-ኧ-ኙ = he run-IMP-CON When he run
ኒ ጋኝ-ይ-ኧ-ኙ = she run-3F-IMP-CON When she run
አኔው ጋኝ-ን-ኧ-ኙ = we run-PL-IMP-CON When we run
እንተዴው ጋኝ-ይ-እን-ኧ-ኙ = you run-2P-PL-IMP-CON When you run
ናይዴው ጋኝ-ን-ኧ-ኙ = they run-PL-IMP-CON When they run
3.3.4.7. INTERROGATIVE
There are three kinds of interrogatives in Kemantney. The first is morphological using the
morphemes: /-ኢ/, /-ማ/ and /-ኣ/ in verbs. This applies to 'yes' or 'no' questions (Q).
A. The present/future interrogative conjugations are the following sample.
Page | 92
ጋኝ-ኢ = run-Q ጋኝ-ኢ-ማ = run-Q-Q (EMP) Do I run/will I run?
ጋኝ-ይ-ኢ = run-2P-Q ጋኝ-ይ-ኢ-ማ = run-2P-Q-Q (EMP) Do you run/will you run?
ጋኝ-ኢ = run-Q ጋኝ-ኢ-ማ = run-Q-Q (EMP) Does he run/will he run?
ጋኝ-ይ-ኢ = run-3F-Q ጋኝ-ይ-ኢ-ማ = run-3F-Q-Q (EMP) Does she run/will she run?
ጋኝ-ን-ኢ = run-PL-Q ጋኝ-ን-ኢ-ማ = run-PL-Q- Q (EMP) Do we run/will we run?
ጋኝ-ይ-ኧን-ኢ = run-2P-PL-Q ጋኝ-ይ-ኧን-ኢ-ማ = run-2P-PL-Q-Q (EMP) Do you run/will you run?
ጋኝ-ኧን-ኢ = run-PL-Q ጋኝ-ኧን-ኢ-ማ = run-PL-Q-Q (EMP) Do they run/will they run?
The interrogative marker /-ኢ/ is suffixed to verb stems as in the first singular and third male
singular or to the inflected form as in the rest. In other interrogative forms, the /-ማ/ interrogative
suffix, which is equivalent to the Amharic interrogative marker /-ን/.
B. When the verb is in the simple past, the interrogative marker is /-ኣ/.
ጋኝ-ኣ = run-Q Did I run?

ጋኝ-ይ-ኣ = run-2P-Q Did you run?
ጋኝ-ኣ = run-Q Did he run?
ጋኝ-ይ-ኣ = run-3F-Q Did she run?
ጋኝ-ን-ኣ = run-PL-Q Did we run?
ጋኝ-እይ-ን-ኣ = run-2P-PL-Q Did you run?
ጋኝ-ን-ኣ = run-PL-Q Did they run?
C. In the present perfect, the interrogative marker remains /-ኢ/. Sample of the present perfect
tense are the following:
Page | 93
ጋኝ-ዋን-ኢ = run-AUX-Q Have I run?
ጋኝ-ይ-ኣን-ኢ = run-2P-AUX-Q Have you run?
ጋኝ-ዋን-ኢ = run-AUX-Q Has he run?
ጋኝ-ይ-ኣን-ኢ = run-3F-AUX-Q Has she run?
ጋኝ-ን-ዋን-ኢ = run-PL-AUX-Q Have we run?
ጋኝ-ይ-ን-ዋን-ኧን-ኢ = run-2P-PL-AUX-PL-Q Have you run?
ጋኝ-ን-ዋን-ኢ = run-PL-AUX-Q Have they run?
D. The negative interrogative involves a relative verb in the imperfect tenses.
ጋኝ-ኧ-ግ-ኢር-ማ = run-IMP-NEG-RE-Q Do not I run/will not I run?

ጋኝ-ኧ-ክ-ኣር-ማ = run-IMP-NEG(2P)-RE-Q Do not you run/will not you run?
ጋኝ-ኧ-ግ-ኣጝ-ማ = run-IMP-NEG-RE-Q Does not he run/will not he run?
ጋኝ-ኧ-ክ-ኧይ-ማ = run-IMP-NEG(3F)-RE-Q Does not she run/will not she run?
ጋኝ-ኧ-ግ-እን-ኢር-ማ = run-IMP-NEG-PL-RE-Q Do not we run/will not we run?
ጋኝ-ኧ-ክ-እን-ኢር-ማ = run-IMP-NEG(2P)-PL-RE-Q Do not you run/will not you run?
ጋኝ-ኧ-ግ-ኧው-ማ = run-IMP-NEG-RE-Q Do not they run/will not they run?
E. The negative interrogative involves a relative verb in the simple past.
ጋኝ-ግ-ኢር-ማ = run-NEG-RE-Q Did not I run?

ጋኝ-ክ-ኣር-ማ = run-NEG(2P)-RE-Q Did not you run?
ጋኝ-ግ-ኣጝ-ማ = run-NEG-RE-Q Did not he run?
ጋኝ-ክ-ኧይ-ማ = run-NEG(3F)-RE-Q Did not she run?
ጋኝ-ግ-እን-ኢር-ማ = run-NEG-PL-RE-Q Did not we run?
ጋኝ-ክ-እን-ኢር-ማ = run-NEG(2P)-PL-RE-Q Did not you run?
ጋኝ-ግ-ኧው-ማ = run-NEG-RE-Q Did not they run?
Page | 94
F. In relative verbs, the particle is /-ማ/. In the following table the last vowel of the interrogative
marker /ኣ/ changes to /ኧ/ being followed by the enclitic /-ኒ/ which replaces the copula (CO).
ጋኝ-ኣር-መ-ኒ = run-RE-Q-CO Am I the one who will run?

ጋኝ-እይ-ኣር-መ-ኒ = run-2P-RE-Q-CO Are you the one who will run?
ጋኝ-ኣጝ-መ-ኒ = run-RE-Q-CO Is he the one who will run?
ጋኝ-እይ-ኣር-መ-ኒ = run-3F-RE-Q-CO Is she the one who will run?
ጋኝ-ን-ኣር-መ-ኒ = run-PL-RE-Q-CO Are we the ones who will run?
ጋኝ-እይ-ኧን-ኣር-መ-ኒ = run-2P-PL-RE-Q-CO Are you the ones who will run?
ጋኝ-ኧው-መ-ኒ = run-RE-Q-CO Are they the ones who will run?
Future tense
ጋኝ-ኣር-መ-ኒ = run-RE-Q-CO Will I be the one who will run?

ጋኝ-እይ-ኣር-መ-ኒ = run-2P-RE -Q-CO Will you be the one who will run?
ጋኝ-ኣጝ-መ-ኒ = run-RE-Q-CO Will he be the one who will run?
ጋኝ-እይ-ኣር-መ-ኒ = run-3F- RE-Q-CO Will she be the one who will run?
ጋኝ-ን-ኣር-መ-ኒ = run-PL-RE-Q-CO Will we be the ones who will run?
ጋኝ-እይ-ኧን-ኣር-መ-ኒ = run-2P-PL-RE-Q-CO Will you be the ones who will run?
ጋኝ-ኧው-መ-ኒ = run-RE-Q-CO Will they be the ones who will run?
G. When asking permission, the interrogative has the following form where the jussive forms
appear with the interrogative suffix /-ማ/.
Page | 95
ጃⷕ -ኢው-ማ = drink-JU-Q May I drink?
ጃⷕ -ት-እው-ማ = drink-2P-JU-Q May you drink?
ጃⷕ -ድ-እወ-ማ = drink-3MS-JU-Q May he drink?
ጃⷕ -ት-እወ-ማ = drink-3F-JU-Q May she drink?
ጃⷕ -ን-እወ-ማ = drink-PL-JU-Q May we drink?
ጃⷕ -ት-እን-ወ-ማ = drink-2P-PL-JU-Q May you drink?
ጃⷕ -ድ-እን-ወ-ማ = drink-3P-PL-JU-Q May they drink?
H. The last yes/no type of question involve the use of the question particle ዊያ 'is it so?’. This is
similar to the Amharic interrogative marker ወይ.
ጃⷕ -እⶖ ዊያ = ጃⷕⶖ ዊያ Is it so that I/he drank?

ጃⷕ -ይ-እⶖ ዊያ = ጃⷕይⶖ ዊያ Is it so that you drank?
ጃⷕ -ን-እⶖ ዊያ = ጃⷕንⶖ ዊያ Is it so that we drank?
3.3.4.8. NEGATION
Negation is shown by a special set of forms. The forms appear finally following the person,
number and tense/aspect markers.
Positive Negation
ትኝ-እⶖ = find-PS I found ትኝ-እ-ል = find-PS-NEG I did not find
ትኝ-ይ-እⶖ = find-2P-PS You found ትኝ-ይ-እ-ላ = find-2S-PS-NEG You did not find
ትኝ-እⶖ = find-PS He found ትኝ-እ-ላ = find-PS-NEG He did not find
ትኝ-እ-ት = find-PS-3F She found ትኝ-ይ-እ-ላ = find-F-PS-NEG She did not find
ትኝ-ን-እⶖ = find-PL-PS We found ትኝ-እ-ል-ላ/ል = find-PS-PL-NEG We did not find
ትኝ-ኢ-ን-እⶖ = find-2P-PL-PS You found ትኝ-ይ-እ-ል-ላ = find-2P-PS-PL-NEG You did not find
ትኝ-ን-እⶖ = find-PL-PS They found ትኝ-እ-ል-ላ = find-PS-PL-NEG They did not find
Page | 96
From the above table, it can be seen that the negation is designated by the morpheme /-ላ/ except
in the first singular where it is /-ል/. The independent negative particle እልላ ‘no’ is a derivative of
this negative suffix.
Negations of the imperfect aspect for verb conjugates are presented below in Table 3.5.
Positive Negation
ትኝ-ኧኵ I find ትኝ-ኧ-ል = find-IMP-NEG I do not find
ትኝ-ይ-ኧኵ You find ትኝ-ይ-ኧ-ላ = find-2S-IMP-NEG You do not find
ትኝ-ኧኵ He find ትኝ-ኧ-ላ = find-IMP-NEG He does not find
ትኝ-ኧ-ት She find ትኝ-ይ-ኧ-ላ = find-F-IMP-NEG She does not find
ትኝ-ን-ኧኵ We find ትኝ-ን-ኧ-ል = find-PL-IMP-NEG We do not find
ትኝ-ይ-ኧኵ-እን You find ትኝ-ይ-ኧ-ል-ላ = find-2P-IMP-PL-NEG You do not find
ትኝ-ኧኵ-እን They find ትኝ-ኧ-ል-ላ = find-IMP-PL-NEG They do not find
ትኝ-ኧኵ I will find ትኝ-ኧ-ል = find-IMP-NEG I will not find

ትኝ-ይ-ኧኵ You will find ትኝ-ይ-ኧ-ላ = find-2S-IMP-NEG You will not find
ትኝ-ኧኵ He will find ትኝ-ኧ-ላ = find-IMP-NEG He will not find
ትኝ-ኧ-ት She will find ትኝ-ይ-ኧ-ላ = find-F-IMP-NEG She will not find
ትኝ-ን-ኧኵ We will find ትኝ-ን-ኧ-ል = find-PL-IMP-NEG We will not find
ትኝ-ይ-ኧኵ-እን You will find ትኝ-ይ-ኧ-ል-ላ = find-2P-IMP-PL-NEG You will not find
ትኝ-ኧኵ-እን They will find ትኝ-ኧ-ል-ላ = find-IMP-PL-NEG They will not find
Table 3.5: Negation of imperfect aspect
The assimilation of the plural morpheme /-ን/ to the negative morpheme /-ል/ can be confirmed in
the first plural (from the above table) where the usual affirmative plural marker /-ን/ appears. Here
/-ል/ is not geminated due to the occurrence of /-ን/ in its position.
3.3.4.9. EMBEDDED VERBS

Page | 97
RELATIVE PARADIGM
In Kemantney, the relative verb precedes its head and agrees with it in number, gender and
person. The relative markers occur finally following the agreement inflections. Unlike other
embedded verbs which mark tense/aspect by the vowel /-ኧ/ this is neutralized in the relative
verbs in the following paradigm, the morpheme /-ኣጝ/ agrees with the noun ጋባ (thing).
አን ዋስ-ኣጝ ጋባ = I hear-RE thing The thing that I heard

እንት ዋስ-እይ-ኣጝ ጋባ = you hear-2P-RE thing The thing that you heard
ኒ ዋስ-ኣጝ ጋባ = he hear-RE thing The thing that he heard
ኒ ዋስ-እይ-ኣጝ ጋባ = she hear-3F-RE thing The thing that she heard
አኔው ዋስ-ን-ኣጝ ጋባ = we hear-PL-RE thing The thing that we heard
እንተዴው ዋስ-እይ-ኧን-ኣጝ ጋባ = you hear-2P-PL-RE thing The thing that you heard
ናይዴው ዋስ-እን-ኣጝ ጋባ = they hear-PL-RE thing The thing that they heard
Relative verbs show the accusative case marker when the head noun is null as in the following.
አን ስላጛ ጃⷕ -ኣጝ-እስ አⷕ -ኧ-ል = I beer drink-RE-AC know-IMP-NEG = I do not know who drank beer.
አን ስላጛ ጃⷕ -እይ-ኧይ-እት አⷕ -ኧ-ል = I beer drink-3F-RE-AC know-IMP-NEG = I do not know who drank
beer.
አን ስላጛ ጃⷕ -ኧው-እስ አⷕ -ኧ-ል = I beer drink-RE-AC know-IMP-NEG = I do not know who drank beer.
SUBORDINATOR /-ኘ/
This morpheme appears suffixed to verb stems and has a subordinating function /-ኘ/ is equivalent
to the Amharic /እንደ-/ or the English 'in order to'.
አን ከብ-ኧ-ኘ አዘስ-ስ-እ-ል = I cut-IMP-SUB order-PA-PS-NEG (1S) = I am not ordered in order to cut.
እንት ከብ-ይ-ኧ-ኘ አዘስ-ስ-ይ-እ-ላ = you cut-2S-IMP-SUB order-PA-2S-PS-NEG (2MS) = You are not
Page | 98
ordered in order to cut.
ኒ ከብ-ኧ-ኘ አዘስ-ስ-እ-ላ = he cut-IMP-SUB order-PA-PS-NEG (3MS) = He is not ordered in order to cut.
ኒ ከብ-ይ-ኧ-ኘ አዘስ-ስ-ይ-እ-ላ = she cut-3F-IMP-SUB order-PA-3F-PS-NEG (3F) = She is not ordered in

order to cut.
አኔው ከብ-ን-ኧ-ኘ አዘስ-ስ-እ-ል-ላ/ል = we cut-PL-IMP-SUB order-PA-PS-PL-NEG (1PL) = We are not

እንተዴው ከብ-ይ-ን-ኧ-ኘ አዘስ-ስ-ይ-እ-ል-ላ = you cut-2P-PL-IMP-SUB order-PA-2P-IMP-PL-NEG (2PL) = You

are not ordered in order to cut.
ናይዴው ከብ-ኧን-ኧ-ኘ አዘስ-ስ-እ-ል-ላ = they cut-PL-IMP-SUB order-PA-PS-PL-NEG (3PL) = They are not
SUBORDINATOR /-ትዝ/
This suffix shows condition and it appears following the person, number and gender inflections.
አን ጟርሽ-ኧ-ትዝ ጋንጅ-ኧኵ = I study-IMP-SUB sleep-IMP = If I study, I will sleep.
እንት ጟርሽ-እይ-ኧ-ትዝ ጋንጅ-እይ-ኧኵ = you study-2P-lMP-if sleep-2P-lMP = If you study, you will sleep.
ኒ ጟርሽ-ኧ-ትዝ ጋንጅ-ኧኵ = he study-IMP-SUB sleep-IMP = If he studies, he will sleep.
ኒ ጟርሽ-እይ-ኧ-ትዝ ጋንጅ-ኧ-ት = she study-3F-IMP-SUB sleep-IMP-3F = If she studies, she will sleep.
አኔው ጟርሽ-እን-ኧ-ትዝ ጋንጅ-እን-ኧኵ = we study-PL-IMP-SUB sleep-PL-IMP = If we study, we will sleep.
እንተዴው ጟርሽ-እይ-ን-ኧ-ትዝ ጋንጅ-እይ-ን-ኧኵ = you study-2P-PL-IMP-SUB sleep-2P-PL-IMP = If you study,
Page | 99
you will sleep.
ናይዴው ጟርሽ-እን-ኧ-ትዝ ጋንጅ-እን-ኧኵ = they study-PL-IMP-SUB sleep-PL-IMP = If they study, they will
sleep.
ADVERBIAL SUBORDINATOR /-ኙ/
As mentioned in conditional mood, this suffix appears in subordinate stative or temporal

(durative) verbs.
ኒ አን ጋኝ-ኧ-ኙ ⷓል-እⶖ = he I run-IMP-SUB see-PS = He saw me while/when I ran.
ኒ አኔው ጋኝ-ን-ኧ-ኙ ⷓል-እⶖ = he we run-PL-IMP-SUB see-PS = He saw us while/when we ran.
ኒ እንት ጋኝ-እይ-ኧ-ኙ ⷓል-እⶖ = he you run-2S-IMP-SUB see-PS = He saw you while/when you ran.
ኒ ኒ ጋኝ-ኧ-ኙ ⷓል-እⶖ = he he run-IMP-SUB see-PS = He saw him while/when he ran.
ኒ እንተዴው ጋኝ-ይ-እን-ኧ-ኙ ⷓል-እⶖ = he you run-2P-PL-IMP-SUB see-PS = He saw you while/when you
ran.
ኒ ናይዴው ጋኝ-ኧን-ኧ-ኙ ⷓል-እⶖ = he they run-PL-IMP-SUB see-PS = He saw them while/when they ran.
ኒ ኒ ጋኝ-እይ-ኧ-ኙ ⷓል-እⶖ = he she run-3F-IMP-SUB see-PS = He saw her while/when she ran.
3.3.5. ADVERB
Zelealem [1] in Kemantney, there are few adverbs (ADVs). They appear immediately preceding
verb phrases (VPs).
The most commonly used adverbs are time adverbs.
Page | 100
አንጅኝ Yesterday ይዅም Last year
አመር Tomorrow ሻⶓ Next year
አመር-ሊ ስየ = አመርሊ ስየ The day before ንኝ Now
አንጅኝ-ሊ ስየ = አንጅኝሊ ስየ The day after
Some examples of time adverbs are the following:
ኒ አንጅኝ ክዝ-ስ-እⶖ = yesterday sell-PA-PS It was sold yesterday

ኒ አመር ጏዝ-ኧኵ = tomorrow plough-IMP He will plough tomorrow
ኒ አመር-ሊ ስየ ኸሽ-እⶖ = tomorrow-from farther wash-PS He will wash the day after tomorrow
ኒ አንጅኝ-ሊ ስየ ተባጝ-እⶖ = yesterday-from farther milk-PS He milked the day before yesterday
ኒ ይዅም ት-እⶖ = last year come-PS He came last year
ኒ ሻⶓ ፋጝ-ኧኵ = pass the rainy season marry-IMP He will marry next year
አነ ንኝ ⷓል-ኧል = today/now see-NEG I do not see today/now
There are few degree and manner adverbs.
ሸርሽ Very
ሰሲየሽ Immediately
ወሊሽ Quickly
Some examples of degree and manner adverbs are the following:
ሸርሽ ሸገር-እⶖ = very trouble-PS He was very troubled

ሰሲየሽ ሸወር-ስ-እⶖ = immediately disappear-PA-PS He disappeared immediately
ወሊሽ ላⶖ = fast come Come fast!
3.3.6. QUESTION WORDS
Zelealem [1] the following are question words in Kemantney.
Page | 101
ዊⶖዝ Why አው Who አውን When አዊ Whose
አውት Where ወ What ዊⷓ How many/much ዊⶖ How
The question words appear following the subject and preceding the predicate. However, they can
also appear initially as shown below.
ወ-ኒ ጋቢ = what-is thing = What is the matter?

አው ገንጅ-እⶖ = who sleep-PS = Who slept?
አውት ፈ-ት-ኣር-ኒ = where go-2P-RE-are = Where are you going?
ወ ሸብ-ወ ጋኝ-እⶖ = what do-GE run-RE = Having done what did he run?
3.3.7. CONNECTIVES
In Kemantney, the form ወይር 'or' is a coordinator.
ኒ ጏዅይ-ኣጝ ወይር ሰካራም ጋላ = he be mad-RE or drunkard he is He is either mad or a drunkard

ታየ ወይር ካሳ ቲይ-ኧኵ-እን = Taye or Kasa come-IMP-PL Either Taye or Kasa will come
አን በይላ ወይር ፍርዛ ዋይት-ኧኵ = I mule or horse buy-IMP I will buy either a mule or a horse
The connective ወይ-እር is a hybrid of the Amharic ወይ 'either' and the Kemantney /-(እ)ር/ 'too'.
The discontinuous morpheme /-(እ)ዝኵ-(እ)ዝ/ is a conjunction linking nominal’s. It appears in both

nouns. The morpheme /-(እ)ዝኵ/ appears with the first conjunct and /-(እ)ዝ/ with the second.
Where there are more than two conjuncts, the connector appears with one second to the last.
ይ-ዝኵ ኒ-ዝ ስም-ን-ኣጝ-ኧስ ለወይ-ን-ኧኵ = I-and he-and live-PL-RE-AC change-PL-IMP = I and he will
change our residence.
ኒ ዳሚያ ግዝኝ ዲርዋ-ዝኵ ፍንትራ-ዝ ባⷔ ሸይን-ኧኵ = he cat dog hen- and goat only hold-IMP = He has only a
cat, a dog, a hen and a goat.
አምብረ-ዝኵ ዲንሸ-ዝ አታክልት-ኧን ጋጝልላ = cabbage-and potato-and vegetable-PL they are = Cabbage and
Page | 102
potato are vegetables.
አይከል-እዝኵ ጎንደር-እዝ ጎረቤት-ኧን ጋጝልላ = Aykel-and Gonder-and neighbour-PL they are = Aykel and
Gondar are neighbours.
The less competent speakers of the language tend to reduce /-(እ)ዝኵ-(እ)ዝ/ to /-ኵ/.
ይ-ዝኵ ኵ-ዝ አን-ኵ እንት = አንኵ እንት I and you

ቡረ-ዝኵ ጌሾ-ዝ ቡረ-ኵ ጌሾ = ቡረኵ ጌሾ Malt and hop
ሳርደ-ዝኵ ምርፋ-ዝ ሳርደ-ኵ ምርፋ = ሳርደኵ ምርፋ Blade and needle
ዅነ-ዝኵ ጃⷕ -ነ-ዝ ዅነ-ኵ ጃⷕነ = ዅነኵ ጃⷕነ Eating and drinking
The coordinating conjunction አዃር 'but' occurs between two sentences.
ኒ ይ-ት ካሽኝ-እ-ት አዃር አን ዋስ-እ-ላ = she me-OBJ call-PS-3F but I hear-PS-NEG = She called me but I did
not hear.
አን ⷓ ል-እⶖ ኒ አዃር ⷓ ል-እ-ላ = I see-PS he but see-PS-NEG = I saw him but he did not see me.
አን ምረወ-ስ ታይ-ኧⶖ አዃር ኒ ኪይ-እ-ላ = I snake-AC hit-PS but he die-PS-NEG = I hit the snake but it did
not die.
The conditional subordinator is /-ኒር/ ‘even though’ or ‘whether’ which is a fusion of another
conditional subordinator /-ን/ and the particle /-ኢር/ 'too'.
ኒ ከደም ይ-Ø ገንጅ-ኧ-ን-ኢር ናን-ታጘሽ ጕይ-እ-ላ = he early say-GE sleep-IMP-SUB-too now-until wake up-
PS-NEG = Even though he slept early, he did not wake up until now.
ኒ ለለማ አጝይ-ኧ-ን-ኢር ኒሽ-ሽብካ ሸበት-እዋን-ኧኵ = she baby be-IMP-SUB-too her-hair be grey-AUX-IMP =
Even though she is young, her hair has become gray.
ናይዴው ካሽኝ-ኧን-ኧ-ን-ኢር ካሽኝ-ኧግ-ኧን-ኧ-ን-ኢር አን ፊይ-ኧኵ = they call-PL-IMP-if-also call-NEG-PL-IMP-if-
also I go-IMP = Whether they call me or do not call me, I will go.
Page | 103
ኒ ሽⶖስ-ኧ-ን-ኢር ሽⶖስ-ኧግ-ኧ-ን-ኢር ሹም-ኧኵ = he be sick-IMP-if-also be sick-NEG-IMP-if-also fast-IMP =
Whether he is sick or is not sick, he fasts.
3.3.8. NOUN PHRASE
Kemantney is a head-final language. In a noun phrase, the head appears following its modifiers:
demonstratives, adjectives, numerals, genitive nouns and relative clauses.
A. The simple noun phrase constructions with the noun heads as the following:
ሸመና እንጀኘንታ = black big-snake A black big-snake

ላጝ ሽሮ ዲⷓ ይር = one earnest poor man One earnest poor man
አⷕ -ኧንተ ይር = know-AG man A wise man
ይኒ ከው = that village That village
ዲርዋ-ኢ ላባ = hen-LINK feather Feather of hen
ጃⷕ -ስ-ኣጝ ስላጛ = drink-PA-RE beer The beer which was drunk
B. Quantifiers appear preceding heads.
ዅራ ወልታ ወልተ ዅርራ Six childern

እይየን ወልታ ወልተ እይየን Six persons
ቢልት ወልታ ወልተ ቢልት Six oxen
ፍንትራ ኒኛ ኒኘ ፍንትር Two goats
ንኝክ ኒኛ ኒኘ ንኝክ Two houses
ከውክ ኒኛ ኒኘ ከውክ Two villages
The pattern is used most often by the fluent native speakers. This suggests that the original
pattern may be N+Q. The common occurrence of such a pattern in Cushitic such as Bilin and
Oromo may also be good evidence for this claim. Hence, the shift to Q+N, especially by young
passive speakers, is clearly the influence of Amharic.
Page | 104
C. When a noun is modified by a numeral, it may appear either in its singular or plural form.
ኒኘ ፍንተራ/ ፍንተር Two goats ወልተ ዅራ/ዅርራ Six childern

ኒኘ ቢራ/ ቢልት Two oxen ወልተ ይር/እይየን Six persons
ኒኘ ከው/ከውክ Two villages ወልተ ቢራ/ ቢልት Six oxen
D. In the following noun phrase, the noun head is preceded by a demonstrative.
እኒ ኵራ This river ይኒ ኵራ That river

እንደው ኵርክ These rivers ይንደው ኵርክ Those rivers
E. Demonstratives precede other modifiers like adjectives.
ይኒ ለገዝ-ኣጝ ይር = that be tall-RE man = That tall man

እኒ ይዘን-ኣጝ ቢራ = this be fat-RE ox = This fat ox
ይኒ ጅከክ-ኣጝ ምውት = that be heavy-RE load = That heavy load
ይኒ ለገዝ-ኣጝ ይዘን-ኣጝ ይር = that tall-RE be fat-RE man = That tall fat man
ይኒ ሸመኒ ይዘን-ኣጝ ቢራ-ኢ = that black be fat-RE ox-NOM = That black fat ox
ይኒ ሻይ-ኣጝ ግሴይር-ኣጝ ፍንትራ-ኢ = that be white-RE be short-RE goat-NOM = That white short goat
F. When a NP has a relative clause and an adjective as its constituents, relative clauses precede
adjectives.
አን አንጅን ዋይት-ኣጝ አዚ ሰይኝ = I yesterday buy-RE new cloth = The new cloth which bought yesterday.
ኒ ከብ-ኣጝ ካጋጝ ካና-ስ = he cut-RE dry wood-AC = The dry wood which he cut.
ታየ ማል-ኣጝ አዚ ስንኯት-ኢ = Taye lose-RE new axe-NOM = The new axe which Taye lost.
እንት ⷓ ል-ኤይ ሸመና በይላ = you see-RE black mule = The black mule which you saw.
3.3.9. EMBEDDED CLAUSES
Generally, an embedded clause appears preceding a main verb.

Page | 105
አን ኒ ቲይ-ኧ-ኘ አⷕ -ኧኵ = I he come-IMP-SUB know-IMP = I know that he will come.
እንት ኒ ⷓጝ-ኧ-ኘ አⷕ -ኧይ-ኧኵ = you he win-IMP-SUB know-2S-IMP = You know that he will win.
አን ክማንትነይ ዅራ ዋን-ኧ-ኙ ኪንት-እⶖ = I Kemantney child have-IMP-SUB learn-PS = I learned
Kemantney when I was a child.
አን ንኝ-ወ ትው-ኧ-ኙ ናይዴው ትምቢይ-ን-እⶖ = I house-to enter-IMP-SUB they stand-PL-PS = While I
entered the house, they stood up.
Interrogative subordinate clauses appear preceding the main verb are the following:
አን ወ ሸብ-ኣር ት-እⶖ = I what do-RE come-PS = What to do did I come?

አኔው ወ ሸብ-ን-ኣር ት-እን-ኧⶖ = we what do-PL-RE come-PL-PS = What to do did we come?
እንት ወ ሸብ-ኢይ-ኣር ት-ኢይ-ኧⶖ = you what do-2P-RE come-2P-PS = What to do did you come?
እንተዴው ወ ሸብ-ኢ-ን-ኣር ት-ኢ-ን-ኤⶖ = you (PL) what do-2P-PL-RE come-2P-PL-PS = What to do did
she come?
ኒ ወ ሸብ-ኣጝ ት-እⶖ = he what do-RE come-PS = What to do did he come?
ናይዴው ወ ሸብ-ኢይ-ኧው ት-እን-እⶖ = they what do-RE come-PL-PS = What to do did they come?
ኒ ወ ሸብ-ኤይ ት-እ-ት = she what do-RE come-PS-3F = What to do did she come?
The complement clause appears preceding the main verb are the following:
ኒ አን ዋይት-ኣጝ-እስ አⷕ-ኧኵ = he I buy-RE-AC know-IMP = He knows what I bought.

ኒ አኔው አውት ፈይ-ን-ኧ-ኘ ዋኝⷐር-እⶖ = he we where go-PL-IMP-SUB ask-PS = He asked where we
will go.
አን አው ፉ-ኧ-ኘ ⷓ ል-ኧኵ = I who weep-IMP-SUB see-IMP = I will see who will weep.
ናይዴው ጏንደር ይ-ስ-ኣጝ ከው ስም-ን-ኧኵ = they Gondar say-PA-RE country live-3PL-IMP = They live in
a country which is called Gondar.
Page | 106
The other form of subordination is the suffix /-ኘ/ (see subordinator embedded verbs) which is
equivalent to the Amharic /እንደ-/ 'in order to'. The embedded verbs show agreement with
subjects. Like other subordinate clauses, the /-ኘ/ clause appears preceding the main verb.
3.3.10. SEMANTIC LOAD (POLYSEMY)
According to Zelealem [1], semantic load is commonly in the Kemantney lexicon where one word
means many things. This happens when words are lost or do not exist at all. As a compensatory
strategy, passive terminal speakers tend to use existing words for more than one thing whereas
active speakers use different words for different things.
ቢያ Soil, Dust, Land, Earth ሸራጝ Good, Best, Beautiful

አጝ-ኣጝ = Who-knows, Knowledge, Wizard, ፍራጝ Big, Elder, Chief, Important
አጛጝ Knowledgeable, Wise
ካና Wood, Tree, Bark, Branch ሽብካ Hair, Feather, Beard
ፊዋ Body, Breath, Soul, Life ሻጛ Iron, Gun
ፊይ Go, Go out, Climb, Migrate, Fly ታይ Hit, Knock
ሸመርጊና Spear, Gun, War, Army ኯርበይ Skin, Hide, Leather
ከብ Cut, Cross, Circumcise, Bite ሻጟ Bone marrow, Dung, Fat
ዅይ Eat, Bite, Snatch, Chew ዅራ Child, Daughter, Baby, Son
Table 3.6: Semantic load (Polysemy)
3.3.11. VOCABULARY TEST
Zelealem [1] the rates of loss and retention, the terminal speakers were asked to give equivalents
in Kemantney for 670 Amharic words. These words include those in the Swadesh basic word list
and other content and function words. They were 363 nouns, 195 verbs, 79 adjectives, 26 adverbs
and 7 postpositions. All informants volunteered borrowed Amharic words at different levels with
Page | 107
or without modifications. The result of the lexicostatistical testing shows the proficiency
continuum of the terminal speakers coinciding with the sociolinguistic variables of place of birth
and residence, occupation, age, education and sex. As a result, rural dwellers show more retention
than urban dwellers. In terms of occupation, peasants and Kemantney priests are more proficient
than the others. Age is an important factor for the level of competence among speakers of the
language in general.
3.4. SUMMARY
In Kemantney language morphology, the inflectional morphologies and some derivational

morphology involve suffixing. It is observed that the main word formation process in Kemantney is
regular and single word. Plural marking in Kemantney noun is quite heterogeneous. It is observed
that the main word formation in Kemantney is done through affixation and the parts of affix are
mostly suffix for Kemantney language.
The Kemantney language is a highly endangered language spoken by a small and elderly fraction
of the Kemantney people and no monolingual speaker is found today. As a result of the exclusive
use of Amharic in every domain, it dominates the speech behavior of terminal speakers.
According to Zelealem [1], the most striking deviation from 'good' Kemantney was the general
decline of verbal inflection. He said that the paradigms amount to an enormous mass of verb
forms which were still quite well-mastered by elderly speakers with good knowledge of
Kemantney. The inflectional elements of the verb keep changing. Kemantney shows simplification
and reduction in morphology.
The reduction in morphology of the Language and highly endangered language spoken are the
main reason to automate the Kemantney text for a language to desire easily transferable from one
generation to next generation since stemmer is an automated mechanism that conflates variants
of words.
Page | 108
In this study based on understanding of the characteristics of word formation in Kemantney an
attempt is made to develop a stemming algorithm that conflate variants of Kemantney words
using iterative approach.
Page | 109
CHAPTER FOUR
IMPLEMENTAION AND EXPERIMENTAL RESULT
The aim of this study is developing a stemmer for Kemantney language. To this end, a thorough
analysis of word formation process in Kemantney is done to identify the focus of this study. The
analysis shows that most of the affixes are mostly suffix for Kemantney language. These suffixes
are used for inflecting and deriving words. Nouns are inflected for action nominal, gender,
number, case and nominal derivation. The main plural marking processes of noun are suffixation
and reduplication. The rest include vowel changes, vowel addition, consonant change, gemination
and suppletion. Verbs are inflected for person, number, gender, tense, mood and all Kemantney
verbs are suffixing.
A prototype stemmer for Kemantney text is developed using Python 3.4.1 programming language.
The prototype is then tested using test dataset and results are reported with the necessary
analysis and interpretation.
4.1. MORPHOLOGICAL PREPROCESSING
It has been argued that the use of morphological information is important for languages where
morphology has an important influence on pronunciation, syllabification and word stress [46].
The stream of characters in a natural language text must be broken up into distinct meaningful
units before any language processing can be performed. Preprocessing is an important part of all
text processing. In the preprocessing stage variant forms, character sets, and file formats can be
converted, so that all text, regardless of its source, is in the same format. Preprocessing must
ensure that the source text be presented to natural language in a form usable for it. For example,
natural language programs usually need their input to be tokenized, i.e. text elements usually
Page | 110
word forms or sentences are identified and placed on separate lines of the input [50]. In the
preprocessing stage this study addresses tokenization, normalization and stop word removal.
4.1.1. TOKENIZATION
Tokenization is the process of splitting a sentence into its constituent tokens. Most text processing
applications like stemmers operate on words and sentences. Texts in their raw form, however, are
just sequences of characters without explicit information about word and sentence boundaries.
Before any further processing can be done, a text needs to be segmented into words and
sentences or before any real processing can be done on the input text, it needs to be segmented
into linguistic units such as words, punctuation, numbers or alphanumerics. These units are known
as tokens. In this study, words are taken as tokens. All punctuation marks, numbers and special
characters are removed from the text before the data is processed. All punctuation marks are
converted to space and space is used as a word demarcation. Hence, if a sequence of characters is
followed by space, that sequence is identified as a word. A consecutive sequence of valid
characters was recognized as a word in the tokenization process [69].
The stemmer takes a corpus as an input. The sample text document is collected from the following
sources, such as text book, PhD thesis on Kemantney language and manuals from one text book
and three newspapers (2004, 2006). One text book (2006), one PhD thesis documents [1] and
manuals are used for collecting a sample text of Kemantney as a sample for checking the stemmer.
The next step after collecting the corpus is a preprocessing step where tokenization process is
involved.
Tokenization breaks a stream of text into words [69]. All punctuation marks, control characters,
numbers and special characters are removed from the text before the data is processed. All
punctuation marks are converted to space and space is used as a word demarcation.
Page | 111
The third step is removing suffixes; the main task of the stemmer is removing all the suffixes that
exist on the text of the language. For removing all the suffixes the stemmer checks the rules used
for making the stemmer. So the stemmer iteratively removes the suffixes. Finally the stemmer
displays its output, the tokens or conflated word of the language after removing the entire suffix
based on the stated rules.
4.2. COMPILATION OF AFFIXES
In Kemantney language word formation involves suffixes. The Kemantney stemmer should not
only able to remove suffixes, but also needs to apply in the noun, reduplication, vowel changes,
vowel addition, consonant change, gemination and suppletion as well.
As pointed out by Lemma [39] and Nega [32], the actual affix compilation is dependent on the
nature of a stemming algorithm, longest match or iterative. The longest match algorithm requires
all forms of the affixes, basic and derived (or concatenated), for successful stemming. Whereas, an
iterative approach is simply a recursive procedure, as its name implies, which removes strings
based on the fact that suffixes are attached to stems one after the other. Such algorithm involves
a recursive procedure which removes the suffixes one at a time, starting at the end of a word and
working towards its beginning. The iterative approach only requires a list of basic affixes and
removes them iteratively. Iteration that is usually based on the fact that suffixes are attached to
stems [37], [39], [47]. Concatenation of suffixes is common in Kemantney. As a result, two or more
suffixes may be concatenated together and attached (or affixed) to a word. In the language,
possible list of combination can be very large making difficult to have complete list of combination
(concatenations). Besides, concatenation in the language makes suffixes long ones attaching one
suffix to another. Hence, iteratively removing each base suffix one by one is considered the best
choice. As a result, iterative approach is adopted to develop the stemmer for Kemantney text.
The major drawback of affix removal approach is their dependency on a-priory knowledge of
language morphology. Affix removal stemmers apply set of transformation rules to each word,
Page | 112
trying to cut off known suffixes. First such algorithm was described by J.B. Lovins in [16]. Then few
more affix removal algorithms have been suggested [16].
Variety of stemming algorithms essentially brings up a question about their comparison. Though
the evaluation measures for the existence of under-stemming (removing too less a suffix) and
over-stemming (removing too much), they are hard to use due to lack of a standard testing set
(and even a possibility of its creation is questionable) [70].
A list of suffixes sample text is collected from Zelealem research, a sociolinguistic and grammatical
study of language replacement [1] and manual documents as a guide.
In this research the data have been represented in Unicode and the stemmer accepts Unicode
data directly without transliterating it to Latin form. The length of a word is also determined to
represent the size of the word while developing the algorithm. The minimum length for a
meaningful Kemantney word is one.
4.3. THE PROPOSED STEMMER
Techniques developed for English, Semitic languages such as Amharic, Tigrigna and Cushitic
language such as Oromigna have been studied. To take the related techniques used in these
algorithms were incorporated in developing the Kemantney stemmer.
In this study, as depicted in Figure 4.1 an iterative approach is followed to remove the suffix.
In the first step the removal of suffix is done by the proposed stemmer. First the word is checked
against stop word list. If there is no match in the stop word list then the length of the word is
checked. If it is greater than one then the suffix list file is opened and checked if there is a match
of suffix with word. If the word has a match and the suffix striping process is performed.
Otherwise no suffix is removed.
Page | 113
Start
Store
End of file yes End
stemmed file
No
Read
Yes Word in stop

word
No
Count length of word

M <= length of word
Accept word and

Yes M<=2 Check the
write it on stem file existence of suffix
? No
No
Found suffix
Yes
Remove suffix
Figure 4.1: Flow chart for the general suffix removal algorithm
Page | 114
4.4. RULES FOR REMOVING SUFFIXES
In a rule based approach to apply language specific rules are encoded and based on these rules
stemming is performed. In this study some condition is specified for converting a word to its
derivational stem and also there are some exceptional rules which are used to handle the
exceptional cases. First the stemmer identifies common patterns in the language which means
words that are substituted by other word on the stemming process. To deal with each suffix
individually, Table 4.1 shows rules that are applied for suffix removal.
Rule- Condition Action Example

set
1 If (word ends with ‘ልክ') Remove the suffix ‘ክ'; and በይልክ → በይላ
Replace word 'ል’, by 'ላ';
2 If (word ends with ‘ምክ') Remove the suffix ‘ክ'; and ከተምክ → ከተሚ
Replace word 'ም’, by 'ሚ';
3 If (word ends with ‘ዝክ') Remove the suffix ‘ክ'; and ፈርዝክ → ፈርዛ
Replace word 'ዝ’, by 'ዛ';
4 If (word ends with ‘ንክ') Remove the suffix ‘ክ'; and ይውንክ → ይውና
Replace word 'ን’, by 'ና
5 If (word ends with ‘ውክ') Remove the suffix ‘ክ'; ዲርውክ፣ጏርውክ → ዲርዋ፣ጏርዋ
Replace word 'ው’, by 'ዋ';
6 If (word ends with ‘ⷕክ') Remove the suffix ‘ክ'; ዲⷕክ → ዲⷓ
Replace word 'ⷕ’, by 'ⷓ';
7 If (word ends with ‘ክ') Remove the suffix ‘ክ'; ሻሊክ፣ዳሚክ፣እኝጊክ → ሻሊያ፣ዳሚያ፣
Replace word by 'ያ'; እኝጊያ
8 If (word ends with ‘ነን') Remove the suffix ‘ን'; ሰይጣነን → ሰይጣን
Replace word 'ነ’, by 'ን';
9 If (word ends with ‘ተን') Remove the suffix ‘ን'; ጣተን → ጣት
Page | 115
Replace word 'ተ’, by 'ት';
10 If (word ends with ‘በክ') Remove the suffix ‘ክ'; ክምበክ → ክምብ
Replace word 'በ’, by 'ብ';
11 If (word ends with ‘ር') Remove the suffix ‘ር'; ፍንትር → ፍንትራ
Replace wordby 'ራ';
12 If (word ends with ‘ል') Remove the suffix ‘ል'; ማⷕል → ማⷕላ
Replace word by 'ላ';
13 If (word ends with ‘ም') Remove the suffix ‘ም'; ከም → ከማ
Replace word by 'ማ';
14 If (word ends with ‘ልት') Remove the suffix ‘ት'; ድⶓልት፣ቢልት → ድⶓራ፣ቢራ
Replace word 'ል’, by 'ራ';
15 If (word ends with ‘ልት') Remove the suffix ‘ት'; ገልት → ገር
Replace word 'ል’, by 'ር';
16 If (word ends with ‘ኳን') Remove the suffix ‘ን'; እርኳን → እርኵ
Replace word 'ኳ’, by 'ኵ';
17 If (word ends with ‘ት') Remove the suffix ‘ት'; ኪንሸንት፣ጏዘንት → ኪንሸንታ፣ጏዘንታ
Replace word by 'ታ';
18 If (word ends with Remove the suffix ‘ኯን'; ልኯኯን → ልኵ
‘ኯኯን') Replace word 'ኯ’, by 'ኵ';
19 If (word ends with ‘ርራ') Remove the suffix ‘ራ'; ዅርራ → ዅራ
Replace word 'ር’, by 'ራ';
Table 4.1: List of rules constructed for stemming
4.5. IMPLEMENTATION OF THE STEMMER
A program was developed using the Python programming language to implement the Kemantney
stemmer. The algorithm implemented is the iterative approach and the lists of suffixes in the rules
Page | 116
are checked against the word. It removes suffixes iteratively until the entire suffixes are removed.
Sample example of implementation is shown in Figure 4.2.
def stem(word):
s=codecs.open("suffix.txt",'r', encoding = 'utf-8' )
suffix= ("suffix.txt",'r')
suf = s.read()
for suffix in suf.split():
if len(word)>=1:
if word.endswith(suffix):
word=word [:-len(suffix)]
if word.endswith('ንክ'):
word=word.replace('ክ','')
word=word.replace('ን','ና')
if len(word)>=1:
return word
else :
Word=word.replace ('ንክ','')
return word
Figure 4.2: Sample Python code for suffix removal
4.6. EVALUATION OF THE STEMMER
Experiments are done to evaluate the performance of the proposed method for the stemmer. The
data are extracted from the Kemantney corpus collected for the development of the stemmer.
There are some criteria for judging stemmers: correctness and compression performance. There
are two ways in which stemming can be incorrect: over-stemming and under-stemming. When a
Page | 117
term is over-stemmed, too much of its characters are removed. Over-stemming can cause
unrelated terms to be conflated or when two words with different stems are stemmed to the
same root. Under-stemming is the removal of too little of a term. Under-stemming will prevent
related terms from being conflated or when two words that should be stemmed to the same root
are not. To evaluate the performance of the stemmer, manual counting technique was used. This
helps to compare number of errors that are not conflated correctly with the correct one.
Ideally, a good Kemantney stemmer will stem all words from the same semantic group to the
same stem. But due to the irregularities which are prominent to Kemantney language the
stemmer unavoidably makes mistakes. This is also true for all other natural languages and no
stemmer can be expected to work perfectly [54]. A good stemmer should obviously produce as
few over-stemming and under-stemming errors as possible. The evaluation of this Kemantney
stemmer is done by counting these errors for the sample of texts. In this evaluation a correctly
stemmed word is considered as any word without suffixes attached.
Experimental result for other local language
Local language Percentage (%) Word compression

Silte stemmer 85.71 34.99%
Wolaytta stemmer for 90.6 41.2%
context-sensitive
Tigrigna rule-based 86.3 -------
stemmer
Rule-based stemmer for 94.84 38%
Afaan Oromo
The Kemantney stemmer was tested on sample text taken from documents such as one text book,
one PhD thesis documents, newspaper and manuals are used. Experimental result of Kemantney
stemmer is presented in Table 4.2 below.
Page | 118
Test set Word stem Correctly stemmed Incorrectly stemmed
Under-stemmed Over-stemmed
count Percentage count Percentage count Percentage
930 315 295 93.65% 17 5.39% 3 0.95%
Table 4.2: Correct stem and incorrect stem
The accuracy of the Kemantney stemmer is word type for the experimentation has been done by
excluding stop-words.
As shown in Table 4.2, during the stemming process two types of errors are observed. These are
under-stemming and over-stemming. According to the assessment, from the total test set 93.65%
of words are correctly stemmed and 6.34% of words are incorrectly stemmed. As per the result
most of the errors happen because of under-stemming which accounts for 5.39% from the
performance result we can understand that, the stemmer has a promising result. However there
are also incorrectly stemmed texts that are the results of under-stemming and over-stemming
errors in the stemming process. Some examples of wrongly conflated words are listed in Table 4.3.
number Unstemmed Expected stem Output Error type

1 ⷓልስⷓልስ ⷓልስⷓልስና ⷓልስⷓልስ Under-stemming
2 ገባጝስ ገባጝስኖ ገባጝስ Under-stemming

3 አው አውነይ አው Under-stemming
4 ድር ድርስ ድር Under-stemming
5 አⷕስ አⷕስና አⷕስ Under-stemming
6 ወከልነው ወከል ወከልነው Over-stemming
7 ማⶖርሳጝ ማⶖርሳጝዝ ማⶖርሳጝ Under-stemming
8 ሚስ ሚስኖ ሚስ Under-stemming
9 ዳድስኖ ዳድስኖዝ ዳድስኖ Under-stemming
Page | 119
10 ከለብት ከለብትና ከለብት Under-stemming
11 ይኖ ይ ይኖ Over-stemming
12 ⷕር ⷕርነይ ⷕር Under-stemming
13 ዳይኝትነ ዳይኝትነዝ ዳይኝትነ Under-stemming
14 ታጘተልሻር ታጘተልሻ ታጘተልሻር Over-stemming
15 ተገዳደበው ተገዳደበውዝ ተገዳደበው Under-stemming
16 ደግ ደግነይ ደግ Under-stemming
17 ⷓልናጝ ⷓልናጝዝ ⷓልናጝ Under-stemming
18 ይከከልስ ይከከልስና ይከከልስ Under-stemming
19 አሰመርሻጝ አሰመርሻጝዝ አሰመርሻጝ Under-stemming
20 ዳይጝነይ ዳይጝነይክ ዳይጝነይ Under-stemming
Table 4.3: Sample of Kemantney words under-stemmed and some over-stemmed
In general, reasons for the under-stemming and over-stemming problems are: It was difficult to
come up with the complete rule because of the difficulty of understanding the morphology of the
language. More conditions/rules are required based on a detailed study of the morphology of the
language.
WORD COMPRESSION RATIO
To measure reduction of dictionary size, percentage of compression is calculated. For calculating

the compression rate (C) the following formula is used, where W is number of total words and S is
number of stemmed words from W [31].
Where
Page | 120
Since out of the total 930 words 315 stems are identified, the compression ratio is 66.13% for the
stemmed words. That means that the stemmer drastically reduces the sample text by 66.13%. This
shows that using a stemmer for Kemantney brings a significant reduction in dictionary size as a
result of conflating variant words to their same stem.
Page | 121
CHAPTER FIVE
CONCLUSION AND RECOMMENDATIONS
5.1. CONCLUSION
Agew is a morphologically complex language. A single word in the language has a number of
variants. The language is rich in both inflectional and derivational morphologies. Kemantney is one
of the Cushitic languages. These languages have common grammatical system based on the root
pattern structure.
The main word formation process in Kemantney is done through suffixation. The suffixation
technique to create variant words and suffixes are concatenated one after the other to create
another variant of the word. Most function words are assimilated as part of content bearing words
in the form of suffixes. This property of the language has its own role in the morphological
behavior of the language.
The commonly used methods of stemmers such as suffix list, stop word list, rule based, iterative
approach, punctuation remover, posting list and normalization are employed. Also some
techniques are adopted from Porter in developing the stemmer. Stripping suffix is not enough to
conflate variant words of Kemantney to one form. Hence the stemmer also includes other
procedure to apply the language implementation such as rule based approach in classification of
stemmers.
The evaluation for the final stemmer reveals that there is significant difference between stemming
and non-stemming for Kemantney in terms of the size of word compression. The word size is
compressed by 66.13%. This shows that using a stemmer for Kemantney brings a significant
reduction in dictionary size as a result of conflating variant words to their same stem. Based on
Page | 122
experimental result, the performance of the stemmer is also promising, as it registers 93.65%
accuracy. Only 6.34% total errors occurred, most of the errors are because of under-stemming.
In the study a promising result is achieved using iterative stemmer and rule based approach for
Kemantney language. In the language there are many challenges that affect the result of the
stemmer. Solving such complexity requires further investigation for identifying patterns and
exception rules. The stemmer is also evaluated on a small set of sample data sets, as there is no
standard corpus prepared for the purpose.
5.2. RECOMMDATIONS
This stemmer is the first trial in the automated system for the Kemantney language. It is my belief
that the stemmer should be improved by further research to attain better understanding, store a
long period of time, transmission and hence bring it to an operational level. Based on the findings
of this study and the knowledge obtained from the literature, the following recommendations are
forwarded for future work.
 To solve the problem of over- and under-stemming errors happened in this study, other
alternate approach, such as longest match approach can be implemented to see whether it
performs better than the approach used in this study.
 There is a need to investigate further and enhance the functionality of Kemantney stemmer
for irregular words and compound words.
 The need for compiling a standard corpus useful for natural language processing of Kemantney
in general and development of a stemmer in particular such that the stemmer can be
evaluated on text collection of large size collected from different sources.
 By incorporating the necessary elements, the stemmer can also be used as a component for
developing other computational tools like morphological analyzer, parser, spell checker,
context sensitive rules, word frequency counting, thesaurus and dictionary and the like of the
language under consideration.
Page | 123
 The procedures followed to develop the Kemantney stemmer can be used as a source for
developing stemmers for other Cushitic family languages.
 The stemmer has to be tested with large amount of texts to prove its real performance. To
succeed in this regard there is a need to apply the stemmer for Kemantney texts. Then we can
have a complete view of the stemming system and the returned results after every search
request.
 The rules described in this work can be a base for further research and it can support to
develop extended stemming rules covering most of the terms in the Kemantney language.
 Evaluation of the Kemantney stemmer on Kemantney information retrieval system will suggest
the best substitution between under-stemming and over-stemming that can be achieved by
dropping or adding a few suffixes in the list.
Page | 124
REFERENCES
[1] Zelealem Leyew, the Kemantney Language – A Sociolinguistic and Grammatical Study of
Language Replacement, 2003. Cologne: RudigerKoppe Verlag: Koln/Germany.
[2] Zelealem Leyew, Sociolinguistic Survey Report of the Kemant (Qimant) Language of Ethiopia,
2002.
[3] Nega Alemayehu and Petter Willett, The Effectiveness of Stemming for Information Retrieval in
Amharic for Information Retrieval: Electronic Library and Information Systems, Vol. 37, Num. 4,
PP. 254-259, 2003.
[4] Belay Shibeshi, Minority rights protection in the Amhara national regional state: the case of the
Kemant people in North Gondar, Addis Ababa University, Addis Ababa, Ethiopia, January 2010.
[5] The 1994 Population and Housing Census of Ethiopia. Results for the Amhara Region. Central
Statistical Office. Addis Ababa. Ethiopia:
[6] Debela Tesfaye, Ermias Abebe.” Designing a Rule Based Stemmer for Afaan Oromo Text”, 2002.
[7] Argaw A.A., Asker L., An Amharic stemmer: Reducing words to their citation forms, in
proceedings of the 45thannual meeting of the association for computational Linguistics, PP. 104-
110, ACI, Prague, Czech Republic Workshop on Computational Approaches to Semitic Languages,
2007 b.
[8] Clifford Lynch, Coalition for Networked Information, February 21, 1998. at http://www.cni.org/
[9] David L. Appleyard, "A descriptive outline of Kemant," Bulletin of the School of Oriental and
African Studies, 1975.
Page | 125
[10] Hetzron, R., the Verbal System of Southern Agew. University of California Publications, Near
Eastern Studies 12, Berkeley and Los Angeles: University of California Press, 1969.
[11] Bender, M.L. and Fleming, H. C. Non-Semitic Language. In: Bender, M., Bowen, J. D., Cooper,
R. L. and Ferguson, C. A. (eds.), Language in Ethiopia, Oxford University, 1976.
[12] Blair, David C., Maron, M. E. (1990). "Full-text information retrieval: Further analysis and
clarification." Information Processing & Management 26(3): 437-447.
http://hdl.handle.net/2027.42/28883
[13] Dimmendaal, G. J., On Language Death in Eastern Africa. In: Dorian, N. (ed.), Investigating
Obsolescence: Studies in Language Contraction and Death, 13-31. Cambridge: Cambridge

University Press, 1989.
[14] Krovetz R., Viewing Morphology as an inference process, in proceedings of the 16th annual
international ACM SIGIR conference on research and development in information retrieval, pp.
191-202, ACM New York, 1993.
[15] Frakes W., R. Baeza-Yates, Information Retrieval: Data Structures and Algorithms Englewood
Cliffs, NJ: Prentice-hall, 1992.
[16] Lovins J., Development of a Stemming Algorithm, Mechanical translation and Computational
linguistics, 11 pp. 22-31, 1968.
[17] Gamst, F.C. The Qemant Agew of Ethiopia: A Study in Culture Change of a Pagan Hebraic
Culture, 1969.
[18] Bender, M. Lionen, et al., Languages in Ethiopia. London: Oxford University Press, 1976.
Page | 126
[19] Goldsmith John, “Linguistica: Unsupervised Learning of the Morphology of a Natural
Language.”Computational Linguistics 27(2): 153 – 198, 2001a.
[20+ Hafer M., and S. Weiss, “Word Segmentation by Letter Successor Varieties”, Information
Storage and Retrieval, 10, 371-85, 1974.
[21] Deepika Sharma, “Improved stemming approach used for text processing in information
retrieval system.” Thapar University, May 2012.
[22] David L. Appleyard, “The radical extension system of the verb in Agaw”, In Goldenberg, G.
(ed.), Ethiopian Studies, 1987.
[23] Hudson, G (1999) Essential Introductory Linguistics. Oxford, Blackwell.
[24] Khoja S. and Garside R., Stemming Arabic text, Computing department, Lancaster University,
Lancaster, 1999.
[25] Popovic and Willett, the Effectiveness of Stemming for Natural-Language Access to Slovene
Textual Data, 1992.
[26] Lenneon, M. Peirce, D. Tarry, B. and Willett, P. “An evaluation of conflation algorithms for
information retrieval.” Journal of information science, 3,177-183, 1981.
[27] Russel, Stuart and Peter Norvig. Artificial Intelligence: A Modern Approach. New Jersey:
Prentice Hall, 1995.
[28] Frakes, William B., Stemming Algorithms. In Frakes, William B. and Baeza-Yates, Richardo, eds.
Information Retrieval: Data Structures & Algorithms. New Jersey: Prentice Hall PTR, 1992.
[29] M. Aljlayl and O. Frieder (2002). On Arabic Search: Improving the retrieval effectiveness via a
light stemming approach. In Proceedings of CIKM’02, VA, USA.
Page | 127
[30] Savoy, “Stemming of French Words Based on Grammatical Categories, 1993.”
[31+ Muzeyn Kedir, “Designing a stemming algorithm for Silt’e Language”, Addis Ababa University,
Addis Ababa, Ethiopia, 2004.
[32+ Nega Alemayhu, “Development of a Stemming Algorithm for Amharic Language Text
Retrieval.” Ph. D Thesis. University of Sheffield, England, 1999.
[33] Tesfaye Bayu, Automatic Morphological Analyzer for Amharic an Experiment Employing
Unsupervised Learning and Auto segmentation, 2002.
[34] Palmer, F.R., the Verb in Bilin. Bulletin of the School of Oriental and African Studies 19, pt.1,
131-59, 1957.
[35] Popovie M & Willett P., Processing of documents and queries in a Slovene language free text
retrieval system, Literature and linguistics computing, 5, 182-190, 1990.
[36] Tesfaye Biru, Incorporation of relevance Data in the Term Discrimination Value. The
University of Sheffield (unpublished), 1987.
[37] Porter M, An algorithm for suffix stripping Program, 143, Pp. 130-137, 1980.
[38] Rijsbergen, C. J. V. (1979). Information retrieval.2nd ed. London: Butterworth.
[39] Lemma Lessa, “Development of stemming algorithm for wolaytta text.” Addis Ababa
University, Addis Ababa, Ethiopia, 2003.
[40] Baeza-Yates, Richardo, Modern Information Retrieval. New York: McGraw Hill, 1999.
[41] Salton G. & McGill N.J, Introduction to Modern Information Retrieval, New York: McGraw-Hill,
1983.
Page | 128
[42] Mayfield J. and McNamee P., Single N-gram Stemming. In Proceedings of the 26thannual
international ACM SIGIR conference on Research and development in information retrieval, pages
415-416, Toronto, Canada, ACM Press, 2003.
[43] Schachter, P., Parts-of-Speech Systems. In: Shopen, T. (ed.), Language Typology and Syntactic
Description, 1, 3-61. Cambridge: Cambridge University Press, 1985.
[44] Girma Berhe, “A Stemming Algorithm Development for Tigrigna Language Text Documents.”
Addis Ababa University, Addis Ababa, Ethiopia, 2001.
[45] Sparck Jones and Willet, what is the role of NLP in text retrieval In: Strzalkowski, T (ed.)
Natural Language Information Retrieval, 1997.
[46] Taylor, P., Hidden Markov models for grapheme to phoneme conversion, 2005.
[47] Paice C., Another stemmer, in proc. Of SIGIR Forum, Vol. 24(3), pp. 56-61, 1990.
[48] Ekmekcioglu, et al., Stemming and N-gram Matching for Term Conflation in Turkish Texts at
http://informationr.net/ir/2-2/paper13.html, 1996.
[49] The Agew Languages. Afroasiatic Linguistics, 3, 31-75. Malibu: Undena Publications, 1976.
[50] R.M. Kaplan. A method for tokenizing text. Festschrift in Honor of Kimmo Koskenniemi’s 60 th
anniversary. CSLI Publications, Stanford, CA. (2005).
[51] Schinke, R, et al. (1996). "A Stemming Algorithm for Latin Text Databases" In Journal of
Documentation. 52(2), 172 - 187.
[52] Appleyard, the Agew Languages: A Comparative Morphological Perspective. Proceedings of

the Eighth International Conference of Ethiopian Studies, 1,581-592. Institute of Ethiopian Studies:
Addis Ababa University, (1988).
Page | 129
[53] Trost, Harald, Computational Morphology. Available at
http://www.univie.ac.at/~harald/handbook.html (2000).
[54] Yonas Fisseha,”Development of Stemming Algorithm for Tigrigna text.” School of Information
Science, Addis Ababa University, Addis Ababa, Ethiopia, 2003.
[55] Wakshum Mekonen ,“Development of Stemming Algorithm for Affan Oromo language Text”,
Addis Ababa University, Addis Ababa, Ethiopia, 2000.
[56] Xu J., Croft B., Corpus Base stemming using concurrence of word variants, In ACM
transactions on information systems, Vol. 16, No 1, Pp. 61-81, 1998.
[57] Tucker, A.N. and M.A. Bryan, the Cushitic Languages. Linguistic Analysis: The non-Bantu
Languages of North-eastern Africa. London: Oxford University Press, 1966.
[58] Dawson J., Suffix removal for word conflation, In Bulletin of the Association for Literary and
Linguistics computing, Vol. 2(3), 1974.
[59] Atelach Alemu, “automatic sentence parsing for Amharic text an experiment using
probabilistic context free grammars.” Addis Ababa University, Addis Ababa, Ethiopia, 2002.
[60] O’Reilly Media, powerful object-oriented programming, learning Python 4th edition
September 2009.
[61] Al-Kharashi, I.A. and Evens, M.W, “Comparing Words, Stems and Roots as Index Terms inan
Arabic Information Retrieval Systems “Journal of the American Society for information science,
45(8), 546-560, 1994).
Page | 130
[62] Melucci Massimo and Orio Nicola. “A novel method for stemmer generation based on hidden
Markov models”. Proceedings of the twelfth international conference on Information and
knowledge management, pp 131-138, 2003
[63] SURAFEL TEKLU, “AUTOMATIC CATEGORIZATION OF AMHARIC NEWS TEXT: A

MACHINELEARNING APPROACH.” Addis Ababa University, Addis Ababa, Ethiopia, July 2003.
*64+ Olivier Tourny, ‘Kedassie’. A Kemant (Ethiopian Agaw) Ritual. In Svein Ege et al (eds.),
Proceedings of the 16thInternational Conference of Ethiopian Studies, Addis Ababa, Ethiopia ,2009.
[65] Yeshiwas Degu, Kemant (ness): The Quest for Identity and Autonomy in Ethiopian Federal
Polity. Mekelle University, Mekelle, Ethiopia, 2013.
[66] Interim Committee. Kemant Nationality Identity and Self-rule question. Request Letter to
House of Federation, Gonder, Ethiopia, 2005 E.C..
[67] Hetzron, The Agew Languages. Afroasiatic Linguistics, 3, 31-75. Malibu: Undena Publications,
1976.
[68] Bethlehem Mengistu, N-gram-Based Automatic Indexing for Amharic Text, Addis Ababa
University, Addis Ababa, Ethiopia, July 2002
[69] Jacob Perkins, Python Text Processing with NLTK 2.0 Cookbook, 2010.
[70] Ilia Smirnov, Overview of Stemming Algorithms, DePaul University, 03/12/08.
[71] Hull D. A. and Grefenstette, “A detailed analysis of English Stemming Algorithms”, XEROX
Technical Report, http://www.xrce.xerox.
Page | 131
[72] Funchun Peng, Nawaaz Ahmed, Xin Li and Yumao Lu. “Context sensitive stemming for web
search”. Proceedings of the 30th annual international ACM SIGIR conference on Research and
development in information retrieval, 2007.
[73] Anjali Ganesh Jivani et al. “A Comparative Study of Stemming Algorithms”, Int. J. Comp.
Tech. Appl., Vol 2, 1930-1938.
[74] M. Tashakori , M. Meybodi , F. Oroumchian . “Bon: first Persian stemmer (2002)”. Published
in: Proceeding, EurAsia-ICT '02 Proceedings of the First EurAsian Conference on Information and
Communication Technology, pages 487-494.
Page | 132

Stemmer of Kemant

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stemmer of Kemant

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF GONDAR

FACULTY OF NATURAL AND COMPUTATIONAL SCIENCE

DEPARTMENT OF INFORMATION TECHNOLOGY

DEVELOPING A STEMMER FOR KEMANTNEY TEXT

SEMALGN ESHETE ABERRA

DEPARTMENT OF INFORMATION TECHNOLOGY

DEVELOPING A STEMMER FOR KEMANTNEY TEXT

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE

DEPARTMENT OF INFORMATION TECHNOLOGY

DEVELOPING A STEMMER FOR KEMANTNEY TEXT

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE

Name Title Signature Date

_______________________Chairperson _____ ______

_______________________Advisor ______ ______

_______________________Examiner ______ ______

TABLE OF CONTENTS -------------------------------------------------------------------------------------------------- III

LIST OF TABLES ---------------------------------------------------------------------------------------------------------- XI

LIST OF FIGURES -------------------------------------------------------------------------------------------------------- XI

LIST OF ACRONYMS AND ABBREVIATIONS ---------------------------------------------------------------------- XII

ABSTRACT -------------------------------------------------------------------------------------------------------------- XIII

CHAPTER ONE ------------------------------------------------------------------------------------------------------------ 1

1.1. BACKGROUND OF THE STUDY ------------------------------------------------------------------- 1

1.2. MOTIVATION ---------------------------------------------------------------------------------------- 3

1.3. KEMANTNEY LANGUAGE AND ITS CLASSIFICATION ---------------------------------------- 4

1.4. STATEMENT OF THE PROBLEM ------------------------------------------------------------------ 6

1.5. OBJECTIVE OF THE STUDY ---------------------------------------------------------------------- 12

1.5.1. GENERAL OBJECTIVE -------------------------------------------------------------------------- 12

1.5.2. SPECIFIC OBJECTIVES -------------------------------------------------------------------------- 12

1.6. SCOPE AND LIMITATION OF THE STUDY ----------------------------------------------------- 13

1.7.1. LITRATURE REVIEW ---------------------------------------------------------------------------- 14

1.7.2. DATA SOURCES --------------------------------------------------------------------------------- 14

1.7.3. EXPERMENTATION METHOD ---------------------------------------------------------------- 14

1.7.4. TESTING PROCEDURE ------------------------------------------------------------------------- 15

1.8. SIGNIFICANCE OF THE STUDY ------------------------------------------------------------------ 16

1.9. ORGANIZATION OF THE THESIS---------------------------------------------------------------- 16

CHAPTER TWO --------------------------------------------------------------------------------------------------------- 18

REVIEW OF RELATED LITRATURE ---------------------------------------------------------------------------- 18

2.1. OVERVIEW OF STEMMING---------------------------------------------------------------------- 18

2.2. STEMMING TECHNIQUES ----------------------------------------------------------------------- 19

AFFIX REMOVAL METHOD ---------------------------------------------------------------------------- 21

SUCCESSOR VARIETY METHOD ---------------------------------------------------------------------- 22

TABLE LOOKUP METHOD ------------------------------------------------------------------------------ 24

N-GRAM METHOD -------------------------------------------------------------------------------------- 25

2.3. CLASSIFICATION OF STEMMING ALGORITHMS -------------------------------------------- 27

2.3.1. RULE BASED APPROACH ---------------------------------------------------------------------- 27

2.4. RELATED WORKS ---------------------------------------------------------------------------------- 35

2.4.1. ENGLISH LANGUAGE STEMMERS----------------------------------------------------------- 35

LOVINS STEMMING ALGORITHM ---------------------------------------------------------------- 36

DAWSON STEMMING ALGORITHM ------------------------------------------------------------- 36

PORTER STEMMING ALGORITHM --------------------------------------------------------------- 37

PAICE/HUSK STEMMING ALGOTRITHM -------------------------------------------------------- 38

KROVETZ STEMMING ALGORITHM -------------------------------------------------------------- 38

2.4.2. Bon: first Persian stemmer ------------------------------------------------------------------ 39

2.4.3. ARABIC STEMMING ALGORITHMS --------------------------------------------------------- 39

2.4.4. STEMMING ALGORITHM FOR ETHIOPIA LANGUAGE ---------------------------------- 40

SILT'E STEMMER ------------------------------------------------------------------------------------- 40

WOLAYTA STEMMERS ------------------------------------------------------------------------------ 41

OROMO STEMMERS -------------------------------------------------------------------------------- 42

TIGRIGNA STEMMERS ------------------------------------------------------------------------------ 43

AMHARIC STEMMERS ------------------------------------------------------------------------------ 44

CHAPTER THREE ------------------------------------------------------------------------------------------------------- 46

3.2. WORD FORMATION IN KEMANTNEY --------------------------------------------------------- 47

3.3. INFLECTIONAL SUFFIXES OF KEMANTNEY--------------------------------------------------- 49

_____________________Chairperson _ ____

_______________Advisor

_______________Examiner