11 - Chapter 5 PDF

89
Chapter 5: Morphological Analysis
5.1. Introduction
The study of words is not altogether a new field of research in linguistics, though the
definition of the term word is a long standing problem in linguistics because of its ambiguous
nature. It either refers to a string of characters as it appears in speech or writing or it refers to
a more abstract entity, a part of the structure of the language as represented in a dictionary.
Aronoff (1981: 07) states that words are different from sentences; their structures are much
more varied, and though there is a single principle governing the structure of most complex
words, the principle must be applied to different class of words. Words, once formed, persist
and change; they take on idiosyncrasies, with the result that they are soon no longer generable
by a simple algorithm of any generality. All regular word-formation processes are word-
based. A new word is formed by applying a regular rule to a single already existing word.
Both the new word and the existing one are members of major lexical categories (Aronoff
1981:21).
The meaning of word is not always determined compositionally as in some cases the
word as a whole bears the meaning while in other cases the relationship between the meaning
of the parts of a word and the meaning of a word as a whole can be obscure. So there are
considerable difficulties pinning down any universally applicable notion of word, even when
we restrict ourselves to morphological criteria within a single language (Spencer 1991: 45).
One way to define wordhood is in terms of their linguistic contrasts, such as phonology,
syntax and semantics. Such criteria, when developed for individual languages, may be quite
successful. There are veiy few semantic properties of word which will distinguish them from
morphemes or phrases. Words are generally referentidly opaque (Spencer 1991: 42) i.e. it is
impossible to see inside them and refer to their parts. Rules of syntax, as generally conceived,
take words as their smallest unit and compose them into phrases and sentences. Here a word
is a minimal five form, the smallest unit which can exist on its own. A language has a
particular grammar of word structure which nonetheless conforms to certain quite general
principles governing possible word structures in language (Selkirk 1983: 09).
It is also observed that some morphemes which are used within a word may not have
any meaning. Once the words are in the lexicon, the morphemes out of which these words are
formed and into which they are to be analysed, do not have constant meanings and in some
cases have no meaning at all. Moreover, there is no constant meaning for some prefixes
which can be attributed to any of them. Here starts the basic trouble with words. Because
words, though formed by regular rules, change when required. Therefore it is difficult to
segment the meaning of individual word in a principled manner.
To overcome this problem Halle and Chomsky (1968) suggested that the dictionary
should contain the actual words as well as their idiosyncrasies. In their opinion a word can
convey more than it is expected to mean. These idiosyncrasies would include phonological
90
and syntactic exception features which a word might have. They would also include its
semantic and syntactic peculiarities which are not provided by general rules of morphology.
But the problem is that there are some words which are so idiosyncratic that their meanings
are totally divorced from what is expected by general rules. So, it is difficult to find how it
could mean something different from its expected meaning without damaging its rule of
generation.
The morphological analysis of word is concerned with the analysis of morphemic

structure of words where the type, form and functional role(s) of morphemes are investigated.
The study also involves morpho-phonemic, semantic and syntactic analysis of words along
with their inflectional, conjugational and derivational changes. From computational point of
view, morphological analysis of words is an important part of NLP, which involves automatic
extraction of morphemic, semantic and syntactic information from words used in a sentence
so that the information can be used in higher levels of analysis. In this process each word is
analysed and separated, morphemes are pulled apart and their roles are studied. Other
properties of words such as gender, number, person, tense, aspect, modality, particles etc. ate
also determined and studied.
In this chapter an effort is initiated to find out the morphemic structure of the Bangla
words used in the corpus texts to see how these morphemes are used in the formation of
surface words. The chapter is arranged as follows: section 5.2. gives a brief discussion on
earlier studies in this area, section 5.3. considers word formation processes in Bangla, section
5.4. encounters affixation processes which include different markers, case termination,
endings etc., section 5.5. briefly discusses the formation and function of the post positions,
section 5.6. analyses the process of compound word formation, section 5.7. discusses process
of formation of reduplicated words, and section 5.8. is the conclusion.
5.2. Earlier Studies

Morphemes are generally considered as fundamental units of words. The Indo-Europeanists
studied morphemes philologically to find out the roles of word in understanding of a natural
language. The Prague School followers paid attention to phonological and morpho-phonemic
aspects of words. To the structuralists, word is a linguistic form which beam no phonetic-
semantic resemblance to any other form (Bloomfield 1933: 161). The descriptivists related
semantics with words to investigate their difficulties in productivity. To them, morphemes are
the smallest individually meaningful elements in utterances of language which play important
roles in word formation (Hockett 1958: 123). Within the generative framework, the domain,
of word is considered not to differ from that of sentence. The school of generative semantics
insisted that word is fundamentally not different from any other syntactic unit. Thus from
Indo-Europeanists to the school of generative semantics, every one ignored the independent
entity of words.
Among individuals, Matthews (1974) centres around words and lexemes, gives
traditional treatment of inflections and sandhi, considers different morphological processes
and investigates the relation of morphology with phonology and syntax along with the role of
morphology in generative grammar. Aronoff (1981) considers a morpheme as a phonetic
string which can be connected to linguistic entity outside that string. What is important is not
its meaning, but its arbitrariness. He conceived morphology in general framework of
91
transformational grammar as outlined in the works of Chomsky (1972). Bybee (1985) is

concerned with both morphology and morpho-phonemics of words. She proposes that in
morphology words should not be studied independent of meaning. The meaning of a
morpheme and the meaning of the context determine many properties of formal expression of
words.
Among the generative morphologists, Selkirk (1983) examines complex words -

compounds and those involving derivational and inflectional affixation - from a syntactic
stand point that encompasses both the structure of words and the system of rules for
generating that structure. Jensen (1990) claims that morphemes are primarily structural units
which are typically but not necessarily meaningful. The role of morphemes in the process of
word formation can be explored by observing their structural potentials. Spencer (1991) deals
with the interface between morphology and phonology along with the types of non-
coneatenative morphology. Moreover, he studies the theories of word structure including
inflectional morphology, investigates the interface between morphology and syntax and
explores the processes of grammatical relations. He considers that a morpheme can be
thought of as a cover term for various relationships which hold between words, and a more
concrete level at which words and morphemes are realised as sounds (or at least as
phonemes). Katamba (1993) introduces some notions fundamental to all morphological
discussions relating words. He also explores the relationship between morphology, phonology
and words in generative theory. To him the morpheme is the smallest difference in shape of a
word that correlates with the smallest difference in word or sentence meaning or in
grammatical structure.
5.3. Word Formation Processes in Bangla

Within the theory of word formation, a new word is formed by performing some morpo-
phonemic operation on an already existing one. In most cases the effect of this morpho-
phonemic operation would be the addition of some affixes to the already existing word. In the
model of Aronoff (1981) the Word Formation Rules (WFRs), in their productive or synthetic
function, create new words by adding morphemes to old words. This means, in effect, that the
new word would contain the old. The meaning of the new word will also be a compositional
function of the meaning of the word it contains (Aronoff 1981: 25). In other words, the
generation of a new word is simply making some morpho-phonemic changes on the existing
word to adjust the newly-found idea which has relation with the idea which the already
existing word contains and conveys. Therefore, the new word would contain, at least a
portion, if not all, the inherent semantic property of the old word.
One has to determine what sort of new words can be generated, in the process
mentioned above, in a particular language. There must be some rules for word formation for
each lexical category in the language. In Bangla, a word is a string of graphemes that appear
in print between the spaces or punctuation marks following an orthographic convention. It is
observed that in Bangla, word formation rules are highly productive for major lexical
categories such as noun, pronoun, adjective and verb but less productive to other lexical
categories namely indeclinable, adverb etc. For generation of new words, the major processes
are: (i) inflection, (ii) derivation, (iii) sandhi, (iv) compounding and (v) reduplication. The
following table (5.1) would show the processes of generation of words in Bangla.
92
Table 5.1.
Processes of generation of words in Bangla.
No process examples glossary
1 Single morpheme (word) fe + 0 —> fe [din + 0 —> [din] day
2 Adding suffix far + -4 days
(inflection)
[din+-gulo]-4 [dingulo]
3 Adding case (inflection) fe + -ccw —> [din + -er] —» [diner] of day
4 Adding suffix and case fe + + -us -4 in days
(inflection) [din + -gulo + -te] —»[dingulote]
5 Adding prefix (inflection) sj- + fe—»sjfe [su- + din] -4 [sudin] good day
6 Adding prefix and suffix + fe + -'OtM —» good
(inflection) days
[su- + din + -gulo] -4 [sudingulo]
7 Adding prefix and case ^- + fe+-ccw->^few of good
(inflection) day
[su- + din + -er] —> [sudiner]
8 Adding prefix, suffix and ^-+fe + -’G0?!T + -W -4 to good
case (inflection) days
[su-+din + -gulo + -ke] -4 [sudinguloke]
9 By derivation fe + -fo?F -4 "difw [din-f—ik]—> [dainik] Daily
10 Adding case with derived fe + -fb?F + -KIW -4 C'iRw’si of Daily
form (inflection after [din + -ik + -er] —> [dainiker]
derivation
11 Adding suffix with derived fer +-fb^ + —> wR<p'CK5Tt the
form (inflection after Dailies
[din + -ik + -gulo] -4 [dainikgulo]
derivation)
12 Adding suffix and case fe+-fb¥ + to Dailies
with derived form
[din+-ik + -gulo + -ke] -4 [dainikguloke]
(inflection after derivation)
13 By sandhi fe + sis —> fets [din+ant&]—» [dinantS] days’s
end
14 Adding case after sandhi fe + 'SIS + -tCW—> felcsw of days’s
(inflection after sandhi) end
[din + anti + -er] —> [dinanter]
15 By compounding fe + -4 day and
time
[din + kal] —> [dinkal]
16 Adding case after fe + + -C03 —» fe<?ll=1S3 of day
compound (inflection and time
[din + kal + -er] -4 [dinkaler]
after compounding)
17 By reduplication fe+fe —> Rife day by
day
[din + din] [dindin]
18 Adding case after fe+-co + fe+-cofeifer day by
reduplication (inflection day
[din + -e + din + -e] -4 [dinedine]
after reduplication)
93
In the process of word formation the use of bound morphemes with free morphemes is
always controlled by some rules applicable to morphemes. In the list above, for no. 9 the rule-
of derivation, and for no. 13 the rule of sandhi have operated upon the root words. Therefore,
it is necessary to identify, by analysing the morpho-phonemic structure of words, which rules
would operate on them for producing new words. Because existing words sometimes tend to
be resistant to any system which derives their properties using general rules.
The hypothesis that each word has its internal constituent structure implies that there
must be a Word Structure Grammar {WSG) to generate that word. We have tried to look into
the constituent structures of Bangla words belonging to different lexical categories. Each and
every word must belong to some lexical category, the exact category being determined by the
Word Formation Rules (WFRs) which produce the words (Aronoff 1981: 49). For instance, in
Bangla the suffix [-tv] produces nouns (e.g. [naritva] "femininity" etc.), while the
suffix -4FT [-ban] produces adjectives (e.g. [d&yaban] "kind" etc.).
The corpus shows that in the texts two types of word are used: (i) root words, and (ii)
inflected words. The root words are mostly free morphemes (noun, adjective, pronoun, verb
roots etc.) which have the potentiality to be inducted in the lexicon while inflected words are
generated by some grammatical concatenations between root words and suffixes which are
mostly bound morphemes. These bound morphemes are not generally inducted into a lexicon
though they could be entered in the lexicon for detail grammatical analysis.
At the time of morphemic analysis each surface word is considered as a separate

linguistic unit with a structure and semantic content of its own. The analysis provides
information necessary for understanding syntactic and semantic role(s) of words. Generally,
the words have two entities: (i) lexical entity which is context free, and (ii) syntactic entity
which is context bound. At lexical level a word can belong to multiple word-classes with
different meanings and but its final word-class and meaning are determined by its syntactic
entity. For instance, the surface word WM [chaRa] at lexical level can be an adjective or an
adverb or a noun or a verb. The final word-class is determined by its context of use in a
sentence. However, some words can be ambiguous in form and meaning both in lexical and
syntactic level. For such cases, allocation of final word-class depends on some constraints of
words such as their forms, their formation, their abilities to change and their abilities of using
affixes or post-positions etc.
In Bangla, a word is formed by putting some morphemes together with or without

applying different morpho-phonemic processes. For each part-of-speech there are sets of
words (free morphemes) and sets of affixes (bound morphemes for gender, number, person,
tense, aspect, modality as well as for case, particles, negation, affirmation, vocative
expression etc.) along with some rules for surface form generation. Following are the main
processes for Bangla words formation:
(i) words are generated by systematic arrangement of morphemes. Adding of morphemes

may cause euphonic (sandhi or pratay) changes in the final structure of words.
(ii) the word forming morphemes are synthetic by nature i.e. morphemes can join with
others to generate words. Among them some are highly productive (prefixes, suffixes,
case markers etc.) whereas others are less productive.
94
(iii) among the morphemes some are bound (affixes) which cannot be used independently
in a sentence and some are free (root Words) which can be used independently.
(iv)the number of bound morphemes is nearly fixed in the language. It can be increased
only when new morphemes are borrowed from other language or coined within the
language.
Following traditional grammar words generally belong to: noun, pronoun, adjective,
verb, adverb, indeclinable and post-positions. Among these some are in root form with or
without affixes or markers, some are with inflections, some are in derived form, and some are
in derived form with inflections or affixes. Moreover, there are compound and reduplicated
words with or without inflections. For automatic parsing and tagging, the words belonging to
each lexical category are to be analysed structurally to understand the patterns of their
formation. In the following sub-sections we have tried to understand this by analysing the
words structurally.
5.3.1. Morphemic Structure of Nouns

Structurally, nouns are of two types: (i) root form without inflection, and (ii) root form with
inflection. The inflection includes suffix, case, gender, number, person and particle markers
which are used at the end of a root form. In the corpus nearly 40% of nouns are non-inflected,
most of which are potentially competent for using inflections if required though their use
without inflection is also regular in the text. However, every inflection is not to be used with
every noun. There is a system of grammatical agreement between the noun and the inflection
at the time of using. Similarly, the prefixes can be added to nouns following some rules of
grammatical agreement between prefixes and nouns where the acceptability of prefixes with
nouns has to be judged. The following table (5.2) shows the processes of inflected noun
formation.
Table (5.2)
Processes of inflected noun formation
No process examples glossary
1 root word + 0 + -0 —> [manus + -0 —> [manus] man
2 root word + particle + -$ —> NFjst [manus i] —» [manusi] man himself
3 root word + case [manus+-er]—> [manuser] of man
4 root word + number ^ + -,Ot5Tf —> men
[manus + -gulo] —> [manusgulo]
5 root word + case + + -W+ to man
particle himself
[manus + -ke + -i] —> [manuskei]
6 root word + number ^+-coat+-$ -> men
+ particle themselves
[manus + -era + -i] —> [manuserai]
7 root word + number ^4 + + -u$-> to men
+ case
[manus + -gulo + -ke] —»[manusguloke]
8 root word + number + -«£off + 47F + -$ -» to men
+ case + particle themselves
[manus+-gulo + -ke + -i] -»[manusgulokei]
95
5.3.2. Morphemic Structure of Pronouns

By simple observation three types of pronouns are found in corpus: (i) simple pronominal
forms (e.g. [ami] "I", [tumi] "you", <?T [se] "he" etc.) which are used in the text
without any inflection (case, number, person markers etc.), (ii) inflected pronominal forms
(e.g. [tomaderke] "to you", csrd^rer [segulir] "of those", [apnadig&ke] "to
you" etc.) which are always inflected with case, number or person markers, and (iii) adjectival
pronominal forms (e.g. [tvadiyi] "of him", :!3#hl [sv&kiya] "mine", [et&] "this much"
etc.) which are mostly used as adjectives in the text without any suffix markers.
The formation of inflected pronouns is far more complicated than that of inflected
nouns, because unlike nouns, the components of pronouns quite occasionally replace their
positions in their linear arrangements. Another difference form noun is that a pronoun
whether inflected or not never uses a prefix. On the other hand, almost all pronouns use
inflections when they are used in the texts. Generally, pronouns are used as pronouns in the
texts. However, there are some instances where a pronoun is used as a noun in the text such
as given below. However, this kind of use is very rare in the texts if not impossible:
cst cwft 1©tiM

[tumi to dekhchi 'amir'jale atke gecM]
"I see you are trapped in the net of T.
Some pronominal roots undergo morpho-phonemic changes when they use inflection.
For example, roots tpi- [turn-] and [tu-] change into [tom-] and C®t- [to-], respectively
whenever the plural suffix -3f [-ra] is used with them. The example is displayed below:
singular plural
[turn-] : [tumi] "you" :: osbr- [tom-] : WlNal [tomra] "you"
[tu-] : [tui] "you" :: C®t- [to-] : [tora] "you"
The total number of pronouns in Bangla is around 750 including both non-inflected
and inflected forms. However, the number of pronoun roots is around 50 and that of suffixes
is 32, which by some rules of grammatical agreement between roots and suffixes constitute
the total list of pronominal forms. Moreover, there are some pronouns which are used as
adjectives in the texts. Among the inflected pronouns some personal and demonstrative
pronouns are most frequently used in the texts. The corpus cites a new form (C^t [kei] "who
else") which is not found in the pronoun list Probably, it is derived in the following way:
(some one) + -$ (particle)
There is a system in the formation of inflected pronouns. Generally, a pronoun is

made of root part and suffix part. The suffix part includes person, number (singular/plural),
case markers and particles. In the table (5.3) below the morphemic structure of inflected
pronouns is given with examples:
96
Table (5.3)
Morphemic structure of inflected pronouns
no. process examples glossary
1 root + 0 CTf + -0 —> <?T [se + -0 -4 [se] he
2 root + particle CT + _» C5$ [se + -i] -4 [sei] himself
3 root + article CT + -f& —> [se + -ti] —> [seti] that
4 root + article + particle CH + -i + 4$ —»[se + -ti + -i] -4 [setii] that
5 root + particle + article csr + [se + -i + -ti] -4 [seiti] that
6 root + article + case C5T + -H +-C^-> C5#C^ [se+-ti+-ke]-4 [setike] to that
7 root + article + case + C5T+-f + -C<jr+-|-4 to that
particle
[se + -ti + -ke + -i] —» [setikei]
8 root + particle + article C5f+-t + -ft + -C^-4 t^lBw to that
+ case
[se + -i + -ti + -ke] -4 [seitike]
9 root + particle + article CST + -f^ + -C^ + -t; —> to that
+ case + particle
[se + -i + -ti + -ke + -i] —» [seitikei]
10 root + number csf + -<#r _> C5T<3[% [se + -guli] -4 [seguli] those
11 root+particle+number (?f+-^+--<3l%-4 [se+-i-t-guli]—» [seiguli] those
12 root + number + case C5T + -<of% + -foi -4 of those
[se + -guli + -ir]-4 [segulir]
13 root + particle + C5T + + -<#T + -fbsT -4 CT'Ofe of those
number + case [se + -i + -guli + -ir] -4[seigulir]
•14 root + case tsW + -4 wtstcsr [taha + -ke] -4 [tahake] to him
15 root + number + case + tshtf + ~<M + -w + -t -4 to them
particle
[taha + -der + -ke + -i] -4 [tahaderkei]
16 root + number + article <3T$T + -0BT + -fl + -C® -4 V5l5trate to their's
+ case
[taha + -der + -ti + -te] -4 [tahadertite]
17 root + number + article ^r+-cw + -f& + -Ct5 + -$ -4 in their's
+ case + particle
[taha + -der + -ti + -te + -i -4 [tahadertitei]
18 root + person vjsrfsi + -fo -4 $!# [am + -i] -4 [ami] I
19 root + person + gsrfsj + _fo + -4 vsnfit [am+ -i + -i]-4 [amii] I myself
particle
.20 root +case +article tshr + -coi+-It -4 tsisisrit of his
[taha + -er + ti] -4 [taharti]
21 root + case + article + tSl+ -COI + -f + -W -4 to his
case
[tahl + -er + -ti + -ke] -4 [tahartike]
22 root + case +article + WRT + -Col + -t + -C¥ + -t -4 of his
case + particle
[taha + -er + -ti + -ke + -i] -4 [tahartikei]
97
The analysis shows that the markers quite often shift their respective positions in case
of inflected pronouns formation. This information and the modes of their arrangement are
necessary for developing algorithms for automatic detection and parsing of pronominal forms
by machine. Similarly, at the time of surface pronominal forms generation, the logical and
possible mappings of the components are to be used to stop generation of wrong pronominal
forms.
5.3.3. Morphemic Structure of Finite Verbs

The corpus cites two types of verbs used in the text: (i) root verbs, and (ii) conjugated verbs.
In the list of conjugated forms finite, non-finite as well as causative and gerundial verbs are
taken into consideration. Moreover, those verbs which are originated from nouns by adding
verbal suffixes (e.g. [hatiyechi] "(I/we) have stolen" etc.) are also included.
Most of the verbs used in the corpus text are in their conjugated forms. Probably, the
verb roots, devoid of information provided by verbal suffixes, are not competent enough to
denote a complete sense of action. Therefore, to give a complete sense of action as well as to
provide aspectual, temporal and other information the roots need to use some grammatical
properties (i.e. markers of person, number, honorific, non-honorific, gender etc.) where all
information are stored. As a result, for understanding the function of the conjugated verbs we
have to analyse their suffix parts to extract relevant information. This helps us to process
conjugated verbs without paying emphasis on their semantic part. In the table (5.4) below the
possible morphemic structure of conjugated verbs is given:
Table (5.4)
Morphemic structure of conjugated verbs
1 root + 0 ^ + -0 -> W [idr + _0] [Jc&r] to do
2 root + person W + -fo -4 [kar + -i] -> [k&ri] Ido
3 root + auxiliary + + -s- + -fe -4[kar+-ch+-i] -4 [k&rchi] I am doing
person
4 root + causative + + -or + -w+-fo -» causing
auxiliary + person others to do
[k&r + -a + -cch + -i] —> [k&racchi]
5 root + aspect + w + -fotsj + -w + -fo —> <j>Qwsfie I am doing
auxiliary + person [k&r + -ite + -ch + -i] -4 [k&ritechi]
6 root + tense + 5SRT + -fCM + -Otn -4 I did
person
[k&r + -il + -am] —»[k&rilam]
7 root + auxiliary + ^3 + -■§>■+ -fb«T + -OUT -4 WmIN I was doing
tense + person
[k&r + -oh + -il + -am] —> [karchilam]
8 root + aspect + +-font+-w+ -fcM+-our -4 I had done
auxiliary + tense
[kSr+-iya+-ch+-il+-am] —» [kMyachilam]
+ person
9 root + causative + w+-cT+- doing
gerund
PfcSr + -a + -no] -4 [k&rano]
98
At least five types of grammatical information are stored within the suffix part of a
conjugated verb: aspect, auxiliary, tense, person and particle. However, all information are
not used always with every verb root. In certain cases only one or two information are used
while in some other occasions, all information arc used. The sequence of using the markers
with regard to root is always uniform, i.e. no marker would interchange its position with other
in respect with the root. Generally, the root takes the leftmost position following which come
markers of aspect, auxiliary, tense, person and particle one by one in sequential order. If any
marker is dropped from suffix, its immediately following marker would occupy its place.
However, there is only one variation of this system. The particle (mostly emphatic) can shift
its position after aspect marker if required, though it is generally used at the final position.
This shifting of particle marker from last to third position sometimes creates problem for
automatic identification and parsing. Following this method around 105 valid conjugated
verbs can be obtained from a single verb root. In fact, in our corpus we have come across
nearly 105 valid conjugated verbs of a single verb root (including simple, causative and
gerundial forms).
5.3.4. Morphemic Structure of non-finite Verbs

Non-fmite verbs occupy a major portion in the total occurrence of conjugated verbs in the
corpus texts. The idea of non-finiteness is defined semantically but determined structurally
following some specific makers at the end of verb roots. For identifying non-finite verbs there
are eight markers (-fc?Jt [-iya], -It [-ya], -as [-ye], -to [-e], -fbc® [-ite], -os [-te], -foc®T [-ile]
and -OT [-le], including both sadhu and calit forms) which are used at word-final positions.
Among these, -fowl [-iya], -Tit [-ya] and -01 [-ye] have some restrictions in distribution with
the verb roots. It is noted that -fbflt [-iya] is used with verb roots ending in -fb [-i], -cot [-o], -
q [-u] and in consonant, while -It [-ya] is used with verb roots ending in -fb [-i], -cot [-o] and
-q [-u] only. Similarly, -01 [-ye] is used with verb roots ending in -fb [-i] and -q [-u], while -
CO [-e] is used with other verb roots. For other two non-finite markers there is no restriction of
distribution. The table (5.5) below presents the morphemic structure of non-finite verbs with
examples:
Table (5.5)
Morphemic structure of non-finite verbs
no examples golssary
1 ^ + -CO —> W [Mr + -e] —> [kare] doing
2 ft + -01 —» fct [di + -ye] —> [diye] giving
3 + -fbqf —> <jRiqi [Ml + -iya] —»[baliya] saying
4 ft + -It —> ftst [ni + -ya] -» [niya] taking
5 f+ [dhu + -ite] -» [dhuite] to wash
6 + -CS —> scto [MI + -te] [cAlte]. to go
7 *IT + -»*<r$D5T [kha + -ile] [khaile] having eaten
8 ft H—(yf —^ fttyT [di + -le] —^ [dile] having given
99
If the markers given above and the algorithm for appropriate matching between roots
and suffix parts are followed for identifying non-finite verbs, it is hoped that almost all the
non-finite verbs would be automatically identified and parsed. In fact, one of my colleagues
has successfully identified and parsed all non-finite verbs in the corpus using this schema.
Details can be found in Chaudhuri, Dash and Kundu (1997).
5.3.5. Morphemic Structure of Adjectives

Structurally, three types adjectives are found in the corpus texts: (i) simple adjectives without
any addition of suffixes (e.g. 'SM [bhal&] "good", [baRi] "big", frt'Qt [thanda] "cold" etc.).
In this group the cardinal numbers [ek] "one", [dui] "two", [tin] "three" etc.) and
the ordinal numbers (2f$l4 [prlth&m] "first", fvSfa [dvitlyfi] "second", cufet [cautha] "fourth" etc.
are included, (ii) adjectives derived from nouns or verbs by the process of derivation (e.g.
OTE& [metho] "of field", [sahure] "of city", [bhabuk] "thoughtful", Stfsfc [dharmik]
"religious", [b&rdhiti] "extended", C¥5l% [ketabl] "copy book", [dUti] "running",
[ghum£nt&] "sleeping", [bMniyi] "welcoming" etc.), and (iii) adjectives derived by
adding specific adjectival suffixes to nouns or verbs. In most cases these are considered as
compound adjectives(e.g. df^Sflw [dehijatl] "bom in body", ^54 [brhSttim&] "largest", *l<Wli
[dhab&man] "running", [ph&l&pr&su] "result oriented", [pranablntd] "lively",
fW'tl# [bitt&salT] "rich", [dSyahln] "kindless" etc.).
Moreover, as with the nouns, a prefix can be added to an adjective following the rules
of grammatical agreement between the two. In the table (5.6) the morphemic structure of
adjectives is displayed with examples:
Table (5.6)
Morphemic structure of adjectives
1 root adjective form + -0 —> [bhalci + -0] —> [bhal&] good
2 adjective + adjectival longest
suffix
[dirgha + -tSmi] —> [dirghMm&]
3 prefix + adjective '5T- + -4 [a- + purna] —> [apurni] unfulfilled
4 verb root +adjectival 5*T + -fa -4 E*ifa [dll + -ti] -4 [c&lti] running
Suffix
5 noun +verb root ^ + *IW + -Ot -4 stained with
+adjectival suffix blood
[riktd + makh + -a] -4 [r&kt&makha]
6 noun + noun + +-514 -> during war
adjectival suffix
ftuddha + kal + -In] -4 [ûddh&kalin]
7 pre + verb + tsr- + SR4 + -<5ta -4 unbearable
adjectival suffix
[a- + s&han + -lyi] -4 [asShanlyi]
100
5.3.6. Morphemic Structure of Adverbs

Structurally, three types of adverbs are found in the corpus texts: (i) simple adverbs used in
the texts without any addition of suffix (e.g. [kMacit] "rarely", dftPt [pray&&&]
"regularly", [mOlMJ "mainly" etc.), (ii) adverbs derived mostly from adjectives and
nouns by adding suffix -CO [-e] (e.g. te [dhlre] "slowly", MMiTT [gop&ne] "secretly",
[bege] "fast", Pidk/i [niijine] "lonely” etc.), and (Hi) adverbs derived by adding specific
adverbial suffixes to nouns. In most cases these are considered as compound words (e.g.
\»i°iv3lw [bhal&bhabe] "well", [bhulkrame] "by mistake", IwrHOT [nidenpSkse] "at
least", 3Wtt5 [bhriimb&sM] "by mistake" etc.). The morphemic structure of adverbs is given
below in the table (5.7) with examples:
Table (5.7)
Morphemic structures of adverbs
1 root adverb form + -0 —> toft [tSkhSn + -0] —> [tSkhSn] then
2 noun + adverbial + -CO —> ?PtC5 [kach + -e] -4 [kache] near
suffix
3 adjective + adverbial fta + -10 -»Iter [dhlr + -e] -4 [dhlre] slowly
suffix
4 adjective + noun + fom+to+-co -4 specially
adverbial suffix
[bises + bhab + -e] -4 [bisesbhabe]
The analysis of adverbs in corpus shows that they generally modify the action denoted
by verbs in the sentences. Moreover, some post-positional words like 'SIC'W [apeksa],
[k&rtrk], Sffl [dvara], [d&run], \3RT Q&ny&], vTOT [janye], TO [tAre], 2tfo [prati], qim [nyay],
[b&rab&r], [sAhit], M3? [bSnam], ?fl^f [bab&d], 3$ [bAi], fort [bina] etc. are
sometimes used with nouns, pronouns or adjectives to denote adverbial sense.
5.4. Affixation
Affixes are bound morphemes which are never used independently in the text without support
of a root word. They are considered as derivational morphemes which have the function of
creating a new word out of another existing word (Spencer 1991: 39). However, in Bangla,
they include both inflectional and derivational affixes having meaning of their own by which
they can affect the semantic property of a word. In the word formation processes these affixes
are recursive in nature. They can be used before, after or within nouns, pronouns, adjectives,
verbs, adverbs or compounds. Traditionally affixes are divided into three groups: prefix, infix
and suffix. In Bangla the number of prefixes is less in comparison with suffixes while the
number of infix is very few. The details of each group of affixes are given in the following
sections.
101
5.4.1. Prefixes
A prefix is used in front of a word and is considered as an integrated part of a word.
Generally, prefixes have some fixed meaning attributed to them but now it is claimed that
prefixes have no constant meaning which can be attributed to any of them (Aronoff 1981: 13).
They have an important role in word formation process as the use of a prefix can add an extra
shade of meaning to the existing word.
The total number of prefixes used in Bangla is around 70. Among them the number of
prefixes inherited from Sanskrit is 20. The number is almost fixed and all the forms are more
or less regularly used in the language. The number of prefixes acquired from native sources is
around 30 which are mostly collected from native sources and borrowed form neighbouring
languages. The number prefixes of foreign origin is also around 30. All these forms are
borrowed either form Persian group of languages (mostly Arabic or Persian) or European
languages (mostly English) by the process of lexical borrowing. The number is not fixed as it
can be increased with the increase of lexical borrowing from different languages. The table
below (5.8) gives the total list of prefixes found in the corpus.
Table 5.8
The list of prefixes found in the corpus
Sanskrit 'Slfe- [ati-] vs#- [adhi-] vspj; [anu-] ®r*f- [ap-] safR- [api-] W- [ab-] \Sjfe-
[abhi-] vsjt [a-] [ut-] [up&-] ’p- [dur-] R- [ni-] Rl- [nir-] ■rat- [p&ra-]
■'#- [pfiri-] 2f- [pr&-] 2lRi- [prfiti-] R- [bi-] - [s&m&-] 3J- [su-]
Bangla 3T- [a-] srar- [aj-] ®RT- [ana-] v5Jt- [a-] sit*t- [ag-] 3i1$- [at-] Wg- [aR-] ^T-
[un&-] ^ra- [upar-] viMR- [upri-] f- [ku-] *f<3- [gand&-] fw- [chmic-] qt- [na-
] R- [ni-] f%5- [nit-] [pach-] "#3- [pati-] ■'tW- [pas-] [pich-] R- [bi-
] tra- [bh&r-] [mig-] m- [majh-] fe- [mit-] m- [ram-] sr- [s^-]
[su-] 5t- [ha-]
Foreign W9- [ab-] v5Jbr- [am-] W- [kar-] [khas-] f'f- [khus-] «1*f- [khol-] ra-
[glr-] 'ra^T- [dfib^l-] ra- [dfir-] Jtf- [na-] R^t- [nim-] R- [phi-] [phul-] 3-
[b&-] ^t- [b&d-] <3- [be-] sra- [sab-] ra- [Mr-] [haph-] era- [hed-]
There are some words found in the corpus texts like sra [atS.], [an], [antHh],
[ud], [et&], fra [ciri], W3 [tM], W [t&t], crcit [t&tha], (ss [te], far [tri], rat [d£$], \ [du], $
[duh], ft [dvi], H [n&], Rs [nih], ras [pific&], ra [p&r], ‘‘JsTt [pure], 2ft^ [prak], ft7 [phri], W [be],
W [yltha], stf© [sat], ^ [sv&] etc. which have the potentiality to be considered as prefixes
because in most occasions they play the role of a prefix of the words in the text However, in
the traditional grammar these forms are not considered as such.Among the prefixes the forms
like v5r- [a-], [a-], f- [ku-], [dur-], [na-], fra- [nir-], 21- [pr&-], w- [be-], *T- [si-] etc.
are highly productive in nature because they can be added to any noun or adjective to generate
a new word having a different sense or meaning. Moreover, forms like vsjfe [ati], 'SFJ [anu], ^
[dur], rat [p&ra], 2ife [pr&ti], 5R [sSmS], 'Silt [at], Sit'S [aR], ]§ra [up&r], [upri], «Tf [na], ‘‘It?
102
[pach], -'ttfo [pati], *IH [pas], f*t? [pich], m [bhSx], [majh], [ram], ^ [khas], ^
[khus], <3IW [khos], tsg?T [dabal], f*T [phul], [bid], ^ [sab], W [haph], C^5 [hed] etc. can
be used as independent words in the text. In their individual use they are equally efficient in
using case tertminations and other endings like regular words in the language.
The use of a prefixes can change the lexical category or part-of-speech of a word. The
newly formed word can be either opposite in meaning or can change the meaning of the
existing word. A set of examples where a word is changed in meaning after using different
prefixes is given in the table (5.9).
Table 5.9
Change of meaning of words by using prefixes
Prefixed word Glossary Prefixed word Glossary
sights [adhigM] under control '5FTW® [anag&ti] not has come yet
'Spj'lts [anugM] obedient [apagM] spoiled
3TW5 [ab&gM] known >5ife*K3 [abhigM] informed
®IT*K3 [agM] appeared [udgM] invoked
[up&gM] copulated [nirgM] extricated

"SW'5M [p&ragM] taken by others [pr&gM] advanced
[pr&tyagM] returned fws [bigM] passed
yfo [durgM] distressed 5PTC5 [s&gM] by own
[sugM] with majestic gait 5TW5 [s&mgM] appropriate
5.4.2. Suffixes
For the purpose of easy computability and processing of words, the term suffix is used here in
its broadest possible sense. Here, all those forms are considered as suffixes which are added
immediately after the root (base) word. It is observed that the in case of inflected words, the
determination of parts-of-speech of words primarily depends on the suffix part of words. Two
parallel examples are given to elucidate the idea.
English : [[[[establish]fy ment]„ arian]„y ism]„

Bangla : [[[[[m&r]fv &n&]„ sila]^ der&]n ke]n
The examples show that a valid word can be formed at each block and the part-of-
speech of the word will be determined depending on the last part of the word (specified by
bold subscript in each block). In this parameter, there are four types of suffixes in Bangla.
They are noun suffixes, verb suffixes; adjective suffixes and adverb suffixes.
5.4.2.I. Suffixes for Nouns
The list of suffixes for nouns encompass the total set of inflections used for nouns. These
suffixes include gender, number, person, particle markers and case termination. Because of
their bound nature, they cannot be used independently in the text. Form the corpus it is found
that the number of total suffixes used with nouns is around 75. Among them singular markers
103
are 5, plural markers are 41, gender markers are 25 and particle markers are 3. These
inflections are used at the end of words.
The concept of person is absent for nouns. Therefore, all nouns are considered to
belong to third person only. For denoting number, nouns have two types of suffixes: singular
(5) and plural (41). To denote duality of nouns, some adjectives with dual sense (^5U [ubhay]
"both", [âuthi] "both" etc.) are used with nouns. Sometimes, some cardinal adjectives
[du], ^ [dui], ijft [duti] "two" etc.) are used before nouns to denote same sense of duality. The
suffixes denoting singularity are used both for animate and inanimate nouns whereas the
suffixes denoting plurality have some specifications in use. The suffixes denoting singularity
are: -H [ti], -H [-ta], -C& [-te], -sflft [-khani] and -HfFft [-khana].
Out of total number of plural suffix markers, 13 markers are used for animate nouns,
17 markers are used for inanimate nouns and the remaining 11 markers are used for both
animate and inanimate nouns. The suffixes denoting plurality of nouns are given below:
Animate : -^°T [-kul], -*fcT [-gin], -v9H [-jin], -fw [-dig&], -(M [-der], -•'ttH
[-pal], [-bSrgi], [-brndi], -at® [-bratH], [-m&hal],
-^51 [-ûtha], -at [-ra], [-s&nghi]
Inanimate : -Otft [-adi], -Ota# [-ab&li], -“'ft? [-gach], -*!Ht [-gacha], -•'tlfis'
[-gachi], -<!W [-gucchi], -tfH [-gram], -'StH [-jal], -IiH [-dam],
-U¥ [-cdy], -ftw [-nik&r], -ftni [-nic&y], [-pufij&], -a^st [~br§j A],
-4M [-mala], -at% [-raji], -?#t [-rasi]
Common : -coat [-era], -<01% [-guli], -'0# [-gula], -oc# [-gulo], -HH [-dAl],
[-m&ndil], -*TG# [-m&nd&ll], -C3# [-sreni], [-sik&I], [-s&b],
-spp [-samuh&]
The Bangla nouns have retained some specific gender markers which are mostly
borrowed form the Sanskrit and the Persian languages. Unlike Hindi, the gender markers in
nouns in Bangla have no impact on the conjugation systems of verbs because Bangla does not
have any grammatical gender. The gender value is mentioned by an adjective which precedes
the noun, e.g. C4C8 [meye minus] "women" or nIî! [mihila k&bi] "poetess" etc.
Because of their bound nature, these markers are always dependent on base nouns. However,
in the corpus texts, some specific gender markers (nearly 20) are found. The analysis shows
that these gender markers are mostly feminine (nearly 15), though there are a few specific
markers for masculine (nearly 5) and both these are used for Bangla adjectives and nouns.
The feminine gender markers found in the corpus texts are: -Of [-a], -Olft/-Ot% [-ani/-anl], -
[-ini/-inl], -fo/-# [-i/-I], -fb<PT [-ika], -font [-iya], -fbH [-ima], -ft [-ni], -# [-trl], -4#
[-Mtl], -OH [-an], -# [-si], -5J# [-mitl] etc.
104
In the corpus texts, two types of particles are found: (i) emphatic, and (ii) negative.
Emphatic particles ($ [i] and v3 [o]) are generally used immediately after nouns without any
blank space in between. Moreover, an emphatic marker sometimes becomes a part of a word
by changing into an allograph (e.g. etc.). This kind of mutation takes place
only when a word-final consonant grapheme is without any allograph and non-vocalic in
. utterance. The negative particle markers, however, are very rarely used with nouns.
The use of case termination is a unique property of nouns. Because of the inflectional
nature of the language, some extra information regarding case relation are supplied with
nouns by the use of case termination. Generally, only one case marker is used at a time with a
noun (e.g. + -CO? [bal&k + -er] "of boy"), though in some occasions, more than
one case markers are observed to be used at a time with a single noun. For instance, the noun
wliwsw [lokederke] "to the people" + -C + -tH? + -t^ [lok + -e + -der + -ke]) has three
case markers: (i) -CO [-e] (accusative), (ii) -0t? [-der] (genitive with a sense of plurality), and
(iii) -C^ [-ke] (accusative). From the corpus texts it is counted that the case termination used
for nouns are around 32 in number. These markers can be used with a noun depending on its
structure and case in the sentence. Below (table 5.10) is a list of each case markers for nouns.
Table 5.10
Case markers for nouns
case marker
Nominative [-0], -to [-e], -C? [-ye], -? [-y], -as [-te]
Accusative [-0], -to [-e], -or [-ye], -? [-y], -c* [-ke], -C? [-re]
Instrumental [-0], -io [-e], -? [-y]
Dative [-0], -OT [-ke], -as [-te]
Ablative [-0], -to [-e], -0$ [-ke], -as [-te]
Genitive [-0], -? [-r], -01? [-ar], -CO? [-er], -C?? [-ker], -?t? [-kar]
Locative [-0], -CO [-e], -QT [-ye], -? [-y], -as [-te]
Generally, the case markers perform the inflectional roles of the words. Therefore, in
most cases, the addition of a case marker with a word-final character does not cause any such
notable morpho-phonemic change in the word. However, for the case marker (-CO? [-er]) it is
observed that the first character (-to [-e]) is retained if the last character of the noun is a
vowel, consonant or cluster grapheme (e.g. + -CO? > <pIc<m [kak + -er > kaker], or (ii) it is
dropped if the last character of the noun is a vowel allograph (e.g. NWT + -CO? > *IM? [matha +
-er > mathar], ‘5lfo + -to? > *tfts? [g&ti + -er > g&tir], + -CO? > [n&dT + -er > n&dir], ^ +
-to? > *Pf? [madhu + -er > m&dhur], ?T + -E03 > [badhu + -er > [b&dhur], CWT + -to? >
C5C5T? [chele + -er > cheler], srtMT + -to? > 'SltWl? [alo + -er > alor] etc. The examples show
that at the time of using genitive case markers the final character of the noun generally
dominates in generation of a surface inflected form.
105
5.422. Suffixes For Pronouns
The system of using suffixes with nouns is appicable to pronouns also. The lists of
pronominal suffixes include person, gender, number, particle markers and case termination.
The pronouns have three persons: first, second and third person though the number of
markers is 2 ([-0] (null) and -fo [-i]). Among them, -fo [-i] denotes singularity of person and
is used with all personal pronoun roots irrespective of first, second or third person such as
[ami] "I", [tumi] "you", vSiT1# [apni] "you", [tini] "he" etc. Moreover, it does not
cause any morpho-phonemic change in the surface pronominal forms.
For denoting number, it has two types of suffix: singular and plural. These forms
generally indicate the nominative case as there is a very few specific nominative case markers
for pronouns. Pronouns have a peculiarity in use of suffix markers for number. It is found that
a pronominal form denoting singularity can use a plural suffix (e.g. 'WlHld^TT [amar(sg) + -
gulo(pl)] "those of mine" etc.), while a pronominal form denoting plurality can use a singular
suffix (e.g. vslwjjbl [tader(pl) + -ta(pl)] "theirs" etc.) which lead us to consider
[amargulo] as a plural form, and vsItHsibi [tiderta] as a singular one. Therefore, the actual
number denoted by a pronominal form is determined by the number denoted by the suffix.
This information helps in automatic determination of number of surface pronominal forms.
The notable feature of the suffixes denoting singularity is that none of these is used
with personal (human) pronoun roots except CT [se] where it denotes a non-human object as
in C5i|f [setuku] "that only", Cl# [seti] "that" etc. Among singular suffixes, !<M [-khana] and
[-khani] are sometimes used separately being detached from the pronominal roots as in W
[se khana] "that", C^T [kon khani] "which" etc. To denote duality the word
[ubh&y] "both" is used as pronoun in the language. Moreover, some cardinal adjectives like
^ [duti], [duto], [duta] "two" etc.) are used after pronouns as in cspjfij [seduti], wjcbl
[seduto], cs^jt [seduta] "those two" etc.
The suffixes for plurality are used both with personal and impersonal pronominal
roots, though in case of the personal pronominal roots like f- [mu-], f- [tu-] and sp- [turn-],
they cause a morpho-phonemic change in the surface forms. Among them, -^t [-k&ta], -3# [-
k&ti] are probably derived from [k&yekta] and <KJj«>lb [k&yekti] "some" respectively.
Moreover, the plural forms, like the singular forms, make no discrimination at the time of use
with both personal and impersonal pronouns. The list of pronoun suffixes denoting number is
given in table (5.11).
Table 5.11
Suffixes denoting number for pronouns
Singular -ft [-ta], -f [-ti], -Ct [-te], -|f [-tuku], -m [-khan], [-khana],
[-khani]
Plural -com [-era], -m [-ra], -mfet [-k&ta], -mft [-k&ti], -olm [-guli],
-Q5lt [-gula], -OCSTt [-gulo], -‘m [-digi], -m [-der]
106
The concept of gender division is also irrelevant for pronouns in Bangla because there
is no specific gender markers for the pronominal forms. Moreover, the gender has no impact
on the verbal conjugation in the language.
The particles used for nouns are also used with pronouns in the language. Among
them, emphatic particles [i] and \3 [o]) are generally used after pronouns without any blank
space in between, while negative particles [na], 1% [ni] and <7T [ne] "no") are used neither
with nouns nor with pronouns. Moreover, similar to nouns, the particles sometimes become a
part of a surface pronominal form (e.g. [kei < keu + -i] "whom" or
+ -'3 [kono < koni + -o] "any" etc.) 1
The use of case markers with pronouns is almost similar with that of nouns in the
language. However, for pronouns, the markers are available only for nominative, accusative,
genitive and locative cases while for nouns the markers are available for all seven cases.
Moreover, similar to nouns, the markets for accusative case are used to denote dative case
also. Another interesting thing is that the marker -<M [-der] (probably derived from -fetw [-
diger] < -fw [-dig&(pl)] + -COsf [-er](gen)]) is used both in nouns and pronouns as a genitive
plural marker as in tstcw [tader] "of them" etc. The list of case markers used with pronouns is
given in table (5.12).
Table 5.12
Case markers used with pronouns
Nominative [-0], -CO [-«], -I [-y], -GF [-ke], -Cl [-re]
Accusative -CO® [-ere], -®f [-ke], -a [-y], -CO [-e], -CO® [-ere], -CST [-re]
Genitive -O® [-ar], -COS' [-er], -H [-r], -(M [-der]
Locative -COC© [-ete], -n [-y], -os [-te], -CO [-e], -fOOT [-ite]
5.4.2.3. Suffixes for Finite Verbs
After structural analysis of the conjugated verbs, it is observed that the verb suffixes include
aspect, auxiliary, tense, number, person markers and particles. However, all these suffixes are
not used always with every conjugated verb. The use of the suffixes mostly depends on the
grammatical sub-classes (person, number, tense etc.) of the verbs.
The aspect marker is an integrated part of conjugated verbs. It is also called as

continuative because the presence of this marker in the verb forms implies a sense of
continuation, progressiveness, completeness, possibility or recurrence of any action.
According to OED (1995) aspect is (gram.) a verbal category of a form expressing inception,
duration or completion. Similarly, Chatteiji (1926/1993: 821) observes:
"The force of affixed themes was to indicate the aspect or nature of the action
whether it was progressive or transitory, iterative or intensive or indefinite".
The aspect markers used with the conjugated verb forms in Bangla are: fc>03 [ite] (e.g.
<&Rcv3fl [kSritechi] "I/we are doing"), -foil [-iya] (e.g. sfetf [suniyach&]"you have heard"), -
107
COM [-oya] (e.g. ojIjjMn [dhoyatam] "I/we caused to wash"), -Ut [-ya] (e.g. ftltfe [niyachi]
"I/we have taken"), -01 [-ye] (e.g. finfe [diyeehi] "I/we have given"). They are used
immediately after the verb root in the conjugated forms. However, the forms can vary
depending on the structure of the verb root.
The auxiliary markers are generally used immediately after the aspect markers in a
conjugated verb form. They are of two forms: -S’ [-ch] is used with the verb roots ending in a
consonant grapheme as in [kirchi] "I/we am/are doing", while -s? [-chh] is used with the
verb roots ending with [-i], -OT [-a], -fb [-i] and -q [-u] (e.g. Sts? [hicche] "being done", !*tlfe
[khacchi] "I/we am/are eating", fife [dicchi] "I/we am/are giving" and $3? [dhucche] "S/he is
washing" etc.) For automatic identification of the conjugated verb forms these markers are
quite helpful.
The tense is a deictic category that places a situation in time with respect to the
moment of speech, or occasionally with respect to some other pre-established point in time
(Bybee 1985: 21). It is observed that nouns usually refer to time-stable entities, while verbs
refer to situations that are not time-stable. Thus it is the verb that needs to be placed in time if
the event or situation is to be placed in time, since the entities involved'in the situation
usually exist both prior to and after the referred to situation. However, this distinction in the
form of tense does not effect the meaning of the verb (Bybee 1985: 22), since the situation
referred to by the verb remains the same whether it is said to occur in the present or the past
or will be occurring in the future.
In Bangla there are three specific tense markers for the conjugated verb forms: (i) -
fo°T/-5f [-ila / ~li] for past tense (e.g. [kirili / karli] etc.), (ii) -fots / -*3 [-ita / -ti]
for habitual past tense (e.g. / ^3 [kiriti / kirti] etc.), and (iii) -fo^ / - ^ [-M / -bi] for
future tense (e.g. [kiribi / kirbi] etc.). However, there is no specific tense marker
for denoting present tense. These tense markers are always attached with the verb roots.
Moreover, these forms are not number dependent i.e. forms do not differ depending on
number of the subject. The concept of person plays a very important role in the conjugated
verb forms because depending on this property the final forms of the verbs are changed. In the
table (5.13) different forms of person markers are cited.
Table 5.13
Person markers for the verb forms
person marker
Present Past Future Habitual Imperative
1st -K> [-i] -5im [-lam] -?[-bi] -m [-tarn] [0]
2nd -cot [-o] -c*t [-le] -«[-be] -m [-te] . -cot [-o]
2nd (n-hn) [-0] -fit [-li] -ft [-bi] -first [-tis] -fost [-is]
2nd (hn) -cot [-en] -cot [-len] -(OT [-ben] -OOT [-ten] -q^t [-un]
3rd (n-hn) -CO1 [-e] -st [-li] -w [-be] [ti] -q<sr [-uk]
3rd (hn) -cot [-en] -COT [-len] -ot [-ben] -ccft [-ten] [-un]
108
In case of conjugated verbs, each person marker (1st, 2nd and 3rd) is different from
the other. Even, in case of the 2nd person, the makers vary depending on whether the person
is general, honorific or non-honorific. The same variation is noted in case of honorific and
non-honorific quality for the 3rd person also. Moreover, the person markers vary also
depending on the tense of the verbs.
Both types of Bangla particles are very often used with the conjugated verbs. Among
them the emphatic particles (t [i] and \3 [o]) are generally used immediately after the
conjugated verbs without any space in between as in [klrei] or TOM3 [k&reo] "also does"
etc. However, in some occasions, they are used even in the middle of a conjugated verb form
as in [k&reichi] or [k&reochi] "Fwe also have done" etc. The other other
emphatic particle [na] "indeed") is structurally same with the negative particle ^ [na]
"no", but functionally different from it This particle has its origin in the Sanskrit form
[nam&] "indeed". This form is mostly used in the sense of emphasis in the sentence as in
*lt 'SM [tumi na khub bhala] "you are indeed very good" or [tumi na
âbe Mlechile?] "you indeed wanted to go" etc.
The negative particles OTt [na], ft [ni] and CT [ne] "no") are used in the sense of
negation in the sentence. They are generally used after the verb forms. The pattern of writing
of these particles in the text creates some problems in analysis of words by the machine. It is
found that [na] and ft [ni] are sometimes used after the verb form with a blank space as in
^ [âbi na] "I/we will not go" or ft [khai ni] "I have not eaten" etc. But in some
Occasions, they are attached with the verb forms without any blank space in between as in
$w*il [habena] "will not" or Offifft [dekhini] "I have not seen" etc. The other form (<?T [ne]) is
always used with a verb form without space in between as in vsrtfttTT [janine] "I don't know"
etc. However, this form is very rarely used in the prose text and the purpose of its use is to
evoke a poetic flavour in the statement. '
In case of any conditional statement the negative particle ^ [na] is used before the
non-finite verb as in ^ WT [na bile] "not saying" etc. Here also the negative particle ^ [na] is
sometimes attached with the word or sometimes detached from the word with a blank space
in between. The other two forms (ft [ni] and (7T [ne]) are not generally used in this type of
context in the text.
5.4.2.4. Suffixes for Adjectives
After structural analysis of the adjective forms in the corpus text it is found that a majority of
the nouns or adjectives use specific adjectival suffixes to generate surface adjective forms.
These adjectival suffixes denote gender and adjectival quality of the words. Adjectival
suffixes are of two types.
The first type of adjectival suffixes are bound morphemes in nature and therefore are
not generally used as a separate surface word form in the text. They are generally used with
109
nouns for generating a surface adjective form. The addition of a suffix with the word
generally cause the morpho-phonemic change either by derivation (pr&tay) or by sandhi.
These can be called primary suffixes (derivational in nature). In the corpus text we have
found nearly 60 such suffixes. (Details are given in Appendix - ).
The second type of adjectival suffixes are also bound morphemes in nature, therefore,
are never used as separate surface word forms in the text However, these are compounding in
nature because joining of these forms with any noun or adjective results in producing an
adjectival compound. They can be called secondary suffixes (inflectional in nature). Among
these suffixes some are highly productive while some are less productive. The addition of
these at the end of a word may cause a morpho-phonemic change. There is a kind of mapping
for their joining with nouns or adjectives. In the corpus text around 50 such suffixes are
found. (Details are given in Appendix - II).
Besides, there are some simple as well as participial adjective forms (mostly derived
from nouns and verbs through passivisation) which are also used for generation of compound
adjectives. These forms (around 200 in number) are free morphemes by nature as they can be
used as separate surface word forms in the text Moreover, like secondary suffixes, they are
compounding in nature because joining of these forms with any noun or adjective results in
producing a compound adjective. (Details are given in Appendix - HI).
5.4.3. Infixes
Probably, Bangla has no infix form in the true sense of the term. However, it has a form (-31-
[-T-]) which is used at the middle of a word whenever an adjectival suffix is added to a noun
by derivation. For instance:
[stup] + ^5 [krti] -» tjflpps [stup-T-krtA] "heaped",

*FT [gh&ni] + ^5 [bhutd] -» [gh&n-T-bhiM] "condensed"
Moreover, some verbal nouns (gerunds) have a link vowel (-01- [-a-]) inserted in
between two morphemes while the second part of the form takes the affix -fo [-i] (Chatteiji
1993: 1048) as in [janajani] "knowing", HlsllHlQ [maramari] "fighting" etc. For our
convenience of morphological processing these markers are considered as infixes in the
surface word forms. Whenever such types of word are considered for character based analysis
these markers are fished out as infixes form the surface word forms.
5.5. Post-positions
Post-positions are those which, when used immediately after some words, develop a case
relation with it They mostly retain their phrasal characters and they are remained distinct as
detached words. They are equal to case markers because they are used to denote case relations
which could have been expressed by some specific case markers. The example below would
show that in Bangla, in some occasions, the case relation can be either denoted by a case
marker or by a post-position:
110
(a) <?T fft Red 'SltCHof <f>lCb or csf ffac® 3>1Uj "He cuts apple with a knife"
[se churi diye apel kate] or [se churite apel kate]
(b) <?T 'till0'!?) or OT vSft^r far "He jumped into the water"
[se j&ler majhe laph dili] or [se j&le laph dilfr]
(c) or CTf TO C’H "He entered into the room"

[se ghSrer bhitdre dhuke gela] or [se gh&re dhuke geld]
The examples given above imply that in Bangla, case-markers and post-positions are
functionally same or serve the same morpho-syntactic roles. At least, in the above examples,
the replacement of post-positions with case markers does not markedly change the meaning
of the sentences, which, however, is not true for all cases. It can be assumed that, probably, in
the earlier times, the case markers were used to denote the same functions what the post
positions do presently.
In the study of Bangla post-positions, Chatteiji (1993) has discussed their origin along
with their historical changes in different stages of language development, Sen (1992) has
given a list of two types of post-positional word along with their usage in the language, while
Sarkar and Basu (1994) have provided lists of post-positions along with examples to show
how they are used in the language.
It is not yet clearly known whether suffixes are generated from post-positions or vice-
versa. Chatteiji (1993: 766) says that by the process of simplification the post-positions are
originated when the case markers were lost and “the speech began to employ the accusative,
dative, ablative or locative form of suitable nouns (with the sense of location, vicinity,
direction, connexion, purpose or power) along with the principal noun which retained its
original inflexion”. But Bhattachaija (1998: 136) assumes and explains with examples that
the suffixes were actually derived from the post-positions. Post-positions are actually the
analytical properties of the language which, in course of time, have changed into inflectional
properties and attached with the preceding words.
In the inflectional languages like Sanskrit or Latin, the grammatical properties

(prefixes, suffixes, case markers, etc.) are mostly attached with the lexical morphemes.
Sometimes, they used to cause some morpho-phonemic changes to the lexical items by
different phonological processes. But in analytical languages the suffixes and post-positions
are generally retained separate from the lexical morphemes. The Bangla language has the
properties of both the language types as it has grammatical properties of the inflectional
languages as well as the post-positions like the analytical languages.
In generative grammar the post-positions are not considered as lexical morphemes,

because they are not free morphemes like nouns, verbs or adjectives. It is true that these post
positions are not free morphemes like nouns, verbs or adjectives but they are also not bound
morphemes like suffixes (case, gender, person markers etc.). The arguments behind
considering post-positions as nouns are that post-positions can retain their lexical nature as
nouns in the language. Moreover, like nouns they can use case-markers with them. For
instance:
Ill
(i) ■9ftCet?T C^M "boy of the near-by village"

[paser gramer chele ]
(ii) sltolJJ "on the near-by road "

[kacher rastay]
Here the post-postions (■fflFt [pase] and [kache]) are used in the sentences as
nouns with case-markers -003 [-er] and -to [-e], respectively.
In Bangla, almost all the postpositions are derived form nouns or verbs with then-
respective specific meanings. The following examples show how they are derived from two
sources:
(i) Post-positions derived from nouns:

vsnrtf < vsrsl [agre < agr&], tjMta < [up&re < up&r], ^t^ < [urdhve < urdhvl],
^tt^ < [kache < kach], ^t't < [kar&ne < kar&n], vwlw> < tstpte [t&phate <
t£phat], [dike < dik], R<j>ui < [nik&te < nik&t], Rt5 < [nice < nic],
Rra < Rw [nimne < nimn^], Stta < [dhare < dhar], < •'iw [p&kse < p&ks&],
M'-bltv) < 'i15eFtt5 [pascate < p&scat], [parsve < parsvci], ■‘ftC'T < ■'tH [pase <
pas], fast/T < [pich&ne < pich&n], Rtt^ < [piche < pich], tftGS < dte [prante
< pranti], WT < [ph&le < ph&l], W°T < |Md£le < b&d&l], [baire
< bair], [bahire < bahir], < fetss [bhit&re < bhit&r], W < W
[m&dhye < mddhyi], ^ [majhe < majh], 5TC5T < spr [s&nge < Scingci], spijpjf <
[s&mmukhe < Scimmukh], sfltsf < sttsl [sathe < sath] etc.
(ii) Post-positions derived from verbs:

[kSre/kMya < Vkfir], < Vstv? [chaRa < VchaR], «fta/ <ffet <
[dhare/dhSriya < Vdh&r], fel/fwf < Vft [diye/diya < Vdi], 5#f/^#TOT V5TH
[lagi/lagiya < Vlag], < Vfrt [ceye/caite< Vca], tslt^ < [theke <
Vthak], [h&te/h&ite < Vh&] etc.
Considering their sources of origin, in standard Bangla grammar the postpositions

(around 60 in number) are divided in two groups: nominal post-positions .and participial
post-positions. From the corpus it is found that generally no postposition is used after the
nouns used in nominative, accusative and genitive case. For other cases (namely instrumental,
dative, ablative and locative) the postpositions are used after nouns. Nominal post positions
are mostly derived from nouns and are used after nouns which are in genitive case, while
participial postpositions are mostly non-fmite in form, derived from verb roots and used after
the words which are mostly without any case marker. In the table (5.14) below the most
recurrently used post positions with different cases are cited:
112
Table 5.14
Post positions for different cases
Instrumental <pw [kare] [kariya] <pvy<> [kSrtrk] <5>lsil't [kar&ne] Ms! [dvara]
[d&run] felt [diya] fell [diye]
Dative tSRT [j&ny&] vSfiTff [j§nye] [tare] fife [prAti] W! [Md£le]
[b&nam] [bab&d] [lagi] [lagiya]
Ablative bfevs [caite] COT [ceye] tef^t [chaRa] cslt^ [theke] for [diye] felt [diya]
[b&i] felt [bina] [Mite] WS [hate]
Locative vsictt [agre] WW [apeksa] vsfesjptf [abhimukhe] \5ltPf [age] 'S’OT [up&re]
tSwtf [urdhe] [kache] [t&phate] to [dike] [nik&te] to
[nice] to [nimne] [nyay] OT [dh&re] qfirat [dh&riya] [dhare]
*IOT [p&kse] [pMcate] *tn?r [pane] [parsve] *m*t [pase]
[pichSne] to [piche] ms [prante] WT [phale] [b&rabar]
[baire] [bahire] tol [bhitSr] feOT [bhit&re] W [m&dhye] »
[majhe] spy [sange] [simmukhe] stffe [s&hit] s?ttsf [sathe] stfspr
[samne]
5.6. Compound Words

Compounding is a process where new words are produced by combining two (rarely more)
other words (or stems). It is a very important way of adding to the word stock (lexicon) in any
language. A compound word contains at least two bases which are both words, or at any rate,
root morphemes (Katamba 1993: 54). Bloomfield (1933: 233) has made a general observation
while commenting on compounds:
“Linguists often make the mistake of taking for granted the universal existence
of whatever types of compound words are current in their own language. It is
true that the main types of compound words in various languages are
somewhat similar, but this similarity is worthy of notice; moreover, the
details, and especially the restrictions, vary in different language. The
differences are great enough to prevent our setting up any scheme of
classification that would fit all languages.”
The phenomenon of compounds in English is studied by different scholars. Among

them Bloomfield (1933), Jespersen (1929), Bloch and Trager (1942), Marchand (1969),
Hockett (1958) are worth mentioning. All these studies were primarily descriptive, with less
emphasis on the generative aspect of compounds. The scholars agree in holding that a
compound is a kind of word with the combination of at least two lexical items. Among the
generative morphologists, Selkirk (1983) probes the structure and headedness of compounds,
Jensen (1990) classifies compounding constituents, Spencer (1991) examines the processes
involved in synthetic compounds while Katamba (1993) observes the phonological and
syntactic aspects of compounding.
113
In many respects compounding represents the interface between morphology and

syntax. Compounding is prototypically the concatenation of words to generate other words.
Compounding is a generative process where we look at morphology and syntax (or something
else) as well as looking at the problem of how to define the notion of ‘word’ (Spencer 1991:
309). When two words (as opposed to roots) are compounded each is a ‘minimal free from’
by definition. But is the resulting compound word ? If we regard the compounding process as
essentially syntactic (as we are at liberty to do so), then the answer is presumably ‘no’; if
compounding is a morphological process the answer will be ‘yes’ (Spencer 1991: 43).
In English, a compound noun may consist of a noun, adjective, preposition, or verb on

the left and a noun on the right; a compound adjective may consist of a noun, adjective or
preposition followed by an adjective; and a compound verb may consist of a preposition
followed by a verb (Selkirk 1983: 14). Besides this lexical categorisation of components of
the compound, the English compounds are also analysed from three different angles namely,
morphological or structural, syntactic and functional or semantic.
There are two types of compound words in English form structural or morphological
point of view: (i) the synthetic compounds whose second member is either derived by adding
a verbal suffix (-ing, -er, red etc.) or is a past participle form of a verb. E.g. breath taking,
watch maker, man made, time killed etc., and (ii) the primary compounds (Murchand 1969)
which encompass all other types.
From syntactical point of view, the relation of the participating components of a

compound is defined as. either: (i) syntactic where the components force the compound to act
like phrase adding an extra dimension to the compound. E.g. black bird, white cap etc. or (ii)
asyntactic where the participating components develop no grammatical relation among
themselves within the compound structure. E.g. door-knob or green field etc. or (iii) semi
syntactic where the participating components have some kind of syntactic relation but not
like phrase. E.g. house keep <=> keep house etc.
From syntactical point of view the compounds have two sets of characteristic
properties (Spencer 1991: 310).
(i) The first set makes compounding resembles syntactic process in that it is typically
recursive. The elements of a compound may have relations to each other which
resemble to the relations holding between the constituents of a sentence. For example,
(a) mass literacy

(b) mass literacy campaign
(c) mass literacy campaign programme...
(c) mass literacy campaign programme committee...
(ii) The second set brings compounding closer to word formation. It points out that
compounds have a constituent structure, which in general, depends on the way the
compound is built up. For example the compound 'mass literacy campaign' can be
analysed as:
(a) [mass [literacy campaign]]
(b) [[mass literacy] campaign]
114
From functional or semantic point of view, the compounds are identified as (i) the
endocentric compounds where one component stand for the whole or functions as a head.
Most English compounds are of this type. In English the right component gives the basic
meaning of the compound as a whole such as mailman or blackbird. Here, the modifier
element of the compound has the function of attributing a property to the head, much like the
function of an attributive adjective, and (ii) The excocentric compounds where none of the
components is substituted for the whole compound. Neither component can be called the head
of the construction. Such compounds are similar to bdhuvrlhi (reciprocal) compounds in
Bangla. E.g. pickpocket, lazybones, sweetheart etc. The compound sweetheart does not
imply a kind of heart which is sweet in taste but a fiancS (Jensen 1990: 99). In these
compounds one can isolate a predicate-type element (pick, lazy, sweet) and an argument-type
element (pocket, bones, heart) (Spencer 1991: 311).
The Bangla compounds have probably been first discussed by William Carey (1805).
A detailed classifications of Bangla compounds along with their formative and semantic
analysis was given by Chatteiji (1993), the basic differences between the Bangla and the
Sanskrit compounds are briefly discussed by Sen (1992), an exhaustive discussion on the
methods of compound formation is given by Sarkar and Basu (1994), both tdtsdmd and non-
tdtsdmd compounds are dealt with by Chakravarty (1974), a comparative study between the
Snaskrit and the Greek compounds with clear indication for Bangla compound analysis is
presented by Baneijee (1997), a descriptive study of the Bangla compounds is presented by
Bhattacharya (1983) in her unpublished doctoral dissertation, while Chaki (1996) has tried to
answer what is compound and the significance of the respective head names.
According to the scholars mere congregation of words into one form will not make
any compound (sSmasi); there must be some amount of syntactical and semantic connections
between the compounded words. If the words are not syntactically and semantically related to
each other, they cannot form a compound (Baneijee 1997: 07). Moreover, Bangla compounds
should be discussed in between morphology and syntax because a compound is a transformed
version of a sentence of which at least two words are juxtaposed in such as way that their
semantic relationship are never understood from the structure (Bhattacharya 1997: 53).
The only arguable deficiency of the above discussions is that none of scholars has
elaborately discussed what types of changes take place in the structure of the participating
lexical items (components) after compound formation, or whether the lexical categories of the
involving items (components) are changed in the final output. Here we will consider only the
way of their formation in relation to the formation of word structure along with their
relevance to the theory of word structure. These aspects will be taken for consideration as this
is related to automatic processing of compounds by computer. However, their syntactic and
semantic characteristics are not aimed at in the present framework of the thesis.
5.6.1. Compound Nouns

There is a considerable degree of inconsistency in the orthographic representation of
compounds in Bangla. Some very well established compounds are written as single word with
or without a hyphen (e.g. alwlE) [rajbaRi] "palace", [raja-bidsa] "kings and
monarchs" etc.), while many other compounds are not conventionally identified as such by
the orthography. Thus, a compound form sometimes appear as two different words separated
115
by a blank space (e.g. 'StvSiT [mach bhaja]) and sometimes as a single hyphenated word
(e.g. [mach-bhaja] "fish fry").
Moreover, by sandhi, two separate words can merge, thereby losing a character (basic
or allographic) of a word. For instance, the compound word fejMy [vidyal&y] "school" is
actually formed by adding IwT [vidya] "knowledge" and v5Jl*T?l [al&y] "house" where the last
character (the allograph OT [a]) of the first word and the first character (vowel Wt [a]) of the
last word is merged together. Similarly, the compound word [nTlotp&l] "blue lotus" is
formed by joining the word %T [nil] "blue" and the word tg'V’H [utp&l] "lotus" where the last
character (inter-vocalic [&]) of the first word and the first character (vowel tjj [u]) of the last
word is merged to form the allograph COT [o] which is joined with the last character [1] of
the first word. Such loss or change of a character is possible if the components pass through a
process of moipho-phonemic change.
Therefore, it is clear that orthographic conventions are a poor guide to compound

word identification. So, for proper understanding (and for automatic processing by computer)
of the Bangla compounds the following features are taken into consideration:
(i) a compound must have at least two lexical items (components) which may or may not
be used as an individual lexical item in the text.
(ii) participating components can be separated with a space in between, but taken together
they would definitely denote some extra information or idea or meaning which cannot
be gathered form the meanings of each participating lexical items put together.
(iii) the compounds resemble to single words because they are often lexicalised. They are
often subject to semantic drift associated with stored words, which means that their
meaning becomes non-compositional or even totally idiosyncratic. For instance, the
compounds like [sulapani] or %tT‘#T [vlnapani] no longer indicate a person
having a spear/lyre in his/her hand. Rather they imply Lord Shiva or Goddess
Saraswati respectively who are associated with these compounds from time
immemorial by some mythological events or beliefs.
(iv)there are often some lexical restrictions on which the formation of compounds is
permitted in Bangla. For instance one can write [s&bdaM] "burning of a dead
body" or [s&bsadhana] "meditating over a dead body" but cannot say
[m&RadaM] or [m&RasadMna] though the respective meanings of these two
compounds are identical with the earlier two examples. In a similar fashion, one can
say [brstipat] "rain fall”, [baripat] "rain fall", [tusarpat] "snow
fall", [sisirpat] "diew fall", 'SREF’tM [asrupat] "tears shed" or even
[rSktSpat] "blood shed" but cannot say [jalpat] or [nlrpat] or
[panipat] or [kannapat] etc. though in all cases the meaning is nearly similar
denoting "fall or shading of rain or water or tear" etc.
116
(v) the formation of compounds in Bangla is also controlled by distribution of the

components (lexical items). For instance, in Bangla the two most recurrent lexical
items are CSW [cokh] "eye" and Wf [j&l] "water", but for the difference of distribution
of each item, we cannot have the compound form *C5WSr5T [cokhjhl] "tears" (the form
[cokher j&l] is in regular use), though the other forms like 'sifow’l [aksij&l],
[aksibari], [aksinlr], wiPjWT [amkhijSl], [amkhinlr] etc. having
same meaning, are quite common in the language.
(vi) morphological integrity is another feature of the compounds. The constituents of

compounds cannot be split up by other words or phrases. It is not uncommon for
elements of compounds to become so frequently used and for the compounds they
become so lexiealised that the element loses its status as an independent word and
becomes a clitic or an affix (Spencer 1991: 313).
(vii) a morpho-phonemic process can operate in the formation of compounds in Bangla.

Generally, there are sets of sandhi rules which are applied for the purpose. For
instance, the compound like sjgN? [mShausSdhi] "great medicine",
[âb&jjTvan] "whole life", WlW [adykitS] "from beginning to end", SFSrRf [rajlrsi]
"king living like monk", [^hthecchh] "as one likes" etc. are produced by applying
sandhi rules.
In standard Bangla grammar the compounds are analysed from semantic point of view
rather than structural. It has identified six types of compounds namely, copulative (dv&ndv&),
determinative (titpurus), numeral (dvigu), descriptive (kcLrmMhar&y), adverbial (aby&yTbhab)
and reciprocal (b&huvrlhi) compounds. Patanjali in his Mdhabhasyd (2/1/6) has given a
semantic interpretation of the Sanskrit compounds which seems appropriate to the Bangla
compounds. According to his analysis, in copulative (dv&ndvi) compounds both the
components are of equal importance, in adverbial (aby&yTbhab) and numeral (dvigu) the first
component is important and carries the semantic load of the compound, in determinative
(tJitpurus) and descriptive (IdirmMhar&y) the last component carries all importance, whereas
in reciprocal (b&huvrihi) something other than the involved components becomes important
In the history of studying compounds in Bangla we find that Chatteiji (1993: 176) has
semantically divided compounds into three main divisions: collective, determinative and
descriptive. Determinative compound is again divided into three sub-groups: determinatives
with one element governing another (t&tpurus), appositional determinatives (k&rm&dhar&y)
and numeral determinatives (dvigu). The collective compound includes copulative (dvSndvi)
compounds and similar other words, the determinative compound includes ’uplpad-titpums’,
'aluk-titpurus', 'n&n-tihpurus', ’pradi-s&mas’, ’nityH-sSmas’, 'aby&ylbhab' and 'supsupa', while
the descriptive compound includes reciprocal compounds.
At the time of morphological analysis of the compounds for automatic processing by

computer it is observed that four types of structural variations take place in the process of
formation of compounds in Bangla.
117
Type 1 : none of the components is inflected; only the bare stems are combined in
compounds, e.g. [bhal&m&ndii] "good or bad", [sadakal&] "black
and white" etc.
Type 2: both the components are inflected, e.g. [hatebajare] "in markets and
other places", [pMeghate] "in road or fields" etc.
Type 3: the first component is inflected while the last one is intact, e.g.
[hatekMRi] "holding chalk in hand" (first step of formal education),
[gayeh&lud] "smearing turmeric paste in the body" (a ritual in wedding) etc.,
and
Type 4: the last component is inflected while the first one is intact, e.g. 5ft*n;ftD5
[lalpeRe] "with read border", ief^lW [chSRihate] "with cane in hand" etc.
The compounds display a type of word structure which is made up of two or more
constituents, each belonging to one of the lexical categories: noun, adjective, verb or adverb.
Compounds of three or more words are usually produced by combining a compound with
another word or with another compound. For instance, the compound
[hajarhatkalT] "the Goddesses Kali who has a garland of thousand cut-off hands" consists of
three words where first two words produced a compound and the last word is added to the
earlier compound in the second stage. The regular standard process of compound formation in
Bangla is given in table (5.15) below with examples:
Table (5.15)
Process of compound formation in Bangla
no process examples glossary
1 noun + noun f¥T + -4 [din + kal] —> [dinkal] days and times
2 noun + adjective cftvft + ftp -»C4<Fiifop full of sorrow
[bed&na + bidhur] -> [bed&nabidhur]
3 noun + adjective 5M + TOT -4 bHVHlvStf parched rice
(adj < verb)
[cal + bhaja] -4 [calbhaja]
4 adjective + noun C*fi> + C5ftf -4 mean person
[chotd + lok] -4 [chot&lok]
5 adjective + hard and soft
adjective
[k&thin + kom&l] -4 [k&thinkomM]
6. adjective Cft + tssit oilv5®it [do + tala] -4 [dotila] second floor
(cardinal) + noun
7 finite verb + finite 55ft + OTftt -4 IHlttMf moving around
verb
[cala + phera] -4 [cMaphera]
8 prefix + noun f- +ftW-4fft©r?r [ku-+ nSj§r] -4 [kunijlr] bad look
9 prefix + adjective JJ-+ PJI -4 sjFp' [su- + ckur] -4 [suc&tur] very intelligent
10 prefix + adjective W- + CM! -4 [a- + cena] -4 [acena] unknown

(derived from verb)
118
The prefixes have an important role in compound word formation in Bangla. It is

noted that almost all the prefixes are efficient in compound formation. Moreover, the addition
of a prefix can affect the meaning of a word as well as can change the word-class of a
compound. For instance, a noun by addition of a prefix, changes into an adjective. The
following analysis shows some examples compiled from the corpus text.
[[['5Tt-]pre NsKliJadj [[[a-] pre mMn]n]adj "till death"

n]adj [[[phi-lpre bSchArjJadj "every year
[[[^pt"]pre ÎÔnladj [[[phuljpre hata]n]adj "full sleeve"
[[[^-Ipic^lnladj [[[bejpre imanjjadj "shameless"
[[[*-] pre ^]n]adj [[[s^-]pte bandhSbJJadv "with friends"
[[[5^“]pre tdl^lnjadv [[[hSx—Jpj-e roj]n]adv "everyday"
[[IXf-]pre ^f]n]adj [[[ha-]pre ghSrjJadj "homeless"
[[[^l<l*-]pre 5t'3T]n]adj [[[haph-]pre hata]Jadj "half sleeve"
5.6.2. Compound Verbs
Probably, the English term compound has become synonymous to the Bangla term sdmas.
Therefore, the study on the compound words is mainly centred around noun and adjective
compounds available in the language. Quite logically, the study of the compound verbs is
considered as a separate area of investigation having little connection with compound nouns
or adjectives. There have been some efforts to discuss compound verbs in Bangla, isolating it
from the general scope of compounds. Chatteiji (1926/1993) has considered compound verbs
as “a remarkable idiomatic use of verb roots in connexion with a noun or a verbal conjunctive
or participle” (1993: 1049). He has observed (1993: 1050):
“... these compound verbs supply to some extent the want of modal and
temporal affixes, and are as characteristic of the modem Indo-Aryan speeches
as the ‘aspects’ of the verb in the Slav languages”
He has also noted that the inflected root is properly an auxiliary one which is modified
by preceding noun or by a participle. Moreover, he has classified the compound verbs in
accordance with their semantic or aspectual peculiarities and the usage of the auxiliary or the
subsidiary verbs attached to the preceding verb. Sarkar (1976) has given a detail structural
description of the compound verbs with rules for compound verb generation. Dasgupta (1977)
has defined compound verbs and has pointed out their constructional homonymy finally
modifying the traditional definition of compound verbs. Dakshi (1998) has analysed the
compound verbs to show how aspectual function is a part and parcel of the Bangla verb
structure.
A compound verb in Bangla, like a compound noun, is generally made of two

constituents. One is the pole which generally occurs at left hand position and the other one is
the vector which occurs at right hand position. In compound verb formation the pole plays
predominant semantic role, while the vector mildly modifies its sense. Usually the pole is
followed by the vector but there are rare exceptions to this rule. Note that, both the pole and
119
the vector are made of verb roots. “Of the two constituents of a compound verb, the vector is
inflected for tense, mood, aspect, degree of honour, and person, while the pole invariably
ends in -CO [-e] (Dasgupta 1977: 69). But Chatteiji has shown that the pole can end in -01 [-
ye], -to [-e], -fC05 [-ite] and -05 [-te] if the verb is with infinitive, and in -fOTTt [-iya] if the
verb is with a gerund (Chatteiji 1993: 1050). Sarkar (1994: 211) has cited examples where the
pole ends in -foot [-iye], -GI [-ye], -to [-e], -fbo5 [-ite] or -05 [-te]. Dakshi’s (1998: 52) list
adds one more ending for the pole namely -font [-iya] to the existing list
Usually, a compound verb can be converted into single worded verb without virtually
making any big change in the sense it denotes (Chaudhuri et al. 1997). However, there are
exceptions where such simplifications cannot be done. This is particularly so in future and
past perfect or continuous tense situation. Also, in Bangla sentence, a verb with infinite suffix
marker is reduplicated to represent continuity of action. For the purpose of automatic
identification of compound words, the computer needs observation on a few sequential words
in a sentence. There is a small subset (usually 25) of verb roots which are generally used as
vector in compound verb formation: [ach], V'SIH [an], Vu [ca], [as], Vc4\?t [beRa],
[b&s], [cSl], Vcw [dekh], Vfaft [dmaRa], Vc? [de], V(?T [ne], tya], “M [oth], Vgm
[phel], [pSR], V* [r&], [lag], [tol], V?TN [rakh], [thak], V*tT [pa],
[par], [mar] etc. However, the analysis of the syntactic aspect of the compound verbs is
beyond the scope of this theis.
5.7. Reduplicated Words

The process of reduplication is another method of word formation by which the sense of
continuation of an action or the sense of multiplicity of items can be expressed. Traditionally,
it is defined as the process of repetition of all parts of a morpheme or a word to express a
morphological category. It is observed that in reduplication process, some part of a base is
repeated, either to the left of the root as a prefix, or to the right as a suffix or occasionally in
the middle of the root as an infix.
In recent years it has created a good deal of interest among generative phonologists
and morphologists because it has both morphological and phonological aspects which are
important to morpho-phonological studies of lexemes (Spencer 1991: 151). Probably this has
influenced Sapir (1921: 76). to comment:
“... nothing is more natural than the prevalence of reduplication, in other

words, the repetition of all or part of the radical element The process is
generally employed, with self-evident symbolism, to indicate such concepts as
distribution, plurality, repetition, customary activity, increase in size, added
intensity, continuance."
However, Spencer holds a slightly different view about reduplication. While Sapir
gives emphasis on the semantic role of reduplication, Spencer emphasises on the additions of
morphemes with the base. Therefore, he (1991: 13) observes:
120
“The interesting thing about reduplication is that it involves adding material,

just like any other form of affixation, but the identity of the added material is
partially or wholly determined by the base. Thus we have a form of affixation
which looks much more like some sort of process’ which applied to the base
rather than a simple concatenation of one morpheme with another."
The process of reduplication of words is not a very common feature in English. So

there has been a very little discussion on it. Gleason (1970: 90) gives a short description of
partial reduplication which is more common than complete reduplication in English. Aronoff
(1981: 73-78) discusses reduplication from phonological aspect and considers whether any
mle of generative morphology can be applied for the generation and analysis of reduplicated
words. McCarthy (1982: 20-50) discusses reduplication in Semitic and other languages in
some details. Bybee (1985 : 152) shows how reduplication in Songhai, Massai, Vietnamese
and Tongan can signal diminutive meaning, though such meaning does not modify the
temporal contours of a situation. Jensen (1990: 68-71) discusses reduplication of different
kinds in Quebec French, Indonesian Malay and Doc ana, a language of the Philippine Islands.
Spencer (1991: 13) considers reduplication as another form of affixation and cites examples
from Tagalog, another Philippine language. He sums up the hypotheses of other investigators
such as Downing (1977), Bauer (1983), Maramtz (1982), Levin (1983) with different syllabic
analysis of reduplicated forms (150-156). Katamba (1993: 180-197) considers reduplication
as a process whereby an affix is realised by phonological material borrowed from the base. He
shows how the semantic property of the base is changed with reduplication and tests whether
reduplication is nothing more than constituent copying. Moreover, he presents a theoretical
approach that offers some insight into these reduplication phenomena that does not involve
constituent copying.
There are two types of reduplicated words in English. One is complete reduplication,
where an entire word is reduplicated. It is considered as compounding, in which the
reduplicated word is compounded with itself. For instance the reduplicated words like goody-
goody, pooh-pooh, thick-thick etc. where the second part is simply the repetition of the first
part of the word. The other one is partial reduplication, where only a part is reduplicated.
The reduplicated part may be prefixed, suffixed or infixed to the original word. For example,
in the reduplication words like dilly-dally, bit-bat, hum-drum, riff-raff, sing-song, roly-poly
etc., the second part is generated either phonetically or orthographically from the first part
In Bangla, the reduplication of words is a very common phenomena as found in the

texts of the corpus. For the orthographic variations prevalent in the language, the reduplicated
words can be written with or without a blank space in between two components. In most
cases reduplication takes place to right of the root, as a suffix. The material reduplicated can
be a whole word, a whole morpheme, a syllable or sequence of syllables or simply a string of
consonant or vowel characters which doesn’t form any particular prosodic constituent
(syllable, root, morpheme etc.). It can occur to almost all the word-classes except pronouns
and indeclinable. In most cases it signifies plurality of objects or continuity of some actions.
Moreover, echo words and onomatopoeic words, which are quite prevalent in the language,
are also considered as reduplicated words in Bangla. Chatteiji (1993, 1995) discusses
reduplication of words which are semantically related citing examples of reduplication of
verbs only, Sarkar and Basu (1994) investigate different types of reduplication with examples,
while Chaki (1996) considers it both from structural and semantic points of view.
121
Structurally, three types of reduplicated words are mentioned by Chatteiji (1995), and
Sarkar and Basu (1994). However, Chaki (1996) has mentioned four types. All these types of
reduplication are mentioned below with examples. The last one is of Chaki’s addition.
(i) repetition of the same word, e.g. fw f¥T [din din] "day by day", [hasi hasi]
"smiling", 3153 [b&ch&r b&ch&r] "every year", ^rt^r [lal lal] "red" etc.,
(ii) addition of a semantically similar or almost similar word to the base word, e.g.
[alig&li] "lane by-lane, ptFK [cupcap] "silently", [curicamari] "stealing",
[calculo] "status" etc.,
(iii) addition of some onomatopoeic words to the base word, e.g. w»ib»i [jcdtcil] "water and
etc", NlfeBl? [machtach] "fish and others", °jlb$b [luciphuci] "luchi and others", '
[k&that&tha] "speech and others" etc. and
(iv)simple onomatopoeic words, e.g. [jh&njh&n] "jingling", [kh&lkh&l] "sound

of laughter", [dhupdhup] "soun4 of falling" etc.
The reduplicated words, found in the corpus, are analysed from their structural point
of view, so that without considering their semantic properties the machine would be able to
identify and process them automatically. Reduplication in Bangla is generally found with
noun, verb (finite or non-finite), adjective and adverbs. At the time of morphological analysis
of reduplicated nouns it is noted that an inflected noun is reduplicated in six ways:
(i) the base form is doubled while neither the first nor the second component of the final
form has any suffix marker, e.g. ^ ^ [ghSr gh&r] "every house", f^T f^T [din din]
"regular", CTO [meye meye] "like a girl" etc.
(ii) the base form is doubled in such a way that the second component of the final form is
not a noun, rather an echo of the first noun, e.g. 'Sfo^T [j&lt&l] "water and others",
Nlfcbl® [machtach] "fish and others", [b&it&i] "books and others" etc.
(iii) the base form is doubled while both the first and the second component of the final
form take the affix -CO [-e], e.g. W [jane jane] "person by person", fro1 fror [dike
dike] "in every direction", TO TO [gh&re gh&re] "in every house" etc.
(iv)the base form is doubled while both the first and the second component of the final
form take the affix -OtS [-ay], e.g. [k&thay k&thay] "in every word",
[diijay dSijay] "in every door", [rastay rastay] "on every road" etc.
(v) the base form is doubled while both the first and the second component of the final
form take the affix -03 [-te], e.g. ;#03 [n&dlte n&dite] "in rivers", 41%?3 <qll^x.vD
122
[baRite baRite] "in every house", *ft%® [gaRite gaRite] "in cars", *1%® *1%®
[s&khite s&khlte] "between two girls" etc. and
(vi) the base form is doubled being connected by a link vowel -OT- [-a-] while the second
component of the final form takes the affix -fo [-i] (Chatteiji 1993: 1048), e.g. 4l®kll®
[hatahati] "fighting with lands", eiMMlRr [lathalathi] "kicking" etc.
The morphological analysis of reduplicated verbs (finite and non-finite) in Bangla

shows that the verbs can be reduplicated in three ways.
(i) both the verb roots may be identical or different in form but both the verb roots can
have:
(a) the [-0] suffix as in [c&l c&l] "walk" etc.
(b) the suffix -fbnf [-iya] as in Q%T [suniya suniya] "hearing" etc.
(c) the suffix -CO [-e] as in [bale bale] "saying" etc.
(d) the suffix -foe® [-ite] as in of%® ©ftt® [sunite sunite] "hearing" etc.
(e) the suffix -C® [-te] as in Mt® M® [dekhte dekhte] "seeing" etc.
(f) the suffix [-b&] as in [sunb£ sunba] "I will hear" etc.
(g) the suffix -C4 [-be] as in [bSlbe b&lbe] "you will say" etc.
(h) the suffix -°T [-1&/] as in [sunil sunl&] "he heard" etc.
(i) the suffix -C°T [-le] as in ^C=T [karle kSrle] "you did" etc.
(j) the suffix -focsr [-iye] as in Ciftoi [dekhiye dekhiye] "showing" etc.
(k) the suffix -OtH [-ay] as in [sonay bcilay] "hearing and saying"
(l) the suffix -OTc*Tf [-ano] as in (MM CM HIM I [dekhano sonano] "showing and
hearing" etc.
(ii) there is another set of reduplicated verbs where first form is a verb root while the
second one is not a verb but an echo form of the first verb form. These forms can use
the verbal suffixes in the said manner mentioned above, e.g. =*JT5T M [khay day] "eats
and does others things", ^ [lute pute] "plundering and doing other things", oftoi
[gutiye sutiye] "folding and other things" etc.
(iii) the verb root is doubled, and they are connected by a link vowel -Ot- [-a-] and the
second part of the final form takes the suffix -fb [-i] (Chatteiji 1993: 1048). The roots
are genuine verb roots While the infix and suffix are different. The final output can be
a considered as a verb or a noun. From grammatical point of view the final output is
verb because it is originated form two verb roots. But from lexical point of view the
final output is a noun because the form can be inflected with case marker which is a
property of noun, e.g. vHiHIvyrlfi [janajani] "knowing", [maramari] "fighting",
bMIblft [tanatani] "pulling" etc.
123
The morphological analysis of reduplicated adjectives in Bangla shows that the

adjective can be reduplicated in two ways:
(i) both the words have no marker, e.g. ^Tt^T [lal lal] "red", C^fi; [chota chot&]
"small", ^5t°T [bhali bhal&] "good" etc. and
(ii) the second component has the marker -CO [-e], e.g. TOR [dh&bdh&be] "milky white",
•Jb<£tb [kuckuce] "jet black" etc.
The morphological analysis of reduplicated adverbs in Bangla shows that the adverb
can be reduplicated in five ways.
(i) the adverb is doubled without any marker with any of the components, e.g. W W
Pdkh&n k&kMn] "sometimes", CWT CWT [yem&n êmSn] "such" etc.,
(ii) both the forms have the marker -to [-e] at the end, e.g. sftwr [majhe majhe]
"occasionally", vSflC'T *ftt*f [ase pase] "nearby" etc.,
(iii) both the forms has the marker -Otu [-ay] at the end, e.g. [belay belay]
"before time", sMn JftsM [mathay mathay] "just upto the mark" etc.,
(iv)the first word has the ending -Ot [-a] while the second word has the ending -to [-i],
e.g. [kholakhuli] "openly", twlftf? [michamichi] "falsely" etc. and
(v) the first word has the ending -COT [-o] while the second word has the ending -fo [-i],
e.g. [mukhomukhi] "face to face", [pithopithi] "one after another" etc.
This process of analysis has helped us to identify reduplicated words in the corpus
with certain accuracy. However, in some occasions they are doubly parsed due to their
ambiguities in their surface forms.
5.8. Conclusion
In this chapter an effort is initiated to analyse the words morphologically with all their
markers and inflections to capture the actual orthographic representation of the words used in
the text of the Bangla corpus. The effort is also taken to observe the variations of the
morphological makeup whenever these words have undergone any kind of morpho-phonemic
reformation or change. Here only a few sample words and examples are used for study. As the
research is entirely grapheme based (because the study is based on the printed texts) it is
necessary to know how the graphemes behave in the level at the time of inflection, derivation,
sandhi or at the time of using case and other markers. The analysis and results would be used
in chapter 7 when the parsing processes of the surface words would be investigated.
*****

11 - Chapter 5 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

11 - Chapter 5 PDF

Uploaded by

Copyright:

Available Formats

89

Chapter 5: Morphological Analysis

The morphological analysis of word is concerned with the analysis of morphemic

5.2. Earlier Studies

transformational grammar as outlined in the works of Chomsky (1972). Bybee (1985) is

Among the generative morphologists, Selkirk (1983) examines complex words -

5.3. Word Formation Processes in Bangla

At the time of morphemic analysis each surface word is considered as a separate

In Bangla, a word is formed by putting some morphemes together with or without

(i) words are generated by systematic arrangement of morphemes. Adding of morphemes

5.3.1. Morphemic Structure of Nouns

5.3.2. Morphemic Structure of Pronouns

cst cwft 1©tiM

(some one) + -$ (particle)

There is a system in the formation of inflected pronouns. Generally, a pronoun is

5.3.3. Morphemic Structure of Finite Verbs

5.3.4. Morphemic Structure of non-finite Verbs

5.3.5. Morphemic Structure of Adjectives

5.3.6. Morphemic Structure of Adverbs

®IT*K3 [agM] appeared [udgM] invoked

[up&gM] copulated [nirgM] extricated

English : [[[[establish]fy ment]„ arian]„y ism]„

5.4.2.I. Suffixes for Nouns

5.422. Suffixes For Pronouns

5.4.2.3. Suffixes for Finite Verbs

The aspect marker is an integrated part of conjugated verbs. It is also called as

5.4.2.4. Suffixes for Adjectives

[stup] + ^5 [krti] -» tjflpps [stup-T-krtA] "heaped",

(c) or CTf TO C’H "He entered into the room"

In the inflectional languages like Sanskrit or Latin, the grammatical properties

In generative grammar the post-positions are not considered as lexical morphemes,

(i) ■9ftCet?T C^M "boy of the near-by village"

(ii) sltolJJ "on the near-by road "

(i) Post-positions derived from nouns:

(ii) Post-positions derived from verbs:

Considering their sources of origin, in standard Bangla grammar the postpositions

5.6. Compound Words

The phenomenon of compounds in English is studied by different scholars. Among

In many respects compounding represents the interface between morphology and

In English, a compound noun may consist of a noun, adjective, preposition, or verb on

From syntactical point of view, the relation of the participating components of a

(a) mass literacy

5.6.1. Compound Nouns

Therefore, it is clear that orthographic conventions are a poor guide to compound

(v) the formation of compounds in Bangla is also controlled by distribution of the

(vi) morphological integrity is another feature of the compounds. The constituents of

(vii) a morpho-phonemic process can operate in the formation of compounds in Bangla.

At the time of morphological analysis of the compounds for automatic processing by

10 prefix + adjective W- + CM! -4 [a- + cena] -4 [acena] unknown

The prefixes have an important role in compound word formation in Bangla. It is

[[['5Tt-]pre NsKliJadj [[[a-] pre mMn]n]adj "till death"

5.6.2. Compound Verbs

A compound verb in Bangla, like a compound noun, is generally made of two

5.7. Reduplicated Words

“... nothing is more natural than the prevalence of reduplication, in other

“The interesting thing about reduplication is that it involves adding material,

The process of reduplication of words is not a very common feature in English. So

In Bangla, the reduplication of words is a very common phenomena as found in the

(iv)simple onomatopoeic words, e.g. [jh&njh&n] "jingling", [kh&lkh&l] "sound

The morphological analysis of reduplicated verbs (finite and non-finite) in Bangla

The morphological analysis of reduplicated adjectives in Bangla shows that the

You might also like