Professional Documents
Culture Documents
11 - Chapter 5 PDF
11 - Chapter 5 PDF
5.1. Introduction
The study of words is not altogether a new field of research in linguistics, though the
definition of the term word is a long standing problem in linguistics because of its ambiguous
nature. It either refers to a string of characters as it appears in speech or writing or it refers to
a more abstract entity, a part of the structure of the language as represented in a dictionary.
Aronoff (1981: 07) states that words are different from sentences; their structures are much
more varied, and though there is a single principle governing the structure of most complex
words, the principle must be applied to different class of words. Words, once formed, persist
and change; they take on idiosyncrasies, with the result that they are soon no longer generable
by a simple algorithm of any generality. All regular word-formation processes are word-
based. A new word is formed by applying a regular rule to a single already existing word.
Both the new word and the existing one are members of major lexical categories (Aronoff
1981:21).
The meaning of word is not always determined compositionally as in some cases the
word as a whole bears the meaning while in other cases the relationship between the meaning
of the parts of a word and the meaning of a word as a whole can be obscure. So there are
considerable difficulties pinning down any universally applicable notion of word, even when
we restrict ourselves to morphological criteria within a single language (Spencer 1991: 45).
One way to define wordhood is in terms of their linguistic contrasts, such as phonology,
syntax and semantics. Such criteria, when developed for individual languages, may be quite
successful. There are veiy few semantic properties of word which will distinguish them from
morphemes or phrases. Words are generally referentidly opaque (Spencer 1991: 42) i.e. it is
impossible to see inside them and refer to their parts. Rules of syntax, as generally conceived,
take words as their smallest unit and compose them into phrases and sentences. Here a word
is a minimal five form, the smallest unit which can exist on its own. A language has a
particular grammar of word structure which nonetheless conforms to certain quite general
principles governing possible word structures in language (Selkirk 1983: 09).
It is also observed that some morphemes which are used within a word may not have
any meaning. Once the words are in the lexicon, the morphemes out of which these words are
formed and into which they are to be analysed, do not have constant meanings and in some
cases have no meaning at all. Moreover, there is no constant meaning for some prefixes
which can be attributed to any of them. Here starts the basic trouble with words. Because
words, though formed by regular rules, change when required. Therefore it is difficult to
segment the meaning of individual word in a principled manner.
To overcome this problem Halle and Chomsky (1968) suggested that the dictionary
should contain the actual words as well as their idiosyncrasies. In their opinion a word can
convey more than it is expected to mean. These idiosyncrasies would include phonological
90
and syntactic exception features which a word might have. They would also include its
semantic and syntactic peculiarities which are not provided by general rules of morphology.
But the problem is that there are some words which are so idiosyncratic that their meanings
are totally divorced from what is expected by general rules. So, it is difficult to find how it
could mean something different from its expected meaning without damaging its rule of
generation.
In this chapter an effort is initiated to find out the morphemic structure of the Bangla
words used in the corpus texts to see how these morphemes are used in the formation of
surface words. The chapter is arranged as follows: section 5.2. gives a brief discussion on
earlier studies in this area, section 5.3. considers word formation processes in Bangla, section
5.4. encounters affixation processes which include different markers, case termination,
endings etc., section 5.5. briefly discusses the formation and function of the post positions,
section 5.6. analyses the process of compound word formation, section 5.7. discusses process
of formation of reduplicated words, and section 5.8. is the conclusion.
Among individuals, Matthews (1974) centres around words and lexemes, gives
traditional treatment of inflections and sandhi, considers different morphological processes
and investigates the relation of morphology with phonology and syntax along with the role of
morphology in generative grammar. Aronoff (1981) considers a morpheme as a phonetic
string which can be connected to linguistic entity outside that string. What is important is not
its meaning, but its arbitrariness. He conceived morphology in general framework of
91
One has to determine what sort of new words can be generated, in the process
mentioned above, in a particular language. There must be some rules for word formation for
each lexical category in the language. In Bangla, a word is a string of graphemes that appear
in print between the spaces or punctuation marks following an orthographic convention. It is
observed that in Bangla, word formation rules are highly productive for major lexical
categories such as noun, pronoun, adjective and verb but less productive to other lexical
categories namely indeclinable, adverb etc. For generation of new words, the major processes
are: (i) inflection, (ii) derivation, (iii) sandhi, (iv) compounding and (v) reduplication. The
following table (5.1) would show the processes of generation of words in Bangla.
92
Table 5.1.
Processes of generation of words in Bangla.
No process examples glossary
1 Single morpheme (word) fe + 0 —> fe [din + 0 —> [din] day
2 Adding suffix far + -4 days
(inflection)
[din+-gulo]-4 [dingulo]
3 Adding case (inflection) fe + -ccw —> [din + -er] —» [diner] of day
4 Adding suffix and case fe + + -us -4 in days
(inflection) [din + -gulo + -te] —»[dingulote]
5 Adding prefix (inflection) sj- + fe—»sjfe [su- + din] -4 [sudin] good day
6 Adding prefix and suffix + fe + -'OtM —» good
(inflection) days
[su- + din + -gulo] -4 [sudingulo]
7 Adding prefix and case ^- + fe+-ccw->^few of good
(inflection) day
[su- + din + -er] —> [sudiner]
8 Adding prefix, suffix and ^-+fe + -’G0?!T + -W -4 to good
case (inflection) days
[su-+din + -gulo + -ke] -4 [sudinguloke]
9 By derivation fe + -fo?F -4 "difw [din-f—ik]—> [dainik] Daily
10 Adding case with derived fe + -fb?F + -KIW -4 C'iRw’si of Daily
form (inflection after [din + -ik + -er] —> [dainiker]
derivation
11 Adding suffix with derived fer +-fb^ + —> wR<p'CK5Tt the
form (inflection after Dailies
[din + -ik + -gulo] -4 [dainikgulo]
derivation)
12 Adding suffix and case fe+-fb¥ + to Dailies
with derived form
[din+-ik + -gulo + -ke] -4 [dainikguloke]
(inflection after derivation)
13 By sandhi fe + sis —> fets [din+ant&]—» [dinantS] days’s
end
14 Adding case after sandhi fe + 'SIS + -tCW—> felcsw of days’s
(inflection after sandhi) end
[din + anti + -er] —> [dinanter]
15 By compounding fe + -4 day and
time
[din + kal] —> [dinkal]
16 Adding case after fe + + -C03 —» fe<?ll=1S3 of day
compound (inflection and time
[din + kal + -er] -4 [dinkaler]
after compounding)
17 By reduplication fe+fe —> Rife day by
day
[din + din] [dindin]
18 Adding case after fe+-co + fe+-cofeifer day by
reduplication (inflection day
[din + -e + din + -e] -4 [dinedine]
after reduplication)
93
In the process of word formation the use of bound morphemes with free morphemes is
always controlled by some rules applicable to morphemes. In the list above, for no. 9 the rule-
of derivation, and for no. 13 the rule of sandhi have operated upon the root words. Therefore,
it is necessary to identify, by analysing the morpho-phonemic structure of words, which rules
would operate on them for producing new words. Because existing words sometimes tend to
be resistant to any system which derives their properties using general rules.
The hypothesis that each word has its internal constituent structure implies that there
must be a Word Structure Grammar {WSG) to generate that word. We have tried to look into
the constituent structures of Bangla words belonging to different lexical categories. Each and
every word must belong to some lexical category, the exact category being determined by the
Word Formation Rules (WFRs) which produce the words (Aronoff 1981: 49). For instance, in
Bangla the suffix [-tv] produces nouns (e.g. [naritva] "femininity" etc.), while the
suffix -4FT [-ban] produces adjectives (e.g. [d&yaban] "kind" etc.).
The corpus shows that in the texts two types of word are used: (i) root words, and (ii)
inflected words. The root words are mostly free morphemes (noun, adjective, pronoun, verb
roots etc.) which have the potentiality to be inducted in the lexicon while inflected words are
generated by some grammatical concatenations between root words and suffixes which are
mostly bound morphemes. These bound morphemes are not generally inducted into a lexicon
though they could be entered in the lexicon for detail grammatical analysis.
(ii) the word forming morphemes are synthetic by nature i.e. morphemes can join with
others to generate words. Among them some are highly productive (prefixes, suffixes,
case markers etc.) whereas others are less productive.
94
(iii) among the morphemes some are bound (affixes) which cannot be used independently
in a sentence and some are free (root Words) which can be used independently.
(iv)the number of bound morphemes is nearly fixed in the language. It can be increased
only when new morphemes are borrowed from other language or coined within the
language.
Following traditional grammar words generally belong to: noun, pronoun, adjective,
verb, adverb, indeclinable and post-positions. Among these some are in root form with or
without affixes or markers, some are with inflections, some are in derived form, and some are
in derived form with inflections or affixes. Moreover, there are compound and reduplicated
words with or without inflections. For automatic parsing and tagging, the words belonging to
each lexical category are to be analysed structurally to understand the patterns of their
formation. In the following sub-sections we have tried to understand this by analysing the
words structurally.
The formation of inflected pronouns is far more complicated than that of inflected
nouns, because unlike nouns, the components of pronouns quite occasionally replace their
positions in their linear arrangements. Another difference form noun is that a pronoun
whether inflected or not never uses a prefix. On the other hand, almost all pronouns use
inflections when they are used in the texts. Generally, pronouns are used as pronouns in the
texts. However, there are some instances where a pronoun is used as a noun in the text such
as given below. However, this kind of use is very rare in the texts if not impossible:
Some pronominal roots undergo morpho-phonemic changes when they use inflection.
For example, roots tpi- [turn-] and [tu-] change into [tom-] and C®t- [to-], respectively
whenever the plural suffix -3f [-ra] is used with them. The example is displayed below:
singular plural
[turn-] : [tumi] "you" :: osbr- [tom-] : WlNal [tomra] "you"
[tu-] : [tui] "you" :: C®t- [to-] : [tora] "you"
The total number of pronouns in Bangla is around 750 including both non-inflected
and inflected forms. However, the number of pronoun roots is around 50 and that of suffixes
is 32, which by some rules of grammatical agreement between roots and suffixes constitute
the total list of pronominal forms. Moreover, there are some pronouns which are used as
adjectives in the texts. Among the inflected pronouns some personal and demonstrative
pronouns are most frequently used in the texts. The corpus cites a new form (C^t [kei] "who
else") which is not found in the pronoun list Probably, it is derived in the following way:
Table (5.3)
Morphemic structure of inflected pronouns
no. process examples glossary
1 root + 0 CTf + -0 —> <?T [se + -0 -4 [se] he
2 root + particle CT + _» C5$ [se + -i] -4 [sei] himself
3 root + article CT + -f& —> [se + -ti] —> [seti] that
4 root + article + particle CH + -i + 4$ —»[se + -ti + -i] -4 [setii] that
5 root + particle + article csr + [se + -i + -ti] -4 [seiti] that
6 root + article + case C5T + -H +-C^-> C5#C^ [se+-ti+-ke]-4 [setike] to that
7 root + article + case + C5T+-f + -C<jr+-|-4 to that
particle
[se + -ti + -ke + -i] —» [setikei]
8 root + particle + article C5f+-t + -ft + -C^-4 t^lBw to that
+ case
[se + -i + -ti + -ke] -4 [seitike]
9 root + particle + article CST + -f^ + -C^ + -t; —> to that
+ case + particle
[se + -i + -ti + -ke + -i] —» [seitikei]
10 root + number csf + -<#r _> C5T<3[% [se + -guli] -4 [seguli] those
11 root+particle+number (?f+-^+--<3l%-4 [se+-i-t-guli]—» [seiguli] those
12 root + number + case C5T + -<of% + -foi -4 of those
[se + -guli + -ir]-4 [segulir]
13 root + particle + C5T + + -<#T + -fbsT -4 CT'Ofe of those
number + case [se + -i + -guli + -ir] -4[seigulir]
•14 root + case tsW + -4 wtstcsr [taha + -ke] -4 [tahake] to him
15 root + number + case + tshtf + ~<M + -w + -t -4 to them
particle
[taha + -der + -ke + -i] -4 [tahaderkei]
16 root + number + article <3T$T + -0BT + -fl + -C® -4 V5l5trate to their's
+ case
[taha + -der + -ti + -te] -4 [tahadertite]
17 root + number + article ^r+-cw + -f& + -Ct5 + -$ -4 in their's
+ case + particle
[taha + -der + -ti + -te + -i -4 [tahadertitei]
18 root + person vjsrfsi + -fo -4 $!# [am + -i] -4 [ami] I
19 root + person + gsrfsj + _fo + -4 vsnfit [am+ -i + -i]-4 [amii] I myself
particle
.20 root +case +article tshr + -coi+-It -4 tsisisrit of his
[taha + -er + ti] -4 [taharti]
21 root + case + article + tSl+ -COI + -f + -W -4 to his
case
[tahl + -er + -ti + -ke] -4 [tahartike]
22 root + case +article + WRT + -Col + -t + -C¥ + -t -4 of his
case + particle
[taha + -er + -ti + -ke + -i] -4 [tahartikei]
97
The analysis shows that the markers quite often shift their respective positions in case
of inflected pronouns formation. This information and the modes of their arrangement are
necessary for developing algorithms for automatic detection and parsing of pronominal forms
by machine. Similarly, at the time of surface pronominal forms generation, the logical and
possible mappings of the components are to be used to stop generation of wrong pronominal
forms.
Most of the verbs used in the corpus text are in their conjugated forms. Probably, the
verb roots, devoid of information provided by verbal suffixes, are not competent enough to
denote a complete sense of action. Therefore, to give a complete sense of action as well as to
provide aspectual, temporal and other information the roots need to use some grammatical
properties (i.e. markers of person, number, honorific, non-honorific, gender etc.) where all
information are stored. As a result, for understanding the function of the conjugated verbs we
have to analyse their suffix parts to extract relevant information. This helps us to process
conjugated verbs without paying emphasis on their semantic part. In the table (5.4) below the
possible morphemic structure of conjugated verbs is given:
Table (5.4)
Morphemic structure of conjugated verbs
no. process examples glossary
1 root + 0 ^ + -0 -> W [idr + _0] [Jc&r] to do
2 root + person W + -fo -4 [kar + -i] -> [k&ri] Ido
3 root + auxiliary + + -s- + -fe -4[kar+-ch+-i] -4 [k&rchi] I am doing
person
4 root + causative + + -or + -w+-fo -» causing
auxiliary + person others to do
[k&r + -a + -cch + -i] —> [k&racchi]
5 root + aspect + w + -fotsj + -w + -fo —> <j>Qwsfie I am doing
auxiliary + person [k&r + -ite + -ch + -i] -4 [k&ritechi]
6 root + tense + 5SRT + -fCM + -Otn -4 I did
person
[k&r + -il + -am] —»[k&rilam]
7 root + auxiliary + ^3 + -■§>■+ -fb«T + -OUT -4 WmIN I was doing
tense + person
[k&r + -oh + -il + -am] —> [karchilam]
8 root + aspect + +-font+-w+ -fcM+-our -4 I had done
auxiliary + tense
[kSr+-iya+-ch+-il+-am] —» [kMyachilam]
+ person
9 root + causative + w+-cT+- doing
gerund
PfcSr + -a + -no] -4 [k&rano]
98
At least five types of grammatical information are stored within the suffix part of a
conjugated verb: aspect, auxiliary, tense, person and particle. However, all information are
not used always with every verb root. In certain cases only one or two information are used
while in some other occasions, all information arc used. The sequence of using the markers
with regard to root is always uniform, i.e. no marker would interchange its position with other
in respect with the root. Generally, the root takes the leftmost position following which come
markers of aspect, auxiliary, tense, person and particle one by one in sequential order. If any
marker is dropped from suffix, its immediately following marker would occupy its place.
However, there is only one variation of this system. The particle (mostly emphatic) can shift
its position after aspect marker if required, though it is generally used at the final position.
This shifting of particle marker from last to third position sometimes creates problem for
automatic identification and parsing. Following this method around 105 valid conjugated
verbs can be obtained from a single verb root. In fact, in our corpus we have come across
nearly 105 valid conjugated verbs of a single verb root (including simple, causative and
gerundial forms).
Table (5.5)
Morphemic structure of non-finite verbs
no examples golssary
1 ^ + -CO —> W [Mr + -e] —> [kare] doing
2 ft + -01 —» fct [di + -ye] —> [diye] giving
3 + -fbqf —> <jRiqi [Ml + -iya] —»[baliya] saying
4 ft + -It —> ftst [ni + -ya] -» [niya] taking
5 f+ [dhu + -ite] -» [dhuite] to wash
6 + -CS —> scto [MI + -te] [cAlte]. to go
7 *IT + -»*<r$D5T [kha + -ile] [khaile] having eaten
8 ft H—(yf —^ fttyT [di + -le] —^ [dile] having given
99
If the markers given above and the algorithm for appropriate matching between roots
and suffix parts are followed for identifying non-finite verbs, it is hoped that almost all the
non-finite verbs would be automatically identified and parsed. In fact, one of my colleagues
has successfully identified and parsed all non-finite verbs in the corpus using this schema.
Details can be found in Chaudhuri, Dash and Kundu (1997).
Moreover, as with the nouns, a prefix can be added to an adjective following the rules
of grammatical agreement between the two. In the table (5.6) the morphemic structure of
adjectives is displayed with examples:
Table (5.6)
Morphemic structure of adjectives
no. process examples glossary
1 root adjective form + -0 —> [bhalci + -0] —> [bhal&] good
2 adjective + adjectival longest
suffix
[dirgha + -tSmi] —> [dirghMm&]
3 prefix + adjective '5T- + -4 [a- + purna] —> [apurni] unfulfilled
4 verb root +adjectival 5*T + -fa -4 E*ifa [dll + -ti] -4 [c<i] running
Suffix
5 noun +verb root ^ + *IW + -Ot -4 stained with
+adjectival suffix blood
[riktd + makh + -a] -4 [r&kt&makha]
6 noun + noun + +-514 -> during war
adjectival suffix
ftuddha + kal + -In] -4 [^uddh&kalin]
7 pre + verb + tsr- + SR4 + -<5ta -4 unbearable
adjectival suffix
[a- + s&han + -lyi] -4 [asShanlyi]
100
Table (5.7)
Morphemic structures of adverbs
no. process examples glossary
1 root adverb form + -0 —> toft [tSkhSn + -0] —> [tSkhSn] then
2 noun + adverbial + -CO —> ?PtC5 [kach + -e] -4 [kache] near
suffix
3 adjective + adverbial fta + -10 -»Iter [dhlr + -e] -4 [dhlre] slowly
suffix
4 adjective + noun + fom+to+-co -4 specially
adverbial suffix
[bises + bhab + -e] -4 [bisesbhabe]
The analysis of adverbs in corpus shows that they generally modify the action denoted
by verbs in the sentences. Moreover, some post-positional words like 'SIC'W [apeksa],
[k&rtrk], Sffl [dvara], [d&run], \3RT Q&ny&], vTOT [janye], TO [tAre], 2tfo [prati], qim [nyay],
[b&rab&r], [sAhit], M3? [bSnam], ?fl^f [bab&d], 3$ [bAi], fort [bina] etc. are
sometimes used with nouns, pronouns or adjectives to denote adverbial sense.
5.4. Affixation
Affixes are bound morphemes which are never used independently in the text without support
of a root word. They are considered as derivational morphemes which have the function of
creating a new word out of another existing word (Spencer 1991: 39). However, in Bangla,
they include both inflectional and derivational affixes having meaning of their own by which
they can affect the semantic property of a word. In the word formation processes these affixes
are recursive in nature. They can be used before, after or within nouns, pronouns, adjectives,
verbs, adverbs or compounds. Traditionally affixes are divided into three groups: prefix, infix
and suffix. In Bangla the number of prefixes is less in comparison with suffixes while the
number of infix is very few. The details of each group of affixes are given in the following
sections.
101
5.4.1. Prefixes
A prefix is used in front of a word and is considered as an integrated part of a word.
Generally, prefixes have some fixed meaning attributed to them but now it is claimed that
prefixes have no constant meaning which can be attributed to any of them (Aronoff 1981: 13).
They have an important role in word formation process as the use of a prefix can add an extra
shade of meaning to the existing word.
The total number of prefixes used in Bangla is around 70. Among them the number of
prefixes inherited from Sanskrit is 20. The number is almost fixed and all the forms are more
or less regularly used in the language. The number of prefixes acquired from native sources is
around 30 which are mostly collected from native sources and borrowed form neighbouring
languages. The number prefixes of foreign origin is also around 30. All these forms are
borrowed either form Persian group of languages (mostly Arabic or Persian) or European
languages (mostly English) by the process of lexical borrowing. The number is not fixed as it
can be increased with the increase of lexical borrowing from different languages. The table
below (5.8) gives the total list of prefixes found in the corpus.
Table 5.8
The list of prefixes found in the corpus
Sanskrit 'Slfe- [ati-] vs#- [adhi-] vspj; [anu-] ®r*f- [ap-] safR- [api-] W- [ab-] \Sjfe-
[abhi-] vsjt [a-] [ut-] [up&-] ’p- [dur-] R- [ni-] Rl- [nir-] ■rat- [p&ra-]
■'#- [pfiri-] 2f- [pr&-] 2lRi- [prfiti-] R- [bi-] - [s&m&-] 3J- [su-]
Bangla 3T- [a-] srar- [aj-] ®RT- [ana-] v5Jt- [a-] sit*t- [ag-] 3i1$- [at-] Wg- [aR-] ^T-
[un&-] ^ra- [upar-] viMR- [upri-] f- [ku-] *f<3- [gand&-] fw- [chmic-] qt- [na-
] R- [ni-] f%5- [nit-] [pach-] "#3- [pati-] ■'tW- [pas-] [pich-] R- [bi-
] tra- [bh&r-] [mig-] m- [majh-] fe- [mit-] m- [ram-] sr- [s^-]
[su-] 5t- [ha-]
Foreign W9- [ab-] v5Jbr- [am-] W- [kar-] [khas-] f'f- [khus-] «1*f- [khol-] ra-
[glr-] 'ra^T- [dfib^l-] ra- [dfir-] Jtf- [na-] R^t- [nim-] R- [phi-] [phul-] 3-
[b&-] ^t- [b&d-] <3- [be-] sra- [sab-] ra- [Mr-] [haph-] era- [hed-]
There are some words found in the corpus texts like sra [atS.], [an], [antHh],
[ud], [et&], fra [ciri], W3 [tM], W [t&t], crcit [t&tha], (ss [te], far [tri], rat [d£$], \ [du], $
[duh], ft [dvi], H [n&], Rs [nih], ras [pific&], ra [p&r], ‘‘JsTt [pure], 2ft^ [prak], ft7 [phri], W [be],
W [yltha], stf© [sat], ^ [sv&] etc. which have the potentiality to be considered as prefixes
because in most occasions they play the role of a prefix of the words in the text However, in
the traditional grammar these forms are not considered as such.Among the prefixes the forms
like v5r- [a-], [a-], f- [ku-], [dur-], [na-], fra- [nir-], 21- [pr&-], w- [be-], *T- [si-] etc.
are highly productive in nature because they can be added to any noun or adjective to generate
a new word having a different sense or meaning. Moreover, forms like vsjfe [ati], 'SFJ [anu], ^
[dur], rat [p&ra], 2ife [pr&ti], 5R [sSmS], 'Silt [at], Sit'S [aR], ]§ra [up&r], [upri], «Tf [na], ‘‘It?
102
[pach], -'ttfo [pati], *IH [pas], f*t? [pich], m [bhSx], [majh], [ram], ^ [khas], ^
[khus], <3IW [khos], tsg?T [dabal], f*T [phul], [bid], ^ [sab], W [haph], C^5 [hed] etc. can
be used as independent words in the text. In their individual use they are equally efficient in
using case tertminations and other endings like regular words in the language.
The use of a prefixes can change the lexical category or part-of-speech of a word. The
newly formed word can be either opposite in meaning or can change the meaning of the
existing word. A set of examples where a word is changed in meaning after using different
prefixes is given in the table (5.9).
Table 5.9
Change of meaning of words by using prefixes
Prefixed word Glossary Prefixed word Glossary
sights [adhigM] under control '5FTW® [anag&ti] not has come yet
'Spj'lts [anugM] obedient [apagM] spoiled
3TW5 [ab&gM] known >5ife*K3 [abhigM] informed
5.4.2. Suffixes
For the purpose of easy computability and processing of words, the term suffix is used here in
its broadest possible sense. Here, all those forms are considered as suffixes which are added
immediately after the root (base) word. It is observed that the in case of inflected words, the
determination of parts-of-speech of words primarily depends on the suffix part of words. Two
parallel examples are given to elucidate the idea.
The examples show that a valid word can be formed at each block and the part-of-
speech of the word will be determined depending on the last part of the word (specified by
bold subscript in each block). In this parameter, there are four types of suffixes in Bangla.
They are noun suffixes, verb suffixes; adjective suffixes and adverb suffixes.
The list of suffixes for nouns encompass the total set of inflections used for nouns. These
suffixes include gender, number, person, particle markers and case termination. Because of
their bound nature, they cannot be used independently in the text. Form the corpus it is found
that the number of total suffixes used with nouns is around 75. Among them singular markers
103
are 5, plural markers are 41, gender markers are 25 and particle markers are 3. These
inflections are used at the end of words.
The concept of person is absent for nouns. Therefore, all nouns are considered to
belong to third person only. For denoting number, nouns have two types of suffixes: singular
(5) and plural (41). To denote duality of nouns, some adjectives with dual sense (^5U [ubhay]
"both", [^authi] "both" etc.) are used with nouns. Sometimes, some cardinal adjectives
[du], ^ [dui], ijft [duti] "two" etc.) are used before nouns to denote same sense of duality. The
suffixes denoting singularity are used both for animate and inanimate nouns whereas the
suffixes denoting plurality have some specifications in use. The suffixes denoting singularity
are: -H [ti], -H [-ta], -C& [-te], -sflft [-khani] and -HfFft [-khana].
Out of total number of plural suffix markers, 13 markers are used for animate nouns,
17 markers are used for inanimate nouns and the remaining 11 markers are used for both
animate and inanimate nouns. The suffixes denoting plurality of nouns are given below:
Animate : -^°T [-kul], -*fcT [-gin], -v9H [-jin], -fw [-dig&], -(M [-der], -•'ttH
[-pal], [-bSrgi], [-brndi], -at® [-bratH], [-m&hal],
-^51 [-^utha], -at [-ra], [-s&nghi]
Inanimate : -Otft [-adi], -Ota# [-ab&li], -“'ft? [-gach], -*!Ht [-gacha], -•'tlfis'
[-gachi], -<!W [-gucchi], -tfH [-gram], -'StH [-jal], -IiH [-dam],
-U¥ [-cdy], -ftw [-nik&r], -ftni [-nic&y], [-pufij&], -a^st [~br§j A],
-4M [-mala], -at% [-raji], -?#t [-rasi]
Common : -coat [-era], -<01% [-guli], -'0# [-gula], -oc# [-gulo], -HH [-dAl],
[-m&ndil], -*TG# [-m&nd&ll], -C3# [-sreni], [-sik&I], [-s&b],
-spp [-samuh&]
The Bangla nouns have retained some specific gender markers which are mostly
borrowed form the Sanskrit and the Persian languages. Unlike Hindi, the gender markers in
nouns in Bangla have no impact on the conjugation systems of verbs because Bangla does not
have any grammatical gender. The gender value is mentioned by an adjective which precedes
the noun, e.g. C4C8 [meye minus] "women" or nI^i! [mihila k&bi] "poetess" etc.
Because of their bound nature, these markers are always dependent on base nouns. However,
in the corpus texts, some specific gender markers (nearly 20) are found. The analysis shows
that these gender markers are mostly feminine (nearly 15), though there are a few specific
markers for masculine (nearly 5) and both these are used for Bangla adjectives and nouns.
The feminine gender markers found in the corpus texts are: -Of [-a], -Olft/-Ot% [-ani/-anl], -
[-ini/-inl], -fo/-# [-i/-I], -fb<PT [-ika], -font [-iya], -fbH [-ima], -ft [-ni], -# [-trl], -4#
[-Mtl], -OH [-an], -# [-si], -5J# [-mitl] etc.
104
In the corpus texts, two types of particles are found: (i) emphatic, and (ii) negative.
Emphatic particles ($ [i] and v3 [o]) are generally used immediately after nouns without any
blank space in between. Moreover, an emphatic marker sometimes becomes a part of a word
by changing into an allograph (e.g. etc.). This kind of mutation takes place
only when a word-final consonant grapheme is without any allograph and non-vocalic in
. utterance. The negative particle markers, however, are very rarely used with nouns.
The use of case termination is a unique property of nouns. Because of the inflectional
nature of the language, some extra information regarding case relation are supplied with
nouns by the use of case termination. Generally, only one case marker is used at a time with a
noun (e.g. + -CO? [bal&k + -er] "of boy"), though in some occasions, more than
one case markers are observed to be used at a time with a single noun. For instance, the noun
wliwsw [lokederke] "to the people" + -C + -tH? + -t^ [lok + -e + -der + -ke]) has three
case markers: (i) -CO [-e] (accusative), (ii) -0t? [-der] (genitive with a sense of plurality), and
(iii) -C^ [-ke] (accusative). From the corpus texts it is counted that the case termination used
for nouns are around 32 in number. These markers can be used with a noun depending on its
structure and case in the sentence. Below (table 5.10) is a list of each case markers for nouns.
Table 5.10
Case markers for nouns
case marker
Nominative [-0], -to [-e], -C? [-ye], -? [-y], -as [-te]
Accusative [-0], -to [-e], -or [-ye], -? [-y], -c* [-ke], -C? [-re]
Instrumental [-0], -io [-e], -? [-y]
Dative [-0], -OT [-ke], -as [-te]
Ablative [-0], -to [-e], -0$ [-ke], -as [-te]
Genitive [-0], -? [-r], -01? [-ar], -CO? [-er], -C?? [-ker], -?t? [-kar]
Locative [-0], -CO [-e], -QT [-ye], -? [-y], -as [-te]
Generally, the case markers perform the inflectional roles of the words. Therefore, in
most cases, the addition of a case marker with a word-final character does not cause any such
notable morpho-phonemic change in the word. However, for the case marker (-CO? [-er]) it is
observed that the first character (-to [-e]) is retained if the last character of the noun is a
vowel, consonant or cluster grapheme (e.g. + -CO? > <pIc<m [kak + -er > kaker], or (ii) it is
dropped if the last character of the noun is a vowel allograph (e.g. NWT + -CO? > *IM? [matha +
-er > mathar], ‘5lfo + -to? > *tfts? [g&ti + -er > g&tir], + -CO? > [n&dT + -er > n&dir], ^ +
-to? > *Pf? [madhu + -er > m&dhur], ?T + -E03 > [badhu + -er > [b&dhur], CWT + -to? >
C5C5T? [chele + -er > cheler], srtMT + -to? > 'SltWl? [alo + -er > alor] etc. The examples show
that at the time of using genitive case markers the final character of the noun generally
dominates in generation of a surface inflected form.
105
The system of using suffixes with nouns is appicable to pronouns also. The lists of
pronominal suffixes include person, gender, number, particle markers and case termination.
The pronouns have three persons: first, second and third person though the number of
markers is 2 ([-0] (null) and -fo [-i]). Among them, -fo [-i] denotes singularity of person and
is used with all personal pronoun roots irrespective of first, second or third person such as
[ami] "I", [tumi] "you", vSiT1# [apni] "you", [tini] "he" etc. Moreover, it does not
cause any morpho-phonemic change in the surface pronominal forms.
For denoting number, it has two types of suffix: singular and plural. These forms
generally indicate the nominative case as there is a very few specific nominative case markers
for pronouns. Pronouns have a peculiarity in use of suffix markers for number. It is found that
a pronominal form denoting singularity can use a plural suffix (e.g. 'WlHld^TT [amar(sg) + -
gulo(pl)] "those of mine" etc.), while a pronominal form denoting plurality can use a singular
suffix (e.g. vslwjjbl [tader(pl) + -ta(pl)] "theirs" etc.) which lead us to consider
[amargulo] as a plural form, and vsItHsibi [tiderta] as a singular one. Therefore, the actual
number denoted by a pronominal form is determined by the number denoted by the suffix.
This information helps in automatic determination of number of surface pronominal forms.
The notable feature of the suffixes denoting singularity is that none of these is used
with personal (human) pronoun roots except CT [se] where it denotes a non-human object as
in C5i|f [setuku] "that only", Cl# [seti] "that" etc. Among singular suffixes, !<M [-khana] and
[-khani] are sometimes used separately being detached from the pronominal roots as in W
[se khana] "that", C^T [kon khani] "which" etc. To denote duality the word
[ubh&y] "both" is used as pronoun in the language. Moreover, some cardinal adjectives like
^ [duti], [duto], [duta] "two" etc.) are used after pronouns as in cspjfij [seduti], wjcbl
[seduto], cs^jt [seduta] "those two" etc.
The suffixes for plurality are used both with personal and impersonal pronominal
roots, though in case of the personal pronominal roots like f- [mu-], f- [tu-] and sp- [turn-],
they cause a morpho-phonemic change in the surface forms. Among them, -^t [-k&ta], -3# [-
k&ti] are probably derived from [k&yekta] and <KJj«>lb [k&yekti] "some" respectively.
Moreover, the plural forms, like the singular forms, make no discrimination at the time of use
with both personal and impersonal pronouns. The list of pronoun suffixes denoting number is
given in table (5.11).
Table 5.11
Suffixes denoting number for pronouns
Singular -ft [-ta], -f [-ti], -Ct [-te], -|f [-tuku], -m [-khan], [-khana],
[-khani]
Plural -com [-era], -m [-ra], -mfet [-k&ta], -mft [-k&ti], -olm [-guli],
-Q5lt [-gula], -OCSTt [-gulo], -‘m [-digi], -m [-der]
106
The concept of gender division is also irrelevant for pronouns in Bangla because there
is no specific gender markers for the pronominal forms. Moreover, the gender has no impact
on the verbal conjugation in the language.
The particles used for nouns are also used with pronouns in the language. Among
them, emphatic particles [i] and \3 [o]) are generally used after pronouns without any blank
space in between, while negative particles [na], 1% [ni] and <7T [ne] "no") are used neither
with nouns nor with pronouns. Moreover, similar to nouns, the particles sometimes become a
part of a surface pronominal form (e.g. [kei < keu + -i] "whom" or
+ -'3 [kono < koni + -o] "any" etc.) 1
The use of case markers with pronouns is almost similar with that of nouns in the
language. However, for pronouns, the markers are available only for nominative, accusative,
genitive and locative cases while for nouns the markers are available for all seven cases.
Moreover, similar to nouns, the markets for accusative case are used to denote dative case
also. Another interesting thing is that the marker -<M [-der] (probably derived from -fetw [-
diger] < -fw [-dig&(pl)] + -COsf [-er](gen)]) is used both in nouns and pronouns as a genitive
plural marker as in tstcw [tader] "of them" etc. The list of case markers used with pronouns is
given in table (5.12).
Table 5.12
Case markers used with pronouns
Nominative [-0], -CO [-«], -I [-y], -GF [-ke], -Cl [-re]
Accusative -CO® [-ere], -®f [-ke], -a [-y], -CO [-e], -CO® [-ere], -CST [-re]
Genitive -O® [-ar], -COS' [-er], -H [-r], -(M [-der]
Locative -COC© [-ete], -n [-y], -os [-te], -CO [-e], -fOOT [-ite]
After structural analysis of the conjugated verbs, it is observed that the verb suffixes include
aspect, auxiliary, tense, number, person markers and particles. However, all these suffixes are
not used always with every conjugated verb. The use of the suffixes mostly depends on the
grammatical sub-classes (person, number, tense etc.) of the verbs.
"The force of affixed themes was to indicate the aspect or nature of the action
whether it was progressive or transitory, iterative or intensive or indefinite".
The aspect markers used with the conjugated verb forms in Bangla are: fc>03 [ite] (e.g.
<&Rcv3fl [kSritechi] "I/we are doing"), -foil [-iya] (e.g. sfetf [suniyach&]"you have heard"), -
107
COM [-oya] (e.g. ojIjjMn [dhoyatam] "I/we caused to wash"), -Ut [-ya] (e.g. ftltfe [niyachi]
"I/we have taken"), -01 [-ye] (e.g. finfe [diyeehi] "I/we have given"). They are used
immediately after the verb root in the conjugated forms. However, the forms can vary
depending on the structure of the verb root.
The auxiliary markers are generally used immediately after the aspect markers in a
conjugated verb form. They are of two forms: -S’ [-ch] is used with the verb roots ending in a
consonant grapheme as in [kirchi] "I/we am/are doing", while -s? [-chh] is used with the
verb roots ending with [-i], -OT [-a], -fb [-i] and -q [-u] (e.g. Sts? [hicche] "being done", !*tlfe
[khacchi] "I/we am/are eating", fife [dicchi] "I/we am/are giving" and $3? [dhucche] "S/he is
washing" etc.) For automatic identification of the conjugated verb forms these markers are
quite helpful.
The tense is a deictic category that places a situation in time with respect to the
moment of speech, or occasionally with respect to some other pre-established point in time
(Bybee 1985: 21). It is observed that nouns usually refer to time-stable entities, while verbs
refer to situations that are not time-stable. Thus it is the verb that needs to be placed in time if
the event or situation is to be placed in time, since the entities involved'in the situation
usually exist both prior to and after the referred to situation. However, this distinction in the
form of tense does not effect the meaning of the verb (Bybee 1985: 22), since the situation
referred to by the verb remains the same whether it is said to occur in the present or the past
or will be occurring in the future.
In Bangla there are three specific tense markers for the conjugated verb forms: (i) -
fo°T/-5f [-ila / ~li] for past tense (e.g. [kirili / karli] etc.), (ii) -fots / -*3 [-ita / -ti]
for habitual past tense (e.g. / ^3 [kiriti / kirti] etc.), and (iii) -fo^ / - ^ [-M / -bi] for
future tense (e.g. [kiribi / kirbi] etc.). However, there is no specific tense marker
for denoting present tense. These tense markers are always attached with the verb roots.
Moreover, these forms are not number dependent i.e. forms do not differ depending on
number of the subject. The concept of person plays a very important role in the conjugated
verb forms because depending on this property the final forms of the verbs are changed. In the
table (5.13) different forms of person markers are cited.
Table 5.13
Person markers for the verb forms
person marker
Present Past Future Habitual Imperative
1st -K> [-i] -5im [-lam] -?[-bi] -m [-tarn] [0]
2nd -cot [-o] -c*t [-le] -«[-be] -m [-te] . -cot [-o]
2nd (n-hn) [-0] -fit [-li] -ft [-bi] -first [-tis] -fost [-is]
2nd (hn) -cot [-en] -cot [-len] -(OT [-ben] -OOT [-ten] -q^t [-un]
3rd (n-hn) -CO1 [-e] -st [-li] -w [-be] [ti] -q<sr [-uk]
3rd (hn) -cot [-en] -COT [-len] -ot [-ben] -ccft [-ten] [-un]
108
In case of conjugated verbs, each person marker (1st, 2nd and 3rd) is different from
the other. Even, in case of the 2nd person, the makers vary depending on whether the person
is general, honorific or non-honorific. The same variation is noted in case of honorific and
non-honorific quality for the 3rd person also. Moreover, the person markers vary also
depending on the tense of the verbs.
Both types of Bangla particles are very often used with the conjugated verbs. Among
them the emphatic particles (t [i] and \3 [o]) are generally used immediately after the
conjugated verbs without any space in between as in [klrei] or TOM3 [k&reo] "also does"
etc. However, in some occasions, they are used even in the middle of a conjugated verb form
as in [k&reichi] or [k&reochi] "Fwe also have done" etc. The other other
emphatic particle [na] "indeed") is structurally same with the negative particle ^ [na]
"no", but functionally different from it This particle has its origin in the Sanskrit form
[nam&] "indeed". This form is mostly used in the sense of emphasis in the sentence as in
*lt 'SM [tumi na khub bhala] "you are indeed very good" or [tumi na
^abe Mlechile?] "you indeed wanted to go" etc.
The negative particles OTt [na], ft [ni] and CT [ne] "no") are used in the sense of
negation in the sentence. They are generally used after the verb forms. The pattern of writing
of these particles in the text creates some problems in analysis of words by the machine. It is
found that [na] and ft [ni] are sometimes used after the verb form with a blank space as in
^ [^abi na] "I/we will not go" or ft [khai ni] "I have not eaten" etc. But in some
Occasions, they are attached with the verb forms without any blank space in between as in
$w*il [habena] "will not" or Offifft [dekhini] "I have not seen" etc. The other form (<?T [ne]) is
always used with a verb form without space in between as in vsrtfttTT [janine] "I don't know"
etc. However, this form is very rarely used in the prose text and the purpose of its use is to
evoke a poetic flavour in the statement. '
In case of any conditional statement the negative particle ^ [na] is used before the
non-finite verb as in ^ WT [na bile] "not saying" etc. Here also the negative particle ^ [na] is
sometimes attached with the word or sometimes detached from the word with a blank space
in between. The other two forms (ft [ni] and (7T [ne]) are not generally used in this type of
context in the text.
After structural analysis of the adjective forms in the corpus text it is found that a majority of
the nouns or adjectives use specific adjectival suffixes to generate surface adjective forms.
These adjectival suffixes denote gender and adjectival quality of the words. Adjectival
suffixes are of two types.
The first type of adjectival suffixes are bound morphemes in nature and therefore are
not generally used as a separate surface word form in the text. They are generally used with
109
nouns for generating a surface adjective form. The addition of a suffix with the word
generally cause the morpho-phonemic change either by derivation (pr&tay) or by sandhi.
These can be called primary suffixes (derivational in nature). In the corpus text we have
found nearly 60 such suffixes. (Details are given in Appendix - ).
The second type of adjectival suffixes are also bound morphemes in nature, therefore,
are never used as separate surface word forms in the text However, these are compounding in
nature because joining of these forms with any noun or adjective results in producing an
adjectival compound. They can be called secondary suffixes (inflectional in nature). Among
these suffixes some are highly productive while some are less productive. The addition of
these at the end of a word may cause a morpho-phonemic change. There is a kind of mapping
for their joining with nouns or adjectives. In the corpus text around 50 such suffixes are
found. (Details are given in Appendix - II).
Besides, there are some simple as well as participial adjective forms (mostly derived
from nouns and verbs through passivisation) which are also used for generation of compound
adjectives. These forms (around 200 in number) are free morphemes by nature as they can be
used as separate surface word forms in the text Moreover, like secondary suffixes, they are
compounding in nature because joining of these forms with any noun or adjective results in
producing a compound adjective. (Details are given in Appendix - HI).
5.4.3. Infixes
Probably, Bangla has no infix form in the true sense of the term. However, it has a form (-31-
[-T-]) which is used at the middle of a word whenever an adjectival suffix is added to a noun
by derivation. For instance:
Moreover, some verbal nouns (gerunds) have a link vowel (-01- [-a-]) inserted in
between two morphemes while the second part of the form takes the affix -fo [-i] (Chatteiji
1993: 1048) as in [janajani] "knowing", HlsllHlQ [maramari] "fighting" etc. For our
convenience of morphological processing these markers are considered as infixes in the
surface word forms. Whenever such types of word are considered for character based analysis
these markers are fished out as infixes form the surface word forms.
5.5. Post-positions
Post-positions are those which, when used immediately after some words, develop a case
relation with it They mostly retain their phrasal characters and they are remained distinct as
detached words. They are equal to case markers because they are used to denote case relations
which could have been expressed by some specific case markers. The example below would
show that in Bangla, in some occasions, the case relation can be either denoted by a case
marker or by a post-position:
110
(a) <?T fft Red 'SltCHof <f>lCb or csf ffac® 3>1Uj "He cuts apple with a knife"
[se churi diye apel kate] or [se churite apel kate]
(b) <?T 'till0'!?) or OT vSft^r far "He jumped into the water"
[se j&ler majhe laph dili] or [se j&le laph dilfr]
The examples given above imply that in Bangla, case-markers and post-positions are
functionally same or serve the same morpho-syntactic roles. At least, in the above examples,
the replacement of post-positions with case markers does not markedly change the meaning
of the sentences, which, however, is not true for all cases. It can be assumed that, probably, in
the earlier times, the case markers were used to denote the same functions what the post
positions do presently.
In the study of Bangla post-positions, Chatteiji (1993) has discussed their origin along
with their historical changes in different stages of language development, Sen (1992) has
given a list of two types of post-positional word along with their usage in the language, while
Sarkar and Basu (1994) have provided lists of post-positions along with examples to show
how they are used in the language.
It is not yet clearly known whether suffixes are generated from post-positions or vice-
versa. Chatteiji (1993: 766) says that by the process of simplification the post-positions are
originated when the case markers were lost and “the speech began to employ the accusative,
dative, ablative or locative form of suitable nouns (with the sense of location, vicinity,
direction, connexion, purpose or power) along with the principal noun which retained its
original inflexion”. But Bhattachaija (1998: 136) assumes and explains with examples that
the suffixes were actually derived from the post-positions. Post-positions are actually the
analytical properties of the language which, in course of time, have changed into inflectional
properties and attached with the preceding words.
Here the post-postions (■fflFt [pase] and [kache]) are used in the sentences as
nouns with case-markers -003 [-er] and -to [-e], respectively.
In Bangla, almost all the postpositions are derived form nouns or verbs with then-
respective specific meanings. The following examples show how they are derived from two
sources:
Table 5.14
Post positions for different cases
Instrumental <pw [kare] [kariya] <pvy<> [kSrtrk] <5>lsil't [kar&ne] Ms! [dvara]
[d&run] felt [diya] fell [diye]
Dative tSRT [j&ny&] vSfiTff [j§nye] [tare] fife [prAti] W! [Md£le]
[b&nam] [bab&d] [lagi] [lagiya]
Ablative bfevs [caite] COT [ceye] tef^t [chaRa] cslt^ [theke] for [diye] felt [diya]
[b&i] felt [bina] [Mite] WS [hate]
Locative vsictt [agre] WW [apeksa] vsfesjptf [abhimukhe] \5ltPf [age] 'S’OT [up&re]
tSwtf [urdhe] [kache] [t&phate] to [dike] [nik&te] to
[nice] to [nimne] [nyay] OT [dh&re] qfirat [dh&riya] [dhare]
*IOT [p&kse] [pMcate] *tn?r [pane] [parsve] *m*t [pase]
[pichSne] to [piche] ms [prante] WT [phale] [b&rabar]
[baire] [bahire] tol [bhitSr] feOT [bhit&re] W [m&dhye] »
[majhe] spy [sange] [simmukhe] stffe [s&hit] s?ttsf [sathe] stfspr
[samne]
“Linguists often make the mistake of taking for granted the universal existence
of whatever types of compound words are current in their own language. It is
true that the main types of compound words in various languages are
somewhat similar, but this similarity is worthy of notice; moreover, the
details, and especially the restrictions, vary in different language. The
differences are great enough to prevent our setting up any scheme of
classification that would fit all languages.”
There are two types of compound words in English form structural or morphological
point of view: (i) the synthetic compounds whose second member is either derived by adding
a verbal suffix (-ing, -er, red etc.) or is a past participle form of a verb. E.g. breath taking,
watch maker, man made, time killed etc., and (ii) the primary compounds (Murchand 1969)
which encompass all other types.
From syntactical point of view the compounds have two sets of characteristic
properties (Spencer 1991: 310).
(i) The first set makes compounding resembles syntactic process in that it is typically
recursive. The elements of a compound may have relations to each other which
resemble to the relations holding between the constituents of a sentence. For example,
(ii) The second set brings compounding closer to word formation. It points out that
compounds have a constituent structure, which in general, depends on the way the
compound is built up. For example the compound 'mass literacy campaign' can be
analysed as:
(a) [mass [literacy campaign]]
(b) [[mass literacy] campaign]
114
From functional or semantic point of view, the compounds are identified as (i) the
endocentric compounds where one component stand for the whole or functions as a head.
Most English compounds are of this type. In English the right component gives the basic
meaning of the compound as a whole such as mailman or blackbird. Here, the modifier
element of the compound has the function of attributing a property to the head, much like the
function of an attributive adjective, and (ii) The excocentric compounds where none of the
components is substituted for the whole compound. Neither component can be called the head
of the construction. Such compounds are similar to bdhuvrlhi (reciprocal) compounds in
Bangla. E.g. pickpocket, lazybones, sweetheart etc. The compound sweetheart does not
imply a kind of heart which is sweet in taste but a fiancS (Jensen 1990: 99). In these
compounds one can isolate a predicate-type element (pick, lazy, sweet) and an argument-type
element (pocket, bones, heart) (Spencer 1991: 311).
The Bangla compounds have probably been first discussed by William Carey (1805).
A detailed classifications of Bangla compounds along with their formative and semantic
analysis was given by Chatteiji (1993), the basic differences between the Bangla and the
Sanskrit compounds are briefly discussed by Sen (1992), an exhaustive discussion on the
methods of compound formation is given by Sarkar and Basu (1994), both tdtsdmd and non-
tdtsdmd compounds are dealt with by Chakravarty (1974), a comparative study between the
Snaskrit and the Greek compounds with clear indication for Bangla compound analysis is
presented by Baneijee (1997), a descriptive study of the Bangla compounds is presented by
Bhattacharya (1983) in her unpublished doctoral dissertation, while Chaki (1996) has tried to
answer what is compound and the significance of the respective head names.
According to the scholars mere congregation of words into one form will not make
any compound (sSmasi); there must be some amount of syntactical and semantic connections
between the compounded words. If the words are not syntactically and semantically related to
each other, they cannot form a compound (Baneijee 1997: 07). Moreover, Bangla compounds
should be discussed in between morphology and syntax because a compound is a transformed
version of a sentence of which at least two words are juxtaposed in such as way that their
semantic relationship are never understood from the structure (Bhattacharya 1997: 53).
The only arguable deficiency of the above discussions is that none of scholars has
elaborately discussed what types of changes take place in the structure of the participating
lexical items (components) after compound formation, or whether the lexical categories of the
involving items (components) are changed in the final output. Here we will consider only the
way of their formation in relation to the formation of word structure along with their
relevance to the theory of word structure. These aspects will be taken for consideration as this
is related to automatic processing of compounds by computer. However, their syntactic and
semantic characteristics are not aimed at in the present framework of the thesis.
by a blank space (e.g. 'StvSiT [mach bhaja]) and sometimes as a single hyphenated word
(e.g. [mach-bhaja] "fish fry").
Moreover, by sandhi, two separate words can merge, thereby losing a character (basic
or allographic) of a word. For instance, the compound word fejMy [vidyal&y] "school" is
actually formed by adding IwT [vidya] "knowledge" and v5Jl*T?l [al&y] "house" where the last
character (the allograph OT [a]) of the first word and the first character (vowel Wt [a]) of the
last word is merged together. Similarly, the compound word [nTlotp&l] "blue lotus" is
formed by joining the word %T [nil] "blue" and the word tg'V’H [utp&l] "lotus" where the last
character (inter-vocalic [&]) of the first word and the first character (vowel tjj [u]) of the last
word is merged to form the allograph COT [o] which is joined with the last character [1] of
the first word. Such loss or change of a character is possible if the components pass through a
process of moipho-phonemic change.
(i) a compound must have at least two lexical items (components) which may or may not
be used as an individual lexical item in the text.
(ii) participating components can be separated with a space in between, but taken together
they would definitely denote some extra information or idea or meaning which cannot
be gathered form the meanings of each participating lexical items put together.
(iii) the compounds resemble to single words because they are often lexicalised. They are
often subject to semantic drift associated with stored words, which means that their
meaning becomes non-compositional or even totally idiosyncratic. For instance, the
compounds like [sulapani] or %tT‘#T [vlnapani] no longer indicate a person
having a spear/lyre in his/her hand. Rather they imply Lord Shiva or Goddess
Saraswati respectively who are associated with these compounds from time
immemorial by some mythological events or beliefs.
(iv)there are often some lexical restrictions on which the formation of compounds is
permitted in Bangla. For instance one can write [s&bdaM] "burning of a dead
body" or [s&bsadhana] "meditating over a dead body" but cannot say
[m&RadaM] or [m&RasadMna] though the respective meanings of these two
compounds are identical with the earlier two examples. In a similar fashion, one can
say [brstipat] "rain fall”, [baripat] "rain fall", [tusarpat] "snow
fall", [sisirpat] "diew fall", 'SREF’tM [asrupat] "tears shed" or even
[rSktSpat] "blood shed" but cannot say [jalpat] or [nlrpat] or
[panipat] or [kannapat] etc. though in all cases the meaning is nearly similar
denoting "fall or shading of rain or water or tear" etc.
116
In standard Bangla grammar the compounds are analysed from semantic point of view
rather than structural. It has identified six types of compounds namely, copulative (dv&ndv&),
determinative (titpurus), numeral (dvigu), descriptive (kcLrmMhar&y), adverbial (aby&yTbhab)
and reciprocal (b&huvrlhi) compounds. Patanjali in his Mdhabhasyd (2/1/6) has given a
semantic interpretation of the Sanskrit compounds which seems appropriate to the Bangla
compounds. According to his analysis, in copulative (dv&ndvi) compounds both the
components are of equal importance, in adverbial (aby&yTbhab) and numeral (dvigu) the first
component is important and carries the semantic load of the compound, in determinative
(tJitpurus) and descriptive (IdirmMhar&y) the last component carries all importance, whereas
in reciprocal (b&huvrihi) something other than the involved components becomes important
In the history of studying compounds in Bangla we find that Chatteiji (1993: 176) has
semantically divided compounds into three main divisions: collective, determinative and
descriptive. Determinative compound is again divided into three sub-groups: determinatives
with one element governing another (t&tpurus), appositional determinatives (k&rm&dhar&y)
and numeral determinatives (dvigu). The collective compound includes copulative (dvSndvi)
compounds and similar other words, the determinative compound includes ’uplpad-titpums’,
'aluk-titpurus', 'n&n-tihpurus', ’pradi-s&mas’, ’nityH-sSmas’, 'aby&ylbhab' and 'supsupa', while
the descriptive compound includes reciprocal compounds.
Type 1 : none of the components is inflected; only the bare stems are combined in
compounds, e.g. [bhal&m&ndii] "good or bad", [sadakal&] "black
and white" etc.
Type 2: both the components are inflected, e.g. [hatebajare] "in markets and
other places", [pMeghate] "in road or fields" etc.
Type 3: the first component is inflected while the last one is intact, e.g.
[hatekMRi] "holding chalk in hand" (first step of formal education),
[gayeh&lud] "smearing turmeric paste in the body" (a ritual in wedding) etc.,
and
Type 4: the last component is inflected while the first one is intact, e.g. 5ft*n;ftD5
[lalpeRe] "with read border", ief^lW [chSRihate] "with cane in hand" etc.
The compounds display a type of word structure which is made up of two or more
constituents, each belonging to one of the lexical categories: noun, adjective, verb or adverb.
Compounds of three or more words are usually produced by combining a compound with
another word or with another compound. For instance, the compound
[hajarhatkalT] "the Goddesses Kali who has a garland of thousand cut-off hands" consists of
three words where first two words produced a compound and the last word is added to the
earlier compound in the second stage. The regular standard process of compound formation in
Bangla is given in table (5.15) below with examples:
Table (5.15)
Process of compound formation in Bangla
no process examples glossary
1 noun + noun f¥T + -4 [din + kal] —> [dinkal] days and times
2 noun + adjective cftvft + ftp -»C4<Fiifop full of sorrow
[bed&na + bidhur] -> [bed&nabidhur]
3 noun + adjective 5M + TOT -4 bHVHlvStf parched rice
(adj < verb)
[cal + bhaja] -4 [calbhaja]
4 adjective + noun C*fi> + C5ftf -4 mean person
[chotd + lok] -4 [chot&lok]
5 adjective + hard and soft
adjective
[k&thin + kom&l] -4 [k&thinkomM]
6. adjective Cft + tssit oilv5®it [do + tala] -4 [dotila] second floor
(cardinal) + noun
7 finite verb + finite 55ft + OTftt -4 IHlttMf moving around
verb
[cala + phera] -4 [cMaphera]
8 prefix + noun f- +ftW-4fft©r?r [ku-+ nSj§r] -4 [kunijlr] bad look
9 prefix + adjective JJ-+ PJI -4 sjFp' [su- + ckur] -4 [suc&tur] very intelligent
Probably, the English term compound has become synonymous to the Bangla term sdmas.
Therefore, the study on the compound words is mainly centred around noun and adjective
compounds available in the language. Quite logically, the study of the compound verbs is
considered as a separate area of investigation having little connection with compound nouns
or adjectives. There have been some efforts to discuss compound verbs in Bangla, isolating it
from the general scope of compounds. Chatteiji (1926/1993) has considered compound verbs
as “a remarkable idiomatic use of verb roots in connexion with a noun or a verbal conjunctive
or participle” (1993: 1049). He has observed (1993: 1050):
“... these compound verbs supply to some extent the want of modal and
temporal affixes, and are as characteristic of the modem Indo-Aryan speeches
as the ‘aspects’ of the verb in the Slav languages”
He has also noted that the inflected root is properly an auxiliary one which is modified
by preceding noun or by a participle. Moreover, he has classified the compound verbs in
accordance with their semantic or aspectual peculiarities and the usage of the auxiliary or the
subsidiary verbs attached to the preceding verb. Sarkar (1976) has given a detail structural
description of the compound verbs with rules for compound verb generation. Dasgupta (1977)
has defined compound verbs and has pointed out their constructional homonymy finally
modifying the traditional definition of compound verbs. Dakshi (1998) has analysed the
compound verbs to show how aspectual function is a part and parcel of the Bangla verb
structure.
the vector are made of verb roots. “Of the two constituents of a compound verb, the vector is
inflected for tense, mood, aspect, degree of honour, and person, while the pole invariably
ends in -CO [-e] (Dasgupta 1977: 69). But Chatteiji has shown that the pole can end in -01 [-
ye], -to [-e], -fC05 [-ite] and -05 [-te] if the verb is with infinitive, and in -fOTTt [-iya] if the
verb is with a gerund (Chatteiji 1993: 1050). Sarkar (1994: 211) has cited examples where the
pole ends in -foot [-iye], -GI [-ye], -to [-e], -fbo5 [-ite] or -05 [-te]. Dakshi’s (1998: 52) list
adds one more ending for the pole namely -font [-iya] to the existing list
Usually, a compound verb can be converted into single worded verb without virtually
making any big change in the sense it denotes (Chaudhuri et al. 1997). However, there are
exceptions where such simplifications cannot be done. This is particularly so in future and
past perfect or continuous tense situation. Also, in Bangla sentence, a verb with infinite suffix
marker is reduplicated to represent continuity of action. For the purpose of automatic
identification of compound words, the computer needs observation on a few sequential words
in a sentence. There is a small subset (usually 25) of verb roots which are generally used as
vector in compound verb formation: [ach], V'SIH [an], Vu [ca], [as], Vc4\?t [beRa],
[b&s], [cSl], Vcw [dekh], Vfaft [dmaRa], Vc? [de], V(?T [ne], tya], “M [oth], Vgm
[phel], [pSR], V* [r&], [lag], [tol], V?TN [rakh], [thak], V*tT [pa],
[par], [mar] etc. However, the analysis of the syntactic aspect of the compound verbs is
beyond the scope of this theis.
In recent years it has created a good deal of interest among generative phonologists
and morphologists because it has both morphological and phonological aspects which are
important to morpho-phonological studies of lexemes (Spencer 1991: 151). Probably this has
influenced Sapir (1921: 76). to comment:
However, Spencer holds a slightly different view about reduplication. While Sapir
gives emphasis on the semantic role of reduplication, Spencer emphasises on the additions of
morphemes with the base. Therefore, he (1991: 13) observes:
120
There are two types of reduplicated words in English. One is complete reduplication,
where an entire word is reduplicated. It is considered as compounding, in which the
reduplicated word is compounded with itself. For instance the reduplicated words like goody-
goody, pooh-pooh, thick-thick etc. where the second part is simply the repetition of the first
part of the word. The other one is partial reduplication, where only a part is reduplicated.
The reduplicated part may be prefixed, suffixed or infixed to the original word. For example,
in the reduplication words like dilly-dally, bit-bat, hum-drum, riff-raff, sing-song, roly-poly
etc., the second part is generated either phonetically or orthographically from the first part
Structurally, three types of reduplicated words are mentioned by Chatteiji (1995), and
Sarkar and Basu (1994). However, Chaki (1996) has mentioned four types. All these types of
reduplication are mentioned below with examples. The last one is of Chaki’s addition.
(i) repetition of the same word, e.g. fw f¥T [din din] "day by day", [hasi hasi]
"smiling", 3153 [b&ch&r b&ch&r] "every year", ^rt^r [lal lal] "red" etc.,
(ii) addition of a semantically similar or almost similar word to the base word, e.g.
[alig&li] "lane by-lane, ptFK [cupcap] "silently", [curicamari] "stealing",
[calculo] "status" etc.,
(iii) addition of some onomatopoeic words to the base word, e.g. w»ib»i [jcdtcil] "water and
etc", NlfeBl? [machtach] "fish and others", °jlb$b [luciphuci] "luchi and others", '
[k&that&tha] "speech and others" etc. and
The reduplicated words, found in the corpus, are analysed from their structural point
of view, so that without considering their semantic properties the machine would be able to
identify and process them automatically. Reduplication in Bangla is generally found with
noun, verb (finite or non-finite), adjective and adverbs. At the time of morphological analysis
of reduplicated nouns it is noted that an inflected noun is reduplicated in six ways:
(i) the base form is doubled while neither the first nor the second component of the final
form has any suffix marker, e.g. ^ ^ [ghSr gh&r] "every house", f^T f^T [din din]
"regular", CTO [meye meye] "like a girl" etc.
(ii) the base form is doubled in such a way that the second component of the final form is
not a noun, rather an echo of the first noun, e.g. 'Sfo^T [j<&l] "water and others",
Nlfcbl® [machtach] "fish and others", [b&it&i] "books and others" etc.
(iii) the base form is doubled while both the first and the second component of the final
form take the affix -CO [-e], e.g. W [jane jane] "person by person", fro1 fror [dike
dike] "in every direction", TO TO [gh&re gh&re] "in every house" etc.
(iv)the base form is doubled while both the first and the second component of the final
form take the affix -OtS [-ay], e.g. [k&thay k&thay] "in every word",
[diijay dSijay] "in every door", [rastay rastay] "on every road" etc.
(v) the base form is doubled while both the first and the second component of the final
form take the affix -03 [-te], e.g. ;#03 [n&dlte n&dite] "in rivers", 41%?3 <qll^x.vD
122
[baRite baRite] "in every house", *ft%® [gaRite gaRite] "in cars", *1%® *1%®
[s&khite s&khlte] "between two girls" etc. and
(vi) the base form is doubled being connected by a link vowel -OT- [-a-] while the second
component of the final form takes the affix -fo [-i] (Chatteiji 1993: 1048), e.g. 4l®kll®
[hatahati] "fighting with lands", eiMMlRr [lathalathi] "kicking" etc.
(i) both the verb roots may be identical or different in form but both the verb roots can
have:
(a) the [-0] suffix as in [c&l c&l] "walk" etc.
(b) the suffix -fbnf [-iya] as in Q%T [suniya suniya] "hearing" etc.
(c) the suffix -CO [-e] as in [bale bale] "saying" etc.
(d) the suffix -foe® [-ite] as in of%® ©ftt® [sunite sunite] "hearing" etc.
(e) the suffix -C® [-te] as in Mt® M® [dekhte dekhte] "seeing" etc.
(f) the suffix [-b&] as in [sunb£ sunba] "I will hear" etc.
(g) the suffix -C4 [-be] as in [bSlbe b&lbe] "you will say" etc.
(h) the suffix -°T [-1&/] as in [sunil sunl&] "he heard" etc.
(i) the suffix -C°T [-le] as in ^C=T [karle kSrle] "you did" etc.
(j) the suffix -focsr [-iye] as in Ciftoi [dekhiye dekhiye] "showing" etc.
(k) the suffix -OtH [-ay] as in [sonay bcilay] "hearing and saying"
(l) the suffix -OTc*Tf [-ano] as in (MM CM HIM I [dekhano sonano] "showing and
hearing" etc.
(ii) there is another set of reduplicated verbs where first form is a verb root while the
second one is not a verb but an echo form of the first verb form. These forms can use
the verbal suffixes in the said manner mentioned above, e.g. =*JT5T M [khay day] "eats
and does others things", ^ [lute pute] "plundering and doing other things", oftoi
[gutiye sutiye] "folding and other things" etc.
(iii) the verb root is doubled, and they are connected by a link vowel -Ot- [-a-] and the
second part of the final form takes the suffix -fb [-i] (Chatteiji 1993: 1048). The roots
are genuine verb roots While the infix and suffix are different. The final output can be
a considered as a verb or a noun. From grammatical point of view the final output is
verb because it is originated form two verb roots. But from lexical point of view the
final output is a noun because the form can be inflected with case marker which is a
property of noun, e.g. vHiHIvyrlfi [janajani] "knowing", [maramari] "fighting",
bMIblft [tanatani] "pulling" etc.
123
(i) both the words have no marker, e.g. ^Tt^T [lal lal] "red", C^fi; [chota chot&]
"small", ^5t°T [bhali bhal&] "good" etc. and
(ii) the second component has the marker -CO [-e], e.g. TOR [dh&bdh&be] "milky white",
•Jb<£tb [kuckuce] "jet black" etc.
The morphological analysis of reduplicated adverbs in Bangla shows that the adverb
can be reduplicated in five ways.
(i) the adverb is doubled without any marker with any of the components, e.g. W W
Pdkh&n k&kMn] "sometimes", CWT CWT [yem&n ^emSn] "such" etc.,
(ii) both the forms have the marker -to [-e] at the end, e.g. sftwr [majhe majhe]
"occasionally", vSflC'T *ftt*f [ase pase] "nearby" etc.,
(iii) both the forms has the marker -Otu [-ay] at the end, e.g. [belay belay]
"before time", sMn JftsM [mathay mathay] "just upto the mark" etc.,
(iv)the first word has the ending -Ot [-a] while the second word has the ending -to [-i],
e.g. [kholakhuli] "openly", twlftf? [michamichi] "falsely" etc. and
(v) the first word has the ending -COT [-o] while the second word has the ending -fo [-i],
e.g. [mukhomukhi] "face to face", [pithopithi] "one after another" etc.
This process of analysis has helped us to identify reduplicated words in the corpus
with certain accuracy. However, in some occasions they are doubly parsed due to their
ambiguities in their surface forms.
5.8. Conclusion
In this chapter an effort is initiated to analyse the words morphologically with all their
markers and inflections to capture the actual orthographic representation of the words used in
the text of the Bangla corpus. The effort is also taken to observe the variations of the
morphological makeup whenever these words have undergone any kind of morpho-phonemic
reformation or change. Here only a few sample words and examples are used for study. As the
research is entirely grapheme based (because the study is based on the printed texts) it is
necessary to know how the graphemes behave in the level at the time of inflection, derivation,
sandhi or at the time of using case and other markers. The analysis and results would be used
in chapter 7 when the parsing processes of the surface words would be investigated.
*****