Dissertation-May Version - Edited

A Diachronic and a Synchronic
Approach to Romani Language
Romani (also known pejoratively in English as Gipsy) is an Indo-Aryan language

spoken in Europe, the Americas, and Australia. It is recognized as a minority
language in twenty-two European countries (i.e. Albania, Bosnia and Herzegovina,
Bulgaria, Finland, France, Germany, Greece, Hungary, Italy, Kosovo, Macedonia, the
Netherlands, Norway, Poland, Portugal, Romania, Russia, Slovakia, Serbia, Sweden,
Ukraine, and the United Kingdom) and Colombia. It enjoys official status in the small
Macedonian municipality of Šuto Orizari since Roma outnumber the autochtonous
Macedonian population in that area.
A report written by Jean-Pierre Liégeois and Nicolae Gheorghe (1999) estimates that
the total amount of Roma population in Europe ranges between 7, 000, 000 and 8,
500, 000, out of which approximately 5, 000, 000 persons are native Romani
speakers, having a Romani dialect as their mother tongue. According to the
aforementioned report, Romania has the largest Romani population, ranging between
1, 800, 000 and 2, 500, 000 persons, out of which 40% are native Romani speakers.
The autonym used by Roma for self-designation is Rom (‘Romani man’) and Rromni
(‘Romani woman’), while the language is known as rromani ćhib (‘Romani
language’) or amari ćhib (‘our language’) inside the community. (Matras, 2004: 5) It
is to be noted that Rom and its derivatives are generic and that Roma make use of
many other ethnonyms when referring to themselves. I shall group these ethnonyms
following Matras (2004: 5):
 ethnic designations referring to traditional trades, the actual ethnonym is an

exonym borrowed into Romani from Romanian, Turkish or Hungarian. E.g.:
Kelderaś (Romanian căldărar ‘kettle-maker’), Ursari (Romanian ursar ‘bear-
trainer’), Ćurari (Romanian ciurar ‘sieve-makers’), Lovari (Hungarian ló
'horse', followed by a Romanian-derived agentive nominalizer –ari ‘horse-
dealer’), Bugurʒi (Turkish burgucu ‘drill-makers’), Sepeći (Turkish sepetçi
1
‘basket-weavers’). Nevertheless, some exonyms were translated into Romani
and adopted for self-identification leading to internalized and fully accepted
professionyms such as kïkavǎri (‘Kalderash man’), rićhinari (‘Ursari man’);
 ethnic designations that represent endonyms, originating in Romani. E.g.:

Rrom, Kale, Manuś, Sinte. Kale (masculine singular kalo) is an endonym
derived from the colour term kalo (‘black’) that is used by some Roma groups
from Finland where it is spelled Kaale and Iberia where it is spelled Calo.
Manuś is an endonym used by Roma people from France being derived from
the Romani word for ‘man, human being’. Sinte is another endonym used for
self-designation by German Roma, being etymologically related to the name
of the Sindh river in India, also known by its Latin name Indus;
 several other ethnic designations related to various aspects such as settledness

(Arli/ Erli from Turkish yerli ‘settled’), region of settlement (Bergitka Roma
from German Berg ‘mountain’ referring to southern Polish highlands),
religion (Xoraxane Roma from Romani xoraxaj ‘Turkish/ Muslim man’),
country of settlement (Polska Roma from Polish polska ‘Polish’), place of
origin (Mačvaja from the Serbian district of Mačva, a Romani group that is
currently settled in the USA, Russia, and Sweden).
Although Roma use a wide plethora of self-designations, the most widespread is

rrom. As Matras (2004: 15) points out, it is etymologically related to that of ḍom, a
population found in Hunza valley in Pakistan, ḍomba, an ethnic group scattered
across Northern India, dom, a population of Indo-Aryan origin found in Syria, Jordan
and Palestine and lom, an Indo-Aryan group from Armenia. According to the Online
Etymology Dictionary (www.etymonline.com), all these ethnonyms ultimately derive
from the Sanskrit word ḍombas meaning ‘male member of a low caste of musicians’.
1.1 Romani Language-A Diachronic Approach
As far as the ethnogenesis of the Romani people and the formation of their language
are concerned, two hypotheses have been put forth. One of them belongs to Matras
(2004), while the other belongs to Hancock (2006). I shall discuss them in turn.
2
Hancock’s hypothesis claims that the Romani people have a military origin and that
their language is a military koiné. He also states that there was not a single migration
wave out of India followed by a split into three sister languages i.e. Domari,
Lomavren and Romani after passing through Persian language territories. He bases
his hypothesis on lexical data, after performing a thorough analysis of the shared and
non-shared Iranian word-stock present in each language today. What he found out is
three distinct layers of lexical innovation pointing to different migration waves. He
also considers that the Romani people must have left India in the sixth century and
entered Iranian territories before 700 AD by taking into account the scarcity of Arab
loanwords that can be encountered in Romani.
On the other hand, Matras (2004) supports the so-called traditionalist hypothesis. In
other words, he favours the idea of a single migration wave out of India, claiming the
unity of the Proto-Romani language and stating that the ancestors of present-day
Romanies belonged to a caste of service providers. Unlike Hancock (2006), he does
not refer to the military reasons (i.e. Islam invasions in India) that triggered the
departure of the Proto-Romanies, nor does he come up with a scientific account for
the mismatches found in the Iranian layer of three related Indo-Aryan languages i.e.
Romani, Domari, and Lomavren. Taking into account the arguments brought by
Hancock (2006), his hypothesis is far more plausible.
1.1.1 Proto-Romani
Although the exact location of the Proto-Romanies on the Indian subcontinent cannot
be accurately predicted, some speculations have been made on the basis of linguistic
evidence. Courthiade (n.d.) follows Ralph Turner’s seminal work published in 1927
in stating that Proto-Romani was initially a Central Indian language originating in the
area of Uttar Pradesh. The ancestor of Proto-Romani is various prakrits spoken on the
Indian subcontinent at the beginning of the Common Era. These prakrits evolved
from Sanskrit, a formal register of Old Indo-Aryan dialects spoken in ancient India.
Courthiade (n.d.) noticed that the main features that signal the development of
Sanskrit into Prakrit phonology are the following:
 reduction of initial and medial consonant clusters. Gemination occurred in the

case of the latter. The Old Indo-Aryan initial consonant clusters /stʰ, sv, ɕv,
3
spʰ, gr/ changed into /tʰ, s, pʰ, g/ in the Middle Indo-Aryan period and the OIA
medial consonant clusters /pt, kt, gn, rp, rṇ, rk/ evolved into MIA geminates
/tt, kk, gg, pp, ṇṇ /. For instance, the OIA Past Participle form supta (‘slept’)
turned into sutta in MIA, while the OIA lexeme śvaśrū (‘mother-in-law’)
turned into MIA sassu ‘id.’;
 voicing of the simple intervocalic plosive /t/ which turned into /d/ in Śauraseni
Prakrit. For example, OIA śata ‘hundred’ became sada ‘id.’ in this language;
 preservation of OIA /r/ in Middle Indo-Aryan. For instance, in words such as
the OIA lexeme rāʒā (‘lord’), the quality of the rhotic was preserved (cf.
MIA rāʒa (‘king’));
 affrication of the Sanskrit palatal approximant /j/ in Śauraseni and of the
voiceless palato-alveolar fricative /ʃ/ in initial position in other MIA dialects.
E.g.: OIA yuvatī > Śaur. ʒuvadi ‘woman’, OIA śāba > Ardhamāgadhi ćhāva
‘young of an animal’;
 voicing of the intervocalic simple retroflex voiceless plosive /ṭ/ in prakrits. For
example, OIA akṣoṭa ‘walnut’ turned into akkhoḍa ‘id.’ in MIA idioms;
 continuation of OIA retroflex /ḍ/ and of geminated retroflex consonant
clusters /ṭṭ, ṇḍ/ in MIA dialects. Thus, OIA aṇḍa (‘egg’) was preserved as
such in MIA languages;
 vocalization of the OIA voiced alveolar fricative trill /̥r/ in interconsonant
environment. It changed to /u/ before bilabials and labiodentals and to /i/
before other types of consonants. E.g.: mr̥ta > MIA muda ‘dead-PP’, ghr̥ta >
Śaur. ghida ‘butter’;
 the OIA consonant cluster /kṣ/ underwent affrication to /ʧʰ/ in word-initial
position and reduction to /k/ with a subsequent aspiration to /kʰ/ in initial
position plus germination (kkʰ) in medial position. E.g.: OIA r̥kṣa > Śaur.
riććha ‘bear’, OIA rukṣah > Śaur. rukkha ‘tree’;
 coalescence to /o/ of the OIA cluster –ava-. E.g.: OIA lavaṇa ‘salt’ > MIA
loṇa 'id.';
 deocclusivation of OIA medial voiceless aspirated plosive /kʰ/ in MIA
dialects. E.g.: mukha > muha ‘face’.
4
Proto-Romani emerged from Śauraseni Prakrit brought by the Proto-Romanies in the
Byzantine Empire from India. Following Brenda Lanel Higgie’s (1994)
reconstruction of Proto-Romani phonology, the ancestor of present-day Romani
dialects had 5 vowels /a, e, i, o, u/ and showed contrast based on vowel length. It
evinced the following phonological features:
 preservation of aspirated plosives and stops /bʰ, gʰ, dʰ/ inherited from
Sanskrit. For instance, the Sanskrit word bhrātār (‘brother’) evolved into
Proto-Romani *bhral (‘brother’). It evinces devoicing of the initial aspirated
plosive (SIR phral) in present-day Romani dialects;
 the vowel /a/ underwent centralization in Proto-Romani, followed by fronting
to /e/. Thus, Old Indo-Aryan daśa (‘hundred’) turned into /dʌʃ/ (‘id.’) in
Proto-Romani;
 degemination of MIA medial retroflex plosive /ṭṭ/. Thus, MIA peṭṭa (‘belly’)
turned into PR *peṭ. Later on, the degeminated retroflex plosive evolved into a
rhotic (ɽ) cf. perr in Modern Romani dialects;
 absence of fortition. The initial voiced Sanskrit fricative /v/ was preserved in
Proto-Romani. Therefore, Sanskrit vasati (‘to sit’) turned into Proto-Romani
*vesh- (‘id.’) and underwent fortition in later stages of evolution (SIR beśel
‘to sit, to dwell’);
 no prothetic yot in words such as *akhi (‘eye’), *eka (‘one’). Both lexemes
display prothetic yot in contemporary Romani dialects (SIR jakh and jekh);
 conservation of initial non-aspirated stops. Proto-Romani preserves initial
stops inherited from OIA (Sanskrit dugdha > PR*dudh ‘milk’), which
underwent devoicing in later stages with subsequent transfer of aspiration
(SIR thud ‘milk’);
 preservation of OIA and MIA initial retroflex /ḍ/ and of the cluster /ṇḍ/. Thus,
the ethnic designation of the Romani people was *ḍom in Proto-Romani,
deriving from the Sanskrit word ḍomba. In later stages of evolution, the stop
turned into a rhotic /ɽ/;
 palatalization of the Sanskrit voiceless fricative /s/. This feature is shared with
present-day Romani. The intervocalic /s/ in the Sanskrit word vasati (‘to sit’)
turned into /ʃ/ in the Proto Romani root *vesh- and present-day Romani beśel.
5
 lateralization of MIA simple intervocalic /d/, which is a further development
of the intervocalic /t/ present in OIA dialects. For instance, OIA gita ‘song of
praise’ turned into *gila ‘song’ in PR. It is also shared with today’s dialects;
 velarization of OIA initial voiceless aspirated plosive /kʰ/, probably under
Persian influence. For instance, OIA and MIA khara ‘donkey’ became PR
*xar /ˈxar/ ‘id.’
The table below adapted from Higgie’s 1994 PhD dissertation shows the inventory of
Proto-Romani consonants. What is different from present-day Romani is that the
voiceless aspirates /kʰ, ph/ were voiced in PR and the consonants /ḍ, ṭ/ were retroflex
at that time, their development into a retroflex rhotic /ɽ/ being characteristic of ER.
Table x Proto-Romani Consonant Phonemes adapted from Higgie (1994: 113)

Labial Dental Palatal Retroflex Velar Glottal
Stop p k
b g
bh t gh
m n d
Fricative s sh h
Affricate c
j
Liquid l
r
Glide v
The morphology of Proto-Romani was entirely Indo-Aryan. Unfortunately, no

attested form exists and linguists can only reconstruct them using the tools of
comparative linguistics. Thus, Matras (2004:19) reconstructs the Oblique
demonstratives in Masculine singular and Feminine singular having the forms *otas
and *ota respectively. These forms evolved into oles and ola in present-day Romani
dialects. In his paper entitled The Quest for a Proto-Romani Infinitive, Michael
Beníšek (2010) also reconstructs the first and second-person plural pronouns
suggesting the forms *am (‘we’) and *tum (‘you’), the present-day Nominative forms
ame and tume being a subsequent development from the Oblique forms in amen and
6
tumen, which underwent the subtraction of the final nasal consonant. Its syntax is also
purely Indo-Aryan, evincing some structures that are similar to constructions found in
Modern Indo-Aryan languages. Beníšek (2010) also attempts to reconstruct some
aspects of Proto-Romani syntax, bringing evidence from both diachronic and
synchronic data. He posits the existence of a predicative participle of necessity
(Gerundive) that has parallels in Old Indo-Aryan languages. The following examples
are taken from Beníšek’s paper (2010:59) :
(1) a. *ghuṛo na khā-b–o
horse:nom.sg.m neg eat-gerve-sg.m
‘A horse is not to be eaten.’
b. *agg thov-ib-i
fire:nom.sg.f set-gerve-sg.f
‘The fire is to be set.’
c. *kaṣṭa ān-ib-e
wood:nom.pl.m bring-gerve-pl.m
‘Sticks of wood are to be brought.’
These participles were characterized by gender, number and case agreement with
their Nominative subject. Another construction that Beníšek stipulates has a Dative
subject and a Gerundive expressing necessity agreeing with the object in case, gender,
and number. The example below is taken from Beníšek's paper (2010: 62):
(2) *am-en mās kin-ib-o (si)

we-dat meat:nom.sg.m buy-gerve-sg.m (cop)
‘We have to buy meat.’
Other constructions that Beníšek (2010) reconstructs are imperfective participles,

which were similar to those found in Hindi, absolutive participles that are
encountered in modern Romani dialects and Indo-Aryan languages and a modal verb
expressing possibility that vanished a long time ago. Finally, he also reconstructs
verbal nouns that were replaced by finite complementation with the complementizer
te triggered by Greek contact in the Early Romani period. The below examples are
taken from the aforementioned paper (Beníšek, 2010: 80):
(3) a. *Me džabo kamam.

I-nom go-gerve want-sg.1st pers.
‘I want to go’.
b. *Man džabo si.

I-dat go-gerve is
‘I have to go.’
7
As far as the lexicon of Proto-Romani is concerned, its core was Indo-Aryan.
However, it also featured an amount of Iranian loanwords, mainly from Kurdish,
Ossetian and Persian. Beníšek (2008: 69-70) cites tover (‘axe’, Kurdish tavar), daj
(‘mother’, Kurdish dāy), arde/orde (‘here’, Ossetian ardəm and ordəm) and mom
(‘wax’, coming from Persian). Matras (2002: 23) establishes the etymology of some
other Romani words that entered the language in this period: ambrol (‘pear’) and
avdǐn (‘honey’) coming from Persian, baxt (‘luck’) and sir (‘garlic’) coming from
both Persian and Kurdish. The privative suffix bi- and the interfix (-u-) used in the
formation of numerals are also of Persian origin. Apart from the Iranian word-stock,
Armenian loanwords also entered its lexicon. Words such as bov (‘oven’), grast
(‘horse’), patǐv (‘honour, respect’) and thagar (‘king, emperor’) are often cited to be
of Armenian origin. The arrival of the Proto-Romanies in the Byzantine Empire put
an end to the Proto-Romani period and gave way to the Early Romani period.
1.1.2 Early Romani
In Markedness and Language Change (2006: 68), Viktor Elšík and Yaron Matras
view Early Romani as the phase that preceded the dispersion of the Romanies
throughout Europe with a subsequent split into dialect families. According to this
source, their earliest attestation in the Balkans dates from the late fourteenth century.
At the phonological level, what makes Early Romani different from Proto-Romani is
the smaller amount of retroflex consonants. As Matras and Elšík (2006: 70) notice, a
small number of retroflex sounds (/ḍ, ṇ/) preserved their Middle Indo-Aryan quality,
while some others underwent a shift. Perhaps, the most important phonological
feature characterizing Early Romani is rotacism, which led to the emergence of a
retroflex rhotic /ɽ/ derived from the previous word-initial retroflex voiced plosive /ḍ/
and from the medial retroflex consonant clusters /-ṇḍ-, -ṭṭ-/. Another seminal feature
is devoicing of word-initial aspirated plosives, alveolars, and velars (/bʰ, dʰ, gʰ/).
Apart from the Indo-Aryan consonant inventory, the Iranian and later on Greek
loanwords brought into the language six other phonemes (v, f, z, c, dz, ž), as the
aforementioned authors (2006: 71) point out. As for vowels, Matras and Elšík (2006:
71) show that the ER vowel system fused with Late Greek, yielding a five-vowel
system (/a, e, i, o, u/) evincing no length opposition. Subsequent developments
8
consist of the introduction of central vowels /ə, y/, the emergence of vowel length in
some dialects and forward shift of stress under the influence of contact languages.
Other evolutions typical of the post-Early Romani period include allophonic

palatalizations of dental stops and velars before the vowel /i/ and affrication which
can be encountered in some dialects. Finally, subsequent phenomena such as
palatalization around iotacized segments, aspiration of /s/ to /h/ via debuccalization,
apocope of word-final /s/, reduction of initial /a/ and prothetic /j, v, a/ also
characterized the post-Early Romani period. The table below adapted from the
aforementioned source presents us with the consonant inventory of Early Romani.
Table x The Consonant Inventory of Early Romani adapted from Matras and Elšík
(2006: 70)
Bilabial Labiodental Alveolar Palato- Retroflex Palatal Velar Uvular Glottal

alveolar
Plosive: p t (t’) k
voiceless
Plosive: ph th kh
voiceless
aspirated
Plosive: b d (d’) (ḍ) g
voiced
Nasal m n (ṇ) /ŋ/
Affricate: c č
voiceless
Affricate: čh
voiceless
aspirated
Affricate: (dz) dž
voiced
Fricative: f s š x h
voiceless
Fricative: v z (ž) (ř)
voiced
Trill r (ṛ)
Lateral l (ḷ)
Approximant j
9
The respects in which the consonant inventory of Early Romani is different from that
of Proto-Romani are the absence of aspirated voiced alveolars, plosives and velars
(/dʰ, bʰ, gʰ/) which underwent devoicing in Early Romani (/tʰ, pʰ, kʰ), the emergence
of a retroflex rhotic /ɽ/, the introduction of an amount of affricate consonants /dz, dʒ,
ʧ/ and fricatives (/f, v, z/) and the appearance of a voiceless velar consonant /x/.
Nevertheless, both Proto-Romani and Early Romani evince incomplete development
of the inherited retroflex consonants /ḍ, ṇ/.
Although its phonology showed signs of non-Indo-Aryan influence (mainly Greek

and to a lesser extent Iranian), its morphology remained essentially Indo-Aryan.
Matras and Elšík (2006: 71) show that Early Romani evinced two genders, masculine
and feminine, and two numbers, singular and plural. Definite articles, adjectives, and
demonstratives had to agree with the head noun and agreement was also marked on
Genitive attributes, leading to a double case structure that is characteristic of Indo-
Aryan languages. Thus, ‘the boy’s father’ was čhav-es-ker-o dad in Early Romani,
while ‘the boy’s mother’ was čhav-es-ker-i daj. (Matras and Elšík, 2006:71) Gender
agreement underwent plural neutralization, case marking being sensitive to animacy.
The Early Romani case-marking system inherited and preserved the late Indo-Aryan
system of three layers, out of which Layer I is the most archaic, being closest to the
noun stem, drawing the distinction between Nominative and Oblique cases, while
showing sensitivity to declension patterns. Early Romani was unique among the Indo-
Aryan languages since it displayed both inherited or oikoclitic declension classes
derived from Old Indo-Aryan/Middle Indo-Aryan nominal derivational markers in the
Nominative and remnants of the Old Indo-Aryan/Middle Indo-Aryan Genitive in the
Oblique cases and borrowed or xenoclitic derivational classes, the latter being the
outcome of the integration of Greek Nominative inflections. The Nominative case
characterized the subject as well as the inanimate direct object. The Oblique was
meant for marking the animate direct object, the internal and external possessor and
the recipient of the verb ‘to give’. Layer II suffixes marked the Dative (-ke/ge),
Locative (-te/-de), Ablative (-tar/-dar), Sociative (-sa/) and Genitive cases (-ker/-ger).
(Matras and Elšík, 2006: 73) Greek morphology further penetrated Early Romani
morphological system by adopting the singular morpheme (-imos) and the plural
morpheme (-mata) used in the formation of abstract nouns. Other functional
borrowings include the suffix –to used for ordinal numerals, the suffix –imen used in
10
the formation of past participles from loan verbs, the interfixes –is-, -in- and –iz- used
for loan verb adaptation and the adjectival suffix itiko > icko. However, the main
departure from the Indo-Aryan linguistic norm consists of the emergence of a
proclitic definite article (o/i-m/f.sg.)
Early Romani possessed a set of valency-changing morphemes inherited from its

Middle Indo-Aryan ancestor (-av- and –ar-) evincing restricted productivity.
However, they played a major role in the formation of causatives and transitive verbs
from other parts of speech. Another valency-increasing morpheme emerged (i.e. –
ker), being the result of the grammaticalization of the verb ker- (‘to do’). There was
also a valency-decreasing morpheme –jov- originating in the verb ov- (‘to become’)
which was used to derive middles from transitive participles and inchoatives from
adjectives and nouns. A seminal feature of Early Romani verbal morphology was the
integration of Greek verbs together with their inflection class markers (–ín- and –íz-),
whose purpose was tense marking. Thus, Greek-derived verbs kept on carrying Greek
tense inflection when used in Early Romani (jiríz-‘to go’, jirís- ‘to have gone’, graf-
‘to write’, graps-'to have written'). Greek verbs were integrated into the Early Romani
lexicon by taking into account and signaling Greek-based inflection class affiliation, a
mechanism that was also applied to verbs coming from subsequent contact languages.
This phenomenon faded out gradually after the Early Romani period, as the number
of Greek verbal roots slowly declined, while the number of verbal borrowings coming
from other contact languages in the post-Early Romani period gradually increased.
According to Matras and Elšík (2006: 84), the main source of dissimilarity between
Early Romani and the New Indo-Aryan languages is its syntax. While Proto-Romani
was a verb-final language, Early Romani was a verb-medial language. Its Middle
Indo-Aryan syntactic system underwent a process of Balkanization fueled by Greek.
The main pieces of evidence refer to the abandon of modal infinitives (which were
prolific in Proto-Romani), in favor of finite complementation, the development of
adverbial subordination, the emergence of kaj as a general subordinator and
relativizer and the appearance of a split between factual (for which kaj was used) and
non-factual complementation for which the complementizer te was used.
As for its lexicon, Early Romani featured an impressive amount of Greek loanwords.
In his book entitled Romani-A Linguistic Introduction, Matras (2004) provides the
etymology of the Greek borrowings that entered the language during this stage: words
11
related to kinship (papu(s) ‘grandfather’ from Gk. papús, mami ‘grandmother’ from
Gk. mamí), to the human body and emotions (kokalo ‘bone’ from Gk. kókkalo, xoli
‘anger’ from Gk. xolí), to time (kurko ‘Sunday, week ‘ from Gk. kiriakí, tehara
‘tomorrow’ from Gk. tachiá), numerals (efta ‘seven’, oxto ‘eight’, enja ‘nine’),
adverbs and particles (pale ‘again’ from Gk. pále, panda ‘more’ from Gk. pánta
‘always’, komi ‘still’ from Gk. akómi, (v)orta 'straight-ahead' from Gk. orthá). Matras
(2004:22) estimates that the number of Greek roots in present-day dialects is around
250 lexemes, their number having been much higher during the Early Romani period,
due to intensive and prolonged language contact with Greek.
1.1.3 Romani in the context of Indo-Aryan languages
There have been many debates among linguists related to the position of Romani
among Indo-Aryan languages. Turner (1926) cited in Matras (2004: 33) claims that
Romani was initially a Central Indo-Aryan language that experienced influence from
Dardic languages. Three pieces of evidence support this hypothesis: preservation of
the cluster dental + rhotic inherited from OIA (cf. Romani drakh ‘grape’, but Hindi
dākh ‘id.’); conservation of the OIA cluster labial + rhotic (cf. Romani phral
‘brother’, but Hindi bhāi ‘id.’); continuation of OIA medial cluster sibilant + dental
(cf. Romani vast ‘hand’, but Hindi hāth 'id.'); inheritance of two dental sibilants (/s,
ʃ/) out of three sibilants found in OIA (cf. Romani sap ‘snake’, Hindi sap ‘id.’,
Romani berś ‘year’ but Hindi baras ‘id.’); retention of OIA cluster nasal + dental (cf.
Romani dand ‘tooth’, but Hindi dā̃t 'id.'). All these archaisms are shared with
Northwestern NIA languages such as Kashmiri and Lahnda. Nevertheless, Turner
(1926) cited in Matras (2004: 32) shows that Romani shares with Central IA
languages several phonological innovations, which are rendered below.
Table x Romani Innovations Shared with Central Indo-Aryan Languages
OIA Northwest Southwest Central Romani East South
r̥ ri a i, u i, u a? a
tv tt tt pp pp pp? tt?
sm sp, ss mh? mh mh mh mh?
12
kṣ cc̣ ḥ cch kkh kkh kkh cch?
y- y- j- j- j- j- y-
As can be seen in the table above, Romani and Central IA languages share a
considerable amount of phonological innovations. Therefore, OIA /r̥/ undergoes
vocalization in both Romani (cf. mulo ‘dead’) and Hindi (cf. muā ‘id.’). Elision of
medial consonants is also characteristic of both Romani and the aforementioned
language group. Thus, OIA lavaṇa (‘salt’) became lon ‘id.’ in Romani and lōna ‘id.’
in Hindi, the latter example evincing compensatory vowel lengthening. Affrication of
the OIA palatal approximant /j/ is another shared feature among Central IA
languages. For instance, OIA yūkā ‘louse’ became ʒuv ‘id.’ in Romani and jū̃ ‘id.’ in
Hindi. OIA medial clusters also experienced reduction followed by germination in
MIA and degemination in NIA languages. Therefore, OIA sarpa ‘snake’ became
sappa in MIA dialects, and turned into Romani sap ‘id.’ and Hindi sā̃ p ‘id.’ Cluster
simplification affected even initial and final clusters such as –itvana, which turned
into -ippaṇa in MIA dialects and became –ipen in Romani and –pan in Hindi. There
are still three phonological innovations, which may be viewed as either common
innovations or independent developments (Matras, 2004: 32):
 fortition in the case of OIA initial /b/. Thus, OIA varṣa ‘rainy season’ became
berś ‘year’ in Romani and baras ‘id.’ in Hindi;
 opening of the vowel /u/ to the vowel /o/. Therefore, the OIA root ru(v)- ‘to
cry’ turned into rov- ‘id.’ in Romani and ro- ‘id.’ in Hindi;
 OIA medial /m/ turned into /v/ in Central IA languages. For example, OIA
grāma ‘countryside’ turned into Romani gav ‘village’ and Hindi gā̃ v ‘id.’
1.2 Romani Language - A Synchronic Approach
What followed after the Early Romani period is a stage of dispersion and the
emergence of Modern Romani dialects. This section presents the reader with a
synchronic approach to Romani, explaining its main phonological, morphological,
syntactic and lexical aspects. The first part focuses on critically assessing various
theories regarding dialects and dialect families. The second part is divided into two
parts: the former tackles the four major dialects spoken in Romania (Kalderash,
13
Ursari, Transylvanian Carpathian and Xoraxane), while the latter approaches the main
Romani dialects spoken in Europe. The last part is devoted to the so-called Para-
Romani varieties, which display Indo-Aryan vocabulary, but their syntax and
morphology are identical to those of the host languages.
1.2.1 Dialect Classifications
There have been many attempts at classifying Romani dialects. The first scholar who
took up the task of describing dialect splits in Romani is Miklosich (1872-1880). He
described a gradual migration wave through Europe, out of which individual dialects
and dialect families stemmed and underwent influences from the host languages as
well as individual linguistic change. His model of dialect classification took into
account the language of the majority population of a particular country. Therefore, he
labels the Romani dialects as Hungarian Romani, German Romani, French Romani
etc. What can be noticed in this terminology is the stress emphasized on the role of
the contact language in the formation of various dialects.
The next scholar who tried to provide a classification of Romani dialects is Bernard
Gilliat-Smith (1915) who depicted the Romani dialects of the North-Eastern parts of
Bulgaria. He identified two linguistic groups that were differentiated by religion (one
group was Orthodox, while the other was Muslim), by professional criteria (nomadic
versus settled) and major host language (Bulgarian versus Turkish). By looking at
these data, Smith further reached the conclusion that one of those groups, which was
labeled Vlax by him, originated in a Romanian-speaking territory and had occupied
an area that had already been inhabited by a non-Vlax Romani-speaking community.
His dialectological model was seminal for the field of Romani dialectology, laying
the basis of modern Romani dialect classification since many other scholars picked up
his terminology in the subsequent years and started to draw a neat distinction between
Vlax and non-Vlax Romani dialects. His work also led to the dissemination of the
idea that Vlax Roma migrated outwards Romania at a certain point and settled in
different regions at different moments in the course of history.
There is no doubt that the dialectological model that had the strongest impact in the
field of Romani linguistics is provided by the Occitan linguist Marcel Courthiade
(1995) cited in Sarău (1998:139). According to the French linguist, Romani dialects
14
can be classified into three layers, out of which the first one is the most archaic, while
the other two being more recent.
Finally, a more refined classification is provided by Boretzky (2003), who

distinguishes between Northern Vlax dialects (Romanian Kalderash Romani and
Hungarian-dominated Lovari Romani) and Southern Vlax dialects which comprise
Balkan Vlax Romani dialects.
The table below adapted from Sarău (1998:139) attempts to provide an overall picture
of the main linguistic features of the aforementioned three layers of development.
Table x Parameters of the Layers of Linguistic Evolution from Sarău (1998: 139)
Parameter Layer I Layer II Layer III
the vowel of the first- the back vowels /o, u/ Idem Layer III the front vowel /e/
person singular or the iotacized
E.g.: phirdém ‘I
perfect ending counterparts /jo, ju/
have walked’
E.g.: phirdóm,
phirdiom, phirdium ‘I
have walked’
alteration/preservation E.g.: paní ‘water’ Idem Layer III paj, paji ‘water’
of the suffix –ni for
khoní ‘tallow’ khoj, khoji
singular nouns
‘tallow’
kuní ‘elbow’
kuj, kuji ‘elbow’
reduction of the -not present. Idem Layer III śib (Kalderash

aspirated affricate /ʧʰ/ Romani śib)
E.g.: ćhib ‘tongue,
to the fricative /ʃ/ ‘tongue,
language’
language’
reduction of the -not present. E.g. ğenó Idem Layer III źenó ‘individual,
affricate /dʒ/ to the ‘individual, guy, guy, male person’
15
fricative /ʒ/ person’
the abstract -pe(n), -pa /-be(n), –ba Idem Layer I -mos

nominalizer for
E.g.guglipe(n)/guglipa, E.g. guglimos
singular nouns
guglibe(n)/gugliba. ‘sweetness’
‘sweetness’
the comparative po ‘more’ po/maj ‘more’ maj ‘more’

adverb for adjectives
and adverbs
typical Romanian -absent from the sub- Idem Layer III E.g.: lùmia
loanwords Danubian linguistic in Romanian- ‘world’, tràjo
area. speaking ‘life’, vòrba
territories ‘word’, gïndisarel
‘to think’.
As for their geographical distribution, Sarău (1998:140) shows that the speakers of
the first Layer are Mećkari, Kaburʒi, Erli, Xanduri, Drindari, Arli living in the
Balkans, Ursari from Moldavia, Spojtori found in Dobrudja, extremely southern
Romania and other parts in the Balkans, Carpathian Roma found in Transylvania and
Maramures, Slovakia, the Czech Republic, Hungary, Ukraine (Galicia) and southern
Poland, Abruzzian Roma from Italy, Kaale Roma from Finland and Sinte-Manouche
living in Germany, Austria, France, Italy, and Slovenia. The speakers of the second
Layer are Ćergari from Montenegro, Bosnia, and Northern Albania, Serbian Gurbeti,
Ʒambazi from Macedonia, Fićiri and Filipidzi from Greece. The Third Layer gathers
speakers of Kalderaś living in Romania, Russia, Ukraine, Belgium, France, Spain,
Sweden, Australia, Latin America, the United States and Canada and Lovara from
Hungary, Germany, Poland, Slovakia, and Czechia.
Some observations regarding the aforementioned theory have to be made. First of all,
the previous parameters are the most striking and one can add many others in favour
of this theory. Second, because of interdialect contact, some Romani speakers brought
into their idioms foreign features typical of dialect mixing. Finally, they reflect
sociohistorical conditions and are partly the outcome of host language contact.
16
1.2.2 An Overview of the Romani Dialects Spoken in Romania
As can be seen in the lines above, four Romani dialects are spoken in Romania:
Kalderash Romani whose speakers are spread over the entire country, Ursari Romani
spoken in Moldova, Xoraxane Romani spoken alongside the Danube, near the
Bulgarian border and Carpathian Romani spoken in Transylvania. Xoraxane, Ursari,
and Carpathian Romani belong to the most archaic layer, i.e. the first one, while
Kalderash Romani belongs to the most recent layer i.e. the Third Layer. This
subsection offers an outline of the main features of these four dialects, heavily relying
on Sarău’s 1998 book.
1.2.2.1 Kalderash Romani
At the phonological level, Sarău (1998:142-146) identifies the following features:
 dejotation of the first pers.sg. Perfect ending. E.g.: (me) kerdem < kerdjom‘(I)
made’;
 palatalization of the affix –ni in masculine nouns. E.g.: khoj ‘tallow’, kuj
‘elbow’;
 reduction of the voiceless aspirated affricate /ʧʰ/ to the fricative /ʃ/ and the
voiced affricate /dʒ/ to the fricative /ʒ/.E.g.: ćhon > śon ‘month, moon’, ʒi > źi
‘until’;
 affrication of aspirated alveolar stops before front vowels /i, e/, the outcome
being a voiceless palato-alveolar sibilant affricate /ʧ/. E.g.: kathé > kaćhé
‘here’;
 affrication of the voiced palatal stop /ɟ/ followed by /i/.E.g.: luludǐ > lulugí
‘flower’;
 prothetic /a/ in verbs and nouns. E.g.: abiáv ‘wedding’, arësël ‘to arrive’;
 the allophones /ə, ɨ/ of the front vowels /e, i/ e.g.: kerel > kërël ‘to make, to
do’, briśind > brïśïnd ‘rain’;
 vowel umlaut in the case of the diphthong /aj/ before palatals. E.g.: daj > dej
‘mother’;
 presence of the voiced retroflex flap /ɽ/ in some items. E.g.: le rrom ‘the Roma
men’;
17
 the diphthong /ea/ in the theme of feminine plural nouns. E.g.: o pharimós le
buteángo ‘the difficulty of things’;
 vocalization of word-final /v/, becoming an approximant /w/.E.g.: gav > gaw
‘village’.
At the morphological level, Sarău (1998:146-151) mentioned the following features:
 presence of the Greek-derived abstract nominaliser –mos. E.g.: tërnimós

‘youth’;
 the Greek-derived nominaliser used for integrating xenoclitic nouns is –ó(s).
E.g.: o dóftoro ‘the doctor’;
 borrowed singular feminine nouns end in –a. E.g.: pílda ‘example’, sínia
‘table’;
 borrowed masculine plural nouns end in the Romanian-derived suffixes –uri
and –e/-eá/. E.g.: ćéro(s)/ćéruri ‘sky/skies’, o ućeníko/äl ućeníće(a)
‘apprentice/apprentices’;
 borrowed feminine nouns end in the plural marker –e. E.g.: pílde ‘examples’;
 the definite articles are o (for masculine nouns), e (for feminine nouns) äl (for
plural masculine and feminine nouns). The forms for the Oblique cases are lé
(for masculine nouns) and e/lé (for feminine nouns). For inanimate nouns, the
articles are o (for masculine nouns) e (for feminine nouns) and äl (for both
genders) when used in the Oblique and the Nominative case;
 zero plural marking (ø) of nouns ending in a consonant. E.g.: äl (le) dandø
‘the teeth’;
 both analytic and synthetic ways of forming the future. The former is built
with the aid of the verb avéla (‘to come, to be’), while the latter is built with
the particle –a. E.g.: ći avéla te merën ‘(they) are not going to die’, arakhëla
‘(he/she) will find’;
 two interfixes are used for adapting foreign verbs i.e. –sav- and –sar-. The
former is used for adapting intransitive verbs, while the latter is used for
adapting transitive verbs. E.g.: mutisávël ‘(he/she) is moving (into a new
house)’ versus mutisarël ‘ (he/she) is moving (something)’;
 short Genitive forms in –ko/ki/ke/ke. E.g.: sapänqo ‘of the snakes’;
 specific negations are ći and níci, both meaning ‘no’;
18
 invariable reflexive pronoun pe ‘himself, herself’ accompanying the reflexive
verb. E.g.: arakhën pe ‘they meet (each other)’;
 Romanian-derived comparative and superlative prefix of adjectives and
adverbs maj (‘more’);
 presence of the inherited cumulative conjunction vi (‘also, too, as well’);
 the most widespread conjunction is thaj and its variants haj, t(h)a ‘and’;
 the Greek-derived suffix –to helps in the formation of ordinal numerals. E.g.:
jékhto ‘the first one’;
 cardinal numerals (10-19) are built up with the help of the Persian-derived
interfix –u-. Cardinal numerals above 19 are constructed with the help of the
inherited conjunction thaj. E.g.: deśujékh ‘eleven’, biś thaj duj ‘twenty-two’.
At the level of syntax, I could identify the following characteristics:
 postnominal adjectives, the adjective following the head noun. E.g.: o manuś
źungalo ‘the ugly man’;
 analytic Genitives, the Genitive following the head. E.g.: o pharimós le
buteángo ‘the difficulty of things;
 postnominal demonstratives. E.g.: e gažeske kodoleske ‘for that man’.
At the lexical level, Sarău (1998: 151-152) notices an outstanding influx of Romanian
loanwords belonging to the lexical fields of kinship ((e) kumnáta ‘(the) sister-in-
law’,(o) sókro ‘(the) father-in-law’ (e) sókra ‘(the) mother-in-law’), fauna ((o)
magári ‘(the) ass’, (o) kokostïrko ‘stork’, (o) hïrco ‘(the) mouse’), flora ((e) buriáca
‘(the) mushroom’), tools ((o) ćiokáno ‘(the) hammer’), landscape [(o) kïmpo ‘(the)
field, (the) plain’].
1.2.2.2 Ursari Romani
This dialect was formerly spoken in the Balkans, the Ursari Roma immigrating to
Romania and settling in Moldova. Its speakers were initially horse-trainers, but they
went through a process of professional reconversion. What characterizes Ursari
Romani is an impressive amount of South Slavic borrowings, especially from
Bulgarian and Macedonian. At the level of phonology, Sarău (1998: 142-146)
presents the reader with the following features:
19
 the diphthong -eó- and the back vowel /o/ in the first-person singular Perfect
ending. E.g.: (me) uśtineom ‘(I) have climbed’, (me) dikhlóm ‘I have seen’;
 preservation of the suffix –ni in Nominative singular masculine and feminine
nouns. E.g.: paní ‘water’, khoní ‘tallow’, kuní ‘elbow’;
 preservation of the affricate /ʧʰ/. E.g.: ćhib ‘language, tongue’, ćhon ‘month,
Moon’;
 reduction of the affricate /dʒ/ to the fricative /ʒ/. E.g.: penźarel ‘to know’;
 sporadic neutralization of aspiration. E.g.: phral > pral;
 preservation of voiceless aspirated alveolar plosives /tʰ/.E.g.: them ‘country’;
 preservation of voiced palatal stop/ɟ/ followed by the vowel /i/. E.g.: luludǐ
‘flower’;
 no prothesis in verbs or nouns. E.g.: biáv ‘wedding’, resél ‘to arrive’;
 sporadic devoicing of final consonants. E.g.: dand > dant ‘tooth’;
 vocalization of final /v/. E.g.: gav > gaw ‘village’, keráva > keráwa ‘(I) will
make’;
 absence of the allophones /ə, ɨ/ of the vowels /e, i/ found in Kalderash
Romani;
 absence of the voiced retroflex apical flap /ɽ/ found in Kalderash Romani;
 absence of vowel umlaut in the diphthong /aj/. E.g.: ćhaj ‘Romani girl’, daj
‘mother’;
 syncope in pronouns and indefinites. E.g.: saworó > saró ‘all, entire (m.sg.)’,
saworí > sarí ‘all, entire (f.sg.)’, saworé > saré ‘all, entire (m./f.pl.)’, mirí >
mi ‘my (f.sg.)’, miró > mo ‘my (m.sg.)’, miré > me ‘my (m./f.pl)’ tiró > to
‘your (m.sg.)’, tirí > ti ‘your (f.sg.)’, tiré > te ‘your (m./f.pl.)’;
 presence of the vowel /e/ in the theme of plural feminine nouns. E.g.: pirénca
‘with pots’, ćiriklén ‘birds-acc.’;
 metathesis and insertion of non-etymological aspiration in demonstrative
pronouns. E.g.: adavkha < kadava ‘this (f.sg.)’;
 preservation of consonants in adjectival and participial forms;
 regressive assimilation of the consonant /v/ in final position in first person
singular Present Indicative forms. E.g.: (me) dikháv tut > (me) dikháp tut ‘(I)
see you’, (me) dav tu > (me) dap tu ‘(I) give you’.
20
At the level of morphology, Sarău (1998:146-151) identifies the following features:
 presence of the abstract nominaliser –pé and sporadically the voiced

correspondent –bé. E.g.: śudripe ‘coolness’, ternipe ‘youth’, patiabé ‘trust,
belief’, xabé ‘food’;
 Romanian-derived comparative and superlative prefix of adjectives and
adverbs maj (‘more’), also found in Kalderash Romani;
 rare instances of zero plural marking (ø). E.g.: ol dandø ‘the teeth’;
 presence of the specific negation ni ‘no’;
 cardinal numerals built with the help of the interfix –u-. E.g.: deśujékh
‘eleven’;
 ordinal numerals are partially preserved. E.g.: jekhtó ‘the first’, dújto ‘the
second’;
 long unsyncopated Genitives. E.g.: sapés/koro ‘of the snake’, sapés/kiri ‘of
the snake’, sapés/kere ‘of the snake’, sapén/goro ‘of the snakes’, sapén/giri
‘of the snakes’, sapén/gere ‘of the snakes’, sapén/gere ‘of the snakes’;
 both synthetic and analytic future. The former is built with the particle –a,
while the latter is built with the particle ka, derived from the verb kamel (‘to
want’). E.g.: arakhása ‘(we) will find’, ka arakhás ‘(we) are going to find’;
 synthetic reflexive verbs are not accompanied by reflexive pronouns and end
in the suffix –eól. E.g.: śuneól ‘it is heard’;
 analytic reflexive verbs are accompanied by the invariable reflexive pronoun
pe(s). E.g.: arakhel pes ‘(he/she) meets’, arakhén pes ‘they meet (each
other)’;
 the suffix used for loan verb adaptation is –zel. e.g. daśtizél ‘can, be able to’;
 the definite articles are o (for masculine nouns), e (for feminine nouns) ol (for
plural masculine and feminine nouns). The article of Oblique cases is lé for
both masculine and feminine animate nouns. For inanimate nouns, the articles
are o (for masculine nouns) e (for feminine nouns) and ol (for both genders)
irrespective of case.
Its lexicon shows considerable dialectal variation. It evinces Romanian loanwords

expressing kinship (e.g. kumnáta ‘sister-in-law’, sókra ‘mother-in-law’, sókros
‘father-in-law’), flora and fauna (bárza ‘stork’, paparúda ‘butterfly’, albina ‘bee’,
21
kukurúzi ‘Indian corn’), metals (kositóri ‘tin’, aráma ‘copper’) and landscape (kïmpos
‘plain’). South Slavic borrowings such as zéleno ‘green’, dïlgo ‘long’, níśto ‘nothing’,
ráno ‘morning’ are also very pervasive.
1.2.2.3 Xoraxane Romani
Xoraxane Romani is spoken alongside the course of the Danube in the proximity of
the Bulgarian border and Dobrudja. At the level of the lexicon, one can notice a
considerable amount of Turkish loanwords, and to a lesser extent Bulgarian
borrowings. Its speakers are bilingual and also speak Turkish, their religion being
Islamic. Most of the time, they identify as Turkish, celebrating many Turkish
holidays. At the level of phonology, Xoraxane Romani evinces the following features
outlined by Sarău (1998:142-146):
 the first-person singular Perfect ending features the back vowel /o/. E.g.: (me)
kerdóm ‘I have made’;
 apocopate consonants in word-final position. E.g.: grast > gras ‘horse’;
 incomplete palatalization of the suffix –ni in Nominative singular masculine
and feminine nouns. E.g.: pají ‘water’, khojí ‘tallow’, kují ‘elbow’;
 devoicing of final consonants in Nominative singular nouns. E.g.: dad > dat
‘father’;
 affrication of the voiceless palatal stop /c/ before the vowel /i/, which turns
into a voiceless alveolar sibilant affricate /ts/. E.g.: kirmalo > cirmalo
‘wormy’;
 preservation of the voiceless aspirated affricate /ʧʰ/, a feature shared with
Ursari Romani. E.g.: ćhuri ‘knife’;
 reduction of the affricate /dʒ/ to the fricative /ʒ/, a feature shared with both
Ursari and Kalderash Romani. E.g.: źi ‘until’, źamutró ‘son-in-law’;
 palatalization of the voiced palatal stop /ɟ/ and the voiceless palato-alveolar
affricate /ʧ/.E.g.: luludĭ [lulugí] > luluzí ‘flower’, paramìći > paramìci ‘story,
fairy tale’;
 no prothesis in verbs or nouns, a feature shared with Ursari Romani. E.g.:
śúnla ‘to hear, to listen’, beáw ‘wedding’;
 preservation of the diphthong /aj/, a feature shared with Ursari Romani. E.g.:
daj ‘mother’;
22
 presence of vowel /e/ and sometimes the diphthong /iě/ in the theme of plural
feminine nouns. E.g.: o rromnién ‘the Romani women-ACC.’;
 metathesis of phonological sequences featuring the phoneme /r/. E.g.: berś >
breś ‘year’, barvaló > bavraló ‘rich’;
 absence of the allophones /ə, ɨ/ of the vowels /e, i/;
 palatalization of consonants in feminine adjectival and participial forms. E.g.:
lolí > lojí, ‘red’, laźawdi > laźawźi ‘ashamed’;
 presence of the voiced retroflex apical flap /ɽ/ in some lexical items. E.g.:
rromanéste ‘in a Romani manner’, (i) xuxúrr ‘mushroom’.
At the morphological level, Sarău (1998: 146-151) identifies the following features:
 the most widespread abstract nominaliser is –mós, a feature shared with

Kalderash Romani. E.g.: śudrimós ‘coolness’, ternimós ‘youth’. Only a
handful of nouns employ the Indic-derived abstract nominaliser –bé. E.g.:
xabé ‘food’;
 short Genitive forms. E.g.: sapésqo ‘of the snake’, sapénqo ‘of the snakes’;
 the masculine definite article is invariably o (e.g.: o gras ‘the horse’, o grasta
‘the horses’), the feminine form is i for singular animate and inanimate nouns,
and o for plural Nominative and singular and plural Oblique cases. E.g.: i
leléka ‘the stork’, o lelékes ‘the storks’, o bará ‘the gardens-ACC.’, o daján
‘the mothers-ACC.’;
 cardinal numerals are built with the help of the Persian-derived interfix –u-.
E.g.: deśueftá ‘seventeen’;
 ordinal numerals are wholesale Romanian borrowings. E.g.: al dóilea gras
‘the second horse’, a tréja daj ‘the third mother’;
 analytic way of expressing the future with the aid of the particle kam derived
from the modal verb kaméla (‘to wish, to desire’). E.g.: me kam keráw ‘I am
going to make’;
 synthetic reflexive verbs end in the palatalized suffix –zól. E.g.: kerzól ‘to
become, to turn into’;
 analytic reflexive verbs are accompanied by the reflexive pronoun pes (3rd
person singular) and pumén (3rd person plural), built via analogy with the 2nd
person plural pronoun tume (‘you’);
23
 the Greek-derived interfix –sar- helps the integration of loan verbs. E.g.:
lućisárla ‘(it) shines’;
 the adverb of the comparative and superlative forms of adjectives and adverbs
is the South Slavic-derived lexical item po ‘more’;
 the most widespread negation is na (‘no’). However, other negations are used
i.e. nákha (‘no’) and ni (‘not even’).
At the lexical level, a thorough analysis of the data presented by Sarău (1998:151-
155) shows that Xoraxane Romani preserves several Indo-Aryan lexemes that were
lost in the aforementioned dialects ((o) sastró ‘father-in-law’, (i) sasúj ‘mother-in-
law’), Greek lexical items also lost in the previous dialects (vázi ‘plain, field’),
Turkish lexemes ((o) kïzáj ‘child’, (i) źeśtaj ‘sister-in-law’, (o) śakás ‘joke’). It also
shows that this dialect lost some old Indo-Aryan lexical items ((o) truśúl ‘cross’, (e)
patrin ‘leaf’, (ś)uźo ‘pure’) and some pre-Balkanic loanwords ( (o) xer ‘ass’, (o)
ambról ‘pear’, (o) wurdón ‘cart’) and replaced them with Romanian borrowings i.e.
(i) krúćea, (i) frúnza, kurátos, (o) magári, (i) péra and (i) karúca.
1.2.2.4 Transylvanian Carpathian Romani
This dialect is spoken in Crisana, Maramures and other parts of Transylvania. At the
level of phonology, Sarău (1998: 124-125) identifies the following features:
 reduction of the affricate /dʒ/ to the fricative /ʒ/ in native words and Romanian
borrowings. E.g.: gaği > gaźi ‘non-Romani woman’, /ɨntseleʒiˈnel/ ‘to
understand’ < (Romanian) /ɨntseˈledʒe/. This feature is likely to be the
outcome of interdialect contact with Kalderash Romani;
 affrication of /c/ before /i/. E.g.: lésќoro > lésćiro ‘his’, tikno > cikno ‘little-
M.SG.’;
 syncopate nouns, adjectives, and Genitive forms. E.g.: bimóskoro>bimósk-ro
‘dumb’;
 vocalization of word-final /v/. E.g.: (me) duśáw ‘I am milking’, ruw ‘he-wolf’;
 regressive assimilation of the consonant /v/ in final position in first person
singular Present Indicative forms. E.g.: (me) vazdav tut >(me) vadzap tut ‘(I)
am helping you’;
 presence of the open-mid front unrounded vowel /ɛ/ in some varieties, which
has phonemic value, marking the opposition between singular and plural
24
nouns. E.g.: ɛ̀rmɛ ‘a head of cabbage’ versus ɛ̀rme ‘heads of cabbage’ (Sumi,
2016);
 aspiration of the fricative /s/ in the initial position and intervocalic context via
debuccalization. E.g.: maresa > marɛhɛ ‘(you) will beat/strike’;
 emergence of the allophones /ɨ, ə/ of the vowels /i, e/ under Kalderash
influence. E.g.: briśind > brïśïnd ‘rain’, berś > bërś ‘year’.
At the level of morphology, Sarău (1998:125-126) notices the following features:
 presence of the suffix –i used for adapting foreign nouns e.g. salivári ‘bridle’;
 the diphthong -ió- in the first-person singular Perfect ending. E.g.: me
duślióm ‘I have milked’;
 a specific modal verb kámpel expressing necessity derived from kamel pes ‘to
be wanted, desired’. E.g.: kadá léske kámpel! ‘This is what he deserves!’
 presence of Romanian-derived agentive nominalisers –íca and –tóri. E.g.:
manglitoríca ‘female beggar’, manglitóri ‘male beggar’. This feature can
represent an instance of Kalderash Romani influence;
 extensive use of the Indic-derived abstract nominaliser –bó e.g. ćioribó ‘theft’;
 plural oikoclitic abstract nouns receive the Indic-derived plural marker –e,
unlike in Kalderash, where the Greek-derived plural marker –mata is used e.g.
gulibe ‘sweets’;
 xenoclitic participles derived with the Greek suffix –imen exhibit plural
forms, unlike in Vlax dialects where they are invariable. E.g.: Lala hin la bala
feștymena. ‘She has dyed her hair’ (Daniel-Samuel Petrilă, personal
communication);
 the forms of the article are o for masculine animate and inanimate nouns, e for
feminine animate and inanimate nouns, o for all genders in plural Nominative
and le for all genders and numbers in Oblique cases.
The dialect spoken in Romania evinces Romanian and Hungarian loanwords. Some
examples are gedèśi or vònato (‘train’), ćegebigo (‘snail’), hegedűve (‘violin’), bìstoś
(‘sure’), śàrkany (‘dragon’), kapalinav (‘(I) hoe’) of Hungarian origin, ïnźero
(‘angel’), leźinaw (‘(I) choose’), lulàva (‘pipe’), vrämä (‘time’) taken from Romanian
and dósta (‘enough’), xoláva (‘trousers’), źámba (‘frog’), źélta (‘bank-note’), zéleno
(‘green’), gláźa (‘glass’) borrowed from South Slavic languages.
25
1.2.3 A Sketch of European Romani Dialects
The aim of this subsection is presenting a brief sketch of the main Romani dialects
spoken in Europe that will be dealt with in the last chapter entitled Levels of
Codification. The dialects that will be tackled are Arli Romani, Sinti Romani, Welsh
Romani, and Rumelian Romani.
1.2.3.1 Arli Romani
Arli Romani is a dialect belonging to the first Layer of linguistic development spoken
in Macedonia. It is widely spoken across the Balkans. The main linguistic features of
this dialect were outlined by Friedman (n.d., 1-8):
 final stress in native words, penultimate stress in nominal and verbal

inflection, in Hellenisms and other recent loanwords. E.g.: dženó ‘male
person’, Dative dženéske, vogí ‘soul, belly’, Dative vogéske, džépo ‘pocket’,
Dative džepóske, sfíri ‘hammer’, Dative sfiríske;
 emergence of vowel /y/ in some Xoraxane Macedonian Arli dialects under
Albanian and Turkish influence and of schwa. E.g. dikhljum >dikhlüm
/dikʰˈlym/, sastrən ‘iron’;
 it has five main vowels (/a/, /e/, /i/, /o/, /u/), evincing no opposition of length;
 a great number of palatalizations. E.g.: čhumindžum ‘(I) have kissed’, biandža
‘(she) gave birth’, kerdža ‘(he/she) made’, pačav ‘(I) believe, think’;
 prothetic /v/ and /j/ in nouns and adjectives. E.g.: vogí ‘soul, belly’, javer
‘other-SG.’;
 neutralization of aspiration in word-final position and before another
consonant. E.g.: jek ‘one’, pučel ‘(he/she) asks’;
 apocope of final /s/ in masculine singular Accusative nouns. E.g.: čhaves >
čhave ‘Romani guy-acc.’;
 instrumental forms ending in –r (-çar) in Plural nouns;
 reduction to a glide of morphological /s/ in intervocalic environment -e[j]a/[-
a[j]a in Singular nouns;
 analytic Future forms made up with the grammaticalized particle ka, deriving
from a lexical verb kamel (‘to want, to wish, to desire’). E.g.: ka ovel man ‘(I)
shall have’;
26
 Genitives are usually long and unsyncopated. However, some Arli dialects
evince short singular masculine Genitive forms. E.g.: me phraleske kheresko
vudar ‘the door of my brother’s house’;
 presence of the Slavic-derived comparative and superlative adverbs (po-, and
naj-) used for forming comparative and superlative forms of adjectives and
adverbs. However, the older suffix –eder used for deriving comparative forms
is also attested;
 extensive use of the abstract nominaliser –pe/-be. E.g.: ternipe ‘youth’, maribe
‘war’.
1.2.3.2 Sinti Romani
Sinti Romani is a group of Romani dialects spoken in Western Europe. It has the
following branches: Sinti Gačkane spoken in Germany, Sinti Estrexarja spoken in
Austria, Piedmontese Sinti spoken in Northern Italy, Lalere Sinti spoken in
Hildesheim, Abruzzian Sinti spoken in Southern Italy and Manouche spoken in
France. It is an archaic dialect, belonging to the First Layer of development. The main
features of this dialect were outlined by Holzinger (1995):
 betacism under German influence: *phabaj > phāvi ‘apple’, *birta > virta
‘pub’;
 fortition of final etymological /v/. E.g.: patíb < *pativ ‘duty’,jōb < * jov ‘he’;
 fronting to /e/ of vowel /a/. E.g.: šuker < śukar ‘handsome’, sonekáj <
sonakaj ‘gold’;
 prothetic /j/ in third-person personal pronouns. E.g.: jōb ‘he’, joi ‘she’, jon
‘they’;
 devoicing of word-final consonants. E.g.: *ʒav >*ʒab > džap ‘I am going’;
 voicing of word-initial consonants. E.g.: brindževau ‘(I) know’, ginom ‘(I)
bought’;
 syncope of /e/ in unstressed position: krel < *kerel ‘to do’, gjom < *geljom
‘(I) went’;
 vocalization of word-final /v/. E.g.: brindževau ‘(I) know’, ginau ‘(I) tell’;
 aspiration of the fricative /s/ via debuccalization. E.g.: his < *sas ‘was-3rd.
pers.sg.’
27
 aphaeresis. E.g.: vrī < avri ‘out’, prē < opre ‘up’, drē < andre ‘in’, men <
amen ‘us’;
 presence of the back vowel /o/ in the 1st pers. sg. Perfect ending. E.g.: dajom
‘(I) gave’, gjóm ‘(I) went’;
 Italian loanwords in Sinti varieties spoken in Northern Italy. E.g.: kontán
‘satisfied’, óni ‘every’, sémpar ‘always, forever’, ke ‘whom’, kun ‘with’
(Franzese, 2002);
 the relative clause always follows the modified noun. The relativizer is kai;
 existence of subject clitics (lo/li/le), which Holzinger (1993:290-308) cited in
Matras (1999:149) defines as ‘highly contiguous referential markers’;
 presence of resumptive pronouns of animate direct and indirect objects. E.g.:
Ko rom, kai brindževau les. ‘The man that I know.’
 the Passive is made up with the verb va- (‘to come’) and the Perfect Participle
and the verb ‘to be’ plus the PP for states. E.g.: O vūder his phandlo. ‘The
door was closed.’
 auxiliaries appear in final position in subordinate clauses. E.g.:
Job ginas drē, kai koi ta koi o mūlo tšido hi.

he counted in that there and there a dead laid is
‘He told that there is a dead man at such and such a place.’
1.2.3.3 Welsh Romani
Welsh Romani was a dialect spoken in Wales that went extinct in the course of the
twentieth century. The main source of linguistic data comes from John Sampson’s
monumental work entitled The Dialect of the Gypsies of Wales (1926):
 emergence of phonemes derived from language contact with Welsh noted by

Sampson <l̥ , r̥, w̥>. E.g.: l̥ okō ‘light’, r̥ōd- ‘to seek’, w̥aspa ‘wasp’;
 emergence of phonemes derived from contact with English such as /ɒ, ɵ,ð, w/.
E.g.: tåp ‘on’, munþos ‘mouth’, gloðera ‘soldier’, wēva ‘wave’;
 reduction of the aspirated affricate /ʧʰ/ to the fricative /ʃ/. E.g.: śōn ‘month’;
 simplification of clusters /vd, vn/ through assimilation. E.g.: gurunī < guruvni
‘cow’;
28
 metathesis of clusters featuring /r/. E.g.: karfin > krafnī ‘nail’, ruzalō < zuralo
‘strong’;
 palatalization of /d/ followed by the front vowel /i/. E.g.: klidin > klizín
‘key’, goźalō < godălo ‘wise’;
 paragogic /n/ and /v/: būtī > būtīn ‘work’, čuri > čurin ‘knife’, pānī > pānín
‘water’;
 extensive use of the suffix –ben/-pen. E.g.: bārvalipen ‘wealth’, dūriben
‘distance’;
 extensive use of the borrowed suffix –mos for deriving abstract nouns from
both inherited roots (cf. asavimos 'laughter') and loan verbs (cf. bignimos
‘beginning’);
 long Present Indicative forms ending in the particle –a, which expresses
futurity in Vlax Romani. E.g.: Ĵunáva ī vaverén ‘(I) know the others.’
 presence of the Romanian-derived nominalizer –ārī(s). E.g.: kirčimāris ‘inn-
keeper’;
 both short and long unsyncopated Genitives. E.g.: dadéskī grasnī ‘father’s
mare’, dadéskō grai ‘father’s horse’, vasteŋerī ‘of the hands’, vavereskō ‘of
the other’.
1.2.3.4 Rumelian Romani
This dialect belongs to the so-called Balkan Romani dialect family, being spoken by
the Romani people in Turkish Rumelia. The most famous scholar who did extensive
research on this dialect is Alexander Paspati (1870). The most important features are
the following:
 preservation of the affricates /dʒ, ʧ/. E.g.: tcharó ‘plate’, gadjò ‘foreigner’;
 palatalization e.g. penghiás < phendăs ‘(he) said’;
 analytic Future made up with the particle kama-. E.g.: kamakeràva ‘(I) shall
make’;
 long and unsyncopated Genitives e.g. sappéskoro múi ‘the snake’s mouth’;
 the masculine definite article is o (e.g. o tchavó 'the Romani lad'), the feminine
definite article is i, its plural form is o for both genders (e.g. o trin borià ‘the
three brides’), while the Oblique article is e for both genders. E.g.: e kherés
‘the ass-acc.’;
29
 productive use of the abstract nominalizer –pé/-bé. E.g.: maribé ‘fight’,
putchipé ‘question’;
 rare instances of zero plural marking (ø). E.g.: sávvore o rómø ‘all the Roma’;
 the diphthong /iǒ/ present in Past Indicative forms e.g. diklióm ‘(I) have seen’;
 long Present Indicative forms ending in –a this particle expressing futurity in
Vlax Romani. E.g.: teréla panj khér ‘(he) owns five asses’, kináva matchó ‘(I)
buy fish’;
 sporadic dejotation of the feminine plural morpheme. E.g.: patriná < patriniá
‘leaves’.
1.2.4 Para-Romani Language Varieties
These mixed languages, also called pogadi ćhib (‘broken language’) by Marcel
Courthiade, feature Indo-Aryan lexicon and non-Romani phonology, morphology and
syntax. In the book entitled Romani in Contact-The History, Structure and Sociology
of a Language (Matras: 1994), Peter Bakker (1994:136) mentions that the following
languages gave rise to these mixed languages: Swedish, Norwegian, German,
Catalan, Portuguese, English, Basque, Spanish, Greek, Persian, Turkish and
Armenian. The main Para-Romani languages are Caló (with its two branches:
Portuguese Caló spoken in Brazil, Andalusian Caló spoken in Spain and Portugal),
Catalonian Para-Romani, once spoken in Catalonia, Scando-Romani, a cover term for
mixed Traveler Danish, Tavringer Romani and Traveler Norwegian varieties,
Erromintxela spoken in the Basque Country, Angloromani spoken in the United
Kingdom by the English Romanies, Helleno-Romani, a cover term for Finikas
Romika, Janina Para-Romani and Konitsa Para-Romani, spoken the Greek Romanies,
Romano-Serbian, a mixed Serbian-based language variety spoken by some Serbian
Roma, Laiuse Romani, an Estonian-based variety once spoken in Estonia, Bohemian
Romani, a mixed Czech-Romani language formerly spoken in Bohemia, Crimean
Romani, a Turkish-based language variety spoken by Crimean Roma and Kurbetcha,
a Cypriot Turkish-based variety spoken by Cypriot Roma.
Some hypotheses mentioned by Bakker (1994:126) regarding the genesis of these

mixed languages are conscious creation of a mixed idiom, relexification to language
intertwining and rescuing a dying language by preserving its lexicon and borrowing
on a wholesale scale the phonology and morphosyntax of the host language.
30
Contents
1.1 Romani Language-A Diachronic Approach ...................................................................... 2
1.1.1 Proto-Romani ........................................................................................................... 3
1.1.2 Early Romani............................................................................................................ 8
1.1.3 Romani in the context of Indo-Aryan languages ................................................... 12
1.2 Romani Language - A Synchronic Approach................................................................. 13
1.2.1 Dialect Classifications ............................................................................................. 14
1.2.2 An Overview of the Romani Dialects Spoken in Romania ...................................... 17
1.2.3 A Sketch of European Romani Dialects .................................................................. 26
1.2.4 Para-Romani Language Varieties........................................................................... 30
31

Dissertation-May Version - Edited

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dissertation-May Version - Edited

Uploaded by

Copyright:

Available Formats

A Diachronic and a Synchronic

Approach to Romani Language

Romani (also known pejoratively in English as Gipsy) is an Indo-Aryan language

 ethnic designations referring to traditional trades, the actual ethnonym is an

 ethnic designations that represent endonyms, originating in Romani. E.g.:

 several other ethnic designations related to various aspects such as settledness

Although Roma use a wide plethora of self-designations, the most widespread is

1.1 Romani Language-A Diachronic Approach

 reduction of initial and medial consonant clusters. Gemination occurred in the

Table x Proto-Romani Consonant Phonemes adapted from Higgie (1994: 113)

The morphology of Proto-Romani was entirely Indo-Aryan. Unfortunately, no

(2) *am-en mās kin-ib-o (si)

Other constructions that Beníšek (2010) reconstructs are imperfective participles,

(3) a. *Me džabo kamam.

b. *Man džabo si.

1.1.2 Early Romani

Other evolutions typical of the post-Early Romani period include allophonic

Bilabial Labiodental Alveolar Palato- Retroflex Palatal Velar Uvular Glottal

Although its phonology showed signs of non-Indo-Aryan influence (mainly Greek

Early Romani possessed a set of valency-changing morphemes inherited from its

1.1.3 Romani in the context of Indo-Aryan languages

Table x Romani Innovations Shared with Central Indo-Aryan Languages

OIA Northwest Southwest Central Romani East South

sm sp, ss mh? mh mh mh mh?

1.2 Romani Language - A Synchronic Approach

1.2.1 Dialect Classifications

Finally, a more refined classification is provided by Boretzky (2003), who

Parameter Layer I Layer II Layer III

reduction of the -not present. Idem Layer III śib (Kalderash

the abstract -pe(n), -pa /-be(n), –ba Idem Layer I -mos

the comparative po ‘more’ po/maj ‘more’ maj ‘more’

1.2.2.1 Kalderash Romani

At the phonological level, Sarău (1998:142-146) identifies the following features:

At the morphological level, Sarău (1998:146-151) mentioned the following features:

 presence of the Greek-derived abstract nominaliser –mos. E.g.: tërnimós

At the level of syntax, I could identify the following characteristics:

1.2.2.2 Ursari Romani

 presence of the abstract nominaliser –pé and sporadically the voiced

Its lexicon shows considerable dialectal variation. It evinces Romanian loanwords

1.2.2.3 Xoraxane Romani

 the most widespread abstract nominaliser is –mós, a feature shared with

1.2.2.4 Transylvanian Carpathian Romani

At the level of morphology, Sarău (1998:125-126) notices the following features:

1.2.3.1 Arli Romani

 final stress in native words, penultimate stress in nominal and verbal

1.2.3.2 Sinti Romani

Job ginas drē, kai koi ta koi o mūlo tšido hi.

1.2.3.3 Welsh Romani

 emergence of phonemes derived from language contact with Welsh noted by

1.2.3.4 Rumelian Romani

1.2.4 Para-Romani Language Varieties

Some hypotheses mentioned by Bakker (1994:126) regarding the genesis of these

You might also like