Download as pdf or txt
Download as pdf or txt
You are on page 1of 76

1

Unified Parts of Speech (POS) Standard in Indian


Languages
- Draft Standard –Version 1.0

Department of Information Technology


Ministry of Communications & Information Technology
Govt. of India

Copyright@TDIL
2

CONTENTS

1. INTRODUCTION
2. SCOPE
3. TERMINOLOGY
3.1 POS Tag
3.2 XML Schema
3.3 Metadata
4. WHAT IS A POS TAG
5. REQUIREMENTS OF A POS TAG
5.1 Need of XML Schema in designing common POS format
6. POS TAG SET FOR INDIAN LANGUAGES
7. XML INTERNATIONALIZATION BEST PRACTICES
7.1 What is Internationalization Tag Set (ITS)
8. XML SCHEMA
9. METADATA ON POS
10. ONE TO ONE MAPPING LABELS IN POS SCHEMA
11. POS SCHEMA BLOCK DIAGRAM
12. DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML
13. ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES
14. ALGORITHM FOR SELECTION OF NODES
15. REFERENCE BASED IMPLEMENTATION
16. REFERENCE

ANNEXURES

A. Language Code Table

Copyright@TDIL
3

1. INTRODUCTION

Parts of Speech tagging is one the key building blocks (noun, pronoun, verb,
demonstrative, etc) for developing Natural Language Processing applications. This POS
schema is based on W3C XML Internalization best practices, ISO 639-3 Language Codes
for Language Identification, ISO 12620:1999 as metadata definition and one to one
mapping table for all the labels used in POS Schema.

This document sets out the structural part of the XML Schema definition language and
also how to make XML POS Schema for tagging. XML Schemas including an
introduction to the nature of XML Schemas and an introduction to the XML POS Schema
abstract data model, along with other terminology used throughout this document and
also specifies the precise semantics of each component of the abstract model, the
representation of each component in XML. This document contains block diagram that
shows the flow-chart of creating XML scheme for POS tagging. It also includes the
algorithm that contains metadata as per ISO 12620:1999.

2. SCOPE
The common unified XML based POS Schema for Indian Languages based on W3C
Internationalization best practices have been formulated. The schema has been developed
to take into account the NLP requirements for Web based services in Indian Languages.
This standard specifies XML POS Schema for tagging. This portion of the XML Schema
Language discusses labels that can be used in an XML POS Schema.

3. TERMINOLOGY

3.1 POS Tag: A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads
text in some language and assigns parts of speech to each word.

3.2 XML Schema: XML Schemas express shared vocabularies and allow machines to
carry out rules made by people and to define a class of XML documents, and so the
term "instance document" is often used to describe an XML document that conforms to
a particular schema.

3.3 Metadata: Metadata describes how and when and by whom a particular set of data
was collected, and how the data is formatted.

Copyright@TDIL
4

4. WHAT IS A POS TAG


A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some
language and assigns parts of speech to each word. Parts of speech include nouns, verbs,
adverbs, adjectives, pronouns, conjunction and their sub-categories.

The input to a tagging algorithm is a string of words of a natural language sentence and a
specified tag set (a finite list of Part-of-speech tags). The output is a single best POS tag
for each word.

5. REQUIREMENT OF A POS TAG

The POS tagger can be used as a pre-processor. Text indexing and retrieval uses POS
information. POS tagger is used for making tagged corpora and Machine Translation
System. Speech processing uses POS tags to decide the pronunciation.
POS tagger would be needed to identify the tag for the words that could not be analysed
by the morphological analyser. If the Morph gives multiple tags for a word, then the
tagger could be used to resolve the ambiguity.

5.1 NEED OF XML SCHEMA IN DESIGNING COMMON POS FORMAT

The need of XML for creating POS tag-set is to standardize the POS tag framework
for all Indian languages.
The main benefits of xml in using POS tag set for IL’s are:
• It Supports multilingual documents and Unicode
• XML allows developers to add extra information to a format without breaking
applications.
• XML documents can be stored without using database administrator, because they
contain meta data in the form of tags and attributes.
• The tree structure of XML documents allows documents to be compared and
aggregated efficiently element by element.
• XML documents can consist of nested elements that are distributed over multiple
remote servers
It is easier to convert data between different data types.

Copyright@TDIL
5

6. POS Tag set for Indian Languages


POS Categories and Labels

Sl. No Category Label Annotation Remarks


Convention**
Top level Subtype Subtype
(level 1) (level 2)
1 Noun N N
1.1 Common NN N__NN
1.2 Proper NNP N__NNP
1.3 Verbal NNV N__NNV The verbal noun
sub type is only
for languages
such as Tamil and
Malayalam)
1.4 Nloc NST N__NST
2 Pronoun PR PR
2.1 Personal PRP PR__PRP
2.2 Reflexive PRF PR__PRF
2.3 Relative PRL PR__PRL
2.4 Reciprocal PRC PR__PRC
2.5 Wh-word PRQ PR__PRQ
2.6 INDEFINITE PRI PR__PRI
3 Demonstrative DM DM
3.1 Deictic DMD DM__DMD
3.2 Relative DMR DM__DMR
3.3 Wh-word DMQ DM__DMQ
3.4 Indefinite DMI DM__DMI
4 Verb V V
4.1 Main VM V__VM
4.1.1 Finite VF V__VM__VF
4.1.2 Non-finite VNF V__VM__VNF
4.1.3 Infinitive VINF V__VM__VINF
4.1.4 Gerund VNG V__VM__VNG
4.2 Verbal VN V__VN paTittam,

Copyright@TDIL
6

naTattam,
naTanam

4.2 Auxiliary VAUX V__VAUX


4.2.1 Finite VAUX V__VAUX__VF
4.2.2 Non-finite VNF V__VAUX__VNF
4.2.3 Infinitive VINF V__VAUX__VINF
4.2.4 Gerund VNG V__VAUX__VNG
4.2.5 PARTICIP VNP V_VAUX_VNP
LE NOUN

5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
8.1 Co-ordinator CCD CC__CCD
8.2 Subordinator CCS CC__CCS
8.2.1 Quotative UT CC__CCS__UT
9 Particles RP RP
9.1 Default RPD RP__RPD
9.2 Classifier CL RP__CL
9.3 Interjection INJ RP__INJ
9.4 Intensifier INTF RP__INTF
9.5 Negation NEG RP__NEG
10 Quantifiers QT QT
10.1 General QTF QT__QTF
10.2 Cardinals QTC QT__QTC
10.3 Ordinals QTO QT__QTO
11 Residuals RD RD
11.1 Foreign word RDF RD__RDF A word written in
script other than
the script of the
original text
11.2 Symbol SYM RD__SYM For symbols such

Copyright@TDIL
7

as $, & etc
11.3 Punctuation PUNC RD__PUNC Only for
punctuations
11.4 Unknown UNK RD__UNK
11.5 Echowords ECH RD__ECH

POS for Hindi

Sl. Category Label Annotation Examples Remarks


No Convention**
Top level Subtype Subtype
(level 1) (level 2)
1 Noun N N ladakaa,
raajaa,
kitaaba

1.1 Common NN N__NN kitaaba,


kalama,
cashmaa

1.2 Proper NNP N__NNP Mohan, ravi,


rashmi
1.4 Nloc NST N__NST Uupara,
niice, aage,
piiche
2 Pronoun PR PR Yaha, vaha,
jo
2.1 Personal PRP PR__PRP Vaha, main,
tuma, ve
2.2 Reflexive PRF PR__PRF Apanaa,
swayam,
khuda
2.3 Relative PRL PR__PRL Jo, jis, jab,
jahaaM,
2.4 Reciprocal PRC PR__PRC Paraspara,
aapasa
2.5 Wh-word PRQ PR__PRQ Kauna, kab,
kahaaM
Indefinite PRI PR__PRI Koii, kis

Copyright@TDIL
8

3 Demonstrative DM DM Vaha, jo,


yaha,
3.1 Deictic DMD DM__DMD Vaha, yaha
3.2 Relative DMR DM__DMR jo, jis
3.3 Wh-word DMQ DM__DMQ kis, kaun
Indefinite DMI DM__DMI KoI, kis
4 Verb V V giraa, gayaa,
sonaa,
haMstaa,
hai, rahaa
4.1 Main VM V__VM giraa, gayaa,
sonaa,
haMstaa,
4.2 Auxiliary VAUX V__VAUX hai, rahaa,
huaa,
5 Adjective JJ JJ sundara,
acchaa,
baRaa
6 Adverb RB RB jaldii, teza
7 Postposition PSP PSP ne, ko, se,
mein
8 Conjunction CC CC aur, agar,
tathaa,
kyonki
8.1 Co-ordinator CCD CC__CCD aur, balki,
parantu
8.2 Subordinator CCS CC__CCS Agar,
kyonki, to,
ki
9 Particles RP RP to, bhii, hii
9.1 Default RPD RP__RPD to,bhii, hii
9.3 Interjection INJ RP__INJ are, he, o
9.4 Intensifier INTF RP__INTF bahuta,
behada
9.5 Negation NEG RP__NEG nahiin,
mata, binaa
10 Quantifiers QT QT thoRaa,
bahuta,
kucha, eka,
pahalaa

Copyright@TDIL
9

10.1 General QTF QT__QTF thoRaa,


bahuta,
kucha
10.2 Cardinals QTC QT__QTC eka, do,
tiina,
10.3 Ordinals QTO QT__QTO pahalaa,
duusaraa
11 Residuals RD RD
11.1 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
11.2 Symbol SYM RD__SYM $, &, *, (, ) For symbols
such as $, &
etc
11.3 Punctuation PUNC RD__PUNC ., : ; Only for
punctuations
11.4 Unknown UNK RD__UNK
11.5 Echowords ECH RD__ECH (Paanii-)
vaanii,
(khaanaa-)
vaanaa
** The annotation is to be done using the lowest level tag of the type hierarchy. Once the
lower level tag is selected, the higher level tags should be stored automatically.

POS for Punjabi

Sl. No Category Label Annotation Examples Remarks


Convention**
Top level Subtype Subtype
(level 1) (level 2)
1 Noun N N ਘਰ ਿਕਤਾਬ Gara kiwAba
kahANI
ਕਹਾਣੀ ਸਡ਼ sadZaka

1.1 Common NN N__NN ਘਰ ਿਕਤਾਬ Gara kiwAba


kahANI
ਕਹਾਣੀ ਸਡ਼ sadZaka

1.2 Proper NNP N__NNP ਹਰਿਵੰ ਦਰ haraviMxara


xiYlI

Copyright@TDIL
10

ਿਦੱ ਲੀ wAjamahila

ਤਾਜਮਿਹਲ
1.4 Nloc NST N__NST �ਤੇ ਥੱ ਲੇ ਅੱ ਗੇ uYwe WaYle
aYge piYCe
ਿਪੱ ਛੇ
2 Pronoun PR PR ਮ� ਤੂੰ ਉਹ ਇਹ mEz wUM uha
iha jo
ਜੋ
2.1 Personal PRP PR__PRP ਮ� ਤੁੰ ਉਹ mEz wuM uha

2.2 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ ApaNA Apa


Kuxa
ਖੁਦ
2.3 Relative PRL PR__PRL ਜੋ, ਿਜਸ jo jisa jihadZA
jaxoz
ਿਜਹਡ਼, ਜਦ�,
2.4 Reciprocal PRC PR__PRC ਆਪਸ Apasa

2.5 Wh-word PRQ PR__PRQ ਕੌ ਣ ਕਦ� ਿਕੱ ਥੇ kONa kaxoz


kiYWe
2.6 Indefinite PRI PR_PRI ਕੋਈ, ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜੋ ਇਹ uha jo iha

3.1 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

3.2 Relative DMR DM__DMR ਜੋ ਿਜਸ jo jisa

3.3 Wh-word DMQ DM__DMQ ਕੌ ਣ kONa

3.4 indefinite DMI DM_DMI ਕੋਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ AiA jA karaxA


mArAzgA
ਕਰਦਾ rahiMxA
ਮਾਰ�ਗਾ
ਰਿਹੰ ਦਾ
4.1 Main VM V__VM ਆਇਆ ਜਾ AiA jA karaxA
mArAzgA
ਕਰਦਾ rahiMxA
ਮਾਰ�ਗਾ
ਰਿਹੰ ਦਾ
4.1.2 Non-finite VNF V__VM__VNF ਜ�ਿਦਆਂ jAzxiAz
AuzxiAz
ਆ�ਿਦਆਂ karaxiAz

Copyright@TDIL
11

ਕਰਿਦਆਂ ਖਾਕੇ KAke jAke


ਜਾਕੇ
4.1.3 Infinitive VINF V__VM__VINF ਿਗਆਂ giAz, AiAz,
kariAz
ਆਇਆਂ
ਕਿਰਆਂ
4.1.4 Gerund VNG V__VM__VNG ਜਾਣ� ਖਾਣ� ਪੀਣ� jANoz KANoz
pINoz
ਮਰਨ� maranoz

4.2 Auxiliary VAUX V__VAUX ਹੈ ਸੀ ਸਿਕਆ hE sI sakiA


hoiA
ਹੋਇਆ
5 Adjective JJ ਸੋਹਣਾ ਚੰ ਗਾ sohaNA
caMgA
ਮਾਡ਼ਾ ਕਾਲ਼ mAdZA kAA
6 Adverb RB ਹੌਲ਼ੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ� ਨੂੰ ਤ� ਨਾਲ ne nUM woz


nAla
8 Conjunction CC CC ਅਤੇ ਿਕ�ਿਕ awe kiuzki
agara ki sagoz
ਅਗਰ ਿਕ ਸਗ�
8.1 Co-ordinator CCD CC__CCD ਅਤੇ ਜ� awe jAz

8.2 Subordinator CCS CC__CCS ਿਕ�ਿਕ ਿਕ ਜੋ kiuzki ki jo


wAz
ਤ�
9 Particles RP RP ਵੀ ਤ� ਹੀ vI wAz hI

9.1 Default RPD RP__RPD ਵੀ ਤ� ਹੀ vI wAz hI

9.2 Classifier CL RP__CL Not required

9.3 Interjection INJ RP__INJ ਉਏ ਅਿਡ਼ਆ ue adZiA nI


janAba
ਨੀ ਜਨਾਬ
9.4 Intensifier INTF RP__INTF ਬਹੁਤ ਬਡ਼ bahuwa
badZA
9.5 Negation NEG RP__NEG ਨਹ� ਨਾ ਿਬਨ� nahIz nA
binAz vagEra
ਵਗੈਰ
10 Quantifiers QT QT ਥੋਡ਼ਾ ਬਹੁਤ WodZA
bahuwA kAPI
ਕਾਫੀ ਕੁਝ ਇੱ ਕ kuJa iYka

Copyright@TDIL
12

ਪਿਹਲਾ pahilA

10.1 General QTF QT__QTF ਥੋਡ਼ਾ ਬਹੁਤ WodZA


bahuwA kAPI
ਕਾਫੀ ਕੁਝ kuJa
10.2 Cardinals QTC QT__QTC ਇੱ ਕ ਦੋ ਿਤੰ ਨ iYka xo wiMna

10.3 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦੂਜਾ pahilA xUjA

11 Residuals RD RD
11.1 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
11.2 Symbol SYM RD__SYM $, &, *, (, ) For symbols
such as $, &
etc
11.3 Punctuation PUNC RD__PUNC ., : ; Only for
punctuations
11.4 Unknown UNK RD__UNK
11.5 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ (pANI-) XANI
(cAha-) cUha
(ਚਾਹ-) ਚੂਹ
** The annotation is to be done using the lowest level tag of the type hierarchy. Once the
lower level tag is selected, the higher level tags should be stored automatically.

Tagset for Dravidian Languages (Telugu, Kannada, Malayalam and Tamil)

Sl. No Category Label Annotation Remarks


Convention**
Top level Subtype Subtype
(level 1) (level 2)
1 Noun N N
1.1 Common NN N__NN
1.2 Proper NNP N__NNP
1.3 Nloc NST N__NST
2 Pronoun PR PR
2.1 Personal PRP PR__PRP
2.2 Reflexive PRF PR__PRF

Copyright@TDIL
13

2.3 Relative PRL PR__PRL


2.4 Reciprocal PRC PR__PRC
2.5 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM
3.1 Deictic DMD DM__DMD
3.2 Relative DMR DM__DMR
3.3 Wh-word DMQ DM__DMQ

4 Verb V V
4.1 Main VM V__VM
4.1.1 Finite VF V__VM__VF
4.1.2 Non-finite VNF V__VM__VNF
4.1.3 Infinitive VINF V__VM__VINF
4.1.4 Gerund VNG V__VM__VNG
4.2 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

4.3 Auxiliary VAUX V__VAUX


4.3.1 Non-finite VNF V_VM_VNF
4.3.2 Infinite VINF V_VM_VNF
5 Adjective JJ
6 Adverb RB Only manner
adverbs
7 Postposition PSP
8 Conjunction CC CC
8.1 Co- CCD CC__CCD
ordinator
8.2 Subordinator CCS CC__CCS
8.2.1 Quotative UT CC__CCS__UT
9 Particles RP RP
9.1 Default RPD RP__RPD
9.2 Classifier CL RP__CL
9.3 Interjection INJ RP__INJ
9.4 Intensifier INTF RP__INTF

Copyright@TDIL
14

9.5 Negation NEG RP__NEG


10 Quantifiers QT QT
10.1 General QTF QT__QTF
10.2 Cardinals QTC QT__QTC
10.3 Ordinals QTO QT__QTO
11 Residuals RD RD
11.1 Foreign RDF RD__RDF A word written in
word script other than
the script of the
original text
11.2 Symbol SYM RD__SYM For symbols such
as $, & etc
11.3 Punctuation PUNC RD__PUNC Only for
punctuations
11.4 Unknown UNK RD__UNK
11.5 Echowords ECH RD__ECH
** The annotation is to be done using the lowest level tag of the type hierarchy. Once the
lower level tag is selected, the higher level tags should be stored automatically.

POS for Tamil

Sl. No Category Label Annotation Examples Remarks


Convention**
Top level Subtype (level Subtype
1) (level 2)
1 Noun N N paiyan,

raajaa,

puttakam

1.1 Common NN N__NN puttakam,

kaNNaaTi,

paTam

1.2 Proper NNP N__NNP moohan,


ravi,
maalati
1.3 Nloc NST N__NST meel,
kiiz,
mun,
pin

Copyright@TDIL
15

2 Pronoun PR PR itu,atu,avan

2.1 Personal PRP PR__PRP naan, nii,


avaL, avarkaL
2.2 Reflexive PRF PR__PRF taan,

2.3 Relative PRL PR__PRL yaar, etu,


eppootu,
enkee
2.4 Reciprocal PRC PR__PRC oruvarukoruv
ar, avanavan,
parasparam
2.5 Wh-word PRQ PR__PRQ yaarum,
yaaraavatu,
yaaroo,
etuvum
3 Demonstrative DM DM a-, i-, e-

3.1 Deictic DMD DM__DMD anta, inta,


enta
3.2 Relative DMR DM__DMR enta

3.3 Wh-word DMQ DM__DMQ enta, yaar


eetaavatu,
yaaraavatu
4 Verb V V vizu, poo,
tuunku, aaku
4.1 Main VM V__VM vizu, poo,
tuunku, ciri
4.1.1 Finite VF V__VM__VF vizuntaan,
pooneen,
cirittaaL
4.1.2 Non-finite VNF V__VM__VNF vizunta,
poonaal
4.1.3 Infinitive VINF V__VM__VINF viza, pooka,
cirikka
4.1.4 Gerund VNG V__VM__VNG vizutal,
cirittal,
tuunkutal

4.2 Verbal VN V_VN paTippu,


naTai,
naTattai,
ceykai
4.3 Auxiliary VAUX V__VAUX aakum,
veeNTum,
muTiyum
5 Adjective JJ iniya, periya,
azakaana
6 Adverb RB veekamaaka,
viraivaaka

Copyright@TDIL
16

7 Postposition PSP paRRi,


kuRittu, viTa
8 Conjunction CC CC maRRum,
eenenRaal,
aanaal
8.1 Co-ordinator CCD CC__CCD - -um is a co-
um(raamanu ordinator which
m) can be added to
maRRum, noun and verb.
aanaal, allatu
8.2 Subordinator CCS CC__CCS enRu, ena,
enpatu,
enRaal
8.2.1 Quotative UT CC__CCS__UT enRu, ena

9 Particles RP RP maTTUm,
kuuTa
9.1 Default RPD RP__RPD maTTUm,
kuuTa
9.2 Classifier CL RP__CL Not required

9.3 Interjection INJ RP__INJ ayyoo, teey,


aamaam
9.4 Intensifier INTF RP__INTF ati, veku, mika

9.5 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam,
niRaiya, oru,
mutal
10.1 General QTF QT__QTF koncam,
niRaiya
10.2 Cardinals QTC QT__QTC onRu, iraNTu

10.3 Ordinals QTO QT__QTO mutal,


iraNTaam
11 Residuals RD RD

11.1 Foreign word RDF RD__RDF A word written


in script other
than the script
of the original
text
11.2 Symbol SYM RD__SYM $, &, *, (, ), For symbols
ruu. such as $, & etc
11.3 Punctuation PUNC RD__PUNC ., : ; Only for
punctuations
11.4 Unknown UNK RD__UNK

11.5 Echowords ECH RD__ECH vaNTi kiNTi,


paal kiil

Copyright@TDIL
17

POS for Malyalam

Sl. Category Label Annotation Examples Examples


No Convention**
in
Malayalam
Top level Subtype Subtype
(level 1) (level 2)
1 Noun N N avan

mOhan

vItu

1.1 Common NN N__NN vItu,

vellam,

pattam

1.2 Proper NNP N__NNP mOhan, േമാഹ൯


ravi,
sIta രവി
സീത
1.3 Nloc NST N__NST mEle, േമെല
tAze,
munpil, താെഴ
pinnil
മുന്പി
പിന്ന
2 Pronoun PR PR avan,aval,at അവ൯
u,itu,
അവള,
അത,
ഇത
2.1 Personal PRP PR__PRP naan, nii, ഞാ൯,നീ,
avaL, avar
അവള,
അവ൪
2.2 Reflexive PRF PR__PRF tanne-taan തെന
ത്൯
2.3 Relative PRL PR__PRL aaro, ആേരാ
2.4 Reciprocal PRC PR__PRC tammiltammi തമ്മി
l,
parasparam തമ്മ

Copyright@TDIL
18

പരസ്
രം
2.5 Wh-word PRQ PR__PRQ aaru, evan ആര,
എവ൯,
3 Demonstrative DM DM aa-, ii-, ആ, ഈ
3.1 Deictic DMD DM__DMD atu, itu അത,
ഇത,
3.2 Relative DMR DM__DMR eetu ഏത
3.3 Wh-word DMQ DM__DMQ eetu, ennane ഏത,
എങ്ങ
4 Verb V V pO, kazhi, േപാ,
Annu,ciri
കഴി
ആണ(Cop
ula), ചിരി
4.1 Main VM V__VM pO, kazhi, േപാ,
cirri,Annu(c
opula) കഴി,
ആണ,
(copula),
ചിരി
4.1.1 Finite VF V__VM__VF pOyi, േപായി,
cirikkum,
kazhikkunnu ചിരി
Akunnu(copu
ക്ക,
la)
കഴിക്
ന്,
ആകുന്
(copula)
4.1.2 Non-finite VNF V__VM__VNF pOya, േപായ,
ciricca,
kazhicca ചിരിച,
കഴിച,
4.1.3 Infinitive VINF V__VM__VINF pOkku, േപാക്,
cirikkukayAl
kazhikkee, ചിരിക്
varAn/varuv
കയാല,
An

Copyright@TDIL
19

കഴിക്,
വരാ൯/
വരുവാ

4.2 Verbal VN V__VN paTittam, പഠിത്


naTattam,
naTanam നടത്
നടനം
4.3 Auxiliary VAUX V_VAUX kolluka, െകാല��ക
talluka,
kAnuka, ,
nOkkuka തല��ക,
കാണുക,
േനാക്

5 Adjective JJ valiya, വലിയ,
ceRiya
azakulla െചറിയ,
അഴകു

6 Adverb RB veegam, േവഗം,
ativeegam,
kUtutal. അതിേവ
ഗം,
കൂടുതല
7 Postposition PSP paRRi, kUte, പറ്,
കൂെട
8 Conjunction CC CC pakshe, പെക,
, enniTTum,
ennAl,ennalu
m, enkilum എന്നിട
,

എന്ന,

എന്

Copyright@TDIL
20

ലും

എങ്കില
8.1 Co-ordinator CCD CC__CCD -um ഉം
(rAmanum)
pakshe, (രാമനും)
പെക,

8.2 Subordinator CCS CC__CCS ennu, enna, എന്,


ennAl
എന,
എന്ന
8.2.1 Quotative UT CC__CCS__UT ennu, enna എന്,
എന,
9 Particles RP RP kute,mAtram കൂെട,
മാ്രത
9.1 Default RPD RP__RPD mAtram മാ്രത
9.2 Classifier C RP__CL peer േപ൪
9.3 Interjection INJ RP__INJ ayyoo, അേയ്,
9.4 Intensifier INTF RP__INTF pala, valare, പല,
വളെര
9.5 Negation NEG RP__NEG illa, alla ഇല�,
അല�
10 Quantifiers QT QT kuracchu, കുറച്,
niraccu, oru,
dharalam
നിറച്,

ഒരു,
ധാരാളം
10.1 General QTF QT__QTF kuraccu, കുറച്,
niraccu,
dharalam
നിറച്,

ധാരാളം

Copyright@TDIL
21

10.2 Cardinals QTC QT__QTC onnu,rantu ഒന്,


രണ്
10.3 Ordinals QTO QT__QTO onnAm,ranta ഒന്ന,
m

രണ്ട
11 Residuals RD RD
11.1 Foreign word RDF RD__RDF
11.2 Symbol SYM RD__SYM $, &, *, (, ), $, &, *, (, ),
ruu.
രൂ
11.3 Punctuation PUNC RD__PUNC ., : ; ., : ;
11.4 Unknown UNK RD__UNK
11.5 Echowords ECH RD__ECH

POS for Bangla

Sl. No Category Label Annotation Examples Remarks


Convention**
Top level Subtype Subtype
(level 1) (level 2)
1 Noun N N

1.1 Common NN N__NN kalama,


cashmaa

1.2 Proper NNP N__NNP Mohan, ravi,


rashmi
1.4 Nloc NST N__NST upare,
niche,
bhitara
2 Pronoun PR PR
2.1 Personal PRP PR__PRP se, tumi,
AmAra
2.2 Reflexive PRF PR__PRF nijera,
2.3 Relative PRL PR__PRL ye, yakhana,
yena, yAra
2.4 Reciprocal PRC PR__PRC paraspara,
2.5 Wh-word PRQ PR__PRQ ke, kakhana,

Copyright@TDIL
22

kena, kAra,
2.6 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha, jo,


yaha,
3.1 Deictic DMD DM__DMD sei, oi, o, se
3.2 Relative DMR DM__DMR ye, yei
3.3 Wh-word DMQ DM__DMQ kono,
3.4 Indefinite DMI DM__DMI keu
4 Verb V V
4.1 Main VM V__VM
4.1. Finite VF V__VM__VF karachhilAm
1 a, yAba,
khAYa
4.1. Non-finite VNF V__VM__VNF kare,
2 kheYe,
karale,
khete,
4.1. Infinitive VINF V__VM__VINF karate,
3 khete, yete
4.1. Gerund VNG V__VM__VNG yAoYa,
4 AsA, khelA,
karA
4.2 Auxiliary VAUX V__VAUX chhila,
habe, chAi
5 Adjective JJ sundara,
bhAla, lAla,
6 Adverb RB tA.DAtA.Di
, Aste,
haThAt
7 Postposition PSP theke,
abadhI,
madhye,
diYe
8 Conjunction CC CC
8.1 Co-ordinator CCD CC__CCD Ara, eba.n,
athabA,
kimbA
8.2 Subordinator CCS CC__CCS ye, kintu,
noile,

Copyright@TDIL
23

tAhale
8.2. Quotative UT CC__CCS__UT ---- Not required
1
9 Particles RP RP
9.1 Default RPD RP__RPD to, ye,
9.2 Classifier CL RP__CL jana, khAnA
9.3 Interjection INJ RP__INJ Are, ei,
hAya
9.4 Intensifier INTF RP__INTF bhiShaNa,
khuba,
sA~NghAtik
a
9.5 Negation NEG RP__NEG nA, naYa,
chhA.DA
10 Quantifiers QT QT
10.1 General QTF QT__QTF kichhu,
alpa, aneka
10.2 Cardinals QTC QT__QTC eka, dui,
tina
10.3 Ordinals QTO QT__QTO prathama,
paYalA,
dvitIYa
11 Residuals RD RD
11.1 Foreign word RDF RD__RDF A word written
in script other
than the script
of the original
text
11.2 Symbol SYM RD__SYM $, &, *, (, ) For symbols
such as $, & etc
11.3 Punctuation PUNC RD__PUNC ., : ; Only for
punctuations
11.4 Unknown UNK RD__UNK
11.5 Echowords ECH RD__ECH jala Tala,
khAbAra
dAbAra
** The annotation is to be done using the lowest level tag of the type hierarchy. Once the
lower level tag is selected, the higher level tags should be stored automatically.

Copyright@TDIL
24

POS for Marathi

Sl. Category Label Annotation Examples Remarks


No Convention**
Top level Subtype Subtype
(level 1) (level 2)
1 Noun N N मुलगा
(mulagaa-
boy),

राजा (raajaa-
king),

पस
ु ्तक
(pustaka-
book)

1.1 Common NN N__NN पस


ु ्तक
(pustaka-
book), लेखणी
(lekhaNi-
pen), चष्मा
(chashmaa-
goggles )

1.2 Proper NNP N__NNP मोहन


(Mohan), रवी
(Ravi), रश्मी
(Rashmi)
1.3 Verbal NNV N__NNV NA Not
Required
1.4 Nloc NST N__NST वर(var- up), Where it is
separate it is
खाल�(khaalee-
NST
down),
पढ
ु े (pudhe-
ahead),
मागे(maage-
back)
2 Pronoun PR PR येथे(yethe-
here), तेथे
(tethe-there),

Copyright@TDIL
25

जो(jo-who),
तो(to-he)
2.1 Personal PRP PR__PRP तो(to-he),
मी(mee-I),
त(ू tu-you),
ते(te-they),
तम
ु ्ह(tumhi-
you)

2.2 Reflexive PRF PR__PRF स्वत(swatha-


myself),
आपण(aapana-
oursleves)
2.3 Relative PRL PR__PRL जो(jo-who),
ज्यान(jyaane-
who),
जेव्ह(jevhaa-
while),
िजथे(jeethe-
where)

2.4 Reciprocal PRC PR__PRC परस्प(Parasp


ara-
reciprocally ),
एकमेक(ekmek
- mutually)
2.5 Wh-word PRQ PR__PRQ कोण(kona-
who),
केव्ह(kevha-
when),
कुठे (kuthe-
where)

2.6 Indefinite कोणी(kona


3 Demonstrative DM DM तो(to-he),
हा(haa-this),
जो(jo-who)

Copyright@TDIL
26

3.1 Deictic DMD DM__DMD इथे(ithe-here),


�तथे(tithe-
there)
3.2 Relative DMR DM__DMR जो(jo-who)
ज्यान(jyane-
who)
3.3 Wh-word DMQ DM__DMQ कोणता(konta-
which),
कोणी(kona-
who),

4 Verb V V (padalaa-fell
down),
गेला(gelaa-
went),
झोपला(jhopala
a-slept),
आहे(aahe-is),

4.1 Main VM V__VM पडला


(padalaa-fell
down),
गेला(gelaa-
went),
झोपला(jhopala
a-slept),
आहे(aahe-is),

4.1. Finite VF V__VM__VF - This subtype


1 WILL NOT
be used for
Hindi as
Hindi does
not have
enough
information
at the word
level.
4.1. Non-finite VNF V__VM__VNF - --do--
2
4.1. Infinitive VINF V__VM__VINF - --do--
3
4.1. Gerund VNG V__VM__VNG --do--

Copyright@TDIL
27

4
4.2 Auxiliary VAUX V__VAUX आहे (is),
लागला
(started),
5 Adjective JJ सदुं र(sundara-
beautiful),
चांगला(chaang
alaa-good),
मोठा(moThaa-
big)
6 Adverb RB लवकर(lavakar
- fast ),
हळूहळू(haLuuh
aLuu-slowly)
7 Postposition PSP Not in Marathi
8 Conjunction CC CC आ�ण(aaNi-
and),
कारण(kaaraN-
because)
8.1 Co-ordinator CCD CC__CCD आ�ण(aaNi-
and),
पण(paNa-
but), परं तु
(parantu-but)
8.2 Subordinator CCS CC__CCS कारण क�
(kaaraN-
because of),
का क�(kaaraN
kii-because
of), जर-
तर(jara-tara-
if-then)
8.2. Quotative UT CC__CCS__UT असा, म्हणू
1
9 Particles RP RP तर(tara),
9.1 Default RPD RP__RPD तर(tara) (then)
9.2 Classifier CL RP__CL Not required
9.3 Interjection INJ RP__INJ अरे रे!(arere),

Copyright@TDIL
28

ओहो!(oho-
oh!)
9.4 Intensifier INTF RP__INTF खूप(khoop-
lot, very ),
बराच(baraach-
too much),
अ�तशय(atisha
ya- too much,
very)
9.5 Negation NEG RP__NEG नको(nako-
not), न(na-
Na)
10 Quantifiers QT QT थोडे(thode-
few),
जास्(jaasta-
lot),
काह�(kaahi-
few), एक(eka-
one),
प�हला(pahilaa-
first),
10.1 General QTF QT__QTF थोडे thoDe-
few),
जास्(jaasta-
lot),
काह�(kaahi-
few)
10.2 Cardinals QTC QT__QTC एक(eka-one),
दोन(dona-two)
10.3 Ordinals QTO QT__QTO प�हला(pahilaa-
first),
दसु रा(dusaraa-
second)
11 Residuals RD RD
11.1 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text

Copyright@TDIL
29

11.2 Symbol SYM RD__SYM $, &, *, (, ) For symbols


such as $, &
etc
11.3 Punctuation PUNC RD__PUNC ., : ; Only for
punctuations
11.4 Unknown UNK RD__UNK
11.5 Echowords ECH RD__ECH जेवण�बवण(jev
anbivaNa-
meal/dinner),
डोके�बके(Doke
bike- head)
(Paanii-)
vaanii,
(khaanaa-)
vaanaa
** The annotation is to be done using the lowest level tag of the type hierarchy. Once the
lower level tag is selected, the higher level tags should be stored automatically.

POS for Gujarati

Sl. Category Label Annotation Examples Remarks


No Convention**
Top level Subtype Subtype
(level 1) (level 2)
1 Noun N N

1.1 Common NN N__NN kalam,chash


mA

‘pen’,
‘spectacles’

1.2 Proper NNP N__NNP mohan,ravI

‘Mohan’,
‘Ravi’
1.3 Nloc NST N__NST upar, nIche,
ahIM

‘up’, ‘down’,
‘in front’
2 Pronoun PR PR
2.1 Personal PRP PR__PRP huM,tuM,te

‘me’, ‘you’,

Copyright@TDIL
30

‘he/she’
2.2 Reflexive PRF PR__PRF pote,
jAte,svayam

‘herself/him
self’
2.3 Relative PRL PR__PRL je, te, jyAM

‘who’,
‘where’
2.4 Reciprocal PRC PR__PRC aras-paras,
paraspar

‘mutually’,‘e
ach other’
2.5 Wh-word PRQ PR__PRQ koN, kyAre,
kyAM

‘who’,
‘when’,
‘where’
2.6 Indefinite koI, kaIMK,
kashuM

‘someone’,
‘something’
3 Demonstrative DM DM
3.1 Deictic DMD DM__DMD A

‘this’
3.2 Relative DMR DM__DMR je, jeNe

‘which/who’,
‘whom’
3.3 Wh-word DMQ DM__DMQ koN,shuM,ke
m

‘who’,
‘what’, ‘why’
3.4 Indefinite koI, kaIMK,
kashuM

‘someone’,
‘something’
4 Verb V V
4.1 Main VM V__VM khAshe,khAd
hu

‘will eat’,

Copyright@TDIL
31

‘ate’
4.2 Auxiliary VAUX V__VAUX chhe,hatuM,k
aryuM

‘is’, ’was’,
‘did’
5 Adjective JJ
6 Adverb RB
7 Postposition PSP
8 Conjunction CC CC
8.1 Co-ordinator CCD CC__CCD ane,ke

‘and’, ‘or’
8.2 Subordinator CCS CC__CCS tethI, evuM,
kAraNke

‘so’, ‘like
that’,
‘because’
9 Particles RP RP
9.1 Default RPD RP__RPD paNa,ja,tO

‘but’, emph,
topic
9.2 Interjection INJ RP__INJ hE !!, arrrE
!!,O !!
9.3 Intensifier INTF RP__INTF bahu,ghaNu
M

‘very’,
‘much’
9.4 Negation NEG RP__NEG nahi,na

‘no’
10 Quantifiers QT QT
10.1 General QTF QT__QTF thoduM,ghaN
uM

‘little’,
‘much’
10.2 Cardinals QTC QT__QTC eka,be traN

‘one,two,thr
ee’
10.3 Ordinals QTO QT__QTO paheluM,bIjI

‘first’(neu),

Copyright@TDIL
32

‘second’
(fem)
11 Residuals RD RD
11.1 Foreign word RDF RD__RDF tv,
perasitemol
11.2 Symbol SYM RD__SYM $, *,&
11.3 Punctuation PUNC RD__PUNC , : ; {} ()
11.4 Unknown UNK RD__UNK
11.5 Echowords ECH RD__ECH kAm-
bAm,pANi-
bANi

‘work and the


like’, water
and the like’

POS for Konakani

Sl. Category Label Annotation Examples Remark


No Convention** s
Top level Subtype Subtype
(level 1) (level 2)
1 Noun N N

1.1 Common NN N__NN पस


ु ्त ,रु ,आंबो ,
माड

1.2 Proper NNP N__NNP रामायण, बायबल,


कुराण, ग�य,
क�कणी, क�पला
1.3 Nloc NST N__NST भायर, भीतर, वयर,
सकयल
2 Pronoun PR PR
2.1 Personal PRP PR__PRP हांव, त,ूं तो, त� , ते,
त्य, तम
ु च� , आमच� ,
तांचे
2.2 Reflexive PRF PR__PRF आपूण, स्वत

Copyright@TDIL
33

2.3 Relative PRL PR__PRL जातूंत, जो


2.4 Reciprocal PRC PR__PRC एकामेकाक, आपसांत
2.5 Wh-word PRQ PR__PRQ कोण, �कते, खंयचो
2.6 Indefinite कोणूय, �कत� य,
खयच� य
3 Demonstrative DM DM
3.1 Deictic DMD DM__DMD तो, हो
3.2 Relative DMR DM__DMR जो
3.3 Wh-word DMQ DM__DMQ कोण� , कसल�
3.4 Indefinite कोणाच� य, कसल� य
4 Verb V V
4.1 Main VM V__VM येवप

4.1. Finite VF V__VM__VF आयलो, आयला,


1 आ�यल्ल

4.1. Non- VNF V__VM__VNF येतकच, येवन,


Finite
2 आ�यल्लया, येवंक,
येवपाक, येवपाच� ,
येवच�

4.1. Infinitive VINF V__VM__VINF आसूं, व्हर , केल्या


3

4.1. Gerund VNG V__VM__VNG खावप, वचप,


4 खावपी, जेवपी,
समजुपी
4.2 Auxiliary VAUX V__VAUX NA
4.2. Finite V__VAUX__VF केल्ल� आस, आयला
1
आसत
4.2. Non- V__VAUX__VN करता जायत, करता
2 Finite F
आसतलो, येतीत
5 Adjective JJ सोबीत, संद
ु र
6 Adverb RB फाल्या, सवकास,

Copyright@TDIL
34

अश�
7 Postposition PSP खातीर, पासत, बगर,
कडेन, लागीं
8 Conjunction CC CC
8.1 Co-ordinator CCD CC__CCD आनी, वा
8.2 Subordinator CCS CC__CCS जाल्या, जर-तर,
दे खन
ू , म्हणल्य,
पण
ु न

8.2. Quotative UT CC__CCS__UT अश�, क�
1
9 Particles RP RP
9.1 Default RPD RP__RPD बी, आद�, इत्या�
9.2 Classifier CL RP__CL (पांच) जाण
9.3 Interjection INJ RP__INJ आरे , चप

9.4 Intensifier INTF RP__INTF उपाट, भरपूर
9.5 Negation NEG RP__NEG ना, न्य
10 Quantifiers QT QT
10.1 General QTF QT__QTF थोडे, चड, कांय, खब

10.2 Cardinals QTC QT__QTC एक, दोन
10.3 Ordinals QTO QT__QTO पयल� , दस
ु र�
11 Residuals RD RD
11.1 Foreign word RDF RD__RDF
11.2 Symbol SYM RD__SYM &, $
11.3 Punctuation PUNC RD__PUNC .,?-/
11.4 Unknown UNK RD__UNK
11.5 Echowords ECH RD__ECH जोवण-�बवण

Copyright@TDIL
35

POS for Maithili

Sl. Category Label Annotation Examples Remarks


No Convention**
Top level Subtype Subtype
(level 1) (level 2)
1 Noun N N

1.1 Common NN N__NN पोथी, कलम,


पं�डत, खवास

1.2 Proper NNP N__NNP अरु, �दनेश,


अतल

1.3 Nloc NST N__NST आग,ू पीछू,
ऊपर, नीचा,
एखन, आब,
बीच, कतहु
2 Pronoun PR PR
2.1 Personal PRP PR__PRP त�, हम, ई, ओ,
अहाँ
2.2 Reflexive PRF PR__PRF अपना, अपने,
स्वय, स्वयंमे
2.3 Relative PRL PR__PRL जे, िजनका,
िजनकर, जकरा
2.4 Reciprocal PRC PR__PRC एक-दोसरक�,
आपस, परस्प
2.5 Wh-word PRQ PR__PRQ के, क�, कथी
ककर

Indefinite केओ, �कछु/


�कउछ, कोनो
3 Demonstrative DM DM
3.1 Deictic DMD DM__DMD ओ, ई, ऊ

3.2 Relative DMR DM__DMR जे, जा�ह

3.3 Wh-word DMQ DM__DMQ के, क�, कोन

Indefinite केओ, �कछु/

Copyright@TDIL
36

�कउछ, कोनो
4 Verb V V
4.1 Main VM V__VM चलबैत, रौपेत,
पढइत, खाइत,
सत
ु त
ै , हँसत

4.2 Auxiliary VAUX V__VAUX अ�छ, छल,


होएब, �थक
5 Adjective JJ नीक, मोटका,
ललक�,
6 Adverb RB भने, अनायास,
क्र:,
एकाएक,
अवश्, पन
ु ः
फेर
7 Postposition PSP सँ, क�, लेल
8 Conjunction CC CC
8.1 Co-ordinator CCD CC__CCD आओर, परं च,
मद
ु ा, वा
8.2 Subordinator CCS CC__CCS जँ, तँ, �क, य�द

9 Particles RP RP
9.1 Default RPD RP__RPD भ�र, यौ, हौ, रौ

Classifier CL RP_CL टा, गोट, गो


9.3 Interjection INJ RP__INJ ओह-ओ, अहा,
वाह, हा
9.4 Intensifier INTF RP__INTF बहुत, बेसी,
खब
ू , �नतान्
9.5 Negation NEG RP__NEG न, न�ह, ज�ु न
10 Quantifiers QT QT
10.1 General QTF QT__QTF कनेक, बहुत,
�कछु

10.2 Cardinals QTC QT__QTC एक, एकटा,


दईु , बीसगोट,

Copyright@TDIL
37

तीन, चा�र
10.3 Ordinals QTO QT__QTO प�हल, दोसर,
तेसर, चा�रम
11 Residuals RD RD
11.1 Foreign word RDF RD__RDF A word
written in
script other
than the
script of the
original text
11.2 Symbol SYM RD__SYM $, , *, (, ) For symbols
such as $, &
etc
11.3 Punctuation PUNC RD__PUNC ., : ; Only for
punctuations
11.4 Unknown UNK RD__UNK
11.5 Echowords ECH RD__ECH जलखे (तलखे),
म�ट (स�ट)
** The annotation is to be done using the lowest level tag of the type hierarchy. Once the
lower level tag is selected, the higher level tags should be stored automatically.

POS for Urdu

Sl. Category Label Annotation Examples Remarks


No
Convention**

Top level Subtype Subtype

(level 1) (level 2)

1 Noun N N ،(laRkaa) ‫ﻟﮍﮐﺎ‬

(ism-‫)ﺍﺳﻢ‬ ،((raajaa ‫ﺭﺍﺟﺎ‬

(kitaab) ‫ﮐﺘﺎﺏ‬

1.1 Common NN N__NN ،(kitaab) ‫ﮐﺘﺎﺏ‬

-‫)ﻧﮑﺮﻩ‬ ،(qalam) ‫ﻗﻠﻢ‬


(nakeraa
(cashma) ‫ﭼﺸﻤہ‬

1.2 Proper NNP N__NNP ،Mohan))‫ﻣﻮﮨﻦ‬

-‫)ﻣﻌﺮﻓہ‬ ‫ﺭﺷﻤﯽ‬

Copyright@TDIL
38

((m‘aarefa ،(Rashmi)

(Ravi) ‫ﺭﻭی‬

1.3 Verbal NNV N__NNV ،(jalan) ‫ﺟﻠﻦ‬ May be


considered
‫)ﺣﺎﺻﻞ‬ ،(calan) ‫ﭼﻠﻦ‬
for Urdu-
–‫ﻣﺼﺪﺭ‬
،(bahaao) ‫ﺑﮩﺎﺅ‬ Hindi, too.
haasil-e-
(masdar ‫ﺑﻨﺎﻭٹ‬
(banaavat)

1.4 Nloc NST N__NST ،(upar) ‫ﺍﻭﭘﺮ‬

( zarf-‫)ﻅﺮﻑ‬ ،(niice) ‫ﻧﻴﭽﮯ‬

،(aage) ‫ﺁﮔﮯ‬

(piiche) ‫ﭘﻴﭽﻬﮯ‬

2 Pronoun PR PR ،(yih) ‫ﻳہ‬

(zamiir-‫)ﺿﻤﻴﺮ‬ ،(voh) ‫ﻭﻩ‬

(jo) ‫ﺟﻮ‬

2.1 Personal PRP PR__PRP ،(voh) ‫ﻭﻩ‬ In Urdu,


unlike
‫)ﺿﻤﻴﺮ‬ ،(tum) ‫ﺗﻢ‬
Hindi, voh
-‫ﺷﺨﺼﯽ‬
(maim) ‫ﻣﻴﮟ‬ is used both
zamiir-e- for singular
(shakhsii and plural.

2.2 Reflexive PRF PR__PRF ،(apnaa) ‫ﺍﭘﻨﺎ‬

( ‫ﺿﻤﻴﺮ‬ ،(khud) ‫ﺧﻮﺩ‬


‫ﻣﻌﮑﻮﺳﯽ‬-
‫ﺍﭘﻨﮯ ﺁپ‬
zamiir-e-
m‘aakoosii (apne aap)
)

2.3 Relative PRL PR__PRL ،(jo) ‫ﺟﻮ‬

( ‫ﺿﻤﻴﺮ‬ ،(jab) ‫ﺟﺐ‬


‫ﻣﻮﺻﻮﻟہ‬- ،(jis)‫ﺟﺲ‬
zamiir-e- (jahaM) ‫ﺟﮩﺎں‬
mausoolaa(

2.4 Reciprocal PRC PR__PRC ،(baaham) ‫ﺑﺎﮨﻢ‬


‫ﺩﺭﻣﻴﺎﻥ‬
(‫ﺿﻤﻴﺮ ﺭﺍﺟﻊ‬- ،(darmiyaan)
zamiir-e-
raaje‘) (aapas) ‫ﺁﭘﺲ‬

Copyright@TDIL
39

2.5 Wh-word PRQ PR__PRQ ،(kaun) ‫ﮐﻮﻥ‬

( ‫ﺿﻤﻴﺮ‬ ،(kab) ‫ﮐﺐ‬


‫ﺍﺳﺘﻔﮩﺎﻣﻴہ‬-
(kahaaM) ‫ﮐﮩﺎں‬
zamiir-e-
istafhaamiy
aa)

3 Demonstrative DM DM ،(yih) ‫ﻳہ‬

(‫ﺿﻤﻴﺮ ﺍﺷﺎﺭﻩ‬- ،(voh) ‫ﻭﻩ‬


zamiir-e-
،(inn) ‫ﺍﻥ‬
ishaaraa)
(unn) ‫ﺍﻥ‬

3.1 Deictic DMD DM__DMD ،(yih) ‫ﻳہ‬

-‫)ﺍﺷﺎﺭے‬ ،(voh) ‫ﻭﻩ‬


(ishaare

3.2 Relative DMR DM__DMR ،(jo) ‫ﺟﻮ‬

( ‫ﺿﻤﻴﺮ ﺍﺷﺎﺭﻩ‬ (jis)‫ﺟﺲ‬


‫ﻣﻮﺻﻮﻟہ‬-
zamiir-e-
ishaaraa
mausoolaa)

3.3 Wh-word DMQ DM__DMQ ،(kaun) ‫ﮐﻮﻥ‬ According


to Urdu
‫)ﺿﻤﻴﺮ ﺍﺷﺎﺭﻩ‬ ،(kis) ‫ﮐﺲ‬
grammar
-‫ﺍﺳﺘﻔﮩﺎﻣﻴہ‬
(kitnaa) ‫ﮐﺘﻨﺎ‬ words like
zamiir-e- koi, kisi,
ishaaraa kuch do not
istafhaamiy come under
(aa Wh-word;
they are
used for
indefinite
person. For
them,
another
category
(subtype),
i.e.,tankiir
(indefinitiv
e) is used.
Under this
category

Copyright@TDIL
40

following
words are
also placed:
chand,

b‘aaz,
fulaan, sab,
bahut. Can
we have a
category/

subtype
like
indefinitive
demonstrati
ve (DMI)?

4 Verb V V ،(giraa) ‫ﮔﺮﺍ‬

(f‘el-‫)ﻓﻌﻞ‬ ،(gayaa) ‫ﮔﻴﺎ‬

،(sonaa) ‫ﺳﻮﻧﺎ‬

(haMstaa) ‫ﮨﻨﺴﺘﺎ‬

4.1 Main VM V__VM ،(giraa) ‫ﮔﺮﺍ‬

،(gayaa) ‫ﮔﻴﺎ‬

،(sonaa) ‫ﺳﻮﻧﺎ‬

(haMstaa) ‫ﮨﻨﺴﺘﺎ‬

4.1.1 Finite VF V__VM__VF This


subtype
-‫)ﻣﺤﺪﻭﺩ‬
mahdoo WILL NOT
(d
be used for

Hindi as

Hindi does

not have

enough

information
at

the word

level.

Copyright@TDIL
41

4.1.2 Nonfinit VNF V__VM__VNF -- do--


e

‫)ﻏﻴﺮﻣﺤﺪﻭ‬
ghair -‫ﺩ‬
mahdoo
(d

4.1.3 Infinitiv VINF V__VM__VINF -- do--


e

-‫)ﻣﺼﺪﺭ‬
(masdar

4.1.4 Gerund VNG V__VM__VNG -- do--

‫)ﺣﺎﺻﻞ‬
-‫ﻣﺼﺪﺭ‬
haasil-e-
(masdar

4.2 Auxiliary VAUX V__VAUX ،(hai) ‫ﮨﮯ‬

-‫)ﻓﻌﻞ ﺍﻣﺪﺍﺩی‬ ،(rahaa) ‫ﺭﮨﺎ‬


f‘el-e-
(huaa) ‫ﮨﻮﺍ‬
(imdaadi

5 Adjective JJ ،(dilkash) ‫ﺩﻟﮑﺶ‬


،(safed) ‫ﺳﻔﻴﺪ‬
(sifat-‫)ﺻﻔﺖ‬
،(siyaah) ‫ﺳﻴﺎﻩ‬

،(cauRaa) ‫ﭼﻮڑﺍ‬

(uuMcaa) ‫ﺍﻭﻧﭽﺎ‬

6 Adverb RB ،(tez) ‫ﺗﻴﺰ‬

-‫)ﻣﺘﻌﻠﻖ ﻓﻌﻞ‬ jald)) ‫ﺟﻠﺪ‬


mut‘alliq-e-
(f‘el

7 Postposition PSP ‫ ﻧﮯ‬،(se) ‫ﺳﮯ‬


،(ko) ‫ ﮐﻮ‬،(ne)
jaar--‫)ﺟﺎﺭﻣﻮﺧﺮ‬
(e-moakkhar (meiM) ‫ﻣﻴﮟ‬

8 Conjunction CC CC ،(aur) ‫ﺍﻭﺭ‬

(atf‘-‫)ﻋﻄﻒ‬ ،(agar) ‫ﺍﮔﺮ‬

‫ﮐﻴﻮں ﮐہ‬
(kyoMki)

Copyright@TDIL
42

8.1 Co- CCD CC__CCD ،(aur) ‫ﺍﻭﺭ‬


ordinator
،(voh) ‫ﻭﻩ‬
-‫)ﺣﺮﻑ ﻭﺻﻞ‬
،(yaa) ‫ﻳﺎ‬
(harf-e-vasl
،(ki) ‫ﮐہ‬

(balki) ‫ﺑﻠﮑہ‬

8.2 Subordinat CCS CC__CCS ،(agar) ‫ﺍﮔﺮ‬


or
‫ﮐﻴﻮں ﮐہ‬
-‫)ﺗﺎﺑﻊ ﮐﻨﻨﺪﻩ‬ ،(kyoMki)
taab‘e (to) ‫ﺗﻮ‬
(kunindaa

8.2.1 Quotativ UT CC__CCS__UT Not


e required

-‫)ﺍﻗﺘﺒﺎﺳﯽ‬
iqtabaas
(ii

9 Particles RP RP ،(to) ‫ﺗﻮ‬

(haaliyaa-‫)ﺣﺎﻟﻴہ‬ ،(hii) ‫ﮨﯽ‬

(bhii) ‫ﺑﻬﯽ‬

9.1 Default RPD RP__RPD ،(to) ‫ﺗﻮ‬

(‫ڈﻳﻔﺎﻟﭧ‬- ،(hii) ‫ﮨﯽ‬


Default)
(bhii) ‫ﺑﻬﯽ‬

9.2 Classifier CL RP__CL Not


required
-‫)ﺩﺭﺟہ ﺑﻨﺪ‬
(darja band

9.3 Interjection INJ RP__INJ ،e)) ‫ﺍے‬

-‫)ﻓﺠﺎﺋﻴہ‬ ،(o) ‫ﺍﻭ‬


(fajaa’iyaa
،(are) ‫ﺍﺭے‬

،(jii) ‫ﺟﯽ‬

،(ahaa) ‫ﺍﮨﺎ‬

(vaah) ‫ﻭﺍﻩ‬

9.4 Intensifier INTF RP__INTF ،(bahut) ‫ﺑﮩﺖ‬

Copyright@TDIL
43

-‫)ﺣﺮﻑ ﺗﺎﮐﻴﺪ‬ ،(behad) ‫ﺑﮯ ﺣﺪ‬


harf-e-
،(albattaa) ‫ﺍﻟﺒﺘہ‬
(taakiid
،(zaroor) ‫ﺿﺮﻭﺭ‬
‫ﺧﺒﺮﺩﺍﺭ‬
(khabardaar)

9.5 Negation NEG RP__NEG ،(na) ‫ﻧہ‬

-‫)ﺣﺮﻑ ﻧﮩﯽ‬ (nahiiM) ‫ﻧﮩﻴﮟ‬


harf-e-
(nahii

10 Quantifiers QT QT ،(cand) ‫ﭼﻨﺪ‬

-‫)ﮐﻤﻴﺖ ﻧﻤﺎ‬ ‫ﻣﺘﻌﺪﺩ‬


kamiiyat
(muta’addad)
(numaa
،(qaliil) ‫ﻗﻠﻴﻞ‬

(kasiir) ‫ﮐﺜﻴﺮ‬

10.1 General QTF QT__QTF ،(thoRaa) ‫ﺗﻬﻮڑﺍ‬

(aam‘ -‫)ﻋﺎﻡ‬ ،(bahut) ‫ﺑﮩﺖ‬


(kuch)‫ﮐﭽﻬ‬

10.2 Cardinals QTC QT__QTC ،(Ek) ‫ﺍﻳﮏ‬

-‫)ﺍﻋﺪﺍﺩ ﻣﻄﻠﻖ‬ ،(do) ‫ﺩﻭ‬


-a‘adaad
(tiin)‫ﺗﻴﻦ‬
(e-mutlaq

10.3 Ordinals QTO QT__QTO ،(avval) ‫ﺍﻭﻝ‬

-‫)ﺗﺮﺗﻴﺒﯽ ﺍﻋﺪﺍﺩ‬ ،(doam) ‫ﺩﻭﻡ‬


tartiibii
،(pahalaa) ‫ﭘﮩﻼ‬
(a‘adaad ‫ﺩﻭﺳﺮﺍ‬
(duusaraa)

11 Residuals RD RD

baaqi -‫)ﺑﺎﻗﯽ ﻣﺎﻧﺪﻩ‬


(maandaa

11.1 Foreign RDF RD__RDF A word

Copyright@TDIL
44

word written in

-‫)ﺑﺪﻳﺴﯽ ﻟﻔﻆ‬ script other


bidesii
(lafz than the
script

of the
original

text.

11.2 Symbol SYM RD__SYM $, &, *, (, ) Such


symbols are
-‫)ﻋﻼﻣﺖ‬ & ,$ not used in
(alaamat‘ Urdu. They
are written

‫(ڈﺍﻟﺮ‬dollar),
‫(ﭘﺎﻭﻧﮉ‬pound)
etc.

11.3 Punctuatio PUNC RD__PUNC ‫ ۔‬،, ،: ،; Only for


n
Punctuation
-‫)ﺍﻭﻗﺎﻑ‬ s
(auqaaf

11.4 Unknown UNK RD__UNK

naa -‫)ﻧﺎﻣﻌﻠﻮﻡ‬
(m‘aaloom

11.5 Echowords ECH RD__ECH ‫( ِﻭﻝ‬-‫)ﺩﻝ‬

‫)ﮔﻮﻧﺞ ﺩﺍﺭ‬ ،dil-) vil)


-‫ﺍﻟﻔﺎﻅ‬
‫( ﻭﻳﺎﺭ‬-‫)ﭘﻴﺎﺭ‬
goonjdar
(lafz ،pyaar-) vyaar)

‫(ﻭﺍﺋﮯ‬-‫)ﭼﺎﺋﮯ‬

caa‘e-) vaa‘e)

** The annotation is to be done using the lowest level tag of the type hierarchy. Once the lower
level tag is selected, the higher level tags should be stored automatically.

Copyright@TDIL
45

7. XML INTERNATIONALIZATION BEST PRACTICES


To make the common POS Schema for Indian Languages completely interoperable
extensible and web enabled, W3C XML Internationalization best practices guidelines and
ISO Metadata standard are adopted in the above framework.

7.1 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized
effectively.

ITS for Schema developers:


User will find proposals for attribute and element names to be included in their new
schema (also called "host vocabulary"). It leads to easier recognition of the concepts
represented by both schema users and processors. [For more details
http://www.w3.org/TR/2007/REC-its-20070403/]

Main Attributes:
Defining mark-up for natural language labelling (xml:lang- defined for the root element
of your document, and for any element where a change of language may occur), Defining
mark-up to specify text direction (its:dir - defined for the root element of your document,
and for any element that has text content), Indicating which elements and attributes
should be translated (its:translateRule- elements to indicate which elements have non-
translatable content), Providing information related to text segmentation
(its:withinTextRule- elements to indicate which elements should be treated as either part
of their parents, or as a nested but independent run of text), Defining mark-up for unique
identifiers (xml:id- elements with translatable content can be associated with a unique
identifier), Defining mark-up for notes to localizers (its:locNote- allows content authors
to provide localization-related notes as attribute values, or to point to the location of the
relevant note text using). [For more details http://www.w3.org/TR/xml-i18n-bp/]

8. XML SCHEMA
XML Schemas express shared vocabularies and allow machines to carry out rules made
by people and to define a class of XML documents, and so the term "instance document"
is often used to describe an XML document that conforms to a particular schema. It
provides a means for defining the structure, content and semantics of XML documents.
[For more details http://www.w3.org/TR/1999/NOTE-xml-schema-req-19990215]

Copyright@TDIL
46

9. METADATA ON POS

Metadata:
Metadata describes how and when and by whom a particular set of data was collected,
and how the data is formatted. It is essential for understanding information stored in data
warehouses and has become increasingly important in XML-based Web applications.

XML Metadata:
Metadata built into the document. Every element has a tag to tell you where the data is
stored in the document. Descriptive tags give structure to the document and tell you
what the data means (sort of).
“Sort of” because it only tells the tag name, so this only has meaning to someone who
already understands what the element or attribute means.

METADATA AS PER ISO 12620:1999

Metadata ()
{

<?xml version="1.0"?>
<datasm-categorySelection xmlns="http://www.isocat.org/ns/dcif" dcif-version="1.0">
<globalInformation>............</globalInformation>

<languageSection>

<language>en</language>

<identifier>............ </identifier>
<version>1.0.0</version>
<registrationStatus>standard</registrationStatus> // registered as a standard //
<origin>ISO 12620:1999
<author>................</author>
<domain>............</domain>

</origin>
<creation>
<creationDate>1999-01-01</creationDate>
</creation>
<descriptionSection>

Copyright@TDIL
47

<definitionClass>
<definition xml:lang="en">.......................</definition>
<source>ISO 12620:1999</source>
</definitionClass>
</descriptionSection>
</languageSection>

10. ONE TO ONE MAPPING LABELS IN POS SCHEMA

In order to develop common framework of XML based POS schema in all 22 Indian
Languages, it is necessary that labels defined in POS Schema for English to have one to
one mapping for Indian Languages. The XML schema needs to have a complete tree
structure as depicted in fig. below:

Copyright@TDIL
48

The common XML Schema would select a particular Indian Language by and the Schema
then needs to be transformed into POS Schema for that particular language. The language
specific POS Schema could be enabled by making a particular branch of the tree structure
‘off’. It is schematically represented in the next heading. i.e. POS schema block diagram

11. POS SCHEMA BLOCK DIAGRAM

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script
(Devanagari,
Malayalam, Bangla,
Perso-arabic-----------
-- n=12

Select Language Display (Metadata)


(Hindi, Malayalam, Call (POS Schema)
Bodo, Kashmiri, ---- End
Display (Desired Nodes)
---------n=22
Hide (remaining nodes)

Copyright@TDIL
49

12. DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

<?xml version="1.0" encoding="UTF-8"?>


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<file Desc>

<titleStmt>

<title>POS tag in multilingual language</title>

<script>.................. </script>

<language>multilingual</language>

<label language>……………..</label language>

<type>multimodal</type>

[Languages taken: Hindi, Bodo, Malayalam, Kashmiri, Assamese, Konkani, Gujarati]

--------------------------------------Noun Block--------------------------------------
<xs:element name="cat" POS cat=”noun” hin-cat=”सं�ा” brx-cat=”मंम
ु ा” mal-cat=”നാമം”

kas-cat=”‫ ” ﻧﺎ ُﻭﺕ‬asm-cat=”িবেশষয” kok-cat=”नाम” guj-cat”સંજ્” tag=”N”>

<xs:attribute name="type" subcat="common” hin-cat=”जा�तवाचक” brx-cat=”फोलेर


�दिन्थग” mal-cat=”സാമാന� നാമം” kas-cat=”‫ ” ﻋﺎﻡ‬asm-cat=”জািতবাচক” kok-

cat=”जातवाचक नाम” guj-cat”�િતવાચક” tag=”NN">

<xs:attribute name="type" subcat ="Proper” hin-cat=”व्य��वाच” brx-cat=”मुं


�दिन्थग” mal-cat=”സംജ് നാമം” kas-cat=”‫ ” ﺧﺎﺹ‬asm-cat=”বয্ি�বাচ” kok-

cat=”व्य��वाच नाम” guj-cat”વ્ય�ક્તવા” tag=”NNP">

<xs:attribute name="type" subcat ="Verbal” hin-cat=”�क्रयामू” brx-cat=”हाबा


�दिन्थग” kas-cat=”‫ﺍﻭﺗٲﻭۍ‬ ٛ ” asm-cat=”ি�য়াবাচক” kok-cat=”�क्रयामू नाम” guj-
ٕ ‫ﮐﺮ‬
cat”�ક્રયાવા” tag=”NNV">

Copyright@TDIL
50

<xs:attribute name="type" subcat ="Nloc” hin-cat=”दे श-काल सापे�” brx-cat=”थाव�न


�दिन्थग मुंमा” mal-cat=”ആധാരിക നാമം” kas-cat=”‫ ” ﻧﺎﻭﺗ ٕہ ﺟﺎﻳ ِہ ﮨﺎﻭ‬asm-cat=”�ানবাচক”

kok-cat=”थळसापे�-काळ- नाम” guj-cat”સ્થાનવાચ” tag=”NST">

-------------------------------------Pronoun Block-----------------------------------
<xs:element name="cat" POS cat=”Pronoun” hin-cat=”सवर्ना” brx-cat=”मुंराइ” mal-
cat=”സരവവ്നാ” kas-cat=”‫ ” ﭘَﺮﻧﺎ ُﻭﺕ‬asm-cat=”সবর্না” kok-cat=”सवर्ना” guj-

cat”સવર્ના” tag=”PR”>

<xs:attribute name="type" subcat ="Personal” hin-cat=”व्य��वाच” brx-cat=”संबुं


�दिन्थग” mal-cat=”പുരുഷ സരവവ്നാ” kas-cat=”‫ ” ﺷﺨﺼﻴٲﺗﯽ‬asm-cat=”বয্ি�বাচ”

kok-cat=”पर
ु ूश सवर्न” guj-cat”�ુ�ુષવાચક” tag=”PRP">

<xs:attribute name="type" subcat ="Reflexive” hin-cat=”�नजवाचक” brx-cat=”गाव


�दिन्थग” mal-cat=”നിചവാചി സരവവ്നാ” kas-cat=”‫ ” ﻣﺎﮐﻮﺳﯽ‬asm-cat=”আত্মবা”

kok-cat=”आत्मवाच सवर्ना” guj-cat”પ્રિત�બ��” tag=”PRF">

<xs:attribute name="type" subcat ="Reciprocal” hin-cat=”पारस्प�र” brx-


cat=”गावज� गाव सोमोन्द” mal-cat=”സംബന്ധവാ സരവവ്നാ” kas-cat=”‫” ﺑﺎﮨﻤﯽ‬

asm-cat=”পাৰ�িৰক” kok-cat=”संबंद� सवर्ना” guj-cat”પરસ્પરવાચ” tag=”PRC">

<xs:attribute name="type" subcat ="Relative” hin-cat=”सम्बन वाचक” brx-


cat=”सोमोन्दो �दिन्थ” mal-cat=”പാരസ്പി സരവവ്നാ” kas-cat=”‫ ” ﺭٲﺑِﺘٲﻭۍ‬asm-

cat=”স��বাচক” kok-cat=”एकमेक� सवर्ना” guj-cat”સાપેક” tag=”PRL">

<xs:attribute name="type" subcat ="Wh-words” hin-cat=”प्र�वा” brx-cat=”स��थ


�दिन्थग” mal-cat=”േചാദ�വാചി സരവവ്നാ” kas-cat=”‫ ” ک ﻟﻔﻆ‬asm-cat=”��েবাধক

সবর্না” kok-cat=”प्रस्न सवर्ना” guj-cat”પ્ર�ાથર્વ” tag=”PRQ">

Copyright@TDIL
51

----------------------------------Demonstrative Block------------------------------
<xs:element name="cat" POS cat=”Demonstrative” hin-cat=”�नष्चयवाच” brx-cat=”थाव�न
�दिन्थग” mal-cat=”നിരേദശകം” kas-cat=”‫ﺮﻧﺎﻭﺗۍ‬
ٕ َ‫ ” ﮨﺎ َﻭﻥ ﭘ‬asm-cat=”িনেদর ্শেবাধ” kok-
cat=”दशर्” guj-cat”દશર્ક” tag=”DM”>

<xs:attribute name="type" subcat =" Deictic” hin-cat=”” brx-cat=”�थ �दिन्थग” mal-


cat=”്രപത� സൂചകം” kas-cat=”‫ ” ﻭٲﻧﻴٲﻭۍ‬asm-cat=”�তয্� িনেদর্” kok-cat=”” guj-
cat”ઉલ્લેખદશર” tag=”DMD">

<xs:attribute name="type" subcat ="Relative” hin-cat=”सम्बन वाचक” brx-


cat=”सोमोन्दो �दिन्थ” mal-cat=”സംബന്ധവാ നിരേദശകം” kas-cat=”‫” ﺭٲﺑﺘٲﻭۍ‬

asm-cat=”স��বাচক” kok-cat =”संबंद� दशर्” guj-cat”સાપેક” tag=”DMR">

<xs:attribute name="type" subcat ="Wh-words” hin-cat=”प्र�वा” brx-cat=”म


स��थ �दिन्थग” mal-cat=”േചാദ�വാചി നിരേദശകം” kas-cat=”‫ ” ک ﻟﻔﻆ‬asm-

cat=”��েবাধক অবয্” kok-cat=”प्रस्न दशर्” guj-cat”પ્ર�વા” tag=”DMQ">

-------------------------------------Verb Block---------------------------------------
<xs:element name="cat" POS cat=”Verb” hin-cat=”�क्र” brx-cat=”थाइजा” mal-cat=”്രകി”

ٚ ” asm-cat=”ি�য়া” kok-cat=”�क्रया” guj-cat”આખ્યા” tag=”V”>


kas-cat=”‫ﮐﺮﺍ ُﻭﺕ‬

<xs:attribute name="type" subcat ="Auxiliary Verb” hin-cat=”सहायक �क्र” brx-


cat=”लेङाइ थाइजा” mal-cat=”സഹായക ്രകി” kas-cat=”‫ﮐﺮﺍﻭﺕ‬
ُ ‫ ” ڈﮐﻬ ٕہ‬asm-
cat=”সহায়কাৰী ি�য়া” kok-cat=”पालवी �क्रया” guj-cat”” tag=”VAUX">

<xs:attribute name="type" subcat ="Main Verb” hin-cat=”मुख्य �क्” brx-cat=”गुबै


थाइजा” mal-cat=”്രപധാ ്രകി” kas-cat=”‫ ” ﺭﺍے ﮐﺮﺍ ُﻭﺕ‬asm-cat=”মুখয্ ি�য়” kok-

cat=”मुखेल �क्रया” guj-cat”�ુખ્” tag=”VM">

<xs:attribute name="subtype" subcat ="Finite” hin-cat=”प�र�मत” brx-


cat=”जाफुंजा थाइजा” mal-cat=”പൂരണ ്രകി” kas-cat=”‫ﺸﺮ ﮨﺎﻭ‬
ٕ ‫ ” ِﮨ‬asm-cat=”সমািপকা”
kok-cat=”�न�ीत �क्रया” guj-cat”� ૂણર” tag=”VF">

Copyright@TDIL
52

<xs:attribute name="subtype" subcat ="Infinitive” hin-cat=”अनंत” brx-


cat=”जाफु�ङ थाइजा” mal-cat=”്രകിയാരൂപ” kas-cat=”‫ﺸﺮ ﮐﻬﺎﻭ‬
ٕ ‫ ” ِﮨ‬asm-cat=”অসমািপকা”
kok-cat=”सादारण रू” guj-cat”હ�ત્વથ” tag=”VINF">

<xs:attribute name="subtype" subcat ="Gerund” hin-cat=”�क्रयावा” brx-


cat=”जाफुबाय थानाय �दिन्थग” kas-cat=”‫ﮐﺮﺍﻭﺗ ٕہ ﻧﺎ ُﻭﺕ‬
ٛ ” asm-cat=”িনিমত্তাথর্ক স” kok-
cat=”�क्रयावा नाम” guj-cat”વતર્માન�ૃદન” tag=”VNG">

<xs:attribute name="subtype" subcat ="Non-Finite” hin-cat=”गैर प�र�मत”


brx-cat=”जाफु�ङ थाइजा” mal-cat=”അപൂരണ ്രകി” kas-cat=”‫ﺸﺮ ﮨﺎﻭ‬
ٕ ‫ ” ﻧﺎ ِﮨ‬asm-
cat=”অসমািপকা” kok-cat=”अ�न�ीत �क्रया” guj-cat”અ� ૂણર” tag=”VNF">

------------------------------------Adjective Block----------------------------------
<xs:element name="cat" POS cat=”Adjective” hin-cat=”�वशे�ण” brx-cat=”थाइला�ल” mal-
cat=”നാമ വിേശഷണം” kas-cat=”‫ ” ﺑﺎ ُﻭﺕ‬asm-cat=”িবেশষণ” kok-cat=”�वशेशण” guj-

cat”િવશેષણ” tag=”JJ”>

---------------------------------------Adverb Block----------------------------------
<xs:element name="cat" POS cat=”Adverb” hin-cat=”�क्र �वशे�ण” brx-cat=”थाइजा�न
थाइला�ल” mal-cat=”്രകിയ വിേശഷണം” kas-cat=”‫ ” ﻟَﮕ ٕہ ﺑٲﺵ‬asm-cat=”ি�য়া িবেশষণ”

kok-cat=”�क्रया�वशे” guj-cat”�ક્રયાિવશે” tag=”RB”>

-----------------------------------Post Position Block-------------------------------


<xs:element name="cat" POS cat=”Post Position” hin-cat=”परसगर” brx-cat=”सोदोब उन
महर�थ” mal-cat=”അനു്രപേയാഗ” kas-cat=”‫ﭘﻮﺕ ﺟﺎے‬
ٚ ” asm-cat=”অনুসগর” kok-
cat=”संबंद� अव्य” guj-cat”અ�ુગો” tag=”PSP”>

------------------------------------Conjunction Block-------------------------------
<xs:element name="cat" POS cat=”Conjunction” hin-cat=”योजक” brx-cat=”दाजाब महर�थ”
mal-cat=”സമുച്ച” kas-cat= ”‫ ” ﻭﺍﮢ َﻮﻥ‬asm-cat=”সংেযাজক” kok-cat=”जोड अव्य” guj-

cat”સંયોજકો” tag=”CC”>

Copyright@TDIL
53

<xs:attribute name="type" subcat ="Co-ordinator” hin-cat=”समन्वय” brx-


cat=”लोगो महर” mal-cat=”ഏേകാപിത സമുച്ച” kas-cat=”‫ ” ﻭﺍﮢُﺖ‬asm-

cat=”সম�য়ক” kok-cat=”समानाधीकरण जोड अव्य” guj-cat”સહ�ક્રયાદશ” tag=”CCD">

<xs:attribute name="type" subcat ="Subordinator” hin-cat=”” brx-cat=”लेङाइ लोगो


महर” mal-cat=”ആശ്ചര�സൂ സമുച്ച” kas-cat=”‫ ” ﺗﺤﺘ ُﻮﻥ‬asm-cat=”” kok-

cat=”आश्र जोड अव्य” guj-cat”ગૌણ�ક્રયાદશ” tag=”CCS">

<xs:attribute name="subtype" subcat ="Quotative” hin-cat=”उ��-वाचक” mal-


cat=”ഉദ്ധാരണവാ സമുച്ച” brx-cat=”मुंख’�थ” kas-cat= ”‫ ” َﺩﭘَﻦ ﻧِﺸﺎﻧ ٕہ‬asm-cat=””
kok-cat=”अवतरणअथ�- उतर” guj-cat”” tag=”UT">

------------------------------------Particles Block------------------------------------
<xs:element name="cat" POS cat=”Particles” hin-cat=”अव्य” brx-cat=”महर�थ” mal-
cat=”നിപാദം” kas-cat=”‫ ” ﮢﻮﮢ ٕہ َﻭ ٕﻧﺘۍ‬asm-cat=”আনুষংিগক অবয্” kok-cat=”अव्य” guj-

cat”િનપાત” tag=”RP”>

<xs:attribute name="type" subcat ="Default” hin-cat=”व्य�तक” brx-cat=”गोरोिन्”


mal-cat=”സാമാന�ം” kas-cat=”‫ ” ِڈﻓﺎﻟﭧ‬asm-cat=”” kok-cat=”सरभरस अव्य” guj-

cat”સ્વયં �” tag=”RPD">

<xs:attribute name="type" subcat ="Classifier” hin-cat=”वग�कारक” brx-cat=”�थ


�दिन्थग्रा दाजा” mal-cat=”വരഗ്ഗ” kas-cat=”‫ ” َﻭﺭ ٕﮔﮩﺎ‬asm-cat=”িনিদর ্�তাবাচক সগ” kok-
cat=”वगर् अव्य” guj-cat”” tag=”CL">

<xs:attribute name="type" subcat ="Interjection” hin-cat=”�वस्मया�दबोध” brx-


cat=”सोमोनांनाय �दिन्थग” mal-cat=”വ�ാേക്ഷപ” kas-cat=”‫ ” ژﻫﮣُﺖ‬asm-
cat=”িব�য়েবাধক” kok-cat=”उमाळी अव्य” guj-cat”” tag=”INJ">

<xs:attribute name="type" subcat ="Negation” hin-cat=”नकारात्म” brx-cat=”न�ङ


�दिन्थग” mal-cat=”നിേഷദം” kas-cat=”‫ ” ﻧَہ ﮐٲﺭۍ‬asm-cat=”নঞাথর্” kok-cat=”न्हयकार

अव्य” guj-cat”નકારદશર્” tag=”NEG">

Copyright@TDIL
54

<xs:attribute name="type" subcat ="Intensifier” hin-cat=”तीव्” brx-cat=”गुन


�दिन्थग” mal-cat=”തീ്ര നിപാദം” kas-cat=”‫ ” ﺷﺪﺕ ﮨﺎﺭ‬asm-cat=”” kok-cat=”तीव्रका

अव्य” guj-cat”માત્ર ા�ૂ” tag=”INTF">

------------------------------------Quantifiers Block--------------------------------
<xs:element name="cat" POS cat=”Quantifiers” hin-cat=”संख्यावाच” brx-cat=”�बबां
�दिन्थग” mal-cat=”സംഖ�ാവാചി i” kas-cat=”‫ ﻨﺪ‬ٛ‫ ” ﮔﺮﻳ‬asm-cat=”পিৰমাণবাচক” kok-

cat=”संख्यादशर” guj-cat”પ�રમાણ� ૂચકો” tag=”QT”>

<xs:attribute name="type" subcat ="General” hin-cat=”सामान्” brx-cat=”सरासनस्”


mal-cat=”െപാതുസംഖ�ാവാചി” kas-cat=”‫ ” ﻋﻤﻮﻣﯽ‬asm-cat=”সাধাৰণ” kok-

cat=”सामान्” guj-cat”સામાન્” tag=”QTF">

<xs:attribute name="type" subcat ="Cardinals” hin-cat=”गणनासच


ू क” brx-cat=”गब
ु ै
�बसान” mal-cat=”അടിസ്ഥ സംഖ�ാവാചി” kas-cat=”‫ ﻨﺪ‬ٚ‫ﺁﻧﮑﻮﻧ ٕہ ﮔﺮﻳ‬ٛ ” asm-

cat=”সংখয্াবাচ” kok-cat=”संख्यावाच” guj-cat”સંખ્યાવાચ” tag=”QTC">

<xs:attribute name="type" subcat ="Ordinals” hin-cat=”क्रमसू” brx-cat=”फा�र


�बसान” mal-cat=”കരമ്മവാ” kas-cat=”‫ ﻨﺪ‬ٚ‫ ” ٴﻭﻧۍ ﮔﺮﻳ‬asm-cat=”�মবাচক সংখয্াবাচক

শ�” kok-cat=”क्रमवा” guj-cat”ક્રમવા” tag=”QTO">

------------------------------------Residuals Block----------------------------------
<xs:element name="cat" POS cat=”Residuals” hin-cat=”अवशेष” brx-cat=”आद्” mal-

cat=”അവശിഷ്ടപ” kas-cat=”‫ ” ﺑﺎﻗﻴٲﺗﯽ‬asm-cat=”” kok-cat=”हे र” guj-cat”શેષ” tag=”RD”>

<xs:attribute name="type" subcat ="Foreign word” hin-cat=”�वदे शी शब्” brx-


cat=”गब
ु न
ु हादरा�र सोदोब” mal-cat=”അന�ഭാഷാപദം” kas-cat=”‫ ” ﻏٲﺭ ُﻣﻠﮑﯽ ﻟَﻔﻆ‬asm-

cat=”িবেদশী শ�” kok-cat=”�वदे शी” guj-cat”પરદ� શી શબ્દ” tag=”RDF">

<xs:attribute name="type" subcat ="Symbol” hin-cat=”प्रत” brx-cat=”नेस�न” mal-

cat=”ചിഹ്” kas-cat=”‫ ” ﻋﻼ َﻣﺖ‬asm-cat=”�তীক” ki=”कुर” guj-cat”સંક�ત” tag=”SYM">

Copyright@TDIL
55

<xs:attribute name="type" subcat ="Unknown” hin-cat=”अ�ात” brx-cat=”�म�थ�य”


mal-cat=”ഇതരപദം” kas-cat=”‫ ” ﺍَﺯﻭﻥ‬asm-cat=”অ�াত” kok-cat=”अनवळखी” guj-

cat”અ�ણ્ય શબ્દ” tag=”UNK">

<xs:attribute name="type" subcat ="Punctuation” hin-cat=”�वरामा�द-�च�” brx-


cat=”थाद’�सन खािन्” mal-cat=”വിരാമ ചിഹ്” kas-cat=”‫ﮩﺠ َﻮﻥ‬
ِ َ‫ ” ﻟ‬asm-cat=”যিত
িচন” kok-cat=”�वरामकूर” guj-cat”િવરામ�ચહ્” tag=”PUNC">

<xs:attribute name="type" subcat ="Echowords” hin-cat=”प्र�तध्-शब्” brx-


cat=”�रंखां सोदोब” mal-cat=”മാെറ്റാലിവാ” kas-cat=”‫ﭘﻮﺕ ُﺩﻧۍ ﻟﻔﻆ‬
ٚ ” asm-
cat=”�নয্াত্মক” kok-cat=”पडसाद� उतरां” guj-cat”અ�ુરણનાત્મ” tag=”ECH">

</xs:attribute>

</xs:element> </xs:schema>
}

Copyright@TDIL
56

13. ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES

To incorporate such facility in the xml Schema the common one to one mapping table for
the labels has been developed as presented in the Table 1, Table 2 and Table 3

Languages: Hindi, Punjabi, Urdu, Gujarati, Oriya, Bengali


S.No English Hindi Punjabi Urdu Gujarati Oriya Bengali
1 Noun सं�ा ਨ�ਵ ‫ﺍﺳﻢ‬ સંજ્ ସଂଞା িবেশষয
common जा�तवाचक ਆਮ ‫ﻧﮑﺮﻩ‬ �િતવાચક ଜାତିବାଚକ জািতবাচক
Proper व्य��वाच ਖਾਸ ‫ﻣﻌﺮﻓہ‬ વ્ય�ક્તવા ବ୍ୟକ୍ତିବାଚକ বয্ি�বাচ
Verbal �क्रयामू / ਿਕਿਰਆਮੂਲਕ ‫ﺣﺎﺻﻞ‬ �ક્રયાવા ি�য়ামূলক
‫ﻣﺼﺪﺭ‬
कृदं त କ୍ରିୟାବାଚକ
Nloc दे श-काल ਸਿਥਤੀ ਸੂਚਕ ‫ﻅﺮﻑ‬ સ્થાનવાચ ଦେଶ-କାଳ �ানবাচক
सापे� ସାପେକ୍ଷ
2 Pronoun सवर्ना ਪੜਨ�ਵ ‫ﺿﻤﻴﺮ‬ સવર્ના ସର୍ବନାମ সবর্না
Personal व्य��वाच ਪੁਰਖਵਾਚੀ ‫ﺿﻤﻴﺮ‬ �ુ�ુષવાચક ବ୍ୟକ୍ତିବାଚକ বয্ি�বাচ
‫ﺷﺨﺼﯽ‬
Reflexive �नजवाचक ਿਨਜਵਾਚੀ ‫ﺿﻤﻴﺮ‬ પ્રિત�બ�� ଆତ୍ମବାଚକ আত্মবা
‫ﻣﻌﮑﻮﺳﯽ‬
Reciprocal पारस्प�र ਪਰਸਪਰੀ
‫ﺿﻤﻴﺮ‬ પરસ્પરવાચ ପାରସ୍ପାରିକ বয্িতহা
‫ﺭﺍﺟﻊ‬
Relative संबध
ं - वाचक ਸੰ ਬੰ ਧਵਾਚੀ ‫ﺿﻤﻴﺮ‬ સાપેક ସଂବନ୍ଧବାଚକ স��বাচক
‫ﻣﻮﺻﻮﻟہ‬
Wh-words प्र�वा ਪ�ਸ਼ਨਵਾਚੀ ‫ﺿﻤﻴﺮ‬ પ્ર�ાથર્વ ପ୍ରଶ୍ନବାଚକ ��বাচক
‫ﺍﺳﺘﻔﮩﺎﻣﻴہ‬
Indefinite अ�न�यवाचक NA NA અિનિ�ત સવર્ના NA অিনেদর ্শ
3 Demonstrative �न�यवाचक/ ਸੰ ਕੇਤਵਾਚੀ ‫ﺍﺷﺎﺭے‬ દશર્ક ନିଶ୍ଚୟବାଚକ/ସ িনেদর ্শ
संकेतवाचक ଂକେତବାଚକ
Deictic �नद� शी ਪ�ਤੱਖ ਪ�ਮਾਣਵਾਚੀ ‫ﺍﺷﺎﺭﻩ‬ ઉલ્લેખદશર �তয্� িনেদর্

Relative संबंधवाचक ਸੰ ਬੰ ਧਵਾਚੀ ‫ﺍﺷﺎﺭﻩ‬ સાપેક ସଂବନ୍ଧବାଚକ স��বাচক


‫ﻣﻮﺻﻮﻝ‬
Wh-words प्र�वा ਪ�ਸ਼ਨਵਾਚੀ ‫ﺍﺷﺎﺭﻩ‬ પ્ર�વા ପ୍ରଶ୍ନବାଚକ ��বাচক
‫ﺍﺳﺘﻔﮩﺎﻣﻴہ‬

Indefinite अ�न�यवाचक NA NA અિનિ�ત સવર્ના NA অিনেদর ্শ


4 Verb �क्र ਿਕਿਰਆ ‫ﻓﻌﻞ‬ આખ્યા କ୍ରିୟା ি�য়া
Auxiliary सहायक ਸਹਾਇਕ ‫ﻓﻌﻞ ﺍﻣﺪﺍﺩی‬ સહાયકાર� েগৗণ ি�য়া
Verb
�क्र ਿਕਿਰਆ �ક્ર ସହାୟକ କ୍ରିୟା
Main मुख्य�क्र ਮੁੱ ਖ ਿਕਿਰਆ ‫ﻓﻌﻞ‬ �ુખ્ ମୁଖ୍ୟ କ୍ରିୟା মুখয্ ি�য়াপ
Verb ‫ﻻﺯﻡ‬
Finite प�र�मत ਕਾਲਕੀ ‫ﻓﻌﻞ‬ � ૂણર ପରିମିତ সমািপকা
‫ﻣﺤﺪﻭﺩ‬
Infinitive �क्रयाथर्क स ਅਿਮਤ ‫ﻣﺼﺪﺭ‬ હ�ત્વથ ଅନନ୍ତ অপূণর্ ি�য়

Copyright@TDIL
57

Gerund �क्रयावा ਿਕਿਰਆਵਾਚੀ ‫ﺣﺎﺻﻞ‬ વતર્માન�ૃદન କ୍ରିୟାବାଚକ �েযাজক


‫ﻣﺼﺪﺭ‬ ি�য়া
Non-Finite गैर-प�र�मत ਅਕਾਲਕੀ ‫ﻓﻌﻞ ﻏﻴﺮ‬ અ� ૂણર ଅପରିମିତ অসমািপকা
‫ﻣﺤﺪﻭﺩ‬
Participle कृदं त परक नाम NA NA NA NA ি�য়াজাত
Noun িবেশষয
5 Adjective �वशेषण ਿਵਸ਼ੇਸ਼ਣ ‫ﺻﻔﺖ‬ િવશેષણ ବିଶେଷଣ িবেশষণ
6 Adverb �क्र-�वशेषण ਿਕਿਰਆ ਿਵਸ਼ੇਸ਼ਣ ‫ﻣﺘﻌﻠّﻖ ﻓﻌﻞ‬ �ક્રયાિવશે କ୍ରିୟା-ବିଶେଷଣ ি�য়া-িবেশষণ
7 Post परसगर ਸਬੰ ਧਕ ‫ﺟﺎﺭ ﻣﻮ ّﺧﺮ‬ અ�ુગો ପରସର୍ଗ পরসগর
Position
8 Conjunction
योजक ਯੋਜਕ ‫ﺣﺮﻑ ﻋﻄﻒ‬ સંયોજકો ସଂଯୋଜକ সংেযাগমূলক
Co-ordinator
समन्वय ਸਮਾਨ ਯੋਜਕ ‫ﺣﺮﻑ ﻭﺻﻞ‬ સહ�ક્રયાદશ সম�য়ক
ସମନ୍ୟକ
Subordinator
अधीनस् ਅਧੀਨ ਯੋਜਕ ‫ﺣﺮﻑ‬ ગૌણ�ક્રયાદશ শতর ্ সংেযাজ
‫ﺗﺎﺑﻊ ﮐﻨﻨﺪﻩ‬
Quotative उ��-वाचक ਕਥਨਵਾਚੀ ‫ﺣﺮﻑ‬ NA ଉକ୍ତିବାଚକ উি�বাচক
‫ﺍﻗﺘﺒﺎﺳﯽ‬

9 Particles अव्य ਿਨਪਾਤ ‫ﺣﺮﻑ‬ િનપાત ଅବ୍ୟୟ / অবয্


‫ﭘﺎﺑﻨﺪ‬/‫ﺣﺎﻟﻴہ‬
ନିପାତ
Default व्य�तक ਤਰੁਟੀਵਾਚਕ ‫ﺣﺮﻑ‬ સ્વયં � ବ୍ୟତିକ୍ରମ সাধারণ
‫ڈﻳﻔﺎﻟﭧ‬ অবয্
Classifier वग�कारक ਵਰਗੀਿਕ�ਤ ‫ﺣﺮﻑ‬ NA ବର୍ଗୀକାରକ বগর্বাচ
‫ﺩﺭﺟہ ﺑﻨﺪ‬
Interjection �वस्मया�दबोध ਿਵਸਮਕ ‫ﺣﺮﻑ ﻓﺠﺎﺋﻴہ‬ િવસ્મયઆ�દ ବିସ୍ମୟ ବୋଧକ িব�য়ািদেবাধক
બોધક
Negation नकारात्म ਨ�ਹਵਾਚੀ ‫ﺣﺮﻑ ﻧﮩﯽ‬ નકારદશર્ ନିଷେଧାତ୍ମକ নঞথর্
Intensifier तीव् ਤੀਬਰਤਾਵਾਚੀ ‫ﺣﺮﻑ ﺗﺎﮐﻴﺪ‬ માત્ર ા�ૂ ତୀବ୍ରତାବାଚକ তী�তােবাধক
10 Quantifiers
संख्यावाच ਸੰ ਿਖਆਵਾਚੀ ‫ﮐﻤﻴﺖ ﻧﻤﺎ‬ પ�રમાણ� ૂચકો ସଂଖ୍ୟାବାଚୀ পিরমাণবাচক
General सामान् ਸਧਾਰਨ ‫ ﻋﺎﻡ‬/‫ﻋﻤﻮﻣﯽ‬ સામાન્ ସାମାନ୍ୟ সাধারণ
Cardinals गणनासच
ू क ਿਗਣਤੀਸੂਚਕ ‫ﺍﻋﺪﺍﺩ ﻣﻄﻠﻖ‬ સંખ્યાવાચ ଗଣନାସୂଚକ সংখয্াবাচ
Ordinals क्रमसू ਕ�ਮਸੂਚਕ ‫ﺗﺮﺗﻴﺒﯽ ﺍﻋﺪﺍﺩ‬ ક્રમવા କ୍ରମସୂଚକ �মবাচক
11 Residuals अवशेष ਬਾਕੀ ‫ﺑﺎﻗﯽ ﻣﺎﻧﺪﻩ‬ શેષ ଅବଶେଷ অবিশ� পদ
Foreign �वदे शी शब् ਿਵਦੇਸ਼ੀ ਸ਼ਬਦ ‫ﺑﻴﺮﻭﻧﯽ ﻟﻔﻆ‬ પરદ� શી શબ્દ ବିଦେଶୀ ଶବ୍ଦ িবেদশী শ�
word
Symbol प्रत ਸੰ ਕੇਤ ‫ﻋﻼ َﻣﺖ‬ સંક�ત ପ୍ରତୀକ �তীক
Unknown अ�ात ਅਿਗਆਤ ‫ﻧﺎﻣﻌﻠﻮﻡ‬ અ�ણ્ય શબ્દ ଅଞାତ অ�াত
Punctuation �वरामा�द-�च� ਿਵਸ਼ਰਾਮ ਿਚੰ ਨ� ‫ﺍﻭﻗﺎﻑ‬ િવરામ�ચહ્ ବିରାମ ଚିହ୍ନ যিতিচ�
Echowords प्र�तध्-शब् ਪ�ਿਤਧੁਨੀ ਸ਼ਬਦ ‫ﮔﻮﻧﺞ ﺩﺍﺭ‬ અ�ુરણનાત્મ ପ୍ରତିଧ୍ନୀ অনুকার শ�
‫ﺍﻟﻔﺎﻅ‬

Copyright@TDIL
58

Languages: Assamese, Bodo, Kashmiri (Urdu Script), Kashmiri (Hindi Script), Marathi
S.No English Hindi Assamese Bodo Kashmiri Kashmiri Marathi
(Hindi)
1 Noun सं�ा িবেশষয मुंमा ‫ﻧﺎ ُﻭﺕ‬ नावुत नाम
common जा�तवाचक জািতবাচক फोलेर �दिन्थग ‫ﻋﺎﻡ‬ आम सामान्य
नाम
Proper व्य��वाच বয্ি�বাচ मंु �दिन्थग ‫ﺧﺎﺹ‬ ख़ास विशेष नाम
Verbal �क्रयामू ি�য়াবাচক हाबा �दिन्थग ‫ﺍﻭﺗٲﻭۍ‬ ٛ
ٕ ‫ﮐﺮ‬ क्रावतां धातुसाधित
नाम
/ कृदं त
Nloc दे श-काल �ানবাচক थाव�न �दिन्थग मुंमा ‫ﻧﺎﻭﺗ ٕہ ﺟﺎﻳ ِہ ﮨﺎﻭ‬ नाव त देश
जा�य हाव कालवाचक
सापे�
नाम
2 Pronoun सवर्ना সবর্না मंरु ाइ ‫ﭘَﺮﻧﺎ ُﻭﺕ‬ पर नावत
ु सर्वनाम
Personal व्य��वाच বয্ি�বাচ संबुं �दिन्थग ‫ﺷﺨﺼﻴٲﺗﯽ‬ शिख्सयांत पुरुषवाचक

Reflexive �नजवाचक আত্মবা गाव �दिन्थग ‫ﻣﺎﮐﻮﺳﯽ‬ माकूसी आत्मवाचक

Reciprocal पारस्प�र পাৰ�িৰক गावज� गाव ‫ﺑﺎﮨﻤﯽ‬ बा�हमी/ पारस्पारिक

सोमोन्द बो�हमी
Relative संबंध- वाचक স��বাচক सोमोन्दो ‫ﺭٲﺑِﺘٲﻭۍ‬ रो�बतांव् संबंधवाची

�दिन्थग
प्र�वा स��थ �दिन्थग ‫ک ﻟﻔﻆ‬ क-लफ़् प्रश्नार्थक
Wh-words ��েবাধক
সবর্না
Indefinite अ�न�यवाचक

3 Demonstrative �न�यवाच/ িনেদর ্শেবাধ थाव�न �दिन्थग ‫ﺮﻧﺎﻭﺗۍ‬


ٕ َ‫ﮨﺎ َﻭﻥ ﭘ‬ हावन दर्शक
संकेतवाचक परनावुत्
Deictic �नद� शी �তয্� �थ �दिन्थग ‫ﻭٲﻧﻴٲﻭۍ‬ वोनयोव्
িনেদর ্শ
Relative सम्बन স��বাচক सोमोन्दो �दिन्थ ‫ﺭٲﺑﺘٲﻭۍ‬ रोबतांत् संबंधवाच/
संबंधदर्शक
वाचक
Wh-words प्र�वा ��েবাধক म स��थ �दिन्थग ‫ک ﻟﻔﻆ‬ क-लफ़् प्रश्नार्थक
অবয্

Indefinite अ�न�यवाचक NA NA NA NA NA
4 Verb �क्र ি�য়া थाइजा ٚ
‫ﮐﺮﺍ ُﻭﺕ‬ क्राव क्रियापद
Auxiliary सहायक সহায়কাৰী
लेङाइ थाइजा ‫ڈﮐﻬ ٕہ ﮐﺮﺍ ُﻭﺕ‬ डख क्राव सहायकारी
Verb ি�য়া क्रियापद
�क्र
Main Verb मुख्य মুখয্ ি�য় गुबै थाइजा ‫ﺭﺍے ﮐﺮﺍ ُﻭﺕ‬ राय क्राव मुख्य
�क्र क्रियापद
Finite प�र�मत সমািপকা जाफुंजा थाइजा ‫ﺸﺮ ﮨﺎﻭ‬
ٕ ‫ِﮨ‬ �हशर आख्यात
हाव क्रियारूप

Copyright@TDIL
59

Infinitive अनंत অসমািপকা जाफु�ङ थाइजा ‫ﺸﺮ ﮐﻬﺎﻭ‬


ٕ ‫ِﮨ‬ �हशर खाव भाववाचक
कृदंत
Gerund �क्रयावा িনিমত্তাথর जाफुबाय थानाय ٛ
‫ﮐﺮﺍﻭﺗ ٕہ ﻧﺎ ُﻭﺕ‬ क्राव विभक्तिक्षम
সং�া कृदंतरूप
�दिन्थग नावत

Non-Finite गैर-प�र�मत অসমািপকা जाफु�ङ थाइजा ‫ﺸﺮ ﮨﺎﻭ‬
ٕ ‫ﻧﺎ ِﮨ‬ ना �हशर आख्यातेतर
हाव क्रियारूप
Participle कृदं त परक NA NA NA NA NA
Noun नाम
5 Adjective �वशेषण িবেশষণ थाइला�ल ‫ﺑﺎ ُﻭﺕ‬ बावुत विशेषण
6 Adverb �क्र-�वशेषण ি�য়া थाइजा�न थाइला�ल ‫ﻟَﮕ ٕہ ﺑٲﺵ‬ लग बांश क्रियाविशेषण

িবেশষণ
7 Post परसगर অনুসগর सोदोब उन महर�थ ‫ﭘﻮﺕ ﺟﺎے‬ٚ पोत अंत्यस्थान
Position
जाय
8 Conjunction योजक সংেযাজক दाजाब महर�थ ‫ﻭﺍﮢ َﻮﻥ‬ राटवन उभयान्वयी
अव्यय
Co-ordinator समन्वय সম�য়ক लोगो महर ‫ﻭﺍﮢُﺖ‬ वाटत/ NA
वाटथ
Subordinator अधीनस् NA लेङाइ लोगो महर ‫ﺗﺤﺘ ُﻮﻥ‬ तहतून NA
Quotative उ��-वाचक NA मुंख’�थ ‫َﺩﭘَﻦ ﻧِﺸﺎﻧ ٕہ‬ दपन उद्गारवाचक

�नशान
अव्य महर�थ ‫ﮢﻮﮢ ٕہ َﻭ ٕﻧﺘۍ‬ टोट वनत्
আনুষংিগক
9 Particles
অবয্
अव्यय/
निपात
Default व्य�तक गोरोिन् ‫ِڈﻓﺎﻟﭧ‬ �डफाल् सामान्य
वग�कारक �थ �दिन्थग् ‫َﻭﺭ ٕﮔﮩﺎ‬ वरगहा NA
Classifier িনিদর ্�তাবাচক
সগর
दाजाबदा
Interjection �वस्मया� িব�য়েবাধক सोमोनांनाय ‫ژﻫﮣُﺖ‬ छटत/ विस्मयवाचक

बोधक �दिन्थग छटथ


Negation नकारात्म নঞাথর্ न�ङ �दिन्थग ‫ﻧَہ ﮐٲﺭۍ‬ नकांरय निषेधात्मक

Intensifier तीव् गुन �दिन्थग ‫ﺷﺪﺕ ﮨﺎﺭ‬ शदत हाव तीव्रतावाचक

10 Quantifiers संख्यावाच পিৰমাণবাচক �बबां �दिन्थग ‫ ﻨﺪ‬ٛ‫ﮔﺮﻳ‬ ग्रे संख्यावाचक

General सामान् সাধাৰণ सरासनस् ‫ﻋﻤﻮﻣﯽ‬ अमूमी सामन्य


Cardinals गणनासच
ू क সংখয্াবাচ गुबै �बसान ‫ ﻨﺪ‬ٚ‫ﺁﻧﮑﻮﻧ ٕہ ﮔﺮﻳ‬ٛ ओकँवन गणनावाचक

ग्रॆ
Ordinals क्रमसू �মবাচক
फा�र �बसान ‫ ﻨﺪ‬ٚ‫ٴﻭﻧۍ ﮔﺮﻳ‬ वेन्य क्रमवाचक
সংখয্াবাচক
শ� ग्रॆ
11 Residuals अवशेष NA आद् ‫ﺑﺎﻗﻴٲﺗﯽ‬ बाक़यांती शेष

Copyright@TDIL
60

Foreign �वदे शी িবেদশী শ� गुबुन हादरा�र ‫ﻏٲﺭ ُﻣﻠﮑﯽ‬ गोर मुल्क� विदेशी शब्द
word ‫ﻟَﻔﻆ‬ लफ़ुज़
शब् सोदोब
Symbol प्रत �তীক नेस�न ‫ﻋﻼ َﻣﺖ‬ अलामत चिन्ह
Unknown अ�ात অ�াত �म�थ�य ‫ﺍَﺯﻭﻥ‬ अज़ोन अज्ञात
Punctuation �वरामा�द-�च� যিত িচন थाद’�सन खािन् ِ َ‫ﻟ‬
‫ﮩﺠ َﻮﻥ‬ लहिजवन विरामचिन्हे

Echowords प्र�तध्-शब् �নয্াত্মক


�रंखां सोदोब ‫ﭘﻮﺕ ﺩُﻧۍ ﻟﻔﻆ‬ٚ पॊत दे न् नादानुकारी/
लफ़ज़ अभ्यस्त

Copyright@TDIL
61

Languages: Telugu, Malayalam, Tamil, Konkani


S.No. English Hindi Telugu Malayalam Tamil Konkani
1 Noun सं�ा సంజఞ നാമം பெயர் नाम
common जा�तवाचक జతవచకం സാമാന� നാമം பொதுப் जातवाचक नाम
பெயர்
Proper व्य��वाच వయకతవచకం സംജ് നാമം சிறப்புப் व्य��वाच
பெயர்
नाम
Verbal �क्रयामू / కరయమ�లకం NA தொழில் �क्रयामू नाम
कृदं त பெயர்
Nloc दे श-काल దశ-కల సపకషకం ആധാരിക നാമം இடப் பெயர் थळ-काळ-
सापे� सापे� नाम
2 Pronoun सवर्ना సరవనమం സരവവ്നാ பதிலீடுப் सवर्ना
பெயர்
व्य��वाच पुरूश सवर्न
പുരുഷ மூவிடப்பெய
Personal వయకతవచకం സരവവ്നാ
Reflexive �नजवाचक ఆతమరథకం നിചവാചി தற்சுட்டுப் आत्मवाच
സരവവ്നാ பதிலீடுப்
सवर्ना
பெயர்
Reciprocal पारस्प�र పరసపరకం
സംബന്ധവാ பரஸ்பர संबंद� सवर्ना
സരവവ്നാ
பதிலீடுப்
பெயர்
Relative संबंध- वाचक సంబంధ-వచకం പാരസ്പി இணைப்பு एकमेक�
സരവവ്നാ பதிலீடுப்
सवर्ना
பெயர்
Wh-words प्र�वा పశర నవచకం േചാദ�വാചി வினாச் சொல் प्रस्न
സരവവ്നാ
सवर्ना
Indefinite अ�न�यवाचक NA சுட்டு अ�नि�त सवर्ना
3 Demonstrative �न�यवाचक/ నరదశకవచకం നിരേദശകം நேர்ச்சுட்டு दशर्
संकेतवाचक
�नद� शी दशर्क उत
്രപത� സൂചകം சுட்டு
Deictic నరదషట
பதிலீடுப்
பெயர்
Relative संबंधवाचक సంబంధ-వచకం സംബന്ധ வினாச் சொல் संबंद� दशर्
ചി നിരേദശകം
Wh-words प्र�वा పశర నవచకం േചാദ�വാചി வினை प्रस्न दशर्
നിരേദശകം
Indefinite अ�न�यवाचक NA NA துணை வினை अ�नि�त सवर्ना
4 Verb �क्र కరయ ്രകി முதன்மை �क्रया
வினை
Auxiliary सहायक �क्र సహయక కరయ സഹായക ്രകി முற்று வினை पालवी �क्रया
Verb Auxiliary Finite
(पूण् पालवी

Copyright@TDIL
62

�क्रया)
Auxiliary Non
Finite
(अपणू ् पालवी

�क्रया)
Main Verb मुख्य �क् మ�ఖయ కరయ ്രപധാ ്രകി குறை எச்சம் मुखेल �क्रया
Finite प�र�मत సమపక പൂരണ ്രകി வினைப் பெயர் �न�ीत �क्रया
Infinitive �क्रयाथर्क स త�మ�ననరథకం ്രകിയാരൂപ வினை எச்சம் सादारण रू
Gerund �क्रयावा కరయవచకం NA பெயரடை �क्रयावा नाम
Non-Finite गैर-प�र�मत అసమపక
അപൂരണ ്രകി வினையடை अ�न�ीत
�क्रया
Participle कृदं त परक नाम NA NA பின்னுருபு NA
Noun
�वशेषण �वशेशण
നാമ വിേശഷണം இணைப்புச்
5 Adjective వశషణం
சொல்
�क्र-�वशेषण �क्रया�वशे
്രകിയ இணை
6 Adverb కరయవశషణం വിേശഷണം
இணைப்புச்
சொல்
7 Post परसगर పరసరగ അനു്രപേയ சார்பு संबंद� अव्य
Position ഗം இணைப்புச்
சொல்
8 Conjunction योजक సమ�చఛయం സമുച്ച நிரப்பு जोड अव्य
இடைச்சொல்
Co-ordinator
समन्वय సమనధకరణం ഏേകാപിത இடைச்சொல் समानाधीकरण जोड
സമുച്ച अव्य
Subordinator
अधीनस् వయధకరణం ആശ്ചര� முன்னிருப்பு आश्र जोड
ചക
अव्य
സമുച്ച
Quotative उ��-वाचक అనుకరకం ഉദ്ധാരണ இனப்பிரிப்பு अवतरणअथ�- उतर
ചി സമുച്ച ஒட்டு
9 Particles अव्य అవయయం നിപാദം வியப்பிடைச் अव्य
சொல்
Default व्य�तक వయతకరమం സാമാന�ം எதிர்மறை सरभरस अव्य
Classifier वग�कारक వరగకరకం വരഗ്ഗ மிகுவிப்பான் वगर् अव्य
Interjection �वस्मया�दबोध వసమయదబ� ధకం വ�ാേക്ഷപ அளவையடை उमाळी अव्य
Negation नकारात्म నకరతమకం നിേഷദം பொது न्हयकार अव्य
Intensifier तीव् అతశయరథకం തീ്ര നിപാദം எண்ணுப் तीव्रका अव्य
பெயர்
10 Quantifiers संख्यावाच సంఖయవచకం സംഖ�ാവാചി எண்ணு संख्यादशर
முறைப் பெயர்

Copyright@TDIL
63

General सामान् సమనయం െപാതുസം எஞ்சியவை सामान्


ഖ�ാവാചി
Cardinals गणनासूचक గణనసూచకం അടിസ്ഥ அயல் சொல் संख्यावाच
സംഖ�ാവാചി
Ordinals क्रमसू కరమసూచకం കരമ്മവാ குறியீடு क्रमवा
11 Residuals अवशेष అవశషం
അവശിഷ്ടപ தெரியாதது हे र
Foreign �वदे शी शब् వదశ శబదం
അന�ഭാഷാപദം நிறுத்தற்குறி �वदे शी
word யீடு
Symbol प्रत సంకతం ചിഹ് இரட்டைக்கிள कुर
வி
Unknown अ�ात అజఞత ഇതരപദം NA अनवळखी
Punctuation �वरामा�द-�च� వరమం വിരാമ ചിഹ് NA �वरामकूर
Echo-words प्र�तध्-शब् పరతధవన-శబంద മാെറ്റാലിവാ NA पडसाद� उतरां

Copyright@TDIL
64

14. ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg: {

<xs:element name="cat" POS cat=”noun” hin-cat=”सं�ा”tag=”N”>

<xs:attribute name="type" subcat="common” hin-cat=”जा�तवाचक” tag=”NN">

<xs:attribute name="type" subcat ="Proper” hin-cat=”व्य��वाच” tag=”NNP">

<xs:element name="cat" POS cat=”Pronoun” hin-cat=”सवर्ना” tag=”PR”>

<xs:attribute name="type" subcat ="Personal” hin-cat=”व्य��वाच” tag=”PRP">

<xs:attribute name="type" subcat ="Reflexive” hin-cat=”�नजवाचक” tag=”PRF">

……………………………………………..

End if

If language is Bodo then

Call (POS Schema)

Display (English, Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg: {

<xs:element name="cat" POS cat=”noun” hin-cat=”सं�ा” brx-cat=”मुंमा” tag=”N”>

Copyright@TDIL
65

<xs:attribute name="type" subcat="common” hin-cat=”जा�तवाचक” brx-cat=”फोलेर


�दिन्थग” tag=”NN">

<xs:attribute name="type" subcat ="Proper” hin-cat=”व्य��वाच” brx-cat=”मुं


�दिन्थग” tag=”NNP">

<xs:element name="cat" POS cat=”Pronoun” hin-cat=”सवर्ना” brx-cat=”मुंराइ” tag=”PR”>

<xs:attribute name="type" subcat ="Personal” hin-cat=”व्य��वाच” brx-cat=”संबुं


�दिन्थग” tag=”PRP">

<xs:attribute name="type" subcat ="Reflexive” hin-cat=”�नजवाचक” brx-cat=”गाव


�दिन्थग” tag=”PRF">

……………………………………………..

End if

If language is Konkani then

Call (POS Schema)

Display (English, Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg: {

<xs:element name="cat" POS cat=”noun” hin-cat=”सं�ा” kok-cat=”नाम” tag=”N”>

<xs:attribute name="type" subcat="common” hin-cat=”जा�तवाचक” kok-


cat=”जातवाचक नाम” tag=”NN">

<xs:attribute name="type" subcat ="Proper” hin-cat=”व्य��वाच” kok-


cat=”व्य��वाच नाम” tag=”NNP">

<xs:element name="cat" POS cat=”Pronoun” hin-cat=”सवर्ना” kok-cat=”सवर्ना”


tag=”PR”>

Copyright@TDIL
66

<xs:attribute name="type" subcat ="Personal” hin-cat=”व्य��वाच” kok-cat=”पुरू


सवर्ना” tag=”PRP">

<xs:attribute name="type" subcat ="Reflexive” hin-cat=”�नजवाचक” kok-


cat=”आत्मवाच सवर्ना” tag=”PRF">

……………………………………………..

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English, Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg: {

<xs:element name="cat" POS cat=”noun” hin-cat=”सं�ा” mal-cat=”നാമം” tag=”N”>

<xs:attribute name="type" subcat="common” hin-cat=”जा�तवाचक” mal-


cat=”സാമാന� നാമം” tag=”NN">

<xs:attribute name="type" subcat ="Proper” hin-cat=”व्य��वाच” mal-


cat=”സംജ് നാമം” tag=”NNP">

<xs:element name="cat" POS cat=”Pronoun” hin-cat=”सवर्ना” mal-cat=”സരവവ്നാ”


tag=”PR”>

<xs:attribute name="type" subcat ="Personal” hin-cat=”व्य��वाच” mal-


cat=”പുരുഷ സരവവ്നാ” tag=”PRP">

<xs:attribute name="type" subcat ="Reflexive” hin-cat=”�नजवाचक” mal-


cat=”നിചവാചി സരവവ്നാ” tag=”PRF">

……………………………………………..

Copyright@TDIL
67

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English, Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg: {

<xs:element name="cat" POS cat=”noun” hin-cat=”सं�ा” kas-cat=”‫ ” ﻧﺎ ُﻭﺕ‬tag=”N”>

<xs:attribute name="type" subcat="common” hin-cat=”जा�तवाचक” kas-cat=”‫” ﻋﺎﻡ‬


tag=”NN">

<xs:attribute name="type" subcat ="Proper” hin-cat=”व्य��वाच” kas-cat=”‫” ﺧﺎﺹ‬


tag=”NNP">

<xs:element name="cat" POS cat=”Pronoun” hin-cat=”सवर्ना” kas-cat=”‫” ﭘَﺮﻧﺎ ُﻭﺕ‬


tag=”PR”>

<xs:attribute name="type" subcat ="Personal” hin-cat=”व्य��वाच” kas-cat=”


‫ ”ﺷﺨﺼﻴٲﺗﯽ‬tag=”PRP">

<xs:attribute name="type" subcat ="Reflexive” hin-cat=”�नजवाचक” kas-cat=”


‫ ”ﻣﺎﮐﻮﺳﯽ‬tag=”PRF">

……………………………………………..

End if

Copyright@TDIL
68

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English, Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg: {

<xs:element name="cat" POS cat=”noun” hin-cat=”सं�ा” asm-cat=”িবেশষয” tag=”N”>

<xs:attribute name="type" subcat="common” hin-cat=”जा�तवाचक” asm-


cat=”জািতবাচক” tag=”NN">

<xs:attribute name="type" subcat ="Proper” hin-cat=”व्य��वाच” asm-


cat=”বয্ি�বাচ” tag=”NNP">

<xs:element name="cat" POS cat=”Pronoun” hin-cat=”सवर्ना” asm-cat=”সবর্না”


tag=”PR”>

<xs:attribute name="type" subcat ="Personal” hin-cat=”व्य��वाच” asm-


cat=”বয্ি�বাচ” tag=”PRP">

<xs:attribute name="type" subcat ="Reflexive” hin-cat=”�नजवाचक” asm-


cat=”আত্মবা” tag=”PRF">

……………………………………………..

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English, Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg: {

Copyright@TDIL
69

<xs:element name="cat" POS cat=”noun” hin-cat=”सं�ा” guj-cat=”સંજ્” tag=”N”>

<xs:attribute name="type" subcat="common” hin-cat=”जा�तवाचक” guj-

cat=”�િતવાચક” tag=”NN">

<xs:attribute name="type" subcat ="Proper” hin-cat=”व्य��वाच” guj-

cat=”વ્ય�ક્તવા” tag=”NNP">

<xs:element name="cat" POS cat=”Pronoun” hin-cat=”सवर्ना” guj-cat=”સવર્ના”


tag=”PR”>

<xs:attribute name="type" subcat ="Personal” hin-cat=”व्य��वाच” guj-

cat=”�ુ�ુષવાચક” tag=”PRP">

<xs:attribute name="type" subcat ="Reflexive” hin-cat=”�नजवाचक” guj-

cat=”પ્રિત�બ��” tag=”PRF">

……………………………………………..

End if

Copyright@TDIL
70

15. REFERENCE BASED IMPLEMENTATION

Hindi
1. स�प�ु रय�\N_NNP के\PSP दशर्\N_NN से\PSP �मलता\V_VM है\V_VM मो�\N_NN !\RD_PUNC
2. �हंद\ू N_NN धमर\N_NN म�\PSP तीथर\N_NN का\PSP बड़ा\JJ महत्\N_NN है \V_VM ।\RD_PUNC
3. य\ँू RP_RPD तो\RP_RPD हर\QT_QTF तीथर\N_NN बड़ा\JJ और\CC_CCD अहम\JJ है \V_VM
,\RD_PUNC ले�कन\CC_CCS सात\QT_QTC स्थान\N_NN क�\PSP बड़ी\JJ मह�ा\N_NN
और\CC_CCD मान्यत\N_NN है \V_VM ।\RD_PUNC
4. ये\DM_DMD सात�\QT_QTC धमर्स्\N_NN सात\QT_QTC नगर�\N_NN या\RP_RPD
स�प�ु रय�\N_NNP के\PSP रू\N_NN म� \PSP ग्रं\N_NN म� \PSP व�णर्\V_VM ह�\V_VAUX
।\RD_PUNC
5. ऐसा\DM_DMD कहा\V_VM गया\V_VAUX है\V_VAUX �क\CC_CCS चतम
ु ार्\N_NNP म� \PSP
इन\DM_DMD स�प�ु रय�\N_NNP का\PSP दशर्\N_NN मो�\N_NN प्रद\N_NN करने\V_VM
वाला\PSP होता\V_VM है \V_VAUX ।\RD_PUNC

Punjabi

1. ਸਪਤਪੁਰੀਆਂ\N_NN ਦੇ\PSP ਦਰਸ਼ਨ\N_NN ਨਾਲ\PSP ਿਮਲਦਾ\V_VM_VNF ਹੈ\V_VAUX ਮੋਖ\N_NN

2. ਿਹੰ ਦੂ\N_NN ਧਰਮ\N_NN ਿਵਚ\PSP ਤੀਰਥ\N_NN ਦਾ\PSP ਬਹੁਤ\QT_QTF ਮੱ ਹਤਵ\N_NN

ਹੈ\V_VAUX |\RD_PUNC

3. �ਝ\RB ਤ�\CC_CCS ਹਰ\QT_QTF ਤੀਰਥ\N_NN ਵੱ ਡਾ\JJ ਤੇ\CC_CCS ਅਿਹਮ\JJ ਹੈ\V_VAUX

,\CC_CCS ਪਰ\CC_CCS ਸੱ ਤ\QT_QTC ਸਥਾਨ�\N_NN ਦੀ\PSP ਬਹੁਤ\QT_QTF ਮਹੱ ਤਤਾ\N_NN

ਅਤੇ\CC_CCD ਮਾਨਤਾ\N_NN ਹੈ\V_VAUX |\RD_PUNC

4. ਇਹ\DM_DMD ਸੱ ਤੇ\QT_QTC ਧਰਮ\N_NN ਸਥਾਨ\N_NN ਸੱ ਤ\QT_QTC ਨਗਰ�\N_NN

ਜ�\CC_CCD ਸਪਤਪੁਰੀਆਂ\N_NN ਦੇ\PSP ਰੂਪ\N_NN ਿਵਚ\PSP ਗਰੰ ਥ�\N_NN ਿਵਚ\PSP

ਦਰਜ\N_NN ਹਨ\V_VAUX |\RD_PUNC

5. ਇੰ ਝ\V_VM_VNF ਿਕਹਾ\V_VM_VNF ਿਗਆ\V_VM_VF ਹੈ\V_VM_VNF ਿਕ\CC_CCS ਚੌਥ\ੇ QT_QTO


ਮਹੀਨ�\N_NN ਿਵਚ\PSP ਇਨ��\PSP ਸਪਤਪੁਰੀਆਂ\N_NN ਦਾ\PSP ਦਰਸ਼ਨ\N_NN ਮੋਖ\N_NN

ਪ�ਦਾਨ\N_NN ਵਾਲਾ\PSP ਹੁੰ ਦਾ\V_VM_VNF ਹੈ\V_VAUX |\RD_PUNC

Copyright@TDIL
71

Tamil

1. சப்த��கை\N_NN த�சிப்பதா\V_VM_VNG �க்த\N_NN


கிைடக்கிற\V_VM_VF .\RD_PUNC
2. இந்\N_NNP மதத்தி\N_NN �ண்ண�\JJ இடங்க\N_NN மிக�ம\RP_INTF
சிறப்\N_NN வாய்ந்த\N_NN ஆ�ம\V_VAUX .\RD_PUNC
3. ஒவ்ெவா\QT_QTC �ண்ண�யத்தல\N_NN ெப�ய�\N_NN
மற்�\CC_CCD �க்கியத்�\N_NN வாய்ந்\N_NN ஆ�ம\V_VAUX
ஆனால\CC_CCS ஏ�\QT_QTC இடங்க\N_NN மிக\RP_INTF சிறப்�\N_NN
மதிப்�\N_NN வாய்ந்ததா\V_VM_VF .\RD_PUNC
4. இந்\DM_DMD ஏ�\QT_QTC �ண்ண�யத்தலங\N_NN ஏ�\QT_QTC
நகரங்க\N_NN அல்ல\CC_CCD சப்த��க\N_NN என்\CC_CCS_UT
�த்தகங்கள\N_NN வர்ண�க்கப்\V_VM_VNF இ�க்கின்\V_VAUX
.\RD_PUNC
5. ெபௗர்ணமிய�\N_NN இந்\DM_DMD சப்த��ய�\N_NN த�சனம\N_NN
�க்திை\N_NN வழங்�கிற\V_VM_VF என்\CC_CCS_UT
ெசால்லப்ப\V_VM_VNF இ�க்கிற\V_VAUX .\RD_PUNC

Malayalam

1. ഏഴ\QT_QTC പുണൃനഗരികള\N_NN സന്ദരക്കു\V_VM_VNF


െകാണ്\RP_RPD േമാക്\N_NN ലഭിക്കു\V_VM_VF .\RD_PUNC
2. ഹിന്\N_NN മതത്ത\N_NN പുണ�സ്ഥലങ്\N_NN വലിയ\JJ
മഹത�ം\N_NN ഉണ്\V_VAUX .\RD_PUNC
3. എല�ാ\QT_QTF തീര്‍ടനസ്ഥലങ്\N_NN വലുതും\JJ ്രപധാനെപ്പട\JJ
ആണ\V_VAUX ,\RD_PUNC എങ്കില\CC_CCD ഈ\DM_DMD ഏഴ\QT_QTC
സ്ഥലങ്ങ\N_NN വലിയ\JJ േ്രശഷ്ഠതയ\N_NN ആദരവും\N_NN
ഉണ്\V_VAUX .\RD_PUNC
4. ഈ\DM_DMD ഏഴ\QT_QTC ധര്സ്ഥലങ്\N_NN ഏഴ\QT_QTC
പട്ടണ\N_NN അഥവാ\CC_CCD ഏഴ\QT_QTC പുണ�നഗരികള\N_NN
എന\CC_CCD രീതിയില\N_NN ്രഗന്ഥങ\N_NN വര്‍ച്ചിട്\V_VM_VF
.\RD_PUNC
5. ചതുരമാസത്ത\N-NNP ഈ\DM_DMD പുണ�സ്ഥലങ്ങ\N_NN
സന്ദനം\N_NNV േമാക്ഷ്രപദായകമാെ\N_NN പറഞ്ഞിട്\V_VM_VF
.\RD_PUNC

Copyright@TDIL
72

Bangla

1. স�পুির\N_NNP দশর্\N_NN কের\V_VM_VNF েমা�লাভ\N_NN হয়\V_VAUX ৷\RD_PUNC


2. িহ�ু\N_NN ধেমর\N_NN তীেথর্\N_NN যেথ�\JJ মহ�\N_NN আেছ\V_VAUX ৷\RD_PUNC
3. যিদও\PSP সম�\QT_QTF তীথর\N_NN যেথ�\JJ গু�পূণর,\N_NN তাও\PSP সাতি\QT_QTC
জায়গার\N_NN িবেশষ\JJ গু�\N_NN ও\CC_CCD মহ�\N_NN আেছ\V_VAUX ৷\RD_PUNC
4. এই\DM_DMD সাতি\QT_QTC ধমর্�\N_NN সাত\QT_QTC নগর\N_NN বা\CC_CCD স�পুির\N-
NNP নােম\N_NN পিরিচত\N_NN ৷\RD_PUNC
5. এটা\DM_DMD বলা\V_VM_VNG হয়\V_VAUX েয\RP_RPD চতুর ্দশীে\N_NN এই\DM_DMD
স�পুির\N-NNP দশর্\N_NN করেল\V_VM_VNF েমা�লাভ\N_NN হয়\V_VAUX ৷\RD_PUNC

Marathi

1. सातपरु �ंच्य\N_NNP दशर्नान\N_NN �मळतो\VM मो�\N_NN .\PUNC


2. �हंद\ू JJ धमार्मध्\N_NN तीथर्�ेत्रा\N_NN खूप\QT_QTF मह�व\N_NN आहे\VM .\PUNC
3. तसे\PR तर\RP प्रत्\QT_QTF तीथर\N_NN मह�वाचे\N_NN आ�ण\C_CCD मख
ु ्\JJ आहे\VM
पण\C_CCD सात\Q-QTC स्थानांच\N_NN मह�व\N_NN आ�ण\C_CCD मान्यत\N_NN मोठ�\JJ
आहे\VM .\PUNC
4. ह�\DM सात\Q_QTC धमर्स्थ \N_NN सात\Q_QTC नगर�\N_NN वा\C_CCD स�परु �ंच्य\NNP
रूपा\N_NN ग्रंथांमध\N_NN व�णर्लेल\JJ आहे त\VM .\PUNC
5. असे\PR म्हटल\VM गेले\VAUX आहे\VAUX क�\C_CCD चातम
ु ार्सामध्\N_NN या\C_CCD
स�परु �ंच\े NNP दशर्\N_NN मो�\N_NN दे णारे \V_VM_VNF ठरते\VM .\PUNC

Gujarati

1. સપ્ત�ુર�ઓન\N_NNP દશર્નથ\N_NN મળે \V_VM છે \V_VAUX મોક.\N_NN

2. �હ��ુધમર્મા\N_NN તીથર્�ુ\N_NN ઘ�ુ\ં QT_QTF મહત્ત\JJ છે .\V_VM

3. આમ\RP_RPD તો\RP_RPD દર� ક\DM_DMD તીથર\N_NN મહાન\JJ અને\CC_CCD મહત્ત્વ�ૂ\JJ

છે ,\V_VM પણ\CC_CCD સાત\QT_QTC સ્થાનોન\N_NN મહ�ા\N_NN અને\CC_CCD

માન્યત\N_NN છે .\V_VM

4. આ\DM_DMD સાતેય\QT_QTC ઘમર્સ્\N_NN સાત\QT_QTC નગર\N_NN અથવા\CC_CCD

સપ્ત�ુર�ઓન\N-NNP સ્વ�પ\N_NN ગ્રથોમ\N_NN વણર્વાયેલ\V_VM છે .\V_VAUX

5. એમ\RP_RPD કહ�વાય\V_VM ક�\CC_CCS ચા�ુમાર્સમા\N_NN આ\DM_DMD સપ્ત�ુર�ઓન\N-


NNP દશર્\N_NN મોક\N_NN આપનાર\V_VM હોય\V_VAUX છે .\V_VAUX

Copyright@TDIL
73

Konkani

1. स�परु �च� \N_NNP दशर्\N_NN घेतल्या\V_VM_VNF मो�\N_NN मेळटा\V_VM_VF .\RD_PUNC


2. �हंद\ू N_NNP धमा�त\N_NN �तथर्स्थान\N_NN व्ह\JJ म्हत\N_NN आसा\V_VM_VF .\RD_PUNC
3. तश�\RB पळोवपाक\V_VM_VNF गेल्या\V_VM_VNF सगल�ंच\QT_QTF �तथा�\N_NN व्ह\JJ
आनी\CC_CCD खाशेल�ं\JJ आसात\V_VM_VF पण
ू \CC_CCS सात\QT_QTC स्थळा\N_NN व्ह\JJ
आनी\CC_CCD म्हत्वाच\JJ अश�\RB मानतात\V_VM_VF .\RD_PUNC
4. ग्रंथां\N_NN �ा\DM_DMD सातय
ू \QT_QTC धमर्स्थळां \N_NN वणर्\N_NN सात\QT_QTC
नगर�\N_NN वा\CC_CCD स�परु �\N_NNP अश�\RB आसा\V_VM_VF .\RD_PUNC
5. चातम
ु ार्सां\N_NN हे \PR_PRP स�परु �च� \N_NNP दशर्\N_NN मो�\N_NN मेळोवन\V_VM_VNF
�दवपी\V_VM_VNG थारता\V_VM_VF अश�\RB मानतात\V_VM_VF .\RD_PUNC

Urdu

1. ‫\ﺳﺘﭙﻮﺭﻳﻮں‬N_NNP ‫\ﮐﯽ‬PSP ‫\ﺯﻳﺎﺭﺕ‬N_NN ‫\ﺳﮯ‬PSP ‫\ﻣﻠﺘﯽ‬V_VM ‫\ﮨﮯ‬V_VAUX ‫\ﻧﺠﺎﺕ‬N_NN


2. ‫\ﮨﻨﺪﻭ‬N_NN ‫\ﻣﺬﮨﺐ‬N_NN ‫\ﻣﻴﮟ‬PSP ‫\ﺗﻴﺮﺗﻬ‬N_NN ‫\ﮐﯽ‬PSP ‫\ﺑﮍی‬QT_QTF ‫\ﺍﮨﻤﻴﺖ‬N_NN ‫\ﮨﮯ۔‬V_VAUX
3. ‫\ﻳﻮں‬PSP ‫\ﺗﻮ‬PSP ‫\ﮨﺮ‬QT_QTF ‫\ﺗﻴﺮﺗﻬ‬N_NN ‫\ﺑﮍی‬N_NN ‫\ﺍﻭﺭ‬CC_CCD ‫\ﺍﮨﻢ‬N_NN ‫\ﮨﻴﮟ‬V_VAUX
،\RD_PUNC ‫\ﻟﻴﮑﻦ‬PSP ‫\ﺳﺎﺕ‬QT_QTC ‫\ﻣﻘﺎﻣﺎﺕ‬N_NN ‫\ﮐﯽ‬PSP ‫\ﺑﮍی‬N_NN ‫\ﻋﻈﻤﺖ‬N_NN ‫\ﺍﻭﺭ‬CC_CCD
‫\ﻣﻘﺒﻮﻟﻴﺖ‬N_NN ‫\ﮨﮯ۔‬V_VAUX
4. ‫\ﻳہ‬PR_PRP ‫\ﺳﺎﺗﻮں‬QT_QTC ‫\ﻣﺬﮨﺒﯽ‬JJ ‫\ﻣﻘﺎﻣﺎﺕ‬N_NN ‫\ﺳﺎﺕ‬QT_QTC ‫\ﺷﮩﺮﻭں‬N_NN ‫\ﻳﺎ‬CC_CCD
‫\ﺳﺎﺕ‬QT_QTC ‫\ﭘﻮﺭﻳﻮں‬N_NN ‫\ﮐﯽ‬PSP ‫\ﺷﮑﻞ‬N_NN ‫\ﻣﻴﮟ‬PSP ‫\ﮐﺘﺎﺑﻮں‬N_NN ‫\ﻣﻴﮟ‬PSP ‫\ﻣﺬﮐﻮﺭ‬JJ
‫\ﮨﻴﮟ‬V_VAUX ‫\۔‬RD_PUNC
5. ‫\ﺍﻳﺴﺎ‬DM_DMD ‫\ﮐﮩﺎ‬V_VM ‫\ﮔﻴﺎ‬V_VM ‫\ﮨﮯ‬V_VAUX ‫\ﮐہ‬CC_CCD ‫ﻣﻮﺳﻢ‬‫\ﺑﺮﺳﺎﺕ‬N_NN ‫\ﻣﻴﮟ‬PSP
‫\ﺍﻥ‬DM_DMD ‫\ﺳﺎﺗﻮں‬QT_QTC ‫\ﺷﮩﺮﻭں‬N_NN ‫\ﮐﯽ‬PSP ‫\ﺯﻳﺎﺭﺕ‬N_NN ‫\ﻧﺠﺎﺕ‬N_NN ‫\ﻓﺮﺍﮨﻢ‬JJ
‫\ﮐﺮﻧﮯ‬V_VM_VF ‫\ﻭﺍﻟﯽ‬V_VAUX ‫\ﮨﻮﺗﯽ‬V_VM ‫\ﮨﮯ۔‬V_VAUX

Oriya

1. ସପ୍ତପୁରୀଗୁଡ଼ିକN__NN ର PSP ଦର୍ଶନ NN ରୁ PSP ମୋକ୍ଷ NN ମିଳିଥାଏ N__NNV |


2. ହିନ୍ଦୁଧର୍ମN__NN ରେ PSP ତୀର୍ଥ NN ର PSP ବଡ଼ JJ ମହତ୍ NN ଅଟେ V__VAUX |
3. ଏପରିକି RP__RPD ସବୁPR__PRL ତୀର୍ଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମୁଖ୍ୟ JJ ଅଟନ୍ତି V__VAUX, ପରନ୍ତୁ
CC__CCS ସାତ QT__QTC ସ୍ଥାନଗୁଡ଼ିକର N__NN ଶ୍ରେଷ୍ଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD
ମାନ୍ୟତାN__NN ଅଟେ V__VAUX |
4. ଏହିPR ସାତ QT__QTC ଧର୍ମସ୍ଥଳ NN ସାତ QT__QTC ନଗରଗୁଡ଼ିକ N__NN ର PSP କିଂବା CC__CCD
ସପ୍ତପୁରୀଗୁଡ଼ିକN__NN ର PSP ରୂପ JJ ରେ PSP ଗ୍ରନ୍ଥଗୁଡ଼ିକN__NN ରେ PSP ବର୍ଣତ
ି N__NNV ହେଇଅଛି
V__VAUX |
5. ଏଭଳି PR କୁହାଯାଇ V__VM ଅଛି V__VAUX କି CC__CCS ଚର୍ତୁମାସ N__NN ରେ PSP ଏହି PR
ସପ୍ତପୁରୀଗୁଡ଼ିକ N__NN ର PSP ଦର୍ଶନ NN ମୋକ୍ଷ NN ପ୍ରଦାନ V__VAUX କରିବାବାଲା NN ହେଇଥାଏ
V__VAUX |

Copyright@TDIL
74

16. REFERENCE

1. ISO 12620:1999, Terminology and other language and content resources —


Specification of data categories and management of a Data Category Registry for
language resources

2. XML Schema Requirements: http://www.w3.org/TR/1999/NOTE-xml-schema-req-


19990215

3. Best Practices for XML Internationalization: http://www.w3.org/TR/xml-i18n-bp/

4. Internationalization Tag Set (ITS) Version 1.0: http://www.w3.org/TR/2007/REC-its-


20070403/

5. ISO 639-3, Language Codes: http://www.sil.org/iso639-3/codes.asp

Copyright@TDIL
75

ANNEXURE-1
LANGUAGE TAGS

S.No. Language Name Language Tags according to ISO


639-3
1 Hindi asm
2 Assamese ben
3 Bangla brx
4 Bodo doi
5 Dogri guj
6 Gujarati hin
7 Kannada kan
8 Kashmiri kas
9 Konkani kok
10 Maithili mai
11 Malayalam mal
12 Manupuri mni
13 Marathi mar
14 Nepali nep
15 Oriya ori
16 Punjabi pan
17 Sanskrit san
18 Santhali sat
19 Sindhi snd
20 Tamil tam
21 Telugu tel
22 Urdu urd

Copyright@TDIL
76

CONTRIBUTERS
1. Ms. Swaran Lata, Department of Information Technology, New Delhi
2. Prof. Girish Nath Jha, JNU, New Delhi
3. Dr. Somnath Chandra, Department of Information Technology, New Delhi
4. Dipti Misra Sharma, LTRC, IIIT-H
5. Somi Ram CDAC, NOIDA
6. Prof. Uma Maheswara Rao G, University of Hyderabad
7. Dr. Sobha L, AU-KBC, Chennai
8. Menak. S,
9. Kalika Bali, Microsoft, Bangalore
10. Prof. Pushpak Bhattacharyya, IIT-Bombay
11. Prof. Malhar Kulkarni, IIT-Bombay
12. Lata Popale, IIT-Bombay
13. Kirtida Shah, Gujarati University, Ahemadabad
14. Mona Parakh, LDCIL, Mysore
15. Jyoti Pawar, Goa University
16. Madhavi Sardesai, Goa University
17. Ramnath,
18. Aadil Kak, University of Kashmir
19. Nazima, University of Kashmir
20. Dr. Richa, LDCIL, Mysore
21. Mazhar Mehdi Hussain, JNU, New Delhi
22. Mr. Prashant Verma, W3C India, New Delhi
23. Swati Arora, W3C India, New Delhi

Copyright@TDIL

You might also like