Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

DRAVIDIAN WORDNET

S.Arulmozi
Dravidian University
29 April 2013
Tamil Thesaurus
Preliminary work on lexical semantics.
Monumental work on Tamil Thesaurus.
Ontologicial classification of Tamil Vocabulary
Rajendran, S. (2001) tamizhc
coRkaLanjciyam. (in Tamil).Tamil University
Publication.
29 April 2013
Domains in Tamil Thesaurus
Tamil vocabulary is classified into four
major domains:
Entities
Abstracts
Events and
Relationals

29 April 2013
29 April 2013



parumaippeyarkaL
`concrete nouns

'
aHRinaippeyarkaL
`irrational nouns'




uyirillaatavai
`non-living beings'

uruvaakkiya maRRum patananjceyta poruTkaL
`manufactured and processed items'


kaTTappaTTavai
`constructed'
Lexical Hierarchy of the Domain `Construction
Nouns
Relations Example
Synonymy viiTu house - illam `house
Hypernymy-Hyponymy paLLi 'school' kalviccaalai
'educational institution
Hyponym-Hypernymy kalluuri 'college'
aracukkalluuri `govt college
Holonymy-Meronymy ndaaRkaali 'chair' - kaal 'leg
Meronymy-Holonymy cakkaram 'wheel' to vaNTi
'cart
Related Verb paTittal reading paTi read
Coordinate terms kooyil `temple' macuuti
'mosque'
29 April 2013
Verbs
Relations Example
Synonym paTi read payilu read
Hypernymy cuvai taste uNar
Troponymy keeL ask kenjcu plead
Nominal paruku `drink parukutal `drinking
Related Noun kaNTupiTi `discover kaNTupiTippu
`discovery
29 April 2013
Tamil WordNet
Objective: To build a WordNet for Tamil to
enhance machine translation
Resources: Tamil Thesaurus, Technical
Glossaries (Tamil University Publications),
Princeton English WordNet
Funding Agency: Tamil Software Development
Fund, Tamil Virtual University - 4 lacs
Time Frame: 18 months

29 April 2013
Details
Software used
Front-end Java
Back-end - Mysql Database
Project Deliverables
50k root words
Relationships coded
Stand-alone and web-based interface
Embedded morphological analyser
29 April 2013
Statistics
Total Words: 50497
Unique Senses:
41013

Nouns: 46710
Verbs: 2881
Adjectives: 416
Adverbs: 490

29 April 2013
Total Words: 50497
Unique Senses: 41013
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
Nouns Verbs Adjectives Adverbs
Total Words
Unique Senses (Tokens)
29 April 2013
Project Completed (2004)
http://www.nrcfosshelpline.in/code/wiki/TamilWordnet
29 April 2013
Standalone version Tamil WordNet (Snapshot)
29 April 2013
Standalone version Tamil WordNet (Snapshot)
29 April 2013
Web-version Tamil WordNet (Snapshot)
29 April 2013

Web-version Tamil WordNet (Snapshot)
29 April 2013

First Effort on Dravidian Languages
National Workshop on WordNet for Dravidian
Languages
2-3 June 2003
Organized by AU-KBC Research Centre,
Chennai, Central Institute of Indian
Languages, Mysore and Tamil University.
Hands-on experience on specified domain
construction
Report available on Global WordNet website
29 April 2013
MHRD Project
Creation of Machine Translation tools and resources
for English to Dravidian Languages: Pilot Study
to develop Machine Translation(MT) system and needed
linguistic resources for
English-Dravidian languages(Tamil, Malayalam, Telugu and Kannada),
This would facilitate the creation of rich educational contents in
Indian languages.
This research effort is to make all the tools and translation
system to be based on Machine Learning methodologies so
that computer graduates and other such non-linguists are able
to immediately participate in the national mission on literacy by
contributing additional tools for language translation.


29 April 2013
Modules
Module 1: Machine Translation
aims at developing teaching material corresponding to the tools
developed so that it can be delivered as part of undergraduate
computer science and engineering curriculum on data
mining/machine learning.
This will ensure a critical amount of man power required for
sustaining translation effort needed for national mission on
education.
Module 2: Training
aims at training 500 faculties selected from across the country on
machine translation methodologies using machine learning
techniques.
Module 3: Dravidian WordNet
aims at developing a Dravidian WordNet required for translation.
29 April 2013
Total Budget
IIT Bombay 15 lacs
Amrita University 40 lacs
Tamil University 15 lacs
University of Hyderabad 15 lacs
Dravidian University 15 lacs
Time Frame
12 months
March 30, 2009 March 29, 2010


29 April 2013
Work done
Part of a one year Pilot project involving
Tamil, Telugu, Malayalam and Kannada
Funding Agency: Ministry of HRD
Duration: 18 months (July 2009-Dec 2010)
Deliverable: 13k synsets
7k synsets linked to IndoWordNet,
available at
http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php



29 April 2013
Statistics on Dravidian WordNet
29 April 2013
Publications
`Tamil WordNet, Proceedings of the Fifth Global WordNet
Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran)
`Building a WordNet for Dravidian Languages, Proceedings of
the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4
Feb 2010 (S.Rajendran, S.Gopakumar, V.Dhanalakshmi)
`Representation of Kinship in WordNet, Proceedings of the 9
th

International Tamil Internet Conference, Coimbatore, 23-27
June 2010 (S.Arulmozi)
`Polysemy in Tamil and other Indian Languages, Proceedings
of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4
Feb 2010 (S.Arulmozi & Panchanan Mohanty)
`Telugu WordNet, Proceedings of the Fifth Global WordNet
Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi)


29 April 2013
First IndoWordNet Workshop
Amrita University
11-14 June 2009
Necessity for developing linked WordNets of different
languages of India was stressed
Challenges such as language divergence, lexical semantics,
embedding WordNet in MT and cross-lingual search applications
can be achieved
Participation from groups: Hindi, Marathi, Sanskrit, Nepali,
Assamese, Bodo, Manipuri, Konkani, Kashmiri, Tamil,
Telugu, Malayalam, Kannada
Proposal on Indhradhanush
29 April 2013
Dravidian WordNet
Present Project
Funded by DIT.
29 April 2013
Links
Tamil WordNet Open Source
http://www.nrcfosshelpline.in/code/wiki/TamilWordnet
VerbNet (English)
http://verbs.colorado.edu/~mpalmer/projects/verbnet.html
Princeton English WordNet
http://wordnet.princeton.edu/
Global WordNet Association
http://www.globalwordnet.org/
WordNets in the World
http://www.globalwordnet.org/gwa/wordnet_table.htm
WordNet Bibliography
http://lit.csci.unt.edu/~wordnet/
IndoWordNet
http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php


29 April 2013
Thank you!
29 April 2013

You might also like