Grammar Teaching Tools For Tamil Language: July 2010

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/232618835

Grammar Teaching Tools for Tamil language

Article · July 2010


DOI: 10.1109/T4E.2010.5550056

CITATIONS READS

6 10,022

5 authors, including:

V. Dhanalakshmi M. Anand Kumar


Pondicherry University National Institute of Technology Karnataka
19 PUBLICATIONS 321 CITATIONS 180 PUBLICATIONS 1,747 CITATIONS

SEE PROFILE SEE PROFILE

Soman Kp Rajendran Sankaravelayuthan


Amrita Vishwa Vidyapeetham Amrita Vishwa Vidyapeetham
803 PUBLICATIONS 13,526 CITATIONS 237 PUBLICATIONS 793 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by M. Anand Kumar on 19 March 2018.

The user has requested enhancement of the downloaded file.


T4E 2010

Grammar Teaching Tools for Tamil language

Dhanalakshmi Velliangiri ,Anand Kumar.M, Rajendran S


Rekha R U, Soman K P Department of Linguistics
Computational Engineering and Networking Tamil University
Amrita Vishwa Vidyapeetham Thanjavur, India
Coimbatore, India raj_ushush@yahoo.com
{v_dhanalakshmi ,m_anandkumar ,kp_soman}
@cb.amrita.edu

Abstract—Grammar plays an important role in good NLP Tools and the annotated corpus we have developed
communication. Learning grammar rules for Tamil the Tools for Grammar teaching.
language is very difficult as they have a very rich Our automatic sentence analyzer and generator will
morphological structure which is agglutinative. Students get help the students to learn the structural construction and
annoyed with the language rules and the old teaching generation of the sentence in Tamil. When the user gives
methodology. Computer assisted Grammar Teaching Tools the input sentence, the system automatically analyzes the
makes students to learn faster and better. NLP applications Parts of Speech, Phrases and Dependency relationship
are used to generate such tools for curriculum enhancement between the words present in the sentence. It is also used
of the students. In this paper we present the Grammar
to generate the sentence using users Morpholexical input
teaching tools in the sentence and word analyzing level for
Tamil Language. The tools like Parts of speech Tagger,
and morphological generator. This non-conceptual way of
Chunker and Dependency parser for the sentence level study enhances the grammar learning ability of the
analysis and Morphological Analyzer and Generator for the students.
word level analysis were developed using machine learning
based technology. These tools were very useful for second
II. TAGGED CORPUS DEVELOPMENT
language learners to understand the word and sentence In language teaching tools annotated corpus plays an
construction in a non-conceptual way. An user interface is important role. Annotation of corpora is needed in
developed for the practical usage of the tool. sentence level as well as in word level. In the sentence
level, we have annotated corpus for Parts of Speech,
Keywords-Grammar;Teaching;NLP-Tool;Machine Phrase, and the Dependency relationship between the
Learning words. In the word level, morpheme annotation was done.
Part of speech tagging forms the basic step towards
I. INTRODUCTION building an annotated corpus at sentence level. For
We are now living in the world of communication. creating Tamil Part of Speech Tagger We have
Grammar plays an important role in good communication. grammatically tagged corpus size of 3, 50,000 words. We
But Students get annoyed with the language rules and the have collected sentences from Dinamani news paper,
old teaching methodology. Today the computer aided yahoo Tamil news and Tamil short stories etc and tagged
teaching technologies are widely used to increase the its POS categories [6]. Chunking the phrases form the next
learning ability of the students. Students also benefit from level of tagging. We have chunked the corpus size of 3,
the immediate feedback provided by computers and most 50,000 words. For Dependency parser we have manually
of them appreciate the self-paced learning environment. At annotated corpus size of 25,000 words.
its best, CAI engages student interest, motivates them to In an agglutinative language like Tamil the words are
learn, and increases their personal responsibility for highly inflected with morphemes. For the word analysis
learning [1].The field of Natural Language Processing is we have segmented the morphemes and tagged its
developing steadily with the well advancement in the morphological categories [5]. Our corpora contain 7 lakhs
Artificial Intelligence application. The innovative NLP morphologically tagged words. These grammatically
applications are used to generate Language teaching tools annotated corpora are used for language teaching and
which enhance the teaching and learning of Grammar. learning.
We have developed the NLP Tools like Parts of speech
Tagger, Chunker and Dependency parser for the sentence III. GRAMMAR TEACHING TOOLS FOR TAMIL
level analysis and Morphological Analyzer and Generator
for the word level analysis. These tools were developed
using machine learning approaches. The key thing needed In this section we are discussing about the NLP tools
for the machine learning process is the linguistically which we were created for grammar learning.
annotated data. This linguistically annotated corpus is very
useful for creating the language teaching tools. There is no • POS Tagger
such annotated corpus for Indian languages, especially for • Chunker
Tamil. We have developed an annotated corpus for POS • Dependency Parser
tagger, Chunker, Morphological Analyzer and • Morphological Analyzer
Dependency Parser. Using our machine learning based • Morphological generator

978-1-4244-7361-8/10/$26.00 © 2010 IEEE 85


Figure 2. POS Tagger and Chunker

Example:
{That Beautiful Girl}
Figure 1. Sentence analyzer and generator . Sentence: அந்த அழகான ெபண்
POStag : DET ADJ NN
These tools are very useful for the students for Chunk Tag : [ Noun Phrase ]
language learning. The major draw back of most language B. Dependency parser
teaching software is that they do not automated .They uses
a small set of predefined sentences that cannot be modified Dependency grammar has until recently played a fairly
or replaced. But here we have automated the system. Using marginal role both in theoretical linguistics and in natural
these tools we have developed a Sentence Analyzer and language processing. Dependency grammar has largely
Generator. “Fig.1” shows the general block diagram of developed as a form for syntactic representation used by
sentence analyzer and generator. traditional grammarians, the fundamental notion of
dependency is based on the idea that the syntactic structure
A. Part-of-Speech Tagger and Chunker of a sentence consists of binary asymmetrical relations
Part Of Speech tagging and chunking are the between the words of the sentence [3]. A dependency
fundamental processing steps for any language processing relation holds between a head and a dependent. Parsing in
task. Part of speech (POS) tagging is the process of natural language context is a process of recognizing
labeling automatic annotation of syntactic categories for structure of an input sentence and assigning syntactic
each word in a corpus [6]. Chunking is the task of structure to it. The important thing of parser is that it
identifying and segmenting the text into syntactically should be able to solve ambiguities in various structures.
correlated word groups. These are done by the machine “Fig.3” shows the interface for Dependency parser. Here
learning techniques, where the linguistical knowledge is input is a sentence and output is a dependency Tree and
automatically extracted from the annotated corpus. The Dependency relations.
capability for a tool to automatically POS tag and chunk a
sentence is very essential for further analysis in many
approaches to the field of NLP.
A subsequent step after POS tagging focuses on the
identification of basic structural relations between groups
of words in a sentence. This recognition is usually referred
to as chunking. It is essential for many NLP tasks such as
structure identification, information extraction, parsing and
phrase based machine translation system. Chunker divides
a sentence into its major-non-overlapping phrases and
attaches a label to each chunk. Chunking falls between
tagging and parsing. Many of the machine learning
techniques and algorithms are used in this task. Here we
are using this tool for grammar teaching. This tool is very
useful for students to teach grammatical categories and
phrases. “Fig.2” shows the user interface of POS-Tagger
and Chunker tool.
Figure 3. Dependency Parser.

86
It is also used in Machine Translation, Grammar D. Morphological generator
checking, question answering and information extraction The Morphological Generator takes lemma and a
systems. The dependency parsing approach here uses Morpho-lexical description as input and gives a word-form
linguistic information to give relationship between words. as output. It is a reverse process of Morphological
The use of machine learning approach arises from the fact Analyzer. Morphological generator system implemented
that rules are learned automatically from annotated data here is a new data driven approach which is simple,
using learning and parsing algorithms to learn models and efficient and does not require any rules and morpheme
make predictions. The motive behind this work is to find dictionary.
relation between words and go to the next stage of There are two options available in this morphological
disambiguation of word sense and also to find predicate Generator tool. As a first option User selects any Morpho-
argument structure. Dependency Parser for Tamil is lexical information along with root word. This tool will
developed using Machine Learning based Parser. We have produce intended word form corresponding to the user’s
created an annotated corpora size of 25,000 words for input. The second option available will generate all the
Dependency Parser tool. This Tool is very useful for word forms of a particular word. If the user gives the root
student to understand the Structure of the sentence and word or lemma as an input this tool it will automatically
dependency relationships between words. generate all the possible word forms (1500 for verbs and
C. Morphological analyzer 320 for Noun). This tool is also used for Sentence
Generator. So this can be very useful for teaching
Morphological analysis is concerned with retrieving languages. Using this tool alone students can understand
the structure, syntactic rules, morphological properties and the different word forms and morphological
the meaning of a morphologically complex word. It is the transformations of words in Tamil. This tool is very
process of segmenting words into morphemes and helpful for improving vocabulary. User interface of
analyzing the word formation. It is a primary step for Morphological generator is shown in”Fig.5”.
various types of text analysis of any language. It is also
used in speech synthesizer, speech recognizer,
Example :{ eyes}
lemmatization, noun decompounding, spell and grammar
checker and machine translation. கண்+<PL-Marker>= கண்கள்
Our Novel approach to morphological analyzer is
based on sequence labeling and training by kernel
methods. It captures the non-linear relationships and
various morphological features of Natural language in a
better and simpler way [5]. Our morphological analyzer
and the morphologically tagged words are used to increase
vocabulary knowledge of the students. From this tool the
students learn the morpheme, the smallest meaningful unit
of grammar and its morphotactic construction. In addition
the students also understand the differences between a root
word and suffixes. This is very important in academic
word learning process [2]. “Fig.4” shows the user interface
for morphological analyzer.
Example :{ studied}
ப த்தான் = ப <Verb-Root>
த்த் <Past-Tense>
ஆன் <3SM>
Figure 5. Morphological Generator

IV. ADVANTAGES OF NLP TOOLS


These NLP tools are also used in developing the following
Systems:
• Spell checker
• Information extraction and retrieval
• Simple Machine Translation system
• Grammar cheker
• Content analysis
• Question Answering system
• Automatic sentence Analyzer/generator
• Dialoge system
Figure 4. Morphological analyzer

87
• Knowlege representation in learning ACKNOWLEDGMENT
• Automatic question generation(Randomly) This work was part of the project “Creation of Machine
• Automatic assesment tool Translation Tools and resources for English to Dravidian
• Language Teaching Languages” funded by MHRD Government of India. We
• Language based educational exercises would like to thank MHRD for the successful completion
of this work.
It enhances the students to:
• Improve vocabulary REFERENCES
• Improve writing and reading skills [1] David Collins, Alan Deck,Myra McCrickard “Computer Aided
Instruction: A Study Of Student Evaluations And Academic
• Visualize grammatical structure of a sentence Performance”, Journal of College Teaching & Learning –
• Develop grammar knowledge November 2008 , Volume 5, Number 11
• Speed up the Learning of Second language [2] Bellomo, T. (2009, April).” Morphological analysis and vocabulary
development: Critical criteria.” Reading Matrix, 9(1), 44-55:
V. CONCLUSION AND FUTURE WORK http://www.readingmatrix.com/articles/bellomo/article.pdf
[3] Joakim Nivre, “Dependency Grammar and Dependency Parsing”
In this paper we have described the development and
http://stp.lingfil.uu.se/~nivre/docs/05133.pdf
creation of NLP Tools and annotated corpora which in turn
[4] Dhanalakshmi V., Padmavathy P., Anand Kumar M., Soman K.P.,
aid in the Grammar Teaching and learning for Tamil Rajendran S., "Chunker for Tamil," artcom, pp.436-438, 2009
language. Machine Learning techniques are playing an International Conference on Advances in Recent Technologies in
important role in the fast growth of NLP. Due to the well Communication and Computing, 2009, IEEE Press,doi:
10.1109/ARTCom.2009.191
advancement of these systems linguistical tools were
[5] Dhanalakshmi V., Anand Kumar M., Rekha R.U., Arun Kumar C.,
developed with the grammatically annotated corpora. Both Soman K.P., Rajendran S.,"Morphological Analyzer for
the NLP tools and the annotated corpora are interlinked to Agglutinative Languages Using Machine Learning Approaches,"
produce a healthy tool to teach and learn grammar of artcom, pp.433-435, 2009 International Conference on Advances in
Tamil language at word and sentence level. We have also Recent Technologies in Communication and Computing,
2009,IEEE Press,doi: 10.1109/ARTCom.2009.184
developed these tools for Malayalam, Kannada and Telugu
[6] Dhanalakshmi V, Anandkumar M, Vijaya M.S, Loganathan R,
languages. In future, using these tools we are going to Soman K.P, Rajendran S,” Tamil Part-of-Speech tagger based on
develop a simple Machine translation system between SVMTool “, In Proceedings of the COLIPS International
English to Dravidian languages and educational tools for Conference on natural language processing(IALP), Chiang Mai,
Thailand. 2008.
students for effortless language learning.

88
View publication stats

You might also like