Myanmar Lexicon: Thin Zar Phyo, Wunna Ko Ko

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Myanmar Lexicon

Thin Zar Phyo, Wunna Ko Ko

May 08, 2008 JCSSE 08 1


Contents

● Introduction
● What is Lexicon?
● Myanmar Lexicon
● Lexique Pro
● Myanmar Lexicon with Lexique Pro
● Conclusion

May 08, 2008 JCSSE 08 2


Introduction


A south-east Asian country.
● Burmese is the official language.

● Burmese script comes from

Brahmi Script.
● There are more than 100

languages used in Myanmar.


● Myanmar includes Burmese,

Karen, Mon, Shan and other ethnic


languages used in Myanmar.

May 08, 2008 JCSSE 08 3


Lexicon

● It is a useful language resource for both


localization experts and computational
linguistics.
● It includes information about:
– the form and meaning of words and phrases
– lexical categorization
– the appropriate usage of words and phrases
– relationships between words and phrases, and
categories of words and phrases

May 08, 2008 JCSSE 08 4


Myanmar Lexicon
● The participating organizations include:
– Ministry of Education
– Myanmar Unicode & NLP Research center
– Myanmar Language Commission
● Includes over 35000 head words. The word counts do
not include word grouping.
● Field markers are chosen based on Multi Dictionary
Format (MDF). It consists of a set of word meanings
and their semantic relationships.
● Myanmar Language Commission supply necessary
information and collection of word list.
May 08, 2008 JCSSE 08 5
Myanmar Lexicon
(in paper format)

May 08, 2008 JCSSE 08 6


Lexique Pro
● It is designed to build dictionary/lexicon especially for
complex scripts.
● It is an open source software and can run on Windows OS.
● It is a product of SIL, formerly known as Summer Institute
of Linguistics.
● The primary functions of Lexique Pro are to:
– Create a dictionary,
– View and edit an existing Shoebox/Toolbox dictionary database,
– Share a database with other computer users,
– Export a dictionary to print as a text document, or html format for
web publication.

May 08, 2008 JCSSE 08 7


Lexique Pro (Contd.)

● It can export the lexicon into the following formats:


– Rich Text Format (.rtf),
– Microsoft Word 2007, Office Open XML (.docx),
– OpenOffice.org (.odt or .sxw) in one of three
formats:
– Alphabetical dictionary
– Classified dictionary (by category)
– Index table based on a gloss language.

May 08, 2008 JCSSE 08 8


Field markers of
Myanmar Lexicon

May 08, 2008 JCSSE 08 9


Myanmar Lexicon with Lexique Pro

May 08, 2008 JCSSE 08 10


Categories tab in Myanmar Lexicon

May 08, 2008 JCSSE 08 11


Myanmar Lexicon in MS Word 2007 format

May 08, 2008 JCSSE 08 12


Myanmar Lexicon in html format

May 08, 2008 JCSSE 08 13


Conclusion

● The developed lexicon is available in CD format. To be


available online in near future.
● This can help in development of statistical machine
translation, Text-To-Speech (TTS), Automatic Speech
Recognition (ASR).
● This is a general lexicon but with full features of field
markers and Part-of-Speech (POS).
● For particular use, this may not be usable as it is.

May 08, 2008 JCSSE 08 14


Thank you for your Attention.

Please send your Questions & Comments to:


Thin Zar Phyo (myanmar.nlp5@gmail.com)

May 08, 2008 JCSSE 08 15

You might also like