Professional Documents
Culture Documents
GA-Based Machine Translation System For Sanskrit To Hindi Language
GA-Based Machine Translation System For Sanskrit To Hindi Language
1 Introduction
Machine translation system (MTS) helps the humans to interact with people from
other cultures without any language barriers so that they can share their ideas, infor-
mation and know each other very easily. This is a common application of Natural
Language Processing (NLP), and it helps to translate one language into other. As
we know, India is very rich for their cultures and all the states have their different
languages. So, to provide translation of different languages without any human assis-
tance is the main aim of the MTS. Sanskrit is the primary sacred language of India
M. Singh (B)
Research Labs, CSED, Thapar University, Patiala, Punjab, India
e-mail: muskaan_singh@thapar.edu
R. Kumar · I. Chana
CSED, Thapar University, Patiala, Punjab, India
[1] and official language in one of the states, i.e., Uttarakhand, India, whereas Hindi
is also one of the primary languages of India and it is used everywhere in India. So,
there is a need for the translation system which can translate these two important
languages.
The two important uses of the MTS are assimilation and dissemination. Where
is assimilation means one can understand some text which means understand its
general meaning but might not grammatically right. Secondly, dissemination means
text should be grammatically correct so that it will be treated as publishable content.
As different languages have different styles and different structures so the current
translation system has to face so many problems. Some of the problems are like:
a. Meaning of the Word: Different words have different meanings but sometimes
same words have a different meaning when they are translated from one language
into other, and it is very difficult to select the right word for correct meaning which
is very important part of any translation system.
b. Order of the Word: This is also a very important factor while translating one
language into other because some of the languages follow the order like Subject
(S), Verb (V), and Object (O) and others might have some other order but Sanskrit
language can be written using SVO, SOV, and VOS order.
c. Idioms: As idioms are the gathering of words set up by utilization as having sig-
nificance not deducible from those of the individual words. So while translating
idiom expressions, it will not convey the original meaning.
2 Related Work
A number of works were performed as MTS. Some of the MTSs are given below.
In 1968, SYSTRAN is established by Dr. Dwindle Toma which is one of the most
seasoned machine interpretation organizations. SYSTRAN has done broad work for
the US Department of Defense and the European Commission. SYSTRAN gives the
innovation to Yahoo! Babel Fish among others. It was utilized by Google’s dialect
apparatuses until 2007. It is rule-based MTS and deals with 35 different languages.
ETSTS [2–4] is a rule-based and example-based approach of MT. Utilizing discourse
synthesizer as a module, it changes over target sentence to discourse yield. The out-
lines of the framework are modularized to text input, grammar and spell check, token
generator, translator, parser generator module, RBMT/EBMT engine, its bilingual
database, text yield, and a waveform generator.
Google Translate is a free multilingual machine interpretation benefit created by
Google, to decipher content from one dialect into another. It is based on statistical
approach. It offers a site interface, portable applications for Android and iOS, and an
API that enables designers to assemble program augmentations and programming
applications. Google Translate bolsters more than 100 dialects at different levels.
Khaled Shaalan [5] had given the rule-based approach of machine interpretation for
English to Arabic Natural Language Handling and the govern based apparatuses for
GA-Based Machine Translation System … 421
Arabic common dialect. It has given the morphological analyzers and generator and
syntactic analyzer and generators.
Sandeep Warhade [6] had given a plan of a phrase-based decoder for English-to-
Sanskrit interpretation. It portrays the phrase-based measurable machine translation
decoder for English as source what’s more, Sanskrit as target dialect. They will
likely enhance the interpretation quality by improving the interpretation table and
by preprocessing the source dialect content research. They examine the significant
plan objective for the decoder and its execution with respect to other SMT decoders.
ESSS [7] utilize rule-based interpretation system. Subsequent to changing English
discourse to content, the sentences shaped are first disintegrated into words. The resul-
tant words are coordinated in database, it likewise separates parts of speech (POS)
data utilizing database to characterize each word for a thing, verb, and descriptor and
so on, at that point applying sentence structure standards, and Sanskrit content gets
created revising the sentence producing target dialect content. This content is given
as a contribution to wave shape generator where it gets changed over to Sanskrit
discourse.
Microsoft provides a statistical approach-based machine translation system named
as Big Translator [8]. E-Trans [6] is a control-based machine interpretation instru-
ment. It is basically in view of plan of synchronous context-free language (SCFG), a
subset of context-free grammar (CFG). Dialect portrayal of linguistic structure is fin-
ished utilizing SCFG. The process motor created works in two stages, the top–down
approach and bottom-to-top examination.
MTSs are categorized into different categories as shown in Fig. 1. They are direct,
rule based, corpus based, and knowledge based.
In direct MT [9], there is no middle of the road portrayal of codes. Utilizing bilin-
gual lexicon, there is word-by-word interpretation with the help of bilingual word
reference took after by a few syntactic reworking. This technique for interpretation
is as it was possible for one dialect match. It requires little examination of con-
tent without parsing. Here investigation technique like morphological investigation,
preposition handling, syntactic arrangement and morphological generation can be
performed.
In rule-based MTS [9], middle portrayal might be created like a parse tree. It
depends on rules for morphology, linguistic structure, lexical choice and exchange,
semantic examination, and age in this way known as rule based. Rule based can be
of two types
(1) Transfer-based
(2) Interlingua
In transfer based, SL to TL is without moderate portrayal while in Interlingua some
moderate code portrayal is made through which SL meant TL by means of bury
dialect codes.
422 M. Singh et al.
Rule • Transfer
Based • Interligua
Corpu • Example
s
Based • Statistical
Knowledge
based
Fig. 2 MTS
Input Text
Tokenization
Morphological Analysis
Parsing
Generator Module
Reformatting
This proposed system is used to translate the Sanskrit language into Hindi. It is an
efficient translator because it uses genetic algorithm -based generator which enhances
the mapping process. In this proposed work, we have two main phases: (i) initial phase
and (ii) generator phase.
Phase 1: Initial Phase
This phase is responsible for different analysis processes which include acquisition
of input, tokenization, morphological analysis, and parsing. These steps play an
important role in MTS.
• Acquisition of Input: In this step, firstly sentence is taken as an input for transla-
tion process. We have divided the sentences into three categories for this proposed
work that are (a) small, (b) large, and (c) extra large. So, input sentence may fall
into any one of these categories.
• Tokenization: This process will divide the input sentence into smaller tokens so
that it will be analyzing that sentence falling under which category. For example,
some of the sentences and their token values with their category are given in
Table 1.
424 M. Singh et al.
5 Simulation Results
Fig. 3 Performance of
Accuracy
GA-MTS 100%
95%
90%
85%
Small Large Very Large
6 Conclusion
This proposed work generates GA-based machine translation system. This GA-MTS
has two phases, and both phases perform different tasks. We have added genetic
algorithm in generator phase where it helps to map the particular input to its tar-
get language and GA performs very efficiently in this proposed MTS. For testing
purposes, three different types of sentences were taken. This proposed GA-MTS is
tested using 300 different samples and achieves 94% accuracy on an average which
is very good in the field of translation for Sanskrit to Hindi language. In future,
this proposed system will be tested using a large number of samples of the complex
category means samples with sentences of large and very large category. So that
more and more efficient machine translation system will be generated for these two
prominent languages of India.
References
5. Mane, D. T., Devale, P. R., & Suryawanshi, S. D. (2010). A design towards English To Sanskrit
machine translation and sythesizer system using rule base approach. International Journal of
Multidisciplinary Research And Advances In Engineering (IJMRAE), 2(2), 405–414.
6. Bahadur, P., Jain, A. K., & Chauhan, D. S. (2012). EtranS-A complete framework for English
To Sanskrit machine translation. International Journal of Advanced Computer Science and
Applications, 2(1), 52–59.
7. Tahir, G. R., Asghar, S., & Masood, N. (2010). Knowledge based machine translation. In Inter-
national Conference on Information and Emerging Technologies (ICIET) (pp. 1–5), November
2010.
8. Raulji, J. K., & Saini, J. R. (2016). Sanskrit machine translation systems: A comparative
analysis. International Journal of Computer Applications, 136(1), 1–4.
9. Mishra, V., & Mishra, R. B. (2008). Study of example based English to Sanskrit machine
translation. Polibits, 37, 43–54.
10. Patil, S. P., & Kulkarni, P. P. (2014). Online handwritten Sanskrit character recognition using
support vector classification. Internal Journal of Engineering Research and Applications, 4(5),
82–91.
11. Shahnawaz (2015). Conversion between Hindi and Urdu. In International Conference on Com-
puting, Communication & Automation (pp. 309–313).