Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

GA-Based Machine Translation System

for Sanskrit to Hindi Language

Muskaan Singh, Ravinder Kumar and Inderveer Chana

Abstract Machine translation is the noticeable field of the computational etymol-


ogy. Computational phonetics has a place with the branch of science which bargains
the dialect perspectives with the help of software engineering innovation. In this field,
all handling of regular dialect is finished by the machine (PC). Calculation is done
by considering all features of the language and in addition vital principal of sentence
like its structure semontics and morphology. Machine ought to see all these conceiv-
able parts of the dialect, yet past work does not deal with alternate prerequisites amid
machine interpretation. Current online and work area machine interpretation frame-
works disregard numerous parts of the dialects amid interpretation. Because of this
issue, numerous ambiguities have emerged. Because of these ambiguities, current
machine interpreter is not ready to deliver right interpretation. In this proposed work,
genetic algorithm-based machine translation system is proposed for the translation
of Sanskrit into Hindi language which is more efficient than the existing translation
systems.

Keywords Machine translation · GA · Hindi · Sanskrit · NLP

1 Introduction

Machine translation system (MTS) helps the humans to interact with people from
other cultures without any language barriers so that they can share their ideas, infor-
mation and know each other very easily. This is a common application of Natural
Language Processing (NLP), and it helps to translate one language into other. As
we know, India is very rich for their cultures and all the states have their different
languages. So, to provide translation of different languages without any human assis-
tance is the main aim of the MTS. Sanskrit is the primary sacred language of India

M. Singh (B)
Research Labs, CSED, Thapar University, Patiala, Punjab, India
e-mail: muskaan_singh@thapar.edu
R. Kumar · I. Chana
CSED, Thapar University, Patiala, Punjab, India

© Springer Nature Singapore Pte Ltd. 2019 419


A. Khare et al. (eds.), Recent Trends in Communication, Computing,
and Electronics, Lecture Notes in Electrical Engineering 524,
https://doi.org/10.1007/978-981-13-2685-1_40
420 M. Singh et al.

[1] and official language in one of the states, i.e., Uttarakhand, India, whereas Hindi
is also one of the primary languages of India and it is used everywhere in India. So,
there is a need for the translation system which can translate these two important
languages.
The two important uses of the MTS are assimilation and dissemination. Where
is assimilation means one can understand some text which means understand its
general meaning but might not grammatically right. Secondly, dissemination means
text should be grammatically correct so that it will be treated as publishable content.
As different languages have different styles and different structures so the current
translation system has to face so many problems. Some of the problems are like:
a. Meaning of the Word: Different words have different meanings but sometimes
same words have a different meaning when they are translated from one language
into other, and it is very difficult to select the right word for correct meaning which
is very important part of any translation system.
b. Order of the Word: This is also a very important factor while translating one
language into other because some of the languages follow the order like Subject
(S), Verb (V), and Object (O) and others might have some other order but Sanskrit
language can be written using SVO, SOV, and VOS order.
c. Idioms: As idioms are the gathering of words set up by utilization as having sig-
nificance not deducible from those of the individual words. So while translating
idiom expressions, it will not convey the original meaning.

2 Related Work

A number of works were performed as MTS. Some of the MTSs are given below.
In 1968, SYSTRAN is established by Dr. Dwindle Toma which is one of the most
seasoned machine interpretation organizations. SYSTRAN has done broad work for
the US Department of Defense and the European Commission. SYSTRAN gives the
innovation to Yahoo! Babel Fish among others. It was utilized by Google’s dialect
apparatuses until 2007. It is rule-based MTS and deals with 35 different languages.
ETSTS [2–4] is a rule-based and example-based approach of MT. Utilizing discourse
synthesizer as a module, it changes over target sentence to discourse yield. The out-
lines of the framework are modularized to text input, grammar and spell check, token
generator, translator, parser generator module, RBMT/EBMT engine, its bilingual
database, text yield, and a waveform generator.
Google Translate is a free multilingual machine interpretation benefit created by
Google, to decipher content from one dialect into another. It is based on statistical
approach. It offers a site interface, portable applications for Android and iOS, and an
API that enables designers to assemble program augmentations and programming
applications. Google Translate bolsters more than 100 dialects at different levels.
Khaled Shaalan [5] had given the rule-based approach of machine interpretation for
English to Arabic Natural Language Handling and the govern based apparatuses for
GA-Based Machine Translation System … 421

Arabic common dialect. It has given the morphological analyzers and generator and
syntactic analyzer and generators.
Sandeep Warhade [6] had given a plan of a phrase-based decoder for English-to-
Sanskrit interpretation. It portrays the phrase-based measurable machine translation
decoder for English as source what’s more, Sanskrit as target dialect. They will
likely enhance the interpretation quality by improving the interpretation table and
by preprocessing the source dialect content research. They examine the significant
plan objective for the decoder and its execution with respect to other SMT decoders.
ESSS [7] utilize rule-based interpretation system. Subsequent to changing English
discourse to content, the sentences shaped are first disintegrated into words. The resul-
tant words are coordinated in database, it likewise separates parts of speech (POS)
data utilizing database to characterize each word for a thing, verb, and descriptor and
so on, at that point applying sentence structure standards, and Sanskrit content gets
created revising the sentence producing target dialect content. This content is given
as a contribution to wave shape generator where it gets changed over to Sanskrit
discourse.
Microsoft provides a statistical approach-based machine translation system named
as Big Translator [8]. E-Trans [6] is a control-based machine interpretation instru-
ment. It is basically in view of plan of synchronous context-free language (SCFG), a
subset of context-free grammar (CFG). Dialect portrayal of linguistic structure is fin-
ished utilizing SCFG. The process motor created works in two stages, the top–down
approach and bottom-to-top examination.

3 Machine Translation Systems

MTSs are categorized into different categories as shown in Fig. 1. They are direct,
rule based, corpus based, and knowledge based.
In direct MT [9], there is no middle of the road portrayal of codes. Utilizing bilin-
gual lexicon, there is word-by-word interpretation with the help of bilingual word
reference took after by a few syntactic reworking. This technique for interpretation
is as it was possible for one dialect match. It requires little examination of con-
tent without parsing. Here investigation technique like morphological investigation,
preposition handling, syntactic arrangement and morphological generation can be
performed.
In rule-based MTS [9], middle portrayal might be created like a parse tree. It
depends on rules for morphology, linguistic structure, lexical choice and exchange,
semantic examination, and age in this way known as rule based. Rule based can be
of two types
(1) Transfer-based
(2) Interlingua
In transfer based, SL to TL is without moderate portrayal while in Interlingua some
moderate code portrayal is made through which SL meant TL by means of bury
dialect codes.
422 M. Singh et al.

Fig. 1 Methods of machine


translation systems
Direct

Rule • Transfer
Based • Interligua

Corpu • Example
s
Based • Statistical

Knowledge
based

In corpus-based MTS [9, 10], it requires sentence-adjusted parallel content for


every dialect combine. It cannot be utilized for dialect sets for which such corpora
do not exist. It can be additionally ordered into statistical MT and example-based
MT.
In knowledge-based MTS frameworks [11], semantic-based way to deal with
dialect investigation is presented by artificial knowledge specialists. It requires vast
information base that incorporates both ontological and lexical learning. The fun-
damental AI approaches incorporate semantic parsing, lexical disintegration into
semantic systems and settling ambiguities.
The basic process of the MTS is as given in Fig. 2. The first and foremost step
to provide input text to the translation system is according to the user’s requirement
and also on the basis of the translation system.
The next step of the translation system is tokenization, which is the process of
crumbling of given text into smaller units called tokens. Then, grammatical informa-
tion of these token is generated by using morphological analysis. After that, parse
tree will be generated by using parser. It generates grammatical information with
respect to context. This will further lead to generator module which takes semantic
data from the morphological examination module and does mapping taken after by
hunting down the right type of the words from the vocabulary by considering root
words to produce a yield of the source dialect. Then, target text will be generated
after reformatting.
GA-Based Machine Translation System … 423

Fig. 2 MTS
Input Text

Tokenization

Morphological Analysis

Parsing

Generator Module

Reformatting

Target Text Generation

Table 1 Input and tokenization

4 Proposed Ga-Based Approach for Translator

This proposed system is used to translate the Sanskrit language into Hindi. It is an
efficient translator because it uses genetic algorithm -based generator which enhances
the mapping process. In this proposed work, we have two main phases: (i) initial phase
and (ii) generator phase.
Phase 1: Initial Phase
This phase is responsible for different analysis processes which include acquisition
of input, tokenization, morphological analysis, and parsing. These steps play an
important role in MTS.
• Acquisition of Input: In this step, firstly sentence is taken as an input for transla-
tion process. We have divided the sentences into three categories for this proposed
work that are (a) small, (b) large, and (c) extra large. So, input sentence may fall
into any one of these categories.
• Tokenization: This process will divide the input sentence into smaller tokens so
that it will be analyzing that sentence falling under which category. For example,
some of the sentences and their token values with their category are given in
Table 1.
424 M. Singh et al.

Table 2 Example of morphological analysis

• Morphological Analysis: This process helps the translation system to collect


grammatical information about the input sentence from the tokens. For exam-
ple, . Input sentence have three tokens (1) (2) (3)
and morphological process provides grammatical information about these
tokens which tells that the words belongs to which category means its noun, pro-
noun or verb as given in Table 2. For example: Vrkssebhyah is a noun but
it is ablative case of noun, where ablative indicates ‘from, on account of, etc.’
• Parsing: It is the process of analyzing given input and confirming its grammatical
correctness. The parser first scans the words and recognizes them, and then it
recognizes the syntactic units. The main three operations performed by the parser
are (i) checking and verification of the syntax based on specified syntax rules and
(ii) reporting about the errors. In this work, bottom-up parser technique is used
which first finds the rightmost derivation in reverse order and then for every token
decides when production is found. This parsing approach can handle the largest
class of grammars that can be parsed.
Phase 2: Generator Phase
The second phase of the MTS is generator phase. This phase performs the reverse of
the initial phase which includes mapping, morphological analysis, and output pro-
cess. In this work, we proposed a genetic algorithm-based mapper which efficiently
maps the tokens with their respective target tokens.
• Genetic algorithm-based mapper: Mapping is the process which basically
checks the grammatical compatibility of the source and target language. In this
work, we generate a GA-based mapper which follows the following steps:
– Randomly generate a population of ‘k’ possible solutions where ‘k’ is the num-
ber of inputs.
– Figure out the degree of acceptance of each solution on the basis of fitness value
which is calculated on the basis of grand mean.
– Select two solutions and perform crossover process. In this, we have used single-
point crossover, which generates new solutions.
– Then, alter these solutions by mutation process.
– Treat it as the current best solution and repeat until generating new solutions.
– Repeat until fitness value reaches its maximum value or the number of solutions
has reached maximum.
• Morphological Generator: The morphological generator takes the input sentence
and first looks for its root word from the mapper. This GA-based mapper analyzes
the root word with its respective target language and provides the exact match for
GA-Based Machine Translation System … 425

Table 3 Example of target tokens

Table 4 Source and target sentence

the same in target language. For Example: In this sentence


root word is The respective target word selected by Mapper is as given in
Table 3.
• Output Process: This is the last step of this phase which mainly gathers informa-
tion from the leaves and finally generates the output.

5 Simulation Results

In this work, the genetic algorithm-based machine translation system (GA-MTS)


is proposed. The input language for this work is Sanskrit and target is Hindi. Both
languages are the part of Indian culture. Sanskrit is the primary sacred language
where Hindi is national language of India. This system supports grammar system for
both the languages and generates very efficient results. To evaluate the effectiveness
of this proposed system, 300 samples of different types of sentences were taken and
various analyses were done using proposed GA-MTS. Some of the results are given
in Table 4.
The sentences are divided into three different categories according to their size
that are (i) small, (ii) large and (iii) very large as given in Table 5, and accuracy of this
proposed MTS is analyzed on these different categories. We had taken 200 sentences
from small category and 50 each from large and very large category. Accuracy is
calculated on the basis of conversions and their matching with actual meaning of
the sentences. It means converted samples are matched with the original Hindi sam-
ples, and their recognition provides us the accuracy of this proposed convertor. The
performance results are as shown in Fig. 3.
426 M. Singh et al.

Table 5 Different categories of sentences


Category Parameter Number of sentences Accuracy (%)
tested
Small ≥ 1 &<3 200 98
Large ≥ 3 &<6 50 95
Very large ≥ 6 & ≤ 11 50 90

Fig. 3 Performance of
Accuracy
GA-MTS 100%
95%
90%
85%
Small Large Very Large

6 Conclusion

This proposed work generates GA-based machine translation system. This GA-MTS
has two phases, and both phases perform different tasks. We have added genetic
algorithm in generator phase where it helps to map the particular input to its tar-
get language and GA performs very efficiently in this proposed MTS. For testing
purposes, three different types of sentences were taken. This proposed GA-MTS is
tested using 300 different samples and achieves 94% accuracy on an average which
is very good in the field of translation for Sanskrit to Hindi language. In future,
this proposed system will be tested using a large number of samples of the complex
category means samples with sentences of large and very large category. So that
more and more efficient machine translation system will be generated for these two
prominent languages of India.

References

1. Rathod, S. G. (2014). Machine translation of natural language using different approaches:


ETSTS (English to Sanskrit Translator and synthesizer). International Journal of Computer
Applications, 102(15), 26–31.
2. Zhao, Y., & He, X. (2009). Using N-gram based features for machine translation system com-
bination. In Proceedings of Human Language Technologies: The 2009 Annual Conference of
the North American Chapter of the Association for Computational Linguistics, Companion
Volume: Short Papers on—NAACL 09 (pp. 205–208).
3. Zogheib, A. (2009). Genetic algorithm-based multi-word automatic language transla-
tion. Recent Advances in Intelligent Information Systems, 751–760.
4. Rathod, S. G., & Sondur, S. (2012). English to Sanskrit Translator and synthesizer (ETSTS).
International Journal of Emerging Technology and Advanced Engineering, 2(12), 379–383.
GA-Based Machine Translation System … 427

5. Mane, D. T., Devale, P. R., & Suryawanshi, S. D. (2010). A design towards English To Sanskrit
machine translation and sythesizer system using rule base approach. International Journal of
Multidisciplinary Research And Advances In Engineering (IJMRAE), 2(2), 405–414.
6. Bahadur, P., Jain, A. K., & Chauhan, D. S. (2012). EtranS-A complete framework for English
To Sanskrit machine translation. International Journal of Advanced Computer Science and
Applications, 2(1), 52–59.
7. Tahir, G. R., Asghar, S., & Masood, N. (2010). Knowledge based machine translation. In Inter-
national Conference on Information and Emerging Technologies (ICIET) (pp. 1–5), November
2010.
8. Raulji, J. K., & Saini, J. R. (2016). Sanskrit machine translation systems: A comparative
analysis. International Journal of Computer Applications, 136(1), 1–4.
9. Mishra, V., & Mishra, R. B. (2008). Study of example based English to Sanskrit machine
translation. Polibits, 37, 43–54.
10. Patil, S. P., & Kulkarni, P. P. (2014). Online handwritten Sanskrit character recognition using
support vector classification. Internal Journal of Engineering Research and Applications, 4(5),
82–91.
11. Shahnawaz (2015). Conversion between Hindi and Urdu. In International Conference on Com-
puting, Communication & Automation (pp. 309–313).

You might also like