Professional Documents
Culture Documents
Research Papers
Research Papers
234
Proceedings of the 17th International Conference on Natural Language Processing, pages 234–238
Patna, India, December 18 - 21, 2020. ©2020 NLP Association of India (NLPAI)
• We test multiple machine translation ap- data, we obtained 2,100 sentences from OPUS4 ,
proach based supervised methodology, Trans- Sanskrit translation of Bible, Shlokas from Ra-
fer Learning, and reinforcement learning ap- mayana and more sentences from Gita. As data
proach that leverages monolingual data for is extracted from multiple sources, sentences with
Neural Machine Translation (NMT). the same source but multiple translations and sen-
tences with the same translation, various sources
• We also release the collected parallel English are removed. Finally, a parallel dataset with 9000
- Sanskrit data as well as monolingual data for parallel lines is extracted, further divided into the
Sanskrit. standard train, test, and validation set with a ratio
of 80:10:10, respectively.
2 Related Work For the monolingual data, we collected the data
Work by Mishra and Mishra (2009) mainly fo- from the Romanized version of Mahabharata, con-
cuses on building tokenization, POS Tagger, and a sisting of 130,000 lines (approx) and for English,
Named Entity Recognition (NER) system for the we extracted Europarl dataset (Koehn, 2005)
Sanskrit language using statistical machine trans-
lation approach. Mane et al. (2010) introduced a 4 Proposed Methodology
dictionary-based approach for implementing ma- Previous Neural Machine Translation approaches
chine translation on Sanskrit by parsing and replac- for Sanskrit mainly focus on Rule-Based Approach
ing source word with the target using a bilingual and Encoder-Decoder Mechanism using LSTM
dictionary. units. The classical Rule-Based approach is time-
Bahadur et al. (2012) developed Machine trans- consuming, requires much manual work by the lin-
lation which primarily focused formulation of Syn- guist, and does not have good learning capabilities.
chronous Context-Free Grammar (SCFG) and a In contrast, LSTMs based models tend to overfit
subset of Context-Free Grammar (CFG). The de- faster, suffer from issues related to polysemy, and
veloped model firstly tokenize input data and then multiple word senses (Calvo et al., 2019; Huang
match the exact word or phrase from the dictio- et al., 2011).
nary. The developed model also gathers informa- To handle all these issues, we first established
tion about parts of speech (POS) of input sentences. a baseline translation model using a multi-head
The work by Rathod (2014) implemented a Rule- self-attention mechanism using encoder-decoder
Based and Example-based approach for Machine architecture, as suggested by Vaswani et al. (2017).
translation using a bilingual dictionary and speech Further, we improved the baseline translator using
synthesizer that also converts speech to text. The a reinforcement learning approach by establishing
designed model was capable of grammar and spell language models and agents that leverage mono-
check too. An open-source web portal 3 collects lingual data. We further experimented with the
data from domains like primary and secondary Transfer Learning approach for Machine Transla-
school Sanskrit literature books, also established tion to get the benefit of lexically similar Hindi
by Govt. of India in 2015. It also implements sta- language that is rich resource language.
tistical Machine Translation algorithms and even
tries to solve Word Sense Disambiguation (WSD) 4.1 Transformer Translator
problem. Initially, the raw sentences were tokenized using
Apart from Koul and Manvi (2019) encoder- SentancePiece tokenizer (Kudo and Richardson,
decoder model, no such work has been done on 2018), which is an unsupervised and language-
Sanskrit’s Neural Machine Translation in the best independent tokenization method. Further, the par-
of author’s knowledge. allel and monolingual tokenized data was used
to train word-vectors of length 128, using the
3 Dataset word2vec (Mikolov et al., 2013) technique. As
For this paper, we extracted parallelly allied the transformer architecture doesn’t maintain any
English-Sanskrit data as well as monolingual data word order, so along with the trained word-vectors,
for each language. The parallel English-Sanskrit a positional encoding signal is mixed and given to
the encoder as input. Introducing the positional
3
http://sanskrit.jnu.ac.in/shmt/index.
4
jsp http://opus.nlpl.eu/
235
encoding helps maintain the embedding informa- prepared a Hindi-English NMT model using Trans-
tion and gives the vital position information to the formers, as the parent model on 1.56 Million paral-
encoder. In the architecture, both encoder and de- lel data provided by Kunchukuttan et al. (2017) and
coder are formed by stacking four identical layers training Sanskrit - English NMT model as a child
in the same manner as described by Vaswani et al. model. The Hindi data was firstly tokenized us-
(2017). The encoder takes the representation of ing Indic Tokenizer (Kunchukuttan, 2020), English
Sanskrit token through word embedding and posi- using Moses tokenizer (Koehn et al., 2007), and
tional encoding, which is then fed to a multi-head Sanskrit using sentencepiece (Kudo and Richard-
attention unit where feed-forward units with resid- son, 2018). Further Hindi-English NMT model
ual connections are employed between every other was trained using the same training procedure as
sublayer. This signal normalizes and is given to the of Transformer model discussed in section 4.1
decoder as input along with the output embeddings,
positional encoding, and masked multi-head atten- 5 Result & Discussion
tion. The decoder works similar to the encoder and The baseline model in section 4.1 was implemented
generates output word by word and finally makes a using the OpenNMT Framework (Klein et al.,
sequence. 2017). For the transfer learning implementation,
we used the NEMATUS toolkit (Sennrich et al.,
4.2 Reinforcement Translator
2017). The baseline and child model in the transfer
This methodology is inspired by He et al. (2016), learning approach were trained, tuned, and tested
where we used our Transformer model as the trans- on the same data split set discussed in section 3.
lation model, and building the language model For the quantitative evaluation, we used the BLEU
from the Recurrent Neural Network (RNN) using score (Papineni et al., 2002) for English translation
the monolingual data only. We define dual NMT as generated by the model against the test set. The
a combination of Sanskrit to English considered to results obtained are shown in Table 1.
be the primary task and English to Sanskrit being
dual. For both primary and dual tasks, we set indi- Architecture BLEU Rating
vidual agents to perform two agent communication 1. Transformer Translator 4.6 2.4
games where they correct each other through a re- 2. Reinforcement Translator 5.8 2.9
inforcement learning process. The reward system
is a combination of Language model (r1 ) reward 3. Transfer Learning 18.4 3.9
and communication reward (r2 ), which can be ex-
pressed using the equation: Table 1: Evaluation of different models with English
translation using BLEU scores
236
All average ratings are shown in the last column doors for many researchers, linguists, and students
of Table 1. Hyperparameters searched and best to work and explore Sanskrit.
selected for the baseline model during the training Dataset, training subroutine, and trained model
are mentioned in the Table 2. is available at: https://github.com/RavneetDT
U/Improving-Neural-Machine-Translation-f
Hyperparameters Experimented Best
or-Sanskrit-English
Epochs 200 200
Batch-Size 512,1024 1024
References
Number of Layer 4 4 Promila Bahadur, AK Jain, and DS Chauhan. 2012.
Learning Rate Dynamic Etrans-a complete framework for english to san-
skrit machine translation. In International Journal
Dropout 0.1 0.1 of Advanced Computer Science and Applications
(IJACSA) from International Conference and work-
Dimensional Vectors 128,256 128 shop on Emerging Trends in Technology. Citeseer.
Table 2: Hyperparameter searching for the best results Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-
gio. 2014. Neural machine translation by jointly
learning to align and translate. arXiv preprint
Few Observations from results: arXiv:1409.0473.
• Transfer Learning approach performs best Hiram Calvo, Arturo P Rocha-Ramirez, Marco A
Moreno-Armendáriz, and Carlos A Duchanoy. 2019.
among all three models. The lexical simi-
Toward universal word sense disambiguation us-
larity between Hindi and Sanskrit helped in ing deep neural networks. IEEE Access, 7:60264–
achieving a better result. 60275.
• Transformer translator performed worst, most Sergey Edunov, Myle Ott, Michael Auli, and David
likely due to small and sparse dataset from Grangier. 2018. Understanding back-translation at
scale. arXiv preprint arXiv:1808.09381.
various domains and a large number of param-
eters of the model. However, Reinforcement V. K. Gupta, N. Tapaswi, and S. Jain. 2013. Knowledge
learning made a slight improvement of 1.2 representation of grammatical constructs of sanskrit
language using rule based sanskrit language to en-
BLEU points. glish language machine translation. In 2013 Inter-
national Conference on Advances in Technology and
• The dataset used Koul and Manvi (2019) is Engineering (ICATE), pages 1–5.
different and not available to the public do-
main for testing, so it won’t be appropriate Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu,
Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learn-
to compare results of Koul and Manvi (2019) ing for machine translation. In Advances in neural
with our experiments. information processing systems, pages 820–828.
237
on interactive poster and demonstration sessions, Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexan-
pages 177–180. Association for Computational Lin- dra Birch, Barry Haddow, Julian Hitschler, Marcin
guistics. Junczys-Dowmunt, Samuel Läubli, Antonio Valerio
Miceli Barone, Jozef Mokry, and Maria Nădejde.
Nimrita Koul and Sunilkumar S Manvi. 2019. A pro- 2017. Nematus: a toolkit for neural machine trans-
posed model for neural machine translation of san- lation. In Proceedings of the Software Demonstra-
skrit into english. International Journal of Informa- tions of the 15th Conference of the European Chap-
tion Technology, pages 1–7. ter of the Association for Computational Linguistics,
pages 65–68, Valencia, Spain. Association for Com-
Taku Kudo and John Richardson. 2018. Sentencepiece: putational Linguistics.
A simple and language independent subword tok-
enizer and detokenizer for neural text processing. Richard S Sutton, David A McAllester, Satinder P
arXiv preprint arXiv:1808.06226. Singh, and Yishay Mansour. 2000. Policy gradient
methods for reinforcement learning with function ap-
Anoop Kunchukuttan. 2020. The IndicNLP Library. proximation. In Advances in neural information pro-
https://github.com/anoopkunchukuttan cessing systems, pages 1057–1063.
/indic nlp library/blob/master/docs/in
dicnlp.pdf. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
Anoop Kunchukuttan, Pratik Mehta, and Pushpak Bhat- Kaiser, and Illia Polosukhin. 2017. Attention is all
tacharyya. 2017. The iit bombay english-hindi par- you need. In Advances in neural information pro-
allel corpus. arXiv preprint arXiv:1710.02855. cessing systems, pages 5998–6008.
238