Sanskrit As Inter-Lingua Language in Machine Translation: Sunita Chand

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Sanskrit as Inter-Lingua Language

in Machine Translation

Sunita Chand

Abstract This paper gives an insight into the role of Sanskrit as inter-lingua
language in Multi-language machine translation. Inter-lingua and direct transfor-
mation based approaches have been used for a long period complementing each
other while sometimes competing with each other. Inter-lingua based approach is
efficient when used for multi-lingual machine translation e.g. Angla-Bharati system
uses pseudo lingua for Indian language (PLIL) as inter-lingua language for trans-
lation from Hindi to other Indian regional language. It is proposed to use Sanskrit as
an inter-lingua in Multi-language machine translation.

Keywords Inter-lingua ⋅
Machine translation ⋅ Natural language ⋅ Corpus
based ⋅
Sanskrit ⋅
Hindi

1 Introduction

Natural language processing has evolved from artificial intelligence field in order to
provide linguistic intelligence to machines so that it can unambiguously interpret
the sentences given as input in regional language and do the desired processing.
Machine translation (MT) is a task that requires knowledge of various other dis-
ciplines such as computational linguistics, cognitive science, computer science etc.
MT has been made possible by various approaches classified as follows:

S. Chand (✉)
University of Delhi, New Delhi, India
e-mail: sunitamk@gmail.com

© Springer Science+Business Media Singapore 2017 27


K.R. Attele et al. (eds.), Emerging Trends in Electrical, Communications
and Information Technologies, Lecture Notes in Electrical Engineering 394,
DOI 10.1007/978-981-10-1540-3_3
28 S. Chand

Fig. 1 Translation using


single inter-lingua language

Fig. 2 Translation using two


intermediate languages

1.1 Direct Substitution

In this approach the words or phrases in the source language are translated as they
are to a target language by using a dictionary. It needs a comprehensive dictionary
of all the words and their phrases. Obviously It seems very unrealistic that this
method can cope-up with complexity and ambiguous nature of a natural language
e.g. Anusaaraka.

1.2 Rule Based/Knowledge Based/Transfer Based Approach


(RBMT)

This approach involves generating the database of rules used by the source lan-
guage as well as target language and obtaining a parse of the source language for
Sanskrit as Inter-Lingua Language in Machine Translation 29

mapping to target language structure using these rules. There have been many
systems developed using this approach e.g. Angla-Bharati, Matra, Anubaad etc.
This approach has the limitations that we can’t incorporate all the rules in a system
due to which such system suffers from being inadequate, providing limited cov-
erage and also sometimes producing incorrect translations.

1.3 Corpus Based Approach (CBMT)

Corpus based system are further divide as example based MT system e.g.
ANUBHARATI and Statistical MT system (SBMT) e.g. GOOGLE Translator,
Bing Translator etc. These systems learn how to translate by analyzing existing
human translations (known as bilingual text corpora). The success of these systems
is obviously dependent upon availability of representative parallel corpora with
wide and adequate coverage in the domain of application.

1.4 Inter-Lingua Machine Translation

In this approach, the translation between source language and the target language is
accomplished by using some intermediate language which is capable of presenting
whole information contained in source language sentences in unambiguous form.
Further the intermediate language is translated to target language text.
This approach is very efficient for designing of multilingual translation systems
with minimum additional effort. The quality and success of this approach depends
on the ‘virtues’ of the intermediate language and the intermediate structure obtained
from the source text. PLIL is one such intermediate structure used by Angla-Bharati
system [1]. Other languages used as inter-lingua are UNL (Universal networking
language) [2], KANT system [3].

1.4.1 Types of Inter-Lingua Machine Translation Systems

A. Translation using single intermediate language.


In this approach any source language is first translated into in intermediate language
(X) and vice versa [4]. Now to translate a language L1–L4 the translations required
are (Fig. 1):
30 S. Chand

B. Translation using two intermediate languages


There might be cases that a single intermediate language is not sufficient for
translation from language L1 to L2 i.e. there does not exist a common language
which can be conveniently used as intermediate language for L1 and L2. Instead L1
can be conveniently converted to X1 whereas X2 can be used as intermediate
language for L2. Also X1 and X2 are commonly used languages for which X1 to
X2 conversion packages are already available. Then some least used languages can
take the benefit of conversion through two intermediate language instead of no
conversion solution.
Let L2 and L5 are least used languages having that does not have a common
intermediate language, then L2 → L5 conversion can take place as
L2 → X1 → X2 → L5 (Fig. 2).
The remaining paper is organized as follows: Sect. 2 gives an insight to the
features required by any language to serve as inter-lingua. Section 3 describes the
features of Sanskrit language that make it a strong candidate to be used as inter-
mediated language for machine translation. The paper ends with the conclusion and
future scope in Sect. 4.

2 Features of an Inter-Lingua

Inter-lingua based approach is advantageous over other machine translation


approaches. The first advantage is that multilingual translation need not define
explicit rules for each language pair in translation direction. Rather each of the N
languages first need N mappings to be translated to intermediate language and then
N mappings to translate each language from inter-lingua. So a total of 2 N map-
pings are required to translate among N natural languages. Whereas, in transfer
based approach, a separate mapping is required to translate in each direction for
every pair of languages resulting in a total of N (N–1) mappings [5].
For a language to be capable of being an inter-lingua language, it should
possess some important characteristics. First of all, it should be unambiguous.
Each word in the inter-lingua should be explicit in representation. Secondly, an
inter-lingua should be universal, i.e., it should be capable of representing the
abstract meaning of any text belonging to any language or domain. Third
characteristic that an inter-lingua should possess is that it should be capable of
presenting the wholesomeness of the input text i.e., it should be able to rep-
resent morphological, syntactic, semantic and even pragmatic meaning of the
input text. Fourth, an inter-lingua should not get influenced by the formal
Sanskrit as Inter-Lingua Language in Machine Translation 31

representation of the content in source language. Rather it should represent the


content of the input language only. Fifth, an inter-lingua should be independent
of both, the source language and the target language. The analysis part of the
Source-to-target language translation should be based on source language,
whereas the generation part should be target specific [6].
The next section describes the features of sanskrit language that make it suitable
to be selected as inter-lingua language.

3 Sanskrit as an Inter-Lingua Language

The entire Sanskrit grammar, known as Ashtadhyayi was created by sage Panini
with the help of fourteen distinctive sounds that he conceived from God Shiva’s
damru (small hand-drum which God Shiva holds in His hand).
The perfection of Sanskrit grammar can be proved very easily by the
extensiveness of its grammatical tenses, one form for the present tense, three
forms for the past tense and two forms for the future tense. There is an
exclusive representation for, potential mood, imperative mood, benedictive
mood (called asheerling, which is used for indicating blessing), and condi-
tional. It has three separate words for each of the three grammatical persons
(first, second and third person), and it further distinguishes among ekvachan,
dvi-vachan and bahu-vachan i.e., if it is referring to one, two or more than
two people. Also the three categories of the verbs, known as atmanepadi,
parasmaipadi and ubhaipadi. signifies that the outcome of the action is
related to the doer or the other person or both respectively.
In this way there are ninety forms of one single verb.
For example: ‘kri’ root word (known as dhatu) means ‘to do’. Sanskrit has
ninety forms of verbs like this e.g., karoti, kurutah, kurvanti, etc. whereas in
English language, there are only a few forms of each word e.g., do, doing, and
done in the below figure. Additional words e.g. is, was, will, has been, have
been, had, had been etc. are added to these forms of verb to distinguish the
tenses. But in Sanskrit language there are distinct single words for all kinds of
uses and situations. There are words for all the three genders for the nouns
and pronouns and each word has twenty-one forms of its own to cover all
situations.
32 S. Chand
Sanskrit as Inter-Lingua Language in Machine Translation 33

Regarding Sanskrit vocabulary, there is a dictionary of the root words and


prefixes and suffixes called dhatu path at the end of Ashtadhyayi. It has an abun-
dance of words and furthermore, Sanskrit grammar is capable enough for creating
any number of new words for a new situation or concept or thing.
Hence the Sanskrit language fulfills the first requirement of an Inter-lingua
language that it should have word, explicit in representation.
Second requirement that is an inter-lingua should be universal, can be proved by
the Sanskrit language has been in its perfect form since thousands of years earlier
even before the infancy of the earliest prime languages of the world like Hebrew,
Greek, and Latin.
These languages have adopted many words from Sanskrit and have undergone
many changes as they passed from one stage to another whereas there has never
been any kind, class or nature of change in the science of Sanskrit grammar.
The sound of each of the 36 consonants and the 16 vowels of the Sanskrit
language are precise and fixed since its inception. The words of Sanskrit language
were never changed, improved, altered or modified in any way. All the words of
Sanskrit language used to be pronounced in the same way as they are pronounced
today. Also the Sanskrit vowel system has also been immune to any kind of
alteration. The reason to this immunity is that Sanskrit was the first language of the
world and that it attained its absolute perfection by its nature and formation
When a language changes its form and shape to some extent when it is spoken
by unqualified people and people of other origin, is known as ‘Apbhransh’. For
example Sanskrit word ‘matri’, with a long ‘a’ and ‘soft’ ‘t’, became ‘mater’ in
Greek, and ‘mother’ in English. It represents that English and Greek languages are
‘apbhransh’ form of Sanskrit. Such ‘apbhranshas’ of Sanskrit words are found in all
the languages of the world, which proves that Sanskrit was the mother language of
the world.
As such Sanskrit language has the capacity to represent all forms of represen-
tation of a word, from morphological to semantic, from pragmatic to discourse
representation.

4 Conclusion

Considering all the above points as explained above, it is quite evident that Sanskrit
is the source of all other languages of the world and not a derivation of any
language. As such, it can represent any other language thus qualifying as the
inter-lingua language that has the capacity to represent the content of any source
language. Hence it qualifies as an inter-lingua language which can be used in the
mapping of multiple source languages to multiple target languages. As opposed to
34 S. Chand

the KANT system [7], which produces a source F-structure as the inter-lingua
language, the proposed system may be easier to implement as each transfer from
source language to Sanskrit language will be governed by the well known gram-
matical rules of Sanskrit which can be further transferred to any other language.

References

1. Sinha RMK, Jain A (2003) AnglaHindi: an English to Hindi machine aided translation system.
http://anglahindi.iitk.ac.in, MTS-2003
2. Dave S, Parikh J, Bhattacharyya P (2001) Inter-lingua-based English-Hindi machine translation
and language divergence. Mach Transl 16:251–304
3. Nyberg EH (1996) Controlled Language and Knowledge Based Machine Translation:
Principles and Practice. In: First International workshop on controlled language applications,
Katholieke University, Leuven, 26–27 March 1996
4. Adusumilli KK Natural languages translation using an intermediate language. IAENG Int J
Comput Sci 33:1, IJCS_33_1_20
5. Lampert A (2004) Inter-lingua in machine translation. Technical Report, 2004
6. Al Ansary S Inter-lingua-based machine translation Systems: UNL versus other inter-linguas
7. Mitamura T, Nyberg EH, Carbonell JG (1991) An efficient inter-lingua translation system for
multi-lingual document production. In: Proceedings of machine translation summit III,
Washington D.C., 2–4 July 1991

You might also like