Professional Documents
Culture Documents
Machine Translation1 (P20)
Machine Translation1 (P20)
Machine Translation(MT)
Translation of text units from one language into another
using computers
One of the earliest NLP application
We need not know multiple languages, just feed it to a MT
system and get it translated.
Need of MT
It can help overcome technological barriers.
A lot of information is available in today’s world but this
information is available in very small subset of languages
and beyond the reach of significant portion of the society.
This has led to digital divide in the society.
MT can be of great help in removing this divide.
Multilingual countries like India where very few people
can understand English are more particularly in need of
MT systems to translate information from English into
local languages.
Problems in MT
There are many structural and stylistic differences among
languages, which make automatic translation a difficult
task:
Word Order: Arrangement of words in a sentence varies
across languages. E.g. in English, words are arranged in
order subject verb and object; whereas in Indian
languages object usually precedes verb.
Word Sense: The sense of a word in one language may
translate into a different sense with the words of another
language. This creates problem in target language word
selection.
Problems in MT(contd..)
Anaphora resolution (AR) which most commonly appears as pronoun
resolution is the problem of resolving references to earlier or later items in
the discourse.
These items are usually noun phrases representing objects in the real world
called referents but can also be verb phrases, whole sentences or paragraphs.
There are primarily three types of anaphora:
Example
Transfer Interlingua Statistical
based
Direct Machine Translation(DMT)
DMT systems provide direct machine translation i.e. , no
intermediate representation is used.
They carry out word-by-word translation with the help of
bilingual dictionary, usually followed by some syntactic
rearrangement.
They take a monolithic approach towards development,
i.e. they consider all the details of one language pair.
Anusaarka(IIIT Hyderabad) is a MT based on direct
approach
Overview
Three main methodologies for Machine Translation
Direct
Transfer
Interlingual
Contd..
The general procedure for direct translation subsystems can be
summarized in the following three steps:
1. Remove morphological inflections from the words to get
the root form of the source language words.
2. Look up a bilingual dictionary to get the target-language
words corresponding to the source language words.
3. Change the word order to that which best matches the word
order of the target language, e.g. in a English-Hindi
translation system, this would involve changing
prepositions to post-positions and changing the subject-
verb-object structure to subject-object-verb.
DMT System
Target language
Source
text
language text
SL TL
Morphologica Words Bilingual Words Syntactic
l
lookup rearrangement
analysis
SL-TL dictionary
Example
Consider this English sentence
Khushbu slept in the garden.
To translate this sentence into Hindi, a direct translation system will
first look up a dictionary to get target words for each word appearing
in the source-language sentences. Then the words are reordered to
match the default sentence structure of Hindi. The output of these
steps is:
Word Translation:
खुशबु सोयी में बाग
Khushbu soyi mein baag
Syntactic rearrangement:
खुशबु बाग में सोयी
Khushbu baag mein soyi
Contd..
Besides word ordering and preposition handling, suffix handling is also needed
in order to make the translation acceptable. E.g. in the following sentence we
need to change the Hindi word ladka to ladke. This is termed as idiomatization.
English sentence:
TL
SL Representation representation
Analysis Transfer Synthesis
TL1 Text
Interlingua Synthesis
Analysis
representation
SL11 Grammar
Grammar TL11 Grammar
Grammar
SLn Text
TLn Text
Analysis Synthesis
TL1
Inter- TL2
Source Lingua TL3
Language
TL4
TL5
TL6
Overview
Two major advantages of Interlingua method
2. Interlingual representations can also be used by NLP
systems for other multilingual applications
Overview
Sounds great, but…due to many complexities
Only one interlingua MT system has ever been made
operational in a commercial setting:-
KANT (knowledge-based accurate natural language
translation )system
Only a few have been taken beyond research prototype
Statistical Machine Translation (SMT)
Deals with automatically mapping sentences in one human
language (for example French) into another human language (such
as English).
The first language is called the source and the second language is
called the target.
There are many SMT variants, depending upon how translation is
modelled.
Some approaches are in terms of a string-to-string mapping, some
use trees-to-strings, and some use tree to-tree models.
All share in common the central idea that translation is automatic,
with models estimated from parallel corpora (source-target pairs)
and also from monolingual corpora (examples of target sentences).