Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Speech translation

Speech translation is the process by which conversational spoken phrases are


instantly translated and spoken aloud in a second language. This differs from phrase
translation, which is where the system only translates a fixed and finite set of phrases that have
been manually entered into the system. Speech translation technology enables speakers of
different languages to communicate. It thus is of tremendous value for humankind in terms of
science, cross-cultural exchange and global business.

How it works
A speech translation system would typically integrate the following three software
technologies: automatic speech recognition (ASR), machine translation (MT) and voice
synthesis (TTS).
The speaker of language A speaks into a microphone and the speech recognition module
recognizes the utterance. It compares the input with a phonological model, consisting of a
large corpus of speech data from multiple speakers. The input is then converted into a string of
words, using dictionary and grammar of language A, based on a massive corpus of text in
language A.
The machine translation module then translates this string. Early systems replaced every word
with a corresponding word in language B. Current systems do not use word-for-word translation,
but rather take into account the entire context of the input to generate the appropriate translation.
The generated translation utterance is sent to the speech synthesis module, which estimates the
pronunciation and intonation matching the string of words based on a corpus of speech data in
language B. Waveforms matching the text are selected from this database and the speech
synthesis connects and outputs them.[1]

History
In 1983, NEC Corporation demonstrated speech translation as a concept exhibit at the ITU
Telecom World (Telecom '83).[2]
In 1999, the C-Star-2 consortium demonstrated speech-to-speech translation of 5 languages
including English, Japanese, Italian, Korean, and German.[3][4]

Features
Apart from the problems involved in the text translation, it also has to deal with special problems
occur in speech-to-speech translation, incorporating incoherence of spoken language, fewer
grammar constraints of spoken language, unclear word boundary of spoken language, the
correction of speech recognition errors and multiple optional inputs. Additionally, speech-to-
speech translation also has its advantages compared with text translation, including less complex
structure of spoken language and less vocabulary in spoken language.

Research and development


Research and development has gradually progressed from relatively simple to more advanced
translation. International evaluation workshops were established to support the development
of speech-translation technology. They allow research institutes to cooperate and compete
against each other at the same time. The concept of those workshop is a kind of contest: a
common dataset is provided by the organizers and the participating research institutes create
systems that are evaluated. In this way, efficient research is being promoted.
The International Workshop on Spoken Language Translation (IWSLT), organized by C-STAR,
an international consortium for research on speech translation, has been held since 2004. "Every
year, the number of participating institutes increases, and it has become a key event for speech
translation research."[1]

Standards
When many countries begin to research and develop speech translation, it will be necessary to
standardize interfaces and data formats to ensure that the systems are mutually compatible.
International joint research is being fostered by speech translation consortiums (e.g. the C-STAR
international consortium for joint research of speech translation and A-STAR for the Asia-
Pacific region). They were founded as "international joint-research organization[s] to design
formats of bilingual corpora that are essential to advance the research and development of this
technology ... and to standardize interfaces and data formats to connect speech translation
module internationally".[1]

Applications
Today, speech translation systems are being used throughout the world. Examples include
medical facilities, schools, police, hotels, retail stores, and factories. These systems are
applicable anywhere that spoken language is being used to communicate. A popular application
is Jibbigo that works offline.

Challenges and future prospects


Currently, speech translation technology is available as product that instantly translates free form
multi-lingual conversations. These systems instantly translate continuous speech. Challenges in
accomplishing this include overcoming speaker-dependent variations in style of speaking
or pronunciation are issues that have to be dealt with in order to provide high-quality translation
for all users. Moreover, speech recognition systems must be able to remedy external factors such
as acoustic noise or speech by other speakers in real-world use of speech translation systems.
For the reason that the user does not understand the target language when speech translation is
used, a method "must be provided for the user to check whether the translation is correct, by such
means as translating it again back into the user's language". [1] In order to achieve the goal of
erasing the language barrier worldwide, multiple languages have to be supported. This requires
speech corpora, bilingual corpora and text corpora for each of the estimated 6,000 languages said
to exist on our planet today.
As the collection of corpora is extremely expensive, collecting data from the Web would be an
alternative to conventional methods. "Secondary use of news or other media published in
multiple languages would be an effective way to improve performance of speech translation."
However, "current copyright law does not take secondary uses such as these types of corpora
into account" and thus "it will be necessary to revise it so that it is more flexible."[1]

You might also like