Professional Documents
Culture Documents
Individual Project - Mason Leary
Individual Project - Mason Leary
Individual Project - Mason Leary
Mason Leary
Professor Adam Wooten
Introduction to Computer Assisted Translation
14 November 2014
The History of Speech-to-Speech Translation Technology
Speech-to-speech machine translation, also referred to as spoken language translation or
voice translation, is only a recent development in the realm of machine translation. The concept
was first presented at the 1983 ITU Telecom World conference by the Japanese company NEC
Corporation (Nakamura, 2). Researchers knew it would take years or even decades to implement
speech-to-speech translation technology, and as a result, the Advanced Telecommunications
Research Institute International (ATR) was founded in 1986 and began a project on
speech-to-speech translation research. The project included researchers both in Japan and from
around the world. In 1993, the first experiment in speech-to-speech translation occurred. The
experiment linked three sites around the world: the ATR, Carnegie Melon University and
Siemens. With the start of the ATR project, other projects began to spring up around the world.
One of these projects was the Verbmobil project in Germany.
Germanys Federal
Ministry of Research and Technology funded the project with 65 million Marks (approximately
$41 million) and private investors from the industry gave 31 million Marks (about $19 million)
(Der Spiegel 1). The project was headed by Wolfgang Wahlster with a team of around 100
colleagues. The project began in 1993 and ran until 2000. The goal of the project was to create
a machine that understands spoken German or Japanese and correctly translates it into English.
The realm of what was to be translated was limited to reservation systems.
These core technologies needed to have high-performance in order to have more efficient
domain-independent speech-to-speech translation systems, and the gap between human and
machine translation performance needed to be reduced. The project was a success after only
three years of research with usable systems being developed for European English, European
Spanish and Mandarin Chinese.
On March 18th, 2005, the U.S. Defense Advanced Research Projects Agency started a
new project called GALE (Global Autonomous Language Exploration) (Estival 179). The goal
of this project was to [eliminate] the need for linguists and analysts [and to]
automaticallyinterpret huge volumes of speech and text in multiple languages. The U.S.
Government, particularly the military, needed computer software to absorb, analayze and
interpret huge volumes of speech and text in multiple languages (Fgen et al. 218). This project,
was not full speech-to-speech translation but instead speech-to-text translation.
The input
speaker speaks in either Arabic or Mandarin Chinese and the English output was in the form of
text. The output text wss also consolidated and easy to understand.
Once smartphones became popular and more ubiquitous in 2007, many speech-to-speech
translation system developers began to focus on this medium.
recognition technologies, Microsoft is able to fine-tune its model-based training approach. The
engineers are also trying to fix disfluency, the difference between the way people talk and write.
At Microsofts Beijing and Redmond labs, the team has made great advances in their speech
recognition technology.
One of the biggest game changers in Microsofts speech-to-speech translation system is
the use of deep neural networks. Deep neural networks are more efficient because they are
deep architectures which have the capacity to learn more complex models than shallow ones [and]
this expressivity and robust training algorithms allow for learning powerful object
representations with-out the need to hand design features (Szegedy, 1).
IBM is developing a speech-to-speech translation system, too. The technology consists
of three parts: a speech recognition system, a text-to-text system and a text-to-speech system
(Hyman, 17). The team of researchers is working to make all three components behave well,
both individually and together. For speech recognition, the team needs to reduce error rate and
be able to have it detect unarticulated utterances. The team has already improved error rates by
40% since last year. For the text-to-text system, the IBM team is trying to improve out-ofvocabulary words, i.e. dialect and slang. To improve upon this, the team introduced a dialogue
manager which prompts the speaker to clarify an unknown word. So far, the system is able to
detect its inability to recognize a word 80% of the time. The team is hoping to get that up to 95%
over the next few years.
Google is another player in the speech-to-speech translation system game. They have
been developing their machine translation system for over ten years and now offer speech-tospeech translation on their smartphone app. The app can translate between 72 languages and
receives over a billion translations a day (18). These large amounts of data being uploaded daily
Google is reluctant,
however, to speculate on a timeline of when they will have fully functioning, smooth speech-tospeech translation. They do believe, though, that they are really close.
AT&T is also working on a speech-to-speech translation system. This team is hoping to
integrate all aspects of the process into one single step (AT&T). This is possible, because the
company already has a high-quality speech recognition system called WATSON ASR, as well as
a natural-sounding text-to-speech system called Natural Voices. The system also uses many
recognition possibilities and is constantly extracting from large datasets in various domains to
increase its corpora. The system is capable of being cloud-based and device-based.
All components of speech-to-speech translation technology are being researched by
hundreds of teams around the world.
sentence is very different than the English equivalent. This sentence also utilized
three different cases: nominative, dative, and accusative. I wanted to see if the
systems can properly identify them. Acceptable translations are:
10
Jibbigo
German into English
System Output
Sentence
Rating
Sentence
3
1.7
Average:
English into German
System Output
Rating
Average:
My Comments
Speech recognition
was bad the first
few tries.
Speech recognizer
unable to interpret
Schrder
Couldnt
differentiate based
on context
Very bad
translation
Correct translation
but unidiomatic.
Did not recognize
the inflection
denoting a
question.
Did not recognize
my speech fully
Perfect translation.
My Comments
Syntax is right but
the verb choice is
unidiomatic.
Proper tense used.
Wrong future tense
used.
11
Google Translate
Sentence
1
How are you?
Rating
My Comments
3
Perfect
I had to repeat
2
My name Dietrich Schrder.
2
Schrder. Left
out verb.
Couldnt
3
The dog is an apple.
1
differentiate based
on context
4
She bought my dad a blue shirt.
3
Perfect translation.
Did not recognize
the inflection
5
Then we are ready.
2
denoting a
question. Left out
particle.
Did not translate
6
Youll stay here.
1
particles or
intonation.
7
Tomorrow is Friday, right?
3
Perfect translation.
Average:
2.1
English into German
Sentence
System Output
Rating
My Comments
Syntax is right but
the verb choice is
unidiomatic.
1
Ich nehme den Zug Morgen um 9 a.m.
1
Proper tense used.
Did not know what
was meant by
a.m.
Could not
differentiate
Sie geht auf die Universitt nchsten Herbst
between the
2
1
gehen.
different uses of to
go. Used wrong
preposition.
Average:
1
12
Rating
My Comments
3
Perfect
Recognized what I
said, but miss
translated heie
2
I hot Dietrich Schrder.
2
as hot (which is an
alternate meaning
of the word).
Couldnt
3
The dog is an apple.
1
differentiate based
on context
4
She bought a blue shirt my father.
2
Bad syntax.
5
Then were so done.
1
Bad translation.
Bad translation.
Did not understand
6
You stay here you.
1
meaning of
particles.
Did not understand
7
Tomorrow is Friday, or?
2
meaning of particle.
Average:
1.7
English into German
Sentence
System Output
Rating
My Comments
Syntax is right but
the verb choice is
1
Ich nehme den Zug Morgen um 9:00.
2
unidiomatic.
Proper tense used.
Right tense but
2
Sie wird zur Universitt gehen nchsten Herbst.
2
wrong syntax.
Average:
2
Avg. 1
Avg. 2
1.7
2
Ratings Comparison
Sentence Jibbigo
Google
1
2
3
2
1
2
3
1
1
4
1
3
5
2
2
6
2
1
7
3
3
1
2
1
2
2
1
2.1
1.7
1
2
Voice
3
2
1
2
1
1
2
2
2
13
My Conclusion:
Each of the systems I tested performed averagely:
Jibbigos translations were adequate but it was bad at syntax and context cues. The
speech recognizer had a hard time understanding my non-native, but intelligible, German accent.
The speech synthesizer was also very hard to comprehend. The user interface was not very good,
either.
Google Translate did well when translating from German into English, but also did
poorly when translating from English into German. The speech recognizer had very little trouble
understanding my non-native German accent.
comprehendible. The user interface on this app was the smoothest and most easy-to-use.
Voice Translator Frees translations were adequate but it was bad at syntax and context
cues. The speech recognizer worked very well, since it utilized Googles speech recognition
technology. However, the speech synthesizer used was hard to comprehend. The interface was
also full of glitches and not very smooth.
I would conclude that none of these three speech-to-speech apps performed better than
the others. They all need improvements in machine translation, speech recognition and speech
synthesis to some extent.
14
Works Cited
1. "AT&T Researchers Inventing the Science Behind the Service." AT&T Labs Research.
Web. 24 Nov. 2014.
<http://www.research.att.com/projects/Speech_Translation/?fbid=8hKe8rrgmbR>.
2. "Enabling Cross-Lingual Conversations in Real Time - Microsoft Research." Enabling
Cross-Lingual Conversations in Real Time - Microsoft Research. Microsoft Research, 27
May 2014. Web. 24 Nov. 2014. <http://research.microsoft.com/enus/news/features/translator-052714.aspx>.
3. Estival, Dominique. "The Language Translation Interface: A Perspective from the Users."
Machine Translation 19.2 (2005): 175-92. Print.
4. Fgen, Christian, Alex Waibel, and Muntsin Kolss. "Simultaneous Translation of Lectures
and Speeches." Machine Translation 21.4 (2007): 209-52. Print.
5. Hyman, Paul. "Speech-To-Speech Translations Stutter, But Researchers See Mellifluous
Future." Communications of the ACM 57.4 (2014): 16-19. Business Source Elite. Web. 22
Nov. 2014.
6. Lavie, Alon, Fabio Pianesi, and Lori Levin. "The NESPOLE! System for Multilingual
Speech Communication over the Internet." 1-10. Print.
7. "Mit Weltwissen Gefttert." Der Spiegel 1 May 1997. Print.
8. Nakamura, Satoshi. "Overcoming the Language Barrier with Spech Translation Technology."
Science & Technology Trends 31 (2009). Print.
9. "Skype Translator Demo from WPC 2014 - Microsoft Research." Skype Translator Demo
from WPC 2014 - Microsoft Research. Web. 24 Nov. 2014.
<http://research.microsoft.com/apps/video/default.aspx?id=225021>.
15
10. Szegedy, Christian, Alexander Toshev, and Dumitru Erhan. "Deep Neural Networks for
Object Detection." 1-9. Online.