Download as pps, pdf, or txt
Download as pps, pdf, or txt
You are on page 1of 24

Computer Speech/Voice

Recognition
- IBM ViaVoice® -
April 2005
IBM PC Club

Bernhard Krevet, IBC, Napa


Overview
• Definitions
• Categories of speech recognition software
• Products: Dragon NaturallySpeaking, ViaVoice
• ViaVoice® by IBM for Windows & Mac
– System Requirements 2003
– System Requirements 1994 & 1997
– Installation experience

• Resources (Web) & Comments


• Using Speech Recognition
• Demo
Speech Recognition (1/3)
 ... refers to the process by which
a person dictates a phrase that
the computer translates into
typed text.

The dictated words can be


interpreted as a command or
stored as the words in a
document.
Speech Recognition (2/3)
 What It Does
ƒ Transform spoken words into written text
or commands
ƒ Recognize context (e.g. differentiate
homonyms)
ƒ Learn from you
ƒ Use personal voice model
ƒ Extensible vocabularies
ƒ Support many languages
Speech Recognition (3/3)
 What It Does Not

ƒ Accept more than


one person talking
at the same time
ƒ "Understand"
ƒ Think / Create
ideas
ƒ Organize
ƒ Replace a secretary
Categories of speech recognition software (1/2):

 Continuous speech, which means speaking words without pauses in


between. It's not quite "natural," but it's close.

• Natural user interface, with things like natural language commands.


Instead of using a set of specified commands, you would say what you want, and the
computer would take the appropriate action. The programs available today aren't
fully "natural" yet, especially since usually they let you be "natural" only in certain
applications.

• Short training period , this means the software makers are looking for
what is known as "speaker independence." The hope is that someday you'll be able to
sit down at a strange computer and tell it what to do, or to record somebody and
then have the computer do the transcribing.

 Discrete speech dictation (pause between words)


Categories of speech recognition software (2/2):

 Programs geared toward specific tasks


• Speech-enabled PC apps recognize commands
• VoicePilot,
• EasyVoice,
• ASR (Automatic Speech Recognition)

Platforms:
 Windows
 Macintosh
 Unix
 OS/2 (comes with IBM's discrete speech engine)
Popular Speech Recognition
Products

Talk To
Me !
“Dragon NaturallySpeaking 7 is the most accurate and full-
featured Dragon NaturallySpeaking ever released! Accuracy up to
99%! 15% Accuracy Improvement. Breakthroughs in speech
engine technology deliver the largest single accuracy
improvement ever for a Dragon NaturallySpeaking release.”

PC Magazine - May 2003:


"ScanSoft's Dragon NaturallySpeaking Preferred 7 makes
dictation, correction, and voice control of your PC faster and
easier than any voice recognition software yet." "...the new auto-
punctuation option worked admirably at adding commas and
period to our dictations; it should be ideal for casual dictation
such as e-mail or online chat."
ViaVoice Characteristics

IBM ViaVoice® technology, available on the Windows, Macintosh


and handheld computer platforms, can afford a 'multi-modal'
environment, freeing users from dependence on the mouse, keyboard
and stylus for many applications.

ViaVoice® personal computer software leverages generations of


IBM voice recognition research and accomplishment. ViaVoice for
Windows Release 10 product family offers a complete portfolio
appealing to every level of user expertise, and our ViaVoice for Mac
offerings were the first continuous speech products on the Apple
Macintosh platforms in the consumer marketplace.
 Windows products
 Pro USB Edition: Flagship edition, featuring a digitally-enhanced
stereo headset microphone.

 Advanced Edition: Productivity tool with new command and


control features.

 Standard Edition: Great dictation accuracy for the home/home


office.

 Personal Edition: Introduction to natural, continuous speech


recognition on the PC

 Macintosh products
 ViaVoice for Mac OS X Edition with the sleek "Aqua" look and feel
 Simply Dictation for Mac OS X Introduction to dictation on the Mac
ViaVoice® Pro USB Windows
ƒ System Requirements (2003)
ƒ > 300 MHz processor, > 128MB RAM
ƒ 500MB available hard drive space
ƒ Sound card with microphone jack, USB
ƒ CD ROM drive (for installation)

ƒ Windows 98SE, Me or XP
ƒ MS Office for direct input or
ƒ Any word processor with access to the
clipboard (copy/paste)

ƒ MSRP: $200 incl. Headset ($100 upgrade)


Components / Prerequisites -1996-
Hardware:
– Pentium/100MHz processor, 24MB RAM
– Any sound card

Software:
– IBM's OS/2 WARP 4.0 ($189) which included:
ƒ OS/2 Speech Recognition SW
ƒ Headset Microphone with ANC
– IBM's "Simply Speaking" for Windows95 ($600)
– Any word processor or editor with access to the clipboard
(copy/paste)
Components / Prerequisites -1994 -
Hardware:
 486/33MHz processor, 65MB disk space
 IBM VoiceType Dictation adapter
ISA, MC, PCMCIA
 Unidirectional microphone
 Powered speakers or headphones

Software
– IBM VoiceType Dictation Program Product
– Any word processor or editor with access to the clipboard
(copy/paste)

Price: ~ $1000.00 for VTD HW and SW


ViaVoice® Setup
 Installation
– SW installation
– Registration of each
user / language
 Training
– Human:
About 90min reading
predefined texts
– Computer:
About 30 min processing
of personal language
voice model
ViaVoice® Installation Experience January 2005

 Installation of two languages, must be same version (USB Pro)


 560MB hard drive space
 Headset on phone jacks or USB
 Check audio levels and record sample texts to build voice models

 First dictation – many errors, need to improve voice model


 Check your voice, drink water… initially tedious correction process
 Web-advice: use special SpeakPad (not Word) with open correction window
 Learn how to use the Correction Window
 File (save) sessions to give program a chance to improve the model
 Some idiosyncrasies e.g “OPEN-QUOTE” “CLOSE-QUOTE”

 Analyze existing documents to add specific words to vocabulary


(only supports .doc & .txt, not even IBM Lotus own WordPro .lpw)
 Manage vocabulary - OK
Voice Recognition Sites – Most Popular (Yahoo)
Lernout & Hauspie - provider of speech and language products, technologies, and services,
including speech recognition, text to speech, compression, and translation.
Dragon NaturallySpeaking - family of software products that turn speech into text.
Nuance Communications - provides enterprise-level speech recognition and speaker
verification software to automate v-commerce and communications transactions.
General Magic - voice infrastructure software company that provides enterprise-class software
and supporting voice dialog design and hosting services.
SpeechWorks International - provider of speech recognition, text-to-speech (TTS), and
speaker verification for network and embedded environments.
Philips Speech Processing - large vocabulary continuous speech recognition products for
PCs. Also Digital Dictation devices and solutions for the medical and legal area.
Sensory, Inc. - low-cost integrated circuit providing speech recognition, speech synthesis,
music synthesis and 8-bit micro controller.
IBM Voice Systems - offering the ViaVoice line of speech recognition software.
Fonix Corporation - voice recognition technology featuring automatic speech recognition
(ASR) using neural network (artificial intelligence) techniques.
Conversá - develops speech-enabled software and hardware that allows users a conversational
way of interacting with their computers.
http://www.out-loud.com/

This site is intended to help people using


speech recognition software, whatever the
variety, and to do so without the filters of
vendors. We have our own filters, of course,
so please read critically.
By Susan Fulton, longtime user of speech
recognition and assorted gadgets for easier,
less painful computing.
http://www.voicerecognition.net/

List established in January 1996 for discussing all aspects of using


voice recognition input systems. The focus is on effective use of
voice recognition.
Sample topics:
• Using such systems safely, without muscle or voice strain;
• Techniques for improving recognition accuracy;
• How to set up the physical voice workstation optimally;
• General tips for effective use of voice interfaces;
• Configuration of specific systems, troubleshooting, etc.
Voice Recognition Dead End?

You've heard of killer apps? How about


an app killer? This is what voice
recognition has become over the years,
because for the most part, it doesn't
work. I see no evidence that it ever will,
at least not in the sense that we can
achieve true voice dictation capability.
[...]
John C. Dvorak, January 25, 2002
Using Speech Recognition
 The Verdict Is in: Speech Recognition Software Can Increase
Productivity in the Legal Environment
 The Use of Speech Technologies in Mobile Computing
 Creating Access for People with Disabilities through Speech
and Language Technologies
 Voice Recognition Software Versus a Traditional Transcription Service
for Physician Charting in the ED
 From RSI to ROI: Speech Recognition Can Increase Productivity and
Reduce Repetitive Stress Injuries in Your Workplace
 Automating the Electronic Forms Process
 Leading Electronic Medical Records Program Incorporates
Dragon NaturallySpeaking® Speech Recognition
 Dragon NaturallySpeaking® Speech Recognition Benefits Not Only
Doctors Who Use it, but Patients Who Rely on Them
CONVERSATION
Q. What is one divided by zero.
A. Error in line above: malformed?

Q. What is one divided by zero?


A. Error in line above: attempting division by zero.

Q. What is one divided by one.


A. Error in line above: malformed?

Q. What is one divided by one?


A. One. Problem trivial for my capacity.
Please consult supervisor before typing in further
instructions.

Q. Why?
A. Question not quantified or answerable by "yes" or "no."
This is fourth malformed or erroneous question typed in
during your time interval.
Consult supervisor about formation of instructions.
Cost effectiveness ratio for this machine prohibits
further use by this operator until recycling code is typed
in.
Sorry.

Q. Why?
A. KEYBOARD NOW BLOCKED.
ViaVoice® Demonstration

Talk To
Me !

You might also like