Professional Documents
Culture Documents
Presented By: Funda Singh Raajput: Prakash Choudhary Ajay Prakash
Presented By: Funda Singh Raajput: Prakash Choudhary Ajay Prakash
Applications:
Services for the hearing impaired
Reading email aloud
words
Text
Prosody Concatenation
Pre-processing
Text Pre-Processing
Input
– String of characters (sentence)
Output
– String of diphone symbols
Objective
– Perform sentence level analysis
• Punctuation marks
• Pauses between words
– Convert all input to corresponding diphones
Text Pre-Processing (Block Diagram)
Diphone
Dictionary
Number Converter
Replace numerals with their textual
versions
100 one hundred
Diphone
Dictionary
Acronym Converter
Replace acronyms with single letter
components
A.B.C. ABC
Diphone
Dictionary
Word Segmenter
Divide sentence into word segments
– Special delimiter to separate segments
(i.e. ‘||’)
Segments can be:
– A single word
– An acronym
– A numeral
Identify punctuation marks
Text Pre-Processing (Block Diagram)
Diphone
Dictionary
Word To Diphone Converter
(Phonetization)
Purpose
– Translate words to their diphone
representations
Resource
– Dictionary of words and their diphones
(derived from CMU phoneme database)
– Over 175,000 words supported
W-to-D Converter Cont’d
Implementation
– Binary Search Algorithm in C
– Start with whole dictionary as search range
start index, end index, middle index
– If target word alphabetically less then middle
word,
then ignore second half (i.e. end index =
middle index)
else ignore first half (i.e. start index = middle
index)
– Repeat until word found or range contains zero
words
W-to-D Converter Cont’d
Advantages
– Fast search times
• Search range decreases exponentially with
each iteration (max of 1 sec currently)
– Less complicated to implement
• Compared to indexing dictionary or
• Importing the dictionary to an internal
structure
Text Pre-Processing (Block Diagram)
Diphone
Dictionary
The Multi-Level Data Structure
Contains all necessary data for the
next sub-system:
– Word
– Diphone representation
– Prosodic parameters for each diphone
• This reflects both word-level and sentence-
level prosody
Allows for modularization
Summary
TTS System
words
Text
Prosody Concatenation
Pre-processing
System modularized