Professional Documents
Culture Documents
Developing Speech To Text Messaging System Using Android Platform
Developing Speech To Text Messaging System Using Android Platform
Supervised By
Dr. Wint Pa Pa Kyaw
Associate Professor
Candidate
Ma Htet Yi Zaw
3PhDCom - 2
Department of Computer Studies
University of Yangon
6-March-2020 1
Main Title :
2
1PhD Regular Title :
3
2PhD Regular Title :
4
3PhD Regular Title :
5
Contents
1. Introduction
2. Data Preparation for Building Models
3. Setting Up the Environment
4. Building Acoustics Model
5. Building Phonetic Dictionary
6. Building Language Model
7. Conclusion
6
Introduction
7
Limitations of Speech Recognition Models
8
Data Preparation for Building Models
An acoustic model contains
acoustic properties for each state of
phone.
A phonetic dictionary contains
a mapping from words to phones.
A language model is used to
restrict word search.
10
Data Preparation for Building Models
Text Preparation
• List of possible saying words for messaging are selected.
11
Data Preparation for Building Models
Speech Corpus
• To gather the speech that has already been recorded and
manually transcribe it into text.
• To create the text corpus first and record the speech by
reading the collected text.
• To collect daily conversational data, the latter method is used
• 4 male and 4 female speakers
• Recording 20 sentences of the general messaging dialogs
12
Data Preparation for Building Models
Transcription File
• Gives the words spoken
• This file contains one line for each file used in training
• The line contains the text of the words spoken and the
filename (without extension such as .wav)
• So in a file the dialogue of the speaker noted exactly the same
precise way it has been recorded, with silence tag (starting tag
<s> , ending tag </s>), followed by the file id which represent
the utterance. For example:
သူ ငယ်ချင်းရေ<s> ငါတို့ </s>,<s> မနက်ဖြန် </s>, ဆုံ ရအောင်
13
Data Preparation for Building Models
Pronunciation Dictionary
• Maps words to pronunciations
• A dictionary can also contain alternative pronunciations.
Single word may have multiple pronunciations
အပြုံ း a pjoun:
14
Data Preparation for Building Models
15
Setting Up the System Environment
Hardware Requirements
• Android mobile of version 2.2 minimum.
• Processor should not be less than 500MHZ.
• RAM should not be less than 170MB.
• SD card of minimum 512 MB.
• Device should be enabled for USB debugging.
Software Requirements
• Android Mobile Operating System of version 2.2 or later.
• IDE tools: Eclipse or Android Studio.
• User Interface: XML.
• Code Behind: JAVA and XML.
• Internet: Yes.
16
CMUSphinx Toolkit
17
CMUSphinx Toolkit
18
Training an Acoustic Model
19
Example of the Sentences in the Acoustic Model
မင်္ လာပါ
သူ ငယ်ချင်း
နေကောင်းရဲ့လား
ဒီဟာဘယ်လောက်ကျလဲ
ကားဂိတ်က ဘယ်မှ ာလဲ
ကျေးဇူ းတင်ပါတယ်
ထမင်း စားပြီးပြီလား
မနက်ဖြန် တွေ့ရအောင်
22
Building a Language Model
25
How to Build a Statistical Language Model
There are many approach and tools to create the statistical
language model.
CMU language modeling toolkit will be used to create n-gram
language model
The output language model file is the ARPA format or binary
26
Overview of Speech Recognizer
27
Selecting Next Set of States
28
Conclusion
29
QU
EST
I ON
S&
ANS
WE
RS
30
THANK YOU SO MUCH
31