Developing Speech To Text Messaging System Using Android Platform

Developing Speech to Text Messaging System
Using Android Platform
Supervised By
Dr. Wint Pa Pa Kyaw
Associate Professor
Candidate
Ma Htet Yi Zaw
3PhDCom - 2
Department of Computer Studies
University of Yangon
6-March-2020 1
Main Title :
MYANMAR SPEECH TO TEXT SYSTEM ON ANDROID
2
1PhD Regular Title :
Study of Myanmar Language Acoustics Signal to

Strings
1PhD Credit Title :
Modeling Approaches for Myanmar Language

Speech Recognizer
3
Compatible Methods and Models for Myanmar

Continuous Speech Recognition System
2PhD Credit Title :
Compatible Models on Speech to Text SMS

Messaging System
4
Developing the Speech Recognition Models for

Myanmar Language
3PhD Credit Title :
Developing Speech to Text Messaging System Using

Android Platform
5
Contents
1. Introduction
2. Data Preparation for Building Models
3. Setting Up the Environment
4. Building Acoustics Model
5. Building Phonetic Dictionary
6. Building Language Model
7. Conclusion
6
Introduction
 Speech to text messaging system is the mobile application with the

process of writing hand-free SMS (Short Message Service)
 This can help smart phone users to send their message faster and
also give the chance to handicapped individuals who are unable to
type, and write their messages.
 For building Myanmar language speech recognizer, Sphinx tools
are chosen to use after learning about different tools.
 And then the data and files/scripts are needed to learn and prepare
7
Limitations of Speech Recognition Models
 Vocabulary Size and Confusability

 Speaker Dependence and Independence
 Isolated, Discontinuous, or Continuous Speech
 Read and Spontaneous Speech
 Real-time Recognition and Recorded Samples
8
Data Preparation for Building Models
 An acoustic model contains
acoustic properties for each state of
phone.
 A phonetic dictionary contains
a mapping from words to phones.
 A language model is used to
restrict word search.
Those three entities are combined together in an engine to

recognize speech.
9
 The general data items of a typical speech recognizer are

 Text Preparation
 Speech Corpus
 Transcription File
 Pronunciation Dictionary
 Language Model
 Phone File
10
 Text Preparation
• List of possible saying words for messaging are selected.
မင်္ လာပါ အခု ဘယ်မှ ာလဲ

သူ ငယ်ချင်း အစည်းအဝေးခန်းမှ ာ
နေကောင်းရဲ့လား ဖု န်းပြန်ခေါ်လို က်မယ်
ဒီဟာဘယ်လောက်ကျလဲ အိပ်ပြီလား
ကားဂိတ်က ဘယ်မှ ာလဲ မနက်တွေ့မယ်လေ
ကျေးဇူ းတင်ပါတယ် လဘက်ရည်ဆို င်သွားရအောင်
ထမင်း စားပြီးပြီလား ဒီနေ့ကျောင်းလာမှ ာလား
မနက်ဖြန် တွေ့ရအောင် လိပ်စာလေးပို့ပေးပါ
ဒီစာရရင် ဖု န်းပြန်ဆက်ပါ ခဏနေရောက်မယ်
အခု မအားလို့ နောက်မှ ဖု န်းပြန်ခေါ်လို က်မယ် မု န့်ဝယ်ခဲ့ ပါ
11
 Speech Corpus
• To gather the speech that has already been recorded and
manually transcribe it into text.
• To create the text corpus first and record the speech by
reading the collected text.
• To collect daily conversational data, the latter method is used
• 4 male and 4 female speakers
• Recording 20 sentences of the general messaging dialogs
12
 Transcription File
• Gives the words spoken
• This file contains one line for each file used in training
• The line contains the text of the words spoken and the
filename (without extension such as .wav)
• So in a file the dialogue of the speaker noted exactly the same
precise way it has been recorded, with silence tag (starting tag
<s> , ending tag </s>), followed by the file id which represent
the utterance. For example:
သူ ငယ်ချင်းရေ<s> ငါတို့ </s>,<s> မနက်ဖြန် </s>, ဆုံ ရအောင်
13
 Pronunciation Dictionary
• Maps words to pronunciations
• A dictionary can also contain alternative pronunciations.
Single word may have multiple pronunciations
အပြုံ း a pjoun:
တောင် ပြုံ း taun pjoun: => taun bjoun:
14
 The general data items of a typical speech recognizer are

 Language Model
A language model is used to restrict word search.
Sample texts are collected for language model training
The issue with such a collection is to put present documents
(like PDFs, web pages, scans) into a spoken text form. That
is, removing tags and headings, expanding numbers are
needed to their spoken form and to expand abbreviations.
15
Setting Up the System Environment
 Hardware Requirements
• Android mobile of version 2.2 minimum.
• Processor should not be less than 500MHZ.
• RAM should not be less than 170MB.
• SD card of minimum 512 MB.
• Device should be enabled for USB debugging.
 Software Requirements
• Android Mobile Operating System of version 2.2 or later.
• IDE tools: Eclipse or Android Studio.
• User Interface: XML.
• Code Behind: JAVA and XML.
• Internet: Yes.
16
CMUSphinx Toolkit
 State of the art speech recognition algorithms for efficient

speech recognition.
 CMUSphinx toolkit is a best platform for the practical
application development.
 Support for several languages like US English, UK English,
French, German and ability to build a models for low-resourced
languages such as Myanmar language
 Wide range of tools for many speech recognition related
purposes.
17
CMUSphinx Toolkit
 CMUSphinx contains a number of packages for different tasks

and applications.
• Pocketsphinx — lightweight recognizer library written in C.
• Sphinxbase — support library required by Pocketsphinx
• Sphinx4 — adjustable, modifiable recognizer written in Java
• Sphinxtrain — acoustic model training tools
18
Training an Acoustic Model
 The acoustic model is trained by analyzing large corpora of

Myanmar language speech with label.
 Sphinxtrain tool are chosen to build acoustic model for a new
language, Myanmar.
 The trainer learns the parameters for the models of the sound
units using a set of sample speech signals.
 This is call a training database.
 The database contains information that is required to extract
statistics from the speech in form of the acoustic model.
19
Example of the Sentences in the Acoustic Model
မင်္ လာပါ
သူ ငယ်ချင်း
နေကောင်းရဲ့လား
ဒီဟာဘယ်လောက်ကျလဲ
ကားဂိတ်က ဘယ်မှ ာလဲ
ကျေးဇူ းတင်ပါတယ်
ထမင်း စားပြီးပြီလား
မနက်ဖြန် တွေ့ရအောင်
 The speech corpus is created by recording speech of the above

texts.
20
Building a Phonetic Dictionary
 A phonetic dictionary provides the system with a mapping of

vocabulary words to sequences of phonemes.
 A dictionary can also contain alternative pronunciations.
က/ka
က/ga
စား/sa:
က စား/ga za:
 A dictionary should contain all the words, otherwise the

recognizer will not be able to recognize them.
 The recognizer looks for a word in both the dictionary and the
language model.
 Without the language model, a word will not be recognized.
21
Example of Phonetic Dictionary (Lexicon)
Part of Phonetic Dictionary

က ka.
ကာ ka
ကား ka:
ကိ ki
ကီ ki
ကောင်း kaun:
လည်းကောင်း le` kaun: => la gaun:
တောင် ပြုံ း taun pjoun: => taun bjoun:
နတ် က တော် na’ ka. to => na’ ga do
22
Building a Language Model
 Language models help guide and constrain the search among

alternative word hypotheses during recognition
 The language model is an important component of the
configuration which tells the decoder which sequences of words
are possible to recognize.
 There are several types of models:
 keyword lists
 grammars and
 statistical language models
 They have different capabilities and performance properties.
23
Statistical Language Models
 Statistical language models contain probabilities of the words
and word combinations.
 Those probabilities are estimated from sample data and
automatically have some flexibility.
 Every combination from the vocabulary is possible, although
the probability of each combination will vary.
 And then, they require way less engineering effort than
grammars.
 A language model can be stored and loaded in three different
formats: text ARPA format, binary BIN format and binary
DMP format. 24
How to Build a Statistical Language Model
 Prepare a reference text that will be used to generate the
language model.
 The set of sentences that are bounded by the start and end
markers of the sentence: <s> and </s>. More data will generate
better language models.
 Generate the vocabulary file from a reference text.
 Edit the vocabulary file to remove words (numbers,
misspellings, names)
 Finally , convert the model to a binary format for faster loading.
25
How to Build a Statistical Language Model
 There are many approach and tools to create the statistical
language model.
 CMU language modeling toolkit will be used to create n-gram
language model
 The output language model file is the ARPA format or binary
26
Overview of Speech Recognizer
27
Selecting Next Set of States
 Uses Grammar to select next set of possible words

 Uses dictionary to collect pronunciations for words
 Uses Acoustic Model to collect HMMs for each pronunciation
 Uses transition probabilities in HMMs to select next set of states
28
Conclusion
 The works presented in above is a step towards the development

of Myanmar speech to text system on android platform
 By incorporating the three features such as large vocabularies,
continuous capability and speaker independent, Myanmar
speech to text system on android will be developed.
 Speech recognition system requires fast machines with lots of
data capacity and memory for complex recognition tasks
 In addition, the size of the vocabulary improves massively the
accuracy of the recognition.
29
QU
EST
I ON
S&
ANS
WE
RS
30
THANK YOU SO MUCH
31

Developing Speech To Text Messaging System Using Android Platform

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Developing Speech To Text Messaging System Using Android Platform

Uploaded by

Copyright:

Available Formats

Developing Speech to Text Messaging System

Using Android Platform

MYANMAR SPEECH TO TEXT SYSTEM ON ANDROID

Study of Myanmar Language Acoustics Signal to

1PhD Credit Title :

Modeling Approaches for Myanmar Language

Compatible Methods and Models for Myanmar

2PhD Credit Title :

Compatible Models on Speech to Text SMS

Developing the Speech Recognition Models for

3PhD Credit Title :

Developing Speech to Text Messaging System Using

 Speech to text messaging system is the mobile application with the

 Vocabulary Size and Confusability

Those three entities are combined together in an engine to

 The general data items of a typical speech recognizer are

မင်္ လာပါ အခု ဘယ်မှ ာလဲ

တောင် ပြုံ း taun pjoun: => taun bjoun:

 The general data items of a typical speech recognizer are

 State of the art speech recognition algorithms for efficient

 CMUSphinx contains a number of packages for different tasks

 The acoustic model is trained by analyzing large corpora of

 The speech corpus is created by recording speech of the above

 A phonetic dictionary provides the system with a mapping of

 A dictionary should contain all the words, otherwise the

Part of Phonetic Dictionary

 Language models help guide and constrain the search among

 Uses Grammar to select next set of possible words

 The works presented in above is a step towards the development

You might also like