Professional Documents
Culture Documents
Development and Suitability of Indian Languages Speech Database For Building Watson Based ASR System
Development and Suitability of Indian Languages Speech Database For Building Watson Based ASR System
Development and Suitability of Indian Languages Speech Database For Building Watson Based ASR System
net/publication/260508111
CITATIONS READS
4 35
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Tapabrata Mondal on 25 March 2019.
Abstract-In this paper, we discuss our efforts in the system, which have great similarities. The work
development of Indian spoken languages corpora for could benefit large number of people working in the
building large vocabulary speech recognition systems field of speech recognition, as we are exploiting our
using WATSON Toolkit. The current paper research in the study of comparison of phonemes of
demonstrates that these corpora can be reduced to a different languages. Indian languages are basically
varied degree for various phonemes by comparing phonetic in nature and there exists a one-to-one
the similarity among phonemes of different correspondence between the orthography and
languages. We also discuss the design and pronunciation in all the sounds, barring few
methodology of collection of speech databases and exceptions.
the challenges we have faced during database
creation. The experiments have been conducted on 2. ASR SYSTEM ARCHITECTURE
commonly known Indian languages, by training the The architecture of speech recognition system is
ASR system with WATSON toolkit and evaluation shown in Fig 1.
by Sclite. The results for these experiments show that It contains two modules: The Training Module and
different Indian languages have a great similarity The Testing Module.
among their phoneme structures and phoneme Training module generates the system model with
sequences and we have exploited these features to which test data has been compared to get
create speech recognition system. Also, we have performance percentage. Testing module compares
developed an algorithm to bootstrapping the the test-data with training module and yields the 1-
phonemes of one particular language into another by best hypothesis.
mapping the phonemes of different languages. The First of all, Pronunciation Dictionary is created using
performance of Hindi and Bangla ASR systems using G2P Model (Section 5.1) which is trained with
these databases has been compared. 30,000 words that are linguistically correct. Based on
Keyword Components: Speech Recognition, Speech these linguistically correct words, English phonemes
databases, Indian Languages. of different Hindi Graphemes have been generated.
For creating G2P Model, we have used Moses [8].
1. INTRODUCTION The pronunciation dictionary along with mapping
Researchers are striving hard currently to improve dictionary (Section3.3) represents the different
the accuracy of the speech processing techniques for possibilities of pronouncing, or occurrence of a word.
various applications. In the recent parts, some of the Language Model has been created using large set of
researchers have been focusing on development of text data to capture all the possibilities of occurrence
suitable speech databases for Indian languages for of a phoneme in a word, or a word in a sentence.
developing speech recognition systems: Thus, to give strength to the Acoustic Models
Samudravijaya et al. [1], R.K.Agarwal [2], Chourasia (Section3.1), Language Model has been created.
et al. [3], Shweta Sinha & S.S. Agarwal [4], Srinivas In Testing Module, Sclite [9] is used for evaluating
Bangalore [5], Ahuja et al. [6], Maya Ingle, and the 1-best hypothesis of each word. The average of
Manohar Chandwani [7]. all the accuracies of different words gives the overall
In this paper, our goal is to develop a speech accuracy of the Speech Recognition system with
recognition system that uses Indian languages possible word accuracy percentage.
corpora through WATSON Toolkit. We are focusing Also, the Insertion, Deletion and Substitution error
our major concentration on those languages for can be computed.
developing large vocabulary speech recognition
. Fig 1. ASR SYSTEM ARCHITECTURE
3. BUILDING ASR SYSTEMS We have trained the HMM models using Watson Toolkit
Typically, ASR system comprises of three major [10].For parameterization, Mel Frequency Cepstral
constituents - the acoustic models, the language model Coefficients (MFCC) have been computed. At the time
and the phonetic lexicons. of recognition, various words are hypothesized against
the speech signal.
3.1. Acoustic Models: In this experiment, context- To compute the likelihood of a word, the 1-best
independent as well as context-dependent models of hypothesis of individual word of the text data has been
Hindi & Bangla have been created by borrowing taken with the help of Sclite. The combined likelihood of
phonemes from English. Context-independent models are all the phonemes represents the likelihood of the word in
basically mono-phone models, taking each phone as an the acoustic models.
individual sound unit. Furthermore, context-dependent
models take the probability of occurrence of one phone, 3.2. Language Model: For the language model, very
relative to the neighbouring phones. The data used for large set of text data is required, so that all the
creating acoustic models for Hindi and Bangla have been possibilities of occurrence of a word in Indian languages
shown in Table1 & Table2 respectively. can be captured. The text data taken for language model
is shown in Table3 & Table4.
Corpus Number of Speakers
Sentences (Male / Female) Corpus Number of Total Unique
General Messages 1260 3(Male),2(Female) Sentences Words Words
General Messages 1260 65300 54324
Health & Tourism 41282 2(Male),2(Female) Health & Tourism 41282 90140 67522
Corpus Corpus
Hindi 50 74.2
Bangla 50 65.6
Table10: Comparison of Hindi & Bangla ASR
Language Sentences from Testing Accuracy (%) Some of the conclusions of our study are:
Set Female Speakers perform better when the system is
Hindi 50 57.4 trained with Female voice database alone.
Bangla 50 64.2 Accuracy of the system is found better when the
Table9. Accuracies of Original Bangla Sentences & its system is trained with variety of speakers and their
transliterated Version speaking style as compared to simply increasing the
corpus from limited number of speakers.
The system has been trained for Original Bangla Native Speakers performs better than Non-Native
sentences. Thus, the system is giving better accuracy for Speakers in all conditions
Bangla sentences than their Hindi transliterated version. As the beam-width of training speakers increases,
As the difference in accuracy percentage is not large, word accuracy also increases.
shows the transliteration is effective. Thus, we can The word accuracy also increases with clock time.
increase text corpus of a particular language by using
transliterated data obtained from any other language. We hope that the ASRs created using the database
developed in this experiment will serve as baseline
7.5 Comparison of Hindi & Bengali ASR Model: In the systems for further research in improving the accuracies
experiment, we have build the acoustic model for both in each of the languages. Our future work is focused in
Bangla corpus and its Hindi transliterated corpus tuning these models and test them using language and
acoustic models built using a much larger corpus from APPENDIX
large number of speakers. The characterizations of Hindi and Bangla phonemes
have been done as follows:
10.ACKNOWLEDGEMENT CATEGORY HINDI BENGALI IPA REPRESENTATION
We would like to acknowledge the help and support PHONEMES PHONEMES USING ENGLISH
PHONEMES
recieved from Mr.Anirudhha of IIIT,Hyderabad in Monothongs अ অ /ə/ AX
conducting these experiments.We are also thankful to (Short) इ ই /i/ I
उ উ /u/ U
Prof.Michael Carl of CBS, Copenhagen and KIIT ऋ ঋ / RR
management in particular to Dr. Harsh V.Kamrah,Mrs. Monothongs आ আ aː AA
Neelima Kamrah for providing necessary (Long) ई ঈ iː II
ऊ uː
facilities,financial help and encouragement .Also to ঊ UU
/e/
ए এ E
DIETY for providing fellowship to one of the author /o/
ओ ও O
Dipti Pandey. Diphthongs ऐ ঐ /æ/ AI
औ ঔ /ɔː AU