Professional Documents
Culture Documents
Moroccan Dialect Speech Recognition System Based On Cmu Sphinxtools
Moroccan Dialect Speech Recognition System Based On Cmu Sphinxtools
on CMU SphinxTools
Abderrahim Ezzine Hassan Satori Mohamed Hamidi Khalid Satori
LISAC Laboratory LISAC Laboratory LISAC Laboratory LISAC Laboratory
Faculty of Sciences Dhar Faculty of Sciences Dhar Faculty of Sciences Dhar Faculty of Sciences Dhar
Mahraz, Sidi Mohammed Ben Mahraz, Sidi Mohammed Ben Mahraz, Sidi Mohammed Ben Mahraz, Sidi Mohammed Ben
Abbdallah University Abbdallah University Abbdallah University Abbdallah University
Fés, Morocco Fés, Morocco Fés, Morocco Fés, Morocco
ezzine2abderrahim@gmail.com hassan.satori@usmba.ac.ma mohamed.hamidi.5@gmail.com khalidsatori@gmail.com
Abstract— The main aim of an Automatic Speech Recognition H. Satori and F. ElHaoussi [8] have implemented an
system (ASR) is to produce a system that is able to simulate the Amazigh speech system by using CMU Sphinx tools based on
human listener based on the learning approach and speech data of Hidden Markov Model. The exploited corpus includes 60
a studied language. In this paper, we describe the Darija Amazigh Moroccan speakers Tarifit native, equally divided
Moroccan Dialect speech recognition system that is implemented
between male and female. Their designed system permits to
to recognize the ten first Arabic digits spoken in Moroccan dialect
(Darija) collected from 20 speakers including both males and recognize digits and alphabets of Amazigh language and the
females. This system is designed based on the CMU Sphinx tools best-achieved performance is 92.89% found with 16 Gaussian
through the ASR Hidden Markov Model method with small data Mixture models.
and the Mel frequency spectral coefficients (MFCCs) that are used
in the feature extraction phase. Our best-obtained accuracy is
Ouissam et al. [9] have presented an automatic speech
96.27 % found with 8 GMMs. recognition system based on Sphinx4 that permit to detect the
people who have disorders voices. Their project is carried out
Keywords—Speech recognition; Moroccan dialect; HMMs; using Amazigh language in order to differentiate the normal and
MFCC; CMU Sphinx; Acoustic model; Artificial intelligence. pathological voices. Their findings were measured using
combinations of HMMs 5-states with 8 Gaussian mixture
I. INTRODUCTION distributions.
Automatic Speech Recognition (ASR) defined as a
Hamidi et al. [10] have presented an interactive security
technology that allows a computing device to converts the words
system-based speech recognition technique. In their work, the
into a readable text by way of a microphone or telephone. The
ASR-HMM and IVR technologies were combined to allow the
ASR has a large field of implementations such as command
distance tasks administration managing by utilizing speech
recognition, interactive voice response, dictation, it can be used
commands and the security identification by using biological
to help handicapped people to interact with the community. It is
voiceprint. The findings present that the access rate is more than
a technology that makes life more facile [1]. The principal aim
80 % whereas the non-admin recognition rate is less than 6%.
of ASR research is to permit a computer to identifies all words
that were spoken by anybody, independent of vocabulary size, The aim of this paper is to create the Darija dialect ASR and
noise and speaker characteristics in real-time with 100% explore the changes that must be realized in the model to adapt
precision [2]. Moroccan dialect speech recognition. Our work will be based on
the hidden Markov model - Gaussian mixture model
To build an ASR system, we need to create the Language
combination. The proposed system will be designed by using
Model (LM), Acoustic Model (AM), and dictionary for the
Carnegie Melon University (CMU) Sphinx which is a statistical
target language. Unfortunately, designing an acoustic model for
speaker-independent set of tools using the Hidden Markov
a specific language is expensive, unlike the AM and the
Models (HMM).
dictionary. Due to the recording of speech data from speakers to
ensure the ASR speaker-independent [3]. Given the importance The paper is organized as follows: Section 2 presents the
of ASR technology, several systems are implemented for Moroccan dialect. Section 3 presents the Moroccan dialect
different languages based on the HTK [4] and CMU Sphinx [5], speech recognition system. Section 4 shows the experimental
Dragon [6], KALDI [7] toolkits. results and Section 5 concludes the paper.
Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 26,2020 at 12:44:29 UTC from IEEE Xplore. Restrictions apply.
II. MOROCCAN DIALECT TABLE I. TEN FIRST DIGITS WITH THEIR SYLLABLES AND THEIR
TRANSCRIPTION IN ENGLISH, ARABIC
Parameter Value
Text
Sampling rate 16 kHz
Fig. 1. Block diagram of ASR System
Number of bits 16 bits
Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 26,2020 at 12:44:29 UTC from IEEE Xplore. Restrictions apply.
Welch algorithm. Training is the procedure of building the
knowledge base by learning the Acoustic Model and Language Pre-
Framing
Model used by the system [8]. Fig. 2 presents the training emphasis
process.
The generation of the acoustic model is realized by grouping
a set of input data and treat them with the SphinxTrain tool (see
Fig. 8). The input data is as follow: Mel frequency
FFT Windowing
• Speech data filtering
Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 26,2020 at 12:44:29 UTC from IEEE Xplore. Restrictions apply.
of our system we use the CMU-Cambridge statistical language TABLE III. RECOGNITION RATES TO THE TEN DARIJA DIGITS WITH THREE
HMM STATES AND DIFFERENT GMMS VALUES
modelling toolkit [20].
GMMs
S Moroccan Dialect
digits
4 8 16 32
Fig. 4. (a) representation of "S" Phoneme with 5 hmm states (b) TSAAOD 96.66% 93.33% 91.66% 93.33%
representation of “STTA” digits with hmm states.
Average 95.08% 96.27% 96.10% 94.58%
4) Pronunciation dictionary
TABLE IV. RECOGNITION RATES TO THE TEN DARIJA DIGITS WITH FIVE
The dictionary file is used as an intermediary among the AM HMM STATES AND DIFFERENT GMMS VALUES
and LM. Our used pronunciation dictionary file includes the ten
first Darija dialect words followed by their pronunciation. Fig. 5 GMMs
Moroccan Dialect
represents the phonetic dictionary list used in the training [21]. digits
4 8 16 32
Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 26,2020 at 12:44:29 UTC from IEEE Xplore. Restrictions apply.
respectively. We have observed that 8 GMMs gives the best rate [11] M. Ennaji, A. Makhoukh, H. Es-saidy, M. Moubtassime, S. Slaoui “ A
GRAMMAR OF MOROCCAN ARABIC” Publications of the Faculty
of 96.27 %. According to the analysis of digits recognition rates, of Letters Dhar El Mehraz, Fès 2004
the best recognized Darija digits are STTA and KHAMSA. [12] Zealouk, O., Satori, H., Hamidi, M., Laaidi, N., & Satori, K. (2018).
When we used the five HMM state, the systems try to recognize Vocal parameters analysis of smoker using Amazigh language.
2000 samples of all 10 Moroccan dialect digit. Table V shows International Journal of Speech Technology, 21(1), 85-91.
the accuracy rate of the system. The performances are 95.25 %, [13] Zealouk, O., Hamidi, M., Satori, H., & Satori, K. (2020). Amazigh Digits
95.25 %, 94.92% and 92.71% for using 4, 8, 16 and 32 GMMs, Speech Recognition System Under Noise Car Environment. In Embedded
Systems and Artificial Intelligence (pp. 421-428). Springer, Singapore.
respectively. Also, in the case of digits the best results were
[14] Hamidi, M., Satori, H., Zealouk, O., & Satori, K. (2019). Speech coding
found with 4 and 8 GMMs. The most frequently misrecognized effect on Amazigh alphabet speech recognition performance. J. Adv. Res.
Moroccan dialect digits are STTA and KHAMSA. Dyn. Control Syst, 11(2), 1392-1400.
[15] Hamidi, M., Satori, H., Zealouk, O., & Satori, K. (2020). Interactive
Voice Application-Based Amazigh Speech Recognition. In Embedded
Systems and Artificial Intelligence (pp. 271-279). Springer, Singapore.
V. CONCLUSION
[16] Barkani, F., Satori, H., Hamidi, M., Zealouk, O., & Laaidi, N. (2020).
Comparative Evaluation of Speech Recognition Systems Based on
Different Toolkits. In Embedded Systems and Artificial Intelligence (pp.
In this paper, the automatic speech recognition system for the 33-41). Springer, Singapore.
Darija Moroccan dialect was developed. This system is [17] Hamidi, M., Satori, H., Zealouk, O., & Satori, K. (2020). Amazigh digits
implemented by using CMU Sphinx tools based on HMMs with through interactive speech recognition system in noisy environment.
Gaussian mixtures. The corpus size used in this work is not large International Journal of Speech Technology, 23(1), 101-109.
and the best obtained result is about 96.27 % accurate which is [18] Addarrazi, I., Satori, H., & Satori, K. (2020). A Follow-Up Survey of
Audiovisual Speech Integration Strategies. In Embedded Systems and
very encouraging. Artificial Intelligence (pp. 635-643). Springer, Singapore.
In our future work, the proposed system can be improved by [19] Gupta, K., Gupta, D.: An analysis on LPC, RASTA and MFCC
using a large vocabulary of the Darija Moroccan dialect and we techniques in automatic speech recognition system. In: 2016 6th
test the performance of the system in a noisy environment. International Conference-Cloud System and Big Data Engineering
(Confluence), pp. 493–497. IEEE (2016)
REFERENCES [20] http://www.speech.cs.cmu.edu/tools/lmtool-new.html
[21] Satori, H., Hiyassat, H., Harti, M., & Chenfour, N. (2009). Investigation
Arabic Speech Recognition using CMU Sphinx System. The International
Arab Journal of Information Technology, 6(2), 186–190.
[1] Haton, J. P., Cerisara, C., Fohr, D., Laprie, Y., & Smaïli, K. (2006).
Reconnaissance automatique de la parole: Du Signal à son Interprétation.
Dunod.
[2] Anand, A. V., Devi, P. S., Stephen, J., & Bhadran, V. K. (2012,
December). Malayalam Speech Recognition system and its application
for visually impaired people. In 2012 Annual IEEE India Conference
(INDICON) (pp. 619-624). IEEE.
[3] Jackson, M. (2005). Automatic speech recognition: Human computer
interface for kinyarwanda language. A Project Report Submitted in Partial
Fulfillment of the Requirements for the Award of the Degree Master of
Science in Computer Science of Makerere University August.
[4] Young, S. J., & Young, S. (1993). The HTK hidden Markov model
toolkit: Design and philosophy. Cambridge, England: University of
Cambridge, Department of Engineering.
[5] Lee, K. F., Hon, H. W., & Reddy, R. (1990). An overview of the SPHINX
speech recognition system. IEEE Transactions on Acoustics, Speech, and
Signal Processing, 38(1), 35-45.
[6] Baker, J. (1975). The DRAGON system--An overview. IEEE
Transactions on Acoustics, speech, and signal Processing, 23(1), 24-29.
[7] Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel,
N., ... & Silovsky, J. (2011). The Kaldi speech recognition toolkit. In
IEEE 2011 workshop on automatic speech recognition and understanding
(No. CONF). IEEE Signal Processing Society
[8] Satori, H., Hiyassat, H., Harti, M., & Chenfour, N. (2009). Investigation
Arabic Speech Recognition using CMU Sphinx System. The International
Arab Journal of Information Technology, 6(2), 186–190.
[9] Zealouk, O., Satori, H., Hamidi, M., & Satori, K. (2020). Pathological
Detection Using HMM Speech Recognition-Based Amazigh Digits.
In Embedded Systems and Artificial Intelligence (pp. 281-289). Springer,
Singapore.
[10] Hamidi, M., Satori, H., Zealouk, O., Satori, K., & Laaidi, N. (2018,
October). Interactive voice response server voice network administration
using hidden markov model speech recognition system. In 2018 Second
World Conference on Smart Trends in Systems, Security and
Sustainability (WorldS4) (pp. 16-21). IEEE.
Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 26,2020 at 12:44:29 UTC from IEEE Xplore. Restrictions apply.