Professional Documents
Culture Documents
Accent Classification Among Punjabi, Urdu, Pashto, Saraiki and Sindhi Accents of Urdu Language
Accent Classification Among Punjabi, Urdu, Pashto, Saraiki and Sindhi Accents of Urdu Language
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/269398058
CITATIONS READS
0 134
4 authors:
Some of the authors of this publication are also working on these related projects:
Accent Classification among Punjabi, Urdu, Pashto, Saraiki and Sindhi accents of Urdu Language View
project
All content following this page was uploaded by Afsheen Rafaqat Ali on 11 December 2014.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Accent Classification among Punjabi, Urdu, Pashto, Saraiki and Sindhi
Accents of Urdu Language
Abstract ASR system which gives best accuracy for all accents
of a language.
Automatic Speech Recognition (ASR) is a key Present speech recognition systems use
component in Human Computer Interaction (HCI) pronunciation dictionaries, which are based on
applications. Stability of ASR systems largely depends pronunciation of words followed by a single accent
on accent, gender, age of speakers, background noise group. Different accents give rise to multiple
and channel variations. In this paper, a study has been pronunciations of a word due to the use of different
conducted to classify five different accents of Urdu phonemes. But the addition of multiple pronunciations
language spoken in Pakistan i.e. Punjabi, Urdu, in ASR dictionary against a single word generates
Pashto, Saraiki and Sindhi. Speech data has been additional confusions, resulting in higher recognition
collected from native speakers of these accents. The error rate [1].
five accents have been classified using mel frequency The most widely used approach to address accent
cepstral coefficient (MFCCs) and feature formants. variation problem in speech recognition systems is to
build multiple accent dependent ASR systems. In order
to send accented speech to the accent-specific speech
1. Introduction recognizer, the above approach requires a pre-
processing step of accent classification.
The objective of an automatic speech recognition In different geographical regions of Pakistan, 59
system is to convert speech into text. The performance languages are spoken. Based on the accent and native
of ASR systems depends on the training data on which language, there are six prominent accents used in
acoustic models have been trained. Speaker dependent Pakistan, namely Urdu, Punjabi, Pashto, Saraiki,
(SD) ASR systems perform better than speaker Sindhi and Balochi [2]. In the past, major research has
independent (SI) ASR systems, as the trained speech been done on Urdu literature or grammar but no major
samples in SD ASR systems cover most of the possible work is done on the acoustic analysis of the different
acoustic variations of the targeted speakers. The accents of Urdu. Although less than 8% of Pakistanis
acoustic variations in speakers are mainly due to age, speak Urdu as their first language, but it is spoken and
gender and regional accents. Among these factors, understood as a second language by almost all
accent is the most leading factor that contributes to a Pakistanis. In learning a second language, vowel
higher error rate in ASR systems. articulation is the key factor to take care of [4]. In
Accent refers to the articulation pattern that a Phonetics, vowel is a sound least obstructed by any of
speaker follows to produce a particular sound and it is the articulators during its articulation and are generally
also related to speaker's first language, which affects called the directors of the sounds [5]. Therefore, in our
speaker's perception and production of speech. Dialect experiments we use vowels to classify five different
and accent do not cause much trouble when accents of Urdu language.
communicating among humans but gives poor The rest of the paper is structured as follows:
recognition results with ASR systems. ASR trained on Section 2 describes the past techniques used to classify
a particular accent gives poor accuracy on cross accent accents of different languages, Section 3 details the
speech data. It is a worth challenging task to build an two methodologies followed in this paper, to classify
five different accents of Urdu language, Section 4
reports the results, Section 5 discusses the results and experiments and different regions are clustered using
Section 6 concludes the paper. Bhattacharya distance.
The above discussion shows that a lot of work has
2. Literature Review been done to classify Chinese and English accents but
no research has been carried out before, to classify
The performance of speech recognition systems different accents spoken in Pakistan. In this paper we
improve significantly over the last few years as this proposed a method to classify five widely spoken
field has received much attention by researchers. In the accents of Urdu language in Pakistan i.e. Punjabi,
past, many experiments have been performed in order Urdu, Pashto, Saraiki and Sindhi.
to classify Chinese, Indian, Korean and American
English accents. 3. Methodology
A recent research shows that the native accent
pronunciation dictionary, used in ASR systems, can be In this paper, we have classified Punjabi, Urdu,
transformed to the accented speech dictionary by Pashto, Saraiki and Sindhi accents of Urdu language
simply using knowledge of native language of the by using vowels. Based on two different acoustic
foreign speakers. The use of accent adapted dictionary features, two experiments have been conducted to
to classify different Chinese and English accents classify above mentioned five accents. The vocabulary
reduces speech recognition error rate by 13.5% [7, 11]. of the experiments consists of 139 district names of
Acoustic level adaptation techniques such as MLLR Pakistan. The proper nouns are chosen, as these are
and an integration of both PDA and MLLR are also language independent words and are familiar to the
performed to trace the detailed acoustic changes in the speakers of all five languages. Urdu is the national and
speech of speakers due to change in speaking speed official language of Pakistan, therefore, district names
and style [11]. Results in the paper showed that both can also be considered to be words/phrases of Urdu
techniques are complementary. language. The training and testing corpus consists of
Another approach used to identify Beijing, 760 utterances from native speakers of above
Shanghai, Guangdong and Taiwan accents from multi- mentioned five languages. The speaker distribution is
accent Mandarin corpus, consisting of male and female listed in Table 1.
speakers of each accent, is by using a set of Gaussian
mixture models (GMMs) [8, 11]. The GMMs are used Table 1: Speaker distribution
to estimate the probability that the input speech comes
Accents Male Speaker Female Speaker
from a particular gender and accent. This approach
shows that performance of accent dependent systems is Punjabi 130 77
generally better than that of accent independent Urdu 45 56
systems. Pashto 14 6
Multivariate statistical analysis is also used to Saraiki 13 5
classify different accents [9]. The corpus used for the Sindhi 5 14
experiments consists of 4925 telephone utterances of
American English being spoken by native speakers of
The speech is recorded over telephone channel at
23 different languages. The study shows that
the sampling rate of 8 kHz with random background
classification results of support vector machine are
noise.
better than other classifiers like ZeroR, Naive Bayes,
In Urdu, there exist 13 vowels: three short, three
Logistic Regression, SMO and K- Neatest Neighbors
medial and seven long [3]. In our experiments, we used
using MFCC, FBank and LPC features.
eight vowels, three short and five long, having
In order to classify Chinese, Indian, Korean and
sufficient data for analysis. The list of eight vowels
Croatian accents of English speakers, Q factor is
along with Urdu letters, CISAMPA and IPA symbols
introduced [10]. The paper reports that clustering of
is given in Table 2.
different accents based on formants does not assist to
classify accents with high accuracy.
Different geographic regions of Sweden that share
common accents are grouped using MFCC feature
vectors [6]. The Swedish Speech data FD5000
telephone speech database was used for these
Table 2: Urdu vowel list with common in speaker recognition, which is the task of
IPA and CISAMPA symbols recognizing people from their voices. The middle
Sr. # Urdu IPA CISAMPA frame of 100 samples of each vowel is used to
Letter calculate MFCCs. These cepstral features are
1 ء،َ ə A calculated using HTK toolkit. The Hidden Markov
2 آ،ا ɑ: AA Model Toolkit (HTK) is a portable toolkit for building
and manipulating hidden Markov models. Only the
3 e: AE
first twelve MFCCs have been used here to classify
4 َِ ɪ I accents of Urdu language. The MFCC data is first
5 ی i: II normalized using Weka (Weka is a collection of
6 و o: OO machine learning algorithms for data mining tasks.)
and then mean MFCC vector for each accent of a
7 َُ ʊ U
vowel has been computed.
8 ُو u: UU Using 12 dimensional mean MFCC vector, the
distance between two accents has been computed by
The data of each of eight vowels listed above is Euclidian distance formula. For example, distance
given in table 3. between Punjabi and Urdu accents (of a vowel of Urdu
language) is computed using equation 1.
Table 3: Vowel data statistics
Vowel Punjabi Urdu Pashto Saraiki Sindhi
CISAMPA
(mfcc1PUN mfcc1URD ) 2 ...
A 276 109 295 80 142 d ( Pun,Urd ) (mfcc 7PUN mfcc 7URD ) 2 ... (1)
AA 255 82 164 51 90 (mfcc12PUN mfcc12
URD 2
)
AE 69 21 37 20 20
I 125 64 52 12 29 The distance between Punjabi and Urdu accents of a
II 65 25 57 19 30 vowel of Urdu language using formants has been
calculated using two dimensional Euclidian distance
OO 39 16 15 7 15 formula given in equation 2.
U 62 28 39 14 20
UU 36 23 22 8 13 d ( Pun,Urd ) ( F1PUN F1URD ) 2 ( F2PUN F2URD ) 2
(2)
In Experiment 1, we used feature formants, which
are proved to be sensitive features to classify accents of The distances calculated using these two equations
Chinese language [10], to classify accents of Urdu are then compared and analyzed in following sections.
language. Preliminary analysis of first and second
formant frequencies (F1 and F2) shows differences in 4. Classification Results
characteristics of vowels. Formants are calculated from
the midpoint of each vowel to get a standard value
The classification results of experiment 1 are shown
using Praat, Praat is a package for analysis of speech in
below in graphs. Each vowel with the five accents is
phonetics. The mean and standard deviation of F1 and
plotted in a separate graph along with remaining
F2 for five accents of each vowel have been calculated.
vowels of Urdu accent and each plot shows standard
Each accent of a vowel is compared with the remaining
deviation of formants (F1 and F2) of a particular vowel
four accents of the same vowel and all vowels of Urdu
from their mean.
accent, to find out how much these accents and vowels
The figure 1 shows the standard deviation of F1 and
are different from each other.
F2 of short vowel "A" (of above mentioned five
In Experiment 2, we have used mel frequency
accents) from their mean along with other vowels of
cepstral coefficients (MFCCs) and formants (F1 and
Urdu language.
F2) as acoustic features to classify five different
accents of Urdu language by calculating distance
between them. MFCCs are commonly used
as features in speech recognition. They are also
The figure 4 below shows the standard deviation of
formant frequencies of short vowel "I" (of above
mentioned five accents) from their mean along with
remaining vowels of Urdu language.
5. Discussion
The figures 1 to 8 of experiment 1 show that using
F1 and F2, all vowels of the Urdu accent can be
classified, with an exception of short vowel I which is
being confused with long vowels II and AE of Urdu
language (shown in all figures 1 to 8).
Figure 7: Comparison of U vowel of 5 But on the other hand it is very clear from the above
accents and all vowels of Urdu graphs that using F1 and F2, it is difficult to classify
Punjabi, Urdu, Pashto, Saraiki and Sindhi accents of A,
Figure 8 shows the standard deviation of formant AA, AE, I, II, OO, U and UU vowels.
frequencies of "UU" vowel of five accents from their Table 4 of experiment 2 shows that distance
mean. calculated between two accents using MFCCs is
always greater than the distance calculated using
formant frequencies. Therefore, it can be concluded
that on the basis of distance calculated using MFCCs,
the probability of an accent to get confused with other
accents is minimal. As given in table 4, the calculated
distance between "Urdu and Saraiki" accents based on
formant frequencies values is 0.402592 while based on
MFCC values is 1.514395 which is almost three times
greater than that calculated using formant frequencies.
Therefore, MFCCs can be used to classify Punjabi,
Urdu, Pashto, Saraiki and Sindhi accents of Urdu
language.