Elsa Speak App: Automatic Speech Recognition (ASR) For Supplementing English Pronunciation Skills

Pedagogy: Journal of English Language Teaching, (9)1: 01-14

Pedagogy: Journal of English Language Teaching

Volume 9, Number 1, June 2021
E-ISSN: 2580-1473 & P-ISSN: 2338-882X
Published by Institut Agama Islam Negeri Metro

Elsa Speak App: Automatic Speech Recognition (ASR) for

Supplementing English Pronunciation Skills

Adhan Kholis
Universitas Nahdlatul Ulama Yogyakarta, Indonesia
Email: adhan@unu-jogja.ac.id

ARTICLE INFO Nowadays, artificial intelligence (AI) became a special concern in language
Article history: teaching for the reason that it can assist and enhance language learning for all
Received levels of education. Again, it had beneficial roles for supplementing language
November 30, 2020 teaching like ELSA Speak App one of Automatic Speech Recognition (ASR)
used for teaching pronunciation. It studied how students heard, voiced,
Revised uttered, vocalized, and asserted the English words in the oral language, but the
December 15th, 2020 students often pronounced incorrect words with the result that the uttered
words had faulty meaning. This study aimed to carry out English Language
Accepted Speech Assistant (ELSA) Speak App to improve English language
March 2nd, 2021 pronunciation skills to higher education learners that were the English
Department Students of Nahdlatul Ulama University of Yogyakarta (UNU). The
data were collected using a test of pronunciation and interview. The researcher
also taught in the classroom. The results showed that ELSA Speak can increase
the students‟ pronunciation skills. It can be seen from the average scores
obtained from the teaching cycles from two to four in grade. Clearly, ELSA
Speak helped the students pronounce diverse words more easily and
comprehensively. Also, the available features offered by this app like instant
feedback enabled the students to pronounce precisely. In conclusion, ELSA
Speak can improve the students‟ pronunciation skills well and effectively.
Indeed, it can motivate the students to engage in learning to pronounce.

Keywords: artificial intelligence; ELSA Speak; pronunciation skills;


How to cite Kholis, A. (2021). Elsa Speak App: Automatic Speech Recognition (ASR) for
Supplementing English Pronunciation Skills. Pedagogy: Journal of English Language
Teaching, 9(1). 01-14
DOI: 10.32332/joelt.v9i1.2723.
Journal Homepage https://e-journal.metrouniv.ac.id/index.php/pedagogy
This is an open-access article under the CC BY SA license https://creativecommons.org/licenses/by-sa/4.0/

Pedagogy: Journal of English Language Teaching, (9)1: 01-14

INTRODUCTION greeting and others, the American model

English is a means for is usually a useful style because people
communication worldwide. It has been a instantly want to create clear interaction
global language (Crystal, 2003). It is not thinking of the presence of
regarded as an international language grammatical uses. Occasionally, the
because most people in the world use it in structures of sentences are often neglected
daily life for many kinds of purposes: by the speaker because it obstructs the
working, educating, getting jobs, fluence of the utterance itself.
transaction, and business. Definitely, to To formally exist, pronunciation
make sure that the message is delivered to should be taught in language teaching in
the listeners, people must possess the classroom. It becomes the subject of the
competencies in English like speaking one English course. It includes some aspects
skill regarding the producing words. such as sounds, stress, rhythm, and
Speaking is one of the most vital skills in intonation (Gerald, 2001). Teaching
English. In the field of works, the pronunciation has passed a shift from a
competence of speaking is very needed. narrow approach focusing on segmental
People used to English for getting a job features like phonetics and phonology to a
like applying TOEFL, TOEIC, and IELST comprehensive one stressed on
for continuing the study and working in suprasegmental features like stress,
many places. Many industries demand the intonation, rhythm, and sounds (Tergujeff,
employee to be active in communication 2012). For decades, teaching pronunciation
and interaction. People should be able to is sometimes disregarded (Hincks, 2003).
convince others with good English orally. The attention to students‟ pronunciation
In speaking, the most concern people errors which can inhibit efficacious
must know is also how to pronounce communication is a fundamental concern
words well. Pronunciation is one of the on why it is urgent with pronunciation in
English skills regarding the mastery of the classroom. Practically, there are some
phonetics and phonology how words are roles of teachers when teaching it: helping
articulated and produced into sound students hear and make sounds, providing
systems. It is necessary for making the feedback and correction, establishing
message clear and to be understood by the priorities, devising activities, and assessing
listener. There must be two accents namely progress (Joanne, 1988). Students hear the
British and American style. People can use sounds as the teacher said natively. After
both or one in practice. Both exactly have hearing the sounds, students immediately
differences in each word even in sounding filter them into the mind. Indeed, teachers
and articulation. Those make people force students to be more active in
difficult to pronounce English words. For producing words as students hear. The
most communication in daily life like feedback and correction should be given to
Pedagogy: Journal of English Language Teaching, (9)1: 01-14

know the progress. Since the Sometimes, teachers still make mistakes in
pronunciation is so complicated so that giving an example of sounding words.
teachers should regard the exercise types Indeed, students‟ dependency on teachers
and activities in the form of practice. as a model in learning to pronounce which
Theories and practice in spelling words corrects and evaluate the sounds still
must be balanced. Teachers should use dominates (McCrocklin, 2015), whereas
some approaches and methods in teaching teachers‟ instruction in pronunciation class
it to make students easy to produce and is a lack (Baker & Burri, 2016). This makes
pronounce words well. Every student students have a little space for expressing
wants to be able to speak as a native the words freely. Mostly, students may
speaker. receive habits of pronunciation from
There must be problems arising different places and people (Tlazalo Tejeda
teaching pronunciation in the classroom & Basurto Santos, 2014) making students
from either teachers or students. It is get lost of interference of knowledge.
distinct with the writing focusing on The native language also gets a great
marks on paper which are no sounds. It factor in learning to pronounce English
concentrates with the receiving voices in (Joanne, 1988). There must be two accents
the ear and contacts directly in face-to-face in English that are British and American.
activities (J.D. O‟Connor, 1998). Firstly, it Both have the own style and characteristics
tends to teachers lacking interest in the in spelling words making it students
subject. Secondly, teachers‟ knowledge of difficult in predicting each word. Others,
pronunciation contrasts with the practice the basic problem in English pronunciation
(Gerald, 2001). Many theories of is to construct the set of boxes conforming
pronunciation have been possessed by to the English sounds (J.D. O‟Connor,
teachers, however; there is no time for 1998). There must be lots of sets of
practicing and producing words in daily phonetic and phonological rules
activities enabling teachers are not connecting each other to create one
successfully difficult to pronounce words. formula. The English letters are different
The average of teachers‟ English habits can from Indonesia. Some have adequate
be seen when the teacher teaches in the similarities like „th, „ch, and „sh. One letter
classroom. Because English is not the has more sounding. Different from the
primary language in Indonesia, English Indonesian language, English is unique in
teachers have limited time to consume it in all words. The sounds are disparate from
a routine. This may be the lack of teachers‟ their original form. For instance, the word
mastery in lots of vocabulary. Teachers “one”, students must say “wan” not “one”.
need to habituate speaking in the In case students are not familiar with the
classroom fully so that students are English words so that the errors and
enthusiastic about joining the class. mistakes often appear. The other features
Pedagogy: Journal of English Language Teaching, (9)1: 01-14

of pronunciation are minimal pairs of two explaining courses. But, for the clueless
words which differ by only one sound like teachers about technology, it adds new
„lice‟ and „rice‟. The best way to learn problems only for learning to operate tech.
pronunciation is to create habits in It is necessary for training teachers in
speaking and bringing technology to using tech.
support it. Technology is coming for supporting
Recently, technology stands in people language teaching and learning. For many
living for many certain purposes including years, it was still in the form of chalk and
virtual interaction, transferring goods and blackboard. The phonograph, movies, and
services, a transaction in business, tape recorder are also as means for
commercial jobs, and even for education. language teaching (Ahmad, 2012). Then,
It‟s time for the educational sector to fully the emergence of Computer Assisted
adapt in modifying teaching and learning. Language Learning (CALL) has been
Many schools and institutions have formulated in fostering language learning
technology facilities such as Learning (Fatemeh, 2014) focusing on using a
Management System (LMS) to support the computer to work in word processing,
learning processes. Some popular online excel, and PowerPoint. Then, in the
classrooms for teaching in Distance industrial revolution 4.0, the term of the
Education (DE) are Google Classroom, internet of things (IoT) and artificial
Edmodo, Microsoft Team, and Moodle. intelligence (AI) has grown very fast for
Those all belong to the Massive Open many works like browsing, business,
Online Course (MOOCs) used for online networking, commercial, even learning.
learning. It can give the students‟ appraisal The AI refers to as a machine, computer or
of the online meeting in the form of grade computer system reproducing the
levels. The emergence of technology in functions of cognitive aspects of the
education also brings influences and human mind like learning and problem-
modification toward teachers‟ methods in solving Russel in (Pokrivcakova, 2019). It
teaching. For a long time, the roles of can be a very helpful device for second
teachers in the classroom are vital as the language error correction (Dodigovic,
model or actors for students‟ advances. 2007). It has the potency for assigning
Mostly, the method refers to the lecturing more motivation and chances to students
giving lots of explanations in front of the to engage in spoken communication in the
class. There is no innovation in learning. target language (Underwood, 2017). The
Students get passive learning. Conversely, use of the internet and technology in
with technology, teachers are as a English face-to-face and virtual meeting
facilitator and instructor in practice. Many serves the greater opportunities for
teachers‟ works are finished quickly. valuable and authentic language uses than
Technology can simplify teachers‟ jobs in ready in the class (Richards, 2015). The
Pedagogy: Journal of English Language Teaching, (9)1: 01-14

networking in language learning also analysis and gives correction feedback. For
indorses learners to link in universal. users, there must be register sections
Learners can access many resources such before using the app. In the initial face of
as e-books, pdf, article journals. It can ELSA Speak App, there were some choices
make self-regulated learning or toward the levels of users‟ ability toward
autonomous learning. Students can easily English. The stages of skills were also
open in World Wide Web (www). available. It is Automatic Speech
Nowadays, the term of Mobile Assisted Recognition (ASR) can help students
Language Learning (MALL) shaped the increase the pronunciation and speaking
habits of students‟ learning and social skills outside of the classroom
interaction (Kannan & Munday, 2018). (Xodabande, 2017). This app is available
Most students can use mobile devices in on all mobile devices like smartphones or
attending classroom activities. They can android identifying the words that people
practice at their own motion, at any time speak into the microphone, and solely
and anywhere they are. Consequently, to change them into legible text (Liakin et al.,
bring and engage students in 2015). ASR can improve quickly in
pronunciation classes, the use of mobile accordance with the accuracy in
devices is compatible (Fouz-González, identifying the spoken discourse and
2020). Many iPhone and smartphone transcribing it into written text (Carrier,
applications have the superior potential to 2017). It also can simplify the new mode to
rehearse and increase the aspects of undertake on phonology and accent.
English pronunciation (Fouz González, Indeed, it can give direct formative
2012). assessment and feedback on the accuracy.
At present, many software and Furthermore, ASR can aid students' works
digital platforms or applications become on pronunciation including the segmental
the tool for language learning (Joy Calvo features (Neri et al., 2008). In the ELSA
Benzies, 2017) like ELSA Speak software Speak App, students can learn the scope of
for learning to pronounce. Located in San pronunciation like phonetics and
Francisco established in 2015 by Vu Van, it phonology how words are written and
is one of the smart artificial intelligence articulated in the right way.
technology for language learning This study focuses on the
especially for learning to speak and implementation of ELSA Speak application
pronounce. This app is also categorized as to support the students‟ pronunciation
the Mobile-Assisted Language Learning skills. It is conducted to know the effect of
(MALL). It sustains the learning process to using ELSA Speak app in supplementing
become a two-way direction. For instance, the students‟ pronunciation skills. The
when students pronounce words or certain researcher used this app in learning to
sentences, the system of ELSA Speak does pronounce for the reason that it was the
Pedagogy: Journal of English Language Teaching, (9)1: 01-14

one automatic speech recognition used by skills, the technique of testing was applied
most and the contribution in enhancing by giving a spelling word test. While for
pronunciation skill. non-test one, the researcher interviewed
some students about 10 English education
students of Nahdlatul Ulama University of
This study dealt with classroom Yogyakarta in accordance with the
action research (CAR) as the research student's perception and responses toward
design. Collaboration both qualitative and using ELSA Speech App. Also, the
quantitative methods were also the researcher gave the questionnaire in the
concern of this study. In this study, there form of essay questions related to the use
were some cycles in the learning including of ELSA Speak app in teaching
planning a change, acting and observing, pronunciation. The pre-test and post-test
and reflecting the processes. All cycles are also given to know the students‟
were repeated until the researcher got the pronunciation skills. The same test was
desired results. Firstly, the researcher plan used to compare both the pre-test and
to use ELSA Speak application in teaching post-test scores. To analyze the data, the
pronunciation including designing and researcher used triangulation analysis for
organizing the lesson plan and teaching comparing the data.
method. Then, the researcher began
applying Elsa Speak App in the teaching RESULTS AND DISCUSSION
and learning processes. To know the This section dealt with the research
students‟ responses and development findings and the discussion. It focused on
during using it, the observation is the implementation of ELSA Speak
conducted gradually. Finally, the reflection Application in supporting the students‟
toward the learning was done to know the pronunciation skills. The findings were
overall learning whether it was clear or withdrawn from the cycles of learning
lack. In this case, the subject was the activities in the classroom and the last
English Education students of Nahdlatul score toward post-test. It was also
Ulama University of Yogyakarta semester supported by the results of the interview
two graded 2018 consisting of 18 students. and questionnaire. It is presented on the
This study was conducted during teaching table of pronunciation scores shown in the
pronunciation courses and phonetics and table of T-Test. For knowing and
phonology. For collecting the data, the investigating the students‟ perception
researcher used some instruments like a toward using ELSA Speak App, the
checklist of observations and a students‟ answers in an interview were
questionnaire. The test and non-test used. This study was conducted in
techniques were also employed. In semester three about in the 15 meeting. It
knowing the students‟ pronunciation was divided into three phases or cycles

Pedagogy: Journal of English Language Teaching, (9)1: 01-14

which each consisted of about five the middle of learning, it was time to
meetings. Each phase has the same utilize the app. The researcher gave
activities including planning a change, training and instructions toward operating
acting and observing, and reflecting the ELSA Speak. The students can also select
processes of learning. the level and stages of skills and adjust
Firstly, in Cycle one, it ran about five based on their ability in pronunciation.
meetings in teaching and learning. The Here, the researcher observed all students‟
researcher planned, designed, and activities when the students used ELSA
organized the lesson plan and used ELSA Speak in the classroom. The students were
Speak App for learning to pronounce. The enthusiastic and motivated in learning.
teaching method used was They got opportunities to know the errors
Communicative Language Teaching and mistakes in sounding words on
(CLT). In the preliminary phase, the responses of systems of the app. The
activities began with a greeting, checking correction and feedback were immediately
the students‟ presence, and explaining the given.
teaching materials and learning objectives. For the reflection of the learning
Indeed, brainstorming always be given processes, the researcher gave the
before main activities to engage students assessment of pronunciation including
in learning processes. After that, the topics spelling words and reading texts. The
were explained by giving examples. Since aspects in accordance with the
this was the pronunciation class, the pronunciation rating included accuracy in
learning was focused on how the students grammar, accuracy in vocabulary, fluency,
sounded the words in the right utterances. appropriacy, and comprehensibility. Each
The drilling and training were as the main has different scores in estimation. Also, to
activities. The researcher gave the texts to know more about the achievement, the
be read periodically before the students spoken test-like questions and answers
pronounced it. Then, the researcher were applied. The test of performance was
showed the ELSA Speak Application established at the end of the cycles. The
downloaded from Smartphone and telling results said that the students‟ scores of the
the benefits and the functions even the pronunciation were significant and
features which were available on the app improved. In the initial study before
toward learning to pronounce. The conducting the post-test, the average
students started to download the app on scores of the students‟ pronunciation test
android. For the main activity, the were 60 from 16 students, whereas the
researcher asked the students to read the scores of the maximum were 70. In Cycle 1
whole text given loudly one by one to after using ELSA Speak app, the students‟
know the students‟ pronunciation skills scores were increased by about 72.7 from
first before using the ELSA Speak app. In 16 students while the maximum score was
Pedagogy: Journal of English Language Teaching, (9)1: 01-14

75. It was shown in the following table

Table 1. The students’ Pronunciation Performance in Cycle one

Accuracy Accuracy in Fluency Appropriacy Comprehensibility

in Vocabulary
Preliminary 2 2 1 1 2
C1 3 3 2 2 3
* The average scores of each pronunciation indicator

From the table above. It highlighted vocabulary and practice like sounding the
that the students‟ performance in cycle one diphthong words, stressing words, and
was increased than in the preliminary connecting each word to others. This was
study. done repeatedly to get better
Secondly, in Cycle two, the teaching pronunciation. For the results, students get
and learning were similar to cycle one better scores than cycle one. The average
where the researcher used Communicative scores of the students‟ pronunciation test
Language Teaching (CLT) in teaching were 75 from 16 students while the highest
pronunciation. Here, the main differences scores were 80. The students‟
were the learning activities during using pronunciation scores in cycle two were
ELSA Speak App. The students were represented in the table as follows:
asked to try to choose the next stages or
level where it had more difficulties in
Table 2. The students’ Pronunciation Performance in Cycle two

Accuracy Accuracy in Fluency Appropriacy Comprehensibility

in Vocabulary
Preliminary 2 2 1 1 2
C1 3 3 2 2 3
C2 3 3 3 3 3
* The average scores of each pronunciation indicator

Clearly, the table above has shown In the last Cycle, the learning
that the students‟ pronunciation activities were still the same as the
achievement got significant. It has previous one. Learning started with
emerged from the scores in cycle two. greeting, brainstorming, and leading the

Pedagogy: Journal of English Language Teaching, (9)1: 01-14

learning objectives. Some drilling and difficulties and progress. Questions and
practices were more given to know the answers were also done to know to what
students‟ progress in learning. Moreover, extent the students can pronounce words
many tasks and materials were provided well and clearly in practice. In Cycle three,
to stimulate the students‟ engagement in the students‟ scores were excellent where
learning to pronounce. Inevitably, the level the scores were 80 from 16 students. The
of skills was improved. The students got highest scores were 85. The table below
more complex exercises in sounding represented the progress of students‟
words in the sentence forms. The pronunciation
researcher also gave some feedback and
evaluation toward learning by asking the
Table 3 The students’ Pronunciation Performance in Cycle three

Accuracy Accuracy in Fluency Appropriacy Comprehensibility

in Vocabulary
Preliminary 2 2 1 1 2
C1 3 3 2 2 3
C2 3 3 3 3 3
C3 three 4 4 4 4 4
* The average scores of each pronunciation indicator

As has been shown above, the ASR and the control class taught by the
students‟ achievement in sounding words regular teaching. The use of ASR was more
was greatly increased. It meant that ELSA effective than the traditional mode in
Speak app can support the students in learning to pronounce. Moreover, the
improving the pronunciation. The eases (Fathi Sidig Sidgi & Jelani Shaari, 2017)
were obtained by the students. Clearly, research on the use of ASR in
there must be differences between the pronunciation class got significant results.
conventional instructions like teachers do It claimed that ASR was very beneficial in
all this time and the modern way like the supporting the students‟ pronunciation
use of Automatic Speech Recognition skills and helping the students appreciate
(ASR) in learning to pronounce. These the error and mistakes in sounding words.
results conformed with the (Elimat & Indeed, (Guskaroska, 2019) in the theses
AbuSeileek, 2014) study conducting the research said that ASR can improve
experimental research which showed that pronunciation skills. It was proven by the
there was a significant difference between better scores of the experimental group
the experimental class taught by using than the control group.

Pedagogy: Journal of English Language Teaching, (9)1: 01-14

The findings were also obtained from From the interview above, it was
the interview in accordance with the clear that ELSA Speak application enabled
students‟ perception toward using ELSA the students to learn pronunciation more
Speak App in the pronunciation class. The easily and fast. It gave usefulness
students‟ answers were shown in the especially in giving correction and
following transcript below: feedback instantly. The students were
Researcher: [What do you get of motivated in learning to pronounce words.
using ELSA Speak App in learning to This corresponded with the (Sarmita
pronounce?] Samad & Aminullah, 2019) study stated
The student 1: [I am motivated in that the ELSA Speak app was suitable for
using this app for my learning to learning to pronounce. Many content
pronounce. ELSA Speak app gave me categories available were good such as
simplicities and more facilities in learning. programs related to the English
My utterances can be corrected and proficiency level. Indeed, the features like
revised directly as the native said. I got vowel and consonant sounds, the
lots of vocabulary and practices]. diphthongs, syllabic, and word stress can
The student 2: [I am very enjoyed be the reference for learning. From the
using ELSA Speak app in learning. it can students‟ participation toward the use of
help me revise my speaking. I can learn ELSA Speak App in pronunciation, the
more English words even long sentences. students‟ excitement was high. The
Also, I can study the phonetic and attitude in learning can be shaped and
phonology aspect. I can study words controlled. They can learn by themselves.
whenever and wherever I am]. This statement was supported by the
The Student 3: [This is a good (Haryadi, S & Aprianoto, 2020) study
application for learning to speak and to revealed that pronunciation apps can give
pronounce. I no longer went to abroad to some advantages and positive effect for
meet tourist only for talking and independent learning and participation in
discussion using English, but with this the classroom.
App I feel I talk with English people. My The next findings were drawn from
speaking was corrected by native. I was the students‟ questionnaire. 85 % of the
motivated in using it]. students said that they like using ELSA
The student 4: [ELSA Speak app has Speak for learning to pronounce, and 90 %
supported me in improving my of the students felt motivated and
pronunciation skill. I can sound many improved during using it.
words correctly. This app gave me chances The evidence mentioned highlights
to learn more about words, phrases, even that ELSA Speak App can give more
sentences well. I was a very spirit in stimulus and input to the students in
joining the class]. learning to pronounce even speaking. It is

Pedagogy: Journal of English Language Teaching, (9)1: 01-14

very good and convenient to be utilized by directly. For last, concerning the use of
the students in the classroom. Also, it is a technology in language teaching, there
new app providing many features like must be some consideration in accordance
students‟ grades in English from beginner with the students‟ readiness and the app
to advance. The teacher can use it anytime used. The appropriate app choices toward
for supporting in pedagogical design also. the language skills should be regarded
before using it in teaching and learning in
CONCLUSION order to make learning be effective and
From the results and discussion efficient. Technology was just
proposed in the previous section, it was supplementary in language teaching. The
clear that ELSA Speak application can English teachers cannot neglect their
support and improve the students‟ English competence itself. The key to how
pronunciation skills and motivation. This to make language teaching become a
was proven by the performance of success was also from the teachers‟
students and the scores obtained. The pedagogy how teachers can manage and
students can hear and imitate sounds as handle the classroom activities. Moreover,
same as with the native language. The teachers should know the students‟ needs
students‟ voice recording can be received and want. Finally, English teachers should
in the system immediately to be corrected regard the ELSA SPEAK App as the
as the good one. Indeed, the students can proper technology for teaching
increase the English vocabulary. This app pronunciation skills
can successfully make the students‟ engage
Kholis, A. | Elsa Speak App: Automatic Speech Recognition (ASR) for…, 01-14
Pedagogy: Journal of English Language Teaching, (9)1: 01-14

Kholis, A. | Elsa Speak App: Automatic Speech Recognition (ASR) for…, 01-14
Pedagogy: Journal of English Language Teaching, (9)1: 01-14

Kholis, A. | Elsa Speak App: Automatic Speech Recognition (ASR) for…, 01-14
Pedagogy: Journal of English Language Teaching, (9)1: 01-14


