Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 40

VIETNAM NATIONAL UNIVERSITY– HO CHI MINH CITY

INTERNATIONAL UNIVERSITY
SCHOOL OF LANGUAGES

Perception And Production Of


English Initial Aspirated Plosives /P-
T-K/ By International University
Students
A thesis submitted to
The School of Languages, International University,
in partial fulfillment of the requirements for the Degree of
Bachelor of Arts in English Linguistics and Literature

Student’s name: Trần Ngọc Hồng Phúc – ENENIU18129


Supervisor: Võ Thanh Nga, M.A.

February/2023
VIETNAM NATIONAL UNIVERSITY – HO CHI MINH CITY
INTERNATIONAL UNIVERSITY
SCHOOL OF LANGUAGES

DECLARATION OF
UNDERGRADUATE THESIS PROJECT PAPER AND COPYRIGHT

Author’s full name : Trần Ngọc Hồng Phúc


Date of birth : 25/5/2000
Title : Perception and production of English initial aspirated plosives /p-t-k/ by
International University
Academic Session : 2022 – 2023

I acknowledged that International University – VNU reserves the right as bellows:


1. The thesis is the property of International University – VNU.
2. The Library of International University – VNU has the right to make copies
for the purpose of research only.
3. The Library has the right to make copies of the thesis for academic exchange.

Student: “I hereby certify that the attached material is my original work. No other
person’s work or ideas have been used without acknowledgement. I have not been
submitted, either wholly or substantially, for a degree in this university or elsewhere.”

______________________________
Signature of student

______________________________
Student’s ID No.

Date: 02/2023

Supervisor: “I hereby declare that I have read this thesis project paper and in my
opinion, this paper is sufficient in terms of scope and quality for the award of the
Degree of Bachelor of Arts in English Linguistics and Literature.”

______________________________
Signature of Supervisor

______________________________
Name of Supervisor

Date: 02/2023
VIETNAM NATIONAL UNIVERSITY – HO CHI MINH CITY
INTERNATIONAL UNIVERSITY
SCHOOL OF LANGUAGES

THESIS APPROVAL

AN INVESTIGATION OF STUDENTS’ MOTIVATION IN


LEARNING LISTENING SKILLS VIA PADLET

by
Tran Ngoc Hong Phuc

APPROVED:

__________________________________________ _______________
Thesis Supervisor Approval Date
Vo Thanh Nga

__________________________________________ ____________
Thesis Reviewer Approval Date
Vu Hoa Ngan

__________________________________________ _______________
Dean Approval Date
Nguyen Huy Cuong
Acknowledgement

My academic journey here in the International University – VNU-HCMC is

coming to an end after this thesis. Needless to say, I was not able to finish this paper

on my own. In fact, there were a lot of people supporting me throughout this semester

so that I can eventually have my research done.

Firstly, I would like to express my deepest gratitude towards my beloved

supervisor, Ms. Vo Thanh Nga, M.A for her patience and dedication to my paper

supervision. It was my greatest honor to work with her, as I learnt a lot of experience

from her in many aspects. Her guidance enabled me to structure my ideas, form the

content from the smallest details and finally complete this paper on time. She always

expressed her professionalism throughout the time we were working together on this

research, which I completely dignify.

Secondly, I would like to express my gratitude to Ms. Vu Hoa Ngan for her

insightful comments on my thesis and to all of the professors in the IU English

Department for their lectures and helpful guidance during my coursework and data

collecting.

Thirdly, I also want to express my appreciation and heartfelt gratitude to Ms.

Dang Hoai Phuong for all the resources needed for my thesis and remote supporting.

Moreover, I would like to thank Long N., my friend, who gave me technical

support on collecting data and decoding data. Also, another big thanks to 30

participants, all of whom were complete strangers to me in the first place, for their

unconditional participation in the test for my data collecting purposes.

Finally, I also want to send my love to my family for their indirect support and

their encouragement that motivated me to finish my very last work at university.

An Investigation of Students’ Motivation in Learning Listening Skills via Padlet


Tran Ngoc Hong Phuc

School of Languages, International University – VNU HCM

EL046IU: Thesis

Vo Thanh Nga (M.A.)

Author Note

I have no known conflict of interest to disclose.

Correspondence regarding this study should be addressed to Tran Ngoc Hong

Phuc, School of Languages, International University – VNU HCM, Quarter 6, Linh

Trung Ward, Thu Duc City, HCMC, Vietnam. Email:

eneniu18115@student.hcmiu.edu.vn
1

Table of contents

THESIS APPROVAL..................................................................................................3

APPROVED:................................................................................................................3

Abstract.........................................................................................................................3

Introdction....................................................................................................................4

Literature Review........................................................................................................5

English plosive consonants........................................................................................5

Types of English plosives.................................................................................................5


Articulatory properties of aspirated plosives....................................................................6
Voice Onset Time (VOT)............................................................................................9

Reason to use VOT...........................................................................................................9


VOT of targeted sounds..................................................................................................10
Previous research....................................................................................................11

Research gap............................................................................................................12

Methodology...............................................................................................................12

Study design.............................................................................................................12

Participants..............................................................................................................13

Materials..................................................................................................................13

Recording method....................................................................................................13

Research instrument................................................................................................13

Perception test................................................................................................................13
Production test................................................................................................................14
Data collection procedures......................................................................................15

Data analysis...........................................................................................................15

Perception test................................................................................................................15
Production test................................................................................................................15
Findings.......................................................................................................................16

Perception test.........................................................................................................16
2

Production test.........................................................................................................17

Discussion....................................................................................................................21

Implications and Limitations....................................................................................23

Conclusion..................................................................................................................25

References...................................................................................................................26

Appendices..................................................................................................................30

Appendix A. Consent form.......................................................................................30

Appendix B. Perception test.....................................................................................33

Appendix C. Production test....................................................................................34

List of Tables

Table 1. Place of articulation and voicing of the target plosives...................................6

Table 2. Mean VOT of plosive in isolated words.......................................................17

Table 3. Mean VOT of plosive in utterances..............................................................18

List of Figures

Figure 1. Types of VOT...............................................................................................9

Figure 2. Individual Perceptual Errors of Plosives.....................................................16

Figure 3. Mean VOT of aspirated plosives in isolated words and in utterances.........19

Figure 4. The word ‘pack’ as pronounced by the speaker BABAIU22470.................20

Figure 5. The word 'talk' as pronounced by the speaker BABAIU22483...................21


3

Abstract
In order to successfully communicate in a language, an individual must be able to

both perceive and produce that language. The study was to investigate how

Vietnamese students at the International University perceive and produce English

initial aspirated plosives /p-t-k/. This research employed both quantitative and

qualitative approaches. 30 IU students, who are at elementary level, were invited to

participate in the study. The production and perception tests were used as the

instruments of this research. Also, the software Praat was employed to analyze the

recorded samples from production tests. In terms of perception test, the stimuli test

identification and discrimination were used in order to examine the abilities to

perceive the investigated sounds. The outcomes of both tests have revealed that the

bilabial /p/ sound was the most problematic while the alveolar /t/ would not be a

problem to Vietnamese students. Furthermore, the sound with the greatest mean VOT

value generated from isolated words was /t/ (85ms), followed by /k/ and /p/ with 83ms

and 65ms, respectively; however, in utterances, the maximum VOT can be found in

the alveolar stop /k/ (91ms), with 77ms and 60ms for /t/ and /p/. The findings of the

study are expected to contribute to the literature for the perception and production of

these English initial consonants in the Vietnamese context and further pedagogical

implications in pronunciation teaching.

Keywords: aspirated plosives, initial sounds, production, perception, Praat, VOT


4

Introdction
Pronunciation makes a significant contribution to the success of a language

learner. However, pronunciation is a challenging problem that non-native English

speakers confront when studying English. Improper pronunciation can lead to

negative impressions, misunderstanding and ineffective communication. To achieve

successful communication, interlocutors need to understand what other speakers utter

as well as present their speeches in a comprehensive way for the others (Gilakjani &

Sabouri, 2016). In other words, both perception and production aspects must

contribute to speech intelligibility. However, both perception and production are still

the challenges that non-native speakers, or especially non-major speakers, are tackling

in their target language (Ben, 2005). Moreover, ‘pronunciation emerged as by far the

greatest factor in unintelligibility, and the difficulty tended to increase with the gap

between interlocutors’ first languages’ (Jenkins, 2000, p. 258). In addition, the

contrastive analysis hypothesis suggests that differences between two distinct

languages can create a plethora of difficulties for learners in the process of second

language acquisition (Dost & Bohloulzadeh, 2017). In the case of English and

Vietnamese, some significant differences in pronunciation can cause Vietnamese

learners to struggle. The pronunciation of certain suffixes or consonants that do not

exist in Vietnamese, such as /θ/ or /ð/, poses challenges for certain Vietnamese

speakers to master, and thus demands certain efforts in practicing.

Not only sounds that do not exist in speaker’s native language, for instance,/θ/

or /ð/, are not pronounced correctly by Vietnamese speakers, but even sounds that are

close to those existing in their first language can keep language learners from

pronouncing English Vietnamese and English share some consonants in the initial

position (Tang, 2007). According to Truong (2015), as exemplified in the study, the
5

English sounds/ʧ/ and /C/ (or /Ch/) by Vietnamese are pronounced quite similarly

even though they are not shared sounds. In addition, Hoang (1970) noted that the

replacement of / ʧ/ by /C/ occurred the second most frequently among those due to the

confusion by the informants. As a result, not only are non-shared sounds between

languages confusing but sounds that "appear" to be shared sounds are also disruptive

to hearing and pronunciation.

Although the fricatives and affricates sounds have been studied in detail,

insufficient attention has been paid to plosive stops. With regards to the aspirated

plosives /p, t, k/, there is a vast difference between English and Vietnamese. In

English, initial plosive sounds /p, k/ may be aspirated depending on context. On the

other hand, according to Dinh & Nguyen (1998), /p / and /k/ are unaspirated or

implosive stops in Vietnamese. Thus, the differences may cause a multitude of

difficulties in pronouncing aspirated plosives since the students’ English accent has

been strongly influenced by Vietnamese. On account of these gaps between /p, t, k/

sounds in both languages, the English aspirated plosives are chosen to be investigated

in this research in the Vietnamese context.

Literature Review
English plosive consonants

Types of English plosives

Generally, there are 2 types of plosives: voiced and voiceless. Each type is

indicated by some characteristics.

Theoretically, according to Lisker & Abramson (1964), the presence of glottal

buzz during articulatory closure phase indicates the voiced stops while the absence of

buzz thoughout this time implies voiceless stops. Acoustically, these two types of

plosives are glottalized differently based on their spectrographic patterns. In

particular, in terms of voiced stops, a limited number of low-frequency harmonic


6

components traverse the formantless segment corresponding to the closing duration..

On the other hand, the closing interval for voiceless stops is virtually blank. However,

for English, those physical rules separating the two categories work only in part due

to the fact that in the initial position, both sets are commonly produced with silent

closure intervals and should therefore be classed as voiceless according to the

definitions cited. As a result, aspiration is highlighted as another phonetic feature that

seperates /p t k/ from /b d g/ (this attribute works in the initial position and medially

before a stressed syllable).

At the beginning of the syllable, these consonants are released with a small

explosion. Air escapes through the vocal cords at the post-release period, producing a

sound similar to /h/. This is referred to as aspiration. Then the vocal cords join

together and form a vowel. Phonological studies have shown that the listener

perceives an initial voiceless plosive as when the sound is emitted there is a delay

between the plosion and the beginning of the vowel (Awoonor-Aziaku, 2021). In

short, aspiration is a short frication noise that occurs before vowel formants and lasts

around 30ms.

The context in which the stop appears determines whether an aspirated or

unaspirated voiceless stop is used. This research focuses on the stops appearing at the

beginning of a word. When voiceless stop consonants are frequently articulated with

this extra puff of air when they are at the word's initial position. The aspirated sound

is indicated with a raised h in its phonetic symbol.

Articulatory properties of aspirated plosives

Roach (2009, pp. 26-27) lists out the properties, for instance, place of

articulation and voicing, of the target plosives as follows:


7

Table 1. Place of articulation and voicing of the target plosives.

PLACE OF ARTICULATION
VOICING
Bilabial Alveolar Velar
Voiceless p t k
(Roach, 2009, pp. 26-27)

Stages in the Production of the Aspirated English Plosives Consonants:

The closure phase: the articulators move towards each other, make firm

contact, and close the air passage completely.

The hold stage: the air stream is temporarily stopped at the place of

articulation (lip /p/, teeth /t/, soft palate /k/), so air pressure builds up behind the

closure.

The release phase: the speech organs separate abruptly and release the closure,

thus allowing the compressed air to escape quickly with a slight plosion.

The post-release phase: aspiration occurs and completes producing the

voiceless plosives.

(Roach, 2009)

In this study, we only focus on voiceless aspirated plosives in the initial position

and not included in the cluster.

English versus Vietnamese plosives

Concerning the manner of articulation and voicing, /p/, /t/, /k/ in English and

Vietnamese are all considered voiceless stops. However, it should be noted that /p, k/

in Vietnamese are unaspirated. Moreover, mild differences are found in the place of

articulation of these sounds, in which the consonant /t/ in English is an alveolar sound

while the sound /t/ in Vietnamese is a tooth-tongue sound. Meanwhile, the sound /p/ is

both a bilabial sound, and /k/ sound in English and Vietnamese are both velar sounds.

(Thuật, 2000; Roach, 2009).


8

In particular, English is associated with a group of languages which has two

categories of stops on each place of articulation, which are /b/-/p/ for labial sounds,

/d/-/t/ for alveolar sounds, and /g/-/k/ for velar sounds. Due to the fact that the

examined plosives are initial sounds, their pronunciation must be accompanied by

aspiration.

On the other hand, as regards Vietnamese, in addition to the /tʰ/ sound (as in

‘thôi’), which is relatively the same as the aspirated /t/ in English, there is also a

voiceless unaspirated phoneme /t/ as in ‘tôi’ (‘me’). Additionally, there are two

Vietnamese phonemes that are easily misunderstood as /k/ sound in English, which

are the velar voiceless unaspirated /c/ as in ‘kiến’ (‘ant’) and the voiceless fricative /x/

as in ‘không’ (‘no’). One related voiced sound is the post-velar fricative /ɣ/ as in ‘ghế’

(‘chair’), which will not be taken into account as it is not regarded as a stop.

Therefore, the voiceless aspirated plosive /k/ sound is not found in the Vietnamese

phonetic system and, as a consequence, is expected to be a challenging sound for the

participants. Moreover, only in loanwords does the sound /p/ occur in the first

syllable. For example, some Vietnamese people may pronounce it as voiceless

bilabial unaspirated /p/ as in ‘sâm panh’ (derived from French ‘champagne’) or in

‘pin’ (derived from French ‘pile’). However, in some cases, it will be replaced by a

voiced bilabial implosive /ɓ/ as in ‘bạn’ (‘friend’) or a voiceless fricative /f/ as in

‘pháo’ (‘firework’).

Summarized in the table below are all the Vietnamese sounds relating to

English plosives:

labial ɓ (implosive) p (unaspirated) f (fricative)

alveolar ɗ (implosive) t (unaspirated) tʰ/ t’

velar ɣ (fricative) k (unaspirated) x (fricative)


9

(IPA/Vietnamese 2022)

According to contrastive analysis hypothesis, it can be initially predicted that

the participants will struggle with voiceless aspirated /k/ and voiceless aspirated /p/

since the Vietnamese language does not consist of a sound with similar manner nor

place of articulation. Meanwhile, Vietnamese individuals are expected to find it easier

to master the English pronunciation of /tʰ/.

Voice Onset Time (VOT)

Reason to use VOT

VOT is a characteristic of stop consonant production. It is defined as the time

interval (in milliseconds) between the release of a stop consonant and the onset of

voicing. VOT has been regarded as a highly effective means of differentiating

phonemic categories (Lisker & Abramson, 1964).


Figure 1. Types of VOT

A VOT of type 3 is considered “negative” when the start of the voicing occurs

prior to the release of the stop. This is typical for voiced stops, whose voice onset

times are typically less than zero (Kaur, 2015). In contrast, a “positive” VOT, which

indicates a voiceless stop, occurs when the voicing begins after the stop has been

released (which means after the burst), resulting in a “voice lag”. The length of this

voice lag may vary depending on whether the voiceless stop is produced with

aspiration (type 1) or without aspiration (type 2). Moreover, a short voice lag that
10

takes place simultaneously or just after the burst can be referred to as “zero VOT”,

which typically represents unaspirated voiceless stops.

In general, voiced sounds are characterized by a negative VOT. Meanwhile,

voiceless unaspirated sounds have a VOT of around zero, whereas aspirated sounds

have a positive VOT (Styler, 2012). Fundamentally, VOT is a key feature in the

classification of voiced and voiceless plosives.

VOT of targeted sounds

This study differentiates American English stops based on criteria adapted

from Lisker and Abramson's (1964) cross-linguistic research of Voice Onset Time

(VOT) values of stop categories.

First, aspirated stops such as /p/, /t/, and /k/ have a mean VOT in the range of

60-100ms, which indicates a long voice lag. In contrast, the mean VOT of /b/, /d/

and /ɡ/ shows some diversity. In most of the cases, they have the features of an

unaspirated voiceless stop with a very short voice lag, and VOT values ranging from

0 to 25ms (Auzou et al., 2000). Only in some cases are they fully voiced stops with

negative VOT values within the range of -125ms to -75ms. This is backed up by

Roach (2009) as he asserts that the key factor in differentiating between /p/, /t/, /k/

and /b/, /d/, /g/ in English is not voicing, but aspiration, especially when they are in

the initial position. According to Roach, it is unnatural to produce the initial plosive

/b/, /d/, /g/ in a fully voiced manner. As a result, this study separates the plosives into

two categories: unaspirated plosives whose range is below 0 to 25ms and aspirated

plosives with a range beyond 26ms.

Second, in terms of place of articulation, English velar stops have the highest

VOT, whereas English bilabial stops have the lowest. Auzou et al. (2000) believe that
11

an offset must be more than 15ms for /t/ and more than 30ms for /k/ to classify it as

aspirated stops, while this figure for /p/ is shorter.

Finally, in terms of the placements of the stops that are initial in isolated

words, initial in phrases, and medial in sentences, the VOT of the stops in sentences is

shorter than that of the stops in isolated words, demonstrating "temporal compression

in rapid speech" (Lisker and Abramson, 1964, p. 414).

Previous research

There are several studies that also investigated plosive sounds. Muis (2008)

conducted a study that focused solely on the voiced plosives /b, d, g/. Apparently, his

study was restricted to the ability to produce voiced plosives, as he only tested his

subjects on words containing voiced plosives.

Loitsch (2016) studied English voicing by asking 10 Austrian adult learners to

distinguish sounds in 63 minimal pairs including two pairs of plosives (/p/ vs /b/; /t/

vs /d/) in initial positions. The subjects were asked to do the perception test by

listening and choosing the words they hear. Then, in the production test, they were

asked to repeat after the recording. The results of both tests indicate that unvoiced

sounds are produced more accurately than voiced counterparts which were supposed

to be due to the difference in the sound systems of the L1-Austrian German- and the

L2-English. Also, the study proved that a highly accurate perception does not entail a

highly accurate production of the same sounds.

In Vietnamese context, Tam, H. (2005) conducted research on common

pronunciation problems by 51 students of the English department at VNU Hanoi-

University of Languages and International Studies, who had finished four years of

English and took part in the final exam. The oral examination was taken as the
12

production test. Sound confusion is also detected in this study, with the percentage of

participants confusing between /t/ and /ʧ/; /p/ and /b/ standing at 25,5% and 17,6%

respectively. Also, the researcher mentioned the similarities and differences between

L1 and L2 as one of the factors causing difficulties in pronouncing the voiceless

plosives.

Investigating first-year students' pronunciation problems at Thai Nguyen

University of Industrial Technology by Hoang, T. T. (2014) has pointed out that

Vietnamese has an impact on the pronunciation of English sounds by students. A

variety of data-gathering methods were utilized during the study, including recording

students' pronunciations, evaluating their performance in the classroom, and taking

notes. Specifically, the replacement of /p/ with /b/ or /f/ occurred by 26% of

participants. Also, the researcher has argued that the Vietnamese phoneme system is

responsible for the mispronunciation of /p/ to /f/ or /b/ due to the fact that /p/ does not

exist in the first position in Vietnamese.

Research gap

According to the aforementioned studies, despite a variety of pronunciation

topics, research on both the perception and production of English sounds in Vietnam

is still limited, as it primarily elaborates on each manner. Furthermore, studies that

incorporate acoustic analysis, which can provide a more objective perspective on the

productive investigation of those voiceless aspirated plosives, also suffer from

scarcity. Therefore, this study, with an aim to cover the gap left by previous research,

will analyze both perception and production aspects. This study aims to seek answers

to the following questions:

1. How do students at IU-VNU perceive initial plosives /p t k /?

2. How do students at IU-VNU produce initial plosives /p t k /?


13

Methodology

Study design

In this study, participants mainly have to do two perception tests and two

production tests. This research will take a mixed approach of both quantitative and

qualitative. The descriptive qualitative approach takes precedence over the other.

First, it includes a qualitative analysis of the non-numerical data, which was

undertaken by observing the acoustic representations and auditory perceptions of the

target productions. Second, this work also follows a quantitative methodology

because it measures the test scores of both perception and production tests.

Participants

A total of thirty students from all majors enrolling the academic year of 2022-

2023 in the International University are randomly chosen for this study. All members

from both groups are guaranteed to be at the same level of English proficiency by

taking Intensive English 2 (IE2) courses (which should be equivalent to pre-

intermediate level or at band 5.5 IELTS) in the first semester (based on the result of

Placement test held by International University). These criteria are set in order to

minimize the gap between participants as much as possible.

Materials

In terms of resources, the Ship or Sheep book (Baker, 2006), Pronunciation in

use- Elementary level (Marks, 2007) are the key sources for assessment tasks. Audio

files are extracted from the Pronunciation in Use and Cambridge online dictionary

(https://dictionary.cambridge.org/). The audio's primary accent was American.


14

Recording method

All the pronunciation will be recorded by microphone connected to

smartphone (only for Iphone), and then analyzed by Praat, sampling size 22.050 Hz,

16 bit with files .wav form.

Research instrument

The research instruments of this study are production and perception tests.

Perception test

In this study, the perception test is a listening exam. The purpose of the exam

is to assess students' ability to distinguish between English intial consonant sounds.

The perception exam is divided into two parts. (See Appendix B.)

Part 1: Listen and circle the words

Part 2: Listen and circle the word the same as the last word.

Two perceptual tests (Appendix B) were: an identification test (Part 1) and a

discrimination test (Part 2), which followed the ABX format (Liberman, Harris,

Hoffmann, & Griffith, 1957). In Part 1, Perceptual Identification Test (PIT),

participants were required to listen to the target word and respond to a multiple-choice

question about the consonant the word began with. Six minimum pairs are included in

the exam. Meanwhile, the Part 2, Categorical Discrimination Test (CDT), is adapted

from an ABX format (Liberman et al., 1957) and consisted of 6 questions presenting a

sequence of three words, which henceforth will be denominated clusters. In a cluster,

two tokens in the triad relate to the same word, but the other refers to a separate

lexical item (bay-pay-pay). Participants are asked to differentiate the initial

consonants occurring in the triads. Test takers have to decide whether the first

consonant of the third word in the sequence is (a) the same as the initial consonant of
15

the first word in the triad or (b) the same as the initial consonant of the second word in

the triad.

In short, listeners in identification tasks directly designate a sound as

belonging to one of two categories, but listeners in discrimination tasks hear three

sounds and must react whether the first or second word is the same or different from

the third word (also called ABX tasks).

Production test

The production test consists of two tasks including the word read-aloud task in

which participants are asked to pronounce each isolated word three times, and each

word in utterances two times slowly and clearly. (Appendix C)

Data collection procedures

The procedures are as follow:

1. The researcher asked for the permission from teachers for collecting data in the

classroom; it is given from 5th to 12th December 2022 at International University

during the break time.

2. The consent form (Appendix A) was distributed to every participant.

3. The perception tests are handed out to students as the participants in the

classroom.

4. All the participants have to complete the perception tests. There will be a total of

12 items for two parts, and it will take around two minutes.

5. After finishing the perception tests, the participants are orally instructed to do a

production test. The process by which students read the words from the list will be

recorded by their own phones (iphone only), and there are a total of nine target

sounds (including isolated words and words in utterances), and it will take around

two minutes.
16

Data analysis

Perception test

There are 12 questions for both tests. The answer will be marked “correct” if it

matches with the provided answer key, or else it will result in “incorrect”. Moreover,

the percentage of incorrect answers for the target sounds will be calculated.

Production test

After the data is collected, the participants’ audios will be analyzed using

Praat focusing on Voice onset time (VOT) indicators. The beginning and the end of

the investigated consonants are determined by using waveform and spectrogram.

In order to obtain the VOT values of the target stops, the two time points were

located on their waveforms. The voicing onset was located at the beginning of the

periodic wave on the waveform, and the burst was determined as the presence of the

first spike which signals the sudden change of noise caused by the stop release.

1. “aspirated stops”: VOT is in the range of above 26ms.

2. “unaspirated stops”: VOT is in the range of below 0 to 25ms.

Adapted from Lisker & Abramson (1964)

Findings

Perception test

The results of the two perception tests are shown in Figure 7 in which the

horizontal axis shows the mentioned plosives in the two tests while the vertical axis

demonstrates the number of incorrect informants.


17

Figure 2. Individual Perceptual Errors of Plosives.

Overall, the /p/ sound is by far the most confusing among the three targeted

plosives in the perception test. The number of errors made with the /p/ sound in Test 1

and Test 2 are 8 and 20 respectively. On the other hand, only one error was detected

for the /t/ sound in Test 1, with no errors occurring in Test 2. As a result, bilabial /p/

which has an anterior position of articulation shows the highest rate of error whereas

velar plosive /k/, the most posterior sound, shows a lower rate of error and alveolar

plosive /t/, shows the lowest rate of error.

The nature of perceptual errors also reveals a phenomenon since the direction

of perceptual errors is mainly fronter articulation-p. As can be seen in Figure 9, a

voiceless plosive is misheard as a voiced one that has the same articulation. /p/ sound

is systematically misheard as /b/, and /k/ sound as/ g/. However, there is hardly a

single case of alveolar /t/ being misheard as /d/.

The perception test of plosives shows that there are individual differences in

perceptual ability. For instance, there were students who only made one mistake in the

entire test whereas there were students who made as many as five mistakes. Such

individual differences may be interpreted as being conditioned by various factors such


18

as natural phonetic talent, unconscious or deliberate ear training through foreign

language acquisition, etc.

Production test

Table 2. Mean VOT of plosive in isolated words.

VOT of plosive in isolated word

aspirated / unaspirated / aspirated aspirated aspirated unaspirated /


k/ k/ /t/ /t/ /p/ p/

Av 83 23 85 -117 / 18 65 -23;18

-126 :-108; -97 : -5;


R 29 : 156 22 : 23 48 : 136 27 : 118
/// 9 : 23

N 120 2 122 3 74 52

The findings for the initial stops of isolated words are presented in Table 2.

The results for each sound of /p/, /t/ and /k/ are divided into two groups, which are

aspirated and unaspirated. The first data row provides the average values of the VOT

for each category. The second row displays the range of value observed, and the third

row records the number of tokens of each stop phoneme.

As seen in Table 2, the mean VOT of /k/ and /t/ are 83ms and 85ms

respectively, while the mean VOT of /p/ is 65ms. Among the investigated voiceless

aspirated plosives, the alveolar stop /t/ has the highest VOT, while the bilabial stop /p/

displays the lowest. In regard to the VOT range, the voiceless aspirated /k/ exhibits a

wider range from 29ms to 156ms, compared to the aspirated /t/ and /p/, which present

a range of 48-136ms and 27-118ms, respectively. Moreover, the number of tokens of

each stop phoneme reveals that there are 52 cases of aspirated sounds /p/ that are

mispronounced as unaspirated, accounting for 41,2% of the total cases. Meanwhile,

the number of mispronounced tokens for /t/ and /k/ are significantly lower, which are

3 (2,4%) and 1 (1,6%), respectively.


19

Table 3. Mean VOT of plosive in utterances.

VOT of plosive in utterance

aspirated unaspirated aspirated unaspirated aspirated unaspirated


/k/ /k/ /t/ /t/ /p/ /p/

Av 91 -94 77 -35 / 28 60 -26

-107 : -4;
R 34 : 151 /// 31 : 123 ///; /// 27 : 141
8 :24

N 43 1 42 2 32 12

The table 3 presents the findings for the initial stops of words in utterances.

The values presented (consisting of Av, R, and N) closely resemble those of Table 2.

In particular, the mean VOT of /k/, /t/, and /p/ are demonstrated in the order of

91ms, 77ms, and 61ms. The highest VOT can be found in the alveolar stop /k/,

whereas the lowest belongs to the bilabial stop /p/. The VOT range of the aspirated /k/

sound, with a range of 34-151ms, surpasses the VOT ranges of the aspirated /t/ and /p/

sounds, which occupy the ranges of 31-123ms and 27-141ms, respectively.

By examining the number of tokens for each stop phoneme in Table 3, it was

found that 12 of the 44 aspirated tokens of the /p/ sound were pronounced incorrectly

as unaspirated, which constitutes 27,3% of the total. Meanwhile, 2 out of the total 44

tokens of the /t/ sound were mispronounced (accounting for 4,5%) and 1 token of

the /k/ sound was mispronounced (representing 2,3% of the total).

With the data extracted from Table 2 and Table 3, Figure 3 provides the

comparison of the mean VOT of aspirated plosives in isolated words and in

utterances,
20

Figure 3. Mean VOT of aspirated plosives in isolated words and in utterances.

The data presented in the table indicate that the mean VOT of the plosive in

isolated words is greater than that of the same plosive in utterances, with the

exception of /k/. Specifically, while the mean VOT of the aspirated /k/ in isolated

words is 83ms, its mean VOT in utterances reaches 91ms.

The results of the test suggest that the /p/ sound is the most problematic for

participants, with approximately half of them demonstrating an inaccurate

pronunciation of this phoneme. Furthermore, in terms of utterances, the highest VOT

value is recorded for the velar stop /k/ (followed by the alveolar stop /t/), whereas the

lowest value is seen in the bilabial stop /p/. Conversely, in isolated words, the alveolar

/t/ registers the highest mean score, with the bilabial /p/ maintaining the lowest figure.

Additionally, during the data analysis process, several errors are recorded

among each individual sound.

Aspirated voiceless is mispronounced as voiced (recorded in /t/, /p/)

With regard to the bilabial /p/, one typical error involves some students being

confused when they attempted to relate the phonemes in their mother tongue to the

voiceless plosive /p/ in English. Specifically, for those with a negative VOT value,

mispronunciation from voiceless /p/ to voiced /b/ could be observed.


21

Figure 4. The word ‘pack’ as pronounced by the speaker BABAIU22470.

Based on the spectrograms presented above, the spectrogram of consonant /p/

can be seen with the voicing bar at a frequency below 1000Hz, indicating the sound

released in this case was voiced. Moreover, the voicing point is identified when the

periodic waves are present. The burst can be detected following the voicing point,

hence a negative VOT value, which necessarily means that the sound recorded was

voiced /b/ rather than /p/.

Aspirated plosives are pronounced without aspiration. (recorded in /t/, /p/)

In several cases, the phoneme /p/ could also be produced without any

aspiration, shown when the VOT values were between 0 and 25ms, which creates

another mistake.

Figure 5. The word 'talk' as pronounced by the speaker BABAIU22483


22

As seen in Figure 5, there is no voicing bar at the frequency below 1000Hz, so

the sound released was voiceless. However, there is no short frication noise displayed

prior to the beginning of the vowel formants shown in the spectrograms. Frication is

normally synonymous with aspiration, which means the absence of this indicates that

the /t/ in the spectrogram above was unaspirated.

Strong aspiration in velar aspirated plosive /k/

Considering the sound /k/ - the most posterior one, it is expected to be the

most difficult sound out of the three plosives for the Vietnamese to generate. In

contrast to this prediction, however, the number of problems revolving around /k/ was

the lowest. In 54 cases, this phoneme is pronounced with a longer aspiration, forming

a clearer /h/ sound and generating VOT values that surpass the range of 60-100ms.

Discussion

The results of the production test are not entirely in line with previous

research.

On the one hand, the average mean score of VOT values derived from the oral

production of utterances shows the same order in VOT length of three investigated

plosives as the findings from the VOT study by Lisker & Abramson (1964) regarding

the place of articulation. In particular, the velar stops have the highest mean VOT,

whereas English bilabial stops have the lowest. However, the mean score of VOT

values generated in the oral production of separated words was incompatible with

Lisker & Abramson’s (1964) findings in that the alveolar stops have the highest VOT.

The bilabial stop’s VOT is still the lowest.

On the other hand, in regard to the contexts where the investigated sounds

appear, the production test did not generate entirely parallel outcomes to Lisker &

Abramson’s (1964). While the two scholars report that the VOT of the stops found in
23

individual lexical items is greater in length than the figure found in sentential stops,

this is in fact contradictory to the test’s results where the VOT of velar stops in

sentences is longer.

However, perception and production tests both figured out that the /p/ sound is

the most challenging for participants and somehow indicates a relation between

perception and production of aspirated plosives.

There are several explanations that can be proposed regarding the differences

between the data collected and the theory suggested in the literature review.

Unaspirated /p/

As pointed out in Literature Review, English plosives are classified into two

categories which are not equivalent to those in Vietnamese (Thuật, 2000). In addition,

voiced counterparts of the plosive consonants discussed in this paper do exist in

Vietnamese with no aspiration. When they encounter English consonants, they tend to

substitute Vietnamese consonants for English ones (Truong, 2015). This could have

prompted the participants into confusion between voiceless /p/ and voiced /b/, thus

replacing the former with the latter. Such a mistake might be caused by interference

from L1, which requires further investigation, or purely accidental.

This is somewhat similar to the case of specific Vietnamese words such as

‘sâm panh’ (‘champagne’) or ‘pin’ (‘battery’) where the initial plosive sound /p/ is

voiceless yet unaspirated, creating a short VOT value. For this reason, some

Vietnamese learners may find it difficult to pronounce /p/ accurately as they have a

tendency to omit the aspiration from this sound due to the influence of the unaspirated

/p/ in Vietnamese.

Unaspirated /t/
24

Unaspirated /t/ is a potential mistake that was rarely made in this production

test. This can be partly explained by the fact that in Vietnamese, there is an aspirated

voiceless stop which is / tʰ/ that is produced in the same manner as the /t/ in English,

albeit with a more gentle explosion in the initial position. This phonemic similarity

can possibly allow most students to pronounce /t/ with the aspiration needed.

To sum up, because of some distinct characteristics in the phonetic system of

English and Vietnamese, the majority of inaccuracy was associated with the plosive

/p/, which has been anticipated. In contrast to this, the accuracy rate of /k/ was higher

than initial expectation, whereas no significant proportion of errors was observed in /t/

pronunciation. It can be concluded that despite a multitude of mistakes identified in

the analytical process, the undergraduate participants have pronounced the plosive

sounds with a relatively high rate of accuracy.

Implications and Limitations

The research has enhanced our understanding of how students perceive and

produce initial voiceless aspirated English plosives to some extent. It is expected that

these findings will be beneficial and valuable in Phonetic and Phonology research in

Vietnamese context. Also, it is believed that VOT is a significant criterion for

identifying voiceless pops in the initial place of words.

Moreover, the following suggestion may be drawn from the present perception

and production test for International University students (IU students). The results of

the perceptual and production test conducted among IU students indicate that they had

difficulties in identifying and producing syllable-initial plosives, with the error rate

following the respective order of /p/, /k/, and /t/. Based on the findings, it should be

recommended that teachers act as good models for proper English pronunciation, pay

closer attention to how their students pronounce words and provide more instruction
25

on English phonemes. Also, aspirated /p/ should be emphasized over the two sounds

that pupils are most prone to mispronounce, for example, by employing minimal

pairsor comparing the phonetic systems of the L1 and L2 languages. As for the

students, they should practice and improve their pronunciation of English sounds,

particularly the voiceless plosive consonants /p/, /t/, and /k/.

However, despite the valuable outcomes, this study is limited by its small scale.

Firstly, the research only focuses on a limited range of consonants in specific

positions, which does not include word-middle and final plosives or plosives in

clusters throughout the paper. Secondly, the sample size of 30 participants is not

sufficient enough to accurately represent the entire population of English language

learners at the International University. However, a sample of this size is the

minimum number required to ensure that the study is valid. Moreover, objectively,

some external factors might have an influence on the students’ recorded

pronunciation. As specified in Methodology, the recording process was carried out

using mediums, which were iPhones, as the Praat software could not be directly

utilized for recording purposes. Therefore, technical problems stemming from this

type of mobile device were a possible factor that might affect or even distort the

recording quality. Moreover, since the testing procedure was performed in an active

classroom with the presence of other undergraduates, background noise was barely

avoidable.

Conclusion

The conducted research has more or less made valuable contributions to the

exploration of perception and production of initial plosives as well as having

addressed the existing research gap to an extent. The findings of this study indicate as

follows:
26

1. VOT varies depending on the place of articulation; however, there is an

inconsistency between the outcomes of the production test and those reported in the

literature.

2. The sound /p/ is the most challenging as anticipated based on the differences

between English and Vietnamese, while students face almost no difficulty recognizing

and producing the /k/ sound.

The outcomes of this research investigation can be utilized in the future to

improve the quality of teaching listening and speaking by providing additional

instruction on English phonemes. Furthermore, as EFL learners, it is advisable for

students to have an understanding of how consonants, such as initial plosives, are

produced in various situations- in isolated words and in utterances. They should also

be mindful of their perception to ensure effective communication.

In addition, the current study only focuses on aspirated plosives, and thus,

future studies should delve more deeply into unaspirated sounds at the same place of

articulation to gain a comprehensive understanding of the phenomenon of

mispronouncing /p/ as /b/ by Vietnamese EFL learners. Nevertheless, further research

is undoubtedly required to fully comprehend the perceptual tendencies of IU students.

Furthermore, the relationship between speech perception and production requires

more in-depth investigations in order to obtain a comprehensive understanding.


27

References

Awoonor-Aziaku, L. (2021). Realisation of voice onset time (VOT) and its

implication on voicing of English (RP) stops in Ghanaian english (Ghe). Open

Journal of Modern Linguistics, 11(03), 448–460.

https://doi.org/10.4236/ojml.2021.113034

Baker, A. (2006). Ship or Sheep?: An intermediate pronunciation course. Cambridge

University Press.

Ben, T. (2005). Perception and Production of Non-Native Prosodic Categories .

Brown, G. (2006). Second language listening. Encyclopedia of Language &

Linguistics, 81–88. https://doi.org/10.1016/b0-08-044854-2/00629-5

Dinh, T. L. & Nguyen, H. V. (1998). Cơ cấu ngữ âm tiếng Việt [Structure of

Vietnamese phonetics]. Ho Chi Minh City, Vietnam: Nhà Xuất Bản Giáo Dục

Dost, I. N., & Bohloulzadeh, G. (2017). A review of Contrastive Analysis Hypothesis

with a phonological and syntactical view: A cross-linguistic study. The

Buckingham Journal of Language and Linguistics, 10, 32–41.

https://doi.org/10.5750/bjll.v10i0.1482

Gilakjani, A. P. & Sabouri, N. B. (2016). Why Is English Pronunciation Ignored by

EFL Teachers in Their Classes?. International Journal of English Linguistics,

6(6), 195. https://doi.org/10.5539/ijel.v6n6p195


28

Gilakjani, A. P. (2012). A Study of Factors Affecting EFL Learners' English

Pronunciation Learning and the Strategies for Instruction. International

Journal of Humanities and Social Science, 2(3), 119-128.

Help:IPA/Vietnamese. Wikipedia. (2022, May 31). Retrieved February 9, 2023, from

https://en.wikipedia.org/w/index.php?title=Help%3AIPA

%2FVietnamese&oldid=1090870294#cite_note-p-2

Hoang, T. T. (2014). The interference of the mother tongue in the first year students’

English pronunciation at Thai Nguyen University of Technology.

Hualde, J., Simonet, M. & Nadeu, M. (2011). Consonant lenition and phonological

recategorization. Laboratory Phonology, 2(2), 301-329.

https://doi.org/10.1515/labphon.2011.011

Hwa-Froelich, D., Hodson, B. W., & Edwards, H. T. (2002). Characteristics of

Vietnamese phonology. American Journal of Speech-Language Pathology,

11(3), 264–273. https://doi.org/10.1044/1058-0360(2002/031)

Jenkins, J. (2000). The phonology of English as an international language. Oxford

University Prewa.

Kaur, J. (2015). Factors Influencing Voice Onset Time (VOT): Voice Recognition.

International Journal for Research in Applied Science & Engineering

Technology.

Lisker, L., & Abramson, A. S. (1964). A Cross-Language Study of Voicing in Initial

Stops: Acoustical Measurements, 20(3), 384–422.

https://doi.org/10.1080/00437956.1964.11659830
29

Liberman, A. M., Harris, K. S., Hoffman, H. S., & Griffith, B. C. (1957). The

discrimination of speech sounds within and across phoneme

boundaries, 54(5), 358–368. Journal of experimental psychology.

https://doi.org/10.1037/h0044417

Loitsch, K. (2016). Perception and Production of English Voicing.

Ly Bui, T. T., Mai, T. H., & Diep, H. N. (2021). Common errors in pronouncing final

consonants of English-majored sophomores at Tay Do University, Vietnam.

European Journal of English Language Teaching, 6(3).

https://doi.org/10.46827/ejel.v6i3.3640

Muis, A. (2008). Students’ Errors in Pronouncing English Voiced Stops in Words

Final Position ”The Case of The Tenth Grade Students of MA Al Asror

Patemon Gunungpati Semarang in the Academic year of 2007/2008.

Semarang: Unpublished Final Project of FBS Semarang State University

Roach, P. (2009). Chapter 4. English Plosives. In English phonetics and phonology: A

practical course (pp. 26–27). Cambridge University Press.

Styler, W. (2012). Using praat for linguistic research latest.

Tam, H. (2005). Common Pronunciation Problems of Vietnamese Learners of

English. VNU Journal Of Foreign Studies, 21(1).

Tang, G. (2007). Cross-linguistic analysis of Vietnamese and English with

implications for Vietnamese language acquisition and maintenance in the

United States. Journal of Southeast Asian American Education and

Advancement, 2(1). https://doi.org/10.7771/2153-8999.1085


30

Thuật Đoàn Thiện. (2000). Ngữ Âm tiếng việt. Nhà xuất bản Đại học quốc gia Hà Nội.

Truong, L. (2015). Vietnamese and English phonological analysis. Academia.edu.


31

Appendices

Appendix A. Consent form

INFORMED CONSENT

Informed Consent to Participate in a Research Study

International University – School of Languages

Quarter 6, Linh Trang Ward, Thu Duc District, HCMC

Title of Research Project: Perception and production of English initial aspirated

plosives /p-t-k/ by International University students

Name of Principal Investigator: Tran Ngọc Hồng Phúc

Email of Principal Investigator: ENENIU18129@student.hcmiu.edu.vn

Phone number of Principal Investigator: (+84)0773772468

A. PURPOSE AND BACKGROUND

Ms. Tran Ngoc Hong Phuc and the School of Languages – IU-VNU is conducting

research on how IU students recognize and pronounce consonants. The purpose of

your participation in this research is to help the researcher obtain authentic data.

You were selected as a possible participant in this study because your educational

background, i.e., English level, majors, etc. is highly matched with the criteria of

the study.

B. PROCEDURES

If you agree to participate in this research study, the following will occur:

1. First, you will sign in this consent form in order to validate your participation in the

study.
32

2. Second, you will have a perception test on paper (5 minutes) in which you listen to

the recording twice and choose the words you heard.

3. Third, you will have a production test (3-5 minutes) in which you pronounce

separate words and read-aloud sample sentences. The whole session will be

recorded. Your work will be used for further analysis and your results (scores) will

be kept confidential.

C. RISKS

By participating in this research, you can comfortably perform your ability in

perceiving and producing words/ sentences. We guarantee that it is ONLY for the

research, so there is no pressure for you in conducting the tests.

D. CONFIDENTIALITY

The records from this study will be kept as confidential as possible. No individual

identities will be used in any reports or publications resulting from the study. All

questionnaires, results, personal information will be given codes and stored

separately from any names or other direct identification of participants. Research

information will be always kept in locked files. Only research personnel will have

access to the files and only those with an essential need to see names or other

identifying information will have access to that particular file. After the study is

completed, the records will still be stored for evidence.

E. BENEFITS OF PARTICIPATION

There will be no direct benefit to you from participating in this research study. The

anticipated benefit of your participation in this study is to provide empirical data for

analyzing the perception-production process of EFL in Vietnam.

F. VOLUNTARY PARTICIPATION

Your decision whether or not to participate in this study is voluntary and will not
33

affect your relationship with the department of English. If you choose to participate

in this study, you can withdraw your consent and discontinue participation at any

time without prejudice.

G. QUESTIONS

If you have any questions about the study, please contact Ms. Tran Ngoc Hong

Phuc by calling (+84)0773772468. You can also contact any questions about the

rights of research participants or research-related concerns.

CONSENT

YOU ARE MAKING A DECISION WHETHER OR NOT TO PARTICIPATE IN

A RESEARCH STUDY. YOUR SIGNATURE BELOW INDICATES THAT

YOU HAVE DECIDED TO PARTICIPATE IN THE STUDY AFTER READING

ALL OF THE INFORMATION ABOVE AND YOU UNDERSTAND THE

INFORMATION IN THIS FORM, HAVE HAD ANY QUESTIONS

ANSWERED, AND HAVE RECEIVED A COPY OF THIS FORM FOR YOU

TO KEEP.

Signature __________________________________ Date

Research Participant

Signature __________________________________Date

Interviewer

Appendix B. Perception test

Perception test
34

Topic: Perception and production of English initial aspirated plosives

/p-t-k/ by International University students

My name is Hong Phúc, a K18 student majoring in English Linguistics. I am

currently taking a thesis this semester, and this small test will enable me to collect

data for further analysis.

We have 2 sections in this test.

Section 1: Listen to each question twice and choose the word you hear. There is no

scoring nor feedback on your performance, so feel no pressure.

Section 2: For each question, you will listen to 3 words namely A, B and C. You are

allowed to listen once and tick on A, B, or both in which the word is the same as the

last word (word C)

Section 1: Listen to the recording and choose the word you hear

1. bear pear

2. past bast

3. tie die

4. cold gold

5. game came

6. down town

Section 2: Listen to the recording and choose the word same as C (you can

choose both A and B)

1. A (bay) B (pay) C (pay)

2. A (came) B (game) C (came)

3. A (bast) B (past) C (past)

4. A (gold) B (cold) C (cold)


35

5. A (tie) B (die) C (tie)

6. A (town) B (down) B (town)

Appendix C. Production test

Production test

Topic: Perception and production of English initial aspirated plosives

/p-t-k/ by International University students

There are 2 tasks in this test.

Task 1: Pronounce the given words THREE times.

Task 2: Read the given sentences TWICE.

Note: The whole session will be recorded for scoring. Your score will not be

officially announced; instead, it will only be kept as evidence for this research.

Please read the following words THREE times.

1. cold

2. coffee

3. to

4. talk

5. pack

6. passport

Please read the following sentences TWICE.

1. Pack your bags and bring your passport!

2. Can I talk to you?

3. You gave me cold coffee again.

You might also like