Indranil Stops PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 155



B.A., Jawaharlal Nehru University, 1995
M.A., Jawaharlal Nehru University, 1997
M.A., University of Illinois at Urbana-Champaign, 2001

Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy in Linguistics
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2007

Urbana, Illinois

Doctoral Committee:
Professor Hans Henrich Hock, Chair
Associate Professor Jennifer S. Cole
Professor Jose Ignacio Hualde
Associate Professor Chilin Shih

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

UMI Number: 3301265


The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations and
photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.

UMI Microform 3301265
Copyright 2008 by ProQuest LLC.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.

ProQuest LLC
789 E. Eisenhower Parkway
PO Box 1346
Ann Arbor, Ml 48106-1346

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

2007 Indranil Dutta

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A bstract

In this dissertation, results from an acoustic phonetic study of the four stops types
in Hindi: voiced stops (VS), voiced aspirated stops (VAS), voiceless stops (VLS)
and voiceless aspirated stops (VLAS) are reported. The Standard View on the
distinction between VS and VAS proposes th a t the voiced aspirated stops are VS
with a breathy m urm ured release and this release feature is sufficient to make the
contrast between the VS and VAS. Evidence from studies on the duration of voic
ing and effect of m anner of articulation on the fundam ental frequency (f#) of the
following vowel in Hindi questions the characterization proposed by the standard
view. This study through an exam ination of durational properties of stop closure,
voicing during closure and aspiration following these stops provides evidence against
the standard view.
Both VAS and VS have been shown to lower f 0 of the following vowel. It has
also been shown th a t the VAS lower f 0 even further. This evidence suggests th a t f0
perturbations can be reliable acoustic cues for stop identification. The goal of this
dissertation is to understand not only the m agnitude of the f 0 perturbations but
also the extent of this effect in the following vowel.
Spectral intensity analysis of contrasting breathy and modal vowels in G ujarati,
!X6 o and languages which make use of the breathy and modal phonation type as
contrastive features provide a background against which spectral analysis of the
breathy/m urm ured release following VAS can be conducted to test the assumptions
of the standard view. Spectral analysis based on four measures of spectral intensity

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

of the vowel following the stops indicate th a t the breathiness following the VAS
perm eates into a sizeable portion of the vowel. Comparisons between durations of
breathiness spread and voiceless aspiration also show th a t voiceless aspiration is
shorter in duration than the duration of breathiness characterized by th e difference
in spectral intensity between the VAS and the unaspirated stops (VS,VLS).
Based on these analyses, I argue th a t the stop distinctions in Hindi are best
understood as a cumulative effect of several acoustic cues, in contradistinction to
previous accounts, including the standard view.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

To Tulsi, m y love and life.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A cknow ledgm ents

First and foremost, my thanks and gratitude are due to Professor Hans Henrich
Hock, my advisor, mentor, Doktorvater and above all an exceptional hum an being.
This project and all of my research is due to him and his constant encouragement,
advice, help and support.

H ans wisdom and intellect is visible in whatever is

exceptional in this thesis. Through the years, I learned to appreciate and attem pted
to emulate H ans keen eye for detail and transparency in making arguments. His
exceptional intellect, knowledge and enthusiasm led me through the hardest parts
of coming up with the research questions, the hypotheses and finally the analyses.
Words are not enough to express the gratitude I feel for H ans constant support
for my dissertation project and my general intellectual growth. A part from being
an exceptional advisor, mentor, source of strength, Hans exhibits a passion for
linguistics th a t is rare and infectious, as much as his sense of humour. I am indebted
to him for pointing out so many directions and yet leaving me to choose the way I
wanted to go.
Words can also not express entirely, my gratitude for my dissertation commit
tee. Professors Jennifer Cole, Jose Ignacio Hualde, and Chilin Shih are an extremely
supportive group of scholars whose constant encouragement and insight shaped the
dissertation. Professor Coles keen interest in my dissertation project, her support
and advice led me through all of my graduate career. Her valuable insights on for
m ulating relevant research questions and passion for experim entally addressing such
questions helped me understand the need for empiricism in linguistics. Right from

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

the very beginning of my graduate career she has been a guide and the rigour she
embodies in scientific research has been nothing b u t a major source of inspiration.
Professor Hualde has been a m ajor source of encouragement and knowledge, es
pecially when it came to understanding the crucial relationship between diachronic
and synchronic processes in languages. His unparalleled scholarship, good cheer and
sense of hum our is worthy of emulation. Professor Shihs support for my project
came by way of her outstanding scholarship and insight on experim ental research.
Her constant and continuous encouragement, and her enthusiasm for finding a lot
more than meets the eye from the d ata helped shape the observations and results
in this work. I am truly indebted to my comm ittee of scholars for their support,
encouragement and advice and couldnt have embarked on this project w ithout their
crucial input.
T hanks are due also to my numerous teachers at the D epartm ent of Linguis
tics. Abbas Benmamoun, Eyamba Bokamba, Chin-Chuan Cheng Georgia Green,
Molly Mack, Jerry Morgan, Chin-Woo Kim, Daniel Silverman and James Yoon are
responsible for the well-rounded education in linguistics th a t I received here at Illi
nois. Thanks are also due to the staff of FLB, especially, Mary Ellen Fryer and P at
Gallagher for their passionate com mitment th a t makes life for graduate students
easy and stress free.
I am extremely thankful to Parvinder Kaur, Principal and Team Leader of K atha
K hazana School for being so helpful during my trip to the field. Her contribution
to this thesis is immense, w ithout her help and support, I wouldnt have been able
to collect my data. I am also indebted to the staff of K atha K hazana for making
my days at the school comfortable and also to the students for providing general
amusement. I am extremely thankful to my subject pool for volunteering to take
part in my project w ithout any need for encouragement.
Over the years, the encouragement and good times shared with my friends in

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

U rbana-Cham paign have had nothing but an salubrious effect on my general well
being. Countless conversations later, I am glad to say th a t th e following friends
have had an extremely positive role to play towards my social and professional life.
So, thank you Emin Adas, Som nath Baidya Roy, Himika B hattacharya, Shefali
Chandra, G uha D harm arajan, Serife Genis, Michael Koliska, Sarasij M ajumder,
Kelly M aynard, Deepti Misri, A niruddha M itra, Mark Nye, A m it Prasad, Srirupa
Prasad, Ra Ravishankar, Celiany Rivera-Velazquez, Shilpi Sarkar, A nna Schultz,
D ebarati Sen, Sunita Singh, Saadia Toor and Shivali Tukdeo.
For wonderful times shared w ith my colleagues at Nuance Communications in
Belgium, who quickly became the best friends one can hope for, Duygu Can, Ben
jam in Freitag, Benedicte Frostad, Elisa Paloni, Elena Sacau, R enaat Seys, W ang-Ju
Tsai and Chingwen Tseng, I th an k you from the bottom of my heart. Thanks are
also due to the folks behind the bar at M anteca, the best watering hole on the other
side of the big river.
I am very grateful to my parents, R atna and Pradip D utta, for instilling in
me a belief in my abilities and for providing the necessary foundation for pursuing
life with honesty, courage and strength. I am also very thankful to Geeta and K.
D harm arajan, my parents-in-law, for their support and encouragement. I also want
to acknowledge the love and affection of Amamma, Dadi, C IT T h ath a and Delhi
T hatha, my grand-parents-in-law. From them, f learned the valuable lesson of living
with extreme compassion.
My affection and thanks are also reserved for Tashi and Zuri, my adorable pup
pies, for sharing with me their undying passion for food and a good time. Their
love translated into numerous licks made it easy to go through the toughest patches
of the dissertation process.
And finally, my thanks, love and gratitude for Tulsi. Her joie de vivre, infectious
laugh, lust for life, selfless and caring nature have filled me with a sense of well-being

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

th a t is unprecedented. Tulsis passionate existence is worthy of an entire volume

by itself. She has supported me through thick and thin, and there isnt a word in
my vocabulary th a t can encapsulate my love and affection for her. She is the sole
reason th a t I have been able to complete this project and also the sole reason th a t I
can continue to work towards goals th a t far exceed my abilities. To her, I dedicate
this work w ith w arm th and love.
Quite obviously there are several individuals, entities and institutions who are
also responsible for all th a t is good in this dissertation. As always, th e sole respon
sibility for regrettable omissions and errors rests with me alone.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Table of C ontents

List o f T a b l e s .........................................................................................................
List o f F ig u r e s ....................


List o f A b b r e v ia tio n s................................................................................................ xiv

C hapter 1 Introd u ction ..................................................................................
1.1 Aims of this d isse r ta tio n ...............................................................................
1.2 Sanskrit grammarians on voicing and a sp ir a tio n ....................................
1.3 Feature based accounts of the phonology of stop types in Hindi . . . .
1.4 The standard view on stop types in Hindi ..................................................10
1.5 Problems with the standard v i e w .................................................................. 13
1.6 Studies on the effect of stop type on VLT andf o ......................................... 15
1.7 Research questions and o u tlin e......................................................................... 17
C hapter 2 E xperim ental m e th o d o lo g y ........................................................ 21
2.1 Introduction.......................................................................................................... 21
2.2 Experimental d e s i g n ..........................................................................................21
2.2.1 M a te r ia l...................................................................................................... 21
2.2.2 S u b je c ts ...................................................................................................... 22
2.2.3 R e co r d in g ...................................................................................................23
2.3 Acoustic a n a ly s is ................................................................................................ 23
2.3.1 Closure duration and V L T .....................................................................25
2.3.2 Aspiration and vowel d u ra tio n .............................................................. 26
2.3.3 f0 a n a ly s is ...................................................................................................30
2.3.4 Spectral m easurem ents........................................................................... 30
2.4 Statistical a n a ly s e s ............................................................................................. 33
C h a p te r 3 C lo su r e a n d v o ic in g in H i n d i .........................................................
3.1 O utline.......................................
3.2 Durational properties: Closure duration ( C D ) ........................................... 37
3.3 Closure duration with p a u s e .............................................................................43
3.4 Durational properties: Voicing lead time (VLT) durations....................... 46
3.5 Closure and VLT correlation.............................................................................55
3.6 Summary of r e s u lt s ............................................................................................. 57


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

C hapter 4 Effects o f stop ty p e on fundam ental f r e q u e n c y ................

4.1 Universal tendency for f0 lowering followingvoiced s t o p s ........................ 59
4.2 f0 as a function of stop t y p e .............................................................................61
4.2.1 Subject G A ...............................................................................................62
4.2.2 Subject P B ...............................................................................................63
4.2.3 Subject R M ...............................................................................................67
4.2.4 Subject SD
4.2.5 Subject SV
4.3 D iscu ssio n .............................................................................................................. 73
4.4 Effect of context on f0 .......................................................................................76
4.5 Summary and im p lications................................................................................ 77
C hapter 5 A spiration and vow el d u r a tio n .................................................
5.1 O u tlin e.....................................................................................................................79
5.2 Durational properties: Aspiration d u r a tio n ..................................................80
5.2.1 Subject GA andPB ................................................................................80
5.2.2 Subject RM, SD and S V .........................................................................80
5.3 Durational properties: Vowel d u r a tio n s.........................................................82
5.4 Summary of r e s u lt s ............................................................................................. 84
C hapter 6 Spectral properties o f H indi stops...........................................
6.1 Breathiness due to incomplete glottal clo su re...............................................86
6.2 The relevance of measures of spectral in ten sity ......................
6.3 Spectral intensity m easures................................................................................ 88
6.3.1 H i -H2 ........................................................................................................ 88
6.3.2 Hj-Ai ........................................................................................................ 95
6.3.3 Hr A2 ...................................................................................................... 101
6.3.4 I R - A a .....................................................
6.4 Contributions of the individual spectral intensity m e a su r e s.................. 113
6.5 Summary, discussion and conclusion..............................................................119
C hapter 7 C onclusions, im plications and further r e s e a r c h .................... 121
7.1 Overview................................................................................................................121
7.2 Implications and further r e s e a r c h .................................................................123
A p p en d ix A: Frame S e n t e n c e s ........................................................................... 128
A p p en d ix B: W ord L i s t .........................................................................................129
A p p en d ix C: Language Background Q u e s tio n n a ir e ...................................132
R e f e r e n c e s .................................................................................................................... 134
A u th o rs B io g r a p h y ................................................................................................... 139

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

List of Tables

Sanskrit phonetic and phonological characterization of voice and breath 4

1 .2
Hindi stops: M anner and place of a r t ic u la ti o n .........................................
1.3 Average -VOT v a lu e s ........................................................................................... 13


Effect of context'on VLT: Post-hoc comparison

based on SNK and Tukey HSD ...................................................................... 52
3.2 Effect of place of articulation on VLT: Post-hoc comparison based on
SNK and Tukey H S D .......................................................................................... 55


Ordered relations between the stops from 10 to 50 percent of the

vowel based on Tukey HSD c o m p a r is o n s ...................................................... 75


Maximally distinct distributions and contributing factors towards a

distinction in vowel quality. Means of values at
10 and 30 percent of the vowel......................................................................... 115
6.2 Maximally distinct distributions and contributing factors towards a
distinction in vowel quality. Means of values at
10, 30 and 50 percent of the vowel...................................................................116


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

List of Figures

1.1 Axis of contrast: Voicing and aspiration ...................................................
1.2 Laryngeal feature s c h e m a tic ..........................................................................
1.3 Lower VLT and f0 values for voiced aspirated/breathy voiced stops
(Speaker H G ) ........................................................................................................16




Closure D uration (CD) measurements: Panel A, auditory evaluation

confirms long pause between [la:l] and [k^ad]. Panel B, CD was an
notated between [lad] and [ C a d a ] ................................................................... 27
VLT measure: From cessation of final [1] in [lad] to appearance of
burst in [g a :d ]........................................................................................................28
A spiration duration measure: From burst till appearance of sinusoidal
waveform, [thad]. Points at which fo measures were taken for VLAS
are also shown in Tier 2....................................................................................... 29
Percentage duration of vowel for spectral m easu rem en ts............................ 33
Spectral measures: Spectrum, LTAS and LPC analysis ............................ 34


Effect of place of articulation (POA) on Closure D uration (ms) . . . . 39

Effect of stop type on Closure D uration ( m s ) ........................................40
Effect of Context on Closure D uration .........................................................42
Effect of Stop type on Closure Duration: Excluding velar stops . . . . 43
Effect of Stop type on Closure D uration (with pause): U tterance
medial [s] condition; dental POA ................................................................... 45
3.6 Speaker GA: ANOVA results for dependent variable VLT and inde
pendent variables, stop type (Panel A), place of articulation (Panel
B), context (Panel C), Horizontal lines in boxes (Panel D) represent
median values and Summary of effects and interactions (Panel E) . . 47
3.7 Speaker PB: ANOVA results for dependent variable VLT and inde
pendent variables, stop type (Panel A), place of articulation (Panel
B), context (Panel C), Horizontal lines in boxes (Panel D) represent
median values and Summary of effects and interactions (Panel E) . . 48
3.8 Speaker RM: ANOVA results for dependent variable VLT and inde
pendent variables, stop type (Panel A), place of articulation (Panel
B), context (Panel C). Horizontal lines in boxes (Panel D) represent
median values and Summary of effects and interactions (Panel E) . . 49


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

3.9 Speaker SD: ANOVA results for dependent variable VLT and inde
pendent variables, stop type (Panel A), place of articulation (Panel
B), context (Panel C), Horizontal lines in boxes (Panel D) represent
median values and Summary of effects and interactions (Panel E) . . 50
3.10 Speaker SV: ANOVA results for dependent variable VLT and inde
pendent variables, stop type (Panel A), place of articulation (Panel
B), context (Panel C), Horizontal lines in boxes (Panel D) represent
median values and Summary of effects and interactions (Panel E) . . 51
3.11 Effect of place of articulation on V L T .............................................................54
4.1 Effect of stop type on f0 (Subject GA): Mean f0 p l o t s .............................. 64
4.2 Effect of stop type on f0 (Subject GA): Variation in f0 values. Hori
zontal lines within the boxes represent the median f0 values
4.3 Effect of stop type on f0 (Subject PB): Mean f0 p l o t s .............................. 6 6
4.4 Effect of stop type on f0 (Subject PB): Variation in f0 values. Hori
zontal lines w ithin the boxes represent the median f0 values
4.5 Effect of stop type on f0 (Subject RM): Mean f0 p l o t s .............................. 69
4.6 Effect of stop type on fo (Subject RM): Variation in fo values. Hori
zontal lines w ithin the boxes represent th e median f0 values
4.7 Effect of stop type on f0 (Subject SD): Mean f0 plots ................................ 71
4.8 Effect of stop type on f0 (Subject SD): Variation in f0 values. Hori
zontal lines within the boxes represent the median f0 values.......................72
4.9 Effect of stop type on f0 (Subject SV): Mean f0 plots ................................ 73
4.10 Effect of stop type on f0 (Subject SV): Variation in f0 values. Hori
zontal lines within the boxes represent the median fo values...................... 74

Effect of place of articulation (POA) on A spiration D uration (ms) . . 81

Effect of Context on Aspiration D uration ( m s ) ............................................82
Vowel duration as a function of stop m a n n e r ............................................... 83
Box plot: Percentage of aspiration duration relative to Vowel D u ra tio n r^ / 84


Effect of Stop Type on H i-H 2 ..........................................................................93

Box plots show variation in Hx-H 2 values and overlapping......................... 94
Effect of Stop Type on Hx-Ax ..........................................................................99
Box plots show variation in Hj-Ai values...................................................... 100
Effect of Stop Type on Hx-A 2 ........................................................................104
6 .6
Box plots show variation in H i-A 2 values...................................................... 105
6.7 Effect of Stop Type on Hx-A 3 ........................................................................110
6 .8
Box plots show variation in H i-A 3 values...................................................... I l l
6.9 Box plot: Percentage of aspiration duration relative to Vowel D uration 7 oin2113
6.10 Box plot: Effect of place of articulation on Hx-Ax....................................... 114
6.11 M ean spectral intensity for values at 10 and 30 percent of the vowel . 117
6.12 Mean spectral intensity for values at 10, 30 and 50 percent of the v o w elll 8


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

List of A bbreviations


Closure D uration


Fundam ental Frequency

H 1 -A 1

Difference between the am plitude of the First Harmonic and the peak
am plitude of the first form ant

H i-A 2

Difference between the am plitude of the First Harmonic and the peak
am plitude of the second formant

H 1 -A 3

Difference between the am plitude of the First Harmonic and the peak
am plitude of the third form ant

H 1 -H 2

Difference between the am plitude of the First Harmonic and Second



Voiced A spirated Stops


Voiceless A spirated Stops


Voiceless Stops


Voiced Stops


Voice Lead Time


Voice Onset Time


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Chapter 1

svaso ghosanam
itaresam tu nadah
sosmosmanam ghosinam svasanadau
B reath is em itted for the voiceless sounds but for the others voice,
for the voiced fricative (h) and the v o iced asp irates, both breath
and voice.
Rk-Pratisakhya xiii.4-6


A im s o f th is d isserta tio n

This dissertation is an acoustic study of the four stop types in Hindi: voiced stops
(VS), voiced aspirated stops (VAS). voiceless stops (VLS) and voiceless aspirated
stops (VLAS). The Standard View" on the distinction between VS and VAS pro
poses th a t the voiced aspirated stops are VS with a breathy murm ured release and
this release feature is sufficient to make the contrast between the VS and VAS
(Ladefoged and Maddieson 1996, Dixit 1987a). Evidence from studies on the dura
tion of voicing and effect of m anner of articulation on the fundam ental frequency
(fo) of the following vowel in Hindi questions the characterization proposed by the
standard view. This study through an exam ination of durational properties of stop
closure, voicing during closure and aspiration following these stops provides evidence
against the standard view.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Both VAS and VS have been shown to lower f0 of the following vowel. It has
also been shown th a t the VAS lower f0 even further. This evidence suggests th a t f0
perturbations can be reliable acoustic cues for stop identification. The goal of this
dissertation is to understand not only the m agnitude of the f0 perturbations but
also the extent of this effect in the following vowel.
Spectral intensity analyses of contrasting breathy and modal vowels in G ujarati,
!X6 o and other languages which make use of the breathy and modal phonation type
as contrastive features provide a background against which spectral analysis of the
breathy / murm ured release following VAS can be conducted to test the assum ptions
of the standard view. Spectral analyses based on four measures of spectral intensity
of the vowel following the stops indicate th a t the breathiness following the VAS
perm eates into a sizeable portion of the vowel. Comparisons between durations of
breathiness spread and voiceless aspiration also show th a t voiceless aspiration is
shorter in duration than the duration of breathiness characterized by the difference
in spectral intensity between the VAS and the unaspirated stops (VS,VLS).
Several studies have shown th a t prosodic context has an effect on durational
properties (Cho and Jun 2000, Cho and McQueen 2005/4, Cole, Kim, Choi and
Hasegawa-Johnson 2007). An additional goal of this dissertation is to examine the
effect of prosodic context on the durational properties of the four stops in Hindi.
Based on these analyses, I will argue th a t the stop distinctions in Hindi are best
understood as a cumulative effect of several acoustic cues, in contradistinction to
previous accounts, including the standard view . 1 Before discussing the motivations
for conducting this study, I will provide a brief historical account of the views of the
Sanskrit grammarians on the issue of stop contrasts in Sanskrit. This introduction
P e y s e r and Stevens (2006) discuss the relevance of acoustic cues th a t are involved in enhancing
co n trasts. In chapter 6, I discuss the relevance of a theory of enhancem ent as proposed in Keyser
an d Stevens (2006) for a com prehensive understanding of cue interaction in Hindi.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

will help situate the arguments th a t have been made so far, both in the phonetic
and phonological literature.


Sanskrit gram m arians on voicin g and

asp iration

Predating the advent of modern phonetic sciences, Sanskrit phoneticians and gram
marians understood the complex articulatory dynamics behind the production of
voiced aspiration . 2

In fact one of the most im portant contributions of the San

skrit gram m arians was identification of the distinction between voiced and voiceless
sounds. The Pratisakhya clearly states the distinction between sounds th a t are pro
duced w ith a closed throat, ghosa (voice) and those th a t are produced with an open
throat as in a simple breath (voiceless), svasa (Allen 1953). These lead to further
characterization of madhye hakdrah an interm ediate glottal s ta te which has both
voice and voicelessness (see quote above)Ibid, p. 9. This state, where the glottis
is understood to produce both b reath and voice, was essential in understanding
not only the voiced aspirates but also voiced [h]. M ost Sanskrit grammarians made
the distinction between voice and b reath to be a distinction between closed and
open th ro a ts, respectively. William D. W hitney, however, expressed concerns with
regards to the characterization of voiced aspirates and voiced [h] by the Sanskrit
gram m arians in the Pratisakhya in the following way:
2In th is study. I will reserve the term s 'voiced asp iratio n to refer to the phonological category
and b re a th y / m u rm u r as labels for th e phonetic m anifestation of the category.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

The R k-Pratisakhya (xiii.2, r.

) declares both breath and sound to

be present in the sonant aspirates and in h, which could not possibly

be true of the latter, unless it were composed, like the former, of two
separate parts, a sonant and a surd: and this is impossible. W hitney
(1860-1863): 348
Aspiration produced with voice is inconceivable according to Whitney. Despite
W hitneys view, it is clear th a t the Sanskrit grammarians recognized a clear distinc
tion, first between voiced and voiceless sounds and second between the phonological
statu s of aspiration and the phonetic implem entation of both voice and b re a th ; rel
evant for the voiced aspirates and the voiced h . Thus, early on, Sanskrit phoneti
cians were able to make a distinction between phonological categories and phonetic
phenomenon. In Table 1.1, I provide a summary of this distinction.
Table 1.1: Sanskrit phonetic and phonological characterization of voice and breath

ghosin ( = voiced)

sosman (^aspirate)

aghosa (=voiceless)

t 1'

nada, (+)

svasa, (-)



While ghosin (voiced), aghosa (voiceless) and sosman (aspirate) are used as
purely phonological terms, nada and svdsa are term s reserved for expressing the
phonetic properties of sounds. A four-way stop contrast in Sanskrit, as in Hindi,
is asymmetrically addressed in the phonetics, but in the phonology, a completely
symmetrical account of the contrast is found. Thus, firstly the contribution of the
Sanskrit gram m arians begins with the recognition th a t voicing and voicelessness are
phonetically distinct. And secondly, in identifying th a t the phonetic m anifestation
of voiced aspiration could be both voiced and voiceless in order to explain the

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

phonological and categorical distinction between the Sanskrit voiceless aspirates

and voiced aspirates.
Phonological treatm ents of aspiration as a feature in this respect and breath
iness/m urm ur as phonetic m anifestation of this aspiration contrast find currency
even in contem porary analyses of laryngeal phonology and Hindi phonology in par
ticular. In section 1.3, I will provide a brief sum m ary of the nature of the laryngeal
stop contrasts in Hindi followed by a discussion of the various feature-based accounts
th a t a tte m p t to express these contrasts through various phonological features. Fol
lowing this, I will outline the arguments and implications of the standard view on
the phonetics of voiced aspiration. In section 1.5, the problems associated with
the standard view are discussed. Section 1.6 provides a brief introduction to the
findings from previous research th a t address some of these problems. In this sec
tion, I will also outline the questions th a t remain unanswered. Section 1.7 outlines
the research questions th a t will be addressed through this experim ental study and
provides a brief organization of the following chapters.


F eature based accou n ts o f th e p h on ology o f

sto p ty p e s in H indi

A large number of modern Indo-Aryan languages exhibit a four-way laryngeal con

trast in their stops . 3 While Thai exhibits a three-way contrast, namely, voiceless
(VLS), voiceless aspirated (VLAS) and voiced (VS), Hindi possesses a fourth stop
type: voiced aspirated stop (VAS). The Hindi stops occur in four places of articu
lation. and thus can be represented in a 4x4 array (Table 1.2) . 4
3T he U PSID survey of languages shows th a t there are ten languages from six different language
families th a t exhibit a four-way laryngeal contrast.
4In th is array th e p alatals have not been presented, since th ey are affricated.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Table 1.2: Hindi stops: M anner and place of articulation






cl 4 h
g gh



The four-way stop contrast in Hindi is conventionally understood to be a laryn

geal contrast expressed in two dimensions - voicing and aspiration (Fig.

1.1). This

leads to four distinct categories of stops. While voicing requires the vocal folds to be
approxim ated and vibrating, aspiration requires the vocal folds to be abducted. Un
der this view, a two-dimensional representation establishes unambiguously three of
the four stop categories, namely, th e voiceless, voiced and voiceless aspirated stops,
while failing to address the articulatory mismatch between voicing and aspiration
in VAS. VAS, then, pose a difficulty for a binary featural analysis of this contrast,
in th a t both voicing and aspiration need to be simultaneously implemented at the
release of the stop closure.

Voiceless aspirated


Voiced aspirated


Voiceless stops

Voiced stops
1 '-1

Figure 1.1: Axis of contrast: Voicing and aspiration

The focus of research on laryngeal phonology in general and voiced aspiration

in particular has not deviated much from the twin poles of simultaneous aspira6

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

tion and voicing, when characterizing voiced aspirates. Feature based theories of
laryngeal phonology have attem pted to identify the segmental features th a t make
up the voiced aspirates. These features, be they binary, underspecified or privative,
have attem pted to express the category of voiced aspirate through an interaction of
voicing and aspiration (Avery and Idsardi 2001).
One of the earliest treatm ents is found in Halle and Stevens (1971), who pro
pose four laryngeal features, [spread glottis](sg), [constricted glottis](eg), [stiff vocal
cords] and [slack vocal cords]. In this model, voiced aspirates are characterized as
[+sg] and [+slack]. For a recent and complete overview of the feature based accounts
see Avery and Idsardi (2001) and Fig. 1.2 below.
Avery and Idsardi, 2001





Glottal Width


Glottal Tension


u u*
Larynx Height


Figure 1.2: Laryngeal feature schematic

As correctly argued by Keating (1988) and Lombardi (1994) these features pose
two m ajor problems: First the representation of voicing in this model is not articulatorily accurate, in th a t not all voiced sounds are produced with slack vocal chords
and neither are all voiceless sounds made w ith stiffening of the vocal cords. Second,
the model lacks a single feature th a t groups together all voiced sounds. The second
problem is crucial in the case of Hindi.
VOT theory (Lisker and Abramson 1964, Abramson and Lisker 1967, Abramson

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

1977, Poon and M ateer 1985) attem p ts to remedy these problems by expressing the
contrasts in term s of differences in Voice Onset Time (VOT). A m ajor contribution
of VOT theory was in identifying the phonetic correlates of voicing. Under this the
ory, Voice Onset Time (VOT) is a phonetic m anifestation of underlying differences
in laryngeal tim ing in relation to the release of the oral occlusion. Lombardi (1994)
is correct in stating th a t VOT theory is insufficient in accounting for the voiced as
pirates. In fact, Lisker and A bramson (1964) recognize the difficulty in establishing
a distinction between voiced stops and voiced aspirated stops simply on the basis of
VOT. The difficulty is brought about by the fact th a t both voiced aspirated stops
and voiced stops are produced w ith lead voicing or -VOT. These facts make VOT
inadequate for making a distinction between these two stop types.
The inadequacy of VOT theory in characterizing the voiced aspirated stops was
already pointed out by Schiefer (1986).

Lombardi (1994) argues th a t based on

phonetic and phonological d a ta the voiced aspirates in Hindi and other languages
are both voiced and aspirated. The bulk of the phonetic evidence th a t forms the
basis for the argum ent of Lombardi (1994) comes from Dixit (1989), Yadav (1984)
and Ingemann and Yadav (1978). Ladefoged (1971) presents the view th a t the voiced
aspirated stops can not be considered aspirated because the 'b reath y , m urm ured
release of these stops is distinct from voiceless aspiration.
Thus, previous research on the position of voiced aspirates essentially deal with
two hypotheses:
Hypothesis 1 => Voiced aspiration results from overlap of two independent
gestures, voicing and aspiration
Hypothesis 2 => Voiced aspiration represents an independent mode of phonation
Research on acoustic phonetics of Hindi voiced aspiration has addressed both

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

of these hypotheses, presenting d ata th a t provide evidence in favour of hypothesis

1 (Schiefer 1992) and hypothesis 2 (Ladefoged 1975). Schiefer (1992) finds th a t
listeners judgem ents of voiced stops improve for those stimuli th a t have relatively
longer durations of Voicing Lead Time (VLT)5. Based on the findings from Schiefer
(1986), Schiefer (1992) proposes a feature Lead onset tim e with values 1 and 2 for
voiced aspirated stops and voiced stops respectively, to account for the difference in
preference for VLT. As rightly pointed out by Selkirk (1992), Schiefers motivation
to posit the feature Lead onset tim e was in effect a way of shifting away from the
position th a t murmured stops represent a third possible state of the vocal chords
(Ladefoged 1971).

Additionally, width of glottal opening for voiced aspirates is

also an im portant gesture (Hirose 1977, Dixit and MacNeilage 1980, Benguerel and
B hatia 1980). Benguerel and B hatia (1980) show very different laryngeal tim ing
and opening properties for the voiced aspirated stops as compared to the other
stops. Thus, not only do both voiced aspirated stops and voiced stops show closure
voicing, the width of glottal opening during the closure for voiced aspirated stops is
significantly different from th a t of the voiced stops.
Thus, in summary, the categorical nature of the four-way contrast expressed
along the dimensions of voicing and aspiration have been the prim ary focus of
feature-based accounts of this contrast . While phonological accounts have attem pted
to characterize the asymmetrical nature of voicing and aspiration in voiced aspirated
stops, phonetic accounts have attem pted to provide evidence in favour of either hy
pothesis 1 or hypothesis 2. Schiefer (1989) argues in favour of Hypothesis 1 and
provides evidence to support a gesture-based account, while (Ladefoged 1971) ar
gues in favour of Hypothesis 2 by providing evidence from languages th a t make
contrasts between breathy V m urm ured and modal vowels. As far as the phonetics
of voiced aspirates is concerned, I will consider arguments in favour of Hypothesis
5VLT refers to th e period of voicing during closure

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

2 to represent the Standard View on voiced aspiration. In the following, I will

present in greater detail the arguments th a t form the standard view.


T he stan d ard v iew on sto p ty p es in H indi

The standard view on the phonetics of Hindi VAS has focused prim arily on the post
release aspiration phase (Ladefoged and Maddieson 1996, Dixit 1987a). Ladefoged
and Maddieson (1996) state that:
breathy voiced stops in Hindi and many other Indie languages are
acoustically distinguished from the plain voiced stops by w hat happens
after the release rather th an audible differences during the closure .
Dixit (1987a) and Dixit (1987b) show th a t voicing during the closure in VAS is
comparable to modal voicing and only at the release of these stops there is turbulent
high rate of air flow through th e glottis. VOT based theories also assume th a t the
low am plitude buzz associated with the release of VAS serves to distinguish the
VAS from the VS (Abramson and Lisker 1967. Lisker and Abramson 1964, Abramson
1977). This forms the bulk of the standard view on distinctions between VS and
VAS. The standard view also assumes th a t VAS are VS with breathy / m urm ured
According to Ladefoged (1971) and Ladefoged and Maddieson (1996), the breathy
/ murm ured release following the VAS represents a third state of laryngeal setting
to be contrasted with voiceless and modally voiced stops. Argum ents have also been
made on the basis of articulatory studies which suggest th a t only at the release of
VAS there is significant difference in glottal states between VS and VAS: while the
VS continue to be voiced (modally) the VAS are an adm ixture of interm ittent modal
voicing and aspiration at release (Dixit 1982, Dixit 1987a, Dixit 1987b). In addition,
VOT has also been used as a measure to distinguish between the four stops. VOT
refers to the timing of the onset of vocal fold vibration with reference to the release

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

of a consonant. Abramson and Lisker (1967), Abramson (1977), and Lisker and
Abramson (1964) show th a t in a great number of languages, including Hindi, stops
could be discrim inated on the basis of VOT. A VOT based theory of stop discrim
ination, however, is only able to distinguish between voiced stops (VS, VAS), VLS
and VLAS based on differences in the tim ing of the onset of vocal fold vibration.
Voiced stops (VS, VAS) are said to have negative VOT values due to the presence
of voicing during closure. VLS are considered to be short-lag stops and VLAS stops
are considered long-lag stops. The inadequacy of a VOT based distinction between
VAS and VS was recognized by the authors of VOT theory. Consider the following
quote from Lisker and Abramson (1964):
The two four-category languages, Hindi and M arathi, present us with
our only clear cut cases in which the measure VOT is insufficient for
distinguishing among all stop categories of a language. To be sure, the
voiced unaspirated and voiced aspirated stops show differences in aver
age values th a t are almost systematic; nevertheless they occupy ranges
th a t are nearly coextensive. It seems very likely th a t voiced aspirates
are distinguished from the other voiced category by the presence of low
am plitude buzz mixed with noise in the interval following the release
of the stop Lisker and Abramson (1964): 403.
VOT based analysis of Nepali, which also exhibits a four-way laryngeal contrast
like Hindi, by Poon and Mateer (1985) shows results similar to those of Lisker and
Abramson (1964), Abramson and Lisker (1967), and Abramson (1977). Following
the standard view, Davis (1994) shows th a t Noise Offset Time (NOT) is sufficient
to make a distinction between VS and VAS, with breathy release serving to enhance
the contrast. Noise Offset Time (NOT) as m easured by Davis (1994) is an amended
measurem ent of Lag Time or +V O T. She measures NOT as the tim e between stop
release and appearance of the Fg on the vowel, which signals the cessation of aspi11

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

ration noise.
The crucial difference between these argum ents arises because of the assump
tions th a t - a) breathy / murm ured release is a categorically distinct glottal state
(Ladefoged 1971, Ladefoged and Maddieson 1996); b) the VS and VAS are both
voiced in the closure and thus there is no need to propose a third glottal state (Dixit
1982, Dixit 1987a, Dixit 1987b) or c) VOT differences between VS and VAS are not
significant (Lisker and Abramson 1964, Abramson and Lisker 1967, Abramson 1977).
A m ajor assum ption of the standard view therefore is:
The VAS can be characterized as VS with a breathy / murm ured release.
This assum ption th a t VAS is VS with a breathy / murm ured release has two
m ajor implications in relation to the phonetic characterization of the stops in Hindi:
1. VAS and VS are both voiced in the closure and the difference between them
arises in the release: this release is characterized by simultaneous voicing and
aspiration, which results in a breathy / murmured release.6
2. Voicing during closure (VLT) an d /o r duration of closure (CD) represent nondistinctive differences.
In summary, the m ajor assum ption of the standard view implies: (1) the duration
of the occlusion or Closure D uration (CD) is not a significant cue for contrasting
between the four stops in Hindi. (2) the voicing during the stop closure (VLT) is not
relevant in m aintaining contrast between the voiced stops. (3) the VAS and VS are
distinguished only by the breathy / murmured release portion which is associated
with the VAS.
6T he term s b re a th y and m u rm u r appear interchangeably in the lite ra tu re on the phonetic
m anifestation of voicing and aspiration contrast. Research on contrastive non-m odal phonation
differences in vowels, however, exclusively m aintains the term 'b rea th y ' to contrast these vowels
from m odal vowels.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


P rob lem s w ith th e stan d ard v iew

Implications from the assumptions of the standard view notw ithstanding, Schiefer
(1986) shows th a t listeners prefer longer VLT durations for unambiguous perception
of VS. Similar results for VLT durations (see Table 1.3 below) are also found in the
raw d ata of Lisker and Abramson (1964) and Davis (1994). However, the relevance
of VLT towards making a contrast between the VAS and VS does not form the
prim ary focus of these studies.

Lisker and Abramson (1964)

Davis (1994)



Table 1.3: Average -VOT values

In an acoustic phonetic study, D u tta (2007) argues th a t there is a significant

difference in VLT between VAS and VS in Hindi. VS are produced w ith signifi
cantly longer VLT durations than VS. In this study of initial voiced stops of Hindi,
however, the target words were embedded in an utterance medial position such th a t
the preceding environment was voiced, making the closure and voicing durations
coterminous. Therefore, from D u tta (2007) the relevance of CD could not be as
certained. These two studies, however, do provide evidence against the assum ption
th a t VAS are VS with a breathy / m urm ured release and the differences in VLT
are not significant towards m aintaining a contrast between these two stops. The
relevance of CD, however, remains to be studied.
Further, voicing is universally expected to lower f0 of the following vowel (House
and Fairbanks 1953, Hombert, Ohala and Ewan 1979). Studies have also shown th a t
f0 of the following vowel is reduced in both VAS and VS (Ohala 1979, Schiefer 1986).
These two studies also show th a t fo following the VAS is lower th an the fo following
the VS. The VLT durational differences between VAS and VS suggest th a t VAS

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

are phonetically less voiced than the VS. The fo patterns from the VAS and VS
however, do not reflect this fact. C ontrary to the universal tendency the less voiced
VAS reduces f0 more than the VS.
Purcell, Villegas and Young (1978) in a study of Panjabi tones suggest th a t the
allotonic low tonal1 characteristic of initial Hindi VAS correlates with the Panjabi
phonemic low-rising tone. Aside from the correlation of f0, this study shows th a t
Hindi VAS lower f0 in the following vowel. However, a lim itation of the study by
Purcell et al. (1978) was th a t it did not take into consideration durational aspects
of voicing in Hindi, i.e. VLT measurements. They also did not compare the effect of
all stop types on f0. Schiefers study also shows th a t VAS in Hindi show low tonal
Both these studies allow us to make the observation th a t Hindi VAS lower f0
even further than the VS. These observations raise im portant questions about the
nature of voicing in VAS and its relation to fo lowering.7
The standard assum ption that VAS and VS are contrasted bv the post-release
breath\7 / murm ured portion following VAS and modal phonation following VS has
not yet been shown to be an accurate assum ption based on acoustic properties (but
see Bali (1999) for a study of spectral tilt in Delhi Hindi medial stops). Furthermore,
it is unclear from previous studies if indeed voiced and voiceless aspiration can
be differentiated not only through spectral differences bu t also differences in the
duration of the aspiration phase for VLAS and the duration of breathy / m urmured
release following VAS. In the following section, I will outline research findings th a t
address the issue of VLT, fo lowering, and breathy / murm ured release following
'H ock (1986/1991) provides an explanation for th e greater tonal distinctiveness in P anjabi by
suggesting th a t a process of polarization leads to an enhanced tonal distinction following the loss
of aspiration in Panjabi.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


S tu d ies on th e effect o f stop ty p e on VLT

and fo

In D u tta (2007), I have addressed the question of the distinction in voicing durations
during closure and the lowering of f0 following VS and its further lowering following
VAS. However, in this study, only VLT was studied for VS and VAS. Thus a complete
cross comparison between the durations of the occlusions for all the four stop types
could not be established. The results from the study show th a t VAS are produced
with shorter durations of VLT th an VS (Fig. 1.3 below). Results from the analysis of
f0 in D u tta (2007) also show lower f0 values for the VAS. However, this study looked
only into the acoustic properties of the voiced stops (VS, VAS) and not the voiceless
stops (VLS, VLAS), hence it was not possible to ascertain the relevance of CD and
VLT independently. Additionally, the effect of the stops on the initial f0 of the
vowel was measured as the average fo of the first six pitch periods. Methodologically,
measuring the fo over six pitch periods, was a replication of the m ethod for measuring
initial fo perturbations in Schiefer (1986). Thus, it was unclear from the results in
Schiefer (1986) and D utta (2007) what the full extent of the fo perturbation is in
the larger domain of the following vowel. While measuring the average f0 in the first
six pitch periods provides an account of the absolute differences in f0 between the
VAS and VS, it fails to establish in relative terms the extent of the effect of the stop
type on the f0.8 Details of absolute and relative f0 measurements th a t were used in
this study are discussed in section 2.3.3.
Another observation th a t was made in D u tta (2007) was th a t the breathy por
tion following the VAS tended to perm eate into a sizeable portion of the vowel.
This observation was however not instrum entally verified. A mean six-cycle f0 mea
8Purcell et al. (1978) m easured fo in the following vowel at 10 consecutive points in th e vowel.
See section 2.3.3 for fu rth er details.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

surement did not adequately correlate in duration with the actual spread of the

Further, in terms of the phonetic im plem entation of the aspiration

contrast, it would also be interesting to compare and correlate the aspiration dura
tions in VLAS and the breathy portion in VAS. Spectral intensity analysis by Bali
(1999) shows th a t in Delhi Hindi, intervocalic VAS may be produced w ithout aspi
ration and th a t voiced aspirated tokens are produced with a larger open quotient,
wider first formant bandw idth and a steeper spectral tilt, while the voiced plosives
show the reverse glottal configurations. This goes to show th a t for discrimination
between intervocalic voiced aspirated and voiced stops, spectral intensity measures
may be especially relevant as also for distinguishing breathy vowels from modal ones
as in G ujarati following Fischer-Jorgensen (1967). However, at the moment there
are no studies on the glottal characteristics of word initial voiced aspirated stops in
Hindi. Further, the nature and extent of the breathy release following the VAS is
not clearly understood.




C o n s o n a n t T ype


C o n s o n a n t T yp e

50 __





P la ce o f A rtic u la tio n

P la c e o f A rtic u la tio n

Figure 1.3: Lower VLT and f0 values for voiced aspirated/breathy voiced stops
(Speaker HG)

Further, Ohala (1979), Schiefer (1986), and D u tta (2007) measured f0 in the
initial portion of the following vowel and not in the entire vowel (see Purcell et al.
(1978) for measurement of fo contours in the entire vowel), in the case of Schiefer

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

(1986) and D u tta (2007) only the mean f0 of the first six pitch periods was measured.
W hile this particular m ethod provides a fairly accurate account of the effect of stop
type on the initial f0 perturbation, it somewhat limits a complete understanding of
the nature of the effect of the stops on the fo of the following vowel. As has been
observed in D u tta (2007), the breathy release following the VAS tends to perm eate
deeper into the following vowel. In order to arrive at a comparison between the
spread of breathiness and lowering of fo, a new m ethod of f0 m easurem ent is pro
posed in this study (details in section 2.3.3). fo is measured at 10 equidistant points
following the release of the stop. Two m ajor goals are achieved by following this
m ethod. First, this m ethod allows us to normalize the f0 contour over variable vowel
durations by taking proportional measurements of fo values and second, the ten f0
values represent the fo contour in a way th a t proportional comparisons could be
made between stops. In addition, this m ethod also allows us to compare th e effects
of fo lowering with the spread of breathiness in the vowel following the VAS. In this
respect, this m ethod while facilitating a study of the effect on the initial portion
of the vowel also makes it possible to compare normalized f0 contours. M ethod
ologically, this measurement technique is somewhat similar to the one followed by
Purcell et al. (1978).
Based on this discussion of previous research, in the following section. I will
outline three prim ary acoustic issues th a t are key to understanding the nature of
the four-way laryngeal contrast in Hindi.


R esearch q u estio n s and ou tlin e

The discussion in the previous sections, from the phonetic observations of the San
skrit gram m arians to the body of research on the four-way stop contrast in Hindi
allow us to formulate several research questions. As mentioned in section 1.6 above,


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

three outstanding acoustic issues need to be addressed in order to arrive at a com

prehensive understanding of the nature of the stop contrasts in Hindi.
First is the issue of VLT and CD. Studies on VLT (Schiefer 1992, D u tta 2007)
have shown th a t VAS are produced with lower VLT th an the VS. Previous research
on the perception of voicing had also shown th a t listeners require longer VLT dura
tions for unambiguous perception of VS (Schiefer 1992, Schiefer 1986). One of the
prim ary research questions for this study was to investigate whether a parallel could
be found in the duration of closure (CD) in VLS and VLAS and secondly, it would
be necessary to examine the primacy of CD or VLT as features th a t help in cueing
the contrast between unaspirated and aspirated stops. In other words, it needs to be
seen w hether voicing during closure for voiced stops is an epiphenomenon w ith the
prim ary contrastive feature being the duration of closure rather than the atten d an t
voicing. In C hapter 3 of this dissertation, I will show th a t indeed such is the case.
VLS stops are produced with longer CD as compared to VLAS. Comparison of the
complementary distribution of CD and VLT also leads to the expectation th a t CD
is a relevant feature in cueing the contrast between the aspirated and unaspirated
stops. In section 3.5. I address the nature of the complementary distribution of
CD and VLT and argue th a t CD is a prim ary feature in cueing a contrast between
the aspirated and unaspirated stops, w ith voicing serving to enhance the contrast
between the aspirated and unaspirated stops. Here, I also discuss the relevance of
a theory of enhancement in possibly explaining the secondary relevance of voicing
during closure for cueing the contrast between voiceless and voiced stops.
Second is the issue of f0 lowering following voiced stops. Studies on the f0 in
the vowel following the stops have shown the VAS to lower f0 even further than
the VS (Ohala 1979, Schiefer 1986, D u tta 2007, Purcell et al. 1978). However, an
investigation of the extent of this f0 lowering in the following vowel needs to be
conducted to fully understand the effects of all the four stop types on the f0 and

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

not ju st the voiced stops. In chapter 4 of this dissertation, I address this particular
question and show th a t fo in VAS is lower till 20-30 percent of the following vowel.
T hird is the issue of breathy / murm ured release following the VAS. The standard
view on the VAS is th a t the VAS are VS th a t are produced with a breathy /
murm ured release. In this respect, one of the goals of this dissertation is to examine
the nature and extent of this breathy release. C hapter 6 of this dissertation is a
study of four spectral intensity measures th a t address the question of the nature and
extent of the breathy release following VAS. The results from the spectral measures
indicate th a t the reliability of specific measures is speaker dependent. However,
indirect measure of Open Q uotient (OQ) is the most reliable measure th a t helps
distinguish between the VAS and the unaspirated stops.
In addition to these three key problems, in this dissertation, I will also look
into the duration of aspiration in VLAS, in an attem p t to answer the question:
W hat are the differences in duration of aspiration, if any, between VLAS and VAS?
In chapter 5. this question is addressed through a discussion of the duration of
voiceless aspiration, while chapter 6 discusses the spread of breathiness following
the VAS based on the spectral properties of the following vowel.
Much of the previous research on V LT/CD is based on data elicited through
carrier sentences. Recent studies have shown th a t in spontaneous and controlled
speech data a consistent effect of prosodic/positional context is found on segmen
tal/d u ratio n al features (Cho and Jun 2000, Cho and McQueen 2005/4, Cole et al.
2007). Considering the complex articulatory configurations involved in producing
the four-way system of stop contrasts in Hindi, it would be of immense interest
to compare the prosodic effects of cue strengthening in Hindi with those reported
in other languages. Prosodic position is a constant source of variation in not only
fo features, but also durational and spectral features. In this dissertation, I will
attem p t to look at the scope of this variation and hope to relate the findings with

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

those of Cho and Jun (2000), Cole et al. (2007) amongst others.

Barring Shih,

Mobius and Narasim han (1999), who show th a t preceding context has a stronger
effect on segmental features than does th e following context, previous studies have
overlooked the im pact of contextual position. In chapter 3 of this dissertation, I
examine the effect of varying prosodic context on VLT and CD. The general con
sensus on the effect of prosodic context on segmental features is th a t prosodically
stronger positions tend to strengthen segmental features. In the current study the
m anifestation of prosodically conditioned strengthening is found in the lengthening
of VLT/ CD durations in the the absolute utterance initial position. All of the above
m entioned questions have been addressed through an acoustic phonetic study.
The methodological details of this study follow in chapter 2. In chapter 3, results
from the study of the durational features, VLT and CD are presented. Following
th a t, in chapter 4, results from the study of the effect of stop type on f0 perturbations
are presented. C hapter 5 presents the results from aspiration and vowel duration.
In chapter 6 results from the effects of stop type on the spectral features of the
following vowel are presented. Chapter 7 is a detailed discussion of the conclusions
from the studies and the implication of the results from these studies towards a
unified account of the contrast between the four stops in Hindi.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Chapter 2
E xperim ental m ethodology


In trod u ction

In this chapter details of the experim ental methodology used to address the issues
raised in chapter 1 are presented. In section 2.2, details of the experim ental design
are presented, followed by a discussion of the methods of acoustic analysis in section
2.3. In section 2.4, various statistical tests and procedures th a t were employed to
test the hypotheses are discussed.


E xp erim en ta l d esign
M a teria l

Recorded m aterial consists of Hindi word-initial voiced aspirated (VAS), voiced

(VS), voiceless (VLS) and voiceless aspirated (VLAS) stops in frame sentences (see
Appendix 1, for the frame sentences). Frame sentences were of three types, repre
senting three prosodic contexts/positions for the target word:
U tterance initial [U]
Phrase initial [I]
Phrase medial [M]


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

The phrase medial [M] context consisted of two sub-types, one in which the word
preceding the target word ended in [1] and another in which the word preceding the
target word ended in [s]. These contexts allow us to not only study the effect of
the phrase medial context on the segm ental/durational features but also to examine
the effect of a preceding voiced and voiceless environment on the voicing of the

Additionally, the context with a preceding [s] also allows us to measure

closure duration for the sto p s.' All the target words were of the type CVC, where
the initial C was the target stop.1 The vowel following the target stop was an /a :/.
The consonants following this vowel were [k, g, t, d, f, p, b, r, 1, d$] (see details in
Appendix 2). The material was presented to the subjects on index cards and they
were asked to read the sentences at a natural speaking pace. Each subject was given
a short 5 m inute pause between the required 3 recording periods.

2.2 .2

S u b je c ts

Five native Hindi speakers were recorded in a quiet environment in New Delhi,

Fifteen potential subjects were asked to fill out a Language Background

Q uestionnaire (Appendix 3).

The LBQ was designed to ascertain the linguistic

background, exposure, level of contact, and proficiency in other languages of the


Five speakers were then selected from these fifteen potential subjects

based on their level of exposure to other languages. Potential subjects who reported
in the LBQ th a t they had substantial knowledge or contact with Panjabi were not
selected for the study.

Although a substantial number of Hindi speakers in the

vicinity of Delhi also speak Panjabi, they tend to consider themselves to be native
Hindi speakers. Since Panjabi does not exhibit the same four way stop contrast
as does Hindi (initial VAS in Hindi correspond to VLS with a low rising tone in
^ n e w ord for th e retroflex voiceless aspirated stop was of the type C V C V since a suitable
lexical item w ith C V C stru ctu re is absent from th e lexicon.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Panjabi) only subjects who reported Hindi to be their native language and also that
of their parents were selected.

2 .2 .3

R eco rd in g

All of the recordings were made using a head-mounted AKG C 420 III pp micro
phone. All of the recordings were made onto a TASCAM DA P I DAT recorder.
The recordings took place in the music room of the K atha K hazana school located
at Govindpuri in New Delhi. India.2 These recordings were then digitized at the
D epartm ent of Linguistics phonetics laboratory using Com puterized Speech Lab
(CSL) at 22050Hz.


A co u stic an alysis

Acoustic analysis of the d a ta was done using P raa t (version 4.4).3 The acoustic
analysis of the digitized d a ta involved segmentation, annotation and measurement
of the relevant acoustic cues. These acoustic cues included:
1. D urational/tem poral cues such as Closure D uration (CD). Voicing Lead Time
(VLT), aspiration duration and vowel duration (see section 2.3.1 and 2.3.2 for
2. Temporally normalized fo contours (see section 2.3.3 for details)
3. Four measures of spectral tilt (see section 2.3.4 for details)
2I am deeply indebted to th e P rincipal of K a th a K hazana school for allowing th e p articip an ts
in this stu d y to take tim e off from th eir busy teaching schedules and I am grateful for the help
and cooperation th a t was extended to me by all of the participants.
3 1992-2005 by P au l B oersm a and D avid W eenink


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

The segm entation procedures involved identifying the beginning and end of seg
m ental features such as oral closure, closure voicing, duration of aspiration for VLAS
and vowel duration. Following the segm entation of the data, segmented portions
of the acoustic signal where annotated to identify the segments.

These annota

tions included identifying the place of articulation, stop type and context of the
target word. M easurement of the segmented portions was conducted by employing
autom ated P ra a t scripts.
Inter/intra-speaker and token variation in f0 contours was normalized by taking
10 measures of fo starting at 10 percent of the vowel and ending at 100 percent of the
vowel in 10 percent vowel duration increments.4 In D u tta (2007) fo was measured
as an average of the fo in the first six pitch periods following Schiefer (1986). The
m ethod used in D u tta (2007) and Schiefer (1986) provides a good estim ation of
the absolute fo perturbation in the initial six cycles, w ithout providing a complete
understanding of the nature of f0 perturbation throughout the entire contour in the
vowel. D u tta (2007) and Schiefer (1986) provide methods with which segmental
effects on initial f0 can be measured, however, in the current study fo measurement
at 10 consecutive points provides for a normalization of the f0 contour over the entire
duration of the vowel. The norm alization procedure is necessitated by the fact th a t
in this study generalizations about the fo contour and shape are being made over
several different prosodic, segmental and vocalic conditions. A similar approach was
followed by Purcell et al. (1978). where they measured the f0 in the following vowel
at 10 consecutive points following the burst so as to provide a relative measure of f0
change over the entire stretch of the vowel, starting at 0 percent of the vowel. Due to
4V ariation in th e production could be a ttrib u te d to several causes. Inter / in tra speaker vari
ation could be a result of speech ra te and token frequency of the targ et words. In order to be
able to m ake generalizations across speakers and also to minimize the effects of variation in vowel
length, norm alization of fo was conducted.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

unstable nature of fo in the periods following the burst, in this study fo perturbations
were studied from 10 percent of the vowel and onwards till 100 percent.
The vowel duration over which the spectral measures were taken was also nor
malized by taking the measurements for the four spectral measures at five points in
the vowel, starting at 10 percent and ending at 90 percent of the vowel in 20 percent
vowel duration increments. Detailed procedures are presented in section 2.3.4.


C lo su re d u r a tio n an d VLT

Closure D uration (CD) was measured for the phrase medial [M] and phrase initial
[I] conditions for the voiceless stops (VLS and VLAS) and VLT was measured for
the voiced stops (VS and VAS). The beginning of the occlusion in the U tterance
Initial [U] condition can not be segmented in the acoustics, therefore CD was mea
sured for the voiceless stops only in the phrase medial [M] and phrase initial [I]
conditions where the segmental context preceding the target words provide clues to
wards segmenting the beginning of the occlusion in the target words. In the phrase
medial condition with preceding [s] the cessation of frication was considered to be
the onset of the CD in cases where no audible pause could be detected. However, a
sizeable number of utterances produced by the speakers exhibited pauses between
the preceding context and the target word. The pauses could have been caused by
two factors. First, a number of nonce words were used in this study to complete a
full set of CVC words. Speakers being unfamiliar with these nonce words tended
to pause between the context preceding the target word and the target word itself.
Second, speakers in some cases paused between the preceding word and the target
word due to hesitation, in effect imposing an intonation p attern reserved for use
in uttering words in isolation. A uditory judgement on the appropriateness of these
utterances was made for these cases. Similar auditory judgem ents were used for the
phrase medial [M] condition with preceding [1] and phrase Initial [I] condition with

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

preceding [e]. U tterances where auditory evaluation led us to confirm th a t a pausal

break existed between the target words and their preceding environments where not
annotated with a CD.
In order to confirm th a t the auditory judgem ents were robust in identifying
naturally intonated sentences w ithout a pausal break, a subset of the d a ta th a t was
not annotated with CD was also examined. This subset included dental stops th a t
occurred in the phrase medial [M] condition with a preceding [s]. The cessation
of the frication in the preceding context to the target word provided an adequate
environment where the noise in the frication of [s] contrasted w ith the absence of
signal in the closure and the pausal break. Beginning of the CD was then annotated
from the end of the frication in [s] till the visible burst of the target word. This
relatively long duration of CD could include the duration of closure and pausal
CD annotations are exemplified in

2.1, panels A and B. As can be seen in

Panel A In the figure below, V represents the duration of the vowel and the domain
over which vowel duration and f0 measurements were taken and H represents the
annotated duration over which spectral measures were taken. Tiers 3 and 4 were
used to annotate the lexical item and the prosodic context in which the lexical item
Voicing Lead Time (VLT) measurements involved measuring closure voicing from
the initiation to cessation before burst as shown in Fig. 2.2.

2.3 .2

A sp ira tio n an d vow el d u ra tio n

The duration of aspiration was measured for voiceless aspirated stops (VLAS) from
the offset of the burst till the onset of voicing as seen in Fig. 2.3. In most cases
this coincided with the appearance of the second formant. However, the cessation
of voiceless aspiration was seen in most cases also to coincide w ith the appearance

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


Figure 2.1: Closure D uration (CD) measurements: Panel A, auditory evaluation

confirms long pause between [la:l] and [khad]. Panel B, CD was annotated between
[lad] and [t^ada]

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


Figure 2.2: VLT measure: From cessation of final [1] in [la:l] to appearance of burst
in [ga:d]

of sinusoidal waveforms. Therefore, the boundaries of the annotated duration of

aspiration were marked by careful observation of both the waveform and the spec
trogram. The offset of the burst and the onset of voicing signaled the onset of the
vowel for VS, VAS. and VLS, while the cessation of aspiration and onset of voicing
signaled the onset of the vowel for VLAS. For VLS, VS, and VAS, vowel duration
was segmented from the offset of the burst to the offset of the vowel. For VLAS,
the annotation procedure involved segmenting the aspiration portion following the
burst and annotating the beginning of sinusoidal waveforms following the aspiration
at the beginning of the vowel. The total vowel duration for the VLAS, therefore,
included the aspiration duration as well. This procedure was followed so as to be
able to examine the relative duration of aspiration as compared to the to tal vowel


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

10 20 30 40 50 60 70 SO 90 100

Figure 2.3: A spiration duration measure: From burst till appearance of sinusoidal
waveform, [tha:l]. Points at which fo measures were taken for VLAS are also shown
in Tier 2.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

2 .3 .3

fo a n a ly sis

As discussed in section 1.6 above, effect of stops on the f0 of the following vowel was
studied on the initial portion of the vowel following the stop burst (Schiefer 1986,
D u tta 2007).

Methodologically, both of these studies were similar in measuring

the average fo in the first six pitch periods. This method, while being relatively
accurate is not able to dem onstrate the full extent of the f0 perturbation which
is affected by the stop manner. Further, as stated above, one of the goals of this
study was to examine the effect of the stop m anner on the f0 curve itself. Therefore,
for the purposes of this study the f0 analysis m ethod involved taking f0 measures
following the target stops at ten equidistant points after the release of the stop in
the following vowel. These ten points correspond to 10, 20, 30, 40, 50, 60, 70, 80,
90 and 100 percent duration of the vowel. This m ethod allows for the possibility of
representing the f0 contour as a function of 10 consecutive f0 values and also provides
a way of normalizing the fo contour over varying vowel durations.5 Additionally, as
m entioned in section 1.6, this m ethod allows an indirect comparison of the extent
of f0 lowering with the extent of the breathiness spread following the VAS.
While the beginning of the measurements for VS, VAS, and VLS coincided with
the cessation of the burst, in the case of VLAS, fo was measured from the offset
of the aspiration and onset of periodic vibrations in the waveform. The rationale
behind following this m ethod is the absence of voicing during the aspiration duration
in VLAS, which makes it impossible to measure fo in the aspiration portion.

2 .3 .4

S p e ctra l m ea su re m e n ts

Ideally, glottal airflow measures are taken using direct measures of airflow. However,
since articulatory experiments on the glottal airflow could not be conducted, acoustic
ln th is respect, th e m ethod applied here is sim ilar to the m ethod used by Purcell et al. (1978).


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

measures have been used in this study to arrive at indirect measures of estim ation of
the glottal characteristics of the stops in Hindi. Spectral intensity measures based
on inverse filtering of the acoustic signal provide an indirect m ethod of studying the
effect of the glottal source on the acoustic signal (Bickley 1982, Fischer-Jorgensen
1967, Ladefoged and Antonanzas-Barroso 1985). Differences between the am plitudes
of Hi, H2, A i , A2, and A3 have been shown to correlate with differences in phonation
type within vowels (Hanson 1995, Hanson, Stevens, Kuo, Chen and Slifka 2001,
Wayland 1998). In this study four measures of spectral tilt were taken th a t included
difference between the amplitudes of H ;-H g, H /-A ;, H ; -A 2 and H ;-A 5. These
spectral intensity measures together provide a measure of spectral tilt.
Due to the high energy in the harmonic component for VAS, it is expected th a t at
least till 20-30 percent of the vowel the difference between H^-Hg will be sufficiently
high when compared to the VS and the VLS. This measure of spectral tilt is also
indirectly a m easure of Open Q uotient (OQ). OQ refers to the ratio between the
opening and closing gestures of the vocal folds, in the case of VAS. at least in the
initial portion after the stop release, the vocal folds are open for a longer duration
than they are closed, leading to a higher OQ.
II ,-A , refers to an estimation of the first form ant bandw idth, in the case of
VAS, it is expected th a t the first formant bandw idth measures based on the spectral
intensity measure, H ;-A ; , will be higher th an the other stops (Bali 1999).
H /-A 2 provides an estim ation of the skewness of the glottal source. Typically the
opening phase of the vocal folds tend to be longer th an the closing phase (Ni Chasaide and Gobi 1997) and acoustic consequence of this is seen in a boosting effect
of the lower harmonics if the shape of the glottal pulse is more symmetrical.
H ;-A 5 is an acoustic measure of overall spectral tilt. Due to the higher energy
in the lower harmonics and dropping off of energy in the higher formants for the
VAS, it is expected th a t in the initial portion of the vowel following the VAS, the

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

overall spectral tilt represented by difference between Hj and A 5 should be be fairly

large in comparison with the other stops. VLAS stops are aspirated between 10
and 30 percent of the vowel. This implies th a t there is no fundam ental/harm onic
component present during this portion of aspiration. Due to this reason, comparison
between the stops on the basis of the spectral tilt measures was restricted to the VLS,
VS, and VAS. All of the spectral measures used in this study are direct measurements
of spectral intensity differences between the fundamental, the second harmonic,
and the am plitudes of the first, second and third formant peaks. Therefore, only
comparisons between the VAS and the unaspirated stops are made for all the four
spectral measures.
The measurement points for these acoustics measures coincided w ith 10, 30,
50, 70, and 90 percent duration of the vowel as seen in Fig. 2.4 below. The spec
trum was calculated over a 30 ms period.

Since the average duration of vowels

varied between 200-300 ms, spectral measures were taken over 5 equidistant points
rather than 10 points as in the f0 measurements. Spectral measurem ents taken at
10 equidistant points with a spectral window of 30 ms would provide values from
overlapping consecutive portions of the vowel. In order to get measures th a t were
from non-overlapping portions of the vowel, spectral measurements were taken from
5 equidistant points in the vowel.
The measurements were taken with a window length of 0.005 ms for the spec
trogram , the F I, F2, and F3 references points were based on the formant values for
vowel [a] and these formant values were adjusted for male and female speakers. A
spectral window of 30 ms was used for each point where measurements were taken
from the spectrum . Spectral peaks were measured by comparing the spectrum , Long
Term Average Spectrum (LTAS) and the LPC spectra as shown in Fig. 2.5 below6.
The bottom panel of Fig. 2.5 also shows the peaks at which the am plitude mea6LTAS is a representation of th e spectral density as a function of frequency, expressed in dB /H z.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.






_____________________________________ I______ I______ I______ I______ I________


Figure 2.4: Percentage duration of vowel for spectral measurements

sures were taken. The d ata was collected by deploying a P ra a t script, originally
developed by B ert Remijsen for capturing spectral moments of steady state vowels.
The script was modified to take measures at the five equidistant points. The five val
ues for each of the acoustically determined spectral measures provide an estim ation
of the glottal characteristics and are perhaps not as accurate as actual articula
tory airflow m ethods, nonetheless, these acoustic measures have been fairly well
correlated w ith articulatory causes (Hanson et al. 2001, Stevens and Hanson 1994).


S ta tistica l analyses

One-way ANOVAs were done to test the significance of independent factors such as
Stop Type, Context, Place of Articulation, and Subject. Post-hoc Tukey tests were
conducted to help categorize the d ata into subsets. M ultivariate tests provided the
results for significant interactions between the various independent variables and
also the effect of each of the independent variables on the dependent variables.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


Tracker output - FI: 111 -

F2: 1603

3500 -


- .........................----


Time (s)


Spectrum [30 m s], L ta s(l-to -l) [30 m s], LPC(autocorrelation), all three overlaid






0 H1H2500


Frequency (Hz)

Figure 2.5: Spectral measures: Spectrum, LTAS and LPC analysis

Post-hoc tests were used to confirm the effect of the independent variables on the
outcomes of the dependent variables and also to categorize the pattern s into distinct


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

C hapter 3
Closure and voicing in Hindi


O u tlin e

The goal of this study is to examine the tem poral/durational aspects of the stops
in Hindi. Studies on VLT (Schiefer 1992, D u tta 2007) have shown th a t VAS are
produced w ith lower VLT than the VS. One of th e prim ary research questions for this
study was to investigate whether a parallel could be found in the duration of closure
(CD) in VLS and VLAS. If indeed it is the case th a t CD and VLT durations show a
com plem entary distribution, then a claim could be made towards the primacy of CD
as a defining and contrastive feature. Evidence from this study suggests th a t VLS
are produced with longer CD compared to VLAS. As has been mentioned in section
1.7 of chapter 1, the relevance of VLT and CD need to be looked at comparatively.
In this chapter. I will show th a t VLT and CD in Hindi stops, in their durations,
are correlated in such a way th a t VLT for VS and CD for VLS stops tend to be
longer than the VLT in VAS and the CD in VLAS. In the following sections I will
discuss the results from this study th a t show a systematic relationship between CD
and VLT.
Closure durations are affected by both the stop type and place of articulation,
in the case of VLT, it has been shown th a t stop type has an effect on VLT du
rations, w ith VS showing longer VLT durations th an VAS. Further, CD measures
for VLS and VLAS stops show th a t VLS stops have longer CD than VLAS. These
observations suggest th a t in Hindi voiced stops (VS, VAS) and voiceless stops (VLS,

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

VLAS) the duration of voicing during closure and the duration of closure, respec
tively, show similar patterns. U naspirated stops tend to have longer CD and in
comparison aspirated stops tend to have shorter CD. These findings suggest th a t
CD shows a parallel or complementary distribution with respect to the stops being
aspirated or unaspirated.
These results from C D /V LT provide evidence against the standard view on con
trasts in Hindi, as discussed in section 1.4 of chapter 1. The results from the VLT
durations in conjunction w ith the results from CD show th a t aspirated stops have
comparatively shorter closures compared to the unaspirated stops. This observa
tion also provides evidence against the standard assum ption th a t the voicing during
closure and closure in and off itself are non-distinctive. Based on the evidence from
VLT and CD and SchiefeUs observation th a t listeners prefer longer VLT durations
for the unambiguous perception of VS, it can be concluded th a t closure durations
could play a contrastive role in making the four-way distinctions in Hindi.
This chapter is organized as follows: In section 3.2, I discuss the results from
closure durations in VLS and VLAS. Following this, in section 3.3. I present results
from the investigation of those tokens th a t were produced with a pause between the
target word and the preceding context. Details of the m ethod employed to measure
the CD with pauses is discussed in section 2.3.1. In section 3.4, I present results
from the VLT durations for VS and VAS. In section 3.5, I correlate the findings
from the CD and VLT durations and propose th a t duration of closure serves as an
contrastive feature. Section 3.6 is a sum m ary of the relevant results and conclusions
th a t can be drawn based on the results.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


D u ration al properties: C losure d u ration

As described in C hapter 2, d ata for this experimental study was collected from
three prosodic conditions, namely, utterance initial [U] position, phrase initial [I]
and phrase medial position [M], Additionally, in the phrase medial condition the
target words where modified by adjectives, [la:l] and [lVa:s].

This environment

allows us to contrast a voiceless preceding environment with a voiced environment.

Closure duration (CD) d ata was collected for voiceless and voiceless aspirated stops
1. Since closure in voiceless stops cannot be measured at utterance initial position
from the speech waveform it was collected from two prosodic contexts, phrase initial
[I] and phrase medial [M],
Subject GA was unable to produce a labial voiceless aspirated stop. Synchronically, Hindi is undergoing a change where the canonical labial VLAS is merging with
[f], the voiceless labio-dental fricative. This subject reflected the change and only
results from dental, retroflex and velar voiceless stops are reported here. A ddition
ally, this subject produced all of the VLAS stops in the phrase medial[s][M] context
with considerable pause between the modifier [kha:s] and the target words. Due to
this. CD measurements in this context could not be annotated and measured.
A univariate ANOVA with CD as the dependent variable and place of articulation
(POA), prosodic context and stop type (VS, VLAS) as factors shows no effect of
POA on the CD. An F-test reveals th a t F(2,15)=2.373 has a p=0.188. POA was
subsequently removed as a factor from the model. Prosodic context did not have
a significant effect on CD, pairwise comparisons, however did reveal a significant
difference between the Phrase Medial ([s][M] and [1] [M]) and Phrase Initial [I] context
(p0.029 and p=0.046, respectively). Noting th a t only CD from VLS stops could
d u r a t i o n a l m easurem ents are expressed th ro u g h o u t this study in ms


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

be measured for the Phrase Medial [s] [M]. CD for VLS stops was significantly longer
in this context than in the other two contexts. There was no significant difference
in CD between the Phrase Medial [1] [M] and Phrase Initial [I] context. Significant
differences in CD obtained between the stop types (VLS and VLAS). VLS had
longer closures than VLAS. An F-test reveals th a t F(l,15)=34.720 is significant at
p=0.002. While POA had no effect on the CD, a marginal effect of prosodic context
was found in addition to the longer duration of closure for VLS compared to the
Subject PB showed a significant effect of POA on CD, F(3, 145)=26.923, p<0.005.
This effect could be attrib u ted to the significant difference in marginal means th at
obtained between the velar and the three other places of articulation (labial, dental
and retroflex,p<0.005). The velar stops being produced with significantly short CD.
There was no significant difference in CD between the labial, dental and retroflex
stops (Fig. 3.1 below). A significant effect of context was seen on CD. The Phrase
Initial [I] context resulted in higher CD than the other two contexts (Phrase Medial
[s][VI] and Phrase Medial [1] [Mj). F(2,145)=45.76 with p<0.005 (see 3.3). Significant
effect of stop type was found on CD. with VLS having significantly longer closures
than VLAS, F(l,145)=79.36. p<0.005 (see Fig. 3.2). Post-hoc Tukey's HSD test
could only be conducted for POA and prosodic context, since there were fewer than
three comparable stop types.

The post-hoc tests confirm th a t these results can

be used to significantly categorize the POA differences in mean into two homoge
nous subsets. Subset 1 consists of velar and Subset 2 consists of labial, dental and
retroflex stops. Similarly, for context, subset 1 consists of Phrase Initial [I] and
subset 2, Phrase Medial [s][M] and Phrase Medial [1][M] contexts. In summary, for
2Only 10 tokens of VLS and 5 of VLAS were m easured for this subject, owing to long pausal
breaks betw een th e preceding w ord and the target word. These results, therefore, reflect the low
num ber of tokens m easured.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

subject PB, velar stops are produced with significantly shorter CD and the Phrase
Initial [I] context has an effect on CD, such th a t CD in this context are significantly
longer than the Phrase Medial contexts..

l i ^

-q.^-1 I *-'** 1

i i
P l a c e o f A r tic u la tio n


P l a c e o f A r tic u la tio n

Figure 3.1: Effect of place of articulation (POA) on Closure D uration (ms)

Subject RM showed a significant effect of POA on CD, F(3, 156)=3.397, p<0.05.
Significant difference in marginal means were obtained between the velar and labial
stops at p<0.005 and dental and labial stops at p<0.05 (see Fig. 3.1).


cant difference at the p<0.05 level were obtained between the Phrase Medial [s][M]
and Phrase Medial [1] [M] contexts, in th a t CD in the Phrase Medial [s][M] context
were significantly longer than in the Phrase Medial [1][M] context (F(2,156)=4.924).
M arginally significant differences were also seen between the Phrase Medial [1] [M]
and Phrase Initial [I] context at p<0.05 level (see 3.3). CD for Phrase Medial [1] [M]
context were significantly shorter th an CD in Phrase Initial [I] context. There were

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

no significant differences between the Phrase Medial [s] and Phrase Initial [I] con
text. Stop type had a significant effect on CD, in th a t VLS stops had significantly
longer CD than VLAS stops (F(l,156)=6.366). This difference was significant at
the p<0.05 level (see Fig. 3.2). Post-hoc Tukeys HSD tests showed th a t velar stops
where significantly lower in CD th an labial stops, while the mean differences in CD
between Phrase Medial [s] [M] and Phrase Medial [1] [M] contexts were significantly
different, with mean CD for Phrase Medial [s][M] context being longer th an CD in
Phrase Medial [1][M] context. Overall, labial stops had longer CD th an velars and
in the Phrase Medial[s][M] context the CD tended to be longer th an in the Phrase
Medial [1][Mj context.

3CO 0 0 -

, i i

.... s r " * i
B iB liB S U B B M

- I

................. {

r- iK M

------- 1-------

1------- '------- 1

1------- ------- r

Figure 3.2: Effect of stop type on Closure D uration (ms)


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Subject SD showed a significant difference between velar and dental stops in

CD (overall F(3,154)=2.701). Velar stops showed shorter CD th an the dental stops
and this difference was significant at p<0.005 level (see Fig. 3.1). The effect of
prosodic context was such th a t the mean differences between the Phrase Medial
[s][M] and Phrase Medial [1] [M] were significant at the p<0.005 level with CD for
Phrase Medial [s][M] context being longer th an the Phrase Medial [1] [M] context
(F(2,154)=5.512) as can be seen in Fig. 3.3. In addition, significant difference in
mean obtained between Phrase Medial [s][M] and Phrase Initial [I] contexts, again,
with CD in Phrase Medial [s][M] context being longer. A significant difference in CD
was also seen between the stop types at the p<0.05 level (F(l,154)=4.438, p=0.037),
with VLS showing longer CD than VLAS (see Fig. 3.1). A post-hoc Tukeys HSD
comparison showed th a t only the CD differences between the velar and dental stops
were significant, while the Phrase Medial [s][M] differed in mean CD from the other
two contexts.

Place of articulation had a significant effect on the CD for subject SV, F(3,143) = 10.
p<0.005 (see Fig. 3.1). The velar stops were significantly shorter than the labial
and retroflex stops. Differences in marginal means of CD were significantly different
for all contexts. The Phrase Initial [I] context showing longer CD than the Phrase
Medial [s][M] context, and the latter being longer than the Phrase Medial [1][M]
context (F(2,143)=24.629, p<0.005). as shown in Fig. 3.3. Post-hoc tests revealed
th a t the differences were categorical between these three contexts. Stop type, like in
the case of the other subjects, had a significant effect on the CD. VLS stops showed
significantly longer CD than VLAS (F(l,143)=15.436, p<0.005).
In summary, we observe th a t the CD measures show th a t CD for VLS is sig
nificantly longer than VLAS (Fig. 3.2). Overall subjects, the velar stops show the


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


5 0 OC

1 M H
f|g|lB| f l M i

|ifjppj[ i l p i

-l^ rrj^ rX n



: l n
0) >0000


T H i
jigiggi a i IM D "r -1

SO00 1

P h r a s e M edia! [s] j'Mj

P h r a s e M e d ia l [sj [Mi

P h r a s e M ed ial [r [M]

P h f a s e M ed ia! [1] [Mj

P h r a s e India! [I]


^ h r s s e Initial


Figure 3.3: Effect of Context on Closure D uration

shortest CD durations compared to stops from other places of articulation (Fig. 3.1).
The effect of context on CD is less clear, despite the fact th a t over all subjects the
Phrase Initial [I] context tends to show longer CD. Differences between subjects do
exist and hence a generalization is difficult to make on the basis of these data.
Velar stops show significantly short CD. In order to test whether these short
durations where indeed responsible for the over all short difference in CD between
VLAS and VLS, velar stops where excluded from the model. W ith VLS and VLAS
from ju st three places of articulation significant effect of stop type on CD was
obtained (F (l,447)24.38). This p attern can be seen in 3.4. Excluding subject
GA, three speakers show significant difference between the CD for VLS and VLAS.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.




____ 1____


fffflfflW W flP I



" 1


- L

a m

il 9 li2 illi

m m m

1 1 1

~ r
BB S M H a n

S S 3 8 H IM E E i
H B 5


~l------------- 1------


Stop Type

1------------- 1


Stop Type

Figure 3.4: Effect of Stop type on Closure Duration: Excluding velar stops


C losure du ration w ith pause

As mentioned in section 2.3.1 a sizeable number of utterances produced by the

speakers exhibited pauses between the preceding context and the target word. Two
probable causes could be responsible for these pauses. One. a number of nonce words
were used in this study to complete a full set of CVC words. Unfamiliarity with
these words may have caused the speakers to pause between the context preceding
the target word and the target word itself. Two, pauses could have been introduced
due to speakers exhibiting hesitation in completing the experimental task. This
hesitation could have led the speakers to produce an intonation boundary between
the preceding context and the target word which in tu rn would have resulted in
an intonation contour reserved for words in isolation. A uditory judgem ent on the


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

appropriateness of these utterances was made for these cases.

Similar auditory

judgem ents were used for the phrase medial [M] condition with preceding [1] and
phrase Initial [I] condition w ith preceding [e]. Utterances where auditory evaluation
led us to confirm th a t a pausal break existed between the target words and their
preceding environments where not annotated with a CD.
In order to confirm th a t the auditory judgem ents were robust in identifying
naturally intonated sentences w ithout a pausal break, a subset of the d a ta th a t was
not annotated with CD was also examined. This subset included dental stops th a t
occurred in the phrase medial [M] condition with a preceding [s]. The cessation
of the frication in the preceding context to the target word provided an adequate
environment where the noise in the frication of [s] contrasted w ith the absence of
signal in the closure and the pausal break. Beginning of the CD was then annotated
from the end of the frication in [s] till the visible burst of the target word. This
relatively long duration of CD could include the duration of closure and pausal break.
As mentioned in section 2.3.1. auditory judgem ent was used to identify tokens th a t
d id n 't exhibit a pause between the preceding [s] condition and the targ et word.
These measurements will confirm th a t the auditory judgement used to identify the
tokens which didnt exhibit a pause between the preceding context and target word,
V vroc

in ific Qytirriof!
c u n v m j 111 iuo
oiiiio. bivJii.

The general pattern of the distribution of CD (with pause) for VLAS and VLS
for three of the five subjects (PB. SD, RIM) suggests th at CD (with pause) for VLS
is longer than the CD (with pause) for VLAS. As can be seen in 3.5, the median CD
(with pause) values are consistently longer for VLS compared to VLAS in the phrase
medial [s] condition for dental stops. Thus the patterns of CD durations for target
words preceded by pauses and target words th a t are naturally uttered are similar
for at least three subjects. This finding confirms th a t the auditory judgem ents used
to select tokens for which CD measures would be taken are accurate in identifying

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

tokens w ithout pauses.


Figure 3.5: Effect of Stop type

[s] condition; dental POA



Closure D uration (with pause): U tterance medial


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


D u ration al p roperties: V oicing lead tim e

(VLT) d u ration s

All subjects show a significant effect of stop type on VLT durations.

VS show

significantly longer VLT than th e VAS (Figures 3.6-3.10, Panel A) for all speakers.
This p attern can be seen in the box plots (Panel D), where the VLT values for VS
tend to be higher when compared to the VLT values for VAS. The box plots also
show th a t context has a significant effect on VLT. The U tterance Initial [U] context
tends to have higher VLT values than the Phrase medial contexts unambiguously
for GA, PB and SV, while the RM and SD do not show this p attern consistently.
GA and PB show higher VLT for the Phrase Initial [I] context th an the phrase
medial context. Thus initial contexts, like the [U] and [I] conditions tend to have
longer VLT than the medial contexts.
Pairwise comparison of means shows th a t for speaker GA, the mean difference
in VLT between the U tterance Initial [U] position and the Phrase medial positions
is significant at p<0.05. The Phrase Initial [I] context significantly differs only from
the Phrase Medial [1] context.
Speaker PB shows a significant difference between the U tterance Initial [U] and
the Phrase Medial positions; VLT in the U tterance Initial [U] context is higher than
in the phrase medial contexts. A similar p attern emerges for the Phrase Initial [I]
position and in this position as well, the VLT values are higher than in the phrase
medial context.

The mean differences in VLT between the U tterance Initial[U]

and Phrase Initial [I] are not significant. The differences within the phrase medial
position between [1] and [s] are significant with [1] position showing shorter VLT
than the [s] position.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

C o n te x t

U n i v a r i a t e T e sts-E ffe ct o f S t o p T y p e ( P u n e l A)
Dependent Variable: V II
.Sum ut'Squares


Mean Square


3352 VOX?

3)523 ox?


K4094 807



s '8

54 6 n

The T tests the elfeci oTSlop Type. This test is based on the linearly

independent pairwise comparisons among the estimated marginal means.


ion no


t l n i v n r i f l t r T c s t s - E f f e c l o f P l a c e o f A r t i c u I h t i o n ( P a n e l B)

Sum ofSquares


Mean Square







] *17|

Stop Type
(Panel D)


The I lests the elfccl o | Place o f Aniceilitis>ti. I his test is based on the
linearly independent pairwise comparisons among the esiimtncd marginal

M a i n e f f e c t s a n d i n t e r a c t i o n s ( P a n e l E)
_^)cj>etulen^V ^riabttA n^
Typc III Sum
( nrrccted Model

Sum <'| Squares

3 502


1370 570




54 613


3708 (115

902 672

1 471


1 1503.190

3X14 397



(.5X1 059

731 295

1 191



149 436



POA * Conu-M

2633 5K6

242 621



Stop Typ * IOA




1 rn>r


1 17

61? 8 l|






.Stop Typ IOA Context


Mean Square


11503 190

3834 397


84(194 SOT


hi 3 831

6 7.4 7

I lie I- tests ihc eEvt ofC.ontexl T in s test is based on the linear!'1

independent pmrvise comparisons among the estimated marginal means


84)298. l 0

Dependent Variable: VLT

Mean Square

Stop_ 1 yp

Stop Tyft Coiileu





U n iv a r ia te T es ts -K ffe c t o f C o n t e x t ( P a n e l ( )




84129ft 11)6


Corrected 1 otttl

11 R Squared- 442 tAdiuslcd R Squared A) f>i

Figure 3.6: Speaker GA: ANOVA results for dependent variable VLT and independent variables, stop type (Panel A), place
of articulation (Panel B), context (Panel C), Horizontal lines in boxes (Panel D) represent median values and Sum m ary of
effects and interactions (Panel E)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

C o n te x t
IB litte r a n o f-

Univariate Tests-Effect o f S to p T y p e (P anel A)

Dependent Variable1 VI.T
Sum ofSquarcs


Mean Square




c ,

72217 047






The F tests the effect of Stop Tvpe. T his test is based on the linearly
nidepemlenl pairwise comparisons among the estimated marginal mean

initial [I1}

P h ra se M edial [> ] (M]

Pbra=?e Ma.-ii.-il [I] [M]

P h r a s e Inrttal ((]


U n i v a r i a t e T c s t s - F . f f e c t o f P l a c e o f A r t i c u l a t i o n ( P a n e l B)

Sum ofSquares


Mean Square





7221 7 047


7t>fl 454

1 1 496


Stop Type
(Panel D)

1 he F lesls the effect ofl'luce of Articulatioin 1 his test is based on the

linearly independent pair" ise cotn pan so n s :ltnnng, the estimated marginal

M a in e f fe c t s a n d in t e r a c t i o n s ( P a n e l E)



T ype III Sum

of Squares'
Corrected Model

Univariate T ests-Effect o f C o n tex t (P anel C)

Stop l yp
Dependent Variable: VL T
Sum ofSqtmres





Mean Square



368.4 '4

The V tests thectfcel of Context. T his test is based on the linearly

independent pain* ise comparisons among the estttnaicd marginal n


11 186 2.(400"


Mean Square














4235 779






S r o p J yp * POA Conte t

2231 D P

247 891



Stop I yp * Context





POA * Context

5101 188

566 799




309 371



368 454

Slop T yp * POA

Corrected 1 otal

72217 047






L R Squared- 60S (Adjusted R Squared " .546)

Figure 3.7: Speaker PB: ANOVA results for dependent variable VLT and independent variables, stop type (Panel A), place
of articulation (Panel B), context (Panel C), Horizontal lines in boxes (Panel D) represent m edian values and Sum m ary of
effects and interactions (Panel E)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

C o n te x t

U n i v a r i a t e T e st s-E ff e c t o f S t o p T y p e ( P a n e l A)

Dependent Variable. VLT

Sum Dl'Squitrcs


Mean Square

8092 062


61865 452


265 51?


U tte r a n c e initial ju j

H p h r a t e M edial ic jiM ]
S l 'T i r a e e M edial ii] |Mj


H ^ h r - a s e Initia1 [Ij

T he f texts the elfcct o f Stop T vpe. This tesi is based on the Imeaflc
independent p;n' ire eumparisuns among the estimated marginal means

U n i v a r i a t e - E f f e c t o f P l a c e o f A r t i c u l a t i o n ( P a n e l B)

Dependent Variable. VI I
Sum ofSquarcs


1976 192


Mean Square

7 447



Stop Type
(Panel D)

265 517

T he F tests the diet olTlacc o f Articulation. This test is based on the

linear!; independent pam usc comparisons among Ihe estimated marginal

M a i n e f f e c t s A n d i n t e r a c t i o n s ( P a n e l E)
Dependent Variable. VLT_____________________________________________________


Type 111 .Sum

Corrected Model

o f Squares

Sum ofSquares

______ ____________ _________


Mean Square



(>iXt)5.4 32

23 7

205 5 |


flu- F tests the elfcit ofConlexl. This lest is based on the linearly
independent pairwise enmparisims among the estimated marginal means


Moan Square




3 197





Stop I'yp





5928 5?6

)9?6 192

7 443







Stop T yp POA * Context

45X0 408


1 917


81)7 69b






1 U25











1397964 495


XX 80.108


U n iv a r ia te Tests-EI'fect o f C o n t e x t ( P a n e l C)
Dependent Variable VI,T


Stop Typ* Context

IOA * Context
Stop I yp * I'O.A

('onccicd 1 otal

' R Squared 39K (Adjusted R Squared - .205)

Figure 3.8: Speaker RM: ANOVA results for dependent variable VLT and independent variables, stop type (Panel A), place
of articulation (Panel B), context (Panel C), Horizontal lines in boxes (Panel D) represent median values and Sum m ary of
effects and interactions (Panel E)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

U n i v a r i a t e Test s-E ffe c t o f S t o p T y p e ( P a n e l A)

C o n te x t


I ittt r o r

Dependent Variable VLT

Sum ofSquares

Mean Square




P 4 3 2 6 I4





59 834

irit n [ I

Hi I'm n i'll ][ii]



The h test s the effcet o fS to p 1 ype [ his test is based on the linearly
mdepcndeiit pmru isc comparisons among the estimated marginal means


U n i v a r i a t e T e s t s - E f f e c t o f P l a c e o f A r t i c u l a t i o n ( P a n e l B)
Dependent \'aitable VL1
Sum ofSquares


Mean Square


2353 441)

6 ti601.112







Stop Type
(Panel D)

T he F tests the efletl u lllace of AilicuUliun This test is based on the

linettrlv independent painvise comparisons among, the estimated marginal

M a i n e f f e c t s a n d i n t e r a c t i o n s ( P a n e l E)


Dependent Variable VI.T

T ype III Sum

of Square*













7060 321









Stop_ Typ ' PDA Context







1 233






1 929


Stop Typ

U n i v a r i a t e Tests- E ffe ct o f C o n t e x t ( P a n e l C)

Mean .Square




Corrected Model

Dependent Variable: VLI

Sum ofSquares

M ean Square


1969 394

2656 465



201 352

9 118


Slop T yp * Context
POA * Context

1 he K tests the clfcct ofC'oulexl. 1 his ics is based on the linearly

independent pmru isc comparisons among the estimated marginal means

1685 981

561 994





T on)

967281 185


Corrected Total



Stop T vp * POA

R Squared - ^94 (Adjusted R Squared = 304

Figure 3.9: Speaker SD: ANOVA results for dependent variable VLT and independent variables, stop type (Panel A), place
of articulation (Panel B), context (Panel C), Horizontal lines in boxes (Panel D) represent median values and Sum m ary of
effects and interactions (Panel E)

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

C o n te x t
Univariate Tests-Effect o f S t o p T y p e (P an el A)
bcpcndeni Variable- VLI
Sum ofSquares


trio r

Mean Square


12033 048


5 1 7. 7X8

12633 04X

2 4 :1XX




M edial [I] [M]

I he I- tests the clfcci ofStop Type. This test is based on the linearly

independent pairvwse comparisons among the estimated marginal means


U n i v a r i a t e T e s t s - E f f e c t o f P l a c e o f A r t i c u l a t i o n ( P a n e l B)
Dependent Variable VLT
Sinn ol'Squaies


Mean Square

n o t ) . 042

1037X5.2 P


The F tests the clfcci ol'Pluu'




Stop Type

M 'ttif i

(Panel D)

Ariiculal i e>n. This lest is bused on the

linearly independent pains hc comparisons mining the estimated marginal

M a i n e f f e c t s a n d i n t e r a c t i o n s ( P a n e l E)
Dependent Variable V'l.T


I ype III Sum

Corrected Model
Slug 1 yp

U n iv a r i a t e I es ts -Kffect o f C o n t e x t ( P a n e l C)

Dependent Variable VL.T

tm .r

Sum ofSquares
nxnn i 771

C untc'l
Mean Square


S407 924


5 13.7X8

lf> 365

The I' tests the effect ofCor.texl. This test is based o r the linearly


n f Squares

Mean Square



*9414.m *



4 358

1789733 901)

1789733 900

3483 4l)X




24 588



27X6 6X1

5 424




16 365



Slop T y p IOA * Content




Stop 1 yp ' Context





IOA * CnnicM



1 901



1 14


St *>p_T yp * P O A

3096 5X9

1032 196




2044621 670




independent pairwise com pari sons among the estimated marginal means.
L'orreeled i'olal

K Squared -- 40 I (Adjusted R Squared - 309)

Figure 3.10: Speaker SV: ANOVA results for dependent variable VLT and independent variables, stop type (Panel A), place
of articulation (Panel B), context (Panel C), Horizontal lines in boxes (Panel D) represent median values and Sum m ary of
effects and interactions (Panel E)

Speaker SV shows a significant difference between the U tterance Initial [U] and
phrase medial contexts, like GA and PB, however in the Phrase Initial [I] position
for this speaker, the VLT is shorter th an the Phrase Medial [s] condition.
Speaker RM shows longer VLT in the U tterance Initial [U] position than the
phrase medial position, however only the mean difference between the U tterance
Initial [U] and Phrase Medial [1] context is significant at the p<0.05 level. The
difference between the Phrase Initial [I] and Phrase Medial [1] context is also signif
icant for this speaker with VLT being longer in the Phrase Initial [I] context. The
difference between the [s] and [1] contexts w ithin the Phrase Medial condition is not
significant for this speaker.
Speaker SD shows significant differences between the initial [U,I] and the Phrase
Medial [1] context, and the difference between the Phrase Medial [s] and Phrase
Medial [1] contexts is also significant for this speaker.
Post-hoc comparisons however show th a t not all the significant results are ade
quate to categorize the VLT patterns based on the effect of the context. Table 3.1
details the results of the post-hoc comparisons for all the subjects.

Post-hoc comparisons
Phrase Medial [1] < Phrase Medial [s]. Phrase Initial
U tterance Initial [U]
Phrase Medial [1] < Phrase Medial [s] < Phrase Initial
U tterance Initial [U]
Phrase Medial [1] < Phrase Initial [I]
Phrase Medial [1] < Phrase Medial [s], Phrase Initial
U tterance Initial [U]
Phrase Medial [1] < Phrase Initial [I], Phrase Medial [s
terance Initial [U]

[I] and
[I] and

[I] and
< Ut-

Table 3.1: Effect of context on VLT: Post-hoc comparison

based on SNK and Tukey HSD
The post-hoc comparisons show th a t the Phrase Medial [1] context where the
target word is preceded by the modifier [lad] Ted, a voiced context, the VLT dura52

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

tions are significantly lower than in the other contexts for all speakers3. Based on
these results, it can be observed th a t a preceding voiced context tends to decrease
the VLT duration when com pared to a voiceless context (Phrase Medial [s]). Only
for two speakers (PB and SV) does the initial context (U and I) have an effect on the
VLT, such th a t the VLT is longer th an in the phrase medial context. These results,
therefore, indicate th a t the effect of prosodic context is m arginal on VLT. There is
a tendency towards longer VLT durations in the initial contexts when compared to
the medial contexts, however, a more controlled experim ental environment may be
able to shed light on this observation.
Except for speaker GA, all of the other four speakers show a significant effect of
place of articulation on VLT (Panel C, Figures 3.9-3.13). W ithin speaker pairwise
comparisons show th a t for speaker PB, significant differences in mean VLT appear
between labial and dental, retroflex, and velar places of articulation. The differences
in decreasing order of VLT duration are such that: labial and dental > Retroflex
> Velar. These differences are significant at p<0.05. in the case of speaker RM
the order of the differences is as follows: Labial > Dental > Velar. No significant
differences obtain between labial and retroflex places of articulation and between
the retroflex and dental, however the retroflex stops are significantly longer in VLT
than the velar stops. Speaker SD shows th a t the difference in VLT due to place
of articulation is only significant between the velar and the other three places of
articulation, with the velar stops showing significantly lower VLT. W ithin the labial,
dental and retroflex, the differences being not significant. The order of differences
between place of articulation for speaker SD is as follows: Labial, dental, Retroflex
> Velar. Speaker SV shows th a t only the labial place of articulation has significantly
longer VLT than the other three places of articulation. The order of the differences
3For speaker RM , VLT for P h rase M edial [s] and the U tterance Initial [I] contexts does not
significantly differ from either th e P h rase M edial [1] or P hrase Initial context.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

being: Labial > Dental, Retroflex and Velar. These patterns can be seen in Fig. 3.11






P lace o f A rticu la tion

H Labial

S to p T y p e

D e n ta l

R e tr o f le x

S B V e la r

to p T y p e

Figure 3.11: Effect of place of articulation on VLT

Post-hoc tests based on multiple comparisons, however, show th a t only the dif
ference between the velar and the other three places of articulation are significantly
different (Table 3.2).
Except for speaker RM, all of the other speakers show th a t there is no interaction
between the factors, stop type, place of articulation (POA) and context as far as
VLT durations are concerned. Speaker RM show's a marginal effect of interaction
between the three factors (p=0.05).


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


Post-hoc comparisons
Velar < Dental, Retroflex and Labial
Velar < Dental. Retroflex and Labial
Velar < Dental, Retroflex and Labial
Velar, Dental, Retroflex < Labial

Table 3.2: Effect of place of articulation on VLT: Post-hoc comparison based on

SNK and Tukey HSD


C losure and VLT correlation

One of the prim ary goals of this study was to examine the hypothesized parallel
/ complementary distribution between CD and VLT. Based on the results from
section 3.2, 3.3 and 3.4 it can be concluded th a t CD for voiceless stops and VLT
for voiced stops show p attern s th a t allow us to draw a parallel between these two
proposed categories.

CD for VLS and VLT for VS are comparatively longer in

duration than CD for VLAS and VLT for VAS, respectively. This generalization
when correlated with the fact th a t closure durations for voiced stops are produced
with voicing suggests th a t closure in four-wav contrasts can be im portant in cueing
the aspiration contrast. Under this view, the voicing duration during the closure
for voiced stops can be understood to be an epiphenomenon. This view is also in
keeping with the observations and findings of Schiefer (1986) th a t listeners perceive
tokens with shorter VLT durations to be VAS. all else being equal. These findings
suggest th a t in Hindi production and perception of stops, the duration of the closure
is prim ary and relevant for th e stop to be categorized as an unaspirated or aspirated
stop. This observation, however, is not a claim towards the redundancy of voicing
during closure, on the contrary, voicing during closure, the duration being irrelevant,
is an im portant cue towards distinction between voiceless and voiced stops. Thus,
the epiphenomenon of voicing during closure serves to enhance the contrast between
voiceless and voiced stops, while the duration of the closure serves to distinguish


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

between the aspirated and unaspirated stops.

Keyser and Stevens (2006) propose a two step process for their model of speech
production. The first process involves replacement of universal distinctive features
with an appropriate set of motoric instructions.

The second process involves a

language-specific process th a t is sensitive to those features th a t are in danger of

losing their perceptual saliency as a consequence of the environment in which they
appear. This process is referred to as enhancem ent, which adds additional motoric
instructions to enhance the saliency of the jeopardized features. Closure duration in
Hindi can be understood in this, context to be subsumed under a language-specific
process. W hile the obliteration of the duration of closure between aspirated and
unaspirated stops puts in jeopardy the contrast between aspirated and unaspirated
stops, motoric instructions to m aintain a distinction in duration serve to salvage the
contrast. Similarly, the voicing during closure for VAS and VS serves to enhance
the contrast between the voiced and voiceless stops.
There are two fundamental assumptions behind the model proposed by Keyser
and Stevens (2006). The first assumption is th a t defining features are universal and
the second assum ption is th at defining features need enhancement depending on
the contrasts in a language. In this respect, in Hindi, while the aspiration contrast
is a defining feature and is no different from a distinctive feature th a t describes
aspiration in any other language, the duration of closure is a feature th a t serves to
enhance the aspiration feature.
However, it is necessary to relate the theory of enhancement w ith the d ata in

Presum ably the motoric instructions th a t are governed by the language-

specific features of enhancement need to be directed towards either lengthening the

closure for unaspirated stops or shortening the closure for aspirated stops in com
parable terms. While adequate and independent articulatory-aerodynam ic grounds
exist for shorter closures for aspirated stops, sufficient grounds are unavailable for

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

lengthening of closures for unaspirated stops.

A m anifestation of the theory of

enhancement of the aspiration contrast in Hindi therefore, is a set of motoric in

structions th a t lengthen the duration of closure for unaspirated stops. Relating this
to Schiefers findings from perception studies, it becomes clear th a t enhancem ent in
Hindi results in lengthening of closures for unaspirated stops and not shortening of
closures for aspirated stops. In 6. I discuss in detail the reasons behind the absence
of aerodynamic factors in the spread of breathiness following VAS.


Sum m ary o f resu lts

Closure durations are affected by both the stop type and place of articulation, in
the case of VLT, it has been shown th a t stop type has an effect on VLT durations.
Marginally it has also been shown th a t prosodic positions such as the U tterance
Initial (U) position tend to lengthen durational features such as closure duration
and VLT duration, the phrase medial contexts lead to reduction in duration of
segmental features. A complete cross category comparison of closure duration and
VLT in the utterance initial position could not be accomplished due to the fact th a t
acoustically it is not possible to segment the beginning of closure in this position.
Further, the effect of segmental contexts on the segmental closure and VLT
durations shows th a t despite the voiceless [s] preceding context, in the phrase medial
context VLT durations are longer in this context compared to the phrase medial [1]
Place of articulation has expected effects on both closure and VLT durations, in
th a t labials tend to be longer than the velar stops. The articulatory mechanisms
th a t increase the oral tract volume due to the place of the constriction, it seems, not
only effect the durations of voicing but also the closure durations. This is unusual
in the sense th a t labials for instance are not only voiced for longer durations, but


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

they tend to have longer closures as well.

Over all the results from VLT and CD provide evidence against the standard
view on contrasts in Hindi. As discussed in section 1.4 of chapter 1. The results
from the VLT durations in conjunction w ith the results from CD show th a t aspirated
stops have comparatively shorter closures compared to the unaspirated stops. This
observation also provides evidence against the standard assumption th a t the voicing
during closure and closure in and off itself are non-distinctive. Based on the evi
dence from VLT and CD and Schiefers observation th a t listeners prefer longer VLT
durations for the unambiguous perception of VS, I conclude th a t closure durations
could play a contrastive role in making the four-way distinctions in Hindi.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Chapter 4
Effects of stop typ e on
fundam ental frequency

U n iversal ten d en cy for fo low ering follow ing

voiced sto p s

Voiced consonants are known to lower the fundam ental frequency in the following
vowel (House and Fairbanks 1953, Lehiste and Peterson 1961, Lofqvist 1975, Umeda
1981, Ohde 1984). fo tends to be higher following voiceless consonants than voiced
consonants, an effect th a t can be found as far as 100 ms from voicing onset (House
and Fairbanks 1953, Umeda 1981). This is a well known tendency amongst a large
number of languages, irrespective of the nature of laryngeal contrasts th a t are em
ployed in these languages. T h at is to say, languages th a t contrastively use breathy,
creaky and modal phonation, tonal contrasts and laryngeal contrasts such as aspira
tion, all exhibit lowering of f0 following voiced segments (Ohala 1973, Hombert 1978).
D u tta (2007) shows th a t compared to VS. VAS lower average fo values in the
first six pitch periods. However, this study looked only into the acoustic properties
of the voiced stops (VS, VAS) and not the voiceless stops (VLS, VLAS), hence it
was not possible to study the effect of stops on the f0 of the following vowel for
all the four stop types. Additionally, the effect of the stops on the initial f0 of the
vowel was measured as the average fo of the first six pitch periods. Since the first
six pitch periods do not necessarily coincide w ith 100 ms following the release of the
stop it would be interesting to find out how far the effect of the stop type on the


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

fo persists. W hile measuring the average f0 in the first six pitch periods provides
an account of the absolute differences in fo between the VAS and VS, it fails to
establish in relative terms the extent of the effect of the stop type on the fo. One
of the research goals of this study was to investigate the extent of f0 lowering in the
following vowel in order to fully understand the effects of all the four stop types on
the fQ and not just the voiced stops. As has been shown in chapter 3, VLT for VAS
is significantly lower than VS. Under the assum ption th a t voicing is responsible for
lowering f0 universally, a comparatively less voiced VAS tends to lower f0 further
than the VS. Therefore it is also im portant to relate the extent of f0 lowering in
VAS w ith the extent of the spread of breathiness in order to establish whether the
fo lowering and breathiness spread could correlated. The results from the study of
f0 lowering show th a t f0 following VAS is lower compared to the VS, VLAS and VLS
till 30 percent of the following vowel. Results from the spectral intensity measures
(C hapter 6) suggest th a t the breathiness spreads till 30-50 percent of the following
vowel in VAS. A comparison of the the fo lowering and breathiness spread durations
following VAS suggests a correlation between the nature of phonation following the
release of the VAS and the f0 in the following vowel.
This chapter is organized, as follows: In the following section. 4.2. I will discuss
the effect of the stops on the fo of the following vowel in Hindi and also present the
results from the individual speakers. Section 4.3 is a brief discussion of the results
from section 4.2. Section 4.3 is a brief discussion on the effects of prosodic context
on the fo and finally, in section 4.4, I outline the m ajor conclusions from this portion
of the study.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


fo as a fu n ction o f sto p ty p e

Stemming from the interest in investigating the consonantal origins of tonal systems,
Purcell et al. (1978) is a phonetic study on the correlation of Hindi breathy voiced
stops and Panjabi tones.

In fact, Ohala (1974) had modeled the Panjabi/H indi

d ata to account for tonal development in general. A number of studies (House and
Fairbanks 1953, Hombert 1978), to cite a few, started to establish the correlation
between voicing and f0-. Purcell et al. (1978) in their experim ental study established
two correspondences between Hindi and Panjabi:
Panjabi tone originated due to the loss of aspiration
Panjabi tonal contours and f0 perturbations in Hindi are comparable
They were able to establish the fact th a t there is a direct correlation between the
effects of Hindi VAS on f0 of adjacent vowels and the realizations of Panjabi high
and low tones on cognate words. Thus the correlation, as far as low-rising tone in
Panjabi following voiceless stops [kar] houseAnd Hindi [ghar] 'houseis concerned, it
is established w ithout specifically looking into the phonetic m anifestation of voicing
in Hindi, rather just the phonological contrast. The correlation between the conso
nantal origins of tones in Panjabi and the synchronic behaviour of fo in Hindi stops
notw ithstanding, it is not clear from the study by Purcell et al. (1978), whether the
fo lowering following Hindi VAS could be correlated with the mode of phonation or
could it be because of the universal tendency for voiced stops to lower foIn this study, the effect of the Hindi stops on the f0 of the following vowel
is examined so as to provide an explanation for the above mentioned questions
regarding the nature of fo lowering following VAS. fo analysis, as mentioned in
section 2.3.3 of chapter 2. was conducted on a tim e normalized contour such as ten
fo values for each token were recorded at 10 equidistant points in the vowel starting


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

a t 10 percent of the vowel and ending at 100 percent of the vowel. This way the
ten f0 values th a t are recorded for each token, represent the overall f0 contour for
each token. A one-way ANOVA was conducted with fo at 10 percentage points as
the dependent variable with Stop Type (4 levels) as a factor. A multiple pairwise
comparison was done to examine differences in f0 as a function of stop type. Tukey
post-hoc tests were used categorize the levels of the stop type according to the effect
a particular stop type had on the corresponding f0 values at the different percentage
points of the vowel. In what follows, first results from each subject are discussed
followed by a discussion of the most conservative findings th a t are obtained for all

4 .2.1

S u b je c t G A

The statistical tests indicate th a t mean f0 differences between the four stops are
significant at p<0.05 till 40 percent of the vowel. After 50 percent of the vowel
the mean f0 differences due to stop type begin to dissipate and after 80 percent
till 100 percent of the vowel the differences in fo between th e four stops are no
longer statistically significant.1 The mean differences between f0 at 10.20,30 and 40
percent allow us to posit a three category distinction of stops, namely, VS. VAS,
and voiceless stops. Voiceless stops here subsumes both VLS and VLAS. The mean
difference in f0 between VLAS and VLS is statistically insignificant from 10 percent
till 20 percent of the vowel. Significant differences between mean fo values sta rt at 30
percent and persist till 40 percent of the vowel between all four stops. Subsequently
from 50 percent till 60 percent of the vowel, the difference in mean f0 between VAS
and VS becomes insignificant, while the distinction between the voiced stops (VAS
and VS) and the voiceless stops (VLS and VLAS) and w ithin the voiceless stops
! I Ion' an d elsewhere, all rep o rted p values are at the level of p < 0.05 .


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

(VLS and VLAS) is significant. From 70 till 100 percent of the vowel there is no
significant difference in mean f0 between the four stops.
A Tukey post-hoc test was used to confirm these results and also to help cat
egorize the data. T he test shows th a t from 10 till 20 percent of the vowel mean
fo values help categorize the stops into three subsets, VS, VAS, and the voiceless
stops. The Tukey test also shows th a t only the difference in mean fo between the
stops at 30 percent of the vowel can help in categorizing the stops into four subsets.
The difference at 40 percent being only sufficient to categorize the stops as voiced,
VLS and VLAS. This pattern persists till 80 percent of the vowel after which the
difference in f0 between the stops as a factor of stop type remains insignificant till
the end of the vowel. These patterns can be seen in the fo means plots below in
Fig. 4.1. The mean f0 values for VAS are significantly lower than those of the VS
till 30 percent of the vowel. W hile initially at 10 percent there is no significant
difference between the VLS and VLAS stops. The boxplots in Fig. 4.2 show the
variation in fo values. The overall range of fo for VAS between 10 and 40 percent
of the vowel tends to be high, however the median values do show a p attern where
they are consistently lower than the medians for VS. At 20 and 30 percent of the
vowel, however, the VAS show a great degree of variation.

4.2 .2

S u b je c t P B

M ultiple comparisons for the results for speaker PB show th a t the mean f0 differences
are significant between VAS and the other three stops a t the p<0.00-5 level till 30
percent of the vowel. The mean f0 differences are significant between VS and the
other three stops only at 20 percent of the vowel. Significant differences between

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

.0 ;

Stop Type

P ercen t V ow el

Figure 4.1: Effect of stop type on f0 (Subject GA): Mean f0 plots

the VLAS and VLS obtain only at 40 percent of the vowel. The mean difference
between VAS and the voiceless stops is significant at p<0.05 level till 40 percent of
the vowel. The mean difference between VAS and VLAS being significant till 80
percent of the vowel. These patterns can be seen in Fig. 4.3. The mean f0 values
for VAS are significantly lower than the mean f0 values for VS till 30 percent of the
vowel. In this respect the difference in f0 values for speaker PB are similar to th a t
of speaker GA.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.



i , . T_ _ 1_ T_ [_ _ r_ _ ]_ n r

Percent Vowel

Percent Vowel

Figure 4.2: Effect of stop type on f0 (Subject GA): Variation in fo values. Horizontal
lines w ithin the boxes represent the median f0 values.
A post-hoc Tukey test shows th a t till 40 percent of the vowel the mean fo val
ues alone can help categorize the stops into three different subsets: at

1 0


the mean fQ values for VAS<VS=VLAS<VLS, at 20 percent the mean f0 values

for VAS<VS<VLS=VLAS, at 30 percent V A S<VS=VLS<VLS=VLAS and at 40
percent, the mean f0 for VAS<YS=VLS<VLAS. Overall the patterns suggest th a t
mean f0 for VAS is lower than the other stops till 40 percent of the vowel. VS also


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

tend to lower f0, however in the case of this subject, only at

2 0

percent the mean f0

value is significantly lower th an th a t for VLS and VLAS. The boxplots in Fig. 4.4
show variation in f0 values, quite like speaker GA. f0 values for VAS th a t lie between
10 and 40 percent of the vowel tend to show a great am ount of variation. The me
dian values, however, do show a p attern of f0 fall after 10 percent till 30 percent of
the vowel.

P ercen t V ow el

Figure 4.3: Effect of stop type on f0 (Subject PB): Mean f0 plots


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

I 1 I I ' I ' I ' 1 I ' T I H

1 ( r l | >-| | | | | |

r r~mn r n n ^ T T ^ r T
Percent Vowel

T r n ^ r r n ^ 'r 'r n r
Percent Vowel

Figure 4.4: Effect of stop type on f0 (Subject PB): Variation in fo values. Horizontal
lines w ithin the boxes represent the median fo values.

4 .2 .3

S u b je c t R M

One-way ANOVAs show th a t speaker RM exhibits similar patterns for f0 values for
VAS when compared to speaker GA and PB. However the crucial difference being

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

th a t the difference in f0 between VAS and VS is significant only till 20 percent of

the vowel, with VAS showing lower mean f0 values than VS (p=0.007 and 0.005 at
10 and 20 percent, respectively). The mean f0 differences between the voiced stops
and voiceless stops are significantly different till 60 percent of the vowel.
A post-hoc Tukey test shows th a t from 10 till 20 percent of the vowel the p attern
of f0 values can help categorize the stops into three categories, such th a t f0 for
VAS<VS<VLS=VLAS. These patterns can be seen in Fig. 4.5 below. In Fig. 4.6
we see th a t for this speaker the variation in the f0 values is reduced, compared to
speaker GA and PB. Further, the median values show th a t the VAS have lower f0
values than the voiceless stops as do the VS at least till the initial 10-20 percent of
the vowel.

4 .2 .4

S u b je c t S D

Multiple comparisons between mean f0 values for this speaker show patterns similar
to those of the other speakers, especially in the initial 30-40 percent of the vowel. For
this speaker mean difference in f0 between VAS and VS is statistically significant
till 30 percent of the vowel, w ith the VAS showing lower mean f0 th an the VS.
The difference in mean f0 between VLS and VLAS stops is significant between
20-50 percent of the vowel, w ith the VLS showing lower f0 means. The VS show
significantly lower mean f0 th an the voiceless stops from 20-70 percent of the vowel.
In all of these cases p<0.005 for the significant differences in f0 means.


patterns can be seen in Fig. 4.7 below.

A post-hoc Tukey test shows th a t for this subject at

1 0

percent of the vowel the

f0 means for VAS<VS=VLS=VLAS. However, between 20-30 percent of the vowel


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Stop Type

P ercen t V o w e l

Figure 4.5: Effect of stop type on f0 (Subject RM): Mean f0 plots

the fo means for the VAS<VS<VLS<VLAS. Post-hoc tests also confirm th a t from
80-100 percent of the vowel the stop type has no significant effect on the f0 of the
vowel. The boxplots in Fig. 4.8 show th a t for this subject also f0 values for VAS
vary when compared to the other stops in the beginning 10-30 percent of the vowel.
The m edian values however conform to the p attern seen in the mean fo differences.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.





! i i i i T 1 i i
P ercent Vowel

Percent Vowel

Figure 4.6: Effect of stop type on f0 (Subject RM): Variation in f0 values. Horizontal
lines within the boxes represent the median f0 values.


S u b je c t S V

Multiple comparisons for this subject show th a t f0 means for VAS are lower th an the
other three stops from 10-80 percent of the vowel (p<0.05). However the VS for this
speaker show f0 values higher than those for the VLAS significantly at 20 percent

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

S to p T y p e

i r;

; ;;

* -

1- - - - - - - - - 1- - - - - - - - i

P ercen t V ow el

Figure 4.7: Effect of stop type on f0 (Subject SD): Mean f0 plots

of the vowel (p=0.019). This p attern for the VS is unusual and exceptional when
compared to the initial f0 patterns of the other four speakers. While the f0 p attern
for the VAS for this speaker like the other speakers shows a low onset and a steep
rise, the VS do not reflect the low onset. These patterns can be seen in Fig. 4.9.
The boxplots in Fig. 4.10 show th at the variation in f0 values for this speaker are
consistent for all the stops. The median values for VAS also show th at VAS are
produced w ith lower f0 compared to the other stops.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.




|~ v | - r - - p - r - p -

Percent Vowel

Percent Vowel

Figure 4.8: Effect of stop type on f() (Subject SD): Variation in f0 values. Horizontal
lines within the boxes represent the median fo values.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


Stop Type

P ercen t V ow el

Figure 4.9: Effect of stop type on f0 (Subject SV): Mean f0 plots


D iscu ssion

Overall the fo patterns suggest th a t the effect of the stop type is reflected in the
initial 20-30 percent of the vowel at least for four subjects (GA. PB, RM and SD).
T he general pattern for f0 values is such th a t initially a three way distinction can
be seen, dependent on the stop type. At least for four speakers (GA, PB, RM and
SD). VAS tend to have lower mean f0 values than the VS which in tu rn have lower


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

i i i i i i " i " "

i i i

Percent Vowet

Percent vowei

Figure 4.10: Effect of stop type 0 1 1 f0 (Subject SV): Variation in f0 values. Horizontal
lines within the boxes represent the median f0 values.
fo than the voiceless stops (both VLS and VLAS). The initial mean f0 differences
between VLS and VLAS is less certain. The relations between the stops based on
statistically significant mean fo differences can be summarized as in Table 4.1 below.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.








V A S < V S = V L A S < VLS



V A S = V S < V L S < VLAS

V A S < V S = V L S < VLAS
V A S = V S < V L S < VLAS




Table 4.1: Ordered relations between the stops from


1 0

to 50 percent of the vowel based on Tukey HSD comparisons

Three observations can be made based on the results of the fo analysis as well
as the summarized table above:
1. In their initial portion , vowels following voiced stops (VS, VAS) show lower
mean f0 than the vowels following voiceless stops (VLS, VLAS).
2. VAS< VS
3. The extent of the fo lowering for vowels following voiced stops is such th a t this
lowering effect can be seen till 30 percent of the vowel (for four speakers).
4. Mean f0 values do not provide a clear understanding of the relation between
the VLS and VLAS.

provides evidence for a universal tendency for f0 lowering following

voiced stops, however, it does need to be noted based on the results from C hapter 3
on the duration of VLT th a t VAS show shorter VLT than the VS. Thus, relatively
less voiced VAS tend to lower fo more than the VS. Observation 2 provides cre
dence to the claim th a t f0 lowering for voiced stops can be seen in the first six pitch
periods (Schiefer 1986. D utta 2007) and also shows th a t the extent of the f0 lowering
can extend to about 30 percent of the vowel. Observation 3 claims th a t based on fo
patterns it is not possible to categorize the voiceless stops unambiguously.


Effect o f co n tex t on fo

There was no significant interaction between Stop Type and Context on the mean
f0 values. However, in all contexts, a low-rising contour was found. This low-rising
contour could be an effect of speakers identifying the target words as new or novel
instances, which lead the speakers to implement perhaps a 'focus 1 accent in Hindi.
It has also been shown by various studies on the m anifestation of stress in Hindi th a t


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

a L*H contour is a close estim ation of word stress in Hindi ( n.d.). It is likely th a t
in this study the low rising f0 contour could be a m anifestation of lexical stress. In
th a t respect, an experiment th a t is designed specifically to test the effect of prosodic
context on the fo of vowels following the stops in Hindi is needed. Unfortunately,
the near-laboratory setting under which the recordings were made, made it difficult
to obtain natural or near-spontaneous recordings.


Sum m ary and im p lication s

The results from the effect of the stop types on f0 show th a t VAS and VS tend to
lower the f0 of the following vowel till about 30 percent . While previous studies had
established th a t fo following the voiced stops and especially the VAS is lower, in
this study the extent of this lowering effect has been shown. There are two major
implications of these results. First, based on these results it can be said th a t the f0
lowering following voiced stops and especially VAS is not restricted to the first six
pitch periods. The lowering effect can be seen till about 30 percent of the vowel,
which in most cases will be more th an just six pitch periods in the following vowel.
Secondly, it can also be said th a t the VAS for all subjects tend to have lower mean
f0 in the initial 10-30 percent, of the vowel than the VS. In addition to these findings,
it can also be said th a t the universal tendency associated with voiced stops for f0
lowering is somewhat supported by these results, however comparison between the
proportional duration of f0 lowering and breathiness spread (C hapter 6 ) suggest th a t
breathiness could also be considered a likely reason behind the extent of fo lowering
following VAS. Languages th a t contrast between breathy and modal vowels tend to
show th a t breathy vowels are accompanied by low f0 (Esposito 2003, Andruski and
Ratliff'e 2000, Thurgood 2004). Voiced stops tend to have lower f0 initially than
the voiceless stops (at least for four speakers). However, the VAS, which are "less


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

voiced (see VLT duration results in C hapter 3) tend to have a further lowering
compared to VS. In chapter

, the nature and extent of breathiness following the

VAS is examined based on four spectral intensity measures.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Chapter 5
A spiration and vowel duration


O u tlin e

In D u tta (2007) it was observed th a t the breathy portion following the VAS tended
to perm eate into a sizeable portion of the vowel. One of the research goals for this
study was to be able to compare and correlate the durations of aspiration following
VLAS and VAS. This correlation will provide a crucial understanding of the nature
and phonetic im plem entation of the aspiration contrast. In this study, we find th a t
the duration of aspiration following VLAS is comparatively shorter th an the duration
of breathiness following VAS. Further, we also find th a t the duration of aspiration
is well correlated w ith the place of articulation of the stop in VLAS. Thus, a further
point of occlusion in the oral tract leads to a longer duration of aspiration. The
same pattern, however, is not found in breathiness / aspiration following the VAS.
This goes to show th a t aerodynamic factors th a t are responsible for the varying
duration of aspiration following VLAS are not at play in the results of the spread
of the breathy portion in VAS. In this respect, supraglottal configurations have no
effect on the duration of the breathy portion following the VAS. Breathy release
following the VAS are due to particular laryngeal configurations.
This chapter is organized as follows: In section 5.2, I discuss the results from
aspiration durations in VLAS. Following this, in section 5.3. I present the results
from the vowel durations . In section 5.4, I provide a sum m ary and discussion of
the relevant results and conclusions th a t can be drawn based on the results.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


D u ration al properties: A sp iration d u ration

A spiration duration was measured for the VLAS for all places of articulation and all
contexts . 1 In this section, results from the effect of place of articulation and context
will be presented for all of the subjects.


S u b je c t G A an d P B

Place of articulation had a significant effect on aspiration duration. For subject

GA, the absence of VLAS led to only three-way comparisons between the places
of articulation. Results indicate a significant effect, F(2,74)=21.046 w ith p<0.001
(see Fig. 5.1). Velar VLAS are longer in aspiration duration th an both dental and
retroflex VLAS for GA. Post-hoc comparisons confirm th a t the difference in mean
between dental and retroflex aspiration duration is not significant. Similar results
obtain for subject PB. with F(3.114)=26.192 which is significant at the p<0.001
level. For PB as well, the post-hoc comparisons reveal th a t the velar VLAS stops
have longer durations of aspiration compared to the labial, dental and retroflex
VLAS (see Fig. 5.1). Subjects GA and PB did not exhibit any significant effect
of prosodic context in the duration of aspiration for the VLAS. Hence, prosodic
context was not included as a factor in the analysis of aspiration duration.

5 .2 .2

S u b je c t R M , S D and S V

Results from the effect of place of articulation indicate th a t subject RM shows

longer aspiration duration for velar VLAS and patterns together with subjects GA
and PB. SD and SV however pattern together w ith significant differences between
: As m entioned above, subject GA could not produce th e labial VLAS due to ongoing m erger
of this sto p category w ith the voiceless labio-dental fricative.


with permission of the copyright owner. Further reproduction prohibited without permission.







P l a c e o f A r tic u la tio n

P l a c e o f A r tic u la tio n

Figure 5.1: Effect of place of articulation (POA) on A spiration D uration (ms)

the velar+labial VLAS and the dental+retroflex stops. A consistent finding on the
duration of aspiration is the longer duration of aspiration for all velar VLAS (see
Fig. 5.1). Subjects RM, SD and SV did show marginal effect of prosodic context on
the aspiration duration, however post-hoc tests reveal th a t the differences in mean
are not significant for RM and SD. For SV, prosodic context does have an effect
on the aspiration duration, in th a t, the Phrase Initial [I] (Subset 1) is maximally
distinct from the U tterance Initial [U] (Subset 2) and the Phrase Medial [s][M]
context (Subset 3) as seen in Fig. 5.2.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


i P B B B S58

b b b Ib b h

J i g g

.WBBB i C T a - - '

g s a
M aa
w iy ts ii p ip E l^




-m m W lM

e s s le a

[fS h f!



Figure 5.2: Effect of Context on Aspiration D uration (ms)


D u ration al properties: V ow el d u ration s

Vowel duration results show th a t there is a main effect of stop manner on the vowel
duration. Vowels following aspirated stops (VAS and VLAS) are significantly longer
than vowels following unaspirated stops (VS and VLS). As can be seen in Fig. -5.3 a
difference in vowel duration appears between the unaspirated and aspirated stops.
The prim ary purpose for these measurements was to be able to compare relative
durations of aspiration following VLAS with the total vowel durations for these
stops. This would help establish the extent of aspiration following the VLAS relative
to the total vowel duration.
Vowel D u ratio n r 0 ia/=Aspiration duration + v
Here, v refers to the duration of the vowel following the offset of aspiration.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.




1m s m





...... i




V :.A S

S top Ty p e



V ..S


S top Type

Figure 5.3: Vowel duration as a function of stop m anner

Comparisons between the relative duration of aspiration and vowel duration show
th a t aspiration duration for VLAS for all speakers persists till about 25 percent of
the total vowel duration.

This can be seen in the bar chart in Fig. 5.4.


goes to show th a t, irrespective of the effects of place of articulation on aspiration

duration, the maximal durational effect of aspiration following VLAS is less than the
spread of breathiness following VAS (between 30-50 percent). Results from spectral
intensity measures were used to arrive at this comparison. Details of this analysis
appear in C hapter

. Suffice it to say, th a t the effect of aspiration following VLAS

is tem porally shorter than the voice quality difference between the VAS and the
unaspirated stops.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

tf i


p a

i w

l i

. g W T






Place of A rticulation

Figure 5.4: Box plot:

D uratiorijota/


Place o f A rticu lation

Percentage of aspiration duration relative to Vowel

Sum m ary o f resu lts

The results from the duration of aspiration and the vowel durations suggest th a t the
duration of aspiration following VLAS is comparatively shorter than the duration of
breathiness following VAS (see 5.4). Further, place of articulation of the preceding
stop in VLAS has an effect on the duration of aspiration. Thus, velar stops tend
to have longer durations of aspiration compared to the stops from other places of
articulation. This goes to show th a t aerodynam ic factors are responsible for the
varying duration of aspiration following VLAS. Thus, supraglottal configurations


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

have an effect on the duration of the aspiration portion following the VLAS. By
contrast, as shown in C hapter

, VAS exhibit no durational differences in aspira

tion dependent on place of articulation. VAS also behave differently in terms of

the duration of aspiration and thus, there is a fundam ental difference between the
voiceless aspiration of VLAS and and the breathy aspiration of VLAS. Further, it is
safe to assume th a t glottal mechanisms responsible for the breathy release following
VAS are not at play either in the manner or spread of aspiration following VLAS.
A comparison of these results from the duration of aspiration following VLAS with
the findings from

below, suggests th a t aspiration as produced following VLAS is

acoustically and aerodynamically different from the breathy release following VAS.
The duration of aspiration apart from being shorter than the breathy release is also
governed by the place of articulation of the stop. Therefore, the m anner and dura
tion of aspiration in VLAS is partly due to aerodynamic factors, which for shorter
durations of closures for velar stops produces longer durations of aspiration. These
findings can be directly correlated to supraglottal configurations rather than glottal
configurations, as is the case with the breathy portion following VAS.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

C hapter 6
Spectral properties of Hindi stops


B rea th in ess due to in co m p lete g lo tta l closure

Two studies by Dixit and MacNeilage (1980) and Benguerel and B hatia (1980) have
reported on the articulatory attributes of breathy voiced stops in H indi1. Both these
studies present d ata from post-release measurements of glottal w idth associated with
stops in Hindi. These studies show th a t the w idth of the glottal opening in VAS lies
in between th a t of VS and VLAS, suggesting insufficient glottal closure. The acous
tic consequence of an articulatory configuration such as insufficient glottal closure
is simultaneous periodicity and low frequency noise associated with the release por
tion of the VAS. A glottal configuration such as this also leads to increased air flow.
This has been given impressionistic labels such as 'breathy' and 'm urm ured' voicing,
both of these labels being used interchangeably (Ladefoged 1975). G lottal source
signals obtained through inverse filtering typically show more symm etrical open
ing and closing phases with little or no complete closed phase for G ujarati breathy
vowels (Bickley 1982, Fischer-Jorgensen 1967). Fischer-Jorgensen (1967) observes
th a t the high intensity of the first harmonic, H1; is the most salient spectral feature
of G ujarati breathy vowels. Recent spectral analysis by Bali (1999) shows th a t in
Delhi Hindi, intervocalic VAS may be produced without aspiration. However, the
Hj-Ha measures show th a t VAS tokens are produced with a larger open quotient
and a steeper spectral tilt, while the voiced plosives show the reverse glottal config
5See also K agaya and Hirose (1975).


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

urations. This goes to say th a t for discrimination between intervocalic VAS and VS.
H i-H 2 measures may be especially relevant as also for distinguishing breathy vowels
from modal ones as in G ujarati following Fischer-Jorgensen (1967). However, at the
moment there no studies on the glottal characteristics of word initial VAS in Hindi.


T h e relevance o f m easures o f sp ectra l

in ten sity

Spectral intensity measures provide an indirect m ethod of studying the effect of

the glottal source on the acoustic signal (Bickley 1982, Fischer-Jorgensen 1967,
Ladefoged and Antonanzas-Barroso 1985). Differences between the am plitudes of
H i, H2, Aj, A2, and A 3 have been shown to correlate with differences in phonation
type within vowels (Hanson 1995. Hanson et al. 2001, W ayland 1998).

In this

study four measures of spectral tilt were taken th at included difference between the
am plitudes of H i-H 2. H i-A j. H 1 -.V and H1-A3. These spectral intensity measures
together provide a measure of spectral tilt . 2
One of the observations th a t had been made in D utta (2007) was th a t the breathy
release following the VAS tends to perm eate deep into the vowel. This observation
however, had not been experimentally validated. As has been shown in C hapter
3. 3, aspiration in VLAS tem porally varies according to the place of articulation.
In this context it would also be necessary to examine the effect of place of articu
lation on the four spectral measures. In this chapter I will show th a t the VAS can
be differentiated from the VLS and VS due to differences in mean Hx-H2, Hj-Ax
and Hi-A 2 measures. Over all spectral tilt does not have a significant role in dis
2In a recent publication M ikuteit and Reetz (2007) suggest a m easure of b o th voiceless and
voiced aspiration called Superim posed A spiration which is based on visual inspection of the wave
form and spectrogram .


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

tinguishing between the VAS and the unaspirated stops. Further, I will show th a t
the aspiration duration following the VLAS is shorter than the breathy portion of
the vowel following the VAS by about 10 percent. Comparing the results from the
effect of place of articulation on the duration of aspiration following VLAS w ith the
breathy portion following VAS shows th a t place of articulation does not have any
effect on the spectral measures. This chapter is organized as follows. In section 6.3,
I discuss the effect of VAS. VLS and VS on the four spectral intensity measures at
five points in the vowel. Section 6.4 is a discussion of the contributions of each of
the spectral measures in differentiating between the VAS and the unaspirated stops
(VLS, VS). Section 6.5 is a summ ary of the m ajor conclusions.


S p ectral in ten sity m easu res

In this section, results from the Hx-H2, Hi-Ax, Hx-A2 and H 1 -A 3 measures will
be discussed. VLAS stops are aspirated between 10 and 30 percent of the vowel.
This implies th a t there is no fundam ental component present during this portion of
aspiration. All of the spectral measures used in this study are direct measurements
of spectral intensity differences between th e fundamental, the second harm onic and
the am plitudes of the first, second and third formant peaks. Therefore in this part
of the study only comparisons between the VAS and the unaspirated stops will be
made for all the four spectral measures.


H4 -H 2

A one-way ANOVA was conducted for all speakers with H x-f^ values at 10, 30,
50, 70 and 90 percent of the vowel as dependent variable and stop type as an
independent factor. For subject GA, significant difference between VAS and the
unaspirated stops obtained till 30 percent of the vowel at p<0.001.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

At 10 and

30 percent of the vowel the difference

between VLS and VS was found to be


significant. At 50 percent of the vowel the difference between the three stops (VLS,
VS and VAS) became insignificant. However, at 70 and 90 percent of the vowel the
H1-H2 means were significantly lower th an the VLS and VS put together. Tukey
post-hoc comparisons confirm these results. The over all p attern for H i-H 2 means
at the 5 points in the vowel are:



2. 30 percent:



3. 50 percent:



1 0

4. 70 percent: VS = VLS > VAS

5. 90 percent: VS = VLS > VAS
Subject PB shows a significant of VAS on the H 1 -H 2 means. ANOVA results
show th a t VAS have significantly higher 111 -I P till 70 percent of the vowel at p<0.001
than the VLS and VS put together. The difference between the VLS and VS is not
significant till 50 percent of the vowel. At 70 percent of the vowel, the difference
between VLS and VS is marginally significant (p=0.028) w ith VS > VLS. Tukey
post-hoc comparisons confirm these results. The over all p attern for H i-H 2 means
at the 5 points in the vowel are:
. 10 percent:



2. 30 percent:



3. 50 percent:



4. 70 percent:



5. 90 percent:




Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Subject RM also shows a significant effect of VAS on the H 1 -H 2 . ANOVA results

show th a t VAS have significantly higher H i-H 2 till 70 percent of the vowel at p<0.001
than the VLS and VS put together. The difference between the VLS and VS is not
significant till 50 percent of the vowel. At 70 percent of the vowel, the difference
between VLS and VS is marginally significant(p=0.025) w ith
percent the difference between VAS and the VLS

VLS > VS. At 90

and VScombined

is significant

at the p<0.001 with VAS > VLS = VS. Tukey post-hoc comparisons confirm these
results. The over all p attern for H i-H 2 means at the 5 points in the vowel are:

. 10

percent: VAS > VS = VLS

. 30

percent: VAS > VS - VLS

3. 50

percent: VAS > VS = VLS

4. 70 percent: VAS > VLS > VS

5. 90 percent: VAS > VLS = VS
Subject SD shows a significant effect of VAS on the Lb-fR. ANOVA results show
th a t VAS have significantly higher H i - 1 4 2 till 70 percent of the vowel at p<0.001
than the VLS and VS put together. The difference between the VLS and VS is not
significant throughout the vowel. Tukey post-hoc comparisons confirm these results.
The over all pattern for H 1 -H 2 means at the 5 points in the vowel are:

. 10

percent: VAS > VS = VLS

. 30

percent: VAS > VS = VLS

3. 50

percent: VAS > VS = VLS

4. 70

percent: VAS > VS = VLS


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

5. 90 percent: VAS > VS = VLS

Subject SV also shows a significant effect of VAS on the H i-H 2. ANOVA results
show th a t VAS have significantly higher H i-H 2 till -50 percent of the vowel at p<0.001
than the VLS and VS. Between the VLS and VS, the mean difference is significant
from 10 till 30 percent at p<0.05 and with VS > VLS. At 70 and 90 percent of
the vowel the

differences between the VAS, VS and VLS are not significant

for this speaker. Tukey post-hoc comparisons confirm these results. The over all
p attern for H i-H 2 means at the 5 points in the vowel are:

. 10 percent: VAS > VS > VLS

. 30

percent: VAS > VS> VLS

3. 50

percent: VAS > VS = VLS

4. 70

percent: VAS = VS = VLS

5. 90

percent: VAS = VS = VLS

Several observations can be made about the H i-H 2 differences between the three
stops. Except for speaker GA. all other speakers show th a t the H i-H 2 means for
VAS are significantly higher till 50 percent of the vowel than the unaspirated stops.
We can also sec th a t for three of the five speakers (PB, RM and SD) the difference
between VLS and VS is not significant till 50 percent of the vowel. Based on these
results it can be concluded (for three speakers) th a t the open quotient for VAS
stays significantly high till 50 percent of the vowel. The patterns of the mean H i-H 2
differences are shown in Fig. 6.1.
The mean patterns and the conclusions th a t have been drawn on the basis of
the same, however, do not come w ithout qualifications. As can be seen in Fig. 6.2.
the d ata from H i-H 2 show th a t there is variation between and within speakers. One

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

p a tte rn th a t can be adduced from Fig. 6.2, however, is the steady nature of the
H4 -H 2 values for the unaspirated (VLS, VS) stops. This p attern can be seen in
speakers PB, SV and RM and to an extent in speaker SD as well till 70 percent of
th e vowel. The medians and upper and lower quartiles suggest th a t the VLS and VS
p a tte rn together and help establish a baseline for w hat could be considered H4 -H 2
values for m odal vowels. For speaker PB and SD it is also clear th a t the Hx-hL
values for VAS do not over lap with the values for the unaspirated stops.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Figure 6.1: Effect of Stop Type on HiSap) ZH-m


j h -ih

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Ej3 fesp1 ^3 |


Stop type


P ercent v o w rI


P er c en t V ow el


.2 : Box plots show variation in H i-H 2 values and overlapping.


H i-A i

A one-way ANOVA was conducted for all speakers w ith Hx-Ax values at 10, 30, 50, 70
and 90 percent of the vowel as dependent variable and stop type as an independent
For subject GA, significant difference between VAS and the unaspirated stops
obtained till 90 percent of the vowel at p<0.001. A t 10 percent of the vowel the
difference between VLS and VS was found to be significant with VS > VLS. Fol
lowing this the difference between VS and VLS was not significant. Tukey post-hoc
comparisons confirm these results. The over all p attern for Hx-Ax means at the 5
points in the vowel are:
1. 10 percent: VAS > VS > VLS

. 30 percent: VAS > VS = VLS

3. 50 percent: VAS > VS VLS

4. 70 percent: VAS > VS = VLS
5. 90 percent: VAS > VS = VLS
Subject PB shows a significant of VAS on the Hi-Ai means. ANOVA results
show th a t VAS have significantly higher Hx-Ax till 70 percent of the vowel at p<0.001
than the VLS and VS. The difference between the VS and VLS is not significant
till 30 percent of the vowel such th a t VS > VLS. At 90 percent of the vowel all the
stops behave similarly. Tukey post-hoc comparisons confirm these results. The over
all p attern for Hx-Ax means at the 5 points in the vowel are:
1. 10 percent: VAS > VS > VLS
2. 30 percent: VAS > VS > VLS


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

3. 50 percent: VAS > VS = VLS

4. 70 percent: VAS > VS > VLS
5. 90 percent: VAS = VS = VLS
Subject RM also shows a significant effect of VAS on the H i-A i. ANOVA results
show th a t VAS have significantly higher Hi-Ai till 90 percent of the vowel at p<0.001
than the VLS and VS put together. The difference between the VLS and VS is not
significant till 90 percent of the vowel.Tukey post-hoc comparisons confirm
results. The over all p attern for Hx-Ai


means at the 5 points in the vowel are:

1. 10 percent: VAS > VS = VLS


. 30 percent: VAS > VS = VLS

3. 50 percent: VAS > VS = VLS

4. 70 percent: VAS > VS = VLS
5. 90 percent: VAS > VLS = VS
Subject SD shows a significant effect of VAS on the H i-A i. ANOVA results show
th a t VAS have significantly higher Hi-Ai till 70 percent of the vowel at p< 0.00i
than the VLS and VS put together. The difference between the VLS and VS is
significant only at 10 percent of the vowel with the VS < VLS. Tukey post-hoc
comparisons confirm these results. The over all pattern for H i-A i means at the 5
points in the vowel are:
1. 10 percent: VAS > VS > VLS
2. 30 percent: VAS > VS = VLS
3. 50 percent: VAS > VS = VLS

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

4. 70 percent: VAS > VS = VLS

5. 90 percent: VAS = VS = VLS
Subject SV also shows a significant effect of VAS on the H i-A j. ANOVA results
show th a t VAS have significantly higher Hi-Ai till -50 percent of the vowel at p<0.001
than the VLS and VS. Between the VLS and VS, the mean difference is significant
from 10 till 30 percent at p<0.001 and w ith VS > VLS. At 90 percent of the vowel
the H i-A i differences between the VAS and the unaspirated, VS and VLS become
significant for this speaker. Tukey post-hoc comparisons confirm these results. The
over all p attern for Hi-Ai means at the 5 points in the vowel are:

. 10 percent: VAS > VS > VLS

2. 30 percent: VAS > VS > VLS

3. 50 percent: VAS > VS = VLS
4. 70 percent: VAS = VS = VLS
5. 90 percent: VAS > VS = VLS
Two broad observations can be made on the basis of the results from the H jAi means. First, the 1 1(- A , values tend to be significantly higher for the VAS till
about 70 percent of the vowel, which is further th an the H r H 2 means. However,
consonant voicing in VAS and VS tends to have an effect on the peak am plitude
of the first formant. The patterns suggest th a t at least in the initial 10-30 percent
of the vowel the ArS have significantly higher means than VLS. These pattern s can
be seen in Fig. 6.3. In Fig. 6.4 the overlapping patterns of the values can be seen.
The variation in the Hi-Ai patterns between th e stops shows considerable overlap
despite the significant mean values. It is likely th a t the values at around 50 percent
of the vowel, reflect a voice quality th a t is transient between breathy and modal

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

phonation. Nonetheless, these patterns show th a t initially (Fig. 6.3) till about 30
percent of the vowel, based on the H r Ai values the vowels can be differentiated


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Figure 6.3: Effect, of Stop Type on H r


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


H i- 1


1---- H i 1

I i........... I '
' n-<
i---- B-<

!------ ------- i
idi *
I- - i- 1 1
i m-H
i ........
i ... *.i
' E33-i
i IS1
* i t r1
* H F

H S-i
i- B h
-S h
H 3F

fwn i
1 8]1


! ' !

i- B - h

1 S F

i M*j*
H I1
B 1
HI '
S t- *
| S h


i----- gig-(
l ' ' 1 ' l .........

1 IS1
i-B F

1 Sfl1
1 H i*
I ... .

1 '

i g p
i 1 1 ' i ...... ' i ...........




Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Figure 6.4: Box plots show variation in Hi-Aj values.



H i-A 2

A one-way ANOVA was conducted for all speakers with H 1 -A 2 values at 10, 30, 50, 70
and 90 percent of the vowel as dependent variable and stop type as an independent
For subject GA, significant difference between VAS and the unaspirated stops
obtained till 50 percent of the vowel at p<0.001 and at p=0.002 at 70 percent of the
vowel. Only 10 percent of the vowel the difference between VLS and VS was found
to be significant with VS > VLS. Following this the difference between VS and VLS
was not significant. Tukey post-hoc comparisons confirm these results. The over all
pattern for H j-A 2 means at the 5 points in the vowel are:

. 10 percent: VAS > VS > VLS

2. 30 percent: VAS > VS = VLS

3. 50 percent: VAS > VS = VLS
4. 70 percent: VAS > VS = VLS
5. 90 percent: VAS = VS = VLS
Subject PB shows a significant of VAS on the H j-A 2 means. ANOVA results
show th a t VAS have significantly higher LL-A2 till 70 percent of the vowel at p<0.001
th an the VLS and VS. The difference between the VS and VAS is significant till
50 percent of the vowel such th a t VS > VLS. At 90 percent of the vowel the VAS
are significantly higher than the VLS bu t not the VS. Tukey post-hoc comparisons
confirm these results. The over all p attern for H i-A 2 means at the 5 points in the
vowel are:
1. 10 percent: VAS > VS > VAS

. 30 percent: VAS > VS > VAS


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

3. 50 percent: VAS > VS > VLS

4. 70 percent: VAS > VS = VLS
5. 90 percent: VAS = VS, VAS > VLS
Subject RM also shows a significant effect of VAS on the H 1 -A 2 . ANOVA results
show th a t VAS have significantly higher Hx-A2 till 70 percent of the vowel at p<0.001
than the VLS and VS put together.

At 90 percent of the vowel the VAS and

VLS difference is significant but not the VAS and VLS difference. The difference
between the VS and VLS is significant at 10 percent of the vowel. Tukey post-hoc
comparisons confirm these results. The over all p attern for H!-A 2 means at the 5
points in the vowel are:
. 10 percent: VAS > VS > VLS

2. 30 percent: VAS > VS = VLS

3. 50 percent: VAS > VS = VLS
4. 70 percent: VAS > VS = VLS
5. 90 percent: VAS = VS, VAS > VLS
Subject SD shows a significant effect of VAS on the H j-A 2. ANOVA results show
th at VAS have significantly higher LL-A2 till 90 percent of the vowel at p<0.001 than
the VLS and VS put together. The difference between the VLS and VS is significant
till 50 percent of the vowel w ith the VS > VLS. Tukey post-hoc comparisons confirm
these results. The over all pattern for tb -A 2 means at the 5 points in the vowel are:

. 10 percent: VAS > VS > VLS

2. 30 percent: VAS > VS > VLS


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


50 percent: VAS > VS > VLS


70percent: VAS > VS = VLS


90percent: VAS = VS = VLS

Subject SV also shows a significant effect of VAS on the H 1 -A2. ANOVA results
show th a t VAS have significantly higher H i-A 2 till 50 percent of the vowel at p<0.001
than the VLS and VS. Between the VLS and VS, the mean difference is significant
from 10 till 70 percent at p<0.001 and with VS > VLS. At 90 percent of the vowel the
ffi-A 2 differences between the VAS and VS are not significant, however both these
stops have significantly higher means th an the VLS. Tukev post-hoc comparisons
confirm these results. The over all p attern for H 1 -A 2 means at the 5 points in the
vowel are;
1. 10

percent: VAS > VS > VLS

2. 30

percent: VAS > VS > VLS

3. 50

percent: VAS > VS > VLS

4. 70

percent: VAS = VS > VLS

5. 90

percent: VAS = VS > VLS


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Figure 6.5: Effect of Stop Type on H


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


g i-t

H i i

i E3Hi

i EHi

i SB'


_ _ EHi
-S23 i




> S 1

>0 H

, gg-





H ID 1

.6 : Box plots show variation in H r A -2 values.


I I <
i m i

1H r 1

H2 H

1 ISM 1
!------ S

i i






HH! i

i m 1


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A conservative generalization th a t can made on the bases of these results is th a t

the VAS tend to show higher mean H r A 2 values at least till 50 percent of the
vowel for all speakers. Except for Subject SV, all other speakers show a significant
difference in mean H i-A 2 values between VS and VLS at 10 percent of the vowel.
These patterns can be seen in 6.5. Except for subject SD at 30 percent, all other
subjects however, do show considerable overlap in the range and values of ffi-A 2
between the three stops despite significant mean differences and high median values
for the VAS till about 50 percent of the vowel (see Fig.

.6 ). At 90 percent of the

vowel the values tend to overlap and become coextensive.



A one-way ANOVA was conducted for all speakers w ith H 1 -A 3 values at 10, 30, 50, 70
and 90 percent of the vowel as dependent variable and stop type as an independent
For subject GA. significant difference between VAS and the unaspirated stops
obtained till 30 percent of the vowel at p<0.001 and at p=0.05 at 70 percent of the
vowel. Only at 10 percent of the vowel the difference between VLS and VS was
found to be significant with VS > VLS. Following this the difference between VS
and VLS was not significant. Tukey post-hoc comparisons confirm these results.
The over all pattern for H 1 -A 3 means at the 5 points in the vowel are:

. 10 percent: VAS > VS > VLS

. 30 percent: VAS > VS = VLS

3. 50 percent: VAS = VS, VAS > VLS

4. 70 percent: VAS > VS = VLS
5. 90 percent: VAS = VS = VLS

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Subject PB shows a significant effect of VAS on the H 1 -A 3 means.


results show th a t VAS have significantly higher mean Hr A 3 till 70 percent of the
vowel at p<0.001 than the VLS and VS. The difference between the VS and VLS
is significant till 30 percent of the vowel such th a t VS > VLS. At 90 percent of the
vowel the differences in mean H 1 -A 3 are not significant between the stops. Tukey
post-hoc comparisons confirm these results. The over all p attern for H4 -A 2 means
at the 5 points in the vowel are:
1. 10

percent: VAS > VS > VLS

. 30

percent: VAS> VS > VLS

3. 50

percent: VAS> VS = VLS

4. 70

percent: VAS> VS = VLS

5. 90

percent: VAS= VS = VLS

Subject RM also shows a significant effect of VAS on mean Lb-A3 . ANOVA.

results show th a t VAS have significantly higher H 1 -A 3 till 50 percent of the vowel
at p<0.001 than the VLS and VS put together. At 70 percent of the vowel the VAS
and VS difference is significant but not the VAS and VLS difference. At 90 percent
all differences in mean are not significant. The difference between the VS and VLS
is significant only at


percent of the vowel. Tukey post-hoc comparisons confirm

these results. The over all p attern for H]-A 3 means at the 5 points in the vowel are:

. 10

percent: VAS > VS > VLS

. 30

percent: VAS > VS = VLS

3. 50

percent: VAS > VS = VLS

4. 70

percent: VAS= VLS,


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

5. 90 percent: VAS = VS = VLS

Subject SD shows a significant effect of VAS on the Lb-A3. ANOVA results show
th a t VAS have significantly higher H 1 -A 3 till 70 percent of the vowel at p<0.001
than the VLS and VS put together. The difference between the VLS and VS is
significant till 50 percent of the vowel with the VS > VLS. At 90 percent of the
vowel only the difference in mean for VAS and VS is significant. Tukey post-hoc
comparisons confirm these results. The over all p attern for H4 -A 3 means at the 5
points in the vowel are:

. 10 percent: VAS > VS > VLS

. 30 percent: VAS > VS > VLS

3. 50 percent: VAS > VS > VLS

4. 70 percent: VAS > VS = VLS
5. 90 percent: VS > VLS, VAS = VS = VLS
Subject SV also shows a significant effect of VAS on the H 1 -A 3 . ANOVA results
show th a t VAS have significantly higher mean H 1 -A 3 till 50 percent of the vowel
at p<0.001 than the VLS and VS. Between the VLS and VS, the mean difference
is significant from 10 till 30 percent at p<0.001 and with VS > VLS. From 70 till
90 percent of the vowel the H 1 -A 3 differences between the stops are not significant.
Tukey post-hoc comparisons confirm these results. The over all p attern for LL-A 3
means at the 5 points in the vowel are:

. 10 percent: VAS > VS > VLS

. 30 percent: VAS > VS > VLS

3. 50 percent: VAS > VS = VLS


with permission of the copyright owner. Further reproduction prohibited without permission.

4. 70 percent: VAS = VS = VLS

5. 90 percent: VAS = VS = VLS


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.








lap) Ev-iH

(ap) c v - m

1 1 0

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

i ssEH

in H ]-A 3 values.



i mi


I- 4-i---


" H P 1
i---- --------i
i---- RIF)----i
1--- 1"1
i---- BHi
|---- {IB i
i---- Nl-3----1
' iaah

1----{31! i
1ill 1
i gHi

r _ i i

1--- RIH1
im i
' s i >
1-----tfiifS-- 1
i---- fiiSi
| iH---
i--- KfVSi <
' H i

' i 1

i H3i
i taa1

p ~ -p


ElIB--- 1


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Figure 6.8: Box plots show


I 1
-sa i

The results from the H 1 -A 3 means show th a t at least for four subjects the differ
ences in mean between VAS and the unaspirated stops are significant till 50 percent
of the vowel. The difference between the means for VLS and VS can be seen at
least till 10 percent of the vowel with the VS showing higher means than the VLS.
The patterns for the mean of H 1 -A 3 can be seen for the three stops in 6.7. The
boxplots in

.8 , however, show considerable overlap in the H 1 -A 3 values between

the stops, despite the higher medians for the VAS till 50 percent for four out of the
five speakers. At 90 percent of the vowel we also th a t the values become coextensive
for all the stops for all subjects.
The results from the spectral intensity measures show th a t significant differences
do obtain between the means for the VAS and the unaspirated stops till about 50
percent of the vowel. Based on these four measures of spectral intensity and the
correlation between these four measures, and voice and vowel quality distinctions it
is possible to conclude th a t the vowel following the VAS is distinct from the vowel
following the unaspirated stops (VLS, VS). The vowel following the VAS tends to
m aintain a 'breathy' quality till about 50 percent of the vowel.

In comparison

to durations of voiceless aspiration as stated in section 3.3. of chapter 3 we can

also conclude th a t the breathy portion following the VAS tends to be longer th an
the aspiration portion following the VLAS. This trend confirms to an extent the
observation made in D utta (2007) th a t the breathy release following VAS tends
to perm eate deep into the vowel compared to the duration of aspiration following


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Place o f A rticu lation

Figure 6.9: Box plot:

D u ra tio n ^ ,


Place of A rticu lation

Percentage of aspiration duration relative to Vowel

C on trib u tion s o f th e in d ivid u al sp ectra l

in ten sity m easures

As can be seen in Fig. 6.9 the median percentage of aspiration following VLAS is
between 20 and 40 percent depending on the subject and also the place of articula
tion. Compared to the percentage of aspiration in VLAS, for most of the speakers
breathy release tends to extend till about 30-50 percent of the vowel. In addition,
unlike the effect of place of articulation on aspiration duration following VLAS,
no effect of place of articulation was found for all the speakers on any of the four
spectral intensity measures (see Fig. 6.10). Fig. 6.10 shows the results for the effect
of place of articulation on H i-A i. The effect of place of articulation on the other


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


-r : ; /

H -

P e rc e n t Vowel

P e rc e n t Vowel

Figure 6.10: Box plot: Effect of place of articulation on Hx-Ai.

three spectral measures were comparable to the effect on Hj-At. This observation
allows us to conclude th a t the volume of the oral cavity has a significant effect on
the aspiration duration for VLAS but has no effect on the duration of the breathy
release following the VAS.
Although the mean differences for the four spectral measures are significantly
higher between the VAS and the unaspirated stops, there is considerable overlap
between the values for the three stops. Therefore, it became necessary to ascertain
the individual contribution of each spectral intensity measure in comparable terms.
In order to accomplish this comparison, first a comparison between the distribution
of means for the values at 10 and 30 percent was conducted. Second a comparison
between the means for values at 10, 30, and -50 percent was conducted. The distribu
tion of the mean values for these comparisons can be seen in Fig. 6.11 and Fig. 6.12,

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


Spectral Measures


Hr A2,
H i-A l 5
Hr H2,

H rA i, Hi-Aa
Hi-Ha, Hj-Aa
Hi-Aa, H r A3

Table 6.1: Maximally distinct distributions and contributing factors towards a dis
tinction in vowel quality. Means of values at
10 and 30 percent of the vowel.
respectively. The maximally distinct or separated distributions for each spectral
measure were considered contributing maximally as well towards a breathy/m odal
vowel quality distinction following the stops.
These contributing spectral measures and there relation to other measures are
summarized in Table 6.1 and Table 6.2. The measures th a t appear to the left of
the column are observed to be better in distinguishing th e stops based on com
parisons of the boxplots in Fig. 6.11 and Fig. 6.12. Tables 6.1 and 6.2 show th a t
H i-H 2 as a measure of Open Quotient is maximally distinct between VAS and the
unaspirated stops for two speakers (PB, SD). H , - A ,. a measure of first formant
bandw idth tends to maximally distinguish between the VAS and the unaspirated
for two speakers (RM, SV). 1 R -A 2 . a measure of skewness of the glottal pulse, maxi
mally distinguishes between the VAS and the unaspirated stops only for one speaker
(GA). These observations allow us to conclude th a t for distinctions based on the
quality of the vowel following the VAS and unaspirated stops, the difference in the
amplitudes of the first and second harmonic, first harmonic and the peak am plitude
of the first formant, and second formant are relevant acoustic features.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


Spectral Measures


H i -A2,

Hr-Ax, Hr A 2
Hx-Ax, Hx-A2

Table 6 .2 : Maximally distinct distributions and contributing factors towards a dis

tinction in vowel quality. Means of values at
10. 30 and 50 percent of the vowel.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


- n

" .. i .... _

i ---- r ^

S t o p Ty p e


Spectral IntensityMeasures

v l :-

Spectral Intensity Measures

Figure 6.11: Mean spectral intensity for values at 10 and 30 percent of the vowel

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Spectral IntensityMeasures

Spectral Intensity Measi

Figure 6.12: Mean spectral intensity for values at 10, 30 and 50 percent of th e vowel


Sum m ary, discu ssion and con clu sion

The results from the study of spectral intensity measures suggest the VAS tend to be
produced with greater difference in the am plitudes of the first and second harmonic
compared to the unaspirated stops. VAS are also produced with greater differences
in the am plitudes of the first harmonic and the peak am plitude of the first formant.
In addition, based on the mean differences between the four measures it can also
be concluded th a t the breathy portion following the VAS tends to perm eate till
about 30-50 percent of the following vowel. In comparison with the duration of
aspiration following VLAS, the breathy release portion is longer. Overall spectral
tilt measured by way of the difference in the am plitude of the first harmonic and
the peak am plitude of the third formant does not help distinguish the VAS from
the unaspirated stops as well as the other three measures. Further, we have shown
th a t the aspiration duration following the VLAS tends to vary as a function of the
place of articulation of the stops.

This is due to variable oral tract length and

consequently volume during the closure portion of the VLAS. Place of articulation
does not. however, have any effect on the breath}' portion of the VAS. This goes
to show th a t aerodynamic factors th a t are responsible for the varying duration of
aspiration following VLAS are not at play in the results of the spread of the breath}'
portion in VAS. In this respect, supraglottal configurations have no effect on the
duration of the breath}' portion following the VAS. Breathy release following the
VAS are due to particular laryngeal configurations. The articulatory correlates of
the spectral measures found to be most responsible for distinctions between VAS
and the unaspirated stops suggest Hindi VAS are produced with a larger Open
Quotient. This implies th a t the vocal folds during the initial 30-50 percent of the
vowel are open for longer durations compared to unaspirated stops. The results from
the H 1 -A 2 measures also suggest th a t articulatorily Hindi VAS are produced with


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

comparatively more abrupt closing of the vocal folds th an the unaspirated stops.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

C hapter 7
C onclusions, im plications and
further research

O verview

In this study we have been able to show th a t several acoustic distinctions between the
four stops in Hindi can be seen at the durational, fo, and spectral level. Durationally,
VLT differences between the voiced stops (VAS and VS) have been shown to be
statistically significant. While Closure D uration d ata could not be collected in the
utterance initial position [U], it has been shown th a t VLAS are produced with
shorter closures in comparison to VLS. Studies on VLT (Schiefer 1992, D u tta 2007)
have shown th a t VAS are produced with lower VLT than the VS. One of the prim ary
research questions for this study was to investigate whether a parallel could be found
in the duration of closure (CD) in VLS and VLAS. Temporally, voiceless and voiced
stops in Hindi show stop manner dependent duration patterns of closure and closure
voicing (VLT). Based on the manner of articulation, the duration of the CD becomes
predictable irrespective of whether these stops are voiced or voiceless. The duration
patterns for CD suggest th a t there is a parallel distribution of CD dependent on
the m anner of articulation. A spirated stops have shorter durations compared to
unaspirated stops.
The duration of aspiration for VLAS is such th a t the aspiration portion is nearly
20-30 percent of the vowel and is dependent on the place of articulation of the stops
in VLAS. Compared to this result, the duration of breathy/m urm ur following the
VAS is nearly 30-50 percent of the vowel and is not dependent on the place of

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

articulation of the preceding stop. These results confirm the initial observations
from D u tta (2007) th a t the breathy/m urm ured portion following the VAS perm eate
for a longer duration than the voiceless aspiration following the VLAS. Marginal
effects of the context on the durations have also been shown.
The f0 contours of the vowels following the four stops in Hindi show th a t for a
m ajority of the speakers in this study, VAS have lower mean f0 values till about
20-30 percent of the vowel. VS tend to have comparably higher f0 values th an the
VAS in the beginning of the vowels. VLAS and VLS stops tend to have higher mean
fo values initially when compared to the voiced stops. In this study, effect of these fo
perturbation patterns th a t are dependent on stop type have been shown to persist
till about 30 percent of the vowel, at least for VAS. For four of the five speakers it
has been shown th a t a three way categorical distinction can be made on the basis
of the f0 patterns in the beginning 20-30 percent of the vowel.
Spectral measures based on measurement of relative am plitudes of H i, H 2 , A 3.
A 2 and A 3 show th a t stop type has a significant effect on measures of spectral tilt.
Amongst the four measures th a t were used in this study, H i-H 2. an indirect measure
of Open Quotient. H i-A 3, a measure of the first formant bandw idth, and H i-A 2, an
indirect measure of the abruptness of the vocal fold closure or skewness are the
most reliable measures th a t help distinguish the VAS from the unaspirated stops
(VS, VLS). H 1 -A 3 , a measure of the over all spectral tilt is not a reliable cue for dis
crimination in this data. The spectral measures also indicate th a t till 30-50 percent
of the vowel the VAS can be categorized as having breathy phonation. Comparing
the duration of aspiration in VLAS and the breathy portion following the VAS.
we can conclude th a t the breathy phonation is comparably longer. In addition, we
have shown th a t the aspiration portion following the VLAS is dependent on the
place of articulation of the stop, w ith velar VLAS tending to have longer durations
of aspiration compared to other places of articulation. Place of articulation does not

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

have an effect on the breathy portion following the VAS. This suggests th a t place
of articulation dependent supraglottal configurations do not have an effect on the
breathy portion following the VAS. These findings suggest th a t in the phonology if
VAS are categorized as aspirated stops, the particular feature th a t describes this
distinctive release needs to be phonetically realized quite differently from the aspi
ration following the VLAS. In order to capture the spread of the breathy/m urm ured
release, the aspiration following the VAS would have to be laryngeally specified to
be either delayed onset of modal voicing or as a distinctive release following the stop.
The durational differences in the phonetic outcome of this phonologically distinctive
feature can be b etter explained if all of the cues responsible in making the four-way
stops contrasts are understood in part to function as enhancing cues (Keyser and
Stevens 2006). In the following section, I discuss the relevance of th e results from
this study for a complete understanding of the four-way stop contrasts in Hindi.
I also discuss the implications of these findings towards a phonological account of
these stop contrasts th a t takes into close consideration the phonetic facts. Further,
I develop the idea th a t complex cue interaction and correlation leads to cues acting
in favor of each other in order to preserve the existing contrasts.


Im p lication s and further research

Based on the results from the durational features such as VLT and CD it can be
said th a t the standard view of stop production in Hindi needs

to be amended in

order to b etter explain the durational patterns for VLT and CD.

The CD and VLT

durations vary according to stop manner in Hindi, with aspirated stops tending to
have shorter closures compared to unaspirated stops. The standard view suggests
th a t the distinction between VAS and VS is prim arily a distinction in the breathy
release following the VAS. The durational features th at have been examined in


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

this study support the view th a t differences in the duration of closure between the
unaspirated and aspirated stops could indeed be relevant for making a contrast
between the stop types. Results from the effect of the stop type on f0 also show
th a t fo following VAS tends to be lower compared to the voiceless stops till about 30
percent of the vowel. These results confirm findings from previous research on the
effect of stops on f0 (Schiefer 1986. Ohala 1979). VS also tend to lower fo as compared
to voiceless stops. These findings suggest th a t fo can also act as a valuable cue for
stop identification. Based on the differences in VLT between VAS and VS it can also
be said the correlation between voicing and fo lowering is not entirely accurate. VLT
for VAS is consistently shorter than VLT for VS. In th a t respect, a comparatively
less voiced VAS tends to lower f0 more than the VS. Hence it can be argued th a t the
causes for f0 lowering following VAS are not entirely due to the universal tendency for
fo lowering following voiced stops. Rather, the breathy/m urm ured portion following
the VAS is also correlated with the duration of the fo lowering following these stops.
It can. hence, be argued th a t breathy/m urm ured portion is the prim ary reason
behind the fo lowering patterns, rather than the universal tendency for voiced stops
to lower f0. An alternative analysis th a t can be formulated based on these results
is also dependent on the severity of the f0 lowering following the VAS. Noting th a t
the fo lowering is significantly lower following VAS than VS it is probable th at
both the universal tendency and the breathv/m urm ured mode of phonation are
partly responsible for the further lowering of the f0 following the VAS. The spectral
intensity measures show th a t the VAS are distinct in th a t they are produced with
a larger Open Quotient , first formant bandw idth and skewness of the glottal pulse.
It has also been shown th a t the breathy portion following the VAS tends to be
longer than the aspiration portion following the VLAS. In term s of aspiration as a
contrastive feature, this result confirms Ladefogeds (Ladefoged 1971) earlier view
th a t the Hindi VAS are produced with a breathy release th a t is distinct from voiceless

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

These results suggest th a t the stop distinctions in Hindi are a result of several
acoustic features, including durational as well as spectral features. In addition, f0
also plays a crucial role in the acoustic distinction between the voiced stops and the
voiceless stops. The results from this study also provide a basis for understanding
the complexity of cue interaction in phonological systems with more th an three-way
contrasts. In light of the results from this study, we can see th a t a four-way phono
logical contrast can lead up to a complex interaction between the acoustic cues
th a t could be eventually responsible for either defining a contrast or m aintaining a
contrast. Given the level of interaction th a t can be seen from these results, it can
be well-motivated to suggest th a t perhaps the latter is the case. In section 3.5, I
have discussed the relevance of the theory of enhancement towards a b etter under
standing of acoustic cue interaction in Hindi. I have argued th a t stop distinctions
in Hindi could be a cumulative result of the several cues functioning together. How
ever, in order to be able to provide evidence in favour of a theory of enhancem ent it
is essential first to identify those features th a t threaten to obliterate a contrast. As
can be seen from the CD results, there is a parallel distribution of closure durations
dependent on the manner of articulation; aspirated stops have shorter closures than
unaspirated stops. The voicing during closure for VS and VAS in this respect, is
a defining feature in terms of m aintaining a contrast between voiced and voiceless
stops. Voicing, as is universally attested can be affected by contextual and coarticulatory effects. These effects can in tu rn result in the obliteration of contrasts
between voiced and voiceless stops. In the context of Hindi, then, as specified bv
Keyser and Stevens (2006), certain language specific features can be argued to be in
place to enhance the voicing contrast, for instance. Thus, com pensatory mechanisms
could be responsible for the acoustic outcomes of the phonological contrasts.
Further research in this direction will have to involve conducting perception


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

experiments th a t would provide us w ith an insight into the contribution of the

various acoustic features in making the stop distinctions possible in Hindi. This
being an acoustic study, CD measurements could not be made for U tterance Initial
[U] stops. Further research th a t examines the relevance of CD in this context will
provide im portant insights into the role of CD in stop distinctions in Hindi. The
laboratory environment and the speech m aterial th a t was recorded did not lend
itself to an understanding of the effect prosodic context on the durational and fo
features. Future research in this direction will help understand in greater detail the
effect of varying prosodic contexts.
The results from the spectral intensity study also suggest th a t disparate artic
ulatory causes produce the singular perceptual effect of breathiness in the speech
stream. This can form the basis for conducting a study th a t can examine the ef
fect of linguistic background on the categorical nature of perceptual voice quality
judgements. If indeed speakers linguistic background has an effect on their per
ceptual judgem ents of voice quality, we should expect American English listeners
to behave differently from the Hindi and G ujarati listeners. Further, since G ujarati
also has a contrast between modal and breathy vowels, in addition to the breathy
and modal stops, we can expect G ujarati listeners to behave differently from Hindi
as well as the American English listeners. The results from this study will have
several implications, both for a b etter understanding of the use of spectral cues for
making contrasts and for an accurate estim ation of voice quality judgem ents for
speakers who do not employ breathiness contrastively. First, this study will show
whether speakers whose languages employ breathiness contrastively will differ in
their breathiness judgem ents from American English speakers. Secondly, it will also
provide us with insights into the variety of perceptual ratings for breathiness th a t
speakers from all three languages might exhibit. This in tu rn will prove to be cru
cial from a psychoacoustic and speech therapy perspective, since critical distinctions

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

between normal and disordered speech could be understood on the basis of a set
of param eters th a t govern speaker judgem ents of voice quality. Acoustic techniques
have benefited our understanding of disordered breathiness as in Hillenbrand, Cleve
land and Erickson (1994), the same techniques have also proven to be relevant for
studying breathy voicing in vowels, stops and affricates giving us an insight into the
production of contrastive breathiness (Bali 1999, Bickley 1982, Blankenship 1998).
Results from this study will contribute further towards the utilization of acoustic
phonetic techniques to study both normal and disordered speech.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A ppendix 1: Frame Sentences

U tte r a n c e in itia l [U]


T arget W ord ka: matlab vo sam d^a:

T arget W ord GEN meaning he understand NEG
He d id n 't understand the meaning of (the) T arget W ord

P h rase in itia l [I]


T arget W ord ka:
m atlab bataiije ?
Excuse me, T arget W ord GEN meaning tell
Excuse me, could (you) tell (me) the meaning of the T arget W ord

P h ra se m ed ia l [M]

(3) msme mohan ko k ^ais Target Word bolte hue suna:

Mohan ACC sp ecia l Target Word tell heard say
I heard Mohan say sp ecia l Target W ord


(4) me:ne mohan ko la:l Target W ord bolte hue suna:

Mohan ACC red Target Word tell heard say
I heard Mohan say red Target Word"


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A ppendix 2: Word List





A spirated


A spirated

ka:l, ka:k. ka:g


gad, ga:d, gait





ta:p, tail, ta:<%

R a:p,


V ail

tail, ta:p, ta:b

t ha:t-











P/4 a:t



* ad, (\h a:g,

c} /1 a:k

p ftad.





Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.


T a r g e t W o rd

M e a n in g


NM. Time: period; age: era


NM. Crow


NM. Cork

k ha:l

N F. Skin; hide

k fta:d

NF. M anure; fertilizer

k ha :t

NF. Cot; B edstead


NM. A cheek


N F. Sedim ent; dregs


NM. Body: person


NM; Adj. C unning, shrew d (person)

gh a :t

NM. River bank

g fta:l



NM. 1. A lake. 2. rh y th m


NM. H eat; tem p e ra tu re


NM . Crown


NF. P at: ta p or palm stroke over a percussion in stru m en t

t fca:b


t ha:l

NM. M etallic p late


N F. Pulse; Lentils


NM. A speck; stain



d h a:r

N F. An edge; sh arp edge


NF. C om m anding/overw helm ing influence: sway

d 'l a:g



Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

N F . 1. A s t o c k 2. P r e v a r i c a t i o n


NF. Hoof

t a;b


t ha:t

NM. Pom p, splendor

t Aa:la

A dj. Idle NM . Scarcity

t^ a ip



NF. B ranch (of a tree)

4a: k

NF. Mail; p o st



4 A a:l

NF. 1. Shield 2. Slope

ih a:g


4* a:k

NM. 1. A p a rtic u la r tre e 2. Sticking to an unwelcome

conv en tio n /cu sto m


NM. 1. T he o th er c o a s t/b a n k 2. Adv. Across: on the o th er side


NM. 1. A nything boiled in sugar syrup

2. NF. A long w inding tu rb a n


Adj. Holy, sacred; pure


NM. A typical song sung during Holi festival

p ha:l

NM. A blade: ploughshare

P ^ ait

NM. A division of land


NM. Hair


NN. A garden: park

ba :t

NF. Talk


NM. Load: weight: burden

b '1a:g

NM. Portion: p art

bJ' a:p

NF. Steam : vapour


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A ppendix 3: Language
Background Q uestionnaire
Note: The original language background questionnaire was typed in Devnagari script
and potential subjects were asked to fill in the details. The following is a translation
from the original Hindi language background questionnaire.
1. N a m e ---------------------2

. A g e -----------------------

3. Gender (Please circle one) Female/M ale

4. Native lan g u ag e----------5. Ability in native language (Please say 'yes' or 'no') speak - read - write

. How many years have you learned your m other tongue in school? -

7. Please write the names of other languages th a t you can speak, read and write
and also indicate your ability in these languages.




W rite

. M other's native lan g u ag e------------ -


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

9. Your m others ability in her native language (Please say yes or no) speak
- read - write
10. Please write the names of other languages th a t your m other can speak, read
and write also indicate her ability in these languages.




W rite

11. F athers native language

12. Your fath ers ability in his native language (Please say ves or no) speak read - write 13. Please write the names of other languages th a t your father can speak, read
and write and also indicate his ability in these languages.




W rite


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

R eferences

: n.d., M aster's thesis.

Abramson, A.: 1977, Laryngeal tim ing in consonant distinctions, Phonetica 34, 295303.
Abramson, A. and Lisker, L.: 1967, Dicriminability along the voicing continuum,
In Proc. 6th Int. Congr. Ph-on. Set., pp. 569-573.
Allen, W. S.: 1953, Phonetics in A ncient India, Oxford University Press.
Andruski, J. E. and Ratliffe, M.: 2000, Phonation tj^pes in the production of phono
logical tone: The case of Green Mong, Journal of the International Phonetic
Association 30, 2604-2637.
Avery, P. and Idsardi, W. J.: 2001, Laryngeal dimension, completion and enhance
ment, in T. A. Hall (ed.), Distinctive feature theory. M outon de G ruyter,
pp. 41-70.
Bali, K.: 1999. A spectral analysis of breathiness of intervocalic voiced aspirated
plosives in Delhi Hindi. In proceed/mgs of ICPhs99 San Fransisco p. 2036.
Benguerel, A.-P. and Bhat-ia. T. K.: 1980, Hindi stop consonants: An acoustic and
fiberscopic study. Phonetica 37. 134-48.
Bickley, C.: 1982, Acoustic analysis and perception of breathy vowels, Speech Com
munication Group Working Papers I Research Lab. of M IT pp. 71-82.
Blankenship, B.: 1998, The Time Course of Breathiness and Laryngealization in
Vowels. PhD thesis, UCLA.
Cho. T. and Jun, S.-A.: 2000, Domain-initial strengthening as featural enhance
ment: Aerodynamic evidence from korean. In Chicago Linguistic Society
36, 31-44.
Cho, T. and McQueen, J. M.: 2005/4, Prosodic influences on consonant production
in Dutch: Effects of prosodic boundaries, phrasal accent and lexical stress,
Journal of Phonetics 33(2), 121-157.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Cole, J., Kim, H., Choi, H. and Hasegawa-Johnson, M.: 2007, Prosodic effects on
acoustic cues to stop voicing and place of articulation: Evidence from radio
news speech, Journal of Phonetics 35(2), 180-209.
Davis, K.: 1994, Stop voicing in Hindi, Journal of Phonetics 22, 177-193.
Dixit, R. P.: 1982, On defining aspiration, In proceedings o f the thirteenth Interna
tional Conference o f Linguists pp. 606-10.
Dixit, R. P.: 1987a, In defense of the phonetic adequacy of the traditional term
voiced aspirated, Proc. X lth ICPhS, Tallinn 1, 145-148.
Dixit, R. P.: 1987b. Mechanisms for voicing and aspiration: Hindi and other lan
guages compared, UCLA Working Papers m Phonetics 67, 49-102.
Dixit, R. P.: 1989, G lottal gestures in Hindi plosives, Journal of Phonetics
17(3), 213-237.
Dixit, R. P. and MacNeilage, P. F.: 1980, Cricothroid activity and control of voicing
in Hindi stops and affricates, Phonetica 37, 397-406.
D utta, I.: 2007, Correlation between voicing lead time (VLT) and f0, Proceedings of
the 39th annual meeting o f the Chicago Linguistics Society, pp. 405-422.
Esposito, C. M.: 2003, Santa ana del valle zapotec phonation, M asters thesis, Uni
versity of Californa, Los Angeles.
Fischer-Jorgensen. E.: 1967. Phonetic analysis of breathy (murmured) vowels in
G ujarati, Indian Linguistics: Journal of the Linguistic Society o f India 28. 71139.
Halle, M. and Stevens. K. N.: 1971, A note on laryngeal features, Quarterly Progess
Report. R.esearch Laboratory of Electronics. M IT 101, 198-212.
Hanson, H. M.: 1995, Glottal characteristics of female speakers, PhD thesis, Harvard
University. MA.
Hanson, H. M., Stevens, K. N.. Kuo, H.-K. J., Chen, M. Y. and Slifka, J.: 2001,
Towards models of phonation, Journal of Phonetics 29, 451-480.
Hillenbrand, J., Cleveland, R. A. and Erickson, R. L.: 1994, Acoustic correlates of
breathy vocal quality, Journal of Speech and Hearing Research 37(4). 769-78.
Hirose, H.: 1977, Laryngeal adjustm ent in consonant production, Phonetica 34, 289294.
Hombert, J.-M.: 1978. Consonant types, vowel quality, and tone, in V. Fromkin
(ed.). Tone: A linguistic survey. Academic Press, pp. 77-111.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Hombert, J.-M ., Ohala, J. and Ewan, W.: 1979, Phonetic explanations for the
development of tones, Language 55, 37-58.
House, A. and Fairbanks, G.: 1953, The influence of consonant environment upon
the secondary acoustical characteristics of vowels, JA SA 25, 105-113.
Ingemann, F. and Yadav, R.: 1978, Voiced aspirated consonants, Papers from the
1977 M id-America Linguistics Conference, Columbia: University of Missouri,
pp. 337-344.
Kagaya, R. and Hirose, H.: 1975, Fiberoptic electromyographic and acoustic anal
yses of Hindi stop consonants, Annual Bulletin of the Research Institute of
Logopedics and Phoniatrics 9, 27-46.
Keating, P.: 1988, A survey of phonological features, Reproduced by the Indiana
University Linguistics Club .
Keyser, S. J. and Stevens, K. N.: 2006, Enhancement and overlap in the speech
chain, Language 82(1), 33-63.
Ladefoged, P.: 1971, Preliminaries to linguistic phonetics, University of Chicago
Press, Chicago.
Ladefoged, P.: 1975, A course in Phonetics, H arcourt Brace Jovanovich, New York.
Ladefoged, P. and Antonanzas-Barroso, N.: 1985, Com puter measures of breathy
voice quality, UCLA Working Papers in Phonetics 61, 79-86.
Ladefoged, P. and Maddieson, L: 1996, The Sounds' o f the World's Languages.
Oxford: Blackwell.
Lehiste, I. and Peterson, G. E.: 1961. Some basic considerations in the analysis of
intonation, The Journal of the Acoustical Society of America 33(4). 419 -425.
Lisker, L. and Abramson, A. S.: 1964, A cross-language study of voicing in initial
stops: acoustical measurements. Word 2 0 , 384-422.
Lofqvist, A.: 1975, Intrinsic and extrinsic fo variations in Swedish tonal accents.
Phonetica 31. 228-247.
Lombardi, L.: 1994, Laryngeal features and laryngeal neutralization, O utstanding
dissertations in Linguistics, G arland Publishing.
M ikuteit, S. and Reetz, H.: 2007, Caught in the act: The tim ing of aspiration and
voicing in east bengali, Language and Speech 50.
Ni Chasaide, A. and Gobi, C.: 1997, Voice source variation, in W. ,J. H. Laver and
John (eds), The Handbook o f Phonetic Sciences, Blackwell.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Ohala, J.: 1973, The physiology of tone, in L. Hyman (ed.), Consonant Types and
Tones, Southern California Occassional Papers in Linguistics, p. 539.
Ohala, J.: 1974, Experim ental historical phonology, In proceedings of the 1st In ter
national Conference on Historical Linguistics, Vol. II, Edinburgh, pp. 353-387.
Ohala, M.: 1979, Phonological features of Hindi stops, In proceedings of the South
Asian Language Analysis Roundtable 1, 79-88.
Ohde, R. N.: 1984, Fundamental frequency as an acoustic correlate of stop consonant
voicing, The Journal o f the Acoustical Society of America 75(1), 2 24-230.
Poon, P. G. and Mateer, C. A.: 1985, A study of VOT in Nepali stop consonants,
Phonetica 42. 39-47.
Purcell, E. T., Villegas, G. and Young, S. P.: 1978, A before and after for tonogenesis, Phonetica 35, 284-93.
Schiefer, L.: 1986, F 0 in the production and perception of breathy stops: Evidence
from Hindi, Phonetica 43, 43-69.
Schiefer, L.: 1989, Voiced aspirated or breathy voiced and the case for articula
tory phonology, Forschungsbenchte des Instituts fu r Phonetik und Sprachhche
K om m unikation der Universitdt Miinchen pp. 257-278.
Schiefer, L.: 1992, Trading relations in the perception of stops and their implications
for a phonological theory. In Papers in Laboratory Phonology, Vol. II, pp. 296313.
Selkirk, E.: 1992. Comments on "trading relations in the perception of stops and
their implications for a phonological th eo ry ", In Papers in Laboratory Phonol
ogy, Vol. II, pp. 313-318.
Shili. C., Mobius. B. and Narasimhan, B.: 1999, Contextual effects on consonant
voicing profiles: a cross-linguistic study, In proceedings o f ICPhs99 San Fransisco.
Stevens, K. and Hanson. H.: 1994, Classification of glottal vibration from acoustic
m easurements, Paper presented at the 8th Vocal Fold Physiology conference in
Kurum e. Japan. April 7-9 .
Thurgood, E.: 2004. Phonation types in Javanese, Oceanic Linguistics 43(2). 277
Umeda, N.: 1981. Influence of segmental factors on fundam ental frequency in fluent
speech. The Journal o f the Acoustical Society o f America 70(2), 350-355.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Wayland, R. P.: 1998, Acoustic and Perceptual Investigation o f Breathy and Clear
Phonation in Chanthaburi Khmer: Implications fo r the History of K hm er
Phonology, PhD thesis, Cornell University.
Whitney, W. D.: 1860-1863, The Atharva-Veda Pratigakhya, or Qaunakiya
C aturadhyayika, Journal of the American Oriental Society 7, 333-615.
Yadav, R.: 1984, Voicing and aspiration in Maithili: a fiberoptic and acoustic study,
Indian Linguistics 45. 1-25.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A u th ors Biography

Indranil D u tta was born in Maligaon, Assam, in India on April 13, 1973. He gradu
ated from the Jawaharlal Nehru University in 1997 with an B.A. degree in Germ an
and an M.A. in Linguistics. Following th a t he came to the University of Illinois
and received a second M.A. degree and a Ph.D. in Linguistics in 2007. In between
O ctober 2006 and February 2007, he worked on Text-to-Speech research and devel
opment for Nuance Communications in Belgium. His research interests lie w ithin
the broad fields of acoustic and articulatory phonetics, speech perception, phono
logical theory, and speech synthesis and technologies. Aside from these interests,
he is also inclined towards com putational, and statistical speech and articulatory
modeling, com putational modeling of pronunciation, historical linguistics, language
contact and general linguistic theory.
Following the completion of his Ph.D. he will be teaching at Rice University at
Houston. Texas.


Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

You might also like