Professional Documents
Culture Documents
The Structure of Spoken Language - Intonation in Romance
The Structure of Spoken Language - Intonation in Romance
The Structure of Spoken Language - Intonation in Romance
Philippe Martin
University Printing House, Cambridge CB2 8BS, United Kingdom
www.cambridge.org
Information on this title: www.cambridge.org/9781107036185
© Philippe Martin 2015
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2015
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data
Martin, Philippe, 1944– author.
The structure of spoken language : intonation in Romance / Philippe Martin.
pages cm
ISBN 978-1-107-03618-5 (hardback)
1. Romance languages – Phonetics – Intonation. 2. Romance languages –
Phonology. 3. Romance languages – Phonology, Historical. 4. Romance
languages – Spoken Romance languages. 5. Intonation (Phonetics)
6. Biolinguistics. I. Title.
PC81.5.M27 2015
440′.0415–dc23
2015012063
ISBN 978-1-107-03618-5 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents
1 Introduction 1
The respiratory cycle 1
The source-filter model of phonation 3
Emotions 5
Voiced and unvoiced speech sounds 7
Laryngeal frequency 7
Fundamental frequency and melodic curve 7
Intensity 9
Spectrographic analysis 9
Syllabic duration 10
Syntax and prosody 11
The prosodic structure: the structure of spoken language 13
Stressed syllables 13
Intonation and syntax 14
Brain waves and prosody 14
A Copernican change 15
From laboratory to spontaneous speech 16
Reading and listening 16
Romance languages 17
3 Transcription systems 29
Acoustic and perceived data 30
v
vi Contents
Inventory 138
Processing prosodic information 141
Prosodic structures in Romance languages 144
Identification of prosodic contours 144
Complex contour 146
Experimental data 148
Sequences of two prosodic words 150
Sequences of three prosodic words 160
Sequences of four prosodic words and more 185
Coordination, enumeration, parenthesis 192
Coordination 192
Enumeration 198
Parenthesis 200
An example of AM prosodic analysis in French 203
An example of ISC prosodic processing in French 208
Conclusion 212
8 Macrosyntax 214
A first approach 215
Three current models for macrosyntax 217
The theory of la lingua in atto 220
Text macrosyntax and prosodic macrosyntax 221
Merging text and intonation 222
Dysfluencies 224
Ponctuants 225
The prosodic eraser 226
Use of dysfluencies 226
Deletions 227
Additions 228
Text and prosodic macrosegments 230
Examples of macrosyntactic analysis 232
French 233
Italian 241
Portuguese 245
Conclusion 248
9 Applications 249
Teaching French prosodic structure 249
Silent reading 252
Eye movement 253
Subvocalization 253
Delta wave synchronization 255
10 Conclusion 256
Quotes from Frédéric Dard (San Antonio) 256
11 WinPitch 259
Sound recording made clear 259
Sound and video 260
Transcription and alignment on the fly 261
Contents ix
References 272
Analyzed corpora 285
Author index 287
Subject index 290
List of figures and maps
Figures
1.1 Respiration cycle, without phonation (top) and with
phonation (bottom) page 2
1.2 An example of an out-of-breath speaker (NS), interrupting
the phonation process by pauses longer than usual 3
1.3 Source-filter model of phonation 4
1.4 Interactions in the source-filter model between phonation and
emotions 4
1.5 Extreme cases of the emotion–phonology relationship:
emotion dominates phonology (extreme stress or anger), and
phonology dominates emotions 6
1.6 An example of melodic curve, interrupted at segments without
voicing (including pauses and silence), with the fundamental
frequency (top), intensity (middle) and wave (bottom) curves 8
1.7 Narrowband spectrogram to visualize harmonics
corresponding to the fundamental frequency curve 10
1.8 Staircase duration curves showing the evolution of syllabic
duration 11
1.9 Bézier duration curves showing the evolution of syllabic
duration 12
1.10 Map: Romance dialects 18
2.1 Rousselot kymograph 21
2.2 Measure of the laryngeal period, directly (top), or indirectly
from the duration of a packet of ten periods 22
2.3 Spectrogram printed on thermo-sensitive paper 22
2.4 The ten basic intonation patterns for French by Delattre (1966) 24
2.5 An example of analysis with the software Waves™ of the
sentence Jim builds a big daisy-chain (from ToBI, 1999) 25
2.6 An example of a fundamental frequency curve with a wide
band spectrogram displayed underneath (from Delais-
Roussarie et al., 2015) 26
x
List of figures and maps xi
xix
Preface
xxi
xxii Preface
Indeed, I was then able to process a very large amount of data, essentially in
French, sometimes playing with the intonation of my own voice and getting the
resulting melodic curves immediately, and was curious to see if some pattern
would emerge from all these trials. In the late months of 1973, I finally got an
idea pertaining to the frequently observed regularity of F0 patterns, an idea that
I formalized with the term contraste de pente (contrast of melodic slope) in
French, and published in the review Linguistics in 1975 (Martin, 1973, 1975).
At the same time, I coined the terms structure prosodique and hiérarchie
prosodique (prosodic hierarchy and prosodic structure) to name a hierarchical
organization of minimal prosodic units or prosodic words, containing one and
only one prosodic event indicating this hierarchy. Although referring to the
same kind of experimental data in French, papers published at that time (e.g.
Vaissière, 1975; Émerard, 1977) were phonetically rather than phonologically
oriented but brought comparable data.
To my regret, my Linguistics paper had almost no impact on the research
domain in prosody, except in France. Only five years later, however, papers
using the term prosodic structure appeared, but unfortunately without ever
mentioning my earlier work. Later, speech analysis computer software
became popular (Signalyze, WinPitch, Praat, and so on), and phonologists
(essentially based in US universities) were happy to discover a new
playground. To differentiate their activity from that of the phoneticians, who
were not considered seriously by linguists at the time, and to avoid being
confused with them, they called it “laboratory phonology,” which
corresponds to what most phoneticians have actually been doing for a
century or more.
The purpose of this monograph is to present an alternative theoretical
approach that attempts to describe and understand the prosodic organization
of sentences. In this endeavor, I briefly present a critical exposition of the main
aspects of the dominant Autosegmental-Metrical model (henceforth AM),
succinctly describing existing research using this approach for Romance
languages such as European French, Italian, Spanish, Catalan, European
Portuguese, and Romanian (Post, 2000; D'Imperio 2002; D'Imperio et al.,
2005; Michelas & D’Imperio, 2010; Sosa, 1999; Hualde, 2003; Prieto, 2014;
Frota, 2009). Then I introduce an alternative model, called Incremental Storage
Concatenation (henceforth ISC) derived from the Storage-Concatenation
model I proposed in 2009 (Martin, 2009). In this model, I highlight some
characteristics, apparently never mentioned in AM descriptions, formalized
as a set of constraints limiting the number of prosodic structures that could be
associated with a given text.
This leads to a concept of intonation that from the start completely
dissociates sentence text from its hierarchical organization by syntax. This
concept departs dramatically from earlier concepts of prosodic structure
Preface xxiii
conceived under the AM approach, where only one such structure can be
associated with a given syntactic structure, even if its restructuration appears
possible,in order to obtain a better eurhythmicity (Post, 1999).
The set of constraints, originally part of the Storage-Concatenation
framework, i.e. planarity, the seven syllables rule, eurhythmicity, stress clash,
and syntactic clash, made me look for an underlying explanation that gives a
proper account for the observed constraints. A key aspect is their time
dimension and especially the dynamic process performed by listeners to
recover the prosodic structure intended by the speaker or the writer.
Examining the consequences of the time domain aspect of the process is the
key to a better understanding of the observed data, an aspect that is often
neglected or totally ignored in the current literature. Indeed, the usual
reasoning on a two-dimensional plane of a sheet of paper limits considerably
an understanding of the mechanisms necessarily used by the listener in the
perception of the prosodic structure.
Pushing this exploration further, I related this model to results obtained
recently in the neurolinguistic domain, and particularly those concerning
evoked potential linked to prosodic stimuli. These results lead me to propose
a new and coherent model based not only on the time dynamics of the prosodic
structure but also, and perhaps even more interestingly, on specific cognitive
mechanisms, in particular those involving short-term memory (Gilbert, 2012).
This approach suggests a convincing set of explanations pertaining not only to
the set of constraints relative to the prosodic structure but also to some phonetic
data, such as the duration of minimal units of prosody (defined below as
prosodic words), the minimal and maximal time interval between
consecutive stressed syllables, and even the speed limits of silent reading.
The second part of this book is devoted to applications of the model
presented in the first part to the analysis of data in some Romance languages,
starting with French, often considered as the ugly duckling among other
languages of the same family as it is deprived of lexical stress. This second
part itself is divided according to the type of data analyzed: read/laboratory
speech and spontaneous/non-prepared speech. In this latter set of chapters, I use
a modified macrosyntax approach derived from the GARS (Groupe Aixois de
Recherche en Syntaxe) work (Blanche-Benveniste, 1990, 2000) for both the
text and the prosodic aspects of speaker productions.
I sincerely hope that this book will help both new and experienced
researchers in the field of prosody to restore sentence intonation to its
deserved place in linguistic studies. I will try to show that far from being the
cherry on the phonological cake for some, intonation is the essential linguistic
base for both speech production and speech perception.
Acknowledgments
I have many people to thank, and in the first place, Pierre Léon (1926–2013)
who, like Obelix, a cartoon character in the adventures of the French comic
book Astérix, “plunged me in a barrel of prosody when I was little.” From
Pierre Léon I learned a lot of facts about intonation in linguistics, stylistics,
phonetics, etc., and about how to survive in the academic world.
In addition, I had the privilege to meet and work with the outstanding linguist
Claire Blanche-Benveniste (1935–2010). She had a tremendous influence on
my research, always encouraging me to improve in our countless fruitful and
pleasant discussions.
Many other people helped me in various ways. In particular, I would like to
thank (in alphabetic order):
Mathieu Avanzi (Université de Neuchatel) for his numerous useful (and
exacting) comments;
Helen Barton (Cambridge University Press) for her constant support
and encouragement in this project;
Gabriela Bilbiie (Université Paris Diderot) for her help in elaborating
the Romanian corpus;
Victor Boucher (Université de Montréal) for his original and fruitful
views on speech perception and his constant support for this project;
Georges Boulakia (Université Paris Diderot) for his constant friendship
and understanding;
Marie Claude Capt-Artaud (Université de Genève), formerly skeptical
but now convinced;
Emanuela Cresti (Università degli Studi di Firenze) for our discussions
and her constant friendship;
Jeanne-Marie Debaisieux (Université Paris 3) for the trust she placed in
my research;
Élisabeth Delais-Roussarie (Université Paris Diderot) for many inter-
esting discussions;
Didier Demolin (Université de Grenoble) for his indefectible friendship;
José Henri Deulofeu (Université Aix-Marseille) for teaching me what
macrosyntax is and staying my friend;
xxv
xxvi Acknowledgments
To help the reader to quickly evaluate the distance from known (and
dominant?) concepts in the field of intonation studies, the following list
contains the essential nonstandard theoretical points developed in this book.
1. This book is about the structure of spoken language.
2. Spoken language is made of time sequences of syllables organized into
stress groups (basic units of speech are syllables, not phonemes).
3. Stress groups are not necessarily aligned with words or syntactic groups;
however, they are aligned on complete words (i.e. their beginnings and
ends are aligned on beginnings and ends of lexical units – words).
4. Stress groups are also called rhythmic groups, accent groups, prosodic
words, and Accent Phrases in the literature.
5. Prosodic words are segments of prosody associated with and aligned on
stress groups.
6. Prosodic words are organized hierarchically by a prosodic structure.
7. Specific prosodic markers indicate prosodic structures; they allow the
listener to reconstitute dynamically the speaker intended prosodic struc-
ture in an incremental time fashion.
8. Prosodic markers are instantiated by prosodic events located on stressed
groups’ stressed and final vowels.
9. Prosodic events are instantiated by prosodic contours, described primarily
in acoustic terms of duration, melodic contour, and intensity.
10. (Silent) reading and speaking are described as an Incremental Storage-
Concatenation (ISC) process.
11. Recovering the prosodic structure in (silent) reading mode is a specific
process distinct from listening to speech.
12. Generation of spontaneous speech involves chunks of prosody hosting
syntactic constructions, which in turn host morphological units.
13. There is therefore a precedence of prosody over syntax, and of syntax over
morphology.
14. It follows that the same prosodic structure can host various syntactic and
morphological constructions, i.e. different texts.
15. Conversely, more than one prosodic structure can be associated with a
given text (for example when reading).
xxvii
xxviii Key concepts
16. Prosodic boundaries (between an AP, ip, or IP) do not correspond neces-
sarily to syntactic or macrosyntactic boundaries. Likewise, (macro)syn-
tactic boundaries do not necessarily correspond to prosodic boundaries.
17. Stress shift in stress clash conditions entails a reallocation of stress groups
organized hierarchically in the prosodic structure;
18. Generation of a prosodic structure when reading involves the precedence
of syntax (analyzed by the reader from the written text).
19. Prosodic structures are not necessarily congruent with the sentence syn-
tactic structure. They do not result from restructuration of the prosodic
structure either. Actually they do not coexist with syntax; they precede
syntax.
20. Prosodic markers are subject to neutralization of some of their acoustic
features when partially or totally redundant in a given prosodic structure
configuration.
21. Prosodic markers must be acoustically similar in their respective domain.
22. Acoustic features describing prosodic events ensure a necessary and
sufficient differentiation between prosodic markers (melodic contours) in
the prosodic structure.
23. Prosodic structures are constrained by a set of rules: planarity / seven
syllables / stress clash / syntactic clash / eurhythmy (the latter for read
speech).
24. Neurocognitive properties and processes may explain these constraints.
25. Prosodic structure and prosodic markers properties are extended to
macrosyntax.
26. Broad and narrow focus are subcases of macrosyntax configurations
(Prenucleus, Nucleus, Postnucleus).
27. There is a macrosyntax analysis of sentence intonation (no Prefix, only
prosodic Nucleus, prosodic Parenthesis, and prosodic postfix).
In order to be compatible with the many other studies on intonation that are
probably familiar to most readers, I use the terms R (prosodic structure root), IP
(intonation phrase), ip (intermediate intonation phrase), and AP (Accent
Phrase) throughout this book whenever possible, despite potential general
conceptual differences.
1 Introduction
1
2 Introduction
Silence
Time
Phonation
Phonation Phonation
Time
Figure 1.1 Respiration cycle, without phonation (top) and with phonation
(bottom).
the same number of syllables that a speaker would in a more neutral emotional
state. On the other hand, some types of anger and fear consume a lot of
physiological energy. This state leads to shorter phonation sequences, which
may not even reach the duration of a single sentence, or of a complete syntagm,
and which may end with an unexpected (for the listener) respiratory pause of
considerable duration.
An example of an out-of-breath speaker phonation cycle, when the speaker
needs inspiratory pauses that are longer and more frequent than usual, is given
in Figure 1.2.
It is clear, then, that in order to master her/his speaking activity, the speaker
must constantly control her/his physiological state, which is itself conditioned
by an emotional state, in order to control the air volume inspired to the lungs
and to maintain a sufficient subglottal pressure during expiration. The larger the
pulmonic air debit, the shorter the phonation time, as, for example, during
phonation with high acoustic intensity, a higher than usual laryngeal frequency,
or with a large amount of melodic variation. Conversely, a low debit of
pulmonic air will result in low-intensity speech and a lower laryngeal fre-
quency with reduced melodic variation.
Pulmonic air expiration requires a control of the vocal folds tension (the
word tension stands here for the complex muscular mechanisms controlling the
positioning and the elongation of the vocal folds) in order to compensate for
the diminution of the lung air volume during expiration. As this air volume
diminishes, the subglottal pressure mechanically diminishes as well, since an
The source-filter model of phonation 3
Vocal folds
Source Speech
Vocal folds
Speech
Effect on
laryngeal Source-filter Effect on articulation
frequency interaction and vowel quality
Emotional
state
harmonics of voiced sounds to be shaped, on the one hand, and of noise regions
of the fricatives (both for voiced and unvoiced) on the other hand. Stop
consonants such as [p], [t], [k] are simply not taken into account in this
model, although voiced stops [b], [d], [g] are partially represented by their
voiced character. However, this is partially justified, as the perception of stop
consonants is based essentially on spectral transitions on the vowel (if any) that
follows (spectral loci).
If the speaker’s emotional state has to be taken into account in a model, it is
essential to consider the interactions necessarily existing between the source
and the filter (Fig. 1.4). Indeed, the emotional state has an effect on the
physiological mechanism of phonation, as it affects the respiration cycle, the
volume of air inspired (resulting in the speech rate), the subglottal pressure, and
the tension of the vocal folds, which, in turn, determines the laryngeal fre-
quency and the voice pitch. The position of the articulators is also modified,
conditioning vowel quality. This emotional state affects the muscular tension
Emotions 5
responsible for the positioning of the articulators, which are modeled by the
filter. It also produces secondary effects on the source characteristics (for
example, on the control of the laryngeal frequency and the position of the
glottis in the vocal tract).
Emotions
One can easily say that there are as many categories of emotions as there are
authors dealing with the subject. For example, in what may appear as a con-
tinuum, Eckman (1999) distinguishes the following basic categories: Joy,
Sadness, Disgust, Fear, Anger, and Surprise, with secondary emotions resulting
from a mixture of these basic emotions. Shame, for example, can be considered
as a mixed emotion, combining fear and anger directed at oneself. Eckman’s
categories of emotion, like many other systems, obviously pertains to the
terminology of emotions, often influenced or even determined by categories
existing in the language of the researchers (cf. color terminology or snow quality
in Inuktitut, etc.).
Physiological constraints linked to various emotions were often studied (e.g.
Sauleau 2010). Factors prone to influence phonation are salivation, muscular
tension, perspiration, or more globally, blood pressure, and cardiac frequency.
The physiological parameters controlling phonation affected by emotions
are essentially as follows:
a. Energy, which acts on voicing and vowel quality;
b. Tension of the vocal folds, which determines the melodic height as well as
vowel quality;
c. Articulation, another factor affecting vowel quality;
d. Speech rate, responsible for the tense or lax mode of articulation;
e. The degree of voicing, characterizing the noise/source ratio (breathy voice);
f. Breath insertion, as an index of irritation, pleasure, fear (iconic value);
g. Uncontrolled muscular movements (shivering) acting on the laryngeal
frequency as well as on vowel quality;
h. Regulation of the respiration cycle, which determines the position and the
length of pauses.
Dominance of an emotional state occurs when linguistic rules and con-
straints are not fulfilled in the realization of vowels, consonants, and the
prosodic structure.
However, emotion affects the whole phonation process (laryngeal source and
vocal tract), as well as all of the syllables, whereas the dialectal or idiosyncratic
variations pertain essentially to stressed (prominent) syllables. An extreme case of
this process is shown in Figure 1.5. The borderline cases correspond to the “hot”
anger and extreme stress, for which emotion disturbs all or some aspects of the
phonological realizations of prosodic markers, and at the other end of the scale,
6 Introduction
Laryngeal frequency
The successive cycles of slow opening and rapid closing of the vocal folds
produce harmonics whose frequencies are integer multiples of the frequency of
vibration of the vocal folds, called laryngeal frequency. Strictly speaking, a
frequency can thus be associated with any segment of a voiced speech sound,
assuming that the frequency value remains constant, which is, of course, never
the case. In fact, the term frequency is a bit confusing and strictly corresponds
to the inverse of the cycle time of vibration of the vocal folds, itself often called
the laryngeal period (while laryngeal vibration is a quasi-periodic phenomenon
rather than strictly periodic).
longer the time window, the finer the frequency analysis resolution, at the
expense of a lower time resolution due to the use of longer windows. A 1
second window would give an excellent frequency resolution of 1 Hz, but
would be unsuitable for speech as many events may occur in a single second of
speech. The commonly retained value of 30 ms results in a sufficient frequency
resolution to “capture” the fundamental frequency of voiced speech sounds by
interpolation. This value compares with the number of frames per second
commonly used in movies to capture body movements (typically 24, 25, or
30 frames per second).
The speech fundamental frequency, F0 (denoted F “zero”), not to be con-
fused with the Fourier fundamental frequency, corresponds to the first harmo-
nic component found in the Fourier analysis of the signal, but also, by
definition, corresponds to the frequency difference between two consecutive
speech harmonics. As this analysis needs a rather long time window to be
effective, the actual value of the laryngeal period may fluctuate during the time
window needed for the analysis. By moving the analysis time window along the
time axis, values for each successive position of the time window are obtained.
These plotted values, whose ordinate corresponds to the fundamental fre-
quency (vertical axis) and the abscissa the time (horizontal axis), form a
melodic curve (Fig. 1.6).
It appears that the melodic curve has a much tormented shape with numerous
ups and downs in frequency, and is also interrupted in some places. These
interruptions correspond to the absence of fundamental frequency value, due in
turn to the absence of voicing (unvoiced speech sounds or silence), at least if
the measure is reliable, which is not always the case in adverse recording
conditions (e.g. low signal to noise ratio). We observe, for example, a rather
300
veau
250
chaud
faut soit
200 a ti
gneau que
beau rô
150 ou le
100
50
0
0 0.5 1 1.5 2 2.5 3
[1] agneau ou veau [2] faut que le beau rôti soit chaud
Intensity
Besides the melodic curve resulting from the successive values of F0 plotted
along the time axis, it is also customary to display an intensity curve by
measuring the intensity or the amplitude of each speech segment inside a
time window. The unit of measurement is usually the decibel (dB), a logarith-
mic value relative to some arbitrary reference defined in the instruments or
within the software used. The most commonly used value for the measurement
is relative to the global intensity detected within the time window used in
Fourier analysis.
A remarkable intensity value corresponds to the increase in decibels result-
ing from doubling the amplitude of a pure tone: 10 log (2) = 3 dB (exactly
3.0102999. . . dB) for amplitude and 20 log (2) = 6 dB intensity. Halving the
amplitude causes −3 dB amplitude and −6 dB of intensity drop. The multi-
plication of the amplitude by a factor of 10 corresponds to an increase in
intensity of 20 log (10) = 20 dB, by a factor 100 of 40 dB, etc.
The dB unit is always a relative value. When the threshold of hearing is used
for reference, the values are absolute decibels (0 dB SPL in English notations,
where SPL stands for Sound Pressure Level) and decibels relative when the
reference is different from this threshold. Absolute dB are thus dB relative to
the threshold of audibility at 1000 Hz.
Since it is sufficient to increase or decrease the amplitude level of sound
reproduction equipment in order to change the intensity curve span up or down,
only relative intensity measures make sense, for example by comparing the
values in dB of two consecutive vowels. Also it is not legitimate to average
intensity values in dB, since this unit is logarithmic. Averages should be
computed from the amplitude values, the formula to obtain an amplitude
value from a dB value of a sound relative to a reference amplitude being Aref
is I = 20 log [a / Aref].
Spectrographic analysis
The spectrogram is a three-dimensional graphical representation (time on the
abscissa, frequency on the ordinate, and amplitude coded by colors or levels of
gray) of the Fourier analysis of successive windows of the speech signal
previously recorded. Depending on the length of the time window used, it
can display harmonics (setting called narrowband) or high concentrations of
10 Introduction
4500
4000
3500
3000
2500 veau ti
2000 a faut
gneau que beau soit
1500 rô
le
1000 ou
chaud
500
0
Syllabic duration
Syllable duration is also a prosodic parameter. The duration unit of measure
is the millisecond (ms). Instrumental measurement may seem trivial, but in
practice it is actually complex to do manually or automatically. Indeed, to
determine syllabic segment boundaries, even by an expert versed in the
joys of visual inspection of spectrograms, is far from simple and cannot be
automated easily. The main reason is that the problem is ill posed, since the
consonants and vowels result from continuous changes of the speaker
articulatory configuration, as is the case when we walk, where moments
of beginning and end gestures are not precisely defined. Likewise, the
starting and ending instants of a speech event should be evaluated from
the time they are perceived and the time they cease to be perceived, and
these moments are not necessarily identical for a listener and for an
acoustic speech analyzer.
Syntax and prosody 11
300
veau
250
chaud
faut soit
200 a gneau ti
quebeau
150 le rô
ou
100
50
0
0 0.5 1 1.5 2 2.5 3
[1] agneau ou veau [2] faut que le beau rôti soit chaud
300
veau
250
chaud
faut soit
200 a gneau ti
que beau
le rô
150 ou
100
50
0
0 0.5 1 1.5 2 2.5 3
[1] agneau ou veau [2] faut que le beau rôti soit chaud
linguistics at the time (Di Cristo & Rossi, 1977; Selkirk, 1978, 1986; Di Cristo,
1998; Rossi, 1999; Fox, 2000; Mertens, 1993, 2008; Bocci, 2013; Di Cristo,
2013, and so on). The advent of AM phonology, aimed first at describing the
tonal systems of African languages (Goldsmith, 1976), came to the scene,
together with the ToBI (Tone and Break Indices) notation system for tonal
targets (Beckman & Ayers Elam, 1997). This, given that the concept that
sentence intonation could be described as strings of well-formed tonal targets,
obscured considerably, at least in my view, research in the field. Data, however
scarce and resulting from sometimes unreliable pitch analysis of coined labora-
tory sentences, had to fit into the theoretical grid imposed by the community.
Perhaps one of the most disappointing aspects of the AM approach pertains
to the idea, inspired by syntactic theory, that strings of prosodic events could
not have variations, i.e. that one set of tonal targets should be deemed correct
for a given sentence without considering other possibilities. The common use
of very short sentences was equally misleading for the interpretation and
comprehension of sentence intonation. Furthermore, what was lacking is an
explanation principle specific to sentence prosody, as the one usually proposed
was strongly linked to syntax. Sentence intonation, with its prosodic structure,
was frequently viewed as a crutch helping a locally deficient syntax (in
“ambiguous” sentences, Lehiste, 1979), or, at most, as a cherry on the syntactic
cake. The independence of the prosodic structure from syntax, proclaimed
around 2005 (though already discussed in detail in Rossi et al., 1981), was
difficult for some researchers to accept, and even more so was the more recent
idea that this prosodic structuration would operate before the syntactic struc-
ture. Indeed, this latter view would imply that syntax would depend, at least
partially, on intonation, and not conversely that intonation does depend on
syntax.
Stressed syllables 13
I strongly felt and still do feel that the advent and dominance of the AM
model was not giving a proper account of the prosodic structure function.
Furthermore, one of the supplementary problems is linked to the use of the
ToBI notation, which appeared more and more as a convenient ad hoc system to
force data to fit into the model. Indeed, whereas simple use of properly aligned
high H* and low L* tonal targets may be satisfactory, transcriptions frequently
appear deliberately without any convincing link with the data, especially if
these data are illustrated by fundamental frequency curves without any detailed
frequency scale as was often the case at that time.
Stressed syllables
Many observers and analysts of the voice, not to mention professional linguists,
noticed a long time ago that in the flow of syllables some were stronger than
others. Some fine ears, i.e. amateurs or professional musicians, even noticed
that these “strong” syllables were not necessarily all stronger in the same way
and were differentiated by musical features such as duration and, interestingly,
melody (linked to the variation of laryngeal frequency). Indeed, modern acous-
tical analysis revealed that stressed syllables (i.e. strong syllables) do bear some
melodic changes, but this is also the case for the other syllables of the
sentences, which are perceived as stronger or not. From these common obser-
vations emerged many descriptions of the phenomena, either purely descriptive
14 Introduction
framework had ignored them, as the then “available theory could not accom-
modate them,” as is the case with the contrast of melodic slope (Jun, 2012,
personal communication).
These constraints were discovered and evaluated empirically from acoustic
analysis of numerous recordings, but an explanation based on the interpretation
of electroencephalographic (EEG) data was recently found (see Chapter 7).
This discovery led to a very new concept of prosodic structure, taking not the
usual representation on a piece of paper ignoring the dynamic aspect of the time
axis (the time parameter in most papers on linguistic events is translated into
“to the left,” or “to the right” instead of “before” or “after” to characterize their
position in time). However, our perspective changes radically if we reintroduce
the time parameter when considering the linguistic generation and perception
of linguistic objects, and particularly prosodic events and their structuration.
A Copernican change
Instead of starting from well-described syntactic facts and properties, I decided
to go the other way around. Martin (1987) and Avanzi (2012), for example,
showed that (1) there is in general more than one single prosodic structure that
can be attached to a given syntactic structure, and (2) it is not reasonable to
assume that a unique prosodic structure can be predicted from textual facts (see
also Bolinger’s famous (1972) paper, “Accent is Predictable (if you’re a Mind-
Reader)”). It seems therefore quite vain to continue piling up rules that would
give better predictions than the ones already available. The key point about this
change of view, a kind of Copernican revolution, is not to consider that the
prosodic structure results from the syntactic structure and some semantic
structure or properties, but instead to view the prosodic structure as the first
hierarchical organization produced by the speaker, in which text and syntax
will be accommodated more or less felicitously in a next or a concurrent step in
the speech production process. This view also involves the a priori indepen-
dence of the prosodic structure from syntax.
Many arguments can validate this view, including the following:
1. In general, more than one prosodic structure can be associated with a given
syntactic structure.
2. When reading, speakers try to recover the prosodic structure implicitly or
explicitly designed by the writer (who has a limited number of orthographic
symbols – the punctuation marks – to give indications to the reader).
3. Far from being an accessory (as considered in functional and generative
syntax as belonging to phonetics, and relevant only to performance rather
than competence), the prosodic structure is indispensable to the listener and
the reader alike to process the linguistic information. Even in silent reading,
the reader has to regenerate some prosodic structure from the read text, the
16 Introduction
enfants sont partis] (“Without waiting, the children left”), [Sans attendre les
enfants] [ils sont partis] (“Without waiting for the children, they left”), not to
mention the phrasing [Sans attendre] [les enfants] [ils sont partis], which is
also well-formed.
The listener, on the other hand, has limited possibilities to anticipate proso-
dic events to be realized in the near future by the speaker, although, depending
on the prosodic grammar used in the language, some indications may exist. In
French, for example, the mechanism of melodic slope contrast (see Chapter 7)
allows the listener to anticipate the occurrence of the final conclusive contour
as boundary tones of the final prosodic group may present a reversed rising
slope.
This dissimilarity in processing the linguistic information plays an important
role in explaining the differences of realization in pitch, rhythm, and intensity
observed in laboratory (read) and spontaneous speech analysis. However, many
papers are based on reading and written texts, with the deeply held conviction
that the prosodic structure derives from syntactic/semantic/pragmatic condi-
tions observable from a written text as in recent contributions (e.g. Gachet &
Avanzi, 2008, or Delais-Roussarie, 2009 and Delais-Roussarie et al., 2015) on
parenthesis and coordination in read and spontaneous sentences.
Romance languages
No less than thirty-two distinct Romance dialects are currently identified from
Latin (in alphabetic order): Aragonais, Aroumain, Asturien, Bergamasque
(Lombard de l’Est), Bourbonnais, Bourguignon-Morvandiau, Catalan, Corse,
Espagnol, Estrémègne, Français, Franc-Comtois, Francoprovençal valaisan,
Frioulan, Gallo, Galicien, Italien, Léonais, Milanais (Lombard de l’Ouest),
Mirandais, Napolitain, Normand, Occitan, Piémontais, Portugais, Romanche,
Romanesco, Roumain, Sarde, Sicilien, Vénitien, Wallon (Fig. 1.10).
Among those, a dialect becomes a language when it is used by a strong
political power to become the official language (Posner, 1996). Then if we
retain state languages, only French, Italian, Spanish, Portuguese, and
Romanian remain. But then we miss Quebec French spoken in Quebec, a part
of Canada, and Catalan spoken in Catalonia, a part of Spain. Another criterion
could be the production of literary work. In this case, most dialects should be
added to the list.
The most widely spoken Romance languages are Spanish, French,
Portuguese, Italian, and Romanian (97% of speakers). The Romance language
with the largest number of speakers is Spanish (spoken in about thirty-six
countries by about 329 million speakers). French is next, spoken by 250 million
speakers in twenty-five countries, followed closely by Portuguese (240 million
speakers in eight countries), Italian (62 million in ten countries), Romanian
Figure 1.10 Map: Romance dialects (www.romaniaminor.net/mapes/romania.swf)
Romance languages 19
In the last six decades, the advent of new and ever more sophisticated speech
analysis tools has given researchers the opportunity to test existing theoretical
phonological models, especially those devoted to sentence intonation.
Complex models elaborated from the linguist intuitions could be tested against
actual speech data, first in well-defined production conditions (laboratory
speech) and later in various real-life conditions (spontaneous speech).
Technological advances which allowed for acoustic analysis to be quickly
and reliably performed were of paramount importance for the analysis of data
recorded in various discourse production environments. For prosodic research,
the quest for a correct and reliable measure of fundamental frequency has been
definitively pivotal.
Parallel to these advances, the emergence of large spontaneous speech
corpora, gathering speakers’ performances in all kinds of conditions (mono-
logues, dialogs, public and family contexts, etc., for example, C-ORAL-ROM,
http://lablita.dit.unifi.it/coralrom/) led to a reconsideration of what have some-
times been intangible descriptive results, elaborated in isolation from the
intuition of researchers. Early work on prosody, for instance, was largely
based on intuitive concepts proposed by theorists like Liberman and Prince
(1977), without using much experimental data (as they were felt technically
difficult to obtain by phonologists not especially trained in experimental
phonetics).
The kymograph
In 1860, Édouard-Léon Scott de Martinville realized the oldest known record-
ing of the human voice, with the phonautographe, seventeen years before
Edison’s phonograph. With this device, a stylus engraves the sound vibration
on a sheet of paper coated with carbon black wound around a rotary cylinder.
Although this system could not reproduce the recorded sound (this was done
much later with a laser tracking device, Rosen, 2008), it allowed a first
acoustical analysis of a human voice and announced a specific development
20
The spectrograph 21
for acoustic speech analysis from the kymograph invented by Carl Ludwig in
1847. Later, Etienne-Jules Marey developed a series of instruments dedicated
essentially to the dynamic aspects of speech, including vibration of the vocal
folds (Teston, 2004).
Indeed, phoneticians were already using laboratory speech in the early
twentieth century. Rousselot (1901, 1908), for instance, used a modified kymo-
graph (Fig. 2.1) to obtain rudimentary speech waveforms from which it was
possible to derive values of laryngeal frequency in function of time. This was
done by visual identification of the period or group of periods on the waveform
(Fig. 2.2). The duration of analyzed speech was of course quite limited and
speakers had to be physically present to produce recordings. This approach
became much improved later with high-speed speech wave recording on
photosensitive paper, allowing a reasonable precision in the resulting melodic
curve realized by expert phoneticians.
The spectrograph
Later, in the 1950s, thanks to the development of electronic amplifiers, the
sound spectrograph appeared and it became possible to perform an acoustical
analysis of speech segments of 2.4 s from speech recordings made elsewhere
(Fig. 2.3). The information provided by this tool became quickly central in
22 The role of technological advances
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Figure 2.2 Measure of the laryngeal period, directly (top), or indirectly from
the duration of a packet of ten periods.
0 – 4.000 Hz
(Scale magnifier)
45 Hz
2.4 seconds
First results
Among the first changes of point of view pertaining to phonology, the use of a
kymograph by Rousselot (1901, 1908) and then by Grammont (1933) led to a
better understanding of the nature of stressed vowels and the importance of their
duration. Later, the advent of the spectrograph made possible one of the first
phonetic if not phonological descriptions of basic intonation patterns in French
based on acoustical analysis (Fig. 2.4) by Delattre (1966). These results are still
used today as a basis for the phonological description of French speech prosody.
The description of pitch contours relied on the analysis of recordings stored
on vinyl that were actually inserted in an issue of The French Review where the
paper was published. Everybody (having access to a spectrograph) could
therefore verify the experimental results and review the hypotheses existing
at the time on sentence intonation.
Later, thanks to the realization of a real-time pitch analyzer working reliably
in a large laryngeal frequency range (70 Hz to 500 Hz; Léon & Martin, 1969),
extensive analysis of French prosodic data led to a first model of sentence
intonation using the concept of prosodic structure (Martin, 1975). In this
model, stress groups (sequences of words with only one stressed syllable) are
assembled hierarchically into a prosodic structure according to specific melo-
dic contours located on stressed syllables.
The emergence of relatively inexpensive computers in the years 1980
brought the development of easier to use software pitch analyzers. These
software programs led gradually researchers to apply the phonological AM
(Goldsmith, 1976) model to sentence intonation data, so that the AM model
quickly became dominant in intonation phonology. In this model, the prosodic
structure organizes hierarchically prosodic events in non-recursive levels: a
24 The role of technological advances
Figure 2.4 The ten basic intonation patterns for French by Delattre (1966).
first level assembles syllables σ into content words Wc (verbs, nouns, adjec-
tives, and adverbs) and function words Wf (conjunctions, pronouns, preposi-
tions, articles, verb auxiliaries; a second level into accentual phrases (APs); a
third level groups APs into intonation phrases (IPs); and finally a phonological
utterance (PU) eventually groups sequences of intonation phrases (see
Chapter 4).
The prosodic events are aligned on AP stressed syllables and are described as
sequences of tones transcribed with the ToBI notational system. This system
uses High (H) and Low (L) symbols to represent tone targets as perceived or
observed on fundamental frequency curves obtained from the speech signal
acoustic analysis (an example is given Fig. 2.5).
The prosodic structure in the AM model was applied in the 1990s to
data obtained from the then recently available speech analysis software
Waves™. This opened a new playground for phonologists and syntacticians
interested in sentence intonation, as it offered much simpler access to acoustic
data than the spectrograph currently available at that time. Unfortunately, funda-
mental frequency curves displayed by waves were frequently found unreliable
despite the use of high-quality speech recorded in soundproof rooms.
Some of these errors, such as frequency doubling and halving, were so
frequent that their manual correction became part of the ToBI description
manuals (Beckman & Ayers Elam, 1997).
First results 25
Figure 2.5 An example of analysis with the software Waves™ of the sentence
Jim builds a big daisy-chain (from ToBI 1999). The erroneous segments (on
big and daisy) are circled.
The generalized use of Praat from 2000 onwards delivered more reliable
data, but users frequently choose a display combining a wideband spectrogram
with the fundamental frequency curve (Fig. 2.6).
The fact that this representation became a kind of standard in the field is
rather unfortunate as even in speech recorded in laboratory conditions errors
occasionally do occur. A simultaneous display of a narrowband spectrogram
would be more advisable, as it would allow even moderately knowledgeable
operators to locate immediately potential errors in the pitch curve from the
observation of voice harmonics displayed simultaneously (see Chapter 1).
More recently, the elaboration of rather large spontaneous speech corpora
has led to the development of more complex software programs such as
WinPitch (from 1996 onwards) imbedding various sophisticated tools to tran-
scribe, annotate, and align recorded data (Fig. 2.7).
A somewhat detailed description of WinPitch is included at the end of this
book. One of the important features of the program is the ability to apply
multiple fundamental frequency tracking algorithms on various segments of
recorded speech. It is then possible to obtain the best fundamental frequency
curves in adverse recording conditions, by selecting appropriate algorithms.
26 The role of technological advances
be done in an extra step. Some dedicated software does exist (e.g. EasyAlign,
Goldman, 2011; Penn Phonetics Lab Forced Aligner, Yuan & Liberman, 2008),
but requires reasonably good quality speech recordings, without too much
echo, speech overlapping, frequency distortion (e.g. from high compression
mp3 coding), and so on to operate properly.
WinPitch is an example of another approach to this problem. Instead of
relying on automatic processing using tools available from speech recognition
techniques, its aligner uses the capacity of human operators to handle speech
quality problems, assuming that operators have better overall speech percep-
tion faculties than machines.
The WinPitch alignment engine is based on the following approach: psy-
choacoustics experiments have shown that subjects are capable of correlating
moving objects with speech if the speech rate is reduced by at least 30 percent.
Using a PSOLA (Pitch Synchronous Overlap and Add), Autocorrelation or
Phase vocoder type re-synthesis for natural speech, the aligner can play back
speech with an adjustable reduced speech rate (down to seven times slower),
allowing the user to click on written words (or any other unit) of the text corpus
corresponding to the running slower speech. The speech rate is adjustable in
real time with a mouse wheel, so that the operator can continuously adjust the
output rate to suit the alignment task.
This aligner presents the important following features:
1. It will work with degraded quality recordings, which are common in
spontaneous speech corpora (recordings made on the street, with machine
background, multiple speakers, echo, etc.), which is not presently the case
for existing automatic aligners.
2. The error rate will depend on the operator, and is expected to be very low
(5%) compared to automatic methods (reporting 25–30% error rate) thanks
to the adjustable speech rate. Graphic tools are available to make an adjust-
ment if needed.
3. It will be insensible to speaker’s dialectal variations, whereas automatic
segmentation and alignment based on Hidden Markov Model speech recog-
nition technology requires training for each speaker voice to be effective,
and is therefore very sensitive to those variations.
4. It integrates user-friendly graphic commands with the mouse to correct
eventual misalignment errors.
3 Transcription systems
29
30 Transcription systems
300
veau
250
soit chaud
faut
200 ti
a gneau que beau
ou le rô
150
100
50
0
0 0.5 1 1.5 2 2.5 3
[1] agneau ou veau [2] faut que le beau rôti soit chaud
Figure 3.1 Pitch curve using a linear scale in Hz (French sentence agneau ou
veau faut que le beau rôti soit chaud “Lamb or veal, the beautiful roast must
be hot”). The waveform, representing directly the speech sound vibrations, is
displayed on the bottom of the graph.
to a logarithmic frequency scale will also change the graphic aspects of the
melodic curves and possibly influence their interpretation by researchers.
Furthermore, the creaky mode of phonation, due to an irregular or alternate
(i.e. consecutive long and short periods) mode of vibration of the vocal
folds, will give other problems of interpretation of data, as in this case the
fundamental frequency is generally not correctly evaluated and is therefore
displayed with erroneous values. It will be necessary for an operator to
learn how to identify certain remarkable passages, so as to segment it in
relevant sections for phonological analysis.
Moreover the acoustical measurement of the fundamental frequency is
episodically erroneous. This characteristic, frequently due to the poor record-
ing conditions, unseats more than one beginner in this field of research. After
all, measuring instruments, acoustics or others, are a priori designed to function
correctly (within the limits specified in their respective user manuals, which are
not necessarily read by everybody). The same applies to pitch analyzers. Many
more or less reliable software programs allow for pitch curve visualization. If
Praat is one of the most popular today, a program like WinPitch, as its name
indicates, is adapted more to the prosodic analysis by extended possibility of
analysis with multiple fundamental frequency tracking algorithms reducing the
risk of errors and by its assisted aligned process, which is useful for transcribing
large corpora.
The purpose of transcription, as for any other physical phenomenon, is to
filter and select information in order to make data interpretable. Most of the
existing transcription systems import some principle that is implicitly or
explicitly part of the system. For instance, ToBI, a transcription system based
32 Transcription systems
on the perception of high and low pitch targets, inherits from Pike (1945)
notation for English as well as perception experiments carried out at IPO
(Instituut voor Perceptie-Onderzoek) in Holland in the 1970s, using a vocoder
to manipulate speech intonation (’t Hart et al., 1990). Another system, the
prosogram (Mertens, 2004), is based on the assumed validity of the glissando
threshold for voiced speech sounds (despite the fact that this threshold has been
established for the synthetic isolated vowel [a]). Other systems such as Analor
(Avanzi et al., 2008) or IntSint (Hirst & Espesser, 1993) imply the validity of
perception tests based in specific conditions, Analor from a set of defined
acoustic parameters, IntSint on the equivalence of the overall perceived sen-
tence intonation with an assumed equivalent quadratic fundamental frequency
function. The AMPER project (Contini et al., 2002) eliminates the rhythmic
parameter, apparently judged less important, by aligning pitch values of dif-
ferent realizations of read sentences syllable by syllable.
In most cases, syllabic duration and intensity are absent from these
transcriptions, and only pitch movements are retained as pertinent informa-
tion. This is especially true for ToBI, which is the de facto standard for
sentence intonation transcription. However other systems (including the
one used in this book) use a more detailed description of melodic contours
including their duration.
Selecting data
Historical background
Prosodic transcription systems found in the twentieth-century linguistic litera-
ture are mostly of Anglo-Saxon origin and were elaborated by phoneticians as
well as phonologists. Their transcriptions are based on auditory perception,
without the assistance of acoustic analysis devices, which appeared only later.
Some of these systems reveal theoretical options concerning the status of
prominent syllables in the sentence, options which one finds almost every-
where today. The instruments available at the time (the kymograph since 1847,
the spectrograph after 1950) were complex and tedious to handle and did not
allow in practice the analysis of speech of large duration. The use of acoustic
analysis spread gradually only after 1970, together with the availability of
personal computers.
Some early prosodic transcriptions were inspired by musical notation
(Figs. 3.2 and 3.3). Throughout the duration of the musical notes and their
grouping in rhythmic units, this kind of musical notation in some ways takes
into account the sentence rhythm, an attribute which is seldom found in
contemporary transcriptions.
Selecting data 33
Figure 3.3 Musical transcription used by Fónagy and Magdics (1963) for
English.
During the same period, other systems appeared, using the verticality of
graphic space without referring precisely to a musical or frequency scale. These
iconic notations represent non-prominent syllables by points, prominent sylla-
bles as static tones by horizontal strokes, and pitch variations on stressed
syllables by tilted or curved strokes (Figs. 3.4 and 3.5).
Later, the theoretical importance of the stressed syllables began to
appear. As a precursor of the ToBI notation, Pike (1945) transcribed
34 Transcription systems
Figure 3.4 Unstressed syllables, static tones, and contours for English
(Armstrong & Ward, 1931).
Figure 3.5 Unstressed syllables, static tones, and contours for German (von
Essen, 1956).
Figure 3.6 Stressed and final syllables pitch transcribed as static tones in
English (Pike, 1945).
1 2 3 4 5 6
Figure 3.7 Melody contours of groups for English oral (Palmer & Blandford,
1924).
Figure 3.9 Simplified musical range has four levels for French: 1 Low,
2 Average, 3 High, 4 Acute (Léon & Martin, 1969).
syllable to indicate a contour. Thus in Figure 3.8, the syllable dark presents a
high and flat tone, whereas the syllable blue presents a melody contour slightly
rising, and black a downward contour.
The French tradition uses a simplified four levels of musical notation
(Delattre, 1966). These levels, usually numbered from 1 to 4, correspond to
perceived or measured low, average, high, and acute pitch. This notation has
been used for a long time in many phonetic research projects as well as in the
teaching of the intonation of French as a foreign language (M. Léon, 1964)
(Fig. 3.9).
Many other systems were also proposed (Léon & Martin, 1969); however, a
tendency gradually appeared over time to sort the prosodic events related to
syllables in three classes: non-prominent syllables (not accentuated), promi-
nent syllables accentuated thanks to their (perceived) pitch static level, and
prominent syllables accentuated thanks to their pitch variation. One finds this
classification in modern semi-automatic systems of transcription (e.g. the
Prosogram [Mertens, 2004]).
36 Transcription systems
The prosogram
Mertens (2004) implemented in a Praat script an automatic prosodic transcrip-
tion operating on pitch curves. This transcription is based on the assumption
that a change in time of the fundamental frequency F0 is perceived either as a
static tone or as a pitch movement according to the speed of variation of F0, and
8pxti 1
300
250
200
Fo[Hz]
180 Hz
150
100
aff.
int
50
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
samples × 104
9 10 11 12
100 vow-nucl G(adapt)–0.16–0.32/T2,DG–30, dm–0.050
P
ST
90
150 Hz
80
d y R ã t u tse z a n e d ~
c k d v u p ã S e s yR l @ s c R d e
s e s e Q e Q f a m
70 cessé durant touteses années donc devous pencher sur le sort des femmes
Groult Prosogram v2.9
Figure 3.11 Prosogram for the automatic determination of pitch starting from
the syllabic segmentation and of the glissando threshold (here equal to 0,32/
sec2) (Mertens, 2004).
therefore of the frequency change in the syllable per unit of time. A glissando
threshold value determines whether the perceived pitch is a static tone or a pitch
variation.
This threshold was established for pure sounds by Seargent and Harris (1962),
for synthetic vowels initially by Rossi (1971, 1978), and then by ’t Hart (1976)
who used a semitones scale. If the variation – assumed to be linear – is lower than
the threshold, the pitch movement will correspond to a static tone at a level
equivalent to 2/3 of the final frequency of the variation (rising or falling). If it is
higher, it will be perceived as pitch variation and as a static tone. Segmenting
the speech into syllable or phoneme-like units allows for the automatic repla-
cement of the actual pitch curve by either straight or rising or falling variations,
the latter corresponding to a perceived glissando and consequently to a segment
perceived as stressed (Fig. 3.11).
The problems of interpretation and reliability of the representation by a
prosogram are multiple. It is clear that the results depend directly on the
reliability of the speech segmentation, as segment durations intervene directly
into the glissando value for this segment. In addition, the perception of the pitch
of harmonic sounds depends not only on the value of the fundamental fre-
quency but also on the frequency difference existing between two consecutive
harmonics as well as on the distribution of their respective intensities.
The occurrence of nonlinear pitch variation is also a problem. The nonlinear
variations of contours (convex, concave, or bell curved) are common, leading to
errors in the evaluation of a proper glissando threshold, whether the fundamental
frequency scale is linear or logarithmic. Moreover, the variation of intensity
inside the vowel is also a perturbing factor: Rossi (1978) showed that the
threshold of glissando decreases when the intensity increases inside the syllable,
whereas the threshold increases in the event of reduction in the intensity.
In fact this process detects only the differences possibly perceived between
pitch variation and the absence of variation, according to the value adopted for
the parameter of glissando. The interpretation which is often made from the
Prosogram (Simon et al., 2008) tends then to conclude that the only prominent
38 Transcription systems
syllables are those realized with a glissando, although some syllables can be
prominent only because of longer duration, whether pitch variation is perceived
or not. Likewise the bell-curved pitch movements, frequently met in regional,
or idiosyncratic realizations (e.g. in political discourse) could be wrongfully
interpreted as below the glissando threshold as their fundamental frequency
values at the beginning and end of syllables could be close in value (Martin,
2013a).
ToBI
ToBI is one of the most popular systems used almost everywhere today
for many languages. ToBI (acronym of Tone and Break Indices) is inspired
by earlier systems, notably first by Trager and Smith (1951), who use a
system with four tones, then by the description of Swedish accent by Bruce
(1977), the autosegmental theory elaborated in the wake of J. Goldsmith’s
(1976) work on Igbo (a spoken language in Nigeria), and the work of
Liberman and Prince (1977). This system is local and transcribes only
prosodic events assumed to be pertinent in some theoretical approach, such
as the AM model.
A ToBI transcription of a sentence (Beckmann et al., 2005) comprises four
tiers:
1. An orthographical tier;
2. A tier of break indices, noted from 1 to 4 on a perceptive scale related to
the cohesion degree perceived between units;
3. A tonal tier where the pitch events are consigned; these include the boundary
tones and the pitch accents;
4. A comment tier.
In an attempt to use universal features, the pitch events are noted by High (H)
and Low (L) tones, which are interpreted phonetically as pitch target points toward
which the considered pitch movements tend, or, according to some practitioners,
by points of inflection of the pitch curve. The attribution of H and L tones rests on
phonological definitions, together with few phonetic variations.
An underlying phonological system is associated with every sequence with
the following basic elements elaborated for English:
Specific pitch movements on stressed syllables
H* High pitch target (peak accent);
L* Low pitch target (low accent);
L*+H Falling rising pitch movement (scooped accent);
L+H* Rising pitch movement (rising peak accent);
H+! H* High pitch target high preceded by a high tone followed by a fall
in terrace (downstep).
Selecting data 39
Sentence stress
L- Low, on a boundary of intermediate component
H- High on a boundary of intermediate component
!H High and going down in terrace (downstepping)
Boundary tones
L% Low and sentence final
%L Low and sentence initial
H% High and sentence final
%H High and sentence initial
The problem with this system of transcription stems from the fact that the
transcriber has to carry out simultaneously a perception and a phonological task
since some symbols require the identification of intermediate components in
the prosodic structure.
Moreover, the transcription of slow or fast pitch rises (or falls), small or large
pitch excursions, or concave or convex variations may be transcribed by the
same sequences of symbols (Fig. 3.12). Duration of prosodic events is not
implicitly transcribed in ToBI, as each one of the F0 movements in Figure 3.12
will be represented a priori by the same sequence L H*: (a) shorter than (b) will
be noted L H*, like (c) less full than (a) L H*. By the same token (a), (d)
concave, and (e) convex will all three be transcribed L H*, with a possible
alternative L+H* for (d).
This is especially a problem for French, as some melodic contours do
contrast by the amplitude of frequency change. Their transcription may result
in the same LH* sequence. However, recent adaptation suggests using some
kind of LHH* notation, which in practice means the abandonment of a bitonal
system (Jun, 2012).
Another possibly surprising characteristic of the ToBI system consists in
aligning the high tone H* with the end or the beginning of the stressed syllable,
in order to give a similar account of downward and rising contours, although
L H* L H* L H*
(d) (e)
L H ou L+H* ? L H*
Figure 3.12 Variations of rising melody contours transcribed with the ToBI
notation.
40 Transcription systems
F0 rise F0 fall
Stressed Stressed
syllable syllable
L H* L L H* L
Figure 3.13 In the ToBI transcription, a high tone can correspond to a rising
(on the left) or a falling contour (on the right), according to the alignment of
the high tone H* associated with the end or to the beginning of the syllable.
Analor
Analor (Avanzi et al., 2008) is a tool which has been conceived to model and
process semi-automatically prosodic constructions at the different levels of the
grammatical analysis. It is actually implemented as a Praat script, with a
specific display layout. With a set of assumed appropriate parameters (now
user selectable), it detects automatically boundaries of what the authors call
intonative periods, characterized originally by a melodic change of at least four
semitones, followed by a pause exceeding 300 ms and a difference between the
first melodic value after the pause and the last before the pause of at least three
semitones (Fig. 3.14). More generally, it can also detect prominent syllables,
possible candidates to be stressed, or accented syllables.
Improvements in the system are obtained by an automatic adjustment of the
Analor parameters from global properties of the speech sections transcribed
(Avanzi et al., 2011). Of course, results are directly dependent on segmentation
criteria, which may limit their phonological validity. In particular, many
occurrences of continuation majeure (IP boundary tones in AM terminology)
and even of conclusive melodic contours are not followed by a pause, restrain-
ing the usefulness of this approach.
42 Transcription systems
220
175
139
110
a1Md a1M a1Md
kilomètres) (tout autour de cet endroit)] [(ie suis arrivèe (euh au Kenya)]
7.8
7.6
7.4
7.2
7
6.8
8 8.5 9 9.5 10 10.5 11 11.5 12
Transcription as theory
Transcribing prosodic events obviously reduces the apparent complexity of
acoustic data and makes their interpretation apparently easier, but also neces-
sarily biased. A particular problem lies in the choice and definition of phonetic
and phonological transcriptions. A phonetic transcription of prosody should
include all the details that are a priori important for further phonological
description, but that imply the elaboration of a satisfactory list of features
without knowing in advance their phonological pertinence. It may then be
simpler to use the acoustic data themselves rather than a transcription of any
flavor. A phonological transcription of prosody, on the other hand, should
represent only data pertinent for the theoretical approach adopted, i.e. only
the features that would give a proper account of the role and function attributed
to prosodic events, for instance to the indication of the sentence prosodic
structure.
A system like ToBI is positioned somewhere between the phonetic and the
phonological domain. Indeed, as reflected by many reports of annotator con-
sistency (Wightman, 2002), transcriptions are made from a lexicon of prosodic
events (Frota, 2009), and are directly inspired by the shape of melodic curves in
the language in question. In this sense, ToBI is close to the principles governing
International Phonetic Alphabet (IPA) elaboration.
Perception and interpretation 43
Badiou (1969) and Ochs (1979) have demonstrated that the choice of a
transcription system determines the theory that uses this system, whereas, in
linguistics, and in phonology in particular, it should be the reverse: the models
derived from a theory should determine the transcription system used to
analyze the experimental data. Although this observation seems rather obvious,
this opinion is not shared by everybody, as new researchers introduced to the
field accept without question the dominant notation system. Their choice may
of course be pivotal in the interpretation of data later.
The transcription system I adopted reflects directly and explicitly the theo-
retical assumptions underlying the analysis, and is therefore clearly phonolo-
gical. Indeed, the set of descriptive features chosen reflect directly the relations
of dependency of prosodic events in their prosodic context, relations directly
indicating the sentence prosodic structure. Another possibility, of course,
would be to ignore this phenomenon at least in a first step, for example by
installing prosodic events in some deep phonological structure, even if it means
proposing adjustment rules that would give a proper account of the observed
data.
I do not believe this is a suitable solution in phonology, as it allows any
appropriate mechanism to be built that would explain prosodic (and others)
data, whatever they are. It is always possible to find a set of rules that
would generate the proper forms of ANY linguistic events from ANY deep
structure form (cf. the Ugly Duckling theorem in logic, Watanabe, 1969).
While applying Occam’s razor principle of simplicity would give some
warning signal to linguists, it seems that working in a deep structure
environment would give too many possibilities in their phonological
descriptions, at the expense, often delayed for better times, of proposing
appropriate rules generating surface structure forms that would correspond
to data in a more satisfactory manner.
(2006) observed only 19% to 49% agreement in the judgment made on syllabic
prominence on the same text by expert phoneticians.
A question then arises: how can experts, familiar with the techniques of
transcription as well as specialists in prosody, who are free to listen at will to the
segments of word of which they were to detect syllabic prominences, put forth
such divergent judgments? Would syllabic prominence be such an evasive
concept?
We know from Saussure that a linguistic object does not exist if it is deprived
of signification. The segmentation and the categorization of the prosodic events
rest on a knowledge which is, a priori, unconscious of the speaker. Thus, the
user of the “syllabic prominence” object operates by a process among other
unconscious processes while decoding a sentence. The lambda listener is not
consciously aware of a possible function of perceived prominences. The
semiotician and linguist Luis Prieto (1975) would say that the listener uses a
system of primary categorization elaborated during the acquisition and the
practice of the language.
The situation is quite different when the lambda listener becomes conscious
of the operation, for example if asked for a judgment on syllabic prominence,
with instructions such as “do you perceive this syllable as prominent, accen-
tuated, or insistent?” This kind of instruction will bring into play another type
of knowledge, implying secondary systems of categorizations operated by the
listener. It is clearly the strategy adopted by Simon et al. (2008) to obtain 93
percent agreement in the transcription of prominences in a standard text. It
seems illusory to think that the automation of perception will protects us from
an erroneous linguistic interpretation. Such a process will only give us results
based on a set of assumptions, embedded in the perception of software algo-
rithms, which may be linked only remotely to the perception of intonation by
the listeners.
A brief description
Given that the Autosegmental-Metrical (AM) model has been and is still
dominant in the field of intonation phonology, I will briefly recall some of its
essential characteristics and its associated prosodic transcription system, ToBI,
relying on a recent book presenting a relatively up-to-date version of the AM
model (Feldhausen, 2010). The main goal of this model is to “explain the
complexity and the diversity of fundamental frequency (F0) contours” where
pitch is the perceived F0 and intonation the variation of the fundamental
frequency while speaking. Intonation corresponds to “the overall melody of
an utterance, as reflected by its tonal or F0 contour” (Hualde, 2003). In the AM
model, however, only particular points of the utterance are specified for tone.
These points are either prominent syllables or phrasal boundaries at the pho-
nological level. The rest of the contour is filled in by phonetic interpolation
between tonally specified points, assumed to correspond to actual fundamental
frequency values.
There are two types of tonal units (also called prosodic events): pitch
accent and boundary (edges) tones. Pitch accents are associated with
metrically strong syllables of a word or a sentence. They are strictly
locally determined, do not interact with each other, and are categorically
distinct from the other prosodic events. Only two tones, H(igh) and L(ow),
suffice to describe tonal units, which can be mono, bi, or tri tonal. Bitonal
pitch accents can combine two basic tones (e.g. L+H*, H*+L, etc.), the *
symbol indicating the association of the tone with a pitch accent and thus
metrically strong syllable.
Boundary tones mark the edge of prosodic constituents. Pierrehumbert
(1980) distinguished two kinds of edge tones: final boundary tones (noted
L% or H%, but %L, %H initial boundary tones may also exist), and phrase
accents (annotated L- or H-). Boundary tones mark the start or the end of an
intonation phrase (IP). They are also independent from pitch accents. Their
function is to mark limits (the “left” or “right” side) of IPs, the higher level of
Accent Phrases grouping in the prosodic structure.
46
A brief description 47
Phrase accents are freestanding unstarred tones, not aligned a priori on any
syllable, but their existence was questioned by Beckman and Pierrehumbert
(1986) and replaced by intermediate (intonative) phrases (ip’s) as an additional
component of the prosodic hierarchy. Phrase accents were thus substituted by
ip boundary tones. Today, the actual existence of the ip is debated, as according
to Selkirk (2005), they are not marked by specific boundary tones, whereas for
Sun-Ah Jun (2005) some markers do exist in Korean or in French (Michelas &
D’Imperio, 2010), for example.
All these concepts operate in the framework of autosegmental-metrical pho-
nology, metrical following Liberman and Prince (1977) and autosegmental
according to the work of Leben (1971) and Goldsmith (1976). Essentially,
autosegmental means that sequences of tones (as annotated with ToBI combina-
tions of H and L symbols) behave independently from the syllabic segments to
which they are associated. Tones are thus autosegments, each tone being asso-
ciated with one or more elements in the tonal structure called the Tone Bearing
Unit (TBU). On the other hand, they are metrical because relative prominence is
assigned through metrical grids to elements within phrases composing the
utterance. The claimed advantage of a metrical grid over a simple assignment
of stressed syllables is to allow degrees of accent determined by rules such as
S(trong) → w(eak) w(eak) . . . s(trong), i.e. a node generates two or more
syllables with the last one being the strongest. In other words, the rightmost
syllable in node branches is supposed to have maximal prominence. From a
hierarchy of syllables up to the root, it is then possible to calculate the degree of
stress of each syllable, as shown in Figure 4.1. By representing these degrees of
stress on a grid, it is also possible to predict events caused by stress clash (i.e. the
presence of two consecutive stressed syllables; cf. Dell, 1984).
The degrees of prominence, represented by the number of symbols * placed
on the grid on top of syllables, are obtained by applying the s → w w . . . s rule
on nodes of a metrical tree congruent to the syntactic structure of the sentence
considered (Fig. 4.2). Counting the number of s strong nodes obtained by
percolating to the root of this tree leads to the value of the degree of stress
(1 is the lowest, and 5 the highest stress) (see Fig. 4.2).
W S
W S W S
W S W S W S W S
1 2 2 3
Figure 4.2 Degrees of stress obtained by counting the number of stress nodes.
Metrical trees are built congruent to the sentence syntactic tree, implicitly
admitting that syntax indirectly determines the degrees of stress. The hypothesis
of congruence has been abandoned since, but the idea that a given prosodic event,
for example a pitch accent, has a dependency relation with another prosodic
event located on its right (i.e. in the future of the current prosodic events) is one of
the founding principles prevailing for the indication of the prosodic structure in
many non-AM approaches (see Martin, 1975, 1987; Mertens, 1987).
Properties
In the AM model, the overall resulting structure is as follows: syllables group
segments, feet group syllables, Accent Phrases (APs) group feet, intermediate
(intonative) phrases (ip’s) group APs, Intonational Phrases (IPs) group inter-
mediate phrases (ip’s), and utterances group intonation phrases. Furthermore,
an AP is assumed to contain either a verb, an adverb, an adjective, or a noun,
and optionally a grammatical word, such as a conjunction, a determinant, etc.
The metrically strongest syllable usually belongs to one of the content (i.e. non-
grammatical) elements. A specific constraint called the Strict Layer Hypothesis
(SLH) applies to this structure (Nespor & Vogel, 1986; Selkirk, 2005). The
SLH specifies that:
1. A given nonterminal unit of the prosodic hierarchy is composed of one or
more units of the immediate lower category.
2. A unit of a given level of the hierarchy is exhaustively contained in the
superordinate unit of which it is a part.
3. The hierarchical structures of prosodic phonology are n-ary branching.
4. The relative prominence relation defined for sister nodes is such that one
node is assigned the value strong (s) and all other nodes are assigned the
value weak (w).
Stated more simply, the prosodic hierarchy is non-recursive (objects of one
layer are grouping objects of another nature (condition 1), it is a structure
Properties 49
IP groups one on more ip’s, and that every ip groups one or more accent
phrases (APs).
In summary, the prosodic structure presents three levels:
1. AP, accent phrases, with one stressed syllable on a content word (verb,
adverb, adjective, or noun);
2. ip, intermediate (intonation) phrase, containing one or more APs;
3. IP, Intonation Phrase, containing one or more ip’s;
4. U, the Utterance, containing one or more IPs.
At this point, there are no assumptions whatsoever about any correspon-
dence with any unit belonging to the syntactic or semantic domains, except that
APs must contain one content (i.e. non-grammatical) word.
The prosodic structure is also planar, which means that the graph that
represents the hierarchical grouping of units, be it AP, ip, or IP, is planar (i.e.
tree branches representing the grouping of APs into ip, of ip’s into IP, or of IPs
into U do not cross when drawn in a two-dimensional space like a piece of
paper).
The prosodic structure is also connected, which means that no unit, whether
AP, ip, or IP is floating, i.e. not belonging to a unit of higher rank, i.e. an AP to
an ip, an ip to an IP, an IP to a U. In reality, especially in spontaneous speech,
IPs can float and not maintain any dependency relation with another IP when
prosodic parentheses are embedded in the sentence (see Chapter 8 on
macrosyntax).
These properties mean, among other things, that a prosodic structure may
contain only one AP, which would be at the same time the unique ip and unique
IP of the prosodic structure.
In summary, the dominant concept of prosodic structure derived from AM
theory organizes Accent Phrases, APs) hierarchically in one or more levels.
These APs are supposed to contain one content word bearing a pitch accent
(verb, adverb, noun, or adjective) around which may revolve one or more
grammatical words (conjunction, pronoun, preposition, etc.). The APs bear a
melodic accent (pitch accent) placed on the AP content word’s metrically
strong syllable.
In a complex prosodic structure, a first assembly of APs forms an inter-
mediate (intonation) phrase (ip), bearing some prosodic marker distinct from a
boundary tone. The grouping of these ip’s constitutes an Intonation Phrase (IP)
ended with a boundary tone. Finally, the sequence of IPs forms the prosodic
structure, terminated by another type of boundary tone, conclusive and ending
the sentence.
A given prosodic structure does not necessarily present all these levels of
hierarchy. A prosodic structure can be “flat,” as an enumeration of a sequence
of APs assembled in one single level. A two-level prosodic structure groups the
AP in IP’s, and the IP in the sentence prosodic structure. A three-level prosodic
Applying the concept 51
Ë Utterance
È È Intonational Phrase
ip ip ip intermediate phrase
ω ω ω ω ω Prosodic Word
F F F F F Foot
σ σ σ σ σ σ σ Syllable
structure groups the AP into ip, the ip into IP, and the IPs into the final prosodic
structure. In this formalism, the most complex prosodic structure presents three
levels, and in its original form (Selkirk, 1978) with only two levels is not
recursive (Fig. 4.3), as each level groups units of different types (AP and IP).
In addition, the definition of the AM prosodic structure in the literature has
more than one version, depending on whether the IP is considered to be the
largest intonative unit in the sentence or not, and on whether the existence of
ip’s is considered or not.
This book is about the structure of spoken language. It is based on the assumed
existence of a prosodic structure, organizing hierarchically minimal units of
prosody, the prosodic words.
This assumption implies:
a. that prosodic words do exist, and
b. that some prosodic markers do indicate the hierarchical organization of
prosodic words in a structure.
This chapter exposes the main concepts of the Incremental Prosodic
Structure, a prosodic structure built dynamically along the time axis while
speaking. These concepts are applied to the analysis of Romance language
sentence intonation in Chapter 7.
Melodic curves
Whether we read a text orally or silently, we produce a “music of the sentence”
created by specific melodic height, duration, and intensity accompanying each
syllable emitted orally with vibration of the vocal folds. Some of these syllables
are perceptually “stronger” than others. These “strong” syllables are not neces-
sarily all stronger in the same way; they are differentiated from other strong
syllables by acoustic features such as melody (their perceived height), duration,
and intensity. Modern acoustical analysis shows clearly that stressed syllables
do bear some melodic changes, but this is also the case for the other syllables of
the sentence, whether they are perceived as stronger or not.
Many descriptions of the melodic changes have emerged from these acoustic
analyses, some purely descriptive and phonetic, others phonological in their
attempt to highlight regularities governing their realizations by speakers of a
given language.
The melodic curve, whose acoustic analysis can be obtained by dedicated
speech analysis packages such as Praat, WinPitch, WaveSurfer, and so on,
are usually displayed with various degrees of reliability and details on a
frequency/time graph. These curves are quite complex and can be described
and studied in many ways. Therefore, their description presupposes some
59
60 The Incremental Prosodic Structure
250
200
150
100
50
0
39 39.5 40 40.5 41 41.5
L1 [17] son las principales actividades denunciadas por Trafic.
appear as excellent choices. In the first days of the AM approach, for instance,
prosodic phrase boundaries were simply aligned on syntactic boundaries.
However, researchers confronted with more and more data gradually gave
prosodic boundaries their independence from syntax, at least in principle.
president” where the tonic pronoun moi will be stressed in the phrasing
[moi] [mon papa] [il est president]. Likewise, a stress group may contain a
single syllable (e.g. je suis dé-bor-dé “I am overwhelmed,” c’est absolument
fan-tas-tique “it’s absolutely fantastic,” with each syllable of débordé or
fantastique pronounced detached and stressed), or more than one content
word (la fin du film “the end of the movie” with one stressed syllable on film,
or la ville de Paris, “the city of Paris”).
Syllabic chunking
Recent work in neurocognition (Gilbert & Boucher, 2009; Gilbert, 2012)
shows that, in order to be perceived and memorized, strings of syllables must
be organized in chunks of three to five syllables if theses chunks do not
correspond to lexical entries. If presented with larger strings of syllables such
as bisraktoubzachdujpermasrik, the listener will spontaneously segment these
sequences in subgroups of three or four syllables (for example, a sequences of
eight syllables will be segmented into two groups of 3 and 5 syllables bisrak-
toub zachdujpermasrik, 4 and 4 bisraktoubzach dujpermasrik or 5 and 3
bisraktoubzachdu permasrik).
This kind of segmentation can also be observed when reading (even silently)
numbers with many digits written without a dividing space, such as
13878376396 (a telephone number in China), whereas spacing between digits
will force the reader to determine its proper segmentation, for example 138
7837 6396.
The same principle is applied to sixteen-digit credit card numbers, usually
formatted in four groups of 4 digits: 1234 5678 9012 3456 instead of
1234567890123456, and to car license plates, often alternating digits and
alphabetic letters: 845 KWC 87 and KZ-801-RJ (French license plates).
Similar examples were given by Miller (1956) with, for instance, the
sequence IBMCBSCIAIRS. Readers familiar with US culture have no trouble
in dividing this string into the well-known abbreviations IBM, CBS, CIA, and
IRS (International Business Machines, Columbia Broadcasting System,
Central Intelligence Agency, and Internal Revenue Service). Other readers
either would not be capable of remembering the sequence as presented, or
would segment it into smaller parts, such as IBMCBS and CIAIRS, or IBMC,
BSCI and AIRS.
The prominence of one syllable inside a large string, realized, for example,
by vowel lengthening, will also determine the segmentation chosen, for exam-
ple bisraktoub and zachdujpermasrik if the third syllable is made more promi-
nent (stressed), or bisraktoubzach et dujpermasrik if the fourth syllable is more
prominent. However, when syllable groups do correspond to strings already
stored in the listener’s long-term memory, their segmentation can result in a
larger number of syllables, the last one not necessarily being the strongest. This
suggests that some triggering mechanism must exist to allow the listener to
determine and realize quickly the segmentation of nonsense sequences of
syllables into what is called temporal groups by Gilbert and Boucher (2007).
As stress is rarely located on the last syllable in lexically stressed languages
(e.g. in Romance languages other than French), another mechanism of seg-
mentation must be considered, possibly using both the presence of stress and
the direct identification of a lexical entry in the listener’s long-term memory.
64 The Incremental Prosodic Structure
strings. A third kind of prosodic event presents none of the two first character-
istics and will be identified with an iconic value, putting emphasis on the
current string of syllables. Usually, the first type of event is viewed classically
as a property of lexical entries (words), the second as a boundary tone, and the
third type as an emphatic or secondary accent (accent d’insistance in French).
In a dynamic process where syllables occur one after the other in a time
sequence, there is a cognitive limit to the number of syllables the listener can
retain in the short-term memory buffer where syllables are stored, waiting for
further processing. Indeed, sentence processing does not simply operate on
syllables, but groups them in stress groups. Interestingly, this limit does
not affect the actual number of syllables stored, but their cumulated duration.
As mentioned above, experiments suggest that the perception of nonsense
sequences of syllables is limited to four or five if there is no marking by
some prosodic events (Gilbert, 2012), whereas this limit extends to some
seven or eight if such a prosodic mark occurs. As to maximal duration,
measurements made on spontaneous speech data in French suggests a value
of about 1,250 ms (Martin, 2014b).
ones used in writing and inherited from Latin, whereas linguists will give a list
of fifteen or sixteen.
Speakers cannot spontaneously spell isolated phonemes, they have to rely
on the writing system at hand. You cannot spell a word in Mandarin, unless
you refer to its alphabetic Pinyin transcription. On the other hand, if asked, all
naïve native speakers of a language can segment a word into syllables
intuitively, whereas only subjects trained in linguistics can segment it into
phonemes.
One possible reason may originate from the knowledge of single syllable
words, so that multisyllabic words may appear as concatenated monosyllables.
In récréation “recreation” in French, it is easy to find by segmentation (not
based on orthography) the words ré, crée, à, and Sion (“a musical note, created,
to, a city name in Switzerland”) leading to the segmentation ré.cré.a.tion.
Adopting the vowel as the anchor of prosodic events in the syllable has
long been a debatable issue, as the presence of final voiced nasals such as [l],
[m], or [n] generally prolong the melodic movement located on the preceding
vowel (Chen, 1970, Raphael et al., 1975). Among the arguments in favor, the
variability of the syllabic structure, and in particular its duration in case of
final nasals, makes it difficult to consider the overall syllabic duration as a
descriptive phonological feature. A simple example demonstrates this point
(Fig. 5.2).
Comparing the examples c’est ma maman “this is my mother” and c’est mon
papa “this is my father” shows the advantage of adopting the prosodic events
occurring on the vowel only instead of the whole syllable. Whatever the initial
or final consonant, voiced, unvoiced, or absent, the prosodic part of the vowel
remains (relatively) unchanged. Still, voiced nasals (such as [l], [m], and [n] in
French) may extend the vowel melodic movement and participate in its percep-
tion by the listener. As a result, to avoid taking into account the syllabic
structure, segments of prosodic events occurring on stressed and final vowels
only will be described in the following chapters.
200
c'est ma maman c'est mon papa
150
100
50
0
0 0.5 1 1.5 2 2.5
Basic modalities
A sentence generated by a speaker necessarily involves a modality. It has long
been accepted that sentence modality is directly linked to sentence intonation,
and particularly to its last conclusive melodic contour located on the last
stressed syllable. Basically, sentence modality defines the relation between a
speaker and a listener. In the simplest classification, modality can be declarative
or interrogative. The prosodic structure being assumed (in this book) indepen-
dent from the sentence text modality (i.e. the one possibly indicated in the text
itself) is correlated with a modality without direct relation with other modality
(syntactic, morphologic) markers eventually present in the sentence.
Classes of conclusive contours 69
Modality variants
Depending on the perspective chosen, many variants of the basic modality of
declarative and interrogative contours can be considered (Cresti et al., 2002). I
will select only three variants for each of the basic declarative and interrogative
modalities, retaining as supplementary features the emphasis the speaker puts
(1) on the overall information conveyed in the sentence itself and (2) on the
context and situation of the speech act in which the sentence is pronounced
(Table 5.1).
The emphasis on the declarative corresponds fairly well to the order or
command, usually considered as a basic modality (probably by analogy with
the corresponding verbal mode, which, as already mentioned, incidentally
borrows all its forms from other verbal modes such as indicative and subjunc-
tive). Emphasis on the context (containing the information built by previous
statements or present in the situation – in short all the information already
On the context
Emphasis None On the sentence and/or situation
Table 5.2 Phonological description of modality variants using the features +/−
Rising, +/− Ample, and +/− Bell shaped
Rising − − − + + +
Ample − + +/− − + +/−
Bell shaped − − + − − +
Figure 5.3 Variants of modality melodic contours located on the last stressed
syllable (declarative case) or the last syllable, stressed or not (interrogative
case).
Iconicity of conclusive contours 71
Alternative questions
The concept of independence given a priori to the sentence text and its prosodic
structure allows us to better handle cases long regarded as difficult to analyze
by prosodists and semanticists alike. In a statement in French such as Voulez-
vous du thé du café ou du chocolat? “do you want tea, coffee or chocolate?”
(Fig. 5.4), the text contains an interrogative modality, but stressed syllables on
thé and café are necessarily associated with two rising (or neutralized flat)
similar melodic contours while sentence last syllable on chocolat carries a
conclusive declarative falling contour.
Without the conjunction of coordination ou “or” “do you want tea, coffee,
chocolate?” the example becomes as shown in Figure 5.5, with all three
melodic contours on the stressed syllable rising.
The proposed explanation is simple: in the first case, the text carries an
interrogative modality (due to the inversion of verb and subject in voulez-vous),
but the prosodic structure is declarative, in the second case, the text is asso-
ciated with three independent interrogative prosodic structures. The first pro-
sodic structure contains three prosodic words, the first two are carrying a rising
contour contrasting with the final falling contour, while in the second case, the
text is segmented by three successive prosodic groups, and thus three indepen-
dent sentences with an interrogative modality.
hacer pschitt, etc., which, when pronounced, may sound somewhat similar to
the sound of a liquid poured from a bottle, and of the noise made by carbonic
gas escaping an opened can of beer.
The classical view on the correlation between sentence modality and
a prosodic event distinguishes between declarative and interrogative
categories, as well as their imperative, implicative, surprise, and doubt
variants. Using a sketchy phonological description, these contours are
respectively:
Basic declarative: low-range falling contour
Imperative: high-range falling contour
Implicative: moderately rising followed by a falling contour
Basic interrogative: low-range rising contour
Surprise: high-range rising contour
Doubt: rising contour followed by a moderately falling contour
A given modality melodic contour is in a relation of opposition with the other
modality contours, whereas, as it will be shown later, non-final melodic con-
tours are in relations of contrasts with other non-final contours present in the
sentence.
It has long been assumed that the declarative contour, normally in sentence-
final position, has an iconic value, as its fundamental frequency usually reaches
the lowest value of the whole sentence (Martinet, 1960). This melodic move-
ment is due to a drop in the subglottal pressure, which, in the absence of a
counter action by a muscular action on the vocal folds tension, is generally
accompanied by a down movement of the speaker head (Fónagy, 1983). This
head movement appears as a sign of submission toward the listener and is used
in various cultures such as Greek, where a down movement and slight rotation
of the speaker head signifies a submissive agreement (typically associated with
the Greek word μάλιστα – malista – “indeed”). The falling melodic contour
upward signals the end of the sentence and the possible relinquishment of a
speech turn by the speaker in control.
On the contrary, a rising contour found in the middle of (rather long)
sentences (the continuation majeure in French) is supposed to be linked to
a rising movement of the speaker head, correlated with a gesture indicating
the conservation of power toward the listener, i.e. the conservation of the
speech turn control. This melodic movement can be followed by a short or
long pause, which may be used by the speaker for inspiration, permitting
the lungs to be filled for a new phonation sequence. Famous political
leaders sure of their power may perhaps abuse the use of silent pauses in
their speeches, as nobody in their audience would dare to interrupt them.
Doing so will also give their audience the chance to applaud. Indeed, a
vertical rising rotating head is a sign of dominance over the audience
Iconicity of conclusive contours 73
(Duez, 1997). Again Greek culture uses this body gesture to signal strong
denial or refusal, and such a head movement is typically accompanied by
an alveolar click.
The interrogative rising contour in French leads to another interpretation: it
is realized at the end of the sentence like the declarative falling contour, at a
point of low subglottal pressure, as corroborated by a drop in intensity of 6 dB
or so (Martin, 2009). The melodic rise is then obtained by activating the
phonation muscles which control the vocal folds tension. There is also a rising
rotation of the speaker head, which remains basically at its normal straight
position, with only a slight upward rotation. Indeed, there is no direct submis-
sion involved, as the speaker deliberately relinquishes control on the speech
turn to request an answer from the listener.
Imperative contour
As shown by phonetic studies such as in Léon (1993), the imperative contour
appears as an emphatic variant of the declarative contour (Fig. 5.6).
On the iconic level, the imperative melodic contour is an assertion that
admits (supposedly) no comment or reply, i.e. an emphasized assertion.
Furthermore, its phonetic realization requires considerably more articulatory
effort than the simple declaration, involving a preliminary laryngeal frequency
rise to achieve a large fall afterwards controlled essentially by vocal fold
tension, the simple drop in subglottal pressure being insufficient to achieve a
large frequency swing. This suggests that the imperative contour is linked to
some degree of muscular effort from the speaker, an effort which can be
symbolically linked to moderate to strong violence (Fig. 5.7).
Implicative contour
The implicative melodic contour has frequently been called contour d’évidence
in French work on intonation (Léon, 1993). It can be interpreted on the iconic
level as the speaker asking a rhetorical question, indicated by the moderate rise
of the fundamental frequency, followed by a large fall, correlative of an
74 The Incremental Prosodic Structure
assertion (Martin, 2009). In other words, the speaker suggests by this melodic
movement that any question on the content of the assertion should be aborted as
a clear certainty follows immediately, as suggested by the falling part of the
contour (Figs. 5.8 and 5.9).
Contour of surprise
The contour of surprise can be viewed as an emphatic question, instantiated by
an exaggeration of the melodic rise associated with the interrogative modality.
On the iconic level, surprise can be considered as an interrogation almost
deprived of any control, a question which imperiously requires an answer,
which may not necessarily come from the listener (Fig. 5.10).
Iconicity of conclusive contours 75
Contour of doubt
The contour of doubt starts as an interrogative contour but ends as a moderately
falling declarative contour. As an emphasis bearing on the context rather than
on the sentence itself, it raises a question at the beginning but ends with a
moderately marked assertion. This combines two contradictory indications, a
strong demand for information and a moderate denial of any answer that can be
the outcome of the demand (Fig. 5.11).
76 The Incremental Prosodic Structure
syllables were assumed to limit the size of the minimal units of signification,
whereas other studies (including one published in the sixteenth century by
Meigret, 1550) showed that this limit may be around the number seven (+/– two
syllables), or be related to short memory limitations (Miller, 1956). I will show
below that the limit is actually a temporal one, the number of syllables being a
consequence and not the origin of the phenomenon.
The next step pertaining to the concept of prosodic structure is clearly part
of a top-down approach, as it poses the existence of a hierarchy organizing
the minimal units of signification. This hierarchy results from the speaker’s
usage and is not necessarily determined by syntax. It is then called prosodic
hierarchy, and when associated with dependency relations between units, it
becomes a prosodic structure (again a structure is a hierarchy with either
node labels or dependency relations). From a Saussurian point of view,
the significant would be the sequence of prosodic markers indicating the
prosodic structure and the signifier the hierarchical classification of stress
groups.
In the Martin (1975) paper, the prosodic markers are instantiated by melodic
contours, whose acoustic description involves an absolute value of frequency
height, frequency span, and duration. These contours operate by contrast
with the prosodic marker ending (in French) the unit of immediate higher
rank. This final position in sequences of prosodic words in French brings a
confusion from an AM point of view since both pitch accents and boundary
tones are manifested by prosodic events occupying the same position on the
final syllables.
However, there is more. Indeed, if the only differentiation between pro-
sodic events (excluding emphatic stress) exists between terminal boundary
tones and the other stressed syllables, there is still an explanation to provide
as to how the hierarchical arrangements realized between syllabic chunks
actually occur. Indeed, the concatenation of these chunks is not realized
most of the time as a single enumeration, but as a hierarchy with more than
one level corresponding to a prosodic structure. I did notice at that time the
tendency for pitch movements on stressed syllables to follow a principle of
contrast of melodic slope, i.e. to be falling if they depend on a rising contour
“at their right,” and to be rising if that contour is falling (Martin, 1975).
More precisely, a contour would fall if it belongs to a larger group ended
with a rising contour, and conversely it would rise if the ending contour is
falling.
The model considers the various prosodic events (not emphatic accents) as
markers determining dynamically the assembly of successive prosodic
words. This process implies that prosodic markers on lexical stressed sylla-
bles and on accent phrases boundaries, eventually combined into one single
syllable, are sufficiently differentiated to ensure an efficient indication to the
78 The Incremental Prosodic Structure
Independence
The predominance of syntax over the other organizations of linguistic objects
(phonological, morphological, informational), following the Chomskian
model(s) giving a predominant role to the (deep) syntactic structure in the
sentence, affected research on sentence intonation almost from the beginning
of linguistic interest in prosody. Former studies influenced by structuralism
also presented this tendency. Many papers on sentence prosodic structure
(including Martin, 1975) had trouble admitting that the prosodic and syntac-
tic structures were not (always) congruent (cf. Rossi et al., 1981). A lot of
effort has then been devoted to the elaboration of sets of rules to prove that the
prosodic structure is actually derived from syntax, although this may not
always appear obvious from the data (e.g. Longchamp, 1998; Rossi, 1999).
Rather than renounce the primacy of syntax, researchers were keen to propose
elaborated sets of alignment, interpolation, and other fancy phonological
devices that would hopefully result in convincing arguments to derive
prosody from syntax.
Even in classical modality analysis, morphology and syntax are given
priority over prosodic events, as encoded by sentence-final melodic contours.
Many difficult problems such as the ones brought by alternative questions
evoked above (vous voulez du thé ou du café? “do you want tea or coffee?”
with a falling modality contour indicating a statement) were handled, again
giving precedence to facts visible in writing transcription, i.e. morphology and
syntax (Beyssade et al., 2007).
Prosodic events
The whole storage-concatenation principle relies on the ability of the listener
to classify adequately prosodic events located on stressed syllables. This
implies, among other things, that pertinent stressed syllables can be distin-
guished from other prosodic events, for example that emphatic stress can be
differentiated from prosodic contours leading to the storage of the current
sequence of syllables (since the last decoded prosodic events) in the proper
80 The Incremental Prosodic Structure
buffer. It also implies that categories of distinct prosodic events are instan-
tiated (realized) with sufficient acoustic differences to be classified correctly
by the listener.
The generation of a prosodic structure is a dynamic process, whose span
does not exceed a certain limit set by the capacity of the speaker’s and
listener’s short-term memory (probably limited to a few stress groups).
In this process, the speaker has a limited choice of fragments to encode
successive chunks of the prosodic structure (corresponding to ip – intermediate
[intonation] phrase – and IP – Intonational Phrase – as defined in the AM
prosodic structure).
In the course of sentence production, the speaker must necessarily plan a
sequence of stress groups which necessarily end with the modality contour C0
(not considering the abandonment or the possibility of adding a deferred
complement after the prosodically defined sentence end (see Chapter 8). In
fact, at each step, the choice is between coding a relationship of parataxis
(juxtaposition of units), of rection (dependency of units), or no relation at all
with one following melodic contour to appear later. The encoding of the chosen
relation involves always a melodic contour “on the right,” i.e. occurring later in
the sentence.
There is therefore necessarily planning, at least locally, until the appearance
of the next contour in a sequence, since by definition the contour classes define
relationships of dependency toward a contour appearing later, immediately or
not, in the sequence.
Many suitable acoustic differences come to mind to differentiate the
realizations of stressed syllables, but in Romance languages, the most
important contrast is accomplished by the so-called melodic slope contrast.
In essence, if a prosodic event C1 is realized normally with a rising melodic
contour, the most efficient acoustic characteristic for another prosodic event
C2 to contrast with C1 is to have a melodic variation of opposite melodic
slope, i.e. falling if C1 is rising, and rising if C1 is falling. However, other
contrasts are possible, essentially in some idiosyncratic cases.
Although this is true for French, other Romance languages with a lexical
stress system have a supplementary possibility by using both stressed and
final syllabic locations, as seen above. This is what essentially differentiates
French from the other Romance languages, as French does not have lexical
stress, only a final group stress. As French leaves the speaker to choose the
phrasing of a given sentence, only constrained by the duration of the resulting
stress groups (and the alignment with lexical word-final syllables), the
storage-concatenation process will be influenced by the duration and the
hierarchy of these stress groups. In the sentence le frère de Max a mangé
les tartines “Max’s brother ate the sandwiches,” for example, the possible
phrasings are as follows:
The Incremental Prosodic Structure 81
le frère de Max a mangé les tartines – ten syllables, difficult to pronounce with
only one final syllable
le frère de Max a mangé les tartines – two prosodic words of 4 and 6 syllables –
probably pronounced with an accelerate speech rate for the second prosodic word
le frère de Max a mangé les tartines – three prosodic words, of 2, 2, and 6 syllables, a
quite unbalanced phrasing, not very probable
le frère de Max a mangé les tartines – three prosodic words, of 4, 3, and 3 syllables, a
somewhat balanced phrasing, a probable realization (in reading mode)
le frère de Max a mangé les tartines – four prosodic words, of 2, 2, 3, and 3 syllables,
phrasing adapted to a slow speech rate.
This flexibility is not totally possible in the other Romance languages. For
instance in Italian, il fratello di Max ha mangiato i panini requires the four
stressed syllables to be pronounced stressed. The maximal duration of prosodic
words appears less frequently than in French, only in lexical words with more
than seven syllables before the stressed syllable or so (e.g. precipitevolissimo
“very hasty” does not qualify, but precipitevolissimevolmente “in a way like
someone/something that acts very hastily” does, and leads to the indicated
stressed pattern). Most of these cases are found in the specialized chemical or
pharmaceutical domain.
Properties
Although the prosodic structure is usually presented as independent of the flow
of time in linguistic studies (i.e. as if it would be completely known at once
from start to end), it should always be remembered from the point of view of the
listener that prosodic events are perceived one after the other in function of
time. Therefore, their categorization, the identification of the class they belong
to, instead of being absolute on isolated events, is relative and depends only on
past and expected future events and not on all future events, normally unknown
to the listener. Besides, the process is similar for syllables which are not
perceived in isolation but in sequence.
This view is central in the ISC model, as every prosodic event, instantiated
by a prosodic contour, determines a relation of dependency “to the right,” i.e.
toward another prosodic event occurring later on the time axis, until the final
conclusive contour occurs. (The denomination “to the right” refers of course to
the western way of writing; in the writing of other cultures, such as Arabic or
Hebrew, it should be referred to as “to the left” – or even “to the bottom” for old
Chinese writing. This may be a good reason to abandon it).
This means that a prosodic event, say C1, for example (denoting what is
called in the literature a continuation majeure), is correlative to a dependency
relation toward a future event, be it an event of the same class C1 to be part of
an enumeration, or an event C0 of a higher rank belonging to a class other than
82 The Incremental Prosodic Structure
Prosodic phrasing
Prosodic phrasing refers to the segmentation of the syllabic flow into stress
groups corresponding to groups of syllables with only one stressed syllable.
Prosodic hierarchy pertains to the hierarchical assembly of the prosodic
words corresponding to stress groups into larger groups (prosodic phrases)
until the whole intonation line of the sentence is obtained (usually by the advent
of a final conclusive contour).
In this book, there is absolutely nothing that links this hierarchy, which
becomes a structure once dependency relations between groups are defined,
to any syntactic or morphological object, with one exception. Even if the last
(or only) lexical word integrated into a stress group normally has its final
syllables aligned on the end of the stress group (i.e. no stress group would
The Incremental Prosodic Structure 83
stop in the middle of a lexical word), other possibilities do exist, for example by
realizing every syllable as stressed, or by putting an extra stress inside long
words (see below). Therefore, the prosodic and syntactic structure (and other
structures existing in the sentence for that matter) are a priori totally indepen-
dent from each other.
Planarity
The planarity constraint (Martin, 1987) forbids prosodic grouping of
stress groups such as [A[B]C] in which prosodic words (and stress groups)
A and C would form a larger prosodic group before integration of C in a
sequence A B C to form the prosodic structure. A prosodic group forms a
larger unit with the group placed “at its right,” i.e. later on the timescale
marked by a prosodic marker (i.e. a melodic contour) of higher rank. It is
therefore not possible, due to the absence of prosodic markers allowing the
indication of a dependency relation going “over” another group (as morpho-
logical markers of gender and numbers for instance) to realize a non-planar
prosodic structure (see Fig. 5.12).
Connexity
The dependency “to the right” assumed for every prosodic event in a
prosodic structure implies connexity between the hierarchized prosodic
words and phrases, i.e. that every prosodic word or phrase maintains a
relation of dependency toward another prosodic word or stress group
located “at its right,” i.e. next to it on the time axis (Fig. 5.13).
However, this property ceases to be valid when a prosodic parenthesis,
Figure 5.12 A non-planar partial structure [A [B] C], not well-formed for a
prosodic structure.
84 The Incremental Prosodic Structure
i.e. a completely independent prosodic structure ended with its own con-
clusive contour, is embedded in the main sentence prosodic structure.
Examples in spontaneous speech are given in Chapter 8 on macrosyntax
(Debaisieux & Martin, 2010).
Domain
In the ISC model, a prosodic domain is defined between two consecutive
prosodic markers belonging to the same class (for instance, two successive
conclusive contours C0, or two successive first-level C1 contours – called
continuation majeure in the French tradition or IP boundary tones in the AM
model). Phonetic realizations of contours of the same class must present
enough similar characteristics that the listener can recognize and identify the
contours as belonging to that class correctly. Outside a domain, for example in
two consecutive domains, the realizations of prosodic markers of the same
class (i.e. phonologically identical) can vary, as long as they are sufficiently
similar to be correctly classified by the listener, and as long as their differences
are such that they are not confused with markers of another class inside the
domain. This rule originates from the mechanism of prosodic markers identi-
fication, which operated dynamically along the time axis. In this process, the
listener has to compare a limited number of successive realizations of markers
in order to proceed to the storage and concatenation of the sequences of
prosodic words. Another consequence of this rule is that final conclusive
contours must be realized the same way in a given speaking community,
although lower-level contours inside the prosodic structure may be different
from speaker to speaker, or from domain to domain for the same speaker
(Fig. 5.14).
Differentiation in the time domain 85
The contours of the same class in the same domain must use the same
subsets of melodic features to contrast (or not) with other contours. In
another domain, contrasts may use different features to differentiate distinct
contours.
Neutralization
The actual realization of an abstract (phonological) prosodic marker is done in
such a way that it differentiates (acoustically and/or perceptually) from all the
other realizations of other prosodic markers that could happen in its place (i.e.
at the same position and in the same context). This is a basic rule in functional
phonology. It simply stipulates that the actual realization of a specific prosodic
contour by the speaker must possess such acoustic characteristics that it may
not be confused by the listener with another contour that could be selected
instead by the speaker (in the same context). As the following discussion will
show, the consequences of this rule are extremely important in prosodic
phonology, whereas they are totally absent from AM descriptions, as pitch
accents are not supposed to interact.
The differentiation rule (syntagmatic axis) spells as follow: the mani-
festation (realization) of a prosodic marker by acoustic parameters (again
F0, intensity and duration variations of either stressed or final prosodic
phrases vowels) must contrast only locally in time. This means that only a
limited number of prosodic markers in a time sequence must contrast (or
not) so that their prosodic marker classes are correctly identified by the
listener.
Preplanning
If for the listener the dynamic restitution of the prosodic structure intended by
the speaker involves only comparing locally consecutive melodic contours, it is
not necessarily the case for the speaker, who has to plan ahead the sequence of
more than two contours if the prosodic structure is relatively complex, i.e. if the
structure has more than one level.
Indeed, if starting with a neutralized contour Cn, the speaker has the
choice between three options (to take the example of French): the simplest,
with one level, Cn C0 (again C0 is the terminal conclusive contour), a two
level hierarchy encoded by Cn C1 C0, and a three level arrangement Cn C2
C1 C0. The realization of a contour C1 after Cn is the most frequent,
whereas Cn C2 C1 C0 requires preplanning of at least four stress groups,
consecutive or not. If this kind of sequence may be observed in read
speech, spontaneous speakers prefer often sequences such as Cn Cn
Cn . . . C2 C1, with a melodic slope contrast affecting only the two last
contours of the sequence, therefore not requiring preplanning on a long
sequence of stress groups.
Rising − − − + + +
Ample − + +/− − + +/−
Bell shaped − − + − − +
92 The Incremental Prosodic Structure
Cx C0
−Low +Low
+/−Complex −Complex
+/−Rising −Rising
−/−Glissando +Glissando
+/−Long +Long
direction of the dependency relations between contours, see Fig. 5.15). For
the sake of simplicity, I will consider only French cases, excluding the use of
the +/−Complex feature.
Cx C1 C0
C1 Cx C0
Cx Cx C1
Cy Cx C1
Cx Cy C1
prosodic phrase) bears a final rising contour, contrasting with a slightly rising
or falling melodic movement on its stressed syllable usually below the glis-
sando threshold. This configuration corresponds to a boundary tone in AM
terminology, where two distinct prosodic events occur in the same prosodic
word. Where the lexical stress in the prosodic word is final, the two melodic
events merge and a falling-rising contour takes place. The rising part of Cc is
normally above the glissando threshold, and Cc is phonologically described
as −Low, +Complex, +Rising, +Glissando, and +Long. Furthermore, like C1,
Cc may be followed by a pause, adding an extra phonetic feature.
With the principle of contrast of melodic slope, by combining the different
configurations of three successive contour ended with a complex contour Cc,
the possible sequences are: C2 C2 Cc for the first configuration (I), Cn C2 Cc
for (II), and C2 Cn Cc or C1 Cn Cc for (III). More configurations and examples
are given in Chapter 7.
black shoe,” un regard triste “a sad look,” or il arrive tard “he arrives late.” The
regular final stress on the first content word gets shifted to the first syllable.
The arc accentuel forms a larger stress group (AP, prosodic word). The
first stress shifted to the first lexical word syllable becomes a secondary
(emphatic) accent and therefore does not function as a prosodic marker
anymore. However, this configuration is possible only if the newly formed
prosodic word complies with the syntactic clash condition (see below), i.e.
if the two content words involved are dominated by the same node in the
syntactic structure, or put more simply, if the resulting stress group can be
recognized as belonging to the listener lexicon. In French examples such as
je bois mon café tôt “I drink my coffee early” or le travail de nuit nuit
“night work harms,” there is no possibility to shift the first stress or delete
it. The stress group *café tôt does not belong to the lexicon, and in the
second example, there is no room to shift the stress, and the stress group
obtained by deletion of the first stress *nuit nuit does not belong to the
lexicon of French either. The only possible realization is then to stress both
words nuit (the first being a noun, the second a verb), leaving a time gap
between both words.
Stress clash
The so-called stress clash has been observed in many languages, and in
particular in Romance languages. For Italian onestà sarde, a metà prezzo
(Nespor & Vogel, 1986; Profili and Martin, 1987), in Portuguese café quente,
French café froid, etc. also known a rhythmic rule in order to promote an
alternate binary principle (Liberman & Prince, 1977). A stress clash refers to a
sequence of adjacent stressed syllables and is assumed to be avoided in most
cases. Stress clash rules for shifting or deleting the first stress involved in the
clash were formulated a long time ago (Meigret, 1550; Prince, 1983) and their
consequences discussed for French by Martin (2009), Italian, (Profili & Martin,
1987), Spanish (Hualde, 2010), and so on.
In French, at the question Comment Max aime-t-il son café? (“How does
Max like his coffee?”), the (possible) answer Max aime le café # froid shows
two consecutive stressed syllables, whereas to the question Qu’est-ce que Max
aime boire le matin? (“What does Max like to drink in the morning?”), the
answer would be Max aime le café froid, with a stress shift of the first stress on
the preceding syllable to avoid a stress clash. However, the realization of two
successive stressed syllables in the first case implies the insertion of a short
pause (eventually realized with a glottal stop) whose origin will be explained
later.
As seen above, this latter possibility in French leads to a rephrasing of the
sentence and the merge of the originally prosodic words café and froid into one
98 The Incremental Prosodic Structure
café froid, with an emphatic accent on the first syllable. It is easy to establish
the quality of the apparently “shifted” stress by observing the lack of melodic
change whether the final stress bears a rising or a falling contour (resulting
from a contrast in melodic slope) as in Juliette préfère le café russe, mais Max
préfère le café suisse. The same mechanism applies in Italian (ex.: metà
prezzo → metà prezzo “half price”) or Spanish (sofá cama → sofá cama
“sofa bed”), where the last lexical stress remains in place and the first
becomes an emphatic (or secondary) accent in the newly formed prosodic
word and stress group.
Stress clash requires obvious phonological conditions: in French the first unit
is normally stressed on its last syllable, so stress clash requires it to be followed
by a one (stressed) syllable word. In the other Romance languages, the first
word must be stressed on its last syllable, and the following on its first syllable,
whatever the number of syllables.
But these conditions are not sufficient to induce a stress shift: stress clash
can occur only if the resulting group formed by the two consecutive words
are dominated directly by the same node in the syntactic structure (the
syntactic clash condition). This simply means the group formed by the two
words involved in the stress clash have to form a single unit that can be
transferred in concatenation memory, and be later identified in the listener
lexicon.
Whether the first stress involved in a clash is shifted on the first syllable of
the word or elsewhere inside the word, the result if the formation of another
larger stress group, with a larger number of syllables. The stress clash
constrains the formation of a larger prosodic word when the first stress
implied in a stress clash shifts to the left. As the newly formed prosodic
word cannot violate the syntactic clash constraint (see below), this simply
means that the conversion of the syllabic memory would be unsuccessful in
this case.
450
400
350
300
250
200
150
100
50
0
Pol Nar Cnf Jpa Lec
Figure 5.17 Prosodic word shortest duration in ms for various speech styles
(Martin, 2014b), political discourse, narrative, conference, radio news,
university lecture.
access (138), and then two groups of four digits for the number itself (7837
6396). Again, readers not familiar with the phone number format have to divide
the whole sequence into subgroups, such as 138783 and 76396 for example, in
order to be able to interpret (and write down) the information.
Aside from numbers on a car license plate or phone numbers, the same process
should be applicable to strings of syllables. This would mean that sequences of
more than some four or five syllables cannot be decoded, i.e. handled for further
linguistic processing, without being spliced into short sequences.
However, when syllables do correspond to strings already stored in the
listener’s long-term memory, their segmentation can result in a larger number
of syllables being identified. This suggests that some triggering mechanism
must exist to allow the listener to determine and realize quickly the segmenta-
tion into stress groups, but that Gilbert and Boucher (2007) call temporal
groups, a chunk of syllables ended with a stress.
Although words containing more than four syllables are rare in most if not all
languages, French taken as an example offers some cases:
Anticonstitutionnellement (“unconstitutionally”), 8 or 9 pronounced syllables,
depending on the realization of a mute e after the 7th syllable)
Paraskevidekatriaphobie (“Paraskevidékatriaphobia, fear of Friday 13”), 10 syllables
Παρασκευή /pa.ɾa.skɛ.ˈvi/ “Friday”
δεκατρείς /ðɛ.ka.ˈtɾis/ “thirteen”
de δεκα /ðɛ.ka./ “ten” et τρείς /ˈtɾis/ “three”
φόβος /ˈfɔ.vɔs/ “fear”
The normal stress pattern of these examples in French assumes a stressed last
syllable: Anticonstitutionnellement, Paraskevidekatriaphobie. However, at least
another stressed syllable must be realized in order to be pronounced and, as I
will discuss later, perceived. We can then have Anticonstitutionnellement
and Paraskevidekatriaphobie or Paraskevidekatriaphobie or even
Paraskevidekatriaphobie for example, depending on the knowledge of the speaker
about the internal morphology of these rare words (the latter realization by
speakers knowledgeable of Modern Greek). A similar effect occurs in other
Romance languages. In Italian, one of the longest words (outside chemical and
pharmaceutical entries, which could as well be written with spaces or hyphens) is
precipitevolissimevolmente, “in a way like someone/something that acts very
hastily.”
Another possible realization would, of course, consist of separate syllables,
so that each syllable would be stressed, as in an.ti.con.sti.tu.tio.nnel.le.ment
and pa.ra.ske.vi.de.ka.tri.a.pho.bie.
This apparent obligation to stress at least one syllable in a sequence of seven
was already noticed in the sixteenth century! Indeed, in his Le tretté de la
grammère françoise, Louis Meigret (1550) coined some unattested words
Prosodic structure constraints 101
Eurhythmy
Eurhythmy refers to the tendency for speakers to either (1) realize temporal
groups whose number of syllables of consecutive temporal groups are compar-
able or (2) accelerate the speech rate when temporal groups contain a larger
number of syllables and slow down when they have few syllables. Both strategies
realize the same goal: to balance the duration of enunciation of successive
temporal groups. In Max adore les chocolats (“Max loves chocolate”) for
instance, an eurhythmic realization would be [Max adore] [les chocolats] to
balance the number of syllables of both groups, whereas a realization congruent
with syntax would be [Max] # [adore les chocolats], whose non-eurhythmicity
could be compensated for by insertion of a pause after Max (since in this case the
speech rate is difficult to modify on a single syllable). These variations are
possible due to the lack of lexical stress in French.
102 The Incremental Prosodic Structure
1200
1000
800
600
400
200
0
Pol Nar Cnf Jpa Lec
Figure 5.20 Prosodic word longest duration in ms for various speech styles
(Martin, 2014b).
But there is another aspect of the reading process. In fact, for sentences with
any sizable degree of syntactic complexity, the reader must be a good syntax
expert, helped only by punctuation signs. Due to the limitation of the prosodic
structure to two or three levels, an adaptation must frequently be made in the
advent of a more complex syntactic structure with more than two or three levels.
It is then no wonder that only right boundaries of syntactic and prosodic phrases
are effectively realized by the reader, in order that a minimal recovering of the
prosodic structure intended by the writer is established for the eventual listeners,
including the speaker. In any case, this implies that more than one prosodic
structure can be associated with a given read text, except perhaps for sentences
with only one or two prosodic words. (Actually, even two prosodic words can be
assembled differently, as a simple two prosodic words group Nucleus, or as a
Nucleus followed by a prosodic Postfix, or even with two consecutive prosodic
Nuclei, a differed complement, as described in Chapter 8 on macrosyntax.)
In this regard, the prediction of a prosodic structure in French is more proble-
matic. Lacking lexical stress, speakers of French, even in reading mode, may or
may not effectively stress stressable syllables, i.e. the syllables ending predicted
stress groups. The only limit to the predicted variation is the maximal stress group
duration. This duration can be translated into the number of syllables a projected
stress group can contain. If this number is too short, the speaker (and the silent
reader alike) may skip one or more stressable syllables to form a larger stress
group, especially selecting a faster speech rate since the stress group constraint
pertains to the total duration and not to the number of syllables.
Prosodic structure constraints 103
eurythmic (4 + 4) syntactic (2 + 6)
Average Average Slow Fast
eurythmic (5 + 6) syntactic (3 + 8)
500
400
Duration [ms]
300
200
100
0
0 1 2 3 4 5 6 7 8 9
Nb Syllables
number of syllables. Figure 5.22 shows, for example, the evolution of the
average syllabic duration varying from about 100 ms to 250 ms in stress groups
of one to eight syllables.
Examples of stress groups containing up to twelve syllables are found, for
example, in Lehka and Le Gac (2004), but even in this fast speech rate case
their duration is below the limit of about 1,250 ms.
Word alignment
The only constraint linking stress groups to morphology or syntax pertains to
the alignment of words’ last syllables with the end of prosodic words.
Ambiguous application of this principle occurs in puns, whereas violation
may happen in other cases mentioned earlier. Below are some examples in
French:
Ce sont des Mongols fiers de leur passé Ce sont des montgolfières de leur passé
“These are Mongols from their past” “These are balloons from their past”
J’ai vu l’eau tarie dans la fontaine J’ai vu l’otarie dans la fontaine
“I saw the water dried in the fountain” “I saw sea lions in the fountain”
Depending on the stress pattern, listeners will perceive, thanks to the asso-
ciated prosodic words, Mongols fiers or montgolfières in the first example, and
Prosodic structure constraints 105
l’eau tarie or l’otarie in the second (examples taken from Rossi, 1983). But
again, this constraint is violated for long stress groups, as shown above.
Intonation may resolve ambiguity only if (1) the context and situation are not
bringing any information susceptible to remove the ambiguity, and (2) the
incremental process implemented by storage-concatenation ensures a non-
ambiguous hierarchical grouping of stress groups.
In a book devoted to linguistic ambiguities in French, C. Fuchs (1996)
gives some examples of syntactic ambiguities (which would cease to be
ambiguous if pronounced). Although in practice these cases seldom occur
orally in real life, the following examples may illustrate how the ISC process
operates:
[Nadine] [couvre] [la corbeille de fleurs] vs. [Nadine] [couvre la corbeille] [de
fleurs] “Nadine covers the flower basket” vs. “Nadine covers the basket with
flowers”
[Il a parlé fort] [spécialement] vs. [Il a parlé] [fort spécialement] “He spoke loudly
specially” vs. “He spoke very specifically”
[Moules marinières] [et frites à volonté] vs. [Moules marinières et frites] [à volonté]
or most probably [Moules marinières] [et frites] [à volonté] (to avoid a stress
group with six syllables) “Mussels, and fries at will” vs. “Mussels and fries, at
will” and “Mussels, and fries, at will”
For all these examples, the ambiguity is resolved by the difference in phrasing
(cf. Boulakia, 1985).
Syntactic clash
An apparent constraint seems to exist between the syntactic and prosodic struc-
tures, governed by the syntactic clash rule (Martin, 1987). This rule defines
properties of sentence phrasing, i.e. the way sequences of syllables are grouped
together to form stress groups, or seen from a top-down point of view, how
sentences are divided into stress groups. The original definition of the syntactic
clash forbids the grouping of two syntactic units corresponding to two prosodic
words to be dominated immediately (i.e. at the first level) by distinct nodes in the
syntactic structure. For example, in Mary is eating this excellent chocolate, the
following phrasing would not be well formed: [Mary is] [eating this] [excellent
chocolate], since in [Mary is], the auxiliary is is dominated in the syntactic
structure by a node which is also dominating eating in the next prosodic word
syntagm. Likewise in [eating this], this is dominated immediately by a node
which also dominates chocolate in the next chunk (Fig. 5.23).
An explanation for this rule is rather easy to find: although Mary is may be
part of the standard lexicon stored in the listener memory, as eating this, a
“correct” chunking allowing the listener to retrieve a lexical entry (which is not
106 The Incremental Prosodic Structure
Experimental data
In a detailed analysis of various styles of spontaneous speech in French
(Martin, 2014b), I made the following observations confirming the points
mentioned above:
a. Successive stressed syllables are found (“stress clash,” corresponding in
French to one single-syllable stress group preceded by any other stress
group), but there is a minimal amount of time between two consecutive
stressed syllables (actually between two consecutive stressed vowels). This
observation confirms the hypothesis about Delta brain waves synchronizing
the perception of AP, for a maximum frequency, i.e. a minimal period of
about 250 ms (see below).
b. Cases where eurhythmy is obtained at the expense of congruence of the
prosodic structure with syntax are rare, so the eurhythmic compensation is
done by compressing the syllabic duration in stress groups with many
syllables. This was already observed empirically in (Fónagy & Magdigs,
1960; Lehka & Le Gac, 2004; Pasdeloup, 2004, and more recently in Avanzi
et al., 2013). One of the reasons why balancing of the number of syllables is
not frequent in spontaneous data may pertain to the fact that such balancing
requires preplanning essentially more likely for read speech (cf. the read
phrasing [Marie adore] [les chocolats] vs. the spontaneous [Marie]
Brain waves and prosodic structure 107
today is that the perception of syllables along the time axis is synchronized by
Theta waves (Henry & Obleser, 2012; Ghitza et al., 2013), or conversely, that
Theta waves are entrained by syllabic acoustic landmarks (Doelling et al.,
2014). Indeed, the intelligibility of syllabic sequences is improved when the
Theta bursts are in phase with the sequence of syllables (Ghitza 2012; Henry &
Obleser, 2012, Ghitza et al., 2013). In a way, this process can be compared to
mechanisms often used in computer circuitry, where a master clock synchro-
nizes the transfer of information from some circuit output, allowing these to
vary in response time, as the resulting stage would be retained at the same
instant for all electrical outputs, whatever the individual delay values (glitch)
for each of them.
This interpretation (Henry & Obleser, 2012) excludes the role sometimes
given to syllables to themselves synchronize Theta waves by mean of a phase-
locked loop (PLL), as suggested by Ghitza (2013). On the contrary, Henry and
Obleser (2012) demonstrated the importance of phase realignment in
response to frequency-modulated auditory stimuli, where this realignment
depends on the instantaneous phase of delta oscillations, which are them-
selves entrained by an auditory spectrally modulated stimulus (for pure tones
actually). To be efficient, the maximal Theta phase shift should not exceed
about 50 ms, as the range of Theta varies by a factor of about 2 only, from
100 ms to 250 ms.
extends from 250 ms to 1,250 ms, values very similar to the range of period
values for Delta waves (Martin, 2014b).
This hypothesis may raise some difficulties of interpretation when applied
to tone languages lacking stressed syllables, leading one to consider that the
melodic changes linked to tone realizations (high flat, rising, falling-rising, and
falling in Mandarin) are responsible for the synchronizing of Delta waves,
110 The Incremental Prosodic Structure
Syllable
Theta wave
Temporal group
coffee hot,” vs. Max ama o seu café quente, “Max likes his hot coffee”), similar
to the same example in French discussed earlier).
This observation suggests that an imperative reason does exist to squeeze
syllables in a limited time window, a reason linked to the frequency range of
Delta waves, essential to synchronize the perception of syllables by Theta
waves, and constitutes another argument in favor of considering Delta waves
synchronized by stressed syllables (but not by accented syllables). If stressed
syllables would not play this role, there would be no need to keep both stressed
syllables 250 ms apart (Fig. 5.26).
Constraints revisited
The constraints governing stress groups observed on prosodic structures of
both read and spontaneous speech can find a justification – and an explanation –
in recent neurophysiological research work on speech (Steinhauer et al., 1999;
Friederici, 2002; Makuuchi et al., 2009; Giraud & Poeppel, 2012). These
studies, based essentially on EEG, investigate the possible correlations that
may exist between brain activity and the perception and linguistic treatment of
the information by listeners. They also use magnetic resonance imaging in
specific experiments.
Steinhauer et al. (1999), for instance, demonstrated with this technique of
investigation the precedence of prosodic over syntax treatment. Obrig et al.
114 The Incremental Prosodic Structure
(2010) as well as Gilbert and Boucher (2007) and Gilbert (2012) showed that
the speech flow was segmented thanks to prosodic tags and with direct identi-
fication of already memorized units.
Figures 5.27 to 5.30 give explicit explanations linking each of the prosodic
structure constraints to a specific property of Delta waves.
Long-term memory
Identification
of temporal
groups Temporal Maximum
groups duration
Time
Delta
CPS 250/1250 ms
Conversion
Syllables
100/250 ms
Short-term memory Time
Thêta
100/250 ms
Long-term memory
Identification
of temporal
groups
Temporal
groups
Time Minimum
Delta
CPS 250/1250 ms Conversion gap
Syllables
100/250 ms
Short-term memory Time
Thêta
100/250 ms
Figure 5.28 Delta waves synchronize the transfer of chunks of syllables from
short-term memory. The minimum gap between consecutive stressed
syllables is therefore limited by the minimum period value of the Delta waves
(about 250 ms). A lower value would leave Theta bursts desynchronized and
lower efficiency in the perception of syllables.
Brain waves and prosodic structure 115
Long-term memory
Balance of
Identification chunks duration
of temporal
groups Temporal
groups
Time
Delta
CPS 250/1250 ms
Conversion
100/250 ms Syllables
Short-term memory Time
Thêta
100/250 ms
Long-term memory
Identification
of temporal
groups Identification
Temporal
groups
Time
Delta
CPS 250/1250 ms
Conversion
Syllables
100/250 ms
Short-term memory Time
Thêta
100/250 ms
operates before syntax (which in passing explains why the prosodic structure is
not necessarily congruent to syntax). It is therefore not surprising that con-
gruence appears more frequently in laboratory read speech, as in this case the
written text syntax is obviously present before sentence intonation.
As mentioned before, whether in read speech, silent reading, or sponta-
neous speech, the prosodic structure is always present as an obligatory
linguistic object in order to allow the listener to process the information
brought by the flow of syllables and access the syntactic information con-
tained in the sentence. The goal is to demonstrate that the elaboration of the
prosodic structuration necessarily present in the sentence actually precedes
the elaboration of the other structures and particularly of the syntactic struc-
ture, whether in the generation process by the speaker or the perception
process by the listener.
Arguments favoring this conclusion are of various order and are based on the
following facts:
– The prosodic structure can exist without any words or any syntax whereas
the opposite is not true. Syntax depends on the presence of prosody, but
prosody does not depend on the presence of syntax.
– The flow of syllables must be segmented in chunks in order to be processed
by Delta brain waves. Delta waves synchronize the transfer of sequences of
syllables (the stress group) stored in short-term memory.
– In spontaneous speech, reformulations are (almost) always realized by
retaking a complete stress group and not a selected word (except sometimes
in stylistic applications, which may not be a real reformulation).
– The dynamic process of the prosodic structure generation shows that the
speaker has to choose between a relation of dependence (rection), indepen-
dence (paratax), or the absence of relation between the actual prosodic group
(ip or IP in AM terminology). This is done by specifying prosodic contours
indicating a dependency relation toward another contour to occur in the
immediate future (i.e. to “the right”).
All these observations lead to a conclusion suggesting that the prosodic struc-
ture operates before the syntactic and the other structures of the sentence. The
usual graphic representation and analysis of the prosodic structure obscure
considerably these aspects, leading to the belief that intonation acts as a
supplement to syntax, to be processed by the listener (in reality only the reader)
as another set of syntactic features.
syllables grouped in four stress groups, as the usage is to spell out numbers
below 100. The three stress groups will be simply enumerated and form a two-
level prosodic structure:
[[cinq un quatre] [cinq deux deux] [quatre quatre trois six]]
“[[five one four] [five two two] [four three six four]]”
or
[[cinq cent quatorze] [cinq cent vingt deux] [quarante quatre trente six]].
“[[five hundred fourteen] [five hundred twenty-two] [four thirty and forty six]]”
A realization with each number stressed is also possible, but much slower in
order to maintain the minimal duration between consecutive stressed syllables:
[[cinq # un # quatre] # [cinq # deux # deux] # [quatre # quatre # trois # six]]
The maximal duration of a stress group constrains the enunciations as well,
with about 1,250 ms as a maximum duration. This limits the possibility of
enunciating every digit separately without putting stress on every syllable, as
*[cinq un quatre cinq deux deux quatre quatre trois six]
since this sequence is too long to pronounce it in less than 1,250 ms, the
maximum duration of a stress group. Therefore, a possible realization with a
flat structure (an enumeration) is obtained by simply enumerating the succes-
sive digits requires a pause # between each syllable, as
[cinq # un # quatre # cinq # deux # deux # quatre # quatre # trois # six].
The prosodically defined hierarchy of stress groups must be congruent with
the “syntax” defined graphically. The interpretation of the sequence of numbers
by storage-concatenation requires the segmentation in 3 and 7 syllables or 3, 3
and 4 syllables: the example [cinq cent vingt deux] [[quarante quatre] [trente
six]] “five hundred twenty two forty four thirty six” for the telephone number
5224436 can be segmented in three different ways (Figs. 5.31, 5.32, 5.33):
Figure 5.31 A hierarchy which contradicts the graphic structure, since the
most important frontier of the prosodic grouping corresponds to an absence of
a graphic space, and vice versa: 522 44 36.
A simple example: telephone numbers 119
In the case of the configuration congruent with the graphic layout [522] [[44]
[36]] pronounced cinq cent vingt deux quarante quatre trente six, with nine
syllables, two syllables can be stressed: the final on six, and the other on deux
ending cinq cent vingt deux. One feature is enough to maintain the contrast, the
choice being left to the speaker:
1. A pause after deux, resulting in a eurhythmic prosodic structure vingt deux #
quarante quatre trente six;
2. A contrast of melodic height, with a higher pitch on deux and a lower pitch
on six.
3. A contrast of melodic slope, deux rising vs. six falling;
4. A contrast of duration, with the syllable deux shorter than six.
This last choice may be more difficult to realize, given the intrinsic duration
of single stress groups.
These considerations may throw a new light on memory recall experiments
for strings of digits, where tone language speakers perform better than non-
tonal, recalling eight to ten digits for Mandarin and Cantonese speakers,
compared to four digits for English speakers (Chen at al., 2009). Although
the digits are monosyllabic in both types of languages (with the exception of
zero and seven in English), sequences of syllables bearing a tone are better
remembered and processed. In view of the ISC model, this could be explained
by the fact that in tone languages, each monosyllabic digit is processed as a
stress group, whereas for English or French, digits are groups in chunks (i.e.
stress groups) of four to five syllables.
6 Lexical stress in Romance languages
120
Stress in various languages 121
Monosyllabic
más (adverb of quantity): Quiero más comida “I want more food.”
mas (conjunction): Le pagan, mas no es “You get paid but not more.”
él (personal pronoun): ¿Estuviste con él? “Were you with him?”
el (article): El vino está bueno “The wine is good.”
Plurisyllabic
célebre “famous,” celebre (from celebrar “to celebrate,” 3rd person
present subjunctive of celebrar, “to celebrate”), celebré “I celebrated.”
Written with a hyphen, both components keep their acute stress (if any);
without a hyphen both words keep their original graphic stress:
Cuentakilómetros “odometer,” lógico-matemático “logical-
mathematical”
Spanish grammars, which are relying on written forms of isolated words, tell us
that a word pronounced in isolation always carries a stress, but some items lose
their stress when used in connected speech. Nouns, adjectives, verbs, adverbs,
disjunctive pronouns, numerals, and interrogative wh-words (i.e. words
like qué, quién, cuándo, cuál, etc.) are always stressed, whether uttered in isolation
or in connected speech. Words that are never stressed include the following:
1. The definite article, e.g. [la ˈtʃika] la chica “the girl.”
2. Clitic pronouns, e.g. [te lo emˈbje] te lo envié “I sent it to you.”
3. Monosyllabic possessive determiners, e.g. [mi ˈkasa] mi casa “my house.”
126 Lexical stress in Romance languages
A statistical approach
Statistics performed on some 8,000 frequent words in Italian show that 78% of
them are stressed on the penultimate syllable (Sandri & Vivalda, 1981). With
only one rule assigned to the penultimate syllable, the error rate is already
reduced to 22% (Delmonte, 1981).
Some researchers felt that a statistical approach may be the method of choice
for all cases. This method was used by the CSELT (Centro Studi e Laboratori
Telecomunicazioni, now partially Telecom Italia Lab) in 1981.
The approach taken by the CSELT is based on the correlation observed
between orthographic trigrams (sequences of three graphemes) and the location
of the stressed syllable in the word. However, the implementation of these
observations require no less than 250 rules implemented in an augmented
transition network automata operating from the end of the word. For example,
sequences ending with -isia, -isie, -osia, -psia, -psie, etc. will indicate a stress
on the word penultimate syllable.
The appropriate ranking of these rules and an extensive list of exceptions
allowed the system to reach a correct stress positioning of about 97% in
standardized tests. This method was also used for Romanian, claiming 99%
success (Ungurean et al., 2009. See also Chitoran, 2002 and Chitoran et al.,
2014).
A phonological-phonetic approach
From a linguistic point of view, it may seem reasonable to think that stress is
linked to the phonological structure of the syllable. It would then be possible to
establish contextual rules to determine the stressed character of a syllable from
its phonetic and phonological structure.
This approach was adopted and implemented by Delmonte (1981) in a text-
to-speech synthesis system for Italian. The problem with this method is that it
requires a large number of rules and exceptions to the rules in order to obtain
satisfactory performances. After phonetization (orthographic-phonetic conver-
sion), the rules analyze contexts of three elements around a given vowel in each
syllable (starting from the end).
For example, if the vowel [i] belongs to the penultimate syllable, the word is
proparoxyton if [i] is followed by [t, d, l, m, k, t] and if the word does not belong
to a list of exceptions. The word is also proparoxyton if [i] is followed by [l, m,
n, y, t] and the word is a verb followed by a clitic. The word is again
proparoxyton if [i] is followed [g, r, n, t] and the word is a verb of the first
group in -anare, -agire, -atare, or a noun or an adjective belonging to an
exception list. The word is paroxyton in all other cases.
Other rules apply to the left context of the vowel, leading to a very complex
system not really capturing the assignment of stress mechanism (if one exists).
128 Lexical stress in Romance languages
A phonological approach
Sticking to their phonological guns, some phonologists still attempt to find rules
to predict word stress in Italian based on syllabic properties. This is part of the
tradition aiming to find universal stress rules for a majority of languages (cf. Halle
& Vergnaud, 1987). Cei and Hayes (2012), for example, direct a large project to
find out “How predictable is Italian word stress?” Using a very large corpus and
sophisticated mathematical tools, an optimality treatment of data, and filtering of
borrowed words, they still have not achieved their goals to date. Tackling timidly
some properties of suffixes, they reject this solution because the suffixes represent
lexical properties of morphemes! To gather even more data, they use an “Amazon
Mechanical Turk” platform to assemble a very large set of occurrences in order to
consider possible variations in various regions of Italy. The data are then modeled
with complex statistical tools, such as Bayesian models.
A morphophonetic approach
O. Profili (1987) proposed phonetic rules for nouns suffixes which required a
morphological analysis of this category of words. For instance, the suffixes -illo,
-esse are always stressed on their penultimate syllable: distillo “distillate,” profe-
tessa “prophetesse.” Other suffixes are never stressed: -ido, bile as shown by
timido “timid,” sensibile “sensitive.” A large number of cases can be correctly
analyzed with this approach, but a problem remains with homographic suffixes,
such as -ino stressed or not stressed. In piccolino “small,” the suffix + flection -ino
(diminutive masculine singular) is stressed, whereas in amino (3rd person sub-
junctive plural of amare “to love”) -ino is unstressed. This approach lead to the
solution detailed below.
A morphological approach
The morphological approach, originally suggested by Paul Garde (1968, 2013),
and implemented by Martin (1989) in an automatic software program for
automatic placement of stressed syllables in Italian, is based on (1) the stress
rules in Latin and (2) a morphological analysis of nouns, adjectives, and verbs
into their morphological structure:
(prefix) + stem + (suffixes) + (flections)
According to a proposal by Paul Garde (1968), suffixes and flections can be
classified as stressable and unstressable, i.e. susceptible to be stressed, or not
susceptible to be stressed. As most lexical entries in Italian are derived from
Latin (thus excluding borrowed words), the stem follows the Latin stress rule as
given above in this chapter.
Rules for word stress placement 129
The stress rule is then very simple: the last stressable morphological element
(stem, suffix, flexion) of the word determines the position of the stressed
syllable. Given the relatively large number of suffixes and flections homo-
graphs, the key to success of this method lies in matching corresponding
morphological categories (i.e. suffixes and flections for verbs, nouns, and
adjectives) and a correct morphological analysis.
Things may appear more complicated with homographs belonging either to
distinct grammatical categories or worse (for a computer program) to the same
category. An often-quoted example is sono cose che capitano capitano “these
are things that happen captain,” where the first capitano is a verb (3rd person
singular of the verb capitare) and is stressed on the fourth syllable from the end,
whereas the second capitano is a noun (here in its singular masculine form) and
is stressed on its penultimate syllable.
Likewise, two stress patterns pertain to ancora: ancora (anchor), ancora
(still) and ancora “he moors.” Ancora comes from Latin ancora (second
syllable with the light vowel |o|), borrowed from Ancient Greek ἄγκυρα
(stressed on the first syllable), whereas Ancora derives from Latin ad hanc
hōram “at this hour.”
Homographs can belong to the same grammatical category. For example,
principi “princes” and principi “principles,” or turbine “whirlwind” (singular,
il turbine) and turbine “turbines” (plural of la turbina). Furthermore, the
position of the stress may vary with the dialect considered, and sometimes
with the level of language: tenebra “darkness” vs. tenebra as poetic form.
Depending on their nature, i.e. the syntactic category of stems they deter-
mine, homographic suffixes and flections can be stressable or unstressable. For
example, as seen above, the diminutive suffix -in is stressable and by syllabic
segmentation affects the syllable li (piccolo → piccolino, with the morphemes
piccol, in and o), -in being a noun suffix.
However, the same -in is also a marker of the subjunctive and is unstressable
as a verb suffix. In amino, analyzed as am, in, and o, subjunctive present 3rd
person plural of amare “to love,” neither the suffix nor the verb flection are
stressable, resulting in the stress located on the first and only stem syllable.
When derived from Latin, stems are always stressable according to the rules
seen above on their penultimate or prepenultimate syllables. According to
Antonetti and Rossi (1970), 82% of stems (not of words) are stressable on
their last syllable, and only 18% on their penultimate for their root.
The derivations of oper stressable on its first syllable, as derived from Latin
ops “means, resources, power” and opus “business, work” are:
Opera → oper + -a, unstressable flection -a, resulting in opera “opera.”
Operoso → oper + -os stressable nominal suffix + -o unstressable adjec-
tival masculine singular flection, resulting in oporoso “hard worker.”
130 Lexical stress in Romance languages
French
French has no lexical stress, only a final group stress. Progressively from Old
French, all segmental units following the accented syllables were dropped, with
the exception of a single mute [ə] in certain cases. Some traces of penultimate
accent can be found in words ending with the suffix -ation, such as la nation,
l’exaggération, etc. stressed on their penultimate syllable until the middle of
the twentieth century (Martin, 2009). By this process, the position of stress lost
its function of marking morphological boundaries as in the other Romance
languages. From lexical the stress became demarcative in French, indicating
boundaries not of words but of groups of words, content and grammatical
words, or even single syllables.
Therefore, every stressed syllable instantiates in the AM sense a boundary
tone in French, with the exception of emphatic accent. Thus, stress in stress
groups can be located on grammatical words and not only on lexical words, as
in interrogative or imperative forms such as le lui donneras tu? “will you give it
French 131
to him?” or donne le lui “give it to him” where the pronouns are stressed simply
as final syllable of a stress group.
Emphatic accent, which – if melodic – is most of the time realized with a
rising melodic contour (but counterexamples do exist; Martin, 2012a), is
located on the first syllable of the first (or only) lexical words of the stress
group. It follows that a problem in analysis may occur when the stress group
contains a lexical word of only one syllable, as in oui c’est exact “yes it’s
correct.” Should it be considered as an emphatic accent, therefore presenting a
melodic rise, or as a stress group boundary tone? An expected melodic rise due
to the principle of melodic slope inversion would leave the ambiguity in the
analysis, whereas a falling melodic contour would classify the prosodic event
as a stress group boundary.
(especially for learners of French), it is easy to show that this definition does not
hold for long sense groups. Compare the following examples:
L’armoire “the drawer”
La petite armoire “the small drawer”
La petite armoire verte “the small green drawer”
La petite armoire vert-bouteille “the small bottle-green drawer”
However, if the number of syllables of the sense group exceeds a given number
of syllables (in the order of seven, depending on the speech rate, as the limit
depends on the duration of enunciation and not the number of syllables), such
as in La jolie petite armoire vert-bouteille “the pretty small bottle-green
drawer,” it is easy to notice that an extra stress must be applied on some
syllable, for example on jolie, or on armoire, thus keeping the sequence of
syllables below the maximal number of syllables in a single stress group.
133
134 The Incremental Prosodic Structure in Romance languages
and very efficient alignment of the available text transcriptions with corre-
sponding speech sound segments (see Chapter 11 for details). The alignment
allowed for a fast retrieval of prosodic information from any text segment
selected through simultaneous acoustic analysis.
The ISC model assumes a specific function for stress group stressed and final
syllables in the indication of the prosodic structure, challenging the idea that
pitch accent characteristics belong to the phonetic domain, and also extending
to melodic movements the concept of a stress degree hierarchy inside stress
groups. Indeed, melodic rises and falls are not randomly distributed. Eventually
combined with final pitch movements, they indicate to the listener how to
assemble dynamically the sequences of prosodic words along the time axis in
order to reconstitute the sentence prosodic structure (see Chapter 5).
To avoid possible confusion in terminology with pitch accent used in the AM
model, the pitch movements located on the stressed vowels of stress groups as
well as those on their final syllables are called melodic contours (see Chapter 5).
In Romance languages, in a given stress group, we will necessarily have a
stressed syllable (vowel) melodic contour optionally accompanied with a final-
syllable (vowel) melodic contour. In this case both melodic movements are
considered as one unique prosodic event. In French, with no lexical stress,
stressed and final syllables occupy the same position. Exceptions pertain only
to cases where the last syllable contains a mute [ə].
Furthermore, only the melodic change inside the stressed syllable vowel will
be retained to describe melodic contours. Possible extension of pitch move-
ments on a voiced consonant after a stressed vowel or a pause can be part of
the perceived pitch movement by the listener, but will not be considered in the
phonological description of the melodic contour.
of stress groups and prosodic words. Later, I added Romanian and Catalan to
the already analyzed set (Martin, 2002, 2004).
The first corpus sentences were designed in such a way as to observe the
realizations of pitch movements relative to more and more complex syntactic
structure. Typical examples in Italian illustrating this increasing complexity go
from l’idea era semplice “the idea was simple” to L’animale sotto l´influenza
del dolore reagisce attivando il sistema nervoso autonomo con il risultato, ad
esempio, di un incremento della temperatura corporea e della frequenza
respiratoria e cardiaca “The animal under the influence of pain reacts by
activating the autonomic nervous system with the result, for example, of an
increase in body temperature and respiratory rate and heart rate.” The results
obtained from the analysis of EuRom4 examples were tested against the full
texts part of the EuRom4 “lessons,” read by at least four speakers in each
language.
The EuRom5 recordings for their part contain numerous read examples of
enumeration, parentheses, and long sentences. Although read by professional
speakers, observed phrasing appeared to be essentially guided by punctuation
signs (comma, colon, semi colon, parentheses, etc.), by verbal group boundar-
iess, and also by the occurrence of conjunctions of coordination. Relative
pronouns serve also as phrasing boundaries, frequently taking over
punctuation.
Prosodic Syntactic
Text Structure Structure
Temporal Text
Constraints
Note on figures
All the figures below (and most in this book) were obtained with the analysis
software WinPitch. In these representations, plain bold curve segments indicate
the F0 sections corresponding to stressed vowels, and bold dotted curve seg-
ments a final boundary tone in a complex Cc contour. Melodic contours below
the glissando threshold are displayed with a lighter bold segment (see Fig. 7.2).
200
B
150
an ti pa si ri
ra ta
100
50
50
2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8
L1 [1] B [2] antiparasitari
Figure 7.2 Marking of stressed melodic contours and boundary tones. Plain
bold curve segments indicate the F0 sections corresponding to stressed
vowels, and bold dotted curve segments a final boundary tone in a complex
Cc contour. In this example, b antiparasitari “B” (pronounced [bi]) and [a] in
the syllable ta are stressed vowels, whereas [i] in ri is the final vowel carrying
the rising part of a complex contour.
Inventory
In any case, for all read sentences the last stressed vowel indicates the last stress
group of the utterance with an easily recognizable melodic contour, either
perceptually or visually on a melodic curve, falling and/or reaching the lowest
fundamental frequency of the sentence. As it indicates the end of the sentence,
this contour is often called the conclusive terminal contour (examples of
sentences with more than one conclusive contour [complement différé or
epexegesis] are given in Chapter 8 on macrosyntax).
Again, by definition, melodic contours refer to the fundamental frequency
(F0) variation (including their duration) on the stressed syllable vowels (see
Chapter 5). Their definitions belong to the phonological domain, as their actual
phonetic realizations depend on the minimal and sufficient contrast being
maintained in order to allow the listener to group (or not) successive prosodic
words. For example, in a simple prosodic structure with two prosodic words,
the last one being a conclusive declarative contour, the first melodic contour
needs to be differentiated from the last by only one acoustic (and perceived)
feature, for example vowel duration or a non-falling pitch movement (more
than one feature may also be used). In other words, some features characteristic
of the contour may be neutralized, depending on the structure configuration.
When structures are more complex, the network of contrasts that needs to be
maintained also becomes more complex and the realizations of melodic con-
tours more contrasted by using more melodic feature types. The melodic
features retained for the phonological description of the prosodic contours
are: +/− Low, +/− Rising, +/− Complex, +/− Glissando, +/− Long (see defini-
tion in Chapter 5). The inventory of contours is for French: Cn, C2, C1, C0, and
The melodic contours of Romance languages 139
for the other Romance languages, Cn, C1, C2, Cc, C0. Their phonological
descriptions are as follows:
C0 [+Low, −Complex, −Rising, +Glissando, +Long]
Cc [−Low, +Complex, +Rising, +Glissando, +Long]
C1 [−Low, −Complex, +Rising, +Glissando, +/−Long]
C2 [−Low, −Complex, −Rising, +Glissando, +/−Long]
Cn [−Low, −Complex, +/−Rising, −Glissando, −Long]
The actual + or − values of these features depend on the contrast to be
maintained (or not) between contours in a domain. A rising contour C1 of
50 Hz span, for example, should not be compared in absolute value of F0
change, but only to its neighbor. However, inside the same domain, defined by
what comes before the contour from either the beginning of the sentence or the
occurrence of a same-class contour, all contours of immediately inferior rank
must have similar acoustic realizations. As an example, all C2 occurrences
inside a domain defined between two contours C1 must share similar features in
order to be perceived as belonging to the same class by the listener.
A contour C1 taken as an example at the beginning of a sentence could then
show a larger change in F0 than another C1 located near the end of the sentence,
realizing a downstep. The downstep effect has been observed and described for
a rather long time, and is largely due to the diminution of the lungs’ volume and
subglottal pressure during sentence production (see Chapter 5). It will be
considered as phonetic and not phonological. The advantage of a localized
approach focused on contrasts between contours is to take into account (by
essentially eliminating them) the possible changes in speaker state of mind,
emotions, etc., influencing the realizations of prosodic events and in particular
the contour melodic span inside the same speech turn, or even inside the same
sentence.
In French there is no complex contour. The ranking between prosodic events
is as follows, C0 being the final conclusive contour: Cn < C2 < C1 < C0.
Contours ranked in this order use an increasing number of salient phonetic
characteristics, such as glissando value, rise or fall, and duration. The prece-
dence of C1 over C2 originates from the contrast of the melodic slope principle,
where C1 is rising as depending on C0 falling, and C2 is falling as depending on
C1 rising.
If a complex contour is present (for all Romance languages excluding
French), the ranking is: Cn < C1 < C2 < Cc < C0.
The complex contour Cc is spread over two syllables (or its characteristics
are merged on a final stressed syllable) and is inserted between C2 and C0. Here
the ranking order of C1 and C2 is inverted, and both contours can select Cc “on
their right,” as Cc is considered ambivalent versus the principle of melodic
slope contrast, i.e. a rising and a falling contour can contrast with it.
140 The Incremental Prosodic Structure in Romance languages
200
C
B
150 an ti li
an ti pa si ri con zio
ce
ra ta na
100
50
movement resulting in the complex contour Cc. This possibility does not exist
in French, which uses therefore a slightly different prosodic marker system and
a distinct contour ranking.
to another part of memory, and the listener stores a new string of syllables
waiting for a new prosodic event.
When this new melodic contour is perceived, the listener transfers the new
sequence of syllables according to the following three distinct actions:
a. If the new melodic contour belongs to a class ranked below the last contour,
the syllabic sequence is stored in another part of memory, waiting for further
processing.
b. If the new melodic contour belongs to the same class as the last contour,
the syllabic sequence is stored in the same part of memory as the first
sequence.
c. If the new melodic contour belongs to a class ranked higher than the last
contour, the current sequence is concatenated with the already stored
sequence of syllables and the newly formed string of syllables is stored in
the same part of memory as under (b).
This incremental process goes on until a final conclusive contour occurs,
leading to the complete sentence processing of the complete chain of syllables.
Figure 7.5 illustrates a spontaneous speech example in French.
In this example (corpus C-ORAL-ROM French), the sequence of prosodic
events on stressed syllables is C2, C1, Cn, C2, C1, C0, as revealed by the
fundamental frequency curve of Figure 7.5. The ISC process implies that a
certain number of memory buffers must be used to store the intermediate results
of partial concatenation before obtaining the final prosodic structure. The
number of buffers equals the depth of the prosodic structure (four levels,
including the root, in this example). The process requires then buffers M3,
M2, M1, and M0.
400
350
100
50
0
0 0.5 1 1.5 2 2.5 3
Table 7.1 Processing the prosodic events Cn, C2, C1, and C0 in the example of
Figure 7.5
Prosodic
events Cn C2 C1 C0
Buffers M3 M2 M1 M0
C2 et pour 0 0
répondre
C1 et pour répondre 0
à ta question
Cn la 0
difference
C2 la différence entre 0 0
un poney
C1 et pour répondre 0
à ta question la
différence entre
un poney et un
cheval
C0 et pour répondre à
ta question la
différence entre
un poney et
un cheval c’est
sa hauteur
144 The Incremental Prosodic Structure in Romance languages
variants declarative Cd and interrogative Ci). The four other classes of contours
are Cc (for Romance languages other than French), C1, C2, and Cn. These
classes may be instantiated differently by melodic features as long as the
necessary and sufficient contrasts between contours is preserved.
In the simplest configuration, the sentence contains one single prosodic word
ended with a declarative contour C0 located on the last stressed syllable.
The next configuration presents two prosodic words, with the first stressed
syllable bearing a prosodic event Cx to be identified. When the first prosodic
event occurs, the only differentiation to be made by the listener pertains to its
belonging to the C0 (Cd declarative or Ci interrogative) class or not. Indeed, if
the prosodic marker belongs to the C0 class and if the next prosodic marker is
also a C0, the two consecutive prosodic words form two independent prosodic
structures, normally attached to two distinct sentences (they could also be
attached to a single syntactic structure, thus one single sentence, organized in
two sections with a deferred complement, see macrosyntax Chapter 8).
If the contour is not a C0 and the next contour is a C0, whatever the
realization, Cc complex, C1 rising, C2 falling, or Cn neutralized, the indicated
prosodic structure will be the same [Cx C0]. In other words, Cx is neutralized in
this configuration and must only be differentiated from the other contours that
can occur at its place, i.e. C0 (with their Cd and Ci variants). This fact is
frequently observed in French read and spontaneous data, where there is no
lexical stress. However, due to the principle of contrast of melodic slope, the
sequence C2 C0 (almost) never occurs, as C2 +Glissando and –Rising has a
similar falling melodic slope as C0. The C2 and C0 contours may then appear
perceptually too close (although informal perception tests show that listeners
can differentiate both contours taken in isolation).
In read mode, one cannot always expect the prosodic structure to be con-
gruent to the syntactic structure. On the contrary, the reading process allows the
speaker to gather information ahead of what is said by appropriate eye move-
ments, resulting often in eurhythmy not normally observed in spontaneous
speech. Still, some hypotheses can be laid as to the probable configuration of
the sentence prosodic structure, especially when proceeding from simple to
more complex sentences.
To ensure a consistent phonological notation, non-final rising and falling
melodic contours are transcribed phonologically as C1 (rising) and C2 (falling)
if their glissando values (in semitones per second) exceed a glissando threshold
(Rossi, 1978; Mertens, 2004). This glissando threshold expressed in semitones/
sec2 is assumed to correspond to the threshold of perception of a change in pitch
varying linearly. Actually, this value is parametric and should be adjusted, as
the threshold values were obtained from perception tests pertaining to synthetic
vowel [a] (Rossi, 1971). The glissando gives only an indication of the
146 The Incremental Prosodic Structure in Romance languages
Complex contour
The Complex contour Cc is characterized by the realization of a flat (or slightly
rising or falling) melodic contour on the stress group stressed vowel, and a
sharp rise on the final vowel, above the glissando threshold. If the stressed
vowel is in final position in the stress group, both flat and rising melodic shapes
coexist on the same vocalic segment, eventually extended to an adjacent voiced
consonant. This Cc contour presents various realizations apparently linked to
phonetic differences between Romance languages, as illustrated in the follow-
ing figures (Fig. 7.6 to Fig. 7.10).
Italian (Svevo)
200
150 an
ti li
zio
con ce
na
100
50
0
4.4 4.6 4.8 5 5.2 5.4
L1 [4] anticoncezionali
250
una
200 hay
que ción
ción
150 a cri
pro vo
la mi
ca
100 dis na
50
0
15 15.5 16 16.5 17 17.5
L1 [1] que hau una provocación a la discrimiación
Catalan (CAT-ANA56)
350
300 sia
250
que com po nen
200 Ma là
150
100
50
0
231.2 231.4 231.6 231.8 232 232.2 232.4 232.6
L1 [1] que componen Malàsia
250
nos dos
200 u
esta ni
dos
150
100
50
0
26 26.2 26.4 26.6 26.8 27
L1 [1] nos estados unidos
200 po
pre
zi
tio la
150 ,
na
100
50
0
9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10
L1 ,
[2] prepozitionala
C0 Cc C1 C2 Cn
Low + − − − −
Complex − + − − −
Rising − + + − +/–
Glissando +/– + + + −
Long + + − − −
In the example of Figure 7.7, the last stressed vowel has a falling melodic
contour, whereas the complex rising part is realized by the voiced [n] ending
the stressed syllable.
The examples (Figs. 7.6 to 7.10) in five Romance languages of a complex
contour, show a slight fall on the stressed vowel and rise on the final vowel
(with possible extension of the melodic rise on the following consonant if
voiced).
In the read corpora, all retained realizations were declarative, with a “broad”
focus, i.e. without implicative or imperative variations. The presence of
Postfixes or Suffixes is also avoided (see Chapter 8 on macrosyntax for defini-
tions of these macrosegments).
Table 7.2 gives a static phonological description of the melodic contours.
This description uses the binary features +/– Low, +/– Complex, +/– Rising,
+/–Glissando, +/–Long. The feature Low pertains to the value of the funda-
mental frequency reached at the end of the melodic contour. The feature
Complex refers to realization of two distinct contours on stressed and final
vowels (eventually merged).
In the rest of this chapter, these definitions are applied dynamically in
sequences of three consecutive melodic contours, which have to be differen-
tiated by necessary and sufficient contrasts from all other contours that could
occur at their place. These contrasts may involve only the minimally necessary
and not all the features of Table 7.2.
Experimental data
Table 7.3 gives all possible hierarchical configurations of three prosodic
words, excluding cases of saturation resulting in sequences of neutralized
contour as outcome, and sequences where the falling contour would
depend on another falling contour (C0 for example). Configurations I, II,
and III are given in Table 7.3. Sequences involving two prosodic words
Experimental data 149
French C0 C1 C1 C0 C2 C1 C0 C1 Cn C0
(10) Cn Cn C0 Cn C1 C0
C1 C2 C2 C1 Cn C2 C1 C2 Cn C1
Cn Cn C1
C2 Cn Cn C2 – –
Romance C0 Cc Cc C0 C2 Cc C0 Cc C1 C0
(22) C1 C1 C0 C1 Cc C0 C1 Cn C0
Cn Cn C0 Cn Cc C0
Cn C1 C0
Cc C2 C2 Cc C1 C2 Cc C2 C1 Cc
C1 C1 Cc Cn C2 Cc C2 Cn Cc
Cn Cn Cc Cn C1 Cc C1 Cn Cc
C2 Cn Cn C2 Cn C1 C2 C1 Cn C2
C1 Cn Cn C1 – –
can be extracted from this table by retaining two contours instead of three
and more complex realizations by expanding these configurations. These
combinations reassemble the various possible conditions of operation of
the ISC process.
The configurations are directly derived from the principles of (1) depen-
dency “to the right,” (2) inversion of melodic slope, and (3) ranking of the
melodic contours (Cn < C2 < C1 < C0 for French, Cn < C1 < C2 < Cc < C0 for
the other Romance languages).
A possible mini grammar producing the sequences of contours include the
following rewriting rules (limited to two daughters per expanded node):
For French:
C0 → {C1 C0 | Cn C0}
C1 → {C1 C1 | C2 C1 | Cn C1}
C2 → {C2 C2 | Cn C2}
Cn → {Cn Cn}
For the other Romance languages:
C0 → {Cc C0 | C1 C0 | Cn C0}
Cc → {Cc Cc | C2 Cc | C1 Cc | Cn Cc}
C2 → {C2 C2 | C1 C2 | Cn C2}
C1 → {C1 C1 | Cn C1}
Cn → {Cn Cn}
150 The Incremental Prosodic Structure in Romance languages
Although definitions of contours are local, it is easy to see that the same
contour, say Cn, can occupy distinct levels in the prosodic hierarchy. For
example, in the context C1 Cn C0, Cn is located one level below C1, which
is itself one level below C0. In another context, for example C2 Cn C1, Cn has
to be on a structure level below both C2 and C1, C2 being below C1. Merging
the two sequences as C2 Cn C1 Cn C0, the corresponding structure is shown in
Figure 7.11.
French C1 C0
(1) [L’idée C1 était simple C0] “The idea was simple” (EuRom4 frfn6F)
150 dée
l'i é
tait sim
100
ple
50
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4
L1 [1] l'idée était simple
Italian Cn C0
(2) [L’idea Cn era semplice C0] “The idea was simple” (EuRom4 itfn6I)
200
a
de
150 e ra
L'i
sem pli ce
100
50
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
L1 [1] L'idea era semplice
Spanish Cn C0
(3) [La idea Cn era simple C0] “The idea was simple” (EuRom4 esfn6E)
200
150
a e
de ra
La i sim
100 ple
50
0
0 0.2 0.4 0.6 0.8 1 1.2
L1 [1] La idea era simple
Catalan Cn C0
(4) [La idea Cn era simple C0] “The idea was simple” (EuRom4 cafn6C)
300
250 ra
La i a e
de sim
200
ple
150
100
50
0
0 0.2 0.4 0.6 0.8 1 1.2
L1 [1] La idea era simple
Portuguese Cn C0
(5) [A ideia Cn era simples C0] “The idea was simple” (EuRom4 ptfn6P)
150
A i de ia e ra
sim
100
ples
50
0
0 0.5 1 1.5 2
L1 [1] A ideia era simples
Romanian Cc C0
(6) [Ideea Cc era simplă C0] “information is essential” (EuRom4 rofn6R)
350
300 ra
e
250 a
150
plǎ
100
50
0
7.8 58 58.2 58.4 58.6 58.8 59 59.2
L1 ˇ simpla
[1] Ideea era
Italian C2 Cc
(7) [In pericolo C2 poi Cc] “In danger then . . .” (Euro5 I07-2)
250
200
pe co lo
In ri
150 poi
100
50
0
36.6 36.8 37 37.2 37.4 37.6
L1 [1] In pericolo poi
Figure 7.18 In pericolo poi in In pericolo, poi, ci sono alcune città africane
come Timbuctù . . .
Spanish C2 Cc
(8) [cuando C2 se constate Cc] “when it was noticed . . .” (EuRom5 E12-1)
250
200
ta
150
cuan te
do se cons
100
50
0
14 14.2 14.4 14.6 14.8 15
[1] cuando se constate
Figure 7.19 cuando se constate inque pide asus 47 Estados miembros que
establezcan algún tipo de sanción cuando se constate que hay una
“provocación a la discriminación” en los mensajes publicitarios.
Experimental data 155
Portuguese C2 Cc
(9) [apelidado C2 de Óscar Cc] “called Oscar . . .” (EuRom5 P04-1)
300
250
a
pe car
li
200
de
150
do de Ós
100
50
0
7.4 7.6 7.8 8 8.2 8.4 8.6
[1] apelidado de Óscar
Figure 7.20 apelidado de Óscar inUm cão de raça terra nova apelidado de
Óscar, cujo dono é um socialite, vai ser submetido a um lifting aos olhos.
Catalan C2 Cc
(10) [Com C2 les formigues Cc] “Like ants . . .” (EuRom5 C19-1)
200
150
Com gues
les for mi
100
50
0
7.8 48 48.2 48.4 48.6 48.8
[1] Com les formigues
Figure 7.21 Com les formigues in Com les formigues, els cucs o els
escarabats, les meduses són en algunes cultures un element bàsic de
l’alimentació.
Italian C1 Cc
(11) [[La più grande C1 delle piroghe Cc] [misura Cn cinque Cn metri C0]]
“The largest of the canoes measures five meters” (EuRom4 itfn42I)
200
ghe cin que
gran de
150 più de mi su
La lle pi ra
ro me tri
100
50
0
0 0.5 1 1.5 2 2.5 3
L1 [1] La più grande delle piroghe misura cinque metri
Spanish C2 Cc
(12) [Siguiendo Cn este modelo Cc] “Following this model . . .” (EuRom5
E11-1)
200
guien
150
Si do
te
100
es mo de lo
50
0
0 0.2 0.4 0.6 0.8 1 1.2
L1 [1] Siguiendo este medelo
Portuguese C2 Cc
(13) [Segundo C2 a especialista Cc] “According to the expert . . .” (EuRom5
E11-1)
250
Se
200
lis
ta
gun
150 pe cia
100
do aes
50
0
0 0.2 0.4 0.6 0.8 1 1.2
L1 [1] Segundo a especialista
Portuguese C2 Cc
(14) [na cidade C2 de York Cc] “in the city of York . . .” (EuRom5 P05-1)
250
e na
200
ci
da
150 de
de
York
100
50
0
69.6 69.8 70 70.2 70.4 70.6
L1 [1] e na cidade de York
Portuguese C2 Cc
(15) [Nascido C2 no Japão Cc] “Born in Japan . . .” (EuRom5 P06-1)
350
300
250
Na pão
200 sci Ja
do no
150
100
50
0
22.6 22.8 23 23.2 23.4 23.6
L1 [1] Nascido no Japão
French C2 C1
(16) [les garçons C2 de piste C1] “the track boys . . .” (EuRom5 F03-1)
200
150 çons
gar
les de pis
100
te
50
0
21.8 22 22.2 22.4 22.6
L1 [1] les garçons de piste
French C2 C1
(17) [[La plus grande C2 des pirogues C1] [mesure Cn cinq mètres C0]]
“The largest of the canoes measures five meters.” (EuRom4 frfn42F)
200
100
50
0
0 0.5 1 1.5 2
L1 [1] La plus grande des pirogues mesure cinq mètres
In these two examples, the contrast of melodic slope is clearly at work, reveal-
ing one important characteristic of French sentence intonation (Martin, 1975).
French C2 C1
(18) [mais les scientifiques C2 japonais C1] “but Japanese scientists . . .”
(EuRom5 F05-1)
200
scien
150
ti fiques ja
mais les po nais
100
50
0
31.8 32 32.2 32.4 32.6 32.8 33
L1 [1] mais les scientifiques japonais
Figure 7.29 mais les scientifiques japonais in Mais les scientifiques japonais
ont montré que cet agent était contrôlé par une enzyme.
speaker to select various melodic features as long as the contrast between the
desired successive contours is realized to ensure a proper perception of the
prosodic structure.
Catalan Cn C1
(19) [La cuinera de Sant Pol de Mar] “The chef of San Pol de Mar. . .”
(EuRom5 C19-1)
200
150
La Mar
ne ra
cui
de Pol
100 de
Sant
50
0
70.2 70.4 70.6 70.8 71 71.2 71.4
L1 [1] La cuinera de Sant Pol de Mar
Figure 7.30 La cuinera de Sant Pol de Mar in La cuinera de Sant Pol de Mar,
Carme Ruscalleda . . .
or [Cn Cn C0], the first realization being phonetically more marked than
the first, as in Figure 7.35. If ended with C1, the possible prosodic groups are
[C2 C2 C1] and [Cn Cn C1].
Ended with C0 The two first contours Cx both contrast with
C0 and must belong to the same class. They can be instantiated by C1
or Cn.
An example of enumeration, realized as such in all the other Romance
languages with similar text (except in Romanian).
French C1 C1 C0
(20) [Les romans C1 ont un début C1 et une fin C0]]
“Novels have a beginning and an end” (EuRom4 frfn9)
200
mans but
150
ro dé
Les une
un et fin
100
50
0
0 0.5 1 1.5 2
L1 [1] Les romans ont un début et une fin
French Cn Cn C0
(21) [[on rend C1] [cette interdiction Cn strictement Cn inefficace C0]]
“we make the ban strictly ineffective” (EuRom4 F_23_16)
250
200 tion
rend
stric te ment
cette
150 on in ter i
neff
dic i
cace
100
50
0
53.5 54 54.5 55 55.5 56
L1 [17] on rend cette interdiction strictement inefficace.
French C2 C2 C1
(22) [ainsi C1] [[sa nouvelle gamme C2 de combinés C2 ] [présentés Cn
lundi C1]]
“so its new range of handsets presented Monday . . .” (EuRom5 F04-1)
with the sequence of contours C1 C2 C2 C1
350
350
si
350 nou
com di
ain sa bi
200 pré
velle
gamme lun
sen
150
de nés tés
100
50
0
16 16.5 17 17.5 18 18.5 19
L1 [1] ainsi [2] sa nouvelle gamme de combinés présentés lundi
French Cn Cn C1
(23) [[Le trente Cn novembre C2], [anniversaire Cn de la mort Cn au combat
Cn en dix-sept cent dix-huit Cn du roi Cn Charles XII C1]]
“On 30 November, the anniversary of the death in battle in 1718, King
Charles XII . . .” (F_01_22 EuRom4)
300
trente
250
saire ze
200 no dou
de bat
ver
nni la mort com
150 au en cent dix roi
le a dix sept huit du
les
venmbre Char
100
50
0
8.5 9 9.5 10 10.5 11 11.5 12 12.5
L1 [2] Le 30 novembre, anniversaire de la mort au combat, en 1718, du roi Charles XII
Figure 7.36 This example shows a saturation of melodic contrasts in the long
syntagm anniversaire de la mort au combat en dix-sept cent dix-huit du roi
Charles Douze resulting in Cn neutralized contours on all stress groups’ final
(and stressed) syllables, except the last, carrying C1. The contrast of melodic
slope is realized with the contrast C2 ending Le trente Cn novembre with C1.
164 The Incremental Prosodic Structure in Romance languages
Configuration II
The configuration (II) groups the two first prosodic words, which are then
grouped with the third prosodic word. The possible sequences are [[C2 C1]
C0], [[Cn C1] C0] if terminated by C0 (Fig. 7.38), and [[Cn C2] C1] (Fig. 7.39)
if terminated by C1.
Ended with C0
French C2 C1 C0
(24) [[de se livrer C2 à des affrontements C1] en règle C0]
“to engage in good standing clashes” (EuRom4 F_01_22)
250
200 ments
li
de se te
a en
150
vrer ffron
à des
règ le
100
50
0
16.5 17 17.5 18
L1 [3] es devenu... [4] de se liver à des affrontements en règle.
French Cn C1 C0
(25) [[cette maladie C1] [[est devenue C2] [une pathologie Cn changeante
C1]] et multiforme C0]
“this disease has become a changing and multifaceted pathology”
(EuRom4 F_08_04)
250 ma
pa mul
la nue chan gean te
the
200 cette die est et
de ve une lo ti
gie for
150
me
100
50
0
12 12.5 13 13.5 14 14.5 15 15.5 16
L1 [7] cette maladie est devenue une pathologie changeante et multiforme.
French Cn C2 C1
(26) [[C’est au travers Cn de cette relation Cn qu’il instaurera C2] à ces deux
personnes C1]
“It is through this relationship that he will build with these two people”
(EuRom4 F_21_16)
300
250
200
au
c’est re
vers la tion ins tau
150 tra re
de la qu’il ra
à sonnes
per
ces deux
100
50
0
0 29.5 30 30.5 31 31.5 32 32.5
L1 [9] c’est au travers de la relation qu’il uinstaurera à ces deux personnes,
Figure 7.41 C’est au travers de cette relation qu’il instaurera à ces deux
personnes.
Configuration III
Ended with C0
French C1 Cn C0
(27) [[Certains C1] [de ces bâtiments Cn préfabriqués C1] [se sont révélés Cn
dangereux C0]]
“Some of these prefabricated buildings have proved dangerous” (EuRom4
frfn39F)
200
tains
cer de ti qués
150 ces
ments pré
bâ se sont lés dan
fa ré vé
bri ge
reux
100
50
0
0 0.5 1 1.5 2 2.5 3 3.5
French C1 Cn C0
(28) [[cependant C1][empêcher C2 les bagarres C1] [recherchées Cn de part
et d’autre C0]]
“however, to prevent fights sought both sides” (EuRom4 F_01_22)
300
250 dant
cher garres
200 ce pen pê
re
em cher chées
les
150 ba de part et
100 d’autre
50
0
34.5 35 35.5 36 36.5 37
[11] Neuf cents policiers
L1 [12] empêcher les bagarres recherchées de part et d’autre.
n’ont pu, cependant,
Figure 7.44 Neuf cents policiers n’ont pu, cependant, empêcher les bagarres
recherchées de part et d’autre.
168 The Incremental Prosodic Structure in Romance languages
French C2 Cn C1
(29) [[Le 30 Cn novembre C2], [anniversaire Cn de la mort au combat Cn en
1718 Cn du roi Charles XII C1]]
“On 30 November, the anniversary of the death in battle in 1718, King
Charles XII . . .” (EuRom4 F_01_22)
300
trente
250
saire ze
200 no
dou
ver de bat
nni la mort com
150 au en cent dix
le roi
a dix sept huit du les
vembre Char
100
50
0
8.5 9 9.5 10 10.5 11 11.5 12 12.5
L1 [2] Le 30 novembre, anniversaire de la mort au combat, en 1718, du roi Charles XII,
Figure 7.46 This example shows a saturation of melodic contrasts in the long
syntagm anniversaire de la mort au combat en dix-sept cent dix-huit du roi
Charles Douze resulting in Cn neutralized contours on all stressed groups’
final (and stressed) syllables, except the last, carrying C1. The contrast of
melodic slope is realized with the contrast C2 ending Le trente Cn novembre
with C1.
Experimental data 169
Configuration I Enumerations
Ended with C0 The possible instantiations of Cx are Cc, C1, and Cn.
(Svevo)
(30) [B C2 antiparasitari Cc] [C C2 anticoncezionali Cc]
“antiparasites contraceptives . . .”
200
C
B ti
150 li
an ti si ri an con zio
pa ce
ra ta na
100
50
0
2.5 3 3.5 4 4.5 5 5.5
These cases normally require the existence of a lower level in the prosodic
structure between the consecutive complex contours Cc (as in Fig. 7.48).
Outside emphatic style, consecutive contours ending single prosodic words
do not need high-level contrasts using Cc and are instead using Cn or C1 as in
the following examples.
170 The Incremental Prosodic Structure in Romance languages
Italian C1 C1 C0
(31) [I romanzi C1 hanno un inizio C1 e una fine C0]
“The novels have a beginning and an end” (EuRom4 itfn9I)
200
150 zi zio
ni
man ha mno e u
l ro uni na
100
fi ne
50
0
0 0.5 1 1.5 2 2.5
[1] I romanzi hanno un inizio e una fine
Spanish Cn Cn C0
(32) [Los romances Cn tienen un inicio Cn y un fin C0]
“The novels have a beginning and an end” (EuRom4 esfn9E)
200
d.
e.
f. ces
150 man
g. tie
Los nen un ni cio
h. ro y un
i
100 i. fin
50
0
0 0.5 1 1.5 2
[1] Los romances tienen un inicio y un fin
Catalan Cn Cn C0
(33) [Els romanços Cn tenen un inici Cn i un final C0]
“The novels have a beginning and an end” (EuRom4 cafn9C)
300
250 ços
ci
200 te
man i
Els ro nen un un
i ni fi
150
nal
100
50
0
33 33.5 34 34.5 35
[1] Els romanços tenen un ininci i un final
Portuguese Cn Cn C0
(34) [Os romances Cn têm um início Cn e um fim C0]
“The novels have a beginning and an end” (EuRom4 ptfn9P)
150
Os ro mances têm cio
um i ní e um
fim
100
50
Portuguese Cn Cn C0
(35) [Avião Cn de papel Cn no Espaço C0]
“Paper Airplane in Space” (EuRom5 P02-1)
300
250
A vião
200 pel
de pa no
Es
150
pa ço
100
50
0
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
L1 [1] Avião de papel no Espaço
Portuguese C2 C2 Cc
(36) [Um cão C2 de raça C2 terra nova Cc]
“A dog of Newfoundland breed” (EuRom5 P04-1)
350
Um
300
250
te
cão
rra va
200
de ra
150 ça
no
100
50
0
5.5 6 6.5 7
L1 [1] Um cão de raça terra nova
In this sequence, the second contour C2 has a glissando value very close to the
threshold and could be transcribed as Cn instead of C2, corresponding to the
structure [[C1 [Cn Cc]] congruent with the syntax of Um cão de raça terra nova.
This is an example of a complex contour Cc occurring on the last and
stressed syllable of the prosodic group (um escocês).
Portuguese C2 C2 Cc
(37) [é permitido C2 matar C2 um escocês Cc]
“it is allowed to kill a Scotsman . . .” (EuRom5 P05-1)
250 per
cês
é mi
200
ti esco
ma
150 tar um
do
100
50
0
71 71.2 71.4 71.6 71.8 72 72.2 72.4 72.6 72.8
L1 [1] é permitido matar um escocês
Portuguese C2 C2 Cc
(38) [[Os vídeos C2] [sobre Cn actividades C2] paranormais Cc]
“The videos about paranormal activities” (EuRom5 P16-1)
300
250
200 Os ví
ti pa
vi
150 so bre ac da
deos ra
des
nor mais
100
50
0
10.5 11 11.5 12 12.5
L1 [4] Os vídeos sobre actividades paranormais
Spanish C1 C1 Cc
(39) [La recomendación C1 plantea C1 a los Estados Cc]
“The recommendation poses to the States . . .” (EuRom5 E12-1)
300
ción te a
250
La
re ce a ta
200 men
da plan
los Es dos
150
100
50
0
91 91.5 92 92.5 93
L1 [1] La recomendación plantea a los Estados
Romanian Cn Cn Cc
(40) [[Mișcarea Cn separatistă Cn bască Cc] [comis C1 noi atentate C0]]
“The Basque separatist movement committed new bombings” (EuRom4
rofn42)
350
300 ca
250
cǎ a
200 tis a mis noi ten
Mis, co
tǎ ta te
150
bas
100
50
0
206 206.5 207 207.5 208 208.5 209 209.5 210
L1 [42] Miscarea
, separatistaˇ bascaˇ a comis noi atentate
Ended with C2
Catalan Cn Cn C2
(41) [[[i després Cn que el vedell Cn] ataqués C2] un dels homes Cc] [que el
volia C1 lligar Cc]
“and, after the calf attacked one of the men who wanted to tie him”
(EuRom5 C18-1)
250
200
i qués mes
150 que ho que
prés dell a ta un li gar
des el dels el a
vo lli
100 ve
50
0
0 0.5 1 1.5 2 2.5 3
L1 [1] i després que el vedell ataqués un dels homes que el volia lligar
Figure 7.60 i després que el vedell ataqués un dels homes que el volia lligar.
176 The Incremental Prosodic Structure in Romance languages
Ended with C1
Portuguese Cn Cn C1
(42) [A escolha Cn da carreira Cn profissional C1]
“The choice of a professional career . . .” (EuRom5 P09-1)
300
co
250 A es Iha
200 fe
da nal
150 pro ssio
ca
rrei
100 ra
50
0
6.6 6.8 7 7.2 7.4 7.6 7.8 8 8.2
L1 [3] A escolha da carreira profissional,
Configuration II
Ended with C0
Italian C2 Cc C0
(43) . . . [[in coppie C2 nelle quali il padre Cc] è sieropositivo C0]
“in couples in which the father is HIV positive” (EuRom4 I_22_17)
300
250 sie ro
in qua è po
co ppie nelle li il si
200 dre
pa
ti
150 vo
100
50
0
24.5 25 25.5 26 26.5 27
L1 [10] in coppie nelle quali il padre è sieropositivo.
Italian C2 Cc C0
(44) [[. . . sarà arruolato C2 dai carabinieri Cc] e addestrato C0]
“. . . will be recruited and trained by the police” (EuRom5 I03-1)
250
ddes
200
e a
150 ra bi
sa la
rà rruo ca tra
a ri
to to
100 nie
dai
50
0
64.5 65 65.5 66 66.5
L1 [22] s... [23] sarà arruolato dai carabinieri e addrestrato.
Italian C1 Cc C0
(45) [[[probabilmente C1] [sfuggito Cn al controllo Cc]] del padrone C0]
“probably escaped the control of the master” (EuRom5 I03-1)
250
ba bil
200 pro con
te sfu
men to al pa
150 tro del
ggi llo
dro
100
ne
50
0
32 32.5 33 33.5 34 34.5
L1 [11] probabilmente sfuggito al controllo del padrone.
Romanian Cn C1 C0
(46) [[Aceasta Cn este o dilemă C1] insolubilă C0]
“This was an insoluble dilemma” (EuRom4 rofn10)
350
300
ta
250
ceas este in
mǎ
200 o so
le
lu
150 a di
bi lǎ
100
50
0
43.5 44 44.5 45 45.5
L1 [10] Aceasta este o dilemaˇ insolubilaˇ
Ended with Cc
Romanian C1 C2 Cc
(47) [[Situația C1 periferică C2] a Portugaliei Cc]
“The peripheral situation of Portugal . . .” (EuRom4 rofn40)
350
300 ,tia
,
tua
250 Si pe fe
ri liei
200 ri a Por
cǎ tu ga
150
100
50
0
193.5 194 194.5 195 195.5 196
L1 [40] Situatia
, perifericaˇ a Portugaliei o mentine
, într-o pozitie
, marginalaˇ in raport cu fluxurile din Est
Catalan Cn C2 Cc
(48) [i després Cn que el vedell Cn ataqués C2] un dels homes Cc] que el volia
C2 lligar Cc]
“and, after the calf attacked one of the men who wanted to tie him . . .”
(EuRom5 C18-1)
250
200
50
0
0 0.5 1 1.5 2 2.5 3
L1 [1] i després que el vedell ataqués un dels homes que el volia lligar
Figure 7.70 i després que el vedell ataqués un dels homes que el volia lligar.
These examples show that, contrary to French, the other Romance languages
use C2 and not C1 as marker of the prosodic dependency relation with the
higher rank melodic contour Cc.
Experimental data 181
Romanian Cn C1 Cc
(49) [[[Unele C1] [[dintre aceste Cn clădiri C1] prefabricate Cc]] s-au dovedit
Cn periculoase C0]
“Some of these prefabricated buildings have been found to be dangerous”
(EuRom4 rofn46)
400
350
ne
300
50
0
228.5 229 229.5 230 230.5 231 231.5 232
L1 [46] Unele dintre aceste cladiri
ˇ prefabricate s-au dovedit periculoase
Romanian Cc C1 C0
(50) [[Romanele Cc] [[au un început C1] [și un sfârșit C0]]]
“The novels have a beginning and an end” (EuRom4 rofn9R)
300
250 si
,
le au un
200 un put sfar
ˇ
Ro în ce
ma ne
150 sit
,
100
50
This sentence was read with a different prosodic structure from the other
Romance languages, congruent with syntax.
Romanian C1 Cn C0
(51) [Alarmă C1] [la școala Cn britanică C0]
“Alarm in the British school” (EuRom4 rofn9R)
350
300
250 lar
maˇ
scoa
, la
200 bri
la ta
A
150
ni
cǎ
100
50
0
40.4 40.6 40.8 41 41.2 41.4 41.6 41.8 42 42.2
L1 [9] Alarmaˇ la scoala
, britanicaˇ
Ended with Cc
Italian C2 C1 Cc
(52) [[che C2] [trasferirsi C1 in USA Cc]]
“which, moved in USA . . .” (EuRom4 I_23_06)
300
rir SA
si
250 in
che
tras fe U
200
150
100
50
0 91 91.5 92 92.5 93
L1 [1] che [2] trasferirsi in USA
Catalan C2 Cn Cc
(53) [[Pocs Cn minuts C2] [després Cn de les set Cn de la tarda Cc]] el vedell
“A few minutes, after seven in the evening, the calf . . .” (EuRom5 C18-1)
300
250
mi
200 da
nuts ve dell
Pocs
prés el
150 set tar
des de de
100 les la
50
0
55.5 56 56.5 57 57.5 58
L1 [1] Pocs minuts després de les set de la tarda [2] el vedell
The prosodic structure segments the text into Pocs minuts and després de les
set de la tarda.
Italian C1 Cn Cc
(54) [[probabilmente C1] [sfuggito Cn al controllo Cc]]] del padrone C0]
“probably escaped the control of the master” (EuRom5 I03-1)
250
ba bil
con
200 pro sfu
te
men to
al pa
150 tro del
ggi llo
dro
100 ne
50
Romanian C1 Cc C0
(55) [[Unele clădiri C1 s-au dovedit Cc]a fi periculoase C0]
“Some of these buildings have proved dangerous” (EuRom4 rofn8R)
Romanian C1 Cc C0
(56) [[[Un grup C1] [de cercetători Cn germani Cc]] a rezolvat enigma C0]
“A group of German researchers has solved the enigma” (EuRom4
rofn3R)
350
300
e
250
grup vat
to a
200 Un de re
cer ce nig
ta ri zol
150 ger mani
ma
100
50
0
0 0.5 1 1.5 2 2.5 3
ˇ
[1] Un grup de cercetatori germani a rezolvat enigma
French
(57) [[les médecins C2] [[de l’Académie Cn des sciences C2] [médicales C1]]
“the doctors of the Academy of Medical Sciences” (EuRom5 F01-1)
200
150
decins
ca mie sciences mé
les mé de l’A dé des di cales
100
50
00 0.5 1 1.5 2
L1 [1] les médecins de l’Académie des sciences médicales
The glissando values of all four contours are above the threshold, and a
supplementary contrast between C1 and the C2 contours is ensured by a
difference of duration of the vowels implied: C2 about 100 ms, C1 160 ms.
b. C1 C1 Cn Cn Cn C0
Spanish
(58) [[El catalán C1] [es C1] [[la ochenta Cn y ocho Cn lengua Cn] del
mundo C0]]]
“Catalan is the 88th language of the World” (Euro5 E16-1)
300
250
200 lán yo
es cho
la
el ca ta o
150 chen ta len
gua del
100 mun
do
50
0 3
0.5 1 1.5 2 2.5 3.5
L1 [1] 16. El catalán es la 88a lengua del mundo
c. Cn Cn Cc Cn Cn Cc
Catalan
(59) [L’acadèmia Cn de la llengua Cn catalana Cc] [l’Institut Cn d’Estudis Cn
Catalans Cc] [IEC Cc]
“The Catalan language academy, the Institute of Catalan Studies (IEC) . . .”
(Euro5 C06-1)
200
dè
mia
150 ca de
l’a na
la llen tut tu Ca
lans
l’ln sti IEC
100 gua la d’Es
ca ta dis ta
50
The speaker of this example realizes a particular complex contour with both
rising melodic movement on the stressed and final vowels.
d. Cn Cn Cn C0 Another example using the same strategy, with a
succession of falling contours Cn –Glissando contrasting with the final contour
by the +/−Low and +/−Glissando features (C0 being + Low and +Glisssando),
as well as the vowel duration (about 90 ms for the first four stressed vowels, and
150 ms for the final vowel of the conclusive contour.
French
(60) [Le programme Cn de recherche Cn a débuté Cn en deux mille deux C0]
“The Research Program began in two thousand and two” (Eurom5 F05-1)
150 gramme té
pro cherche deux
a en mille deux
Le de dé
re bu
100
50
0
0 0.5 1 1.5 2
L1 [1] Le programme de recherche a débuté en deux mille deux
The example Poche zampate per attirare l’attenzione del piantone del
comando provinciale dei carabinieri (EuRom5 I03-1) illustrates a typical
case of melodic feature saturation, with a succession of neutralized contours
Cn ended with a complex contour Cc [[per attirare Cn l’attenzione Cn del
piantone Cn del comando Cn provinciale C2] dei carabinieri Cc] after a first
group [Poche Cn zampate C2]. In this example, the speaker cannot realize a
prosodic structure congruent to the relatively complex syntax (typical of
written texts and seldom, if ever, heard in a non-prepared spontaneous
production).
Italian
(61) [[Poche Cn zampate C2] [per attirare Cn l’attenzione Cn del piantone Cn
del comando Cn provinciale Cn dei carabinieri Cc]]
“There were few paws to attract the attention of the steering column of the
provincial command of the police . . .” (EuRom5 I03-1)
200
Po che zam pa
tti ra re zio to bi
150 cia nie
per tten ri
pain co ra
a ne ne man le ca
la do pro vin
del del
100 te dei
50
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
L1 [1] Poche zampate per attirare l’attenzione del piantone del comando provinciale dei carabinieri
Figure 7.84 Poche zampate per attirare l’attenzione del piantone del
comando provinciale dei carabinieri . . .
Italian
(62) [[Una nuova C1] [e divertente Cn ginnastica Cn con la palla Cc]] “A new
and fun gymnastic exercise with a ball . . .” (EuRom5 I05-1)
350
300
250 u na nuo
va
200 e ten
di ver te ti
150
gin nas ca con la pa lla
100
50
0
9 9.5 10 10.5 11 11.5
L1 [4] Una nuova e divertente ginnastica con la palla
Example (63) below shows a saturation chosen by the speaker when reading
the complex syntagm: es necesario alfabetizar a cuatro millones de personas
cada año (the two successive vowels [a] in cada año are pronounced as one
unique [a]), with a sequence of six neutralized contours Cn followed by the
final group rising contour C1.
Spanish
(63) [es Cn necesario Cn alfabetizar Cn a cuatro Cn millones Cn de personas
Cn cada año C1]
“it is necessary to alphabetize four million people each year . . .” (EuRom5
E02-1)
300
ne ce
250 sa rio
al fa cua tro
es llo nes a
mi de
200 zar so
be ti a per ca da
nas ño
150
100
50
0
30 30.5 31 31.5 32 32.5 33 33.5
[13] es necesario alfabetizar a cuatro millones de personas cada año,
Spanish
(64) [[La resolución C1] [propone Cn que se pongan Cn en marcha Cn
dispositivos Cn nacionales Cn de autocontrol Cc]]
“The resolution proposes to implement national self-monitoring devices
. . .” (Euro5 E12-1)
250
ción ti vos
cha
200 trol
pro po neque pon po si na con
La gan dis de
re lu se les
so mar ciona
150 en au
100
J
50
0
51 52 53 54 55
L1 [1] La resolución [2] propone que se pongan en marcha dispositivos nacionales de autocontrol.
Catalan
(65) [[Així C2] [abans Cn que acabiel Cn dos mil vuit Cc]]
“Thus before 2008 . . .” (EuRom5 C20-1)
200
Ai xí bans
150
a que vuit
ca
biel mil
100 a
dos
50
0
9.5 130 130.5 131 131.5
L1 [1] Així [2] abans que acabiel dos mil vuit
Portuguese
(66) [Com cerca Cn de sete Cn centímetros Cn de comprimento Cc]
“With about seven inches long . . .” (EuRom5 P02-1)
350
300
com
250
cer
se te com
200 ca
cen tí pri
de de mento
150 me tros
100
50
0
45 45.5 46 46.5 47
L1 [19] Com cerca de sete centímetros de comprimento
Romanian
(67) [[[In Germania C1] [violența Cn rasistă Cc]] [a depășit Cn limita C0]]
“Racist violence in Germany has exceeded the limit” (EuRom4 rofn22R)
350
300
nia ta
250
len sis
200 ma vio ta
In a Sit
de li
Ger ra pa
150 mi
ta
100
50
0
92.5 93 93.5 94 94.5 95 95.5 96
[1] In Germania violenta
, rasista
ˇ a depasit
, limita
Coordination
In a recent paper devoted to the prosodic aspects of coordinated constructions
in French (Mouret et al., 2008), the authors examine three syntactic configura-
tions implying coordination:
1. Postverbal position
a. simple (e.g.. il faut décorer Anne-Marie, Jean-Philippe et Ségolène “you
must decorate Anne-Marie, Jean-Philippe and Ségolène”)
b. duplicated (e.g. il faut décorer et Anne-Marie et Jean-Philippe et
Ségolène “you must decorate and Anne-Marie and Jean-Philippe and
Ségolène”)
c. juxtaposition (e.g.. il faut décorer Anne-Marie, Jean-Philippe, Ségolène
“you must decorate Anne-Marie, Jean-Philippe, Segolene”).
2. Preverbal position
a. simple (e.g. Anne-Marie, Jean-Philippe et Ségolène vont être décorés
“Anne-Marie, Jean-Philippe and Segolene will be decorated”)
b. duplicated (e.g. et Anne-Marie et Jean-Philippe et Ségolène vont être
décorés)
c. juxtaposition (e.g.. Anne-Marie, Jean-Philippe, Ségolène vont être décorés
“And Anne-Marie and Jean-Philippe and Segolene will be decorated”).
Seven speakers read 126 sentences of these different types. The acoustic
analysis of the recordings was displayed in order to highlight the melodic
movements occurring on stressed vowels, at the boundaries of the coordinated
Coordination, enumeration, parenthesis 193
300
250
su ta
200 d’ins et
je des ri
vous ggère des des
ller lets
vo deaux voi
150
lages
100
50
0
0 0.5 1 1.5 2 2.5 3
L1 [1] je vous suggère d'installer des volets des rideaux et des voilages
Figure 7.91 je vous suggère d’installer des volets des rideaux cet des
voilages “I suggest you install shutters, curtains, sheers.”
300
250 con et
je le
200 le
vous d’é tu da nor
néer lan nois
seille dier le dais vé
150 gien
100
50
0
0 0.5 1 1.5 2 2.5 3 3.5 4
L1 [1] Je vous conseille d'étudier le nééralandais le danois et le norvégien
250
200
mais ni
bé vid Ma prêts le
150 ja Jean dou à ve nir
Bar na Da ma ne seraient tra vailler sa
me
di
100
50
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
L1 [1] Jamais Barnabé Jean-David ni Mamadou ne seraient prêts à venir travailler le samedi
250
mu style
200
don
le glise ro man
150 ret le
jon
et
l'é sont de
100
50
0
0 0.5 1 1.5 2 2.5
L1 [1] Le muret le donjon et l'église sont de style roman
Figure 7.94 Le muret le donjon et l’église sont de style roman “The wall the
dungeon and the church are Romanesque.”
(70) [[Jamais C2 Barnabé C1] [Jean-David C1] [ni Mamadou C1] [ne seraient prêts
Cn à venir travailler Cn le samedi C0]]
(71) [[Le muret C2 le donjon C2 et l’église C1] sont de style roman Co]
The sequences of melodic contour are distributed according to Table 7.4.
Coordination, enumeration, parenthesis 195
A B et C et A et B et C ABC
Figure 7.95 Two possible hierarchical configurations [ABC] and [[A] [B]
[C]] for postverbal accentual units, resulting in sequences similar contours
rising, rising, falling.
250
200
lo le com
ptent
ller ou l’a ron ac ti vi
150 vé
ro vi
par mi les tés po pulairessur le
le cam
pus
100
50
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
L1 [1] le vélo le roller ou l'aviron comptent parmi les activités populaires sur le campus
250
mu style
200
don
le glise ro man
150 ret le
jon
et
l’é sont de
100
50
0
0 0.5 1 1.5 2 2.5
L1 [1] Le muret le donjon et l'église sont de style roman
Figure 7.98 le muret le donjon et l’église sont de style roman in which the
coordinate units are subject of the Verb Phrase sont de style roman.
Coordination, enumeration, parenthesis 197
(72) [[le vélo C1] [le roller C1] [ou l’aviron C1] [comptent parmi les activités Cn
populaires Cn sur le campus C0]]
(73) [[le muret C2 le donjon C2 et l’église C1] sont de style roman C0]
300
250
200
ni les
vo ni les ni
au cun ri les glerce
150 voi per
lets lages ne mettent
en deaux de ré pro
cas blème
100
50
0
0 1 2 3 4 5
L1 [1] En aucun cas ni les volets ni les volets ni les rideaux ni les voilages ne permettent de régler ce problème
Figure 7.99 A parallel realization where the first stress groups are
coordinated with the conjunction ni, associated in each case with an
emphatic accent. En aucun cas ni les volets ni les rideaux ni les voilages ne
permettent de régler ce problème.
300
250 plus
On peut liver ni
200 ne la le
baignoire l'évier sans
accompte
lavabo de votre
150
part
100
50
0
0 1 2 3 4 5
L1 [1] On ne peut plus livrer la baignoire le lavabo ni l'évier sans accompte de votre part
Enumeration
Enumeration implies the use of same class prosodic contours, usually C1 or Cn
in French, Cc or C1 in the other Romance languages, except for the last item.
Some examples are given in Figures 7.101 to 7.103.
(74) 1.46, 1.47, 1.44, 2.78, 2.41
(75) [. . . [B C2 antiparasiti Cc] [C C2 [anticoncezionali Cc] . . . [M Cc
antilopi C0]]
Coordination, enumeration, parenthesis 199
250
ou
50
0
119 120 121 122 123 124 125 126 127 128
L1 [42] 1,86, [43] 1,47, [44] 1,44, [45] 2,78, [46] 2,41 giorni.
[41] rispettivamente,
Figure 7.101 Enumeration in Italian of numbers (1.46, 1.47, 1.44, 2.78, 2.41),
sequence of Cc contours ending each prosodic group, terminated by C0
conclusive on the last item giorni (Italian EuRom4 I_09_04).
200
M anti
150 antiparasi ta ri anticongestio na li
lo pi
100 B C
50
0
2 2.5 3 3.5 4 4.5 5 5.5 17.5 18 18.5
Portuguese
(76) [[Uma equipa C2] [de cientistas C2] [do instituto Cn de reabilitação C2]
de Chicago]
“A team of scientists at the Rehabilitation Institute of Chicago” (EuRom5
P14-1)
350
300 u
ma
e
250 Ch go
Rea
qui tu
200 tis de ica
pa cien ti bi li
Ins to de ta
150 de tas do ~
çao
100
50
0
5 5.5 6 6.5 7 7.5 8 8.5
~
L1 [2] Uma equipa de cientistas do Instituto de Reabilitaçao de Chicago
Parenthesis
The stress group corresponding to a parenthesis in the text can either be
integrated prosodically and end with a C1 or Cc contour, or appear isolated
and end with a conclusive terminal C0 contour (cf. Gachet & Avanzi, 2008;
Debaiseux & Martin, 2010).
A first example (Fig. 7.104) shows the prosodic integration of the relative
pronoun che in the parenthesis in quel mondo, using a melodic contour C2
contrasting with Cc ending the parenthesis:
Italian
(77) [Il fatto che C2, in quel mondo Cc], [gli uomini Cc]
“The fact that, in that world, men . . .” (EuRom4 I16-21)
350
300
fa
250 tto gli
il uo ni
200 che quel do
mon mi
in
150
100
50
0
33 33.5 34 34.5 35
L1 [12] ll fatto che, in quel mondo, gli uomini
Italian
(78) [[che C2] [trasferirsi C1 in USA Cc]]
“which, moved in USA . . .” (EuRom4 I_23_06)
300
rir SA
si
250 in
che
tras fe U
200
150
100
50
0
91 91.5 92 92.5 93
L1 [1] che [2] trasferirsi in USA
The same configuration can be found for the other Romance languages, for
instance in Spanish (Fig. 7.106).
Spanish
(79) [. . .[permittan C1 una utilización C2] [más segura Cn de algo Cn que C2]
evidentemente Cc]
“allow a safer use of something that obviously . . .” (EuRom4 E19-17)
250
200 na
ttan
mi ra
u u ti
li ción más gu den
150 per za al go e te men te
se de
que vi
100
50
Italian
(80) [Ancora C1 I giapponesi, Cc] [per contro Cc]
“Yet the Japanese, by contrast . . .” (EuRom4 I_23_06)
350
300
co
si la
250 ha nno
ra ppo per più
i tro
200 ne
gia
al in ci den za
an con ta
150
100
50
0
97.5 98 98.5 99 99.6 100 100.5
[31] hanno la più alta incidenza
L1 [30] Ancora i giapponesi, per contro,
mondiale di tumori dello stoma-co,
Figure 7.107 Ancora i giapponesi, per contro hanno la più alta incidenza
mondiale.
An example of AM prosodic analysis in French 203
Italian
(81) [[Le donne C1 giapponesi C2] per esempio Cc] [[hanno Cn un’incidenza
C2] [di tumori Cn alla mammella Cc . . . ]]
“Japanese women, for example, have an incidence of breast cancer . . .”
(EuRom4 I_23_06)
400
350
300
nne ci
250
do un in
ha nno
gia
ppo pio den
200
le ne
150 si per e sem
100
50
0
72 72.5 73 73.5 74
[24] hanno un'incidenza di
L1 [23] Le donne giapponesi, per esempio,
tumori alla mammelle
Figure 7.109 Le coléreux garçon ment à sa mère (from Jun & Fougeron,
2002).
An example of AM prosodic analysis in French 205
Figure 7.110 le coléreux et mauvais garçon ment à sa mère (from Jun &
Fougeron, 2002).
configuration in two stress groups involved into a single stress group [Le
coléreux garçon] LHiLH*. The falling melodic contour on the final sylla-
ble of coléreux is explained by the interpolation which must exist between
the initial rise Hi and the initial L in the final prosodic word LH*.
However, the author’s interpretation associating a Hi on the single syllable
word ment is quite questionable, as this content word has only one syllable, and
if stressed, this single syllable can be a final AP pitch accent (in the interpreta-
tion of the authors).
The next example analyzed by the authors is le coléreux et mauvais garçon
ment à sa mère “The choleric and bad boy lies to his mother.”
Driven by the concept of a pitch accent realized with an H* tonal target, the
sentence is analyzed into three stress groups: [Le coléreux] [et mauvais
garçon] [ment à sa mère] so that each AP receives the sequence Hi H*.
Since it seems rather strange to associate an emphasis Hi on the second
syllable of mauvais, a better interpretation is also possible, with the following
segmentation:
(83) [Le coléreux] [et mauvais] [garçon] [ment à sa mère].
Actually, whether there are considered stressed or not, the final syllables
of le coléreux and et mauvais have a falling melodic contour, as predicted
by the ISC model, in a melodic contour sequence C2 C2 C1 C0 or Cn Cn
C1 C0.
Interestingly, the authors get puzzled by what they call “exceptional
cases” and discuss such an example where an AP ends with a falling pitch
accent.
206 The Incremental Prosodic Structure in Romance languages
300
F0 (Hz)
250
200
150
100
Hip LHi L* L- Hip Hi L* L- Hip LH*H% LH* LHi L*L%
d~ ÂE
~ R
l l R
dwav te t
e
e n c c e n mi na
n
le my aj rεs to re
0 0 3 0 0 3 0 0 4 0 1 4
Furthermore, the L symbol seems to be there because of the preceding dip in the
fundamental frequency curve, but this dip is phonetic due to the syllable-initial
voiced stop [d] (see Martin, 2008). Likewise, the second Hi aligned on the first
syllable of minaret is actually lower than the F0 value on the preceding ou. An
actual emphatic stress would have provoked a clear bump in the melodic curve.
Another remark pertains to the initial stress on the three conjunctions ou. No
phonological explanation for this fact is given here, although studies on
intonation of coordination in French provide simple explanations (all conjunc-
tion must be stressed equally – or non-stressed – in this kind of example;
Martin, 2009).
The observed prominences must be of another nature than pitch accents,
namely boundary tones. The problem then pertains to the Single Layer
Hypothesis (SLH), as even in simple sentences with only ip an IP specific
boundary tone will occur. This has been somewhat swept under the carpet by
Jun and Fougeron (2002) who managed to analyze very short sentences with
only IPs (see above). In a more general case, however, as already revealed in
Delais et al. (2014), in the sentence Ou le donjon ou le minaret ou les murailles
doivent être restaurés, there are two boundary tones ending the stress groups ou
le donjon and ou le minaret, with a falling melodic curve. These tones are of
another nature than the boundary tone, rising, that ends the sentence first IP Ou
le donjon ou le minaret ou les murailles.
If the hypothesis of having two different boundary tones is maintained, the
SLH is not valid anymore as contradicting the recursivity observed in this
208 The Incremental Prosodic Structure in Romance languages
example (ip groups prosodic words, and IP groups ip’s, even having ip and IP
reduced to a single component, respectively a stress group and an ip). If the
SLH must be kept at all cost, then the prosodic events on the first prosodic
words must be considered as pitch accents . . . French would then have prosodic
events which are sometimes pitch accents, sometimes boundary tones, the first
being realized with a falling pitch, the second with a rising pitch. Numerous
examples detailed in the next chapter show that this interpretation does not hold
when the prosodic structure becomes more complex.
tous
les
tout plaisir
pa
sur
in
fi ment c'est voir rents
qui fait quand meme de léves
ce m'a ni d'é
tous
les
du pri a
vé é
blie
comme pprou ver
léves charte
du pu cette
C1 C1 Emphasis
tous
les
tou plaisir
Cn pa
sur
in
fi ment c'est de voir rents
qui léves
ce m'a fait quandmeme ni d'é
Cn C1
C2
C2
Emphasis
tous
C1
les
du pri a
vé é
lic
comme léves
pprou ver C0
du pu cette charte
C2
C1
Cn
approuver Cn (“approve”)
cette charte C0 (“this chart”)
This particular example in French is canonical, in the sense that prosodic
events are instantiated by standard melodic contours conforming to commonly
found phonetic descriptions. Only the final C0 conclusive declarative contour
seems awkward on the melodic curve (see Fig. 7.115) due to overlapping of
another speaker pronouncing the word alors before the speaker (CA) had
finished (careful examination of harmonics displayed on a narrowband spectro-
gram shows the falling fundamental frequency of the contour realized as
expected).
The storage-concatenation process implies the (hypothetical) existence of
“buffers” of short-term memory keeping temporary information. Let’s call
them C0, C1, C2, and Cn. These buffers keep temporary strings of syllables
terminated by specific prosodic events C0, C1, C2, and Cn. As with the strings
of syllables, there is a maximum number of items (words) that can be stored
(thus remembered by the listener), in the order of seven.
First, at the beginning of the storage-concatenation process, all memory
buffers are cleared. Then along the time axis appear successively strings of
syllables organized (phrased) into stress groups, successively ended with
prosodic events C1, C2, C1, C2, C1, C2, C1, C1, Cn, and C0. The listener
must identify each of these prosodic events as belonging to a specific prosodic
events class. Only the final conclusive (and in this case declarative) contour has
to be realized according to a specific pattern in the language in question in order
to be always identifiable by the listener. The other prosodic events may vary in
their realizations, according to sociolinguistics parameters, geographic origins,
etc., but will be properly identified by the listeners after a short period of
adaptation if necessary.
An example of ISC prosodic processing in French 211
Then as the prosodic events are identified, the sequence of syllables ended
with this prosodic events are stored in their appropriate buffer, as shown in
Table 7.5:
Sequence of events (see Table 7.5):
Surtout C1| “Especially” ended with prosodic event C1, stored in
buffer C1;
ce qui m’a fait Cn quand même C2| “what made me” ended with C2,
stored in buffer C2;
infiniment Cn plaisir C1| “extremely happy” ended with C1, conca-
tenated with what was stored in C2 (ce qui m’a fait quand même)
and stored in buffer C1, which now contains ce qui m’a fait quand
même infiniment plaisir. C2 buffer is cleared;
c’est de voir C2| “it is to see” ended with C2, stored in buffer C2;
Syllabic
sequence Event Buffer C2 Buffer C1 Buffer C0
Surtout C1 Surtout
ce qui m’a C2 ce qui m’a
fait quand fait quand
même même
infiniment C1 Surtout + ce qui m’a fait
plaisir quand même
infiniment plaisir
c’est de voir C2 c’est de voir
tous les C1 c’est de voir + tous les
parents parents d’élèves
d’élèves
du privé C2 du privé
comme du C1 du privé + comme du
public public
tous les C1 tous les élèves
élèves
approuver Cn
cette charte C0 Surtout ce qui m’a
fait quand même
infiniment plaisir c’est
de voir tous les parents
d’élèves du privé
comme du public tous
les élèves approuver
cette charte
212 The Incremental Prosodic Structure in Romance languages
tous les parents d’élèves C1| “all the parents” ended with C1, con-
catenated with what was stored in C2 (c’est de voir) and stored in
buffer C1, which now contains c’est de voir tous les parents
d’élèves. C2 buffer is cleared;
du privé C2| “of the private” ended with C2, stored in buffer C2;
comme du public C1| “like the public school system” ended with C1,
concatenated with what was stored in C2 (du privé) and stored in
buffer C1, which now contains du privé comme du public. C2 buffer is
cleared;
tous les élèves C1| “all students” ended with prosodic event C1, stored
in buffer C1;
approuver Cn| “approve” ended with Cn, stored in buffer Cn;
cette charte C0| “this chart” ended with the terminal conclusive
prosodic event Co, concatenated with all remaining syllabic
sequences stored in buffer C1 to form the whole sentence. All
buffers are then cleared to process the next sentence.
In all concatenation stages, the resulting buffer cannot contain more than a
limited number of stress groups. In the final memorization process, one may
imagine that the speaker will remember only the key words. In the above
examples, surtout, plaisir, voir, élèves, privé, public, charte, i.e. a limited
number of words (only seven here) end stress groups with their final syllable
stressed.
Conclusion
By highlighting the dynamic time aspect of the prosodic structure and of the
encoding and decoding process performed by the speaker and the listener, an
unexpected coherence in the mechanism using a limited number of melodic
contours emerged for all the Romance languages considered in this chapter.
The comparison between prosodic realizations of similar sentences reveals the
similarities in the processing of the prosodic structure by both the speaker and
the listener.
In summary, the phonological contours on stressed vowels are:
Co Terminal (declarative or interrogative);
Cc Complex contour, slightly falling on the stressed syllable and
rising on the final syllable (stressed or not), absent in French;
C1 Rising (above the glissando threshold);
C2 Falling (above the glissando threshold);
Cn Neutralized (below the glissando threshold).
The essential differences between French and the other Romance languages
pertain to:
Conclusion 213
For a long time, the mere study of spontaneous speech was not considered
worthy of scientific investigation, as this kind of speech style was regarded as
being full of mistakes and could reflect the bad usage of “uneducated people.”
Although this view is becoming exceptional, many linguists still insist on
considering written text, which is by definition created by “educated” people,
as the sole legitimate production of language for scientific description. Indeed,
syntactic theories often use linguists’ intuition to validate analyzed examples,
and linguists’ intuition naturally follows the rules of written language.
Pioneering work has been underway since as early as the sixteenth century
(Meigret, 1550), and linguists like Ferdinand Brunot were interested in regional
popular productions since the 1910s. A real development of research on spoken
French appeared after the 1968 student uprising in France, with the work of
GARS (Groupe Aixois de Recherche en Syntaxe), initiated by Claire Blanche-
Benveniste (among others). At this time one of the reference titles on the
linguistic analysis (essentially syntactic) of spoken French appeared, the per-
iodical edited by GARS appropriately entitled Recherche sur le français parlé
(1977–2004).
Interestingly, one of the key factors in encouraging research on sponta-
neous speech came from developers of computer applications in speech
recognition. Indeed, if some current efficient techniques (for example Siri
for Apple, or Dictanote for Google) rely on the speaker context and situation
to proceed to the recognition of single words or short sentences, taking
advantage of the extra-linguistic information they may currently have on
individual speakers, the automatic recognition of natural, non-prepared
speech is hampered in existing systems by the use of a grammar based on
written text properties. These systems would give reasonably good results
when reading a text (without the “mistakes” found in spontaneous speech).
The use of a grammar in the software boosts the recognition rate from about
70 to 75% when no grammar is used to about 90 to 95% with an embedded
syntactic description of the language in question, allowing locally missing
phonetic information to be supplemented by the redundancy present in
written texts.
214
A first approach 215
A first approach
One of the key elements in macrosyntactic analysis pertains to the definition of
the sentence itself. With the Written Language Bias at work (Linne, 2005), the
sentence is simply defined by an uppercase character at its left boundary, and a
dot (or a question mark) at its end. Of course, this definition cannot totally apply
to oral production, unless we prove that the conclusive declarative and inter-
rogative contours transcribed with an orthodox dot act as reliable boundaries by
their relatively stable features. Likewise, commas in the text represent prosodic
breaks, and no further investigation on this question was felt necessary in most
grammatical studies.
One early approach to macrosyntax is linked to the so-called “left disloca-
tion” and “right dislocation” in sentence configuration. Dislocation in a sen-
tence occurs, in a traditional syntactic description, when a constituent is placed
outside clause boundaries, whereas it could otherwise (in its basic grammatical
form) be an argument or an adjunct placed inside the clause. In English,
examples could be “Romeo and Juliet, they met on the balcony” for a left
dislocation, and “They met on the balcony, Romeo and Juliet” for a right
dislocation. Both of these written examples include a comma, supposedly
transcribing some expected prosodic event eventually including a pause.
216 Macrosyntax
The answer took the form of manually selected multi-analysis methods, allow-
ing the recovery of a reliable melodic curve in adverse recording conditions.
The importance of this kind of spontaneous speech recording is enormous, as
we cannot underestimate the space of prosodic possibilities in spontaneous
speech. Whereas most researchers in the domain work under laboratory pho-
nology conditions, analyzing very short coined sentences, a large number of
unexpected cases produced by speakers in real life made necessary some
radical changes in the theoretical vision, leading among other things to the
consideration of many variants in prosodic realization not expected earlier.
At least three definitions pertain to the term macrosyntax (cf. Avanzi, 2012):
1. C. Blanche-Benveniste (1980): With her associates in GARS in Aix-en-
Provence, Claire Blanche-Benveniste considered that two different types of
syntactic dependencies must be considered to describe properties of sponta-
neous speech: (a) the morphosyntactic combinations belonging to the verb and
its dependents domain (grammar of categories) and (b) the macrosyntactic
dependencies accounting for the oral and written (long) productions.
2. A. Berrendonner (1990, 2003): For Alain Berrendonner and Marie-José
Béguelin (Groupe de Fribourg), macrosyntax pertains to the succession of
communicative acts, i.e. to the sequences of sentences with their contextual
and praxeologic aspects. This kind of macrosyntax concerns units larger than
sentences analyzed by GARS, and describes the relations (syntactic, semantic,
pragmatic) between sentences in the discourse.
3. E. Cresti – M. Moneglia (Cresti, 2000): For Emanuela Cresti and
Massimo Moneglia, head of the research laboratory LABLITA in Florence,
macrosyntax refers to the pragmatic aspects of sequences of speech acts (cf.
Austin, 1962). The sentence is viewed as an informational and pragmatic unit
of speech analyzed into connected segments of pragmatic and information
value. Their analysis of the sentence uses all linguistic objects encoding the
informational structure, whether syntactic, semantic, prosodic, or pragmatic.
It is possible to find some bridges between these three approaches, all
essentially devoted to the linguistic analysis of spontaneous speech. For
instance, GARS analyzes the sentence into macrosegments: an optional
Prenucleus, a Nucleus, and an optional Postnucleus (also called originally
Postfix, Nucleus, and Postfix). To give a quick definition, the Nucleus is a
segment that can be extracted from the sentence and constitutes a well-formed
autonomous sentence by itself. For Cresti–Moneglia, the sentence is analyzed
in an optional Topic, a Comment, and an optional Appendix. However, the
criteria of analysis are quite different, GARS using relations of (or rather
the lack of) syntactic dependency to segment the sentence into Prenucleus,
Nucleus, and Postnucleus, whereas the communicative approach of Cresti–
Moneglia utilizes semantic, syntactic, and prosodic criteria to identify Topic,
Comment, and Appendix.
All these approaches aim to describe what they observed in spontaneous
speech, a domain outside the rection (relations of dependency) between gram-
matical categories (GARS), where specific non-dependency relations, apposi-
tions, detachment cannot be explained with grammar (Fribourg), or where
some configurations are impossible to explain with a simple constituent gram-
mar (Lablita).
Interestingly, at first, only the Lablita group was interested in the intonation
aspects of the analysis, whereas it has been marginal for the GARS team. The
then called theme–rheme division of the sentence, where the theme pertains to
Three current models for macrosyntax 219
information belonging to the context or the situation of the speech act, had the
intonative aspects already described by Bally (1944) among others and largely
documented in numerous papers by phoneticians (even in Martin, 1975). In
these analyses, the theme segment carries a somewhat flat melodic curve,
whereas the rheme ends with a conclusive contour (in declarative mode),
corresponding to Appendix and Comment for Lablita.
Likewise, Topic in the Lablita approach does resemble Prenucleus in the
GARS acceptance and may sometimes be equivalent under specific condi-
tions. The difference relates to the way these units are defined, i.e. on
semantic, prosodic, or syntactic criteria. In both cases, a sentence can inte-
grate more than one Topic or Prenucleus, but Topics are typically ended with
a specific list melodic contour (actually a C1 or Cc for Romance languages
other than French, as defined in Chapter 7). On the other hand, Prenuclei are
strictly defined by the lack of dependency relation toward macrosegments
that precede or follow (and are therefore not necessarily ended with a specific
melodic contour, although the presence of these prosodic markers is
frequent).
In the GARS macroanalysis, more than one Nucleus can be present in the
text side of a sentence, but there is only one that ends with a conclusive
prosodic marker. For Lablita, there is only one Comment and possibly more
than one Topic and more than one Appendix. However, a confusion may occur
when the text and the sentence intonation are considered simultaneously to
define a unique Nucleus (see below).
There is also a difference in the treatment of Parentheses. For Lablita, an
Appendix can occur between a Topic and a Comment, and corresponds to a text
parenthesis integrated in the overall sentence prosodic structure for GARS. A
GARS text parenthesis aligned with a prosodic parenthesis is equivalent to a
Lablita definition of the parenthesis.
In Lablita the intonation allows one to determine the sentence segments in
discourse, as every native speaker is assumed to be capable of identifying
important prosodic breaks, as well as the types of speech acts (type of illocu-
tion, declaration, interrogation, etc., with many possible variants – cf. Cresti et
al., 2002) as they are correlated with specific intonative contours (actually
terminal conclusive contours essentially).
Taking into account the fact that the study of the prosodic aspects started to
develop only recently (Debaisieux et al., 2008), even today most syntactitians
and macrosyntactitians alike tend to consider sentence prosodic events as an
accessory to (macro)syntax, eventually looking at intonation “breaks” only if
the syntax “fails,” i.e. when no syntactic marker indicates a macrosegment
boundary, for example (cf. Lacheret, 2003). I will show below that sentence
intonation deserves a macroanalysis per se, in order to better understand the
apparent coexistence of two independent domains.
220 Macrosyntax
Text
Prenucleus – Nucleus – Postnucleus le métro c ’est sous terre le métro
Intonation
Prosodic Nucleus – Postfix - Suffix le métro ( ) c ’ est sous terre le métro
customers I ate at noon I was with customers I slept in the evening I was with
customers” (corpus Olive), three identical text Nuclei are present in the same
sentence, as revealed by its grid representation:
je me levais le matin j’étais avec des clients.
je mangeais à midi j’étais avec des clients.
je dormais le soir j’étais avec des clients.
Only one of these nuclei has its final clients aligned with the final prosodic word
containing the conclusive contour. From the point of view of the GARS
tradition, all three occurrences are Nuclei.
Should the macrosyntactic model then consider only the last occurrence of
j’étais avec des clients as the sentence Nucleus?
The same problem occurs with parentheses. A text Nucleus can incorporate
one or more parentheses, which are aligned either with prosodic groups or with
an independent Prosodic Nucleus, and Infix (see discussion below). Likewise,
independent Prosodic Nuclei can be associated with text macrosegments which
do not present syntactic characteristics of parentheses.
Should the model differentiate between these three categories of parenth-
eses? It seems reasonable to give more semantic weight to macrosegments of
text and intonation that match, i.e. which are aligned on each other. For
example, in respect of a text parenthesis aligned on a Prosodic Nucleus it
appears at least intuitively that the parenthetical effect is stronger than if the
text parenthesis is integrated in the sentence Nucleus, i.e. in the sentence
prosodic structure.
An example of parenthesis (underlined) prosodically integrated and ending
with a C1 rising contour:
[notre métier c’est pour ça que il y a plus de jeune(s) qui veut venir sur notre métier
C1] [c’est trop dur C0][crfp]
“Our job that’s why there aren’t more young(s) who want(s) to come on our
business is too hard.”
224 Macrosyntax
Dysfluencies
Written transcription of spontaneous (i.e. non-prepared) speech almost
always reveals the presence of dysfluencies, elements that seem unnecessary
and that are systematically removed in “correct” transcriptions, as they are,
for example, in declarations of politicians in newspapers. These dysfluencies
appear under various forms: hesitations (instantiated by euh in French, for
example, or by a lengthening of the last vowel preceding the hesitation.
Italian um, Spanish um, Catalan um, Portuguese hum, Romanian um), by
primers of stress groups followed by reformations, by repetitions, by aborts of
the current stress group before its completion, etc. Although they may be
removed to obtain “correct” written text, these dysfluencies belong to the
macrosegments.
Some examples are extracted from the French spontaneous text analyzed
further below.
Hesitations (filler)
bon je reviens sur cette euh ce problème
“well I’ll be back on this uh this problem”
bon ben je la prends et euh et voilà quoi
“well I take it and uh well”
Repetitions
quand même y a des des sous sur le compte
“even when there there is money on the account”
c’est pas loin euh tu tu j’y vais à pied
“it’s not far you you go by foot”
Ponctuants 225
Reprises / Reformulations
bon je reviens sur cette euh ce problème
“well I come back on uh this this problem”
c’est pas loin euh tu tu j’y vais à pied
“it’s not far you you go by foot”
Aborts
elles marchent elles non non elles ont tendance à non non elles adorent marcher
“they walk they tend to no no no no they love to walk”
(From Anita Musso, CFPP 2000)
Ponctuants
Another typical characteristic of spontaneous speech is the presence of ponc-
tuants, which at the difference of dysfluencies, may be left in a “cleaned”
written transcription. A typical (non-exhaustive) list in French is given below
(Morel & Danon-Boileau, 1998):
Tu vois, Hein, Quoi, Enfin, Pour, Alors, Ecoutez, Non mais allo quoi, Attends, Allo, En
tout cas, Ah la la, Ah, Non, Oui, Ouais, Disons, Je veux dire, Bon alors, Et puis,
Ecoutez . . .
“You see, Huh, What, Finally, To, So Listen, But not allo what, Wait, Hello, In any
case, Oh la la, Oh, No, Yes, Yeah, Say, I mean, Okay so, Listen . . . ”
As their name suggests, the ponctuants are small expressions usually placed
at the beginning or the end of macrosegments (and sometimes inside macro-
segments) to replace specific prosodic boundaries that would be transcribed by
a punctuation mark in a written transcription.
To be ponctuants, some of these verbal groups, interjections, etc. need to be
associated with a specific intonation. This is the case for tu vois ended with a
rising melodic contour.
A famous example from a young French TV reality show personality
(Nabilla Benattia): non mais allo quoi “no but hello what,” a sentence
containing only ponctuants, pronounced as three stress groups [non mais
Cn] [allo C0] [quoi C0n]. In this example, each ponctuant suggests to the
listener an elliptic content not formulated but easily recoverable: non mais
introduces a contradictory remark pertaining to the context and/or the situa-
tion, allo indicates the necessity to pay attention to what is going to follow to
begin a telephone conversation, quoi is frequently used as a final concluding
ponctuant. In the original example, allo is associated with a final conclusive
contour. And quoi to a flat terminal contour of a Postfix. In other examples the
ponctuant quoi may also carry a terminal conclusive contour and therefore
end a Prosodic Nucleus.
226 Macrosyntax
Use of dysfluencies
Given that dysfluencies leave traces of mechanisms of discourse production,
it becomes possible to explain or at least propose explanations pertaining to
the formation of syntagms in speech (Blanche-Benveniste, 2003; Martin,
2012b). In written production, hesitations, reformulations, additions, hesita-
tions characteristic of oral production, can be crossed out or integrated in the
text (cf. famous heavily crossed-out manuscripts of nineteenth-century
authors). Today, all these corrections may be totally removed by the use of
text editors, often automatically.
It would seem at first that these dysfluencies would hamper a proper and
easy comprehension of the speaker discourse, but this is far from being the
case. The Incremental Storage-Concatenation model gives explanations as to
how the speaker can use tools that would prevent the learner’s memorization
of stress groups by abort or reformulation, or at a higher level (in the prosodic
structure) to insert an unplanned addition that would be concatenated in
the general prosodic structure, even when the utterance has been signaled
finished with a conclusive contour (case of the deferred complement –
epexegesis).
The prosodic eraser 227
Deletions
The conversion of a sequence of syllables into a stress group is synchronized by
a lexical stress (in Romance languages), a group stress (French), or a combina-
tion of lexical and group stress. As mentioned earlier, the text aligned with the
prosodic word, i.e. the stress group, may contain not only a content word (verb,
noun, adjective, or adverb) but also grammatical words and even only gram-
matical words. Furthermore, in French, stress groups can contain more than one
content word. There are also cases of stress groups containing one single
syllable part of a multisyllabic word (in “em-pha-sis” mode).
This conversion of a stress group into a higher linguistic unit cannot take place if
the stressed syllable is not effectively realized in the sequence. In French, this
means that incomplete stress groups will not be processed and kept in the listener
memory for further treatment, and in particular for the concatenation to form the
complete prosodic structure. For the other Romance languages, conversion trig-
gering will occur, but the sequence will (normally) not be recognized as part of the
stress group known by the listener (although missing syllables after the stressed
one may be eventually completed, there won’t be enough time for this operation if
a reformulation occurs immediately). In both cases, the incomplete stress group is
not memorized, which is equivalent in the final result to an erasure from the listener
memory. This is thus also equivalent to crossing out a written text segment.
The reformulation of a segment may be accompanied with a morphological
adjustment as in the French example:
Alors la l’infirmière de temps en temps me m’humectait euh les lèvres (Vallier)
“then the the nurse from time to time would moisten euh my lips.”
Alors la → l’infirmière Morphological adjustment by elision of la before a vowel l’.
Je je j’ai eu les jambes qui ont tremblé (Selin)
“I I had legs that trembled.”
Je je → j’ai eu les jambes Morphological adjustment by elision of je before a
vowel j’.
In Blanche-Benveniste’s interpretation, these hesitations-reformulations
would indicate a separation in the lexical and syntactic processes: the syntactic
frame is set, planned, but the choice of the appropriate lexical unit is not
finalized. What’s more important, is that even if there is no phonetic adjustment
by elision in the reformulation, it is the entire stress group which is pronounced,
and not just the missing element finally found by the speaker as, for example, in
Je je je vais le faire bientôt (CL 96) “I I I am going to do it soon.”
Et on les on les cultive comme ça (Choix) “and we we cultivate them like that.”
known by the listener. If not, decoding is always feasible, but at the expense
of a longer cognitive treatment (N400 and P600, see Chapter 5), revealed
in particular by the presence of time-expensive mismatch negative or
positive brain wave oscillations. The negative N400 is due to the occurrence
of an unexpected negative spike in EEG recordings when a semantic error
exists in processed speech (Steinhauer et al., 1999), whereas the positive
P600 is linked to any extra syntactic processing by the listener (Wang
et al., 2012).
According to the class of prosodic events identified by the listener as coded
by the speaker, stress groups already stored in listener memory are concate-
nated at their appropriate level, i.e. with the last prosodic group bearing the
same class of identified prosodic event. This is the base of the ISC process. If
the prosodic word is incomplete, in French when the last syllable is not yet
pronounced, in the other Romance languages when the well-formed prosodic
word is not yet completed, the storage-concatenation process is suspended,
waiting for a new eventually reformulated well-formed sequence of syllables
forming a stress group.
This mechanism is distinct from the insertion of a hesitation marker (like uh
in English, euh in French, etc.) or a lengthening of the last pronounced vowel
(or syllable) not finishing the planned stress group. In these cases, the stress
group production is simply interrupted temporarily, but resumes as a well-
formed stress group without erasing all stored syllables that were already
pronounced.
Additions
In both cases, once the prosodic contour characterizing the position of
the prosodic word in the sentence prosodic structure is pronounced by
the speaker, it enters the storage-concatenation process and cannot be
removed from the listener memory. However, a non-planned stress group
or even a large non-planned stress phrase can be inserted immediately
after by the speaker, provided it ends with the same class prosodic
contour. Therefore, this insertion can take place at any level in the
prosodic structure, and not only after the conclusive terminal contour.
These cases correspond to the epexegesis (deferred complement) and the
Suffix (in GARS macrosyntax) but also to additions made inside the
Prosodic Nucleus.
Epexegesis, from classical Greek epexēgēsi, is defined as the “addition
of words to clarify meaning.” In terms of macrosyntax, it is a process
allowing the speaker to add a sequence of words contained in one or more
stress groups by simply terminating the supplementary stress group by a
melodic contour belonging to the same class as the final contour where
The prosodic eraser 229
250
200
qu’on devait prendre
150 en Angleterre dans le pour rejoindre euh
euh euh notreville d’accueil
métro
100
50
40
30
20
10
0
4 5 6 7 8
350
300
mes parents
250
200
m'emmenaient à l'école
150
50
40
30
20
10
0
66 66.5 67 67.5 68 68.5 69 69.5
pk1 [22] à l'école
pk2 [20] oui alors j'suis rentrée ouais j'suis bien rentrée à l'école ... [21] à l'école [23] le temps du déménagement la première ou la ...
these sequences should have been inserted (Godement & Martin, 2010).
This process allows the speaker to make a correction on the syntactic
structure planned during the utterance process, as in the example of
Figure 8.2.
This same mechanism of correction is possible before the Nucleus to add
a Prenucleus that should have been included in the preceding Prenucleus
(Fig. 8.3, CFPP, 2000).
Another example is given in Figure 8.4.
230 Macrosyntax
200
tu sseurs belges
150 et
euh de
l'in quié des bra
de
100
50
0
23 23.5 24 24.5 25 25.5 26 26.5
L1 [2] et euh de l'inquiétude des brasseurs belges par rapport à ce qu'était la
consommation française qui va rester je vous le confirme à un bon niveau
Figure 8.4 [Je confirme que le premier ministre Elio Di Rupo m’a parlé de
cette situation et euh de l’inquiétude des brasseurs C1] [belges C1] [par
rapport à ce qu’était la consommation française . . .] [FH]. By ending the
Prenucleus [belges] with the same contour ending the preceding Prenucleus,
the speaker realizes a correction compared to what it may have been planified:
[. . . l’inquiétude des brasseurs belges] “I confirm that the Prime Minister Elio
Di Rupo told me about this situation and uh of the anxiety of Belgian
brewers.”
Prenucleus + Nucleus
[Le lendemain C1] [grande surprise C0] “the next day big surprise”
The prenucleus is included in the next statement by the prosodic structure,
and carries a melodic contour final amount of slope opposite the terminal
descending contour of the utterance.
Noyau + Postnucleus
[à la caisse C0] [ils se pèsent C0n] “at the till they are weighed”
The Nucleus ends with a descending melodic contour sharp variation, while
the Postnucleus has a falling contour lower melodic variation (symbol C0n).
The two contours are necessary to ensure the indication of this structure, which
opposes a core configuration + Suffix as shown in the following example.
Nucleus + Suffix
[j’achète beaucoup de médicaments C0] [qui ne sont pas remboursés C0]
“I buy a lot of drugs which are not reimbursed.”
The sentence presents two independent prosodic structures, the cohesion of
the two to form the whole sentence is provided by a syntactic relationship
implemented by the relative pronoun qui.
232 Macrosyntax
Parenthesis
[tout le monde faisait C1] [j’en ai fait moi-même C0] [de l’aviron C0]
“everyone did rowing I did some myself.”
The two parts of the Nucleus, tout le monde faisait and de l’aviron, are
separated by the parenthesis j’en ai fait moi-même, which is aligned on an
independent prosodic structure, ended with a terminal conclusive contour C0.
Autonomous sentences
Sentences can be autonomous if (a) they carry a terminal conclusive contour
and (b) if they refer to some information contained either in the sentence or in
the context or/and the situation of the speech act (Martin, 2015).
There are statements where the text does not seem to form a communicatively
independent Nucleus. The statement parce que nous le valons bien “because we
deserve it” (L’Oreal) is introduced by a subordinating conjunction and thus
appears as a Prenucleus (or possibly Postnucleus), but the sentence ends with a
conclusive melodic contour. As the text refers to a cultural context (“we deserve
to buy expensive cosmetics”) the sentence is therefore autonomous.
Statements that do not end with an implicative contour, generally rising and
carrying a rising-falling melodic movement, also exist. This contour leads the
interpretation of the context and/or situation of the speech act. It is also
possible to find statements ended with continuation majeure, as in si tu
crois que ça m’ennuie “if you think it bothers me,” which makes the statement
possibly autonomous. Other similar examples call for a follow-up by other
participants to the speech act, as in et donc . . . “and then,” inviting the speaker
to go on.
Finally, the Nucleus can end with a ponctuant carrying a conclusive
melodic contour. In French, the example non mais allo quoi “no but hello
what” mentioned earlier that ends with a conclusive contour is therefore
autonomous.
French
Les vieux graphistes The first example is from a GARS corpus (Les
vieux graphistes).
Les vieux graphistes ou les anciens je devrais dire graphistes pas les vieux
quelquefois lorsqu’ils voient les mises en page de certaines revues ou de
certains journaux ils se mettent les mains sur la tête [AIX-R00PRO001].
“Aged graphic designers or should I say ancient not aged graphic designers some-
times when they see the layouts of some magazines or some newspapers they put
their hands on the head.”
The melodic curve of this spontaneous speech example is given Figure 8.5.
As discussed earlier, a spontaneous speech utterance consists of one or more
macrosegments. The first step in a macrosyntactic analysis involves the identi-
fication of the Nucleus. By definition, the Nucleus can appear by itself without
any other macrosegment and be well formed both syntactically and prosodi-
cally. This property can easily be tested by extracting the macrosegment
234 Macrosyntax
(a)
phistes
ciens
les gra an
vieux pas
100 les je de les
phistes ou vraisdire vieux
gra
50
40
30
20
10
0
2 2.5 3 3.5 4 4.5 5 5.5 6
8.5a. Les vieux graphistes ou les anciens je devrais dire graphistes pas les vieux.
(b)
neaux
fois mains
ils
sur
lors
qu'ils cer la
taines les te
quel voient les page de de cer tê
que mises re tains se
êr vues ou jour mettent
Figure 8.5 Melodic curve with the pitch movements on stressed syllables
circled.
considered as Nucleus from the speech recording with a sound editor. When
heard, the extracted macrosegment should appear complete (i.e. ending with a
conclusive terminal contour) and be syntactically well-formed. In the example
above, the segment ils se mettent les mains sur la tête fills these two conditions
and is accepted as the Nucleus of the utterance.
To identify the other macrosegments boundaries, it is imperative to
separate the identification of the text boundaries from the prosodic bound-
aries, as they are not guaranteed to match. For text, one of the easiest
ways consists in detecting the break in the dependency relations binding
syntactic units inside each macrosegment. The absence of these relations
generally indicated a macrosegment boundary. When applied to the
example, five macrosegments (including the already identified Nucleus)
emerge:
Examples of macrosyntactic analysis 235
(Les vieux graphistes ou les anciens) (je devrais dire graphistes) (pas les vieux)
(quelquefois lorsqu’ils voient les mises en page de certaines revues ou de certains
journaux)
(ils se mettent les mains sur la tête)
The same segmentation has to be done for the prosodic side of the utterance,
in order to obtain the sequence of prosodic words together with their melodic
contours:
[[Les vieux Cn graphistes C2] [ou les anciens Cph [je devrais dire C0] gra-
phistes C1]]
[pas les vieux C0]
[quelquefois C1]
[[lorsqu’ils voient Cn les mises en page C2] [de certaines revues Cn ou de certains
journaux C1]] [ils se mettent Cn les mains Cn sur la tête C0]
Several segmentations seem possible (e.g. [pas les vieux quelquefois] instead
of [pas les vieux] [quelquefois . . .]), but this first segmentation pertains only to
the text side of the utterance. The prosodic macrosegment analysis may not
reveal boundaries corresponding to the boundaries of the text macrosegment.
Conversely, the prosodic segmentation leads us to identify je devrais dire and
pas les vieux, ending with conclusive contours C0, as parentheses, whereas the
analysis on the text side would give the syntagm je devrais dire graphiste.
The principles of the storage-concatenation process of prosodic words
are then applied, by indicating the type of pitch movement observed. It is
this process which is accomplished first by the listener, revealing the
importance of the prosodic structure as the first organization of utterances
in speech, the (macro)syntactic analysis taking place in a second step
(see Table 8.2).
Remarks:
1. The first stress group Les vieux has a neutralized contour to ensure a
differentiation with the falling contour ending Les vieux graphistes.
2. The group ou les anciens has an emphatic accent on the first syllable of
anciens, which prevents the realization of a stressed syllable group on the
last syllable of the word.
3. The segments je devrais dire and pas les vieux are labeled with the falling
melodic variation and an intensity drop of about −6 dB on the last syllable.
Since they are a well-formed macrosegment on both syntactic and prosodic
levels, they can be isolated and form complete sentences by themselves.
They can also be removed from the whole sentence without modifying the
overall prosodic structure.
This segmentation of the utterance in stress groups, as indicated by
melodic contours on their last stressed syllables, leads to the ISC schema
of Figure 8.6.
236 Macrosyntax
Cn C2 C1 C0
Les vieux
graphistes
ou les anciens
je devrais dire
graphistes
quelquefois Utterance
lorsqu’ils voient
de certaines revues
ou de certains journaux
ils se mettent
les mains
sur la tête
Figure 8.6 The ISC schema of the example Les vieux graphistes ou les
anciens je devrais dire graphistes pas les vieux quelquefois lorsqu’ils voient
les mises en page de certaines revues ou de certains journaux ils se mettent les
mains sur la tête. On the graph, time flows from top to bottom, and the
horizontal axis represents the different assembly levels corresponding to
prosodic falling, flat (neutralized) and rising movements.
Examples of macrosyntactic analysis 237
a. Dysfluencies
Hesitations: euh eight occurrences;
Repetitions: tu tu, followed by one abort;
Reprises and reformulations: cette → ce problème, de → d’être chez
moi, il y a le → il y a un côté;
Aborts: tu tu, et d’aller à, dedans et d’aller euh à, faudrait que je.
b. Ponctuants
Macrosegment initial: bon, voilà, bon ben là;
Macrosegment final: euh voilà, voilà.
c. Text macrosegments
(bon je reviens sur cette euh ce problème qui est un problème (euh voilà) de d’être chez
moi) (combien de fois ça m’est arrivé) (bon ben là tu vas boulevard Voltaire) (c’est
pas loin) (euh tu tu j’y vais à pied) (je suis chez moi) (je me conditionne dans mon
appartement en me disant) (j’y vais à pied) (moi) (ma voiture) (elle est garée dans la
238 Macrosyntax
rue) (j’ai un stationnement résident) (je passe devant) (je ne peux pas m’empêcher
d’ouvrir) (euh la porte) (de monter dedans et d’aller euh à euh voilà) (cinq minutes en
voiture) (ce qui me mettrait peut-être euh un petit quart d’heure à pied) (donc au
dernier moment je prends ma voiture) (sur le coup) (je me dis) (je vais mettre cinq
minutes) (mais le temps de me garer) (de tourner) (de faire des ronds pour pas mal me
garer et tout) (je sais que je suis perdante) (je le sais que je suis perdante)
(il y a le) (oui) (il y a un côté de facilité de passer devant sa voiture et de se dire) (bon
ben je la prends et euh et voilà quoi) (au point où on en est) (c’est pas très malin)
(faudrait que je) (c’est une habitude en tous cas que j’aimerais changer)
d. Prosodic groups
The prosodic groups are determined by the ranking of contours in French: Cn <
C2 < C1 < C0. From the labeling of prosodic events located on stressed
syllables, it is then possible to identify the Nuclei, whose left boundary is a
text macrosegment boundary without syntactic relation to the left (i.e. toward
what precedes), and the right boundary is aligned with a conclusive declarative
contour C0. The prosodic structure reorganizes the text macrosegments as
follows:
[bon je reviens Cn sur cette euh ce problème Cn qui est un problème Cn euh voilà C1]
[de d’être chez moi C1]
[combien de fois Cn ça m’est arrivé C1]
[bon ben là tu vas boulevard Voltaire Cn c’est pas loin Cn euh tu tu j’y vais à pied C1]
[je suis chez moi Cn je me conditionne Cn dans mon appartement Cn en me disant Cn
j’y vais à pied C1]
[moi Cn ma voiture Cn elle est garée dans la rue C1]
[j’ai un stationnement Cn résident C1]
[je passe devant C1]
[je ne peux pas m’empêcher d’ouvrir C1]
[euh la porte Cn de monter dedans Cn et d’aller C1 euh à euh voilà C1] [cinq minutes
en voiture C1]
[ce qui me mettrait peut-être Cn euh un petit quart d’heure Cn à pied C0]
[The first sentence has 11 prosodic groups ended with C1 (continuation majeure), and
the last groups is terminated by a conclusive contour C0. This segmentation defines
a Prosodic Nucleus [ce qui me mettrait peut-être Cn euh un petit quart d’heure Cn
à pied C0].
[donc au dernier moment C1 je prends ma voiture C0]
A simple two stress groups’ prosodic structure.
[sur le coup C1]
[je me dis C1]
[je vais mettre Cn cinq minutes C0]
The text and Prosodic Nuclei [je vais mettre Cn cinq minutes C0] are aligned in
this case.
Examples of macrosyntactic analysis 239
[mais C1]
[le temps Cn de me garer C1]
[de tourner C1]
[de faire des ronds Cn pour pas mal me garer et tout C1]
[je sais Cemph que je suis perdante C0]
The rising melodic contour on je sais is an emphasis marker (accent d’insis-
tance) and not a continuation majeure contour.
[je le sais C0 que je suis perdante Con]
The prosodic construction in this sentence is Prosodic Nucleus + Suffix, as
indicated by a flat melodic contour on perdante, aligned on the text configura-
tion (je le sais) (que je suis perdante).
[il y a le oui C1]
[il y a un côté de facilité C1]
[de passer devant sa voiture C1]
[et de se dire C1]
[bon ben je la prends C1]
[et euh et voilà C0] [quoi Con]
Succeeding to five text PreNuclei ended with a C1 contour, the Nucleus [et euh
et voilà C0] is followed with a Postfix aligned on the ponctuant quoi.
[au point où on en est C1]
[c’est pas très malin C1]
[faudrait que je C1]
[c’est une habitude Cn en tous cas Cn que j’aimerais changer C0]
The last sentence has three Prefixes followed by the Prosodic Nucleus [c’est
une habitude Cn en tous cas Cn que j’aimerais changer C0].
The example also has one text Parenthesis:
[euh voilà C1] integrated in the prosodic structure and embedded in the Prefix [bon je
reviens Cn sur cette euh ce problème Cn qui est un problème Cn [euh voilà C1] de
d’être chez moi C1]
[bon ben là tu vas boulevard Voltaire Cn c’est pas loin Cn euh tu tu j’y vais à pied C1]
(je suis chez moi) (je me conditionne dans mon appartement en me disant)
[je suis chez moi Cn je me conditionne Cn dans mon appartement Cn en me disant Cn]
(j’y vais à pied) (moi) (ma voiture) (elle est garée dans la rue)
[j’y vais à pied C1] [moi Cn ma voiture Cn elle est garée dans la rue C1]
(j’ai un stationnement résident) (je passe devant)
[j’ai un stationnement Cn résident C1] [je passe devant C1]
(je ne peux pas m’empêcher d’ouvrir) (euh la porte)
[je ne peux pas m’empêcher d’ouvrir C1] [euh la porte Cn]
(de monter dedans et d’aller euh à euh voilà) (cinq minutes en voiture)
[de monter dedans Cn et d’aller C1 euh à euh voilà C1] [cinq minutes en voiture C1]
(ce qui me mettrait peut-être euh un petit quart d’heure à pied)
[ce qui me mettrait peut-être Cn euh un petit quart d’heure Cn à pied C0]
(donc au dernier moment je prends ma voiture)
[donc au dernier moment C1 je prends ma voiture C0]
(sur le coup) (je me dis) (je vais mettre cinq minutes)
[sur le coup C1] [je me dis C1] [je vais mettre Cn cinq minutes C0]
(mais le temps de me garer) (de tourner)
[mais C1] [le temps Cn de me garer C1] [de tourner C1]
(de faire des ronds pour pas mal me garer et tout)
[de faire des ronds Cn pour pas mal me garer et tout C1]
(je sais que je suis perdante) (je le sais que je suis perdante)
[je sais Cemph que je suis perdante C0] [je le sais C0 que je suis perdante Con]
(il y a le) (oui)
[il y a le oui C1]
(il y a un côté de facilité de passer devant sa voiture et de se dire)
[il y a un côté de facilité C1] [de passer devant sa voiture C1] [et de se dire C1]
(bon ben je la prends et euh et voilà quoi)
[bon ben je la prends C1] [et euh et voilà C0] [quoi Con]
(au point où on en est) (c’est pas très malin) (faudrait que je)
[au point où on en est C1]
[c’est pas très malin C1] [faudrait que je C1] (c’est une habitude en tous cas que
j’aimerais changer)
[c’est une habitude Cn en tous cas Cn que j’aimerais changer C0]
Examples of macrosyntactic analysis 241
Italian
The example analyzed below was taken from the C-ORAL-ROM corpus, file
IFAMDL09, and consists of one female Tuscan speaker SAB talking to a friend
about her (painful) attendance at a concert the night before. Transcribed with-
out any punctuation, the text appears as follows:
ma io penso non so quanta capienza c’ha comunque io penso dodicimila persone s’era
tutte mh guarda non cascava uno spillo sinceramente e poi tra l’altro c’è stata la menata
che noi ci s’aveva il biglietto per il parterre praticamente quando siamo entrati hanno
aperto i cancelli già in ritardo alle sei e mezzo sicché noi siamo stati due ore lì a aspettare
in fila così quando siamo entrati c’hanno detto che praticamente noi non si poteva andare
sulle gradinate a sedere ma soltanto in mezzo si poteva stare sicché a me mi girava un
po’ le palle perché insomma stare ancora a aspettare fino alle nove e poi tutto il concerto
in piedi insomma era stressante la cosa e poi in piedi hai visto anche se il palco è un po’
rialzato però se ti viene uno davanti alto non vedi nulla specialmente io che non sono ba
insomma che son bassa vero sicché nulla io e quest’altra ragazza che era in macchina
con me s’è detto sai sicché proviamo a andare nelle gradinate e siamo riuscite a sgamare
sicché siamo siamo andate su e nulla ci siamo messe a sedere però logicamente tutti i
posti erano prenotati
“but I think I do not know how much capacity it has in any case I think twelve thousand
people that’s all mh look sincerely a needle could not fall down and then among other
things there was the nuisance that we had tickets for the parterre practically when we
entered they opened the gates already late at six-thirty so we were there two hours to
wait in line like that so when we entered they said that practically we could not go in the
stands but only in the middle to sit so I was feeling a little upset because really to stand
and wait still until nine and then standing throughout the whole concert, it was stressful
and then on tiptoe I saw if the stage was a bit raised if you come in front you don’t see
anything especially that I am I am short actually so nothing me and this other girl who
was in the car with me we thought let’s go to the stands and we were able to sneak in so
we we went on and nothing we started to sit but logically all the seats were booked”
a. Dysfluencies
Repetitions: siamo siamo
Reprises and reformulations: none
Aborts: io che non sono ba → che son bassa
Ponctuants vero, sai, insomma, sicchè
b. Text macrosegments
(ma io penso) (non so quanta capienza c’ha) (comunque io penso) (dodicimila persone
s’era tutte) (mh guarda) (non cascava uno spillo sinceramente) (e poi) (tra l’altro)
(c’è stata la menata che noi ci s’aveva il biglietto per il parterre) (praticamente
quando siamo entrati hanno aperto i cancelli già in ritardo alle sei e mezzo sicché noi
siamo stati due ore lì a aspettare in fila così) (quando siamo entrati) (c’hanno detto
che praticamente noi non si poteva andare sulle gradinate a sedere ma soltanto in
mezzo si poteva stare) (sicché a me mi girava un po’ le palle perché) (insomma stare
242 Macrosyntax
ancora a aspettare fino alle nove e poi tutto il concerto in piedi) (insomma era
stressante la cosa) (e poi in piedi hai visto anche se il palco è un po’ rialzato)
(però se ti viene uno davanti alto) (non vedi nulla specialmente io che non sono ba)
(insomma che son bassa vero) (sicché nulla) (io e quest’altra ragazza che era in
macchina con me s’è detto sai) (sicché proviamo a andare nelle gradinate e siamo
riuscite a sgamare) (sicché siamo) (siamo andate su) (e nulla) (ci siamo messe a
sedere però logicamente tutti i posti erano prenotati)
c. Prosodic groups
The prosodic groups are determined by the ranking of contours in Romance
languages: Cn < C1 < C2 < Cc < C0, different from the French one. From the
labeling of prosodic events located on stressed syllables, it is then possible to
identify the Nuclei, whose left boundary is a macrosegment boundary without
syntactic relation to the left (i.e. toward what precedes), and the right boundary
is aligned with a conclusive declarative contour C0. The prosodic structure
reorganizes the text macrosegments as follows:
[ma io C1 penso C2]
[non so C1 [quanta capienza Cn c’ha C2]]
[[comunque Cn io penso Cn dodicimila persone C1] s’era tutte C0]
The text Nucleus (s’era tutte C0) is aligned on a single stress group ended
with Co, whereas the prosodic structure [C1 C2] [C1 [Cn C2]] [[Cn Cn C1] C0]
is congruent with the macrosyntactic organization (ma io penso) (non so quanta
capienza c’ha) (comunque io penso) (dodicimila persone s’era tutte).
[mh guarda Cn non cascava Cn uno spillo C0] [sinceramente Con]
This sentence has a Prosodic Nucleus followed by a Postfix, ended with a
melodic flat contour.
[e poi C1]
[tra l’altro C1]
[c’è stata la menata Cn che noi ci s’aveva Cn il biglietto Cn per il parterre Cc]
[praticamente Cn quando siamo entrati C1]
[hanno aperto Cn i cancelli C1]
[già in ritardo C0]
[alle sei e mezzo Cn sicché noi siamo stati due ore Cn lì a aspettare Con in fila
così Con]
The next sentence has the same prosodic organization into Prosodic Nucleus +
Postfix, where all non-final contours are neutralized.
[[quando siamo entrati C1]
[c’hanno detto Cn che praticamente C1]
[noi non si poteva andare C1]] [sulle gradinate Cn a sedere C0]
[[ma soltanto C2 in mezzo C1] [si poteva stare C0]
Examples of macrosyntactic analysis 243
[e poi C1]
[in piedi Cn hai visto anche Cn se il palco Cn è un po’ rialzato C1]
[però C1]
[se ti viene Cn uno davanti alto C1]
[non vedi nulla C0]
[[specialmente io che non sono ba Cn insomma Cn che son bassa C1]]
[vero C0]
This sentence has two Prosodic Nuclei, the text macrosegment corresponding
to the second prosodic structure belongs to the preceding text segment as
indicated by the adverb specialmente.
[sicché C1 nulla C0]
The conclusive contour is realized with a rise on nulla stressed syllable and a
falling contour on its last syllable.
[io e quest’altra Cn ragazza C1]
[che era in macchina Cn con me C1]
[s’è detto sai Cn sicché proviamo C1 a andare C1 nelle gradinate C1]
[e siamo riuscite Cn a sgamare C1] [sicché siamo Cn siamo Cn andate su C1]
[e nulla C1]
[ci siamo messe a sedere C1]
[però Cn logicamente Cn tutti i posti Cn erano prenotati C0]
The last sentence uses only prosodic phrases ended with the contour C1 and not
C2, as all contrasts inside prosodic groups are marked by the neutralized
contour Cn.
As explained in Chapter 7, constructions ending with C2 must contrast with
C1 in more complex structure such as [C1 [Cn C2]] as in [non so C1 [quanta
capienza Cn c’ha C2]] above. To realize a stress phrase at the same level C2,
the speaker has chosen the sequence [C1 C2] in the preceding [ma io C1 penso
C2] instead of [Cn C1] as she did later in most prosodic groups.
244 Macrosyntax
[e poi C1] [in piedi Cn hai visto anche Cn se il palco Cn è un po’ rialzato C1]
(però se ti viene uno davanti alto)
[però C1] [se ti viene Cn uno davanti alto C1]
(non vedi nulla specialmente io che non sono ba) (insomma che son bassa vero)
[non vedi nulla C0] [[specialmente Cn io che non sono ba Cn insomma Cn che son
bassa C1]] [vero C0]
(sicché nulla)
[sicché C1 nulla C0]
(io e quest’altra ragazza che era in macchina con me)
[io e quest’altra Cn ragazza C1] [che era in macchina Cn con me C1]
(s’è detto sai) (sicché proviamo a andare nelle gradinate)
[s’è detto Cn sai Cn sicché proviamo Cn a andare Cn nelle gradinate C1]
(e siamo riuscite a sgamare) (sicché siamo) (siamo andate su)
[e siamo riuscite Cn a sgamare C1] [sicché siamo Cn siamo Cn andate su C1]
(e nulla)
[e nulla C1]
(ci siamo messe a sedere però logicamente tutti i posti erano prenotati)
[ci siamo messe a sedere C1] [però Cn logicamente Cn tutti i posti Cn erano
prenotati C0]
Portuguese
António Costa Quinta The short example analyzed below is
extracted from the C-ORAL-ROM corpus (2005, ed. Cresti & Moneglia), file
PFAMCV03, and consists of one female European Portuguese speaker GRA.
GRA is a psychologist, recorded in her home in Lisbon. She is talking to two
researchers about ways of addressing people and her flying experiences. The
recording belongs to a collection of spontaneous conversations recorded in
family environments.
The main characteristics of non-prepared speech pertain to the macrosyntactic
organization of the text in Prenuclei, Nucleus, Parenthesis, and Postnuclei. The
text is thus segmented in macrosegments by identifying lack of dependency
relations between syntagms. Then, from the syntactic properties of analyzed
macrosegments, we can extract the potential nuclei and test their characteristics
(illocutionary property, change in modality, etc.). Extracting its macrosyntactic
text Nuclei, the text (transcribed without any punctuation) appears as follows:
terrível não é eu aliás conheço um médico que é o Costa Quinta o António Costa Quinta
conhecido pelo Tó o Tó Costa Quinta que é a mesma coisa que bebe como uma esponja é
dos tais que não não se altera porque é realmente bem educado mas que chega a qualquer
246 Macrosyntax
sítio e ao fim de cinco minutos está a falar sobre guerra a guerra de África e até acabar até
se ir embora fala sobre a guerra de África eu acho eu só tenho um termo em francês para
definir um tipo destes é um emmerdeur.
“Terrible is it not. By the way I know a doctor, Costa Quinta, António Costa Quinta, known
as Tó or Tó Costa Quinta, which is the same thing, who drinks like a sponge and is such that
he does not get excited because he is really well-behaved, but when he arrives any-
where, after five minutes, he begins to speak about the war, the war in Africa and until he
finishes, until he leaves, he speaks about the war in Africa. I only think I have a
term in French to define a type of this kind: it is an emmerdeur.”
a. Dysfluencies
Hesitations: none;
Repetitions: não não;
Reprises and reformulations: guerra a guerra de África, eu acho →
eu só tenho;
Aborts: none.
b. Ponctuants
Macrosegment initial: none
Macrosegment final: não é.
c. Text macrosegments
(terrível não é) (eu aliás conheço um médico que é o Costa Quinta o António Costa
Quinta) (conhecido pelo Tó) (o Tó Costa Quinta que é a mesma coisa) (que bebe
como uma esponja) (é dos tais que não não se altera porque é realmente bem
educado) (mas que chega a qualquer sítio e ao fim de cinco minutos está a falar
sobre guerra a guerra de África) (e até acabar até se ir embora fala sobre a guerra
de África) (eu acho eu só tenho um termo em francês para definir um tipo destes)
(é um emmerdeur)
d. Prosodic groups
[terrível C0] [não é C0n]
[[eu aliás C1 conheço C2] [um médico Cc]]
[[que é C2] [o Costa Cn Quinta Cn o António Cn Costa Quinta Cn conhecido pelo
Tó Cc]]
[o Tó C2 Costa Quinta Cc]
[[que é a mesma coisa C2] [que bebe C1 como uma esponja Cc]]
[[é dos tais C2] [que não Cn não se altera C2] [porque é realmente bem educado Cc]]
[mas que chega C1 a qualquer sítio Cc]
[e ao fim C1 de cinco minutos Cc]
[[está a falar C2] [sobre guerra Cn a guerra Cn de África Cc]]
[[e até Cn acabar C2] [até se ir embora Cc]]
[[fala C1] [sobre a guerra Cn de África C0]]
Examples of macrosyntactic analysis 247
Conclusion
Separation of sentence prosody and sentence text is essential in macrosyntax,
as well as for classical read sentence analysis. The asymmetry between the two
separately conducted analyses may reveal interesting and surprising speaker
strategies to group into the same prosodic groups more than one text macro-
segment, and conversely, to group into the same text macrosegment a number
of prosodic groups. The evaluation of these asymmetries may lead to a better
classification and a better understanding of various styles of spontaneous
speech, the perfect congruence being probably the key for a good quality
comprehension from the audience, given the resulting reduction of cognitive
load, all occurrences of N400 (semantic discrepancies) and P600 (syntactic
discrepancies) consuming listeners’ energy.
9 Applications
Each of these seven stress groups can be pronounced with a falling or a rising
contour, correlated with a declarative or interrogative modality of the sentence
formed with this single stress group, for example declarative lalalala\\, and
interrogative lalalala//. “\\” and “//” symbolize respectively a falling and low
conclusive declarative contour, and a rising conclusive interrogative contour.
The phonological variants of these two basic conclusive contours can also be
considered (Fig. 9.1): a sharp (i.e. with large melodic change) falling contour
for an imperative modality, a bell-curved falling contour for implicative mod-
ality (i.e. an “evidence” contour), a sharp (i.e. with large melodic change) rising
contour for surprise, and a rising contour ending in a bell shape for a doubt
modality variant (see Chapter 5).
These basic sequences can then be made more complex to form prosodic
structures with two, three, . . ., n stress groups, knowing that sentences with
declarative melodic modality (thus ending with \\ whatever the syntactic or
morphologic marks present in the text) with two stress groups can be organized
only one way by the prosodic structure, three stress groups can be structured
prosodically in 3 ways, four stress groups in 11 ways, five stress groups in 45
ways, etc. (Martin, 1987). For example, a sentence composed of three stress
groups can be hierarchically structured as shown in Figure 9.2.
Silent reading
When we read, either silently or aloud, we generate speech sounds according to
the available information given in the written text. In this process, we also
necessarily generate a prosodic structure, which hierarchically organizes stress
groups (minimal syllabic chunks containing a single lexical or group stress),
into stress phrases, called in the Autosegmental-Metrical (AM) model ip
(intermediate intonative phrases) and at a higher level IP (Intermediate
Phrase), whose sequences constitute the whole utterance intonation.
It is noticeable that this prosodic structure (re)generation is essential to help
the reader to comprehend the text, and there is apparently no way to avoid it.
Therefore, the complete reading process may be constrained by the rules govern-
ing the elaboration of the sentence prosodic structure when speaking, and in
particular the minimal and maximal duration of stress groups (Martin, 2013b).
Indeed in silent reading, it appears impossible to proceed without subvoca-
lization, i.e. without hearing a voice in one’s head that corresponds to a voice
reading the text aloud, including the realization of stressed syllables. For this
reason, silent reading may be subject to the same prosodic constraints as
reading aloud. These constraints may interact or even supersede constraints
established for eye movement while reading. In particular, they may eventually
lead to a new explanation pertaining to the maximum number of words that can
be processed in fast reading.
Silent reading 253
Eye movement
When reading, the eye proceeds in saccades (short rapid movements) to scan the
text, jumping in steps varying from 1 to 20 characters with an average of 7 to 9
characters (forward and backward). In the process, the most frequent fixations are
given by verbal forms and punctuation marks (dots, commas, semicolons, ques-
tion marks, etc.). The eye jumps then constantly to spot these markers, which will
constitute the possible anchors for the prosodic structure to build (Martin, 2011).
Most of the laboratory speech research on sentence intonation actually
investigates this process thoroughly on read speech, before considering spon-
taneous, non-prepared speech prosodic features. For example, if a dot normally
ending written text sentences is associated with a falling conclusive prosodic
contour, the correspondence of the other punctuation marks and the verbal
forms must be dynamically associated with a proper prosodic contour such as a
continuation majeure C1, i.e. a boundary tone in the AM model.
The saccades allow the eye to focus on fixation points in 20 ms to 40 ms,
whereas the fixation point lasts between 100 ms and 500 ms (Sereno & Rayner,
2003). The fixation state of the eye allows the fovea, the central part of the
retina, to scan the selected written information with high resolution, whereas
peripheral information is viewed in less detail.
Owing to the complex muscular mechanisms for speech production, oral
(i.e. aloud) reading is slower than silent reading. However, the puzzling
aspect of silent reading lies in its limitations. Despite a number of ques-
tionable commercial claims stating that fast readers could read up to 3,000
words per minute (about 50 words/sec, but by skimming on content words
only), the fast reading process is limited by subvocalization, the effect of
hearing a voice in one’s head while reading silently (which some authors
curiously attribute to the way we learn to read at school; Nowak, 2012).
By scanning only assumed key words, access to the text’s meaning will
derive from only a long list of rapidly selected words, with no hierarchical
organization and therefore no syntactic structure.
Subvocalization
Subvocalization does not pertain to the mechanical control of articulators
muscle control, but to the perception of the speech signal, which is recovered
254 Applications
by reading. The invention of writing has precisely this function, allowing not
only reading aloud but also reading silently, i.e. “to talk to oneself in silence.”
Indeed, writing is a shorthand notation system of speech sound and not of
articulatory movements, contrary to what supporters of the motor theory of
speech perception claim (Liberman & Mattingly, 1985).
Other systems such as pictograms bypass the generation of speech sounds by
associating directly significant and signifier to access their signification without
going through language units, syllables, words, prosodic words, syntagms, etc.
A road STOP sign may indeed be read aloud or silently, but is more frequently
directly associated with its meaning, i.e. to stop and give way on the road.
Likewise, well-known dates written with numbers, e.g. 1789, may read as
“seventeen hundred eighty nine,” but the constant use of symbols not corre-
sponding directly to syllables and words leads more frequently to a direct
access to its signifier (the French Bastille day). The passage to the status of
pictogram depends of course on the familiarity of the reader with the object and
its frequency of occurrence. Fast reading by scanning key words relies on this
process.
Writing systems using ideograms, for example Mandarin, also involve sub-
vocalization in silent reading. Learning Mandarin without being concerned by
ideograms pronunciation may be possible but somewhat difficult, as many
words are plurisyllabic, which means the reader must deal with a combination
of ideograms (Marshall Unger, 2003). However, one could associate other
sounds to ideograms, such as English words, for example, but the mediation
of speech sounds and therefore a prosodic structure cannot be avoided.
Commercial US-based fast reading “schools” claim that they can remove
subvocalization, or at least minimize it. The subliminal idea is to transform
every word into a pictogram, so when read it will not be pronounced silently.
Other techniques recommend using a pencil to determine eye fixation targets
and accelerating the number of saccades. One application even proposes dis-
playing only lexical words sequentially on a computer screen with a user-
adjustable speed (this approach incidentally corresponds to the definition of
Accent Phrases in the AM model, i.e. one content word for each Accent
Phrase). Comprehension should then be achieved without any prosodic struc-
ture and no syntactic structure linking the read words together. This is equiva-
lent to reading a list of items.
Faster readers claim speeds from 400 to 800 wpm (words per minute). With
an average number of about three (written) words per stress group, 800 wpm
converts into about 266 stress groups per minute, or 266/60 = 4.4 stress groups
per second. So the minimal average duration between silently read stress
groups would be about 225 ms, 800 wpm for the best-observed performance
for speed readers (Dunning, 2010), again limited by the unavoidable reconsti-
tution of stress groups and their associated prosodic structure.
Silent reading 255
Of all the key concepts used in this book to analyze prosodic data, probably the
most important is the separation between sentence text and sentence intonation.
Indeed, the idea of separating intonation from text in the phonological analysis
may at first seem difficult to conceive, as these two linguistic objects are tightly
linked together. However, this simple change of point of view from the traditional
way allowed me to investigate the properties of both speech sound productions
separately, even if they are obviously pronounced at the same time by the speaker.
By proceeding this way, I could establish not only that structures organizing
text and intonation units use different types of markers, but also that these
markers operate independently and were not necessarily appearing at the same
time during the generation of the sentence. This arrangement may be part of the
explanation for the extraordinary resistance to noise of human language.
Indeed, a localized production or perception error may affect only one “side”
of the sentence, the text morphological or syntactic markers or the intonation
contours, without necessarily affecting the other “side.” Furthermore, adopting
an approach that separates text and intonation brings a much easier way of
analyzing difficult examples observed in spontaneous speech, thereby giving
intonation its deserved place in the phonological world.
As a sort of conclusion, I would like to quote the author Frédéric Dard, now
deceased, who, while writing detective stories, demonstrated his deep under-
standing of many linguistic mechanisms in phonology, syntax, semantics, and
pragmatics simply in order to obtain a comic effect. Here are some examples in
the domain of sentence intonation in French, related to topics discussed in this
book, on word alignment of content words, on Postfixes in macrosyntax, on
minimal duration of stress groups, and on syntactic clash. There is even a
prosodic structure that is apparently impossible to pronounce
256
Quotes from Frédéric Dard 257
259
260 WinPitch
if desired (Fig. 11.2). When played back, all the WinPitch functions operate
on the speech signal, displaying the synchronized video part at the same time.
Furthermore, dedicated converters handle mp3 and CD sound files directly.
Selecting a slower playback sound speed will always result in a synchronized
corresponding video display.
Figure 11.5 Fine tuning of speech segment limits with the help of a
simultaneously displayed spectrogram (which allows precise segmentation
in case of speaker’s overlapping).
264 WinPitch
Native output formats are XML (for alignment files) and a proprietary WP2
format (which includes all the annotations, text, highlighting, F0 tracking
parameters, etc. as defined by the user).
WinPitch includes also an automatic prominence analyzer operating
from built-in automatic syllabic detection or from automatic syllabic or
phone segmentation. An automated consulting tool is integrated in the
program for automatic syntactic and morphological labeling as well as
an IPA transcription from data extracted from large lexicon in Excel®
format.
Any speech segment can be labeled and highlighted in a user-defined
color to be exported to Excel in a single mouse click. Sophisticated data
analysis can then be executed later using Excel predefined or user-defined
scripts.
A batch mode allows the automatic playback (and acoustical analysis) of
speech segments as defined in a concordance program, giving the search text
together with its left and right contexts (Fig. 11.7). This mode presently
operates from an Excel file, loads the corresponding sound file, and retrieves
the context-defined text automatically. An interesting application of this batch
mode is described below.
WinPitch functions include an integrated concordancer. Figures 11.7 to
11.10 illustrate the details of the operations involved. In Figure 11.8 the user
Data mining for large speech corpora 265
enters the key word parce que taken as an example, selects an appropriate
alignment source format (Transcriber *.trs in this example), and clicks on
any of the file names stored in the same directory. This directory should
contain all the alignment files of interest in the same format, together with
their corresponding sound files (six formats are available: Transcriber, Praat,
CRF, Necte, WinPitch, XML). In the case of Praat textgrid files, the corre-
sponding sound files must have the same name as their textgrid counterpart,
as Praat textgrid files do not contain any reference to their corresponding
speech file.
An Excel table listing all found occurrences of the key word is immediately
generated (Fig. 11.9). This operation is very fast, in the example of parce que,
the completion takes less than one second to scan 104 files giving 1194
occurrences.
When the user clicks on any line of the Excel table, a specific occurrence of
the keyword is selected together with its left and right contexts. The corre-
sponding text and speech segments are automatically displayed, as shown in
Figures 11.10 and 11.11.
Integrating this function in one single software package makes possible
specific research topics on prosody that would have been seen as too time-
consuming previously.
266 WinPitch
Figure 11.8 Entering the key word “parce que” and selecting a Transcriber file.
Acoustic analysis
Since pitch-tracking algorithms are so far prone to errors in adverse recording
conditions, and given that for a particular speech segment some algorithms
are less prone to errors than others are, WinPitch includes six different pitch-
tracking routines to evaluate the fundamental frequency (spectral comb, spec-
tral brush, autocorrelation, AMDF (Average Magnitude Difference Function),
spectral fit, harmonic selection).
These algorithms and their related parameters can be independently
applied on user defined segments of the speech wave, in order to use
the most appropriate scheme in a given speech section of the recording.
The spectral comb and spectral brush are especially resistant to noise and
absence of some harmonics in the spectrum (Fig. 11.12). WinPitch
includes also a scanning feature allowing a quality analysis of the record-
ing in terms of fundamental frequency coherence, transition, and presence
of creak so that the user can easily retrieve speech segments with F0
tracking problems.
The measurement of fundamental frequency is particularly sensitive
to recorded speech signal distortions due to (1) poor signal to noise
ratio, (2) filtering of low frequencies, eliminating low harmonics for
Acoustic analysis 267
male voices, (3) various spurious components due to room echo in the
recording places, (4) encoding in formats such as mp3 or wma with
excessive compression levels, (5) external sound sources (car engine,
overlapping speech segments, etc.), and (6) presence of creaky segments
where the fundamental frequency is not really defined.
268 WinPitch
A file containing all the information about corrections made can be saved in
text format, as well as a.pitch file describing the corrected pitch curve to be
exported to Praat.
Prosodic morphing
Another interesting feature of WinPitch, devoted more specifically to
prosodic research, is the prosodic morphing tool, where fundamental
frequency, intensity, and syllabic duration can be easily modified with simple
and intuitive graphic commands. The syllabic (or phone) durations, for
example, can be altered by a single mouse move after automatic or manually
defined syllabic or phone boundaries (imported, for example, from a Praat
TextGrid file).
Automatic segmentation
WinPitch has dedicated functions for automatic segmentation of speech
signals into various levels: speech turn, breath group, pause delimited
group, syllable and phone. This latter capability is based on an innovative
approach mimicking the manual segmentation made on spectrograms by
trained phoneticians. It does not require statistical or neuronal training like
most other systems, and is therefore independent from the language
analyzed, with few or no parameters to adjust. Ergonomic and easy to
use manual correction commands are also available with this segmentation
function.
Aguilera, Marion, Radouane El Yagoubi, Robert Espesser & Corine Astésano (2014)
Event-Related Potential investigation of Initial Accent processing in French, in
Nick Campbell, Dafydd Gibbon & Daniel Hirst (eds.), Social and Linguistic
Prosody: Proceedings of the Seventh International Conference on Speech
Prosody, Dublin: Science Foundation Ireland (SFI), 383–387.
Alkire, Ti & Carol Rosen (2010) Romance Languages: A Historical Introduction,
Cambridge University Press.
Antonetti, Pierre & Mario Rossi (1970) Précis de phonétique italienne: synchronie et
dialchronie, Aix-en-Provence: La Pensée Universitaire.
Armstrong, Lilias E. & Ida C. Ward (1931) Handbook of English Intonation (2nd edn.),
Cambridge: W. Heffer.
Astésano, Corine, Mireille Besson & Kai Alter (2004) Brain potentials during semantic
and prosodic processing in French, Cognitive Brain Research (18) 2004, 172–184.
Austin, John Langshaw (1962) How to Do Things with Words, Oxford University Press.
Avanzi, Mathieu (2012) L’interface prosodie/syntaxe en français, Brussels: Peter Lang.
Avanzi, Mathieu & Philippe Martin (2007) L’intonème conclusif: une fin de phrase en
soi? Cahiers de linguistique française 28, 247–258.
Avanzi, Mathieu, Anne Lacheret-Dujour & Bernard Victorri (2008) ANALOR: a tool
for semi-automatic annotation of French prosodic structure, in Proceedings of
Speech Prosody 2008: Fourth International Conference on Speech Prosody,
Campinas, Brazil, May 6–9, 119–122.
Avanzi, Mathieu, Nicolas Obin, Anne Lacheret & Bernard Victorri (2011). Toward a
continuous modeling of French prosodic structure: using acoustic features to
predict prominence location and prominence degree, in Proceedings of
Interspeech, Florence, August, 2033–2036.
Avanzi, Mathieu, Lucie Rousier-Vercruyssen, Sandra Schwab, Sylvia Gonzalez & Marian
Fossard, et al. (2013) C-PROM-Task: a new annotated dataset for the study of French
speech prosody, in Proceedings TRASP 2013: Tools and Resources for the Analysis of
Speech Prosody, Aix-en-Provence, August 30, 27–30.
Avesani, Cinzia (1995) ToBIt: un sistema di trascrizione per l’intonazione italiana, in
Atti delle 5e Giornate di Studio del Gruppo di Fonetica Sperimentale (A.I.A.),
Povo (TN), Italy, 85–98.
Badiou, Alain (1969) Le concept de modèle: introduction à une épistémologie
matérialiste des mathématiques, Paris: Maspéro.
Bally, Charles (1944) Linguistique générale et linguistique française, Berne:
Francke.
272
References 273
Baumann, Stefan, Martine Grice & Ralf Benzmüller (2001) GToBI: a phonological
system for the transcription of German intonation, in S. Puppel & G. Demenko
(eds.), Prosody 2000: Speech Recognition and Synthesis, Poznan: Adam
Mickiewicz University, 21–28.
Beckman, Mary E. & Gayle Ayers Elam (1997) Guidelines for ToBI Labelling (Version
3, March 1997), The Ohio State University Research Foundation, www.ling.ohio-
state.edu/research/phonetics/E_ToBI/.
Beckman, Mary & Sun-Ah Jun (1996) K-ToBI (Korean ToBI) labelling convention,
(Version 2), Ms., Ohio State University and UCLA, www.linguistics.ucla.edu/
people/jun/sunah.htm.
Beckman, Mary E. & Janet B. Pierrehumbert (1986) Intonational structure in Japanese
and English, Phonology Yearbook 3, 255–309.
Beckman Mary E., Manuel Díaz-Campos, Julia Tevis Mcgory & Terrell A. Morgan
(2002) Intonation across Spanish, in the Tones and Break Indices framework,
Probus 14, 9–36.
Beckman Mary E., Julia Hirschberg & Stefanie Shattuck-Hufnagel (2005) The original
ToBI system and the evolution of the ToBI framework. In Sun-Ah Jun (ed.),
Prosodic Typology: The Phonology of Intonation and Phrasing, Oxford
University Press, 9–54.
Berrendonner, Alain (1990) Pour une macro-syntaxe, Travaux de linguistique 21, 25–36.
(2003) Grammaire de l’écrit vs. grammaire de l’oral: le jeu des composantes micro et
macro-syntaxiques, in A. Rabatel (ed.), Interactions orales en contexte didactique:
mieux(se) comprendre pour mieux (se) parler et pour mieux (s’) apprendre, Lyon:
PUL, 249–264.
Beyssade, Claire, Élisabeth Delais-Roussarie & Jean-Marie Marandin (2007) The
prosody of interrogatives in French, Cahiers de linguistique française 28,
163–175.
Blanche-Benveniste, Claire (1990) Le français parlé-études grammaticales, Éditors du
CNRS, Paris
(2000) Approches de la langue parlée en français, Paris: Ophrys.
Blanche-Benveniste, Claire (2003) La naissance des syntagmes dans les hésitations et
répétitions du parler, in J. L. Araoui (ed.), Le sens et la mesure: hommages à Benoît
de Cornulier, Paris: Honoré Champion, 40–55.
(2007) Corpus de langue parlée et description grammaticale de la langue, Langage et
société 121–122(3), 129–141.
Blanche-Benveniste, Claire & Philippe Martin (2011) Structuration prosodique,
dernière réorganisation avant énonciation, Langue française 170, 127–142.
Blanche-Benveniste, Claire, André Valli, Maria Antonia Mota, Raffaele Simone,
Elisabetta Bonvino & Isabel Uzcanga de Vivar (1997) EuRom4: metodo di inseg-
namento simultaneo delle lingue romanze, Florence: La Nuova Italia.
Bocci, Giuliano (2013) The Syntax-Prosody Interface, Amsterdam: Benjamins.
Bolinger, Dwight L. (1961) Contrastive accent and contrastive stress, Language 37, 83–96.
(1972) Accent is predictable (if you’re a mind-reader), Language 48(3), 633–644.
Bonami, Olivier & Elisabeth Delais-Roussarie (2006) Metrical phonology in HPSG, in
Stephan Müller (ed.), Proceedings of the Thirteenth International Conference on
Head-Driven Phrase Structure Grammar, Varna, Bulgaria, July 2006, Stanford:
CLSI Online Publications, 39–59.
274 References
Bonvino, Elisabetta, Sandrine Caddéo, Eulalia Vilaginés Serra & Salvador Pippa (2011)
EuRom5, Milan: Ulrico Hoepli.
Boucher, Victor, Annie Gilbert & Philippe Martin (forthcoming) Prosodic words and
brain waves.
Boulakia, Georges (1985) Ambigüité et intonation, in C. Fuchs (ed.), Aspects de
l’ambigüité et de la paraphrase dans les langues naturelles, Berne: Peter Lang.
Bruce, Gösta (1977) Swedish word accents in sentence perspective, Travaux de
l’Institut de Linguistique de Lund, 12. Gleerup: Lund University Press.
Brunot, Ferdinand (1911–1914) Archives de la parole, Gallica, http://gallica.bnf.fr/html/
enregistrements-sonores/archives-de-la-parole-ferdinand-brunot-1911–1914.
Cei, Erica & Bruce Hayes (2013) Italian Stress Study, http://italianstressstudy.blog
spot.fr/.
Chafe, Wallace (1976) Givenness, contrastiveness, definiteness, subjects, topics, and
point of view, Linguistic Inquiry, 25–55.
Chen, Matthew (1970) Vowel length variation as a function of the voicing of the
consonant environment, Phonetica 22(3), 129–159.
Chen, Zen-Yong, Patricia E. Cowell, Rosemary Varley & Yi-Ching Wang (2009) A
cross-language study of verbal and visuospatial working memory span, Journal of
Clinical and Experimental Neuropsychology 31(4), 385–391, DOI: 10.1080/
13803390802195195.
Chitoran, Ioana (2002) The Phonology of Romanian: A Constraint-Based Approach,
New York: Mouton de Gruyter.
Chitoran, Ioana, Alina Maria Ciobanu, Liviu P. Dinu & Vlad Niculae (2014) Using a
Machine Learning Model to assess the complexity of stress systems, in Nick
Campbell, Dafydd Gibbon and Daniel Hirst (eds.), Proceedings of the Sixth
Conference on Speech Prosody, Tongji University Press, Shanghai, 331–336.
Contini, Michel, Jean-Pierre Lai, Antonio Romano, Stefania Roullet, Lurdes de Castro
Moutinho, et al. (2002) Un projet d’atlas multimédia prosodique de l’espace
roman, in Speech Prosody 2002: Proceedings of the First International
Conference of Speech Prosody (Aix-en-Provence, April, 11–13, 2002), Aubenas
d’Ardèche: Lienhart, 227–231.
Cooper, William & John Sorensen (1981) Fundamental Frequency in Sentence
Production, New York: Springer.
C-ORAL-BRASIL (2012) Reference Corpus for Informal Spoken Brazilian
Portuguese, in Helena Caseli, Aline Villavicencio, António Teixeira & Fernando
Perdigão (eds.), Computational Processing of the Portuguese Language:
Proceedings of the Tenth International Conference, PROPOR 2012, Coimbra,
Portugal, April 17-20, Lecture Notes on Artificial Intelligence, Vol. 7243,
Springer, 362–368.
C-ORAL-ROM (2001) Corpus de référence pour les langues romanes orale, www.elda.
org/en/proj/coral/fr/coralrom.html.
C-ORAL-ROM (2005) Integrated Reference Corpora for Spoken Romance Languages,
Studies in Corpus Linguistics, 15, ed. Emanuela Cresti & Massimo Moneglia,
Amsterdam: Benjamins.
CFPP (2000) Corpus du Français Parlé Parisien http://ed268.univ-paris3.fr/syled/res
sources/Corpus-Parole-Paris-PIII/.
Cresti, Emanuela (2000) Corpus di italiano parlato, Florence: Accademia della Crusca.
References 275
Cresti, Emanuela, Massimo Moneglia & Philippe Martin (2002) L’intonation des
illocutions naturelles représentatives: analyse et validation perceptive, in Macro-
syntaxe et pragmatique: l’analyse linguistique de l’oral, Lablita: Università di
Firenze, 173–192.
Debaisieux, Jeanne-Marie & Philippe Martin (2010) Les parenthèses: étude macrosyn-
taxique et prosodique sur corpus, in Marie-José Béguelin, Mathieu Avanzi & Gilles
Corminboeuf (eds.), La parataxe, Vol. II: Structures, marquages et exploitations
discursives, Berne: Peter Lang, 307–339.
Debaisieux, Jeanne-Marie, Henri-José Deulofeu & Philippe Martin (2008) Pour une
syntaxe sans ellipse, in Jean-Christophe Pitavy & Michèle Bigot (eds.), Ellipse et
effacement: du schème de phrase aux règles discursives, Publications de
l’Université de Saint-Etienne, 227–235.
Delais-Roussarie, Elizabeth (2000) Vers une nouvelle approche de la structure proso-
dique, Langue Française 126(May), 92–112.
(2009) La prosodie des incidentes en français, Cahiers de Grammaire 30, 129–138.
Delais-Roussarie, Elisabeth, Brechtje Post, Mathieu Avanzi, Carolin Buthke, Albert Di
Cristo, et al. (2015) Intonational phonology of French: developing a ToBI system
for French, in Sónia Frota & Pilar Prieto (eds.), Intonational Variation in Romance,
Oxford University Press, pp. 63–100.
Delattre, Pierre (1966) Les dix intonations de base du français, French Review 40, 1–14.
Dell, François (1984) L’accentuation dans les phrases en français, in F. Dell, D. Hirst &
J. R. Vergnaud (eds.), Formes sonores du langage, Paris: Hermann, 65–122.
(2004) On recent claims about stress and tone in Mandarin, Cahiers de Linguistique
Asie Orientale 33(1), 33–63.
Delmonte, Rodolfo (1981) L’accento di parola nella prosodia dell’enunciato dell’ita-
liano standard, Studi di grammatica Italiana 10, 351–394.
Deulofeu, Henri-José (2003) L’approche macrosyntaxique en syntaxe: un nouveau
modèle de rasoir d’Occam contre les notions inutiles, Scolia 16, 77–95.
(2006) Pour une linguistique du rattachement, in Denis Apothéloz, Bernard
Combettes & Franck Neveu (eds.), Actes du colloque international de Nancy
(7–9 juin 2006), Bern/Berlin: Peter Lang, 229–250.
Di Cristo, Albert (1998) Intonation in French, in D. J. Hirst & A. Di Cristo (eds.),
Intonation Systems: A Survey of Twenty Languages, Cambridge University Press,
195–218.
(2013) La prosodie de la parole, Brussels: De Boeck – Solal.
Di Cristo, Albert & Mario Rossi (1977) Propositions pour un modèle d’analyse de
l’intonation, Actes des 8èmes Journées d’Étude sur la Parole (Aix-en-Provence),
1, 323–329.
D’Imperio, Mariapaola (2002) Italian intonation: an overview and some questions,
Probus 14, 37–69.
D’Imperio, Mariapaola, Gorka Elordieta, Sónia Frota, Pilar Prieto & Marina Vigário
(2005) Intonational phrasing in Romance: the role of syntactic and prosodic
structure, in S. Frota, M. Vigário & M. J. Freitas (eds.), Prosodies, Berlin/New
York: Mouton de Gruyter, 59–97.
Doelling, Keith B., Luc H. Arnal, Oded Ghitza & David Poeppel (2014) Acoustic land-
marks drive delta-theta oscillations to enable speech comprehension by facilitating
perceptual parsing, NeuroImage 85(761). DOI: 10.1016/j.neuroimage.2013.06.035.
276 References
(2010) Secondary stress and stress clash in Spanish, in Marta Ortega-Llebaria (ed.),
Selected Proceedings of the Fourth Conference on Laboratory Approaches to Spanish
Phonology, Somerville, MA: Cascadilla Proceedings Project, 11–19, www.lingref.
com/cpp/lasp/4/index.html.
Jones, Daniel (1909) Intonation Curves, Leipzig/Berlin: Teubner.
Jun, Sun-Ah (ed.) (2005) Prosodic Typology: The Phonology of Intonation and
Phrasing, Oxford University Press.
(2012) Prosodic typology revisited: adding macro-rhythm, in Qiuwu Ma,
Hongwei Ding & Daniel Hirst (eds.), Proceedings of Speech Prosody 2012:
Sixth International Conference on Speech Prosody, Tongji University Press,
535–538.
Jun, Sun-Ah & Cécile Fougeron (2002) The realizations of the Accentual Phrase in
French intonation, Probus 14, 147–172.
Karcevski, Serge (2000) Inédits et introuvables, Textes rassemblés et établis par Irina et
Gilles Fougeron, Leuven: Peeters.
Lacheret-Dujour, Anne (2003) La prosodie des circonstants, Leuven: Peeters.
Lacheret-Dujour, Anne & Frédéric Beaugendre (1999) La prosodie du français, Paris:
CNRS Éditions.
Lacheret-Dujour, Anne & Bernard Vittori (2002) La période intonative comme
unité d’analyse pour l’étude du français parlé: modélisation prosodique et enjeux
linguistiques, in M. Charolles, P. Le Goffic & M. A. Morel (eds.), Verbum n°1–2 :
Y-a-t-il une syntaxe au-delà de la phrase? Presses universitaires de Nancy,
55–72.
Ladd, Robert D. (1996) Intonational Phonology, Cambridge Studies in Linguistics,
Cambridge University Press.
(2008) Intonational Phonology (2nd edn.), Cambridge Studies in Linguistics,
Cambridge University Press.
Lanchantin, Pierre, Andrew C. Morris, Xavier Rodet & Christian Veaux (2008)
Automatic phoneme segmentation with relaxed textual constraints, in Proceedings
of the Sixth International Conference on Language Resources and Evaluation
(LREC 08), Marrakech: European Language Resources Association, www.lrec-
conf.org/proceedings/lrec2008/.
Leben, William (1971) The morphophonemics of tone in Hausa, in C.-W. Kim and H.
Stahlke (eds.), Papers in African Linguistics, Edmonton, Alberta: Linguistic
Research, Inc., 201–218.
Lehiste, Ilse (1979) Suprasegmentals, Cambridge, MA: MIT Press.
Lehka, Irina & David Le Gac (2004) Étude d’un marqueur prosodique de l’accent de
banlieue, Actes des XXIIIème Journées d’Etudes sur la Parole, April 2004, Fès,
Morocco, www.afcp-arole.org/doc/Archives_JEP/2004_XXVe_JEP_Fes/actes/
jep2004/Lehka-LeGac.pdf.
Léon, Monique (1964) Exercices systématiques de prononciation française, fascicule 2,
Rythme et intonation, Paris: Hachette et Larousse.
Léon, Pierre (1993) Précis de phonostylistique: parole et expressivité, coll. “Fac
Linguistique,” Paris: Nathan.
(2005) Phonétisme et prononciations du français (5th edn.), Paris: Armand-Colin.
Léon, Pierre & Philippe Martin (1969) Prolégomènes à l’étude des structures intona-
tives, Montréal: Didier.
References 279
Li, X., Peter Hagoort & Yufang Yang (2008). Event-related potential evidence on the
influence of accentuation in spoken discourse comprehension, Chinese Journal of
Cognitive Neuroscience 20(5), 906–915.
Liberman, Alvin M. & Ignatius G. Mattingly (1985) The motor theory of speech
perception revised, Cognition 21(1), 1–36.
Liberman, Mark & Alan Prince (1977) On stress and linguistic rhythm, Linguistic
Inquiry 8, 249–336.
Linne, Per (2005) The Written Language Bias in Linguistics, New York: Routledge.
Lonchamp, François (1998) Prédire l’intonation des phrases affirmatives: facteurs
rythmiques et syntaxiques, Verbum 17(1), 37–45.
Makuuchi, Michirou, Jörg Bahlmann, Alfred Anwander & Angela D. Friederici (2009)
Segregating the core computational faculty of human language from working
memory, Proceedings of the National Academy of Sciences of the United States
of America, 106(20), 8362–8367.
Marshall Unger, James (2003) Ideogram: Chinese Characters and the Myth of
Disembodied Meaning, University of Hawai’i Press.
Martin, Philippe (1973) Les problèmes de l’intonation: recherches et méthodes, Langue
française 19 (Sept. 1973), 4–42.
(1975) Analyse phonologique de la phrase française, Linguistics, 146 (Feb.), 35–68.
(1987) Prosodic and rhythmic structures in French, Linguistics, 25(5), 925–949.
(1989) Automatic assignment of lexical stress in Italian, Proc. Eurospeech 89, Paris,
Sept. 27–29, 1989, 222–225.
(2002) Intonation et syntaxe dans les langues romanes, in Macro-syntaxe et pragma-
tique: l’analyse linguistique de l’oral, Lablita, Università di Firenze, 193–220.
(2006) Modelli di analisi e sistemi di etichettatura prosodica, AISV 2005, Analisi
Prosodica, teorie, modelli e sistemi di annotazione, ed. Claudia Crocco, B. Gili
Fivela & R. Savy, Padova: EDK editore, 43–56.
(2008) Phonétique acoustique: introduction à l’analyse acoustique de la parole.
Paris: Armand Colin.
(2009) Intonation du français, Paris: Armand Colin.
(2011) Ponctuation et structure prosodique, Langue Française, 172, 99–114.
(2012a) The Autosegmental-Metrical Prosodic Structure: not fit for French?, in
Qiuwu Ma, Hongwei Ding & Daniel Hirst (eds.), Proceedings of Speech Prosody
2012: Sixth International Conference on Speech Prosody, Tongji University Press,
131–134.
(2012b) La structure prosodique dynamique: rature et insertion de texte dans l’oral
spontané, in S. Caddéo, M.-N. Roubaud, M. Rouquier & F. Sabio (eds.), Penser les
langues avec Claire Blanche-Benveniste, Presses universitaires de Provence, coll.
Langues et langages, 117–125.
(2012c) Les contours de continuation majeure dans l’océan Indien, in A.-C. Simon
(ed.), La variation prosodique régionale en français, Brussels: De Boeck-Duculot,
199–211.
(2013a) Iconicity of melodic contours in French, in S. Hancil and D. Hirst (eds.),
Prosody and Iconicity, Amsterdam: Benjamins, 179–190.
(2013b) Contraintes phonologiques de l’intonation de la phrase réinterprétées à
la lumière des recherches récentes en neurophysiologie, La Linguistique 1,
97–113.
280 References
Obrig, Hellmuth, Simone Rossi, Silke Telkemeyer & Isabell Watenburger (2010) From
acoustic segmentation to language processing: evidence from optical imaging,
Frontiers in Neuroenergetics 2(13), 1–12.
Ochs, Elinor (1979) Transcription as theory, in E. Ochs & B. Schieffelin (eds.),
Developmental Pragmatics, New York: Academic Press, 43–71.
Palmer, Caroline & Susan Holleran (1994) Harmonic, melodic, and frequency height
influences in the perception of multivoiced music, Perception & Psychophysics
56(3), 301–312.
Palmer, Harold H. & F. G. Blandford (1924) A Grammar of Spoken English on a Strictly
Phonetic Basis, Cambridge: Heffer & Sons.
Pasdeloup, Valérie (2004) Le rythme n’est pas élastique: étude préliminaire de l’in-
fluence du débit de parole sur la structuration temporelle, Actes des XXIIIème
Journées d’Etudes sur la Parole, April 2004, Fès, Morocco, www.afcp-parole.org/
doc/Archives_JEP/2004_XXVe_JEP_Fes/actes/jep2004/Pasdeloup.pdf.
Pierrehumbert, Janet B. (1980) The Phonology and Phonetics of English Intonation,
Ph.D. thesis, MIT, http://dspace.mit.edu/handle/1721.1/16065.
Pierrehumbert, Janet B. & Mary E. Beckman (1988) Japanese Tone Structure,
Cambridge, MA: MIT Press.
Pike, Kenneth L. (1945) The Intonation of American English, Ann Arbor: University of
Michigan Publications, Linguistics.
Poiré, François (2006) La perception des proéminences et le codage prosodique, in
Bulletin No. 6, Prosodie du français contemporain: l’autre versant de PFC, ed.
Anne Catherine Simon, Geneviève Caelen-Haumont & Claudine Pagliano, CNRS
& Université de Toulouse-Le Mirail, 69–80.
Posner, Rebecca (1996) The Romance Languages, Cambridge Language Survey,
Cambridge University Press.
Post, Brechtje (1999) Restructured phonologic phrases in French: evidence from clash
resolution, Linguistics 37(1), 41–63.
(2000) Tonal and Phrasal Structures in French Intonation, The Hague: Holland
Academic Graphics.
Praat (2013) www.praat.org.
Prieto, Luis, J. (1975) Pertinence et pratique: essai de sémiologie, Paris: Éditions de Minuit.
Prieto, Pilar (2014) The intonational phonology of Catalan, in Sun-Ah Jun (ed.),
Prosodic Typology 2, Oxford University Press, 43–80.
Prieto, Pilar, Joan Borràs-Comes, Verònica Crespo-Sendra, Paolo Roseano, Rafèu
Sichel-Bazin & Maria del Mar Vanrell (2007) The phonetics and phonology of
intonational phrasing in romance, in Pilar Prieto, Joan Mascaró & Maria-Josep
Solé (eds.), Segmental and Prosodic Issues in Romance Phonology, ICREA &
Universitat Autònoma de Barcelona, 131–154.
Prieto, Pilar, Lourdes Aguilar, Ignasi Mascaró, Francesc Torres-Tamarit & Maria del
Mar Vanrell (2009) L’etiquetatge prosòdic Cat_ToBI, Estudios de Fonética
Experimental 18, 287–309.
Prince, Alan (1983) Relating to the grid, Linguistic Inquiry 14, 19–100.
Profili, Olga (1987) L’accent et sa prévisibilité, Rapport Syntalit/Italien, CENT Lannion.
Profili, Olga & Philippe Martin (1987) Antonio mangia la zuppa inglese, in Proceedings
XIth ICPhS: The Eleventh International Congress of Phonetic Sciences, Tallinn:
Academy of Sciences of the Estonian SSR.
282 References
Selkirk, Elisabeth O. (1978) On prosodic structure and its relation to syntactic structure,
in T. Fretheim (ed.), Nordic Prosody, Vol. II, Trondheim: TAPIR, 111–140.
(1986) Derived domains in sentence phonology, Phonology Yearbook 3, 371–405.
(2005) Comments on intonational phrasing in English, in S. Frota, M. Vigario &
J. Freitas (eds.), Prosodies: Selected Papers from the Phonetics and Phonology in
Iberia Conference, 2003, Phonetics and Phonology Series, Berlin: Mouton de
Gruyter, 11–58.
Sereno, Sara & Keith Rayner (2003) Measuring word recognition in reading: eye move-
ments and event-related potentials, Trends in Cognitive Science 7(11), 489–493.
Simon, Anne-Catherine, Mathieu Avanzi, Jean-Philippe Goldman (2008) La détection
des proéminences syllabiques: un aller-retour entre l’annotation manuelle et le
traitement automatique, in J. Durand, B. Habert & B. Laks (eds.), Actes du CMLF
2008: 1er Congrès Mondial de Linguistique Française, 2008, Paris, 1685–1698.
DOI: 10.1051/cmlf08256.
Solé Sabater, Maria-Josep (1991) Stress and Rhythm in English, Revista Alicantina de
Estudios Ingleses 4, 145–62.
Sosa, Juan-Manuel (1999) La entonación del español, Madrid: Cátedra.
Steinhauer, Karsten, Kai Alter & Angela D. Friedrici (1999) Brain potentials indicate
immediate use of prosodic cues in natural speech processing, Nature Neuroscience
2(2), 191–196.
Teston, Bernard (2004) L’œuvre d’Étienne-Jules Marey et sa contribution à l’émergence
de la phonétique dans les sciences du langage, Travaux Interdisciplinaires du
Laboratoire Parole et Langage 23, 237–266.
ToBI (1999) www.ling.ohio-state.edu/~tobi/.
Trager, George & Henry Smith (1951) An Outline of English Structure, Studies in
Linguistics: Occasional Papers 3, Norman, OK: Battenburg Press.
Transcriber (2014) http:// trans.sourceforge.net.
Ungurean, Catalin, Dragos Burileanu & Aurelian Dervis (2009) A statistical approach
to lexical stress assignment for TTS synthesis, International Journal of Speech
Technology 12(2/3), 63–73.
Vaissière, Jacqueline (1975) Caractérisation des variations de la fréquence du fonda-
mental dans les phrases du français, in VIèmes Journées d’Etude sur la Parole,
Toulouse, 39–50.
Vercherand, Géraldine, In-Young Kim & Hi-Yon Yoo (2011) Whispering French and
Korean: a comparative study, Linguistic Research 23(2), 81–95.
Viana, Céu & Sonia Frota et al. (2007) Towards a P_ToBI, http://labfon.letras.ulisboa.
pt/SonseMelodias/P-ToBI/P-ToBI.htm.
Von Essen, Otto (1956) Grundzuge der hochdeutschen satzintonation, Ratingen-
Dusseldorf: A. Henn Verlag.
Wang, Suiping, Deyuan Mo, Ming Xiang, Ruiping Xu & Hsuan-Chih Chen (2012) The
time course of semantic and syntactic processing in reading Chinese: evidence
from ERP’s, Language and Cognitive Processes, iFirst, 1–20.
Watanabe, Satosi (1969) Knowing and Guessing: A Quantitative Study of Inference and
Information, New York: Wiley.
Wightman, Colin W. (2002) ToBI or not ToBI? in B. Bel & I. Marlien (eds.),
Proceedings of the First International Conference on Speech Prosody, Aix-en-
Provence: SProSig, 25–29.
284 References
285
286 Analyzed corpora
287
288 Author index
Gachet, Frédéric 17, 200, 276 Magdics, Klara xi, 33, 106, 276
Garde, Paul 65, 120, 121, 128, 276 Makuuchi, Michirou 113, 279
Germain, Aline xxvi Marandin, Jean-Marie 273, 280
Ghitza, Oded 107, 108 Marey, Étienne-Jules 21, 283
Gilbert, Annie xxiii, xxvi, 63, 65, 76, 100, 108, Martin, Philippe i, iii, iv, xxii, 14, 15, 35, 38,
109, 114, 122, 274, 277 48, 52, 56, 58, 61, 64, 65, 66, 69, 73, 74,
Gilléron, Jules 36 76, 77, 79, 82, 83, 97, 99, 102, 103, 104,
Giraud, Anne-Lise 276, 277 105, 106, 108, 109, 110, 111, 121, 128,
Godement, Rémi xxvi, 228, 277 130, 131, 133, 135, 140, 144, 159, 200,
Goldman, Jean-Philippe 11, 28, 277, 283 207, 216, 219, 222, 228, 233, 237, 250,
Goldsmith, John 12, 38, 47, 277 252, 253, 272, 273, 274, 275, 277, 278,
Grammont, Maurice 23, 277 279, 280, 281, 282
Grevisse, Maurice 217 Martinet, André 58, 72, 280
Grice, Martine 273 Matta-Machado, Mirian xxvi
Gussenhoven, Carlos 51, 277 Mattingly, Ignatus G. 254, 279
Meigret, Louis 14, 58, 77, 97, 100, 214, 280
Hagoort, Peter 62, 279 Mello, Helena 216, 282
Halle, Morris 128, 277 Mertens, Piet 32, 35, 36, 37, 48, 51, 145, 280
Harris, J. D. 37, 282 Michelas, Amandine 280
‘t Hart 32, 60, 277 Miller, George A. 63, 77, 280
Hayes, Brice 128, 274 Mittmann, Maryualê 216
Henry, Molly 108, 110, 277, 283 Mo, Deyuan 283
Hirschberg, Julia 273 Moneglia, Massimo xxvi, 218, 219, 241, 245,
Hirst, Daniel 32, 40, 67, 122, 274, 275, 277, 274, 275
278, 279, 280 Morel, Mary-Annick 225, 230, 278, 280
Holleran, Susan 30, 281 Morgan, Terrell A. 273
Hualde, José I. 46, 54, 97, 277, 278 Morris, Andrew 277, 278
Mota, Antonia 273
Jassem, Wiktor 122 Mouret, François 192, 280
Jitcă, Doina 54 Munson, W. A. 30
Jones, Daniel 33, 278
Jun, 15, 39, 47, 131, 203, 204, 206, 207, 273, Nemni, Monique 280
277, 278, 281 Nespor, Marina 48, 97, 280
Niculae, Vlad 274
Karcevski, Serge 76, 278 Nishinuma, Yukihiro 282
Kim, In-Young 278, 283 Nowak, Paul 253, 280
Lacheret, Anne 41, 42, 219, 272, 278 Obin, Nicolas 41, 272
Ladd, Robert 49, 56, 278 Obleser, Jonas 277
Lanchantin, Pierre 11, 278 Obrig, Hellmuth 113, 281
Le Gac, David 104, 106, 278 Ochs, Elinor 43, 86, 281
Author index 289
290
Subject index 291
198, 203, 206, 207, 208, 210, 212, 213, narrow focus, xxviii, 53, 55
214, 215, 216, 217, 219, 224, 225, 227, neutralization, xxviii, 58, 91, 117
228, 232, 233, 237, 249, 250, 251, 256, 259,
272, 273, 275, 278, 279, 280, 281, 283, oxyton, 123, 125
285, 286
fundamental frequency, xxi, 7, 8, 10, 13, 20, 22, parenthesis, 17, 50, 83, 183, 192, 200,
23, 24, 25, 26, 29, 30, 31, 36, 38, 41, 46, 53, 202, 203, 219, 222, 223, 224, 232,
60, 62, 72, 73, 78, 138, 142, 148, 207, 210, 233, 245
216, 259, 260, 266, 270, 277 Paroxyton, 123, 124
phonetics, xxv, 15, 20, 44, 55, 273, 281,
homographs, 124, 126, 129, 130 282
phonology, xxi, 6, 12, 19, 23, 42, 43, 46, 47, 48,
iconicity, 71, 279 55, 58, 61, 76, 85, 124, 131, 256, 273, 277,
imperative, 52, 69, 72, 73, 74, 111, 121, 130, 281, 283
148, 234, 250 pitch curve, 16, 30, 36, 267
implicative, 70, 72, 73, 148, 232, 250 Pitchmeter, 23
incremental storage concatenation, xxvii, 78, Planarity, xxiii, xxviii, 83, 96
84, 87, 144, 149, 208, 235, 236 ponctuant, 225, 232
intensity, xxvii, 1, 2, 8, 9, 11, 13, 17, 30, 32, 37, Portuguese, xxii, 17, 19, 36, 54, 97, 110, 122,
59, 62, 64, 73, 78, 85, 87, 144, 235, 259, 267, 123, 124, 132, 133, 153, 155, 157, 158, 171,
270, 271, 282 172, 173, 174, 176, 191, 199, 224, 226, 245,
interrogative, 52, 53, 55, 68, 69, 70, 71, 72, 73, 259, 274, 276, 285, 286
74, 75, 78, 121, 125, 126, 130, 145, 151, 212, Praat, xxii, 16, 23, 25, 27, 31, 36, 41, 59, 259,
215, 216, 221, 222, 250 265, 268, 270, 277
IntSint, 32 preprepreproparoxyton, 124
Italian, xxii, 17, 19, 36, 52, 54, 64, 71, 81, 97, prepreproparoxyton, 124
98, 100, 121, 122, 123, 124, 126, 127, 128, preproparoxyton, 124
130, 132, 133, 135, 140, 151, 153, 154, 156, proparoxyton, 123, 125
169, 170, 177, 178, 183, 184, 188, 189, 199, prosodic boundary, xxviii
200, 201, 202, 203, 224, 226, 241, 259, 274, prosodic constraints, 135, 280
275, 279, 282, 285, 286 prosodic Contour, xxvii
prosodic eraser, 226
kymograph, 20, 21, 23, 32 prosodic Marker, xxvii
prosodic phrasing, 82
laboratory phonology, xxii, 13, 49, 217 prosodic structure, xxii, xxiii, xxvii, xxviii, 5, 6,
laryngeal frequency, 1, 2, 3, 4, 5, 7, 13, 16, 21, 11, 12, 13, 14, 15, 16, 17, 19, 23, 24, 39, 42,
22, 29, 30, 73, 86 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56,
Latin, 17, 19, 120, 122, 128, 129, 130, 132 57, 58, 59, 61, 62, 68, 69, 71, 76, 77, 78, 79,
80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 92, 96,
macrosegment, 219, 221, 222, 223, 231, 233, 102, 103, 106, 112, 113, 114, 115, 116, 118,
234, 235, 242, 248 120, 121, 133, 134, 135, 136, 137, 138, 140,
macrosyntax, xxiii, xxv, xxviii, 50, 84, 92, 102, 142, 144, 145, 150, 153, 159, 160, 169, 182,
138, 145, 148, 192, 215, 216, 217, 218, 221, 184, 185, 188, 192, 200, 208, 210, 222, 223,
228, 230, 248, 256 224, 226, 227, 228, 231, 232, 233, 235, 239,
melodic contours, xxviii, 6, 16, 19, 32, 39, 41, 244, 249, 250, 252, 253, 254, 255, 275,
49, 54, 56, 70, 71, 72, 73, 75, 79, 82, 86, 88, 280, 283
89, 90, 91, 92, 94, 96, 122, 133, 134, 137, prosodic word, xxii, xxiii, xxvii, 45, 59, 61,
138, 141, 142, 144, 145, 148, 149, 175, 62, 65, 71, 76, 77, 78, 81, 82, 83, 84, 88, 92,
185, 191, 194, 195, 196, 206, 210, 212, 213, 94, 96, 97, 98, 99, 101, 102, 103, 104, 105,
219, 221, 222, 229, 231, 235, 236, 249, 112, 113, 117, 133, 134, 135, 138, 140, 145,
257, 279 148, 150, 160, 164, 169, 172, 185, 190, 191,
melodic curve, 7, 8, 9, 10, 16, 21, 59, 60, 138, 192, 195, 197, 200, 208, 213, 221, 222, 224,
204, 207, 210, 217, 219, 233 225, 226, 227, 228, 231, 235, 236, 252,
Momel, 40, 41 254, 257
292 Subject index
prosogram, 35, 36, 37 121, 133, 136, 142, 144, 145, 192, 214, 215,
216, 217, 218, 220, 225, 226, 233, 248,
respiratory cycle, 1 257, 259
Romance language, xxii, xxiii, 17, 19, 36, 49, stress clash, xxiii, xxviii, 47, 58, 97, 98, 106,
52, 54, 58, 61, 63, 66, 78, 79, 80, 81, 87, 91, 113, 131, 206, 251, 278
94, 95, 97, 98, 100, 110, 120, 121, 122, 123, stressed syllable, 13, 122, 123, 137, 151
124, 126, 130, 132, 133, 134, 136, 137, 138, subvocalization, 112, 252, 253, 254, 255, 280
139, 140, 144, 145, 146, 148, 149, 150, 154, surprise, 5, 72, 75
160, 161, 169, 180, 182, 192, 198, 201, 212, syllabic chunk, 65, 66, 77, 78
213, 214, 216, 219, 226, 227, 228, 282 syllabic duration, 11, 12, 67, 144
Romanian, xxii, xxv, xxvi, 17, 19, 54, 122, 123, syntactic clash, xxiii, xxviii, 6, 58, 98, 105, 106,
124, 126, 127, 132, 133, 135, 153, 161, 175, 107, 115, 117, 251, 257
179, 180, 181, 182, 185, 191, 224, 226, 274,
285, 286 Theta brain wave, 107, 109, 111
ToBI, 12, 13, 24, 25, 29, 31, 32, 33, 38, 39, 40,
sense group, 101, 132 41, 42, 46, 47, 49, 52, 54, 55, 56, 88, 140,
silent reading, xxiii, xxvii, 15, 16, 112, 116, 204, 273, 275, 281, 283
252, 253, 254, 255 Transcriber, 27, 259, 265, 266, 270
Spanish, xxii, 17, 19, 36, 54, 71, 98, 122, 123,
124, 125, 132, 133, 140, 152, 153, 154, 156, voiced, 1, 3, 7, 10, 32, 60, 62, 67, 134, 140, 146,
170, 174, 186, 189, 190, 201, 224, 226, 259, 148, 190, 207
273, 278, 285, 286
spectrograph, xxi, 21, 23, 24, 32 WinPitch, xxii, 11, 23, 25, 26, 27, 28, 31, 59,
spontaneous speech, xxvii, 14, 16, 17, 20, 25, 133, 136, 216, 259, 260, 261, 262, 264, 266,
28, 57, 65, 66, 78, 103, 106, 108, 113, 116, 268, 270