Dankovicova 2007

J.
Dankovičová,
LANGUAGE J. House,
AND SPEECH, A. Crooks,
2007, K. Jones
50 (2), 177 – 225 177
The Relationship between

Musical Skills, Music Training,
and Intonation Analysis Skills
Jana Dankovičová, Jill House,
Anna Crooks, Katie Jones
University College London, U.K.
Key words Abstract

Few attempts have been made to look systematically at the relationship
intonation
between musical and intonation analysis skills, a relationship that has been
intonation to date suggested only by informal observations. Following Mackenzie Beck
analysis (2003), who showed that musical ability was a useful predictor of general
phonetic skills, we report on two studies investigating the relationship between
music musical skills, musical training, and intonation analysis skills in English. The
specially designed music tasks targeted pitch direction judgments and tonal
musical memory. The intonation tasks involved locating the nucleus, identifying the
skills nuclear tone in stimuli of different length and complexity, and same / different
contour judgments. The subjects were university students with basic training
musical in intonation analysis. Both studies revealed an overall significant relationship
training
between musical training and intonation task scores, and between the music
test scores and intonation test scores. A more detailed analysis, focusing on
the relationship between the individual music and intonation tests, yielded a more complicated
picture. The results are discussed with respect to differences and similarities between music and
intonation, and with respect to form and function of intonation. Implications of musical training
on development of intonation analysis skills are considered. We argue that it would be beneficial
to investigate the differences between musically trained and untrained subjects in their analysis
of both musical stimuli and intonational form from a cognitive point of view.
1 Introduction
A course in practical phonetics will typically incorporate training in the perception,
representation, and performance of a wide range of individual speech sounds, together
with the analysis of prosodic features of speech such as stress, rhythm, and intonation.
While most learners can become proficient in all these phonetic skills given time and
Acknowledgments: We are very grateful to Andrew Faulkner for advice on statistics and crea-
tion of the musical stimuli. We would also like to thank him and Yi Xu for their comments
on an earlier draft. Finally we very much appreciated comments and advice from K. Overy,
who reviewed the paper, and from an anonymous reviewer.
Address for correspondence. Jana Dankoviïová, Department of Human Communication
Science, University College London, Chandler House, 2 Wakefield Street, London WC1N
1PF, United Kingdom; phone: +44 20 7679 4251; e-mail < j.dankovicova@ucl.ac.uk >.
Language and Speech
‘Language and Speech’ is © Kingston Press Ltd. 1958 – 2007
Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

178 Musical skills and training vs. intonation analysis skills
practice, some inevitably show a greater aptitude than others. When it comes to the
auditory analysis of intonation, the range of ability may be particularly wide: some
will have no difficulty in hearing pitch movements, in classifying them as a particular
tone type, and mapping the pitch patterns correctly to the spoken text; others may
find these activities particularly difficult. An aptitude for segmental phonetics does
not necessarily imply competence in intonation analysis, suggesting that the two skills
may be tapping into different resources.
In their capacity as either teachers or learners of phonetics, like many others
before them, the authors informally observed that those students who pick up intona-
tion analysis skills quickly are often musicians. This observation may be unsurprising,
since similarities between speech and music are self-evident: both involve phrasing,
metrical structure and pitch patterns. Musical notation, or an interlinear notation
approximating to a musical stave (pioneered by Klinghardt, 1920), is frequently used
to represent intonation (e.g., Cruttenden, 1997; Delattre, 1966; Jones, 1960; Kingdon,
1958; O’Connor & Arnold, 1961 / 1973; ). Bolinger recognizes that “since intonation
is synonymous with speech melody, and melody is a term borrowed from music, it
is natural to wonder what connection there may be between music and intonation”
(1985, p. 28) but notes that there are also important differences: For example, music
usually involves a succession of discrete notes on which a pitch is sustained, whereas
in speech melodies a true pitch sustention is the exception rather than the rule. A
further account of important similarities and differences between speech / language
and music may be found in, for example, Besson and Schön (2001), Patel, Gibson,
Ratner, Besson and Holcomb (1998), Patel and Peretz (1997).
Neuroscientific literature provides a mixed picture with regard to links between
music and language processing. On the one hand, a number of neuropsychological
studies (e.g., Marin, 1982; Sergent, 1993), and more recently cases of congenital amusia
(tone deafness) in healthy subjects (Ayotte, Peretz, & Hyde, 2002; Patel, Foxton, &
Griffiths, 2005), showed that disorders of music and prosodic processing could be
dissociated. On the other hand, the clinical association of prosodic and musical
disorders is also often noted (e.g., Nicholson, Baum, Kilgour, Koh, Munhall, &
Cuddy, 2003; Patel, Peretz, Tramo, & Labrecque, 1998), and, as Platel (2002) points
out, cases of “pure” amusia are extremely rare. In terms of functional lateralization,
the traditional generalization has been that spoken language processing is usually
localized in the left hemisphere, and music processing in the right hemisphere (e.g.,
Altenmüller, 2001; Platel, 2002). However, this turns out to be simplistic. Evidence
accumulated from a range of techniques, including neuropsychological investigations
of the processing deficits of brain-damaged subjects (Emmorey, 1987; Liégois-Chauvel,
Peretz, Babai, Laguitton, & Chauvel, 1998; Patel et al., 1998; Peretz, 1990; 2001;
Perkins, Baran, & Gandour, 1996), dichotic listening experiments (Bever & Chiarello,
1974; Blumstein & Cooper, 1974) and brain imaging (fMRI, PET) (Zatorre, Evans, &
Meyer, 1994; Peretz, Kolinsky, Tramo, Labreque, Hublet, Demeurisse, & Belleville,
1994) has demonstrated that both prosodic and music processing are complex, and
often involve co-operation between the hemispheres. We shall return to this point in
more detail below.
One aspect of this complexity is the claim that both music processing and speech
and language processing are modular in nature. With regard to music processing,
Language and Speech

J. Dankovičová, J. House, A. Crooks, K. Jones 179
there has been some evidence that the left hemisphere may be important in processing
temporal aspects of music (Peretz, 1990; Samson, Ehrlé, & Baulac, 2001), while the
right hemisphere is involved in pitch processing (e.g., Liégeois-Chauvel, Girrud, Badier,
Marquis, & Chauvel, 2001; Zatorre, 2001; Zatorre et al., 1994). However, a number
of recent studies have pointed out that music perception involves a more complicated
system of subcomponents / modules, such as tonal encoding, interval analysis, melodic
contour analysis, harmony, rhythm, tempo, and meter, and have investigated their
corresponding neural subsystems (e.g., Parsons, 2001; Peretz & Coltheart, 2003; Platel,
Price, Baron, Wise, Lambert, Frackowiak, Lechevalier, & Eustache, 1997). These studies
suggest that the neural systems underlying music perception are distributed throughout
the left and right hemispheres, with different aspects of music being processed by
distinct neural circuits.
Within the domain of speech prosody, it has been claimed that pitch-related aspects
of speech are predominantly processed in the right hemisphere (e.g., Zatorre, Evans,
Meyer, & Gjedde, 1992), but that the left hemisphere is crucial when linguistic meaning
of intonation is involved (e.g., Blumstein & Cooper, 1974; Emmorey, 1987; Perkins et
al., 1996). A number of neuropsychological studies have also suggested a dichotomy
between affective and linguistic prosody in terms of lateralization, and suggested that
linguistic dysprosody appears to be associated with left-hemisphere damage and affec-
tive dysprosody with right-hemisphere damage (e.g., Emmorey, 1987; Geigenberger &
Ziegler, 2001; Karow, Marquardt, & Marshall, 2001). However, contradictory evidence
has also been reported (e.g., Weintraub, Mesulam, & Kramer, 1981).
We have mentioned above that conceptually music and speech have a number of
similarities and differences. In recent years a number of neuroscientific studies focused
on identifying cognitive and neural resources that might be shared between speech
and music processing. In their survey of neuropsychological evidence, Patel and Peretz
(1997) suggest that perception of pitch contour and rhythmical grouping / phrasing
(using phrase-final lengthening as a cue) employ some of the same neural resources
in music and language, while “tonality” (structural relationship between tones and
chords) appears to draw on resources used uniquely by music. Moreover, it has been
suggested that there is also an overlap with syntactic and semantic processing. It
has been shown that musical information can have a systematic influence on the
semantic processing of words (Koelsch, 2005). In terms of syntax, both language and
music are rule-based systems, with syntax governing sentence structure and harmony
favoring the expectancy of specific sequences of notes and chords within a musical
piece (Schön, Magne, & Besson, 2004). Some neuroimaging studies revealed a similar
pattern of brain activity for syntactic processing in music and language (e.g., Koelsch,
Gunter, Cramon, Zysset, Lohmann, & Friederici, 2002; Maess, Koelsch, Gunter, &
Friederici, 2001; Patel et al., 1998). However, a number of neuropsychological studies
provide contradictory evidence in terms of dissociations (e.g., Peretz, Ayotte, Zatorre,
Mehler, Ahad, Penhune, & Jutras, 2002). Patel (2003) offers, as a possible “explanation”
for these discrepancies a hypothesis that language and music operate on different
structural representations (localized in different areas of brain), but share a common
set of processes (displaying similar patterns of brain activation).
Finally, apart from research demonstrating an overlap between musical and
speech / language processing, there have also been a number of correlational and
Language and Speech

quasi-experimental studies that suggest a link between musical training and perfor-
mance in other cognitive tasks, such as, for example, verbal memory (Ho, Cheung, &
Chan, 2003), spatial ability (Hetland, 2000), reading ability (Hurwitz, Wolff, Bortnick,
& Kokas, 1975), phonological and spelling skills (Overy, 2000), and mathematics
achievement (Cheek & Smith, 1999). However, Schellenberg (2004) claims that these
associations may stem from a common source, namely, general intelligence, and
demonstrates that music lessons can enhance IQ.
Studies investigating effect of musical training on prosody are hard to find. We
are aware of only two studies in this area. A study by Schön et al. (2004) revealed that
musicians (defined here as having at least 15 years of training) were more accurate
in detecting pitch violations in music and also language — intonation in declarative
sentences. Thompson, Schellenberg, and Husain (2004) showed that music lessons
enhance listeners’ ability to decode emotions conveyed by speech prosody both in
adults and children, although they point out that a similar effect was achieved also
by drama training.
If speech and music are in many respects similar, then most of us are at some
level competent musicians. Whether or not we have studied music, we produce and
interpret the intonation patterns of our native language appropriately, showing that
our auditory systems and higher levels of processing are well tuned to our linguistic
needs. Those studies reporting on intonation processing are doing so mostly in the
context of linguistic interpretation; they are not investigating what happens in the
brain when subjects undertake an auditory-based formal analysis of an intonation
contour, as required of students undergoing phonetic training. The issue in practical
phonetics training is not whether learners can “do” intonation, but whether they can
acquire competence in the auditory analysis of speech melodies: whether they can
classify them according to some abstract representation of pitch patterns, segment
them into component parts, recognize the important similarities and differences
among patterns produced by different speakers, and learn to perform an intonation
pattern based on an abstract representation. These tasks arguably resemble skills
learned when studying music.
Few attempts have been made to look systematically at the relationship between
musical and intonation analysis skills. A connection between musical ability and
overall phonetic ability was reported by Mackenzie Beck (2003), who administered a
battery of tests to a group of students before they embarked on any phonetic training.
One set of tests was based on aural tests used for training young musicians, whereas
the other tests were speech-based and designed to tap more explicitly into phonetic
ability. She found that success in the musical tests was at least as good a predictor of
later success in practical phonetics examinations as the speech-based tests. There was
thus some correlation between musical and phonetic ability, but it was by no means
a necessary one. In Beck’s study, intonation analysis was not considered separately
from other phonetic skills, so it is not possible to judge whether performance in this
area was particularly well correlated with performance in the music tests.
The work reported here compares the performance of phonetically-trained
subjects in two sets of tasks: one set involves musical tests of pitch perception and
tonal memory, whereas the other set involves skills of auditory intonation analysis.
Language and Speech

All subjects were students who had completed their formal training in practical
phonetics, including an introduction to a British-school (nuclear tone-based) model
of intonation analysis. The emphasis in both sets of tasks was on aspects of pitch
perception and analysis, rather than on rhythmic or metrical analysis. No production
task was involved.
The presentation of this work is unusual, in that it brings together the findings
of two experimental studies which were conceived independently but conducted
partly in parallel. The aim of both studies was essentially the same: to investigate the
relationship between musical ability and intonation analysis skills. Different criteria
may be used to assess musical ability: for example, the performance level achieved
after a period of training; or musical aptitude in terms of the scores achieved in certain
types of aural test. We took both types of measure into account when analyzing our
results. Primary objectives in both studies may be summarized by the following
research questions:
Is there a relationship between musical training and performance in music tasks?
Is there a relationship between musical training and performance in intonation

analysis tasks?
Does musical “aptitude” (as measured by performance in the music tasks)

correlate positively with performance in the intonation tasks?
In all the above, our prediction was that musical training and / or aptitude would
be a significant predictor of proficiency in intonation analysis.
A secondary objective was to ask:

Are some intonation forms harder to identify than others (and are these the
same regardless of musical ability)?
Both studies consisted of three parts: (i) a questionnaire eliciting details of
subjects’ musical training and level of achievement; (ii) a set of musical aural tests; (iii)
a range of intonation analysis tasks. There were differences in detail between the two
studies, notably in the specific intonation tasks included. Study 1 required subjects to
make same / different contour judgments, whereas Study 2 included nucleus location
tasks. Both studies tested the ability to identify nuclear tones, but differed in the degree
of detail required, and controlled different aspects of the overall contour in designing
the stimuli. Both studies focused only on the tasks that the students had been trained
in. Appendix 1 summarizes the tasks and stimuli used in the two studies.
The intonation tasks were designed to be consistent with subjects’ training in
the “British school” (e.g., O’Connor & Arnold, 1961/1973). In this framework, each
intonational phrase is divided into up to four constituents: ‘prehead’ (any unaccented
syllables before the first accent in the intonation phrase), ‘head’ (a stretch starting
with the first accent in the intonation phrase up to the nucleus), ‘nucleus’ (the last
accented syllable in the intonation phrase), and ‘tail’ (any remaining, unaccented,
syllables after the nucleus till the end of the intonation phrase). For an example,
see Figure 1.
Language and Speech

Figure 1
Interlinear representation and analysis of a sentence with a Fall-Rise nuclear tone
[he] [got a dis] [ tinc] [tion]
prehead head nucleus tail
The nucleus is the only obligatory component within the intonation phrase, and
is the syllable on which one of a set of pitch patterns known as nuclear tones begins.
Tones may be simple (Fall, Rise), or complex (Fall-Rise, Rise-Fall, Rise-Fall-Rise).
Simple tones may be further subdivided according to pitch range (High Fall, Low
Fall, High Rise, Low Rise). Figure 2 presents schematic representations of the basic
tone types, as realized on monosyllables and Figures 3 and 4 present examples of F0
tracks, aligned with a waveform, illustrating two of the nuclear tones.
Figure 2
Schematic representations of the basic tone types, as realized on
monosyllables
Fall Rise Fall-Rise Rise-Fall Rise-Fall-Rise
2 Study 1
2.1 Method
2.1.1 Material
(1) Questionnaire
The first task for all subjects was to complete a questionnaire detailing their musical
background and training (number of years of musical training, starting age, grade
achieved, instrument(s) played, listening preferences).
(2) Musical aural tests
The musical aural tests were inspired by the Seashore Measures of Musical Talent
(Seashore, Lewis, & Saetveit, 1960), and assessed subjects’ pitch direction judgments
and tonal memory. The skills involved in these tasks have been investigated, for
example, in research on pitch perception deficits (e.g., Foxton, Dean, Gee, Peretz,
& Griffiths, 2004).
Language and Speech

Figure 3
F0 track of an utterance I am very tired with High Fall nuclear tone (underlined)
I am very tired. (High Fall)
Figure 4
F0 track of an utterance I wonder if we could borrow the money? with Low Rise
nuclear tone (underlined)
I wonder if we could borrow the money? (Low Rise)
A CD copy of the 33 1/3 rpm Long Playing recording of the original Seashore
test was not considered of good enough quality as some of the tones had been
distorted due to warping of the original gramophone recording. Therefore new
stimuli were created, using computer technology.
Language and Speech

Pitch direction judgments

The stimuli for pitch direction judgments consisted of pairs of tones, created in
Matlab. Each tone had a saw-tooth waveform and consisted of a series of harmonics
(1 – 20), with the amplitude of each harmonic being the inverse of the harmonic
number. In this way the spectral envelope of the tones was similar to that of voiced
speech. The fundamental frequency of the tones varied around 200 Hz, to match the
average fundamental frequency of the female speaker who recorded the stimuli for
the intonation analysis tasks.
The difference between the tones within a pair was measured in semitones,
ranging between 1/2 semitone and 1/64 semitone, corresponding to 5.8 Hz and
0.2 Hz respectively. Just noticeable differences for complex tones within the range
125 – 500 Hz have been reported to be 0.3 – 0.4% (Stock & Rosen, 1986). For a funda-
mental frequency of 200 Hz this corresponds to 0.6 – 0.8 Hz. Faulkner (1985) reports
thresholds of 1.2– 0.2 Hz for 200 Hz complex tones, depending on the task. Six sets
of stimuli, each consisting of 16 pairs of tones with a particular fraction of semitone
difference, were assembled (thus the total number of pairs was 96). The specification
for each set is shown in Table 1.
Table 1
Pitch direction judgments stimuli
Frequency range Difference in frequency Difference in pitch

[Hz] [Hz] [semitones]
Practice trial 194.3 - 205.9 11.6 1
Set A 197.1 - 202.9 5.8 1/2
Set B 198.6 - 201.5 2.9 1/4
Set C 199.3 - 200.7 1.4 1/8
Set D 199.6 - 200.4 0.7 1/16
Set E 199.8 - 200.2 0.4 1/32
Set F 199.9 - 200.1 0.2 1/64
Within each set, in eight pairs the second tone was higher than the first and in
the other eight pairs it was lower (the pairs were randomized within each set). The
tones were 200 ms long, the gaps between the tones within the pair were also 200 ms,
and the pairs were separated by 3 s silence to allow subjects to respond. After each
group of eight pairs, a silence of 8 s was inserted, to enable subjects to monitor their
place in the task.
The task involved marking on a response sheet whether the second member of
the pair was higher or lower in pitch than the first one.
Language and Speech

Tonal memory
The material for the tonal memory tasks consisted of pairs of melodies in which the
second melody was the same as the first one apart from one tone that was a whole tone
higher or lower than its counterpart in the first melody. In all cases the overall contour
was not violated, in that the pitch direction between adjacent tones remained the same
in the changed melody, that is, it stayed basically rising or falling. (Contour violation
has been shown to have an important effect on same / different contour judgments, see
Foxton et al., 2004). The material was again based on the CD copy of the Seashore
test, in which the melodies were played on a piano. An audio editing software, Cool
Edit 96, was used to match the tones’ frequency to notes on the musical scale. The
edited tones were then made audible using a music-writing program, WinJammer,
as played on a synthesized piano. The pitch range used for all melodies was within a
female vocal (singing) range (A4 – E5; corresponding to a range of 440 Hz – 659 Hz).
The melodic sequences were not in any particular key and the pitch interval between
the tones within each melody was never larger than an octave. Finally, a computer
program, Cakewalk Professional, was used to ensure that all the tones were the same
length (600 ms) and that the gap between the melodies within the pair and between
the pairs of melodies was always the same (400 ms and 6 s, respectively). The speed
of presentation was 100 crotchet beats / min, which corresponded to the original
Seashore recording.
In this Study 3-tone, 4-tone, and 5-tone melodies were used, with 10 pairs
in each group. The sets were separated by a 10 s pause. Subjects had to judge and
mark on a response sheet (i) which tone in the second melody had changed, and (ii)
indicate whether the changed tone was higher or lower than the corresponding one
in the first melody.
(3) Intonation analysis tasks

Two types of intonation analysis tasks were employed in Study 1— auditory discrimina-
tion (same / different judgments of intonational contour) and nuclear tone identification.
All the material consisted of natural speech, read by a professional female phonetician
from an intonationally marked script and recorded in an anechoic chamber, using
a BandK 4165 microphone and a DAT recorder. The sentences in both tasks were
between 4 – 7 syllables long, and produced as if with broad focus. In 50% the nucleus
fell on the final syllable, so that there was no tail component, while in the other 50%
it fell on the penultimate syllable, giving a monosyllabic tail. This design allowed
comparison of success rate in the nuclear tone judgment task between sentences in
which nuclear tone was mapped on to a single syllable, as opposed to two syllables.
Auditory discrimination
The material for the auditory discrimination task consisted of 30 pairs of lexically
identical sentences, played in random order, 15 with an identical intonation contour,
15 with entirely different intonation contours, that is, different head and different
nuclear tone, though the nucleus itself was always in sentence-final position. See
Figure 5 for an example. There was a gap of 7 s between each sentence pair. Subjects
had to mark on a response sheet whether the contours in a pair of sentences were
the same or different.
Language and Speech

Figure 5
An example of a pair of (different) F0 contours in the sentence The flowers are
from Adam for the auditory discrimination task
Nuclear tone identification

The material for nuclear tone identification consisted of three sets of stimuli. Each
set consisted of 15 pairs of lexically identical sentences whose intonation contours
differed systematically. In Set 1 the paired sentences had the same head pattern but
different nuclear tones (hence SHDN). In Set 2 they had the same nuclear tone but a
different head pattern (DHSN) and in Set 3 the sentences differed in both head and
nuclear tone (DHDN). Each sentence in a pair had been recorded with two different
Language and Speech

Figure 6
F0 track of an utterance John’s umbrella is at Karen’s. Two different nuclear tones
(Low Rise—top figure; High Fall—bottom figure) have been cross-spliced onto
the same prenuclear contour (SHDN)
head+ nucleus contours, and the stimuli in these sets were created by cutting and
splicing the recordings (see Fig. 6 for an example). The sentences were not played in
pairs, unlike in the auditory discrimination task, but were randomized within their
respective sets. Seven different nuclear tones were represented more or less equally
across the sentences: four simple tones (High Rise, Low Rise, High Fall, Low Fall),
and three complex (Fall-Rise, Rise-Fall, Rise-Fall-Rise). Subjects were asked to
Language and Speech

identify the nuclear tone in each sentence, and used a response sheet on which they
selected from one of eight possibilities: the seven nuclear tones mentioned above,
plus a “phantom” tone — Fall-Rise-Fall — introduced to balance the set of tones
in terms of complexity and also to serve as a distractor. In order to focus subjects’
attention on the type of nuclear tone, rather than on its placement, the nucleus itself
was underlined in each sentence printed on the response sheet. Seven seconds silence
was inserted between the individual stimuli to enable subjects to respond.
2.1.2 Subjects
Twenty-four female subjects were recruited from the final year postgraduate Speech
and Language Science course at University College London. They all had completed
and passed the phonetics module, which covered, apart from other phonetic topics,
training in intonation analysis, specifically judging placement of the nucleus and
identifying nuclear tone. All subjects were native English speakers, with no known
hearing disorders, aged between 23 and 45 years.
2.1.3 Experimental procedure

After completing the questionnaire, subjects were then tested on the musical aural
tasks, starting with pitch direction judgments and followed by the tonal memory tasks.
Subsequently they performed the intonation analysis tasks, starting with auditory
discrimination and followed by the nuclear tone identification task (the order of
presentation of the three sets was SHDN, DHSN, DHDN, see above). The order of
presentation was the same for all subjects. Both music and speech stimuli were played
via headphones and subjects were asked to mark their choices on a response sheet.
Tasks were explained verbally, with examples, and several practice items were given.
Prior to the intonation analysis tasks, the experimenter briefly revised the nuclear
tones with the subjects. The whole procedure took about 45 mins to complete.
2.2 Results
In this section we shall present the results for each of the areas of investigation outlined
in the Introduction. In presenting some of these results we have made a preliminary
classification of subjects into ‘musicians’ and ‘nonmusicians’, based on their musical
experience as recorded in the completed questionnaires. The cut-off point for being
classed as ‘musician’ was achieving at least Grade 3 in an Associated Board of the
Royal Schools of Music (ABRSM) examination. This subdivision, though crude, was
not arbitrary and may be justified as follows.
We hypothesized that having done some musical aural training would lead to
better scores both in musical and intonation tasks that we designed. At the same
time we recognize that active musical participation over some period of time may
improve one’s sensitivity to pitch and musical memory even without explicit aural
training, and that people without any musical training or much exposure to music
might be naturally very gifted musically. However, considering musical aural training
that is part of music exams ensures a better comparability over subjects, as they are
exposed to the same requirements for passing the exams. It was not straightforward
to establish a cut-off point for the classification of musicians versus nonmusicians,
as our stimuli had important differences from those used in the ABRSM exams. In
Language and Speech

Grade 2 the aural component involves recognizing a melodic change to a 2-bar phrase
in a major key, and in Grade 3 to a 4-bar phrase in a major or minor key, with the
change being made in the second playing in both cases. The tonal sequences used in
our tonal memory tasks, though short, had no key structure to aid the memory. We
chose Grade 3 as a cut-off point, since we considered the longer sequences used the
Grade 3 to be counterbalanced in terms of task difficulty by the lack of key structure
in our stimuli. Given our criterion, 14 subjects in Study 1 were classified as musicians
and 10 subjects as nonmusicians.
Other data from the questionnaires, such as number of years of training, were
considered as criteria for classifying subjects into musicians and nonmusicians, but
rejected. Starting age, intensity of training and the number of instruments played
varied widely across subjects with the same length of musical training. To take these
factors into account would have led to very small subgroups of subjects, making the
statistical analysis impossible.
As a starting point we performed a correlation analysis and found a highly significant
relationship between the grade achieved and the overall music scores (r = .777, p < .001).
Similarly, a highly significant correlation was found between grade and pitch direction
judgments (r = .776, p < .001), and between grade and judgments of pitch direction
change in the tonal memory task (r = .562, p = .004). The correlation between grade
and judgments of place of change in the tonal memory task was not significant.
For further analysis we divided subjects into two groups — ‘musicians’ and
‘nonmusicians’ — as described above. In the first part of the analysis we examined
the extent to which musical training is reflected in the subjects’ performance in our
music tasks (musical aptitude).
A one-way ANOVA was performed both on the pitch direction judgments and
the tonal memory task, summarized in Figure 7. In the tonal memory task subjects
made two judgments: which tone in the 2nd melody changed in comparison with the
1st melody (‘place’), and in which direction, higher or lower (‘pitch’). In the statistical
analysis and in Figure 7 the two are represented separately. In both cases, the depen-
dent variable was the percentage score and the independent variable was musical
training (musician vs. nonmusician). In pitch direction judgments we averaged the
scores across the sets for each subject.
For pitch direction judgments there was a highly significant effect of musical
training, F(1, 22) = 21.0, p < .001, with 49% of variance accounted for and little overlap
in distribution of scores between the two groups of subjects (mean scores for musicians
and nonmusicians were 74% and 59% respectively). A closer examination of scores in
the individual test sets showed that some subjects were performing well above chance
even for the smallest intervals. Regarding the tonal memory task, the two subject
groups were not significantly different from each other with respect to either place or
pitch judgments. Figure 7 shows that judging which tone had changed was too easy a
task for both groups of subjects and led to a ceiling effect (the solid bar represents the
median, the box represents the interquartile range which contains 50% of values, and
the whiskers extend from the box to the highest and lowest values, excluding outliers,
Language and Speech

which are cases with values between 1.5 and 3 box lengths from the upper of lower
edge of the box). Pitch scores discriminated between musicians and nonmusicians
slightly better, with musicians reaching on average a higher score, but the difference
just missed significance (p = .08).
100 Figure 7
90 Relationship between
80 musical training and
performance in music
70
tasks
Score [%]
60
50
Task
40
30 Pitch dir. judgm ent
20
Tonal memory: Place
10
0 Tonal memory: Pitch
musician non-musician
Musical training
For the tonal memory task we also examined the effect of melody length in both
groups of subjects. A two-way ANOVA accounted for 31% of variance in the data,
showing a significant effect of musical training, F ( 1, 68) = 6.2, p = .015, and melody
length, F ( 2, 68) = 14.1, p < .001, while the interaction was not significant. Despite the
fact that statistically both musicians and nonmusicians behaved similarly overall,
Figure 8 reveals some interesting differential trends.
100 Figure 8
90 Effects of musical
80 training and melody
length on performance
70 in tonal memory task
Score [%]
60 (pitch)
50
Melody length
40
30 3-note
20
4-note
10
0 5-note
Musical training
Language and Speech

While musicians performed relatively well still on the 4-note melodies (group
mean 84%), a result comparatively at a similar level to the easier 3-note melodies
(89%), nonmusicians demonstrate a clear drop in their performance already for the
4-note melodies (69% as opposed to the 3-note melodies with 87%).
analysis tasks?
An analysis similar to that for the musical tasks above was conducted for the intonation
analysis tasks. A one-way ANOVA on same / different contour judgment scores showed
no significant effect of musical training. Both groups were at the ceiling level, suggesting
that this task was too easy (musicians: mean score 93%, nonmusicians: 95%).
By contrast, musicians outperformed nonmusicians when it came to nuclear
tone identification. Figure 9 illustrates the distribution of correct scores for both
subject groups. The effect was highly significant in all three groups of stimuli; SHDN
F ( 1, 22) = 19.0, p < .001; R 2 = .44; DNSN F ( 1, 22) = 9.1, p = .006; R 2 = .26; DHDN
F ( 1, 22) = 10.1, p = .004; R 2 = .28. It is worth noting that among the musicians the
distribution is markedly wider than among nonmusicians, and that even with their
lower scores, nonmusicians were still well above the chance level (12.5%, as there
were 8 choices). Further analysis showed that within each group of subjects the three
sets of stimuli were not statistically different from each other. For illustration, see
Figure 9.
100 Figure 9
90 Ef fe ct s of mu sic a l
80 training and stimulus
70 type on performance
in nuclear tone judg-
Score [%]
60 ment tasks
50
Tasks
40
30 SHDN
20
DHSN
10
0 DHDN
Musical training
Relationship between musical training, musical aptitude and intonation analysis skills
The results reported in the previous section suggest that when the classification of
subjects into musicians and nonmusicians is based on them reaching at least Grade 3 in
music exams, musicians perform significantly better both in music tasks and intonation
tasks. We were further interested to find out whether musical aptitude, as measured
by the musical aural tests, would be a significant predictor of performance in the
intonation tasks, and to discover which of the measures of musical competence, if
Language and Speech

any, is the best predictor of the intonation analysis skills, in particular to the nuclear
tone judgments (we did not examine same / different contour judgments further, as
the majority of subjects reached ceiling in this task). However, first we performed a
simple correlation between the overall music scores and the overall intonation scores
to obtain an overall picture of the relationship. The results showed a highly significant
positive correlation (r = .799, p < .001; R 2 = .638). For the graph, see Figure 10.
100
Figure 10
Correlation between
90
Overall intonation scores
overall music scores

and overall intonation
80
scores: Scatter plot with
a line of best fit
70
60
50
40
30
80 90 100 110 120 130 140 150 160
Overall music scores
We also considered a possibility that higher grades than Grade 3 might also be
crucial contributors to skills in intonation analysis. In this way we can treat musical
experience independently from musical aptitude as measured in our aural tests, and
look for correlations between the highest grade achieved by each subject and skills
of intonation analysis (two subjects had a degree in music; for the purposes of the
analysis we coded this level of training as Grade 9). The results were again highly
significant, showing that the higher the grade the higher the overall intonation score
(r = .671, p < .001; R 2 = .450), Appendix 2.
A step-wise multiple regression was used for the detailed analysis, with independent
factors being: grade; pitch direction judgment scores; tonal memory: place scores and
tonal memory: pitch scores. As the dependent variable we entered for each subject the
mean score across all three sets of stimuli (SHDN, DHSN, DHDN), since our previous
analysis showed that the scores in the three sets were not statistically different.
The regression model as a whole was highly significant, F ( 1, 22) = 38.1, p < .001,
and accounted for 63.4% of variance in the data, but only one predictor was respon-
sible for this — pitch direction judgment scores (t = 6.2; p <.001), that is, an aspect of
musical aptitude that involved judging whether the 2nd tone in a pair was higher or
lower in pitch than the 1st one, when the pitch interval was a fraction of a semitone.
The other two musical skills (judging place and pitch of a changed tone in one melody
in comparison with another) were not significant. Similarly musical training (grade)
also did not show any significant linear relationship with intonation scores.
Language and Speech

Are some intonational forms harder to identify than others?

Finally we were interested in finding out whether particular intonational variants made
the analysis task more difficult, and whether there were differences between musicians
and nonmusicians in this respect. We focused on two aspects: (i) the presence / absence
of a tail constituent (an “allotonic” variant), and (ii) the type of nuclear tone. In the
former case, we hypothesized that when the nuclear tone is spread over an extra
syllable (the tail), it would be easier for the listener to decide on the direction of pitch
movement and thus the type of nuclear tone. In the case of nuclear tone types, our
teaching experience led us to predict that simple tones, where pitch moves in one
direction only, would be judged correctly more often than complex tones, and that
the complex tones would tend to be confused with each other.
(1) Effect of presence of tail on judging nuclear tone types
A three-way ANOVA, with musical training, presence / absence of tail and task being
the three independent variables, showed a highly significant effect only for musical
training, F ( 1, 141) = 52.6, p < .001. Presence / absence of tail just missed significance
(p =.088).
(2) Effect of type of nuclear tone
In the analysis of perceptual judgments of nuclear tone types we focused on three
main questions: (i) which nuclear tones were generally easiest / most difficult to judge;
(ii) which tones the individual target nuclear tones were typically confused with; and
(iii) whether there were any marked differences between musicians and nonmusicians
with respect to these two questions. We examined the results in terms of confusion
matrices. The bubble graphs in Figure 11 are graphical representations of confusion
matrices for musicians and nonmusicians (the actual confusion matrices are presented
in Appendix 3). The x-axis displays target nuclear tones and the y-axis the perceived
tones. The size of the bubbles reflects the proportion of perceptual judgments. The
“hits,” that is, correct perceptual judgments, are represented in Figure 11 by the
circles in bold and in tables in Appendix 3 by shaded areas.
All the individual tones were variants of two basic categories: Falls and Rises (as
defined by their final pitch direction). Our predictions were that simple tones would
be easier to perceive than complex tones, and that individual tones would be most
often confused with other tones drawn from the same basic categories. These predic-
tions were largely upheld. As can be seen in Figure 11 and the confusion matrices in
Appendix 3, Rise-Fall and Fall-Rise attracted lowest perceptual scores among both
musicians and nonmusicians. The most successfully judged nuclear tone was Low
Fall, followed by High Rise. This was again true for both groups of subjects. An
exception to our prediction about simple tones was High Fall, which was markedly
more difficult than the other simple tones for both musicians and nonmusicians, and
was even surpassed in number of hits by Rise-Fall-Rise. Low Rise was the only tone
in which musicians and nonmusicians differed very noticeably. While it seemed to
be relatively easy for musicians to perceive, it was much less so for nonmusicians.
Overall, musicians were clearly better at judging all types of nuclear tones than
nonmusicians, as illustrated in tables in Appendix 3 and graphically by the size of
the corresponding bubbles in Figure 11.
Language and Speech

Figure 11
Perceptual judgments of target nuclear tones by musicians and nonmusicians (bold
bubbles represent “hits”; HF = High Fall, LF = Low Fall, HR = High Rise, LR = Low Rise,
RF = Rise-Fall, FR = Fall-Rise, RFR = Rise-Fall-Rise, FRF = Fall-Rise-Fall, NR = no
response)
Musicians
HF
LF
HR
Perceived tones
LR
RF
FR
RFR
FRF
NR
HF LF HR LR RF FR RFR
Target nuclear tones
Non-musicians
HF
LF
HR
Perceived tones
LR
RF
FR
RFR
FRF
NR
HF LF HR LR RF FR RFR
Further analysis showed that the tones with which target tones were most
frequently confused by subjects were largely predictable. This was the case particu-
larly for musicians. As Figure 11 and Appendix 3 demonstrate, musicians tended to
misperceive High Fall as Low Fall. In fact, High Fall was misperceived as Low Fall
nearly as often as it was correctly perceived as High Fall. The next most likely tone
for High Fall to be confused with was Rise-Fall. Low Fall was confused with Rise-
Language and Speech

Fall, which was in turn most often confused with the “phantom” Fall-Rise-Fall, the
tone that was introduced in the response sheet as a distractor and to balance the set
of tones in terms of complexity. The next most likely wrong choice for Rise-Fall was
Rise-Fall-Rise. Among rising tones a similar pattern occurs: High Rise was confused
with Low Rise, Low Rise with Fall-Rise, Fall-Rise with Rise-Fall-Rise, and then with
Rise-Fall and Fall-Rise-Fall. Finally, Rise-Fall-Rise was misjudged most commonly
as Fall-Rise-Fall.
The pattern was generally similar for nonmusicians, with two exceptions (see
Appendix 3). Low Fall was confused most often with High Fall, not Rise-Fall as for
musicians, although Rise-Fall was the next most common mistake. Low Rise was
most often misperceived as Low Fall, not Fall-Rise, which was only the next in the
ranking order of misperceptions.
3 Study 2
3.1 Method
Like Study 1, Study 2 consisted of three parts: (i) a questionnaire; (ii) a set of musical
aural tests; and (iii) a set of intonation analysis tasks. However, unlike in Study 1, a
small pilot study was carried out before the investigation proper, in order to evaluate
the difficulty of the musical tasks, and to identify and eliminate tasks which were either
too hard for the subjects, or too easy (and therefore would lead to ceiling effects). There
were 10 subjects in the pilot study, whose profile of musical experience matched that
of the subjects in the main study. None of them were students at University College
London, and none had phonetic training.
The material for the intonation analysis tasks in Study 2 was different from that
in Study 1; we targeted similar skills, but introduced more variables.
3.1.1 Material
(1) Questionnaire
Subjects were asked to fill in a questionnaire, in which they indicated a number of
facts about their musical background (number of years of musical training, grade
attained, instruments played, frequency of music lessons, listening preferences) and
whether they were right / left handed.
(2) Musical aural tests
As in Study 1, the musical aural tests took the Seashore material as a starting point,
and included a pitch direction judgment task and tonal memory tests. However, the
results of the pilot study suggested that some tasks could be eliminated or modified
because they were too easy or too difficult. The material was therefore modified, as
described below.
In the pitch direction judgments a binomial test on the pilot results showed that most
subjects scored at ceiling for the easiest set (Set A; 1/2 semitone difference between
tones in pairs; see Table 1) while the majority failed in the hardest set (Set F; 1/64
Language and Speech

semitone difference). Therefore only a subset of the original material was used: Sets
B, C, D, and E, that is, 64 pairs of tones in total (see Table 1). This means that the
maximum difference between the paired tones was 1/4 semitone (2.9 Hz) and the
minimum difference was 1/32 semitone (0.4 Hz).
Tonal memory
The tonal memory tasks were partially redesigned to provide four separate tasks of
increasing difficulty. They tested essentially the same skills as in Study 1, though
with greater emphasis on “place” of change rather than direction of pitch movement.
However, in tonal memory tasks 1 – 3 described below, 3-tone melodies were not used,
since the pilot study had suggested they were too easy, and the interval between the
notes which changed in each stimulus pair was reduced from a whole tone to a semitone
in each case. Moreover, to avoid the same melodies being reused for different tasks,
the order of notes within a melody was reversed, as was the order of presentation of
the pairs, in order to avoid possible practice effects. The description of each tonal
memory task is as follows.
TM1: Tonal memory 1 was based on the same pairs of 4-tone and 5-tone melodies as
Study 1. It was a “place” task only: subjects were asked to indicate which tone
in the 2nd melody had changed.
TM2: Tonal memory 2 involved judging simultaneously on 5-tone melody pairs (i)
place: which tone in the second melody had changed, and (ii) pitch: whether
the changed tone was higher or lower than the corresponding one in the first
melody.
TM3: Tonal memory 3 again used a set of 5-tone melodies, but increased the level
of task difficulty in that two, not just one, out of five tones in the 2nd melody
changed. Subjects had to identify the changed tones (place only).
TM4: Tonal memory 4 was based on pairs of 3-tone and 4-tone melodies, where the
changed note in the second melody differed by a whole tone from the corre-
sponding note in the first. Then the 2nd sequence was transposed up a 4th. For
example, in the pairs D5 G#4 E4 and G5 C#5 B4 it is the third tone that is
changed. This transposition was explained to subjects, who were then asked
to indicate which tone in the 2nd sequence had changed and thus altered the
melody (place only). The transpositions meant that the overall pitch range used
in the stimuli was expanded upwards to G#5 — still within a female singing
range, but outside a normal speaking range.
(3) Intonation analysis tasks

Three intonation analysis tasks focused on two skills: (i) identification of nuclear
tone type and (ii) identification of the nucleus position. All stimuli were recordings
of natural speech read from an intonationally marked script by the same speaker as
in Study 1, and under the same conditions.
Nuclear tones
In the first task subjects had to identify nuclear tones in two groups of stimuli — mono-
syllabic words and short phrases. Three monosyllabic words were chosen (yes, no, why),
Language and Speech

and each was produced with high and low realizations of four nuclear tones: Fall, Rise,
Fall-Rise, and Rise-Fall (8 variants in all). Thus subjects listened to 24 monosyllables
in total. A similar design was followed for short phrases. Again there were three
different phrases (one of them is, can it be true, I might think so), each produced eight
times, each time with a different nuclear tone realization. The nucleus was always
placed on the 1st syllable, so that the nuclear tone itself was partly carried by three
tail syllables. Gaps of 3.75 s were inserted between the stimuli, and 6 s long gaps after
each group of six stimuli. Subjects marked their choice of nuclear tone on a response
sheet, choosing from the four basic categories only; unlike in Study 1, they were not
required to distinguish between low and high versions of the nuclear tones.
Nucleus position
In the second task subjects had to judge position of the nucleus in two sentences,
9 and 11 syllables long. A combination of three different nuclear positions in each
sentence (early: nucleus on 1st and only accent in the intonation phrase; mid: 1 head
accent before nucleus; and late: 2 head accents before nucleus) and four different
nuclear tones yielded 12 different renditions for each sentence, that is, 24 stimuli in
total. Gaps of 3.75 s were again inserted between the stimuli, and 6 s long gaps after
each group of six stimuli. Figure 12 shows the same utterance being produced with
the same nuclear tone (Rise) but in a different position (late vs. early).
Nuclear tone + position
In the last task subjects had to judge both nuclear position and nuclear tone type
simultaneously. As in the nuclear position task, the nucleus was either early, mid or
late in the sentence and all combinations of the positions and nuclear tone types were
used. As there were six different sentences, between 9 and 11 syllables long, the total
number of stimuli was 72. The gap between the stimuli was longer in this task since
the task was more demanding (6.25 s, with 7.3 s gaps after groups of 6 sentences).
3.1.2 Subjects
A total of 30 subjects (1 male and 29 female) volunteered to participate in this study.
They were all final year undergraduates in Speech Sciences at University College
London. They received the same amount of intonation training as part of their 1st
year phonetics course. Subjects were all native English speakers with normal hearing,
aged between 21 and 40 years (mean age 22 years).
3.1.2 Experimental procedure

The 30 subjects completed the questionnaire two months before testing. The small pilot
study described above was conducted before the main tests, but subject selection for
the pilot was informed by the musical experience data collected on the questionnaire.
For the main tests, in order to control for possible confounding factors, such as
practice effects or fatigue, the order of presentation was counterbalanced between
the music tasks and intonation tasks. Half of the subjects began with the music tasks
and half began with the intonation tasks. The order of tasks within music section
and intonation section was always as described in Method, to reflect their increasing
difficulty. Stimuli were randomized within their respective groups for each task, and
order of presentation of intonation stimuli reversed for half the subjects.
Language and Speech

Figure 12
F0 tracks of an utterance Why don’t you do something about it then. Top F0 track
shows late nucleus (Why don’t you do something about it then); bottom figure shows
early nucleus (Why don’t you do something about it then). The nuclear tone (Rise)
is the same in each case
Prior to the experiment, the instructor gave out an information sheet reminding
subjects of key concepts in the intonation analysis, with examples demonstrated live.
Each task was preceded by spoken and written instructions and subjects were also
given several practice items. The stimuli were played from a CD player via speakers.
Subject marked their judgments on response sheets in a multiple-choice layout. The
whole procedure took about 40 – 45 mins to complete.
Language and Speech

3.2 Results
The results of Study 2 will be presented under the same research question headings as
in Study 1. Using data supplied on the questionnaire, we followed the procedure used
in Study 1 to divide subjects into ‘musicians’ and ‘nonmusicians’, that is, to be classified
as a musician in our study, a subject had to achieve at least Grade 3 in an ABRSM
examination. On this criterion, there were 15 musicians and 15 nonmusicians.
As in Study 1, we first examined the relationship between subjects’ musical training and
their musical aptitude, as tested by our pitch direction judgments and tonal memory
tasks, using correlation analysis.
We found a highly significant correlation between the grade achieved and the
overall musical scores (r = .573, p < .001), grade and judgments of place of change in
TM2 (r = .599, p < .001), and grade and TM3 (r = .627, p < .001). There was a weaker
correlation between grade and pitch direction judgments in pairs of tones (r = .418,
p = .022) and no significant correlation between grade and both TM4 and judgments
of pitch direction change in TM2.
Further analysis focused, as in Study 1, on comparison between musicians and
nonmusicians. Figure 13 summarizes the results for pitch direction judgments and
for those tonal memory tasks focusing on place only, that is, TM1, 3 and 4. Figure 14
presents results for the two aspects of TM2: simultaneous judgments of place of pitch
change and direction of pitch change. These two aspects were treated separately in
the statistical analysis.
Figure 13
Relationship between
musical training and
performance in music
tasks (excluding TM2)
One-way ANOVA showed that, apart from TM4 and judgments of pitch direc-
tion in TM2, musical training was a highly significant factor in all musical tests, with
musicians outperforming nonmusicians; pitch direction judgments, F ( 1, 28) = 8.4,
p = .007; 20% of variance accounted for; TM1, F(1, 28) = 15.0, p = .001; 33% of variance
Language and Speech

100 Figure 14
70 components of TM2
S c o re [% ]
60
50
40
Task
30
20 pitc h judgments
10
0 plac e judgments
music ian non-music ian
M us ic a l t ra ining
accounted for, judgments of place of pitch change in TM2, F(1, 28) = 18.9, p < .001; 38%
of variance accounted for, TM3, F ( 1, 28) = 20.5, p < .001; 40% of variance accounted
for. Musicians and nonmusicians were most clearly distinguished in the pitch direction
judgments (mean scores 78% and 65%, respectively), judgments of place of pitch change
in TM2 (89% vs. 60%) and in TM3 (89% vs. 64%), with little or no overlap in the score
distribution for the two groups. Differences between the two groups in perceiving
the direction of pitch change did not reach significance. As in Study 1, in the pitch
direction judgments several subjects performed well above chance level even for the
smallest interval. Surprisingly, the highest achieving subject was a nonmusician as
indicated by the most extreme datapoint in Figure 13.
TM1 was intended to be the easiest of the tonal memory tasks, and the highest
scores among the tasks for both groups confirmed this. TM4 was designed to be the
hardest, and again, the results confirmed this for both musicians and nonmusicians.
As Figure 13 shows, this task in fact proved to be similarly difficult for the two groups.
Figure 14 shows that musicians found judgments of place of pitch change significantly
easier than judgments of direction of pitch change.
As in Study 1, we also examined differences between musicians and nonmusicians
with respect to the effect of melody length on the judgments of place of pitch change.
In TM1 subjects had to judge which tone changed in 4-tone and 5-tone melodies, and
in TM4, the melodies were 3-tone and 4-tone long, but judgments had to be made on
a transposed melody.
A two-way ANOVA on TM1 yielded a highly significant effect for both musical
training, F(1, 57) = 22.2, p < .001, and melody length, F(1, 57) = 15.1, p < .001, accounting
for 37% of variance in the data. The interaction between the two was not significant.
Unsurprisingly 4-tone melodies were easier for both groups of subjects, with musicians
performing at ceiling (98%), while nonmusicians showed quite a wide distribution
(mean score 81%), see Figure 15. For 5-tone melodies, the distribution was wide for
both groups, but musicians were more successful (84% vs. 67%).
Language and Speech

100 Figure 15
melody length in TM1
70
Score [%]
60
50
40
Melody length
30
20 4-tone
10
0 5-note
Musical training
In TM4, there was a significant effect of melody length, F ( 1, 58) = 24.3, p < .001,
with 4-tone melodies being more difficult than 3-tone melodies, as one would expect,
but musical training was not significant and neither was the interaction between the
two factors. Results are presented in Figure 16. Despite the fact that this task was
most difficult for both musicians and nonmusicians, as shown above, the majority
of the subjects performed above the chance level (33% for 3-tone melodies, and 25%
for 4-tone melodies), with several performing substantially better than others in both
groups, as shown by the outliers.
100
Figure 16
90
Relationship between
70 melody length in TM4
Score [%]
60
50
40
Melody length
30
20 3-tone
10
0 4-tone
Musical training
Language and Speech


analysis tasks?
The subdivision of subjects into ‘musicians’ and ‘nonmusicians’ was again used in
the analysis of performance in the intonation tasks. Nuclear tone judgments were
performed on monosyllabic utterances and short phrases. In both sets of stimuli,
the ANOVA revealed significantly better scores for musicians than nonmusicians,
though all subjects performed above the chance level of 25%. In monosyllabic utter-
ances, F ( 1, 28) = 12.3, p = .002, 31% of the variance was accounted for by musical
training, and in short phrases, F ( 1, 28) = 8.9, p = .006, 24%. As Figure 17 illustrates,
the distribution of scores was largely overlapping between monosyllabic utterances
(no tail component) and short phrases (nucleus + tail), both within musicians and
within nonmusicians, indicating that the length of utterances over which the nuclear
tone stretched did not make much difference to the performance.
100 Figure 17
80 nuclear tone judg-
70 ments in monosyllabic
Score [% ]
utterances and short

60 phrases
50
40
Stimuli
30
20 Monosyllables
10
0 Short phrases
music ian non-musician
M usical training
The effect of musical training on judgments of nuclear position in sentences just

missed significance, F ( 1, 28) = 3.9, p = .057, with a mere 12% of variance accounted
for. The mean score for musicians was 84% and for nonmusicians 76%, in both cases
well above the chance level (33%, as there were 3 nuclear positions to choose from:
early, middle, and late).
Finally, we compared the performance of musicians and nonmusicians in the
task in which subjects had both to identify nuclear tone type and to locate the posi-
tion of the nucleus in sentences. In both nuclear tone judgments and nuclear position
judgments musical training was significant: tone judgments, F ( 1, 28) = 8.0, p = .008,
with 22% of variance accounted for; position judgments, F ( 1, 28) = 5.1, p = .031, with
16% of variance accounted for. Figure 18 illustrates the results. Musicians clearly
performed better than nonmusicians in both aspects of the task. Both groups found
Language and Speech

nuclear position judgments easier than nuclear tone judgments, but the difference
was more marked for nonmusicians.
100 Figure 18
80 simultaneous judg-
70 ments of nuclear tones
S core [% ]
and nuclear position in

60
sentences
50
40
Task
30
20 Nuc lear plac e
10
0 Nuc lear tone
music ian non-music ian
M usical training
Relationship between musical training, musical aptitude and intonation analysis skills
As in Study 1 we first looked at an overall picture. We again found a highly significant
positive correlation between the overall music scores and the overall intonation scores,
see Figure 19 (r = .632, p < .001; R 2 = .400), and likewise between the grade and the
overall intonation scores, see Appendix 4 (r = .515, p = .004; R 2 = .265).
240 Figure 19
220 Correlation between
overall music scores

200 and overall intonation
scores: Scatter plot with
180 a line of best fit
160
140
120
100
80
60
40 60 80 100 120 140
Overall music scores

Language and Speech

To see whether the highest musical grade achieved or any specific music task can
be shown to be a significant predictor of performance in intonation tasks, we again
used a stepwise multiple regression analysis. A separate analysis was conducted for
each intonation task, with the dependent variable being scores for the task in ques-
tion. The independent factors were always musical experience (grade), pitch direction
judgment scores and the scores from each tonal memory task.
The results for nuclear tone judgments were as follows (see Table 2). For all
types of stimuli, that is, monosyllabic utterances, short phrases and the sentences in
which subjects had to judge simultaneously nuclear position and tone, the strongest
predictor was judgment of place of pitch change in TM2. For short phrases it was in
fact the only significant predictor. In the case of monosyllabic utterances and tone
judgment scores in sentences, scores from TM4 (judgments of place of pitch change)
were also a significant predictor.
Table 2
Results of multiple regression on scores from nuclear tone judgment tasks
Stimulus Significant predictor(s) Amount of variance accounted for
TM2: place (t = 5.9; p < .001)

Monosyllables 53% (TM2), 61% (TM2 + TM4)
TM4 (t = 2.4; p = .025)
Short phrases TM2: place (t = 4.9; p < .001) 47%
Sentences TM2: place (t = 4.6; p < .001)

40% (TM2), 52% (TM2 + TM4)
(tone judgments) TM4 (t = 2.5; p = .017)
For nuclear position judgments the results differed for the sentences in which
subjects were asked only to mark position of the nucleus and for the sentences in
which they had to mark both nuclear position and tone. In the former case, none
of the independent variables was a significant predictor. In the latter case, the best
predictor was again TM2 — judgments of place of pitch change only (t = 3.8; p =.001;
34% of variance accounted for).
Musical experience and pitch direction judgment scores showed no significant
relationship with any of the individual intonation task scores.
Are some intonational forms harder to identify than others?
Above we demonstrated that musicians were more successful than nonmusicians across a
range of intonation tasks. In this section we shall examine all these tasks in more detail.
(1) Nuclear tone judgments

As in Study 1, the analysis of perceptual judgments of nuclear tone types focused on
three main questions: (i) which nuclear tones were generally easiest / most difficult
Language and Speech

to judge, (ii) which tones the individual target nuclear tones were typically confused
with, and (iii) whether there were any noticeable differences between musicians and
nonmusicians with respect to these two questions. For both monosyllabic utterances
and short phrases we again used confusion matrices and their graphical representa-
tion in terms of “bubble graphs” (see Figs. 20 and 21 and Appendices 5 and 6). In
Study 2, however, subjects used a broader classification of tones (4 basic categories
only, with no differentiation between High and Low variants), so the task was not
fully comparable with Study 1.
In both monosyllabic utterances and short phrases, musicians had higher scores
than nonmusicians across all nuclear tones that were targeted. However, the confusion
matrices in tables in Appendices 5 and 6 demonstrate that the overall pattern for the
two groups was similar. Simple tones were easier to perceive and there was generally
little to choose between the hit rates for Rise and Fall. Rise-Fall was the hardest tone
to perceive for both groups of subjects and both groups of stimuli.
In terms of confusions in monosyllabic utterances (Fig. 20 and Appendix 5),
simple tones tended to be confused between each other (Rise for Fall and vice versa)
in both musicians and nonmusicians. For complex tones the picture is somewhat
less clear. Only one observation stands out: nonmusicians confused Fall-Rise with
Rise-Fall.
Musicians
Rise
Perceived tones
Fall
Fall-Rise
Rise-Fall Figure 20
Monosyllabic utterances:
Perceptual judgments of
Rise Fall Fall-Rise Rise-Fall

target nuclear tones by
musicians and nonmu-
sicians (bold bubbles
Non-musicians
represent “hits”)
Rise
Perceived tones
Fall
Fall-Rise
Rise-Fall
Language and Speech

In short phrases (Fig. 21, Appendix 6) there were no apparently prevalent confu-
sions of any type for the simple tones. Among the complex tones, there was a tendency
to confuse Rise-Fall with Fall by both musicians and nonmusicians. However, in the
case of Fall-Rise the misperception was likely to be towards Rise-Fall, not Rise.
Musicians
Rise
Perceived tones
Fall
Fall-Rise
Rise-Fall
Figure 21
Short phrases: Perceptual
judgments of target
Rise Fall Fall-Rise Rise-Fall nuclear tones by musicians
Target nuclear tones and nonmusicians (bold
bubbles represent “hits”)
Non-musicians
Rise
Perceived tones
Fall
Fall-Rise
Rise-Fall

(2) Nuclear position judgments

In the task in which subjects had to judge nuclear position in sentences we were
interested to find out how the success rate of individual nuclear positions related
to types of nuclear tone, and whether musicians and nonmusicians were noticeably
different in this respect. The analysis was based on general observations, rather than
on statistical results, as the data set was too small for a meaningful statistical analysis
(there were only 2 tokens of each nuclear tone type per nuclear position). The results
are presented in Figure 22.
For both musicians and nonmusicians there was a tendency for the nucleus
placement to be best identified in the midposition (when preceded by a single head
accent). The analysis also indicated that in general identifying an early nuclear posi-
tion was most difficult, particularly if the nuclear tone was a Rise.
Language and Speech

Figure 22
Perceptual judgments of nuclear position across different nuclear tone types
Musicians
100
90 Rise
80 Fall
70 Fall-Rise
Score [%]
60 Rise-Fall
50
40
30
20
10
0
Early nucleus Mid nucleus Late nucleus
Nuclear position
Non-musicians
100
Rise
90
80 Fall
70 Fall-Rise
Score [%]
60 Rise-Fall
50
40
30
20
10
0
Early nucleus Mid nucleus Late nucleus
Nuclear position
(3) Simultaneous judgments of nuclear position and nuclear tone type

Finally, we investigated more closely the task in which subjects had to judge simul-
taneously the nuclear position and nuclear tone type in sentences. The focus was on
the following questions: (i) is it easier to determine the place of the nucleus when it
comes earlier rather than later in a sentence?; (ii) are some nuclear tones easier to
classify than others?; (iii) is there any interaction between the nuclear position and
type of nuclear tone, that is, are certain nuclear tones easier to perceive in certain
positions in the sentence?; and (iv) do musicians and nonmusicians behave similarly
with respect to these issues?
In the first part of the analysis we compared musicians and nonmusicians with
respect to the judgments that were correct both in terms of nuclear position and type
of nuclear tone. The results are presented in Figure 23. It was apparent that whereas
musicians out-performed nonmusicians with respect to all nuclear tone types and
Language and Speech

positions, the two groups are remarkably similar with regard to the effect of both
nuclear position and nuclear tone type on the success rate. Fall-Rise was the most
difficult tone for all subjects to classify correctly, irrespective of the position of the
nucleus. The simple tones, Rise, and Fall, were most successfully identified when
the nucleus was placed early in the sentence; the success rate in medial position was
less good and late position the worst. For the simple tones a longer tail appears to
have been an advantage. Rise-Fall manifests the opposite tendency — worst success
rate for early nuclear position and better in middle and late positions. In one respect
nonmusicians differed from musicians: in middle and late nuclear positions nonmu-
sicians were most successful in identifying falling tones (Fall and Rise-Fall), while
musicians showed comparable success in identifying the Rise.
Figure 23
Percentage of correct judgments (both position of nucleus and nuclear tone) by musicians
and nonmusicians: Breakdown according to position and tone
Early nucleus M id nucleus

100 100
Correct judgements [%]
90 90
80 80
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
Rise Fall Fall- Rise- Rise Fall Fall- Rise-
Rise Fall Rise Fall
Nuclear tone Nuclear tone
Late nucleus
100
Correct judgements [%]
90
80
70
60 Musicians
50
Non-musicians
40
30
20
10
0
Rise Fall Fall- Rise-
Rise Fall
Nuclear tone
Language and Speech

Repeated measures ANOVA was used to test the effect of nuclear position,
type of nuclear tone, and the effect of musical training on percentage scores from
perceptual judgments. The first two factors were within-subject factors, while musical
training was a between-subject factor. The statistical analysis confirmed the general
observations described above. All main factors were significant: nuclear position,
F ( 1.6, 46) = 8.2, p =.002, nuclear tone, F ( 2.8, 78) = 13.9, p <.001, musical training,
F(1, 28) = 9.2, p =.005. The interactions position × musical training and tone × musical
training were nonsignificant, confirming that the effects of nuclear position and tone
on success rate follow the same pattern in musicians and nonmusicians. The interaction
position × tone was, however, significant, F ( 5.7, 160) = 4.6, p <.001, indicating that
different nuclear tones attracted different scores in different nuclear positions. A post
hoc Bonferroni test revealed that this overall significant effect for the interaction was
due to the following pairwise significant effects: both simple tones were significantly
different from the complex tones in early nuclear position; the complex tones were
significantly different from each other in the late nuclear position; in the case of Fall
the early position was significantly different both from medial and late positions; in
the case of Rise, only early and late positions were significantly different from each
other. As the interaction position × tone × musical training was not significant, these
observations apply to both musicians and nonmusicians.
In the second part of the analysis we focused on the perceptual judgments that
were only partially correct, that is, subjects marked correctly nuclear position but
not the type of nuclear tone or vice versa. Figure 24 presents the breakdown of all
perceptual judgments, separately for musicians and nonmusicians. For each nuclear
position and tone type it gives the proportions of judgments that were correct both for
position and tone (so this graph incorporates the results presented in Fig. 23 above),
proportions of judgments in which only position was judged correctly, judgments
with only tone correct, and finally proportion of judgments in which neither position
or type of nuclear tone were correct.
While for musicians the proportion of judgments in which both nuclear position
and tone were correct was clearly predominant for all tone types and positions, for
nonmusicians this was the case only for simple tones in the early nuclear position, and,
less markedly, for falling tones in the late nuclear position. In the majority of cases,
nonmusicians were able to judge only nuclear position correctly. Perhaps unsurpris-
ingly, it was least likely for both musicians and nonmusicians to judge correctly the
nuclear tone alone; in other words, if the subjects perceived nuclear tone correctly
they were likely to have judged its position correctly too.
In the first part of the analysis we discussed how the correct judgments of both
nuclear position and tone are distributed with respect to the target tones and posi-
tions. A similar analysis of the judgments in which only nuclear position was judged
correctly yields a mirror image to this. This is because the two outcomes complement
each other.
Language and Speech

Figure 24
Breakdown of perceptual judgments according to nuclear placement and type of nuclear
tone for musicians and nonmusicians
EARLY POSITION
Musicians Non-musicians
100% 100%
Perceptual judgements
90% 90%
80% 80%
70% 70%
60% 60%
50% 50%
40% 40%
30% 30%
20% 20%
10% 10%
0% 0%
Rise Fall Rise Fall
Target tone Target tone
MIDDLE POSITION
100% 100%
90% 90%
80% 80%
70% 70% Neither correct
60% 60% Tone correct
50% 50%
40% 40% Position correct
30% 30% Both correct
20% 20%
10% 10%
0% 0%
Rise Fall Rise Fall
LATE POSITION
100% 100%
90% 90%
80% 80%
70% 70%
g
60% 60%
50% 50%
j
40% 40%
30% 30%
20% 20%
10% 10%
0% 0%
Rise Fall Rise Fall
Language and Speech

4 Discussion
One of our main aims was to investigate the relationship between musical training and
intonation analysis skills. Our preliminary binary division of subjects into ‘musicians’
and ‘nonmusicians’, on the basis of passing ABRSM Grade 3, the level where aural
tests include a tonal memory task, showed that musicians outperformed nonmusicians
across nearly all tasks, both musical and intonational, in both studies. This supported
our hypothesis that those with musical training would perform better in both sets
of tasks.
4.1 Musical training and musical aptitude

We designed a number of musical tasks to ascertain the actual musical skills of all
our subjects and then used the results to determine the relationship between musical
training and actual musical skills. The musical tasks involved pitch direction judg-
ments and tonal memory tasks, two aspects of musical skills that we hypothesized
might be useful in analyzing intonation.
In both studies pitch direction judgments showed a highly significant better
performance by musicians, with very little overlap in scores between the two groups.
Several subjects, mainly musicians, showed remarkable sensitivity in this task, out-
performing even the reported JND for complex tones (Faulkner, 1985; Stock &
Rosen, 1986). In the tonal memory tasks, the discrepancy between musicians and
nonmusicians became more apparent as the level of difficulty increased. For the
simplest tasks, musicians and nonmusicians performed equally well. When the pitch
difference between the two tones that changed in two paired melodies was decreased
from a tone to a semitone, the musicians performed clearly better than nonmusicians,
as they did when the sequence involved a longer sequence of tones, suggesting that
their short-term memory was more efficient.
However, the tonal memory task in which the second melody was transposed
(Study 2, TM4) proved equally difficult for both musicians and nonmusicians. We can
speculate that the listening strategy involved in this task was different in kind from
the equivalent tasks without transposition. In the transposed cases listeners have to
have some awareness of the overall shape of the tonal sequence and the relationship
between the different notes within the melody. In the other tasks the concentration
could be solely on memorizing individual pitches. This seems to be a skill that was
not enhanced by musical training at the threshold level we used.
We cannot conclude from our study that musical training is solely responsible
for generally better performance by musicians in the musical aural tests. We have to
allow for natural musical ability and / or musical experience, not supported by musical
training. Indeed one subject (in Study 2) reported considerable musical experience
without formal training (and was therefore classed as a nonmusician in our study),
and achieved very high scores. Conversely, a few subjects with musical training did
not perform as we would expect on musical tests. Furthermore, it is likely that it is
natural musical aptitude that leads people to take up musical training in the first
place.
Language and Speech

4.2 Musical training and intonation analysis skills

Musicians and nonmusicians were equally competent in their ability to judge whether
two lexically identical sentences had the same or different intonation contours. This
was a useful benchmark to establish. It must be remembered that the stimulus pairs
used were designed to be completely different in their intonation (different head,
different nucleus); we were not dealing with more closely controlled pairs with some
very small difference.
For the tasks that involved only judging nuclear tone type for a given nuclear
position, a significantly better performance by musicians was confirmed in both
studies. This was the case for all types of stimuli that we used, that is, monosyllabic
utterances, short phrases and longer sentences. In Study 1 we had predicted that by
manipulating pairs of sentences in such a way that the nuclear tone remained the
same in both versions while the prenuclear contour changed, subjects might find the
nuclear tone identification easier. However, this did not turn out to be the case. In
fact, the group of stimuli designed in this way did not show any significant differences
from the other two groups of stimuli. In Study 2 monosyllabic utterances and short
phrases presented equal difficulties for both groups of subjects.
In the task in which nuclear tones varied but the subjects had to judge only
nuclear position (Study 2), musical training showed only a weak significant effect. It
is possible that a stronger result would be achieved with a larger dataset.
Finally, when subjects had to judge simultaneously both nuclear tone type and
position musicians once again outperformed nonmusicians in both aspects of the
task. The success rate for place judgments was higher in both groups of subjects
than for tone identification (but more obvious in nonmusicians). This is not entirely
surprising, as correct identification of the tone would seem to depend on knowing
where the relevant part of the contour is located.
4.3 Musical aptitude and intonation analysis skills

In both studies, one ranking of the musical ability of our subjects could be made
according to their overall results on the musical aural tests, whereas a different
ranking (reflecting experience and degree of training) could be made according to
the highest ABRSM grade achieved. In both studies we found that relating either
of these measures to overall scores for intonation analysis yielded a strong positive
correlation. As this analysis cannot shed light on which aspects of musicianship were
most strongly associated with which intonation tasks, we conducted a more detailed
analysis.
The results of this analysis were, however, inconclusive. While in Study 1 only
the ability to judge pitch direction in pairs of tones was a significant predictor for the
judgments of nuclear tone type, in Study 2 pitch direction judgment scores turned
out to have no significant relationship to any of the intonation tasks. This may be
partly due to the fact that in Study 2 subjects were tested only on a subset of stimuli
(those where the interval between the 2 stimuli in a pair was less than 1/4 tone and
more than 1/128 tone). Musical grade was not significant in either study.
In Study 2 there were two types of intonation tasks — tone judgments and nuclear
position judgments. In all three sets of stimuli the best predictor of tone judgment
Language and Speech

scores was scores in tonal memory 2 relating to the place of tonal change. In the case
of monosyllabic utterances tonal memory 4 (judgments of place of tonal change in a
transposed melody) was also a significant predictor. An important difference between
the sentence stimuli in each study was that the nucleus was in a fixed position in
Study 1 (always final), whereas in Study 2 it varied between three different locations.
Moreover, there was also a difference, partly connected to the above, in tail length.
While in Study 1 the maximum tail length was one syllable, in Study 2 it was at least
three syllables. This may also be partly responsible for the different results of the two
studies. For nuclear position judgments the same tonal memory tasks (TM2 place
and TM4) were the best predictors.
We had predicted that there might be some link between pitch judgments in
music tasks and nuclear tone identification on the one hand, and between judgments
of place of tonal change and nuclear position judgments on the other. However,
this has not been confirmed. A more important factor might be complexity
of task. The two tonal memory tasks that were the best indicators of intonation
analysis skills were both the tasks in which subjects either had to perform two tasks
simultaneously, or had to deal with transposition of the melody. In the former
case, locating the place of the nucleus / the place of change in the tonal memory
tasks are crucial prerequisites to identification of the nuclear type/pitch direction
change.
4.4 Relative difficulty of intonation analysis skills

The intonation tasks we set the subjects were varied. They included simple same / dif-
ferent contour judgments, identification of specific nuclear tones and their position in
the intonation phrase. Our prediction was that the same / different contour judgments
would be the simplest task of all, and this prediction was confirmed. We predicted
that nuclear tone identification would depend to some extent on correctly locating
the nucleus. This was confirmed by the results of simultaneous judgments of nuclear
position and tone in Study 2.
In the tonal identification there were the following variables: presence / absence
of a tail, nuclear tone type on its own, and nuclear tone type in interaction with
nuclear position. In Study 1 we controlled for the presence / absence of one-syllable
tail. There is an apparent but nonsignificant tendency for the tonal identification to be
easier when there is another syllable to carry the tonal contour. This trend, certainly
unsurprising, might be stronger in a larger dataset. Study 2 explored the different
position of the nucleus in the intonation phrase in terms of early, mid and late. Only
with one exception the nucleus in all positions was followed by a tail, and by definition,
the earlier the nucleus occurred, the longer the tail. When subjects were required to
locate the nucleus but not to identify the tone, early position within the intonation
phrase appeared to be a disadvantage for both musicians and nonmusicians, while
midposition seemed to have some advantage over both early and late positions.
When subjects had to identify both nuclear position and nuclear tone in the same
task, the picture was slightly different. Simple falls and rises were more successfully
identified in earlier positions by both musicians and nonmusicians, while complex
tones did not demonstrate any consistent behavior. The difference in the results of
these two tasks might reflect the fact that subjects had to focus explicitly on pitch
Language and Speech

direction in the second task. An earlier nucleus with simple fall or rise has a simple
predictable shape over the tail syllables, which may reinforce the identification of
the tone. Conversely, when the nucleus occurred later in the sentence, identification
of the nuclear tone may have been made more difficult by the pitch pattern over the
preceding syllables.
Analysis of the relative difficulty of identifying individual nuclear tone types
brought the following observations. In Study 1 the classification of tones was fine-
grained, while in Study 2 it was simplified in that subjects were not required to
differentiate between high and low versions of the simple tones and complex tones
comprised Rise-Fall and Fall-Rise only.
We predicted that simple tones would be easier to perceive than complex tones
and that falling tones would be confused with falling tones and rising tones with
rising tones. This prediction was confirmed.
In Study 1, the most successfully judged nuclear tone by both musicians and
nonmusicians was Low Fall, followed by High Rise. This may seem counterintuitive
since they involve relatively narrow pitch movement, compared with High Fall and
Low Rise. However, the low starting point of the Low Fall and correspondingly high
starting point of the High Rise may have reinforced subject’s perceptions of falling
and rising tones. Our teaching experience suggests that there is a common tendency
to equate ‘high’ with ‘rise’ and ‘low’ with ‘fall,’ which may explain this result.
The distinction between Low Fall and High Fall depends on the listeners’ judg-
ment about the relative height of the starting point of the tone, whereas the distinction
between simple Fall and Rise-Fall depends on the location of the pitch peak. High
Fall was most regularly confounded with other varieties of fall, by both musicians
and nonmusicians, though the pattern among nonmusicians is more random. The
difficulty subjects experienced with identifying Fall-Rise may be explained by the
presence of the option of the Rise-Fall-Rise, with which this tone was most frequently
confused. Even simple Fall-Rises in context are often preceded by a syllable on a
lower pitch, which may have influenced this confusion.
Study 2 was similar to Study 1 in that simple tones attracted higher scores than
complex tones. In the broad classification of tones used in this study simple rises and
falls were equally well identified. The Rise-Fall was marginally worse identified than
the Fall-Rise and tended to be confused with simple Fall. However, Fall-Rise tended
to be confused with Rise-Fall, particularly among nonmusicians. This fits in with our
classroom experience, which suggests that complex tones are relatively well identified
as such, but the details of the internal structure of the tone are less reliably perceived.
In this area musicians again performed better than nonmusicians.
4.5 Implications and future directions

Any analysis of music will involve study of its melodic and rhythmic form; these formal
characteristics are essential to its very nature. Whether or not we set out to acquire
skills of musical analysis is very much a matter of choice, and it would appear that
people vary widely in their musical analytic ability (Ayotte et al., 2002). However,
Peretz and Hyde (2003) claim that, with prolonged exposure, the ordinary listener
becomes a kind of musical expert, without realizing it, and Tillmann, Bharucha,
Language and Speech

and Bigand (2000) suggest that musical training or explicit learning of music theory
is unnecessary to acquire sophisticated knowledge of the syntax-like relationships
among tones, chords, and keys.
In processing speech, on the other hand, our most essential task as listeners is
to recover what we believe to be the communicative meanings associated with the
utterance. Conscious attention to the particular melodic and rhythmic structures with
which the message is delivered is a secondary consideration, which may again be seen
as a matter of choice. People again seem to vary widely in their ability to isolate the
forms of intonation — in other words, to abstract away from its functional aspect. On
the other hand, the ability to produce and interpret intonation functionally seems
to be an unproblematic part of one’s linguistic ability. In normal circumstances we
have no need of phonetic training in order to exploit intonation’s communicative
potential.
It is the linguistic function of intonation, its contribution to linguistic and
paralinguistic meaning, that guides our interest in the formal analysis of intonation,
just as we are interested in the phonetics and phonology of segments and syllables
because they are the building blocks of words and utterances conveying propositional
meaning. In the same way, study of and training in formal analysis of intonation is a
proper part of linguistic endeavor. But training in the formal analysis of intonation is
also important from the practical point of view, for example in clinical applications, as
it provides a tool for describing atypical production and / or perception of intonation.
In a context where phonetic competence is important (e.g., in speech and language
therapy), skills of intonation analysis must form a part of that competence.
The present study confirms a strong relationship between skills of musical and
of intonation analysis. Previous evidence (Schön, et al., 2004) showed that musical
training / aptitude increases subjects’ ability to detect incongruities in intonation. One
possible interpretation of our findings is that musical training may enhance explicit
analytical abilities in intonation. The wide range of ability found in the capacity to
analyze the forms of intonation reflects the wide range of musical analytic abilities
in the population. This would seem to have pedagogical implications. Our study
may imply that incorporating musical training into phonetic training could enhance
students’ ability to deal with intonation analysis. However, it is conceivable that our
subjects classed as ‘musicians’ outperformed ‘nonmusicians’ simply due to having
a natural aptitude for music that led them to musical training in the first place.
Furthermore, being exposed to musical training incorporates training in using one’s
ears analytically. The contribution of musical aptitude to intonation analysis may
thus be incidental, rather than causal. Pedagogical research is needed to establish
whether improved performance by nonmusicians in intonation analysis could be
more successfully achieved by either additional aural training in intonation analysis
itself, or by musical training.
Although we showed a clear relationship between musical training (and musical
aptitude) and skills in the analysis of intonation, our research has highlighted the diffi-
culty of finding particular musical tasks which tap into the skills specific to analyzing
intonation. Both skills involve the perception of melody, but musical and speech
stimuli have important differences. As we have already observed, musical melodies
Language and Speech

in our tasks are composed of sequences of discrete notes, whereas spoken intonation
has continuously changing pitch, and, moreover, is associated with communicative
meaning. This must affect the way we listen to the two kinds of signal. Our prediction
that there would be an apparent correlation between judgments of position of pitch
change in tonal memory and judgments of nuclear position and likewise between
judgments of pitch direction in changed tone and judgments of nuclear tone type
was too simplistic. Whether this is due to the task design or to the differences in the
kind of stimuli or the type of listening is an open question.
In future research on this topic it would be worthwhile to design music and into-
nation stimuli and tasks which would be even more closely comparable. For example,
music tasks could include same / different melody judgments, as both this task and
same / different intonation contour judgment are likely to involve a similar type of
listening, namely holistic rather than analytical. Furthermore, same / different judg-
ments on pairs of melodies which are identical apart from being in a different key could
be compared with same / different judgment of intonation contours that are produced
by speakers with different pitch ranges. This would tap into the normalization skills
in perception of pitch in speech. We need to investigate further the type of listening
which is a precursor to the analysis of the nuclear positions and tones. The formal
intonation system that we teach assumes the ability to discriminate pitch in consecutive
syllables, something that we have not directly investigated in our study. The musical
ability of subjects, as reflected in training and / or musical aptitude tests, could further
be correlated with their ability to analyze, say, low-pass filtered speech, from which
segmental information, and thus meaningful linguistic content, has been removed,
forcing subjects to focus on the formal patterns themselves. Alternatively, nonsense
syllables could similarly be used to eliminate linguistic meaning. Some attempts to
make stimuli in a discrimination task more comparable between music and speech
have been made by, for example, Patel et al., (2005). In their study of discrimination
of intonation contours in tone-deaf individuals they included speech, discrete pitch
analogs and gliding-pitch analogs of the original intonation contours.
The intonation tasks in our studies are based on a “British school” analysis using
contour tones. Other systems of analysis are of course widely used. We would suggest
that similar skills would be needed to perform an auditory analysis, whatever the
model; but it may be invalid to assume that all models pose equal difficulties. Whether,
for example, the auditory identification of H* (high) and L* (low) pitch accents,
together with high and low boundary tones, following a ToBI model (Silverman,
Beckman, Pitrelli, Ostendorf, Wightman, Price, Pierrehumbert, & Hirschberg, 1992)
is any easier is a matter for empirical investigation.
There are many further questions that need to be asked. For example, is there
any connection between rhythmic skills in musical aural tasks and the perception
of prominence in speech? (Some recent research suggests that such a connection
exists, e.g., Overy, Nicolson, Fawcett, & Clarke, 2003.) Do musical production skills
affect performance in intonation production tasks? Are musicians affected by task
complexity differently from nonmusicians? Do musicians and nonmusicians process
music and intonation analysis tasks differently?
Language and Speech

In addressing many of these questions, and the last question above in particular,
it would be valuable to link perceptual investigations to studies in the area of brain
research, specifically using fMRI or PET scans. Some recent research has started
examining the issue of neuroanatomical and neurofunctional differences between
musicians and nonmusicians. Early evidence suggests that certain differences do
exist in both respects (e.g., Schön et al., 2004), and may also be affected by the age
at which musical training had started (Schellenberg, 2001). The present study has
required subjects to use their powers of analysis of both musical and spoken sequences.
Neuroscientific research has not so far, as far as we know, investigated which areas
in the brain are involved in formal auditory analysis of speech intonation contours.
The question arises whether the neural resources employed for this type of analysis
are shared with analytical tasks involving musical stimuli. In the light of the results
reported in this paper, it would be interesting to investigate whether people with and
without musical training / experience show any differences in the neural resources
used in these tasks.
manuscript received: 05. 23. 2005

manuscript accepted: 06. 09. 2006
References
ALTENMÜLLER, E. O. (2001). How many music centers are in the brain? In R. J. Zatorre & I.
Peretz (Eds.), The biological foundations of music (Annals of New York Academy of Sciences,
vol. 930) (pp. 273 – 280). New York: New York Academy of Sciences.
AYOTTE, J., PERETZ, I., & HYDE, K. (2002). Congenital amusia: A group study of adults
afflicted with a music-specific disorder. Brain, 125(2), 238 – 251.
BESSON, M., & SCHÖN, D. (2001). Comparison between language and music. In R. J. Zatorre
& I. Peretz (Eds.), The biological foundations of music (Annals of New York Academy of
Sciences, vol. 930) (pp. 232 – 258). New York: New York Academy of Sciences.
BEVER, T. G., & CHIARELLO, R. J. (1974). Cerebral dominance in musicians and nonmusicians.
Science, New Series, 185(4150), 537 – 539.
BLUMSTEIN, S., & COOPER, W. E. (1974). Hemispheric processing of intonation contours.
Cortex, 10, 146 – 158.
BOLINGER, D. (1985). Intonation and its parts: Melody in spoken English. London: Edward
Arnold.
CHEEK, J. M., & SMITH, L. R. (1999). Music training and mathematics achievement. Adolescence,
34, 759 – 761.
CRUTTENDEN, A. (1986 / 1997). Intonation. Cambridge: Cambridge University Press.
DELATTRE, P. (1966). Les dix intonations de base du français. French Review, 40, 1 – 14.
EMMOREY, K. D. (1987). The neurological substrates for prosodic aspects of speech. Brain
and Language, 30, 305 – 320.
FAULKNER, A. (1985). Pitch discrimination of harmonic complex signals: Residue pitch or
multiple component discriminations? Journal of the Acoustical Society of America, 78(6),
1993 – 2004.
FOXTON, J. M., DEAN, J. L., GEE, R., PERETZ, I., & GRIFFITHS, T. D. (2004). Characterization
of deficits in pitch perception underlying “tone deafness”. Brain, 127(4), 801 – 810.
GEIGENBERGER, A., & ZIEGLER, W. (2001). Receptive prosodic processing in aphasia.
Aphasiology, 15(12), 1169 – 1188.
Language and Speech

HETLAND, L. (2000). Learning to make music enhances spatial reasoning. Journal of Aesthetic
Eduation, 34(3 – 4), 179 – 238.
HO, Y.-C., CHEUNG, M.-C., & CHAN, A. S. (2003). Music training improves verbal but not
visual memory: Cross-sectional and longitudinal explorations in children. Neuropsychology,
17, 439 – 450.
HURWITZ, I., WOLFF, P. H., BORTNICK, B. D., & KOKAS, K. (1975). Nonmusical effects
of the Kodály music curriculum in primary grade children. Journal of Learning Disabilities,
8, 167 – 174.
JONES, D. (1960). An outline of English phonetics. Cambridge: Heffer.
KAROW, C. M., MARQUARDT, T. P., & MARSHALL, R. C. (2001). Affective processing in
left and right hemisphere brain-damaged subjects with and without subcortical involve-
ment. Aphasiology, 15(8), 715 – 729.
KINGDON, R. (1958). English intonation practice. London: Longmans.
KLINGHARDT, H. (1920). Übungen im Englischen Tonfall [Exercises in English intonation].
Cöthen: Otto Schulze.
KOELSCH, S. (2005). Neural substrates of processing syntax and semantics in music. Current
Opinion in Neurobiology, 15, 1− 6.
KOELSCH, S., GUNTER, T. C., CRAMON, D. Y., ZYSSET, S., LOHMANN, G., &
FRIEDERICI, A. D. (2002). Bach speaks: A cortical “language-network” serves the
processing of music. Neuroimage, 17, 956 – 966.
LIÉGEOIS-CHAUVEL, C., GIRRUD, K., BADIER, J., MARQUIS., P., & CHAUVEL, P.
(2001). Intracerebral evoked potentials in pitch perception reveal a functional asymmetry
of the human auditory cortex. In R. J. Zatorre & I. Peretz (Eds.), The biological foundations
of music (Annals of New York Academy of Sciences, vol. 930) (pp.117 – 132). New York:
New York Academy of Sciences.
LIEGEOIS-CHAUVEL, C., PERETZ, I., BABAI, M., LAGUITTON, V., & CHAUVEL, P.
(1998). Contribution of different cortical areas in the temporal lobes to music processing.
Brain, 121, 1853 – 1867.
MACKENZIE BECK, J., , (2003). Is it possible to predict students’ ability to develop skills in
practical phonetics? In Proceedings of the 15th International Congress of Phonetic Sciences,
Barcelona, 2833 – 2836.
MAESS, B., KOELSCH, S., GUNTER, T. C., & FRIEDERICI, A. D. (2001). Musical syntax is
processed in Broca’s area: A MEG study. Nature Neuroscience, 4, 540 – 545.
MARIN, O. (1982). Neurological aspects of music perception and performance. In D. Deutsch
(Ed.), The Psychology of Music. Orlando: Academic Press.
NICHOLSON, K. G., BAUM, S., KILGOUR, A., KOH, C. K., MUNHALL, K. G., & CUDDY,
L. L. (2003). Impaired processing of prosodic and musical patterns after right hemisphere
damage. Brain and Cognition, 52, 382 – 389.
O’CONNOR, J. D., & ARNOLD, G. (1961 / 1973). Intonation of colloquial English. London:
Longman.
OVERY, K. (2000). Dyslexia, temporal processing and music: The potential of music as an early
learning aid for dyslexic children. Psychology of Music, 28(2), 218 – 229.
OVERY, K., NICOLSON, R. I., FAWCETT, A. J., & CLARKE, E. F. (2003). Dyslexia and
Music: Measuring musical timing skills. Dyslexia, 9, 18 – 36.
PARSONS, L. M. (2001). Exploring the functional neuroanatomy of music performance, percep-
tion, and comprehension. In R. J. Zatorre & I. Peretz (Eds.), The biological foundations of
music (Annals of New York Academy of Sciences, vol. 930) (pp.211 – 231). New York: New
York Academy of Sciences.
PATEL, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6(7), 674 – 681.
PATEL, A. D., FOXTON, J. M., & GRIFFITHS, T. D. (2005). Musically tone-deaf individ-
uals have difficulty discriminating intonation contours extracted from speech. Brain and
Cognition, 59, 310 – 313.
Language and Speech

PATEL, A. D., GIBSON, E., RATNER, J., BESSON, M., & HOLCOMB, P. J. (1998). Processing
syntactic relations in language and music: An event-related potential study. Journal of
Cognitive Neuroscience, 10(6), 717 – 733.
PATEL, A. D., & PERETZ, I. (1997). Is music autonomous from language? A neuropsycho-
logical appraisal. In I. Deliège & J. A. Sloboda (Eds.), Perception and cognition of music
(pp. 191 – 215). London: Taylor and Francis Ltd., Psychology Press.
PATEL, A. D., PERETZ, I., TRAMO, M., & LABRECQUE, R. (1998). Processing prosodic and
musical patterns: A neuropsychological investigation. Brain and Language, 61, 123 – 144.
PERETZ, I. (1990). Processing of local and global musical information by unilateral brain-
damaged patients. Brain, 113, 1185 – 1205.
PERETZ, I. (2001). Brain specialisation for music. In R. J. Zatorre & I. Peretz (Eds.), The biological
foundations of music (Annals of New York Academy of Sciences, vol. 930) (pp.153 – 165).
New York: New York Academy of Sciences.
PERETZ, I., AYOTTE, J., ZATORRE, R., MEHLER, J., AHAD, P., PENHUNE, V., &
JUTRAS, B. (2002). Congenital amusia: A disorder of fine-grained pitch discrimination.
Neuron, 33, 185–191.
PERETZ, I., & COLTHEART, M. (2003). Modularity of music processing. Nature Neuroscience,
6(7), 688 – 691.
PERETZ, I., & HYDE, K. L. (2003). What is specific to music processing? Insights from congenital
amusia. Trends in Cognitive Sciences, 7(8), 362 – 367.
PERETZ, I., KOLINSKY, R., TRAMO, M., LABRECQUE, R., HUBLET, C., DEMEURISSE,
G., & BELLEVILLE, S. (1994). Functional dissociations following bilateral lesions of
auditory cortex. Brain, 117, 1283 – 1302.
PERKINS, J. M., BARAN, J. A., & GANDOUR, J. (1996). Hemispheric specialization in
processing intonation contours. Aphasiology, 10(4), 343 – 362.
PLATEL, H. (2002). Neuropsychology of musical perception: New perspectives. Brain, 125,
223 – 224.
PLATEL, H., PRICE, C., BARON, J. C., WISE, R., LAMBERT, J., FRACKOWIAK, R. S.,
LECHEVALIER, B., & EUSTACHE, F. (1997). The structural components of music
perception: A functional anatomical study. Brain, 120, 229 – 243.
SAMSON, S., EHRLÉ, N., & BAULAC, M. (2001). Cerebral substrates for musical temporal
processes. In R. J. Zatorre & I. Peretz (Eds.), The biological foundations of music (Annals of
New York Academy of Sciences, vol. 930) (pp.166 – 178). New York: New York Academy
of Sciences.
SCHELLENBERG, E. G. (2001). Music and non-musical abilities. In R. J. Zatorre & I. Peretz
(Eds.), The biological foundations of music (Annals of New York Academy of Sciences, vol.
930) (pp. 355 – 371). New York: New York Academy of Sciences.
SCHELLENBERG, E. G. (2004). Music lessons enhance IQ. Psychological Science, 15(8),
511 – 514.
SCHÖN, D., MAGNE, C., & BESSON, M. (2004). The music of speech: Music training facilitates
pitch processing in both music and language. Psychophysiology, 41, 341 – 349.
SEASHORE, C. E., LEWIS, D., & SAETVEIT, J. G. (1960). Seashore measures of musical
talents — manual. New York: The Psychological Corporation.
SERGENT, J. (1993). Mapping the musicians brain. Human Brain Mapping, 1, 20 – 38.
SILVERMAN, K., BECKMAN, M. E., PITRELLI, J., OSTENDORF, M., WIGHTMAN, C.,
PRICE, P., PIERREHUMBERT, J., & HIRSCHBERG, J. (1992). ToBI: A standard for
labeling English prosody. In Proceedings of the Second International Conference on Spoken
Language Processing, Banff, Canada, 2, 867 – 870.
STOCK, D., & ROSEN, S. (1986). Frequency discrimination and resolution at low frequencies in
normal and hearing-impaired listeners. Speech, Hearing and Language: Work in Progress,
Phonetics, and Linguistics, University College London, 2, 193 – 222.
Language and Speech

THOMPSON, W. F., SCHELLENBERG, E. G., & HUSAIN, G. (2004). Decoding speech prosody:
Do music lessons help? Emotion, 4(1), 46 – 64.
TILLMANN, B., BHARUCHA, J. J., & BIGAND, E. (2000). Implicit learning of tonality: A
self-organizing approach. Psychological Review, 107, 885 – 913.
WEINTRAUB, S., MESULAM, M.-M., & KRAMER, L. (1981). Disturbances in prosody: A
right-hemisphere contribution. Archives, of Neurology, 38, 742 – 744.
ZATORRE, R. J. (2001). Neural specializations for tonal processing. In R. J. Zatorre & I. Peretz
(Eds.), The biological foundations of music (Annals of New York Academy of Sciences, vol.
930) (pp. 193 – 210). New York: New York Academy of Sciences.
ZATORRE, R. J., EVANS, A. C., & MEYER, E. (1994). Neural mechanisms underlying melodic
perception and memory for pitch. The Journal of Neuroscience, 14(4), 1908 – 1919.
ZATORRE, R. J., EVANS, A. C., MEYER, E., & GJEDDE, A. (1992). Lateralisation of
phonetic and pitch discrimination in speech processing. Science, 256, 846 – 849.
Language and Speech

Appendix 1
Summary of tasks and stimuli in Study 1 and Study 2
STUDY 1 STUDY 2
MUSICAL TASKS
14 musicians and 15 musicians and
10 non-musicians 15 non-musicians

on pairs of stimuli
3 3
Tonal memory
• Which tone has changed? - 3

(4 & 5-tone melodies
3 & 4 tone melodies
with transposition)
• Which two tones have changed? - 3

(5-tone melodies)
• Which tone has changed and 3 3

in what direction ? (3, 4 & 5-tone melodies) (5 –tone melodies)
INTONATION ANALYSIS TASKS
Auditory discrimination (sentences)

3 3
Nuclear position judgments

(sentences) - 3
Nuclear tone identification
• monosyllables
• short phrases
- 3
• sentences
- 3
3 -
Simultaneous nuclear position

judgments and tone identification
(sentences)
- 3
Language and Speech

Appendix 2
Correlation between highest grade achieved and overall intonation scores: Scatter
plot with a line of best fit (Study 1)
100
90
80
70
60
50
40
30
0 1 2 3 4 5 6 7 8 9 10
Highest grade achieved
Appendix 3
Confusion matrices for perceptual judgments of nuclear tone type in Study 1.
(The “hits,” i.e., correct perceptual judgments, are represented by shaded areas)
Target tones [%]

Perceived tones HF LF HR LR RF FR RFR
HF 36 1 3 2 11 6 5
LF 31 86 1 7 8 4 5
HR 2 0 72 2 3 1 7
LR 7 5 15 59 8 6 2
RF 19 8 3 3 21 16 8
FR 1 0 3 21 7 18 7
RFR 1 1 1 3 18 30 43
FRF 36 1 3 2 11 6 5
Musicians
Musicians
Language and Speech

Target tones [%]
Perceived tones HF LF HR LR RF FR RFR

HF 19 15 8 8 8 7 3
LF 25 54 0 26 8 6 4
HR 8 5 45 3 10 4 9
LR 13 8 24 22 12 10 8
RF 18 11 2 8 15 16 11
FR 6 1 13 16 8 9 8
RFR 0 0 3 7 14 24 22
FRF 4 0 0 6 20 18 29
Nonmusicians
Non-musicians
Appendix 4
Correlation between highest grade achieved and overall intonation scores:
Scatter plot with a line of best fit (Study 2)
240
220
200
180
160
140
120
100
80
60
0 1 2 3 4 5 6 7 8 9
Highest grade achieved
Language and Speech

Appendix 5
Confusion matrices for perceptual judgments of nuclear tone type in monosyllabic
utterances in Study 2. (The “hits,” i.e., correct perceptual judgments, are represented
by shaded areas)
Target tones [%]
Perceived tones Rise Fall Fall-rise Rise-fall
Rise 89 8 6 8
Fall 4 84 12 12
Fall-rise 0 0 67 10
Rise-fall 0 1 9 63
Musicians
Musicians
Target tones [%]
Rise 70 19 8 17
Fall 21 80 9 21
Fall-rise 11 3 57 23
Rise-fall 4 4 33 46
Nonmusicians
Non-musicians
Language and Speech

Appendix 6
Confusion matrices for perceptual judgments of nuclear tone type in short utterances
in Study 2. (The “hits,” i.e., correct perceptual judgments, are represented by shaded
areas)
Target tones [%]
Rise 95 0 4 1
Fall 2 98 6 40
Fall-rise 1 2 83 1
Rise-fall 1 0 7 57
Musicians
Musicians
Target tones [%]
Rise 79 7 10 11
Fall 6 79 12 35
Fall-rise 8 6 56 15
Rise-fall 6 7 22 39
Nonmusicians
Non-musicians
Language and Speech

Dankovicova 2007

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dankovicova 2007

Uploaded by

Copyright:

Available Formats

J.

The Relationship between

Key words Abstract

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Is there a relationship between musical training and performance in intonation

Does musical “aptitude” (as measured by performance in the music tasks)

A secondary objective was to ask:

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

[he] [got a dis] [ tinc] [tion]

prehead head nucleus tail

Fall Rise Fall-Rise Rise-Fall Rise-Fall-Rise

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

I am very tired. (High Fall)

I wonder if we could borrow the money? (Low Rise)

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Pitch direction judgments

Frequency range Difference in frequency Difference in pitch

Set A 197.1 - 202.9 5.8 1/2

Set B 198.6 - 201.5 2.9 1/4

Set C 199.3 - 200.7 1.4 1/8

Set D 199.6 - 200.4 0.7 1/16

Set E 199.8 - 200.2 0.4 1/32

Set F 199.9 - 200.1 0.2 1/64

Language and Speech

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

(3) Intonation analysis tasks

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Nuclear tone identification

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

2.1.3 Experimental procedure

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Language and Speech

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

overall music scores

Overall music scores

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Are some intonational forms harder to identify than others?

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Target nuclear tones

Target nuclear tones

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

(3) Intonation analysis tasks

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

3.1.2 Experimental procedure

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Language and Speech

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

Is there a relationship between musical training and performance in intonation

utterances and short

The effect of musical training on judgments of nuclear position in sentences just

Downloaded from las.sagepub.com at East Carolina University on April 24, 2015

and nuclear position in

[he] [got a dis] [ tinc] [tion]