Professional Documents
Culture Documents
Neural Correlates of Timbre Change in Harmonic Sounds
Neural Correlates of Timbre Change in Harmonic Sounds
Neural Correlates of Timbre Change in Harmonic Sounds
doi:10.1006/nimg.2002.1295
This study is of particular interest because psychophysical bits. Each sound had an amplitude envelope defined in terms
and cognitive studies of timbre have shown transients to be of linear attack time (AT), sustain time (ST), and exponential
an important component in timbral identification and recog- decay time (DT). These three values were, respectively, 15,
nition (Berger, 1964). Participants in this study discrimi- 60, and 198 ms for Timbre A and 80, 0, and 117 ms for Timbre
nated pure tone stimuli incorporating frequency glides of B (Fig. 2). The main interest was in attack time and the other
either long (100 ms) or short (30 ms) duration. Comparison of two values were adjusted to achieve similar perceived dura-
these two conditions revealed significant activation foci in tions across the two timbres. The sounds were composed of
the left orbitofrontal cortex, left fusiform gyrus, and right harmonic spectra. The levels of the harmonies decreased as a
cerebellum. The left hemisphere activations are consistent power function of harmonic rank (A n ⫽ n p). The exponent p of
with previously cited work implicating left hemisphere struc- this function was used to establish the base spectral centroid
tures for temporal processing (Efron, 1963; Robin et al., (SCbase), which is defined as the amplitude-weighted mean
1990). harmonic rank. For a pure tone SCbase ⫽ 1, whereas for a
Based therefore on psychophysical models of timbre per- tone composed of the first three harmonics with amplitudes
ception, which point to an important role for both spectral equal to the inverse of the harmonic rank, SCbase ⫽ (1*1 ⫹
and temporal parameters of sound, we hypothesized a critical 2*1/2 ⫹ 3*1/3)/(1 ⫹ 1/2 ⫹ 1/3) ⫽ 1.63. Expressed in this way,
role for the the left and right temporal lobes in tasks involv- the relative amplitudes among harmonics remain constant
ing timbral processing, as well as those left hemisphere re- with variations in fundamental frequency. Spectral flux
gions previously implicated by Johnsrude et al.’s (1997) study (⌬SC, expressed as a change in amplitude-weighted mean
of transient perception. In order to maximize the difference harmonic rank) was defined as a variation over time of SC
along acoustic dimensions known to affect timbre perception, with a rise time SCRT and a fall time SCFT. Between time 0
we selected two types of sounds that differed maximally in and SCRT, SC rose from SCbase to SCbase ⫹ ⌬SC, following
terms of attack time, spectral centroid, and spectral flux, as a raised cosine trajectory. From time SCRT to SCRT ⫹ SCFT,
found by McAdams et al. (1995). The two distinct timbre SC decreased back to SCbase, following a slower raised co-
stimuli (A and B) created for the present experiment differed sine trajectory. After time SCRT ⫹ SCFT, SC remained con-
significantly along these three dimensions. Their approxi- stant at SCbase. Timbre A was characterized by a fast attack
mate positions with respect to the original space are shown in (AT ⫽ 15 ms), a low base spectral centroid (SCbase ⫽ 1.05),
Fig. 1. Our experimental design was as follows. We compared and no spectral flux (⌬SC ⫽ 0). Timbre B was characterized
brain activation in response to melodies composed of tones by a slower attack (AT ⫽ 80 ms), a higher base spectral
with a fast attack, low spectral centroid, and no spectral flux centroid (SCbase ⫽ 1.59), and a higher degree of spectral flux
(Timbre A) to that resulting from melodies formed of tones (⌬SC ⫽ 4.32, SCRT ⫽ 49 ms, SCFT ⫽ 102). For Timbre B, the
with a slow attack, higher spectral centroid, and greater instantaneous SC therefore varied from 1.59 to 5.91 between
spectral flux (Timbre B). The theoretical interest in these 0 and 49 ms and from 5.91 back to 1.59 between 49 and 151
sounds is that they include spectral, temporal, and spectro- ms, remaining at 1.59 thereafter (Fig. 2). Prior to the exper-
temporal parameters that have been shown to be correlated imental session, the two timbre stimuli were matched psy-
with perceptual dimensions of timbre. A case in which tim- chophysically for loudness by four raters, and this necessi-
bres changed across the entire perceptual space needed to be tated attenuating the sounds with Timbre B by 8.5 dB
tested first in order to demonstrate that perceptually salient
compared to those with Timbre A.
stimuli reveal differences in activation as measured with
Stimuli consisted of 60 six-note melodies played with these
fMRI. The critical question is thus: what are the character-
timbres. The range of fundamental frequencies was 288.2 Hz
istics of brain responses associated with differences in me-
(C No. 4) to 493.9 Hz (B4). One of these melodies, realized
lodic stimuli created with precisely controlled synthetic
with each of the timbres, is shown in a spectrographic rep-
sounds having different timbres?
resentation in Fig. 2. Ten melodies with a given timbre con-
stituted a block. All melodies were tonal and ended on the
MATERIALS AND METHODS tonic note. The pitch range was confined to an octave but
varied from five to nine semitones across melodies: 21 melo-
Subjects dies had a range with five semitones (ST), 21 with seven ST,
Ten healthy adult subjects (4 males and 6 females; ages and 18 with nine ST. Each block had 3 or 4 five-ST melodies,
21– 45, mean 33) participated in the study after giving writ- 3 or 4 seven-ST melodies, and 3 nine-ST melodies. Thirty
ten informed consent, and all protocols were approved by the melodies had one contour change, and 30 had two changes
human subjects committee at Stanford University School of (five of each per block). There were 36 major-mode and 24
Medicine. All participants were volunteers and were treated minor-mode melodies (6 major/4 minor per group). Key
in accordance with the APA “Ethical Principles of Psycholo- changed from melody to melody in order to keep the mean
gists and Code of Conduct.” pitch constant across melodies. An attempt was also made to
keep the pitch distribution relatively constant across blocks.
The time interval between onsets of successive tones within
Stimuli
a melody was 300 ms, making the total duration of each
All sounds were created digitally using a 22,050-Hz sam- melody approximately 1.8 s for Timbre A melodies and 1.7 s
pling rate and 16-bit resolution, although stimulus presen- for Timbre B melodies. A new melody occurred every 2.4 s
tation in the MRI scanner used only the 8 most significant and a block of 10 melodies thus lasted 24 s.
NEURAL CORRELATES OF TIMBRE 1745
FIG. 2. Acoustic features of Timbre A (top) and Timbre B (bottom). (Left panel) Spectrographic representation of one of the stimulus
melodies. The level is indicated by the darkness of the gray scale. The range represented by the gray scale is approximately 90 dB. (Top right
panel) The amplitude envelope of a single tone, representing the rms amplitude as a function of time. (Bottom right panel) The spectral flux
function for a single tone, representing the variation of the spectral centroid (SC) over time.
fMRI Tasks the scan and practiced the experiment prior to entering the
scanner.
Each subject performed two runs of the same experiment
in the scanner. Each experiment consisted of 12 alternating Behavioral Data Analysis
24-s epochs (epochs of 10 melodies with a given timbre), for a
total of 120 melodies per scan. There were 6 Timbre A epochs The aim of the behavioral data collection was to verify that
interleaved with 6 Timbre B epochs. Each of the six blocks of listeners were paying attention to the stimuli to the same
ten melodies was presented for each timbre, but the order of extent for A and B epochs. Reaction time (RT) and number of
blocks and the order of melodies within blocks were random- correct and incorrect responses for A and B epochs were
ized for each subject. Subjects were asked to engage their collapsed across the two runs. A response was considered
attention on the melodies. For each six-note melody they correct if the subject pressed the appropriate response key
were asked to press a response key as soon as the melody was after the end of the musical segment, but before the onset of
over. In one run, the first epoch was played with Timbre A, the next stimulus. The percentages of correct responses and
and in the second run, the first epoch was played with RTs to correct trials were compared using two-tailed paired t
Timbre B. tests. The significance threshold was set at 0.05.
built head holder was used to prevent head movement; 25 using least squares minimization without higher order cor-
axial slices (5 mm thick, 0 mm skip) parallel to the AC/PC rections for spin history (Friston et al., 1996) and were nor-
line and covering the whole brain were imaged with a tem- malized to stereotaxic Talairach coordinates using nonlinear
poral resolution of 1.6 s using a T2*-weighted gradient echo transformations (Ashburner and Friston, 1999; Friston et al.,
spiral pulse sequence (TR, 1600 ms; TE, 30 ms; flip angle, 1996). Images were then resampled every 2 mm using sinc
70°; 180 time frames; and one interleave) (Glover and Lai, interpolation and smoothed with a 4-mm Gaussian kernel to
1998). The field of view was 220 ⫻ 220 mm 2, and the matrix reduce spatial noise.
size was 64 ⫻ 64, providing an in-plane spatial resolution of
3.44 mm. To reduce blurring and signal loss arising from field
inhomogeneities, an automated high-order shimming Statistical Analysis
method based on spiral acquisitions was used before acquir-
ing functional MRI scans (Kim et al., 2000). Images were Statistical analysis was performed on individual and group
reconstructed, by gridding interpolation and inverse Fourier data using the general linear model and the theory of Gauss-
transform, for each of the 180 time points into 64 ⫻ 64 ⫻ 25 ian random fields as implemented in SPM99. This method
image matrices (voxel size, 3.44 ⫻ 3.44 ⫻ 5 mm). A linear takes advantage of multivariate regression analysis and cor-
shim correction was applied separately for each slice during rects for temporal and spatial autocorrelations in the fMRI
reconstruction using a magnetic field map acquired automat- data (Friston et al., 1995). Activation foci were superimposed
ically by the pulse sequence at the beginning of the scan on high-resolution T1-weighted images and their locations
(Glover and Lai, 1998). were interpreted using known neuroanatomical landmarks
To aid in localization of functional data, a high-resolution (Duvernoy et al., 1999). MNI coordinates were transformed to
T1-weighted spoiled grass gradient recalled (SPGR) inver- Talairach coordinates using a nonlinear transformation
sion recovery 3D MRI sequence was used with the following (Brett, 2000).
parameters: TI, 300 ms; TR, 8 ms; TE, 3.6 ms; flip angle, 15°; A within-subjects procedure was used to model all the
22-cm field of view; 124 slices in the sagittal plane; 256 ⫻ 192 effects of interest for each subject. Individual subject models
matrix; two averages; acquired resolution, 1.5 ⫻ 0.9 ⫻ 1.1 were identical across subjects (i.e., a balanced design was
mm. The images were reconstructed as a 124 ⫻ 256 ⫻ 256 used). Confounding effects of fluctuations in the global mean
matrix with a 1.5 ⫻ 0.9 ⫻ 0.9 mm spatial resolution. Struc- were removed by proportional scaling where, for each time
tural and functional images were acquired in the same scan point, each voxel was scaled by the global mean at that time
session. point. Low-frequency noise was removed with a high-pass
filter (0.5 cycles/min) applied to the fMRI time series at each
Stimulus Presentation voxel. A temporal smoothing function (Gaussian kernel cor-
responding to a half-width of 4 s) was applied to the fMRI
The task was programmed using Psyscope (Cohen et al., time series to enhance the temporal signal to noise ratio.
1993) on a Macintosh (Cupertino, CA) computer. Initiation of Regressors were modeled with a boxcar function correspond-
scan and task was synchronized using a TTL pulse delivered ing to the epochs during which each timbre was presented
to the scanner timing microprocessor board from a “CMU and convolved with a hemodynamic response function. We
Button Box” microprocessor (http://poppy.psy.cmu.edu/ then defined the effects of interest for each subject with the
psyscope) connected to the Macintosh. Auditory stimuli were relevant contrasts of the parameter estimates. Group analy-
presented binaurally using a custom-built magnet compati-
sis was performed using a random-effects model that incor-
ble system. This pneumatic delivery system was constructed
porated a two-stage hierarchical procedure. This model esti-
using a piezoelectric loudspeaker attached to a cone-shaped
mates the error variance for each condition of interest across
funnel, which in turn was connected to flexible plastic tubing
subjects, rather than across scans, and therefore provides a
leading to the participant’s ears. This delivery system filters
stronger generalization to the population from which data
the sound with an overall low-pass characteristic and a spec-
are acquired (Holmes and Friston, 1998). In the first stage,
tral dip in the region of 1.5 kHz (Fig. 3). The tubing passed
through foam earplug inserts that attenuated external sound contrast images for each subject and each effect of interest
by approximately 28 dB. The sound pressure level at the were generated as described above. In the second stage, these
head of the participant due to the fMRI equipment during contrast images were analyzed using a general linear model
scanning was approximately 98 dB (A), and so after the to determine voxelwise t statistics. One contrast image was
attenuation provided by the ear inserts, background noise generated per subject, for each effect of interest. A one-way,
was approximately 70 dB (A) at the ears of the listener. The two-tailed t test was then used to determine group activation
experimenters set the stimuli at a comfortable listening level for each effect. Finally, the t statistics were normalized to Z
determined individually by each participant during a test scores, and significant clusters of activation were determined
scan. using the joint expected probability distribution of height and
extent of Z scores (Poline et al., 1997), with height (Z ⬎ 1.67;
P ⬍ 0.05) and extent (P ⬍ 0.05) thresholds.
Image Preprocessing
Contrast images were calculated using a within-subjects
fMRI data were preprocesssed using SPM99 (http://www. design for the following comparisons: (1) Timbre B–Timbre
fil.ion.ucl.ac.uk/spm). Images were corrected for movement A; (2) Timbre A–Timbre B.
NEURAL CORRELATES OF TIMBRE 1747
FIG. 3. Fourier spectra (red) for Timbres A and B (left and right panels, respectively). The spectra were computed on 2048-sample (46
ms) segments with a Hanning window in the middle of the second tone from melody 1 (Fig. 2). The top panels show the spectra at the output
of the computer’s converters. The bottom panels show the sound after filtering by the pneumatic delivery system. Note an overall low-pass
characteristic plus a dip in the spectrum in the region of 1500 Hz. The filtering affects Timbre B sounds more than Timbre A sounds, but the
two are still quite distinct spectrally. Also superimposed on the bottom panels is the estimated spectrum of the scanner noise (black). This
noise could not be recorded under the same conditions due to the magnetic field of the scanner. This relative level is roughly estimated to
correspond to that present in the headphone system. It does not take into account filtering by the headphones, which would be quite variable
in the low-frequency region depending on the fit of the headphones to the head of each listener, nor the fact that the stimulus level was
adjusted individually for each listener.
FIG. 4. Surface rendering of brain activation in response to Timbre B stimuli compared to Timbre A stimuli. Significant clusters of
activation were observed in the central sections of the STG and STS. Note that left hemisphere activations are posterior to right hemisphere
activations. No brain regions showed greater activation in response to Timbre B stimuli, compared to those with Timbre A.
RESULTS for Timbre A and 99.8 ⫾ 1.0% for Timbre B. RTs were 1914 ⫾
59.5 and 1931 ⫾ 76.0 ms for Timbres A and B, respectively.
Behavioral
We conclude that there were no behavioral differences (and
End-of-melody detections did not differ in accuracy be- thus no confounding attentional or arousal differences) be-
tween A and B blocks (t(8) ⫽ ⫺0.900; P ⫽ 0.40), nor did RT tween the two sets of stimulus blocks containing the two
(t(8) ⫽ ⫺1.876; P ⫽ 0.10). Accuracy rates were 99.5 ⫾ 2.8% timbres.
1748 MENON ET AL.
TABLE 1
Brain Regions That Showed Significant Activation Differences between the Two Timbre Stimuli
Timbre B–Timbre A
Right STG/STS (BA 41/22/21) ⬍0.001 297 3.31 50, ⫺8, ⫺4
Left STG/STS (BA 41/22/21) ⬍0.002 285 2.55 ⫺36, ⫺24, 8
Timbre A–Timbre B No significant clusters — — —
Note. For each significant cluster, region of activation, significance level, number of activated voxels, maximum Z score, and location of
peak in Talairach coordinates are shown. Each cluster was significant after height and extent correction (P ⬍ 0.05). STG, superior temporal
gyrus; MTG, middle temporal gyrus; BA, Brodmann Area.
agreement with those of lesion studies which have suggested Our results also bear upon the cortical representation of
involvement of the temporal lobe in timbre processing (Mil- loudness and its examination using fMRI since the fMRI
ner, 1962; Samson and Zatorre, 1994). Lesion studies from activation patterns of two loudness-matched stimuli are so
the Montreal Neurological Institute have reported that right unbalanced. Studies of the effect of sound level on the extent
temporal lobe lesions result in greater impairment in timbre and level of temporal lobe activation have shown complex
discrimination than left temporal lobe lesions (Milner, 1962; relationships based on the nature of pure/harmonic-complex
Samson and Zatorre, 1994). Nevertheless, a closer look at the tones used (Hall et al., 2001) as well as the precise subregion
published data also suggests that left temporal lesions do and laterality of the auditory cortical activation examined
impair timbre discrimination. These studies have not, how- (Brechmann et al., 2002). For example, sound level had a
ever, provided any details of the precise temporal lobe re- small, but significant, effect on the extent of auditory cortical
gions involved in timbre perception and have instead focused activation for the pure tone, but not for the harmonic-com-
on examining the relative contributions of the entire left and plex tone, while it had a significant effect on the response
right temporal lobes to timbre processing. The present study magnitude for both types of stimuli (Hall et al., 2001). Fur-
clearly indicates that focal regions of the left and right tem- thermore, the relationships among sound level, loudness per-
poral lobes are involved in timbre processing. ception, and intensity coding in the auditory system are
Equal levels of activation were detected in the left and complex (Moore, 1995). In any case, our results discount
right temporal lobes. To our knowledge, lateralization of simple models for loudness that are based, implicitly or ex-
timbre-related activation has never been directly tested be- plicitly, on the degree of activation of neural populations.
fore. Within the temporal lobes, activation was localized to Although there were no overall hemispheric differences in
the primary auditory cortex (PAC) as well as the auditory extent of activation, our results clearly point to differences in
association areas in the belt areas of the STG and in the STS. the specific temporal lobe regions activated in the left and
Anteriorly, the activation foci did not extend beyond Heschl’s right hemispheres. ANOVA revealed that distinct subregions
gyrus (HG) (Howard et al., 2000; Kaas and Hackett, 2000; of the STG/STS were activated in each hemisphere: activa-
Tian et al., 2001). The most anterior activation point was at tion in the left hemisphere was significantly posterior to right
hemisphere activation, with very few voxels showing activa-
y ⫽ ⫺4 mm (Talairach coordinates) in the right temporal
tion in the anterior regions ( y ⬎ ⫺22) in the left hemisphere.
lobe. The bilateral activation observed in the present study is
Conversely, very few voxels showed activation in the poste-
in agreement with human electrophysiological studies that
rior regions ( y ⬍ ⫺22) in the right hemisphere. These results
have found bilateral N100 sources during timbre-specific pro-
point to a hemispheric asymmetry in neural response to
cessing (Crummer et al., 1994; Jones et al., 1998; Pantev et
timbre; indeed, the location of peak activation in left auditory
al., 2001).
cortex was 16 mm posterior to that of peak activation in right
Interestingly, activation for Timbre B was greater than
auditory cortex. This shift is 8 –11 mm more posterior than
that for Timbre A, despite the matching of the two stimuli for
can be accounted for by morphological differences in the
loudness. One plausible explanation is that Timbre B has a
locations of the left and right auditory areas (Rademacher et
greater spectral spread (or perceivable spectral bandwidth)
al., 2001). Based on a detailed cytoarchitectonic study of
than Timbre A and would thus have a more extensive tono-
postmortem brains, Rademacher et al. have reported a sta-
topic representation, at least in the primary and secondary tistically significant asymmetry in the location of the PAC in
auditory cortices (Howard et al., 2000; Wessinger et al., the left and right hemispheres. The right PAC is shifted
2001). While the two timbres have the same number of har- anteriorly by 7 or 8 mm and laterally by 4 or 5 mm, with
monics and thus ideally would have the same bandwidths, it respect to the left PAC. An almost identical pattern of func-
is likely that the higher harmonics of Timbre A are below the tional asymmetry was observed in the present study—the
auditory threshold and/or masked by the scanner noise, thus right hemisphere activation peak was shifted more anteriorly
creating a perceptible bandwidth difference. Bandwidth dif- and laterally than the activation peak in the left hemisphere.
ferences were also present in the Wessinger et al. (2000) and While the centroid of the right PAC is roughly shifted by
Hall et al. (2001) studies. However, in all three cases, other ⌬X ⫽ 4, ⌬Y ⫽ 8, and ⌬Z ⫽ 1 mm, the peak fMRI activations
factors were also varied (the relative periodicity of the signals in the present study are shifted by ⌬X ⫽ 14, ⌬Y ⫽ 16, and
in Wessenger et al., the pitch in Hall et al., and attack time ⌬Z ⫽ ⫺12 mm in the X, Y, and Z directions, respectively.
and spectral flux in the present study), so the true contribu- Thus, even after accounting for the anatomical asymmetry in
tion of bandwidth per se cannot be unambiguously deter- location of the PAC, left hemisphere activation is still con-
mined. Another possibility is that the greater spectral flux in siderably posterior to right hemisphere activation, thus sug-
Timbre B may result in greater cortical activation, particu- gesting a functional asymmetry in the left and right temporal
larly in the STS, as this region appears to be sensitive to lobe contributions to timbre processing. Furthermore, these
temporal changes in spectral distribution. Finally, the rela- hemispheric differences do not appear to arise from volumet-
tive salience of the stimuli may also be important since ric differences since careful measurements of the highly con-
Timbre B is clearly more salient perceptually than A, but it is voluted auditory cortex surface have revealed no size differ-
not easy to measure “salience” per se. At any rate, it may not ence between the left and right PAC or HG (Rademacher et
be possible to independently control salience and timbre, al., 2001). Moreover, the functional asymmetry observed dur-
since some timbral attributes, such as brightness and sharp ing timbre processing extends beyond the dorsal STG to the
attack, generally seem to make sounds “stand out.” STS and the adjoining circular insular sulcus as well.
1750 MENON ET AL.
FIG. 5. (a) Coronal and (b) axial sections showing brain activation in response to Timbre B stimuli, compared to those with Timbre A.
Activation is shown superimposed on group-averaged, spatially normalized, T1-weighted images. In each hemisphere, two distinct subclus-
ters of activation could be clearly identified, one in the dorsomedial STG and the other localized to the STS. The activations also extend into
the circular insular sulcus and insula. See text for details.
In each hemisphere, two distinct subclusters of activation borders the circular sulcus medially, the first transverse
could be clearly identified, one in the STG and the other sulcus anteriorly, and the roof of HG caudolaterally. The
localized to the STS. The anteriormost aspects of the dorso- caudally adjacent T2 was centered on Heschl’s sulcus, thus
medial STG activations in both hemispheres were localized including the posterior rim of HG and a similar area behind
to a stripe-like pattern of activation in area T1b which fol- the sulcus on the anterior planum temporale/second Heschl’s
lows the rostromedial slope of HG (Brechmann et al., 2002). gyrus. There is very little activation in area T1a, the division
In the left hemisphere, the activation extends posteriorly to of the HG rostral to the first transverse sulcus. Areas T1a
area T2, which is centered on Heschl’s sulcus. In both hemi- and T1b overlap to a great extent the PAC probability maps
spheres, the activations appear to be anterior to area T3, and Talairach coordinate ranges reported by Rademacher et
which principally overlaps the main body of the planum al. (2001). Based on the landmarks identified by Rademacher
temporale. As defined by Brechmann et al. (2002), area T1b et al. and Brechmann et al., it therefore appears that in the
NEURAL CORRELATES OF TIMBRE 1751
FIG. 5—Continued
left hemisphere, the dorsal STG activation extends posteri- primarily from primary and secondary auditory cortices in
orly beyond the PAC to the posterior HG/anterior tip of the the STG, and the upper bank receives input exclusively from
planum temporale (BA 42/22) in the belt areas of the auditory the STG (Seltzer and Pandya, 1978). Note that these STS
association cortex. In the right STG, activation was primarily regions are anterior to, and functionally distinct from, the
observed in anterior and posterior HG, including the PAC visual association areas of the STS that are involved in visual
(BA 41). No activation was observed in the auditory associa- motion and gaze perception. This central STS region is not
tion cortex A2 covering the planum polare (BA 52) anterior to generally known to be activated by pure tones or even by
the primary auditory cortex in either the left or right STG. frequency modulation (FM) of tones, in contrast to the T1b
Although processing of nonlinguistic, nonvocal, auditory and T2 regions of the auditory cortex (Brechmann et al.,
stimuli is generally thought to take place in the STG, we also 2002). Relatively little is known about the auditory functions
found significant activation in the STS and adjoining circular of the STS, but Romanski et al. (1999) have proposed the
insular sulcus and insula in both hemispheres. The STS existence of a ventral temporal lobe stream that plays a role
regions activated in the present study have direct projections in identifying coherent auditory objects, analogous to the
to the STG (Pandya, 1995) and are considered part of the ventral visual pathway involved in object recognition. Inter-
hierarchical organization of the auditory association cortex estingly, Belin et al. (2000) have shown that similar central
(Kaas and Hackett, 2000). These STS zones receive input STS auditory areas are activated when subjects listen pas-
1752 MENON ET AL.
FIG. 6. To characterize the observed hemispheric asymmetry in temporal lobe activation, we used analysis of variance (ANOVA) on the
dependent variable “percentage of voxels activated,” with factors Hemisphere (Left/Right) and Anterior/Posterior divisions. ANOVA revealed
a significant Hemisphere ⫻ Anterior/Posterior interaction (F(1,8) ⫽ 20.11; P ⬍ 0.002), with the left hemisphere activations significantly
posterior to the right hemisphere activations. The coronal slice at y ⫽ ⫺22 mm (Talairach coordinates), midway between the anterior- and
posteriormost activation in the temporal lobes, was used to demarcate the anterior and posterior subdivisions. Note that this analysis was
based on activation maps obtained in individual subjects.
sively to vocal sounds, whether speech or nonspeech. The temporal lobe to timbre processing is of considerable interest
present study suggests that these STS regions can be acti- is the fact that the same acoustical cues involved in the
vated even when stimuli do not have any vocal characteris- perception of musical timbre may also be involved in process-
tics. A common feature that the vocal sounds used in the ing linguistic cues. For example, spectral shape features may
Belin et al. study and the Timbre B stimuli used in the provide a more complete set of acoustic correlates for vowel
present study have is that both involve rapid temporal identity than do formants (Zahorian and Jagharghi, 1993),
changes in spectral energy distribution (spectral flux). We analogous to the manner in which timbre is most closely
suggest that timbre, which plays an important role in both associated with the distribution of acoustic energy across
music and language, may invoke common neural processing frequency (i.e., the shape of the power spectrum). Evidence
mechanisms in the STS. Unlike the STG, the precise func- for this comes from Quesnel’s study in which subjects were
tional divisions of the left and right STS are poorly under- easily taught to identify subtle changes in musical timbre
stood. One clue that may, in the future, aid in better under- based on similarities between these timbres and certain En-
standing the functional architecture of STS is that we glish vowels (Quesnel, 2001). Although it has been commonly
observed a functional hemispheric asymmetry in the STS assumed that the right and left hemispheres play dominant
similar to that in the STG, in that left STS activations were roles in musical and language processing, respectively (Kolb
also posterior to those observed in the right STS. The poste- and Willshaw, 2000), our findings suggest that music and
rior insular regions activated in this study are known to be language processing may in fact share common neural sub-
directly connected to the STG (Galaburda and Pandya, 1983), strates. In this regard, our data are consistent with human
suggesting a role in auditory function, but the role of this electrophysiological recordings which have also found equiv-
region in auditory processing is less well understood than alent bilateral STG activation during music and language
that of the STS. processing (Creutzfeldt and Ojemann, 1989). It has also been
The differential contributions of the left and right STG/ suggested that the left STG may be critically involved in
STS to timbre processing are not clear. Lesion studies have making distinctions between voiced and unvoiced consonants
provided very little information on this issue, partly because (Liegeois-Chauvel et al., 1999). Conversely, the right STG is
of the large extent of the excisions (Liegeois-Chauvel et al., involved in processing vocal sounds, whether speech or non-
1998). It is possible that the right hemisphere bias for timbre speech, compared to nonvocal environmental sounds (Belin et
processing observed in lesion studies may arise partly be- al., 2000).
cause more posterior left STG regions that contribute to The present study marks an important step in charting out
timbre processing are generally spared in temporal lobecto- the neural bases of timbre perception, a critical structuring
mies. One important reason that the contribution of the left force in music (McAdams, 1999), as well as in the identifica-
NEURAL CORRELATES OF TIMBRE 1753
tion of sound sources in general (McAdams, 1993). This study Duvernoy, H. M., Bourgouin, P., Cabanis, E. A., and Cattin, F. 1999.
motivates thinking about acoustic stimuli in terms of their The Human Brain: Functional Anatomy, Vascularization and Se-
timbral properties. This, in turn, will help to establish a more rial Sections with MRI. Springer-Verlag, New York.
ecological framework for interpreting brain activations aris- Efron, R. 1963. Temporal perception, aphasia and deja vu. Brain 86:
ing from acoustic stimuli with different timbral properties. 403– 424.
Second, it provides a review of neuropsychological investiga- Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J. P., Frith, C. D.,
tions of timbre perception that is integrated with a detailed and Frackowiak, R. S. J. 1995. Statistical parametric maps in
functional imaging: A general linear approach. Hum. Brain Mapp.
description of the hemispheric asymmetries in the differen-
2: 189 –210.
tial response to sounds that fall within different regions of
Friston, K. J., Williams, S., Howard, R., Frackowiak, R. S., and
timbral space. Turner, R. 1996. Movement-related effects in fMRI time-series.
Future studies will need to examine the relative contribu- Magn. Reson. Med. 35(3): 346 –355.
tion of the different components of timbre with independent, Galaburda, A. M., and Pandya, D. N. 1983. The intrinsic architec-
parametric control. Further electrophysiological studies are tonic and connectional organization of the superior temporal re-
also needed that examine how the spectral profile shape and gion of the rhesus monkey. J. Comp. Neurol. 221(2): 169 –184.
its dynamics are encoded by neurons in the auditory cortex Glover, G. H., and Lai, S. 1998. Self-navigated spiral fMRI: Inter-
and how the tonotopic and temporal patterns of neuronal leaved versus single-shot. Magn. Reson. Med. 39(3): 361–368.
activity relate to timbre perception (Shamma, 1999). We Grey, J. M. 1977. Multidimensional perceptual scaling of musical
expect that such future work will allow us to examine the timbres. J. Acoust. Soc. Am. 61: 1270 –1277.
neural basis of some of the known perceptual structure of Griffiths, R., Johnsrude, I., Dean, J., and Green, G. 1999. A common
timbre, and we expect to find further evidence of bilateral neural substrate for the analysis of pitch and duration pattern in
hemispheric contributions to this important parameter of segmented sound? NeuroReport 10(18): 3825–3830.
auditory perception, discrimination, and identification. Hajda, J. M., Kendall, R. A., Carterette, E. C., & Harshberger, M. L.
(Eds.) 1997. Methodological Issues in Timbre Research. Psychology
Press, London.
ACKNOWLEDGMENTS Hall, D. A., Haggard, M. P., Summerfield, A. Q., Akeroyd, M. A.,
Palmer, A. R., and Bowtell, R. W. 2001. Functional magnetic
This research was supported by NIH Grant HD 40761 and a grant resonance imaging measurements of sound-level encoding in the
from the Norris Foundation to V.M., by NSERC Grant 228175-00, absence of background scanner noise. J. Acoust. Soc. Am. 109(4):
FCAR Grant 2001-SC-70936, and CFI New Opportunities Grant 1559 –1570.
2719 to D.J.L., and NIH Grant RR 09784 to G.H.G. We are grateful Helmholtz, H. L. F. 1863/1954. On the Sensations of Tone (A. J. Ellis,
to Nicole Dolensky for her assistance with the behavioral data anal- Trans.). Dover, New York.
ysis. Holmes, A. P., and Friston, K. J. 1998. Generalisability, random
effects and population inference. NeuroImage 7(4): S754.
REFERENCES Howard, M. A., Volkov, I. O., Mirsky, R., Garell, P. C., Noh, M. D.,
Granner, M., Damasio, H., Steinschneider, M., Reale, R. A., Hind,
J. E., and Brugge, J. F. 2000. Auditory cortex on the human
Ashburner, J., and Friston, K. J. 1999. Nonlinear spatial normaliza- posterior superior temporal gyrus. J. Comp. Neurol. 416(1): 79 –92.
tion using basis functions. Hum. Brain Mapp. 7(4): 254 –266.
Johnsrude, I. S., Zatorre, R. J., Milner, B. A., and Evans, A. C. 1997.
Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., and Pike, B. 2000. Left-hemisphere specialization for the processing of acoustic tran-
Voice-selective areas in human auditory cortex. Nature 403(6767): sients. NeuroReport 8(7): 1761–1765.
309 –312.
Jones, S. J., Longe, O., and Vaz Pato, M. 1998. Auditory evoked
Berger, K. 1964. Some factors in the recognition of timbre. J. Acoust.
potentials to abrupt pitch and timbre change of complex tones:
Soc. Am. 36: 1888 –1891.
Electrophysiological evidence of ‘streaming’? Electroencephalogr.
Blood, A. J., Zatorre, R. J., Bermudez, P., and Evans, A. C. 1999. Clin. Neurophysiol. 108(2): 131–142.
Emotional responses to pleasant and unpleasant music correlate
Kaas, J. H., and Hackett, T. A. 2000. Subdivisions of auditory cortex
with activity in paralimbic brain regions. Nature Neurosci. 2(4).
and processing streams in primates. Proc. Natl. Acad. Sci. USA
Brechmann, A., Baumgart, F., and Scheich, H. 2002. Sound-level- 97(22): 11793–11799.
dependent representation of frequency modulations in human au-
ditory cortex: A low-noise fMRI study. J. Neurophysiol. 87(1): Kim, D. H., Adalsteinsson, E., Glover, G. H., and Spielman, S. 2000.
423– 433. SVD Regularization Algorithm for Improved High-Order Shim-
ming. Paper presented at the Proceedings of the 8th Annual Meet-
Brett, M. 2000. The MNI brain and the Talairach atlas. http://
ing of ISMRM, Denver.
www.mrc-cbu.cam.ac.uk/Imaging/mnispace.html.
Kolb, B., and Willshaw, I. Q. 2000. Brain and Behavior. Worth, New
Cohen, J. D., MacWhinney, B., Flatt, M., and Provost, J. 1993.
York.
PsyScope: A new graphic interactive environment for designing
psychology experiments. Behav. Res. Methods Instr. Comp. 25(2): Levitin, D. J. 1999. Memory for musical attributes. In Music, Cog-
257–271. nition and Computerized Sound: An Introduction to Psychoacous-
Creutzfeldt, O., and Ojemann, G. 1989. Neuronal activity in the tics (P. R. Cook, Ed.), (pp. 209 –227). MIT Press, Cambridge, MA.
human lateral temporal lobe. III. Activity changes during music. Liegeois-Chauvel, C., de Graaf, J. B., Laguitton, V., and Chauvel, P.
Exp. Brain Res. 77(3): 490 – 498. 1999. Specialization of left auditory cortex for speech perception in
Crummer, G. C., Walton, J. P., Wayman, J. W., Hantz, E. C., and man depends on temporal coding. Cereb. Cortex 9(5): 484 – 496.
Frisina, R. D. 1994. Neural processing of musical timbre by musi- Liegeois-Chauvel, C., Peretz, I., Babai, M., Laguitton, V., and Chau-
cians, nonmusicians, and musicians possessing absolute pitch. J. vel, P. 1998. Contribution of different cortical areas in the tempo-
Acoust. Soc. Am. 95(5, Pt. 1): 2720 –2727. ral lobes to music processing. Brain 121(Pt. 10): 1853–1867.
1754 MENON ET AL.
McAdams, S. 1993. Recognition of sound sources and events. In volume measurement of human primary auditory cortex. Neuro-
Thinking in Sound: The Cognitive Psychology of Human Audition Image 13(4): 669 – 683.
(S. McAdams and E. Bigand, Eds.), pp. 146 –198. Oxford Univ. Risset, J.-C., and Wessel, D. L. 1999. Exploration of timbre by anal-
Press, Oxford. ysis and synthesis. In The Psychology of Music (D. Deutsch, Ed.),
McAdams, S. 1999. Perspectives on the contribution of timbre to 2nd ed., pp. 113–168. Academic Press, San Diego.
musical structure. Comput. Music J. 23(2): 96 –113. Robin, D., Tranel, D., and Damasio, H. 1990. Auditory perception of
McAdams, S., and Cunibile, J. C. 1992. Perception of timbral anal- temporal and spectral events in patients with focal left and right
ogies. Phil. Trans. R. Soc. Lond. B 336: 383–389. cerebral lesions. Brain Language 39: 539 –555.
McAdams, S., and Winsberg, S. 2000. Psychophysical quantification Romanski, L. M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic,
of individual differences in timbre perception. In Contributions to P. S., and Rauschecker, J. P. 1999. Dual streams of auditory
Psychological Acoustics: Results of the 8th Oldenburg Symposium afferents target multiple domains in the primate prefrontal cortex.
on Psychological Acoustics (A. Schick, M. Meis, and C. Reckhardt, Nat. Neurosci. 2(12): 1131–1136.
Eds.), pp. 165–182. Bis, Oldenburg. Samson, S., and Zatorre, R. J. 1994. Contribution of the right tem-
McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., and Krim- poral lobe to musical timbre discrimination. Neuropsychologia
phoff, J. 1995. Perceptual scaling of synthesized musical timbres: 32(2): 231–240.
Common dimensions, specificities, and latent subject classes. Psy- Schellenberg, E. G., Iverson, P., and McKinnon, M. C. 1999. Name
chol. Res. 58: 177–192. that tune: Identifying popular recordings from brief excerpts. Psy-
Milner, B. (Ed.). 1962. Laterality Effects in Audition. Johns Hopkins chon. Bull. Rev. 6(4): 641– 646.
Press, Baltimore. Seltzer, B., and Pandya, D. N. 1978. Afferent cortical connections
and architectonics of the superior temporal sulcus and surround-
Moore, B. C. J. (Ed.). 1995. Hearing. Academic Press, San Diego.
ing cortex in the rhesus monkey. Brain Res. 149(1): 1–24.
Pandya, D. N. 1995. Anatomy of the auditory cortex. Rev. Neurol.
Shamma, S. A. 1999. Physiological basis of timbre perception. In
151(8,9): 486 – 494.
Cognitive Neuroscience (M. S. Gazzaniga, Ed.), pp. 411– 423. MIT
Pantev, C., Roberts, L. E., Schulz, M., Engelien, A., and Ross, B. Press, Cambridge.
2001. Timbre-specific enhancement of auditory cortical represen-
Sidtis, J., and Volpe, B. 1988. Selective loss of complex-pitch or
tations in musicians. NeuroReport 12(1): 169 –174.
speech discrimination after unilateral lesions. Brain Language 34:
Pitt, M. A. 1995. Evidence for a central representation of instrument 235–245.
timbre. Percept. Psychophys. 57(1): 43–55. Takeuchi, A. H., and Hulse, S. H. 1993. Absolute pitch. Psychol. Bull.
Platel, H., Price, C., Baron, J., Wise, R., Lambert, J., Frackowiak, R., 113(2): 345–361.
Lechevalier, B., and Eustache, F. 1997. The structural components Tian, B., Reser, D., Durham, A., Kustov, A., and Rauschecker, J. P.
of music perception. A functional anatomical study. Brain 2: 229 – 2001. Functional specialization in rhesus monkey auditory cortex.
243. Science 292(5515): 290 –293.
Plomp, R. 1970. Timbre as a multidimensional attribute of complex Wessinger, C. M., VanMeter, J., Tian, B., Van Lare, J., Pekar, J., and
tones. In Frequency Analysis and Periodicity Detection in Hearing Rauschecker, J. P. 2001. Hierarchical organization of the human
(R. Plomp and G. F. Smoorenburg, Eds.), pp. 397– 414. Sijthoff, auditory cortex revealed by functional magnetic resonance imag-
Leiden. ing. J. Cogn. Neurosci. 13(1): 1–7.
Poline, J. B., Worsley, K. J., Evans, A. C., and Friston, K. J. 1997. Zahorian, S. A., and Jagharghi, A. J. 1993. Spectral-shape features
Combining spatial extent and peak intensity to test for activations versus formants as acoustic correlates for vowels. J. Acoust. Soc.
in functional imaging. NeuroImage 5(2): 83–96. Am. 94(4): 1966 –1982.
Quesnel, R. 2001. A Computer-Assisted Method for Training and Zatorre, R. 1988. Pitch perception of complex tones and human
Researching Timbre Memory and Evaluation Skills. Unpublished temporal-lobe function. J. Acoust. Soc. Am. 84: 566 –572.
doctoral dissertation, McGill University, Montreal. Zatorre, R., Evans, A., and Meyer, E. 1993. Functional activation of
Rademacher, J., Morosan, P., Schormann, T., Schleicher, A., Werner, right temporal and occipital cortex in processing tonal melodies. J.
C., Freund, H. J., and Zilles, K. 2001. Probabilistic mapping and Acoust. Soc. Am. 93: 2364.