Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

www.elsevier.

com/locate/ynimg
NeuroImage 23 (2004) 344 – 357

$
Hemispheric roles in the perception of speech prosody
Jackson Gandour, a,* Yunxia Tong, a Donald Wong, b Thomas Talavage, c Mario Dzemidzic, d
Yisheng Xu, a Xiaojian Li, e and Mark Lowe f
a
Department of Audiology and Speech Sciences, Purdue University, West Lafayette, IN 47907-2038, USA
b
Department of Anatomy and Cell Biology, Indiana University School of Medicine, IN 46202-5120, USA
c
School of Electrical and Computer Engineering, Purdue University, IN 47907-2035, USA
d
MDZ Consulting Inc., Greenwood, IN 46143, USA
e
South China Normal University, Guangzhou, PR China
f
Cleveland Clinic Foundation, Cleveland, OH 44195, USA

Received 11 March 2004; revised 2 June 2004; accepted 2 June 2004

Speech prosody is processed in neither a single region nor a specific Introduction


hemisphere, but engages multiple areas comprising a large-scale
spatially distributed network in both hemispheres. It remains to be The differential roles of the left (LH) and right (RH) cerebral
elucidated whether hemispheric lateralization is based on higher-level hemispheres in the processing of prosodic information have re-
prosodic representations or lower-level encoding of acoustic cues, or ceived considerable attention over the last several decades. Evi-
both. A cross-language (Chinese; English) fMRI study was conducted
dence supporting an RH role in the perception of prosodic units at
to examine brain activity elicited by selective attention to Chinese
intonation (I) and tone (T) presented in three-syllable (I3, T3) and one-
phrase- and sentence-level structures has been wide-ranging, in-
syllable (I1, T1) utterance pairs in a speeded response, discrimination cluding dichotic listening (Blumstein and Cooper, 1974; Shipley-
paradigm. The Chinese group exhibited greater activity than the Brown et al., 1988), lesion deficit (Baum and Pell, 1999; Brådvik et
English in a left inferior parietal region across tasks (I1, I3, T1, T3). al., 1991; Pell, 1998; Pell and Baum, 1997; Weintraub et al., 1981),
Only the Chinese group exhibited a leftward asymmetry in inferior and functional neuroimaging (Gandour et al., 2003; George et al.,
parietal and posterior superior temporal (I1, I3, T1, T3), anterior 1996; Meyer et al., 2003; Plante et al., 2002; Wildgruber et al.,
temporal (I1, I3, T1, T3), and frontopolar (I1, I3) regions. Both 2002). Involvement of the LH in the perception of prosodic units at
language groups shared a rightward asymmetry in the mid portions of the syllable- or word-level structures has also been compelling with
the superior temporal sulcus and middle frontal gyrus irrespective of converging evidence from dichotic listening (Moen, 1993; Van
prosodic unit or temporal interval. Hemispheric laterality effects
Lancker and Fromkin, 1973; Wang et al., 2001), lesion deficit (Eng
enable us to distinguish brain activity associated with higher-order
prosodic representations in the Chinese group from that associated
et al., 1996; Gandour and Dardarananda, 1983; Hughes et al., 1983;
with lower-level acoustic/auditory processes that are shared among Yiu and Fok, 1995), and neuroimaging (Gandour et al., 2000, 2003;
listeners regardless of language experience. Lateralization is influenced Hsieh et al., 2001; Klein et al., 2001).
by language experience that shapes the internal prosodic representa- The precise mechanisms underlying functional asymmetry for
tion of an external auditory signal. We propose that speech prosody speech prosody remain a matter of debate. Task-dependent hy-
perception is mediated primarily by the RH, but is left-lateralized to potheses focus on functional properties (e.g., tone vs. intonation) of
task-dependent regions when language processing is required beyond the speech stimuli (Van Lancker, 1980), whereas cue-dependent
the auditory analysis of the complex sound. hypotheses are directed to particular physical properties (e.g.,
D 2004 Elsevier Inc. All rights reserved. temporal vs. spectral) of the acoustic signal (Ivry and Robertson,
1998; Poeppel, 2003; Schwartz and Tallal, 1980; Zatorre and
Keywords: fMRI; Human auditory processing; Speech perception; Selective
Belin, 2001). Speech prosody is predicted to be right-lateralized
attention; Laterality; Language; Prosody; Intonation; Tone; Chinese
by cue-dependent hypotheses. Hemispheric specialization, howev-
er, appears to be sensitive to language-specific factors irrespective
of neural mechanisms underlying lower-level auditory processing
(Gandour et al., 2002).
The Chinese (Mandarin) language can be exploited to address
$
questions of functional asymmetry underlying prosodic processing
Supplementary data associated with this article can be found, in the
that involve primarily variations in pitch. Chinese has four lexical
online version, at doi: 10.1016/j.neuroimage.2004.06.004.
* Corresponding author. Department of Audiology and Speech tones (e.g., ma [tone 1] ‘‘mother’’, ma ‘‘hemp’’ [tone 2], ma [tone 3]
Sciences, Purdue University, 1353 Heavilon Hall, 500 Oval Drive, West ‘‘horse’’, ma [tone 4] ‘‘scold’’). Tones 1 – 4 can be described
Lafayette, IN 47907-2038. Fax: +1-765-494-0771. phonetically as high level, high rising, falling rising, and high
E-mail address: gandour@purdue.edu (J. Gandour). falling, respectively (Howie, 1976). They are manifested at the level
Available online on ScienceDirect (www.sciencedirect.com.) of the syllable, the smallest structural unit for carrying prosodic

1053-8119/$ - see front matter D 2004 Elsevier Inc. All rights reserved.
doi:10.1016/j.neuroimage.2004.06.004
J. Gandour et al. / NeuroImage 23 (2004) 344–357 345

features, on a time scale of 200 – 350 ms. Intonation, on the other syllable utterances, 39% of the pairs for the three-syllable utter-
hand, is manifested at the phrase or sentence level, typically on a ances. Stimuli that were identical in both tone and intonation
time scale of seconds. In Chinese, interrogative intonation exhibits a comprised 28% and 22% of the pairs in one-syllable and three-
higher pitch contour than that of its declarative counterpart (Shen, syllable utterances, respectively.
1990) as well as a wider pitch range for sentence-final tones (Yuan
et al., 2002). In English, interrogative sentences do not have overall Recording procedure
higher pitch contours than declarative sentences, nor do they show
any effects of tone and intonation interaction in sentence-final A 52-year-old male native speaker of Mandarin was instructed
position. Chinese interrogative intonation with a final rising tone to read one- and three-syllable utterances at a conversational
has a rising end, which is similar to English, whereas that with a speaking rate in a declarative and interrogative sentence mood. A
final falling tone often has a falling end (Yuan et al., 2002). reading task was chosen to maximize the likelihood of simulating
In a previous fMRI study of Chinese tone and intonation (Gan- normal speaking conditions as much as possible while at the same
dour et al., 2003), both tone and intonation were judged in sentences time controlling the syntactic, prosodic, and segmental character-
presented at a fixed length (three words), and we observed left- istics of the spoken sentences. To enhance the naturalness of
lateralized lexical tone perception in comparison to intonation. producing the three-syllable utterances, he was told to treat them
However, the prosodic unit listeners selectively attended to and as SVO (subject verb object) sentences with non-emphatic stress
the temporal interval of attentional focus were coterminous. In placed on the final syllable. All items in the list were typed in
judgments of lexical tone, the focus of attention was on the final Chinese characters. A sufficient pause was provided between items
word only, whereas judgments of intonation required that the focus to ensure that the speaker maintained a uniform speaking rate. By
be directed to the entire sentence. Whether the principal driving controlling the pace of presentation, we maximized the likelihood
force in hemispheric lateralization of speech prosody is due to the of obtaining consistent, natural-sounding productions. To avoid
temporal interval of attentional focus rather than the hierarchical list-reading effects, extra items were placed at the top and bottom
level of linguistic units is not yet well-established. The aim of the of the list. Recordings were made in a double-walled soundproof
present study is to determine whether the temporal interval in which booth using an AKG C410 headset type microphone and a Sony
prosodic units are presented influences the neural substrates used in TCD-D8 digital audio tape recorder. The subject was seated and
prosodic processing. As such, participants are asked to make per- wore a custom-made headband that maintained the microphone at a
ceptual judgments of tone and intonation in one-syllable and three- distance of 12 cm from the lips.
syllable Chinese utterances. By comparing activation in homolo-
gous regions of both hemispheres, we can assess the extent to which Prescreening identification procedure
hemispheric laterality for speech prosody is driven by the temporal
interval, prosodic unit, or both. Only native Chinese speakers All one- and three-syllable utterances were presented individ-
possess implicit knowledge that relates external auditory cues to ually in random order for identification by five native speakers of
internal representations of tone and intonation. By employing two Chinese who were naive to the purposes of the experiment. They
language groups, one consisting of Chinese speakers, the other of were asked to respond whether they heard a declarative or
English speakers, we are able to determine whether activation of interrogative intonation and to indicate the tone occurring on the
particular brain areas is sensitive to language experience. final syllable. Only those stimuli that achieved a perfect (100%)
recognition score for both intonation and tone were retained for
possible use as stimuli in our training and experimental sessions.
Materials and methods
Task procedure
Subjects
The experimental paradigm consisted of four active tasks (Table
Ten native speakers of Mandarin (five male; five female) and 1) and a passive listening task. The active tasks required discrim-
ten native speakers of American English (five male; five female) ination judgments of intonation (I) and tone (T) in paired three-
were closely matched in age/years of education (Chinese: M = 29/ syllable (I3, T3) and one-syllable (I1, T1) utterances. Subjects were
19; English: M = 27/19). All subjects were strongly right-handed instructed to focus their attention on either the utterance-level
(Oldfield, 1971) and exhibited normal hearing sensitivity. All intonation or the lexical tone of the final syllable, make discrim-
subjects gave informed consent in compliance with a protocol ination judgments, and respond by pressing a mouse button (left =
approved by the Institutional Review Board of Indiana University same; right = different). The control task involved passive listening
Purdue University Indianapolis and Clarian Health. to the same utterances, either one-syllable utterances (L1) or three-
syllable utterances (L3). Subjects responded by alternately pressing
Stimuli the left and right mouse button after each trial.
A scanning sequence consisted of two tasks presented in a
Stimuli consisted of 36 pairs of three-syllable Chinese utter- blocked format alternating with rest periods (Fig. 2). The one-
ances, and 44 pairs of one-syllable Chinese utterances. Utterances syllable and three-syllable utterance blocks contained 11 and 9
were designed with two intonation patterns (declarative, interrog- trials, respectively. The order of scanning runs and trials within
ative) in combination with the four Chinese tones on the utterance- blocks were randomized for each subject. Instructions were deliv-
final syllable (Fig. 1). Focus was held constant on the utterance- ered to subjects in their native language via headphones during rest
final syllable. No adjacent syllables in the three-syllable utterances periods immediately preceding each task: ‘‘listen’’ for passive
formed bisyllabic words to minimize lexical-semantic processing. listening to speech stimuli, ‘‘intonation’’ for same – different judg-
Tone or intonation each differed in 36% of the pairs for the one- ments on Chinese intonation, and ‘‘tone’’ for same – different
346 J. Gandour et al. / NeuroImage 23 (2004) 344–357

Fig. 1. Acoustic features of sample Chinese speech stimuli. Broad-band spectrograms (SPG: 0 – 8 kHz) and voice fundamental frequency contours (F0: 0 – 400
Hz) are displayed for utterance pairs consisting of same tone/different intonation in three-syllable utterances (top left), same tone/different intonation in one-
syllable utterances (top right), different tone/same intonation in three-syllable utterances (bottom left), and different tone/same intonation in one-syllable
utterances (bottom right).

judgments on Chinese tone. Average trial duration was about 2.9 All speech stimuli were digitally edited to have equal maximum
and 3.5 s, respectively, for the one-syllable and three-syllable energy level in dB SPL. Auditory stimuli were presented binaurally
utterance blocks, including a response interval of 2 s. using a computer playback system (E-Prime) and a pneumatic-

Table 1
Samples of Chinese tone and intonation stimuli for tasks involving one-syllable and three-syllable utterances

Note. I1 (T1) and I3 (T3) represent intonation (tone) tasks in one-syllable and three-syllable utterances, respectively.
J. Gandour et al. / NeuroImage 23 (2004) 344–357 347

Fig. 2. Sequence and timing of conditions in each of the four functional imaging runs. I3 and I1 stand for intonation in three-syllable and one-syllable Chinese
utterances, respectively; T3 and T1 stand for tone in three-syllable and one-syllable Chinese utterances, respectively; R = rest interval; L3 and L1 stand for
passive listening to three-syllable and one-syllable Chinese utterances, respectively.

based audio system (Avotec). The plastic sound conduction tubes Imaging analysis
were threaded through tightly occlusive foam eartips inside the
earmuffs that attenuated the average sound pressure level of the Image analysis was conducted using the AFNI software pack-
continuous scanner noise by f30 dB. Average intensity of all age (Cox, 1996). All data for a given subject were motion-
experimental stimuli was 92 dB SPL as compared to 80 dB SPL corrected to the fourth acquired volume of the first functional
scanner noise. imaging run. To remove differences in global intensity between
Accuracy, reaction time, and subjective ratings of task difficulty runs, the signal in each voxel was detrended across each functional
were used to measure task performance. Each task was self-rated by scan to remove scanner signal drift, and then normalized to its
listeners on a 1- to 5-point graded scale of difficulty (1 = easy, 3 = mean intensity. Each of the four functional runs was analyzed to
medium, 5 = hard) at the end of the scanning session. Before obtain cross-correlation for each of three reference waveforms with
scanning, subjects were trained to a high level of accuracy using the measured fMRI time series for each voxel. The first reference
stimuli different from those presented during the scanning runs: I3 waveform corresponded to one of the four active conditions (I1, I3,
(Chinese, 93% correct; English, 88%); I1 (Chinese, 92%; English, T1, T3) presented in a single run (Fig. 2). The second and third
77%); T3 (Chinese, 99%; English, 82%); T1 (Chinese, 99%; reference waveforms corresponded to the two control conditions,
English, 85%). L1 and L3, respectively, presented during the two runs with the
same temporal interval for the intonation and tone conditions (L1
Imaging protocol for I1 and T1; L3 for I3 and T3). After the resulting EPI volumes
were transformed to 1-mm isotropic voxels in Talairach coordinate
Scanning was performed on a 1.5T Signa GE LX Horizon space (Talairach and Tournoux, 1988), the correlation coefficients
scanner (Waukesha, WI) equipped with birdcage transmit – re- were converted to z scores for purposes of analyzing multisubject
ceive radiofrequency head coils. Each of four 200-volume echo- fMRI data (Bosch, 2000), and spatially smoothed by a 5.2-mm
planar imaging (EPI) series was begun with a rest interval FWHM Gaussian filter to account for intersubject variation in brain
consisting of 8 baseline volumes (16 s), followed by 184 anatomy and to enhance the signal-to-noise ratio.
volumes during which the two comparison tasks (32 s) alternated Direct comparison of active conditions (I1, I3, T1, T3) across
with intervening 16 s rest intervals, and ended with a rest interval runs was accomplished by computing the average z score for each
of 8 baseline volumes (16 s) (Fig. 2). Functional data were of the four active conditions relative to its corresponding control
acquired using a gradient-echo EPI pulse sequence with the condition. Averaged z scores for the control conditions were then
following parameters: repetition time (TR) 2 s; echo time (TE) subtracted from those obtained for their corresponding intonation
50 ms; matrix 64  64; flip angle (FA) 90j; field of view (FOV) or tone conditions (e.g., DzI1 = zI1  zL1, DzI3 = zI3  zL3).
24  24 cm. Fifteen 7.5-mm-thick, contiguous axial slices were Evaluating each active condition to a control of the same temporal
used to image the entire cerebrum. Before functional imaging interval also makes it possible to compare active conditions across
runs, high-resolution, and anatomic images were acquired in 124 temporal intervals (e.g., DzI1 vs. DzI3).
contiguous axial slices using a 3D Spoiled-Grass (3D SPGR) Within- and between-group random effects maps (I1 vs. L1, T1
sequence (slice thickness 1.2 – 1.3 mm; TR 35 ms; TE 8 ms; 1 vs. L1, I3 vs. L3, T3 vs. L3) were also generated for display purposes
excitation; FA 30j; matrix 256  128; FOV 24  24 cm) for by applying voxel-wise ANOVAs on the z (e.g., Chinese zI1 vs.
purposes of anatomic localization and coregistration to a standard Chinese zL1) and Dz (e.g., Chinese DzI1 vs. English DzI1) values,
stereotactic system (Talairach and Tournoux, 1988). Subjects respectively. The individual voxel threshold for between-group
were scanned with eyes closed and room lights dimmed. The maps was set at P = 0.01. For within-group maps, significantly
effects of head motion were minimized by using a head – neck activated voxels ( P < 0.001) located within a radius of 7.6 mm were
pad and dental bite bar. grouped into clusters, with a minimum cluster size threshold
348 J. Gandour et al. / NeuroImage 23 (2004) 344–357

corresponding to four original resolution voxels. According to a Table 2


Monte Carlo simulation (AlphaSim), this clustering procedure Center coordinates and extents of 5-mm spherical ROIs
yielded a false-positive alpha level of 0.04. Region BA x y z Description
Frontal
ROI analysis aMFG 10 F32 +50 +4
mMFG 46/9 F45 +32 +22
Nine anatomically constrained 5-mm radius spherical regions of pMFG 9/6 F44 +10 +33
interest (ROI) were examined along with other regions. We chose FO 45/13 F37 +25 +14 centered deep within the frontal
ROIs that have been implicated in previous studies of phonological operculum of the inferior frontal gyrus,
processing (Burton, 2001; Hickok and Poeppel, 2000l; Hickok et al., extending dorsally to the lower bank of
2003), speech perception (Binder et al., 2000; Davis and Johnsrude, the inferior frontal sulcus, ventrally to
the bordering edge of the anterior insula
2003; Giraud and Price, 2001; Scott, 2003; Scott and Johnsrude,
2003; Scott et al., 2000; Zatorre et al., 2002), attention (Corbetta, Parietal
1998; Corbetta and Shulman, 2002; Corbetta et al., 2000; Shaywitz IPS 40/7 F32 48 +43 centered in and confined to
et al., 2001; Shulman et al., 2002), and working memory (Braver and the intraparietal sulcus
Bongiolatti, 2002; Chein et al., 2003; D’Esposito et al., 2000; IPL 40 F50 31 +28 centered in anteroventral aspects of
Jonides et al., 1998; Newman et al., 2002; Paulesu et al., 1993; the supramarginal gyrus, extending
Smith and Jonides, 1999). ROIs were symmetric in nonoverlapping ventrally into the bordering edge of
frontal, temporal, and parietal regions of both hemispheres (see the Sylvian fissure
Table 2 and Fig. 3). All center coordinates were derived by averaging
over peak location coordinates reported in previous studies. They Temporal
aSTG 38 F55 +9 8 centered in the temporal pole and
were then slightly adjusted to avoid overlapping of ROIs and
wholly confined to the STG; posterior
crossing of major anatomical boundaries. Of these coordinates, 26 border ( y = +5) was about 20 mm
out of 27 (9 ROIs  3 coordinates) fell within 1 SD, 1 (x, mSTS) anterior to the medial end of the first
within 2 SD of the mean published values. Similar results were transverse temporal sulcus (TTS)
obtained with 7-mm radius ROIs, but we chose to present only 5-mm mSTS 22 F49 20 3 centered in the STS encompassing both
radius results because larger ROIs would have to be shifted to avoid the upper and lower banks of the STS;
crossing of anatomical boundaries. anterior border ( y = 16)
The mean Dz (I1, I3, T1, T3) was calculated for each ROI and was contiguous with the medial
every subject. These mean Dz values within each ROI were border of TTS
analyzed using repeated measures mixed-model ANOVAs pSTG 22 F56 38 +12 centered in the STG, extending
ventrally into the STS; anterior border
(SASR) to compare activation between tasks (I1, T1, I3, T3),
( y = 35) was about 20 mm
hemispheres (LH, RH), and groups (Chinese, English). Tasks and posterior to the medial border of TTS
hemispheres were treated as fixed, within-subjects effects; groups
Notes. Stereotaxic coordinates (mm) are derived from the human brain atlas
as a fixed, between-subjects effect. Subjects were nested within
of Talairach and Tournoux (1988). a, anterior; m, middle; p, posterior; FO,
groups as a random effect. It may seem reasonable to use stimulus
frontal operculum; MFG, middle frontal gyrus; IPS, intraparietal sulcus;
length as a separate factor in the ANOVA, treating one-syllable and IPL, inferior parietal lobule; STG, superior temporal gyrus; STS, superior
three-syllable as two levels of this factor. However, as pointed out in temporal sulcus. Right hemisphere ROIs were generated by reflecting the
Introduction, although each stimulus contained three syllables in left hemisphere location across the midline.
both I3 and T3 tasks, T3 was different from I3 with respect to
attentional demands. In T3, participants had to pay attention to the RTs were longer for English than for Chinese listeners when making
last syllable only, whereas in I3, they had to focus their attention on tonal judgments (T1, P = 0.0281; T3, P = 0.007). Regardless of
all three syllables. Treating stimulus length as a separate factor language background, intonation judgments took longer in the one-
would have confounded length (1, 3) and prosodic unit (I, T). syllable (I1) than in the three-syllable (I3) utterances (Chinese, P =
0.0003; English, P = 0.0421). In the Chinese group, I1 was judged to
be more difficult than T1 (P = 0.0003); more errors were made in I1
Results than T1 (P = 0.0001), and RTs were longer in I1 compared to T1
(P < 0.0001), I3 compared to T3 (P = 0.0339). In contrast, the
Behavioral performance English group achieved a higher level of accuracy in I3 than in any of
the other three tasks (P < 0.01).
Behavioral measures of task performance by Chinese and En-
glish groups are given in Table 3. A repeated measures ANOVA was Between group comparisons
conducted with Group as between-subjects factor (Chinese, English)
and Task as within-subjects factor (I1, I3, T1, T3). Results revealed ROI-based ANOVAs revealed that the Chinese group exhibited
significant task  group interactions on self-ratings of task difficulty significantly ( P < 0.001) greater activity, as measured by Dz, in the
[ F(1,18) = 3.14, P = 0.0325], accuracy [ F(3,54) = 18.33, P < .0001] left IPL relative to the English group regardless of task (I1, I3, T1,
and reaction time (RT) [ F(3,54) = 8.68, P < 0.0001]. Tests of simple T3) (Figs. 4f and 5; Table 4). No other ROIs in either the LH or RH
main effects indicated that for between group comparisons, the tone elicited significantly more activity in the Chinese group as com-
task was judged to be easier for Chinese than for English listeners pared to the English group.
(T1, P < 0.0001; T3, P = 0.0004); Chinese listeners judged all tasks In contrast, the English group showed significantly greater
at a higher level of accuracy than English listeners (P < 0.01); and bilateral or right-sided activity in frontal, parietal, and temporal
J. Gandour et al. / NeuroImage 23 (2004) 344–357 349

Fig. 3. Location of fixed spherical ROIs in frontal (open circle), parietal (checkered circle), and temporal (barred circle) regions displayed in left sagittal
sections (top and middle panels), and on the lateral surface of both hemispheres (bottom panels). LH = left hemisphere; RH = right hemisphere. Stereotactic x
coordinates that appear in the top and middle panels are derived from the human brain atlas of Talairach and Tournoux (1988). See also Table 2.

ROIs relative to the Chinese group (Fig. 6; Table 4). In the frontal pSTG (Fig. 4i) was more active bilaterally across tasks (I1, I3, T1,
lobe, all four ROIs (Figs. 4a – d) were more active bilaterally for the T3), whereas greater activity in the aSTG (Fig. 4g) was observed
tone tasks (T1, T3). In the parietal lobe, IPS (Fig. 4e) activity was across tasks in the RH only.
greater in both the LH and RH for T1. In the temporal lobe, the
Within group comparisons
Table 3
Behavioral performance and self-ratings of task difficulty Hemisphere effects for the Chinese group revealed complemen-
a tary leftward and rightward asymmetries, as measured by Dz,
Language group Task Accuracy (%) Reaction time (ms) Difficulty
depending on ROI and task (Table 5). Laterality differences
Chinese I1 91.2 (1.5) 682 (48) 3.3 (0.4) favored the LH in the frontal aMFG (Figs. 4a and 7, upper panel)
T1 97.3 (0.9) 504 (41) 1.4 (0.16)
for intonation tasks only, irrespective of temporal interval (I1, I3).
I3 93.9 (1.3) 559 (36) 2.7 (0.3)
In the parietal lobe, significantly more activity was observed in the
T3 96.9 (1.2) 485 (26) 1.8 (0.25)
English I1 76.8 (2.5) 668 (49) 4.0 (0.30) left IPL (Figs. 4f and 5) across tasks, and in the left IPS (Figs. 4e
T1 70.9 (2.9) 642 (53) 3.3 (0.37) and 7, lower panel) for T3 (cf. Gandour et al., 2003). In the
I3 85.3 (2.4) 565 (40) 3.1 (0.28) temporal lobe, activity was greater in the left pSTG (Figs. 4i and 8)
T3 72.2 (3.0) 656 (47) 3.4 (0.27) and aSTG (Fig. 4g) across tasks regardless of temporal interval. In
Note. Values are expressed as mean and standard error (in parentheses). See contrast, laterality differences favored the RH in the frontal mMFG
also note in Table 1. (Fig. 4b) and temporal mSTS (Figs. 4h and 8) across tasks.
a Hemisphere effects for the English group were restricted to
Scalar units are from 1 to 5 (1 = easy; 3 = medium; 5 = hard) for self-
ratings of task difficulty. frontal and temporal ROIs in the RH (Table 5). Rightward asymme-
350 J. Gandour et al. / NeuroImage 23 (2004) 344–357

Fig. 4. Comparison of mean Dz scores between language groups (Chinese, English) per task (I1, T1, I3, T3) and hemisphere (LH, RH) within each ROI. Frontal
lobe, a – d; parietal, e – f; temporal, g – i. I1 is measured by DzI1; T1 by DzT1; I3 by DzI3; T3 by DzT3. Error bars represent F1 SE.

tries were observed in the frontal mMFG (Fig. 4b) and temporal
mSTS (Fig. 4h) across tasks. These functional asymmetries favoring
the RH were identical to those for the Chinese group. No significant
leftward asymmetries were observed for any task across ROIs.
Task effects for the Chinese group revealed laterality differ-
ences, as measured by Dz, related to the prosodic unit. Intonation
(I1, I3), when compared to tone (T1, T3), favored the LH in the
aMFG (Figs. 4a and 7). In the pMFG (Fig. 4c), I3 was greater than
T3 in the RH; I1 was greater than T1 in both hemispheres.
For both groups, a cluster analysis revealed significant (P <
.001) activation in the supplementary motor area across tasks. The
Chinese group showed predominantly right-sided activation in the
lateral cerebellum across tasks. In the caudate and thalamus,
increased activation was observed in the Chinese group for the
intonation tasks only (I1, I3), but across tasks in the English group.
Fig. 5. A random effects fMRI activation map obtained from comparison of
discrimination judgments of intonation in one-syllable utterances (I1)
relative to passive listening to the same stimuli (L1) between the two Discussion
language groups (DzI1 Chinese vs. DzI1 English). Left/right sagittal sections
through stereotaxic space are superimposed onto a representative brain
anatomy. The Chinese group shows increased activation in the left IPL, as Hemispheric roles in speech prosody
compared to the English group, centered in ventral aspects of the
supramarginal gyrus, and extending into the bordering edge of the Sylvian The major findings of this study demonstrate that Chinese tone
fissure. Similar activation foci in the IPL are also observed in I3 vs. L3, T1 and intonation are best thought of as a mosaic of multiple local
vs. L1, and T3 vs. L3 comparisons. See also Fig. 4. asymmetries that allows for the possibility that different regions
J. Gandour et al. / NeuroImage 23 (2004) 344–357 351

Table 4
Group effects per task-and-hemisphere from statistical analyses on mean Dz within each spherical ROI
Group Hemi Task Frontal Parietal Temporal
aMFG mMFG pMFG FO IPS IPL aSTG mSTS pSTG
C>E LH I1 ***
T1 ***
I3 ***
T3 ***
RH I1
T1
I3
T3
E>C LH I1 **
T1 *** * *** ** *** **
I3 **
T3 ** * ** * **
RH I1 * **
T1 *** * *** ** *** * **
I3 * **
T3 ** * ** * * **
Note. C = Chinese group; E = English group; Hemi = hemisphere. LH = left hemisphere; RH = right hemisphere. *F(1, 18), P < 0.05; **F(1, 18), P < 0.01;
***F(1, 18), P < 0.001. See also notes to Tables 1 and 2.

may be differentially weighted in laterality depending on language-, that are lateralized to the LH in response to all tasks or subsets of
modality-, and task-related features (Ide et al., 1999). Earlier tasks are found in the Chinese group only (Fig. 9). Conversely, the
hypotheses that focus on hemispheric function capture only part two regions in the temporal and frontal lobes that are lateralized to
of, but not the whole, phenomenon. Not all aspects of speech the RH are found in both language groups. We infer that LH
prosody are lateralized to the RH. Cross-language differences in laterality reflects higher-order processing of internal representations
laterality of particular brain regions depend on a listener’s implicit of Chinese tone and intonation, whereas RH laterality reflects
knowledge of the relation between external stimulus features lower-order processing of complex auditory stimuli.
(acoustic/auditory) and internal conceptual representations (linguis- Previous models of speech prosody processing in the brain have
tic/prosodic). All regions in the frontal, temporal, and parietal lobes either focused on linguistics or acoustics as the driving force
underlying hemispheric lateralization. In this study, tone and
intonation are lateralized to the LH for the Chinese group. Despite
their functional differences from a linguistic perspective, they both
recruit shared neural mechanisms in frontal, temporal, and parietal
regions of the LH. The finding that intonation is lateralized to the
LH cannot be accounted for by a model that claims that ‘‘supra-
segmental sentence level information of speech comprehension is
subserved by the RH’’ (Friederici and Alter, 2004, p. 268). Neither
can this finding be explained by a hypothesis based on the size of
the temporal integration window (short ! LH; long ! RH)
(Poeppel, 2003). In spite of the fact that both intonation and tone
meet his criteria for a long temporal integration window, they are
lateralized to the LH instead of the RH.
Instead of viewing hemispheric roles as being derived from
either acoustics or linguistics independently, we propose that both
linguistics and acoustics, in addition to task demands (Plante et
al., 2002), are all necessary ingredients for developing a neuro-
biological model of speech prosody. This model relies on dynamic
interactions between the two hemispheres. Whereas the RH is
engaged in pitch processing of complex auditory signals, includ-
ing speech, we speculate that the LH is recruited to process
categorical information to support phonological processing, or
Fig. 6. Random effects fMRI activation map obtained from comparison of even syntactic and semantic processing (cf. Friederici and Alter,
discrimination judgments of tone in one-syllable utterances (T1) relative to
2004). With respect to task demands, I1 elicits greater activation
passive listening to the same stimuli (L1) between the two language groups
(DzT1 English vs. DzT1 Chinese). An axial section reveals increased
than T1 in the left aMFG and bilaterally in the pMFG. These
activation bilaterally in both frontal and parietal regions, as well as in the differences cannot be explained by ‘‘prosodic frame length’’
supplementary motor area, for the English group relative to the Chinese (Dogil et al., 2002) since both tone and intonation are presented
group. Similar activation foci are also observed in the T3 vs. L3 in an identical temporal context (one-syllable). These findings
comparison. See also Fig. 4. cannot be explained by a model that claims that segmental, lexical
352 J. Gandour et al. / NeuroImage 23 (2004) 344–357

Table 5
Within-group hemisphere effects per task from statistical analyses on mean Dz within each spherical ROI
Group Hemi Task Frontal Parietal Temporal
aMFG mMFG pMFG FO IPS IPL aSTG mSTS pSTG
C LH > RH I1 + ** *
T1 ** *
I3 ++ ** *
T3 ** *
RH > LH I1 * *
T1 * *
I3 * *
T3 * *
E LH > RH I1
T1
I3
T3
RH > LH I1 * *
T1 * *
I3 * *
T3 * *
Note. *F(1, 9), P < 0.05; **F(1, 9), P < 0.01; +tTukey-adjusted(9), P < 0.05. See also notes to Tables 2 and 4. ++
tTukey-adjusted(9), P < 0.01.

(i.e., tone), and syntactic information is processed in the LH, intonation judgment of the second stimulus competes for more
suprasegmental sentence level information (i.e., intonation) in the attentional resources and leads to greater effort in memory retrieval
RH (Friederici and Alter, 2004). Rather, they most likely reflect of intonation from the first stimulus. This process presumably
task demands related to retrieval of internal representations elicits greater activity in the left frontopolar region for intonation
associated with tone and intonation. tasks in Chinese listeners. English listeners, on the other hand,
employ a different processing strategy regardless of linguistic
Functional heterogeneity within a spatially distributed network function. Without prior knowledge of the Chinese language,
retrieving auditory information from working memory and making
Frontal lobe discrimination judgments is presumed to be equally difficult
Activation in the frontopolar cortex (BA 10) was bilateral between tone and intonation, resulting in bilateral activation of
across all tasks for English listeners, but predominantly left-sided frontopolar cortex for all tasks.
in the intonation tasks (I1, I3) for Chinese listeners (Table 5). The Dorsolateral prefrontal cortex, including BA 46 and BA 9, is
frontopolar region has extensive interconnections with auditory involved in controlling attentional demands of tasks and maintain-
regions of the superior temporal gyrus (Petrides and Pandya, ing information in working memory (Corbetta and Shulman, 2002;
1984). Thus, when presented with a competing articulatory sup- Knight et al., 1999; MacDonald et al., 2000; Mesulam, 1981). The
pression task, bilateral activation of frontopolar cortex has been rightward asymmetry in the mMFG (BA 46) that is observed in all
reported in a verbal working memory paradigm (Gruber, 2001). Its tasks (I1, I3, T1, T3) in both language groups (Table 5) points to a
functional role is inferred to be that of integrating working memory stage of processing that involves auditory attention and working
with the allocation of attentional resources (Koechlin et al., 1999), memory. Functional neuroimaging data reveal that auditory selec-
or applying greater effort in memory retrieval (Buckner et al., tive attention tasks elicit increased activity in right dorsolateral
1996; Schacter et al., 1996). prefrontal cortex (Zatorre et al., 1999). In the music domain,
These cross-language differences in frontopolar activation are perceptual analysis and short-term maintenance of pitch informa-
likely to result from the linguistic function of suprasegmental tion underlying melodies recruits neural systems within the right
information in Chinese and English. As measured by RT and prefrontal and temporal cortex (Zatorre et al., 1994). In this study,
accuracy, Chinese listeners take longer and are less proficient in activation of the prefrontal mMFG and temporal mSTS is similarly
judging intonation than tone. The relatively greater difficulty in lateralized to the RH across tasks in both language groups. These
intonation judgments presumably reflects the fact that in Chinese, data are consistent with the idea that the right dorsolateral
all syllables carry tonal contours obligatorily. Tones are likely to be prefrontal area (BA 46/9) plays a role in auditory attention that
processed first, as compared to intonation, due to this syllable-by- modulates pitch perception in sensory representations beyond the
syllable processing. By comparison, intonation contours play a lateral belt of the auditory cortex, and actively retains pitch
comparatively minor role in signaling differences in sentence information in auditory working memory (cf. Plante et al., 2002).
mood. In this study, the unmarked (i.e., minus a sentence-final Albeit in the speech domain, this frontotemporal network in the RH
particle) yes – no interrogatives are known to carry a light func- serves to maintain pitch information regardless of its linguistic
tional load (Shen, 1990). relevance. A frontotemporal network for auditory short-term mem-
In the present study, subjects were required to keep tone or ory is further supported by epileptic patients who show significant
intonation information of the first stimulus in a pair in their deficits in retention of tonal information after unilateral excisions
working memory while concurrently accessing tone or intonation of the right frontal or temporal regions (Zatorre and Samson,
identification of the second stimulus. Due to the functional 1991). In nonhuman primates, a processing stream for sound-
difference between tone and intonation for Chinese listeners, object identification has been proposed that projects anteriorly
J. Gandour et al. / NeuroImage 23 (2004) 344–357 353

Fig. 7. Random effects fMRI activation maps obtained from comparison of discrimination judgments of intonation (I3; upper panel) and tone (T3; bottom
panel) in three-syllable utterances relative to passive listening to the same stimuli (L3) for the Chinese group (zI3 vs. zL3; zT3 vs. zL3). In I3 vs. L3 and I1 vs. L1
(not shown), increased activity in frontopolar cortex (aMFG) shows a leftward asymmetry (upper panel; x =  35), whereas activation of the middle (mMFG)
region of dorsolateral prefrontal cortex shows the opposite laterality effect (upper panel; x = + 35, + 40, + 45). In T3 vs. L3, IPS activity is predominant in the
LH (bottom panel; x =  35,  40,  45). In I3 (upper panel; x = + 35, + 40, + 45) vs. T3 (lower panel; x = + 35, + 40, + 45), activation of the right pMFG is
greater in the I3 than the T3 task. See also Fig. 4.

along the lateral temporal cortex (Rauschecker and Tian, 2000), possible explanation has to do with the temporal interval. One
leading to the lateral prefrontal cortex (Hackett et al., 1999; might argue that the difference between I3 and T3 is due to the
Romanski et al., 1999a,b). A similar anterior processing stream time interval of focused attention for the prosodic unit: I3 = three
destined for the lateral prefrontal cortex in humans presumably syllables; T3 = last syllable only. On this view, shorter prosodic
underlies a frontotemporal network, at least in the RH, for low- frames are processed in the LH, longer frames in the RH. This
level auditory processing of complex pitch information. alternative account of pMFG activity is also ruled out because I1
Intonation elicited greater activity relative to tone in the pMFG elicits similar hemispheric laterality effects as I3. Instead, differ-
(BA 9), bilaterally in the one-syllable condition, right sided only in ential pMFG activity related to direct comparisons between into-
the three-syllable condition (Fig. 4c). The fact that I3 elicited nation and tone are most likely related to task demands (cf. Plante
greater activity than T3 in the posterior MFG of the RH replicates et al., 2002). As measured by RT and self-ratings of task difficulty,
Gandour et al. (2003). One possible explanation focuses on the intonation tasks are more difficult than tone for Chinese listeners
prosodic units themselves. Tones are processed in the LH, intona- (Table 3). Equally significant is the fact that the English group
tion predominantly in the RH. However, this account is untenable shows greater activation for tonal processing (T1, T3) than the
because I1 elicits greater activation bilaterally as compared to T1. Chinese group in the pMFG bilaterally (Table 4). These findings
Moreover, intonation (I1, I3) and tone (T1, T3) tasks separately together are consistent with the idea that the pMFG coordinates
elicit no hemispheric laterality effects in the pMFG. Another attentional resources required by the task.
354 J. Gandour et al. / NeuroImage 23 (2004) 344–357

position in a sequence of syllables, which causes repeated shifts in


attention from one item to another. These laterality differences
between T3 and T1 indicate that selective attention to discrete
linguistic constructs is a gradient neurophysiological phenomenon
in the context of task-specific demands.
The Chinese group, as compared to English, shows greater
activation across tasks (I1, I3, T1, T3) in the left ventral aspects of
the IPL (BA 40) near the parietotemporal boundary (Table 4).
Within the Chinese group, a relatively greater IPL activation on the
left is observed across tasks and without regard to the prosodic unit
(I, T) or temporal interval (1, 3). Perhaps it is the ‘‘categoricalness’’
or phonological significance of the auditory stimuli that triggers
activation in this area (Jacquemot et al., 2003). This language-
specific effect can be understood from the conceptualization of the
IPL as part of an auditory-motor integration circuit in speech
Fig. 8. A random effects fMRI activation map obtained from comparison of
discrimination judgments of intonation in one-syllable utterances (I1)
perception (Hickok and Poeppel, 2000; Wise et al., 2001). Chinese
relative to passive listening to the same stimuli (L1) for the Chinese group listeners possess articulatory-based representations of Chinese
(zI1 vs. zL1). Left/right sagittal sections reveal increased mSTS activity in tones and intonation. English listeners do not. Consequently, no
the RH, projecting both ventrally and dorsally into the MTG and STG,
respectively. pSTG activity shows the opposite hemispheric effect, part of a
continuous swath of activation extending caudally from middle regions of
the STG/STS. Similar activation foci are also observed in T1 vs. L1, I3 vs.
L3, and T3 vs. L3. See also Fig. 4.

The fronto-opercular region (FO, BA 45/13) is activated


bilaterally in both language groups (Table 5). Activation levels
are similar across tasks (I1, I3, T1, T3). Recent neuroimaging
studies (Meyer et al., 2002, 2003) also show bilateral FO activation
in a prosodic speech condition in which a speech utterance is
reduced to speech melody by removal of all lexical and syntactic
information. Increased FO activity is presumed to reflect increased
effort in extracting syntactic, lexical-semantic, or slow pitch
information from degraded speech signals (Meyer et al., 2002,
2003), or in discriminating sequences of melodic pitch patterns
(Zatorre et al., 1994). Similarly, our tasks require increased
cognitive effort to extract tone and intonation from the auditory
stream to maintain this information in working memory.

Parietal lobe
There appear to be at least two distinct regions of activation in
the parietal cortex, one located more superiorly (IPS) in the
intraparietal sulcus and adjacent aspects of the superior parietal
lobule, another more inferiorly (IPL) in the anterior supramarginal
gyrus (SMG) near the parietotemporal boundary (cf. Becker et al.,
1999). Our findings show greater activation in the IPS bilaterally in
T1 for the English group compared to Chinese (Table 4). It has
been proposed that this area supports voluntary focusing and
shifting of attentional scanning across activated memory represen-
tations (Chein et al., 2003; Corbetta and Shulman, 2002; Corbetta
et al., 2000; Cowan, 1995; Mazoyer et al., 2002). The efficacy of
selective attention depends on how external stimuli are encoded
into internal phonological representations. English listeners expe-
rienced more difficulty in focusing and shifting of attention in T1
because lexically relevant pitch variations do not occur in English Fig. 9. Laterality effects for ROIs in the Chinese group only, and in both
monosyllables. Chinese and English groups, rendered on a three-dimensional LH template
for common reference. In the Chinese group (top panel), IPL, aSTG, and
In contrast, the Chinese group shows left-sided activity in the
pSTG are left-lateralized (LH > RH) across tasks; aMFG (I1, I3) and IPS
IPS for T3 (Table 5). This finding replicates our previous study of (T3) are left-lateralized for specific tasks. In both language groups (bottom
Chinese tone and intonation (Gandour et al., 2003), reinforcing the panel), mMFG and mSTS are right-lateralized (RH > LH) across tasks
view that a left frontoparietal network is recruited for the process- (bottom right panel). Other ROIs do not show laterality effects. No ROI
ing of lexical tones (Li et al., 2003). In T1, listeners extract tone elicited either a rightward asymmetry for the Chinese group only, or a
from isolated monosyllables. In T3, they extract tone from a fixed leftward asymmetry for both Chinese and English groups. See also Table 5.
J. Gandour et al. / NeuroImage 23 (2004) 344–357 355

activation of this area is observed in the English group. Its LH These findings collectively support functional segregation of
activity co-occurs with a leftward asymmetry in the pSTG across temporal lobe regions, and their functional integration as part of a
tasks. Co-activation of the IPL reinforces the view that it is part of temporofrontal network (Davis and Johnsrude, 2003; Scott, 2003;
an auditory – articulatory processing stream that connects posterior Scott and Johnsrude, 2003; Specht and Reul, 2003). LH networks
temporal and inferior prefrontal regions. An alternative conceptu- in the temporal lobe that are sensitive to phonologically relevant
alization is that the phonological storage component of verbal parameters from the auditory signal are in anterior and posterior, as
working memory resides in the IPL (Awh et al., 1996; Paulesu et opposed to central, regions of the STG/STS (Giraud and Price,
al., 1993). This notion predicts that both passive listening and 2001). The anterior region appears to be part of an auditory-
verbal working memory tasks should elicit activation in this region, semantic processing stream, the posterior region part of an audi-
since auditory verbal information has obligatory access to the store tory-motor processing stream. Both processing streams, in turn,
(Chein et al., 2003). I1, I3, T1, and T3 were all derived by project to convergence areas in the frontal lobe.
subtracting their corresponding passive listening control condition.
Contrary to fact, this notion would wrongly predict no increased Effects of task performance on hemispheric asymmetry
activation in the IPL.
In this study, the BOLD signal magnitude depends on the
Temporal lobe participant’s proficiency in a particular phonological task (Chee et
The anterior superior temporal gyrus (aSTG) displays an LH al., 2001). The two groups differ maximally in relative language
advantage in the Chinese group across tasks (Table 5). A reduced proficiency: Chinese group, 100%; English group, 0%. As reflected
RH, rather than increased LH aSTG activation, appears to in behavioral measures of task performance (Table 3), perceptual
underlie this hemispheric asymmetry across all tasks. Since judgments of Chinese tones require more cognitive effort by
intelligible speech is used in all tasks, phonological input alone English monolinguals due to their unfamiliarity with lexical tones.
may be sufficient to explain the leftward asymmetry in the Their unfamiliarity with the Chinese language results in greater
Chinese group (Scott and Johnsrude, 2003; Scott et al., 2000). BOLD activation for T1 and T3, either bilateral or RH only (cf.
It is also consistent with the notion that this region maps Chee et al., 2001). The effect of minimal language proficiency
acoustic – phonetic cues onto linguistic representations as part of applies only to lexical tone. Intonation, on the other hand, elicits
a larger auditory-semantic integration circuit in speech perception bilateral activation for both groups in the posterior MFG, frontal
(Giraud and Price, 2001; Scott and Johnsrude, 2003; Scott et al., operculum, and intraparietal sulcus (Table 4; Fig. 4). This common
2003). In contrast, English listeners do not have knowledge of frontoparietal activity implies that processing of intonation requires
these prosodic representations. Consequently, they employ a similar cognitive effort for Chinese and English participants.
nonlinguistic pitch processing strategy across tasks and fail to
show any hemispheric asymmetry.
A language group effect is not found in hemispheric laterality Conclusions
of the mSTS (BA 22/21). Both groups show greater RH activity in
the mSTS across tasks (Table 5). This suggests that this area is Cross-language comparisons provide unique insights into the
sensitive to different acoustic features of the speech signal irre- functional roles of different areas of this cortical network that are
spective of language experience. The rightward asymmetry may recruited for processing different aspects of speech prosody (e.g.,
reflect shared mechanisms underlying early attentional modulation auditory, phonological). By using tone and intonation tasks, we are
in processing of complex pitch patterns. In this study, subjects were able to distinguish hemispheric roles of areas sensitive to linguistic
required to direct their attention to slow modulation of pitch levels of processing (LH) from those sensitive to lower-level
patterns (i.e., c300 – 1000 ms) underlying either Chinese tone or acoustical processing (RH). Rather than attribute processing of
intonation. This interpretation is consistent with hemispheric roles speech prosody to RH mechanisms exclusively, our findings
hypothesized for auditory processing of complex sounds in the suggest that lateralization is influenced by language experience
temporal lobe: RH for spectral processing, LH for temporal that shapes the internal prosodic representation of an external
processing (Poeppel, 2003; Zatorre and Belin, 2001; Zatorre et auditory signal. This emerging model assumes a close interaction
al., 2002). Moreover, it is consistent with the view that right between the two hemispheres via the corpus callosum. In sum, we
auditory cortex is most important in the processing of dynamic propose a more comprehensive model of speech prosody percep-
pitch variation (Johnsrude et al., 2000). Both groups show greater tion that is mediated primarily by RH regions for complex-sound
activation in the right mSTS. We therefore infer that this activity analysis, but is lateralized to task-dependent regions in the LH
reflects a complex aspect of pitch processing that is independent of when language processing is required.
language experience.
A left asymmetric activation of the posterior part of the superior
temporal gyrus (pSTG; BA 22) across tasks is observed in the Acknowledgments
Chinese group only (Table 5). It has been suggested that the left
pSTG, as part of a posterior processing stream, is involved in Funding was provided by a research grant from the National
prelexical processing of phonetic cues and features (Scott, 2003; Institutes of Health R01 DC04584-04 (JG) and an NIH
Scott and Johnsrude, 2003; Scott and Wise, 2003). English listen- postdoctoral traineeship (XL). We are grateful to J. Lowe, T.
ers, however, show no leftward asymmetry in the pSTG (Table 5). Osborn, and J. Zimmerman for their technical assistance in the
Moreover, they show greater activation bilaterally relative to the MRI laboratory. Portions of this research were presented at the 11th
Chinese group (Table 4). Therefore, auditory phonetic cues that are annual meeting of the Cognitive Neuroscience Society, San
of phonological significance in one’s native language may be Francisco, April 2004. Correspondence should be addressed to
primarily responsible for this leftward asymmetry. Jack Gandour, Department of Audiology and Speech Sciences,
356 J. Gandour et al. / NeuroImage 23 (2004) 344–357

Purdue University, West Lafayette, IN 47907-2038, or via email: Gandour, J., Wong, D., Hsieh, L., Weinzapfel, B., Van Lancker, D., Hutch-
gandour@purdue.edu. ins, G.D., 2000. A crosslinguistic PET study of tone perception.
J. Cogn. Neurosci. 12 (1), 207 – 222.
Gandour, J., Wong, D., Lowe, M., Dzemidzic, M., Satthamnuwong, N.,
Tong, Y., Li, X., 2002. A cross-linguistic FMRI study of spectral and
References temporal cues underlying phonological processing. J. Cogn. Neurosci.
14 (7), 1076 – 1087.
Awh, E., Jonides, J., Smith, E.E., Schumacher, E.H., Koeppe, R.A., Katz, Gandour, J., Dzemidzic, M., Wong, D., Lowe, M., Tong, Y., Hsieh, L.,
S., 1996. Dissociation of storage and rehearsal in verbal working mem- Satthamnuwong, N., Lurito, J., 2003. Temporal integration of speech
ory. Psychol. Sci. 7 (1), 25 – 31. prosody is shaped by language experience: an fMRI study. Brain Lang.
Baum, S., Pell, M., 1999. The neural bases of prosody: insights from lesion 84 (3), 318 – 336.
studies and neuroimaging. Aphasiology 13, 581 – 608. George, M.S., Parekh, P.I., Rosinsky, N., Ketter, T.A., Kimbrell, T.A.,
Becker, J., MacAndrew, D., Fiez, J., 1999. A comment on the functional Heilman, K.M., Herscovitch, P., Post, R.M., 1996. Understanding emo-
localization of the phonological storage subsystem of working memory. tional prosody activates right hemisphere regions. Arch. Neurol. 53 (7),
Brain Cogn. 41, 27 – 38. 665 – 670.
Binder, J., Frost, J., Hammeke, T., Bellgowan, P., Springer, J., Kaufman, J., Giraud, A.L., Price, C.J., 2001. The constraints functional neuroimaging
Possing, E., 2000. Human temporal lobe activation by speech and non- places on classical models of auditory word processing. J. Cogn. Neuro-
speech sounds. Cereb. Cortex 10 (5), 512 – 528. sci. 13 (6), 754 – 765.
Blumstein, S., Cooper, W.E., 1974. Hemispheric processing of intonation Gruber, O., 2001. Effects of domain-specific interference on brain activa-
contours. Cortex 10, 146 – 158. tion associated with verbal working memory task performance. Cereb.
Bosch, V., 2000. Statistical analysis of multi-subject fMRI data: assessment Cortex 11 (11), 1047 – 1055.
of focal activations. J. Magn. Reson. Imaging 11 (1), 61 – 64. Hackett, T.A., Stepniewska, I., Kaas, J.H., 1999. Prefrontal connections
Brådvik, B., Dravins, C., Holtås, S., Rosen, I., Ryding, E., Ingvar, D., 1991. of the parabelt auditory cortex in macaque monkeys. Brain Res. 817
Disturbances of speech prosody following right hemisphere infarcts. (1 – 2), 45 – 58.
Acta Neurol. Scand. 84 (2), 114 – 126. Hickok, G., Poeppel, D., 2000. Towards a functional neuroanatomy of
Braver, T.S., Bongiolatti, S.R., 2002. The role of frontopolar cortex in speech perception. Trends Cogn. Sci. 4 (4), 131 – 138.
subgoal processing during working memory. NeuroImage 15 (3), Hickok, G., Buchsbaum, B., Humphries, C., Muftuler, T., 2003. Auditory-
523 – 536. motor interaction revealed by fMRI: speech, music, and working mem-
Buckner, R.L., Raichle, M.E., Miezin, F.M., Petersen, S.E., 1996. Func- ory in area Spt. J. Cogn. Neurosci. 15 (5), 673 – 682.
tional anatomic studies of memory retrieval for auditory words and Howie, J.M., 1976. Acoustical Studies of Mandarin Vowels and Tones.
visual pictures. J. Neurosci. 16 (19), 6219 – 6235. Cambridge University Press, New York.
Burton, M., 2001. The role of the inferior frontal cortex in phonological Hsieh, L., Gandour, J., Wong, D., Hutchins, G.D., 2001. Functional het-
processing. Cogn. Sci. 25 (5), 695 – 709. erogeneity of inferior frontal gyrus is shaped by linguistic experience.
Chee, M.W., Hon, N., Lee, H.L., Soon, C.S., 2001. Relative language Brain Lang. 76 (3), 227 – 252.
proficiency modulates BOLD signal change when bilinguals perform Hughes, C.P., Chan, J.L., Su, M.S., 1983. Aprosodia in Chinese patients
semantic judgments. NeuroImage 13 (6 Pt 1), 1155 – 1163. with right cerebral hemisphere lesions. Arch. Neurol. 40 (12), 732 – 736.
Chein, J.M., Ravizza, S.M., Fiez, J.A., 2003. Using neuroimaging to eval- Ide, A., Dolezal, C., Fernandez, M., Labbe, E., Mandujano, R., Montes, S.,
uate models of working memory and their implications for language Segura, P., Verschae, G., Yarmuch, P., Aboitiz, F., 1999. Hemispheric
processing. J. Neurolinguist. 16, 315 – 339. differences in variability of fissural patterns in parasylvian and cingulate
Corbetta, M., 1998. Frontoparietal cortical networks for directing attention regions of human brains. J. Comp. Neurol. 410 (2), 235 – 242.
and the eye to visual locations: identical, independent, or overlapping Ivry, R., Robertson, L., 1998. The Two Sides of Perception. MIT Press,
neural systems? Proc. Natl. Acad. Sci. U. S. A. 95 (3), 831 – 838. Cambridge, MA.
Corbetta, M., Shulman, G.L., 2002. Control of goal-directed and stimulus- Jacquemot, C., Pallier, C., LeBihan, D., Dehaene, S., Dupoux, E., 2003.
driven attention in the brain. Nat. Rev., Neurosci. 3 (3), 201 – 215. Phonological grammar shapes the auditory cortex: a functional magnetic
Corbetta, M., Kincade, J.M., Ollinger, J.M., McAvoy, M.P., Shulman, G.L., resonance imaging study. J. Neurosci. 23 (29), 9541 – 9546.
2000. Voluntary orienting is dissociated from target detection in human Johnsrude, I.S., Penhune, V.B., Zatorre, R.J., 2000. Functional specificity
posterior parietal cortex. Nat. Neurosci. 3 (3), 292 – 297. in the right human auditory cortex for perceiving pitch direction. Brain
Cowan, N., 1995. Sensory memory and its role in information processing. 123 (Pt 1), 155 – 163.
Electroencephalogr. Clin. Neurophysiol., Suppl. 44, 21 – 31. Jonides, J., Schumacher, E.H., Smith, E.E., Koeppe, R.A., Awh, E.,
Cox, R.W., 1996. AFNI: software for analysis and visualization of func- Reuter-Lorenz, P.A., Marshuetz, C., Willis, C.R., 1998. The role of
tional magnetic resonance neuroimages. Comput. Biomed. Res. 29 (3), parietal cortex in verbal working memory. J. Neurosci. 18 (13),
162 – 173. 5026 – 5034.
Davis, M.H., Johnsrude, I.S., 2003. Hierarchical processing in spoken Klein, D., Zatorre, R., Milner, B., Zhao, V., 2001. A cross-linguistic PET
language comprehension. J. Neurosci. 23 (8), 3423 – 3431. study of tone perception in Mandarin Chinese and English speakers.
D’Esposito, M., Postle, B.R., Rypma, B., 2000. Prefrontal cortical contri- NeuroImage 13 (4), 646 – 653.
butions to working memory: evidence from event-related fMRI studies. Knight, R.T., Staines, W.R., Swick, D., Chao, L.L., 1999. Prefrontal cortex
Exp. Brain Res. 133 (1), 3 – 11. regulates inhibition and excitation in distributed neural networks. Acta
Dogil, G., Ackermann, H., Grodd, W., Haider, H., Kamp, H., Mayer, J., Psychol. (Amst.) 101 (2 – 3), 159 – 178.
Riecker, A., Wildgruber, D., 2002. The speaking brain: a tutorial intro- Koechlin, E., Basso, G., Pietrini, P., Panzer, S., Grafman, J., 1999. The role
duction to fMRI experiments in the production of speech, prosody and of the anterior prefrontal cortex in human cognition. Nature 399 (6732),
syntax. J. Neurolinguist. 15, 59 – 90. 148 – 151.
Eng, N., Obler, L., Harris, K., Abramson, A., 1996. Tone perception def- Li, X., Gandour, J., Talavage, T., Wong, D., Dzemidzic, M., Lowe, M.,
icits in Chinese-speaking Broca’s aphasics. Aphasiology 10, 649 – 656. Tong, Y., 2003. Selective attention to lexical tones recruits left dorsal
Friederici, A.D., Alter, K., 2004. Lateralization of auditory language func- frontoparietal network. NeuroReport 14 (17), 2263 – 2266.
tions: a dynamic dual pathway model. Brain Lang. 89 (2), 267 – 276. MacDonald III, A.W., Cohen, J.D., Stenger, V.A., Carter, C.S., 2000. Dis-
Gandour, J., Dardarananda, R., 1983. Identification of tonal contrasts in sociating the role of the dorsolateral prefrontal and anterior cingulate
Thai aphasic patients. Brain Lang. 18 (1), 98 – 114. cortex in cognitive control. Science 288 (5472), 1835 – 1838.
J. Gandour et al. / NeuroImage 23 (2004) 344–357 357

Mazoyer, P., Wicker, B., Fonlupt, P., 2002. A neural network elicited by pathway for intelligible speech in the left temporal lobe. Brain 123
parametric manipulation of the attention load. NeuroReport 13 (17), (Pt 12), 2400 – 2406.
2331 – 2334. Scott, S.K., Leff, A.P., Wise, R.J., 2003. Going beyond the information
Mesulam, M.M., 1981. A cortical network for directed attention and uni- given: a neural system supporting semantic interpretation. NeuroImage
lateral neglect. Ann. Neurol. 10 (4), 309 – 325. 19 (3), 870 – 876.
Meyer, M., Alter, K., Friederici, A.D., Lohmann, G., von Cramon, D.Y., Shaywitz, B.A., Shaywitz, S.E., Pugh, K.R., Fulbright, R.K., Skudlarski,
2002. fMRI reveals brain regions mediating slow prosodic modulations P., Mencl, W.E., Constable, R.T., Marchione, K.E., Fletcher, J.M.,
in spoken sentences. Hum. Brain Mapp. 17 (2), 73 – 88. Klorman, R., et al., 2001. The functional neural architecture of
Meyer, M., Alter, K., Friederici, A.D., 2003. Functional MR imaging components of attention in language-processing tasks. NeuroImage
exposes differential brain responses to syntax and prosody during au- 13 (4), 601 – 612.
ditory sentence comprehension. J. Neurolinguist. 16, 277 – 300. Shen, X.-N., 1990. The Prosody of Mandarin Chinese. University of Cal-
Moen, I., 1993. Functional lateralization of the perception of Norwegian ifornia Press, Berkeley, CA.
word tones—Evidence from a dichotic listening experiment. Brain Shipley-Brown, F., Dingwall, W.O., Berlin, C.I., Yeni-Komshian, G., Gor-
Lang. 44 (4), 400 – 413. don-Salant, S., 1988. Hemispheric processing of affective and linguistic
Newman, S.D., Just, M.A., Carpenter, P.A., 2002. The synchronization of intonation contours in normal subjects. Brain Lang. 33 (1), 16 – 26.
the human cortical working memory network. NeuroImage 15 (4), Shulman, G.L., d’Avossa, G., Tansy, A.P., Corbetta, M., 2002. Two atten-
810 – 822. tional processes in the parietal lobe. Cereb. Cortex 12 (11), 1124 – 1131.
Oldfield, R.C., 1971. The assessment and analysis of handedness: the Smith, E.E., Jonides, J., 1999. Storage and executive processes in the
Edinburgh inventory. Neuropsychologia 9 (1), 97 – 113. frontal lobes. Science 283, 1657 – 1661.
Paulesu, E., Frith, C.D., Frackowiak, R.S., 1993. The neural correlates Specht, K., Reul, J., 2003. Functional segregation of the temporal lobes
of the verbal component of working memory. Nature 362 (6418), into highly differentiated subsystems for auditory perception: an audi-
342 – 345. tory rapid event-related fMRI-task. NeuroImage 20 (4), 1944 – 1954.
Pell, M.D., 1998. Recognition of prosody following unilateral brain lesion: Talairach, J., Tournoux, P., 1988. Co-planar Stereotaxic Atlas of the Human
influence of functional and structural attributes of prosodic contours. Brain : 3-Dimensional Proportional System: An Approach to Cerebral
Neuropsychologia 36 (8), 701 – 715. Imaging. Thieme Medical Publishers, New York.
Pell, M.D., Baum, S.R., 1997. The ability to perceive and comprehend Van Lancker, D., 1980. Cerebral lateralization of pitch cues in the linguistic
intonation in linguistic and affective contexts by brain-damaged adults. signal. Pap. Linguist. 13 (2), 201 – 277.
Brain Lang. 57 (1), 80 – 99. Van Lancker, D., Fromkin, V., 1973. Hemispheric specialization for pitch
Petrides, M., Pandya, D.N., 1984. Association fiber pathways to the frontal and tone: evidence from Thai. J. Phon. 1, 101 – 109.
cortex from the superior temporal region in the rhesus monkey. Wang, Y., Jongman, A., Sereno, J., 2001. Dichotic perception of Mandarin
J. Comp. Neurol. 273, 52 – 66. tones by Chinese and American listeners. Brain Lang. 78, 332 – 348.
Plante, E., Creusere, M., Sabin, C., 2002. Dissociating sentential prosody Weintraub, S., Mesulam, M.M., Kramer, L., 1981. Disturbances in prosody.
from sentence processing: activation interacts with task demands. A right-hemisphere contribution to language. Arch. Neurol. 38 (12),
NeuroImage 17 (1), 401 – 410. 742 – 744.
Poeppel, D., 2003. The analysis of speech in different temporal integration Wildgruber, D., Pihan, H., Ackermann, H., Erb, M., Grodd, W., 2002.
windows: cerebral lateralization as ‘asymmetric sampling in time’. Dynamic brain activation during processing of emotional intonation:
Speech Commun. 41 (1), 245 – 255. influence of acoustic parameters, emotional valence, and sex. Neuro-
Rauschecker, J.P., Tian, B., 2000. Mechanisms and streams for processing Image 15 (4), 856 – 869.
of ‘‘what’’ and ‘‘where’’ in auditory cortex. Proc. Natl. Acad. Sci. U. S. Wise, R.J., Scott, S.K., Blank, S.C., Mummery, C.J., Murphy, K., Warbur-
A. 97 (22), 11800 – 11806. ton, E.A., 2001. Separate neural subsystems within ‘Wernicke’s area’.
Romanski, L.M., Bates, J.F., Goldman-Rakic, P.S., 1999a. Auditory belt Brain 124 (Pt 1), 83 – 95.
and parabelt projections to the prefrontal cortex in the rhesus monkey. Yiu, E., Fok, A., 1995. Lexical tone disruption in Cantonese aphasic
J. Comp. Neurol. 403 (2), 141 – 157. speakers. Clin. Linguist. Phon. 9, 79 – 92.
Romanski, L.M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P.S., Yuan, J., Shih, C., Kochanski, G., 2002. Comparison of declarative and
Rauschecker, J.P., 1999b. Dual streams of auditory afferents target mul- interrogative intonation in Chinese. In: Bel, B., Marlien, I. (Eds.), Pro-
tiple domains in the primate prefrontal cortex. Nat. Neurosci. 2 (12), ceedings of the First International Conference on Speech Prosody. Aix-
1131 – 1136. en-Provence, France, pp. 711 – 714 (April).
Schacter, D.L., Alpert, N.M., Savage, C.R., Rauch, S.L., Albert, M.S., Zatorre, R.J., Belin, P., 2001. Spectral and temporal processing in human
1996. Conscious recollection and the human hippocampal formation: auditory cortex. Cereb. Cortex 11 (10), 946 – 953.
evidence from positron emission tomography. Proc. Natl. Acad. Sci. Zatorre, R., Samson, S., 1991. Role of the right temporal neocortex in
U. S. A. 93 (1), 321 – 325. retention of pitch in auditory short-term memory. Brain 114 (Pt 6),
Schwartz, J., Tallal, P., 1980. Rate of acoustic change may underlie hemi- 2403 – 2417.
spheric specialization for speech perception. Science 207, 1380 – 1381. Zatorre, R.J., Evans, A.C., Meyer, E., 1994. Neural mechanisms under-
Scott, S., 2003. How might we conceptualize speech perception? The view lying melodic perception and memory for pitch. J. Neurosci. 14 (4),
from neurobiology. J. Phon. 31, 417 – 422. 1908 – 1919.
Scott, S.K., Johnsrude, I.S., 2003. The neuroanatomical and functional Zatorre, R.J., Mondor, T.A., Evans, A.C., 1999. Auditory attention to space
organization of speech perception. Trends Neurosci. 26 (2), 100 – 107. and frequency activates similar cerebral systems. NeuroImage 10 (5),
Scott, S.K., Wise, R., 2003. PET and fMRI studies of the neural basis of 544 – 554.
speech perception. Speech Commun. 41, 23 – 34. Zatorre, R.J., Belin, P., Penhune, V.B., 2002. Structure and function of
Scott, S.K., Blank, C.C., Rosen, S., Wise, R.J., 2000. Identification of a auditory cortex: music and speech. Trends Cogn. Sci. 6 (1), 37 – 46.

You might also like