Psychophysiology - 2005 - Heinks Maldonado - Fine Tuning of Auditory Cortex During Speech Production

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Psychophysiology, 42 (2005), 180–190. Blackwell Publishing Inc. Printed in the USA.

Copyright r 2005 Society for Psychophysiological Research


DOI: 10.1111/j.1469-8986.2005.00272.x

Fine-tuning of auditory cortex during speech production

THEDA H. HEINKS-MALDONADO,a,b,c,d DANIEL H. MATHALON,e,f MAX GRAY,a,b


and JUDITH M. FORDa,b,e,f
a
Department of Psychiatry & Behavioral Sciences, Stanford University School of Medicine, Stanford, California, USA
b
Psychiatry Service, Veterans Affairs Palo Alto Health Care System, Palo Alto, California, USA
c
Department of Neuropsychology, Albert-Ludwigs-Universität, Freiburg, Germany
d
Department of Otolaryngology, University of California, San Francisco, San Francisco, California, USA
e
Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, USA
f
Psychiatry Service, Veterans Affairs West Haven Health Care System, West Haven, Connecticut, USA

Abstract
The cortex suppresses sensory information when it is the result of a self-produced motor act, including the motor act of
speaking. The specificity of the auditory cortical suppression to self-produced speech, a prediction derived from the
posited operation of a precise forward model system, has not been established. We examined the auditory N100
component of the event-related brain potential elicited during speech production. While subjects uttered a vowel, they
heard real-time feedback of their unaltered voice, their pitch-shifted voice, or an alien voice substituted for their own.
The subjects’ own unaltered voice feedback elicited a dampened auditory N100 response relative to the N100 elicited by
altered or alien auditory feedback. This is consistent with the operation of a precise forward model modulating the
auditory cortical response to self-generated speech and allowing immediate distinction of self and externally generated
auditory stimuli.
Descriptors: Efference copy, Event-related potential (ERP), N100, Auditory feedback, Speech production

Sensory stimulation resulting from self-initiated actions is provides a mechanism for filtering sensory information. When
experienced differently than stimulation produced by an external there is a match between the predicted and actual sensory
source. When we move our eyes, we do not perceive a moving feedback, a net cancellation of sensory input results, leading to a
room, and even ticklish people cannot seem to tickle themselves dampened sensory experience. When these signals do not match,
(Blakemore, Wolpert, & Frith, 1998; Weiskrantz, Elliot, & or when there is no corollary discharge to cancel the sensory
Darlington, 1971), perhaps because the brain processes the feedback (as occurs when sensory stimulation results from
sensory consequences of self-initiated actions differently from external events), sensory experience is intensified, alerting us to
externally generated sensory input. It is as if the brain expects the potentially important environmental events.
sensory consequence of self-initiated action, enabling us to Although this forward model has been applied most often in
distinguish potentially important external events from stimula- the visual or tactile modality, it can also be applied to other
tion that results from our own motor acts. It has been proposed sensory responses that are affected by self-generated actions,
that information about motor commands is transmitted in a such as the auditory response to self-produced speech. Early
forward system to make this distinction (Jeannerod, 1988; evidence for the effect of vocal production on the auditory system
Wolpert, 1997; Wolpert, Ghahramani, & Jordan 1995). Those came from animal studies in bats, birds, and monkeys. For
forward or ‘‘re-afference hypothesis’’ models propose that there example, in bats a 15-dB attenuation of the responses in the
is an efference copy of the motor commands used to predict the lateral lemniscus of the midbrain has been found during
sensory consequences (corollary discharge) of the action (Hein & vocalization (Suga & Schlegel, 1972; Suga & Shimozawa,
Held, 1962; Sperry, 1950; von Holst & Mittelstädt, 1950). A 1974). In monkeys, activity in the auditory cortex is inhibited
subtractive comparison of this corollary discharge with the actual during vocalizations (Eliades & Wang, 2003; Müller-Preuss &
sensory feedback associated with the action (‘‘re-afference’’) Ploog, 1981).
In humans, the results are less consistent; however, there have
been reports of dampened temporal lobe responsiveness during
This research was supported by NIH grant MH 58262 and speech production (Creutzfeld, Ojemann, & Lettich, 1989a,
MH067967, NARSAD, the Department of Veterans Affairs and the 1989b). Magnetoencephalography (MEG) recordings have
German National Merit Foundation. We thank J. Houde, S. Nagarajan, shown that auditory cortical responses to self-produced speech
W. Roth, A. Maldonado, and U. Halsband for their advice and are attenuated when compared to responses to tape-recorded
assistance.
speech (Curio, Neuloh, Numminen, Jousmake, & Hari, 2000;
Address reprint requests to: Judith M. Ford, Ph.D., Psychiatry
Service 116A, VA Healthcare System, 950 Campbell Avenue, West Houde, Nagarajan, Sekihara, & Merzenich, 2002; Numminen &
Haven, CT 06516, USA. E-mail: judith.ford@yale.edu. Curio, 1999; Numminen, Salmelin, & Hari, 1999; Pantev, Eulitz,
180
14698986, 2005, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1469-8986.2005.00272.x by University of New South Wales, Wiley Online Library on [09/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Altered auditory feedback 181

Elbert, & Hoke, 1994). For example, studies by Curio et al. and Support for a similarly precise forward model effect in the
Houde et al. showed a reduction of the auditory M100 auditory modality during speech production comes from several
component to speech sounds when subjects spoke them relative MEG (Curio et al., 2000; Houde et al., 2002) and ERP (Ford,
to when they heard them played back. Using electroencephalo- Mathalon, Heinks, et al., 2001; Ford, Mathalon, Kalba, et al.,
graphy (EEG) we showed a similar effect on the auditory N100 2001) studies, as mentioned above. However, conclusions from
(N1) component of the event-related brain potential (ERP; Ford, these studies are limited by their reliance on an experimental
Mathalon, Heinks, Kalba, & Roth, 2001), with the N100 approach involving the comparison of auditory responses to
response being smaller for speech during its production than speech during its production relative to its playback. This
during its playback; both the magnetic M100 and the EEG N100 approach is associated with a potential confound: Although the
have a dominant source in auditory cortex and its immediate loudness of the played back speech was matched to the loudness
environs (Hari et al., 1987; Krumbholz, Patterson, Seither- of the speech as it was being spoken, the speech sounds may
Preisler, Lammertmann, & Lutkenhoner, 2003; Ozaki et al., have differed in quality due to properties of bone conduction,
2003; Pantev, Eulitz, Hampson, Ross, & Roberts, 1996; Reite middle ear muscle contraction, and the response characteristics
et al., 1994; Sams et al., 1985). of the ear. Thus, dampened cortical responsiveness during
These results could be accounted for by the operation of a speaking relative to playback could, in part, be due to the
forward model in which an efference copy of the speech different physical qualities of the sounds.
commands and a corollary discharge representing their predicted Another approach to testing the hypothesis that a precise
auditory consequences modulate the responsiveness of the forward model operates in the auditory system is to manipulate
auditory cortex. However, these results could also be explained the re-afferent auditory feedback that subjects hear as they
by a general gating or dampening of all incoming auditory produce speech. Alteration of the auditory feedback experienced
stimulation during self-generated speech. Support for this during speech allows for a direct test of the prediction, derived
general dampening hypothesis comes from ERP (Ford, Matha- from the precise forward model hypothesis, that auditory cortical
lon, Kalba, et al., 2001) and MEG (Houde et al., 2002) studies dampening during speech is greater to the exact speech sound and
showing the auditory cortical response to sound probes (e.g., less to sounds that do not match it. There is evidence from PET-
phonemes, white noise bursts, tone pips) is attenuated when studies (Hirano et al., 1997) that during talking unaltered and
probes are presented while subjects produce speech relative to altered auditory feedback (either by distortion or time-delay)
when subjects passively listen to the probes. However, these activate different brain regions. However, PET studies are not
studies also showed that the cortical response to sound probes able to reveal the temporal dynamics of activity on a millisecond
was similarly attenuated when these probes were presented scale like EEG or MEG. Houde et al. (2002) used MEG to
during the tape-recorded playback of the speech produced by compare the M100 to speech production versus speech playback
subjects. Whereas Ford, Mathalon, Kalba, et al. showed the in two different experiments, one involving accurate acoustic
response to sound probes to be equally attenuated by speaking delivery of the speech sounds and the other involving the addition
and listening to playback of recorded speech, Houde et al. of white noise that coincided with and effectively masked speech
showed a very small decrement in the M100 to sound probes sounds during their production and playback. The authors found
during speech production relative to speech playback. Indeed, that the M100 suppression observed during speech production
the fact that this decrement was quite small in comparison with relative to playback was abolished when subjects heard white
the large suppression of the M100 to speech itself during its noise instead of the expected voice feedback. Although these
production relative to its playback led Houde et al. to conclude results may show some specificity of the cortical responsiveness,
that general dampening of the auditory cortex during speech white noise is far different from speech and produces widespread
production was, at best, a negligible effect. Instead, Houde et al. activation of the auditory cortex. Thus, the results do not
argued that their data provided strong support for the forward necessarily show the precision of the sensory attenuation during
model hypothesis in which sensory stimulation (in this case, the speaking. Moreover, the Houde et al. results were potentially
auditory re-afference) is specifically suppressed to the extent that confounded by their reliance on direct comparisons of spoken
it matches the predicted sensory consequences (i.e., expected versus played-back speech, as discussed above.
sound) associated with the efference copy of the motor act (i.e., Accordingly, our goal was to design an experiment to
speech commands). investigate the precision of the forward model hypothesis by
Sensorimotor studies in the somatosensory system (Blake- assessing modulations of cortical responsiveness to speech
more et al., 1998, Blakemore, Wolpert, & Frith, 2000; sounds during their production without having to compare them
Weiskrantz et al., 1971) provide evidence for a precise forward to their playback. To this end, we altered the re-afferent auditory
model in which sensory stimulation has to correspond accurately feedback associated with self-produced speech, allowing us to
to the movement producing it in order to attenuate its perception, examine the degree to which suppression of the auditory cortical
with the amount of perceptual attenuation being proportional to response depends on the closeness of the match between the
the accuracy of the sensory prediction. For example, in a study auditory feedback and the predicted feedback (Figure 1). EEG
reported by Blakemore, Frith, and Wolpert (1999) subjects were was recorded while the subjects produced speech sounds and
asked to rate the sensation of self-produced tactile stimulation. heard real-time feedback of either their unaltered speech, their
When varied degrees of delay or trajectory rotation between the pitch-shifted speech, or the voice of another person. The N100
subject’s movement and the resultant tactile stimulation were component of the auditory ERP was compared for these
introduced, the tactile sensation was rated as more intense than different speech-feedback conditions. Two tasks were conducted.
when there was no externally manipulated alteration of the First, we tested the specificity of signal attenuation during speech
movement. Furthermore, the subjects reported incremental production by presenting the subjects with the different feedback
increases in perceived intensity as the delay or the trajectory conditions. Second, we tested whether there was also a difference
rotation was parametrically increased. in cortical activity to the different speech conditions when the
14698986, 2005, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1469-8986.2005.00272.x by University of New South Wales, Wiley Online Library on [09/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
182 T.H. Heinks-Maldonado et al.

Artificial external influences

Actual sensory feedback


No
alteration

Sensorimotor command Pitch shift


System
Alien voice

Alien voice

Auditory input processing


d
picth

man
Com r
Moto
shifted

Desired speech
sound
Effe
Cop
renc
y
e

Predictor Predicted sensory feedback (corollary discharge)


Comparator

Internal model of

no
vocal apparatus

cy

dis
pan
Model of

cre
cre
environmental

pan
dis
influences

cy
Figure 1. A model for determining the auditory consequences of speaking. An internal forward model makes predictions of the
auditory feedback (corollary discharge) based on a copy of the motor command (efference copy). These predictions are then
compared with the actual auditory feedback (re-afference). Self-produced speech sounds can be correctly predicted on the basis of
the efference copy and are associated with little or no sensory discrepancy resulting from the comparison between predicted and
actual feedback. This results in suppression of auditory cortex to the self-produced sound as can be seen by a reduced N100
amplitude. When the actual feedback does not match the predicted feedback (by altering the feedback), the discrepancy increases
and so does the likelihood that the sound is externally produced. As a result the cortical suppression decreases and the N100
amplitude increases. Such a system would allow canceling out the effects of self-produced speech and thereby distinguishing sounds
due to self-produced speech from auditory feedback caused by the environment.

subject was passively listening to playback of the recorded speech. alien voice pitch-shifted downward by two semitones (alien-pitch
Data from these tasks were also compared to each other. shifted). As suggested by Shuster and Durrant (2002), the self-
unaltered voice needed to be pitch-shifted down 0.3 semitones to
best match the subjective experience of self-generated speech.
Methods After each trial, participants were prompted to indicate via
button press whether the feedback heard was their own voice, the
Participants alien voice, or whether they were unsure. Participants were
We recorded ERPs from 17 men (ages 21–48) who were fluent required to respond within 1.5 s after the prompt. Responses
English speakers and naı̈ve to the purpose of this study. After falling outside that window were considered misses. Participants
giving informed consent and passing a hearing test, each were told in the instructions that their own or the alien voice
participant took part in the acclimation phase followed by two would sometimes be pitch-shifted, but that they were still
runs of the speaking task and two runs of the listening task. required to decide whether its source was their own or the alien
voice. These behavioral responses were collected to assess
Tasks whether participants were actually able to distinguish between
The experiment started with an acclimation phase in which the sources of the different types of feedback.
participants produced the vowel [a:] while being made aware of Visual stimuli presented on a computer monitor were used to
the various feedback conditions. In the speaking task, partici- prompt the participant to speak or respond. To avoid an overlap
pants were told to utter ‘‘a short’’ [a:] about every 5 s. The of visual and auditory responses, the participants were instructed
feedback voice participants heard over headphones was varied to speak after the disappearance of the visual cue on the screen.
randomly between their own unaltered voice (self-unaltered), Considering an average vocal response time of about 200–500
their own voice pitch-shifted downward by two semitones (self- ms, the auditory N100 should be relatively free of the influence of
pitch shifted), the alien unaltered voice (alien-unaltered) or the visual ERP components. To further avoid an overlap of the N100
14698986, 2005, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1469-8986.2005.00272.x by University of New South Wales, Wiley Online Library on [09/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Altered auditory feedback 183

with a motor potential associated with the subsequent button resetting of the trigger production module, and inserting of a
press, a stimulus was displayed on the screen 1.5 s after the onset trigger code in the EEG data collection system. The rectified and
of each speech sound to prompt the participants to indicate their filtered signal was also used internally to drive an envelope
response with a button press. All trials containing early responses follower that modulated the processed output signal to match the
or early speech sounds were excluded from analysis. incoming audio signal in amplitude and duration. The average
In the listening task, the recorded feedback sounds from the duration of the participant’s vocalizations was approximately
speaking task were played back, and participants were instructed 350 ms.
to merely listen and then decide about the source of the voice The mean SPL of the participants’ utterances was 76 dB
heard. All other features remained the same as in the speaking measured at a 5 cm distance from the participants’ mouths.
task including the same visual cues and volume. The listening During both the speaking and the listening tasks, the mean sound
task was carried out to replicate the approach of other studies pressure level (SPL) of the speech sounds played back over the
comparing cortical responses during speaking and listening, as headphones was increased 15 db over the average SPL of the
well as to determine whether there were differential effects of the participant’s speech. This was necessary to mask the effect of
feedback conditions when merely listening. Each task consisted bone conduction during vocalization. The SPL measurements
of 240 trials with 60 trials per condition. were made directly at the headphone using a special coupler to
connect the SPL meter and the headphones.
Instrumentation Data Acquisition and Processing
To create the different feedback conditions, we used an audio We acquired EEG data continuously from 27 sites (F7, F3, Fz,
presentation system (Figure 2) that allowed us to detect the F4, F8, FT7, FC3, FCz, FC4, FT8, T7, C3, Cz, C4, T8, TP7,
participant’s vocalization and, in real time, modulate the CP3, CPz, CP4, TP8, P7, P3, Pz, P4, P8, Tp9, Tp10) referenced
participant’s voice or substitute it with a prerecorded speech to the nose. Additional electrodes were placed on the outer canthi
sample of a male voice (‘‘alien’’). When the participant vocalized, of the eyes to measure horizontal eye movements, and above and
the speech signal was picked up by a microphone and sent below the right eye to monitor blinks and vertical eye move-
through a preamplifier to a personal computer equipped with ments. Epochs were synchronized to vocalization onset and
sound processing software and hardware. The incoming audio corrected for eye movements and blinks (Gratton, Coles, &
signal was used to generate a trigger pulse that initiated either a Donchin, 1983), and then re-referenced relative to the mastoid
pitch shift or the alien voice sample (as shown in Figure 2) that electrodes to minimize artifacts associated with talking as well as
was amplified and played to the participant via headphones. to be consistent with the reference sites used in our prior studies
The analog audio system consisted of an Audix OM2 (Ford, Mathalon, Heinks, et al., 2001; Ford, Mathalon, Kalba,
microphone, Nady MM4 mini-mixer, RCA SA155 stereo et al., 2001). After rejecting trials containing artifacts (voltages
amplifier, and audio-technica ATH-M40fs studiophones. Digi- exceeding 4 50 mV), averages using only correctly identified
tal processing was accomplished with the Reaktor software trials were created and then band-pass filtered 0.5–12 Hz.
program (Native Instruments) running on a Gateway PC (MS Averages containing less than 15 trials were not included in the
Windows 2000, 800 MHz), with an M-Audio, Audiophile 2496 statistical analyses.
sound card. The digital sampling rate of the soundcard was N100 was defined as the most negative peak between 80 and
44,100 Hz, and the ASIO drivers delivered 88 samples/processing 120 ms following the onset of the speech sound and was
bin. This, combined with a 1.25-ms software control rate, measured relative to a baseline of 150 ms prior to stimulus onset.
allowed us to detect and modulate in real time the participant’s
vocalizations through the digital processing stream with only 6 Statistics
ms of delay as measured with a Tektronix oscilloscope. A delay Repeated-measures analyses of variance (ANOVA) were con-
this small is not perceptible (Lee, 1950; Stone & Moore, 1999) ducted to examine effects of Task (speaking, listening), Source
and it is unlikely to influence the participant’s performance or the (self, alien), and Pitch (unaltered, pitch-shifted) on the accuracy
ERP amplitudes or latencies. of participants’ judgments regarding the source of the speech
A trigger pulse, signaling onset of vocalization, was generated sounds they heard. Analysis of the ERP N100 data was guided
within the software program on the rising edge of the rectified by the forward model theory that posits N100 attenuation to the
and low-pass filtered channel of the split incoming audio signal. auditory re-afference when it matches the expected sound
This internal trigger pulse initiated all other software processing associated with the produced speech (i.e., self-unaltered feed-
including modification of the original incoming audio channel, back) and by MEG studies reporting hemispheric lateralization
of the suppression effect. Thus, N100 amplitudes were analyzed
in a four-way ANOVA including factors of Task (speaking,
listening), Condition (self unaltered, self pitch-shifted, alien
unaltered, alien pitch-shifted), Laterality (left, right), and
No
Electrode Site. We included 20 electrode sites in the analysis,
ah
alteration 10 for each hemisphere.
ah Pitch shift
Amplifier
Alien voice Results

Pre- Alien voice Behavioral Data


pitch shifted
Amplifier Three-way (Task  Source  Pitch) ANOVAs were used to
assess the accuracy of subjects’ responses (percent correct) and
Figure 2. The audio system used in the experiments. Description: see text. the role of errors (percent incorrect), uncertainty (percent
14698986, 2005, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1469-8986.2005.00272.x by University of New South Wales, Wiley Online Library on [09/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
184 T.H. Heinks-Maldonado et al.

ERPs are plotted in Figures 4 and 5. This ANOVA revealed a


significant main effect of Task, F(1,15) 5 20.85, p 5 .000, and a
Task  Condition  Laterality interaction, F(3,45) 5 3.32,
p 5 .041.
To further investigate this interaction, the data were then
examined separately for each task and hemisphere. The separate
ANOVA for speaking (with the factors Condition, Laterality,
and Electrode Site) showed a significant main effect of
Condition, F(3,45) 5 3.22, p 5 .037, indicating the smallest
N100 amplitudes and biggest suppression during the self
unaltered condition but no significant difference between the
left and the right sites. The ANOVA for listening did not show
any significant main or interaction effects.1
We further performed two separate ANOVAs for the left sites
and the right sites with the factors Task, Condition, and
Electrode Site. For the left sites we found a significant main
effect of Task, F(1,15) 5 17.61, p 5 .001, and a significant
interaction of Task  Condition, F(3,45) 5 2.86, p 5 .047, in-
dicating that the difference between speaking and listening (or
rather the suppression during speech production as compared to
passive listening) on the left is biggest during the self unaltered
condition. This is not the case for the right sites: The ANOVA for
the right sites resulted in a significant main effect for Task only,
F(1,15) 5 22.47, p 5 .000.
An ANOVA of N100 latencies revealed no significant effects.
Figure 3. Means of percent correct, unsure, incorrect, and missed
responses to the four conditions during speaking and listening.
Earlier, we found that N100 to a noise probe or to a syllable
spoken by another person was smaller during speaking than
during listening, even though the eliciting stimulus did not match
what was being said by the subject (Ford, Mathalon, Kalba,
unsure), and misses (percent failures to respond in 1.5-s response et al., 2001). To address whether the speaking–listening
window) in reducing response accuracy (see Figure 3). One difference was greatest when the eliciting sound precisely
subject was excluded from the analysis of the behavioral and the matched the produced sound (i.e., in the self-unaltered condi-
ERP data, since his percentage of missed responses was tion), a two-way (Condition [self-unaltered, self-pitch shifted,
extraordinarily high (45% vs. 6% in the remaining subjects), alien-unaltered, alien-pitch shifted]  Electrode Site) repeated-
which was probably due to a technical failure. measures ANOVA was performed on the N100 speaking–
For the percent correct responses, there were significant listening difference scores from the left electrode sites (F7, F3,
effects of Source, F(1,15) 5 8.07, p 5 .012, and Pitch, F(1,15) 5 FT7, FC3, T7, C3, TP7, CP3, P7, P3), which revealed a signi-
19.6, p 5 .000. The Source effect indicated that subjects were ficant Condition effect, F(1,15) 5 2.86, p 5 .047 (Figure 6).
relatively more accurate in identifying their own voice as their Because we intended to investigate our hypothesis that the first
own than in identifying the alien voice as alien. The Pitch effect condition (self-unaltered) differed significantly from all the other
indicated that subjects were more accurate in identifying both conditions, that is, that the N100 during unaltered feedback of
their own voice and the alien voice when the auditory feedback one own voice was most suppressed as compared to the other
was unaltered relative to when it was pitch-shifted. A similar feedback types, the degrees of freedom for the condition effect
pattern can be observed in terms of misidentification errors were parsed into three single degree of freedom Helmert
(percent incorrect): The ANOVA showed significant main effects contrasts. Helmert contrasts compare the mean of each level of
for Source, F(1,15) 5 5.29, p 5 .036, and Pitch, F(1,15) 5 7.97, the factor Condition (except the last) to the mean of subsequent
p 5 .013. Further, a trend was observed for Task, F(1,15) 5 4.03, levels, with the first contrast being of primary interest: self
p 5 .063, indicating a tendency to respond more often incorrectly unaltered vs. mean (self pitch-shifted, alien unaltered, alien pitch-
during the speaking task. The response uncertainty (percent shifted). This contrast was significant, F(1,15) 5 6.10, p 5 .026,
unsure) ANOVA also showed significant effects for Source, and did not interact with site. The remaining Helmert contrasts
F(1,15) 5 5.7, p 5 .031, and Pitch, F(1,15) 5 8.24, p 5 .012,
indicating that subjects were unsure more often when the voice
source was alien than when it was self and that response 1
uncertainty increased when a pitch-shift occurred. Finally, the To compare the data of this study with results from previous studies
in our laboratory, we also performed the separate ANOVAs for speaking
number of misses was not affected by any of the factors. and listening on the midline sites and found similar results: During
speaking the N100 in the self-unaltered condition was significantly more
suppressed than the N100 in the three remaining conditions (Condition:
ERP Data F [3,45] 5 2.86, p 5 .047; Helmert contrast of the SELF-UNALTERED
N100 amplitudes of 20 sites were analyzed in a four-way condition vs. the average of the subsequent levels, F [1,15] 5 4.7,
p 5 .046). No differences between conditions were observed for the
ANOVA including factors of Task (speaking, listening), Condi- LISTENING task. We further found a significantly larger N100
tion (self unaltered, self pitch-shifted, alien unaltered, alien pitch- amplitude to the self-unaltered voice condition in the listening task
shifted), Laterality (left, right), and Electrode Site. The compared to the speaking task, F(1,15) 5 36.2, p 5 .000.
14698986, 2005, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1469-8986.2005.00272.x by University of New South Wales, Wiley Online Library on [09/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Altered auditory feedback 185

A Speaking Self - left sites Speaking Self - right sites

4.5 4.5 4.5 4.5


F7 F3 F4
F8

µV

µV
µV

µV
0 0 0 0

−3.5 −3.5 −3.5 −3.5


0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

4.5 4.5 4.5 4.5


FT7 µV
FC3 FC4 FT8

µV
µV

µV
0 0 0 0

−3.5 −3.5 −3.5 −3.5


0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

4.5 4.5 4.5 4.5


T7 C3 C4 T8
µV

0 µV

µV
µV

0 0 0

−3.5 −3.5 −3.5 −3.5


0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

4.5 4.5 4.5 4.5


TP7 CP3 CP4 TP8
µV

µV
µV

µV

0 0 0 0

−3.5 −3.5 −3.5 −3.5


0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

4.5 4.5 4.5 4.5


P7 P3 P4 P8
µV
µV
µV

0
µV

0 0 0

−3.5 −3.5 −3.5 −3.5


0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms
Self unaltered
Self pitch-shifted

Figure 4. Event related potential (ERP) waveforms averaged over all subjects at all sites for the four conditions during speaking. A: Subjects heard their
own voice either unaltered or pitch-shifted. B: Subjects heard the alien voice either unaltered or pitch-shifted. In both plots 0 indicates the onset of the
speech sound.

successively compared the difference scores for self pitch-shifted Discussion


versus mean (alien unaltered, alien pitch-shifted) and alien
unaltered versus alien pitch-shifted. Neither of these contrasts We tested the hypothesis that a precise forward model operates in
was significant, nor did they significantly interact with electrode the auditory system during speech production causing maximal
site. Thus, this analysis confirmed our hypothesis that the dampening of the auditory cortical response to the incoming
reduction in N100 during speaking relative to listening was most sounds that most closely match the speech sounds predicted by
pronounced when the subject heard his own undistorted voice, the model (i.e., corollary discharge). To test the hypothesized
that is, when there was an exact match between what the subject selectivity of this feedforward auditory cortical dampening
said and what the subject heard. mechanism, we manipulated the re-afferent auditory feedback
14698986, 2005, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1469-8986.2005.00272.x by University of New South Wales, Wiley Online Library on [09/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
186 T.H. Heinks-Maldonado et al.

B Speaking Alien - left sites Speaking Alien - right sites

4.5 4.5 4.5 4.5


F7 F3 F4 F8
µV

µV

µV
µV
0 0 0 0

−3.5 −3.5 −3.5 −3.5


0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

4.5 4.5 4.5 4.5


FT7 FC3 FC4 FT8
µV

µV

µV
µV

0 0 0 0

−3.5 −3.5 −3.5 −3.5


0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

4.5 4.5 4.5 4.5


T7 C3 C4 T8
µV

µV

µV
µV

0 0 0 0

−3.5 −3.5 −3.5 −3.5


0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

4.5 4.5 4.5 4.5


TP7 CP3 CP4 TP8
µV

µV

µV
µV

0 0 0 0

−3.5 −3.5 −3.5 −3.5


0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms
4.5 4.5 4.5 4.5
P7 P3 P4 P8
µV
µV

µV
µV

0 0 0 0

−3.5 −3.5 −3.5 −3.5


0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms
= Alien pitch-shifted
= Alien unaltered

Figure 4. (Continued)

that subjects heard as they produced speech and assessed both The number of uncertainty responses was similar for speaking
their perception of this feedback and its evoked auditory cortical and listening. There was a tendency, however, for subjects to
N100 response. The performance data showed that subjects had more often respond incorrectly during speaking than during
a 90% accuracy identifying their own unaltered voice as their listening. Even though the ANOVAs did not reveal a significant
own. In the other three conditions incorrect and uncertainty interaction of Task  Source  Pitch, Figure 3 shows that
responses increased. This can be explained by the nature of the during the speaking task subjects had a bias in identifying the
behavioral task. Subjects were instructed to decide whether they inputs as their own. This can be explained by the fact that the
believed the source of the auditory feedback was rather their own subjects were actually involved in the motor act of speaking as
voice or someone else’s, even if the feedback was altered in pitch. compared to passively listening to the sounds, which may have
We assume that it was similarly difficult to decide whether increased the tendency to assume their own voices to be the
subjects heard their own voice pitch-shifted or someone else’s source of the auditory feedback.
voice either unaltered or pitch-shifted. If the instructions had During speech production the N100 amplitude was maxi-
been to decide whether the sound had been altered or not, the mally reduced to the subject’s own unaltered voice feedback re-
results would perhaps have been different. lative to the pitch-shifted and alien speech feedback (Figures 4, 5).
14698986, 2005, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1469-8986.2005.00272.x by University of New South Wales, Wiley Online Library on [09/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Altered auditory feedback 187

A Listening Self - left sites Listening Self - right sites

7 7 7 7
F7 F3 F4 F8

µV
µV

µV

µV
0 0 0 0

−8 −8 −8 −8
0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

7 7 7 7
FT7 FC3 FC4 FT8

µV
µV

µV

µV
0 0 0 0

−8 −8 −8 −8
0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms
7 7 7 7
T3 C3 C4 T4

µV
µV

µV

µV
0 0 0 0

−8 −8 −8 −8
0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

7 7 7 7
TP7 CP3 CP4 TP8

µV
µV

µV

µV

0 0 0 0

−8 −8 −8 −8
0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

7 7 7 7
T5 P3 P4 T6
µV
µV

µV

µV

0 0 0 0

−8 −8 −8 −8
0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

Self unaltered
Self pitch-shifted

Figure 5. Event related potential (ERP) waveforms averaged over all subjects at all sites for the four conditions during listening. A: Subjects heard their
own voice either unaltered or pitch-shifted. B: Subjects heard the alien voice either unaltered or pitch-shifted. In both plots 0 indicates the onset of the
speech sound.

The different feedback types (self-unaltered, self-pitch shifted, production relative to playback was reduced when subjects heard
alien-unaltered, alien-pitch shifted) did not lead to differences in white noise instead of the expected voice feedback. However, our
N100 amplitude during the listening task, even though subjects approach extends the findings of Houde et al. in two ways. First,
correctly identified the source. Houde et al.’s use of broadband white noise to mask and replace
Thus, auditory response attenuation during speech produc- the speech sounds during their production was problematic.
tion is greatest when the re-afferent auditory feedback exactly Because of the substantial acoustic difference between white
matches the predicted auditory consequences of speech (i.e., noise and speech sounds, as well as the fact that white noise
corollary discharge). broadly activates the auditory cortex, Houde et al.’s results could
Our results are consistent with those reported by Houde et al. have been due to the activating effects of white noise rather than
(2002), who found that the M100 suppression during speech to the selective attenuation of the auditory response to the
14698986, 2005, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1469-8986.2005.00272.x by University of New South Wales, Wiley Online Library on [09/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
188 T.H. Heinks-Maldonado et al.

B Listening Alien - left sites Listening Alien - right sites

8 8 8 8
F7 F3 F4 F8

µV

µV
µV

µV
0 0 0 0

−8 −8 −8 −8
0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

8 8 8 8
FT7 FC3 FC4 FT8

µV

µV
µV

µV

0 0 0 0

−8 −8 −8 −8
0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

8 8 8 8
T3 C3 C4 T4

µV

µV
µV

µV

0 0 0 0

−8 −8 −8 −8
0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

8 8 8 8
TP7 CP3 CP4 TP8
µV

µV
µV

µV

0 0 0 0

−8 −8 −8 −8
0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

8 8 8 8
T5 P3 P4 T6
µV

µV
µV

µV

0 0 0 0

−8 −8 −8 −8
0 100 300 0 100 300 0 100 300 0 100 300
ms ms ms ms

= Alien unaltered
= Alien pitch-shifted

Figure 5. (Continued)

specific speech produced. By using voice feedback that differed show selective effects of a precise forward model within the
from the subject’s own speech output only in pitch and/or source speaking task itself.
(self/alien), masking sounds that were much more similar to the It is difficult to link the results of the performance data and the
subjects’ own speech than the white noise mask used by Houde ERP data. During speaking it was easiest for subjects to identify
et al., we demonstrated unambiguously that the attenuation of their own unaltered voice as their own and in this case we found a
the auditory sensory response during speech production is suppressed N100 amplitude, that is, a suppressed N100 precedes
greatest to the subject’s own speech. Second, Houde et al. relied the correct behavioral response in the self unaltered condition.
on direct comparisons between spoken and played-back speech, Determining whether the suppression of N100 amplitude
which potentially differed in sound quality as discussed in the correlates with the correctness of the response would require
introduction. In contrast, by changing the re-afferent auditory analysis of the N100 amplitudes during the erroneous and
feedback during speech production and keeping all other uncertain responses and comparison with the N100 amplitudes
parameters of the forward model constant, we were able to during correct performance. This is not feasible in our study
14698986, 2005, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1469-8986.2005.00272.x by University of New South Wales, Wiley Online Library on [09/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Altered auditory feedback 189

subject speaks and simultaneously hears either his pitch-shifted


voice or somebody else’s voice, the combined spectrum becomes
richer, and presumably more frequency-tuned neurons in
tonotopic cortex are excited. Moreover, in the speaking self-
unaltered condition, the subject’s bone-conducted voice could
presumably be masked better by the feedback (because of
frequency overlap) than in the other cases, where frequencies do
not overlap. However, we think the effect of spectral mismatch
would be small for the following reasons. The spectrum of the
bone-conducted feedback is not the same as that of the ‘‘side-
tone’’ (i.e., air-conducted from the mouth around the sides of the
head to the ears). The bones and soft tissues of the head act as a
significant low-pass filter and they also have their own
resonances that alter the spectrum of speech as it travels from
the vocal tract through the head to the cochlea (Tonndorf, 1972).
Thus, even the side tone feedback of the unaltered speech will
substantially mismatch the bone conduction spectrum almost as
Figure 6. Speaking–listening difference of N100 amplitudes in the four
conditions. much as the pitch-shifted and alien voice spectra do.
In accordance with findings previously reported in the
literature (Curio et al., 2000; Ford, Mathalon, Heinks, et al.,
because the number of trials for incorrect responses is too small 2001; Houde et al., 2002) our results again show a statistically
to create averages with a minimum of 15 trials. significant difference in N100 amplitude between the speaking
To be consistent with other studies that compared N100 (or task and the listening task. This finding could be interpreted as
M100) during speaking and listening, we also had subjects support for the hypothesis of a general suppression effect during
perform the listening task in which they passively listened to the speaking; however, as discussed above, the results of the
speech sounds recorded during the speaking task. Comparing the comparison between speaking and listening may be affected by
listening to the speaking data when the subject’s voice was not differences in sound quality.
altered or substituted by the alien voice, we replicated earlier In addition, our findings show evidence for selective
studies showing a significantly smaller N100 amplitude during suppression of auditory cortex to unaltered self-produced speech,
speaking than during listening (Curio et al., 2000; Ford, supporting a precise forward model mechanism. We assume that
Mathalon, Heinks, et al., 2001; Houde et al, 2002). this precise forward model mechanism that modulates auditory
The results of the MEG studies by Houde et al. (2002) and cortical responsiveness during speech is similar to the forward
Curio et al. (2000) are consistent regarding the hemispheric model systems shown to modulate somatosensory responses to
lateralization of the suppression effect during speaking compared self-generated motor acts (Jeannerod, 1988, 2003; Wolpert &
to listening: Both studies found the difference between speaking Flanagan, 2001). These sensorimotor forward models are
and listening to be most significant on the left. Our data support postulated to provide the mechanism for distinguishing self-
these findings: When the auditory feedback is the subject’s generated from externally generated somatosensory stimulation,
unaltered voice (which is comparable to the studies by Curio which needs to be processed differently (Jeannerod, 2003;
et al. and Houde et al.), the difference between speaking and Wolpert & Flanagan, 2001). Indeed, distinguishing the sources
listening is most pronounced on the left. However, during of somatosensory input is fundamental for recognizing our own
speaking we do not see a left–right difference between the self actions. Extrapolating to the auditory system, precise auditory
unaltered condition and the other three feedback types. suppression during speech allows the auditory system to
It could be argued that the effects of the different feedback distinguish between internal and external sources of auditory
conditions on N100 during the speaking task may have been due information. Auditory information from self-produced speech
to ‘‘prior probabilities’’ of the feedback stimuli. Altered or alien may be used for feedback control of ongoing speech (e.g.,
feedback is improbable by its nature, and improbable or loudness or prosody of speech), whereas externally generated
unexpected stimuli have been associated with larger N100s auditory information may be used primarily to recognize
(Roth, Ford, Lewis, & Kopell, 1976). However, our study design environmental events. When sensory consequences of self-
mitigated against this prior probability confound because produced speech are filtered from other incoming auditory
subjects heard altered or alien feedback on 75% of the trials. stimuli, any unpredicted sounds are immediately recognized as
Another potential explanation of the reduced N100 amplitude externally generated, and therefore, require additional evalua-
can also be considered. While a subject speaks and hears his own tion in the context of the current environment to identify the
unaltered voice, the same spectrum is presented via bone source of the sounds and to determine whether a response is
conduction and via the feedback line. In contrast, when the necessary.

REFERENCES

Blakemore, S., Frith, C., & Wolpert, D. (1999). Spatio-temporal Blakemore, S., Wolpert, D., & Frith, C. (2000). Why can’t you tickle
prediction modulates the perception of self-produced stimuli. Journal yourself? NeuroReport, 11, 11–16.
of Cognitive Neuroscience, 11, 551–559. Creutzfeld, O., Ojemann, G., & Lettich, E. (1989a). Neuronal activity in
Blakemore, S., Wolpert, D., & Frith, C. (1998). Central cancellation of the human lateral temporal lobe: I. Responses to speech. Experi-
self-produced tickle sensation. Nature Neuroscience, 1, 635–640. mental Brain Research, 77, 451–475.
14698986, 2005, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/j.1469-8986.2005.00272.x by University of New South Wales, Wiley Online Library on [09/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
190 T.H. Heinks-Maldonado et al.

Creutzfeldt, O., Ojeman, G., & Lettich, E. (1989b). Neuronal activity in Ozaki, I., Suzuki, Y., Jin, C., Baba, M., Matsunaga, M., & Hashimoto, I.
the human lateral temporal lobe. II. Responses to the subject’s own (2003). Dynamic movement of N100m dipoles in evoked magnetic
voice. Experimental Brain Research, 77, 476–489. field reflects sequential activation of isofrequency bands in human
Curio, G., Neuloh, G., Numminen, J., Jousmaki, V., & Hari, R. (2000). auditory cortex. Clinical Neurophysiology, 114, 1681–1688.
Speaking modifies voice-evoked activity in the human auditory Pantev, C., Eulitz, C., Elbert, T., & Hoke, M. (1994). The auditory
cortex. Human Brain Mapping, 9, 183–191. evoked sustained field: Origin and frequency dependence. Electro-
Eliades, S., & Wang, X. (2003). Sensory-motor interaction in the primate encephalography and Clinical Neurophysiology, 90, 82–90.
auditory cortex during self-initiated vocalizations. Journal of Pantev, C., Eulitz, C., Hampson, S., Ross, B., & Roberts, L. (1996). The
Neurophysiology, 89, 2194–2207. auditory evoked ‘‘off’’ response: Sources and comparison with the
Ford, J., Mathalon, D., Heinks, T., Kalba, S., & Roth, W. (2001). ‘‘on’’ and the ‘‘sustained’’ responses. Ear & Hearing, 17, 255–265.
Neurophysiological evidence of corollary discharge dysfunctions in Reite, M., Adams, M., Simon, J., Teale, P., Sheeder, J., & Richardson,
schizophrenia. American Journal of Psychiatry, 158, 2069–2071. D., et al. (1994). Auditory M100 component 1: Relationship to
Ford, J. M., Mathalon, D. H., Kalba, S., Whitfield, S., Faustman, W. O., Heschl’s gyri. Brain Research Cognitive Brain Research, 2, 13–20.
& Roth, W. T. (2001). Cortical responsiveness during talking and Roth, W., Ford, J., Lewis, S., & Kopell, B. (1976). Effects of stimulus
listening in schizophrenia: An event-related brain potential study. probability and task-relevance on event-related potentials. Psycho-
Biological Psychiatry, 50, 540–549. physiology, 13, 311–317.
Gratton, G., Coles, M., & Donchin, E. (1983). A new method for off-line Sams, M., Hamalainen, M., Antervo, A., Kaukoranta, E., Reinikainen,
removal of ocular artifact. Electroencephalography and Clinical K., & Hari, R. (1985). Cerebral neuromagnetic responses evoked by
Neurophysiology, 55, 468–484. short auditory stimuli. Electroencephalography and Clinical Neuro-
Hari, R., Pelizzone, M., Makela, J., Hallstrom, J., Leinonen, L., & physiology, 61, 254–266.
Lounasmaa, O. V. (1987). Neuromagnetic responses of the human Shuster, L., & Durrant, J. (2003). Toward a better understanding of self-
auditory cortex to on- and offsets of noise bursts. Audiology, 26, produced speech. Journal of Communication Disorders, 36, 1–11.
31–43. Sperry, R. (1950). Neural basis of the spontaneous optokinetic response
Hein, A., & Held, R. (1962). A neural model for labile sensorimotor produced by visual inversion. Journal of Comparative and Physiolo-
coordination. In E. Bernard & M. Hare (Eds.), Biological prototypes gical Psychology, 43, 482–489.
and synthetic systems (pp. 71–74). New York: Plenum Press. Stone, M., & Moore, B. (1999). Tolerable hearing aid delays. I.
Hirano, S., Kojima, H., Naito, Y., Honjo, I., Kamoto, Y., & Okazawa, Estimation of limits imposed by the auditory path alone using
H., et al. (1997). Cortical processing mechanism for vocalization with simulated hearing losses. Ear & Hearing, 20, 182–192.
auditory verbal feedback. NeuroReport, 8, 2379–2382. Suga, N., & Schlegel, P. (1972). Neural attenuation of responses to
Houde, J., Nagarajan, S., Sekihara, K., & Merzenich, M. (2002). emitted sounds in echolocating bats. Science, 177, 82–84.
Modulation of the auditory cortex during speech: An MEG study. Suga, N., & Shimozawa, T. (1974). Site of neural attenuation of
Journal of Cognitive Neuroscience, 14, 1125–1138. responses to self-vocalized sounds in echolocating bats. Science, 183,
Jeannerod, M. (1988). The neural and behavioral organization of goal- 1211–1213.
directed movement. Oxford, UK: Oxford University Press. Tonndorf, J. (1972). Bone conduction. In J. Tobias (Ed.), Foundations of
Jeannerod, M. (2003). The mechanism of self-recognition in humans. modern auditory theory. New York: Academic Press.
Behavioral Brain Research, 142, 1–15. von Holst, E., & Mittelstädt, H. (1950). Das Reafferenzprinzip.
Krumbholz, K., Patterson, R., Seither-Preisler, A., Lammertmann, C., Naturwissenschaften, 37, 464–476.
& Lutkenhoner, B. (2003). Neuromagnetic evidence for a pitch Weiskrantz, L., Elliot, J., & Darlington, C. (1971). Preliminary
processing center in Heschl’s gyrus. Cerebral Cortex, 13, 765–772. observations on tickling oneself. Nature, 230, 598–599.
Lee, B. (1950). Some effects of side-tone delay. Journal of the Acoustic Wolpert, D. (1997). Computational approaches to motor control. Trends
Society of America, 22, 639–640. in Cognitive Science, 1, 209–216.
Müller-Preuss, P., & Ploog, D. (1981). Inhibition of auditory cortical Wolpert, D., & Flanagan, J. (2001). Motor prediction. Current Biology,
neurons during phonation. Brain Research, 215, 61–76. 11, 729–732.
Numminen, J., & Curio, G. (1999). Differential effects of overt, covert Wolpert, D., Ghahramani, Z., & Jordan, M. (1995). An internal model
and replayed speech on vowel-evoked responses of the human for sensorimotor integration. Science, 269, 1880–1882.
auditory cortex. Neurscience Letters, 272, 29–32.
Numminen, J., Salmelin, R., & Hari, R. (1999). Subject’s own speech
reduces reactivity of the human auditory cortex. Neurscience Letters,
265, 119–122. (Received August 18, 2004; Accepted November 29, 2004)

You might also like