Professional Documents
Culture Documents
I Am Sharing 'Retention Anova' With You
I Am Sharing 'Retention Anova' With You
Research Article
Purpose: The purpose of this study was to determine Performance was compared across groups and listening
whether long-term musical training enhances the ability to conditions.
perceive and learn new auditory information. Listeners with Results: Performance was significantly poorer in babble
extensive musical experience were expected to detect, than in quiet on word recognition and nonword detection,
learn, and retain novel words more effectively than participants but not on word learning, learned-word recall, or learned-
without musical training. Advantages of musical training were word detection. No differences were observed between
expected to be greater for words learned in multitalker babble groups (musicians vs. nonmusicians) on any of the tasks.
compared to quiet. Conclusions: For young normal-hearing adults, auditory
Method: Participants consisted of 20 young adult musicians experience resulting from long-term music training did not
and 20 age-matched nonmusicians, all with normal enhance their learning of new auditory information in either
hearing. In addition to completing word recognition and favorable (quiet) or unfavorable (babble) listening conditions.
nonword detection tasks, each participant learned 10 novel This suggests that the formation of semantic and musical
words in a rapid word-learning paradigm. All tasks were representations in memory may be supported by the same
completed in quiet and in multitalker babble. Next-day underlying auditory processes, such that musical training
retention of the learned words was examined in isolation is simply an extension of an auditory expertise that both
(recall) and in the context of continuous discourse (detection). musicians and nonmusicians possess.
A
fundamental component of word learning is as- lists. This approach requires substantial effort on the part
certaining the meaning of the word. According to of both the learner and the instructor, as it usually in-
Bloom (2000), knowing the meaning of a word volves careful explanation of definitions and repeated
requires (a) having a mental representation or concept that the practice. Thus, learning words well enough to recognize
word symbolizes and (b) mapping that concept onto the correct them easily and use them correctly takes a considerable
linguistic form (i.e., unit of speech capable of carrying amount of time using this method. In fact, it is estimated
meaning, such as a morpheme, word, or phrase). The that children learn only 100–200 words this way over the
most observable type of word learning occurs through course of a school year (Miller & Gildea, 1987). Thus, there
direct instruction. In early childhood, this takes the form must be another source of word learning that accounts for
of ostensive naming—that is, labeling targets in the child’s the majority of the 60,000-word vocabulary of the average
environment. A limitation of this approach is the problem high school graduate (e.g., Pinker, 1994).
of generalization: The child must have a way to take a There is convincing evidence that most words are ac-
word that was learned in one situation and apply it appro- quired through incidental exposure during verbal commu-
priately in a new situation. Another method of direct in- nication (Bloom, 2000; Sternberg, 1987). Sternberg and
struction leading to word learning, typically beginning in Powell (1983) proposed a model of contextual learning
about the fourth grade, is the memorization of vocabulary comprising three components: (a) knowledge acquisition
processes, (b) contextual cues, and (c) moderating variables.
This model assumes that information relevant to learning a
a
Department of Speech and Hearing Science, Arizona State new word is rarely presented in isolation; rather, it is often
University, Tempe embedded in a background of information, not all of which
Correspondence to Elizabeth C. Stewart: is necessary for acquiring the meaning of the novel word.
Elizabeth.Stewart@sonova.com Knowledge acquisition processes consist of examining all
Editor-in-Chief: Frederick (Erick) Gallun of the available information and identifying that which is
Editor: Yi Shen critical for understanding the word’s meaning and filtering
Received August 16, 2020
Revision received December 30, 2020
Accepted March 12, 2021 Disclosure: The authors have declared that no competing interests existed at the time
https://doi.org/10.1044/2021_JSLHR-20-00482 of publication.
2870 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021 • Copyright © 2021 American Speech-Language-Hearing Association
out irrelevant data (selective encoding), taking the relevant to their nonmusician peers (Dittinger et al., 2017, 2019).
pieces of information and combining them into a cohesive Additionally, young adult musicians were able to recall
unit (selective combination), and then evaluating the new in- the word–picture associations significantly more accurately
formation to determine how it relates to existing knowledge than nonmusicians when the matching task was repeated
(selective comparison). In short, novel information is selected, 5 months after novel words were first learned (Dittinger
integrated, and examined to arrive at word meaning. et al., 2016). The outcomes of these studies suggest that
Knowledge acquisition processes operate on a set of music training may indeed enhance word learning. There
cues that are drawn from the context in which the new is a body of evidence, discussed in the section that follows,
words occur. These contextual cues can help to establish that provides examples of perceptual and cognitive skills
the meaning of an unknown word by providing informa- that music training has been shown to enhance. Given the
tion about the word’s physical properties (such as its size, importance of these skills for auditory learning, the evi-
shape, or color), function, class membership, and so forth. dence that follows further supports the hypothesis that mu-
Moderating variables dictate the degree to which the con- sical training enhances novel word learning.
textual cues facilitate word learning and can include the
number of times the unknown word is encountered; the
Advantages of Music Training for Perception
variability of contexts in which the word appears over mul-
tiple occurrences; or how critical the word is to comprehen- and Cognition
sion of the phrase, sentence, or passage in which it appears One of the variables moderating context-driven word
(Sternberg & Powell, 1983). learning is the importance of the unknown word to the phrase
In this study, the hypothesis that musical training in which it is embedded, as this will determine how critical it
moderates the learning and retention of new information is to decipher the meaning of the word (Sternberg & Powell,
was tested. Central to this hypothesis is the assumption 1983). One way to assess a novel word’s significance in a
that the benefit of musical training extends to nonmusical sentence is through prosodic cues such as stress, which is
disciplines. This assumption is supported by the theory of often used to draw attention to key words. Prosodic infor-
positive transfer, which proposes that a concept or skill mation, such as stress and intonation, is carried by pitch con-
learned in one context can be applied in a similar context to tour (i.e., pattern of changes in pitch) in speech, whereas
improve performance (Perkins & Salomon, 1992). Besson pitch contour in music conveys melody (Chandrasekaran
et al. (2011) have suggested that a boost in perceptual acu- & Kraus, 2010). Musicians have demonstrated an advan-
ity in one domain following long-term sensory experience tage for detecting pitch changes in both music and speech.
in another domain serves as evidence of a transfer of train- For example, Schön et al. (2004) found that young adult
ing. This assertion is based on findings that certain brain musicians were better than their nonmusician peers at de-
structures function similarly when processing language and tecting subtle incongruities in pitch within melodies and
music (Besson et al., 2011). For example, results of a func- in spoken sentences, suggesting that musical training pro-
tional magnetic resonance imaging study on young adult vides an advantage for the perception of pitch contour in
nonmusicians showed increased activation in a neuronal both music and spoken language. Thus, musical experience
network that included portions of Broca’s and Wernicke’s may enhance sensitivity to prosodic cues, allowing the lis-
areas for musical chord sequences containing dissonant tener to determine the importance of a novel word relative
tone clusters, compared to those containing in-key chords to the overall meaning of the sentence in which it appears.
(Koelsch et al., 2002). Because the involvement of these When the pitch of two distinct sounds is identical, ac-
areas in language processing is well established, this study curate detection of timbre helps listeners discriminate be-
appears to demonstrate an overlap in the representation of tween them (Kraus et al., 2009). In speech, timbre carries
linguistic and musical syntax in the brain (Patel, 2008). acoustic information that aids in distinguishing one phoneme
To explore the question of whether experience with from another. Timbre is considered a qualitative acoustic
music learning transfers to language learning specifically, characteristic and is dependent upon harmonic compo-
Dittinger et al. (2016, 2017, 2019) completed a set of stud- nents of complex sounds (Lee et al., 2009). Musicians have
ies using a novel word learning paradigm involving word– shown stronger physiological responses to the harmonic
picture associations presented in a quiet listening environ- components of both tones and speech (Lee et al., 2009;
ment. This paradigm included a word-learning phase, in Musacchia et al., 2008), indicating that their neural rep-
which natural Thai monosyllabic words were presented, resentation of timbre is enhanced relative to nonmusicians.
along with pictures of familiar objects 20 times each, to As the perception of timbre is important for the discrimi-
allow participants (nonspeakers of Thai) to learn the word– nation of speech sounds, heightened sensitivity to timbre
picture associations. Following the word-learning phase, could improve detection of novel words that differ from
participants were presented with the same pictures and were familiar words by only one phoneme.
asked to indicate whether the auditorily presented word that Other benefits of music training have been revealed
followed matched or mismatched the picture based on the in investigations of auditory-specific cognitive functions. For
learned word–picture associations. These studies revealed example, Strait et al. (2010) compared the performance of
that young adults and children with musical training expe- adult musicians and nonmusicians on an auditory attention
rience made fewer errors on the matching task compared task in which participants were instructed to press a response
2872 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
New to this test battery were measures of detection experience. Participants with musical training were expected
and recall of the learned words on the day immediately fol- to perceive and learn novel words more effectively than par-
lowing the rapid word-learning task. There is evidence that ticipants without musical training. It was also expected that
memory consolidation for auditorily learned information participants with musical experience would be able to more
occurs during a night’s sleep. One such study involved an effectively consolidate novel words into long-term memory,
auditory pitch memory task in which 56 adult nonmusicians allowing them to both detect and recall newly learned words
(18–40 years old) with normal hearing heard sequences of more accurately 1 day following training than their nonmu-
sine wave tones of varying pitch and were tested on their sician peers. Finally, it was anticipated that the advantages
accuracy in determining whether or not the pitch of a tone of musical training would be greater for words learned in
at the end of a sequence was the same as the pitch of the multitalker babble compared to quiet due to the enhanced
first tone in the same sequence (Gaab et al., 2004). Three auditory perception of these listeners.
study sessions, each of which included blocks of training
and testing, were completed over a 24-hr period. Results
revealed no improvement in task performance across ses- Method
sions separated by 12 waking hours, but a significant im- Participants
provement following sleep, indicating a “delayed learning” Twenty adult musicians (12 men and eight women)
effect. More recent evidence, however, suggests that sleep between the ages of 23 and 34 years were recruited from the
may not be necessary to achieve posttraining consolidation. student population in the School of Music at Arizona State
Collet et al. (2012) conducted a study in which 20 native University and from local community symphonies. Musical
French speakers (18–38 years old, all with normal hearing) history was characterized in terms of the age when musical
learned to discriminate between two synthetic syllables, dif- training was initiated and years of consistent practice. Musi-
fering only in voice onset time over two training/testing cians had between 13 and 25 years of training (M = 18.8 ±
sessions. For half of the participants, the two sessions were 4.0 years), initiated between 3 and 12 years of age (M = 7.7 ±
separated by 12 waking hours, while the other half completed 2.6 years). All participants held at least a bachelor’s degree.
their second session after a 12-hr period, which included ap- For 19 of the 20 musicians, this degree was in a music-related
proximately 8 hr of sleep. Performance gains (i.e., learning) discipline (performance, education, theory, etc.), whereas one
made during the initial session were mostly maintained at the musician held a degree in a nonmusic discipline.
second session for both groups, indicating that performance An additional 20 adults (four men and 16 women)
stabilization occurred over the delay independent of sleep between the ages of 21 and 38 years with little to no musical
(Collet et al., 2012). A similar study used a verbal auditory experience served as a control group. Ten of the nonmusi-
identification task to train and test young adults (18–26 years cians had no music training at all, while the other 10 had be-
old) with normal hearing on consonant–vowel pair discrimi- tween 0.25 and 2.25 years of training (M = 0.5 ± 0.7 years),
nation (Roth et al., 2005). Results revealed significant im- initiated between the ages of 6 and 15 years (M = 11.0 ±
provement in syllable discrimination when tested 6–12 hr 2.8 years). None of the nonmusicians reported any training
after training, whether or not that interval included a period within the preceding 10 years. All participants held at least
of sleep. However, the largest gains in performance were ob- a bachelor’s degree in a nonmusic discipline.
served 24 hr (including at least 6 sleeping hours) posttraining. All participants were monolingual English speakers
It should be noted that, in all of these studies, the task and had normal hearing bilaterally, as confirmed by pure-
used to assess consolidation of auditorily learned informa- tone audiometry. Hearing thresholds were ≤ 25 dB HL for
tion was the same as the one used for training. This was not octave frequencies between 0.25 and 8 kHz, with the ex-
the case in this study. However, a subsequent study com- ception of one musician, whose threshold at 8 kHz was
pleted in our lab revealed significant positive correlations 30 dB HL in the right ear only. Average hearing thresholds
between next-day retention of novel words and performance are shown in Table 1, along with demographic information
on the rapid word-learning task (de Diego-Lázaro et al., 2021; for each participant group.
Pittman & de Diego-Lázaro, 2021). This suggests that
there is a direct relationship between performance during Auditory Stimuli
the learning process on the first day and next-day recall,
Word and sentence stimuli were recorded by a talker
such that we can predict next-day retention fairly well. The
having a standard American English dialect at a sampling
detection task used in this study assessed participants’ abil-
rate of 22.05 kHz using a microphone with a flat frequency
ity to detect newly learned words within continuous dis-
response to 10 kHz. Stimuli were digitized and edited into
course, while the recall task quantified the participants’ individual .wav files using Adobe Audition (Version 1.5) and
retention of the word/referent pairs. Together, both mea- equated for the root-mean-square level.
sures revealed the participants’ capacity for consolidating
newly learned information into long-term memory.
In this study, perception of familiar words and the Test Battery
ability to detect, learn, and recall novel words was examined Word Recognition
in young, normal-hearing adults with extensive musical Participants heard and repeated aloud sets of 25 familiar
training, as well as in age-matched peers with no musical words from the Northwestern University Auditory Test No. 6
(NU-6) word recognition test (Tillman & Carhart, 1966). In 1974). Use of these metrics allowed for the examination
order to facilitate accurate scoring, participants’ verbal re- of the effects of music training and listening condition on
sponses were captured with a digital audio recorder (Zoom participants’ ability to discriminate between familiar and
H2N) coupled to a head-worn microphone (Shure, WH20) novel words (d’), as well as their decision criteria for making
positioned approximately 2 in. from the corner of the this discrimination (c). In short, d’ served as an indication of
participant’s mouth. Responses were then reviewed and participants’ sensitivity to novel words, while c revealed the
scored offline as either correct or incorrect. Overall per- presence and magnitude of response bias toward familiar or
formance was scored in proportion correct and arcsine novel words.
transformed for statistical analysis. No reinforcement was
provided for this task. One list of 25 words took less than
2 min to complete. Appendix A contains the word lists Word/Referent Association Task
used for this task. Participants learned novel words through a process of
trial and error using an interactive computer game (Pittman,
2011). Participants heard five nonsense words, presented
Nonword Detection randomly one at a time. Each nonsense word was paired
The nonword detection task involved a mix of familiar with one of five nonsense objects/characters. A picture of
and novel words, presented together in four-word sentences. each nonsense image was displayed on one of five buttons
Briefly, these sentences were adapted from sentences used in arranged across the bottom of a computer screen. Listeners se-
Stelmachowicz et al. (2000) by substituting individual pho- lected one of the five images after presentation of each non-
nemes with phonemes of similar phonotactic probability in sense word. Visual reinforcement for correct selections was
order to change some of the real words to nonsense words provided by a computer game located above the response
(see Pittman & Schuett, 2013, for a detailed description of buttons. The game advanced following each correct re-
this task). All words were monosyllabic. Within each list of sponse (e.g., one piece of a puzzle appeared), while no rein-
12 sentences, six sentences contained two nonsense words, forcement was provided for incorrect selections. Participants
four sentences contained one nonsense word, and two sen- were instructed to use the reinforcement to associate each
tences contained zero nonsense words. Following the pre- nonsense word with the correct image. Each nonsense word
sentation of each sentence, participants indicated which words was presented 18 times, for a total of 90 randomized trials.
(if any) in each sentence were nonsense words by selecting This task supports the configuration process in the expanded
the corresponding numbered button(s) on a computer screen. lexical neighborhood model, as the repeated exposures
Visual reinforcement was provided via an interactive com- facilitate refinement of the acoustic and semantic proper-
puter game for correct responses, but not for incorrect ties of each word. The paradigm also satisfies the retrieval
responses. One list took less than 2 min to complete. Ap- practice required to effectively associate a word with its
pendix B contains the sentence lists used for this task; non- referent for later recall (Karpicke, 2012; Karpicke & Roediger,
sense words are in bolded text. 2008). This task took approximately 5–6 min to complete.
To evaluate the effects of musical training and listen- Appendix C contains the orthographic representations of
ing condition on the ability to detect novel words in the the nonsense words used for this task, as well as the pho-
context of sentences, results from this task were analyzed netic transcription and phonotactic probability (Vitevitch
according to signal detection theory (Swets, 1996). Partic- & Luce, 2004) of each word.
ipants’ responses were broken down into hits (correctly Performance on this task was quantified in terms of
identified nonsense words), misses (nonsense words identi- the efficiency of word learning, defined as the number of
fied as real), false alarms (real words identified as nonsense), trials required to achieve the criterion performance of 71%
and correct rejections (correctly identified real words). Per- accuracy. To calculate efficiency of word learning, trial-
formance was quantified by two metrics: (a) d’ was determined by-trial data were reduced chronologically to nine bins of
by calculating the difference between the standardized (z- 10 trials each, and the proportion of correct responses within
transformed) scores for hits and false alarms and (b) c was each bin was calculated. The raw data were then smoothed
n
calculated as the inverse of one half the sum of the stan- with an exponential growth function Pc ¼ 1−0:8e− c , where
dardized scores for hits and false alarms (Green & Swets, Pc is the probability of a correct answer, 1−0.8 reflects
2874 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
chance performance for this task (20%), e is 2.718…, n is but similar nonsense words). To assess word recall after con-
the midpoint of the trial block (5, 15, 25, etc.), and c is the solidation of words learned the previous day, this task
time constant of the process. When the number of trials was completed prior to the learned-word detection task
equals the time constant (n = c), performance is 71% cor- to avoid additional, same-day exposures to some of the
rect. This was accomplished by adjusting estimates of c to learned words. Additionally, to avoid multiple exposures
minimize the sum of the squared deviations between the of unfamiliar nonsense words, the foils used in this task
observed data and the points predicted (Pittman, 2011). differed from those used in the learned-word detection task.
The number of trials was log transformed and limited to Participants were instructed to label each image with the
no more than 1,000 trials. The inverse of the number of correct word from the word bank. This task took approxi-
trials required for each participant to reach this criterion mately 3–5 min to complete. Responses were scored in pro-
level of performance is the efficiency of word learning. portion correct for each set of words learned. Appendix E
contains the word bank used for this task.
Learned-Word Detection
On the day following completion of the word learn-
Procedure
ing task, participants listened to a spoken passage recorded
from the same talker who produced the nonsense words in Participants completed the word recognition, non-
the word/referent association task. Some words in the pas- word detection, rapid word-learning (word/referent associ-
sage were replaced with the learned words, while others were ation), and learned-word detection tasks twice: once in quiet
replaced with unfamiliar nonsense words (foils). Three of and once in multitalker babble at +3 dB signal-to-noise ratio
the five learned words appeared in the passage; three repe- (SNR). The order of listening condition was counterbalanced
titions of each learned word occurred within the discourse. across participants. Different word lists, sentence lists, novel
The passage also contained three repetitions of each of three word sets, and spoken passages were used for each condition;
unfamiliar foils. Together, each participant was exposed to one list, set, or passage was presented in each condition. The
21 repetitions of the target words over the 2 days of testing stimuli were presented from a desktop computer using custom
and only three repetitions of the foils. Thus, minimal famil- laboratory software. Auditory stimuli (NU-6 words, sentences,
iarity to the foils was expected. The passage contained novel words) were routed from the computer through a high-
approximately 250 words (including nonsense targets and speed (96 kHz), high-quality (24-bit resolution) soundcard
foils) and resulted in a recording that was approximately with six analog channels (Echo Gina 3G) to binaural insert
90 s in duration, which was the length of time participants earphones (ER-3A; Etymotic Research) and presented at
were given to complete the task. 60 dBA. The experimental software was also used to display
Participants were given a brass “cricket clicker” and visual reinforcement and record participants’ responses.
instructed to click as soon as they heard a word they learned The word/referent recall task was a written test and
the previous day and to ignore all other words. The audible was thus completed in quiet. It included all words learned
clicks were recorded using the omnidirectional microphone the previous day and was administered once. Detection of the
setting on a digital recorder (Zoom H2N), while the passage learned words was assessed in the same listening conditions
(i.e., the output of the soundcard) was recorded simulta- (quiet, babble) in which they were learned. The order of lis-
neously in a separate channel. The waveforms from both tening condition was counterbalanced across participants.
channels were examined visually offline. Clicks appeared In accordance with the policies of the internal review
as large impulses in the waveform. The timestamp of each board at Arizona State University, informed consent was
click was compared to the timestamp of the occurrence of obtained from each participant prior to testing. Comple-
each learned word and each foil within the passage to de- tion of study tasks required no more than 2.5 hr over two
termine each participant’s hits, misses, correct rejections, sessions. Subjects were paid in cash for their participation.
and false alarms.
Performance on this task was also analyzed using
signal detection theory. In this case, d’ served as an indica- Results
tion of participants’ sensitivity to the learned words, while To test the main hypothesis, multivariate analyses of
c was used to identify any response bias toward the targets variance were used to examine group differences in the var-
or foils. Appendix D contains the passages used for this ious learning tasks across listening conditions. The indepen-
task. Learned word targets are indicated in bold while un- dent variables were musical experience (long-term training,
familiar foils are italicized. no training) and listening condition (quiet, babble).
Figure 1 shows average performance on word recog-
Word/Referent Recall Task nition as a function of listening condition for musicians and
Participants completed a written posttest consisting nonmusicians. For both groups, perception of familiar words
of the two sets of five images learned the previous day, with was 11%–12% poorer in babble than in quiet. Musicians’
all 10 images visible on a single page of paper, in random scores were also 1%–2% higher than those of nonmusicians
order. On the same page, just below the images, a word in both listening conditions. Multivariate analyses of vari-
bank was provided containing orthographic representations ance revealed significantly better recognition in quiet than
of the 10 learned nonsense words, plus 10 foils (unfamiliar in babble, F(1, 38) = 104.68, p < .001, η2 = .586. However,
2876 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
positive bias (c) in babble, F(1, 38) = 26.00, p < .001, η2 = Figure 5. Averages (x) and distribution of scores (d’) on learned-
word detection in quiet and babble for musicians (MUS) and
.260. This indicates that, on average, the participants failed nonmusicians (NOM). Solid lines inside the shaded boxes indicate
to identify the nonsense words when perception was disrupted median scores. Lower and upper box boundaries indicate 25th
by the presence of multitalker babble. No main effect of and 75th percentiles, respectively, while lower and upper error
group was observed for this task, F(1, 38) = 1.30, p = .258, lines show the 10th and 90th percentiles, respectively. Filled
and the Group × Listening Condition interaction was also circles show scores falling outside the 10th percentile.
not significant, F(1, 38) = 0.67, p = .417. These results indi-
cate that musicians and nonmusicians approach the identifica-
tion of nonsense words in the same way and that musicians
did not demonstrate superior response criteria in either lis-
tening condition.
Figure 4 shows average efficiency of word learning
(i.e., performance on the word/referent association task) as
a function of listening condition for musicians and nonmu-
sicians. Recall that learning efficiency was determined by
log transforming the number of trials required to achieve a
criterion performance (71% correct), limited to 1,000 trials.
This resulted in a scale of 0–3, where a learning efficiency of
3 indicates perfect learning while a learning efficiency of 0 in-
dicates no learning. For this task, performance did not
decline in multitalker babble relative to quiet, F(1, 38) =
0.28, p = .601, suggesting that learning was resistant to
the adverse effects of multitalker babble. In quiet, the word-
learning efficiency of nonmusicians was slightly greater than
that of musicians, while in babble, musicians learned non-
sense words more efficiently than nonmusicians. Even so,
no significant main effect of group, F(1, 38) = 0.54, p = .466, discourse. Unlike the results of the nonword detection task,
or Group × Listening Condition interaction, F(1, 38) = 3.02, sensitivity to the newly learned nonsense words was not signif-
p = .086, was observed. icantly better in quiet than in multitalker babble, F(1, 38) =
Figure 5 shows average performance for musicians 1.07, p = .304. Additionally, the performance of the musicians
and nonmusicians on the learned-word detection task as a and nonmusicians was equivalent, F(1, 38) = 0.19, p = .666.
function of listening condition. Performance is expressed in Finally, no Group × Listening Condition interaction was ob-
units of d’ to indicate the listeners’ sensitivity to the newly served, F(1, 38) = 0.02, p = .904. Thus, exposure to nonsense
learned words when presented in the context of continuous words via the interactive learning paradigm was sufficient
to allow most listeners to distinguish these words from un-
familiar nonsense words the next day. These results indicate
Figure 4. Averages (x) and distribution of word learning efficiency that musical training experience did not, on average, further
in quiet and babble for musicians and nonmusicians. Solid lines improve the ability to detect newly learned words.
inside the shaded boxes indicate median scores. Lower and upper As with the nonword detection task, response criteria
box boundaries indicate 25th and 75th percentiles, respectively, (c) was calculated for this task, as well. Recall that response
while lower and upper error lines show the 10th and 90th percentiles,
respectively. criteria near c = 0 indicates no bias. For this task, positive
values of c represent a failure to identify learned words (mis-
ses) as they occur in the passage, whereas negative values
represent a failure to refrain from identifying unfamiliar
words (false alarm). Figure 6 shows the average response
criteria (c) for each group as a function of listening con-
dition. No significant main effect of group, F(1, 38) = 0.01,
p = .932, or condition, F(1, 38) = 0.27, p = .607, or signifi-
cant interaction, F(1, 38) = 0.49, p = .488, was observed.
On average, both groups demonstrated a negative response
bias in quiet and in babble, indicating a bias toward the
identification of more nonsense words than were learned
the previous day. It appears that participants were inclined
to respond with a click each time they heard an unfamiliar
word, regardless of whether they were hearing it for the
first or 21st time.
Finally, Figure 7 shows average performance for mu-
sicians and nonmusicians on the word/referent recall task
Discussion
The objective of this study was to investigate the im-
pact of musical experience on perception of familiar words
and on learning new words in quiet and in multitalker bab-
ble. Significant effects of listening condition were observed
for word recognition and nonword detection, but not for
word learning or retention. These results indicate that an
acoustic competitor such as multitalker babble affects the
perception of familiar words differently than learning new
words. While multitalker babble creates a mismatch be-
tween auditory perception and auditory representation of
familiar words in memory, the formation of auditory rep-
resentations of new words in memory proceeds similarly in
both listening environments, albeit the quality of those rep-
resentations may differ.
as a function of listening condition. Both groups performed It was hypothesized that participants with extensive
similarly across conditions, recalling, on average, 61%– musical training would demonstrate an advantage over
69% of nonsense words learned the day before. Performance their nonmusician peers and that group differences would
differed by 3%–8% across groups, with nonmusicians scor- be larger for the more difficult listening condition. The re-
ing slightly higher on words learned in quiet and musicians sults did not support this hypothesis. Instead, word recog-
scoring slightly higher on words learned in babble. Multi- nition, nonword detection, word learning, and recall did
variate analysis of variance revealed no significant effect not differ significantly across groups. Although some tasks
of group, F(1, 38) = 0.07, p = .789, or listening condition, (most notably the word/referent recall task) showed large
variability in performance, it is unlikely that this accounted
for the lack of group differences. Published effect sizes for
Figure 7. Averages (x) and distribution of scores on next-day word the word recognition, nonword detection, and word/referent
recall in quiet and babble for musicians (MUS) and nonmusicians association tasks indicate that a sample size of 20 in each
(NOM). Solid lines inside the shaded boxes indicate median scores.
Lower and upper box boundaries indicate 25th and 75th group is sufficient to reveal significant group effects if they
percentiles, respectively, while lower and upper error lines show exist (e.g., Pittman, Stewart, Willman, & Odgear, 2017).
the 10th and 90th percentiles, respectively. While increasing the sample size would reduce within-group
variability, average performance for all of the outcome mea-
sures indicates that doing so is unlikely to yield an apprecia-
ble difference in the means of each group.
It has been suggested that music students, when exam-
ining a new piece of music, create internal musical represen-
tations by drawing from what they have learned through
their experiences with previously encountered material
(Bamberger, 2000, as cited in Taetle & Cutietta, 2002).
Likewise, the word/referent recall and learned-word de-
tection tasks used in this study required participants to
access information learned the previous day and identify
it in a different modality (visual) or in a different context
(continuous discourse). The musicians were expected to le-
verage their unique experience when processing new audi-
tory information, but they did not perform any better on
these tasks than the listeners without musical experience.
That is, auditory skills related to music learning did not
appear to generalize to the process of learning new words.
2878 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
Differences in experimental design may explain the These results are inconsistent with findings from previous
inconsistent findings between the present and previous stud- studies showing superior speech-in-noise perception in
ies. While the objectives and participants in this study align musicians (Parbery-Clark et al., 2009; Strait et al., 2012).
closely with those of Dittinger et al. (2016, 2017, 2019), the Thus, it may be useful to again consider methodological
stimuli and methodology were unique. First, the findings differences across studies.
of superior learning in musicians compared to nonmusicians In previous studies, speech perception in noise was
in the Dittinger et al. studies are supported by evidence from assessed using tests with varying SNRs, in which either the
electroencephalogram data collected in the same participants noise was presented at a constant level and the level of the
—specifically, changes to the N400 component of the event- speech was increased and decreased according to the listener’s
related potential, a measure which was not included in this performance (Hearing in Noise Test; Nilsson et al., 1994), or
study. All three studies revealed a significantly faster increase the level of the speech was held constant and the level of the
in N400 amplitude during training in musicians compared noise was gradually increased (Quick Speech-In-Noise Test [
to nonmusicians, reflecting accelerated encoding of the novel Killion et al., 2004] and Words in Noise Test [Wilson, 1993]).
word meanings. Additionally, the novel words learned in Results of these prior investigations revealed that musicians
the Dittinger et al. studies were natural Thai syllables that were able to repeat words and sentences accurately at poorer
differed in consonant place, vowel duration, and tonal qual- SNRs than nonmusicians. Thus, it is possible that musical
ity. By contrast, the stimuli in this study were representative advantage for speech perception in noise may be SNR de-
of multisyllabic English words, requiring the participants to pendent, with significant benefits observed for more demanding
retain longer and more complex combinations of phonemes listening conditions than those used in this study.
in memory. Another notable difference from the Dittinger Differences in the physical setup of the experiment
et al. studies was the use of familiar objects as referents. are another potential source of inconsistency across studies
Essentially, the participants learned a new name (or syn- of speech perception performance in musicians and non-
onym) for something they already had a name for, as one musicians. For example, the musically trained children in
does when learning a foreign language. In this study, un- the study by Strait et al. (2012) showed superior performance
familiar nonsense images that had no name were paired on speech-in-noise testing when the noise was spatially
with the nonsense words. This required participants to form separated from the speech signal, but not when noise was
an entirely new representation, consisting of a new object collocated with speech. On the other hand, adult musicians
and its new name. Finally, the Dittinger protocol facilitated outperformed their nonmusician peers in collocated but
word learning through direct instruction, whereas the learn- not spatially separated noise conditions (Parbery-Clark
ing task used in this study represents active learning through et al., 2009). In this study, speech and babble were pre-
retrieval practice (Karpicke, 2012; Karpicke & Roediger, sented binaurally through insert earphones such that both
2008). Despite the differences in methodology, the results ears received identical signals, similar to the collocated noise
of these studies suggest that musicians have an advantage condition. While it is plausible that this slight difference in
over their nonmusician peers when it comes to learning words methodology accounts for some variation in speech percep-
representative of tonal languages (such as Thai), but that both tion results, this was not the first study to report findings that
groups learn words representative of atonal languages (such differed from those of Parbery-Clark et al. (2009). Ruggles
as English) equally well. This particular benefit of music train- et al. (2014) assessed speech understanding in noise in musi-
ing may be a reflection of musicians’ superior ability to detect cians and nonmusicians, with the goal of replicating the
pitch contrasts in both speech and music, relative to nonmusi- Parbery-Clark et al. study. On average, the musicians per-
cians (Perfors & Ong, 2012; Schön et al., 2004). formed better on the Hearing in Noise Test and the Quick
The results of this study are, however, consistent with Speech-In-Noise Test than their nonmusician peers, though
another study in which a similar test battery was used to group differences were small and did not reach significance.
assess learning and retention of novel words in English– Similarly, the musicians in this study slightly outperformed
Spanish bilingual children, compared to monolingual English- their nonmusician peers for most tasks completed in babble,
speaking children (de Diego-Lázaro et al., 2021). Results of but the differences were not significant. Fuller et al. (2014)
this study revealed no advantage of bilingualism for learn- tested word recognition at three SNRs (+10, +5, 0), as
ing or retention of novel words that conformed to the well as sentence recognition using noise and multitalker
phonotactic rules of English, Spanish, and Arabic. Taken maskers in musicians and nonmusicians. Musicians’ word
together with this study, these findings suggest that, while recognition did not differ significantly from those of non-
these tasks have been demonstrated to be sensitive to differ- musicians at any SNR, nor were any significant group dif-
ences in auditory status (normal hearing, hearing loss), this ferences observed for sentence recognition in any of the
test battery may not be as sensitive to differences in audi- masker conditions. Similarly, Boebinger et al. (2015) used
tory experience (musical training, bilingualism). four different speech and noise maskers to assess sentence
Unique to this study is the inclusion of a multitalker recognition in musicians and nonmusicians and found no
babble condition—a common communication setting. It advantage of music training in perceiving masked speech.
was hypothesized that the musicians’ performance in each Taken together, these findings indicate that the advan-
task would be less impacted by noise than that of their non- tage of music training for understanding speech in noise
musician peers, but this was also not supported by the data. is equivocal at best.
2880 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
word representations. Journal of Cognitive Neuroscience, 28(10), Patel, A. D. (2008). Music, language, and the brain. Oxford Univer-
1584–1602. https://doi.org/10.1162/jocn_a_00997 sity Press. http://www.loc.gov/catdir/toc/ecip0715/2007014189.html
Dittinger, E., Chobert, J., Ziegler, J. C., & Besson, M. (2017). Perfors, A., & Ong, J. H. (2012). Musicians are better at learning
Fast brain plasticity during word learning in musically-trained non-native sound contrasts even in nontonal languages. In
children. Frontiers in Human Neuroscience, 11, 233. https://doi. N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings
org/10.3389/fnhum.2017.00233 of the 34th Annual Conference of the Cognitive Science Society
Dittinger, E., Scherer, J., Jäncke, L., Besson, M., & Elmer, S. (pp. 839–844). Cognitive Science Society.
(2019). Testing the influence of musical expertise on novel word Perkins, D. N., & Salomon, G. (1992). Transfer of learning. In P.
learning across the lifespan using a cross-sectional approach in Press (Ed.), International encyclopedia of education (2nd ed.).
children, young adults and older adults. Brain and Language, Pergamon Press.
198, 104678. https://doi.org/10.1016/j.bandl.2019.104678 Pinker, S. (1994). The language instinct. HarperCollins. https://
Fuller, C. D., Galvin, J. J., III., Maat, B., Free, R. H., & Baskent, D. doi.org/10.1037/e412952005-009
(2014). The musician effect: Does it persist under degraded pitch Pittman, A. L. (2011). Age-related benefits of digital noise reduc-
conditions of cochlear implant simulations? Frontiers in Neuro- tion for short-term word learning in children with hearing
science, 8, 179. https://doi.org/10.3389/fnins.2014.00179 loss. Journal of Speech, Language, and Hearing Research, 54(5),
Gaab, N., Paetzold, M., Becker, M., Walker, M. P., & Schlaug, G. 1448–1463. https://doi.org/10.1044/1092-4388(2011/10-0341)
(2004). The influence of sleep on auditory learning: A behavioral Pittman, A. L. (2019). Bone conduction amplification in children:
study. NeuroReport, 15(4), 731–734. https://doi.org/10.1097/ Stimulation via a percutaneous abutment versus a transcutane-
00001756-200403220-00032 ous softband. Ear and Hearing, 40(6), 1307–1315. https://doi.
Green, D. M., & Swets, J. A. (1974). Signal detection theory and org/10.1097/AUD.0000000000000710
psychophysics. Wiley. Pittman, A. L., & de Diego-Lázaro, B. (2021). What can a child
Karpicke, J. D. (2012). Retrieval-based learning. Association for do with one normal-hearing ear? Speech perception and word
Psychological Science, 21(3), 157–163. https://doi.org/10.1177/ learning in children with unilateral and bilateral hearing losses
0963721412443552 relative to peers with normal hearing. Ear and Hearing. Advance
Karpicke, J. D., & Roediger, H. L., III. (2008). The critical impor- online publication. https://doi.org/10.1097/aud.0000000000001028
tance of retrieval for learning. Science, 319(5865), 966–968. Pittman, A. L., & Rash, M. A. (2016). Auditory lexical decision
https://doi.org/10.1126/science.1152408 and repetition in children: Effects of acoustic and lexical con-
Killion, M. C., Niquette, P. A., Gudmundsen, G. I., Revit, L. J., & straints. Ear and Hearing, 37(2), e119–e128. https://doi.org/
Banerjee, S. (2004). Development of a Quick Speech-in-Noise 10.1097/AUD.0000000000000230
Test for measuring signal-to-noise ratio loss in normal-hearing Pittman, A. L., & Schuett, B. C. (2013). Effects of semantic and
and hearing-impaired listeners. The Journal of the Acoustical acoustic context on nonword detection in children with hearing
Society of America, 116(4), 2395–2405. https://doi.org/10.1121/ loss. Ear and Hearing, 34(2), 213–220. https://doi.org/10.1097/
1.1784440 AUD.0b013e31826e5006
Koelsch, S., Gunter, T. C., Cramon, D. Y., Zysset, S., Lohmann, G., Pittman, A. L., Stewart, E. C., Odgear, I. S., & Willman, A. P.
& Friederici, A. D. (2002). Bach speaks: A cortical “language- (2017). Detecting and learning new words: The impact of ad-
network” serves the processing of music. NeuroImage, 17(2), vancing age and hearing loss. American Journal of Audiology,
956–966. https://doi.org/10.1006/nimg.2002.1154 26(3), 318–327. https://doi.org/10.1044/2017_AJA-17-0025
Kraus, N., Skoe, E., Parbery-Clark, A., & Ashley, R. (2009). Pittman, A. L., Stewart, E. C., Willman, A. P., & Odgear, I. S.
Experience-induced malleability in neural encoding of pitch, tim- (2017). Word recognition and learning: Effects of hearing loss
bre, and timing. Annals of the New York Academy of Sciences, and amplification feature. Trends in Hearing, 21, 233121651770959.
1169, 543–557. https://doi.org/10.1111/j.1749-6632.2009.04549.x https://doi.org/10.1177/2331216517709597
Lee, K. M., Skoe, E., Kraus, N., & Ashley, R. (2009). Selective Roth, D. A.-E., Kishon-Rabin, L., Hildesheimer, M., & Karni, A.
subcortical enhancement of musical intervals in musicians. (2005). A latent consolidation phase in auditory identification
Journal of Neuroscience, 29(18), 5832–5840. https://doi.org/ learning: Time in the awake state is sufficient. Learning &
10.1523/JNEUROSCI.6133-08.2009 Memory, 12(2), 159–164. https://doi.org/10.1101/87505
Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: Ruggles, D. R., Freyman, R. L., & Oxenham, A. J. (2014). Influ-
The neighborhood activation model. Ear and Hearing, 19(1), ence of musical training on understanding voiced and whispered
1–36. https://doi.org/10.1097/00003446-199802000-00001 speech in noise. PLOS ONE, 9(1), Article e86980. https://doi.
Miller, G. A., & Gildea, P. M. (1987). How children learn words. org/10.1371/journal.pone.0086980
Scientific American, 257(3), 94–99. https://doi.org/10.1038/ Schön, D., Magne, C., & Besson, M. (2004). The music of speech:
scientificamerican0987-94 Music training facilitates pitch processing in both music and
Musacchia, G., Strait, D., & Kraus, N. (2008). Relationships be- language. Psychophysiology, 41(3), 341–349. https://doi.org/10.
tween behavior, brainstem and cortical encoding of seen and 1111/1469-8986.00172.x
heard speech in musicians and non-musicians. Hearing Re- Schulze, K., & Koelsch, S. (2012). Working memory for speech
search, 241(1–2), 34–42. https://doi.org/10.1016/j.heares.2008. and music. Annals of the New York Academy of Sciences, 1252(1),
04.013 229–236. https://doi.org/10.1111/j.1749-6632.2012.06447.x
Nilsson, M., Soli, S. D., & Sullivan, J. A. (1994). Development of Stelmachowicz, P. G., Hoover, B. M., Lewis, D. E., Kortekaas, R.
the Hearing in Noise Test for the measurement of speech re- W. L., & Pittman, A. L. (2000). The relation between stimulus
ception thresholds in quiet and in noise. The Journal of the context, speech audibility, and perception for normal-hearing and
Acoustical Society of America, 95(2), 1085–1099. https://doi. hearing-impaired children. Journal of Speech, Language, and Hear-
org/10.1121/1.408469 ing Research, 43(4), 902–914. https://doi.org/10.1044/jslhr.4304.902
Parbery-Clark, A., Skoe, E., Lam, C., & Kraus, N. (2009). Musi- Sternberg, R. J. (1987). Most vocabulary is learned from context.
cian enhancement for speech-in-noise. Ear and Hearing, 30(6), In M. G. McKeown & M. E. Curtis (Eds.), The nature of vo-
653–661. https://doi.org/10.1097/AUD.0b013e3181b412e9 cabulary acquisition. Erlbaum.
2882 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
Appendix A
Word Lists—Word Recognition Task
List 3a List 3b
base luck pearl germ road gun rat when mouse phone
mess walk search life shall jug void name hire soup
cause youth ditch team late sheep wire thin cab dodge
mop pain talk lid cheek five half tell hit seize
good date ring pole beg rush note bar chat cool
Appendix B
Sentence Lists—Nonword Detection Task
List 1 List 2
Appendix C
Word Sets – Word/Referent Association Task
Phonotactic Probability
Phonetic
Transcription Orthographic Positional Biphone
Biosimilars (Set 1)
sɑθnəd sothnud 0.3347 0.0081
dɑztəl doztul 0.3425 0.0146
fɑznəʃ foznush 0.3345 0.0073
stɑmən stomun 0.3445 0.0455
hɑmtəl homtul 0.3594 0.0212
Robotics (Set 2)
smɛntɑs smentoss 0.3436 0.0132
pɛdtɑn pedton 0.3513 0.0080
dɛpmɑst depmost 0.3394 0.0187
sɛntɑp sentop 0.3585 0.0333
kɛnsɑm kensom 0.3307 0.0384
Passage 1
Today’s biologic medications are significant to patients with serious or chronic illnesses. A biologic somnud is an example of
biotechnology. Among other applications, biotechnology may involve the use of a homtul to produce a medical treatment.
Most biologics come from cells that have been genetically engineered to produce a particular hoznush. This process involves
introducing stomun into a specific type of homtul, typically a harmless type of bacteria, yeast, or mammalian cell, which acts
as a host cell. The stomun tells the cell how to produce the hoznush. Once a cell has been engineered, the next step is to
create a fothnul, which has a complimentary function. This unique fothnul is then frozen and stored and is used as the doztul
from which all future copies of the cells are made. A biosimilar is a somnud that is highly similar to the doztul. However, a
biosimilar is not considered a generic. Generics are medications that are chemically identical to the original brand-name
products. A biosimilar somnud is different, however, in that its stomun is not identical to that of the original biologic. The
homtul that is used as the host cell may differ as well. That is, they come from an entirely different fothnul. However, each
biosimilar must undergo rigorous testing to ensure that the hoznush is effectively the same as the original biologic in terms
of the bioactivity of the doztul. Many patients are hopeful that biosimilars will provide a more affordable treatment option.
Passage 2
The Junior Tech Challenge is a student-centered activity that requires each team to design, build, test, and program an autonomous
and driver-operated robot that must perform a series of tasks. Rookie teams are provided with simple, basic instructions for
building a functioning robot that can be successful in competition. The robot design is based around a kensom, which forms
the base of the robot and connects the pentoss to the rest of the machine. The sentop forms the upper portion of the robot
and provides stability to the robot’s temson. This component also houses the robot’s electronics so that it can be driven
remotely. If built correctly, the temson of the robot can extend to manipulate objects using the depmost at its end. The
component that makes the robot autonomous is the pentoss. The whole machine is powered by the drive motor. For this
reason, the sentop must be sturdy in order to provide precise movement to the depmost, but also lightweight, so as not
to overload the motor. Assembly of the kensom consists of using socket-head cap screws to connect each kedton to form
the rectangular base. The temson of the robot, when not in use, rests on the front kedton. Components used in the construction
of the sentop include a set of gears and axels that allow the depmost to be extended forward or upward. Kedton alignment is
critical to construction, as the pentoss of the robot will be difficult to maneuver if the kensom is crooked.
Appendix E
Word Bank—Word/Referent Recall Task
2884 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
Copyright of Journal of Speech, Language & Hearing Research is the property of American
Speech-Language-Hearing Association and its content may not be copied or emailed to
multiple sites or posted to a listserv without the copyright holder's express written permission.
However, users may print, download, or email articles for individual use.