Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

JSLHR

Research Article

Learning and Retention of Novel Words


in Musicians and Nonmusicians
Elizabeth C. Stewarta and Andrea L. Pittmana

Purpose: The purpose of this study was to determine Performance was compared across groups and listening
whether long-term musical training enhances the ability to conditions.
perceive and learn new auditory information. Listeners with Results: Performance was significantly poorer in babble
extensive musical experience were expected to detect, than in quiet on word recognition and nonword detection,
learn, and retain novel words more effectively than participants but not on word learning, learned-word recall, or learned-
without musical training. Advantages of musical training were word detection. No differences were observed between
expected to be greater for words learned in multitalker babble groups (musicians vs. nonmusicians) on any of the tasks.
compared to quiet. Conclusions: For young normal-hearing adults, auditory
Method: Participants consisted of 20 young adult musicians experience resulting from long-term music training did not
and 20 age-matched nonmusicians, all with normal enhance their learning of new auditory information in either
hearing. In addition to completing word recognition and favorable (quiet) or unfavorable (babble) listening conditions.
nonword detection tasks, each participant learned 10 novel This suggests that the formation of semantic and musical
words in a rapid word-learning paradigm. All tasks were representations in memory may be supported by the same
completed in quiet and in multitalker babble. Next-day underlying auditory processes, such that musical training
retention of the learned words was examined in isolation is simply an extension of an auditory expertise that both
(recall) and in the context of continuous discourse (detection). musicians and nonmusicians possess.

A
fundamental component of word learning is as- lists. This approach requires substantial effort on the part
certaining the meaning of the word. According to of both the learner and the instructor, as it usually in-
Bloom (2000), knowing the meaning of a word volves careful explanation of definitions and repeated
requires (a) having a mental representation or concept that the practice. Thus, learning words well enough to recognize
word symbolizes and (b) mapping that concept onto the correct them easily and use them correctly takes a considerable
linguistic form (i.e., unit of speech capable of carrying amount of time using this method. In fact, it is estimated
meaning, such as a morpheme, word, or phrase). The that children learn only 100–200 words this way over the
most observable type of word learning occurs through course of a school year (Miller & Gildea, 1987). Thus, there
direct instruction. In early childhood, this takes the form must be another source of word learning that accounts for
of ostensive naming—that is, labeling targets in the child’s the majority of the 60,000-word vocabulary of the average
environment. A limitation of this approach is the problem high school graduate (e.g., Pinker, 1994).
of generalization: The child must have a way to take a There is convincing evidence that most words are ac-
word that was learned in one situation and apply it appro- quired through incidental exposure during verbal commu-
priately in a new situation. Another method of direct in- nication (Bloom, 2000; Sternberg, 1987). Sternberg and
struction leading to word learning, typically beginning in Powell (1983) proposed a model of contextual learning
about the fourth grade, is the memorization of vocabulary comprising three components: (a) knowledge acquisition
processes, (b) contextual cues, and (c) moderating variables.
This model assumes that information relevant to learning a
a
Department of Speech and Hearing Science, Arizona State new word is rarely presented in isolation; rather, it is often
University, Tempe embedded in a background of information, not all of which
Correspondence to Elizabeth C. Stewart: is necessary for acquiring the meaning of the novel word.
Elizabeth.Stewart@sonova.com Knowledge acquisition processes consist of examining all
Editor-in-Chief: Frederick (Erick) Gallun of the available information and identifying that which is
Editor: Yi Shen critical for understanding the word’s meaning and filtering
Received August 16, 2020
Revision received December 30, 2020
Accepted March 12, 2021 Disclosure: The authors have declared that no competing interests existed at the time
https://doi.org/10.1044/2021_JSLHR-20-00482 of publication.

2870 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021 • Copyright © 2021 American Speech-Language-Hearing Association
out irrelevant data (selective encoding), taking the relevant to their nonmusician peers (Dittinger et al., 2017, 2019).
pieces of information and combining them into a cohesive Additionally, young adult musicians were able to recall
unit (selective combination), and then evaluating the new in- the word–picture associations significantly more accurately
formation to determine how it relates to existing knowledge than nonmusicians when the matching task was repeated
(selective comparison). In short, novel information is selected, 5 months after novel words were first learned (Dittinger
integrated, and examined to arrive at word meaning. et al., 2016). The outcomes of these studies suggest that
Knowledge acquisition processes operate on a set of music training may indeed enhance word learning. There
cues that are drawn from the context in which the new is a body of evidence, discussed in the section that follows,
words occur. These contextual cues can help to establish that provides examples of perceptual and cognitive skills
the meaning of an unknown word by providing informa- that music training has been shown to enhance. Given the
tion about the word’s physical properties (such as its size, importance of these skills for auditory learning, the evi-
shape, or color), function, class membership, and so forth. dence that follows further supports the hypothesis that mu-
Moderating variables dictate the degree to which the con- sical training enhances novel word learning.
textual cues facilitate word learning and can include the
number of times the unknown word is encountered; the
Advantages of Music Training for Perception
variability of contexts in which the word appears over mul-
tiple occurrences; or how critical the word is to comprehen- and Cognition
sion of the phrase, sentence, or passage in which it appears One of the variables moderating context-driven word
(Sternberg & Powell, 1983). learning is the importance of the unknown word to the phrase
In this study, the hypothesis that musical training in which it is embedded, as this will determine how critical it
moderates the learning and retention of new information is to decipher the meaning of the word (Sternberg & Powell,
was tested. Central to this hypothesis is the assumption 1983). One way to assess a novel word’s significance in a
that the benefit of musical training extends to nonmusical sentence is through prosodic cues such as stress, which is
disciplines. This assumption is supported by the theory of often used to draw attention to key words. Prosodic infor-
positive transfer, which proposes that a concept or skill mation, such as stress and intonation, is carried by pitch con-
learned in one context can be applied in a similar context to tour (i.e., pattern of changes in pitch) in speech, whereas
improve performance (Perkins & Salomon, 1992). Besson pitch contour in music conveys melody (Chandrasekaran
et al. (2011) have suggested that a boost in perceptual acu- & Kraus, 2010). Musicians have demonstrated an advan-
ity in one domain following long-term sensory experience tage for detecting pitch changes in both music and speech.
in another domain serves as evidence of a transfer of train- For example, Schön et al. (2004) found that young adult
ing. This assertion is based on findings that certain brain musicians were better than their nonmusician peers at de-
structures function similarly when processing language and tecting subtle incongruities in pitch within melodies and
music (Besson et al., 2011). For example, results of a func- in spoken sentences, suggesting that musical training pro-
tional magnetic resonance imaging study on young adult vides an advantage for the perception of pitch contour in
nonmusicians showed increased activation in a neuronal both music and spoken language. Thus, musical experience
network that included portions of Broca’s and Wernicke’s may enhance sensitivity to prosodic cues, allowing the lis-
areas for musical chord sequences containing dissonant tener to determine the importance of a novel word relative
tone clusters, compared to those containing in-key chords to the overall meaning of the sentence in which it appears.
(Koelsch et al., 2002). Because the involvement of these When the pitch of two distinct sounds is identical, ac-
areas in language processing is well established, this study curate detection of timbre helps listeners discriminate be-
appears to demonstrate an overlap in the representation of tween them (Kraus et al., 2009). In speech, timbre carries
linguistic and musical syntax in the brain (Patel, 2008). acoustic information that aids in distinguishing one phoneme
To explore the question of whether experience with from another. Timbre is considered a qualitative acoustic
music learning transfers to language learning specifically, characteristic and is dependent upon harmonic compo-
Dittinger et al. (2016, 2017, 2019) completed a set of stud- nents of complex sounds (Lee et al., 2009). Musicians have
ies using a novel word learning paradigm involving word– shown stronger physiological responses to the harmonic
picture associations presented in a quiet listening environ- components of both tones and speech (Lee et al., 2009;
ment. This paradigm included a word-learning phase, in Musacchia et al., 2008), indicating that their neural rep-
which natural Thai monosyllabic words were presented, resentation of timbre is enhanced relative to nonmusicians.
along with pictures of familiar objects 20 times each, to As the perception of timbre is important for the discrimi-
allow participants (nonspeakers of Thai) to learn the word– nation of speech sounds, heightened sensitivity to timbre
picture associations. Following the word-learning phase, could improve detection of novel words that differ from
participants were presented with the same pictures and were familiar words by only one phoneme.
asked to indicate whether the auditorily presented word that Other benefits of music training have been revealed
followed matched or mismatched the picture based on the in investigations of auditory-specific cognitive functions. For
learned word–picture associations. These studies revealed example, Strait et al. (2010) compared the performance of
that young adults and children with musical training expe- adult musicians and nonmusicians on an auditory attention
rience made fewer errors on the matching task compared task in which participants were instructed to press a response

Stewart & Pittman: Word Learning and Musical Experience 2871


button as quickly as possible each time they heard the tar- (e.g., Crukley et al., 2011). If enhanced auditory experi-
get stimulus (beep) and to refrain from responding when ence stemming from music training can improve word
they heard the foil (siren). Not only did the musicians dem- learning, it may also provide a valuable advantage in miti-
onstrate significantly faster reaction times to targets than gating the effects of noise on learning.
nonmusicians, but performance on this task was significantly Although Dittinger et al. (2016) reported that young
correlated with the number of years of music practice, leading adult musicians have superior memory for novel words in
the authors to conclude that long-term musical experience quiet, it is not known if that advantage is reduced, main-
strengthens auditory-specific cognitive abilities (Strait et al., tained, or enhanced in noise. However, there is evidence
2010). that musical training facilitates superior speech perception
Another cognitive function that plays a key role in in noise. For example, Parbery-Clark et al. (2009) found
high-level tasks, including speech processing, is working that young adult musicians scored significantly higher on
memory (Baddeley, 1992; Schulze & Koelsch, 2012). In- clinical assessments of speech perception in noise than young
vestigations of a musician advantage for working memory adults lacking musical experience. Similar results for speech
have yielded somewhat mixed results. In addition to the perception in noise have been obtained in children enrolled
auditory attention task, Strait et al. (2010) assessed perfor- in music training from an early age (approximately 5 years)
mance on an auditory working memory task but found no compared to those with more limited musical experience
significant differences between the musician and nonmusi- (Strait et al., 2012). If the benefit of music training ob-
cian groups. However, other studies have found that adults served for perception of familiar words extends to the de-
and children with musical training experience exhibit greater tection and learning of new words, it would suggest that
auditory and verbal working memory capacity than their music training may be useful for reducing the difficulties
peers who lack such experience (Bergman Nutley et al., associated with learning in noise.
2014; Parbery-Clark et al., 2009; Strait et al., 2012). Taken Music training and the auditory proficiency it re-
together, these studies suggest that music training may en- quires may enhance perceptual and cognitive processing
hance various dimensions of speech and cognitive processing. skills important for learning new words. The purpose of
this experiment was to determine whether or not long-term
music training enhances auditory perception for tasks related
Learning in Noise to listening and learning in quiet and in noise. To assess
This study assessed word learning ability in musicians selected stages in the word learning process, participants
and nonmusicians using a rapid word-learning paradigm completed a series of behavioral tasks involving familiar
for novel words that has been used previously to explore the and novel words. The framework for this test battery is
effects of age (adult, child), hearing status (normal hearing, based on a model put forth by Pittman and Rash (2016),
hearing loss), listening condition (quiet, noise), amplification which proposes that recognizing familiar words and learn-
condition (unaided, aided), and quality of the acoustic signal ing new words share a common underlying process. This
(various hearing device features) on efficiency of novel word model expands upon the lexical neighborhood model (Luce
learning (Pittman, 2011, 2019; Pittman, Stewart, Odgear, & Pisoni, 1998), which posits that the recognition of famil-
& Willman, 2017; Pittman, Stewart, Willman, & Odgear, iar words occurs when speech input triggers a process of
2017). In this paradigm, five novel words are presented 15– acoustic–phonetic pattern recognition, which is then re-
20 times each, in random order. Participants learn to asso- fined by higher level information to identify the word in
ciate each word to a corresponding novel image through the lexicon that provides the closest possible match. The
a process of trial and error using an interactive computer Pittman and Rash model adds a component for the process
game, which provides reinforcement for correct responses of identifying novel words—that is, those with no precise
but not for incorrect responses. This task has been shown lexical match. The process of configuration is also new to
to be sensitive to differences in hearing status, amplification the model and involves the integration of the acoustic–
condition, age, and quality of the acoustic signal (Pittman, phonetic information from the speech input with semantic
2011, 2019; Pittman, Stewart, Odgear, & Willman, 2017; information inferred from the context in which the novel
Pittman, Stewart, Willman, & Odgear, 2017). However, word is embedded, allowing the listener to form a representa-
notable reductions in learning and even detection of novel tion of the novel word in memory (Pittman & Rash, 2016).
words in noise have been observed across studies, regard- A series of tasks that progress in their complexity was
less of participant age, hearing status, and amplification selected in order to identify more precisely the level of diffi-
condition, suggesting that performance on this task is vul- culty at which musical experience becomes advantageous.
nerable to the effects of noise (Pittman, 2011; Pittman & Perception of familiar words was assessed with a traditional
Rash, 2016; Pittman, Stewart, Willman, & Odgear, 2017). clinical measure of speech perception. Sensitivity to the de-
Because high school students spend as much as 80% tection of novel words was assessed using a mix of familiar
of the school day in noise (Crukley et al., 2011), it is rea- and novel words together in short sentences. The rapid
sonable to expect that new words are rarely encountered in word-learning task described above (Pittman, 2011, 2019;
quiet situations. In fact, even when learning words through Pittman, Stewart, Odgear, & Willman, 2017; Pittman, Stewart,
direct instruction in the classroom, learners may have Willman, & Odgear, 2017) was used to determine short-term
to contend with 60 dBA of noise in this environment learning ability for a task composed entirely of novel words.

2872 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
New to this test battery were measures of detection experience. Participants with musical training were expected
and recall of the learned words on the day immediately fol- to perceive and learn novel words more effectively than par-
lowing the rapid word-learning task. There is evidence that ticipants without musical training. It was also expected that
memory consolidation for auditorily learned information participants with musical experience would be able to more
occurs during a night’s sleep. One such study involved an effectively consolidate novel words into long-term memory,
auditory pitch memory task in which 56 adult nonmusicians allowing them to both detect and recall newly learned words
(18–40 years old) with normal hearing heard sequences of more accurately 1 day following training than their nonmu-
sine wave tones of varying pitch and were tested on their sician peers. Finally, it was anticipated that the advantages
accuracy in determining whether or not the pitch of a tone of musical training would be greater for words learned in
at the end of a sequence was the same as the pitch of the multitalker babble compared to quiet due to the enhanced
first tone in the same sequence (Gaab et al., 2004). Three auditory perception of these listeners.
study sessions, each of which included blocks of training
and testing, were completed over a 24-hr period. Results
revealed no improvement in task performance across ses- Method
sions separated by 12 waking hours, but a significant im- Participants
provement following sleep, indicating a “delayed learning” Twenty adult musicians (12 men and eight women)
effect. More recent evidence, however, suggests that sleep between the ages of 23 and 34 years were recruited from the
may not be necessary to achieve posttraining consolidation. student population in the School of Music at Arizona State
Collet et al. (2012) conducted a study in which 20 native University and from local community symphonies. Musical
French speakers (18–38 years old, all with normal hearing) history was characterized in terms of the age when musical
learned to discriminate between two synthetic syllables, dif- training was initiated and years of consistent practice. Musi-
fering only in voice onset time over two training/testing cians had between 13 and 25 years of training (M = 18.8 ±
sessions. For half of the participants, the two sessions were 4.0 years), initiated between 3 and 12 years of age (M = 7.7 ±
separated by 12 waking hours, while the other half completed 2.6 years). All participants held at least a bachelor’s degree.
their second session after a 12-hr period, which included ap- For 19 of the 20 musicians, this degree was in a music-related
proximately 8 hr of sleep. Performance gains (i.e., learning) discipline (performance, education, theory, etc.), whereas one
made during the initial session were mostly maintained at the musician held a degree in a nonmusic discipline.
second session for both groups, indicating that performance An additional 20 adults (four men and 16 women)
stabilization occurred over the delay independent of sleep between the ages of 21 and 38 years with little to no musical
(Collet et al., 2012). A similar study used a verbal auditory experience served as a control group. Ten of the nonmusi-
identification task to train and test young adults (18–26 years cians had no music training at all, while the other 10 had be-
old) with normal hearing on consonant–vowel pair discrimi- tween 0.25 and 2.25 years of training (M = 0.5 ± 0.7 years),
nation (Roth et al., 2005). Results revealed significant im- initiated between the ages of 6 and 15 years (M = 11.0 ±
provement in syllable discrimination when tested 6–12 hr 2.8 years). None of the nonmusicians reported any training
after training, whether or not that interval included a period within the preceding 10 years. All participants held at least
of sleep. However, the largest gains in performance were ob- a bachelor’s degree in a nonmusic discipline.
served 24 hr (including at least 6 sleeping hours) posttraining. All participants were monolingual English speakers
It should be noted that, in all of these studies, the task and had normal hearing bilaterally, as confirmed by pure-
used to assess consolidation of auditorily learned informa- tone audiometry. Hearing thresholds were ≤ 25 dB HL for
tion was the same as the one used for training. This was not octave frequencies between 0.25 and 8 kHz, with the ex-
the case in this study. However, a subsequent study com- ception of one musician, whose threshold at 8 kHz was
pleted in our lab revealed significant positive correlations 30 dB HL in the right ear only. Average hearing thresholds
between next-day retention of novel words and performance are shown in Table 1, along with demographic information
on the rapid word-learning task (de Diego-Lázaro et al., 2021; for each participant group.
Pittman & de Diego-Lázaro, 2021). This suggests that
there is a direct relationship between performance during Auditory Stimuli
the learning process on the first day and next-day recall,
Word and sentence stimuli were recorded by a talker
such that we can predict next-day retention fairly well. The
having a standard American English dialect at a sampling
detection task used in this study assessed participants’ abil-
rate of 22.05 kHz using a microphone with a flat frequency
ity to detect newly learned words within continuous dis-
response to 10 kHz. Stimuli were digitized and edited into
course, while the recall task quantified the participants’ individual .wav files using Adobe Audition (Version 1.5) and
retention of the word/referent pairs. Together, both mea- equated for the root-mean-square level.
sures revealed the participants’ capacity for consolidating
newly learned information into long-term memory.
In this study, perception of familiar words and the Test Battery
ability to detect, learn, and recall novel words was examined Word Recognition
in young, normal-hearing adults with extensive musical Participants heard and repeated aloud sets of 25 familiar
training, as well as in age-matched peers with no musical words from the Northwestern University Auditory Test No. 6

Stewart & Pittman: Word Learning and Musical Experience 2873


Table 1. Demographic information, audiometric thresholds for octave frequencies 0.25–8 kHz, and music training history for musician (MUS)
and nonmusician (NOM) groups.

Average threshold (dB HL)


Male:female Average age in Music training
Group ratio years (SD) Ear 0.25 0.5 1 2 4 8 in years (SD)

MUS 12:8 27.0 (3.8) Right 9 9 8 10 8 5 18.8 (4.0)


Left 11 9 8 9 8 3
NOM 4:16 26.5 (4.5) Right 10 10 9 8 7 5 0.5 (0.7)
Left 11 10 8 8 8 3

(NU-6) word recognition test (Tillman & Carhart, 1966). In 1974). Use of these metrics allowed for the examination
order to facilitate accurate scoring, participants’ verbal re- of the effects of music training and listening condition on
sponses were captured with a digital audio recorder (Zoom participants’ ability to discriminate between familiar and
H2N) coupled to a head-worn microphone (Shure, WH20) novel words (d’), as well as their decision criteria for making
positioned approximately 2 in. from the corner of the this discrimination (c). In short, d’ served as an indication of
participant’s mouth. Responses were then reviewed and participants’ sensitivity to novel words, while c revealed the
scored offline as either correct or incorrect. Overall per- presence and magnitude of response bias toward familiar or
formance was scored in proportion correct and arcsine novel words.
transformed for statistical analysis. No reinforcement was
provided for this task. One list of 25 words took less than
2 min to complete. Appendix A contains the word lists Word/Referent Association Task
used for this task. Participants learned novel words through a process of
trial and error using an interactive computer game (Pittman,
2011). Participants heard five nonsense words, presented
Nonword Detection randomly one at a time. Each nonsense word was paired
The nonword detection task involved a mix of familiar with one of five nonsense objects/characters. A picture of
and novel words, presented together in four-word sentences. each nonsense image was displayed on one of five buttons
Briefly, these sentences were adapted from sentences used in arranged across the bottom of a computer screen. Listeners se-
Stelmachowicz et al. (2000) by substituting individual pho- lected one of the five images after presentation of each non-
nemes with phonemes of similar phonotactic probability in sense word. Visual reinforcement for correct selections was
order to change some of the real words to nonsense words provided by a computer game located above the response
(see Pittman & Schuett, 2013, for a detailed description of buttons. The game advanced following each correct re-
this task). All words were monosyllabic. Within each list of sponse (e.g., one piece of a puzzle appeared), while no rein-
12 sentences, six sentences contained two nonsense words, forcement was provided for incorrect selections. Participants
four sentences contained one nonsense word, and two sen- were instructed to use the reinforcement to associate each
tences contained zero nonsense words. Following the pre- nonsense word with the correct image. Each nonsense word
sentation of each sentence, participants indicated which words was presented 18 times, for a total of 90 randomized trials.
(if any) in each sentence were nonsense words by selecting This task supports the configuration process in the expanded
the corresponding numbered button(s) on a computer screen. lexical neighborhood model, as the repeated exposures
Visual reinforcement was provided via an interactive com- facilitate refinement of the acoustic and semantic proper-
puter game for correct responses, but not for incorrect ties of each word. The paradigm also satisfies the retrieval
responses. One list took less than 2 min to complete. Ap- practice required to effectively associate a word with its
pendix B contains the sentence lists used for this task; non- referent for later recall (Karpicke, 2012; Karpicke & Roediger,
sense words are in bolded text. 2008). This task took approximately 5–6 min to complete.
To evaluate the effects of musical training and listen- Appendix C contains the orthographic representations of
ing condition on the ability to detect novel words in the the nonsense words used for this task, as well as the pho-
context of sentences, results from this task were analyzed netic transcription and phonotactic probability (Vitevitch
according to signal detection theory (Swets, 1996). Partic- & Luce, 2004) of each word.
ipants’ responses were broken down into hits (correctly Performance on this task was quantified in terms of
identified nonsense words), misses (nonsense words identi- the efficiency of word learning, defined as the number of
fied as real), false alarms (real words identified as nonsense), trials required to achieve the criterion performance of 71%
and correct rejections (correctly identified real words). Per- accuracy. To calculate efficiency of word learning, trial-
formance was quantified by two metrics: (a) d’ was determined by-trial data were reduced chronologically to nine bins of
by calculating the difference between the standardized (z- 10 trials each, and the proportion of correct responses within
transformed) scores for hits and false alarms and (b) c was each bin was calculated. The raw data were then smoothed
n
calculated as the inverse of one half the sum of the stan- with an exponential growth function Pc ¼ 1−0:8e− c , where
dardized scores for hits and false alarms (Green & Swets, Pc is the probability of a correct answer, 1−0.8 reflects

2874 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
chance performance for this task (20%), e is 2.718…, n is but similar nonsense words). To assess word recall after con-
the midpoint of the trial block (5, 15, 25, etc.), and c is the solidation of words learned the previous day, this task
time constant of the process. When the number of trials was completed prior to the learned-word detection task
equals the time constant (n = c), performance is 71% cor- to avoid additional, same-day exposures to some of the
rect. This was accomplished by adjusting estimates of c to learned words. Additionally, to avoid multiple exposures
minimize the sum of the squared deviations between the of unfamiliar nonsense words, the foils used in this task
observed data and the points predicted (Pittman, 2011). differed from those used in the learned-word detection task.
The number of trials was log transformed and limited to Participants were instructed to label each image with the
no more than 1,000 trials. The inverse of the number of correct word from the word bank. This task took approxi-
trials required for each participant to reach this criterion mately 3–5 min to complete. Responses were scored in pro-
level of performance is the efficiency of word learning. portion correct for each set of words learned. Appendix E
contains the word bank used for this task.
Learned-Word Detection
On the day following completion of the word learn-
Procedure
ing task, participants listened to a spoken passage recorded
from the same talker who produced the nonsense words in Participants completed the word recognition, non-
the word/referent association task. Some words in the pas- word detection, rapid word-learning (word/referent associ-
sage were replaced with the learned words, while others were ation), and learned-word detection tasks twice: once in quiet
replaced with unfamiliar nonsense words (foils). Three of and once in multitalker babble at +3 dB signal-to-noise ratio
the five learned words appeared in the passage; three repe- (SNR). The order of listening condition was counterbalanced
titions of each learned word occurred within the discourse. across participants. Different word lists, sentence lists, novel
The passage also contained three repetitions of each of three word sets, and spoken passages were used for each condition;
unfamiliar foils. Together, each participant was exposed to one list, set, or passage was presented in each condition. The
21 repetitions of the target words over the 2 days of testing stimuli were presented from a desktop computer using custom
and only three repetitions of the foils. Thus, minimal famil- laboratory software. Auditory stimuli (NU-6 words, sentences,
iarity to the foils was expected. The passage contained novel words) were routed from the computer through a high-
approximately 250 words (including nonsense targets and speed (96 kHz), high-quality (24-bit resolution) soundcard
foils) and resulted in a recording that was approximately with six analog channels (Echo Gina 3G) to binaural insert
90 s in duration, which was the length of time participants earphones (ER-3A; Etymotic Research) and presented at
were given to complete the task. 60 dBA. The experimental software was also used to display
Participants were given a brass “cricket clicker” and visual reinforcement and record participants’ responses.
instructed to click as soon as they heard a word they learned The word/referent recall task was a written test and
the previous day and to ignore all other words. The audible was thus completed in quiet. It included all words learned
clicks were recorded using the omnidirectional microphone the previous day and was administered once. Detection of the
setting on a digital recorder (Zoom H2N), while the passage learned words was assessed in the same listening conditions
(i.e., the output of the soundcard) was recorded simulta- (quiet, babble) in which they were learned. The order of lis-
neously in a separate channel. The waveforms from both tening condition was counterbalanced across participants.
channels were examined visually offline. Clicks appeared In accordance with the policies of the internal review
as large impulses in the waveform. The timestamp of each board at Arizona State University, informed consent was
click was compared to the timestamp of the occurrence of obtained from each participant prior to testing. Comple-
each learned word and each foil within the passage to de- tion of study tasks required no more than 2.5 hr over two
termine each participant’s hits, misses, correct rejections, sessions. Subjects were paid in cash for their participation.
and false alarms.
Performance on this task was also analyzed using
signal detection theory. In this case, d’ served as an indica- Results
tion of participants’ sensitivity to the learned words, while To test the main hypothesis, multivariate analyses of
c was used to identify any response bias toward the targets variance were used to examine group differences in the var-
or foils. Appendix D contains the passages used for this ious learning tasks across listening conditions. The indepen-
task. Learned word targets are indicated in bold while un- dent variables were musical experience (long-term training,
familiar foils are italicized. no training) and listening condition (quiet, babble).
Figure 1 shows average performance on word recog-
Word/Referent Recall Task nition as a function of listening condition for musicians and
Participants completed a written posttest consisting nonmusicians. For both groups, perception of familiar words
of the two sets of five images learned the previous day, with was 11%–12% poorer in babble than in quiet. Musicians’
all 10 images visible on a single page of paper, in random scores were also 1%–2% higher than those of nonmusicians
order. On the same page, just below the images, a word in both listening conditions. Multivariate analyses of vari-
bank was provided containing orthographic representations ance revealed significantly better recognition in quiet than
of the 10 learned nonsense words, plus 10 foils (unfamiliar in babble, F(1, 38) = 104.68, p < .001, η2 = .586. However,

Stewart & Pittman: Word Learning and Musical Experience 2875


Figure 1. Averages (x) and distribution of scores on word recognition Figure 2. Averages (x) and distribution of scores (d’) on nonword
in quiet and babble for musicians (MUS) and nonmusicians (NOM). detection in quiet and babble for musicians (MUS) and nonmusicians
Solid lines inside the shaded boxes indicate median scores. Lower (NOM). Solid lines inside the shaded boxes indicate median scores.
and upper box boundaries indicate 25th and 75th percentiles, Lower and upper box boundaries indicate 25th and 75th percentiles,
respectively, while lower and upper error lines show the 10th and respectively, while lower and upper error lines show the 10th and
90th percentiles, respectively. Asterisk indicates significant difference 90th percentiles, respectively. Filled circle shows score falling
across listening condition (*p < .001). outside the 10th percentile. Asterisk indicates significant difference
across listening condition (*p < .001).

criteria for musicians and nonmusicians on the nonword de-


no effect of group was found, F(1, 38) = 3.21, p = .078, tection task as a function of listening condition. While the
indicating that word recognition was equivalent across response criteria in the quiet condition was neutral for both
groups when collapsed across listening condition. Fur- groups, the response criteria shifted significantly toward a
thermore, the Group × Listening Condition interaction
was not significant, F(1, 38) = 0.02, p = .880, indicating
that performance in both groups was similarly impacted Figure 3. Averages (x) and distribution of response bias (c) for
by babble. nonword detection in quiet and babble for musicians (MUS) and
Figure 2 shows average performance for musicians nonmusicians (NOM). Solid lines inside the shaded boxes indicate
and nonmusicians on the nonword detection task as a func- median scores. Lower and upper box boundaries indicate 25th and
75th percentiles, respectively, while lower and upper error lines
tion of listening condition. Performance is expressed in
show the 10th and 90th percentiles, respectively. Asterisk indicates
units of d’, which denotes sensitivity to the nonsense words. significant difference across listening condition (*p < .001).
For both groups, d’ scores were significantly higher in
quiet than in babble, F(1, 38) = 94.19, p < .001, η2 = .560,
indicating that babble significantly decreased the listeners’
ability to differentiate the nonsense words from the real
words. No main effect of group was observed on the non-
word detection task, F(1, 38) = 0.07, p = .798. While the
results showed that the musicians performed slightly bet-
ter than the nonmusicians in quiet and the nonmusicians
performed slightly better than the musicians in babble,
the Group × Listening Condition interaction was not sig-
nificant, F(1, 38) = 0.78, p = .382.
The response criteria (c) calculated for this task indi-
cates whether or not the listeners’ responses were free of
bias (c = 0) such that their errors were evenly distributed
between false alarms and misses. Response criteria that
deviate to positive values of c represent a bias toward clas-
sifying nonsense words as real (miss), whereas negative
values of c represent a bias toward classifying real words as
nonsense (false alarm). Figure 3 shows the average response

2876 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
positive bias (c) in babble, F(1, 38) = 26.00, p < .001, η2 = Figure 5. Averages (x) and distribution of scores (d’) on learned-
word detection in quiet and babble for musicians (MUS) and
.260. This indicates that, on average, the participants failed nonmusicians (NOM). Solid lines inside the shaded boxes indicate
to identify the nonsense words when perception was disrupted median scores. Lower and upper box boundaries indicate 25th
by the presence of multitalker babble. No main effect of and 75th percentiles, respectively, while lower and upper error
group was observed for this task, F(1, 38) = 1.30, p = .258, lines show the 10th and 90th percentiles, respectively. Filled
and the Group × Listening Condition interaction was also circles show scores falling outside the 10th percentile.
not significant, F(1, 38) = 0.67, p = .417. These results indi-
cate that musicians and nonmusicians approach the identifica-
tion of nonsense words in the same way and that musicians
did not demonstrate superior response criteria in either lis-
tening condition.
Figure 4 shows average efficiency of word learning
(i.e., performance on the word/referent association task) as
a function of listening condition for musicians and nonmu-
sicians. Recall that learning efficiency was determined by
log transforming the number of trials required to achieve a
criterion performance (71% correct), limited to 1,000 trials.
This resulted in a scale of 0–3, where a learning efficiency of
3 indicates perfect learning while a learning efficiency of 0 in-
dicates no learning. For this task, performance did not
decline in multitalker babble relative to quiet, F(1, 38) =
0.28, p = .601, suggesting that learning was resistant to
the adverse effects of multitalker babble. In quiet, the word-
learning efficiency of nonmusicians was slightly greater than
that of musicians, while in babble, musicians learned non-
sense words more efficiently than nonmusicians. Even so,
no significant main effect of group, F(1, 38) = 0.54, p = .466, discourse. Unlike the results of the nonword detection task,
or Group × Listening Condition interaction, F(1, 38) = 3.02, sensitivity to the newly learned nonsense words was not signif-
p = .086, was observed. icantly better in quiet than in multitalker babble, F(1, 38) =
Figure 5 shows average performance for musicians 1.07, p = .304. Additionally, the performance of the musicians
and nonmusicians on the learned-word detection task as a and nonmusicians was equivalent, F(1, 38) = 0.19, p = .666.
function of listening condition. Performance is expressed in Finally, no Group × Listening Condition interaction was ob-
units of d’ to indicate the listeners’ sensitivity to the newly served, F(1, 38) = 0.02, p = .904. Thus, exposure to nonsense
learned words when presented in the context of continuous words via the interactive learning paradigm was sufficient
to allow most listeners to distinguish these words from un-
familiar nonsense words the next day. These results indicate
Figure 4. Averages (x) and distribution of word learning efficiency that musical training experience did not, on average, further
in quiet and babble for musicians and nonmusicians. Solid lines improve the ability to detect newly learned words.
inside the shaded boxes indicate median scores. Lower and upper As with the nonword detection task, response criteria
box boundaries indicate 25th and 75th percentiles, respectively, (c) was calculated for this task, as well. Recall that response
while lower and upper error lines show the 10th and 90th percentiles,
respectively. criteria near c = 0 indicates no bias. For this task, positive
values of c represent a failure to identify learned words (mis-
ses) as they occur in the passage, whereas negative values
represent a failure to refrain from identifying unfamiliar
words (false alarm). Figure 6 shows the average response
criteria (c) for each group as a function of listening con-
dition. No significant main effect of group, F(1, 38) = 0.01,
p = .932, or condition, F(1, 38) = 0.27, p = .607, or signifi-
cant interaction, F(1, 38) = 0.49, p = .488, was observed.
On average, both groups demonstrated a negative response
bias in quiet and in babble, indicating a bias toward the
identification of more nonsense words than were learned
the previous day. It appears that participants were inclined
to respond with a click each time they heard an unfamiliar
word, regardless of whether they were hearing it for the
first or 21st time.
Finally, Figure 7 shows average performance for mu-
sicians and nonmusicians on the word/referent recall task

Stewart & Pittman: Word Learning and Musical Experience 2877


Figure 6. Averages (x) and distribution of response bias (c) during F(1, 38) = 0.62, p = .437, for this task. The Group × Listening
learned-word detection in quiet and babble for musicians (MUS)
and nonmusicians (NOM). Solid lines inside the shaded boxes indicate
Condition interaction also was not significant, F(1, 38) = 0.58,
median scores. Lower and upper box boundaries indicate 25th and p = .449. These results indicate that musicians did not
75th percentiles, respectively, while lower and upper error lines show show an advantage over nonmusicians for recalling non-
the 10th and 90th percentiles, respectively. Filled circle shows scores sense words learned the previous day. Furthermore, the
falling outside the 10th percentile. multitalker babble had little effect on recall of the newly
learned words in listeners with normal hearing.

Discussion
The objective of this study was to investigate the im-
pact of musical experience on perception of familiar words
and on learning new words in quiet and in multitalker bab-
ble. Significant effects of listening condition were observed
for word recognition and nonword detection, but not for
word learning or retention. These results indicate that an
acoustic competitor such as multitalker babble affects the
perception of familiar words differently than learning new
words. While multitalker babble creates a mismatch be-
tween auditory perception and auditory representation of
familiar words in memory, the formation of auditory rep-
resentations of new words in memory proceeds similarly in
both listening environments, albeit the quality of those rep-
resentations may differ.
as a function of listening condition. Both groups performed It was hypothesized that participants with extensive
similarly across conditions, recalling, on average, 61%– musical training would demonstrate an advantage over
69% of nonsense words learned the day before. Performance their nonmusician peers and that group differences would
differed by 3%–8% across groups, with nonmusicians scor- be larger for the more difficult listening condition. The re-
ing slightly higher on words learned in quiet and musicians sults did not support this hypothesis. Instead, word recog-
scoring slightly higher on words learned in babble. Multi- nition, nonword detection, word learning, and recall did
variate analysis of variance revealed no significant effect not differ significantly across groups. Although some tasks
of group, F(1, 38) = 0.07, p = .789, or listening condition, (most notably the word/referent recall task) showed large
variability in performance, it is unlikely that this accounted
for the lack of group differences. Published effect sizes for
Figure 7. Averages (x) and distribution of scores on next-day word the word recognition, nonword detection, and word/referent
recall in quiet and babble for musicians (MUS) and nonmusicians association tasks indicate that a sample size of 20 in each
(NOM). Solid lines inside the shaded boxes indicate median scores.
Lower and upper box boundaries indicate 25th and 75th group is sufficient to reveal significant group effects if they
percentiles, respectively, while lower and upper error lines show exist (e.g., Pittman, Stewart, Willman, & Odgear, 2017).
the 10th and 90th percentiles, respectively. While increasing the sample size would reduce within-group
variability, average performance for all of the outcome mea-
sures indicates that doing so is unlikely to yield an apprecia-
ble difference in the means of each group.
It has been suggested that music students, when exam-
ining a new piece of music, create internal musical represen-
tations by drawing from what they have learned through
their experiences with previously encountered material
(Bamberger, 2000, as cited in Taetle & Cutietta, 2002).
Likewise, the word/referent recall and learned-word de-
tection tasks used in this study required participants to
access information learned the previous day and identify
it in a different modality (visual) or in a different context
(continuous discourse). The musicians were expected to le-
verage their unique experience when processing new audi-
tory information, but they did not perform any better on
these tasks than the listeners without musical experience.
That is, auditory skills related to music learning did not
appear to generalize to the process of learning new words.

2878 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
Differences in experimental design may explain the These results are inconsistent with findings from previous
inconsistent findings between the present and previous stud- studies showing superior speech-in-noise perception in
ies. While the objectives and participants in this study align musicians (Parbery-Clark et al., 2009; Strait et al., 2012).
closely with those of Dittinger et al. (2016, 2017, 2019), the Thus, it may be useful to again consider methodological
stimuli and methodology were unique. First, the findings differences across studies.
of superior learning in musicians compared to nonmusicians In previous studies, speech perception in noise was
in the Dittinger et al. studies are supported by evidence from assessed using tests with varying SNRs, in which either the
electroencephalogram data collected in the same participants noise was presented at a constant level and the level of the
—specifically, changes to the N400 component of the event- speech was increased and decreased according to the listener’s
related potential, a measure which was not included in this performance (Hearing in Noise Test; Nilsson et al., 1994), or
study. All three studies revealed a significantly faster increase the level of the speech was held constant and the level of the
in N400 amplitude during training in musicians compared noise was gradually increased (Quick Speech-In-Noise Test [
to nonmusicians, reflecting accelerated encoding of the novel Killion et al., 2004] and Words in Noise Test [Wilson, 1993]).
word meanings. Additionally, the novel words learned in Results of these prior investigations revealed that musicians
the Dittinger et al. studies were natural Thai syllables that were able to repeat words and sentences accurately at poorer
differed in consonant place, vowel duration, and tonal qual- SNRs than nonmusicians. Thus, it is possible that musical
ity. By contrast, the stimuli in this study were representative advantage for speech perception in noise may be SNR de-
of multisyllabic English words, requiring the participants to pendent, with significant benefits observed for more demanding
retain longer and more complex combinations of phonemes listening conditions than those used in this study.
in memory. Another notable difference from the Dittinger Differences in the physical setup of the experiment
et al. studies was the use of familiar objects as referents. are another potential source of inconsistency across studies
Essentially, the participants learned a new name (or syn- of speech perception performance in musicians and non-
onym) for something they already had a name for, as one musicians. For example, the musically trained children in
does when learning a foreign language. In this study, un- the study by Strait et al. (2012) showed superior performance
familiar nonsense images that had no name were paired on speech-in-noise testing when the noise was spatially
with the nonsense words. This required participants to form separated from the speech signal, but not when noise was
an entirely new representation, consisting of a new object collocated with speech. On the other hand, adult musicians
and its new name. Finally, the Dittinger protocol facilitated outperformed their nonmusician peers in collocated but
word learning through direct instruction, whereas the learn- not spatially separated noise conditions (Parbery-Clark
ing task used in this study represents active learning through et al., 2009). In this study, speech and babble were pre-
retrieval practice (Karpicke, 2012; Karpicke & Roediger, sented binaurally through insert earphones such that both
2008). Despite the differences in methodology, the results ears received identical signals, similar to the collocated noise
of these studies suggest that musicians have an advantage condition. While it is plausible that this slight difference in
over their nonmusician peers when it comes to learning words methodology accounts for some variation in speech percep-
representative of tonal languages (such as Thai), but that both tion results, this was not the first study to report findings that
groups learn words representative of atonal languages (such differed from those of Parbery-Clark et al. (2009). Ruggles
as English) equally well. This particular benefit of music train- et al. (2014) assessed speech understanding in noise in musi-
ing may be a reflection of musicians’ superior ability to detect cians and nonmusicians, with the goal of replicating the
pitch contrasts in both speech and music, relative to nonmusi- Parbery-Clark et al. study. On average, the musicians per-
cians (Perfors & Ong, 2012; Schön et al., 2004). formed better on the Hearing in Noise Test and the Quick
The results of this study are, however, consistent with Speech-In-Noise Test than their nonmusician peers, though
another study in which a similar test battery was used to group differences were small and did not reach significance.
assess learning and retention of novel words in English– Similarly, the musicians in this study slightly outperformed
Spanish bilingual children, compared to monolingual English- their nonmusician peers for most tasks completed in babble,
speaking children (de Diego-Lázaro et al., 2021). Results of but the differences were not significant. Fuller et al. (2014)
this study revealed no advantage of bilingualism for learn- tested word recognition at three SNRs (+10, +5, 0), as
ing or retention of novel words that conformed to the well as sentence recognition using noise and multitalker
phonotactic rules of English, Spanish, and Arabic. Taken maskers in musicians and nonmusicians. Musicians’ word
together with this study, these findings suggest that, while recognition did not differ significantly from those of non-
these tasks have been demonstrated to be sensitive to differ- musicians at any SNR, nor were any significant group dif-
ences in auditory status (normal hearing, hearing loss), this ferences observed for sentence recognition in any of the
test battery may not be as sensitive to differences in audi- masker conditions. Similarly, Boebinger et al. (2015) used
tory experience (musical training, bilingualism). four different speech and noise maskers to assess sentence
Unique to this study is the inclusion of a multitalker recognition in musicians and nonmusicians and found no
babble condition—a common communication setting. It advantage of music training in perceiving masked speech.
was hypothesized that the musicians’ performance in each Taken together, these findings indicate that the advan-
task would be less impacted by noise than that of their non- tage of music training for understanding speech in noise
musician peers, but this was also not supported by the data. is equivocal at best.

Stewart & Pittman: Word Learning and Musical Experience 2879


As for learning new words, it was expected that the more complex tasks. However, it is possible that a musical
rigor of musical training would result in superior acoustic training advantage for these tasks may exist for listening
pattern recognition such that participants with musical train- conditions (i.e., presentation levels and SNRs) not used in
ing would learn and recall novel words more effectively in this study.
noise than their nonmusician peers. This was not observed.
Instead, the results of this investigation suggest that both
musicians and nonmusicians possess auditory expertise Conclusions
that facilitates word learning in noise, suggesting that In this study, enriched auditory experience in the form
musical abilities may be an extension of that expertise—an of long-term music training in young, normal-hearing adults
extension which may not have been necessary for word did not significantly enhance their ability to learn, detect,
learning through retrieval practice in this cohort. If so, or retain novel words. These results indicate that novel word
robust relationships between auditory experience and out- detection, learning, and recall are equivalent in musicians
come measures assessing these word-learning skills are un- and nonmusicians during the young–adult years. The re-
likely to be observed in young normal-hearing listeners who sults lay the foundation for the evaluation of a potential
show healthy neural function and generally good perfor- musical advantage with advancing age and/or degraded
mance on listening tasks. The relationship between music hearing acuity.
training and learning ability may be clearer in a group of
listeners with varied physiological status (e.g., normal hear-
ing vs. hearing impaired) and varied auditory experience Acknowledgments
over the life span. It is possible that musical training did
This study was supported by the Student Research Grant in
not provide a benefit for performance on word-learning
Audiology from the American Speech-Language-Hearing Foun-
tasks in these young, normal-hearing adults because no dation, awarded to the first author. Many thanks to the staff of
detriment existed that required compensation. Listeners the Pediatric Amplification Lab and to the study participants for
in this study did not represent either developing or declin- their time and cooperation.
ing perceptual or cognitive functions. Thus, these results
may serve as a benchmark against which performance in
children and older adults with and without musical experi- References
ence can be compared (as in Dittinger et al., 2019). Such
Baddeley, A. (1992). Working memory. Science, 255(5044), 556–559.
investigations may better reveal the impact of auditory https://doi.org/10.1126/science.1736359
training on learning during vulnerable stages of develop- Bergman Nutley, S., Darki, F., & Klingberg, T. (2014). Music
ment or aging. practice is associated with development of working memory
Finally, some limitations of this study were identified. during childhood and adolescence. Frontiers in Human Neuro-
First, musician and nonmusician groups were not equated science, 7, 926. https://doi.org/10.3389/fnhum.2013.00926
for gender. The musician group consisted of more men than Besson, M., Chobert, J., & Marie, C. (2011). Transfer of training
women, while women were overrepresented in the non- between music and speech: Common processing, attention,
and memory. Frontiers in Psychology, 2, 94. https://doi.org/
musician group. Although efforts were made to recruit non-
10.3389/fpsyg.2011.00094
musicians across various university departments, interest Bloom, P. (2000). How children learn the meanings of words. MIT
in participating in a hearing-related study was highest from Press. https://doi.org/10.7551/mitpress/3577.001.0001
graduate students in the Speech and Hearing Science pro- Boebinger, D., Evans, S., Rosen, S., Lima, C. F., Manly, T., &
gram, the vast majority of whom are women. Nevertheless, Scott, S. K. (2015). Musicians and non-musicians are equally
the possibility that this mismatch in gender distribution adept at perceiving masked speech. The Journal of the Acousti-
influenced study outcomes cannot be ruled out. Second, cal Society of America, 137(1), 378–387. https://doi.org/10.1121/
ceiling effects were observed for the word recognition and 1.4904537
nonword detection tasks presented in quiet. Thus, the abil- Chandrasekaran, B., & Kraus, N. (2010). Music, noise-exclusion,
and learning. Music Perception, 27(4), 297–306. https://doi.org/
ity to detect a musical training advantage for some mea-
10.1525/mp.2010.27.4.297
sures was limited by high levels of performance by young Collet, G., Schmitz, R., Urbain, C., Leybaert, J., Colin, C., &
adults with normal hearing, regardless of their musical ex- Peigneux, P. (2012). Sleep may not benefit learning new pho-
perience. Lastly, this study utilized a paradigm designed nological categories. Frontiers in Neurology, 3, 97. https://doi.
to assess various stages in the word-learning process using org/10.3389/fneur.2012.00097
more ecologically valid measures and conditions. As noted Crukley, J., Scollie, S. D., & Parsa, V. (2011). An exploration of
previously, while this test battery has been shown to be non-quiet listening at school. Journal of Educational Audiol-
sensitive to the quality of the auditory signal, the impact ogy, 17, 23–35.
of auditory experience may not be as readily apparent. Ad- de Diego-Lázaro, B., Pittman, A., & Restrepo, M. A. (2021). Is oral
bilingualism an advantage for word learning in children with
ditionally, learning requires the coordination of several hearing loss? Journal of Speech, Language, and Hearing Research,
skills, which may dilute a musical advantage observed for 64(3), 965–978. https://doi.org/10.1044/2020_JSLHR-20-00487
discrete components of auditory learning. That is, the ben- Dittinger, E., Barbaroux, M., D’Imperio, M., Jäncke, L., Elmer, S.,
efits of musical training for auditory processing and work- & Besson, M. (2016). Professional music training and novel
ing memory tasks may not be large enough to generalize to word learning: From faster semantic encoding to longer-lasting

2880 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
word representations. Journal of Cognitive Neuroscience, 28(10), Patel, A. D. (2008). Music, language, and the brain. Oxford Univer-
1584–1602. https://doi.org/10.1162/jocn_a_00997 sity Press. http://www.loc.gov/catdir/toc/ecip0715/2007014189.html
Dittinger, E., Chobert, J., Ziegler, J. C., & Besson, M. (2017). Perfors, A., & Ong, J. H. (2012). Musicians are better at learning
Fast brain plasticity during word learning in musically-trained non-native sound contrasts even in nontonal languages. In
children. Frontiers in Human Neuroscience, 11, 233. https://doi. N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings
org/10.3389/fnhum.2017.00233 of the 34th Annual Conference of the Cognitive Science Society
Dittinger, E., Scherer, J., Jäncke, L., Besson, M., & Elmer, S. (pp. 839–844). Cognitive Science Society.
(2019). Testing the influence of musical expertise on novel word Perkins, D. N., & Salomon, G. (1992). Transfer of learning. In P.
learning across the lifespan using a cross-sectional approach in Press (Ed.), International encyclopedia of education (2nd ed.).
children, young adults and older adults. Brain and Language, Pergamon Press.
198, 104678. https://doi.org/10.1016/j.bandl.2019.104678 Pinker, S. (1994). The language instinct. HarperCollins. https://
Fuller, C. D., Galvin, J. J., III., Maat, B., Free, R. H., & Baskent, D. doi.org/10.1037/e412952005-009
(2014). The musician effect: Does it persist under degraded pitch Pittman, A. L. (2011). Age-related benefits of digital noise reduc-
conditions of cochlear implant simulations? Frontiers in Neuro- tion for short-term word learning in children with hearing
science, 8, 179. https://doi.org/10.3389/fnins.2014.00179 loss. Journal of Speech, Language, and Hearing Research, 54(5),
Gaab, N., Paetzold, M., Becker, M., Walker, M. P., & Schlaug, G. 1448–1463. https://doi.org/10.1044/1092-4388(2011/10-0341)
(2004). The influence of sleep on auditory learning: A behavioral Pittman, A. L. (2019). Bone conduction amplification in children:
study. NeuroReport, 15(4), 731–734. https://doi.org/10.1097/ Stimulation via a percutaneous abutment versus a transcutane-
00001756-200403220-00032 ous softband. Ear and Hearing, 40(6), 1307–1315. https://doi.
Green, D. M., & Swets, J. A. (1974). Signal detection theory and org/10.1097/AUD.0000000000000710
psychophysics. Wiley. Pittman, A. L., & de Diego-Lázaro, B. (2021). What can a child
Karpicke, J. D. (2012). Retrieval-based learning. Association for do with one normal-hearing ear? Speech perception and word
Psychological Science, 21(3), 157–163. https://doi.org/10.1177/ learning in children with unilateral and bilateral hearing losses
0963721412443552 relative to peers with normal hearing. Ear and Hearing. Advance
Karpicke, J. D., & Roediger, H. L., III. (2008). The critical impor- online publication. https://doi.org/10.1097/aud.0000000000001028
tance of retrieval for learning. Science, 319(5865), 966–968. Pittman, A. L., & Rash, M. A. (2016). Auditory lexical decision
https://doi.org/10.1126/science.1152408 and repetition in children: Effects of acoustic and lexical con-
Killion, M. C., Niquette, P. A., Gudmundsen, G. I., Revit, L. J., & straints. Ear and Hearing, 37(2), e119–e128. https://doi.org/
Banerjee, S. (2004). Development of a Quick Speech-in-Noise 10.1097/AUD.0000000000000230
Test for measuring signal-to-noise ratio loss in normal-hearing Pittman, A. L., & Schuett, B. C. (2013). Effects of semantic and
and hearing-impaired listeners. The Journal of the Acoustical acoustic context on nonword detection in children with hearing
Society of America, 116(4), 2395–2405. https://doi.org/10.1121/ loss. Ear and Hearing, 34(2), 213–220. https://doi.org/10.1097/
1.1784440 AUD.0b013e31826e5006
Koelsch, S., Gunter, T. C., Cramon, D. Y., Zysset, S., Lohmann, G., Pittman, A. L., Stewart, E. C., Odgear, I. S., & Willman, A. P.
& Friederici, A. D. (2002). Bach speaks: A cortical “language- (2017). Detecting and learning new words: The impact of ad-
network” serves the processing of music. NeuroImage, 17(2), vancing age and hearing loss. American Journal of Audiology,
956–966. https://doi.org/10.1006/nimg.2002.1154 26(3), 318–327. https://doi.org/10.1044/2017_AJA-17-0025
Kraus, N., Skoe, E., Parbery-Clark, A., & Ashley, R. (2009). Pittman, A. L., Stewart, E. C., Willman, A. P., & Odgear, I. S.
Experience-induced malleability in neural encoding of pitch, tim- (2017). Word recognition and learning: Effects of hearing loss
bre, and timing. Annals of the New York Academy of Sciences, and amplification feature. Trends in Hearing, 21, 233121651770959.
1169, 543–557. https://doi.org/10.1111/j.1749-6632.2009.04549.x https://doi.org/10.1177/2331216517709597
Lee, K. M., Skoe, E., Kraus, N., & Ashley, R. (2009). Selective Roth, D. A.-E., Kishon-Rabin, L., Hildesheimer, M., & Karni, A.
subcortical enhancement of musical intervals in musicians. (2005). A latent consolidation phase in auditory identification
Journal of Neuroscience, 29(18), 5832–5840. https://doi.org/ learning: Time in the awake state is sufficient. Learning &
10.1523/JNEUROSCI.6133-08.2009 Memory, 12(2), 159–164. https://doi.org/10.1101/87505
Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: Ruggles, D. R., Freyman, R. L., & Oxenham, A. J. (2014). Influ-
The neighborhood activation model. Ear and Hearing, 19(1), ence of musical training on understanding voiced and whispered
1–36. https://doi.org/10.1097/00003446-199802000-00001 speech in noise. PLOS ONE, 9(1), Article e86980. https://doi.
Miller, G. A., & Gildea, P. M. (1987). How children learn words. org/10.1371/journal.pone.0086980
Scientific American, 257(3), 94–99. https://doi.org/10.1038/ Schön, D., Magne, C., & Besson, M. (2004). The music of speech:
scientificamerican0987-94 Music training facilitates pitch processing in both music and
Musacchia, G., Strait, D., & Kraus, N. (2008). Relationships be- language. Psychophysiology, 41(3), 341–349. https://doi.org/10.
tween behavior, brainstem and cortical encoding of seen and 1111/1469-8986.00172.x
heard speech in musicians and non-musicians. Hearing Re- Schulze, K., & Koelsch, S. (2012). Working memory for speech
search, 241(1–2), 34–42. https://doi.org/10.1016/j.heares.2008. and music. Annals of the New York Academy of Sciences, 1252(1),
04.013 229–236. https://doi.org/10.1111/j.1749-6632.2012.06447.x
Nilsson, M., Soli, S. D., & Sullivan, J. A. (1994). Development of Stelmachowicz, P. G., Hoover, B. M., Lewis, D. E., Kortekaas, R.
the Hearing in Noise Test for the measurement of speech re- W. L., & Pittman, A. L. (2000). The relation between stimulus
ception thresholds in quiet and in noise. The Journal of the context, speech audibility, and perception for normal-hearing and
Acoustical Society of America, 95(2), 1085–1099. https://doi. hearing-impaired children. Journal of Speech, Language, and Hear-
org/10.1121/1.408469 ing Research, 43(4), 902–914. https://doi.org/10.1044/jslhr.4304.902
Parbery-Clark, A., Skoe, E., Lam, C., & Kraus, N. (2009). Musi- Sternberg, R. J. (1987). Most vocabulary is learned from context.
cian enhancement for speech-in-noise. Ear and Hearing, 30(6), In M. G. McKeown & M. E. Curtis (Eds.), The nature of vo-
653–661. https://doi.org/10.1097/AUD.0b013e3181b412e9 cabulary acquisition. Erlbaum.

Stewart & Pittman: Word Learning and Musical Experience 2881


Sternberg, R. J., & Powell, J. S. (1983). Comprehending verbal teaching and learning: A project of the music educators national
comprehension. American Psychologist, 38(8), 878–893. https:// conference. Oxford University Press.
doi.org/10.1037/0003-066X.38.8.878 Tillman, T. W., & Carhart, R. (1966). An expanded test for speech
Strait, D. L., Kraus, N., Parbery-Clark, A., & Ashley, R. (2010). discrimination utilizing CNC monosyllabic words: Northwest-
Musical experience shapes top-down auditory mechanisms: Evi- ern University Auditory Test No. 6. SAM-TR-66-55 [Technical
dence from masking and auditory attention performance. Hearing report]. SAM-TR. USAF School of Aerospace Medicine, 1–12.
Research, 261(1–2), 22–29. https://doi.org/10.1016/j.heares.2009.12.021 https://doi.org/10.21236/ad0639638
Strait, D. L., Parbery-Clark, A., Hittner, E., & Kraus, N. (2012). Vitevitch, M. S., & Luce, P. A. (2004). A web-based interface
Musical training during early childhood enhances the neural to calculate phonotactic probability for words and non-
encoding of speech in noise. Brain & Language, 123(3), 191–201. words in English. Behavior Research Methods, Instruments,
https://doi.org/10.1016/j.bandl.2012.09.001 & Computers, 36(3), 481–487. https://doi.org/10.3758/
Swets, J. A. (1996). Signal detection theory and ROC analysis in bf03195594
psychology and diagnostics: Collected papers. Psychology Press. Wilson, R. H. (1993). Development and use of auditory compact
Taetle, L., & Cutietta, R. (2002). Learning theories as roots of discs in auditory evaluation. Journal of Rehabilitation Research
current musical practice and research. In R. Colwell & C. & Development, 30(3), 342–351. https://www.ncbi.nlm.nih.gov/
Richardson (Eds.), The new handbook of research on music pubmed/8126659

2882 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
Appendix A
Word Lists—Word Recognition Task

List 3a List 3b

base luck pearl germ road gun rat when mouse phone
mess walk search life shall jug void name hire soup
cause youth ditch team late sheep wire thin cab dodge
mop pain talk lid cheek five half tell hit seize
good date ring pole beg rush note bar chat cool

Appendix B
Sentence Lists—Nonword Detection Task

List 1 List 2

Close all three doors. Clocks tick on time.


Goats climb up rocks. Step to your right.
Pive off the cliff. Glant pots in yards.
Her sants are red. Keep olf black books.
The moon shimes bright. Fresh bread smegs great.
No news is bood. We laughed out loug.
Vrange snarts when worn. Dut skages on feet.
Brown cake dastes sich Choose the drey zie.
Neave that trash oug. Fow draw four gairs.
Smart sers fleep late. Grab thalk ank pens.
Drave the fuge waves. Cheps knead thig dough.
Lead pown the pav. They flot the zeer.

Appendix C
Word Sets – Word/Referent Association Task

Phonotactic Probability
Phonetic
Transcription Orthographic Positional Biphone

Biosimilars (Set 1)
sɑθnəd sothnud 0.3347 0.0081
dɑztəl doztul 0.3425 0.0146
fɑznəʃ foznush 0.3345 0.0073
stɑmən stomun 0.3445 0.0455
hɑmtəl homtul 0.3594 0.0212
Robotics (Set 2)
smɛntɑs smentoss 0.3436 0.0132
pɛdtɑn pedton 0.3513 0.0080
dɛpmɑst depmost 0.3394 0.0187
sɛntɑp sentop 0.3585 0.0333
kɛnsɑm kensom 0.3307 0.0384

Stewart & Pittman: Word Learning and Musical Experience 2883


Appendix D
Passages—Learned Word Detection Task

Passage 1

Today’s biologic medications are significant to patients with serious or chronic illnesses. A biologic somnud is an example of
biotechnology. Among other applications, biotechnology may involve the use of a homtul to produce a medical treatment.
Most biologics come from cells that have been genetically engineered to produce a particular hoznush. This process involves
introducing stomun into a specific type of homtul, typically a harmless type of bacteria, yeast, or mammalian cell, which acts
as a host cell. The stomun tells the cell how to produce the hoznush. Once a cell has been engineered, the next step is to
create a fothnul, which has a complimentary function. This unique fothnul is then frozen and stored and is used as the doztul
from which all future copies of the cells are made. A biosimilar is a somnud that is highly similar to the doztul. However, a
biosimilar is not considered a generic. Generics are medications that are chemically identical to the original brand-name
products. A biosimilar somnud is different, however, in that its stomun is not identical to that of the original biologic. The
homtul that is used as the host cell may differ as well. That is, they come from an entirely different fothnul. However, each
biosimilar must undergo rigorous testing to ensure that the hoznush is effectively the same as the original biologic in terms
of the bioactivity of the doztul. Many patients are hopeful that biosimilars will provide a more affordable treatment option.

Passage 2

The Junior Tech Challenge is a student-centered activity that requires each team to design, build, test, and program an autonomous
and driver-operated robot that must perform a series of tasks. Rookie teams are provided with simple, basic instructions for
building a functioning robot that can be successful in competition. The robot design is based around a kensom, which forms
the base of the robot and connects the pentoss to the rest of the machine. The sentop forms the upper portion of the robot
and provides stability to the robot’s temson. This component also houses the robot’s electronics so that it can be driven
remotely. If built correctly, the temson of the robot can extend to manipulate objects using the depmost at its end. The
component that makes the robot autonomous is the pentoss. The whole machine is powered by the drive motor. For this
reason, the sentop must be sturdy in order to provide precise movement to the depmost, but also lightweight, so as not
to overload the motor. Assembly of the kensom consists of using socket-head cap screws to connect each kedton to form
the rectangular base. The temson of the robot, when not in use, rests on the front kedton. Components used in the construction
of the sentop include a set of gears and axels that allow the depmost to be extended forward or upward. Kedton alignment is
critical to construction, as the pentoss of the robot will be difficult to maneuver if the kensom is crooked.

Appendix E
Word Bank—Word/Referent Recall Task

mednost domfush sentop homtul smedmod


pedton todsun depmost sothnud kentop
kensom stomun foznush doztul nothfud
stomtul tensom sedtoss stoznud smentoss

2884 Journal of Speech, Language, and Hearing Research • Vol. 64 • 2870–2884 • July 2021
Copyright of Journal of Speech, Language & Hearing Research is the property of American
Speech-Language-Hearing Association and its content may not be copied or emailed to
multiple sites or posted to a listserv without the copyright holder's express written permission.
However, users may print, download, or email articles for individual use.

You might also like