Professional Documents
Culture Documents
Bregman Et Al 2016 Songbirds Use Spectral Shape Not Pitch For Sound Pattern Recognition
Bregman Et Al 2016 Songbirds Use Spectral Shape Not Pitch For Sound Pattern Recognition
Bregman Et Al 2016 Songbirds Use Spectral Shape Not Pitch For Sound Pattern Recognition
pattern recognition
Micah R. Bregmana, Aniruddh D. Patelb, and Timothy Q. Gentnerc,d,e,f,1
a
Department of Cognitive Science, University of California, San Diego, La Jolla, CA 92093; bDepartment of Psychology, Tufts University, Medford, MA 02155;
c
Department of Psychology, University of California, San Diego, La Jolla, CA 92093; dSection of Neurobiology, University of California, San Diego, La Jolla,
CA 92093; eNeurosciences Graduate Program, University of California, San Diego, La Jolla, CA 92093; and fKavli Institute for Brain and Mind, La Jolla, CA 92093
Edited by Dale Purves, Duke University, Durham, NC, and approved December 17, 2015 (received for review August 3, 2015)
Humans easily recognize “transposed” musical melodies shifted up relational pitch processing for tone sequences (22, 29, 30; but
or down in log frequency. Surprisingly, songbirds seem to lack this see refs. 31, 32). Although songbirds can easily learn to dis-
capacity, although they can learn to recognize human melodies criminate between sequences of several tones (say, ascending vs.
and use complex acoustic sequences for communication. Decades descending sequences of four pitches or between the opening
of research have led to the widespread belief that songbirds, un- phrases of two different human melodies), even modest gener-
like humans, are strongly biased to use absolute pitch (AP) in mel- alization to frequency-shifted versions of the same relative pitch
ody recognition. This work relies almost exclusively on acoustically patterns requires extensive training (33), and this generalization
simple stimuli that may belie sensitivities to more complex spectral is restricted to narrow frequency ranges near the training tones
features. Here, we investigate melody recognition in a species of (34). However, songbirds can do relational processing for certain
songbird, the European Starling (Sturnus vulgaris), using tone se- aspects of tone sequence structure. Starlings, for example, can
quences that vary in both pitch and timbre. We find that small learn to discriminate between tone sequences of different tempi
manipulations altering either pitch or timbre independently can and can generalize this discrimination to novel sequences at
drive melody recognition to chance, suggesting that both percepts double the training tempo (35). They can also learn to discrim-
are poor descriptors of the perceptual cues used by birds for this inate between tone sequences that increase versus decrease in
task. Instead we show that melody recognition can generalize
loudness and generalize this discrimination to different loudness
even in the absence of pitch, as long as the spectral shapes of
ranges (36).
the constituent tones are preserved. These results challenge con-
Past work has characterized the difference in how humans and
ventional views regarding the use of pitch cues in nonhuman au-
songbirds recognize transposed pitch sequences in terms of a
ditory sequence recognition.
reliance on relative versus absolute pitch (AP) in tone sequence
perception. Although most humans rely primarily on relative
| | comparative cognition | pitch processing |
Downloaded from https://www.pnas.org by 189.203.88.166 on March 18, 2024 from IP address 189.203.88.166.
Results
mance on the training sequences and not statistically different
To investigate perception of tone sequences that systematically from chance based on binomial tests. Even very small pitch
vary in both pitch and timbre, we trained starlings to recognize changes that placed the test stimuli entirely within the range of
short four-tone sequences that either ascended or descended in the training stimuli rendered the sequence unrecognizable.
Specifically, two interleaved stimuli (starting at B4 and C#5,
PSYCHOLOGICAL AND
COGNITIVE SCIENCES
corresponding to shift amounts of one and three semitones; Fig.
A response speaker C 2A) were one semitone below and one semitone above pairs of
ports 1 training stimuli, respectively, yet recognition of these stimuli was
at chance (mean of interleaved stimuli = 49.2% correct) and was
significantly below recognition performance on the training se-
Proportion Correct
0.8
food
port quences interleaved during test sessions [Wilcoxon rank-sum Z
0.6
(25) = –4.13, P < 0.0001; Materials and Methods].
0.4
These initial results indicate that starlings do not recognize
frequency-shifted versions of spectrotemporally complex tone
0.2 sequences. That is, although they readily learned to discriminate
ascending from descending pitch sequences that were also dis-
0
0 50 100 150
tinguished by their pattern of spectral shape variation over time,
an upward or downward frequency shift of even a single semi-
B 80 80 Training block (200 trials/block)
tone severely disrupted discrimination. Thus, the birds do not
78 78 78 78 appear to take advantage of the redundant pattern of spectral
76 76 76 76 76 76
information in the tone sequences to facilitate recognition of
transposed sequences. At first glance, these results appear to
74 74 74 74 74 74 provide strong support for the prevailing view that songbirds are
72 72 72 72
biased toward using AP cues in tone sequence recognition. In-
deed, the results suggest that the bias is particularly strong when
70 70 tone sequences have spectral shape variation over time (22),
even when that spectral shape variation alone could be used to
Fig. 1. (A) Schematic of the operant panel used for behavioral testing. discriminate the sequences.
Three response ports, the food port, and playback speaker are labeled.
(B) Schematic of the six training stimuli used in experiment 1. Numbers in
Sequence Recognition with Novel Timbre. Importantly, confirma-
each box refer to the MIDI note number (e.g., 70, Bb4, 466.16 Hz; 72, C5, 523.25 Hz,
etc.; see Materials and Methods), and color indicates the instrument timbre used
tion that AP really is the primary cue for tone sequence recog-
(blue, oboe; red, choir “aah”; green, muted trumpet; purple, synthesizer). Each of nition requires evidence of positive generalization. That is, if our
the three ascending and three descending tone sequences are connected with birds are using AP, then they should readily recognize ascending
black lines. (C) Mean proportion of correct responses (±SE) for each of the five and descending tone sequences that match the pitch of the
subjects (one color per subject) over the course of training. training stimuli but not their spectral shape. To test this idea, we
0.8 0.8
Proportion Correct
Proportion correct
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
−12 −8 −6 −3 −1 0 1 2 3 4 6 8 12 -15 -10 -5 0 5 10 15
Fig. 2. Pitch and timbre results. (A) Mean proportion of correct responses to tone sequences with novel pitches but familiar timbre. The x axis shows pitch
shift (in semitones) relative to the lowest frequency training stimulus. Gray circles show recognition of the original training sequences during test sessions.
Each point shows the data for one subject. (B) Performance as in A but for sequences composed of sounds with a novel (piano) timbre at the training pitches
(gray) or shifted in pitch (black). Dashed lines indicate Bonferroni-corrected 95% CI around chance (50%).
also presented (during the same test sessions already described) the dominant cue that starlings use to recognize tone sequences,
sequences composed of tones with a novel piano timbre (Fig. S1 then subjects should more easily recognize noise-vocoded versions
and Audio Files S6–S8) at the same pitch as two of the training of the training stimuli than piano-tone versions. If AP is the
sequences (the lowest pitch ascending–descending pair in Fig. 1B). dominant cue, then this pattern of results should be reversed.
If subjects use AP, recognition should generalize to these test We first tested for transfer from the original three ascending
sequences. This was not the case. As with alterations in pitch, versus descending training sequences to noise-vocoded versions
altering the timbre of the training sequences reduced recognition of these stimuli and then to piano-tone versions of the same
performance to chance (mean = 53.5% correct; range, 47.7% stimuli (Materials and Methods). Generalization from the original
correct to 57.4% correct; Fig. 2B, gray circles). For all subjects, training is revealed in both the strength of the initial transfer and
performance was well below recognition of the training sequences in the subsequent acquisition rate. Surprisingly, even in the first
(mean = 93.7% correct; binomial test adjusted for multiple com- 100 trials after transfer, four of five subjects performed better
parisons) and within the 95% confidence interval (CI) around than expected by chance on the noise-vocoded sequences
chance (∼41–59% correct, based on mean of 170.3 trials, cor- (mean = 70.0% correct; range, 55–86%; upper bound of 99% CI,
Downloaded from https://www.pnas.org by 189.203.88.166 on March 18, 2024 from IP address 189.203.88.166.
rected for multiple comparisons). Thus, changing the timbre of 63.1%). Over the first five 100-trial blocks with the noise-vocoded
the tone sequences drove recognition to chance, even when the stimuli, performance was similarly strong (mean = 78.2% correct;
AP of the training sounds was preserved. Given these initial range, 67–90.6%) but significantly below the performance just be-
results, it is not surprising that performance was at chance for fore transfer [paired t(4) = 4.05, P = 0.015; Fig. 3]. Response ac-
tone sequences with novel (piano) timbre at novel pitch levels curacy continued to improve with additional training, reaching a
shifted by ±1, ±3, ±6, ±8, and ±12 semitones relative to the training mean of 86.5% correct in blocks 6–10 (Fig. 3) and continuing to
stimuli (Fig. 2B, black circles). Our observation that altering timbre improve thereafter until it was statistically indistinguishable from
while holding AP constant drives recognition to chance (Fig. 2B) pretransfer levels in blocks 11–14 (mean = 87.2% correct). In
challenges the broadly held view that starlings use AP to recognize contrast, transfer from the original training sequences to the piano-
tone sequences. tone stimuli was much poorer. For all subjects, performance in the
first 100-trial block after transfer to piano-tone sequences did
Recognition of Noise-Vocoded Sequences. The preceding results not differ significantly from chance (mean = 55.4% correct;
demonstrate that independent manipulations of either pitch range, 49–61%; upper bound of 99% CI, 63.1%). Moreover,
height or timbre severely disrupt tone sequence recognition (Fig. although performance did improve with further training, it did so
2 A and B). One interpretation of these results is that the per- only slowly with the mean performance for one of five and four of
cepts of both pitch and timbre provide relatively poor descrip- five subjects exceeding the upper 99% CI around chance in blocks
tions of the perceptual cues available to starlings. We reasoned 1–5 and 6–10, respectively (Fig. 3).
instead that starlings may rely on a perception of each tone based Direct comparisons support the strong differences in responding
on its absolute spectral envelope (i.e., the overall pattern of to the noise-vocoded and piano-tone sequences (Fig. 3). Response
spectral amplitudes across particular frequency bands) rather accuracy varied significantly as a function of test condition
than abstracted features derived from the fundamental fre- [pretransfer, posttransfer blocks 1–5, posttransfer blocks 6–10;
quency (AP) or on the relative power in the harmonics (timbre). By F(2, 8) = 97.08, P < 0.0001, linear mixed effects model (LMM)].
this view, disruption to the absolute spectral envelope (and thus More importantly, the mean accuracy of response to the noise-
recognition) could occur by changing the relative power in the vocoded stimuli was significantly higher than that for the piano
harmonics (which plays a role in human timbre perception) or by tones, F(1, 4) = 29.4, P = 0.0056, LMM, and the acquisition rate
changing the absolute frequencies where power is found (which was significantly faster, F(2, 8) = 26.8, P = 0.0003, LMM
plays a role in human harmonic pitch perception). To test this idea, Condition × Stimulus Interaction. Post hoc tests (Tukey’s honestly
we conducted a third experiment that directly pitted spectral shape significant difference) show no difference between performance on
information against AP information, using noise-vocoded ver- the two stimulus sets pretransfer (P = 0.99) but statistically signifi-
sions of the training stimuli (which preserved the absolute cant differences in mean performance in blocks 1–5 posttransfer
spectral envelope of each tone but removed periodic pitch cues; (P = 0.0014) and blocks 6–10 posttransfer (P = 0.0008). This
Fig. S2 and Audio Files S9–S11) and piano-tone versions of the provides a strong demonstration that starlings can generalize
training sequences (which preserve the pitch but change the recognition of ascending versus descending tone sequences but
absolute spectral envelope of each tone; Audio Files S6–S8, cf., do so using the absolute spectral envelope rather than periodic
Materials and Methods). If absolute spectral envelope variation is pitch cues.
PSYCHOLOGICAL AND
COGNITIVE SCIENCES
components of each tone (and across the tone sequence), it none-
sequences. By using sound sequences that simultaneously vary in theless alters the absolute frequencies (Figs. S3–S5). It thus seems
pitch and timbre (as many natural sound sequences do) and by that the absolute spectral envelope governs avian tone sequence
using an acoustic resynthesis technique from speech science that recognition. For pure tones, the spectral band envelope corresponds
removes pitch cues (noise vocoding) but preserves overall spec- directly to pitch; for complex tones, the spectral band envelope
tral shape, we find that the absolute spectral envelope of each contributes to both pitch and timbre percepts.
sound (i.e., overall spectral shape across particular frequencies) Importantly, absolute spectral envelope is not likely to be the
drives the recognition of tone patterns rather than AP. only perceptual feature that songbirds can use for auditory rec-
These results are surprising given the similarities between ognition. Previously, we showed that starlings can maintain the
starlings and humans in basic psychoacoustic abilities (16) and learned recognition of conspecific songs even when those songs
given that these birds do perceive the pitch of tones with complex are shifted in frequency by large amounts (22). Although the
harmonic structure, including the pitch of the missing funda- precise cues that starlings use to recognize frequency-shifted
mental (18). Thus, although the pitch of a sound may be salient conspecific songs require future study, such manipulations alter
to songbirds, unlike humans they do not seem to use pitch to the absolute spectral envelope of the signals. However, the
generalize across sound patterns. This was particularly notable in spectrotemporal complexity of songs and other natural stimuli
experiment 3, where songbirds transferred much more readily to provides additional perceptual cues (e.g., rhythm and amplitude
noise-vocoded versions of the ascending versus descending training envelope) that are invariant across frequency shifts. Moreover,
sequences than to piano-tone versions that preserved the pattern of the importance of any given cue can vary depending on listening
task. Sensitivity to these features in specific tasks may help to
AP of the training stimuli. To human ears, the piano-tone versions
explain prior evidence suggesting specialized song-processing
sound quite similar to the training sequences because of their
mechanisms in birds (15).
identical pitch patterns, whereas the noise-vocoded versions sound Our results indicate that behavioral effects tied classically to
strikingly different from the training stimuli (compare Audio Files changes in the frequencies of pure tones (29, 30, 37–39) should not
S1–S3 to Audio Files S6–S11). However, for the birds, this pattern be strictly interpreted as changes to the percept of pitch. Instead, we
of perceptual similarity seems to be reversed: The noise-vocoded suggest a revised perspective on melody recognition by songbirds.
stimuli are treated as more similar to the training stimuli than are We propose that unlike humans, for whom pitch plays a dominant
the piano-tone versions that preserve AP. In humans, speech rec- role in the perception of melodic sequences, songbirds rely on a
ognition is famously robust to the pitch-degrading manipulations perceptual representation that appears more closely tied to absolute
introduced by noise vocoders (43), whereas similar manipulations spectral envelope. This surprising difference has implications both
have severe impacts on music perception (44). Our observation that for research in the cognitive psychology of auditory perception and
1B). The lowest ascending training stimulus started on Bb4 (466.16 Hz) and number of consecutive trials, otherwise receiving only the secondary visual
continued to C5 (523.25 Hz), D5 (587.33 Hz), and E5 (659.26 Hz) (corre- reinforcer for correct responses. Incorrect responses or nonresponses reset
sponding to MIDI notes 70, 72, 74, and 76, respectively). The corresponding the count of correct trials. Eventually the fixed ratio was set at six trials for
descending stimulus used the same pitches in reverse order, starting at E5 each subject.
and ending at Bb4. The other two ascending stimuli started on C5 and D5, Recognition testing. To test generalization of the pitch-shifted and novel
ending on F#5 and G#5, respectively, whereas descending stimuli used the timbre tone sequences, we used a probe procedure. On 66% of trials, we
same pitches in reverse order. Thus, the two other ascending and descending presented the training stimuli (randomly selected as in the initial training),
stimuli represented upward shifts of two and four semitones relative to the and on the remaining 33% of trials, we presented one of the pitch-shifted or
original Bb4–C5–D5–E5 or E5–D5–C5–Bb4 sequence. All stimuli were nor- novel timbre stimuli. To keep response rates high, we required subjects to
malized to a mean power of 65 dB. respond correctly to six consecutive training stimulus trials. Responses to the
Pitch-shifted stimuli. We synthesized test stimuli with the same interval spacing test stimuli were never immediately followed by reinforcement and did not
and timbre sequences as the training stimuli but starting at pitches not heard affect the consecutive correct response counter. Failure to respond to any
during training. The novel ascending stimuli started at Bb3, D3, F#3, A4, B4,
stimulus reset the response counter so that six correct responses to training
C#5, F5, G5, and Bb5. Relative to the lowest ascending training sequence
stimuli were again required to receive a food reward.
starting on Bb4, these sequence represent shifts of –12, –8, –6, –3, –1, 1, 3, 6, 8,
Transfer procedure. We a used a transfer procedure rather than a probe-rec-
and 12 semitones, respectively. Novel sequences starting at B4 and C#5 lie be-
ognition procedure in experiment 3. In the transfer procedure, subjects were
tween two training stimuli but were never heard during training, whereas the
switched immediately from sessions in which the training stimuli are pre-
other test stimuli lie partly or entirely outside of the training frequency range.
sented on all trials to sessions in which the test stimuli were presented on all
Piano-tone stimuli. We also constructed novel timbre versions of the training
trials. During these test sessions, subjects were reinforced (or punished) for
stimuli—that is, three ascending and three descending sequences—matched
in AP and duration pattern to the sequences in Fig. 1 but using only piano correct (or incorrect) responses to test stimuli, as during the training sessions.
tones. In addition, we synthesized versions of these novel timbre sequences Above chance performance during initial transfer trials and/or rapid acqui-
that were shifted by ±1, ±3, ±6, ±8, and ±12 semitones relative to lowest sition indicates the generalization, or transfer, of learning from the training
frequency ascending/descending pair of training stimuli. to the test stimuli. Initial transfer performance that falls to chance and takes
Noise-vocoded stimuli. To disrupt pitch cues while retaining the frequency- longer to recover indicates weaker generalization and the need to relearn the
specific spectral shape of each tone, we created noise-vocoded versions of the recognition task anew. The primary difference compared with the probe
training stimuli. Noise vocoding is accomplished by dividing an acoustic signal procedure is that the transfer procedure differentially reinforces responses
into a fixed number of frequency bands, extracting the amplitude envelope and thus affords subjects an opportunity to learn stimulus response associ-
within each band, and then using this envelope to modulate band-pass filtered ations, providing access to a more subtle behavioral measure in acquisition
white noise. These amplitude-modulated noise bands are then recombined to rate. We exposed subjects to a mean of 2,636 trials with the noise-vocoded
create the noise-vocoded signal (see ref. 56). Noise vocoding has been used for stimuli (range, 1,389–3,308) before returning them to the original training
many years in speech research to investigate the role of detailed spectral struc- stimuli. After ensuring stable, accurate recognition (mean = 94.4% correct;
ture (independent of pitch) in speech perception (43). Noise-vocoded speech range, 87.6–97.0% correct), we then transferred them to piano-tone ver-
sounds somewhat like whispered speech and is highly intelligible to humans if sions of the training stimuli.
the number of frequency bands is sufficiently large (e.g., 15 bands between 50
and 8,000 Hz, as in ref. 57), because this preserves the overall time-varying shape ACKNOWLEDGMENTS. The research was partially funded by National Institutes
of the speech spectrum. of Health Grant R01DC008358. A.D.P. was supported by Neurosciences Research
We constructed noise-vocoded versions of our training stimuli by dividing Foundation as part of its program on music and the brain at The Neurosciences
each original training stimulus into 16 logarithmically spaced frequency Institute.
19. Itatani N, Klump GM (2014) Neural correlates of auditory streaming in an objective 48. Pitt MA, Crowder RG (1992) The role of spectral and dynamic cues in imagery for
behavioral task. Proc Natl Acad Sci USA 111(29):10738–10743. musical timbre. J Exp Psychol Hum Percept Perform 18(3):728–738.
20. MacDougall-Shackleton SA, Hulse SH, Gentner TQ, White W (1998) Auditory scene 49. Caruso VC, Balaban E (2014) Pitch and timbre interfere when both are parametrically
analysis by European starlings (Sturnus vulgaris): Perceptual segregation of tone se- varied. PLoS One 9(1):e87065.
quences. J Acoust Soc Am 103(6):3581–3587. 50. Pitt MA (1994) Perception of pitch and timbre by musically trained and untrained
21. Baptista LF, Keister RA (2005) Why birdsong is sometimes like music. Perspect Biol listeners. J Exp Psychol Hum Percept Perform 20(5):976–986.
Med 48(3):426–443. 51. Bahr N, Christensen CA, Bahr M (2005) Diversity of accuracy profiles for absolute pitch
22. Bregman MR, Patel AD, Gentner TQ (2012) Stimulus-dependent flexibility in non-
PSYCHOLOGICAL AND
COGNITIVE SCIENCES
recognition. Psychol Music 33(1):58–93.
human auditory pitch processing. Cognition 122(1):51–60. 52. Hedger SC, Heald SL, Nusbaum HC (2013) Absolute pitch may not be so absolute.
23. Nicolai J, Gundacker C, Teeselink K, Güttinger HR (2014) Human melody singing by
Psychol Sci 24(8):1496–1502.
bullfinches (Pyrrhula pyrrula) gives hints about a cognitive note sequence processing.
53. Hulse SH, Bernard DJ, Braaten RF (1995) Auditory discrimination of chord-based
Anim Cogn 17(1):143–155.
spectral structures by European starlings (Sturnus vulgaris). J Exp Psychol Gen 124(4):
24. West MJ, King AP (1990) Mozart’s starling. Am Sci 78(2):106–114.
409–423.
25. Brown S, Jordania J (2013) Universals in the world’s musics. Psychol Music 41(2):
54. Hervais-Adelman AG, Davis MH, Johnsrude IS, Taylor KJ, Carlyon RP (2011) General-
229–248.
ization of perceptual learning of vocoded speech. J Exp Psychol Hum Percept Perform
26. Patel AD, Demorest SM (2013) Comparative music cognition: Cross-species and cross-
cultural studies. The Psychology of Music, ed Deutsch D (Elsevier Academic Press, San 37(1):283–295.
Diego), 3rd Ed, pp 647–681. 55. Shannon RV, Fu QJ, Galvin J, 3rd (2004) The number of spectral channels required for
27. Plantinga J, Trainor LJ (2005) Memory for melody: Infants use a relative pitch code. speech recognition depends on the difficulty of the listening situation. Acta Otolaryngol
Cognition 98(1):1–11. Suppl (552):50–54.
28. McDermott JH, Lehr AJ, Oxenham AJ (2008) Is relative pitch specific to pitch? Psychol 56. Davis MH, Johnsrude IS, Hervais-Adelman A, Taylor K, McGettigan C (2005) Lexical
Sci 19(12):1263–1271. information drives perceptual learning of distorted speech: Evidence from the com-
29. Hulse SH, Cynx J, Humpal J (1984) Absolute and relative pitch discrimination in serial prehension of noise-vocoded sentences. J Exp Psychol Gen 134(2):222–241.
pitch perception by birds. J Exp Psychol Gen 113(1):38–54. 57. Davis MH, Johnsrude IS (2003) Hierarchical processing in spoken language compre-
30. Page SC, Hulse SH, Cynx J (1989) Relative pitch perception in the European starling hension. J Neurosci 23(8):3423–3431.
(Sturnus vulgaris): Further evidence for an elusive phenomenon. J Exp Psychol Anim 58. Gentner TQ (2008) Temporal scales of auditory objects underlying birdsong vocal
Behav Process 15(2):137–146. recognition. J Acoust Soc Am 124(2):1350–1359.