Event-Related Brain Responses While Listening To Entire Pieces of Music

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Neuroscience 312 (2016) 58–73

EVENT-RELATED BRAIN RESPONSES WHILE LISTENING TO ENTIRE


PIECES OF MUSIC
H. POIKONEN, a* V. ALLURI, b E. BRATTICO, a,c Phases – followed by a rapid increase enhanced the ampli-
O. LARTILLOT, d M. TERVANIEMI a,e AND tudes of N100 and P200 responses. These ERP responses
M. HUOTILAINEN a,e,f resembled those to simpler sounds, making it possible
a
Cognitive Brain Research Unit, Cognitive Science, Institute to utilize the tradition of ERP research with naturalistic para-
of Behavioural Sciences, University of Helsinki, P.O. Box 9 digms. Crown Copyright Ó 2015 Published by Elsevier Ltd.
(Siltavuorenpenger 1 B), FI-00014 University of Helsinki, Finland on behalf of IBRO. All rights reserved.
b
Department of Music, University of Jyväskylä, P.O. Box 35, 40014
University of Jyväskylä, Finland
c
Center for Music in the Brain (MIB), Department of Clinical
Key words: event-related potentials, music, electroen-
Medicine, Aarhus University, Nørrebrograde 44, DK-8000 Aarhus C,
cephalography, musical features, N100, P200.
Denmark
d
Department of Architecture, Design and Media Technology,
University of Aalborg, Rendsburggade 14, DK-9000 Aalborg,
Denmark INTRODUCTION
e
Cicero Learning, P.O. Box 9 (Siltavuorenpenger 5 A), For centuries, music has been an important part of various
FI-00014 University of Helsinki, Finland
f
cultures from tribal drumming rites or performances of a
Finnish Institute of Occupational Health, Haartmaninkatu 1 A, symphony orchestra to the urban underground electronic
00250 Helsinki, Finland
music scene. Making music together as well as listening
to the music of a given culture assists in forming a sense
Abstract—Brain responses to discrete short sounds have of community. At an individual level, music has versatile
been studied intensively using the event-related potential effects, e.g. regulation of mood and emotions (Panksepp
(ERP) method, in which the electroencephalogram (EEG) sig- and Bernatzky, 2002 for a review). Along with the technical
nal is divided into epochs time-locked to stimuli of interest. development of brain-imaging methods, the neural
Here we introduce and apply a novel technique which dynamics underlying music perception, cognition, and
enables one to isolate ERPs in human elicited by continuous emotions started to fascinate researchers (Peretz and
music. The ERPs were recorded during listening to a Tango
Zatorre, 2003; Koelsch, 2014). The field of neurosciences
Nuevo piece, a deep techno track and an acoustic lullaby.
Acoustic features related to timbre, harmony, and dynamics
and music could offer explanations concerning the impor-
of the audio signal were computationally extracted from the tance of music for humans as well as answer to questions
musical pieces. Negative deflation occurring around 100 mil- like: Why is music perceived differently than other auditory
liseconds after the stimulus onset (N100) and positive defla- stimuli like speech and environmental sounds? How are
tion occurring around 200 milliseconds after the stimulus the musical characteristics related to harmony, dynamics
onset (P200) ERP responses to peak changes in the acoustic and rhythm processed in the brain?
features were distinguishable and were often largest for Traditionally, brain research of music with electro-
Tango Nuevo. In addition to large changes in these musical magnetic methods such as electroencephalography (EEG)
features, long phases of low values that precede a rapid and magnetoencephalography (MEG) has focused on
increase – and that we will call Preceding Low-Feature
understanding the neural processing of separated artificial
sounds designed to suit to the specification of each
*Corresponding author. Address: Cognitive Brain Research Unit, particular experiment. This broad line of music-related
Institute of Behavioural Sciences, University of Helsinki, P.O. Box 9 research includes different sequential sounds used as
(Siltavuorenpenger 1 B), FI-00014 University of Helsinki, Finland. stimuli – pure vs. complex tones (e.g., Pantev et al., 1995;
Mobile: +358-407348028.
Tervaniemi et al., 2000), consonant vs. dissonant chords
E-mail addresses: hanna.poikonen@helsinki.fi (H. Poikonen), vinoo.
alluri@campus.jyu.fi (V. Alluri), elvira.brattico@helsinki.fi (E. Brattico), (e.g. Brattico et al., 2010; Virtala et al., 2014), simple mono-
olartillot@gmail.com (O. Lartillot), mari.tervaniemi@helsinki.fi phonic melodies with and without harmony (Fujioka et al.,
(M. Tervaniemi), minna.huotilainen@helsinki.fi (M. Huotilainen). 2005; Brattico et al., 2006) and chordal cadences (Koelsch
Abbreviations: EEG, electroencephalography; ERP, event-related
potential; FIR, finite impulse response; HG, Heschl’s gyrus; ICA,
and Jentschke, 2008). In addition to oddball paradigms
independent component analysis; ISI, inter-stimulus interval; MoRI, (Näätänen et al., 1978), multifeature paradigms have also
magnitude of the rapid increase; N100, negative deflation occurring been established both in adults (Marie et al., 2012; Kühnis
around 100 milliseconds after the stimulus onset; P200, positive et al., 2013; Tervaniemi et al., 2014) and in children
deflation occurring around 200 milliseconds after the stimulus onset;
PLFP, Preceding Low-Feature Phase; RMS, root mean square; STG, (Chobert et al., 2011, 2014; Putkinen et al., 2014). These
superior temporal gyrus. studies have offered precious information about the

http://dx.doi.org/10.1016/j.neuroscience.2015.10.061
0306-4522/Crown Copyright Ó 2015 Published by Elsevier Ltd. on behalf of IBRO. All rights reserved.

58
H. Poikonen et al. / Neuroscience 312 (2016) 58–73 59

processing of individual elements of music and paved the and may distort the results (see Novitski et al., 2001,
way toward the research of natural listening, in which the 2006). Alluri et al. (2012) took the research of natural
unique characters of music are perceived: spontaneity, music further and studied the neural processing of individ-
impurity, interaction and continuous flow of overlapping ual musical features with fMRI during listening to a record
notes. of real orchestral music. In their novel approach, the fMRI
Several research groups have taken the next step and data were correlated with computationally extracted musi-
studied the brain processes evoked by long musical cal features to study the brain activation relevant to each
excerpts with different EEG analysis approaches. For particular musical feature. However, the sampling rate of
example, Bhattacharya et al. (2001) showed that the the data was 2 s, producing overlapping of the fast cranial
gamma-band synchrony increases over distributed corti- processing within each sample. Similarly, Alluri et al.
cal areas with musical practice. This increase found in (2013) used two medleys, one comprising full songs by
professional musicians in comparison with laymen refers Beatles and the other comprising instrumental pieces
to more advanced musical memory when dynamically belonging to the classical, jazz or pop/rock genres as
binding together several features of the intrinsic complex- stimuli. In both studies by Alluri and colleagues, the musi-
ity of music. In addition, professional training in music cal features were chosen so that they depict the musically
refines emotional arousal, which was studied in a whole and acoustically most relevant events and characteristics
musical piece by Mikutta et al. (2014). During high arou- of the musical pieces at two analysis window durations.
sal, professional musicians exhibited an increase of pos- Both short-term features characterizing timbral properties
terior alpha, central delta, and beta rhythm. Also among and long-term features related to context-dependent
laymen, music is shown to be a powerful stimulus modu- aspects of music were shown to correlate with activation
lating emotional arousal (Mikutta et al., 2012). To create in various brain regions, with the largest consistency
these modulations in emotional states and to transmit among features and musical genres for an anterior area
esthetic experiences, confirmation and violation of expec- of the superior temporal gyrus (Alluri et al., 2013). When
tations are crucial in music perception. Pearce et al. investigating the neural activity for the low- or high-level
(2010) showed that low-probability notes, when compared acoustic features, it was found that the timbral features
to high-probability notes, elicited larger late negative activated mainly the auditory cortex and the somatomotor
event-related potential (ERP) component (at a time period regions of the cerebral cortex, as well as the cerebellum,
of 400–450 ms), and increased beta-band oscillation over whereas the tonal and rhythmic features activated limbic
the parietal lobe and stronger long-range brain synchro- and motor regions of the brain (Alluri et al., 2012).
nization between multiple brain regions. Meyer et al. To combine the development toward real musical
(2006a,b) investigated the perception of musical timbre stimuli of EEG and the studies of individual musical
by choosing as stimuli instrument sounds and comparing features of fMRI, we created a novel experimental
them to the sine wave sounds. In addition to the paradigm to reveal music-induced brain responses by
enhanced N1/P2 responses, they revealed how instru- extracting several relevant individual musical features
ments with varying timbre activated also brain regions from continuous musical pieces and studying the
associated with emotional and auditory imagery functions. electrophysiological brain responses evoked by changes
Grewe et al. (2005) studied the strong emotional experi- in these features. We decided to investigate the electric
ence of chills evoked by music noting that the peak emo- brain activity elicited by the same acoustic features
tion of chills is a result of attentive, experienced and extracted from musical pieces belonging to three very
conscious musical enjoyment. Furthermore, results by different musical genres: a Tango Nuevo piece, a deep
Schaefer et al. (2011) suggest that recollecting an event techno track and an acoustic lullaby. The Tango Nuevo
with emotional content involves multiple neural retrieval piece, Adios Noniño by Astor Piazzolla was the same
subprocesses. These studies indicate that the immersive piece used by Alluri et al. (2012, 2013) in their fMRI stud-
sound space created by music, and its creation of strong ies. In these studies, they observed that low-level musical
subjective experiences with vivid memories, emotions features, as used in our study, are mainly processed in
and imagination, can indeed be investigated with multi- the auditory brain regions located in temporal cortices.
faceted EEG analyses. Previous knowledge from the processing of artificial
Several functional MRI (fMRI) studies have focused sounds was utilized in our study of continuous music, in
on using natural continuous music. Typically, excerpts which the sounds are connected to each other in an
from real musical pieces are used as stimuli (Morrison overlapping and dynamic manner. The single-trial ERP
et al., 2003; Koelsch et al., 2006; Pereira et al., 2011; method was considered the best option for studying the
Brattico et al., 2011) and more recently even full musical immediate neural responses on auditory areas of
pieces (Alluri et al., 2012, 2013; Abrams et al., 2013; temporal cortices corresponding to rapid changes in
Toiviainen et al., 2014). Due to the slower temporal low-level musical features. We hypothesized that the
dynamics of hemodynamic reactions recorded with fMRI rapid changes in the musical features of real music
compared to the electromagnetic brain research methods, would elicit similar sensory components as revealed in
in the former fMRI studies brain activity is averaged the conventional ERP studies using tone stimuli, and
across several seconds, thus collapsing the fast dynamic that the amplitudes of the ERP components would be
feature changes that occur in key moments in the musical dependent on the magnitude of the rapid increase in the
pieces. In addition, the fMRI device produces loud back- individual feature value (Picton et al., 1977; Polich et al.,
ground noise which interacts with the auditory stimuli 1996) as well as the duration of the preceding time period
60 H. Poikonen et al. / Neuroscience 312 (2016) 58–73

with low-feature values (Polich et al., 1987) in a similar 20 to 46 years (27.1 on average). No participants
way as the amplitudes of traditional ERP components reported hearing loss or history of neurological illnesses.
depend of stimulus presentation characteristics. All participants were musical laymen with no professional
Our main interests were the sensory negative musical education. However, many participants reported
deflation occurring around 100 milliseconds after the a background in different music-related interests such as
stimulus onset (N100) and positive deflation occurring learning to play an instrument, producing music with a
around 200 milliseconds after the stimulus onset (P200) computer, dancing or singing. Age and the non-
components. The N100 component reflects the acoustic professional musical background of each participant are
energy on stimulus onset and is largest in the fronto- reported in the Table 1. The experimental protocol was
central region (Woods, 1995). Generally, the N100 is conducted in accordance with the Declaration of Helsinki
thought to represent the initial extraction of the informa- and approved by the ethics committee of the Faculty of
tion from sensory analysis of the stimulus (Näätänen the Behavioural Sciences at the University of Helsinki.
and Picton, 1987), or the excitation relating to the
allocation of a channel for information processing out of
Stimuli
the auditory cortex (Hansen and Hillyard, 1980). The
N100-P200 complex is referred to as the vertex potential Three pieces of music from different genres were used as
because of its largest amplitude on the upper surface of stimuli: a modern tango (Adios Noniño by Astor
the brain (Hillyard and Kutas, 1983). Alluri et al. (2012) Piazzolla), an acoustic lullaby (Bless by Kira Kira) and a
suggested that timbre-related acoustic components of deep techno track (My Black Sheep by Len Faki and
continuous music correlate positively with activations in remixed by Radio Slave). The spectrograms of each
large areas of the temporal lobe. Since we wanted to song are shown in Fig. 1. The 8.5-min tango of Astor
study fast neural responses on auditory areas in the tem- Piazzolla was recorded in a concert in Lausanne,
poral cortices, we chose to focus on the timbral features Switzerland. This piece was chosen due to its large
of brightness, spectral flux, and zerocrossing rate. In addi- variation among several musical features related to
tion, the feature root mean square (RMS) related to loud- loudness, timbre, tonality and rhythm and to allow
ness was studied. The ERP method is shown to be comparison to the work of Alluri et al. (2012, 2013). The
adequate in the studies of musical timbre (Pantev et al., original duration of the acoustic lullaby Bless was 3 min
2001; Caclin et al., 2008). Also, Meyer et al. (2006a,b) and 17 s and it was prolonged with Audacity version
proposed that the N100 and P200 responses are 1.2.6 so that at the point of 2 min and 15 s the song
enhanced to instrumental tones when compared to sine was repeated again from the time point of 20 s until the
wave tones. In our study, the time period with low- end of the song, giving a total stimulus length of 5 min
feature values preceding the rapid increase in the value and 43 s. The acoustic lullaby had English lyrics pro-
of the same musical feature corresponds to the inter- nounced in an unclear way which made the singing sound
stimulus interval (ISI) of the previous literature. Naturally, more like humming. An excerpt was selected from the
with real music, regularly reoccurring silent intervals do
not exist. Thus, in this paper, the ISI-type of period is
Table 1. Age and musical background of each participant
called Preceding Low-Feature Phase (PLFP). In the same
way that prolonged ISI strengthens the early sensory ERP Code of Age Years of Instrument Years of Type
responses (Polich et al., 1987), we hypothesized that the participant musical activity
prolonged PLFP would increase the ERP amplitudes dur- activity in dance
ing listening to continuous music. Also, we expected the kh2 20 15 Piano/ None
magnitude of the rapid increase in each musical feature singing
correlate positively with the magnitude of the ERP ampli- kh3 23 13 Piano/flute None
tudes. The possibility to determine the neural correlates of kh4 23 16 Cello None
acoustic contrast and change in continuous music kh5 23 None 6 Ballet
enables one to study the brain with the sound material kh6 24 None None
we are exposed to in the everyday basis. Music is known kh7 20 2 Piano None
to have a strong emotional influence in the brain (Koelsch, kh8 42 15 Alto None
saxophone
2014; Sachs et al., 2015; Salimpoor et al., 2015). Thus, in
kh9 46 None None
addition to research, this method has a great potential for
kh11 22 7 Piano None
applications in therapy and rehabilitation. The efficacy of kh12 21 None None
different interventions, for example among patients suffer- kh13 34 6 Piano/ None
ing from stroke, dementia, disorders of consciousness or keyboards
mood disorders, could be estimated with applications kh14 31 5 Piano None
based on the novel method presented in this paper. kh20 25 7 Piano/ 7 Folk
violin dance/
street
EXPERIMENTAL PROCEDURES dance
kh23 25 None None
Participants kh25 24 3 Piano None
Sixteen right-handed native Finnish speakers took part in kh26 31 5 Computer None
music
the experiment; 10 female and 6 male, age ranged from
H. Poikonen et al. / Neuroscience 312 (2016) 58–73 61

Fig. 1. Spectrograms of the musical pieces Adios Noniño by Astor Piazzolla (above), Bless by Kira Kira (middle) and My Black Sheep Radio Slave
by Len Faki (below).

whole piece of the electronic My Black Sheep Radio Slave spread over a wide-frequency spectrum with a predomi-
with Audacity. The excerpt started from the beginning of nant regular beat but without a melody. It was chosen
the electronic piece and was faded out after 5 min and due to the strong, even rhythmical structure and the lack
20 s. The techno piece consisted of rhythmical sound of harmony in contrast to the acoustic lullaby with
62 H. Poikonen et al. / Neuroscience 312 (2016) 58–73

lingering melody. The musical structure of Adios Noniño is short-time analysis using a 25-ms window with a 50%
versatile whereas Bless and My Black Sheep Radio Slave overlap, which is in the order of the commonly used
had more constant structure with repetitive musical pat- standard window length in the field of Music Information
terns. The last 25 s of Adios Noniño consists of the audi- Retrieval (MIR) (Tzanetakis and Cook, 2002). Overlap-
ence applause because of the live recording. Therefore, ping of windows is recommended in the analysis of musi-
we excluded the last 30 s of Adios Noniño from our data cal features to detect fast changes in the features and
analysis. their possible inactive periods with a precise time resolu-
tion. Excluding RMS, these musical features are the same
Equipment and procedure as those found to activate the auditory areas in the tempo-
ral cortices in the study by Alluri et al. (2012).
The stimuli were presented to the participants with Brightness is computed as the amount of spectral
Presentation 14.0 program in a random order via Sony energy above a threshold value fixed by default in
MDR-7506 headphones with and an intensity of 50 MIRtoolbox at 1500 Hz (Lartillot and Toiviainen, 2007)
decibels above the individually determined hearing for each analysis window. Therefore, high values in
threshold. This threshold was determined for each brightness mean that a high percentage of the spectral
participant by playing a recording of a Finnish children’s energy is concentrated in the higher end of the frequency
poem, which was irrelevant to our study paradigm, spectrum. The zero-crossing rate, known to be an indica-
starting clearly above the hearing threshold, manually tor of noisiness, is estimated by counting the number of
attenuating the sounds, and asking the participant to times the audio waveform crosses the temporal axis
report when he/she did not hear the sound anymore. (Lartillot and Toiviainen, 2007). This means that a higher
Then, the test sounds were played below the hearing zero-crossing rate indicates that there is more noise in the
threshold, manually augmenting the volume until the audio frame under consideration. The noise measured by
participant reported hearing the sounds. The average of the zero-crossing rate refers to noise as opposed to har-
these two reported volumes was considered as the monic sounds rather than to noise as distortion of clean
hearing threshold of the particular participant. The signal. RMS is related to the dynamics of the song and
amplifier was set to augment the stimuli of our study 50 defined as the root average of the square of the amplitude
decibels above this threshold, which was the standard (Lartillot and Toiviainen, 2007). Louder sounds have high
augmentation level of the amplifier used in our RMS values whereas quieter ones have low RMS values.
experiment. The participants were advised to listen to Spectral flux represents the Euclidian distance between
the music while sitting as still as possible with eyes the spectral distributions of successive frames (Lartillot
open. The playback of each piece of music was and Toiviainen, 2007). If there is a lot of variation in spec-
launched by the researcher after a short conversation tral distribution between two successive frames, the flux
with the participants via microphone. has high values. Spectral flux curves exhibit peaks at
The EEG data were recorded with 10–20 system transition between successive notes or chords.
(Jasper, 1958) with BioSemi bioactive electrode caps with
64 EEG channels and 5 external electrodes placed at the
tip of the nose, left and right mastoids and around the right Preprocessing. The EEG data of all the participants
eye both horizontally and vertically. The offsets of the were first preprocessed with EEGLAB (version 9.0.2.2b;
active electrodes were kept below 25 mV in the beginning Delorme and Makeig, 2004). The external electrode of
of the measurement and the data were collected with a the nose was set as a reference. The data were down-
sampling rate of 2048 Hz. The beginning and the end of sampled to 256 Hz, high-pass filtered with 1 Hz and low-
each musical piece was marked with a trigger into the pass filtered with 30 Hz with finite impulse response
EEG data. (FIR) filtering based on the firls (least square fitting of
FIR coefficients) MATLAB function. Visually detected
Data processing and analysis EEG channels with a noisy signal were removed from
the analysis.

Feature extraction with MIRtoolbox. We used


MIRtoolbox (version 1.3.1) to computationally extract the Setting the triggers. The triggers related to the musical
musical features. MIRtoolbox is a set of MATLAB features extracted with MIRtoolbox were added to the
functions designed for the processing of audio files preprocessed EEG data. In continuous speech, the best
(Lartillot and Toiviainen, 2007) and is used for the extrac- ERP-related results are gained when the triggers are set
tion of different musical features related to various musi- into the beginning of the word (Teder et al., 1993;
cal dimensions identified in psychoacoustics and sound Sambeth et al., 2008). Long ISIs are shown to increase
engineering as well as traditionally defined in music the- the amplitude of the N100 component (Polich et al.,
ory. In addition to the dimensions of dynamics, loudness, 1987). Additionally, strong stimulus intensity has been
rhythm, timbre and pitch, MIRtoolbox can process high- shown to enhance ERP components (Picton et al.,
level features related to meter and tonality, among others. 1977; Polich et al., 1996). Previous knowledge from the
The short-term features selected in this study, individual sound processing was utilized in our study of
encapsulating loudness-related and timbral properties of continuous music, in which the individual sounds are con-
the stimulus, were RMS, brightness, spectral flux and nected to each other in an overlapping and dynamic
zero-crossing rate. They were obtained by employing manner.
H. Poikonen et al. / Neuroscience 312 (2016) 58–73 63

We designed an algorithm, implemented in MATLAB, decomposition gives as many spatial signal source com-
for the search of time points with rapid increase of a ponents as there are channels in the EEG data. Thus,
musical feature. The algorithm was tuned using specific there were 68 components excluding the data of four par-
parameter values adapted to each song, as shown in ticipants for whom one noisy channel each had been
Table 2. The length of the PLFP was modified and the removed in preprocessing. 67 ICA components were
rapid increase was required to exceed a value called decomposed. Typically 2 to 5 ICA components related
magnitude of the rapid increase (MoRI), which was to the eye and heartbeat artifacts were removed. The
tuned individually for each different song. The mean noisy EEG data channels of the abovementioned four
values of all the segments of each song and each participants were interpolated. The continuous EEG data
feature were calculated and the magnitude of the were separated into epochs according to the triggers. The
change was defined based on the mean value. The epochs started 3000 ms before the trigger and ended
largest change was from 20% of the mean value to 2000 ms after the trigger. The baseline was defined
+20% of the mean value. Valid triggers were preceded according to the 500-ms time period before the trigger.
by a PLFP whose magnitude did not exceed the lower To double check the removal of the eye artifacts, the
threshold calculated of the mean value. The length of epochs with amplitudes above ±100 lV were rejected.
PLFP was 1 s maximum. In all cases, valid triggers had In addition, all the epochs were visually inspected and
an increase phase that lasted less than 75 ms. Eight no artifacts were detected in the data.
triggers per each feature per each song were set in 10
cases out of the 12 combinations of the song and the
musical feature. However, in two cases (for RMS of Statistical analysis
Bless and for brightness of My Black Sheep Radio
The statistical analyses of the ERP data were conducted
Slave), only 7 triggers were set because in these cases
with MATLAB version R2013a utilizing the Statistics
only seven time points during the song matched with the
Toolbox. Shapiro–Wilk test showed that the data for
computationally set limits of the particular musical
both N100 and P200 components were normally
feature. For each musical feature, the triggers were set
distributed. Also, the data of each individual ERP
in the manner described in detail in Table 2.
component were normally distributed except the N100
component of brightness in Bless by Kira Kira. T-tests
Procedure of ERP analyses. After adding the triggers were calculated over the Cz electrode for each musical
into the preprocessed data, the data were treated with feature of each musical piece with the MATLAB
Independent Component Analysis (ICA) decomposition command ttest(A), in which A refers to 16  1 array
with the runica algorithm of EEGLAB (Delorme and including the peak value of each participant defined in
Makeig, 2004) trained with default settings (decomposi- the following manner: For the N100 component, a
tion of input data using the logistic infomax ICA algorithm minimum value within a window from 80 ms to 150 ms
of Bell and Sejnowski (1995) with the natural gradient was searched for each participant and the mean value
feature to detect and remove artifacts related to eye over ±20 ms from the negative peak was calculated.
movements and blinks as well as the heartbeat. ICA Similarly, for the P200 component, a maximum value

Table 2. Presentation of the characteristics of the triggers for the features brightness, RMS, zero-crossing rate and spectral flux of songs Astor
Piazzolla: Adios Noniño, Kira Kira: Bless and Len Faki: My Black Sheep Radio Slave

Song and Number of Preceding Low-Feature Mean value of the feature Magnitude of the rapid increase (MoRI) of the
feature triggers Phase (PLFP) duration across the whole song feature value, from X% to +X% of the mean
value

Astor Piazzolla: Adios Noniño


Brightness 8 750 ms 0.3272 20% to >+20%
RMS 8 500 ms 0.04827 10% to >+10%
Zero-crossing 8 500 ms 1089 15% to >+15%
rate
Spectral flux 8 1000 ms 13.30 15% to >+15%

Kira Kira: Bless


Brightness 8 500 ms 0.3300 15% to >+15%
RMS 7 875 ms 0.09500 15% to >+15%
Zero-crossing 8 500 ms 987.3 10% to >+10%
rate
Spectral flux 8 1000 ms 24.25 20% to >+20%

Len Faki: My Black Sheep Radio Slave


Brightness 7 500 ms 0.2046 10% to >+10%
RMS 8 500 ms 0.2846 15% to >+15%
Zero-crossing 8 500 ms 471.1 15% to >+15%
rate
Spectral flux 8 312.5 ms 85.93 10% to >+10%
64 H. Poikonen et al. / Neuroscience 312 (2016) 58–73

within a time window from 150 ms to 350 ms was For the Figs. 2–6, the electrodes were pooled to
searched for each participant and the mean value over reduce noise in the plotted curves. For Figs. 2–5, three
±20 ms from the positive peak was calculated. The frontal electrodes (F1, Fz, F2) were pooled as well as
command ttest performs a t-test of the null hypothesis central (C1, Cz, C2), parietal (P1, Pz, P2) and occipital
that data in the vector A are a random sample from a (O1, Oz, O2) electrodes. The Fig. 6 shows the brain
normal distribution with mean 0 and unknown variance, responses over the Cz electrode, which is the electrode
against the alternative that the mean is not 0. The test used in all the statistical analyses.
result 1 indicates a rejection of the null hypothesis at the
5% significance level whereas test result 0 indicates a
failure to reject the null hypothesis at the 5% RESULTS
significance level. The temporal evolution of brightness in Astor Piazzolla’s
The two-way ANOVA for the factors Piece (Adios piece Adios Noniño, averaged across the triggers, and
Noniño by Astor Piazzolla, Bless by Kira Kira and My the corresponding brain responses over the same time
Black Sheep Radio Slave by Len Faki) and Musical window are shown in the Fig. 2. For Figs. 2–5, three
feature (brightness, zero-crossing rate, spectral flux and frontal electrodes (F1, Fz, F2), central electrodes (C1,
RMS) was calculated for the same peaks over the Cz, C2), parietal electrodes (P1, Pz, P2) and occipital
electrode Cz as used in the t-tests with the MATLAB electrodes (O1, Oz, O2) were pooled together. The
command anova2 (M,number_of_participants), in which M evolutions in the musical features and their brain
refers to the 64  3 matrix with peak values of participants responses can be seen in Figs. 3–5 in the following
for four musical features in rows (4  16 = 64) according order: Evolution of the musical feature spectral flux for
to three musical piece defined in columns. Kira Kira’s piece Bless in Fig. 3, zero-crossing for Len

Fig. 2. Brain response to the triggers related to increase of brightness in Adios Noniño by Astor Piazzolla. The absolute values of the amplitudes of
the EEG epochs are presented in the graph above over the frontal (F1, Fz and F2, which are averaged into one signal), central (C1, Cz and C2,
which are averaged into one signal), parietal (P1, Pz and P2, which are averaged into one signal) and occipital (O1, Oz and O2, which are averaged
into one signal) areas with the EEG epochs from 3 s to +2 s from the stimulus onset, and the temporal evolution of the musical feature brightness
for the same 5-s time window. The stimulus onset is defined by the end of the Preceding Low-Feature Phase (PLFP) period. The brain responses of
the EEG epochs are presented in the graph below with the same pooling of electrodes as in the graph above. The feature values present the values
of the feature brightness from 3 s to +2 s from the end of the PLFP period. The brightness curves for the different triggers are averaged into one
single curve over the time period from 3 to +2 s of the time points of the triggers.
H. Poikonen et al. / Neuroscience 312 (2016) 58–73 65

Fig. 3. Brain response to the triggers related to increase of spectral flux in Bless by Kira Kira. The absolute values of the amplitudes of the EEG
epochs are presented in the graph above over the frontal (F1, Fz and F2, which are averaged into one signal), central (C1, Cz and C2, which are
averaged into one signal), parietal (P1, Pz and P2, which are averaged into one signal) and occipital (O1, Oz and O2, which are averaged into one
signal) areas with the EEG epochs from 3 s to +2 s from the stimulus onset, and the temporal evolution of the musical feature spectral flux for the
same 5-s time window. The stimulus onset is defined by the end of the Preceding Low-Feature Phase (PLFP) period. The brain responses of the
EEG epochs are presented in the graph below with the same pooling of electrodes as in the graph above. The feature values present the values of
the feature spectral flux from 3 s to +2 s from the end of the PLFP period. The curves of spectral flux for the different triggers are averaged into
one single curve over the time period from 3 to +2 s of the time points of the triggers.

Faki’s piece My Black Sheep Radio Slave in Fig. 4 and Fig. 6 reveals in detail the brain responses on the
RMS for Adios Noniño in Fig. 5. central area for all musical features of brightness, zero-
The graphs reveal the dependence of the change in crossing rate, spectral flux and RMS of all musical
the voltage measured with ERP signal on the change in pieces of Astor Piazzolla: Adios Noniño, Kira Kira: Bless
the musical feature value. The high value in the musical and Len Faki: My Black Sheep Radio Slave over the
feature is not alone sufficient to elicit an ERP electrode Cz. The conventional ERP responses are
component but the high feature value needs to be shown for several features of several musical pieces.
preceded by a relatively long time period with low- The clearest sensory N100 and P200 components were
feature values. The distribution of the increased elicited for the brightness feature of Astor Piazzolla:
electrical activity over the cortex can be observed on Adios Noniño which is illustrated over a longer time
the graphs of frontal, central, parietal and occipital brain window in the Fig. 2. The results of the t-tests
regions. Alluri et al. (2012) revealed with fMRI that the comparing the N100 amplitudes of the Cz electrode in
low-level musical features, as the ones used in our study, response to each feature and each musical piece
increase activation mainly on the auditory regions located against the zero baseline are listed in Table 3.1 and for
on temporal cortices. In our study, the ERP components the P200 amplitudes in Table 3.2.
indeed are the largest on the central region, which is At N100 amplitude, the following sound features elicited
associated with increased activity originating from the responses which significantly differed from zero baseline:
core auditory areas of temporal cortices. Thus, our results spectral flux t(15) = 6.40, p < .0001; RMS t(15) =
suggest that the rapid changes in the low-level musical 2.97, p = .0095; brightness t(15) = 6.40, p<.0001
features evoke neural responses in the temporal auditory and zero-crossing rate t(15) = 7.35, p < .0001) of Adios
areas. Noniño, zero-crossing rate t(15) = 2.76, p = .015 of
66 H. Poikonen et al. / Neuroscience 312 (2016) 58–73

Fig. 4. Brain response to the triggers related to increase of zero-crossing rate in My Black Sheep Radio Slave by Len Faki. The absolute values of
the amplitudes of the EEG epochs are presented in the graph above over the frontal (F1, Fz and F2, which are averaged into one signal), central
(C1, Cz and C2, which are averaged into one signal), parietal (P1, Pz and P2, which are averaged into one signal) and occipital (O1, Oz and O2,
which are averaged into one signal) areas with the EEG epochs from 3 s to +2 s from the stimulus onset, and the temporal evolution of the
musical feature zero-crossing rate for the same 5-s time window. The stimulus onset is defined by the end of the Preceding Low-Feature Phase
(PLFP) period. The brain responses of the EEG epochs are presented in the graph below with the same pooling of electrodes as in the graph above.
The feature values present the values of the feature zero-crossing rate from 3 s to +2 s from the end of the PLFP period. The curves of zero-
crossing rate for the different triggers are averaged into one single curve over the time period from 3 to +2 s of the time points of the triggers.

Bless and brightness t(15) = 3.06, p = .0079 and zero- For N100 component, the main effect of Piece was
crossing rate t(15) = 2.53, p = .023 of My Black Sheep significant: F(2,180) = 22.04, p < .0001, resulting from
Radio Slave. For all pieces and all musical features of the larger N100 amplitudes for Adios Noniño by Astor
P200 component, the elicited responses differed Piazzolla ( 4.17 lV) compared with My Black Sheep
significantly from zero baseline, as can be seen in detail in Radio Slave by Len Faki ( 1.09 lV) and Bless by Kira
Table 3.2. In Figs. 7–9 scalp maps for selected statistically Kira ( 0.78 lV). We also found a significant main effect
significant ERP components are presented. In these of Musical feature: F(3,180) = 6.17, p = .0005, deriving
figures the ERP components are the strongest in the from larger N100 responses to the musical features of
central region, which suggest the signal summation of the zero-crossing rate ( 3.19 lV) and brightness
left and right temporal cortices originating from the ( 2.81 lV) compared to spectral flux ( 1.11 lV) and
auditory areas. RMS ( 0.96 lV). For brightness and RMS the largest
The electrode location factors (Laterality and Anterior- N100 was elicited in Adios Noniño and smallest in
posterior distribution) did not interact with any of the Bless. For zero-crossing rate and spectral flux the
factors of interest (Piece and Musical feature). largest N100 was elicited with Adios Noniño and
Therefore, we report here only the results from the two- smallest in My Black Sheep Radio Slave.
way ANOVA for the factors Piece (Adios Noniño by For the amplitudes of the P200 component, the two-
Astor Piazzolla, Bless by Kira Kira and My Black Sheep way ANOVA revealed a significant main effect of Piece:
Radio Slave by Len Faki) and Musical feature F(2,180) = 11.51, p < .0001, deriving from larger P200
(brightness, zero-crossing rate, spectral flux and RMS) amplitudes for Adios Noniño (5.27 lV) compared with
over the electrode Cz. Bless (4.19 lV) and My Black Sheep Radio Slave
H. Poikonen et al. / Neuroscience 312 (2016) 58–73 67

Fig. 5. Brain response to the triggers related to increase of RMS in Adios Noniño by Astor Piazzolla. The absolute values of the amplitudes of the
EEG epochs are presented in the graph above over the frontal (F1, Fz and F2, which are averaged into one signal), central (C1, Cz and C2, which
are averaged into one signal), parietal (P1, Pz and P2, which are averaged into one signal) and occipital (O1, Oz and O2, which are averaged into
one signal) areas with the EEG epochs from 3 s to +2 s from the stimulus onset, and the temporal evolution of the musical feature RMS for the
same 5-s time window. The stimulus onset is defined by the end of the Preceding Low-Feature Phase (PLFP) period. The brain responses of the
EEG epochs are presented in the graph below with the same pooling of electrodes as in the graph above. The feature values present the values of
the feature RMS from 3 s to +2 s from the end of the PLFP period. The RMS curves for the different triggers are averaged into one single curve
over the time period from 3 to +2 s of the time points of the triggers.

(3.03 lV). For Feature, the main effect was not significant: brightness of Kira Kira: Bless and Len Faki: My Black
F(3,180) = 1.24, p = 0.30. For RMS and spectral flux the Sheep Radio Slave which both have the PLFP of
P200 was largest in Bless and smallest in My Black 500 ms. Also MoRI of the feature value is larger for
Sheep Radio Slave, and for brightness and zero- Adios Noniño (from 20 per cent of the mean value to
crossing rate the P200 was largest in Adios Noniño and +20 per cent of the mean value) compared to Bless
smallest in My Black Sheep Radio Slave. and My Black Sheep Radio Slave (from 15 per cent to
+15 percent and from 10 per cent to +10 per cent,
respectively). Again, these magnitudes of rapid changes
The relevance of magnitude of the rapid increase and
are the threshold magnitudes, not the exact magnitudes.
Preceding Low-Feature Phase
In addition, clear ERP components were also revealed
The MoRI and the PLFP between the musical features of for zero-crossing rate of Adios Noniño but not for Bless or
three musical pieces and the magnitude of the evoked My Black Sheep Radio Slave (Fig. 6) even though all the
ERP components seem to be correlated. However, the songs have PLFP of 500 ms. Small P200 components
following speculation was not examined statistically. The can be seen for My Black Sheep Radio Slave but not
clearest ERP components were observed for brightness for Bless. These differences could be explained with
in Astor Piazzolla’s piece Adios Noniño (Fig. 2 and different MoRI in the feature value (from 15 per cent to
Fig. 6). In Table 2 PLFP preceding the trigger can be +15 per cent for Adios Noniño and My Black Sheep
seen between all three musical pieces. It is important to Radio Slave and from 10 per cent to +10 per cent for
note that the values of the PLFPs are the minimum Bless). Even though Adios Noniño and My Black Sheep
PLFP and not the exact PLFP. Brightness of Adios Radio Slave both have the same PLFP and magnitude
Noniño had the longest PLFP with 750 ms in contrast to of increase in the feature value, only Adios Noniño had
68 H. Poikonen et al. / Neuroscience 312 (2016) 58–73

Fig. 6. Recapitulation of ERP responses for features spectral flux, RMS, brightness and zero-crossing rate of pieces Astor Piazzolla: Adios Noniño,
Kira Kira: Bless and Len Faki: My Black Sheep Radio Slave. The ERP responses are the values measured over the electrode Cz.

Table 3.1. T-tests of the N100 component (time window from 80 ms to


150 ms of the stimulus onset) over the Cz electrode for the features
brightness, RMS, zero-crossing rate and spectral flux of songs Astor Table 3.2. T-tests of the P200 component (time window from 150 ms to
Piazzolla: Adios Noniño, Kira Kira: Bless and Len Faki: My Black 350 ms of the stimulus onset) over the Cz electrode for the features
Sheep Radio Slave brightness, RMS, zero-crossing rate and spectral flux of songs Astor
Piazzolla: Adios Noniño, Kira Kira: Bless and Len Faki: My Black
t-test t15 p Sheep Radio Slave

Astor Piazzolla: Adios Noniño t-test t15 p


Brightness 6.40 0.000012
Astor Piazzolla: Adios Noniño
RMS 2.97 0.0095
Brightness 7.41 0.00000218
Zero-crossing rate 7.35 0.00000239
RMS 8.16 0.000000675
Spectral flux 4.68 0.000297
Zero-crossing rate 6.92 0.00000491
Kira Kira: Bless Spectral flux 7.52 0.00000184
Brightness 1.56 0.14
Kira Kira: Bless
RMS 0.56 0.58
Brightness 9.29 0.000000131
Zero-crossing rate 2.76 0.015
RMS 6.33 0.0000134
Spectral flux 0.39 0.70
Zero-crossing rate 4.43 0.000485
Len Faki: My Black Sheep Radio Slave Spectral flux 8.58 0.000000359
Brightness 3.06 0.0079
Len Faki: My Black Sheep Radio Slave
RMS 1.24 0.24
Brightness 3.73 0.0020
Zero-crossing rate 2.53 0.023
RMS 5.55 0.0000560
Spectral flux 0.56 0.59
Zero-crossing rate 3.83 0.0016
Spectral flux 6.02 0.0000234

a clear ERP response. The triggering is based on the


mean value of each musical piece and these two pieces MoRI of the feature value from 10 per cent of the
have very different mean values for zero-crossing rate mean value to +10 per cent of the mean value
(1089 for Adios Noniño and 471.1 for My Black Sheep compared to Adios Noniño and Bless; change in
Radio Slave; Table 2). magnitude of the feature value from 15 percent to
For the responses related to the values of spectral flux +15 per cent and 20 per cent to +20 percent,
and RMS (Fig. 6), the ERP components are evoked for respectively, and both have PLFP of 1000 ms). In
Adios Noniño and Bless but not My Black Sheep Radio contrast, for RMS, My Black Sheep Radio Slave has the
Slave. The latter has low values for PLFP and same PLFP (500 ms) and change in magnitude (from
magnitude of change of feature value for triggered time 10 percent to +10 percent) as Adios Noniño.
points related to spectral flux (PLFP 312.5 ms and the However, the mean value varies a lot for RMS for all
H. Poikonen et al. / Neuroscience 312 (2016) 58–73 69

Fig. 7. Scalp maps for selected ERP components of the pieces and the musical features, for which the ERP components differ significantly from the
zero baseline. The latencies of N100 component for Adios Noniño are: Brightness 131 ms and zero-crossing rate 122 ms.

Fig. 8. Scalp maps for selected ERP components of the pieces and the musical features, for which the ERP components differ significantly from the
zero baseline. The latencies of P200 component for Adios Noniño are: spectral flux 221 ms, RMS 210 ms, brightness 234 ms and zero-crossing rate
232 ms.

Fig. 9. Scalp maps for selected ERP components of the pieces and the musical features, for which the ERP components differ significantly from the
zero baseline. The latencies of P200 component for Bless are: spectral flux 203 ms, RMS 215 ms and brightness 237 ms.

songs (0.04827 for Adios Noniño, 0.09500 for Bless and pieces. To take the full advantage of the precise
as high as 0.2846 for My Black Sheep Radio Slave). temporal resolution of EEG and to be able to match the
rapid changes in music with the rapid changes in the
EEG data in a scale of milliseconds, we excluded time–
DISCUSSION frequency analyses, such as wavelet and sliding
In the present study, we investigated the ERP responses window, as possible methodological approaches. EEG
elicited during rapid changes in low-level musical features microstate analysis was excluded since, instead of the
of three pieces from different music genres. We wanted to global cognitive processes, we wanted to focus on the
study brain responses for music in general and keep the sound processing on auditory regions of temporal
category of music wide by using acoustically, digitally cortices, the importance of which was highlighted by
and vocally produced material. A novel technique based Alluri et al. (2012).
on automatic digital recognition of given sound features The results show a relationship between the
enabled an interesting approach to study different magnitude of the ERP components and the magnitude
musical pieces by extracting the same musical features of the rapid change in the feature value as well as the
from each musical piece. For this study we focused on length of the preceding time with low-feature values.
a few short-term musical features which were related to Therefore, the earlier results which coupled the intensity
the timbral and dynamic characters of the musical of the sound and the length of the ISI with the
70 H. Poikonen et al. / Neuroscience 312 (2016) 58–73

magnitude of the ERP components seem to be valid also in be explained with different magnitude of the increase in
the context of the dynamics of the musical features. In the feature value. Adios Noniño and My Black Sheep
other words, the ERP components are elicited not only Radio Slave have very different mean values for zero-
by simple sound streams with precisely defined silent crossing rate and since the triggering is based on the
ISIs but also by dynamic continuous natural stimuli such mean value of each musical piece, the stimulus intensity
as a musical piece. However, the musical stimulus needs of zero-crossing-related time points may still be stronger
to include strong acoustical contrasts in the musical in Adios Noniño. Also, the musical features of brightness
features to be able to elicit conventional ERP components. and zero-crossing rate are strongly correlated in Adios
Carterette and Kendall (1999) describe the perceptual Noniño. The subjective listening to the triggered time
process as that operating on the principle of contrast or points related to brightness and zero-crossing rate reveal
change, during which the sensory mechanisms look for similar elements of the sound across these time points: A
changes in order to make sense of the information in music. strong clear sound, either a single tone or a chord, is
In addition, Kluender et al. (2003) suggest that the percep- played after a longer fading sound. This character of the
tual systems of all sensory modalities respond mainly to trigger-related time points matches well also with the
changes. Spectrotemporal modulations, represented by mathematical definition of brightness and zero-crossing
spectral flux in general, and specifically its sub-bands rate; both are large for the high pitches and small for
(Alluri and Toiviainen, 2012), is shown to capture this low pitches. This kind of unexpected sound, elicited while
aspect of perceivable change in polyphonic timbre. Alluri all the preceding sound is fading, can be reasoned to form
et al. (2012) showed that the brain regions activated corre- sensory N100 and P200 components.
sponding to the musical features encapsulating brightness, For the responses related to the values of spectral flux
RMS, spectral flux and zero-crossing rate of the same and RMS, the N100 and P200 components are evoked for
musical piece as the one used in our study (Adios Noniño) Adios Noniño and Bless but not My Black Sheep Radio
are located in the auditory regions on the right and left tem- Slave. The mean value of RMS calculated for each
poral cortex, namely superior temporal gyrus (STG) and musical piece varies a lot across three different pieces
middle temporal gyrus (MTG) on the left hemisphere and used and reflects the principle of the lower the mean
STG and Heschl’s gyrus (HG) on the right hemisphere. value the stronger the N100 component. Based on the
Due to summation of the electric dipoles in the brain tissue, mathematical definition of RMS, the PLFP with low RMS
the EEG signal originating from these regions is the stron- values before the trigger-related time point involves
gest on the frontal and central middle line. Therefore, it is sounds with very low decibel levels. Novitski et al.
very likely that the ERP components measured in our study (2003) studied the effect of fMRI noise to ERP responses
are generated in the same brain regions as revealed in the and noticed that the N100 diminished in the presence of
study of Alluri et al. (2012) and that these auditory-related noise. The existence of sound during PLFP can be con-
regions are sensitive in detecting rapid changes in these sidered as noise from the point of view of traditional
musical features in particular. ERP research. Thus, higher RMS values during PLFP
Teder et al. (1993) highlighted the relevance of the may lead to an attenuated N100 response evoked by
length of the ISI in the elicitation and magnitude of N100 the rapid increase of the RMS value.
component for continuous natural speech. Based on our In Bless, N100 responses for spectral flux and RMS
results, the length of the PLFP preceding the time point are smaller compared to Adios Noniño despite similar
of interest plays a role in the elicitation of N100 also in PLFP and MoRI values. However, spectral flux and
continuous music. The prolonged PLFP seems to RMS of Bless have higher values during PLFP
increase the amplitude of N100. In addition, stimulus compared to Adios Noniño which might attenuate
intensity has been shown to influence the N100 amplitude the N100 responses. Since spectral flux indicates
such that a stimulus with stronger intensity elicits a larger the spectral difference of two consecutive segments, the
N100 (Picton et al., 1977). Similarly, in our study the N100 higher value indicates more varying spectral content
and P200 amplitudes seem to correlate positively with the between segments. Again, the mean value for spectral
increase of the magnitude of the rapid change in the musi- flux in Adios Noniño is significantly smaller than in
cal feature of interest. It is important to note that in our Bless. Thus, the difference in flux and RMS mean
study at the time points of rapid change, both the intensity values between Bless and Adios Noniño could explain
and the feature value changed drastically. Further studies the smaller N100 component of Bless. However, the
need to be performed to extract the subcomponents sound in the background during PLFP does not affect
related to the intensity and the feature value from the the P200 component. According to Novitski et al.
ERP components. (2003), the P3 component and its latency increase due
The clearest N100 and P200 responses were to noise, which illustrates how differently each ERP com-
observed for brightness in Astor Piazzolla’s piece Adios ponent reacts to noise. In any case, both latency and
Noniño. These findings reflect what was observed in the magnitude of P200 amplitude remained unchanged for
fMRI cross-validation study using the same musical Bless and Adios Noniño, which could indicate that the
piece (Alluri et al., 2013): Adios Noniño activated the brain N100 component is more sensitive to sound in the back-
more robustly than the other pieces. In addition, clear ground than the later ERP components.
N100 and P200 components were also revealed for The current results are encouraging for further
zero-crossing rate of Adios Noniño but not for Bless or research with this new realistic music paradigm for ERP
My Black Sheep Radio Slave. These differences could method.
H. Poikonen et al. / Neuroscience 312 (2016) 58–73 71

Limitations participants and since the signal was very consistent


across the participants, it reached significance and can
Due to pioneering and exploratory characteristics of the
be considered reliable. Yet, this analysis does not allow
current contribution, there are several issues which need
us to study the individual responses of each participant
to be taken into account and, in some occasions, also
due to the small amount of averages. Another issue with
improved, when using the analysis method in future.
ERP analysis is high trial-to-trial variability, especially when
These issues, to be further specified below, deal with the
using natural non-repetitive stimulus such as music. How-
stimulation, ERP analysis, and the participants of the study.
ever, this is an issue to consider in all time-locked analyses
First, regarding the stimulation, the musical pieces
of EEG. In contrast, long tradition and thus the extensive lit-
used in our study differed acoustically from each other:
erature available for the ERP method is one of the remark-
Human voice, instrumental sounds and digital sounds
able advantages of the method chosen. In addition, ERP
are all known to evoke different brain processes (Belin
analysis is computationally light and technology is fully
et al., 2000; Meyer et al., 2006a,b). However, according
mobile, which enables applications in clinical use with lower
to Levy et al. (2001), the difference between human vocal
technical demands in places such as hospitals, asylums,
and instrumental sounds affects only the later ERP com-
therapy centers and refugee camps for example when
ponents starting from the 260 ms from the stimulus onset,
estimating the depth of coma (Fischer et al., 2008) or the
but not the early ones (N100 and P200 components)
prognosis of the vegetative state (O’Kelly et al., 2013).
which were in the focus in our study.
Third, regarding the participants, the unbalance of the
Second, regarding the analyses, since this study is
genders (10 females, 6 males) and relatively large age
based on real music, the computationally defined
range among participants (19–46 years) need also be
changes in the values for the musical features are
acknowledged. Age-related decline in auditory temporal
based on mathematically set boundaries. In the case of
processing start to emerge in middle--aged adults and are
such a specific threshold value, anything larger/smaller
suggested to occur due to changes in auditory processing
than the threshold value is treated equally. Therefore,
in the central auditory system (Bertoli et al., 2002; Alain
the set of equally treated short musical excerpts may
et al., 2004). Thus, the large age range of the participants
include acoustically very different excerpts. These
may affect ERP responses in some participants in our
excerpts are categorized into the same group based on
study, especially in the feature of brightness which captures
the values gained for a particular musical feature
the rapid change from low frequencies to high.
meaning that all the other factors defining that specific
Moreover, regarding the gender of the participants,
musical excerpt are ignored. The inter-stimulus variation
males are shown to have more pronounced right
is an acknowledged deficit in single-trial ERP method
hemispheric predominance in processing musical syntax
and with natural, non-repetitive stimuli this variation
compared to female (Koelsch et al., 2003). Wager et al.
becomes even larger. Thus, both mathematically and
(2003) suggested that emotional activity is more lateral-
philosophically, this straightforward categorizing evokes
ized among males, whereas females showed more brain-
uncertainty. However, when studying human perception,
stem activity in affective paradigms. Moreover, females
emotion and cognition, especially with natural stimuli
tend to be more sensitive to emotional stimuli compared
such as music, we need to keep in mind the creative
to males, not due to increased emotional reactivity among
and non-linear nature of the human brain in perception
female, but due to more efficient emotional regulation
of the external world, which has already been revealed
among male (McRae et al., 2008). However, to our knowl-
in such phenomena as the McGurk effect (McGurk and
edge, there is no evidence about gender effects on corti-
MacDonald, 1976) and binaural beats (Dove, 1839;
cal auditory processes as measured in our study.
Lane et al., 1998).
All the participants were native Finnish speakers but
In addition, in the group comparisons of Musical their background in other languages was not
feature and Piece the results were not corrected for documented in this study. Yet, it should be noticed that
multiple comparisons. These corrections would most it is mandatory for all Finnish pupils to learn two foreign
likely have reduced the number of the ERP components languages as a part of their school curriculum after the
which differed statistically significantly from each other. age of 7 years (in most cases English and Swedish, in
Moreover, one of the challenges with the ERP method this order). This issue is of importance since the
consists in the unfavorable signal-to-noise ratio due to profound knowledge in several languages is known to
artifacts and neural background activity. Typically 25–100 shape the auditory areas. Ressel et al. (2012) showed
trials are averaged to form a clear ERP response (Luck, that the participants, who were bilingual since childhood,
2004). Here we had 8 trials per sound feature in each par- had a larger HG bilaterally compared to the monolingual
ticipant. Also, the musical piece of Adios Noniño was participants. The enlarged HG is shown to improve per-
remarkably longer than Bless and My Black Sheep Radio ception of sounds of an unfamiliar language (Golestani
Slave, which could evoke differences in the habituation et al., 2007). Therefore, differences in the language back-
processes in the brain (Seppänen et al., 2013). However, ground might influence also to the perception of music
Seppänen et al. (2013) discovered rapid perceptual and therefore should be carefully screened in the future
learning to the musical sounds only for professional musi- studies. More importantly, the versatile non-professional
cians but not for the musical laymen as used in our study. musical background of the participants may induce
Moreover, the number of repetitions in her study was individual differences in perception of the musical
considerably higher than in our study. Since we had 16 features. The reported music-related interests varied from
72 H. Poikonen et al. / Neuroscience 312 (2016) 58–73

folk dance to singing and computer music. Many partici- predicting lateralized brain responses to music. NeuroImage 83
pants also had an experience of playing an instrument (12):627–636.
Belin P, Zatorre RJ, Lafaille P, Ahad P, Pike B (2000) Voice-selective
over several years whereas some participants did not
areas in human auditory cortex. Nature 403:309–312.
report any music-related background. This heterogeneity Bell AJ, Sejnowski TJ (1995) An information maximisation approach
in the study sample may increase variance the ERP to blind separation and blind deconvolution. Neural Comput 7
components evoked in our study (Putkinen et al., 2014). (6):1129–1159.
Further research is needed to specify the influence of Bertoli S, Smurzynski J, Probst R (2002) Temporal resolution in young
music and music-related background, such as dance, to and elderly subjects as measured by mismatch negativity and
the ERP components evoked by a musical piece. psychoacoustic gap detection task. Clin Neurophysiol 113:396–406.
Bhattacharya J, Petsche H, Pereda E (2001) Long-range synchrony in
the c band: role in music perception. J Neurosci 21(16):6329–6337.
Brattico E, Tervaniemi M, Näätänen R, Peretz I (2006) Musical scale
CONCLUSION properties are automatically processed in the human auditory
The aim of the study was to develop a novel paradigm cortex. Brain Res 1117(1):162–174.
Brattico E, Jacobsen T, De Baene W, Glerean E, Tervaniemi M
based on ERP method in the research of neural
(2010) Cognitive vs. affective listening modes and judgements of
correlates of continuous natural stimuli. Therefore, music – An ERP study. Biol Psychol 85(3):393–409.
continuous music was used as stimuli from which Brattico E, Alluri V, Bogert B, Jacobsen T, Vartiainen N, Nieminen S,
changes in the individual musical features were Tervaniemi M (2011) A functional MRI study of happy and sad
observed. The usage of continuous music as stimuli emotions in music with and without lyrics. Front Psychol 2:308.
creates a novel and uncharted test setting for the ERP Caclin A, McAdams S, Smith BK, Giard MH (2008) Interactive
processing of timbre dimensions: an exploration with event-
method. Studies investigating dynamics of musical
related potentials. J Cogn Neurosci 20(1):49–64.
feature processing during continuous listening to entire Carterette EC, Kendall RA (1999) Comparative music perception and
pieces of music have been conducted with fMRI (Alluri cognition. In: Deutsch D, editor. The Psychology of Music. San
et al., 2012, 2013; Toiviainen et al., 2014; Cong et al., Diego: Academic. p. 725–791.
2014), which gave an insight to both cortical and subcor- Chobert J, Marie C, François C, Schön D, Besson M (2011)
tical processing of several high- and low-level musical Enhanced passive and active processing of syllables in
features. Since the temporal resolution of EEG is much musician children. J Cogn Neurosci 23(12):3874–3887.
Chobert J, François C, Velay J-L, Besson M (2014) Twelve months of
more precise compared to fMRI and since the two meth-
active musical training in 8- to 10-year-old children enhances the
ods measure different kind of activity in the brain, the preattentive processing of syllabic duration and voice onset time.
extension of the research to EEG is important. With the Cereb Cortex 24(4):956–967.
ERP method of EEG, temporal processing of natural stim- Cong F, Puoliväli T, Alluri V, Sipola T, Burunat I, Toiviainen P, Nandi AK,
uli in the cortex can be investigated in detail. The ERP Brattico E, Ristaniemi T (2014) Key issues in decomposing fMRI
components N100 and P200 revealed for rapid changes during naturalistic and continuous music experience with
independent component analysis. J Neurosci Meth 223(2): 74–84.
in the values related to spectral flux, RMS, brightness
Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for
and zero-crossing rate indicate the suitability of the realis- analysis of single-trial EEG dynamics. J Neurosci Meth 134(1):9–21.
tic, continuous stimuli in ERP research. Thus, elicitation of Dove HW (1839) Repertorium der Physik, Vol. III, p 494.
ERP components is not restricted only to simplified, well- Fischer C, Dailler F, Morlet D (2008) Novelty P3 elicited by the
controlled sequences of separated sounds presented with subject’s own name in comatose patients. Clin Neurophysiol
silent ISI or with relatively repetitive musical passages. 119:2224–2230.
Fujioka T, Trainor LJ, Ross B, Kakigi R, Pantev C (2005) Automatic
encoding of polyphonic melodies in musicians and nonmusicians.
Acknowledgments—We would like to thank Kone Foundation, The J Cogn Neurosci 17(10):1578–1592.
Finnish Cultural Foundation and Academy of Finland for financial Golestani N, Molko N, Dehaene S, LeBihan D, Pallier C (2007) Brain
support, Miika Leminen, Tommi Makkonen, Valtteri Wikström and structure predicts t learning of foreign speech sounds. Cereb
Tanja Linnavalli for their assistance during the EEG recordings Cortex 17:575–582.
and data processing Caitlin Dawson for proofreading. Grewe O, Nagel F, Kopiez R, Altenmüller E (2005) How does music
arouse ‘‘chills”? Ann NY Acad Sci 1060:446–449.
Hansen JC, Hillyard SA (1980) Endogenous brain potentials
associated with selective auditory attention. Electroencephalogr
REFERENCES Clin Neurophysiol 49:277–290.
Hillyard S, Kutas M (1983) Electrophysiology of cognitive processing.
Ann Rev Psychol 34:33–61.
Abrams DA, Ryali S, Chen T, Chordia P, Khouzam A, Levitin DJ,
Jasper HH (1958) The ten-twenty electrode system of the
Menon V (2013) Inter-subject synchronization of brain responses
international federation. Electroen Clin Neuro 10:371–375.
during natural music listening. Eur J Neurosci 37(9):1458–1469.
Kluender KR, Coady JA, Kiefte M (2003) Sensitivity to change in
Alain C, McDonald KL, Ostroff JM, Schneider B (2004) Aging: a
perception of speech. Speech Commun 41:59–69.
switch from automatic to controlled processing of sounds?
Koelsch S (2014) Brain correlates of music-evoked emotions. Nat
Psychol Aging 19:125–133.
Rev Neurosci 15:170–180.
Alluri V, Toiviainen P (2012) Effect of enculturation on the semantic
Koelsch S, Jentschke S (2008) Short-term effects of processing
and acoustic correlates of polyphonic timbre. Music Percept 29
musical syntax: an ERP study. Brain Res 1212:55–62.
(3):297–310.
Koelsch S, Maess B, Grossmann T, Friederici AD (2003) Electric
Alluri V, Toiviainen P, Jääskeläinen IP, Glerean E, Sams M, Brattico
brain responses reveal gender differences in music processing.
E (2012) Large-scale brain networks emerge from dynamic
NeuroReport 14:709–713.
processing of musical timbre, key and rhythm. NeuroImage 59
Koelsch S, Fritz T, v Cramon DY, Müller K, Friederici AD (2006)
(4):3677–3689.
Investigating emotion with music: an fMRI study. Hum Brain Mapp
Alluri V, Toiviainen P, Lund TE, Wallentin M, Vuust P, Nandi AK,
27:239–250.
Ristaniemi T, Brattico E (2013) From Vivaldi to Beatles and back:
H. Poikonen et al. / Neuroscience 312 (2016) 58–73 73

Kühnis J, Elmer S, Meyer M, Jäncke L (2013) The encoding of vowels Pantev C, Roberts LE, Schultz M, Engelien A, Ross B (2001) Timbre-
and temporal speech cues in the auditory cortex of professional specific enhancement of auditory cortical representations in
musicians: an EEG study. Neuropsychologia 51:1608–1618. musicians. NeuroReport 12:1–6.
Lane JD, Kasian SJ, Owens JE, Marsh GR (1998) Binaural auditory Pearce MT, Herrojo Ruiz M, Kapasi S, Wiggins GA, Bhattacharya J
beats affect vigilance performance and mood. Physiol Behav 63 (2010) Unsupervised statistical learning underpins computational,
(2):249–252. behavioural, and neural manifestations of musical expectation.
Lartillot O, Toiviainen P (2007) A Matlab toolbox for musical feature NeuroImage 50:302–313.
extraction from audio. International Conference on Digital Audio Pereira CS, Teixeira J, Figueiredo P, Xavier J, Castro SL, Brattico E
Effects, Bordeaux. (2011) Music and emotions in the brain: familiarity matters. PLoS
Levy DA, Granot R, Bentin S (2001) Processing specificity for human One 6(11):e27241.
voice stimuli: electrophysiological evidence. NeuroReport 12 Peretz I, Zatorre RJ, editors. (2003) The cognitive neuroscience of
(2653):2657. music. Oxford University Press.
Luck SJ (2004) An introduction to the event-related potential Picton TW, Woods DL, Baribeau-Braun J, Healey TMG (1977)
technique. second edition. The MIT Press. Evoked potential audiometry. J Otolaryngol 6(2):90–116.
Marie C, Kujala T, Besson M (2012) Musical and linguistic expertise Polich J, Aung M, Dalessio DJ (1987) Long-latency auditory evoked
influence pre-attentive and attentive processing of non-speech potentials: intensity, inter-stimulus interval and habituation.
sounds. Cortex 48:447–457. Pavlovian J Biol Sci 23(1):35–40.
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Polich J, Ellerson PC, Cohen J (1996) P300, stimulus intensity,
Nature 264:746–748. modality and probability. Int J Psychophysiol 23(1):55–62.
McRae K, Ochsner KN, Mauss IB, Gabrieli JJ, Gross JJ (2008) Gender Putkinen V, Tervaniemi M, Saarikivi K, de Vent N, Huotilainen M
differences in emotion regulation: an fMRI study of cognitive (2014) Investigating the effects of musical training on functional
reappraisal. Group Process Intergroup Relat 11(2):143–162. brain development with a novel melodic MMN paradigm.
Meyer M, Baumann S, Jäncke L (2006a) Electrical brain imaging Neurobiol Learn Mem 110:8–15.
reveals spatio-temporal dynamics of timbre perception in humans. Ressel V, Pallier C, Ventura-Campos N, Diaz B, Roessler A, Ávila C,
NeuroImage 32:1510–1523. Sebastián-Gallés N (2012) An effect of bilingualism on the
Meyer M, Baumann S, Jancke L (2006b) Electrical brain imaging auditory cortex. J Neurosci 32(47):16597–16601.
reveals spatio-temporal dynamics of timbre perception in humans. Sachs ME, Damasio A, Habibi A (2015) The pleasure of sad music: a
NeuroImage 32:1510–1523. systematic review. Front Hum Neurosci 24(9):404.
Mikutta C, Altorfer A, Strik W, Koenig T (2012) Emotions, arousal, Salimpoor VN, Zald DH, Zatorre RJ, Dagher A, McIntosh AR (2015)
and frontal alpha rhythm asymmetry during Beethoven’s 5th Predictions and the brain: how musical sounds become
symphony. Brain Topogr 25:423–430. rewarding. Trends Cogn Sci 19(2):86–91.
Mikutta CA, Maissen G, Altorfer A, Strik W, Koenig T (2014) Sambeth A, Ruohio K, Alku P, Fellman V, Huotilainen M (2008)
Professional musicians listen differently to music. Neuroscience Sleeping newborns extract prosody from continuous speech. Clin
268:102–111. Neurophysiol 119(2):332–341.
Morrison SJ, Demorest SM, Aylward EH, Cramer SC, Maravilla KR Schaefer A, Pottage CL, Rickart AJ (2011) Electrophysiological
(2003) FMRI investigation of cross-cultural music comprehension. correlates of remembering emotional pictures. NeuroImage
NeuroImage 20(1):378–384. 54:714–724.
Näätänen R, Picton T (1987) The N1 wave of the human electric and Seppänen M, Hämäläinen J, Pesonen A-K, Tervaniemi M (2013)
magnetic response to sound: a reviw and an analysis of the Passive sound exposure induces rapid perceptual learning in
component structure. Psychophysiology 24:375–425. musicians: event-related potential evidence. Biol Psychol
Näätänen R, Gaillard AWK, Mäntysalo S (1978) Early selective- 94:341–353.
attention effect on evoked potential reinterpreted. Acta Psychol Teder W, Alho K, Reinikainen K, Näätänen R (1993) Interstimulus
42:313–329. interval and the selective-attention effect on auditory ERPs:
Novitski N, Alho K, Korzyukov O, Carlson S, Martinkauppi S, Escera ‘‘N100 enhancement” versus processing negativity.
C, Rinne T, Aronen HJ, Näätänen R (2001) Effects of acoustic Psychophysiology 30(1):71–81.
gradient noise from functional magnetic resonance imaging on Tervaniemi M, Schröger E, Saher M, Näätänen R (2000) Effects of
auditory processing as reflected by event-related brain potentials. spectral complexity and sound duration on automatic complex-
NeuroImage 14:244–251. sound pitch processing in humans - a mismatch negativity study.
Novitski N, Anourova I, Martinkauppi S, Aronen HJ, Näätänen R, Neurosci Lett 290:66–70.
Carlson S (2003) Effects of noise from functional magnetic Tervaniemi M, Huotilainen M, Brattico E (2014) Melodic multi-feature
resonance imaging on auditory event-related potentials in paradigm reveals auditory profiles in music-sound encoding.
working memory task. NeuroImage 20(2):1320–1328. Front Hum Neurosci 8:496.
Novitski N, Maess B, Tervaniemi M (2006) Frequency specific Toiviainen P, Alluri V, Brattico E, Wallentin M, Vuust P (2014)
impairment of automatic pitch change detection by fMRI Capturing the musical brain with Lasso: dynamic decoding of
acoustic noise: an MEG study. J Neurosci Meth 155:149–159. musical features from fMRI data. NeuroImage 88(3):170–180.
O’Kelly J, James L, Palaniappan R, Taborin J, Fachner J, Magee WL Tzanetakis G, Cook P (2002) Music genre classification of audio
(2013) Neurophysiological and behavioral responses to music signals. T-SAP 10:293–302.
therapy in vegetative and minimally conscious states. Front Hum Virtala P, Huotilainen M, Partanen E, Tervaniemi M (2014)
Neurosci 12(7):884. Musicianship facilitates the processing of Western music
Panksepp J, Bernatzky G (2002) Emotional sounds and the brain: the chords-An ERP and behavioral study. Neuropsychologia 61:
neuro-affective foundations of musical appreciation. Behav 247–258.
Process 60:133–155. Wager TD, Phan KL, Liberzon I, Taylor SF (2003) Valence, gender
Pantev C, Bertrand O, Eulitz C, Verkindt C, Hampson S, Schuierer G, and lateralization of functional brain anatomy in emotion: a meta-
Elbert T (1995) Specific tonotopic organizations of different areas analysis of findings from neuroimaging. NeuroImage 19:513–531.
of the human auditory cortex revealed by simultaneous magnetic Woods DL (1995) The component structure of the N1 wave of the
and electric recordings. Electroencephalogr Clin Neurophysiol 94 human auditory evoked potential. Electroencephalogr Clin
(1):26–40. Neurophysiol Suppl 44:102–109.

(Accepted 30 October 2015)


(Available online 7 November 2015)

You might also like