2018 - Attent Models - Garrido+

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Cerebral Cortex, May 2018;28: 1771–1782

doi: 10.1093/cercor/bhx087
Advance Access Publication Date: 10 April 2017
Original Article

ORIGINAL ARTICLE

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


Bayesian Mapping Reveals That Attention Boosts Neural
Responses to Predicted and Unpredicted Stimuli
Marta I. Garrido1,2,3,4, Elise G. Rowe1,2, Veronika Halász1,2,3 and
Jason B. Mattingley1,3,5
1
Queensland Brain Institute, The University of Queensland, 4072 Brisbane, Australia, 2Centre for Advanced
Imaging, The University of Queensland, 4072 Brisbane, Australia, 3ARC Centre of Excellence for Integrative
Brain Function, The University of Queensland, 4072 Brisbane, Australia, 4School of Mathematics and Physics,
The University of Queensland, 4072 Brisbane, Australia and 5School of Psychology, The University of
Queensland, 4072 Brisbane, Australia
Address correspondence to Marta I. Garrido, Queensland Brain Institute, The University of Queensland, Building 79, St Lucia 4072, Brisbane, Australia.
Email: m.garrido@uq.edu.au

Abstract
Predictive coding posits that the human brain continually monitors the environment for regularities and detects
inconsistencies. It is unclear, however, what effect attention has on expectation processes, as there have been relatively few
studies and the results of these have yielded contradictory findings. Here, we employed Bayesian model comparison to
adjudicate between 2 alternative computational models. The “Opposition” model states that attention boosts neural
responses equally to predicted and unpredicted stimuli, whereas the “Interaction” model assumes that attentional boosting
of neural signals depends on the level of predictability. We designed a novel, audiospatial attention task that orthogonally
manipulated attention and prediction by playing oddball sequences in either the attended or unattended ear. We observed
sensory prediction error responses, with electroencephalography, across all attentional manipulations. Crucially, posterior
probability maps revealed that, overall, the Opposition model better explained scalp and source data, suggesting that
attention boosts responses to predicted and unpredicted stimuli equally. Furthermore, Dynamic Causal Modeling showed
that these Opposition effects were expressed in plastic changes within the mismatch negativity network. Our findings
provide empirical evidence for a computational model of the opposing interplay of attention and expectation in the brain.

Key words: EEG, MMN, modeling, novelty, prediction

Introduction attention is the process of prioritizing information by allocating


The way in which we perceive the world around us is thought to more cognitive resources to the object of focus, while suppres-
be an active inferential process. Rather than passively registering sing information that is irrelevant. Recent extensions of predic-
information that arrives at our senses, the brain builds predictive tive coding have framed attention as the process of enhancing
models of what it might encounter next. These theoretical con- the reliability of prediction errors (Feldman and Friston 2010).
jectures have been formalized in terms of predictive coding (Rao This idea has been empirically demonstrated by larger prediction
and Ballard 1999; Friston 2005) and are useful in explaining the errors for attended than unattended visual objects (Jiang et al.
ubiquitous phenomenon of larger brain responses to surprising 2013) and sounds (Auksztulewicz and Friston 2015), with the lat-
than predictable events (Montague 1999; Garrido et al. 2013) ter going against the longstanding notion of mismatch negativity
(Opitz et al. 1999; Summerfield and Koechlin 2008). Selective (MMN) as a preattentive process (Naatanen et al. 2001).

© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
1772 | Cerebral Cortex, 2018, Vol. 28, No. 5

There is a general consensus that expectation dampens


neuronal activity and that attention boosts neuronal activity
(Summerfield and Koechlin 2008). Thus, superficially at least,
attention and prediction appear to have opposing effects.
However, the way in which attention interacts with expectation
is unclear for 2 reasons. First, there have been very few
attempts to manipulate attention and prediction indepen-
dently, but many instances in which the 2 have been entwined
or confounded (Summerfield and Egner 2009), as attention is
often manipulated in a probabilistic manner rather than

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


through stimulus filtering or prioritization. Second, the few
studies on prediction and attention have yielded a puzzling
depiction of what might be happening in the brain. Kok et al.
Figure 1. Two competing models for the relationship between attention and
(2012) provided fMRI evidence that attention and prediction prediction. In the Opposition model, predicted (green) and unpredicted (red)
have an interactive or synergetic effect by showing greater neural signals are multiplied by 2 levels of Attention, with attended stimuli
brain activity in the visual cortex for predicted (than unpre- (solid lines) receiving a greater boost than unattended stimuli (dashed lines). In
dicted) visual stimuli, a finding which was conceptually repli- the Interaction model (proposed by Kok et al. 2012), predicted and unpredicted
signals are multiplied by 4 (instead of 2) levels of attention that depend on the
cated using electroencephalography (EEG) for auditory stimuli,
level of the prediction.
and expressed in the N1 evoked potential (Hsu et al. 2014). By
contrast, (Auksztulewicz and Friston 2015) found that attention
increased the typically observed difference between evoked
responses to unpredicted versus predicted stimuli, as reflected Methods
in an enhanced MMN, and Bekinschtein et al. (2009) found that
Participants
violation of global rules led to late evoked responses only when
participants were aware of such violations. A total of 21 healthy adults were recruited for the experiment.
In this paper, we first formalize 2 theoretical models that Data from 2 participants were excluded from further analysis
have been put forward to explain the interplay between atten- due to poor performance on the behavioral task (accuracy
tion and prediction in the brain: the Opposition model and the < 50%). The reported analysis was thus performed on data
Interaction model, introduced in Kok et al. (2012). The from 19 participants (10 females, aged 19–43, M = 24.21, stan-
Interaction model postulates that attention and prediction dard deviation = 6.11) with no reported history of neurological
interact such that neuronal activity is greatest for attended and or psychiatric disorder and no previous head trauma resulting
predicted events. This model is inspired by the idea that atten- in unconsciousness. All participants gave written informed
tion increases the precision of predictions by weighting predic- consent in accordance with the guidelines of the University
tion errors (Feldman and Friston 2010), and assumes 4 levels of of Queensland’s ethical committee, and were monetarily com-
precision, or attention, that depend on the level of prediction. pensated for their time.
By contrast, the Opposition model posits that attention and
prediction have opposing effects on neural activity, such that
Auditory Stimuli
prediction mitigates and attention boosts neural activity. The
predictions of this model are that the neuronal responses will The auditory task developed for the study is depicted in Figure 2.
be greatest for attended unpredicted stimuli, and smallest for An auditory frequency oddball sequence was played to one ear
unattended predicted stimuli. Computationally, this model at 60 dB and overlayed with Gaussian white noise at 40 dB. White
assumes that neuronal activity is weighted by 2 (instead of 4) noise only was played to the other ear at 40 dB. Two pure tones,
levels of attention (attended and unattended). This model is standards (P = 0.85) and deviants (P = 0.15), (f = 500 or 550 Hz;
agnostic about the relationship between responses to counter-balanced between blocks) of 50 ms in duration were
attended predicted and unattended unpredicted events. Both played with an inter-stimulus interval of 450 ms. Embedded in
the Interaction and the Opposition models assume that pre- the white noise of either ear were 2 types of targets: a total of 30
diction has 2 levels, such that unpredicted stimuli evoke a nonoverlapping randomized periods of no sound (gaps), which
larger neuronal response than predicted stimuli. They differ, could be singular (90 ms gaps only, 15 per block) or doubled (two
however, in their treatment of the attention component. 90 ms breaks separated by a 30 ms white noise return, 15 per
Specifically, the Opposition model offers a more parsimonious block). The gaps in the white noise of either ear were never
expression of the effects of attention on neuronal responses within 2.5 s of each other and never occurred at the same time as
(Fig. 1). a tone. Importantly, the presentation timings for the noise gaps
Here we tested these models empirically using Bayesian and the tones were uncorrelated in order to avoid any systematic
model comparison for scalp and source EEG data, as well as effects of bottom-up attention that could have otherwise con-
dynamic causal modeling (DCM). We developed a novel audi- founded the ERPs to the tones. All auditory stimuli were created
tory task in which participants were presented with indepen- using in-house Matlab scripts, recorded using Audacity Sound
dent streams of white noise concurrently in each of the 2 ears, Mixer prior to the experiment, and delivered with inner-ear buds
and were instructed to attend to the left channel, the right (Etymotic, ER3).
channel or both channels in separate blocks to detect brief gaps
in the noise streams. At the same time, an irrelevant stream of
Experimental Design
standard and deviant tones was presented in either ear
(attended or ignored), providing an orthogonal stimulus set The twelve experimental trial blocks (T = 3:32 min each) were
from which to extract neural responses to predicted and unpre- comprised of a total of 380 tones (with deviants always falling
dicted auditory events. within 4–10 standard tones). Participants were instructed to
Modeling Attention and Prediction Garrido et al. | 1773

Task
Participants were seated in front of a computer screen and
wore inner-ear buds for the duration of the experiment. Prior to
recordings, participants listened to an example auditory stream
of 1-min duration, which demonstrated the single and double
gaps in the white noise. Each participant then underwent a
brief practice session with auditory stimuli consisting of 9 sin-
gle and 9 double gaps, and a total of 110 tones. Participants
were given feedback about their accuracy in this practice block
but not in the experimental blocks. At the beginning of each

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


experimental block, the focus of attention was specified ver-
bally and an arrow (left, right, or both directions) remained on
the screen for the duration of the block as a reminder.
Participants were asked to make their keypresses in response
to target gaps as quickly and as accurately as possible, and to
ignore any gaps in the uncued ear (in the focused attention
condition). Task performance was assessed based on the per-
centage of correctly detected target gaps and reaction times.
Participants with <50% overall accuracy (proportion correct)
were excluded from further analysis.

EEG Data Acquisition and Preprocessing


Continuous EEG data were recorded with a Biosemi Active Two
system with 64 Ag/AgCl scalp electrodes arranged according to
the international 10-10 system for electrode placement using a
nylon head cap. Data were recorded at a sampling rate of
1024 Hz. Preprocessing and data analysis were performed with
SMP12 (http://www.fil.ion.ucl.ac.uk/spm/). Data were rereferenced
to a common reference, down-sampled to 200 Hz and high-pass
filtered above 0.5 Hz. Eye blinks were detected and marked using
the VEOG channel before the data were epoched offline with a
peristimulus window of −100 to 400 ms. Artefact removal was
performed by removing trials marked with an eyeblink and by
thresholding all channels at 100 uV. Trial data were robustly
averaged before being low-pass filtered below 40 Hz and base-
Figure 2. Experimental paradigm. Gaussian white noise embedded with single line corrected between −100 and 0 ms. We analysed event-
(90 ms) or double (210 ms) noise gaps (periods of silence, and the targets of this related potentials with respect to the onsets of standard and
experiment) was played to both ears (different target sequence in each ear). oddball tones, separately for conditions in which the tones
One ear received the oddball sequence of pure tones (50 ms) at either 500 or were presented in the attended ear, the unattended ear, or in
550 Hz (counter-balanced between blocks) (ISI = 450 ms, standard P = 0.85 black
either ear in the divided attention condition.
rectangle, deviant P = 0.15 hollow rectangle, respectively). Participants were
instructed to pay attention to the targets embedded in the white noise in the
left, right, or both ears and to ignore the tones. ISI = inter-stimulus interval; Spatiotemporal Image Conversion
L = left ear; R = right ear; Std = standard; Dev = deviant.
Event-related potentials were converted into 3D spatiotemporal
volumes per condition and participant. This was achieved by
interpolating and dividing the scalp data per time point into a
listen for and report target gaps within the white noise stream 2D 32 × 32 matrix. We obtained one 2D image for every time
in either the left channel only, the right channel only, or in bin (from 0 to 400 ms in steps of 5 ms). These images were then
either channel (divided attention), and to ignore the tones. stacked according to their peristimulus temporal order, result-
Each attention condition was repeated 4 times and the order ing in a 3D spatiotemporal image volume with dimensions of
of the blocks was pseudo-randomized such that no partici- 32 × 32 × 81 per participant. Data were then smoothed at
pant received the same order. When a target was identified in FWHM 12 × 12 × 20 mm3.
the attended ear(/s) participants responded with a “1” key-
press if the gap was singular and a “2” keypress if the gap was
Spatiotemporal Statistical Maps
doubled. In one-third of the blocks oddball tones were played
in the attended ear, in another third the tones were played in For each participant, the 3D spatiotemporal image volumes
the ignored ear, and in the remaining third, in which partici- were modeled with a mass univariate general linear model
pants divided their attention between ears, the tones were (GLM) as implemented in SPM12. We performed between-
presented to either side, counter-balanced between the left subject F-contrasts for (1) the main effect of attention, (2) the
and right across separate blocks. Participants performed all main effect of prediction, and (3) the interaction between atten-
blocks in one testing session of 60 min (42:24 min total task tion and prediction. Simple effects were estimated using
duration plus breaks) with an additional 30 min EEG setup between-subject t-statistic contrasts. The same statistical anal-
period. yses were performed on the 3D spatial image volume obtained
1774 | Cerebral Cortex, 2018, Vol. 28, No. 5

after source localization (see below). All sensor effects are modeling the data with regressors describing the hypothesized
reported at a threshold of P < 0.05 with family-wise error (FWE) relationships amongst the 4 different conditions.
correction for multiple comparisons over the whole spatiotem- Briefly, covariate regressor weights were applied to every par-
poral volume. For closer inspection of the main effects and ticipant and trial under the Opposition model, which predicts
interactions obtained at channel Fz (at which predictability reductions in ERP amplitudes across conditions in the following
effects are typically strongest, Naatanen and Alho (1997)), we order: (1) attended unpredicted, (2) unattended unpredicted/
implemented a 1D GLM approach using SPM12. We restricted attended predicted, and (3) unattended predicted. Next, we speci-
our time window from 0 to 400 ms after stimulus onset and, in fied a second model derived from Kok et al. (2012), the Interaction
a separate analysis, between the typical MMN time window of model, which predicts reductions in ERP amplitudes across condi-
100–250 ms (FWE corrected over the time bins considered). tions in the following order: (1) attended predicted, (2) attended

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


unpredicted, (3) unattended unpredicted, and (4) unattended pre-
dicted. Voxel-wise whole-brain log-model evidence maps were
Source Reconstruction
then created for every participant and model, estimated using the
We obtained source estimates on the cortical mesh by recon- Variational Bayes first-Level Model Specification methodology
structing scalp activity with a Boundary Element Method (BEM) described in Penny et al. (2005). Source level maps were further
and a standard MNI template for the cortical mesh, in the smoothed with a 1 mm half width Gaussian kernel. We used the
absence of individual MRIs. This forward model was then RFX approach to produce PPMs for both models at the group-
inverted with multiple sparse priors (MSP) assumptions for the level. These maps (displayed at a threshold of probability larger
variance components under group constraints. This allowed for than 75% and 50% for scalp and source, respectively) allowed us
inferences on the most likely cortical regions that generated to compare which model had the higher probability at each voxel
the sensor-level data. We obtained images from these recon- in the brain (and at each time point in the scalp level analysis).
structions for each of the 6 conditions in every participant. Further model comparisons for specific regions at the sensor level
These images were smoothed at FWHM 12 × 12 × 12 mm3. We were undertaken using brain regions selected a priori from the
then computed the main effects of attention and prediction, attention by prediction interaction contrast. At the source, these
and the interaction (attention × prediction) using conventional comparisons were made at the peak coordinates of clusters for
SPM analysis. The effect of prediction (t-statistic) is displayed at each model that exceeded 51%.
an uncorrected threshold of P < 0.001. These weaker signifi-
cance criteria were used for post-hoc visualization, once the
Dynamic Causal Modeling
effects had been established under robust criteria at the scalp
level, and we only report regions significant at P < 0.05 FWE cor- Source locations were identified based on multiple sparse
rected at the cluster level. priors source reconstruction of the overall mismatch (P < 0.05
uncorrected threshold). These regions were: bilateral primary
auditory cortices (A1; MNI coordinates: left [−42, −24, 34] and
Statistics
right [44, −22, 38]), bilateral inferior temporal gyri (ITG; MNI coor-
Significance sensor space maps for prediction effects are dis- dinates: left [−42, −10, −38] and right [44, 0, −42]) and left inferior
played at P < 0.05 corrected for multiple comparisons using frontal gyrus (LIFG; MNI coordinates: [−50, 32, 0]). The choice for
family-wise error rate. The interaction map is displayed at P < the DCM source nodes was undertaken in a data-driven fashion,
0.01 uncorrected for the purpose of defining a region of interest rather than being based on a priori models drawn from the litera-
for follow up Bayesian Model Selection (BMS). Source maps are ture. Previous papers have consistently tested models with A1,
displayed at P < 0.001 uncorrected, but only significant cluster- STG, and IFG, nodes that we first proposed in 2007 based on a
level PFWE < 0.05 are reported. number of fMRI and MEG studies (Garrido et al. 2007). The simi-
BMS was employed to make inferences on both scalp and larity of the model space proposed here and in previous papers is
source maps, as well as on DCMs. Note that this framework evident, except for the replacement of STG with ITG. This was
uses model evidence as a relative (probabilistic) measure for motivated by the strong evidence for ITG both in the standard
how well one model explains the data relative to another, con- SPM analysis at the source level (P < 0.05, FWE-cluster corrected),
sidered in the model space. Importantly, model evidence seeks and the posterior probability maps (probability >80%). It is impor-
the optimal balance between accuracy and model complexity, tant to note, however, that while some papers have tested mod-
by favouring the former and penalizing the latter. els pertaining to the presence or absence of nodes such as STG
and IFG, all assumed that these nodes were correct, given pre-
vious literature, rather than refining the model progressively
Bayesian Model Selection
through exploring other candidate nodes (e.g., ITG instead of
To compare the 2 models (Opposition and Interaction; see STG) through DCM optimization or other forms of source recon-
Introduction) of the effects of attention on prediction (standard struction. Given the strong evidence from our source recon-
and deviant tones) we used the BMS methodology described in structed data and the posterior probability maps, as well as the
Rosa et al. (2010), and adapted here for EEG. For this analysis novelty of the paradigm, here we took a data-driven approach
we discarded trials from the divided attention condition and rather than adhering to an assumed model specification.
used only the attended and unattended trials from the focused Nevertheless, we ran a validation check to rule out the possibil-
attention conditions (attend left ear only, attend right ear only) ity that our source reconstructed nodes were less reliable than
for both standard and deviant tones. We created posterior the nodes taken a priori from the literature. BMS revealed that
probability maps (PPMs) from individual participant log-model the models with the source reconstructed nodes outperformed
evidences using a random-effects approach (RFX). Here, the the models with a priori nodes by 60% probability.
winning model was the one with the highest log-evidence Note that in the absence of individual anatomical land-
(assuming uniform priors over the models) across participants. marks, we used a standard MNI template for the cortical mesh
We performed this analysis at the sensor and source levels by in our source reconstruction, which was then used to identify
Modeling Attention and Prediction Garrido et al. | 1775

candidate nodes for the DCMs. Whether including anatomical each ear—were grouped into unilateral (focused) or bilateral
information would improve the source reconstruction results (divided) attention conditions (30 targets over 8 blocks and 60
at the group level is unclear. This raises an interesting model targets over 4 blocks, respectively). We excluded any partici-
comparison related to that addressed in Mattout et al. (2007); pants who did not achieve mean response accuracy >50%.
Henson et al. (2009), who showed that individual MRI does not There was no significant difference in response accuracy (P =
add to the precision of source estimates compared with an 0.14) between the unilateral (M = 71.80%, standard error of
individual deformed template. This was done for MEG data, mean [SEM] = 5.19%) and the bilateral (M = 68.33%, SEM =
however, and it is unclear what the impact on EEG might be 5.13%) conditions. Participants were significantly faster (P =
when using an MNI template without individual deformations. 0.03) to respond in the bilateral (M = 748.16 ms, SEM = 27.67 ms)
Given that MEG has higher spatial resolution and is more sensi- than the unilateral conditions (M = 779.79 ms, SEM = 34.13 ms),

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


tive to approximations in source models, however, it is likely likely reflecting a strategy of responding immediately to any
that any potential benefit afforded by individual MRIs would be target gap when monitoring both ears under divided attention,
smaller (not larger) for EEG than for MEG (shown to be negligi- as opposed to having to select only relevant gaps in the focused
ble). Furthermore, the sensitivity of our group level inference attention conditions (filtering out gaps in the ignored ear).
yielded a reconstruction of the expected brain regions underly-
ing the MMN (within the temporal and inferior frontal cortex),
Attention Amplifies Prediction Errors—Single
even in the absence of a highly realistic head model. Importantly,
Channel Analysis
the locations of the source reconstruction were only used as soft
priors in the subsequent DCM analysis, so that source locations ERPs corresponding to each of the experimental conditions (as
could be adjusted individually during the connectivity estimation well as the MMNs derived from subtracting the standards from
procedure. the deviants within a condition) were extracted from electrode
We first optimized the basic connectivity architecture using Fz and compared over time (Fig. 3). The N1 and P2 components
responses to attended and unattended standards and deviants were examined as an average across participants and condi-
with no between-trial effects present. This first step considered tions. For this, the lowest time point between 50 and 150 ms
2 competing model structures that included bilateral A1 and and highest point between 150 and 250 ms were determined
ITG, but differed in the presence or absence of LIFG. Next, the from the omnibus ERP plot (i.e., the mean ERP across all partici-
pattern of changes in extrinsic connectivity was optimized pants and conditions over time). These time indexes ±25 ms
under the fully connected architecture (the winning model) were then used to find the average ERP per condition. Statistical
using responses in all 4 conditions for the Opposition and tests of the N1 components found only a main effect of surprise
Interaction models. The family of Opposition models used a (F[1,72] = 4.9583, P = 0.0291). Similarly, at P2, there was a main
between-trial effect of [1, 2, 2, 3] for the attended predicted, pre- effect of surprise (F[1,72] = 17.5898, P = 7.7001e-05), but no fur-
dicted attended, unpredicted unattended, and attended unpre- ther significant main effects or interactions. In addition, results
dicted, respectively. The family of Interaction models, on the at Fz from 0 to 400 ms using the 1D GLM approach revealed a
other hand used [1, 2, 3, 4] for predicted unattended, unpredicted significant main effect of Attention between 290 and 340 ms
unattended, unpredicted attended, and attended predicted, (PFWE_cluster = 0.006), and a significantly larger prediction error
respectively. The choice of the weights for the between-trial for attended relative to unattended conditions at 115–120 ms
effects was motivated by the theoretical relationship proposed (PFWE_cluster = 0.020). We then restricted our analysis to the MMN
in Figure 1. It is important to note that there is an infinite num- time window (100–250 ms) and again found a significant main
ber of possible combinations of weights that could satisfy the effect of Attention but at an earlier period between 200 and
general ordinal relationship between the 4 conditions specified 230 ms (PFWE_cluster = 0.028). Moreover, there was a significantly
in Figure 1. Here, we assumed a linear relationship, in the larger prediction error for attended versus unattended condi-
absence of theoretical or empirical evidence to assume an other- tions between 100 and 130 ms (PFWE_cluster = 0.046). These find-
wise more complex relationship. Specifically, for the Opposition ings demonstrate that attention amplifies prediction errors.
Model, we compared models using [1 2 2 3] or [1 3 3 4], as we had
no reason to believe that the attended predicted and the unat-
Larger Responses to Unpredicted Than Predicted Events
tended unpredicted conditions would be closer to the unat-
Regardless of Attention Level—Sensor and Source Space
tended predicted condition than to the attended unpredicted
condition. Bayesian model comparison revealed that the former As shown in Figure 4A, the main effect of Prediction, or surprise
outperformed the latter. (standards vs. deviants), disclosed several significant compo-
Fifteen competing models were tested, each with a different nents comprised of 2 late effects. The first late effect was
subset of connections—forward (F), backward (B), and recurrent detected from 200 to 220 ms (peak-level Tmax = 8.30, cluster-
(R)—which also included (subscript i) or excluded intrinsic modu- level PFWE < 0.001; at frontocentral channels). The second late
lations of A1, and a single null model. Finally, the Opposition and component was observed from 290 to 295 ms (peak-level Tmax =
Interaction model-dependent changes in intrinsic connectivity 5.02, cluster-level PFWE = 0.004; at right parieto-occipital chan-
were then grouped by families, under the optimized connectivity nels). We also found simple Prediction effects in all of the
architecture. In both DCM estimation steps, models were attention manipulations (Fig. 4B), that is, attended (peaking at
inverted using a 0–400 ms peristimulus time window. 185 ms), unattended (peaking at 210 ms), and divided (peaking
at 195 ms). While there appeared to be qualitatively differences
in the strength and extent of the prediction effects across
Results Attention conditions, the interaction between Attention and
Prediction did not survive correction for multiple comparisons.
Behavioral Findings on Attentional Manipulation
We then used a multiple sparse priors source reconstruc-
Behavioral results for the target detection task—discriminating tion method to investigate the cortical regions that generated
single- and double-gaps in concurrent white noise streams in the effects at the scalp level. Statistical parametric maps for
1776 | Cerebral Cortex, 2018, Vol. 28, No. 5

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


Figure 3. Event-related potentials extracted from electrode Fz for each condition (mean/SEM). (A) The ERPs for each of the experimental conditions were extracted
from electrode Fz and compared over time. The grey shadings indicate the temporal widows during which a significant main effect of attention was found (**corrected
for the whole epoch, *corrected within the a priori MMN time window). (B) ERPs for attended and unattended prediction errors (the MMNs; i.e., the difference between
unpredicted and predicted) are plotted at electrode Fz. Grey shading indicates the temporal window during which a significant Attention by Prediction interaction
was found (*corrected within the a priori MMN time window).

source-reconstructed images revealed 2 significant clusters for is, the evidence that a given model (Opposition or Interaction)
the main effect of Prediction in the left ([−42 −10 −38], peak- generated the data.
level Tmax = 4.14, cluster-level PFWE = 0.019) and right inferior As shown in Figure 5, BMS revealed that the Opposition
temporal gyri ([44 0 −42], peak-level Tmax = 3.77, cluster-level model (“Attention and Prediction oppose”) was the more likely
PFWE = 0.023) (Fig. 4C). (>75% model probability) explanation for the data across most
frontocentral channel locations at the majority of time points
(70–210 and 290–375 ms). However, the Interaction model
(“Attention and Prediction interact”) had a higher probability
Opposition Wins Over Interaction—Evidence From
(>75%) of explaining the data between 170 and 230 ms (i.e.,
Posterior Probability Maps
within the MMN time window) at central and lateral parietal
Scalp level channel locations. Thus, the relationship between Attention
BMS was used to compare the 2 competing models of the rela- and Prediction differed depending on both the time point and
tionship between Attention and Prediction (the Opposition or scalp location; although more often than not, Attention and
Interaction models; see Fig. 1). Specifically, we were interested Prediction had opposing effects.
in comparing the strength of neural activation under the differ- The fact that the Interaction model won within the MMN
ent manipulations of attention and prediction. We used ran- window and yet we did not find a significant interaction in
dom effects BMS to create group-level PPMs for each model, the classic GLM analysis could perhaps be explained by a
derived from the log-model evidence of each participant, that Prediction by Attention interaction effect that did not survive
Modeling Attention and Prediction Garrido et al. | 1777

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


Figure 4. Main and simple effects of prediction at the scalp and source levels. (A) Spatiotemporal statistical analysis revealed significant effects of prediction (pre-
dicted vs. unpredicted) over frontocentral areas around 220 ms and over posterior parietal areas at 295 ms (displayed at P < 0.05, FWE whole-volume corrected). (B)
The effects of prediction across the 3 attentional manipulations revealed a prediction effect in the attended condition at 185 ms, the divided attention condition at
195 ms, and in the unattended condition at 210 ms, all located frontocentrally (displayed at P < 0.05, FWE whole-volume corrected). There was no significant interac-
tion (difference in the MMN between the attention conditions). (C) Source reconstruction analysis revealed a main effect of prediction within the left and right inferior
temporal gyri. (Displayed at P < 0.001 uncorrected and FWE corrected at the cluster-level.)

correction for multiple corrections. We further examined a that of the Interaction model, with a value of 80%. Thus,
potential interaction effect, hindered perhaps by a rather con- Attention and Prediction appear to have opposing effects later
servative multiple comparison correction procedure. Firstly, we in time.
used more lenient, uncorrected peak-level statistics to select 2
small interaction clusters at 175 ms (peak-level Fmax = 5.79,
peak-level Puncorr = 0.004; at central channels) and 360 ms (peak- Source Level
level Fmax = 5.45, peak-level Puncorr = 0.006; at right parietal Finally, we applied the same BMS technique employed at the
channels—see Fig. 6). We then took the spatiotemporal coordi- sensor level to our source reconstructed results. BMS revealed
nates of these clusters and extracted the posterior probability that the Opposition model had the higher model probability
of each model at that particular location. We constructed a 103 and larger clusters at the source (Fig. 7). The Opposition model
cube around these coordinates and took the average posterior achieved >50% model probability in the left middle temporal
probability of each model over that volume. Our reasoning was gyrus (cluster size; KE = 82) and right inferior temporal gyrus
that if an interaction between Attention and Prediction were (cluster size; KE = 288). Conversely, the Interaction model
present in the data, then the Interaction model would have a achieved > 50% model probability in a smaller cluster in the left
higher posterior probability compared with the Opposition middle temporal gyrus (cluster size; KE = 32). We then com-
model at these coordinates. We found that at 175 ms over fron- pared the model probabilities at the center of these clusters
tocentral channels there was a negligible difference between and showed that the Opposition model was more probable
the Opposition and Interaction models, with 48% and 52%, than the Interaction model in the left middle temporal and
respectively (Fig. 6). However, at 360 ms over the right lateral right inferior temporal gyri (winning with 82% and 78% proba-
parietal area, the Opposition model probability far exceeded bility, respectively). Furthermore, model probabilities extracted
1778 | Cerebral Cortex, 2018, Vol. 28, No. 5

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


Figure 5. Scalp Posterior Probability Maps of the Opposition and Interaction models over space and time. Maps display the posterior probability for both models, thre-
sholded at probability >75% over space and time. Scalp maps show the 4 time points with the largest significant clusters. The Opposition model wins (Attention and
Prediction oppose) across most frontocentral channels at the majority of time points (70–210 and 290–375 ms). The Interaction model wins (Attention and Prediction
interact) at the frontocentral and lateral parietal regions of the scalp (channel locations) between 170 and 230 ms.

Dynamic Causal Modeling


The prior location of the cortical sources included in our DCMs
was based on MSP source reconstruction of ERPs corresponding
to the 4 conditions (attended standards, attended deviants,
unattended standards, and unattended deviants) of the Overall
Mismatch. Statistical parametric maps were inspected at a
more liberal threshold of P < 0.05 (uncorrected) to identify can-
didate neural sources of the effects observed on ERP amplitude
for the DCM analysis (Auksztulewicz and Friston 2015).
Following the selection of candidate sources, model structure
was optimized by comparing 2 alternative connectivity models
using data from each of the experimental conditions, with or
without bilateral connections between the left inferior tempo-
ral and inferior frontal gyri, with no between trial effects pres-
ent. Results indicated that the best model included recurrent
connection amongst all regions, that is, inputs to LA1 and RA1,
with LA1 connected to LITG, and LITG connected to LIFG, as
well as connections linking RA1 and RITG, and lateral connec-
tions between LITG and RITG. The selected model was then
used to further optimize condition-specific changes in the
extrinsic connectivity by comparing the types of extrinsic con-
nections present. Next, 15 competing models were tested
(Fig. 8A, B), each with a different subset of condition-specific
Figure 6. Bayesian Model Comparison within the spatiotemporal clusters modulations of connections, according to the Opposition and
extracted from the Prediction by Attention interaction. We extracted model Interaction models on forward (F), backward (B), and recurrent
probabilities using the coordinates (scalp location and time points) of 2 clusters (R) connections (with, i, and without intrinsic modulations of
from the Interaction results (based on the liberal threshold of P < 0.001 uncor-
A1), as well as a null model precluding any modulations (N).
rected). If interaction effects were present, the Interaction model would be
more likely to win over the Opposition model at these coordinates. At 175 ms
These models were fitted to each participant’s data to explain
(within the MMN time window) and over central electrodes, there was a very observed differences in ERP amplitude. Random-effects BMS
slight advantage for the Interaction over the Opposition model. However, at revealed that the Opposition model with modulation of forward
360 ms over right lateral parietal channels, the Opposition model probability far connections outperformed all other models (Fig. 8C).
exceeded that of the Interaction model, with a probability of 80%.

Discussion
from the peak of the Interaction model cluster showed only a In this study, we adjudicated between 2 alternative computa-
slight advantage for the Interaction over the Opposition model tional models of the effect that spatial attention has on expec-
(with 57% probability for the Interaction model) in the left mid- tations. Using Bayesian model comparison of scalp PPMs we
dle temporal gyrus. Such a small difference between the proba- found that, except for an early time window (within the typical
bility of the Interaction model over the Opposition model at MMN), the Opposition model won over the Interaction model.
this cluster suggests we should be cautious in drawing any This suggests that, for the most part, attention provides an
strong conclusions about its functional anatomy. equivalent boost to neuronal responses to predicted and
Modeling Attention and Prediction Garrido et al. | 1779

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


Figure 7. Source posterior probability maps of the Opposition and Interaction models (top) and model probabilities for the 3 major clusters of the 2 models (bottom).
BMS was used for model inference at the group-level for the source reconstructed images. Here, the Opposition model achieved >50% model probability in the left
middle temporal (cluster size; KE = 82) and right inferior temporal (cluster size; KE = 288) gyri. The Interaction model achieved >50% in a small cluster in the left mid-
dle temporal gyrus (cluster size; KE = 32). Overall, the Opposition model achieved higher probability over a larger number of voxels. Extraction of model probabilities
from the peak of the Opposition clusters showed that this model won with 82% probability in the left middle temporal gyrus and with 78% probability in the right
inferior temporal gyrus. Model probabilities extracted from the peak of the Interaction cluster showed minimal differences between either model at this location,
with 57% posterior probability for the Interaction model. Note the differences in the color map scales between the Opposition and Interaction models.

unpredicted stimuli. Similarly, at the source level we found finding of a prediction error effect regardless of attention is oppo-
stronger evidence for the Opposition model underlying a fron- site to Todorovic et al. (2015), who found that while beta syn-
totemporal network. We investigated this further with DCMs chrony decreased with expectation in the unattended condition,
that employed trial-dependent plastic changes according to no difference was found in the attended condition. The latter is
either the Opposition or the Interaction model. In agreement seemingly at odds with the idea that attention amplifies predic-
with the model-based scalp and source analysis, we found that tion errors as previously shown (Jiang et al. 2013;
the family of Opposition models better explained the data. Auksztulewicz and Friston 2015), and as revealed in the current
Classic SPM analysis of spatiotemporal maps revealed an effect study. A number of factors could explain such conflicting
of prediction across and within all attentional manipulations, results. Perhaps most importantly, very different paradigms
which peaked within the typical MMN time window and at and measures were employed across the relevant experiments.
frontocentral channels. This effect was statistically greater in Both our study and that of Auksztulewicz and Friston (2015)
the attended compared with the unattended conditions at the investigated the effects of attention and prediction on evoked
single channel level, where MMN is typically seen, suggesting responses in an oddball paradigm, whereas Todorovic et al.
that attention amplifies prediction errors. At the whole spatio- (2015) focused on endogenous oscillatory activity. Moreover,
temporal map level, however, this interaction effect did not both Auksztulewicz and Friston (2015) and Todorovic et al.
survive correction for multiple comparisons over the whole (2015) manipulated temporal attention, whereas here we
space-time, despite the appearance of somewhat larger clusters manipulated spatial attention. Finally, in our experiment atten-
for the attended than the unattended condition, tion and prediction were manipulated within the same spatial
Our finding of a prediction error effect in all attention condi- location (left or right ears), but were drawn toward independent
tions (attended, unattended, and divided) is in agreement with a auditory “objects” (noise for the attention task, and tones for
vast body of work suggesting that the MMN is elicited regardless the concurrent oddball stream). By contrast, the aforemen-
of attention, and hence is “pre-attentive” in nature (Naatanen tioned studies (and that of Kok et al. (2012)) manipulated atten-
et al. 2001). This is in contradistinction to Auksztulewicz and tion and prediction within the same (visual or auditory) object. It
Friston (2015), who did not find an effect of prediction in the is possible that our attention manipulation, based on spatial
absence of attention (although this might have been due to a selectivity, had a small effect on the tones (in the attended con-
lack of power, as very few trials were included). Again, our dition), given that these were task-irrelevant and that they
1780 | Cerebral Cortex, 2018, Vol. 28, No. 5

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


Figure 8. Dynamic Causal Modeling hypotheses testing for plastic changes according to the Opposition and Interaction families of models. (A) Eight model architec-
tures were considered, which tested for trial-specific modulation in forward (F), backward (B), and recurrent (R, i.e., both forward and backward) connections, as well
as a null model precluding any modulations (N) in extrinsic connections. These models were considered with (bottom row) and without (top row) intrinsic modula-
tions of A1 (subscript i). The nodes in the model included bilateral primary auditory cortices (LA1 and RA1), bilateral inferior temporal gyri (LITG and RITG) and left
inferior frontal gyrus (LIFG). (B) Extrinsic connectivity was optimized using responses in all 4 conditions under the Opposition or Interaction model. A total of 15 com-
peting models were tested, each with a different subset of condition-specific modulation of connections, according to the Opposition (blue) and Interaction (green)
models on F, Fi, B, Bi, R, Ri, and Ni. N did not include any modulations (orange). Summed model exceedance probabilities across each family show the winning family
as the Opposition Family (left; blue). (C) The winning model architecture had recurrent connections between all regions, intrinsic modulation of A1, lateral connec-
tions between bilateral ITG, and included condition-dependent modulations according to the Opposition model in the forward connections (blue lines).

never occurred at the same time as the task-relevant noise considering regions of the visual cortex (V1, V2, and V3). Here,
gaps. However, we believe that this is improbable for 2 reasons. however, we took a different approach by implementing the
First, the onset of the noise gaps was unpredictable and hence models computationally and directly testing them against our
participants had to constantly monitor the stream of sounds on data. By using Bayesian model comparison of statistical maps
the task-relevant side of space. Second, it is unlikely that partici- of EEG activity, and DCMs for ERPs, we were able to quantify
pants learned that the noise gaps never coincided with the tones, how likely each of these 2 models was at every point of space
and could therefore momentarily disengage attention from the and time at the scalp level, at each voxel in source space, and
noise task. Having said that, the possibility remains that by hav- in the trial-dependent plastic changes within a cortical net-
ing the participants focus on the noise streams instead of the work. The Opposition model was unambiguously favored in
tones, our attention manipulation might not have influenced the our data at every level, that is, scalp, source, and network. At
neural representations of the tones as much as it would have, the network level we found that the plastic changes according
had we asked the participants to focus on the tones. Future work to the Opposition model were more pronounced in forward
should test whether manipulating attention and prediction for connections. This is consistent with the idea that attention
common versus independent stimuli alters the extent to which boosts, or heavily weights, prediction errors, which are then
they interact. conveyed upward in the cortical hierarchy. Such prediction
In this work we directly compared 2 competing models of errors signal the need to update an internal perceptual model
the effects of attention on expectations—the Interaction and of the world, in turn prompting learning. At first glance it may
Opposition models—put forward in Kok et al. (2012). The data appear that boosting of prediction errors is more consistent
in that study were consistent with the Interaction model when with the Interaction model. It is important to note, however,
Modeling Attention and Prediction Garrido et al. | 1781

that the corollary of the Interaction model is that attention Fellowship (FL110100103) to J.B.M., the ARC Centre of Excellence
reverses prediction, such that larger responses will be observed for Integrative Brain Function (ARC Centre Grant CE140100007) to
for predicted compared with unpredicted stimuli. In this sense, M.I.G. and J.B.M., and an ARC Special Research Initiative—Science
attention changes the sign of the prediction error instead of of Learning Research Centre (SR120300015) to J.B.M.
boosting it. On the contrary, boosting of prediction errors could
in principle be accommodated by the Opposition model as it
predicts a larger difference between unpredicted and predicted
Notes
responses in the attended versus unattended condition. Having We thank the volunteers for participating in this study and
said this, our instantiation of the Opposition model is agnostic Maria Joao Rosa for discussions. Conflict of Interest: The authors
to such a relationship and was not modeled explicitly here. The declare no competing financial interests.

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020


Opposition model simply assumes that unpredicted responses
will always be larger than predicted responses, regardless of
attention, and that attention will boost these responses. It may
References
also appear surprising that our attention manipulation did not Auksztulewicz R, Friston K. 2015. Attentional enhancement of
modulate backward connections, which are thought to convey auditory mismatch responses: a DCM/MEG study. Cereb
updated predictions (Friston 2005). In our paradigm, however, Cortex. 25:4273–4283.
the predictions did not require constant updating, unlike in Bekinschtein TA, Dehaene S, Rohaut B, Tadel F, Cohen L, Naccache
some other paradigms in which the rule constantly changes, L. 2009. Neural signature of the conscious processing of audi-
such as in roving MMN (Garrido et al. 2008) or in reversal learn- tory regularities. Proc Natl Acad Sci USA. 106:1672–1677.
ing (Ghahremani et al. 2010). In such scenarios, it is possible Desimone R, Duncan J. 1995. Neural mechanisms of selective
that attention would modulate prediction updating via feed- visual attention. Annu Rev Neurosci. 18:193–222.
back processing (Desimone and Duncan 1995; Spratling 2008). Feldman H, Friston KJ. 2010. Attention, uncertainty, and free-
We should, however, be cautious when interpreting the find- energy. Front Hum Neurosci. 4:215.
ings from our best individual model. While we have good evi- Friston K. 2005. A theory of cortical responses. Philos Trans R
dence for an advantage of the Opposition family (77%) over the Soc Lond B Biol Sci. 360:815–836.
Interaction family (17%), and thus can assert that attention and Garrido MI, Friston KJ, Kiebel SJ, Stephan KE, Baldeweg T, Kilner
prediction have opposing effects on plastic changes, we are JM. 2008. The functional anatomy of the MMN: a DCM study
less confident about where exactly in the network these effects of the roving paradigm. Neuroimage. 42:936–944.
might be expressed, given the relatively small advantage for Garrido MI, Kilner JM, Kiebel SJ, Stephan KE, Friston KJ. 2007.
the forward model over the remaining models tested. Dynamic causal modelling of evoked potentials: a reproduc-
While the better performance of the Opposition over the ibility study. Neuroimage. 36:571–580.
Interaction model is generally at odds with the findings by Hsu Garrido MI, Sahani M, Dolan RJ. 2013. Outlier responses reflect
et al. (2014) and Kok et al. (2012), there was a narrow window of sensitivity to statistical structure in the human brain. PLoS
agreement in which the Interaction model was better at Comput Biol. 9:e1002999.
explaining the data at the scalp level, perhaps tellingly within Ghahremani DG, Monterosso J, Jentsch JD, Bilder RM, Poldrack
the MMN time frame. This is an interesting finding, as it seems RA. 2010. Neural components underlying behavioral flexibil-
to point to a tonic Opposition effect between Attention and ity in human reversal learning. Cereb Cortex. 20:1843–1852.
Prediction, and a phasic Interaction effect. Again, there are differ- Henson RN, Mattout J, Phillips C, Friston KJ. 2009. Selecting for-
ences in both the type of paradigm and the neuronal measures ward models for MEG source-reconstruction using model-
between our study, which used EEG, and the experiment of Kok evidence. Neuroimage. 46:168–176.
et al. (2012), which used fMRI. Although Attention was manipu- Hsu YF, Hamalainen JA, Waszak F. 2014. Both attention and
lated spatially in both studies, in our study it was directed prediction are necessary for adaptive neuronal tuning in
towards a different (instead of the same) object. Moreover, our sensory processing. Front Hum Neurosci. 8:152.
Prediction manipulation was learnt from the sequence of sti- Jiang J, Summerfield C, Egner T. 2013. Attention sharpens the
muli, rather than instructed (as in Kok et al. (2012)). distinction between expected and unexpected percepts in
In conclusion, our findings provide empirical evidence for a the visual brain. J Neurosci. 33:18438–18447.
computational model of the opposing interplay of attention Kok P, Rahnev D, Jehee JF, Lau HC, de Lange FP. 2012. Attention
and expectations in the brain. These opposing effects are mani- reverses the effect of prediction in silencing sensory signals.
fested in neuronal activity and in plastic changes within a fron- Cereb Cortex. 22:2197–2206.
totemporal network engaged in sensory prediction errors. We Mattout J, Henson RN, Friston KJ. 2007. Canonical source recon-
demonstrate that attention boosts neuronal responses to pre- struction for MEG. Comput Intell Neurosci. 2007: Article ID
dicted and unpredicted stimuli, and replicate the finding that 67613.
attention boosts prediction errors, in keeping with the predic- Montague PR. 1999. Reinforcement learning: an introduction.
tive coding framework (Rao and Ballard 1999; Friston 2005). Trends Cogn Sci. 3:360–360.
Finally, we demonstrate that prediction errors are elicited Naatanen R, Alho K. 1997. Higher-order processes in auditory-
regardless of one’s state of attention, providing further support change detection. Trends Cogn Sci. 1:44–45.
to the idea of a preattentive nature of change detection systems Naatanen R, Tervaniemi M, Sussman E, Paavilainen P, Winkler
in the brain (Naatanen et al. 2001). I. 2001. “Primitive intelligence” in the auditory cortex.
Trends Neurosci. 24:283–288.
Opitz B, Mecklinger A, Friederici AD, von Cramon DY. 1999. The
Funding functional neuroanatomy of novelty processing: integrating
Australian Research Council (ARC) Discovery Early Career ERP and fMRI results. Cereb Cortex. 9:379–391.
Researcher Award (DE130101393) and a University of Queensland Penny WD, Trujillo-Barreto NJ, Friston KJ. 2005. Bayesian fMRI time
Fellowship (2016000071) to M.I.G., an ARC Australian Laureate series analysis with spatial priors. Neuroimage. 24:350–362.
1782 | Cerebral Cortex, 2018, Vol. 28, No. 5

Rao RP, Ballard DH. 1999. Predictive coding in the visual cortex: Summerfield C, Egner T. 2009. Expectation (and attention) in
a functional interpretation of some extra-classical recep- visual cognition. Trends Cogn Sci. 13:403–409.
tive-field effects. Nat Neurosci. 2:79–87. Summerfield C, Koechlin E. 2008. A neural representation of
Rosa MJ, Bestmann S, Harrison L, Penny W. 2010. Bayesian prior information during perceptual inference. Neuron. 59:
model selection maps for group studies. Neuroimage. 49: 336–347.
217–224. Todorovic A, Schoffelen JM, van Ede F, Maris E, de Lange FP.
Spratling MW. 2008. Reconciling predictive coding and biased 2015. Temporal expectation and attention jointly modulate
competition models of cortical function. Front Comput auditory oscillatory activity in the beta band. PloS One. 10:
Neurosci. 2:4. e0120288.

Downloaded from https://academic.oup.com/cercor/article-abstract/28/5/1771/3571164 by Biology Library user on 03 March 2020

You might also like