Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

1

This module, we're going to talk about the basics of what we can and can't say with FMRI data. And we're
going to do this through the lens of crises in psychology and neuroscience. There are many surprising claims
that have been made in the literature, claims like this one, from the New York Times. In this study, which is
written up as a New York Times op-ed piece, the authors claim to have scanned a number of brains of
Democrats and Republicans, and they claim to be able to make inferences about what somebody's attitudes
are, depending on what their brain scan show. 

So for example, they write, the two areas in the brain associated with anxiety and disgust, the amygdala and
the insula, were especially active when men viewed Republican, and based on that, they're inferring that the
voters sense peril and threat and disgust, and feel disgust. And also emotions about Hilary Clinton are
mixed, based on activation in the amygdala and the nucleus accumbens, which is supposed to be related to
reward, and so on. 

Here's another surprising claim. In this article, Martin Lindstrom scanned people viewing their iPhones. And
what he writes is most striking of all was the flurry of activation in the insular cortex of the brain, which is
associated with feelings of love and compassion. It was discussed in the previous article. Subjects brains
responded to the sound of their phones as they would respond to the presence or proximity of a girlfriend,
boyfriend, or family member. The subjects love their iPhones. 

Many of these claims are demonstrably false, others are unlikely to be true, and this isn't limited to brain
imaging. This is an endemic problem, in psychology, psychiatry, neuroscience, genetics, biology and other
fields. 

And these are illustrations of some of the more extreme examples of what I would call the replicability
crisis. This is one of three crises in neuroscience that we're going to talk about today, with a view towards
how to avoid them, and how to educate ourselves to make better and smarter inferences about brain
images. The replicability crisis is fueled by findings that are not true and effects that might be true, but are not
meaningfully large or not reproducible from laboratory to laboratory. The interpretability crisis concerns
findings that might be true, but they're not meaningful in terms of underlying neuroscience and what we know
about brains. And finally, the translation crisis refers to this idea that we've done many many studies, but it's
very difficult to take brain imaging research, like many kinds of research, genetics research and other
research, and bring that from science into the practical domain, where we actually have commercial and
clinical applications. So this , yeah. But what we'll see next is a funny example of one of the studies recently
that's really fueled this debate. And it's a study of extra sensory perception published in one of the
most prestigious journals in psychology, General Personality and Social Psychology. >> According to a study
by Cornell Psychology Professor, Daryl Bem, soon to be published in the Journal of Personality and Social
Psychology, there is quote, strong evidence for extrasensory perception, the ability to sense future events. I
know you're thinking, Stephen, that's bulls . >> >> But on the other hand, I know you're thinking, Stephen,
that's bulls . >> So many of these surprising claims are not true. And how do we know that they're not all
true? Any one of them could actually be a surprising but true finding. And one way that we can know is to
look at studies across different fields, across the sciences. So here you see plot from studies from the space
sciences, and geo sciences through to social sciences, immunology, climatology, psychiatry, psychology
down at the bottom. And what we're looking at, is the proportion of papers that claim to support the
hypothesis they tested. And these claims, well, so what's the base rate of having a hypothesis that you end up
supporting? It's probably low, but here most studies, and in fact in psychology and psychiatry, nearly all
studies end up supporting the claims that they tested. And one explanation is that they're choosing the claims
to support, to fit the data. 

So this is related to what's called the studies publications bias or the File Drawer Problem, which is a problem
throughout sciences. And the idea is that studies with negative results go in the file drawer, you never see
them, but studies that happen to find significant results, even if they're not true or not replicable, end up
getting published. And because of this false findings are like rumors, they're easy to spread, they're hard to
dispel. And one positive finding can cause a ripple that ends up influencing the field for years to come. And if
those findings are false, this is obviously a dangerous practice. One of the most famous examples of this is
recent controversy over whether autism is caused by vaccines. And there was one publication that claimed to
had find this. It was picked up by the media and by several celebrities and spread very widely. And in spite of
many, many large scale studies failing to find such associations, it continues to be an idea that's prevalent
among certain sectors of the community, and there's a lot of harm that's caused by that. The backlash against
these replicability issues has been quite strong and it's not always very specific. So in this article, Ingfei Chen
is saying that the vast majority of brain research is drowning in uncertainty. Not just a few isolated studies,
but many, many areas of neuroscience, and while there's some truth to this, there's some danger in this also. 

Here are some of the important papers that look at sample size issues, that look at issues of finding
interactions and complex effects where it might be difficult to demonstrate such effects, and really critique
then of all kinds of research by John Ioannidis. What you see here, Why Most Published Research Findings
are False. So, these claims about replicability issues have been applied really broadly across fields. 

And this is one of the papers that's been influential in the brain imaging field.  It was originally called Voodoo
Correlations in Social Neuroscience, which was a targeted attack really on replicability and social
neuroscience studies that look at correlations between brain activity and measure of personality, or attitudes,
or behavior. And they subserviently changed the title to Puzzlingly High Correlations in Social Neuroscience,
but it really caused a big stir. And Sharon Begley was one of the early advocates of brain imaging research
in the news who covered a lot of these stories, and here she writes the neuroscience blogosphere is crackling
with- so far- glee over this paper. And it really it made a big stir. 

One question is whether this is caused, that these problems are caused, by a few bad apples. So one example
recently is Diederik Stapel, who had a series of very high profile findings in science and other journals, and
subsequently was found to have committed fraud and admitted to fabricating data from a number of
papers. And this is a terrible thing, but it's unfortunate that it happens once in a while, and we can ask, are all
these problems really just caused by a few scientists who are overtly unethical? And I think this is not the
case, this isn't just caused by isolated breaches of scientific integrity. It's caused by really a widespread bias
even by well-meaning scientists. So, we have to educate ourselves on what the sources of bias are, and then
how to work so that we minimize those sources of bias and can overcome them. 

There are many reasons for this, but let's look at what Ioannidis says about why most published research
findings are false, and what are the factors involved. So here's some risk factors for false positives. One is
smaller studies, and when we apply this to neuroimaging studies specifically, neuroimaging studies are very
expensive to collect and analyze, and so they have been quite small by comparison to other studies. The low
prior probability of the effect being true, or at least very large, the ESP example that we looked at from Bem's
research is one such example. Some people believe in ESP, other people don't, but there's no known
mechanism and even Bem says he doesn't know how it works. And if there's no known physical
mechanism, then it's less likely that any phenomenon is true a priori, and that makes findings that they're
surprising in this way. That find things with no mechanism less likely to be true. So we need more evidence to
demonstrate that they are, if indeed they are. 

Three is the number of tests conducted. We'll talk more about this in this lecture and the next lecture, but in
neuroimaging, we have many voxels that we test, sometimes hundreds of thousands of voxels. We can test
multiple contrasts, multiple maps and so the risk of false positives is very great. And in addition, when we
pick out the winners, the effect sizes of those voxels look much larger than they actually are. The next factor
is flexibility in design, analysis choices, and outcomes. And in neuroimaging there are many analysis
pathways and options. And we'll look at some of the impact of that and false positive findings as well. 

Next, is financial interest. So, it's financial interest and also prestige interest. We all have a stake in supporting
the things that we want to be true. And many of us have a stake, our status, and maybe our self-worth, is on
the line when we test our scientific findings and we want our favorite effect to pan out.  So we really have to
guard against this. 

It's costly to be wrong in many cases, even if we don't think we're biased and  wehave the best intentions, and
we really have to keep that in mind. And one of the interesting related findings related to this is that there was
a meta-analysis published that showed that significant predictor of how large a drug effect is in a clinical trial,
is whether a drug company funded that study. 

And finally, the competition to publish. In hot fields, there tend to be more false positives, and possibly in
more prestigious journals as well. So don't we all want to publish in Nature and Science. Well, to publish in
Nature and Science, I don't know how to publish in Nature and Science, but if I did, I might say that you have
to have something that's both very surprising. You have to have a very surprising finding. Let's look now at
selection bias in more detail, and this is going to be the core of a number of problems that we have to deal
with in neuroimaging, and they're resolvable. And so first of all, studies and tests with positive findings are
reported and those without positive findings are ignored. This is called the file drawer problem. But there are
multiple types of selection biases, and they include the publication bias that I talked about a minute ago. Also
biases in experiments, inflexibility in which subjects to include, how many subjects to run, when you stop
running subjects, and all those sources of flexibility, if you're choosing them after you observe the data, are
sources of false positives and increase inflated effect sizes. Flexibility in the model, which outcome do you
choose, which covariance do you include, and which modeling procedures do you employ? And finally, the
problem of voxel selection bias. So which voxels do I choose out of the many voxels that I tested to actually
look at and interpret. 

So this is the interpretability crisis now, we'll talk a little bit about this.  And this relates to findings that might
be true but are not meaningful. Here are a couple of the classic papers that deal with inferences about
cognitive neuroscience, and what we can infer about function from looking at brain activity. And the first one
is a really old paper now by Martin Sarter and Berntson and Cacioppo called Toward Strong Inference in
Attributing Function to Structure, and they discuss some of the difficulties in this inferential
process. Foreshadowing a lot of these debates later on. And secondly, this is an article by Max
Coltheart, which spawned a series of responses, and what he asked is, perhaps functional neuroimaging hasn't
told us anything about the mind so far. He challenged the field to come up with an example where brain
imaging supported a cognitive theory, or was critical evidence in deciding whether a cognitive theory is true
or false, or which theory is true. 

And there were number of responses, and there was a redux of this later by Mara Mather and colleagues. And
I invite you to go and look up some of those papers for many interesting answers,  and we'll talk about a
couple examples later in the course. 

So the problem of interpretability relates to the problem of specificity, and I'll illustrate this with this map. So
if you look at this map here, it's a typical brain imaging map, it's a group analysis from about 30 subjects. And
the question is, what can you tell about that map, what does it mean in terms of psychology or function? 

And so, you know, in principle we should be able to look at such maps if we're experts in the brain, we should
be able to say something about what task is being done at least. 

So let's look at the map, well I see anterior cingulate there. 

I see the insula, mid and anterior insula. So, I'm a pain researcher, in part, and so I can say, well, maybe this is
pain because pain activates the anterior and cingulate insula. There's thalamus, the metathalamus also
activated by pain. It's looking pretty good. And this is the secondary metasensory cortex which is another area
that turns out to be pretty specifically activated by pain. And so, I'm doing a pretty good job of brain reading,
right? There's some primary somatosensory cortex. And now let's look at a database of studies. Now this is a
database that was built by Tal Yarkoni when he was in my lab a couple years ago, and it's a database, now, of
nearly 10,000 studies. Each of those blue dots a reported coordinate from one study, a published study. 

So what we can do with this is look at all of the different studies, and we can say, what are the top hits? If we
feed in this brain map, what are the topics that are associated with this brain map, that looks like this brain
map. And what we get out of this, the top hits are noxious, heat, somatosensory, painful, sensation,
stimulation, muscle, temperature. So this increases my confidence that this is really a pain map,
right? Everybody on board? Great. The problem is, this is not a pain map. This map came from looking at the
faces of people who rejected you. So this is a romantic rejection related map. 

So, if we looked at all this evidence, and you believed my brain reading, and you were fooled, then you're like
many of us. And the point is, it's very difficult to actually infer what somebody is doing or experiencing based
on their brain map. And there are many cases where we can be confused. 

So this is the problem with specificity, in a nutshell. I started with the anterior cingulate and anterior
insula. And if we look at those two areas, those are often used to infer the presence of pain or other emotions,
depending on what the purpose is of the interpreter. But here's the base rate of activation across about 3,500
studies. And the higher, the more likely it is that the area is activated across many tasks. And what you can
see here is, no matter what kind of task people are doing across these thousands of studies, the anterior
cingulate and the anterior insula are the most frequently activated areas in the brain. So, just getting activation
somewhere in those areas has, arguably, the least amount of information about what somebody is actually
experiencing. 

So, this areas are not really specific for pain or any other type of affect at this level of analysis. So, what's
happening here is when we interpret a brain image and we say, the insula is active, that must be discussed. Or
that must be love or something else. We're implicitly treating those brain images as a marker, as a
biomarker. And a biomarker is an objectively measured process that serves as a measure of some other mental
experience or process. So for brain finding, like cingulate activity, is use as a biomarker, then activation of
that pattern is assumed to imply the presence of that state, like pain, or decision conflict, or anything else. 

So, if we look at the literature, what's happening is that people are using fMRI activity as a marker for many
different processes. This happens in the popular press as I showed you, and it also happens a lot in the
scientific literature. So we think that we have markers for our reward, that's the nucleus accumbens, or value,
with the medial prefrontal cortex, memory, pain, etc. Here are a couple of common ones. Amygdala activity is
often taken as an indicator of negative emotion. 

And pain processing activity in the areas that you see here is often taken as 

an indicator for pain. So, if a drug treatment or a psychological treatment influences those markers, I might
infer that pain or emotion is influenced. 

But, as we just saw, this is not a valid inference. 

So, there are not yet biomarkers for any of these processes. 

So, let's look at a little bit more systematically about why not. Why are brain maps not biomarkers? So, we
have a couple of problems. We have a problem of definition and replication, and what this means is
every time you get amygdala response, activity, it can be a little bit different. The voxels could be
different, the relative levels of activity in the voxel could be different. And the amygdala is a very small
structure but actually contains hundreds and hundreds of standard-sized voxels. So there's tremendous
flexibility from study to study in picking results out that seem to support the hypothesis. 

What we need is exact replications, not only in which voxels but of the relative activity and magnitudes of
activity in the different voxels. And we'll deal with that later when we talk about machine learning and multi-
varying pattern analysis. But for now, this current state of affairs reflects a lack of exact replication at the
spatial pattern level, at the voxel level. 

That causes a lot of problems. Number two is, most of the results that we get from studies are group
maps. We don't apply them to individual cases, individual people, and they're not validated for
application. And before we really start to use imaging to say something about a person, what a person's
thinking, or feeling, or experiencing, we need to validate them at this level of the individual person. We need
to apply them to individuals. 
Three is a problem of diagnostic value. 

So this relates to the specificity problem I talked about a moment ago. But if you can think of diagnostic
value, if you go back to the earlier lecture, in terms of Bayes' Rule. So what we'd like to know is, what's the
probability of experiencing a psychological event, like pain, given the presence of any particular brain
marker, like anterior cingulate activity. So that's probably a sight given brain. And that's called the positive
predictive value of the brain image as a test. And, this breaks down into two related problems. One, is a
problem of sensitivity. We need to know how big the effects of the manipulations are and is it really
reliable. And this relates to the probability of observing that brain marker, anterior cingulate, given that I'm in
pain. And if that effect is really strong I have high sensitivity. But we don't typically quantify how strong
those effects actually are. 

And secondly, there's the problem of specificity, that I mentioned a moment ago. And this relates to whether
these observed patterns are actually specific enough to be used for those bio markers or what they're specific
for. What class of events? In Bayesian terms, this relates to the probability of observing that brain marker in
the absence of the psychological event. Like in the absence of pain. And as we saw from the earlier plot, with
the anterior cingulate and anterior insula, that specificity is extremely low. Because probably of activation in
the absence of pain is just about as great as the probability in the presence of pain. 

So finally, we'll talk about the translation crisis. And all of these previous problems feed into the problem of
translation. Because to have practical applications, we need things that are replicable, things that work, things
that can be applied to individual cases and say something meaningful about the psychological status or
clinical status of a person. 

And so that's why this is coming last because we have to solve many of these other problems before we can
really address the translation crisis. And this is the article that influenced me quite a bit by Kapur, Phillips,
and Insel. The title is, Why has it taken so long for biological psychiatry to develop clinical tests and what do
we do about it? And there's this feeling that, in science there's lots of research, in fact, here is a plot of
emotion studies that we gathered. It's 163 studies of emotion. Yellow is perceived emotions, red are
experienced emotions. So if you want to know where the emotional brain is, there it is. It's everywhere in the
brain. Of course, these are actually not random activations, but it's very difficult to sort through exactly which
patterns of activation are related to which outcomes. So we have an accumulation of findings, but we don't
have markers that we can apply to a person and say, this is how sad a person is, or this is whether a person  is
angry or not, or whether a person is in pain or not, and so forth. So some of the causes of our translation crisis
are, the lack of brain patterns sensitive and specific to clinical features or particular clinical outcomes, lack of
application to the individual-person level, and finally, when we develop these brain maps, we don't usually
share them across sites and test how they work in new studies, new samples, new populations. So that's
another critical piece. We have to take these brain maps, instead of just putting them in a study, and
then publication as the end goal, that's really the starting point for developing, and testing, and refining these
measures for increasing clinical and translational utility, if they work. 

So there are many positive responses, and I think that partly this series of debates has raised awareness about
the issues and has led to a lot of positive responses by the community. I'll just highlight a few of them here at
the end of this module. So, first of all, the broad criticism has some good elements to it, some positive
features. Improves awareness. Improves changes in practice that are being more widely. Implemented and
talked about. But there are also some negative implications. So, there's a wide spread criticism of science that
really goes beyond the real problems and problematic studies and it gets applied very broadly. And this is a
very dangerous thing because the truth is that it's very difficult, even in the best case, to get it right. And it
takes a lot of resources, and thinking, and expertise. And we have to build on the successful cases and test
them and refine them. And there are always going to be cases where things seem very promising and they
don't pan out. But this doesn't mean that the scientific process is not working. We just need to make it work as
well as we can. 
Let's look, again, at some of the issues of selection bias and some of the resolutions, some of the solutions to
this. So let's look at the file drawer problem. One solution that's been adopted is national registries or pre-
registered trials, as well as outlets for publication of null findings. 

Flexibility in the use and publication of experiments has also been addressed, in part, by study pre-
registration. In some cases, data embargoes which are standard in clinical trials. You can't look at the data
until after you've collected a certain number. And you have to stop when you say you're going to stop
collecting it. Also perspective data sharing is becoming increasingly popular in neuroimaging. And that
allows us to share data as it is coming in and evaluate whether a particular experiment is panning out or not. 

Flexibility at the model level is handled, in part, by standardized pipelines that are applied consistently across
studies. That doesn't mean it should be one size fits all. But having a standard pipeline is really helpful to
avoid flexibility. 

And, making principal choices ahead of time and using those choices instead of looking backwards based on
your results and trying to change your pipeline to get better results. Also study pre-registration and blinding of
experimenters to the research hypothesis is also helpful in that way. 

And finally, the voxel selection bias 

problem can be ameliorated by true a priori hypotheses, that can be derived from meta-analyses of literature,
for example. And, attempts to do exact replication and the sharing of maps and findings in electronic form, so
they can be tested precisely across laboratories. And there's a number of positive responses by the community
more broadly. So there are homes and funding for replication of null results like PsychFileDrawer project,
and the Center for Open Science, and Reproducibility Project. There's a new focus in many journals on
replicability and Registered Reports. Some of them are publishing null findings, in some cases more
often, and some of them are accepting Registered Reports. And finally, open, online platforms for conducting
replications and research like this platform PsyTurk from Todd Gureckis and colleagues. In your imaging
community, there are a number of positive responses that include collaborative efforts and consortium sharing
efforts, data sharing efforts like the OpenfMRI Project, The ADNI for Alzheimer's, 1,000 Functional
Connectomes, the ABIDE project for autism, ADHD200, Human Connectome Project, and others that are
coming down the pipeline. 

And there's also a broad criticism of heuristic reverse inference. That's making these claims based on simply
reading a map with your eyes, without formal assessment of positive predictive value. And increased efforts
at making more formalized quantitative reverse inference. And these efforts include the brain decoding or
machine learning literature that we'll talk about later in the course and other approaches as well. And finally,
an important thing is, these efforts to develop and share biomarkers and patterns of activity across
laboratories, which is critical for evaluating how they hold up in different populations. 

So that's the end of this module, thank you. 

You might also like