The Sensitivity and Specificity Are Lying To You - First10EM

8/2/2021 The sensitivity and specificity are lying to you - First10EM
E V ID E N C E B A S E D M E D IC IN E / F R O N T PA G E SUBSC RIBE
The sensitivity and speci city are lying to Join 17,734 other
you
subscribers
Email Address
SUBSCRIBE
C AT E G O R I E S
Select Category
by Jus tin Morgens tern | Publis hed February 8, 2021 - Updated February 3, 2021
When we talk about diagnostic tests, we are obsessed with sensitivities and
specificities. In many papers, they are the only numbers reported. When we
discuss diagnostic tests at conferences, sensitivity and specificity are
frequently the only numbers mentioned. Even on First10EM, I have frequently
given sensitivity and specificity the leading role when discussing diagnostic
tests. Based on the “spIN” and “snOUT” mnemonics, sensitivity and specificity
seem straight forward. We have been taught that sensitivity will help us rule
disease out and specificity will help us rule disease in. In turns out, that is a
complete lie. Most of us don’t really understand what sensitivity and
specificity mean, and it has been hurting our patients.
Imagine a patient with a possible subarachnoid hemorrhage (SAH). Based on

their presentation, you figure they have about a 10% chance of ultimately
being diagnosed with SAH. Imagine a new decision rule that has a 90%
sensitivity for subarachnoid hemorrhage. Given that we want to rule this
disease out, that sounds promising. The rule only has a 10% specificity, but
you figure you can work with a few false positives if the 90% sensitivity helps
you rule out SAH. So, how much does the 90% sensitivity decrease your
patient’s chance of having a subarachnoid hemorrhage if they pass the rule?
https://first10em.com/the-sensitivity-and-specificity-are-lying-to-you/ 1/9
Let’s do the math, but to keep it simple, I will perform the calculations using
pictures. Imagine 100 patients coming to the emergency department with
headaches. Based on your assessment, you think 10 (or 10%) of these patients
will rule in for subarachnoid hemorrhage.
Your decision rule is 90% sensitive, so it will identify 9 out of the 10 patients
with disease. That sounds promising. It sounds like this rule could be helpful.
If we focus on just the sensitivity, it looks like we will only miss 1 patient in
100, which might be good enough.
This is usually where we stop in medicine. When attempting to rule a disease

out, we only look at the sensitivity. That is certainly how I was taught.
However, let’s consider the impact of the 10% specificity. There are 90 healthy
patients in this cohort, and the 10% specificity means that 81 of them will fail
the decision rule, or be false positives. Now, we can start to see the problem.
There are 90 patients with ‘positive’ tests and 9 of them have a subarachnoid
hemorrhage. In other words, if you fail the decision rule, you have a 10%
chance of having a subarachnoid hemorrhage. There are 10 patients with
‘negative’ tests, and 1 has a subarachnoid hemorrhage. In other words, if you
pass the decision rule, you have a 10% chance of having subarachnoid
hemorrhage.
This was a test with a 90% sensitivity. It was supposed to help us rule out
disease. Instead, we have the exact same chance of disease before and after
the test, no matter what the result!
When this was first explained to me, my mind was absolutely blown.
Everything I had been taught about sensitivity and specificity was a lie.
Sensitivity is supposed to help rule disease out (snOUT). How is it possible
that a test with a 90% sensitivity (significantly better than many of the tests
we use every day in emergency medicine) didn’t change the patient’s chance
of disease at all?!
It turns out, you can’t consider just the sensitivity or just the specificity in
isolation. Although that is exactly how we talk about these measures, they are
absolutely useless on their own. In order to figure out whether a test is
helpful, you have to consider both sensitivity and specificity together, or – as I
will suggest – use a more useful numbers like likelihood ratios, and just stop
talking about sensitivity and specificity altogether.
We make this mistake all the time in medicine. We adopt tests based on just
the sensitivity or just the specificity. We use these tests, but clearly we don’t
understand how they really work. Consider the Ottawa subarachnoid
hemorrhage rule. Based on an excellent sensitivity, there are many who are
widely pushing its use. However, the actual numbers for the Ottawa
subarachnoid hemorrhage rule are a sensitivity of approximately 100% (with
95% confidence intervals down to 95-97%) and a specificity between 7.5 and
15%. (Perry 2017; Bellolia 2014; Chu 2018; Perry 2020) I just demonstrated
that a test with a 90% sensitivity and 10% specificity is completely useless;
does not change a patient’s chance of subarachnoid hemorrhage at all. Does
this rule sound much better?
This shouldn’t have come as a surprise. By definition, sensitivity and

specificity are clinically useless. Sensitivity is defined as the percentage of
patients with a disease who are accurately identified by a positive test. It’s a
measure of the accuracy of a test in a group of patients known to have the
disease. Clinically, we don’t know if a patient has a disease. That is exactly
why we are ordering a test. So the very definition of sensitivity tells us that it
is not a measure we should be applying in a clinical setting.
Predictive values can also be misleading

When we order tests, what we really want to know is, if the test is positive,
what are the chances that this patient actually has the disease? Or,
conversely, if the test is negative, what are the chances that the patient doesn’t
have the disease? The positive and negative predictive values, respectively,
tell us exactly that. If the positive predictive value is 95%, and the patient
tests positive, there is a 95% chance that the patient has the disease.
This sounds like the perfect measure. It appears to tell us exactly what we
need to know as clinicians. Unfortunately, the predictive values have a fatal
flaw: they are inherently tied to the prevalence of the disease in the patients
you are testing. You can’t generalize the number from one group to another.
Just because a study states that a test has a negative predictive value of 99%
doesn’t mean that it will be 99% for your patient, and that is obviously a
problem.
This is best understood with a simple example. Imagine using a coin flip to
decide whether a patient has a pulmonary embolism (PE). In an emergency
department setting, where 10% of patients being tested have a PE, when the
coin comes up heads or “positive”, 10% of patients will have a PE, so the
positive predictive value of my coin flip is 10%. When the coin comes up tails
or “negative” 10% of patients have PE, and so the negative predictive value of
the coin flip is 90%. In this setting, it is pretty obvious that the coin flip is not
very good at diagnosing PE.
However, imagine that I decide to test the exact same coin flip in a PE follow
up clinic, where 100% of patients are known to have PE. Now, when my coin
flip comes up heads, 100% of patients have PE. My coin flip has a 100%
positive predictive value for pulmonary embolism! I could probably get this
published in a major medical journal (if the test was more expensive and
someone was going to profit).
Conversely, if I decide to test my coin in asymptomatic individuals visiting

their doctor for a yearly physical, I can generate the opposite results. Now,
when the coin comes up tails, 0% of patients have PE, so my coin flip has a
negative predictive value of 100%! It’s a perfect test – except obviously it’s
not.
So predictive values can also be very misleading. These examples sound

extreme, but are well represented in the medical literature. We have tested
coronary CT angiograms in populations where 0% of patients have bad
outcomes, and then gleefully proclaimed that CCTA has an amazing negative
predictive value. Hopefully it is now obviously why such statements are
ridiculous.
Although predictive values are closer to what we want to know when working
clinically, they can clearly still be very misleading. Like the sensitivity and
specificity, I think we would be better off if we just stopped talking about
these numbers.
Likelihood ratios: the diagnostic number

that really matters
We need a measure that incorporates the risk of the patient in front of us and
tells us how much that risk changes when the test is positive or negative. The
solution is likelihood ratios.
A likelihood ratio (as is implied by the name) is a ratio of two different

probabilities: the probability of a patient with a condition having a given test
result divided by the probability of a patient without a condition having the
given test result. (The only difference between the positive and negative
likelihood ratio in this formula is whether you are talking about the test result
being positive or negative.)
At face value, this sounds a little complicated, but the result is exactly what
we need clinically. When working clinically, we want to know what a test
result means for the specific patient in front of us. The likelihood ratio will
give you a number that adjusts your pre-test probability into exactly what
you want: the chance that this specific patient has the disease given the test
result you just got back.
The overall concept is very easy. You take your pretest probability and
multiply it by the likelihood ratio and you get the posttest probability.
Unfortunately, the math gets a little complex, because it uses odds, but the
basic concept is simple. If you multiple by 1, your odds don’t change at all, so
a test with a likelihood ratio of 1 is completely useless. If you multiple by a big
number (say bigger than 10) then your chances of disease go up by a lot. If
you multiple by a small number (say smaller than 0.1) then your chances of
disease go down by a lot.
If you want to get more specific than that, you can use the Fagan nomogram. It
is incredibly easy. You just start with your pretest probability on the left,
draw a line through your likelihood ratio, and it tells you your posttest
probability on the right. Better yet, these days you can just use one of the
many online calculators.
Bottom line
Sensitivity and specificity have been lying to us. The spIN / snOUT mnemonic
that we all learned is incorrect. Sensitivity cannot be considered without
specificity, and specificity cannot be considered without sensitivity. These
numbers are counter-intuitive and don’t provide us with the information that
we need clinically. We should stop using them.
When using a diagnostic test, you must first know your patient’s pretest
probability. Once you know the pretest probability, it is the likelihood ratio
that will give you the information you need.
References
Bellolio MF, Hess EP, Gilani WI, VanDyck TJ, Ostby SA, Schwarz JA, Lohse CM,
Rabinstein AA. External validation of the Ottawa subarachnoid hemorrhage
clinical decision rule in patients with acute headache. Am J Emerg Med. 2015
Feb;33(2):244-9. doi: 10.1016/j.ajem.2014.11.049. Epub 2014 Dec 3. PMID:
25511365
Chu KH, Keijzers G, Furyk JS, et al. Applying the Ottawa subarachnoid
haemorrhage rule on a cohort of emergency department patients with
headache. Eur J Emerg Med. 2018;25(6):e29-e32.
doi:10.1097/MEJ.0000000000000523 PMID: 29215380
Perry JJ, Sivilotti MLA, Sutherland J, et al. Validation of the Ottawa

Subarachnoid Hemorrhage Rule in patients with acute headache [published
correction appears in CMAJ. 2018 Feb 12;190(6):E173]. CMAJ.
2017;189(45):E1379-E1385. doi:10.1503/cmaj.170072 PMID: 29133539
Perry JJ, Sivilotti MLA, Émond M, et al. Prospective Implementation of the

Ottawa Subarachnoid Hemorrhage Rule and 6-Hour Computed Tomography
Rule. Stroke. 2020;51(2):424-430. doi:10.1161/STROKEAHA.119.026969 PMID:
31805846
Worster A, Carpenter C. A brief note about likelihood ratios. CJEM. 10(5):441-

2. 2008. PMID: 18826732
Cite this article as: Justin Morgenstern, "The sensitivity and specificity are
lying to you", First10EM blog, February 8, 2021. Available at:
https://first10em.com/the-sensitivity-and-specificity-are-lying-to-you/.
Share this:
        
Like this:
Like
Be the first to like this.
Justin Morgenstern
Emergency doctor working in the community. FOAM enthusiast. Evidence based
A UTHOR
medicine junkie. “One special advantage of the skeptical attitude of mind is that
a man is never vexed to find that after all he has been in the wrong.” - William
Osler
337 P OS TS
YOU MAY ALSO LIKE
 
Leave a Reply
Enter your comment here...
 C O L C O R O NA : C O L C HIC INE IS NO T A S IL V E … 
© 2021 First 10EM – A ll rig ht s reserved

ž ™ š
Des igned with C us to m iz r P ro

The Sensitivity and Specificity Are Lying To You - First10EM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Sensitivity and Specificity Are Lying To You - First10EM

Uploaded by

Copyright:

Available Formats

8/2/2021 The sensitivity and specificity are lying to you - First10EM

Imagine a patient with a possible subarachnoid hemorrhage (SAH). Based on

This is usually where we stop in medicine. When attempting to rule a disease

This shouldn’t have come as a surprise. By definition, sensitivity and

Predictive values can also be misleading

Conversely, if I decide to test my coin in asymptomatic individuals visiting

So predictive values can also be very misleading. These examples sound

Likelihood ratios: the diagnostic number

A likelihood ratio (as is implied by the name) is a ratio of two different

Perry JJ, Sivilotti MLA, Sutherland J, et al. Validation of the Ottawa

Perry JJ, Sivilotti MLA, Émond M, et al. Prospective Implementation of the

Worster A, Carpenter C. A brief note about likelihood ratios. CJEM. 10(5):441-

YOU MAY ALSO LIKE

© 2021 First 10EM – A ll rig ht s reserved

You might also like