by Jus tin Morgens tern | Publis hed February 8, 2021 - Updated February 3, 2021

When we talk about diagnostic tests, we are obsessed with sensitivities and
specificities. In many papers, they are the only numbers reported. When we
discuss diagnostic tests at conferences, sensitivity and specificity are
frequently the only numbers mentioned. Even on First10EM, I have frequently
given sensitivity and specificity the leading role when discussing diagnostic
tests. Based on the “spIN” and “snOUT” mnemonics, sensitivity and specificity
seem straight forward. We have been taught that sensitivity will help us rule
disease out and specificity will help us rule disease in. In turns out, that is a
complete lie. Most of us don’t really understand what sensitivity and
specificity mean, and it has been hurting our patients. 

Imagine a patient with a possible subarachnoid hemorrhage (SAH). Based on

their presentation, you figure they have about a 10% chance of ultimately
being diagnosed with SAH. Imagine a new decision rule that has a 90%
sensitivity for subarachnoid hemorrhage. Given that we want to rule this
disease out, that sounds promising. The rule only has a 10% specificity, but
you figure you can work with a few false positives if the 90% sensitivity helps
you rule out SAH. So, how much does the 90% sensitivity decrease your
Let’s do the math, but to keep it simple, I will perform the calculations using
pictures. Imagine 100 patients coming to the emergency department with
headaches. Based on your assessment, you think 10 (or 10%) of these patients
will rule in for subarachnoid hemorrhage. 

Your decision rule is 90% sensitive, so it will identify 9 out of the 10 patients
with disease. That sounds promising. It sounds like this rule could be helpful.
If we focus on just the sensitivity, it looks like we will only miss 1 patient in
100, which might be good enough.

This is usually where we stop in medicine. When attempting to rule a disease

out, we only look at the sensitivity. That is certainly how I was taught.
However, let’s consider the impact of the 10% specificity. There are 90 healthy
patients in this cohort, and the 10% specificity means that 81 of them will fail
There are 90 patients with ‘positive’ tests and 9 of them have a subarachnoid
hemorrhage. In other words, if you fail the decision rule, you have a 10%
chance of having a subarachnoid hemorrhage. There are 10 patients with
‘negative’ tests, and 1 has a subarachnoid hemorrhage. In other words, if you
pass the decision rule, you have a 10% chance of having subarachnoid

This was a test with a 90% sensitivity. It was supposed to help us rule out
disease. Instead, we have the exact same chance of disease before and after
the test, no matter what the result!

When this was first explained to me, my mind was absolutely blown.
Everything I had been taught about sensitivity and specificity was a lie.
Sensitivity is supposed to help rule disease out (snOUT). How is it possible
that a test with a 90% sensitivity (significantly better than many of the tests
we use every day in emergency medicine) didn’t change the patient’s chance
It turns out, you can’t consider just the sensitivity or just the specificity in
isolation. Although that is exactly how we talk about these measures, they are
absolutely useless on their own. In order to figure out whether a test is
helpful, you have to consider both sensitivity and specificity together, or – as I
will suggest – use a more useful numbers like likelihood ratios, and just stop
talking about sensitivity and specificity altogether.

We make this mistake all the time in medicine. We adopt tests based on just
the sensitivity or just the specificity. We use these tests, but clearly we don’t
understand how they really work. Consider the Ottawa subarachnoid
hemorrhage rule. Based on an excellent sensitivity, there are many who are
widely pushing its use. However, the actual numbers for the Ottawa
subarachnoid hemorrhage rule are a sensitivity of approximately 100% (with
95% confidence intervals down to 95-97%) and a specificity between 7.5 and
15%. (Perry 2017; Bellolia 2014; Chu 2018; Perry 2020) I just demonstrated
that a test with a 90% sensitivity and 10% specificity is completely useless;
does not change a patient’s chance of subarachnoid hemorrhage at all. Does
this rule sound much better?

This shouldn’t have come as a surprise. By definition, sensitivity and

specificity are clinically useless. Sensitivity is defined as the percentage of
patients with a disease who are accurately identified by a positive test. It’s a
measure of the accuracy of a test in a group of patients known to have the
disease. Clinically, we don’t know if a patient has a disease. That is exactly
why we are ordering a test. So the very definition of sensitivity tells us that it
is not a measure we should be applying in a clinical setting. 

Predictive values can also be misleading

When we order tests, what we really want to know is, if the test is positive,
what are the chances that this patient actually has the disease? Or,
conversely, if the test is negative, what are the chances that the patient doesn’t
have the disease? The positive and negative predictive values, respectively,
tell us exactly that. If the positive predictive value is 95%, and the patient
tests positive, there is a 95% chance that the patient has the disease.

This sounds like the perfect measure. It appears to tell us exactly what we
need to know as clinicians. Unfortunately, the predictive values have a fatal
flaw: they are inherently tied to the prevalence of the disease in the patients
you are testing. You can’t generalize the number from one group to another.
Just because a study states that a test has a negative predictive value of 99%
doesn’t mean that it will be 99% for your patient, and that is obviously a

This is best understood with a simple example. Imagine using a coin flip to
decide whether a patient has a pulmonary embolism (PE). In an emergency
department setting, where 10% of patients being tested have a PE, when the
coin comes up heads or “positive”, 10% of patients will have a PE, so the 4/9
or “negative” 10% of patients have PE, and so the negative predictive value of
the coin flip is 90%. In this setting, it is pretty obvious that the coin flip is not
very good at diagnosing PE.

However, imagine that I decide to test the exact same coin flip in a PE follow
up clinic, where 100% of patients are known to have PE. Now, when my coin
flip comes up heads, 100% of patients have PE. My coin flip has a 100%
positive predictive value for pulmonary embolism! I could probably get this
published in a major medical journal (if the test was more expensive and
someone was going to profit).

Conversely, if I decide to test my coin in asymptomatic individuals visiting

their doctor for a yearly physical, I can generate the opposite results. Now,
when the coin comes up tails, 0% of patients have PE, so my coin flip has a
negative predictive value of 100%! It’s a perfect test – except obviously it’s

So predictive values can also be very misleading. These examples sound

extreme, but are well represented in the medical literature. We have tested
coronary CT angiograms in populations where 0% of patients have bad
outcomes, and then gleefully proclaimed that CCTA has an amazing negative
predictive value. Hopefully it is now obviously why such statements are

Although predictive values are closer to what we want to know when working
clinically, they can clearly still be very misleading. Like the sensitivity and
specificity, I think we would be better off if we just stopped talking about
these numbers. 

Likelihood ratios: the diagnostic number

that really matters
We need a measure that incorporates the risk of the patient in front of us and
tells us how much that risk changes when the test is positive or negative. The
solution is likelihood ratios. 

A likelihood ratio (as is implied by the name) is a ratio of two different

probabilities: the probability of a patient with a condition having a given test
result divided by the probability of a patient without a condition having the
given test result. (The only difference between the positive and negative
likelihood ratio in this formula is whether you are talking about the test result
At face value, this sounds a little complicated, but the result is exactly what
we need clinically. When working clinically, we want to know what a test
result means for the specific patient in front of us. The likelihood ratio will
give you a number that adjusts your pre-test probability into exactly what
you want: the chance that this specific patient has the disease given the test
result you just got back.

The overall concept is very easy. You take your pretest probability and
multiply it by the likelihood ratio and you get the posttest probability.
Unfortunately, the math gets a little complex, because it uses odds, but the
basic concept is simple. If you multiple by 1, your odds don’t change at all, so
a test with a likelihood ratio of 1 is completely useless. If you multiple by a big
number (say bigger than 10) then your chances of disease go up by a lot. If
you multiple by a small number (say smaller than 0.1) then your chances of
disease go down by a lot. 

If you want to get more specific than that, you can use the Fagan nomogram. It
is incredibly easy. You just start with your pretest probability on the left,
probability on the right. Better yet, these days you can just use one of the
many online calculators.

Bottom line
Sensitivity and specificity have been lying to us. The spIN / snOUT mnemonic
that we all learned is incorrect. Sensitivity cannot be considered without
specificity, and specificity cannot be considered without sensitivity. These
numbers are counter-intuitive and don’t provide us with the information that
we need clinically. We should stop using them.

When using a diagnostic test, you must first know your patient’s pretest
probability. Once you know the pretest probability, it is the likelihood ratio
that will give you the information you need.

Bellolio MF, Hess EP, Gilani WI, VanDyck TJ, Ostby SA, Schwarz JA, Lohse CM,
Rabinstein AA. External validation of the Ottawa subarachnoid hemorrhage
clinical decision rule in patients with acute headache. Am J Emerg Med. 2015
Feb;33(2):244-9. doi: 10.1016/j.ajem.2014.11.049. Epub 2014 Dec 3. PMID:

Chu KH, Keijzers G, Furyk JS, et al. Applying the Ottawa subarachnoid
haemorrhage rule on a cohort of emergency department patients with
headache. Eur J Emerg Med. 2018;25(6):e29-e32.
doi:10.1097/MEJ.0000000000000523 PMID: 29215380

Perry JJ, Sivilotti MLA, Sutherland J, et al. Validation of the Ottawa

Subarachnoid Hemorrhage Rule in patients with acute headache [published
correction appears in CMAJ. 2018 Feb 12;190(6):E173]. CMAJ.
Perry JJ, Sivilotti MLA, Émond M, et al. Prospective Implementation of the

Ottawa Subarachnoid Hemorrhage Rule and 6-Hour Computed Tomography
Rule. Stroke. 2020;51(2):424-430. doi:10.1161/STROKEAHA.119.026969 PMID:

Worster A, Carpenter C. A brief note about likelihood ratios. CJEM. 10(5):441-

2. 2008. PMID: 18826732

Cite this article as: Justin Morgenstern, "The sensitivity and specificity are
lying to you", First10EM blog, February 8, 2021. Available at:

Justin Morgenstern
Emergency doctor working in the community. FOAM enthusiast. Evidence based

medicine junkie. “One special advantage of the skeptical attitude of mind is that
a man is never vexed to find that after all he has been in the wrong.” - William
337 P OS TS


