E Explained Intuitively

Better Explained Books And Video Courses
Concrete math lessons without the jargon.

Math, Better Explained
Calculus, Better Explained
An Intuitive (and Short) Explanation of Bayes’ Theorem

Bayes’ theorem was the subject of a detailed article. The essay is good, but over 15,000 words long — here’s the condensed
version for Bayesian newcomers like myself:
Tests are not the event. We have a cancer test, separate from the event of actually having cancer. We have a test for spam,
separate from the event of actually having a spam message.
Tests are flawed. Tests detect things that don’t exist (false positive), and miss things that do exist (false negative). People
often use test results without adjusting for test errors.
False positives skew results. Suppose you are searching for something really rare (1 in a million). Even with a good test,
it’s likely that a positive result is really a false positive on somebody in the 999,999.
People prefer natural numbers. Saying “100 in 10,000″ rather than “1%” helps people work through the numbers with
fewer errors, especially with multiple percentages (“Of those 100, 80 will test positive” rather than “80% of the 1% will test
positive”).
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Even science is a test. At a philosophical level, scientific experiments are “potentially flawed tests” and need to be treated
accordingly. There is a test for a chemical, or a phenomenon, and there is the event of the phenomenon itself. Our tests and
measuring equipment have a rate of error to be accounted for.
Bayes’ theorem converts the results from your test into the real probability of the event. For example, you can:
Correct for measurement errors. If you know the real probabilities and the chance of a false positive and false negative,
you can correct for measurement errors.
Relate the actual probability to the measured test probability. Given mammogram test results and known error rates,
you can predict the actual chance of having cancer given a positive test. In technical terms, you can find Pr(H|E), the
chance that a hypothesis H is true given evidence E, starting from Pr(E|H), the chance that evidence appears when the
hypothesis is true.
Bayes' Theorem Intuition

Watch later Share
Anatomy Of A Test
The article describes a cancer testing scenario:
1% of women have breast cancer (and therefore 99% do not).

80% of mammograms detect breast cancer when it is there (and therefore 20% miss it).
9.6% of mammograms detect breast cancer when it’s not there (and therefore 90.4% correctly return a negative result).
Put in a table, the probabilities look like this:
How do we read it?
1% of people have cancer
If you already have cancer, you are in the first column. There’s an 80% chance you will test positive. There’s a 20% chance
you will test negative.
If you don’t have cancer, you are in the second column. There’s a 9.6% chance you will test positive, and a 90.4% chance
you will test negative.
How Accurate Is The Test?

Now suppose you get a positive test result. What are the chances you have cancer? 80%? 99%? 1%?
Here’s how I think about it:
Ok, we got a positive result. It means we’re somewhere in the top row of our table. Let’s not assume anything — it could be
a true positive or a false positive.
The chances of a true positive = chance you have cancer * chance test caught it = 1% * 80% = .008
The chances of a false positive = chance you don’t have cancer * chance test caught it anyway = 99% * 9.6% = 0.09504
The table looks like this:
And what was the question again? Oh yes: what’s the chance we really have cancer if we get a positive result. The chance of an
event is the number of ways it could happen given all possible outcomes:
The chance of getting a real, positive result is .008. The chance of getting any type of positive result is the chance of a true
positive plus the chance of a false positive (.008 + 0.09504 = .10304).
So, our chance of cancer is .008/.10304 = 0.0776, or about 7.8%.
Interesting — a positive mammogram only means you have a 7.8% chance of cancer, rather than 80% (the supposed accuracy of
the test). It might seem strange at first but it makes sense: the test gives a false positive 9.6% of the time (quite high), so there
will be many false positives in a given population. For a rare disease, most of the positive test results will be wrong.
Let’s test our intuition by drawing a conclusion from simply eyeballing the table. If you take 100 people, only 1 person will have
cancer (1%), and they’re most likely going to test positive (80% chance). Of the 99 remaining people, about 10% will test
positive, so we’ll get roughly 10 false positives. Considering all the positive tests, just 1 in 11 is correct, so there’s a 1/11 chance
of having cancer given a positive test. The real number is 7.8% (closer to 1/13, computed above), but we found a reasonable
estimate without a calculator.
Bayes’ Theorem
We can turn the process above into an equation, which is Bayes’ Theorem. It lets you take the test results and correct for the
“skew” introduced by false positives. You get the real chance of having the event. Here’s the equation:
And here’s the decoder key to read it:
Pr(H|E) = Chance of having cancer (H) given a positive test (E). This is what we want to know: How likely is it to have
cancer with a positive result? In our case it was 7.8%.
Pr(E|H) = Chance of a positive test (E) given that you had cancer (H). This is the chance of a true positive, 80% in our
case.
Pr(H) = Chance of having cancer (1%).
Pr(not H) = Chance of not having cancer (99%).
Pr(E|not H) = Chance of a positive test (E) given that you didn’t have cancer (not H). This is a false positive, 9.6% in our
case.
Try it with any number:
Actual probability 1%
Prob true positive 80%
Prob false positive 9.6%
Chance positive test means
positive result
7.76 %
POWERED BY
It all comes down to the chance of a true positive divided by the chance of any positive. We can simplify the equation to:
Pr(E) tells us the chance of getting any positive result, whether a true positive in the cancer population (1%) or a false positive
in the non-cancer population (99%). In acts like a weighting factor, adjusting the odds towards the more likely outcome.
Forgetting to account for false positives is what makes the low 7.8% chance of cancer (given a positive test) seem counter-
intuitive. Thank you, normalizing constant, for setting us straight!
Intuitive Understanding: Shine The Light
The article mentions an intuitive understanding about shining a light through your real population and getting a test population.
The analogy makes sense, but it takes a few thousand words to get there :).
Consider a real population. You do some tests which “shines light” through that real population and creates some test results. If
the light is completely accurate, the test probabilities and real probabilities match up. Everyone who tests positive is actually
“positive”. Everyone who tests negative is actually “negative”.
But this is the real world. Tests go wrong. Sometimes the people who have cancer don’t show up in the tests, and the other way
around.
Bayes’ Theorem lets us look at the skewed test results and correct for errors, recreating the original population and finding the
real chance of a true positive result.
Bayesian Spam Filtering

One clever application of Bayes’ Theorem is in spam filtering. We have
Event A: The message is spam.

Test X: The message contains certain words (X)
Plugged into a more readable formula (from Wikipedia):
Bayesian filtering allows us to predict the chance a message is really spam given the “test results” (the presence of certain
words). Clearly, words like “viagra” have a higher chance of appearing in spam messages than in normal ones.
Spam filtering based on a blacklist is flawed — it’s too restrictive and false positives are too great. But Bayesian filtering gives us
a middle ground — we use probabilities. As we analyze the words in a message, we can compute the chance it is spam (rather
than making a yes/no decision). If a message has a 99.9% chance of being spam, it probably is. As the filter gets trained with
more and more messages, it updates the probabilities that certain words lead to spam messages. Advanced Bayesian filters can
examine multiple words in a row, as another data point.
Further Reading
There’s a lot being said about Bayes:
Bayes’ Theorem on Wikipedia

Discussion on coding horror
The big essay on Bayes’ Theorem
Have fun!
Join Over 450k Monthly Readers
Enjoy the article? There's plenty more to help you build

a lasting, intuitive understanding of math. Join the
newsletter for bonus content and the latest updates.
Enter your email address
Join for Free Lessons
Facebook Twitter LinkedIn Email Print
Other Posts In This Series

1. A Brief Introduction to Probability & Statistics
2. An Intuitive (and Short) Explanation of Bayes' Theorem
3. Understanding Bayes Theorem With Ratios
4. Understanding the Monty Hall Problem
5. How To Analyze Data Using the Average
6. Understanding the Birthday Paradox
169 Comments BetterExplained Discussions 🔒 Disqus' Privacy Policy 
1 Login
 Recommend 6 t Tweet f Share Sort by Best
Join the discussion…
LOG IN WITH
OR SIGN UP WITH DISQUS ?
Name
koni • 3 years ago

You are 45 years old and now have R100000 to invest. If you put it in a bank, it will earn an interest of R8000. If you invest it in a chemical
plant, you can earn R15000 if the price of the chemical increases. The chance of this happening is 71%. However, if the price drops or stays the
same, you stand to loose R1000. You can also get a recommendation from an expert market analyst to invest (call this advice X1) or not to invest
(advice X2). Available information indicates that in 1000 cases where the price of a chemical went up, the analyst's advice was to invest in 890 of
the cases. In 1000 cases where the price of a chemical dropped or stayed the same, the analyst's advice was not to invest in 688 of the cases.
Using Bayes' rule, calculate the probability that the market will decrease given that the analyst tells you not to invest. Calculate the probability as
a fraction to three decimal places.
2△ ▽ • Reply • Share ›
Rho > koni • 3 years ago • edited

is it 0,719? I had some error in calculation xD but fixed it
△ ▽ • Reply • Share ›
Alokit Jindal • 3 years ago

This is the best description on Bayes Theorem I have read so far. Thank You
kalid Mod > Alokit Jindal • 3 years ago

Thanks, glad it helped.
a s, g ad t e ped.
Adam • 3 years ago

Subject matter is too grim to make enjoyable reading. Would prefer to read about diagnosis of addiction to lollipops or some such. We are here
for the maths after all, not to get depressed.
1△ ▽ 2 • Reply • Share ›
CTR$hill > Adam • 10 months ago

I respectfully disagree, the actual stakes of each of the four scenarios helps me sort them in my mind better than four random/whimsical
things.
Rho > Adam • 3 years ago

lol, well I think its pretty reasonable since "rare disease" (cancer is not that rare though, but it gives a grim feeling like you said) does
have high probability of false positive/false negative, so its pretty convenient to take it as an example, but addiction to lollipops is ok too,
since its pretty reasonable if one who take the observation/survey is not professional and makes mistakes or false assumptions therefore
increasing the likelihood false positive/negative xD
Ben Yi • 6 months ago

There is a fundamental flaw with the breast cancer example. Bayes formula assumes events are INDEPENDENT.
Charlie • 8 months ago

I *vehemently* disagree that this explanation is better than the original article. The original article urged readers to attempt to perform the
calculation themselves, no explanation given, using intuition. And then completely blows intuition out of the water. (And reassuringly pats the
reader on the head and assures them that most doctors at the time got it wrong too.)
I learned Bayesian probability from this in 9th grade (early 2012) in a way that I remembered for my college quantitative methods course four
years later. Because it was absolutely a complete shock to me that my intuition could be so completely and terribly wrong. It is now 2020 and I
*still* insist on asking the original problem to my more logically oriented friends and then explaining the solution with great enthusiasm if they
get it wrong
get it wrong.
The original article is a huge struggle to read, don’t get me wrong. But the way it urges you to keep looking at the question and give it your best
attempts (which, if you are like I was, you still epically fail) and then tries to guide you to understanding feels like learning from a perhaps too
intelligent but very invested mentor. It leaves you really wanting more and to know what it is you did wrong. The way it explains the importance
of prior probability and how universally applicable that is in all of life, how universally applicable Bayesian probability is in all of life, was a truly
profound moment for me and one that I’m unlikely to ever forget.
It legitimately concerns me that in a Google search for the original article, this comes up first. There was so much value in urging people to use
the best of their mental resources first to try a problem, and getting rid of that and that display of how counter-intuitive the mathematical truth
can be robs complete newcomers (like myself in 2012) of having a real reason to want to understand what Bayesian probability is.
see more
Joe • 2 years ago

Great explanation.
I thought how could such a simple formula be so difficult to grasp, but there are actually quite a few ideas to tie together. I particularly like the
mental trick of using whole numbers
Guy Manova • 2 years ago

sweet explanation :)
interested_observer • 2 years ago

REF: “there is the event of the phenomenon itself. Our tests and measuring equipment have some inherent rate of error.
REF: “Bayes’ theorem converts the results from your test into the real probability of the event.”
Bayesian statistics provide the Likelihood (a conditional probability) that the data (e.g. results of medical test or measurement) would have
occurred (been observed) HAD (GIVEN) the null hypothesis (e.g., have or does not have the disease) is true.
Bayesian statistics requires a subjective (however clever or arbitrary or deliberately ‘Noninformative” or “vague”) belief which (although it
updates as data, or additional data, is observed) biases its final result (for better or worse).
An alternative to Bayesian statistics is Frequentist statistics. The following lecture is available on the internet. However my posterori judgement
about the quality of the lecture may have been influenced by my a priori belief in the quality of that educational institution.
“Comparison of frequentist and Bayesian inference.” Class 20, 18.05 Jeremy Orloff and Jonathan Bloom (MIT intro course to Statistics &
Probability)
Aryan Deora • 2 years ago

I love you so much <3 You really made this theorem so easy to understand.
Rho • 3 years ago

Nice!
will • 3 years ago

figure it out
Brandon Robshaw • 3 years ago

This is a brilliantly clear explanation of something that has always puzzled me, and now puzzles me no longer. Thank you!
ken • 3 years ago

very helpful. thank you.
Robert Mogire • 3 years ago

Q2. Given that the patient does not have appendicitis , what is the probability that he exhibits RoVsing's disease ?
Robert Mogire • 3 years ago

Assume RoVsing's sign is a perfect indicator of appendicitis. The probability of Patient getting appendicitis .16. The probability of RoVsing's sign
in appendicitis is 0.0001.Given that the patient has appendicitis , what is the probability that he exhibits RoVsing's disease?
Chris Worthington • 3 years ago

Great, great video. I've seen this formula written down so many times and now I understand it.
Laura Houghton • 3 years ago

80 percent of something relatively minuscule is less than 20 percent of something relatively massive.
80 percent of a mouse is less than 20 percent of an elephant.

jay • 3 years ago

The website and the content matched!
it is betterexplained.
thanks :D
Jimmy • 3 years ago

I have a problem. If bayes theorem is simplified to Pr(Can|pos)=Pr(pos|can)Pr(can) / Pr(pos), how can your numerator only be Pr(pos|can), the
probability of a true positive, which is already calculated by multiplying the chance of getting a positive test and the chance of having cancer, and
not multiply it again by the probability of having cancer as it shows in the equation Pr(can).
I know this is hard to comprehend, so to summarise:

You say that the chance of having cancer given a positive test is
True positive
-------------
Positive
but bayes equation suggests that the chance of a true positive is then multiplied by the chance of cancer in general
True positive x chance of cancer

--------------------------------
positive
Reply asap, i need this for an investigation. Thank you.

van • 3 years ago

This breast cancer example is somewhat perjured and misleading. Let me explain why.
A rational person might assume that the statistics you provide are correct and minimize their response to a positive mammogram result, or
generalize your example as presumptive health care advice. Here is a statistic that would correct your example, possibly making useful:
1 in 8 women will have breast cancer in their lifetime, so it is not a rare disease like the 1 in 100 you characterized it as.
Ascertainment error. When we look for disease we are more likely to find it than when we don't look for it, which can skew the results with
respect to the affected population.
There are two kinds of cancer tests, screening tests and diagnostic tests and they are quite different in their intentions and purpose.
For example in a diagnostic mammogram, the individual may already have other heuristic evidence that is bringing them to the clinic for the
test. Thus the population of patients receiving diagnostic mammograms may show a higher true incidence of breast cancer than the general
population. For that population to use your example could actually be fatal, and here's why.
There are four treatments for breast cancer and patients can choose how many courses of each treatment they want to go through. Since all the
treatments are unpleasant (slash, burn, poison, maintain)=(surgery, radiation, chemo, adjuvant) a patient using an incorrect statistical model of
their personal risk might make a decision that would allow the cancer to remain and eventually kill them. This is a very frequent occurrence in
the case of those who have surgery for breast cancer, but decide to avoid chemo because they don't want their hair to fall out. These people often
die. I'm not trying to shock you or anything but this does happen. Also mammograms come in two flavors, ultrasonic and X-ray imaged. So it is
useful to specify which is applicable. If we take the Cartesian product of [screening, diagnostic] with [ultrasonic, X-ray] there are actually four
relevant but quite different cases that should be examined and lumping them together could actually be less informative than not knowing,
which is paradoxical. All the best, - Van Warren
see more
Rho > van • 3 years ago

Yes, well it's just an example, I think we can just pretend that we live in old age (I don't know exactly when xD, but when cancer was just
"discovered"), that the calculation's likelihood of false positive/negative is still pretty high.
Vasif Vora • 4 years ago

really well done. Thanks! I noticed that formula for P(A/X) as you described at least conceptually matches that for PPV (Positive predictive
values).
Saurabh • 4 years ago

Language in the articles is highly incomprehensible. Write them in simple English.
Pete > Saurabh • 4 years ago

Kalid thanks: this article was pitched just right, and explained a highly misunderstood topic really well.
@Saurabh: The article is written in highly comprehensible English, and lots of non-English speakers seem to be able to understand it just
fine. I don't think it's possible to 'dumb it down' further for people who's English skills are less than 'basic'. It's an article about a
conceptually moderately difficult statistics concept, not a children's story, and it probably couldn't be written more simply. The solution
is to improve your English, not try to force the article into easier words.
Rachana • 4 years ago

Hi Kalid!
I stumbled across your blog quite by chance. And what a revelation it was! Thanks you for demystifying math so beautifully!
I wanted to make a request. Could you please explain explain spread (variability) in data distributions without using formulas?
It would be great if you could throw some light on how spread enhances the meaning of central tendency. And how it can be used to predict the
next value.
Thanks
Faisal Tariq • 4 years ago

What is the answer to this problem? I am sitting at a table where there are two plates filled with colored beads. Plate #1 has 200 green beads and
10 red beads , Plate #2 has 2 green beads and 6 red beads. I am blindfolded and the plates are placed in front of me at random and taken away at
random periodically. I am asked to pick a bead. I end up picking a red bead. Question 1: What is the probability that I have picked the red bead
from Plate #1. Question 2: What is the probability that I have picked the red bead from Plate #2?
Peter • 4 years ago

Thanks, I just needed a quick intuition and that "spam" analogy made it click right away.
Oguz Tanrisever • 4 years ago

Very clear explanation. Thanks for your effort
M.K. Safi • 4 years ago

So in real life, what do doctors do when confronted with a situation like this? What other tools or methods do they use to increase or decrease
the confidence on whether the patient actually has cancer or not?
Bengin Cetindere > M.K. Safi • 2 years ago

what you tipically do is do a second test. If you calculate this through you will see that the 7% will jump up to a much higher value. Or
even a third time :)
RJnaylor • 4 years ago

There is a serious mistake here You should not have said "Ok we got a positive result " Once you do that you know your chances -- it is 80%
There is a serious mistake here ... You should not have said Ok, we got a positive result. Once you do that you know your chances it is 80%
by your own definition. Bayes theorem uses past likelihood to adjust for a current likelihood but without knowing the outcome of the test.
Nick • 4 years ago

Hi Khalid
Agree with most other people. Super article.

One quick question.
How is P(X/A) * P(A) + P(X/Not A) * P (Not A) = P(X) ?
I'm not sure how you got there algebraically?
Thanks again
Nick
Rho > Nick • 3 years ago

yeah they (that P(X|A) stuffs) are both summed up to be the probability of X, but part of the result X is "caused" by A and some other
part is "caused" by A'. Together they sum the probability of X.
Steven Chen • 4 years ago

Great article, thanks a lot!
Chanchana • 4 years ago

Can you relate this to precision and recall? Is finding the chance that a person has cancer given a positive test similar to precision estimation?
Carlos Alas • 4 years ago

Hi Kalid,
Do you have an article on how to think in higher dimensions intuitively?
As always your work is AWESOME! :)

kalid > Carlos Alas • 4 years ago

Thanks Carlos! For higher dimensions, think of them as "properties". A human being has a hair color, eye color, height, weight, age,
name, place of birth, etc. (simultaneously). A dimension is basically how many unique properties an item can posses.
If we insist on only using spatial positions to represent dimensions, then we run out at 3. But if we let ourselves use color, shape, texture,
etc. we can visualize more dimensions.
Carlos Alas > kalid • 4 years ago

Thank you for always being kind in replying.
Great way of seeing it, thank you!

Tyler • 5 years ago

Great quick summary
Asaf • 5 years ago

Wonderful Explanation, very clear and intuitive. Thanks !!!
Matt • 5 years ago

Very helpful and clear!
Gordon • 5 years ago

Gordon • 5 years ago
very clear explanation. thank you.
another useful example is the home pregnancy test.

a positive result has a probability of being correct.
a negative result has a different probability of being correct.
Bharath • 5 years ago

How to calculate the probability of an incorrect prediction.
Amy • 5 years ago

And this one:
Suppose that 2% of the students in a school have head lice and the test for head lice is accurate 85% of the time. What is the probability that a
student in the school has head lice, given that the test came back positive? Round your answer to the nearest tenth of a percent.
Amy • 5 years ago

Can you walk me through the table method for this scenario?
Suppose that 15% of the general population has a disease and the test for the disease is accurate 80% of the time. What is the probability of
testing positive for the disease?
Vivs • 5 years ago

Nicely written in simple words...Sir, Could you please give a venn diagram for this cancer example ?
Load more comments
✉ Subscribe d Add Disqus to your site ⚠ Do Not Sell My Data

IN THIS SERIES
A Brief Introduction to Probability & Statistics

An Intuitive (and Short) Explanation of Bayes' Theorem
Understanding Bayes Theorem With Ratios
Understanding the Monty Hall Problem
How To Analyze Data Using the Average
Understanding the Birthday Paradox
ABOUT THE SITE
BetterExplained helps 450k monthly readers with friendly, intuitive math lessons (more).
“If you can't explain it simply, you don't understand it well enough.” —Einstein (more) | Privacy | CC-BY-NC-SA
   

E Explained Intuitively

Uploaded by

Copyright:

Available Formats

You might also like

E Explained Intuitively

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

E Explained Intuitively

Uploaded by

Copyright:

Available Formats

Better Explained Books And Video Courses

Concrete math lessons without the jargon.

An Intuitive (and Short) Explanation of Bayes’ Theorem

Bayes' Theorem Intuition

1% of women have breast cancer (and therefore 99% do not).

Put in a table, the probabilities look like this:

How do we read it?

How Accurate Is The Test?

Here’s how I think about it:

The table looks like this:

So, our chance of cancer is .008/.10304 = 0.0776, or about 7.8%.

Intuitive Understanding: Shine The Light

Bayesian Spam Filtering

Event A: The message is spam.

Plugged into a more readable formula (from Wikipedia):

Bayes’ Theorem on Wikipedia

Join Over 450k Monthly Readers

Enjoy the article? There's plenty more to help you build

Enter your email address

Facebook Twitter LinkedIn Email Print

Other Posts In This Series

 Recommend 6 t Tweet f Share Sort by Best

Join the discussion…

koni • 3 years ago

Rho > koni • 3 years ago • edited

Alokit Jindal • 3 years ago

kalid Mod > Alokit Jindal • 3 years ago

Adam • 3 years ago

CTR$hill > Adam • 10 months ago

Rho > Adam • 3 years ago

Ben Yi • 6 months ago

Charlie • 8 months ago

Joe • 2 years ago

Guy Manova • 2 years ago

interested_observer • 2 years ago

Aryan Deora • 2 years ago

Rho • 3 years ago

will • 3 years ago

Brandon Robshaw • 3 years ago

ken • 3 years ago

Robert Mogire • 3 years ago

Robert Mogire • 3 years ago

Chris Worthington • 3 years ago

Laura Houghton • 3 years ago

80 percent of a mouse is less than 20 percent of an elephant.

jay • 3 years ago

Jimmy • 3 years ago

I know this is hard to comprehend, so to summarise:

True positive x chance of cancer

Reply asap, i need this for an investigation. Thank you.

van • 3 years ago

Rho > van • 3 years ago

Vasif Vora • 4 years ago

Saurabh • 4 years ago

Pete > Saurabh • 4 years ago

Rachana • 4 years ago

Faisal Tariq • 4 years ago