Professional Documents
Culture Documents
Chapter 1: Bayesian Basics: Conchi Aus In and Mike Wiper Department of Statistics Universidad Carlos III de Madrid
Chapter 1: Bayesian Basics: Conchi Aus In and Mike Wiper Department of Statistics Universidad Carlos III de Madrid
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 1 / 39
Objective
Review the basic probability rules and apply the Bayes theorem to build a naive
classifier for spam filtering in a big data setting.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 2 / 39
Probability Rules I: Partitions
Sk
Events B1 , ..., Bk form a partition if Bi ∩ Bj = φ ∀i 6= j and i=1 Bi = Ω. Then
for any event A,
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 3 / 39
Probability Rules II: Conditional Probability
P(A ∩ B)
P(A|B) = .
P(B)
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 4 / 39
Probability Rules III: Total Probability
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 5 / 39
Probability Rules III: Total Probability
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 5 / 39
Probability Rules IV: Bayes Theorem
P(A|B)P(B)
P(B|A) = .
P(A)
P(A|Bi )P(Bi )
P(Bi |A) = Pk
j=1 P(A|Bj )P(Bj )
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 6 / 39
Example: The Monty Hall problem
the host always opens a different door from the door chosen by the player and
always reveals a goat by this action because he knows where the car is hidden.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 7 / 39
Simulation of the Monty Hall problem
Let us simulate the Monty Hall program with d ≥ 3 doors. It is assumed that if a
change is made, one of the unopened doors is selected at random. The code is
adapted from the Wiki.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 8 / 39
Solution using Bayes Theorem
Suppose without loss of generality that the player chooses door 1 and that the
host opens door 2 to reveal a goat.
Let A (B, C) be the event that the prize is behind door 1, (2, 3).
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 9 / 39
Solution using Bayes Theorem
Suppose without loss of generality that the player chooses door 1 and that the
host opens door 2 to reveal a goat.
Let A (B, C) be the event that the prize is behind door 1, (2, 3).
P(A) = P(B) = P(C ) = 13 .
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 9 / 39
Solution using Bayes Theorem
Suppose without loss of generality that the player chooses door 1 and that the
host opens door 2 to reveal a goat.
Let A (B, C) be the event that the prize is behind door 1, (2, 3).
P(A) = P(B) = P(C ) = 13 .
P(opens 2|A) = 12 . P(opens 2|B) = 0, P(opens 2|C ) = 1.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 9 / 39
Solution using Bayes Theorem
Suppose without loss of generality that the player chooses door 1 and that the
host opens door 2 to reveal a goat.
Let A (B, C) be the event that the prize is behind door 1, (2, 3).
P(A) = P(B) = P(C ) = 13 .
P(opens 2|A) = 12 . P(opens 2|B) = 0, P(opens 2|C ) = 1.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 9 / 39
Solution using Bayes Theorem
Suppose without loss of generality that the player chooses door 1 and that the
host opens door 2 to reveal a goat.
Let A (B, C) be the event that the prize is behind door 1, (2, 3).
P(A) = P(B) = P(C ) = 13 .
P(opens 2|A) = 12 . P(opens 2|B) = 0, P(opens 2|C ) = 1.
1 1
P(opens 2|A)P(A) 2 × 3 1
P(A|opens 2) = = 1 =
P(opens 2) 2
3
2
so P(C |opens 2) = 3 and it is better to switch.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 9 / 39
Example: Spam filtering
In this example, we will use the Bayes theorem to build a spam filter based on the
words in the message.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 10 / 39
Example: Spam filtering
Assume that we have the following set of email classified as spam or ham.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 11 / 39
Example: Spam filtering
Assume that we have the following set of email classified as spam or ham.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 11 / 39
Using Bayes theorem
spam: “send us your password”
ham: “send us your review”
ham: “password review”
spam: “review us ”
spam: “send your password”
spam: “send us your account”
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 12 / 39
Using Bayes theorem
spam: “send us your password”
ham: “send us your review”
ham: “password review”
spam: “review us ”
spam: “send your password”
spam: “send us your account”
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 12 / 39
Using Bayes theorem
spam: “send us your password”
ham: “send us your review”
ham: “password review”
spam: “review us ”
spam: “send your password”
spam: “send us your account”
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 12 / 39
For several words...
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 13 / 39
For several words...
Pr (· | spam) Pr (· | ham)
review 1/4 2/2
send 3/4 1/2
us 3/4 1/2
your 3/4 1/2
password 2/4 1/2
account 1/4 0/2
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 13 / 39
For several words...
Pr (· | spam) Pr (· | ham)
review 1/4 2/2
send 3/4 1/2
us 3/4 1/2
your 3/4 1/2
password 2/4 1/2
account 1/4 0/2
Using the Bayes theorem, the probability that the given new email “review us
now” is spam is:
0.0044 · 46
= = 0.123
0.0044 · 64 + 0.0625 · 26
Note that the independence assumption between words will not in general be
satisfied, but it can be useful for a naive approach.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 15 / 39
Example: SMS spam data
Consider the file “sms.csv” which contains a study of SMS records classified as
“spam” or “ham”.
rm(list=ls())
sms <- read.csv("sms.csv",sep=",")
names(sms)
head(sms)
We want to use a naive Bayes classifier to build a spam filter based on the words
in the message.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 16 / 39
Prepare the Corpus
library(tm)
Here, VectorSource tells the Corpus function that each document is an entry in
the vector.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 17 / 39
Clean the Corpus
Different texts may contain “Hello!”, “Hello,” “hello”, etc. We would like to
consider all of these the same. We clean up the corpus with the tm_map function.
inspect(clean_corpus[1:3])
Remove numbers:
Remove punctuation:
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 18 / 39
Clean the Corpus
Remove common non-content words, like to, and, the,.. These are called stop
words. The function stopwords reports a list of about 175 such words.
stopwords("en")[1:10]
clean_corpus <- tm_map(clean_corpus, removeWords,
stopwords("en"))
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 19 / 39
Word clouds
We create word clouds to visualize the differences between the two message types,
ham or spam.
First, obtain the indices of spam and ham messages:
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 20 / 39
Word clouds
library(wordcloud)
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 21 / 39
Word clouds
wordcloud(clean_corpus[spam_indices], min.freq=40)
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 22 / 39
Building a spam filter
Divide corpus into training and test data. Use 75% training and 25% test:
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 23 / 39
Compute the frequency of terms
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 24 / 39
Identify frequently used words
Don’t muddy the classifier with words that may only occur a few times.
To identify words appearing at least 5 times:
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 25 / 39
Convert count information to “Yes”, “No”
Naive Bayes classification needs present or absent info on each word in a message.
We have counts of occurrences. To convert the document-term matrices:
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 26 / 39
Convert document-term matrices
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 27 / 39
Create a Naive Bayes classifier
library(e1071)
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 28 / 39
Evaluate the performance on the test data
Given the classifier object, we can use the predict function to test the model on
new data.
Classifications of messages in the test set are based on the probabilities generated
with the training set.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 29 / 39
Check the predictions against reality
table(predictions, sms_test$type)
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 30 / 39
Problems with the Naive Bayes classifier
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 31 / 39
Problems with the Naive Bayes classifier
Suppose now that we are interested in classifying the following new email:
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 31 / 39
Problems with the Naive Bayes classifier
Pr (· | spam) Pr (· | ham)
review 1/4 2/2
send 3/4 1/2
us 3/4 1/2
your 3/4 1/2
password 2/4 1/2
account 1/4 0/2
Thus,
Pr({1,0,0,1,0,1}|ham) Pr(ham)
Pr (ham | review your account) = Pr({1,0,0,1,0,1}|spam) Pr(spam)+Pr({1,0,0,1,0,1}|ham) Pr(ham)
0 · 26
= =0
0.0014 · 46 + 0 · 62
Therefore, any new email with the word “account” will be classified as spam.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 32 / 39
Bayesian Naive Bayes classifier
θs = Pr(spam) 1 − θs = Pr(ham).
Given the observed data, we update to the posterior distribution using the
Bayes theorem:
f (data | θs )f (θs )
f (θs | data) = ∝ f (data | θs )f (θs )
f (data)
likelihood × prior
posterior = ∝ likelihood × prior
evidence
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 33 / 39
Bayesian Naive Bayes classifier
In the example, we had 4 spam emails and 2 ham emails. Then, the
likelihood is:
f (data | θs ) ∝ θs4 (1 − θs )2
Then, the posterior distribution is:
θs | data ∼ Beta(1 + 4, 1 + 2)
A Bayesian point estimate can be obtained using the mean of the posterior
distribution:
1+4 5 1+2 3
Pr(spam) = = Pr(ham) = =
2+6 8 2+6 8
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 34 / 39
Bayesian Naive Bayes classifier
spam: “send us your password”
ham: “send us your review”
ham: “password review”
spam: “review us ”
spam: “send your password”
spam: “send us your account”
Consequently, the Bayesian probability that an email with the word “review” is a
spam is:
Pr(review|spam) Pr(spam)
Pr (spam | review) = Pr(review|spam) Pr(spam)+Pr(review|ham) Pr(ham)
1+1 1+4
2+4 2+6
= 1+1 1+4 1+2 1+2 = 0.425
2+4 2+6 + 2+2 2+6
which is somewhat larger than the estimated probability with the classical
approach (1/3).
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 36 / 39
Bayesian Naive Bayes classifier
Pr (· | spam) Pr (· | ham)
review 2/6 3/4
send 4/6 2/4
us 4/6 2/4
your 4/6 2/4
password 3/6 2/4
account 2/6 1/4
Then,
2224325
6666668
Pr(spam | review your account) = 2224325 3222213 = 0.369
6666668 + 4444448
Therefore, the new email will be classified as ham, unlike the classical approach.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 37 / 39
Bayesian Naive Bayes classifier
The Bayesian Naive Bayes with uniform priors is equivalent to the frequently
called “Laplacian smoothing”.
In the sms example, it can be incorporated using:
The Bayesian Spam filter performance is slightly better than the classical
approach.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 38 / 39
Summary
We have reviewed the basic probability rules with particular emphasis on the
Bayes theorem.
We have also used the Bayes theorem to build a spam filter to classify
messages as spam or ham based on the observed words. It has been applied
to a real data base of sms messages.
I Firstly, we have estimated probabilities using empirical frequencies based on a
training data set.
I Then, we have considered a Bayesian approach where prior beliefs are updated
using the observed data leading to posterior estimates.
The naive Bayes classifier can be viewed as one of the simplest Bayesian
networks. These will be studied in the next class.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 39 / 39