Professional Documents
Culture Documents
A Study of Supervised Spam Detection Using Artificial Intelligence
A Study of Supervised Spam Detection Using Artificial Intelligence
Presented by
Mohit Magare
Class: BE-B-10
PRN No: 71921639H
1
What is Spam?
• Typical legal definition
– Unsolicited commercial email from someone
without a pre-existing business relationship
2
Spam Detection
Ham
Spam
4
Weather Report Guy
• Content in Image
5
Secret Decoder Ring Dude
• Character Encoding
• HTML word breaking
Pharmacy
Produc<!LZJ>t<!LG>s
6
Diploma Guy
• Word Obscuring
Dlpmoia Pragorm
Caerte a mroe prosoeprus
7
One Solution to Spam Detection
• Machine Learning
– Learn spam versus good
8
Naïve Bayes
• Want P( spam | words )
• Use Bayes Rule: P(spam | words ) P(words | spam) P(spam)
P( words )
9
A Bayesian Approach to Filtering Junk E-Mail
1998 - Sahami, Dumais, Heckerman, Horvitz
10
A Bayesian Approach to Filtering Junk E-Mail
1998 - Sahami, Dumais, Heckerman, Horvitz
13
O
14
Filters
• Some available open-source spam filters
– Spamassassin
– Bogofilter
– CRM-114
– DSPAM
– SpamBayes
– Spamprobe
15
Evaluation Measures (1)
judgement
Ham Spam
Ham a b
Result
Spam c d
a: ham (correctly classified) [true negative]
b: spam misclassification [false negative]
c: ham misclassification [false positive]
d: spam (correctly classified) [true negative]
17
Thank you!
18