Professional Documents
Culture Documents
8 - Classification NaiveBayes PDF
8 - Classification NaiveBayes PDF
Classification
By Naveen Aggarwal
Naveen Aggarwal
Statistical Learning
Data – instantiations of some or all of the random variables
describing the domain; they are evidence
Hypotheses – probabilistic theories of how the domain works
The Surprise candy example: two flavors in very large bags
of 5 kinds, indistinguishable from outside
h1: 100% cherry – P(c|h1) = 1, P(l|h1) = 0
h2: 75% cherry + 25% lime
h3: 50% cherry + 50% lime
h4: 25% cherry + 75% lime
h5: 100% lime
April 29, 2018 Naveen Aggarwal
29-04-2018
Problem formulation
Given a new bag, random variable H denotes the bag
type (h1 – h5); Di is a random variable (cherry or lime);
after seeing D , D , …, D , predict the flavor (value) of D .
1 2 N N+1
Example:
Hypothesis Prior over h1, …, h5 is {0.1,0.2,0.4,0.2,0.1}
Data:
Q1: After seeing d1, what is P(hi|d1)?
Q2: After seeing d1, what is P(d2= |d1)?
Naveen Aggarwal
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 4
29-04-2018
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 5
Bayesian Theorem
Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes
theorem
P ( H | X ) P (X | H ) P (H )
P (X )
Informally, this can be written as
posteriori = likelihood x prior/evidence
Predicts X belongs to C2 iff the probability P(Ci|X) is the highest among all the P(Ck|X) for
all the k classes
Practical difficulty: require initial knowledge of many probabilities, significant
computational cost
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 6
29-04-2018
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 7
2 2
g ( x, , ) e
2
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 8
29-04-2018
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 10
29-04-2018
• Ex. Suppose a dataset with 1000 tuples, income=low (0), income= medium (990), and income = high (10),
• Use Laplacian correction (or Laplacian estimator)
– Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
– The “corrected” prob. estimates are close to their “uncorrected” counterparts
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 11
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 12
29-04-2018
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 13
ML – maximum likelihood
assume uniform prior over H
when (1) no preferable hypothesis a priori, (2) data set is large
Naveen Aggarwal
Naveen Aggarwal
29-04-2018
EM Algorithm steps
E-step computes the expected value pij of the hidden
indicator variables Zij, where Zij is 1 if xj was generated by
i-th component, 0 otherwise
M-step finds the new values of the parameters that
maximize the log likelihood of the data, given the
expected values of Zij
• EM increases the log likelihood of the data at each iteration.
• EM can reach a local maximum in likelihood.
Naveen Aggarwal
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 21
Nearest-neighbor models
The key idea: Neighbors are similar
Density estimation example: estimate x’s probability density by the density of its
neighbors
Connecting with table lookup, NBC, decision trees, …
How define neighborhood N
If too small, no any data points
If too big, density is the same everywhere
A solution is to define N to contain k points, where k is large enough to ensure a
meaningful estimate
• For a fixed k, the size of N varies
• The effect of size of k
• For most low-dimensional data, k is usually between 5-10
Naveen Aggarwal
29-04-2018
_
_
_ _ .
+
_ .
+
xq + . . .
_ +
April 29, 2018 Naveen Aggarwal Data Mining: Concepts and Techniques 24
29-04-2018
Thanks
Naveen Aggarwal