C3 Prob

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 12

Probability and Information

A brief review

Copyright, 1996 © Dale Carnegie & Associates,


Probability
 Probability provides a way of summarizing
uncertainty that comes from our laziness and
ignorance - how wonderful it is!
 Probability, belief of the truth of a sentence
 1 - true, 0 - false,
 0<P<1 - intermediate degrees of belief in the truth
of the sentence
 Degree of truth (fuzzy logic) vs. degree of
belief
Data Mining -- Probability
7/03 H Liu (ASU) & G Dong (WSU) 2
 All probability statements must indicate the
evidence wrt which the probability is being
assessed.
 Prior or unconditional probability
 Posterior or conditional probability

Data Mining -- Probability


7/03 H Liu (ASU) & G Dong (WSU) 3
Basic probability notation
 Prior probability
 Proposition: P(Sunny)
 Random variable: P(Weather=Sunny)
 Each Random Variable has a domain
 Sunny, Cloudy, Rain, Snow
 Probability distribution P(Weather) = <.7,.2,.08,.02>
 A random variable is not a number; a number
may be obtained by observing a RV.
 A random variable can be continuous or discrete

Data Mining -- Probability


7/03 H Liu (ASU) & G Dong (WSU) 4
Conditional Probability
 Definition
 P(A|B) = P(A^B)/P(B)
 Product rule
 P(A^B) = P(A|B)P(B)
 Probabilistic inference does not work like
logical inference.

Data Mining -- Probability


7/03 H Liu (ASU) & G Dong (WSU) 5
The axioms of probability
 All probabilities are between 0 and 1

 Necessarily true (valid) propositions have


probability 1; false (unsatisfiable) have 0

 The probability of a disjunction


P(AvB)=P(A)+P(B)-P(A^B)

Data Mining -- Probability


7/03 H Liu (ASU) & G Dong (WSU) 6
The joint probability distribution
 Joint completely specifies probability assignments
to all propositions in the domain
 A probabilistic model consists of a set of random
variables (X1, …,Xn).
 An atomic event is an assignment of particular
values to all the variables.
 Marginalization rule for RV Y and Z:
P(Y) = ΣP(Y,z) over z
 Let’s see an example next.
Data Mining -- Probability
7/03 H Liu (ASU) & G Dong (WSU) 7
Joint Probability
 An example of two Boolean variables
Toothache !Toothache
Cavity 0.04 0.06
!Cavity 0.01 0.89
 Observations: mutually exclusive and collectively exhaustive
What are
 P(Cavity) =
 P(Cavity V Toothache) =
 P(Cavity ^ Toothache) =
 P(Cavity|Toothache) =
Data Mining -- Probability
7/03 H Liu (ASU) & G Dong (WSU) 8
Bayes’ rule

 Deriving the rule via the product rule


P(B|A) = P(A|B)P(B)/P(A)
 P(A) can be viewed as a normalization factor that
makes P(B|A) + (!B|A) = 1
 P(A) = P(A|B)P(B)+P(A|!B)P(!B)
 A more general case is
P(X|Y) = P(Y|X)P(X)/P(Y)
 Bayes’ rule conditionalized on evidence E
P(X|Y,E) = P(Y|X,E)P(X|E)/P(Y|E)
Data Mining -- Probability
7/03 H Liu (ASU) & G Dong (WSU) 9
Independence
 Independent events A, B
 P(B|A)=P(B),
 P(A|B)=P(A),
 P(A,B)=P(A)P(B)
 Conditional independence
 P(X|Y,Z)=P(X|Z) – given Z, X and Y are independent

Data Mining -- Probability


7/03 H Liu (ASU) & G Dong (WSU) 10
Entropy

 Entropy measures homogeneity/purity of sets of


examples
 Or as information content: the less you need to
know (to determine class of new case), the more
information you have
 With two classes (P,N) in S, p & n instances; let
t=p+n. View [p, n] as class distribution of S.

Entropy(S) = - (p/t) log2 (p/t) - (n/t) log2 (n/t)
 E.g., p=9, n=5; Entropy(S) = Entropy([9,5]) = - (9/14)
log2 (9/14) - (5/14) log2 (5/14) = 0.940
 E.g., Entropy([14,0])=0; Entropy([7,7])=1
Data Mining -- Probability
7/03 H Liu (ASU) & G Dong (WSU) 11
Entropy curve
 For p/(p+n) between 0 & 1,
the 2-class entropy is
 0 when p/(p+n) is 0 1

 1 when p/(p+n) is 0.5


 0 when p/(p+n) is 1
 monotonically increasing
between 0 and 0.5 0.5
 monotonically decreasing
between 0.5 and 1
 When the data is pure, only
need to send 1 bit Data Mining -- Probability
7/03 H Liu (ASU) & G Dong (WSU) 12

You might also like