Professional Documents
Culture Documents
Probability Review
Probability Review
Probability Review
Thursday Sep 13
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
Sample space and Events
• Sample Space, result of an experiment
• If you toss a coin twice
• Event: a subset of
• First toss is head = {HH,HT}
• S: event space, a set of events
• Closed under finite union and complements
• Entails other binary operation: union, diff, etc.
• Contains the empty event and
Probability Measure
• Defined over (Ss.t.
• P() >= 0 for all in S
• P() = 1
• If are disjoint, then
• P( U ) = p() + p()
• We can deduce other axioms from the above ones
• Ex: P( U ) for non-disjoint event
P( U ) = p() + p() – p(∩
Visualization
p( F H )
p ( f | h)
p( H )
Rule of total probability
B5 B3 B2
B4
A
B1
B7 B6
p A PBi P A | Bi
From Events to Random Variable
• Almost all the semester we will be dealing with RV
• Concise way of specifying attributes of outcomes
• Modeling students (Grade and Intelligence):
• all possible students
• What are events
• Grade_A = all students with grade A
• Grade_B = all students with grade B
• Intelligence_High = … with high intelligence
• Very cumbersome
• We need “functions” that maps from to an
attribute space.
• P(G = A) = P({student ϵ G(student) = A})
Random Variables
I:Intelligence High
low
G:Grade B A+
Continuous Random Variables
• Probability density function (pdf) instead of
probability mass function (pmf)
• A pdf is any function f(x) that describes the
probability density in terms of the input
variable x.
Probability of Continuous RV
• Properties of pdf
f (x) 0,x
f (x) 1
FX (v) f (x)dx
d
Fx (x) f (x)
dx
Common Distributions
• Normal X N(μ, σ2)
1 (x ) 2
f (x) exp 2
2 2
. exp 1 xr T 1 xr
2
Mean
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
Joint Probability Distribution
• Random variables encodes attributes
• Not all possible combination of attributes are equally
likely
• Joint probability distributions quantify this
• P( X= x, Y= y) = P(x, y)
• Generalizes to N-RVs
• x y P X x, Y y 1
•
f x, y dxdy 1
x y
X ,Y
Chain Rule
• Always true
• P(x, y, z) = p(x) p(y|x) p(z|x, y)
= p(z) p(y|z) p(x|y, z)
=…
Conditional Probability
events
P X x Y y
P X x Y y
P Y y
p ( x, y )
P x | y
p( y )
Marginalization
• We know p(X, Y), what is P(X=x)?
• We can use the low of total probability, why?
p x P x , y B5 B3 B2
y B4
P y Px | y A
y
B1
B7 B6
Marginalization Cont.
• Another example
p x P x , y , z
y,z
P y, z Px | y, z
z,y
Bayes Rule
• We know that P(rain) = 0.5
• If we also know that the grass is wet, then
how this affects our belief about whether it
rains or not?
P(rain)P(wet | rain)
P rain | wet
P(wet)
P(x)P(y | x)
P x | y
P(y)
Bayes Rule cont.
• You can condition on more variables
P ( x | z ) P ( y | x, z )
P x | y , z
P( y | z )
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
Independence
• X is independent of Y means that knowing Y
does not change our belief about X.
• P(X|Y=y) = P(X)
• P(X=x, Y=y) = P(X=x) P(Y=y)
• The above should hold for all x, y
• It is symmetric and written as X Y
Independence
• X1, …, Xn are independent if and only if
n
P(X1 A1,..., X n An ) P X i Ai
i1
X1, …, Xn ∼ P
CI: Conditional Independence
• RV are rarely independent but we can still
leverage local structural properties like
Conditional Independence.
• X Y | Z if once Z is observed, knowing the
value of Y does not change our belief about X
• P(rain sprinkler’s on | cloudy)
• P(rain sprinkler’s on | wet grass)
Conditional Independence
• P(X=x | Z=z, Y=y) = P(X=x | Z=z)
• P(Y=y | Z=z, X=x) = P(Y=y | Z=z)
• P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z)
We call these factors : very useful concept !!
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
Mean and Variance
• Mean (Expectation): E X
– Discrete RVs: E X v vi P X vi
i
– Continuous RVs: E X
xf x dx
E(g(X)) g(x) f (x)dx
Mean and Variance
• Variance: 2
Var (X) E((X ) )
Var (X) E(X 2 ) 2
– Discrete RVs:
2
– Continuous RVs:
V X
vi
vi P X vi
• Covariance: 2
V X
x f x dx
Cov(X,Y )
• Covariance: E((X x )(Y y )) E (XY) x y
Mean and Variance
• Correlation:
(X,Y ) Cov(X,Y ) / x y
1 (X,Y ) 1
Properties
• Mean
– E X Y E X E Y
– E aX aE X
– If X and Y are independent, E XY E X E Y
• Variance
– V aX b a 2V X
– If X and Y are independent, V X Y V (X) V (Y)
Some more properties
• The conditional expectation of Y given X when
the value of X = x is:
E Y | X x y * p( y | x)dy
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
The Big Picture
Probability
Model Data
Estimation/learning
Statistical Inference
• Given observations from a model
– What (conditional) independence assumptions
hold?
• Structure learning
– If you know the family of the model (ex,
multinomial), What are the value of the
parameters: MLE, Bayesian estimation.
• Parameter learning
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
Monty Hall Problem
• You're given the choice of three doors: Behind one
door is a car; behind the others, goats.
• You pick a door, say No. 1
• The host, who knows what's behind the doors, opens
another door, say No. 3, which has a goat.
• Do you want to pick door No. 2 instead?
Host reveals
Goat A
or
Host reveals
Goat B
Host must
reveal Goat B
Host must
reveal Goat A
Monty Hall Problem: Bayes Rule
• Ci : the car is behind door i, i = 1, 2, 3
• P Ci 1 3
• H ij : the host opens door j after you pick door i
0 i j
0 jk
• P H ij Ck
1 2 ik
1 i k , j k
Monty Hall Problem: Bayes Rule cont.
• WLOG, i=1, j=3
P H13 C1 P C 1
• P C1 H13
P H13
1 1 1
• P H13 C1 P C1
2 3 6
Monty Hall Problem: Bayes Rule cont.
• P H13 P H13 , C1 P H13 , C2 P H13 , C3
P H13 C1 P C1 P H13 C2 P C2
1 1
1
6 3
1
2
16 1
• P C1 H13 1 2 3
Monty Hall Problem: Bayes Rule cont.
16 1
P C1 H13
12 3
1 2
P C2 H13 1 P C1 H13
3 3
You should switch!
Information Theory
• P(X) encodes our uncertainty about X
• Some variables are more uncertain that others
P(X) P(Y)
X Y
1 1
H P X E log
P x log Px log P( x)
p x x P x x
Information Theory cont.
• Entropy: average number of bits required to encode X
1 1
H P X E log
P x log Px log P ( x)
p x x P x x
p ( x, y )
I ( X ; Y ) p( x, y ) log
y x p( x) p( y )
Chi Square Test for Independence
(Example)
Republican Democrat Independent Total
Male 200 150 50 400
Female 250 300 50 600
Total 450 450 100 1000