Professional Documents
Culture Documents
L12 Bayesian Network
L12 Bayesian Network
Unit 12:
Bayesian Network
Outline
• Independence
• Conditional Independence
• Bayesian Network
• Inference in Bayesian Network
References:
• Chapter 14 in Russell & Norvig
• CS188 Lecture Note: Bayes Nets [link]
Independence
3
Independence
𝑃𝑟𝑜𝑑𝑢𝑐𝑡 𝑟𝑢𝑙𝑒:
▪ If two variables x and y are independent: 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦 𝑃(𝑦)
is given by:
Answer: W P T P
sun 0.5 hot 0.6
First, create the marginal rain 0.5 cold 0.4
distribution:
T W P
Next, use the
hot sun 0.3
independence assumption hot rain 0.3
to compute the joint cold sun 0.2
distribution P2(T, W) cold rain 0.2
8
Simplifying Joint Distributions
8 entries
4 entries
32 entries
P(X|Y, Z) = P(X|Z)
P(Y|X, Z) = P(Y|Z)
12
Example
T⊥U|R
A⊥F|S
13
Example 1
Given that the probability of rainfall in a region where in average it rains
120 out of 365 days, and 40% of the employees is expected to be late for
work when it rains. Rains also cause traffic jam 80% of the time.
Assume that an employee coming late and traffic jam are conditionally
independent given the rain. What is the probability of the employee
arriving on time, there is a traffic jam, and it is a rainy day?
Information given: Query:
• P(rain) = 120/365 = 0.329 • P(late, jam, rain)
• P(late | rain) = 0.40
• P(jam | rain) = 0.80
• Late ⊥ Jam | Rain
16
Simplifying Joint Distribution with Conditional Independence
▪ Let's consider P(A, B, C, D)
Apply the chain rule:
P(A, B, C, D) = P(D|A, B, C) P(A, B, C)
= P(D|A, B, C) P(C|A, B) P(A, B)
= P(D|A, B, C) P(C|A, B) P(B|A) P(A)
18
Why Bayesian Network
▪ Joint distributions
• Represented using one single big table
• Typically very huge. Grows exponentially with respect to the
number of variables
• Hard to learn or estimate anything empirically
Auto Insurance Probabilistic
Model
Total 27 variables. If all variables are
binary, need to store 227 (~134 million)
entries in a full joint distribution.
Observation
A variable is locally related to a
few other variables.
19
Bayesian Network
▪ Bayesian Network
• Make assumptions on the conditional independence of
certain variables
• Represented using multiple local conditional probabilities
tables (CPTs)
• Models how variables interact locally
• Local interactions chain together to give global, indirect
interactions
A
A Bayesian network is a probabilistic graphical model
that represents a set of variables and their conditional B C
dependencies via a directed acyclic graph
D
20
The Bayesian Network
▪ Given
P(A, B, C, D) = P(D|B, C) P(C|A) P(B|A) P(A)
P(C|A)
P(B|A) c c
b b B C a 0.7 0.3
a 0.67 0.33 a 0.52 0.48
a 0.3 0.7
D P(D|B, C)
d d
b, c 0.32 0.68 P(d|b,c) = 1 – P(d|b,c)
b, c 0.28 0.72
b, c 0.63 0.37
b, c 0.45 0.55 22
Network Size
▪ Bayesian Network gives a huge space savings compared to the full joint
distribution P(X1, X2, … , Xn).
▪ Considering only non-root variables in the BN, assume n binary
variables:
• Size of the full joint distribution :
2n ‒ 1 ≈ 2n
• Using the Bayesian Net, if each node has up to k parents, n number
of nodes will have size:
n2k
For example:
24
Example: Traffic
Given the following Bayesian Network, get the full joint distribution
P(R, T)
Answer:
R P(R) P(r, t) = P(r)P(t|r) = (1/4)(3/4) = 3/16
r 1/4
R r 3/4 P(r, t) = P(r)P(t|r) = (1/4)(1/4) = 1/16
P(r, t) = P(r)P(t|r) = (3/4)(1/2) = 3/8
R T P(T|R) P(r, t) = P(r)P(t|r) = (3/4)(1/2) = 3/8
r t 3/4
T r t 1/4
r t 1/2
r t 1/2 P(T, R)
t t
r 3/16 1/16
r 3/8 3/8
25
Example: Alarm Network
B E
Variables
▪ B: Burglary A
▪ E: Earthquake
▪ A: Alarm goes off
▪ M: Mary calls J M
▪ J: John calls 26
Example: Alarm Network (cont.)
Burglary Earthquake
Alarm
Answer:
MaryCalls JohnCalls
P(j, m, a, b, e)
= P(e)P(b)P(a|b, e)P(j|a)P(m|a)
= 0.998 0.999 0.001 0.9 0.7
= 0.00063 27
Inference in
Bayesian Network
28
Bayes’ Net Inference by Enumeration
29
Example: Alarm Network
What is the probability of burglary if
both John and Mary call?
P(b | j, m) = P(b, j, m) / P(j, m)
Burglary Earthquake
Alarm
MaryCalls JohnCalls
30
Example: Alarm Network (cont.)
What is the probability of burglary if
both John and Mary call?
Burglary Earthquake
Alarm
MaryCalls JohnCalls
31
Example: Alarm Network (cont.)
What is the probability of burglary if
both John and Mary call?
Burglary Earthquake
P(j, m) = P(b, j, m) + P(b, j, m)
= 0.00059224259 + 0.001491857649
= 0.002084100239
Alarm
32
Bayes’ Net Summary
▪ Conditional Independences
▪ Bayes’ Net Representation
• A directed, acyclic graph, one node per random variable
• A conditional probability table (CPT) for each node
• Implicitly encode joint distributions:
33
AI Algorithms
Search
Problem
Markov
Machine Probabilistic Decision
Learning Inference Process
Constraint
Deep Bayesian Satisfaction Adversarial
Learning Networks Problem Logic
Game
Data Model
34
The End