L12 Bayesian Network

UCCD2063
Artificial Intelligence Techniques
Unit 12:
Bayesian Network
Outline
• Independence
• Conditional Independence
• Bayesian Network
• Inference in Bayesian Network
References:
• Chapter 14 in Russell & Norvig
• CS188 Lecture Note: Bayes Nets [link]
Independence
▪ An event A is independent of event B if A is not affected by B.

▪ For example
• When you toss a coin, the probability of getting head at
any time is not affected by previous tosses.
• The probability that today is A’s birthday is independent
of the date of B’s birthday.
▪ To denote the independence of the two variables A and B, we

use the following notation:
𝐴⊥𝐵
3
Independence
𝑃𝑟𝑜𝑑𝑢𝑐𝑡 𝑟𝑢𝑙𝑒:
▪ If two variables x and y are independent: 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦 𝑃(𝑦)
▪ For example: If the passing exam and the weather are

independent:
W P(W)
E P(E)
hazy 0.6
pass 0.85
sunny 0.1
pass 0.15
rainy 0.3
Passing an exam Weather
Then, the probability of passing exam and a sunny day is

P(pass, sunny) = P(pass)P(sunny)
= 0.85*0.1 = 0.085
Notes: Only correct if the weather and passing an exam are independent. 4
Independence
▪ For two independent variables , the joint distribution can be

derived directly from their marginal distributions
▪ For example: the joint distribution of the two independent
variables:
Passing an exam Weather
is given by:
hazy sunny rainy

0.85*0.6 0.85*0.1 0.85*0.3
pass = 0.085 = 0.255
= 0.51
0.15*0.6 0.15*0.1 0.15*0.3
pass = 0.09 = 0.015 = 0.045
5
Independence
▪ Independent variables also have the following properties:
▪ Consider the previous example:

W and E are independent:
P(pass | hazy) = P(pass, hazy)/P(hazy)
= 0.51 / 0.6
= 0.85
= P(pass)
P(hazy| pass) = P(pass, hazy)/P(pass)

= 0.51 / 0.85
= 0.6
= P(hazy)
6
Example
T W P
Given the following joint distribution P1(T, W) hot sun 0.4
determine if T and W are independent hot rain 0.2
cold sun 0.1
cold rain 0.3
Answer: W P T P
sun 0.5 hot 0.6
First, create the marginal rain 0.5 cold 0.4
distribution:
T W P
Next, use the
hot sun 0.3
independence assumption hot rain 0.3
to compute the joint cold sun 0.2
distribution P2(T, W) cold rain 0.2
T and W are not independent because the original joint distribution

P1(W, T) is very different from the joint distribution generated from the
7
independence assumption P2(W, T)
Simplifying Joint Distributions
▪ The independence property can simplify the probability

distribution
▪ For example, if Weather is independent of Cavity, Toothache
and Catch, we may break the original table into two tables:
P(Toothache, Catch, Cavity, Weather)

= P(Toothache, Catch, Cavity) P(Weather)
8
Simplifying Joint Distributions
▪ This results in less entries

P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather)
8 entries
4 entries
32 entries
Assume Weather: {rainy, sunny, foggy, snowy}.

Cavity: {cavity, cavity},
Toothache: {toothache, toothache}
Catch: {catch, catch}
The number of variables reduces from 32 (i.e., 4222) entries to 12

(i.e., 4 + 222) entries
9
Example
• For n independent unbiased coins:
P(Coin1, …, Coinn) The full joint distribution has a total 2n

entries
With independence assumption, we

need to keep n table, each have a size
of 2. Total entries = 2n.
P(Coin1) P(Coinn)
Since the probabilities of all tosses are

Coin P(Coin)
head 0.5
the same (i.e., all 2n tables are the
tail 0.5
same), we need to store only 1 table.
Total entries = 2. This is a huge savings in
storage.
10
Conditional Independence
▪ Unconditional independence is rare and cannot be used to infer

other variables
▪ Conditional independence:
• Two dependent variables may become independent given
some other variables → needs at least 3 variables to work.
• Example:
• Running Nose is dependent of fever.
Running Once we know someone has fever, it
Fever becomes more likely that he also has
Nose
running nose as well
• Running Nose is independent of
Flu fever given the disease.
Once we know that a person has flu,
Running knowing one symptom does not
Fever
Nose affect the likelihood of getting
another symptom anymore 11
Conditional Independence
▪ If X is conditionally independent of Y given Z

X⊥Y|Z
the following rules applies:
P(X, Y|Z) = P(X|Z) P(Y|Z)
P(X|Y, Z) = P(X|Z)
P(Y|X, Z) = P(Y|Z)
12
Example
What are the conditional independence relationships for

the following domains:
▪ Traffic jam (T)

▪ Somebody carries an umbrella (U)
▪ It is raining (R)
T⊥U|R
▪ There is fire (F)

▪ Smoke is present (S)
▪ Alarm triggers (based on smoke only) (A)
A⊥F|S
13
Example 1
Given that the probability of rainfall in a region where in average it rains
120 out of 365 days, and 40% of the employees is expected to be late for
work when it rains. Rains also cause traffic jam 80% of the time.
Assume that an employee coming late and traffic jam are conditionally
independent given the rain. What is the probability of the employee
arriving on time, there is a traffic jam, and it is a rainy day?
Information given: Query:
• P(rain) = 120/365 = 0.329 • P(late, jam, rain)
• P(late | rain) = 0.40
• P(jam | rain) = 0.80
• Late ⊥ Jam | Rain
P(late, jam, rain)

= P(late, jam | rain) P(rain) (product rule)
= P(late | rain) P (jam | rain) P(rain) (conditional independence assumptions)
= (1 – P(late|rain)) P(jam|rain) P(rain)
= 0.6*0.8*0.329
= 0.15792 14
Example 2
Given that the probability of rainfall in a region where in average it

rains 120 out of 365 days, and the probability of raining and an
employee being late is 0.30.
Assume that an employee coming late and traffic jam are conditionally
independent given the rain. What is the probability of an employee
coming late given that it is raining and there is no traffic jam?
Information given: Query:

• P(rain) = 120/365 = 0.329 P(late | jam, rain)
• P(late, rain) = 0.3
• Late ⊥ Jam | Rain
P(late| jam, rain)

= P(late | rain) (conditional independence assumption)
= P(late, rain)/P(rain) (product rule)
= 0.3 / 0.329
= 0.912
15
Chain Rule
▪ A joint probability distribution can be expressed using

conditional distributions by using a chain of product
rules. For example: 𝑃𝑟𝑜𝑑𝑢𝑐𝑡 𝑟𝑢𝑙𝑒:
𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦 𝑃(𝑦)
P(Umbrella ,Traffic, Rain,)
= P(Umbrella | Traffic, Rain) P(Traffic, Rain)
= P(Umbrella | Traffic, Rain) P(Traffic | Rain) P(Rain)
= P(Rain | Traffic, Umbrella) P(Traffic | Umbrella) P(Umbrella)
▪ The chain rule:

P(X1, X2, … , Xn) = P(Xn | X1, …, Xn-1) P(Xn-1 | X1,…, Xn-2) …P(X2 | X1)P(X1)
16
Simplifying Joint Distribution with Conditional Independence
▪ Let's consider P(A, B, C, D)
Apply the chain rule:
P(A, B, C, D) = P(D|A, B, C) P(A, B, C)
= P(D|A, B, C) P(C|A, B) P(A, B)
= P(D|A, B, C) P(C|A, B) P(B|A) P(A)
▪ If we make conditional independence assumption, we get:
Assume D is independent of A given B and C: P(D|A,B,C) = P(D|B,C)
= P(D|B, C) P(C|A, B) P(B|A) P(A)

Assume C is independent of B given A: P(C|A,B) = P(C|A)
= P(D|B, C) P(C|A) P(B|A) P(A)
17
Bayesian Network
(Bayes’ Net)
18
Why Bayesian Network
▪ Joint distributions
• Represented using one single big table
• Typically very huge. Grows exponentially with respect to the
number of variables
• Hard to learn or estimate anything empirically
Auto Insurance Probabilistic
Model
Total 27 variables. If all variables are
binary, need to store 227 (~134 million)
entries in a full joint distribution.
Observation
A variable is locally related to a
few other variables.
19
Bayesian Network
▪ Bayesian Network
• Make assumptions on the conditional independence of
certain variables
• Represented using multiple local conditional probabilities
tables (CPTs)
• Models how variables interact locally
• Local interactions chain together to give global, indirect
interactions
A
A Bayesian network is a probabilistic graphical model
that represents a set of variables and their conditional B C
dependencies via a directed acyclic graph
D
20
The Bayesian Network
▪ Given
P(A, B, C, D) = P(D|B, C) P(C|A) P(B|A) P(A)
▪ P(D|B, C) P(C|A) P(B|A) P(A) is essentially a Bayesian Network

which can be visualized as follows:
A The topology of the network:

▪ Each node represents a variable
▪ The edge from A to B represents
that B is conditioned by A, P(B|A)
B C
▪ A variable can be conditioned by
more than one variable. For
example, D is conditioned by B and
D C, P(D|B, C)
21
The Bayesian Network
▪ Original: use one big table (i.e., P(A, B, C, D)) to model the joint distribution
▪ BN Network: when we make certain conditional independence assumptions, we
can represent P(A, B, C, D) using the Bayesian Network (BN)
▪ The BN comprises multiple tables:
• Prior distribution for root nodes. e.g., P(A)
• Conditional Probability Table (CPT) for non-root nodes, e.g., P(B|A), P(C|A)
and P(D|B, C)
P(A)
a a
A 0.4 0.6
P(C|A)
P(B|A) c c
b b B C a 0.7 0.3
a 0.67 0.33 a 0.52 0.48
a 0.3 0.7
D P(D|B, C)
d d
b, c 0.32 0.68 P(d|b,c) = 1 – P(d|b,c)
b, c 0.28 0.72
b, c 0.63 0.37
b, c 0.45 0.55 22
Network Size
▪ Bayesian Network gives a huge space savings compared to the full joint
distribution P(X1, X2, … , Xn).
▪ Considering only non-root variables in the BN, assume n binary
variables:
• Size of the full joint distribution :
2n ‒ 1 ≈ 2n
• Using the Bayesian Net, if each node has up to k parents, n number
of nodes will have size:
n2k
Notes: For each node (i.e., CPT), there are 2k conditions
▪ If n>> k, then number of entries in the BN (n2k) will be much smaller

than the full joint distribution (2n)
▪ Easier to construct local CPTs P(Xi | Parents(Xi)) than the full joint
distribution P(X1, X2, … , Xn)
23
Bayesian Network and Joint Distributions
▪ Bayesian Network implicitly encode the joint distribution

▪ We can retrieve the full joint probability from a Bayes’ Net by
multiplying the relevant conditionals (from CPTs) together (chain rule):
For example:
P(cavity, catch, toothache)

=P(cavity)P(toothache|cavity) P(catch|cavity)
P(cavity, catch, toothache)

=P( cavity)P(toothache| cavity) P(catch| cavity)
24
Example: Traffic
Given the following Bayesian Network, get the full joint distribution
P(R, T)
Answer:
R P(R) P(r, t) = P(r)P(t|r) = (1/4)(3/4) = 3/16
r 1/4
R r 3/4 P(r, t) = P(r)P(t|r) = (1/4)(1/4) = 1/16
P(r, t) = P(r)P(t|r) = (3/4)(1/2) = 3/8
R T P(T|R) P(r,  t) = P(r)P(t|r) = (3/4)(1/2) = 3/8
r t 3/4
T r t 1/4
r t 1/2
r t 1/2 P(T, R)
t t
r 3/16 1/16
r 3/8 3/8
25
Example: Alarm Network
Build a Bayesian Network for the following scenario:
You went for a holiday and asked your

two neighbors, John and Mary, to call if
they heard the alarm ringing. The
alarm can be triggered by minor
earthquakes or a burglar.
B E
Variables
▪ B: Burglary A
▪ E: Earthquake
▪ A: Alarm goes off
▪ M: Mary calls J M
▪ J: John calls 26
Example: Alarm Network (cont.)
Given the following Bayesian Network, what is the probability of

event (j, m, a, b, e) happening?
Burglary Earthquake
Alarm
Answer:
MaryCalls JohnCalls
P(j, m, a,  b, e)
= P(e)P(b)P(a|b, e)P(j|a)P(m|a)
= 0.998  0.999  0.001  0.9  0.7
= 0.00063 27
Inference in
Bayesian Network
28
Bayes’ Net Inference by Enumeration
▪ We want to find out the

Burglary Earthquake probability of certain event
happening.
▪ For example: what is the
Alarm
probability of burglary if both
John and Mary calls?
P(b|j, m) = ?
MaryCalls JohnCalls • Evidence variables:
John, Marry
• Query variable:
Burglary
• Hidden variables:
Alarm, Earthquake
29
Example: Alarm Network
What is the probability of burglary if
both John and Mary call?
P(b | j, m) = P(b, j, m) / P(j, m)
Burglary Earthquake
Alarm
MaryCalls JohnCalls
30
Burglary Earthquake
Alarm
MaryCalls JohnCalls
31
Burglary Earthquake
P(j, m) = P(b, j, m) + P(b, j, m)
= 0.00059224259 + 0.001491857649
= 0.002084100239
Alarm
P(b | j, m) = P(b, j, m) / P(j, m)

= 0.00059224259 / 0.002084100239
= 0.2842
MaryCalls JohnCalls
32
Bayes’ Net Summary
▪ Conditional Independences
▪ Bayes’ Net Representation
• A directed, acyclic graph, one node per random variable
• A conditional probability table (CPT) for each node
• Implicitly encode joint distributions:
▪ Probabilistic Inference in Bayes’ Net

• Enumeration (exact, exponential complexity, very slow)
• Variable elimination (exact, often better)
• Sampling (approximate, much faster)
▪ Learning Bayes’ Nets from Data
33
AI Algorithms
Search
Problem
Markov
Machine Probabilistic Decision
Learning Inference Process
Constraint
Deep Bayesian Satisfaction Adversarial
Learning Networks Problem Logic
Game
Data Model
34
The End

L12 Bayesian Network

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L12 Bayesian Network

Uploaded by

Copyright:

Available Formats

UCCD2063

Artificial Intelligence Techniques

▪ An event A is independent of event B if A is not affected by B.

▪ To denote the independence of the two variables A and B, we

▪ For example: If the passing exam and the weather are

Then, the probability of passing exam and a sunny day is

▪ For two independent variables , the joint distribution can be

hazy sunny rainy

▪ Independent variables also have the following properties:

▪ Consider the previous example:

P(hazy| pass) = P(pass, hazy)/P(pass)

T and W are not independent because the original joint distribution

▪ The independence property can simplify the probability

P(Toothache, Catch, Cavity, Weather)

▪ This results in less entries

Assume Weather: {rainy, sunny, foggy, snowy}.

The number of variables reduces from 32 (i.e., 4222) entries to 12

• For n independent unbiased coins:

P(Coin1, …, Coinn) The full joint distribution has a total 2n

With independence assumption, we

Since the probabilities of all tosses are

▪ Unconditional independence is rare and cannot be used to infer

▪ If X is conditionally independent of Y given Z

the following rules applies:

P(X, Y|Z) = P(X|Z) P(Y|Z)

What are the conditional independence relationships for

▪ Traffic jam (T)

▪ There is fire (F)

P(late, jam, rain)

Given that the probability of rainfall in a region where in average it

Information given: Query:

P(late| jam, rain)

▪ A joint probability distribution can be expressed using

▪ The chain rule:

▪ If we make conditional independence assumption, we get:

Assume D is independent of A given B and C: P(D|A,B,C) = P(D|B,C)

= P(D|B, C) P(C|A, B) P(B|A) P(A)

▪ P(D|B, C) P(C|A) P(B|A) P(A) is essentially a Bayesian Network

A The topology of the network:

Notes: For each node (i.e., CPT), there are 2k conditions

▪ If n>> k, then number of entries in the BN (n2k) will be much smaller

▪ Bayesian Network implicitly encode the joint distribution

P(cavity, catch, toothache)

P(cavity, catch, toothache)

Build a Bayesian Network for the following scenario:

You went for a holiday and asked your

Given the following Bayesian Network, what is the probability of

▪ We want to find out the

P(b | j, m) = P(b, j, m) / P(j, m)

▪ Probabilistic Inference in Bayes’ Net

▪ Learning Bayes’ Nets from Data

You might also like