Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

UCCD2063

Artificial Intelligence Techniques

Unit 12:
Bayesian Network
Outline
• Independence
• Conditional Independence
• Bayesian Network
• Inference in Bayesian Network

References:
• Chapter 14 in Russell & Norvig
• CS188 Lecture Note: Bayes Nets [link]
Independence

▪ An event A is independent of event B if A is not affected by B.


▪ For example
• When you toss a coin, the probability of getting head at
any time is not affected by previous tosses.
• The probability that today is A’s birthday is independent
of the date of B’s birthday.

▪ To denote the independence of the two variables A and B, we


use the following notation:
𝐴⊥𝐵

3
Independence
𝑃𝑟𝑜𝑑𝑢𝑐𝑡 𝑟𝑢𝑙𝑒:
▪ If two variables x and y are independent: 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦 𝑃(𝑦)

▪ For example: If the passing exam and the weather are


independent:
W P(W)
E P(E)
hazy 0.6
pass 0.85
sunny 0.1
pass 0.15
rainy 0.3
Passing an exam Weather

Then, the probability of passing exam and a sunny day is


P(pass, sunny) = P(pass)P(sunny)
= 0.85*0.1 = 0.085
Notes: Only correct if the weather and passing an exam are independent. 4
Independence

▪ For two independent variables , the joint distribution can be


derived directly from their marginal distributions
▪ For example: the joint distribution of the two independent
variables:
Passing an exam Weather

is given by:

hazy sunny rainy


0.85*0.6 0.85*0.1 0.85*0.3
pass = 0.085 = 0.255
= 0.51
0.15*0.6 0.15*0.1 0.15*0.3
pass = 0.09 = 0.015 = 0.045
5
Independence

▪ Independent variables also have the following properties:

▪ Consider the previous example:


W and E are independent:
P(pass | hazy) = P(pass, hazy)/P(hazy)
= 0.51 / 0.6
= 0.85
= P(pass)

P(hazy| pass) = P(pass, hazy)/P(pass)


= 0.51 / 0.85
= 0.6
= P(hazy)
6
Example
T W P
Given the following joint distribution P1(T, W) hot sun 0.4
determine if T and W are independent hot rain 0.2
cold sun 0.1
cold rain 0.3

Answer: W P T P
sun 0.5 hot 0.6
First, create the marginal rain 0.5 cold 0.4
distribution:
T W P
Next, use the
hot sun 0.3
independence assumption hot rain 0.3
to compute the joint cold sun 0.2
distribution P2(T, W) cold rain 0.2

T and W are not independent because the original joint distribution


P1(W, T) is very different from the joint distribution generated from the
7
independence assumption P2(W, T)
Simplifying Joint Distributions

▪ The independence property can simplify the probability


distribution
▪ For example, if Weather is independent of Cavity, Toothache
and Catch, we may break the original table into two tables:

P(Toothache, Catch, Cavity, Weather)


= P(Toothache, Catch, Cavity) P(Weather)

8
Simplifying Joint Distributions

▪ This results in less entries


P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather)

8 entries

4 entries
32 entries

Assume Weather: {rainy, sunny, foggy, snowy}.


Cavity: {cavity, cavity},
Toothache: {toothache, toothache}
Catch: {catch, catch}

The number of variables reduces from 32 (i.e., 4222) entries to 12


(i.e., 4 + 222) entries
9
Example

• For n independent unbiased coins:

P(Coin1, …, Coinn) The full joint distribution has a total 2n


entries

With independence assumption, we


need to keep n table, each have a size
of 2. Total entries = 2n.
P(Coin1) P(Coinn)

Since the probabilities of all tosses are


Coin P(Coin)
head 0.5
the same (i.e., all 2n tables are the
tail 0.5
same), we need to store only 1 table.
Total entries = 2. This is a huge savings in
storage.
10
Conditional Independence

▪ Unconditional independence is rare and cannot be used to infer


other variables
▪ Conditional independence:
• Two dependent variables may become independent given
some other variables → needs at least 3 variables to work.
• Example:
• Running Nose is dependent of fever.
Running Once we know someone has fever, it
Fever becomes more likely that he also has
Nose
running nose as well
• Running Nose is independent of
Flu fever given the disease.
Once we know that a person has flu,
Running knowing one symptom does not
Fever
Nose affect the likelihood of getting
another symptom anymore 11
Conditional Independence

▪ If X is conditionally independent of Y given Z


X⊥Y|Z

the following rules applies:

P(X, Y|Z) = P(X|Z) P(Y|Z)

P(X|Y, Z) = P(X|Z)

P(Y|X, Z) = P(Y|Z)

12
Example

What are the conditional independence relationships for


the following domains:

▪ Traffic jam (T)


▪ Somebody carries an umbrella (U)
▪ It is raining (R)

T⊥U|R

▪ There is fire (F)


▪ Smoke is present (S)
▪ Alarm triggers (based on smoke only) (A)

A⊥F|S
13
Example 1
Given that the probability of rainfall in a region where in average it rains
120 out of 365 days, and 40% of the employees is expected to be late for
work when it rains. Rains also cause traffic jam 80% of the time.

Assume that an employee coming late and traffic jam are conditionally
independent given the rain. What is the probability of the employee
arriving on time, there is a traffic jam, and it is a rainy day?
Information given: Query:
• P(rain) = 120/365 = 0.329 • P(late, jam, rain)
• P(late | rain) = 0.40
• P(jam | rain) = 0.80
• Late ⊥ Jam | Rain

P(late, jam, rain)


= P(late, jam | rain) P(rain) (product rule)
= P(late | rain) P (jam | rain) P(rain) (conditional independence assumptions)
= (1 – P(late|rain)) P(jam|rain) P(rain)
= 0.6*0.8*0.329
= 0.15792 14
Example 2

Given that the probability of rainfall in a region where in average it


rains 120 out of 365 days, and the probability of raining and an
employee being late is 0.30.
Assume that an employee coming late and traffic jam are conditionally
independent given the rain. What is the probability of an employee
coming late given that it is raining and there is no traffic jam?

Information given: Query:


• P(rain) = 120/365 = 0.329 P(late | jam, rain)
• P(late, rain) = 0.3
• Late ⊥ Jam | Rain

P(late| jam, rain)


= P(late | rain) (conditional independence assumption)
= P(late, rain)/P(rain) (product rule)
= 0.3 / 0.329
= 0.912
15
Chain Rule

▪ A joint probability distribution can be expressed using


conditional distributions by using a chain of product
rules. For example: 𝑃𝑟𝑜𝑑𝑢𝑐𝑡 𝑟𝑢𝑙𝑒:
𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑦 𝑃(𝑦)
P(Umbrella ,Traffic, Rain,)
= P(Umbrella | Traffic, Rain) P(Traffic, Rain)
= P(Umbrella | Traffic, Rain) P(Traffic | Rain) P(Rain)
= P(Rain | Traffic, Umbrella) P(Traffic | Umbrella) P(Umbrella)

▪ The chain rule:


P(X1, X2, … , Xn) = P(Xn | X1, …, Xn-1) P(Xn-1 | X1,…, Xn-2) …P(X2 | X1)P(X1)

16
Simplifying Joint Distribution with Conditional Independence
▪ Let's consider P(A, B, C, D)
Apply the chain rule:
P(A, B, C, D) = P(D|A, B, C) P(A, B, C)
= P(D|A, B, C) P(C|A, B) P(A, B)
= P(D|A, B, C) P(C|A, B) P(B|A) P(A)

▪ If we make conditional independence assumption, we get:

Assume D is independent of A given B and C: P(D|A,B,C) = P(D|B,C)

= P(D|B, C) P(C|A, B) P(B|A) P(A)


Assume C is independent of B given A: P(C|A,B) = P(C|A)
= P(D|B, C) P(C|A) P(B|A) P(A)
17
Bayesian Network
(Bayes’ Net)

18
Why Bayesian Network
▪ Joint distributions
• Represented using one single big table
• Typically very huge. Grows exponentially with respect to the
number of variables
• Hard to learn or estimate anything empirically
Auto Insurance Probabilistic
Model
Total 27 variables. If all variables are
binary, need to store 227 (~134 million)
entries in a full joint distribution.

Observation
A variable is locally related to a
few other variables.

19
Bayesian Network
▪ Bayesian Network
• Make assumptions on the conditional independence of
certain variables
• Represented using multiple local conditional probabilities
tables (CPTs)
• Models how variables interact locally
• Local interactions chain together to give global, indirect
interactions

A
A Bayesian network is a probabilistic graphical model
that represents a set of variables and their conditional B C
dependencies via a directed acyclic graph
D
20
The Bayesian Network

▪ Given
P(A, B, C, D) = P(D|B, C) P(C|A) P(B|A) P(A)

▪ P(D|B, C) P(C|A) P(B|A) P(A) is essentially a Bayesian Network


which can be visualized as follows:

A The topology of the network:


▪ Each node represents a variable
▪ The edge from A to B represents
that B is conditioned by A, P(B|A)
B C
▪ A variable can be conditioned by
more than one variable. For
example, D is conditioned by B and
D C, P(D|B, C)
21
The Bayesian Network
▪ Original: use one big table (i.e., P(A, B, C, D)) to model the joint distribution
▪ BN Network: when we make certain conditional independence assumptions, we
can represent P(A, B, C, D) using the Bayesian Network (BN)
▪ The BN comprises multiple tables:
• Prior distribution for root nodes. e.g., P(A)
• Conditional Probability Table (CPT) for non-root nodes, e.g., P(B|A), P(C|A)
and P(D|B, C)
P(A)
a a
A 0.4 0.6

P(C|A)
P(B|A) c c
b b B C a 0.7 0.3
a 0.67 0.33 a 0.52 0.48
a 0.3 0.7
D P(D|B, C)
d d
b, c 0.32 0.68 P(d|b,c) = 1 – P(d|b,c)
b, c 0.28 0.72
b, c 0.63 0.37
b, c 0.45 0.55 22
Network Size

▪ Bayesian Network gives a huge space savings compared to the full joint
distribution P(X1, X2, … , Xn).
▪ Considering only non-root variables in the BN, assume n binary
variables:
• Size of the full joint distribution :
2n ‒ 1 ≈ 2n
• Using the Bayesian Net, if each node has up to k parents, n number
of nodes will have size:
n2k

Notes: For each node (i.e., CPT), there are 2k conditions

▪ If n>> k, then number of entries in the BN (n2k) will be much smaller


than the full joint distribution (2n)
▪ Easier to construct local CPTs P(Xi | Parents(Xi)) than the full joint
distribution P(X1, X2, … , Xn)
23
Bayesian Network and Joint Distributions

▪ Bayesian Network implicitly encode the joint distribution


▪ We can retrieve the full joint probability from a Bayes’ Net by
multiplying the relevant conditionals (from CPTs) together (chain rule):

For example:

P(cavity, catch, toothache)


=P(cavity)P(toothache|cavity) P(catch|cavity)

P(cavity, catch, toothache)


=P( cavity)P(toothache| cavity) P(catch| cavity)

24
Example: Traffic
Given the following Bayesian Network, get the full joint distribution
P(R, T)
Answer:
R P(R) P(r, t) = P(r)P(t|r) = (1/4)(3/4) = 3/16
r 1/4
R r 3/4 P(r, t) = P(r)P(t|r) = (1/4)(1/4) = 1/16
P(r, t) = P(r)P(t|r) = (3/4)(1/2) = 3/8
R T P(T|R) P(r,  t) = P(r)P(t|r) = (3/4)(1/2) = 3/8
r t 3/4
T r t 1/4
r t 1/2
r t 1/2 P(T, R)
t t
r 3/16 1/16

r 3/8 3/8
25
Example: Alarm Network

Build a Bayesian Network for the following scenario:

You went for a holiday and asked your


two neighbors, John and Mary, to call if
they heard the alarm ringing. The
alarm can be triggered by minor
earthquakes or a burglar.

B E
Variables
▪ B: Burglary A
▪ E: Earthquake
▪ A: Alarm goes off
▪ M: Mary calls J M
▪ J: John calls 26
Example: Alarm Network (cont.)

Given the following Bayesian Network, what is the probability of


event (j, m, a, b, e) happening?

Burglary Earthquake

Alarm

Answer:
MaryCalls JohnCalls
P(j, m, a,  b, e)
= P(e)P(b)P(a|b, e)P(j|a)P(m|a)
= 0.998  0.999  0.001  0.9  0.7
= 0.00063 27
Inference in
Bayesian Network

28
Bayes’ Net Inference by Enumeration

▪ We want to find out the


Burglary Earthquake probability of certain event
happening.
▪ For example: what is the
Alarm
probability of burglary if both
John and Mary calls?
P(b|j, m) = ?
MaryCalls JohnCalls • Evidence variables:
John, Marry
• Query variable:
Burglary
• Hidden variables:
Alarm, Earthquake

29
Example: Alarm Network
What is the probability of burglary if
both John and Mary call?
P(b | j, m) = P(b, j, m) / P(j, m)
Burglary Earthquake

Alarm

MaryCalls JohnCalls

30
Example: Alarm Network (cont.)
What is the probability of burglary if
both John and Mary call?

Burglary Earthquake

Alarm

MaryCalls JohnCalls

31
Example: Alarm Network (cont.)
What is the probability of burglary if
both John and Mary call?

Burglary Earthquake
P(j, m) = P(b, j, m) + P(b, j, m)
= 0.00059224259 + 0.001491857649
= 0.002084100239
Alarm

P(b | j, m) = P(b, j, m) / P(j, m)


= 0.00059224259 / 0.002084100239
= 0.2842
MaryCalls JohnCalls

32
Bayes’ Net Summary
▪ Conditional Independences
▪ Bayes’ Net Representation
• A directed, acyclic graph, one node per random variable
• A conditional probability table (CPT) for each node
• Implicitly encode joint distributions:

▪ Probabilistic Inference in Bayes’ Net


• Enumeration (exact, exponential complexity, very slow)
• Variable elimination (exact, often better)
• Sampling (approximate, much faster)

▪ Learning Bayes’ Nets from Data

33
AI Algorithms

Search
Problem

Markov
Machine Probabilistic Decision
Learning Inference Process
Constraint
Deep Bayesian Satisfaction Adversarial
Learning Networks Problem Logic
Game

Data Model

34
The End

You might also like