Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 32

Bayesian Belief Networks

1
Introduction
Suppose you are trying to determine
if a patient has inhalational
anthrax. You observe the
following symptoms:
• The patient has a cough
• The patient has a fever
• The patient has difficulty
breathing

2
Introduction
You would like to determine how
likely the patient is infected with
inhalational anthrax given that the
patient has a cough, a fever, and
difficulty breathing

We are not 100% certain that the


patient has anthrax because of these
symptoms. We are dealing with
uncertainty!

3
Introduction
Now suppose you order an x-ray
and observe that the patient has a
wide mediastinum.
Your belief that that the patient is
infected with inhalational anthrax is
now much higher.

4
Introduction
• In the previous slides, what you observed
affected your belief that the patient is
infected with anthrax
• This is called reasoning with uncertainty
• Wouldn’t it be nice if we had some
methodology for reasoning with
uncertainty? Why in fact, we do…

5
Bayesian Networks
HasAnthrax

HasCough HasFever HasDifficultyBreathing


HasWideMediastinum

• In the opinion of many AI researchers, Bayesian


networks are the most significant contribution
in AI in the last 10 years
• They are used in many applications eg. spam
filtering, speech recognition, robotics, diagnostic
systems and even syndromic surveillance
6
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks

7
Probability Primer: Random Variables
• A random variable is the basic element of
probability
• Refers to an event and there is some degree
of uncertainty as to the outcome of the
event
• For example, the random variable A could
be the event of getting a heads on a coin
flip

8
Boolean Random Variables
• We will start with the simplest type of random
variables – Boolean ones
• Take the values true or false
• Think of the event as occurring or not
occurring
• Examples (Let A be a Boolean random
variable):
A = Getting heads on a coin flip
A = It will rain today
A = The Cubs win the World Series in 2007
9
Probabilities
We will write P(A = true) to mean the probability that A = true.
What is probability? It is the relative frequency with which an
outcome would be obtained if the process were repeated a large
number of times under similar conditions*

The sum of the red


and blue areas is
1 P(A = true)

P(A = false)

10
Conditional Probability
• P(A = true | B = true) = Out of all the outcomes in which B
is true, how many also have A equal to true
• Read this as: “Probability of A conditioned on B” or
“Probability of A given B”

H = “Have a headache”
F = “Coming down with Flu”
P(F = true)
P(H = true) = 1/10
P(F = true) =
1/40
P(H = true | F =
true) = 1/2
P(H = true)
“Headaches are rare and flu is rarer, but if
you’re coming down with flu there’s a
11
50-50 chance you’ll have a headache.”
The Joint Probability Distribution
• We will write P(A = true, B = true) to mean
“the probability of A = true and B = true”
• Notice that:
P(H=true|F=true)
P(F = true)

Area of " H and F"



region Area of " F"
region
P(H = true)
P(H  true,F  true)

P(F  true) 12
The Joint Probability Distribution
• Joint probabilities can be between A B C P(A,B,C)
false false false 0.1
any number of variables
false false true 0.2
eg. P(A = true, B = true, C =
false true false 0.05
• true) false true true 0.05
For each combination of variables, true false false 0.3
we need to say how probable that true false true 0.1
• combination is true true false 0.05
The probabilities of these true true true 0.15
combinations need to sum to 1

Sums to 1

13
The Joint Probability Distribution
A B C P(A,B,C)

• Once you have the joint probability false false false 0.1
distribution, you can calculate any false false true 0.2
probability involving A, B, and C false true false 0.05
• Note: May need to use false true true 0.05
marginalization and Bayes rule, true false false 0.3
(both of which are not discussed in
true false true 0.1
these slides)
true true false 0.05
true true true 0.15
Examples of things you can compute:
• P(A=true) = sum of P(A,B,C) in rows with A=true
• P(A=true, B = true | C=true) =
P(A = true, B = true, C = true) / P(C = true)
14
The Problem with the Joint
Distribution
• Lots of entries in the A B C P(A,B,C)
table to fill up! false false false 0.1

• For k Boolean random false false true 0.2


false true false 0.05
variables, you need a false true true 0.05
table of size 2k true false false 0.3
• How do we use fewer true false true 0.1
true true false 0.05
numbers? Need true true true 0.15
the concept of
independence

15
Independence
Variables A and B are independent if any of
the following hold:
• P(A,B) = P(A) P(B)
• P(A | B) = P(A)
• P(B | A) = P(B)
This says that knowing the outcome of
A does not tell me anything new
about the outcome of B.

16
Independence
How is independence useful?
• Suppose you have n coin flips and you want to
calculate the joint distribution P(C1, …, Cn)
• If the coin flips are not independent, you need 2n
values in the table
• If the coin flips are independent, then
n
Each P(Ci) table has 2 entries
P(C1,..., Cn )   and there are n of them for a
P(Ci ) total of 2n values
i1
17
Conditional Independence
Variables A and B are conditionally
independent given C if any of the following
hold:
• P(A, B | C) = P(A | C) P(B | C)
• P(A | B, C) = P(A | C)
• P(B | A, C) = P(B | C)
Knowing C tells me everything about B. I don’t gain
anything by knowing A (either because A doesn’t
influence B or because knowing C provides all the
information knowing A would give) 18
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks

19
Bayesian Networks
1. Encodes the conditional independence
relationships between the variables
(attributes) in the graph structure
2. Is a compact representation of the joint
probability distribution over the variables
3. Allow the representation of dependencies
among subsets of variables.
4. Provide graphical model of causal
relationships between attributes
20
A Bayesian Network
A Bayesian network is made up of:
1. A Directed Acyclic Graph
A

D
A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
2. A set of conditional
false 0.6 probabilities
false false tables
0.01for eachfalse
node in the
false 0.02 false false 0.4
graph
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
A Directed Acyclic Graph
Each node in the graph is a A node X is a parent of
random variable another node Y if there is an
arrow from node X to node Y
A eg. A is a parent of B

C D

Informally, an arrow from


node X to node Y means
X has a direct influence
on Y 21
A Set of Tables for Each Node
A P(A) A B P(B|A)
Each node Xi has a
false 0.6 false false 0.01
true 0.4 false true 0.99
conditional probability
true false 0.7 distribution P(Xi | Parents(Xi))
true true 0.3 that quantifies the effect of
the parents on the node
B C P(C|B)
false false 0.4 The parameters are the
false true 0.6 A probabilities in these
true false 0.9 conditional probability tables
true true 0.1 (CPTs)
B
B D P(D|B)
false false 0.02
C D
false true 0.98
true false 0.05
true true 0.95
A Set of Tables for Each Node
Conditional Probability
Distribution for C given B
B C P(C|B)
false false 0.4
false true 0.6
true false 0.9
true true 0.1 For a given combination of values of the parents (B
in this example), the entries for P(C=true | B) and
P(C=false | B) must add up to 1
eg. P(C=true | B=false) + P(C=false |B=false )=1

If you have a Boolean variable with k Boolean parents, this table


has 2k+1 probabilities (but only 2k need to be stored)
24
Conditional Independence
Given its parents (P1, P2), a node (X) is
conditionally independent of its non- descendants
(ND1, ND2)

P1 P2

ND1 X ND2

C1 C2

25
The Joint Probability Distribution
We can compute the joint probability
distribution over all the variables X1, …, Xn in
the Bayesian net using the formula:

P( X1  x1 ,..., X n  xn )  
n
P( X i  xi | Parents( X i ))
i1

Where Parents(Xi) means the values of the Parents of the node Xi


with respect to the graph

26
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A

C D

27
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
This is from the
P(A = true, B = true, C = true, D = true) graph structure
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A

B
These numbers are from the
conditional probability tables
C D

28
Example
Inference
• Using a Bayesian network to compute
probabilities is called inference
• In general, inference involves queries of the form:
P( X | E )
E = The evidence variable(s)
X = The query variable(s)

30
Inference
HasAnthrax

HasCough HasFever HasDifficultyBreathing HasWideMediastinum

• An example of a query would be:


P( HasAnthrax = true | HasFever = true, HasCough = true)
• Note: Even though HasDifficultyBreathing and
HasWideMediastinum are in the Bayesian network, they are
not given values in the query (ie. they do not appear either as
query variables or evidence variables)
• They are treated as unobserved variables
31
The Bad News
• Exact inference is feasible in small to
medium-sized networks
• Exact inference in large networks takes a
very long time
• We resort to approximate inference
techniques which are much faster and give
pretty good results

32

You might also like