Professional Documents
Culture Documents
Baysian Belief Networks
Baysian Belief Networks
1
Introduction
Suppose you are trying to determine
if a patient has inhalational
anthrax. You observe the
following symptoms:
• The patient has a cough
• The patient has a fever
• The patient has difficulty
breathing
2
Introduction
You would like to determine how
likely the patient is infected with
inhalational anthrax given that the
patient has a cough, a fever, and
difficulty breathing
3
Introduction
Now suppose you order an x-ray
and observe that the patient has a
wide mediastinum.
Your belief that that the patient is
infected with inhalational anthrax is
now much higher.
4
Introduction
• In the previous slides, what you observed
affected your belief that the patient is
infected with anthrax
• This is called reasoning with uncertainty
• Wouldn’t it be nice if we had some
methodology for reasoning with
uncertainty? Why in fact, we do…
5
Bayesian Networks
HasAnthrax
7
Probability Primer: Random Variables
• A random variable is the basic element of
probability
• Refers to an event and there is some degree
of uncertainty as to the outcome of the
event
• For example, the random variable A could
be the event of getting a heads on a coin
flip
8
Boolean Random Variables
• We will start with the simplest type of random
variables – Boolean ones
• Take the values true or false
• Think of the event as occurring or not
occurring
• Examples (Let A be a Boolean random
variable):
A = Getting heads on a coin flip
A = It will rain today
A = The Cubs win the World Series in 2007
9
Probabilities
We will write P(A = true) to mean the probability that A = true.
What is probability? It is the relative frequency with which an
outcome would be obtained if the process were repeated a large
number of times under similar conditions*
P(A = false)
10
Conditional Probability
• P(A = true | B = true) = Out of all the outcomes in which B
is true, how many also have A equal to true
• Read this as: “Probability of A conditioned on B” or
“Probability of A given B”
H = “Have a headache”
F = “Coming down with Flu”
P(F = true)
P(H = true) = 1/10
P(F = true) =
1/40
P(H = true | F =
true) = 1/2
P(H = true)
“Headaches are rare and flu is rarer, but if
you’re coming down with flu there’s a
11
50-50 chance you’ll have a headache.”
The Joint Probability Distribution
• We will write P(A = true, B = true) to mean
“the probability of A = true and B = true”
• Notice that:
P(H=true|F=true)
P(F = true)
Sums to 1
13
The Joint Probability Distribution
A B C P(A,B,C)
• Once you have the joint probability false false false 0.1
distribution, you can calculate any false false true 0.2
probability involving A, B, and C false true false 0.05
• Note: May need to use false true true 0.05
marginalization and Bayes rule, true false false 0.3
(both of which are not discussed in
true false true 0.1
these slides)
true true false 0.05
true true true 0.15
Examples of things you can compute:
• P(A=true) = sum of P(A,B,C) in rows with A=true
• P(A=true, B = true | C=true) =
P(A = true, B = true, C = true) / P(C = true)
14
The Problem with the Joint
Distribution
• Lots of entries in the A B C P(A,B,C)
table to fill up! false false false 0.1
15
Independence
Variables A and B are independent if any of
the following hold:
• P(A,B) = P(A) P(B)
• P(A | B) = P(A)
• P(B | A) = P(B)
This says that knowing the outcome of
A does not tell me anything new
about the outcome of B.
16
Independence
How is independence useful?
• Suppose you have n coin flips and you want to
calculate the joint distribution P(C1, …, Cn)
• If the coin flips are not independent, you need 2n
values in the table
• If the coin flips are independent, then
n
Each P(Ci) table has 2 entries
P(C1,..., Cn ) and there are n of them for a
P(Ci ) total of 2n values
i1
17
Conditional Independence
Variables A and B are conditionally
independent given C if any of the following
hold:
• P(A, B | C) = P(A | C) P(B | C)
• P(A | B, C) = P(A | C)
• P(B | A, C) = P(B | C)
Knowing C tells me everything about B. I don’t gain
anything by knowing A (either because A doesn’t
influence B or because knowing C provides all the
information knowing A would give) 18
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks
19
Bayesian Networks
1. Encodes the conditional independence
relationships between the variables
(attributes) in the graph structure
2. Is a compact representation of the joint
probability distribution over the variables
3. Allow the representation of dependencies
among subsets of variables.
4. Provide graphical model of causal
relationships between attributes
20
A Bayesian Network
A Bayesian network is made up of:
1. A Directed Acyclic Graph
A
D
A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
2. A set of conditional
false 0.6 probabilities
false false tables
0.01for eachfalse
node in the
false 0.02 false false 0.4
graph
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
A Directed Acyclic Graph
Each node in the graph is a A node X is a parent of
random variable another node Y if there is an
arrow from node X to node Y
A eg. A is a parent of B
C D
P1 P2
ND1 X ND2
C1 C2
25
The Joint Probability Distribution
We can compute the joint probability
distribution over all the variables X1, …, Xn in
the Bayesian net using the formula:
P( X1 x1 ,..., X n xn )
n
P( X i xi | Parents( X i ))
i1
26
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A
C D
27
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
This is from the
P(A = true, B = true, C = true, D = true) graph structure
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A
B
These numbers are from the
conditional probability tables
C D
28
Example
Inference
• Using a Bayesian network to compute
probabilities is called inference
• In general, inference involves queries of the form:
P( X | E )
E = The evidence variable(s)
X = The query variable(s)
30
Inference
HasAnthrax
32