Bayes Network - Artificial Intelligence

Artificial Intelligence – Bayes Network https://www.norwegiancreations.com/2018/09/artificial-intelligence-ba...
Artificial Intelligence – Bayes Network
This post will be the �rst in a series on Arti�cial Intelligence (AI), where we will investigate the theory
behind AI and incorporate some practical examples. The �rst, and perhaps most important section of this
series, will be on probability, where we will look at the fundamentals of any AI. Some of the most important
aspects of probability for AI are be restructured probabilities using something called Bayes Networks. We’ll
investigate this through a medical example.
Case: Liver Disorder
The example below is taken from the publication “A Bayesian Network Model for Diagnosis of Liver Disorders” –
Agnieszka Onisko, M.S., Marek J. Druzdzel, Ph.D., and Hanna Wasyluk, M.D.,Ph.D.- September 1999.
1 of 9 18-Apr-20, 13:31
Imagine that you’re a researcher investigating if a patient has a liver disorder. Now what could be the
cause of this liver disorder? Well, if you open a medical journal you will see that gallstones could be a
cause, a history of hepatitis could be another, it could be alcoholism or many others. These causes may
initially be unobservable. You can’t check every single patient that comes into the hospital if they have
gallstones, hepatitis and enjoy a few too many at home just in the o� chance that they may have a liver
disorder.
Well, the medical journal also says that a symptom of gallstones is upper abdominal pain, hepatitis can be
seen in a blood test and alcoholism can show up as iron de�ciency. Finally, some observable symptoms.
So now we can say with a higher probability if there are problems with gallstones, hepatitis or alcoholism,
if they assert the symptoms mentioned above. Also, the symptoms giving a higher probability of gallstones,
hepatitis or alcoholism, will again help explain the probability of having a liver disorder.
And what does liver disorder cause? It can cause fatigue, body hair loss, enlarged spleen, etc. What we end
up with is a network – A Bayes Network – of cause and e�ect based on probability to explain a speci�c
case, given a set of known probabilities. In other words, a Bayesian Network is a network that can explain
quite complicated structures, like in our example of the cause of a liver disorder.
Theory
A Bayesian Network is composed of nodes, where the nodes correspond to events that you might or might
not know. They’re typically called random variables, which may be discrete or continuous. These nodes are
connected by arrows, and if there is an arrow from X to Y, X is said to be parent to Y. Each node Xi has a
conditional probability distribution P(Xi|Parents(Xi)). Bayes Networks de�ne the probability distribution
over graphs of random variables.
In the Bayesian Network above, we have a total of 94 variables! What the graph structures and the
associated probabilities specify, is a huge probability distribution in the space of 94 variables. If they are
binary, which we will assume throughout this example, they can take 2^94-1 di�erent values if you don’t
have a structured graph like above – that’s a lot! This is where the Bayesian Network is key. A Bayesian
Network’s advantage is how compact the representation of a probability distribution is, such as this very
large Joint Probability Distribution (JPD), compared to unstructured representations (like non-graph
structures). Just to clarify, JPD is the probability of every possible event as de�ned by the combination of
the values of all the variables. It’s getting a bit more complicated now, so let’s freshen up some probability
theory from high school/college.
Theory: Quick intro to probability
2 of 9 18-Apr-20, 13:31
We assume the reader has a decent knowledge of probability and statistics background, but let’s repeat
some of the notation anyways:
P(A) – probability of event A
P(A’) = 1 – P(A) – Complementary probability of P(A)
P(A ∩ B) – Probability of events A and B
P(A ∪ B) – Probability of events A or B
P(A|B) – Probability of event A given event B occurred.
A⟂B – A and B are independent of each other. If A⟂B, then we can write that P(A,B) =P(A)*P(B), since
they are independent.
Perhaps the most important rule in AI is the Bayes Rule, which was invented by Thomas Bayes, a British
mathematician. Bayes Rule is stated as following:
Until now we have a pretty good understanding of calculating the probability B, given that we have A, but
not probability A, given we have B. Now it becomes apparent that we can use Bayes Rule to calculate
“backwards”, so to speak, in our example of the liver disorder.
Example
Let’s try out Bayes Network with another example:
Say that you want to �nd out if you are allergic to gluten, but you can’t observe it with your eyes
(unobservable), so you have to perform a test.
The probability of you being allergic to gluten is only 2%, the probability that you are allergic and the test
detects it is 0.8 (denoted as ‘+’), and the probability that you are not allergic, but the test still says that you
are is 0.1(denoted as ‘-’).
– To the reader: these are typical values that can be determined from data of previous testing.
3 of 9 18-Apr-20, 13:31
So, we have, -> which gives us
P(G) = 0.02 -> P(G’) = 0.98
P(+|G) = 0.8 -> P(-|G) = 0.2
P(+|G’) = 0.1 -> P(-|G’) = 0.9
Now, what is the probability of you being allergic to gluten, given that the test comes back positive.
P(G|+) = P(+|G)*P(G)/P(+) = 0.8*0.02/P(+).
Now, P(+) is the probability of when you are allergic and have a positive result, and when you are not
allergic but still have a positive result:
P(+) = P(+,G) + P(+, G’) = 0.8*0.02 + 0.1*0.98 = 0.114
Giving us the result of P(G|+) = 0.14
Types of Bayesian networks

There are many di�erent types of Bayes networks (see below)
4 of 9 18-Apr-20, 13:31
We’ll take a closer look at that last one:
In the example above we have �ve random variables, where the Bayes Network de�nes the distribution
over those �ve random variables – P(A,B,C,D,E). So instead of calculating all possible combinations of those
�ve random variables, the Bayes Network is de�ned by probability distributions that are inherent to each
individual node.
A and B are only dependent on their own variable, so their distribution is P(A) and P(B), since there are no
arrows (connection) coming into them. C, on the other hand, is conditioned on A and B, so we have
P(C|A,B). D and E are conditioned on C, so we have P(D|C), P(E|C).
This gives us the joint probability, represented by a Bayes Network. The joint probability is the product of
various Bayes Network probabilities that are de�ned over the individual nodes, where each node’s
probability is only conditioned on the incoming arrows.
P(A,B,C,D,E) = P(A)*P(B)*P(C|A,B)*P(D|C)*P(E|C)
A and B have no incoming arrows, so they a have a probability distribution of P(A) and P(B).
5 of 9 18-Apr-20, 13:31
C has two incoming arrows, so it’s probability is conditioned on A and B, giving us P(C|A,B).
D and E are both conditioned on C, giving us P(D|C) and P(E|C).
So, the de�nition of this setup for the joint distribution, P(A,B,C,D,E), is based on the factors above, and
gives us one really BIG advantage. We know that the joint distribution over any �ve random variables
requires 2^5-1=31 probability values, while our Bayes network only requires 10 probability values.
P(A) is one value, since we can also derive P(A’) (and the same is true for B). P(C|A,B) is derived by a
distribution over C, conditioned on any combination of A and B. P(D|C) and P(E|C) are conditioned on C
and C’, which give two values. If we add these up we get:
P(A)*P(B)*P(C|A,B)*P(D|C)*P(E|C)
1+1+4+2+2 = 10 parameters in total
1: (P(A)/P(A’)
1: P(B)/P(B’)
4: (P(C|A,B)+ P(C|A,B’)+ P(C|A’,B)+ P(C|A’,B’)
P(D|C) + P(D|C’)
2: P(E|C) + P(E|C’)
Example
If the calculations of the 10 parameters above seemed a bit…. abstract, take a look at the example below,
taken from UT Austin – CS 343:
6 of 9 18-Apr-20, 13:31
P(Burglary)*P(Earthquake)*P(Alarm|Burglary, Earthquake)*P(JohnCalls|Alarm)*P(MaryCalls|Alarm)
Here we are looking at the JPD of an alarm going o�, when the causes may be burglary and/or
earthquakes, and, the probability that either John or Mary calls to check in.
Advantage of Bayesian networks

In the previous section, we saw that we would only need 10 probability values, compared to 31 for an
unstructured non-graph method. It might not seem like such a di�erence, but when scaling to a larger and
more complex problem, the compactness of the network leads to a representation that scales signi�cantly
better to large networks. This is a key reason why Bayes Networks are being used so extensively.
So, what does this look when it gets really complicated? Let’s look back at the �rst example on the liver
disorder.
There is no way we can calculate the joint probability with 2^94-1 probability values, but we now have the
tools to �gure out the joint probability through a Bayes Network instead. After counting through the
di�erent nodes, we �nd that we need to know about only 231 probability values to specify the joint
probability of the liver disorder. With the risk of repeating ourselves, this is one of the main advantages of
using a compact Bayes Network representation instead an unstructured joint representation.
7 of 9 18-Apr-20, 13:31
Conclusion
This was a very short and simple introduction to Bayes Networks, where part of the material comes from
Arti�cial Intelligence – A Modern Approach (Russell, Norvig), http://www.ee.columbia.edu/~vittorio
/lecture12.pdf , https://www.youtube.com/channel/UCshmLD2MsyqAKBx8ctivb5Q/feed and
https://classroom.udacity.com/courses/cs271 , which we strongly recommend for everyone interested in
pursuing this subject.
Before signing o�, we want to introduce one of many practical applications for Bayes Networks in AI,
namely, a Bayesian Neural Network (BNN). While Bayesian Networks have a determined probability value
for each event so that one might derive a value at the wanted end state, BNN’s learn the probabilities and
the probability values, and optimize their learning from the input data to the output (or action). This is a bit
tricky, so let’s scale it back a bit. We often have to make decisions based on our best guess, which can be
based on imperfect and/or incomplete information. So, it’s not that unreasonable to think that the best
way to make decisions, based on such uncertain data, is to keep good track of the uncertainty – i.e. the
probability distribution. What this means is that we incorporate our prior information into the model.
Example of Neural Network Taken from: https://qph.fs.quoracdn.net/main-

qimg-6b355ef13c0f68b715399c0617ca6c72
Our data is used as input into the neural network, which includes levels of nodes set up to give an optimal
8 of 9 18-Apr-20, 13:31

Bayes Network - Artificial Intelligence

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayes Network - Artificial Intelligence

Uploaded by

Copyright:

Available Formats

Artificial Intelligence – Bayes Network https://www.norwegiancreations.com/2018/09/artificial-intelligence-ba...

Artificial Intelligence – Bayes Network

Case: Liver Disorder

Theory: Quick intro to probability

P(A) – probability of event A

P(A’) = 1 – P(A) – Complementary probability of P(A)

P(A ∩ B) – Probability of events A and B

P(A ∪ B) – Probability of events A or B

P(A|B) – Probability of event A given event B occurred.

Let’s try out Bayes Network with another example:

So, we have, -> which gives us

P(G) = 0.02 -> P(G’) = 0.98

P(+|G) = 0.8 -> P(-|G) = 0.2

P(+|G’) = 0.1 -> P(-|G’) = 0.9

P(G|+) = P(+|G)*P(G)/P(+) = 0.8*0.02/P(+).

P(+) = P(+,G) + P(+, G’) = 0.8*0.02 + 0.1*0.98 = 0.114

Giving us the result of P(G|+) = 0.14

Types of Bayesian networks

We’ll take a closer look at that last one:

D and E are both conditioned on C, giving us P(D|C) and P(E|C).

1+1+4+2+2 = 10 parameters in total

4: (P(C|A,B)+ P(C|A,B’)+ P(C|A’,B)+ P(C|A’,B’)

Advantage of Bayesian networks

Example of Neural Network Taken from: https://qph.fs.quoracdn.net/main-

You might also like

P(G|+) = P(+|G)P(G)/P(+) = 0.80.02/P(+).

P(+) = P(+,G) + P(+, G’) = 0.80.02 + 0.10.98 = 0.114