Module 5 1

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 22

Uncertainty Handling

• This is a traditional AI topic,


– prior to covering machine learning approaches
• There are many different approaches to handling
uncertainty
– Formal approaches based on mathematics (probabilities)
– Formal approaches based on logic
– Informal approaches
• Many questions arise
– How do we combine uncertainty values?
– How do we obtain uncertainty values?
– How do we interpret uncertainty values?
– How do we add uncertainty values to our knowledge and
inference mechanisms?
Why Is Uncertainty Needed?
• We will find none of the approaches to be entirely adequate so the
natural question is why even bother?
– Input data may be questionable
• to what extent is a patient demonstrating some symptom?
• do we rely on their word?
– Knowledge may be questionable
• is this really a fact?
– Knowledge may not be truth-preserving
• if I apply this piece of knowledge, does the conclusion necessarily hold true?
associational knowledge for instance is not truth preserving, but used all the
time in diagnosis
– Input may be ambiguous or unclear
• this is especially true if we are dealing with real-world inputs from sensors, or
dealing with situations where ambiguity readily exists (natural languages for
instance)
– Output may be expected in terms of a plausibility/probability such as “what
is the likelihood that it will rain today?”
• The world is not just T/F, so our reasoners should be able to
model this and reason over the shades of grey we find in the world
Methods to Handle Uncertainty
• Fuzzy Logic
– Logic that extends traditional 2-valued logic to be a continuous logic
(values from 0 to 1)
• while this early on was developed to handle natural language ambiguities such
as “you are very tall” it instead is more successfully applied to device
controllers
• Probabilistic Reasoning
– Using probabilities as part of the data and using Bayes theorem or variants
to reason over what is most likely
• Hidden Markov Models
– A variant of probabilistic reasoning where internal states are not observable
(so they are called hidden)
• Certainty Factors and Qualitative Fuzzy Logics
– More ad hoc approaches (non formal) that might be more flexible or at least
more human-like
• Neural Networks
– We will skip these in this lecture as we want to talk about NNs more with
respect to learning
Bayesian Probabilities
• Bayes Theorem is given below

– P(H0 | E) = probability that H0 is true given evidence E (the


conditional probability)
– P(E | H0) = probability that E will arise given that H0 has
occurred (the evidential probability)
– P(H0) = probability that H0 will arise (the prior probability)
– P(E) = probability that evidence E will arise
– Usually we normalize our probabilities so that P(E) = 1
• The idea is that you are given some evidence E = {e1, e2,
…, en} and you have a collection of hypotheses H1, H2,
…, Hm
– Using a collection of evidential and prior probabilities,
compute the most likely hypothesis
Independence of Evidence
• Note that since E is a collection of some evidence, but
not all possible evidence, you will need a whole lot of
probabilities
– P(E1 & E2 | H0), P(E1 & E3 | H0), P(E1 & E2 & E3 | H0), …
– If you have n items that could be evidence, you will need 2n
different evidential probabilities for every hypothesis
• In order to get around the problem of needing an
exponential number of probabilities, one might make the
assumption that pieces of evidence are independent
– Under such an assumption
• P(E1 & E2 | H) = P(E1 | H) * P(E2 | H)
• P(E1 & E2) = P(E1) * P(E2)
– Is this a reasonable assumption?
Continued
• Example: a patient is suffering from a fever and nausea
– Can we treat these two symptoms as independent?
• one might be causally linked to the other
• the two combined may help identify a cause (disease) that the
symptoms separately might not
• A weaker form of independence is conditional
independence
– If hypothesis H is known to be true, then whether E1 is true
should not impact P(E2 | H) or P(H | E2)
– Again, it this a reasonable assumption?
• Consider as an example:
– You want to run the sprinkler system if it is not going to rain
and you base your decision on whether it will rain or not on
whether it is cloudy
• the grass is wet, we want to know the probability that you ran the
sprinkler versus if it rained
• evidential probabilities P(sprinkler | wet) and P(rain | wet) are not
independent of whether it was cloudy or not
• Marginal Probability: The probability of an
event irrespective of the outcomes of other
random variables, e.g. P(A).
• Joint Probability: Probability of two (or
more) simultaneous events, e.g. P(A and B) or
P(A, B).
• Conditional Probability: Probability of one
(or more) event given the occurrence of
another event, e.g. P(A given B) or P(A | B).
• The joint probability can be calculated using
the conditional probability; for example:
• P(A, B) = P(A | B) * P(B)
• The conditional probability can be calculated
using the joint probability; for example:
• P(A | B) = P(A, B) / P(B)
• Bayes Theorem: Principled way of
calculating a conditional probability without
the joint probability.
• It is often the case that we do not have access
to the denominator directly, e.g. P(B).
• P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)
Therefore,P(A|B) = P(B|A) * P(A) / ((P(B|A) *
P(A) + P(B|not A) * P(not A))
• P(A|B): Posterior probability.
• P(A): Prior probability.
• P(B|A): Likelihood.
• P(B): Evidence
Bayes Theorem to be restated as:
• Posterior = Likelihood * Prior / Evidence
Probability Basics
Bayes' Theorem
• Bayes’ Theorem is a way of finding
a probability when we know certain other
probabilities.
Which tells us:
how often A happens given that B happens,
written P(A|B)
• When
P(A|B)
we know:
= how P(A) P(B|A) / P(B)
often B happens given that A happens, written P(B|A)
and how likely A is on its own, written P(A)
and how likely B is on its own, written P(B)
Bayes' Theorem
Bayes' Theorem
Bayes' Theorem
Bayes' Theorem
Three different machines are used to produce a particular manufactured item. The
three machines, A, B and C, produce 20%, 30% and 50% of the items, respectively.
Now, machines A, B and C produce defective items at a rate of 1%, 2% and 3%,
respectively. Suppose that we pick an item from the final batch at random. The item
is found to be defective. What is the probability that the item was produced by
machine B?
Bayesian Networks
• We can avoid the assumption of independence by including
causality in our knowledge
– For this, we enhance our previous approach by using a network where
directed edges denote some form of dependence or causality
• An example of a causal network
is shown to the right along with
the probabilities (evidential and
prior)
– we cannot use Bayes theorem
directly because the evidential
probabilities are based on the prior
probability of cloudy
• However, a propagation
algorithm can be applied where
the prior probability for
cloudiness will impact the
evidential probabilities of
sprinkler and rain
– from there, we can finally compute
the likelihood of rain versus
sprinkler
Real World
Example

Here is a Bayesian net for


classification of intruders
in an operating system

Notice that it contains


cycles

The probabilities for the


edges are learned by sorting
through log data

HMMs
A Markov model is a state transition
diagram with probabilities on the edges
– We use a Markov model to compute the
probability of a certain sequence of states
• see the figure to the right
• This is extremely
• In many problems, we have useful for recognition
observations to tell us what states have problems where we
been reached, but observations may not know the end result
show us all of the states but not how the end
– Intermediate states (those that are not result was produced
identifiable from observations) are hidden
– we know the patient’s
– In the figure on the right, the observations symptoms but not the
are Y1, Y2, Y3, Y4 and the hidden states disease that caused
are Q1, Q2, Q3, Q4 the symptoms to
• The HMM allows us to compute the appear
most probable path that led to a – we know the speech
particular observable state signal that the speaker
– This allows us to find which hidden states uttered, but not the
phonemes that made
were most likely to have occurred up the speech signal

You might also like