PPT05-Quantifying Uncertainty

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 39

Artificial Intelligence

Week 5
Quantifying Uncertainty
LEARNING OUTCOMES

At the end of this session, students will be able to:


LO2 : Explain how to use knowledge representation in reasoning
purpose
LEARNING OBJECTIVE
1. Acting Under Uncertainty
2. Basic Probability Notation
3. Inference Using Full Joint Distributions
4. Independence
5. Bayes’ Rule and Its Use
6. Summary
ACTING UNDER UNCERTAINTY
o Agent may need to handle uncertainty, whether due to partial
observability, nondeterminism, or combination of the two. An
agent may never know for certain what state it’s in or where it
will end up after a sequence of actions.
o The agent’s knowledge can at best provide only a degree of
belief in the relevant sentences.
o Main tool for dealing with degree of belief is probability theory.
o Probability provides a way of summarizing the uncertainty that
come from laziness and ignorance, thereby solving the
quantification problem.
ACTING UNDER UNCERTAINTY
Uncertainty and rational decisions
o To makes such choices, an agent must first have preferences between the
different possible outcomes of the various plans.
o Preferences, as expressed by utilities, are combined with probabilities in the
general theory of rational decisions called decision theory.
Decision Theory = probability theory + utility theory
o Fundamental of decision theory is that an agent is rational if only if it
chooses the action that yields the highest expected utility, averaged over all
the possible outcomes of the action. Called maximum expected utility (MEU).
o The primary difference is that the decision-theoretic agent’s belief state
represents not just the possibilities for world states but also their
probabilities. Given the belief state, the agent can make probabilistic
predictions of action outcomes and hence select the action with highest
expected utility.
ACTING UNDER UNCERTAINTY
Methods for Handling Uncertainty
o Default or nonmonotonic logic:
o Assume my car does not have a flat tire
• Assume A25 works unless contradicted by evidence
o Issues: What assumptions are reasonable? How to handle contradiction?
o Rules with fudge factors:
• A25 |→0.3 get there on time
• Sprinkler |→ 0.99 WetGrass
• WetGrass |→ 0.7 Rain
o Issues: Problems with combination, e.g., Sprinkler causes Rain??
o Probability
• Model agent's degree of belief
• Given the available evidence,
• A25 will get me there on time with probability 0.04
SUMMARIZING UNCERTAINTY
o Example of uncertain reasoning: diagnosing a dental patient’s toothache.
Diagnosis-whether for medicine, automobile repair, or whatever-almost
always involves uncertainty.
o Let try to write rules for dental diagnosis using propositional logic. Consider
the following simple rule:
Toothache ⇒ Cavity
o This rule is wrong
Toothache ⇒ Cavity (x)
o Not all patients with toothaches have cavities; some of them have gum
disease, an abscess, or one of several other problems.
Toothache ⇒ Cavity ∨ GumProblem ∨ Abscess ...
Unfortunately, in order to make the rule true, we have to add an almost
unlimited list of possible problems
SUMMARIZING UNCERTAINTY
o Let’s try this rule
Cavity ⇒ Toothache (x)
But this rule is not right either; not all cavities cause pain

o The only way to fix the rule is to make it logically exhaustive: to


augment the left-hand side with all the qualifications required for a
cavity to cause a toothache
SUMMARIZING UNCERTAINTY
o Let’s try this rule
Cavity ⇒ Toothache (x)
But this rule is not right either; not all cavities cause pain
o The only way to fix the rule is to make it logically exhaustive: to
augment the left-hand side with all the qualifications required for a
cavity to cause a toothache
o The agent’s knowledge can at best provide only a degree of belief in
the relevant sentences. The main tool for dealing with degrees of
belief is probability theory.
SUMMARIZING UNCERTAINTY
Uncertainty in world model
o True uncertainty: rules are probabilistic in nature
rolling dice
o Laziness: too hard to determine exceptionless rules
takes too much work to determine all the relevant factors
too hard to use the enormous rule that result
o Theorical ignorance: don’t know all the rules
problem domain has no complete theory
o Practical ignorance: do know all the rules BUT
haven’t collected all relevant information for a particular case
BASIC PROBABILITY NOTATION
o Probability model : 0  P(w)  1 ; w and P(w) = 1 ; w
o For any proportion , P() = P(w) = 1 ; w
o Conditional probability for propositions a and b are :

o For example (the “|” is pronounced “given”)

o In a different form called the product rule :


BASIC PROBABILITY NOTATION
Example:
o To 100 child, asked “what the meaning of traffic light”, and the
answer is :
75 child know the meaning of the red lamp
35 child know the meaning of the yellow lamp
50 child know the meaning of both

o Therefore :
P(red yellow) = P(red) + P(yellow) – P(red yellow)
P(red yellow) = 0.75 + 0.35 – 0.5 = 0.6
INFERENCE USING FULL JOINT
o DISTRIBUTIONS
A complete specification of the state of the world about which
the agent is uncertain
E.g., if the world consists of only two Boolean variables
Cavity and Toothache, then there are 4 distinct atomic
events:

Cavity = false Toothache = false


Cavity = false  Toothache = true
Cavity = true  Toothache = false
Cavity = true  Toothache = true

o Atomic events are mutually exclusive and exhaustive


INFERENCE USING FULL JOINT
Full Joint Distribution DISTRIBUTIONS
o Start with the joint probability distribution:

o For any proposition φ, sum the atomic events where it is true: P(φ)
= Σω:ω╞φ P(ω)
INFERENCE USING FULL JOINT
DISTRIBUTIONS
Inference by Enumeration
o Start with the joint probability distribution:

o Six possible world in which cavity  toothache holds:


P(cavity  toothache)
= 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28
INFERENCE USING FULL JOINT
Inference by Enumeration DISTRIBUTIONS
o Start with the joint probability distribution:

o P(cavity) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2


INFERENCE USING FULL JOINT
Inference by Enumeration DISTRIBUTIONS
o Start with the joint probability distribution:

o Can also compute conditional probabilities:


P(cavity | toothache) = P(cavity  toothache)
P(toothache)
= 0.016+0.064
0.108 + 0.012 + 0.016 + 0.064
INFERENCE USING FULL JOINT
Normalization DISTRIBUTIONS
o Denominator can be viewed as a normalization constant α
P(Cavity | toothache) = α P(Cavity, toothache)
= α [P(Cavity, toothache, catch) + P(Cavity, toothache,  catch)]
= α [ 0.108, 0.016  +  0.012, 0.064 ]
= α [(0.108 + 0.012) + (0.012, 0.064)]
=  0.12, 0.08  =  0.6, 0.4 
General Idea: compute distribution on query variable by fixing
evidence variables and summing over hidden variables
INFERENCE USING FULL JOINT
Inference by Enumeration DISTRIBUTIONS
o Typically, we are interested in the posterior joint distribution of the
query variables Y given specific values e for the evidence variables E
o Let the hidden variables be H = X - Y – E
o Then the required summation of joint entries is done by summing
out the hidden variables:
o P(Y | E = e) = α P(Y,E = e) = αΣh P(Y,E= e, H = h)
The terms in the summation are joint entries because Y, E and H
together exhaust the set of random variables
o Obvious problems:
1. Worst-case time complexity O(dn) where d is the largest entry
2. Space complexity O(dn) to store the joint distribution
3. How to find the numbers for O(dn) entries?
INDEPENDENCE

o P (Toothache, Catch, Cavity, Weather),


which has 2 x 2 x2 x 4 = 32 entries
o For Example, how are
P(toothache, catch, cavity, cloudy) and P(toothache, catch, cavity)
o Use the product rule :
P(toothache, catch, cavity, cloudy)
= P(cloudy | toothache, catch, cavity) P(toothache, catch, cavity)
o The weather does not influence the dental variables, therefore :
P(cloudy | toothache, catch, cavity) = P(cloudy)
INDEPENDENCE

o Independence between propositions a and b can be written as


P(a|b) = P(a) or P(b|a) = P(b) or P(ab) = P(a) P(b)
o Independence assertions are usually based on knowledge of the
domain
o Independence can reduce the amount information necessary to
specify the full joint distribution.
o If the complete set of variables can be divided into independent
subsets, then the full joint can be factored into separate joint
distributions on those subsets
INDEPENDENCE
o A and B are independent iff
o P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B)

P(Toothache, Catch, Cavity, Weather)


= P(Toothache, Catch, Cavity) P(Weather)
INDEPENDENCE

o 32 entries reduced to 12; for n independent biased coins, O(2n)


→O(n)
o Absolute independence powerful but rare

o Dentistry is a large field with hundreds of variables, none of which


are independent. What to do?
INDEPENDENCE
Conditional Independence
o P(Toothache, Cavity, Catch) has 23 – 1 = 7 independent entries
o If I have a cavity, the probability that the probe catches in it doesn't
depend on whether I have a toothache:
(1) P(catch | toothache, cavity) = P(catch | cavity)
o The same independence holds if I haven't got a cavity:
(2) P(catch | toothache, cavity) = P(catch | cavity)
o Catch is conditionally independent of Toothache given Cavity:
P(Catch | Toothache, Cavity) = P(Catch | Cavity)
o Equivalent statements:
o P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch |
Cavity)
INDEPENDENCE
Conditional Independence
o Write out full joint distribution using chain rule:
P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)
I.e., 2 + 2 + 1 = 5 independent numbers

o In most cases, the usage of conditional independence reduces the


size of the representation of the joint distribution from exponential
in n to linear in n.
o Conditional independence is our most basic and robust form of
knowledge about uncertain environments.
PROBABILITY AND BAYES’ THEOREM
Definition :
P(Hi E) = probability that the hypothesis Hi is true if given the
evidence E
P(E Hi) = probability that the evidence E is appear if know that
hypothesis Hi is true
P(Hi) = a priori probability is probability that the hypothesis Hi is
appear without look any evidence
k = number of possible hypothesis
P( E / H i ) * P( H i )
P( H i / E )  k

 P( E / H
n 1
n ) * P( H n )
PROBABILITY AND BAYES’ THEOREM
Example 1:
Vany had onset of symptoms such as spots on the face. Doctor diagnose that Vany
got chicken pox with the possibility:
 Probability appearance of spots on the face, if Vany got chicken pox, p
(spots/chicken pox) = 0,8
 Probability Vany got chicken pox without notice any symptoms, p(chicken pox) =
0,4
 Probability appearance of spots on the face, if Vany got allergy, p(spots/allergy) =
0,3
 Probability Vany got allergy without notice any symptoms, p(allergy) = 0,7
 Probability appearance of spots on the face, if Vany got pimples, p(spots/pimples)
= 0,9
 Probability that Vany got pimples without notice any symptoms, p(pimples) = 0,5
Calculate the probability of each symptoms stated above!
PROBABILITY AND BAYES’ THEOREM
Solution:
p (chickenpox / spots ) 
p(spots/ch ickenpox) * p(chickenp ox)
p(spots/ch ickenpox) * p(chickenp ox)  p(spots/al lergy) * p(allergy)  p(spots/pi mples) * p(pimples)

p(chickenp ox/spots) 
(0,8) * (0,4) 0,32
  0,327
(0,8) * (0,4)  (0,3) * (0,7)  (0,9) * (0,5 0,98

In the same way, we can obtain:


(0,3) * (0,7)
p (allergy / spots )   0,214
0,98
(0,9) * (0,5)
p ( pimples / spots )   0,459
0,98
PROBABILITY AND BAYES’ THEOREM
Example 2:
o Problem : Marie is getting married tomorrow, at an outdoor ceremony
in the desert. In recent years, it has rained only 5 days each year.
Unfortunately, the weatherman has predicted rain for tomorrow. When
it actually rains, the weatherman correctly forecasts rain 90% of the
time. When it doesn't rain, he incorrectly forecasts rain 10% of the time.
What is the probability that it will rain on the day of Marie's wedding?
o Solution: The sample space is defined by two mutually-exclusive events
- it rains or it does not rain. Additionally, a third event occurs when the
weatherman predicts rain. Notation for these events appears below.
 Event A1. It rains on Marie's wedding.
 Event A2. It does not rain on Marie's wedding.
 Event B. The weatherman predicts rain.
PROBABILITY AND BAYES’ THEOREM
In terms of probabilities, we know the following:
o P( A1 ) = 5/365 = 0.0136985 [It rains 5 days out of the year.]
o P( A2 ) = 360/365 = 0.9863014 [It does not rain 360 days out of the
year.]
o P( B | A1 ) = 0.9 [When it rains, the weatherman predicts rain 90% of
the time.]
o P( B | A2 ) = 0.1 [When it does not rain, the weatherman predicts
rain 10% of the time.]
o We want to know P( A1 | B ), the probability it will rain on the day of
Marie's wedding, given a forecast for rain by the weatherman. The
answer can be determined from Bayes' theorem, as shown below.
PROBABILITY AND BAYES’ THEOREM
The Answer

• Note the somewhat unintuitive result. Even when the


weatherman predicts rain, it only rains only about 11%
of the time. Despite the weatherman's gloomy
prediction, there is a good chance that Marie will not
get rained on at her wedding.
PROBABILITY AND BAYES’ THEOREM
Applying Bayes' Rule : the simple case
o Product rule :

o two right-hand sides and dividing by P(a), get Bayes' rule:


o

or in distribution form :
PROBABILITY AND BAYES’ THEOREM
Applying Bayes' Rule : the simple case
o Perceive as evidence the effect of some unknown cause and to
determine that cause. Bayes’ rule become :
PROBABILITY AND BAYES’ THEOREM
Example:
o The doctor also know some unconditional fact : the prior probability
that patient has meningitis is (1/50000), and prior probability that any
patient has stiff neck is 1 %.
o Letting :
s = the proposition that the patient has a stiff neck
m = the proposition that the patient has meningitis
PROBABILITY AND BAYES’ THEOREM
Bayes' Rule : Combining Evidence
P(Cavity | toothache  catch)
= αP(toothache  catch | Cavity) P(Cavity)
= αP(toothache | Cavity) P(catch | Cavity) P(Cavity)
o This is an example of a naive Bayes model:
o P(Cause,Effect1, … ,Effectn) = P(Cause)πiP(Effecti|Cause)
o
Total number of parameters is linear in n
SUMMARY
o Uncertainty arises because of both laziness and ignorance. It is
inescapable in complex, nondeterministic, or partially observable
environments.Decision theory combines the agent’s beliefs and
desires, defining the best action as the one that maximizes
expected utility.
o Basic probability statements include prior probabilities and
conditional probabilities over simple and complex propositions.
o The axioms of probability constrain the possible assignments of
probabilities to propositions. An agent that violates the axioms
must behave irrationally in some cases.

36
SUMMARY
o The full joint probability distribution specifies the probability of
each complete assignment of values to random variables. It is
usually too large to create or use in its explicit form, but when it is
available it can be used to answer queries simply by adding up
entries for the possible worlds corresponding to the query
propositions.
o Bayes’ rule allows unknown probabilities to be computed from
known conditional probabilities, usually in the causal direction.
Applying Bayes’ rule with many pieces of evidence runs into the
same scaling problems as does the full joint distribution.

37
REFERENCES
o Stuart Russell, Peter Norvig,. 2010. Artificial intelligence: a
modern approach. PE. New Jersey. ISBN:9780132071482,
Chapter 13
o Elaine Rich, Kevin Knight, Shivashankar B. Nair. 2010.
Artificial Intelligence. MHE. New York. , Chapter 7
o Reasoning under Uncertainty:
http://aitopics.net/Uncertainty
o Handling Uncertainty in Artificial Intelligence, and the
Bayesian Controversy:
http://eprints.ucl.ac.uk/16378/1/16378.pdf
ThankYOU...

You might also like