Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Defining Uncertainty:

• Information can be incomplete, inconsistent, uncertain,

or all three. In other words, information is often
unsuitable for solving a problem.
• Uncertainty is defined as, the lack of the exact
knowledge that would enable us to reach a perfectly
reliable conclusion. Classical logic permits only exact
reasoning. It assumes that perfect knowledge always
exists and the law of the excluded middle can always be
Reasoning in Uncertain Situations:
• Traditional inference procedures followed the model of
reasoning used in the predicate calculus
o That is, from correct premises, sound inference
rules produce new, guaranteed correct conclusions.
• We must draw useful conclusions from poorly formed
and uncertain evidence using unsound inference rules
• We do it very successfully in almost every aspect of our
daily life:
o correct medical diagnoses from ambiguous
o comprehend language statements that are often
ambiguous or incomplete
o recognize friends from their voices or their
• Example: To demonstrate the problem of reasoning in
ambiguous situations
• Consider Rule 2,

Dr. A. H. Shanthakumara 1 Dept. of CSE

the engine does not turn over, and
the lights do not come on
the problem is battery or cables
• Failure of the engine to turn over and the lights to come
on does not necessarily imply that the battery and cables
are bad. Interesting, converse of the rule is true
the problem is battery or cables
the engine does not turn over, and
the lights do not come on.
• With a dead battery, neither the lights nor the starter
will work
• Expert system offers an example of abductive reasoning
• Abduction means systematic guessing: "infer" an
assumption from a conclusion
• Abduction states that from P→ Q and Q it is possible to
infer P
• Abduction is an unsound rule of inference, meaning that
the conclusion is not necessarily true for every
interpretation in which the premises are true
• Although abduction is unsound, it is often essential to
solving problems.
o Faults or diseases cause (imply) symptoms, not the
other way around; but diagnosis must work from
symptoms back to their causes.
• We often attach a certainty factor to the rule to measure
confidence in its conclusion.

Dr. A. H. Shanthakumara 2 Dept. of CSE

o The rule, P→ Q , expresses the belief “If you
believe P to be true, then you believe Q will happen
90% of the time”
• Another issue for expert system reasoning is how to
draw useful results from data with missing, incomplete,
or incorrect information
o We may use certainty measures to reflect our belief
in the quality of data
o Beliefs and imperfect data can be propagated
through rules to constrain conclusions.

Logic-Based Abductive Inference:

• Based on predicate logic
• Three important assumptions:
o Predicate descriptions are sufficient w.r.t. to the
o Information is consistent
o Knowledge base grows monotonically
Non-monotonic Logic:
• Addresses the three assumptions of traditional logic
o Knowledge is incomplete
▪ No knowledge about p: true or false?
▪ Prolog – closed world assumption
• the only true facts are those that are
explicitly listed as true in KB
o Knowledge is inconsistent
▪ Based on how the world usually works
▪ Most birds fly, but Ostrich doesn’t
o Knowledge base grows non-monotonically

Dr. A. H. Shanthakumara 3 Dept. of CSE

▪ New observation may contradict the existing
knowledge, thus the existing knowledge may
need removal.
▪ Inference based on assumptions, how come if
the assumptions are later shown to be
• Three modal operators are introduced
1. Unless Operator:
• New information may invalidate previous results
• Implemented in TMS – Truth Maintenance Systems to
keep track of the reasoning steps and preserve the KB
• Introduce Unless operator
o Support inferences based on the belief that its
argument is not true
o Consider
▪ p(X) unless q(X) → r(X)
If p(X) is true and not believe q(X) true then r(X)
▪ p(Z)
▪ r(W) → s(W)
o From above, conclude s(X).
o Later, change believe or find q(X) true, what
▪ Retract r(X) and s(X)
o Unless deals with believe, not truth
▪ Either unknown or believed false
▪ Believed or known true
o Monotonocity
▪ a change of mind cannot be represented

Dr. A. H. Shanthakumara 4 Dept. of CSE

2. Is-consistent-with Operator M:
• When reason, make sure the premises are consistent
• Format: M p – p is consistent with KB
• Consider
o X good_student(X)  M study_hard(X) →
o For all X who is a good student, if the fact that X
studies hard is consistent with KB, then X will
o Not necessary to prove that X study hard.
• How to decide p is consistent with KB
o Negation as failure
o Heuristic-based and limited search
3. Default Logic:
• Introduce a new format of inference rules:
o A(Z)  :B(Z) → C(Z)
o If A(Z) is provable, and it is consistent with what we
know to assume B(Z), then conclude C(Z)
• Compare with is-consistent-with operator
o Similar
o Difference is the reasoning method
▪ In default logic, new rules are used to infer sets
of plausible extensions
o Example:
X good_student(X)  :study_hard(X) → graduates(X)
Y party(Y)  :not(study_hard(Y)) → not(graduates(X))

Next Class: Truth Maintenance Systems

Dr. A. H. Shanthakumara 5 Dept. of CSE

Truth Maintenance Systems:
• Employed to protect the logical integrity of the
conclusions of an inferencing system.
• Whenever beliefs base are revised, it is necessary to re-
compute support for items in a knowledge base
• Truth maintenance systems address this issue by
storing justifications for each inference and conclusions
in the light of new beliefs.
• Backtracking is a systematic method for exploring all the
alternatives for decision points in search-based problem
o systematically check all alternatives in the space
o time-consuming, inefficient, and in a very large
space, useless
• What we really want. backtrack directly to the point in
the space where the problem occurs, and to make
adjustments to the solution at that state(dependency-
directed backtracking)
• In order to use dependency-directed backtracking in a
reasoning system, we must:
1. Associate with the production of each conclusion its
• Must contain all the facts, rules, and assumptions
used to produce the conclusion.
2. Provide a mechanism that, when given a contradiction
along with its justification, finds the set of false
assumptions within that justification that led to the
3. Retract the false assumption(s).

Dr. A. H. Shanthakumara 1 Dept. of CSE

4. Create a mechanism that follows up the retracted
assumption(s) and retracts any conclusion that uses
within its justifications the retracted false assumption

Methods for building dependency directed backtracking

1. Justification based truth maintenance system
• Three main operations that are performed by the JTMS
a. the JTMS inspects the network of justifications
o This inspection can be triggered by queries from the
problem solver
b. modify the dependency network, where modifications
are driven by information supplied by the problem solver
o adding new propositions, adding or removing
premises, adding contradictions, and justifying the
belief in a proposition
c. update the network
o executed whenever a change is made in the
dependency network.
o update operation recomputes the labels of all
propositions in a manner that is consistent with
existing justifications.
Demonstration of JTMS:
• JTMS works with sets of nodes and justifications. Nodes
stand for beliefs, and justifications support belief in
• Associated with nodes are the labels IN and OUT, which
indicate the belief status of the associated node.

Dr. A. H. Shanthakumara 2 Dept. of CSE

• We can reason about the support for any node by
relating it to the INs and OUTs of the other nodes that
make up its justification(s)
• We construct a simple dependency network
• Consider the modal operator M that placed before a
predicate is read as is consistent with.
• For example:
X good_student(X)  M study_hard(X) → graduates(X)

Y party(Y) → not(study_hard(Y))

• We now make this set of propositions into a
justification network.
o belief is associated with two other sets of beliefs.

• The second, labeled OUT, are propositions that should

not be believed for the proposition to hold
• The premises of justifications are labeled and the
combinations of propositions that support a conclusion
are labeled as

Dr. A. H. Shanthakumara 3 Dept. of CSE

• With the information of the network, the problem solver
can reason that study_hard(david) is supported, because
the premise good_student(david) is considered true and
it is consistent with the fact that good students study
• Suppose we add the premise party_person(david).
o enables the derivation not(study_hard(david)), and
the belief study_hard(david) is no longer supported
o The justifications of this situation is in below, note
the relabeling of IN and OUT.

Dr. A. H. Shanthakumara 4 Dept. of CSE

2. Assumption-based truth maintenance system
• The labels for nodes in the network are no longer IN and
OUT but rather the sets of premises (assumptions)
underlying their derivation.
• An advantage of ATMS over JTMS:
o provides in dealing with multiple possible states of
belief, there is no longer a single state of belief but
rather subsets of potential supporting premises.
o The creation of different belief sets, or possible
worlds, allows a comparison of results from
different choices of premises
• Disadvantage of ATMS over JTMS:
o Inability to represent premise sets that are
themselves nonmonotonic and the control over the
problem solver
• Similarity to JTMS:
o The communication between the ATMS/JTMS and
the problem solver for inspection, modification, and

Dr. A. H. Shanthakumara 5 Dept. of CSE

• Suppose we have the ATMS network

• n1, n2, n4, and n5 are premises and assumed true.

• The dependency network also reflects the relations that
from premise n1 and n2 we support n3, with n3 we
support n7, with n4 we support n7, with n4 and n5 we
support n6, and finally, with n6 we support n7.
• The subset/superset lattice for the premise
dependencies is

Dr. A. H. Shanthakumara 6 Dept. of CSE

• This lattice of subsets of premises offers a useful way to
visualize the space of combinations of premises.
• if some premise is found to be suspect, the ATMS will be
able to determine how that premise relates to other
premise support subsets.
• For example, node n3 in Figure 9.4 will be supported by
all sets of premises that are above {n1,n2} in the lattice
of Figure 9.5.
• The ATMS reasoner removes contradictions by removing
from the nodes those sets of premises that are
discovered to be inconsistent.
• For example, we revise the support for the reasoning
reflected by Figure 9.4 to make n3 a contradiction node
• Since the label for n3 is {n1, n2}, this set of premises is
determined to be inconsistent
• In this situation, one of the possible labellings supporting
n7 will also have to be removed.

Dr. A. H. Shanthakumara 7 Dept. of CSE

The Stochastic Approach to Uncertainty:
• Using probability theory, we can describe how
combinations of events are able to influence each
• We can understand the frequency with which events
have occurred in the past we can use this information
(as an inductive bias) to interpret and reason about
present data
• The primary inference mechanism in stochastic
domains is some form of Bayes’ rule
• The full use of Bayesian inference in complex domains
quickly becomes intractable. Probabilistic graphical
models are specifically designed to address this
• Probabilistic graphical models are combination of
probability theory and graph theory
• Probabilistic graphical models can address the
problems of uncertainty and complexity
Some Basics:
• The concept of probability has a long history that goes
back thousands of years when words like “probably”,
“likely”, “maybe”, “perhaps” and “possibly” were
introduced into spoken languages. However, the
mathematical theory of probability was formulated only
in the 17th century.
• The probability of an event is the proportion of cases in
which the event occurs. Probability can also be defined
as a scientific measure of chance.

Dr. A. H. Shanthakumara 1 Dept. of CSE

• Probability can be expressed mathematically as a
numerical index with a range between zero (an
absolute impossibility) to unity (an absolute certainty).
• Most events have a probability index strictly between 0
and 1, which means that each event has at least two
possible outcomes: favourable outcome or success, and
unfavourable outcome or failure.

P(success ) =
the number of successes
the number of possible outcomes

P( failure) =
the number of failures
the number of possible outcomes

• If s is the number of times success can occur, and f is

the number of times failure can occur, then
P(success ) = p =
s+ f

P( failure) = q =
s+ f
and p+q=1
• If we throw a coin, the probability of getting a head will
be equal to the probability of getting a tail. In a single
throw, s = f = 1, and therefore the probability of getting
a head (or a tail) is 0.5.

Dr. A. H. Shanthakumara 2 Dept. of CSE

Conditional Probability:
• Let A be an event in the world and B be another event.
Suppose that events A and B are not mutually exclusive,
but occur conditionally on the occurrence of the other.
The probability that event A will occur if event B occurs
is called the conditional probability. Conditional
probability is denoted mathematically as p(A|B) in
which the vertical bar represents "given" and the
complete probability expression is interpreted as
• “Conditional probability of event A occurring given that
event B has occurred”.

p( A B ) =
the number of times A and B can occur
the number of times B can occur

• The number of times A and B can occur, or the

probability that both A and B will occur, is called the
joint probability of A and B. It is represented
mathematically as p(AՈB). The number of ways B can
occur is the probability of B, p(B), and thus

p( A  B )
p(A B ) =
p (B )
• Similarly, the conditional probability of event B
occurring given that event A has occurred equals

p ( B  A)
p(B A) =
p ( A)
• Hence

Dr. A. H. Shanthakumara 3 Dept. of CSE

p(B  A) = p(B A) p( A)


p( A  B) = p(B A) p( A)
• Substituting the last equation into the equation

p( A  B )
p(A B ) =
p (B )
• yields the Bayesian rule:

Bayesian Rule:

p(B A) p( A)
p(A B ) =
p (B )
• where:
o p(A|B) is the conditional probability that event A
occurs given that event B has occurred;
o p(B|A) is the conditional probability of event B
occurring given that event A has occurred;
o p(A) is the probability of event A occurring;
o p(B) is the probability of event B occurring

Dr. A. H. Shanthakumara 4 Dept. of CSE

The Joint Probability:

n n
 p( A  Bi ) =  p(A Bi ) p(Bi )
i =1 i =1

A B1

B3 B2

• If the occurrence of event A depends on only two

mutually exclusive events, B and NOT B, we obtain:

p(A) = p(AB)  p(B) + p(AB)  p(B)

where  is the logical function NOT.
• Similarly,

p(B) = p(BA)  p(A) + p(BA)  p(A)

• Substituting this equation into the Bayesian rule yields:

p(B A) p( A)
p( A B ) =
p(B A) p( A) + p(B A) p(A)

Dr. A. H. Shanthakumara 5 Dept. of CSE

Bayesian Reasoning:
• Suppose all rules in the knowledge base are
represented in the following form:
IF E is true
THEN H is true {with probability p}
• This rule implies that if event E occurs, then the
probability that event H will occur is p.
• In expert systems, H usually represents a hypothesis
and E denotes evidence to support this hypothesis.
• The Bayesian rule expressed in terms of hypotheses
and evidence looks like this:
p(E H ) p(H )
p(H E ) =
p(E H ) p(H ) + p(E H ) p(H )

• where:
o p(H) is the prior probability of hypothesis H being
o p(E|H) is the probability that hypothesis H being
true will result in evidence E;
o p(H) is the prior probability of hypothesis H being
o p(E|H) is the probability of finding evidence E
even when hypothesis H is false.

Dr. A. H. Shanthakumara 6 Dept. of CSE

• We can take into account both multiple hypotheses H1,
H2,..., Hm and multiple evidences E1, E2,..., En. The
hypotheses as well as the evidences must be mutually
exclusive and exhaustive.
• Single evidence E and multiple hypotheses follow:
p(E H i ) p(H i )
p(H i E ) =
 p(E H k ) p(H k )
k =1

• Multiple evidences and multiple hypotheses follow:

p(E1 E2 . . . En H i ) p(H i )
p(H i E1 E2 . . . En ) =
 p(E1 E2 . . . En H k ) p(H k )
k =1

A Directed Graphical Model: The Bayesian Belief

• Bayesian probability theory offers a mathematical
foundation for reasoning under uncertain conditions
• But the complexity encountered in applying it to
realistic problem domains can be prohibitive.
• Bayesian belief networks or BBNs (Pearl 1988), offers a
computational model for reasoning to the best
explanation of a set of data in the context of the
expected causal relationships of a problem domain.
• Motivation
o Reduce the number of parameters of the full
Bayesian model

Dr. A. H. Shanthakumara 7 Dept. of CSE

o Show how the data can partition and focus
o Avoid use of a large joint probability table to
compute probabilities for all possible events
• Assumption
o Events are either conditionally independent or
their correlations are so small that they can be
Directed Graphical Model:
• The events and (cause-effect) relationships form a
directed graph, where events are vertices and
relationships are links
• The Bayesian representation of the traffic problem with
potential explanations

• The joint probability distribution for the traffic and

construction variables

Dr. A. H. Shanthakumara 8 Dept. of CSE

• Given bad traffic, what is the probability of road
• p(C|T)=p(C=t, T=t)/(p(C=t, T=t)+p(C=f,T=t))
An Example:
• Traffic problem
o Events:
▪ Road construction C
▪ Accident A
▪ Orange barrels B
▪ Bad traffic T
▪ Flashing lights L
o Joint probability
▪ P(C,A,B,T,L)=p(C)*p(A|C)*p(B|C,A)*p(T|C,A,B)
▪ Number of parameters: 2^5=32
o Reduction
▪ Assumption: Parameters are only dependent
on parents
▪ Calculation of joint probability
• P(C,A,B,T,L)=p(C)*p(A)*p(B|C)*p(T|C,A)*
• Number of parameters: 2+2+4+8+4=20

Dr. A. H. Shanthakumara 9 Dept. of CSE

Discrete Markov Process:
• Finite state machine
o A graphical representation
o State transition depends on input stream
o States and transitions reflect properties of a formal
• Probabilistic finite state machine
o A finite state machine
o Transition function represented by a probability
distribution on the current state
• Discrete Markov process (chain, machine)
o A specialization of probabilistic finite state machine
o Ignores its input values
A Markov state machine or Markov chain with four states,
s1, ..., s4

• At any time the system is in one of distinct states

• The system undergoes state change or remain
• Divide time into discrete intervals: t1, t2, …, tn
• Change state according to the probability distribution of
each state
• S(t) – the actual state at time t

Dr. A. H. Shanthakumara 1 Dept. of CSE

p(S(t)) = p(S(t)|S(t-1), s(t-2), s(t-3), …)
• First-order markov chain
o Only depends on the direct predecessor state
o P(S(t)) = p(S(t)|S(t-1))
Observable Markov Model:

• Assume p(S(t)|S(t-1)) is time invariant, that is, transition

between specific states retains the same probabilistic
• State transition probability aij between si and sj:
o aij=p(S(t)=si|S(t-1)=sj), 1<=i,j<=N
o If i=j, no transition (remain the same state)
o Properties: aij >=0, iaij=1
• Question: suppose that today is sunny, what is the
probability of the next five days being sunny, sunny,
cloudy, cloudy, precipitation?
• We assume this location has four different discrete
states for the variable weather:
o S1 – sun, S2 – cloudy, S3 – fog and S4 –

Dr. A. H. Shanthakumara 2 Dept. of CSE

o Time intervals: noon to noon

• the matrix of state transitions aij:

• we determine the probability and that the first day, s1,

is today’s observed sunshine:
= s1, s1, s1, s2, s2, s4
• The probability of these observed states, given the first-
order Markov model, M, is:

Dr. A. H. Shanthakumara 3 Dept. of CSE

You might also like