Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Statistical reasoning

Dr. Anjali Diwan


Abduction
► Abduction is a reasoning process that tries to form plausible
explanations for abnormal observations
► Abduction is distinctly different from deduction and
induction
► Abduction is inherently uncertain
► Uncertainty is an important issue in abductive reasoning
► Some major formalisms for representing and reasoning about
uncertainty
► Mycin’s certainty factors (an early representative)
► Probability theory (esp. Bayesian belief networks)
► Dempster-Shafer theory
► Fuzzy logic
► Truth maintenance systems
► Nonmonotonic reasoning
2
Abduction
► Definition (Encyclopedia Britannica): reasoning that derives an
explanatory hypothesis from a given set of facts
► The inference result is a hypothesis that, if true,
could explain the occurrence of the given facts
► Examples
► Dendral, an expert system to construct 3D
structure of chemical compounds
► Fact: mass spectrometer data of the compound and
its chemical formula
► KB: chemistry, esp. strength of different types of
bounds
► Reasoning: form a hypothetical 3D structure that
satisfies the chemical formula, and that would most
likely produce the given mass spectrum
3
Abduction examples (cont.)

► Medical diagnosis
► Facts: symptoms, lab test results, and other observed
findings (called manifestations)
► KB: causal associations between diseases and
manifestations
► Reasoning: one or more diseases whose presence would
causally explain the occurrence of the given
manifestations
► Many other reasoning processes (e.g., word sense
disambiguation in natural language process, image
understanding, criminal investigation) can also been
seen as abductive reasoning
4
Comparing abduction, deduction,
and induction
Deduction: major premise: A => B
All balls in the box are black
A
minor premise: These balls are from the ---------
box
B
conclusion: These balls are black

Abduction: rule: A => B


All balls in the box are black
B
observation: These balls are black------------
explanation: These balls are from the box-Possibly
A
Induction: case: These balls are from theWheneve
box
r A then
observation: These balls are blackB
hypothesized rule: ------------
All ball in the box are
-
black Possibly
Deduction reasons from causes to effects A => B
Abduction reasons from effects to causes
5
Induction reasons from specific cases to general rules
Characteristics of abductive reasoning

► “Conclusions” are hypotheses, not theorems (may


be false even if rules and facts are true)
► E.g., misdiagnosis in medicine

► There may be multiple plausible hypotheses


► Given rules A => B and C => B, and fact B, both A and
C are plausible hypotheses
► Abduction is inherently uncertain
► Hypotheses can be ranked by their plausibility (if it
can be determined)

6
Sources of uncertainty

► Uncertain inputs
► Missing data
► Noisy data
► Uncertain knowledge
► Multiple causes lead to multiple effects
► Incomplete enumeration of conditions or effects
► Incomplete knowledge of causality in the domain
► Probabilistic/stochastic effects
► Uncertain outputs
► Abduction and induction are inherently uncertain
► Default reasoning, even in deductive fashion, is
uncertain
► Incomplete deductive inference may be uncertain
Probabilistic reasoning only gives probabilistic
results (summarizes uncertainty from various
7

sources)
Causes of uncertainty

Some leading causes of uncertainty to occur in the real world.


► Information occurred from unreliable sources.
► Experimental Errors
► Equipment fault
► Temperature variation
► Climate change.
Decision making with uncertainty

► Rational behavior:
► For each possible action, identify the possible
outcomes
► Compute the probability of each outcome
► Compute the utility of each outcome
► Compute the probability-weighted (expected) utility
over possible outcomes for each action
► Select the action with the highest expected utility
(principle of Maximum Expected Utility)
9
Bayesian reasoning

► Probability theory
► Bayesian inference
► Use probability theory and information about independence
► Reason diagnostically (from evidence (effects) to conclusions (causes)) or causally
(from causes to effects)
► Bayesian networks
► Compact representation of probability distribution over a set of propositional
random variables
► Take advantage of independence relationships

10
Other uncertainty representations
► Default reasoning
► Nonmonotonic logic: Allow the retraction of default beliefs if they prove
to be false
► Rule-based methods
► Certainty factors (Mycin): propagate simple models of belief through
causal or diagnostic rules
► Evidential reasoning
► Dempster-Shafer theory: Bel(P) is a measure of the evidence for P; Bel(¬P)
is a measure of the evidence against P; together they define a belief
interval (lower and upper bounds on confidence)
► Fuzzy reasoning
► Fuzzy sets: How well does an object satisfy a vague property?
► Fuzzy logic: “How true” is a logical statement?

11
Decision making with uncertainty

► Rational behavior:
► For each possible action, identify the possible
outcomes
► Compute the probability of each outcome
► Compute the utility of each outcome
► Compute the probability-weighted (expected) utility
over possible outcomes for each action
► Select the action with the highest expected utility
(principle of Maximum Expected Utility)
12
Probabilistic reasoning

► We have learned knowledge representation using first-order logic


and propositional logic with certainty, with surety about
predicates.
► In this type of knowledge representation, we might write A→B,
which means if A is true then B is true.
► Now consider a situation where we are not sure about whether A
is true or not then we cannot express this statement, this
situation is called uncertainty.

So to represent uncertain knowledge, where we are not sure


about the predicates, we need uncertain reasoning or
probabilistic reasoning.
Why probabilities anyway?
► Kolmogorov showed that three simple axioms lead to the rules of
probability theory
► De Finetti, Cox, and Carnap have also provided compelling arguments for
these axioms
1. All probabilities are between 0 and 1:
• 0 ≤ P(a) ≤ 1
2. Valid propositions (tautologies) have probability 1, and unsatisfiable
propositions have probability 0:
• P(true) = 1 ; P(false) = 0
3. The probability of a disjunction is given by:
• P(a ∨ b) = P(a) + P(b) – P(a ∧ b)

a a∧ b
b
14
Probabilistic reasoning:

► Probabilistic reasoning is a way of knowledge representation


where we apply the concept of probability to indicate the
uncertainty in knowledge.
► In probabilistic reasoning, we combine probability theory with
logic to handle the uncertainty.
► We use probability in probabilistic reasoning because it provides
a way to handle the uncertainty that is the result of someone's
laziness and ignorance.
Need of Probabilistic Reasoning in AI
• Unpredictable outcomes
• Predicates are too large to handle
• Unknown error occurs

►Inprobabilistic reasoning, there are two methods to solve difficulties with


uncertain knowledge:
• Bayes' rule

• Bayesian Statistics

• Probability: Probability can be defined as chance of occurrence of an uncertain


event. It is the numerical measure of the likelihood that an event will occur. The
value of probability always remains between 0 and 1.

0 ≤ P(X) ≤ 1, where P(X) is the probability of an event X.


• P(X) = 0, indicates total uncertainty in an event X.
• P(X) =1, indicates total certainty in an event X.
We can find the probability of an uncertain event by using the below
formula.

► P(¬A) = probability of a not happening event.


► P(¬A) + P(A) = 1.

Event: Each possible outcome of a variable is called an event.


Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real world.
Prior probability: The prior probability of an event is probability computed before observing new
information.
Posterior Probability: The probability that is calculated after all evidence or information has taken
into account. It is a combination of prior probability and new information
Bayes' Theorem in Artificial Intelligence
• Question: From a standard deck of playing cards, a single card
is drawn. The probability that the card is king is 4/52, then
calculate posterior probability P(King|Face), which means the
drawn face card is a king card.
• Solution:

• P(king): probability that the card is King= 4/52= 1/13


• P(face): probability that a card is a face card= 3/13
• P(Face|King): probability of face card when we assume it is a
king = 1
• Putting all values in equation (i) we will get:
Bayesian Belief Network

►Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts:
• Directed Acyclic Graph
• Table of conditional probabilities
►The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
►Note: It is used to represent conditional dependencies.
►A Bayesian network graph is made up of nodes and Arcs (directed links),
where:
• Each node corresponds to the random variables, and a variable can be
continuous or
discrete.
• Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows connect
the pair of nodes in the graph. These links represent that one node directly
influence the other node, and if there is no directed link that means that nodes
are independent with each other
• Note: The Bayesian network graph does not contain any cyclic graph.
Hence, it is known as a directed acyclic graph or DAG.
• The Bayesian network has mainly two components:
1. Causal Component
2. Actual numbers
• Each node in the Bayesian network has condition probability distribution P(Xi
|Parent(Xi) ), which determines the effect of the parent on that node.
• Bayesian network is based on Joint probability distribution and conditional
probability.
• Example: Harry installed a new burglar alarm at his home to detect burglary. The
alarm reliably responds at detecting a burglary but also responds for minor
earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always calls
Harry when he hears the alarm, but sometimes he got confused with the phone
ringing and calls at that time too. On the other hand, Sophia likes to listen to high
music, so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.

• Problem: Calculate the probability that alarm has sounded, but there is
neither a burglary, nor an earthquake occurred, and David and Sophia both
called the Harry.
Note: List of all events occurring in this
network: Burglary (B)
Earthquake(E)
Alarm(A) David
Calls(D) Sophia
calls(S)
• From the formula of joint distribution, we can write the problem statement in the
form of probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E)
Bayes’s rule

► Bayes’s rule is derived from the product rule:


► P(Y | X) = P(X | Y) P(Y) / P(X)
► Often useful for diagnosis:
► If X are (observed) effects and Y are (hidden) causes,
► We may have a model for how causes lead to effects (P(X | Y))
► We may also have prior beliefs (based on experience) about the frequency
of occurrence of effects (P(Y))
► Which allows us to reason abductively from effects to causes (P(Y | X)).

26
Bayesian inference
► In the setting of diagnostic/evidential reasoning


… …
Know prior probability of hypothesis
conditional probability
► Want to compute the posterior probability
► Bayes’ theorem (formula 1):

27
Simple Bayesian diagnostic reasoning

► Knowledge base:
► Evidence / manifestations:E1, …, Em
► Hypotheses / disorders: H1, …, Hn
► Ej and Hi are binary; hypotheses are mutually exclusive (non-overlapping) and exhaustive
(cover all possible cases)

► Conditional probabilities: P(Ej | Hi), i = 1, …, n; j = 1, …, m


► Cases (evidence for a particular instance): E1, …, Em
► Goal: Find the hypothesis Hi with the highest posterior
► Maxi P(Hi | E1, …, Em)

28
Bayesian diagnostic reasoning II

► Bayes’ rule says that


► P(Hi | E1, …, Em) = P(E1, …, Em | Hi) P(Hi) / P(E1, …, Em)
► Assume each piece of evidence Ei is conditionally independent of the others,
given a hypothesis Hi, then:
► P(E1, …, Em | Hi) = ∏mj=1 P(Ej | Hi)
► If we only care about relative probabilities for the Hi, then we have:
► P(Hi | E1, …, Em) = α P(Hi) ∏mj=1 P(Ej | Hi)

29
Limitations of simple
Bayesian inference
► Cannot easily handle multi-fault situation, nor cases where intermediate
(hidden) causes exist:
► Disease D causes syndrome S, which causes correlated manifestations M1 and M2
► Consider a composite hypothesis H1 ∧ H2, where H1 and H2 are independent.
What is the relative posterior?
► P(H1 ∧ H2 | E1, …, Em) = α P(E1, …, Em | H1 ∧ H2) P(H1 ∧ H2)
= α P(E1, …, Em | H1 ∧ H2) P(H1) P(H2)
= α ∏mj=1 P(Ej | H1 ∧ H2) P(H1) P(H2)
► How do we compute P(Ej | H1 ∧ H2) ??

30
Limitations of simple Bayesian inference
II
► Assume H1 and H2 are independent, given E1, …, Em?
► P(H1 ∧ H2 | E1, …, Em) = P(H1 | E1, …, Em) P(H2 | E1, …, Em)
► This is a very unreasonable assumption
► Earthquake and Burglar are independent, but not given Alarm:
► P(burglar | alarm, earthquake) << P(burglar | alarm)
► Another limitation is that simple application of Bayes’s rule doesn’t
allow us to handle causal chaining:
► A: this year’s weather; B: cotton production; C: next year’s cotton
price
► A influences C indirectly: A→ B → C
► P(C | B, A) = P(C | B)
► Need a richer representation to model interacting hypotheses,
conditional independence, and causal chaining
► Next time: conditional independence and Bayesian31 networks!
What is a rule-based system in AI?
► A rule-based system is a system that applies human-made rules to store, sort
and manipulate data. In doing so, it mimics human intelligence.
► Rule-based systems require a set of facts or source of data, and a set of rules
for manipulating that data. These rules are sometimes referred to as ‘If
statements’ as they tend to follow the line of ‘IF X happens THEN do Y’.

The steps can be simplified to:


► First comes the data or new business event
► Then comes the analysis: the part where the system conditionally processes the
data against its rules
► Then comes any subsequent automated follow-up actions
RULE-BASED SYSTEM EXAMPLE

► A domain-specific expert system that uses rules


to make deductions or narrow down choices is
one of the most popular as well as the classic
example of rule-based systems. Furthermore,
recent advancement in technology has given way
to the development of modern machines and
systems like:
► Virtual Assistant.
► Diagnostics Oriented Rockwell Intelligence
System (DORIS). Every rule based system contains four
basic components.
► Machine for Intelligent Diagnosis (MIND).
FEATURES OF RULE-BASED SYSTEMS:

► Widely used in Artificial Intelligence, Rule-Based Expert System is not just only
responsible for modeling intelligent behavior in machines and building expert
system that outperform human expert(s) but also helps:

► Composed of combined knowledge of human experts in the problem domain.


► Represent knowledge in a highly declarative way.
► Enables the use of several different knowledge representations paradigms.
► Supports implementation of non-deterministic search and control strategies.
► It helps describe fragmentary, ill-structured, heuristic, judgemental knowledge.
► It is robust and can operate with uncertain or incomplete knowledge.
► Helps with rule-based decision making examples monitoring, control, diagnostics,
service, etc.
COMPONENTS OF RULE-BASED SYSTEMS
► The rule-based expert system architecture is an amalgamation of four important components
that are focused on different aspects of the problem in hand.
► From assessing the information to helping machines reach the goal state, these components are
integral for the smooth functioning of rule-based systems
► Rule Base: This is a list of rules that is specific to a type of knowledge base, which can be
rule-based vs. model-based, etc.
► Semantic Reasoner: Also known as the inference engine, it infers information or takes
necessary actions based on input and the rule base in the knowledge base. Semantic reasoner
involves a match-resolve-act cycle, wherein:
► Match: A section of the production rule system is matched with the contents of the working memory to
obtain a conflict, which consists of various instances of the satisfied productions.
► Conflict-Resolution: When the production system is matched, one of the production instances in the
conflict is chosen for execution, to determine the progress of the process.
► Act: Finally, the production instance executed in the above phase is executed, which impacts the
contents of the working memory.
► Working Memory: Stores temporary information or data.
► User Interface: It is the connection to the outside world, input and output
signals are sent and received.
CONSTRUCTION OF RULE-BASED SYSTEMS

► The construction of rule-based systems is based on a specific type of logic, such


as Boolean logic, fuzzy logic, and probabilistic logic and is categorized into:
► Knowledge-Based Approach: It is a knowledge-based construction that follows a
traditional engineering approach, which is domain-independent. Here, it is
important to acquire requirements as well as necessary knowledge before
identifying the relationships between attributes.
► Data-Based Approach: This data-based construction follows a machine learning
approach, which, like the earlier approach, is domain-independent. This
rule-based approach is subdivided into:
► Supervised Learning.
► Unsupervised Learning.
TYPES OF RULE-BASED SYSTEMS
► Like expert systems, rule-based systems can also be categorized into:
► Forward Chaining: Also known as data-driven reasoning, forward
chaining is a data-driven technique that follows a deductive approach
to reach a conclusion.
► Backward Chaining: Often used in formulating plans, backward
chaining is an alternative to forward chaining. It is a goal-driven
technique that follows an inductive approach or associative reasoning.
ADVANTAGES OF RULE-BASED SYSTEMS

► Rule-based programming is easy to understand.


► It can be built to represent expert judgment in simple or complicated
subjects.
► The cause-and-effect in Rule-Based Systems is transparent.
► It offers flexibility and an adequate mechanism to model several basic
mental processes into machines.
► Mechanizes the reasoning process.
DISADVANTAGES OF RULE-BASED SYSTEMS

► They require deep domain knowledge and manual work.


► Generating rules for a complex system is quite challenging and
time-consuming.
► It has less learning capacity, as it generates results based on the rules.
DIFFERENCE BETWEEN RULE BASED
SYSTEM AND MACHINE LEARNING

RULE-BASED SYSTEMS MACHINE LEARNING


► It is a simplified form of Artificial ► Machine Learning is an application
Intelligence. of Artificial Intelligence.
► It is based on Facts and Rules. ► It is based on models.
► Labor intensive and hard to ► Easy to implement and use.
implement.
► It helps deliver more accurate
► Deliver excellent performance results.
within a narrow domain.
► It effortlessly deals with complex
► Limited to human coded and excessive data.
information.
What Is Fuzzy Logic?

► Fuzzy logic is an approach to variable processing that allows for multiple


possible truth values to be processed through the same variable.
► Fuzzy logic attempts to solve problems with an open, imprecise spectrum of
data and heuristics that makes it possible to obtain an array of accurate
conclusion.
► Fuzzy logic is designed to solve problems by considering all available
information and making the best possible decision given the input.
► Fuzzy logic is a heuristic approach that allows for more advanced
decision-tree processing and better integration with rules-based
programming.
► Fuzzy logic is a generalization from standard logic, in which all statements
have a truth value of one or zero. In fuzzy logic, statements can have a value
of partial truth, such as 0.9 or 0.5.
► Theoretically, this gives the approach more opportunity to mimic real-life
circumstances, where statements of absolute truth or falsehood are rare.
► Fuzzy logic may be used by quantitative analysts to improve the execution of
their algorithms.
► Because of the similarities with ordinary language, fuzzy algorithms are
comparatively simple to code, but they may require thorough verification and
testing.
Fuzzy Logic tries to capture the human
ability of reasoning with imprecise
information
► Models Human Reasoning
► Works with imprecise statements such as:
In a process control situation, “If the
temperature is moderate and the pressure is
high, then turn the knob slightly right”
► The rules have “Linguistic Variables”,
typically adjectives qualified by adverbs
(adverbs are hedges).
Underlying Theory: Theory of Fuzzy
Sets
► Intimate connection between logic and set theory.
► Given any set ‘S’ and an element ‘e’, there is a very natural
predicate, μs(e) called as the belongingness predicate.
► The predicate is such that,
μs(e) = 1, iff e ∈ S
= 0, otherwise
► For example, S = {1, 2, 3, 4}, μs(1) = 1 and μs(5) = 0
► A predicate P(x) also defines a set naturally.
S = {x | P(x) is true}
For example, even(x) defines S = {x | x is even}
Fuzzy Set Theory (contd.)
► Fuzzy set theory starts by questioning the fundamental
assumptions of set theory viz., the belongingness predicate,
μ, value is 0 or 1.
► Instead in Fuzzy theory it is assumed that,
μs(e) = [0, 1]
► Fuzzy set theory is a generalization of classical set theory also
called Crisp Set Theory.
► In real life belongingness is a fuzzy concept.
Example: Let, T = set of “tall” people
μT (Ram) = 1.0
μT (Shyam) = 0.2
Shyam belongs to T with degree 0.2.
Slot and filler structure
types
weak slot and filler structure:
• The knowledge in slot and filler systems consists
of structures as a set of entities and their attributes. .
• This structure is called a weak slot and filler
structure.
strong slot and filler structure:
• But later we used strong slot and filler structures. It
represent links between objects according to more rigid
rules.

47
SNSCT 16BM031 AI
48
Semantic network

• Semantic networks became popular in artificial


intelligence and natural language processing
only because it represents knowledge or supports
reasoning.
• These act as another alternative for predicate logic
in a form of knowledge representation. Semantic
nets consist of nodes, links and link label.

49
Frames

• A frame is an artificial intelligence data structure used to divide knowledge


into substructures by representing "stereotyped situations.
• " Frames are the primary data structure used in artificial intelligence frame
language.

51
52
Conceptual dependency

• Conceptual Dependency (CD). This


representation is used in natural language
processing in order to represent them earning
of the sentences in such a way that inference
we can be made from the sentences.
• As an example consider the event represented
by the sentence.

53
54
Scripts

• A script is a structured representation


describing a stereotyped sequence of events in
a particular context.
• Scripts are used in natural language
understanding systems to organize a
knowledge base in terms of the situations that
the system should understand.

55
56
WHAT IS SEMANTIC NET?

► A semantic net (or semantic network) is a knowledge representation


technique used for propositional information.
► So it is also called a propositional net. Semantic nets convey meaning. They
are two dimensional representations of knowledge.
► Mathematically a semantic net can be defined as a labeled directed graph.
A brief look at semantic
networks
► A semantic network is an
irregular graph that has
concepts in vertices and
relations on arcs.
► Relations can be ad-hoc, but
they can also be quite
general, for example, “is a”
(ISA), “a kind of” (AKO), “an
instance of”, “part of”.
► Relations often express
physical properties of
objects (colour, length, and
lots of others).
► Most often, relations link
two concepts.
... semantic networks (2)

► General semantic relations help represent the meaning of


simple sentences in a systematic way.
► A sentence is centred on a verb that expects certain
arguments.
► For example, verbs usually denotes actions (with agents)
or states (with passive experiencers, for example, “he
dreams” or “he is sick”).
Frames and frame systems

► A frame represents a concept;


► a frame system represents an
organization of knowledge about
a set of related concepts.
► A frame has slots that denote
properties of objects. Some slots
have default fillers, some are
empty (may be filled when more
becomes known about an
object).
► Frames are linked by relations of
specialization/generalization and
by many ad-hoc relations.
Conceptual graphs

► John Sowa created the


conceptual graph
notation in 1984. It has
substantial philosophical
and psychological
motivation.
► It is still quite a popular
knowledge representation
formalism, especially in
semantic processing of
language, and a topic of
interesting research.
► Conceptual graphs can be
expressed in first-order
logic but due to its
graphical form it may be
easier to understand than
logic. Parents is a 3-ary relation.
Conceptual graphs (2)
Conceptual graphs (3)

Her name was Magill, and she called herself


Lil,
but everyone knew her as Nancy.
Conceptual graphs (4)

Variables allow us to express


the identity of an individual.

You might also like