Professional Documents
Culture Documents
0 AIcompleteMerged
0 AIcompleteMerged
0 AIcompleteMerged
Artificial Intelligence
What is AI?
2
Acting humanly: The Turing Test approach
The Turing Test (Alan Turing, 1950) - provides a satisfactory operational
definition of intelligence.
A computer passes the test if a human interrogator, after posing some written
questions, cannot tell whether the written responses come from a person or from
a computer
The computer would need to possess the following capabilities:
natural language processing to enable it to communicate successfully in English;
knowledge representation to store what it knows or hears;
automated reasoning to use the stored information to answer questions and to draw new
conclusions;
machine learning to adapt to new circumstances and to detect and extrapolate patterns.
test avoids direct physical interaction between the interrogator and the computer
but includes a video signal
To pass the total Turing Test, the computer will need
computer vision to perceive objects, and
robotics to manipulate objects and move about.
3
Thinking humanly: The cognitive modeling
approach
introspection—trying to catch our own thoughts as they go by;
psychological experiments—observing a person in action; and
brain imaging—observing the brain in action
cognitive science brings together computer models from AI and experimental
techniques from psychology to construct precise and testable theories of the
human mind.
4
Thinking rationally: The “laws of thought”
approach
Aristotle was one of the first to attempt to codify “right thinking,” that is,
irrefutable reasoning processes.
His syllogisms provided patterns for argument structures that always yielded
correct conclusions when given correct premises—for example,
“Socrates is a man; all men are mortal; therefore, Socrates is mortal.”
These laws of thought were supposed to govern the operation of the mind; their
study initiated the field called logic.
There are two main obstacles to this approach.
First, it is not easy to take informal knowledge and state it in the formal terms
required by logical notation, particularly when the knowledge is less than 100%
certain.
Second, there is a big difference between solving a problem “in principle” and
solving it in practice.
5
Acting rationally: The rational agent
approach
An agent is just something that acts
all computer programs do something, but computer agents are expected to do
more: operate autonomously, perceive their environment, persist over a
prolonged time period, adapt to change, and create and pursue goals.
A rational agent is one that acts so as to achieve the best outcome or, when there
is uncertainty, the best expected outcome
Making correct inferences is sometimes part of being a rational agent
There are two advantages over the other approaches.
First, it is more general than the “laws of thought” approach because correct
inference is just on of several possible mechanisms for achieving rationality.
Second, it is more amenable to scientific development than are approaches based
on human behavior or human though.
6
FOUNDATIONS OF AI
Philosophy (rules of reasoning)
Can formal rules be used to draw valid conclusions?
How does the mind arise from a physical brain?
Where does knowledge come from?
How does knowledge lead to action?
Aristotle (384–322 B . C .) - first to formulate a precise set of laws governing
the rational part of the mind.
developed an informal system of syllogisms for proper reasoning to generate
conclusions mechanically, given initial premises.
Ramon Lull (1315) - useful reasoning could actually be carried out by a
mechanical artifact.
Thomas Hobbes (1588–1679) - reasoning was like numerical computation, that
“we add and subtract in our silent thoughts.”
Around 1500, Leonardo da Vinci (1452–1519) designed but did not build a
mechanical calculator;
Wilhelm Leibniz (1646–1716) built a mechanical device intended to carry out
operations on concepts rather than numbers,
7
FOUNDATIONS OF AI
Mathematics (logic, algorithms, optimization)
What are the formal rules to draw valid conclusions?
What can be computed?
How do we reason with uncertain information?
Mathematics formalizes the three main area of AI: computation, logic, and
probability
Computation leads to analysis of the problems that can be computed -
complexity theory
Probability contributes the “degree of belief” to handle uncertainty in AI
Decision theory combines probability theory and utility theory
(“preferred outcomes” / bias)
8
FOUNDATIONS OF AI
Economics
How should we make decisions so as to maximize payoff?
How should we do this when others may not go along?
How should we do this when the payoff may be far in the future?
economics being about money?, but economists will say that they are really
studying how people make choices that lead to preferred outcomes(utility).
Decision theory combines utility and probability theory provides a formal and
complete framework for decisions (economic or otherwise) made under
uncertainty
Control theory and cybernetics
How can artifacts operate under their own control?
the science of communication and automatic control systems
The artifacts adjust their actions
To do better for the environment over time
Based on an objective function and feedback from the environment
9
FOUNDATIONS OF AI
Neuroscience (model low level human/animal brain activity)
How do brains process information?
Study of the nervous system, esp. brain
A collection of simple cells can lead to thought and action
Cognitive Science and Psychology (modeling high level human/animal thinking)
How do humans and animals think and act?
The study of human reasoning and acting
Provides reasoning models for AI
Strengthen the ideas
humans and other animals can be considered as information processing machines
Despite advances, we are still a long way from understanding how cognitive
processes actually work.
Linguistics
How does language relate to thought?
computational linguistics or natural language processing
10
FOUNDATIONS OF AI
Computer Engineering
How can we build an efficient computer?
artificial intelligence to succeed, we need two things: intelligence and an
artifact.
The computer has been the artifact of choice.
he first operational computer was the electromechanical Heath Robinson built in
1940 by Alan Turing’s team for a single purpose: deciphering German messages
1943, the same group developed the Colossus, a powerful general-purpose
machine based on vacuum tubes.
The first operational programmable computer was the Z-3, the invention of
Konrad Zuse in Germany in 1941. (also invented floating-point numbers and the
first high-level programming language).
The first electronic computer, the ABC, was assembled by John Atanasoff and
his student Clifford Berry between 1940 and 1942 at Iowa State University
Then ENIAC, developed as part of a secret military project at the University of
Pennsylvania by a team including John Mauchly and John Eckert, that proved to
be the most influential forerunner of modern computers
11
History of AI
The gestation (1943-1955):
♦ Main actors for the next 20 years from MIT, CMU, Standford and IBM
12
History of AI
Newell and Simon’s early success was followed up with the General
Problem Solver, or GPS.
Unlike Logic Theorist, this program was designed from the start to
imitate human problem-solving protocols.
”If the number of customers Tom gets is twice the square of 20 percent of
the number of advertisements he runs, and the number of advertisements he
runs is 45, what is the number of customers Tom gets?”
13
History of AI
14
History of AI
15
History of AI
Knowledge-based systems: The key to power? (1969-1979)
16
History of AI
AI adopts the scientific method (1987-present)
♦ build on existing theories than to propose brand-new ones
♦ to base claims on rigorous theorems or hard experimental
evidence rather than on intuition
♦ and to show relevance to real-world applications rather than
toy examples
♦ speech recongition (HMM), datamining, bayesian networks
17
Potted history of AI
19
Artificial Intelligence
Intelligent systems
Intelligent systems
the system that incorporates intelligence into applications being
handled by machines
perform search and optimization along with learning capabilities.
different types of machine learning such as supervised,
unsupervised and reinforcement learning can be modeled in
designing intelligent systems.
Expert systems, intelligent agents and knowledge-based systems
are examples of intelligent systems
from automated vacuums such as the Roomba to facial
recognition programs to Amazon's personalized shopping
suggestions
Characteristics
Self-explaining
the system can explain how it came to a certain decision
Robust
property of a system means that the system behaves well
and adequate not only under ordinary conditions, but also under
unusual conditions
fault tolerant
continue to adequately perform even if one or more of its internal
system components fail or break.
Adaptive
react to changes, in particular to changes in the environment or the
context of the system
self-optimizing
organize their internal components and capabilities in new
structures without a central or an external authority in place
Characteristics
Deductive
based on a set of axioms and rules, they can deduct new insights by
applying the rules to the axioms as well as to the resulting new
facts.
using an underlying inference engine; deductive systems can
discover new facts that they can use for their decision process
deductive systems can discover new facts that they can use for their
decision process
Learning
observe the achieved results and compare them with the desired
outcome.
Cooperative
expose social capabilities; interact with other systems – and
potentially humans as well
Autonomous
Characteristics
Autonomous
performs the desired tasks and behaves well and adequate
even in unstructured environments without continuous human
guidance.
Agile
able to manage and apply knowledge effectively so that they
behave well and adequate in continuously changing and
unpredicted environments.
Steps to build AI systems
Identify the problem
What are you trying to solve?
Which result is desired?
Preparation of the data (preprocessing)
structured and unstructured data
80% of their time cleaning , moving, reviewing, and organizing
Choice of algorithms (model)
Supervised learning
Unsupervised learning & reinforcing learning
Training the algorithms
Choosing the most suitable programming language
Platform selection
Test the Model
Deployment
The End…
7
Artificial Intelligence
Intelligent agents
contents
Agents And Environments
Agent Function and Program
Rational agent
Agent Program and types
State Representation
Course – Introduction (recall)
The main unifying theme is the idea of an intelligent agent.
define AI as the study of agents that receive percepts from the
environment and perform actions.
sensors
?
?
environment
agent ?
actuators
model
3
Agents And Environments
An agent is anything that can be viewed as perceiving its
environment through sensors
acting upon that environment through actuators.
A robotic agent might have cameras and infrared range finders
for sensors and various motors for actuators.
A software agent receives keystrokes, file contents, and network
packets as sensory inputs and acts on the environment by
displaying on the screen, writing files, and sending network
packets.
19
8-puzzle
States: A state description specifies the location of each of the eight tiles
and the blank in one of the nine squares.
Initial state: Any state can be designated as the initial state. Note that any
given goal can be reached from exactly half of the possible initial states.
Actions: The simplest formulation defines the actions as movements of the
blank space Left, Right, Up, or Down. Different subsets of these are
possible depending on where the blank is.
Transition model: Given a state and action, this returns the resulting state;
for example, if we apply Left to the start state, the resulting state has the 5
and the blank switched.
Goal test: This checks whether the state matches the goal configuration
Path cost: Each step costs 1, so the path cost is the number of steps in the
path
Task Environment
Observable or partially observable?
Discrete or Continuous?
Deterministic or Stochastic?
Static or Dynamic?
Episodic or Sequential?
Multiple or Single Agent?
The End…
22
Artificial Intelligence
Actions
A description of the possible actions available to the agent
Given a particular state s, ACTIONS(s) returns the set of actions that
can be executed in s.
Transition model
A description of what each action does - the transition model
Together, the initial state, actions, and transition model implicitly
define the state space of the problem
state space forms a directed network or graph
Goal test
determines whether a given state is a goal state.
Path cost
function that assigns a numeric cost to each path.
Problem Definition
A solution to a problem is an action sequence that leads from
the initial state to a goal state.
Solution quality is measured by the path cost function, and
an optimal solution has the lowest path cost among all
solutions.
Example Problems - 8-puzzle
9
8-puzzle
States: the location of each of the eight tiles and the blank in one of the
nine squares
Initial state: Any state can be designated as the initial state.
Actions: The simplest formulation defines the actions as movements of the
blank space Left, Right, Up, or Down.
Transition model: Given a state and action, this returns the resulting state;
for example, if apply Left to the start state, the resulting state has the 5 and the blank
switched.
Goal test: checks whether the state matches the goal configuration
Path cost: Each step costs 1, so the path cost is the number of steps in the
path
Example Problems - vacuum world
11
vacuum world
States: The state is determined by both the agent location and the dirt
locations.
Initial state: Any state can be designated as the initial state.
Actions: each state has just three actions: Left, Right, and Suck.
Transition model: The actions have their expected effects, except that
moving Left in the leftmost square, moving Right in the rightmost
square, and Sucking in a clean square have no effect.
Goal test: This checks whether all the squares are clean.
Path cost: Each step costs 1, so the path cost is the number of steps in
the path.
• 8-queens problem
• route-finding problem
Searching for Solutions
A solution is an action sequence, so search algorithms work
by considering various possible action sequences.
The possible action sequences starting at the initial state
form a search tree with the initial state at the root;
the branches are actions and
the nodes correspond to states in the state space of the problem.
Steps in growing Search tree
root node of the tree corresponds to the initial state
to test whether this is a goal state
expanding the current state - generating a new set of states
The set of all leaf nodes available for expansion at any given point - the frontier
17
Artificial Intelligence
Level 0
Level 1
Level 2
Level 3
5
Breadth-first search - Analysis
Complete - if the shallowest goal node is at some finite depth
d, breadth-first search will eventually find it
optimal - shallowest goal node is not necessarily the optimal
one - the same cost
Time Complexity - O(bd)
every state has b successors,
Space Complexity – O(bd)
O(bd−1) nodes in the explored set and O(bd) nodes in the
frontier
the memory requirements are a bigger problem for breadth-
first search than is the execution time
Depth-first search
expands the deepest node in the current frontier of the search
tree
uses a LIFO queue
most recently generated node is chosen for expansion
Depth-first search - Analysis
Complete - not complete
loop forever (or) fail if an infinite non-goal path is encountered
optimal - not optimal
Time Complexity - O(bm)
m itself can be much larger than d (the depth of the shallowest solution)
is infinite if the tree is unbounded
Space Complexity – O(bm)
to store only a single path from the root to a leaf node
remaining unexpanded sibling nodes for each node on the path
Depth-first search
Uniform-cost search
When all step costs are equal, breadth-first search is optimal
algorithm that is optimal with any step-cost function
Instead of expanding the shallowest node, expands the node n
with the lowest path cost.
uses a Priority queue
does not care about the number of steps a path has, but only
about their total cost
goal test is applied to a
node when it is selected
for expansion
Uniform-cost search - Analysis
Complete – yes, if, step cost is non-zero, positive
infinite - if there is a path with an infinite sequence of zero-cost actions
optimal - Yes, expands nodes in order of their optimal path cost
Time Complexity – O(b1+[C∗/ϵ])
Space Complexity –
C∗ be the cost of the optimal solution, and assume that every
action costs at least ϵ
When all step costs are equal, b1+C∗/ When all step costs are
equal, b1+[C∗/ϵ] is just bd+1
Depth-limited search
The failure of depth-first search in infinite state spaces can be
alleviated by a predetermined depth limit l.
nodes at depth l are treated as if they have no successors.
The depth limit solves the infinite-path problem.
In-completeness, if we choose l < d, the shallowest goal is
beyond the depth limit.
Non-optimal, if we choose l > d
Its time complexity is O(bl) and space complexity is O(bl).
Depth-first search can be viewed as a special case of depth-
limited search with l=∞.
depth limits can be based on knowledge of the problem
Iterative deepening DFS
depth-first tree search that finds the best depth limit.
gradually increase the limit — first 0, then 1, then 2, and so
on—until a goal is found.
This will occur when the depth limit reaches d, the depth of the
shallowest goal node
IDS combines the benefits of depth-first and breadth-first
search
Like DFS, its memory requirements are modest: O(bd)
Like BFS,
complete when the branching factor is finite and
optimal when the path cost is a non-decreasing function of the depth
of the node.
number of nodes generated in the worst case
Iterative deepening DFS
b complete if step costs ≥ for positive
c optimal if step costs are all identical
The End…
16
Artificial Intelligence
5
GBFS
6
Greedy Best-First- Analysis
Complete – No
From Iasi to Fagaras
No – can get stuck in loops, e.g., Iasi -> Neamt -> Iasi ->Neamt …
Optimal - No
Time Complexity - O(bm)
Space Complexity – O(bm) keeps all nodes in memory
A∗ search
most widely known form of best-first search
evaluates nodes by combining g(n), the cost to reach the
node, and h(n), the cost to get from the node to the goal
9
A* search - Demo
10
Conditions for optimality
Admissibility
admissible heuristic that never overestimates the cost to reach the
goal, i.e., it is optimistic
A heuristic h(n) is admissible if for every node n, h(n) ≤ h*(n), where
h*(n) is the true cost to reach the goal state from n
Example: hSLD(n) –straight line cannot be an overestimate
Consistency
A heuristic is consistent if for every node n, every successor
n' of n generated by any action a,
Triangle inequality
the triangle is formed by n, n', and
the goal Gn closest to n
Workout
Optimality of A*
15
Artificial Intelligence
Knowledge Representation
Contents
Logical Agents
Knowledge-based Agents
Data Information Knowledge
Knowledge Representation
KR Characteristics
Logical Agents
Humans know things and do reasoning, which are important
for artificial agents
intelligence of humans is achieved by processes of reasoning
that operate on internal representations of knowledge
In AI, this approach to intelligence is embodied in
knowledge-based agents
Knowledge-based Agents
The central component is the knowledge base, or KB
A knowledge base is a set of sentences
A sentence,
expressed in a language called a knowledge representation language
logic languages
represents some assertion about the world
Operations,
TELL - add new sentences to the KB
ASK - a way to query what is known
Both operations may involve inference - deriving new
sentences from old
Knowledge-based agent
KB initially contain some background knowledge
Each time the agent program is called, it does three things.
First, it TELLs the knowledge base what it perceives.
Second, it ASKs the knowledge base what action it should perform.
extensive reasoning may be done about the current state, the
Knowledge
The teacher can see that student’s results are showing a downward trend.
The teacher applies the rule of 40% for a pass to gain the knowledge that
student has passed Tests 1 and 2, and failed Test 3.
Data Information Knowledge
Ice Cream Shop Profit 2017 2018 2019
Month Profit in lakhs
200
2017 2018 2019 190
[None, None, None, None, None] [None, Breeze, None, None, None]
The Wumpus World
[Stench, None, None, None, None] [Stench, Breeze, Glitter, None, None]
The End…
18
Artificial Intelligence
- There Exists:
x P(x) is read “there exists an x such that P(x) is true”.
E.g., there exist a engineering student who is not smart.
mixtures
∀x ∃y Loves(x, y) - Everybody loves somebody
Aditya dislikes
Aditya likes rain and snow
Modus Ponens
Criminal(Joe)
16
Prolog
17
Example 1
Example 2
The End…
20
Artificial Intelligence
Parenthesis > Negation > Conjunction(AND) > Disjunction(OR) > Implication > Biconditional
Examples of PL sentences
Example
It is Raining and it is Thursday:
R T, where
R represents “It is Raining”, T represents “it is Thursday”.
Example
It is not hot but it is sunny. It is neither hot nor sunny.
It is not hot, and it is sunny. It is not hot, and it is not sunny.
Let h = “it is hot” and s = “it is sunny.”
~h s ~h ~s
Example
Suppose x is a particular real number. Let p, q, and r symbolize
“0 < x,” “x < 3,” and “x = 3.” respectively.
Then the following inequalities
x3 0<x<3 0<x3
can be translated as
qr pq p (q r)
Tautology and Contradictory
A tautology is true under any interpretation.
The expression A ˅ ¬A is a tautology.
This means it is always true, regardless of the value of A.
An expression which is false under any interpretation is
contradictory (or unsatisfiable).
A ¬A
Some expressions are satisfiable, but not valid. This means
that they are true under some interpretation, but not under
all interpretations.
AB
A simple knowledge base
R1 : ¬P1,1
R2 : B1,1 ⇔ (P1,2 ∨ P2,1)
R3 : B2,1 ⇔ (P1,1 ∨ P2,2 ∨ P3,1)
R4 : ¬B1,1
R5 : B2,1
Entailment and derivation
Entailment: KB |= Q
Q is entailed by KB (a set of premises or assumptions) if and
only if there is no logically possible world in which Q is false
while all the premises in KB are true.
Or, stated positively, Q is entailed by KB if and only if the
conclusion is true in every logically possible world in which
all the premises in KB are true.
Derivation: KB |- Q
We can derive Q from KB if there is a proof consisting of a
sequence of valid inference steps starting from the premises in
KB and resulting in Q
10
Inference
Deduction: the process of deriving a conclusion from a set of
assumptions
Applying a sequence of rules is called a proof
Equivalent to searching for a solution
If we deduce a conclusion C from a set of assumptions, we
write:
{A1, A2, …, An} ├ C
If C can be concluded without any assumption
├C
The inference rule A ├ B is expressed as
A
B Given A, B is deduced (or concluded).
It is like if A is true, then B is true.
Types of Inference rules
Soundness of Rules
P Q P→Q OK?
True True True
True False False
False True True
False False True
Proving things
A proof is a sequence of sentences, where each sentence is either a
premise or a sentence derived from earlier sentences in the proof by
one of the rules of inference.
The last sentence is the theorem (also called goal or query) that we
want to prove.
Example
15
Fuzzy Logic
Introduction
Fuzzy thinking, why fuzzy, logic
Fuzzy sets
Representation
Linguistic variable and hedges
Operations on Fuzzy sets
Complement
Containment
Intersection, etc.
Fuzzy rules
1
Introduction
Experts rely on common sense when they solve
problems.
How can we represent expert knowledge that uses
vague and ambiguous terms in a computer?
Fuzzy logic is not logic that is fuzzy, but logic that is
used to describe fuzziness.
Fuzzy logic is the theory of fuzzy sets, sets that
calibrate vagueness.
Fuzzy logic is based on the idea that all things admit
of degrees.
2
Fuzzy Logic
Boolean logic uses sharp distinctions.
Fuzzy logic reflects how people think. It
attempts to model our sense of words,
our decision making and our common
sense. As a result, it is leading to new,
more human, intelligent systems.
3
Fuzzy Logic Histroty
Fuzzy, or multi-valued logic was introduced in the
1930s by Jan Lukasiewicz, a Polish philosopher. This
work led to an inexact reasoning technique often
called possibility theory.
Later, in 1937, Max Black published a paper called
“Vagueness: an exercise in logical analysis”. In this
paper, he argued that a continuum implies degrees.
In 1965 Lotfi Zadeh, published his famous paper
“Fuzzy sets”.
Zadeh extended possibility theory into a formal
system of mathematical logic.
4
Why fuzzy?
As Zadeh said, the term is concrete,
immediate and descriptive.
Why logic?
Fuzziness rests on fuzzy set theory, and
fuzzy logic is just a small part of that
theory.
5
Definition
Fuzzy logic is a set of mathematical principles for
knowledge representation based on degrees of
membership.
Unlike two-valued Boolean logic, fuzzy logic is
multi-valued.
It deals with degrees of membership and degrees of
truth.
Fuzzy logic uses the continuum of logical values
between 0 (completely false) and 1 (completely
true).
6
Range of logical values in
Boolean and fuzzy logic
7
Fuzzy sets
The concept of a set is fundamental to
mathematics.
However, our own language is also the
supreme expression of sets. For example, car
indicates the set of cars. When we say a car,
we mean one out of the set of cars.
8
Tall men example
9
Fuzzy Logic
Introduction
Fuzzy thinking, why fuzzy, logic
Fuzzy sets
Representation
Linguistic variable and hedges
Operations on Fuzzy sets
Complement
Containment
Intersection, etc.
Fuzzy rules
1
Tall men example
2
Crisp and fuzzy sets of “tall men”
3
A fuzzy set is a set with fuzzy boundaries
The x-axis represents the universe of discourse
The y-axis represents the membership value of the
fuzzy set.
In classical set theory, crisp set A of X is defined as
fA(x): X → {0, 1}, where
In fuzzy theory, fuzzy set A of universe X is defined
μA(x): X → [0, 1], where μA(x) = 1 if x is totally in A;
μA(x) = 0 if x is not in A;
0 < μA(x) < 1 if x is partly in A.
4
fuzzy set representation
First, we determine the membership
functions. In our “tall men” example, we
can obtain fuzzy sets of tall, short and
average men.
The universe of discourse − the men’s
heights − consists of three sets: short,
average and tall men.
5
Crisp and fuzzy sets
6
Representation of crisp and
fuzzy subsets
7
Fuzzy Logic
➢ Introduction
➢ Fuzzy thinking, why fuzzy, logic
➢ Fuzzy sets
➢ Representation
➢ Linguistic variable and hedges
➢ Operations on Fuzzy sets
➢ Complement
➢ Containment
➢ Intersection, etc.
➢ Fuzzy rules
Linguistic variables and hedges
▪ At the root of fuzzy set theory lies the
idea of linguistic variables.
▪ A linguistic variable is a fuzzy variable.
For example, the statement “John is tall”
implies that the linguistic variable John
takes the linguistic value tall.
2
Example
▪ In fuzzy expert systems, linguistic variables are used
in fuzzy rules. For example:
IF wind is strong
THEN sailing is good
IF project_duration is long
THEN completion_risk is high
IF speed is slow
THEN stopping_distance is short
3
Hedge
▪ A linguistic variable carries with it the
concept of fuzzy set qualifiers, called
hedges.
▪ Hedges are terms that modify the shape
of fuzzy sets. They include adverbs such
as very, somewhat, quite, more or less
and slightly.
4
Crisp and fuzzy sets
5
Fuzzy sets with the hedge very
6
Representation of hedges
7
Representation of hedges (cont.)
8
Operations on Classical Sets
Union:
A B = {x | x A or x B}
Intersection:
A B = {x | x A and x B}
Complement:
A’ = {x | x A, x X}
X – Universal Set
Set Difference:
A | B = {x | x A and x B}
Set difference is also denoted by A - B
9
Operations on Classical Sets
10
Operations on Classical Sets
Complement of set A.
11
Properties of Classical Sets
AB=BA
AB=BA
A (B C) = (A B) C
A (B C) = (A B) C
A (B C) = (A B) (A C)
A (B C) = (A B) (A C)
AA=A
AA=A
AX=X
AX=A
A=A
A=
12
Operations of fuzzy sets
▪ The classical set theory developed in
the late 19th century by Georg Cantor
describes how crisp sets can interact.
These interactions are called
operations.
13
Cantor’s sets
14
Complement
Crisp Sets: Who does not belong to the set?
Fuzzy Sets: How much do elements not belong
to the set?
▪ The complement of a set is an opposite of this
set.
μA(x) = 1 − μA(x)
15
Fuzzy Set Operations
16
Containment
Crisp Sets: Which sets belong to which other sets?
Fuzzy Sets: Which sets belong to other sets?
▪ A set can contain other sets. The smaller set is called
subset.
▪ In crisp sets, all elements of a subset entirely belong
to a larger set.
▪ In fuzzy sets, each element can belong less to the
subset than to the larger set. Elements of the fuzzy
subset have smaller memberships in it than in the
larger set.
17
Intersection
Crisp Sets: Which element belongs to both sets?
Fuzzy Sets: How much of the element is in both sets?
▪ In classical set theory, an intersection between two
sets contains the elements shared by these sets
▪ In fuzzy sets, an element may partly belong to both
sets with different memberships. A fuzzy intersection
is the lower membership in both sets of each
element.
μA∩B(x) = min [μA(x), μB(x)] = μA(x) ∩ μB(x)
where xX
18
Fuzzy Sets
A B → XA B(x)
= XA(x) XB(x)
= min(XA(x),XB(x))
A’ → XA’(x)
= 1 – XA(x)
A’’ = A
19
Union
Crisp Sets: Which element belongs to either set?
Fuzzy Sets: How much of the element is in either set?
▪ The union of two crisp sets consists of every element
that falls into either set.
▪ In fuzzy sets, the union is the reverse of the
intersection. That is, the union is the largest
membership value of the element in either set.
μAB(x) = max [μA(x), μB(x)] = μA(x) μB(x)
where xX
20
Fuzzy Sets
21
Fuzzy Set Operations
22
Operations of fuzzy sets
23
Fuzzy Set Operations
A B
AB AB A
25
A A’ = X A A’ = Ø
Excluded middle axioms for crisp sets. (a) Crisp set A and its
complement; (b) crisp A ∪ A = X (axiom of excluded
middle); and (c) crisp A ∩ A = Ø (axiom of contradiction).
26
A A’ A A’
Excluded middle axioms for fuzzy sets are not valid. (a) Fuzzy set
A and its complement;
∼ (b) fuzzy A ∪ A = X (axiom of
∼
A
A B
A B
A B
28
Examples of Fuzzy Set Operations
▪ Fuzzy union (): the union of two fuzzy sets is the
maximum (MAX) of each element from two sets.
▪ E.g.
▪ A = {1.0, 0.20, 0.75}
▪ B = {0.2, 0.45, 0.50}
▪ A B = {MAX(1.0, 0.2), MAX(0.20, 0.45), MAX(0.75,
0.50)} = {1.0, 0.45, 0.75}
29
Examples of Fuzzy Set Operations
▪ Fuzzy intersection (): the intersection of two fuzzy
sets is just the MIN of each element from the two
sets.
▪ E.g.
▪ A B = {MIN(1.0, 0.2), MIN(0.20, 0.45), MIN(0.75,
0.50)} = {0.2, 0.20, 0.50}
30
Examples of Fuzzy Set Operations
▪ The complement of a fuzzy variable with DOM x is
(1-x).
▪ Example.
▪ Ac = {1 – 1.0, 1 – 0.2, 1 – 0.75} = {0.0, 0.8, 0.25}
31
Properties of Fuzzy Sets
AB=BA
AB=BA
A (B C) = (A B) C
A (B C) = (A B) C
A (B C) = (A B) (A C)
A (B C) = (A B) (A C)
AA=A AA=A
AX=X AX=A
A=A A=
If A B C, then A C
A’’ = A 32
Fuzzy Sets
33
Example (Discrete Universe)
A ( x)
0.5
0
2 4 6 8
x : # courses 34
Example (Discrete Universe)
35
Example (Continuous Universe)
B = ( x, B ( x)) x U
1 about 50 years old
B ( x) =
−
4
x 50
1+ 1.2
5 1
0.8
Alternative B ( x) 0.6
Representation: 0.4
0.2
B= 1
x 0
R + 1+ ( x−550 )
4 0 20 40 60 80 100
36
x : age
Alternative Notation
A = ( x, A ( x)) x U
U : discrete universe A=
xi U
A ( xi ) / xi
U : continuous universe A = A ( x) / x
U
▪ AB max(A, B)
▪ AB = C "Quality C is the disjunction
of Quality A and B"
A B
1 1
0.75
0.375
0 0
• (AB = C) (C = 0.75)
38
Fuzzy Conjunction
▪ AB min(A, B)
▪ AB = C "Quality C is the conjunction
of Quality A and B"
A B
1 1
0.75
0.375
0 0
• (AB = C) (C = 0.375)
39
Example: Fuzzy
Conjunction
Calculate AB given that A is .4 and B is 20
A B
1 1
0 0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 5 10 15 20 25 30 35 40
40
Example: Fuzzy
Conjunction
Calculate AB given that A is .4 and B is 20
A B
1 1
0 0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 5 10 15 20 25 30 35 40
41
Example: Fuzzy
Conjunction
Calculate AB given that A is .4 and B is 20
A B
1 1
0.7
0 0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 5 10 15 20 25 30 35 40
42
Example: Fuzzy Conjunction
Calculate AB given that A is .4 and B is 20
A B
1 1
0.9
0.7
0 0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 5 10 15 20 25 30 35 40
43
Example: Fuzzy
Conjunction
Calculate AB given that A is .4 and B is 20
A B
1 1
0.9
0.7
0 0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 5 10 15 20 25 30 35 40
44
Fuzzy Logic
Introduction
Fuzzy thinking, why fuzzy, logic
Fuzzy sets
Representation
Linguistic variable and hedges
Operations on Fuzzy sets
Complement
Containment
Intersection, etc.
Fuzzy rules
1
Fuzzy rules
In 1973, Lotfi Zadeh published his
second most influential paper. This paper
outlined a new approach to analysis of
complex systems, in which Zadeh
suggested capturing human knowledge in
fuzzy rules.
2
What is a fuzzy rule?
A fuzzy rule can be defined as a conditional
statement in the form:
IF x is A
THEN y is B
where x and y are linguistic variables; and A
and B are linguistic values determined by
fuzzy sets on the universe of discourses X and
Y, respectively.
3
classical vs. fuzzy rules?
A classical IF-THEN rule uses binary logic
Rule: 1 Rule: 2
IF speed is > 100 IF speed is < 40
THEN stopping_distance is long THEN stopping_distance is short
4
Fuzzy Rules
Fuzzy rules relate fuzzy sets.
In a fuzzy system, all rules fire to some
extent, or in other words they fire
partially.
If the antecedent is true to some degree
of membership, then the consequent is
also true to that same degree
5
Fuzzy sets of tall and heavy men
7
Fuzzy Rule
A fuzzy rule can have multiple antecedents, for
example:
IF project_duration is long
AND project_staffing is large
AND project_funding is inadequate
THEN risk is high
IF service is excellent
OR food is delicious
THEN tip is generous
8
Fuzzy Rule
The consequent of a fuzzy rule can also include
multiple parts, for instance:
IF temperature is hot
THEN hot_water is reduced;
cold_water is increased
9
Fuzzy inference
The most commonly used fuzzy inference
technique is the so-called Mamdani method.
In 1975, Professor Ebrahim Mamdani of
London University built one of the first fuzzy
systems to control a steam engine and boiler
combination.
He applied a set of fuzzy rules supplied by
experienced human operators.
1
Mamdani fuzzy inference
2
Example
We examine a simple two-input one-output problem
that includes three rules:
Rule: 1 Rule: 1
IF x is A3 IF project_funding is adequate
OR y is B1 OR project_staffing is small
THEN z is C1 THEN risk is low
Rule: 2 Rule: 2
IF x is A2 IF project_funding is marginal
AND y is B2 AND project_staffing is large
THEN z is C2 THEN risk is normal
Rule: 3 Rule: 3
IF x is A1 IF project_funding is inadequate
THEN z is C3 THEN risk is high
3
Identify Linguistic Variables/Values
and Inputs/Output
Rule: 1 Rule: 1
IF x is A3 IF project_funding is adequate
OR y is B1 OR project_staffing is small
THEN z is C1 THEN risk is low
Rule: 2 Rule: 2
IF x is A2 IF project_funding is marginal
AND y is B2 AND project_staffing is large
THEN z is C2 THEN risk is normal
Rule: 3 Rule: 3
IF x is A1 IF project_funding is inadequate
THEN z is C3 THEN risk is high
4
Mamdani-style fuzzy inference
5
Step 1: Fuzzification
The process of transforming crisp quantities into
fuzzy sets/linguistic variables.
Use hedges to generate new fuzzy sets if required.
Project funding (A)
={A1,A2,A3}
= {inadequate, marginal, adequate}
Project staffing (B)
={B1, B2}
={small, large}
Risk (C) = { C1, C2, C3}
={low, normal, high} 6
Step 1: Fuzzification
given the crisp inputs, x1 and y1 (project funding
=35% and project staffing = 60%)
Determine the degree to which these inputs belong to
each of the appropriate fuzzy sets.
Crisp Input Crisp Input
x1 y1
1 1 B1 B2
A1 A2 A3 0.7
0.5
0.2 0.1
0 0
x1 X y1 Y
(x = A1) = 0.5 (y = B1) = 0.1
(x = A2) = 0.2 (y = B2) = 0.7
7
Fuzzy Membership Function
Membership function (MF) - A function that specifies the degree to which a given input
belongs to a set.
Degree of membership- The output of a membership function, this value is always limited to
between 0 and 1. Also known as a membership value or membership grade.
Membership functions are used in the fuzzification and defuzzification steps of a FLS (fuzzy
logic system), to map the non-fuzzy input values to fuzzy linguistic terms and vice versa
Support: elements having non-zero degree of membership.
Core: set with elements having degree of 1.
α-Cut: set of elements with degree >= α.
Height: maximum degree of membership.
The Fuzzy Logic Toolbox includes 9 built-in membership function types.
These 9 functions are, in turn, built from several basic functions:
Piecewise linear functions.
Gaussian distribution function.
Sigmoid curve.
Quadratic polynomial curves.
Cubic polynomial curves.
8
Membership Function
There are several ways to assign values to fuzzy variables: Intuition,
Inference, Rank ordering, Angular fuzzy sets, Neural networks, Genetic
algorithm, etc.
Inference method performs deductive reasoning and uses knowledge of
geometrical shapes for defining membership values.
membership functions may be defined by various shapes: Triangular,
Trapezoidal, Piecewise linear, Gaussian, Singleton.
9
Membership Functions in the Fuzzy
Logic Toolbox
The simplest membership functions are formed using straight lines.
These straight line membership functions have the advantage of simplicity.
Triangular membership function: trimf.
Trapezoidal membership function: trapmf.
Two membership functions are built on the Gaussian distribution curve: a simple
Gaussian curve and a two-sided composite of two different Gaussian curves.
The two functions are gaussmf and gauss2mf.
The generalized bell membership function is specified by three parameters and
has the function name gbellmf.
Sigmoidal membership function: sigmf.
Polynomial based curves: Three related membership functions are the Z, S, and Pi
curves, all named because of their shape ( The functions zmf, smf and pimf).
Fuzzy Logic Toolbox also allows you to create your own membership functions.
x = (0:0.1:10)'; y1 = trapmf (x, [2 3 7 9]); y2 = trapmf (x, [3 4 6 8]); y3 = trapmf
(x, [4 5 5 7]); y4 = trapmf (x, [5 6 4 6]);
plot (x, [y1 y2 y3 y4]);
10
Triangular membership function
most widely accepted and used membership function (MF).
The triangle which fuzzifies the input can be defined by three
parameters a, b and c, where and c defines the base and b
defines the height of the triangle.
Trivial case:
If input x = b,
then it is having full membership
in the given set.
So, μ(x) = 1, if x = b
If input is less than a or greater then b, then it does belongs to
fuzzy set at all, and its membership value will be 0
μ(x)=0, x<a or x>c
11
Triangular membership function
x is between a and b:
12
Triangular membership function
x is between b and c:
If x is between b and c, its membership
value varies from 0 to 1.
If it is near b, its membership value
is close to 1, and if x is near to c,
its membership value gets close to 0.
We can compute the fuzzy value of x using similar triangle
rule, μ(x) = (c – x) / (c – b), b≤x≤c
Combine all together:
13
Trapezoidal membership function:
14
Step 1: Fuzzification
given the crisp inputs, x1 and y1 (project funding =35% and project staffing = 60%)
A1 = {0,0,20,50}
A2 = {30, 50, 75}
B1 = {0,0,15,65}
B2 = {35,70,100,100}
15
Step 2: Rule Evaluation
Take the fuzzified inputs
(x=A1) = 0.5, (x=A2) = 0.2,
(y=B1) = 0.1 (y=B2) = 0.7
Apply them to the antecedents of the fuzzy
rules.
This number (the truth value) is then applied
to the consequent membership function.
16
Rule Evaluation (cont.)
To evaluate the disjunction of the rule antecedents,
we use the OR fuzzy operation. Typically, fuzzy
expert systems make use of the classical fuzzy
operation union:
AB(x) = max [A(x), B(x)]
Similarly, in order to evaluate the conjunction of the
rule antecedents, we apply the AND fuzzy operation
intersection:
AB(x) = min [A(x), B(x)]
17
Mamdani-style rule evaluation
1 1 1
A3 B1 C1 C2 C3
0.1 OR 0.1
0.0
(max)
0 x1 X 0 y1 Y 0 Z
0 x1 X 0 Z
Rule 3: IF x is A1 (0.5) THEN z is C3 (0.5)
18
Rule Evaluation (cont.)
Now the result of the antecedent evaluation can be
applied to the membership function of the consequent.
clipping : cut the consequent membership function at the
level of the antecedent truth..
Since the top of the membership function is sliced, the
clipped fuzzy set loses some information.
scaling : The original membership function of the rule
consequent is adjusted by multiplying all its membership
degrees by the truth value of the rule antecedent.
offers a better approach for preserving the original shape
of the fuzzy set.
Generally loses less information
19
Clipped and scaled membership
functions
Degree of Degree of
Membership Membership
1.0 1.0
C2 C2
0.2 0.2
0.0 0.0
Z Z
20
Step 3: Aggregation of the rule
outputs
Process of unification of the outputs of all rules.
take the membership functions of all rule
consequents previously clipped or scaled and
combine them into a single fuzzy set.
Input: the list of clipped or scaled consequent
membership functions
Output: one fuzzy set for each output variable.
1 1 1
C1 C2 C3
0.5 0.5
0.2 0.2
0.1 0.1
0 Z 0 Z 0 Z 0 Z
z is C 1 (0.1) z is C 2 (0.2) z is C 3 (0.5)
21
Step 4: Defuzzification
Fuzziness helps us to evaluate the rules, but the final
output of a fuzzy system has to be a crisp number.
The input for the defuzzification process is the
aggregate output fuzzy set and the output is a single
number.
There are several defuzzification methods
The most popular one is the centroid technique. It
finds the point where a vertical line would slice the
aggregate set into two equal masses.
22
Defuzzification (cont.)
Mathematically this centre of gravity (COG) can be
b
expressed as: x x dx
A
COG a
b
A x dx
a
23
Centre of gravity (COG)
(0 10 20) 0.1 (30 40 50 60) 0.2 (70 80 90 100) 0.5
COG 67.4
0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.5 0.5 0.5 0.5
Degree of
Membership
1.0
0.8
0.6
0.4
0.2
0.0
0 10 20 30 40 50 60 70 80 90 100
67.4 Z
24
Sugeno fuzzy inference
Mamdani-style inference: find the centroid of a two-
dimensional shape by integrating across a
continuously varying function.
Not computationally efficient.
Michio Sugeno suggested to use a single spike, a
singleton, as the membership function of the rule
consequent.
A fuzzy singleton is a fuzzy set with a membership
function that is unity at a single particular point on
the universe of discourse and zero everywhere else.
1
Sugeno fuzzy inference (cont.)
Sugeno-style fuzzy inference is very similar to the
Mamdani method.
Sugeno changed only a rule consequent.
Instead of a fuzzy set, he used a mathematical
function of the input variable.
IF x is A
AND y is B
THEN z is f (x, y)
where x, y and z are linguistic variables; A and B are
fuzzy sets on universe of discourses X and Y,
respectively; and f (x, y) is a mathematical function.
2
Sugeno fuzzy inference (cont.)
The most commonly used zero-order Sugeno fuzzy
model applies fuzzy rules in the following form:
IF x is A
AND y is B
THEN z is k
where k is a constant.
In this case, the output of each fuzzy rule is constant.
All consequent membership functions are
represented by singleton spikes.
3
Sugeno-style rule evaluation
4
Sugeno-style aggregation of the rule
outputs
5
Weighted average (WA):
Sugeno-style defuzzification
6
Mamdani or Sugeno?
Mamdani method
Widely accepted for capturing expert knowledge.
Allows description of the expertise in more intuitive,
more human-like manner.
Entails a substantial computational burden.
Sugeno method
Computationally effective
Works well with optimisation and adaptive techniques
Makes it very attractive in control problems,
particularly for dynamic nonlinear systems.
7
Building a fuzzy expert system: case
study
A service centre keeps spare parts and repairs failed
ones.
A customer brings a failed item and receives a spare
of the same type.
Failed parts are repaired, placed on the shelf, and
thus become spares.
The objective here is to advise a manager of the
service centre on certain decision policies to keep the
customers satisfied.
1
Process of developing a fuzzy expert
system
1. Specify the problem and define linguistic
variables.
2. Determine fuzzy sets.
3. Elicit and construct fuzzy rules.
4. Encode the fuzzy sets, fuzzy rules and
procedures to perform fuzzy inference into the
expert system.
5. Evaluate and tune the system.
2
Step 1: Specify the problem and define
linguistic variables
There are four main linguistic variables:
average waiting time (mean delay) m,
repair utilization factor of the service
centre ,
number of servers s,
initial number of spare parts n.
3
Linguistic variables and their ranges
Linguistic Variable: Mean Delay, m
Linguistic Value Notation Numerical Range (normalised)
Very Short VS [0, 0.3]
Short S [0.1, 0.5]
Medium M [0.4, 0.7]
Linguistic Variable: Number of Servers, s
Linguistic Value Notation Numerical Range (normalised)
Small S [0, 0.35]
Medium M [0.30, 0.70]
Large L [0.60, 1]
Factor,
Linguistic Variable: Repair Utilisation Factor,
Linguistic Value Notation Numerical Range
Low L [0, 0.6]
Medium M [0.4, 0.8]
High H [0.6, 1]
Linguistic Variable: Number of Spares, n
Linguistic Value Notation Numerical Range (normalised)
Very Small VS [0, 0.30]
Small S [0, 0.40]
Rather Small RS [0.25, 0.45]
Medium M [0.30, 0.70]
Rather Large RL [0.55, 0.75]
Large L [0.60, 1]
Very Large VL [0.70, 1]
4
Step 2: Determine fuzzy sets
Fuzzy sets can have a variety of shapes.
A triangle or a trapezoid can often provide an
adequate representation of the expert knowledge, and
at the same time, significantly simplifies the process
of computation.
5
Fuzzy sets of Mean Delay m
Degree of
Membership
1.0
0.8 VS S M
0.6
0.4
0.2
0.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Mean Delay (normalised)
6
Fuzzy sets of Number of Servers s
Degree of
Membership
1.0
0.8 S M L
0.6
0.4
0.2
0.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Number of Servers (normalised)
7
Fuzzy sets of Repair Utilisation Factor
Degree of
Membership
1.0
0.8 L M H
0.6
0.4
0.2
0.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Repair Utilisation Factor
8
Fuzzy sets of Number of Spares n
Degree of
Membership
1.0
0.8 VS S RS M RL L VL
0.6
0.4
0.2
0.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Number of Spares (normalised)
9
Step 3: Elicit and construct fuzzy
rules
To accomplish this task, we might ask the expert to
describe how the problem can be solved using the
fuzzy linguistic variables defined previously.
Required knowledge also can be collected from other
sources such as books, computer databases, flow
diagrams and observed human behaviour.
10
Fuzzy Associative Memeory (FAM)
square
No. of Spares
11
Rule Base 1
12
The rule table
13
Cube FAM of Rule Base 2
14
Step 4
Encode the fuzzy sets, fuzzy rules and procedures to
perform fuzzy inference into the expert system
two options:
build our system using a programming language such
as C/C++ or Pascal,
apply a fuzzy logic development tool such as
MATLAB Fuzzy Logic Toolbox or Fuzzy Knowledge
Builder.
15
Step 5: Evaluate and tune the system
We want to see whether our fuzzy system meets the
requirements specified at the beginning.
Several test situations depend on the mean delay,
number of servers and repair utilisation factor.
The Fuzzy Logic Toolbox can generate surface to
help us analyze the system’s performance.
16
Three-dimensional plots for Rule Base
1
17
Three-dimensional plots for Rule Base
1
18
Three-dimensional plots for Rule Base
2
19
Three-dimensional plots for Rule Base
2
20
Modified fuzzy sets of Number of
Servers s
21
Cube FAM of Rule Base 3
22
Three-dimensional plots for Rule Base
3
23
Three-dimensional plots for Rule Base
3
24
Tuning fuzzy systems
1. Review model input and output variables, and if
required redefine their ranges.
2. Review the fuzzy sets, and if required define
additional sets on the universe of discourse. The use
of wide fuzzy sets may cause the fuzzy system to
perform roughly.
3. Provide sufficient overlap between neighbouring
sets. It is suggested that triangle-to-triangle and
trapezoid-to-triangle fuzzy sets should overlap
between 25% to 50% of their bases.
25
Tuning fuzzy systems
4. Review the existing rules, and if required add new
rules to the rule base.
5. Examine the rule base for opportunities to write
hedge rules to capture the pathological behaviour of
the system.
6. Adjust the rule execution weights. Most fuzzy
logic tools allow control of the importance of rules
by changing a weight multiplier.
7. Revise shapes of the fuzzy sets. In most cases,
fuzzy systems are highly tolerant of a shape
approximation.
26
Introduction to
Artificial Neural Networks
Background
- Neural Networks can be :
- Biological models
- Artificial models
2
Biological analogy and some main ideas
3
How Does the Brain Work ? (1)
NEURON
- The cell that performs information processing in the
brain.
4
How Does the Brain Work ? (2)
Each consists of :
SOMA, DENDRITES, AXON, and SYNAPSE.
5
Brain vs. Digital Computers (1)
6
Brain vs. Digital Computers (2)
Human Computer
7
Brain vs. Digital Computers (3)
8
History
1943: McCulloch & Pitts show that neurons can be
combined to construct a Turing machine (using ANDs,
ORs, & NOTs)
9
Definition of Neural Network
experiential knowledge.
10
Neurons vs. Units (1)
- Each element of NN is a node called unit.
- Units are connected by links.
- Each link has a numeric weight.
Notation
11
Notation (cont.)
12
Computing Elements
A typical unit:
13
A Computing Unit.
14
Calculations
Input function:
Activation function g:
15
Simple Computations in this network
16
Activation Functions
1) Step function
2) Sign function
3) Sigmoid function
17
Activation Functions
18
Standard structure of an artificial neural network
Input units
represents the input as a fixed-length vector of numbers (user
defined)
Hidden units
calculate thresholded weighted sums of the inputs
represent intermediate calculations that the network learns
Output units
represent the output as a fixed length vector of numbers
19
Neural Network Example
20
Units in Action
21
Network Structures
The main distinction is between feed-forward and recurrent networks.
feed-forward network, links are unidirectional, and there are no cycles.
recurrent network, the links can form arbitrary topologies.
Technically speaking, a feed-forward network is a directed acyclic graph (DAG).
We deal with networks that are arranged in layers.
In a layered feed-forward network, each unit is linked only to units in the next
layer;
there are no links between units in the same layer, no links backward to a
previous layer, and no links that skip a layer.
22
Network Structures
lack of cycles - computation can proceed uniformly from input units to output
units.
The activation from the previous time step plays no part in the computation,
because it is not fed back to an earlier unit.
Hence, a feed-forward network simply computes a function of the input values
that depends on the weight settings—it has no internal state other than the
weights themselves.
Such networks can implement adaptive versions of simple reflex agents or they
can function as components of more complex agents.
we will focus on feed-forward networks because they are relatively well-
understood.
Obviously, the brain cannot be a feed-forward network, else we would have no
short-term memory.
Some regions of the brain are largely feed-forward and somewhat layered, but
there are rampant back-connections.
In our terminology, the brain is a recurrent network.
23
Network Structures
Recurrent networks can become unstable, or oscillate, or exhibit chaotic
behavior.
Given some input values, it can take a long time to compute a stable output, and
learning is made more difficult.
On the other hand, recurrent networks can implement more complex agent
designs and can model systems with state.
24
Hopfield networks
Hopfield networks are probably the best-understood class of recurrent networks.
They use bidirectional connections with symmetric weights
all of the units are both input and output units;
the activation function g is the sign function; and the activation levels can only
be ± 1.
functions as an associative memory—after training on a set of examples, a new
stimulus will cause the network to settle into an activation pattern corresponding
to the example in the training set that most closely resembles the new stimulus.
One of the most interesting theoretical results is that Hopfield networks can
reliably store up to 0.1387V training examples, where N is the number of units in
the network.
25
Boltzmann machines
Boltzmann machines also use symmetric weights, but include units that are
neither input nor output units.
They also use a stochastic activation function, such that the probability of the
output being 1 is some function of the total weighted input.
Boltzmann machines therefore undergo state transitions that resemble a
simulated annealing search for the configuration that best approximates the
training set.
It turns out that Boltzmann machines are formally identical to a special case of
belief networks evaluated with a stochastic simulation algorithm.
Some networks, called perceptrons, have no hidden units.
This makes the learning problem much simpler, but it means that perceptrons are
very limited in what they can represent.
Networks with one or more layers of hidden units are called multilayer networks.
With one (sufficiently large) layer of hidden units, it is possible to represent any
continuous function of the inputs; with two layers, even discontinuous functions
can be represented.
26
Example
With a fixed structure and fixed
activation functions g, the functions
representable by a feed-forward network
are restricted to have a specific
parameterized structure.
The weights chosen for the network
determine which of these functions is
actually represented.
27
Optimal Network Structure
So far we have considered networks with a fixed structure, determined by some
outside authority.
This is a potential weak point, because the wrong choice of network structure
can lead to poor performance.
If we choose a network that is too small, then the model will be incapable of
representing the desired function.
If we choose a network that is too big, it will be able to memorize all the
examples by forming a large lookup table, but will not generalize well to inputs
that have not been seen before.
In other words, like all statistical models, neural networks are subject to
overfitting when there are too many parameters (i.e., weights) in the model.
28
Optimal Network Structure
It is known that a feed-forward network with one hidden layer can approximate
any continuous function of the inputs, and a network with two hidden layers can
approximate any function at all.
However, the number of units in each layer may grow exponentially with the
number of inputs.
As yet, we have no good theory to characterize NERFs, or Network Efficiently
Representable Functions—functions that can be approximated with a small
number of units.
We can think of the problem of finding a good network structure as a search
problem.
One approach that has been used is to use a genetic algorithm to search the space
of network structures.
However, this is a very large space, and evaluating a state in the space means
running the whole neural network training protocol, so this approach is very
CPU-intensive.
Therefore, it is more common to see hill-climbing searches that selectively
modify an existing network structure.
29
Optimal Network Structure
There are two ways to do this: start with a big network and make it smaller, or
start with a small one and make it bigger.
A mechanism called optimal brain damage to remove weights from the initial
fully-connected model.
After the network is initially trained, an information theoretic approach identifies
an optimal selection of connections that can be dropped (i.e., the weights are set
to zero).
The network is then retrained, and if it is performing as well or better, the
process is repeated.
This process was able to eliminate 3/4 of the weights, and improve overall
performance on test data.
In addition to removing connections, it is also possible to remove units, that are
not contributing much to the result.
30
Optimal Network Structure
Several algorithms have been proposed for growing a larger network from a
smaller one.
The tiling algorithm (Mezard and Nadal, 1989) is interesting because it is similar
to the decision tree learning algorithm.
The idea is to start with a single unit that does its best to produce the correct
output on as many of the training examples as possible.
Subsequent units are added to take care of the examples that the first unit got
wrong.
The algorithm adds only as many units as are needed to cover all the examples.
The cross-validation techniques are useful for deciding when we have found a
network of the right size.
31
Perceptrons
Layered feed-forward networks were first studied in the late 1950s under the
name perceptrons.
Although networks of all sizes and topologies were considered, the only
effective learning element at the time was for single-layered networks, so that is
where most of the effort was spent.
Today, the name perceptron is used as a synonym for a single-layer, feed-
forward network.
The left-hand side of Figure shows such a
perceptron network.
Notice that each output unit is independent of the
others — each weight only affects one of the
outputs.
That means that we can limit our study to
perceptrons with a single output unit, as in the
right-hand side of Figure, and use several of them
to build up a multi-output perceptron.
32
What can Perceptrons Represent ?
We saw that units can represent the simple Boolean functions AND, OR, and
NOT, and that therefore a feed-forward network of units can represent any
Boolean function, if we allow for enough layers and units.
But what Boolean functions can be represented with a single-layer perceptron?
Some complex Boolean functions can be represented.
For example, the majority function, which outputs a 1 only if more than half of
its n inputs are 1, can be represented by a perceptron with each Wj, = 1 and
threshold t= n/2.
This would require a decision tree with O(2^n) nodes.
33
What can Perceptrons Represent ?
Figure shows three different Boolean functions of two inputs, the AND, OR, and
XOR functions.
Each function is represented as a two-dimensional plot, based on the values of
the two inputs.
Black dots indicate a point in the input space where the value of the function is
1, and white dots indicate a point where the value is 0.
, a perceptron can represent a function only if there is some line that separates all
the white dots from the black dots.
Such functions are called linearly separable.
Thus, a perceptron can represent AND and OR, but not XOR.
34
What can Perceptrons Represent ?
The fact that a perceptron can only represent linearly separable functions follows
directly from Equation
which defines the function computed by a perceptron.
A perceptron outputs a 1 only if W • I > 0.
This means that the entire input space is divided in two along a boundary defined
by W • I = 0, that is, a plane in the input space with coefficients given by the
weights.
With n inputs, the input space is n-dimensional, and linear separability can be
rather hard to visualize if n is too large.
It is easiest to understand for the case where n = 2.
In Figure (a), one possible separating "plane" is the dotted line defined by the
equation
35
What can Perceptrons Represent ?
With three inputs, the separating plane can still be visualized. Figure shows an
example in three dimensions.
The function we are trying to represent is true if and only if a minority of its
three inputs are true.
The shaded separating plane is defined by the equation I1 +I2 + I3 = 1.5
This time the positive outputs lie below the plane, in the region (-I1) + (-I2) + (-I
3 )>-1.5
Figure (b) shows a unit to implement the function
36
Learning linearly separable functions
As with any performance element, the question of what perceptrons can
represent is prior to the question of what they can learn.
We have just seen that a function can be represented by a perceptron if and only
if it is linearly separable.
That is relatively bad news, because there are not many linearly separable
functions.
The (relatively) good news is that there is a perceptron algorithm that will learn
any linearly separable function, given enough training examples.
Most neural network learning algorithms, including the perceptron learning
method, follow the current-best-hypothesis (CBH), in this case, the hypothesis is
a network, defined by the current values of the weights.
The initial network has randomly assigned weights, usually from the range [-
0.5,0.5].
The network is then updated to try to make it consistent with the examples. This
is done by making small adjustments in the weights to reduce the difference
between the observed and predicted values.
37
Learning linearly separable functions
The main difference from the logical algorithms is the need to repeat the update
phase several times for each example in order to achieve convergence.
Typically, the updating process is divided into epochs.
Each epoch involves updating all the weights for all the examples.
The general scheme is shown as NEURAL-NETWORK-LEARNING
For perceptrons, the weight update rule is particularly simple.
If the predicted output for the single output unit is O, and the correct output
should be T, then the error is given by Err = T-O
If the error is positive, then we need to increase O; if it is negative, we need to
decrease O.
Now each input unit contributes Wj Ij to the total input, so if Ij is positive, an
increase in Wj will tend to increase O, and if Ij is negative, an increase in Wj will
tend to decrease O.
Thus, we can achieve the effect we want with the following rule:
39
Learning linearly separable functions
This rule is a slight variant of the perceptron learning rule proposed by Frank
Rosenblatt in 1960.
Rosenblatt proved that a learning system using the perceptron learning rule will
converge to a set of weights that correctly represents the examples, as long as the
examples represent a linearly separable function.
The perceptron convergence theorem created a good deal of excitement when it
was announced.
People were amazed that such a simple procedure could correctly learn any rep-
resentable function, and there were great hopes that intelligent machines could
be built from perceptrons.
It was not until 1969 that Minsky and Papert undertook what should have been
the first step: analyzing the class of representable functions.
Their book Perceptrons (Minsky and Papert, 1969) clearly demonstrated the
limits of linearly separable functions.
40
Multi-Layer Feed Forward
Networks
Multi-Layer Feed Forward Networks
Rosenblatt and others described multilayer feed-forward networks in the late
1950s, but concentrated their research on single-layer perceptrons.
This was mainly because of the difficulty of finding a sensible way to update the
weights between the inputs and the hidden units; whereas an error signal can be
calculated for the output units, it is harder to see what the error signal should be
for the hidden units.
When the book Perceptrons was published, Minsky and Papert (1969) stated that
it was an "important research problem" to investigate multilayer networks more
thoroughly.
In a sense, they were right. Learning algorithms for multilayer networks are
neither efficient nor guaranteed to converge to a global optimum.
On the other hand, the results of computational learning theory tell us that
learning general functions from examples is an intractable problem in the worst
case, regardless of the method, so we should not be too dismayed.
The most popular method for learning in multilayer networks is called back-
propagation.
2
Multi-Layer Feed Forward Networks
It was first invented in 1969 by Bryson and Ho, but was more or less ignored
until the mid- 1980s.
The reasons for this may be sociological, but may also have to do with the
computational requirements of the algorithm on nontrivial problems.
3
N-layer FeedForward Network
4
Back-Propagation Learning
Learning in a network proceeds the same way as for perceptrons: example inputs
are presented to the network, and if the network computes an output vector that
matches the target, nothing is done.
If there is an error (a difference between the output and target), then the weights
are adjusted to reduce this error.
The trick is to assess the blame for an error and divide it among the contributing
weights.
In perceptrons, this is easy, because there is only one weight between each input
and the output.
But in multilayer networks, there are many weights connecting each input to an
output, and each of these weights contributes to more than one output.
The back-propagation algorithm is a sensible approach to dividing the
contribution of each weight.
As in the perceptron learning algorithm, we try to minimize the error between
each target output and the output actually computed by the network.
At the output layer, the weight update rule is very similar to the rule for the
perceptron.
5
Back-Propagation Learning
6
Back-Propagation Learning
7
8
Back-Propagation Learning
9
Back-Propagation Learning
In Figure, we show two curves.
The first is a training curve, which shows the
mean squared error on a given training set of 100
examples during the weight-updating process.
This demonstrates that the network does indeed
converge to a perfect fit to the training data.
The second curve is the standard learning curve
for the restaurant data, with one minor exception:
the y-axis is no longer the proportion of correct
answers on the test set, because sigmoid units do
not give 0/1 outputs.
Instead, we use the mean squared error on the test
set, which happens to coincide with the proportion
of correct answers in the 0/1 case.
The curve clearly shows that the network is
capable of learning in the restaurant domain;
indeed, the curve is very similar to that for
decision-tree learning, albeit somewhat shallower.
10
Back-propagation as gradient descent search
the gradient is on the error surface: the surface that
describes the error on each example as a function of
the all the weights in the network.
An example error surface is shown in Figure. The
current set of weights defines a point on this
surface.
At that point, we look at the slope of the surface
along the axis formed by each weight.
This is known as the partial derivative of the surface
with respect to each weight—how much the error
would change if we made a small change in weight.
We then alter the weights in an amount proportional
to the slope in each direction.
This moves the network as a whole in the direction
of steepest descent on the error surface.
11
Back-propagation as gradient descent search
12
Back-propagation as gradient descent search
13
Back-propagation as gradient descent search
14
Discussion
15
Discussion
Computational efficiency: Computational efficiency depends on the amount of
computation time required to train the network to fit a given set of examples.
If there are m examples, and |W| weights, each epoch takes O(m|W|) time.
However, work in computational learning theory has shown that the worst-case
number of epochs can be exponential in n, the number of inputs.
In practice, time to convergence is highly variable, and a vast array of techniques
have been developed to try to speed up the process using an assortment of
tunable parameters.
Local minima in the error surface are also a problem.
Networks quite often converge to give a constant "yes" or "no" output,
whichever is most common in the training set.
At the cost of some additional computation, the simulated annealing method can
be used to assure convergence to a global optimum.
16
Discussion
Generalization: neural networks can do a good job of generalization.
One can say, somewhat circularly, that they will generalize well on functions for
which they are well-suited.
These seem to be functions in which the interactions between inputs are not too
intricate, and for which the output varies smoothly with the input.
There is no theorem to be proved here, but it does seem that neural networks
have had reasonable success in a number of real-world problems.
17
Discussion
Sensitivity to noise: Because neural networks are essentially doing nonlinear
regression, they are very tolerant of noise in the input data.
They simply find the best fit given the constraints of the network topology.
On the other hand, it is often useful to have some idea of the degree of certainty
of the output values.
Neural networks do not provide probability distributions on the output values.
For this purpose, belief networks seem more appropriate.
Transparency: Neural networks are essentially black boxes.
Even if the network does a good job of predicting new cases, many users will
still be dissatisfied because they will have no idea why a given output value is
reasonable.
If the output value represents, for example, a decision to perform open heart
surgery, then an explanation is clearly in order.
With decision trees and other logical representations, the output can be explained
as a logical derivation and by appeal to a specific set of cases that supports the
decision.
This is not currently possible with neural networks.
18
Discussion
Prior knowledge: learning systems can often benefit from prior knowledge that is
available to the user or expert.
Prior knowledge can mean the difference between learning from a few well-
chosen examples and failing to learn anything at all.
Unfortunately, because of the lack of transparency, it is quite hard to use one's
knowledge to "prime" a network to learn better.
Some tailoring of the network topology can be done—for example, when
training on visual images it is common to connect only small sets of nearby
pixels to any given unit in the first hidden layer.
On the other hand, such "rules of thumb" do not constitute a mechanism by
which previously accumulated knowledge can be used to learn from subsequent
experience.
It is possible that learning methods for belief networks can overcome this
problem
19
Discussion
All these considerations suggest that simple feed-forward networks, although
very promising as construction tools for learning complex input/output
mappings, do not fulfil our needs for a comprehensive theory of learning in their
present form.
Researchers in AI, psychology, theoretical computer science, statistics, physics,
and biology are working hard to overcome the difficulties.
20
Applications of Neural Networks
a few examples of the many significant applications of neural networks.
In each case, the network design was the result of several months of trial-and-
error experimentation by researchers.
From these examples, it can be seen that neural networks have wide
applicability, but that they cannot magically solve problems without any thought
on the part of the network designer.
John Denker's remark that "neural networks are the second best way of doing
just about anything" may be an exaggeration, but it is true that neural networks
provide passable performance on many tasks that would be difficult to solve
explicitly with other programming techniques.
We encourage the reader to experiment with neural network algorithms to get a
feel for what happens when data arrive at an unprepared network.
21
Applications of Neural Networks
Pronunciation Pronunciation of written English text by a computer is a
fascinating problem in linguistics, as well as a task with high commercial payoff.
It is typically carried by first mapping the text stream to phonemes—basic sound
elements—and then passing the phonemes to an electronic speech generator.
The problem we are concerned with here is learning the mapping from text to
phonemes.
This is a good task for neural networks because most of the "rules" are only
approximately correct.
For example, although the letter "k" usually corresponds to the sound [k], the
letter "c" is pronounced [k] in cat and [s] in cent.
The NETtalk program (Sejnowski and Rosenberg, 1987) is a neural network that
learns to pronounce written text.
The input is a sequence of characters presented in a window that slides through
the text.
At any time, the input includes the character to be pronounced along with the
preceding and following three characters.
Each character is actually 29 input units—one for each of the 26 letters, and one
22
each for blanks, periods, and other punctuation.
Applications of Neural Networks
There were 80 hidden units in the version for which results are reported.
The output layer consists of features of the sound to be produced: whether it is
high or low, voiced or unvoiced, and so on.
Sometimes, it takes two or more letters to produce a single sound; in this case,
the correct output for the second letter is nothing.
Training consisted of a 1024-word text that had been hand-transcribed into the
proper phonemic features.
NETtalk learns to perform at 95% accuracy on the training set after 50 passes
through the training data.
One might think that NETtalk should perform at 100% on the text it has trained
on.
But any program that learns individual words rather than the entire text as a
whole will inevitably score less than 100%.
The difficulty arises with words like lead, which in some cases should be
pronounced to rhyme with bead and sometimes like bed.
A program that looks at only a limited window will occasionally get such words
wrong.
23
Applications of Neural Networks
So much for the ability of the network to reproduce the training data.
What about the gen- eralization performance? This is somewhat disappointing.
On the test data, NETtalk's accuracy goes down to 78%, a level that is
intelligible, but much worse than commercially available pro- grams.
Of course, the commercial systems required years of development, whereas
NETtalk only required a few dozen hours of training time plus a few months of
experimentation with various network designs.
However, there are other techniques that require even less development and
perform just as well.
For example, if we use the input to determine the probability of producing a
particular phoneme given the current and previous character and then use a
Markov model to find the sequence of phonemes with maximal probability, we
do just as well as NETtalk.
NETtalk was perhaps the "flagship" demonstration that converted many
scientists, partic- ularly in cognitive psychology, to the cause of neural network
research.
24
Applications of Neural Networks
A post hoc analysis suggests that this was not because it was a particularly
successful program, but rather because it provided a good showpiece for the
philosophy of neural networks.
Its authors also had a flair for the dramatic: they recorded a tape of NETtalk
starting out with poor, babbling speech, and then gradually improving to the
point where the output is understandable.
Unlike conventional speech generators, which use a midrange tenor voice to
generate the phonemes, they used a high-pitched generator.
The tape gives the unmistakable impression of a child learning to speak.
25
Applications of Neural Networks
Handwritten character recognition In one of the largest applications of neural
networks to date, Le Cun et al. (1989) have imple- mented a network designed to
read zip codes on hand-addressed envelopes.
The system uses a preprocessor that locates and segments the individual digits in
the zipcode; the network has to identify the digits themselves.
It uses a 16 x 16 array of pixels as input, three hidden layers, and a distributed
output encoding with 10 output units for digits 0-9.
The hidden layers contained 768, 192, and 30 units, respectively.
A fully connected network of this size would contain 200,000 weights, and
would be impossible to train.
Instead, the network was designed with connections D!TECTORS intended to
act as feature detectors.
For example, each unit in the first hidden layer was con- nected by 25 links to a
5 x 5 region in the input.
Furthermore, the hidden layer was divided into 12 groups of 64 units; within
each group of 64 units, each unit used the same set of 25 weights.
26
Applications of Neural Networks
Hence the hidden layer can detect up to 12 distinct features, each of which can
occur anywhere in the input image.
Overall, the complete network used only 9760 weights.
The network was trained on 7300 examples, and tested on 2000.
One interesting property of a network with distributed output encoding is that it
can display confusion over the correct answer by setting two or more output
units to a high value.
After rejecting about 12% of the test set as marginal, using a confusion
threshold, the performance on the remaining cases reached 99%, which was
deemed adequate for an automated mail-sorting system.
The final network has been implemented in custom VLSI, enabling letters to be
sorted at high speed.
27
Applications of Neural Networks
Driving ALVINN (Autonomous Land Vehicle In a Neural Network)
(Pomerleau, 1993) is a neural network that has performed quite well in a domain
where some other approaches have failed.
It learns to steer a vehicle along a single lane on a highway by observing the
performance of a human driver.
We described the system briefly on page 26, but here we take a look under the
hood.
ALVINN is used to control the NavLab vehicles at Carnegie Mellon University.
NavLab 1 is a Chevy van, and NavLab 2 is a U.S. Army HMMWV personnel
carrier.
Both vehicles are specially outfitted with computer-controlled steering,
acceleration, and braking.
Sensors include color stereo video, scanning laser range finders, radar, and
inertial navigation.
28
ALVINN
ALVINN - Autonomous Land Vehicle In a Neural Network
29
Applications of Neural Networks
Researchers ride along in the vehicle and monitor the progress of the computer
and the vehicle itself.
(Being inside the vehicle is a big incentive to making sure the program does not
"crash.")
The signal from the vehicle's video camera is preprocessed to yield an array of
pixel values that are connected to a 30 x 32 grid of input units in a neural
network.
The output is a layer of 30 units, each corresponding to a steering direction.
The output unit with the highest activation is the direction that the vehicle will
steer.
The network also has a layer of five hidden units that are fully connected to the
input and output layers.
ALVINN'S job is to compute a function that maps from a single video image of
the road in front of it to a steering direction.
To learn this function, we need some training data—some image/direction pairs
with the correct direction.
30
Applications of Neural Networks
Fortunately, it is easy to collect this data just by having a human drive the
vehicle and recording the image/direction pairs.
After collecting about five minutes of training data (and applying the back-
propagation algorithm for about ten minutes), ALVINN is ready to drive on its
own.
One fine point is worth mentioning. There is a potential problem with the
methodology of training based on a human driver: the human is too good.
If the human never strays from the proper course then there will be no training
examples that show how to recover when you are off course.
ALVINN corrects this problem by rotating each video image to create additional
views of what the road would look like from a position a little to the right or left.
The results of the training are impressive.
ALVINN has driven at speeds up to 70 mph for distances up to 90 miles on
public highways near Pittsburgh.
It has also driven at normal speeds on single lane dirt roads, paved bike paths,
and two lane suburban streets.
31
Applications of Neural Networks
ALVINN is unable to drive on a road type for which it has not been trained, and
is also not very robust with respect to changes in lighting conditions or the
presence of other vehicles.
A more gerleral capability is exhibited by the MANIAC system (Jochem et al.,
1993).
MANIAC is a neural network that has as subnets two or more ALVINN models
that have each been trained for a particular type of road.
MANIAC takes the output from each subnet and combines them in a second
hidden layer.
With suitable training, MANIAC can perform well on any of the road types for
which the component subnets have been trained.
Some previous autonomous vehicles employed traditional vision algorithms that
used various image-processing techniques on the entire scene in order to find the
road and then follow it.
Such systems achieved top speeds of 3 or 4 mph. 5
32
Applications of Neural Networks
Why has ALVINN proven to be successful? There are two reasons.
First and foremost, a neural network of this size makes an efficient performance
element.
Once it has been trained, ALVINN is able to compute a new steering direction
from a video image 10 times a second.
This is important because it allows for some slack in the system.
Individual steering directions can be off by 10% from the ideal as long as the
system is able to make a correction in a few tenths of a second.
Second, the use of a learning algorithm is more appropriate for this domain than
knowledge engineering or straight programming.
There is no good existing theory of driving, but it is easy to collect sample
input/output pairs of the desired functional mapping.
This argues for a learning algorithm, but not necessarily for neural nets.
But driving is a continuous, noisy domain in which almost all of the input
features contribute some useful information; this means that neural nets are a
better choice than, say, decision trees.
33
Applications of Neural Networks
Of course, ALVINN and MANIAC are pure reflex agents, and cannot execute
maneuvers that are much more complex than lane-following, especially in the
presence of other traffic.
Current research by Pomerleau and other members of the group is aimed at
combining ALVINN'S low-level expertise with higher-level symbolic
knowledge.
Hybrid systems of this kind are becoming more common as AI moves into the
real (physical) world.
34
Artificial Intelligence
Expert Systems
Contents
What is an Expert System?
Why should we use Expert Systems?
Early ES Systems
General Structure of an ES
Components of an Expert System.
Conventional System vs. Expert System
Human Expert Vs Expert System
Limitations of Expert Systems
What is an Expert System?
Human experts have
a considerable knowledge about their areas of expertise
Can learn from their experience
Can do reasoning
Can explain the solution
Can restructure knowledge
Can determine relevance
What is an Expert System?
An Expert System (ES) is software that attempts to
reproduce the performance of one or more human experts,
typically in a specific problem domain
a kind of software that simulates the problem-solving
behavior of a human expert of given domain.
ES employs human knowledge represented in a computer
to solve problems that ordinarily require human expertise.
ES imitate the expert’s reasoning processes to solve
specific problems
An expert system compared with traditional computer :
Inference engine +Knowledge = Expert system
(Algorithm + Data structures= Program in traditional computer )
Why should we Use Expert System?
Expert Systems:
Capture and preserve irreplaceable human expertise
Provide expertise needed at a number of locations at the
same time or in a hostile environment that is dangerous
to human health
Provide unemotional objective solutions faster than
human experts
Provide expertise that is expensive or rare
Share human expertise with a large number of people
Early ES Systems
First expert system, called DENDRAL, was developed in
the early 70's at Stanford University.
DENDRAL
the first knowledge intensive system
Expert system used for chemical analysis to predict molecular
structure
determining 3D structures of complex chemical compounds
MYCIN
Diagnose infectious diseases such as bacteremia and meningitis.
Recommend antibiotics.
Dosage adjusted for patient’s body weight.
Name derived from antibiotics (suffix – “mycin”).
ES Systems
SHINE: designed by NASA for monitoring, analyzing, and
diagnosing real-time and non-real-time systems
Stock Market Prediction
Apple's SIRI, a dialog system
insurance company Blue Cross's automated insurance claim
processing system
Some basic variants of expert systems
a rule-based expert system
fuzzy expert system
neural expert system
neuro-fuzzy expert system
General Structure of an ES
17