0 AIcompleteMerged

Introduction to
Artificial Intelligence
What is AI?
2
Acting humanly: The Turing Test approach
 The Turing Test (Alan Turing, 1950) - provides a satisfactory operational
definition of intelligence.
 A computer passes the test if a human interrogator, after posing some written
questions, cannot tell whether the written responses come from a person or from
a computer
 The computer would need to possess the following capabilities:
 natural language processing to enable it to communicate successfully in English;
 knowledge representation to store what it knows or hears;
 automated reasoning to use the stored information to answer questions and to draw new
conclusions;
 machine learning to adapt to new circumstances and to detect and extrapolate patterns.
 test avoids direct physical interaction between the interrogator and the computer
but includes a video signal
 To pass the total Turing Test, the computer will need
 computer vision to perceive objects, and
 robotics to manipulate objects and move about.
3
Thinking humanly: The cognitive modeling
approach
 introspection—trying to catch our own thoughts as they go by;
 psychological experiments—observing a person in action; and
 brain imaging—observing the brain in action
 cognitive science brings together computer models from AI and experimental
techniques from psychology to construct precise and testable theories of the
human mind.
4
Thinking rationally: The “laws of thought”
approach
 Aristotle was one of the first to attempt to codify “right thinking,” that is,
irrefutable reasoning processes.
 His syllogisms provided patterns for argument structures that always yielded
correct conclusions when given correct premises—for example,
 “Socrates is a man; all men are mortal; therefore, Socrates is mortal.”
 These laws of thought were supposed to govern the operation of the mind; their
study initiated the field called logic.
 There are two main obstacles to this approach.
 First, it is not easy to take informal knowledge and state it in the formal terms
required by logical notation, particularly when the knowledge is less than 100%
certain.
 Second, there is a big difference between solving a problem “in principle” and
solving it in practice.
5
Acting rationally: The rational agent
approach
 An agent is just something that acts
 all computer programs do something, but computer agents are expected to do
more: operate autonomously, perceive their environment, persist over a
prolonged time period, adapt to change, and create and pursue goals.
 A rational agent is one that acts so as to achieve the best outcome or, when there
is uncertainty, the best expected outcome
 Making correct inferences is sometimes part of being a rational agent
 There are two advantages over the other approaches.
 First, it is more general than the “laws of thought” approach because correct
inference is just on of several possible mechanisms for achieving rationality.
 Second, it is more amenable to scientific development than are approaches based
on human behavior or human though.
6
FOUNDATIONS OF AI
 Philosophy (rules of reasoning)
 Can formal rules be used to draw valid conclusions?
 How does the mind arise from a physical brain?
 Where does knowledge come from?
 How does knowledge lead to action?
 Aristotle (384–322 B . C .) - first to formulate a precise set of laws governing
the rational part of the mind.
 developed an informal system of syllogisms for proper reasoning to generate
conclusions mechanically, given initial premises.
 Ramon Lull (1315) - useful reasoning could actually be carried out by a
mechanical artifact.
 Thomas Hobbes (1588–1679) - reasoning was like numerical computation, that
“we add and subtract in our silent thoughts.”
 Around 1500, Leonardo da Vinci (1452–1519) designed but did not build a
mechanical calculator;
 Wilhelm Leibniz (1646–1716) built a mechanical device intended to carry out
operations on concepts rather than numbers,
7
FOUNDATIONS OF AI
 Mathematics (logic, algorithms, optimization)
 What are the formal rules to draw valid conclusions?
 What can be computed?
 How do we reason with uncertain information?
 Mathematics formalizes the three main area of AI: computation, logic, and
probability
 Computation leads to analysis of the problems that can be computed -
complexity theory
 Probability contributes the “degree of belief” to handle uncertainty in AI
 Decision theory combines probability theory and utility theory
(“preferred outcomes” / bias)
8
FOUNDATIONS OF AI
 Economics
 How should we make decisions so as to maximize payoff?
 How should we do this when others may not go along?
 How should we do this when the payoff may be far in the future?
 economics being about money?, but economists will say that they are really
studying how people make choices that lead to preferred outcomes(utility).
 Decision theory combines utility and probability theory provides a formal and
complete framework for decisions (economic or otherwise) made under
uncertainty
 Control theory and cybernetics
 How can artifacts operate under their own control?
 the science of communication and automatic control systems
 The artifacts adjust their actions
 To do better for the environment over time
 Based on an objective function and feedback from the environment
9
FOUNDATIONS OF AI
 Neuroscience (model low level human/animal brain activity)
 How do brains process information?
 Study of the nervous system, esp. brain
 A collection of simple cells can lead to thought and action
 Cognitive Science and Psychology (modeling high level human/animal thinking)
 How do humans and animals think and act?
 The study of human reasoning and acting
 Provides reasoning models for AI
 Strengthen the ideas
 humans and other animals can be considered as information processing machines
 Despite advances, we are still a long way from understanding how cognitive
processes actually work.
 Linguistics
 How does language relate to thought?
 computational linguistics or natural language processing
10
FOUNDATIONS OF AI
 Computer Engineering
 How can we build an efficient computer?
 artificial intelligence to succeed, we need two things: intelligence and an
artifact.
 The computer has been the artifact of choice.
 he first operational computer was the electromechanical Heath Robinson built in
1940 by Alan Turing’s team for a single purpose: deciphering German messages
 1943, the same group developed the Colossus, a powerful general-purpose
machine based on vacuum tubes.
 The first operational programmable computer was the Z-3, the invention of
Konrad Zuse in Germany in 1941. (also invented floating-point numbers and the
first high-level programming language).
 The first electronic computer, the ABC, was assembled by John Atanasoff and
his student Clifford Berry between 1940 and 1942 at Iowa State University
 Then ENIAC, developed as part of a secret military project at the University of
Pennsylvania by a team including John Mauchly and John Eckert, that proved to
be the most influential forerunner of modern computers
11
History of AI
The gestation (1943-1955):
♦ 1943: McCulloch & Pitts: model of neurons → Boolean circuit of the

brain
♦ 1949: Donald Hebb - updading rule for modifying the connection
strengths (Hebbian learning)
♦ 1950: Turing’s Computing Machinery and Intelligence: introduces Turing
Test, machine learning, genetic algorithms, and reinforcement learning.
The birth (1956):
♦ McCarthy (1927 -2011): 2 month, 10 man study of AI, to make machine

use language, form abstractions and concepts, solve kinds of problems now
reserved for humans, and improve themselves.
♦ Main actors for the next 20 years from MIT, CMU, Standford and IBM
12
History of AI
Early enthusiasm, great expectations (1952-1969):
 Newell and Simon’s early success was followed up with the General
Problem Solver, or GPS.
 Unlike Logic Theorist, this program was designed from the start to
imitate human problem-solving protocols.
♦ Geometry Theorem Prover, Logic Theorist, General Problem Solver,

Playing checkers (Given the primitive computers and programming tools)
♦ McCarthy: Lisp (1958), Time sharing (1959), AdviceTaker
at MIT
♦ Minsky: Microwords- algebra story problems, blocks world
”If the number of customers Tom gets is twice the square of 20 percent of
the number of advertisements he runs, and the number of advertisements he
runs is 45, what is the number of customers Tom gets?”
13
History of AI
A scene from the blocks world. S HRDLU (Winograd,1972) has just

completed the command “Find a block which is taller than the one you are
holding and put it in the box.”
14
History of AI
A dose of reality (1966-1973):
♦ Translation of Russian scientific paper in context of Sputnik (Alpack

report 1966)
”the spirit is willing but the flesh is weak”
”the vodka is good but the meat is rotten”
♦ Lighthill report (1973) most successful algorithms would halt on real

world problems and were only suitable for solving ”toy” versions.
false optimism on: cobinatorial explosion to be solved by faster hard-
ware and larger memories, no progress on genetic algorithms
2 input perceptron cannot be trained to recognize that the inputs are
different
15
History of AI
Knowledge-based systems: The key to power? (1969-1979)
♦ expert systems: Dendral, Mycin (certainty factor)

♦ Prolog, 1972 (EU), Planner (US)
♦ Minsky: frames - facts about a particular object, taxonomy of types
roots for OOP
AI becomes an industry (1980-present)

♦ The first successful commercial expert system, R1 (savings of 40milion a
year)
♦ 1981, the Japanese announced the ”Fifth Generation project, a 10-year

plan to build intelligent computers running Prolog.
♦ hundreds of companies building expert systems, vision systems, robots,

♦ the return of neural networks: complements the symbolic approaches
16
History of AI
AI adopts the scientific method (1987-present)
♦ build on existing theories than to propose brand-new ones
♦ to base claims on rigorous theorems or hard experimental
evidence rather than on intuition
♦ and to show relevance to real-world applications rather than
toy examples
♦ speech recongition (HMM), datamining, bayesian networks
The emergence of intelligent agents (1995-present)

♦ Internet, the most important environment
The availability of very large data sets (2001-present)
17
Potted history of AI
1943 McCulloch & Pitts: Boolean circuit model of brain

1950 Turing’s “Computing Machinery and Intelligence”
1952–69 Look, Ma, no hands!
1950s Early AI programs, including Samuel’s checkers program,
Newell & Simon’s Logic Theorist, Gelernter’s Engine
Geometry
1956
Dartmouth meeting: “Artificial Intelligence” adopted
1965
Robinson’s complete algorithm for logical reasoning
1966–74
AI discovers computational complexity
Neural network research almost disappears
1969–79
Early development of knowledge-based systems
1980–88 Expert systems industry booms
1988–93 Expert systems industry busts: “AI Winter”
1985–95 Neural networks return to popularity
1988– Resurgence of probability; general increase in technical depth
“Nouvelle AI”: ALife, GAs, soft computing
1995– Agents, agents, everywhere . . .
2003– Human-level AI back on the agenda 18
State of the art
What can AI do today?

 Robotic vehicles – Auto Cars
 Speech recognition
 Autonomous planning and scheduling
 Game playing
 Spam fighting
 Logistics planning
 Robotics
 Machine Translation
 Image Processing – emotion detection
 Banking – Fraud detection
19
Intelligent systems
Intelligent systems
 the system that incorporates intelligence into applications being
handled by machines
 perform search and optimization along with learning capabilities.
 different types of machine learning such as supervised,
unsupervised and reinforcement learning can be modeled in
designing intelligent systems.
 Expert systems, intelligent agents and knowledge-based systems
are examples of intelligent systems
 from automated vacuums such as the Roomba to facial
recognition programs to Amazon's personalized shopping
suggestions
Characteristics
 Self-explaining
 the system can explain how it came to a certain decision
 Robust
 property of a system means that the system behaves well
and adequate not only under ordinary conditions, but also under
unusual conditions
 fault tolerant
 continue to adequately perform even if one or more of its internal
system components fail or break.
 Adaptive
 react to changes, in particular to changes in the environment or the
context of the system
 self-optimizing
 organize their internal components and capabilities in new
structures without a central or an external authority in place
Characteristics
 Deductive
 based on a set of axioms and rules, they can deduct new insights by
applying the rules to the axioms as well as to the resulting new
facts.
 using an underlying inference engine; deductive systems can
discover new facts that they can use for their decision process
 deductive systems can discover new facts that they can use for their
decision process
 Learning
 observe the achieved results and compare them with the desired
outcome.
 Cooperative
 expose social capabilities; interact with other systems – and
potentially humans as well
 Autonomous
Characteristics
 Autonomous
 performs the desired tasks and behaves well and adequate
even in unstructured environments without continuous human
guidance.
 Agile
 able to manage and apply knowledge effectively so that they
behave well and adequate in continuously changing and
unpredicted environments.
Steps to build AI systems
 Identify the problem
 What are you trying to solve?
 Which result is desired?
 Preparation of the data (preprocessing)
 structured and unstructured data
 80% of their time cleaning , moving, reviewing, and organizing
 Choice of algorithms (model)
 Supervised learning
 Unsupervised learning & reinforcing learning
 Training the algorithms
 Choosing the most suitable programming language
 Platform selection
 Test the Model
 Deployment
The End…
7
Intelligent agents
contents
 Agents And Environments
 Agent Function and Program
 Rational agent
 Agent Program and types
 State Representation
Course – Introduction (recall)
 The main unifying theme is the idea of an intelligent agent.
 define AI as the study of agents that receive percepts from the
environment and perform actions.
sensors
?
?
environment
agent ?
actuators
model
3
Agents And Environments
 An agent is anything that can be viewed as perceiving its
environment through sensors
 acting upon that environment through actuators.
 A robotic agent might have cameras and infrared range finders
for sensors and various motors for actuators.
 A software agent receives keystrokes, file contents, and network
packets as sensory inputs and acts on the environment by
displaying on the screen, writing files, and sending network
packets.
Intelligent agents are supposed to

maximize their performance measure.
Agent Function and Program
 percept to refer to the agent’s perceptual inputs at any given
instant
 percept sequence is the complete history of everything the agent
has ever perceived.
 Thus, an agent’s choice of action at any given instant can
depend on the entire percept sequence observed to date, but not
on anything it hasn’t perceived.
 an agent’s behavior is described by the agent function that maps
any given percept sequence to an action
 the agent function for an artificial agent will be implemented by
an agent program
The agent function is an abstract mathematical description;

the agent program is a concrete implementation, running
within some physical system.
Agent Function
A vacuum-cleaner world with just two locations.
a simple agent function for the vacuum-cleaner world

Rational agent
 Def:
For each possible percept sequence, a rational agent should select an

action that is expected to maximize its performance measure, given
the evidence provided by the percept sequence and whatever built-in
knowledge the agent has.
 task environment - PEAS (Performance, Environment, Actuators,

Sensors)
Rational agent
Agent Program
 The job of AI is to design an agent program that implements
the agent function — the mapping from percepts to actions.
 program will run on some sort of computing device with
physical sensors and actuators — architecture
 Four basic kinds of agent programs

 Simple reflex agents;
 Model-based reflex agents;
 Goal-based agents; and
 Utility-based agents
 Learning based agents
 Each kind of agent program combines particular components
in particular ways to generate actions
Simple reflex agents
 agents select actions on the basis of the current percept,
ignoring the rest of the percept history
 condition–action rule
Model-based reflex agents
 knowledge about “how the world works” is called a model of
the world.
 An agent that uses such a model is called a model-based agent.
 Internal state - percept history
 Requires two knowledge,
 First, some information
about how the world
evolves independently of
the agent
 Second, some information
about how the agent’s own
actions affect the world
Goal-based agents
 Knowing something about the current state of the environment
is not always enough to decide what to do.
 at a road junction, the taxi can turn left, turn right, or go straight on.
 the agent needs some sort of goal information that describes
situations that are desirable
 being at the passenger’s destination
 goal-based action
selection is
 straightforward
 tricky
 Search - subfields of AI
devoted to finding action
sequences that achieve
the agent’s goals.
Utility-based agents
 Utility - general performance measure should allow a
comparison of different world states according to exactly how
happy they would make the agent.
 utility function that measures its preferences among states of
the world.
 it chooses the action that leads to the best expected utility
 An agent’s utility
function is
essentially an
internalization of
the performance
measure
Learning agents
 learning element is responsible for making improvements
 performance element is responsible for selecting external
actions.
 The learning element uses feedback from the critic on how
the agent is doing
 determines how the performance element should be modified to do
better in the future
 critic tells the learning element how well the agent is doing
with respect to a fixed performance standard
 problem generator suggesting actions that will lead to new
and informative experiences
 job is to suggest these exploratory actions
Learning agents
State Representation
 Three ways to represent states and the transitions between
them,
 Atomic : a state is a black box with no internal structure
 each state of the world is indivisible—it has no internal structure
 search and game-playing - work with atomic representations
 Factored : a state consists of a vector of attribute values
 splits up each state into a fixed set of variables or attributes, each of
which can have a value.
 values can be Boolean, real valued, or one of a fixed set of symbols
 propositional logic and machine learning algorithms
 Structured : a state includes objects, each of which may have
attributes of its own as well as relationships to other objects.
 relational databases and first-order logic
State Representation
Problem Solving Agent
 The simplest agents were the reflex agents - base their actions
on a direct mapping from states to actions.
 cannot operate well in environments for which this mapping would be
too large to store
 Goal-based agents consider future actions and the desirability
of their outcomes
 One kind of goal-based agent called a problem-solving agent
 A problem can be defined formally by five components:
 Initial state
 Actions
 Transition model
 Goal test
 Path cost
8-puzzle
19
8-puzzle
 States: A state description specifies the location of each of the eight tiles
and the blank in one of the nine squares.
 Initial state: Any state can be designated as the initial state. Note that any
given goal can be reached from exactly half of the possible initial states.
 Actions: The simplest formulation defines the actions as movements of the
blank space Left, Right, Up, or Down. Different subsets of these are
possible depending on where the blank is.
 Transition model: Given a state and action, this returns the resulting state;
for example, if we apply Left to the start state, the resulting state has the 5
and the blank switched.
 Goal test: This checks whether the state matches the goal configuration
 Path cost: Each step costs 1, so the path cost is the number of steps in the
path
Task Environment
 Observable or partially observable?
 Discrete or Continuous?
 Deterministic or Stochastic?
 Static or Dynamic?
 Episodic or Sequential?
 Multiple or Single Agent?
The End…
22

Contents
 Problem Solving Agent
 Problem Definition
 Example Problems
 Searching For Solutions
 Problem-solving Performance Measure
 Search Strategies
 The simplest agents were the reflex agents - base their actions
on a direct mapping from states to actions.
 cannot operate well in environments for which this mapping would be
too large to store
 Goal-based agents consider future actions and the desirability
of their outcomes
 One kind of goal-based agent called a problem-solving agent
 Problem-solving agents use atomic representations
 states of the world are considered as wholes, with no internal structure
visible
 Goal-based agents that use factored or structured
representations are usually called planning agents
 First Step in Problem Solving: Goal formulation
 based on the current situation and the agent’s performance
measure
 Courses of action that don’t achieve the goal can be rejected without
further consideration
 Goals help organize behavior by limiting the objectives
 The agent’s task is to find out how to act, now and in the future, so
that it reaches a goal state
 Problem formulation is the process of deciding what actions
and states to consider, given a goal.
 if the environment is unknown,
an agent with several immediate options of unknown
value can decide what to do by first examining future
actions that eventually lead to states of known value.
Kind of Environment
 Observable or partially observable?
 Discrete or Continuous?
 Deterministic or Stochastic?
 Static or Dynamic?
 Episodic or Sequential?
 Multiple or Single Agent?
 Competitive or Cooperative Agent?
 The process of looking for a sequence of actions that reaches
the goal - search.
 A search algorithm takes a problem as input and returns a
solution in the form of an action sequence.
 Once a solution is found, the actions it recommends can be
carried out - the execution phase.
“formulate, search, execute” design for the agent

 Steps:
 It first formulates a goal and a problem
 searches for a sequence of actions that would solve the problem, and
 then executes the actions one at a time.
 When this is complete, it formulates another goal and starts over.
Problem Definition
A problem can be defined formally by five components:
 Initial state - the agent starts in
 Actions
 A description of the possible actions available to the agent
 Given a particular state s, ACTIONS(s) returns the set of actions that
can be executed in s.
 Transition model
 A description of what each action does - the transition model
 Together, the initial state, actions, and transition model implicitly
define the state space of the problem
 state space forms a directed network or graph
 Goal test
 determines whether a given state is a goal state.
 Path cost
 function that assigns a numeric cost to each path.
Problem Definition
 A solution to a problem is an action sequence that leads from
the initial state to a goal state.
 Solution quality is measured by the path cost function, and
 an optimal solution has the lowest path cost among all
solutions.
Example Problems - 8-puzzle
9
8-puzzle
 States: the location of each of the eight tiles and the blank in one of the
nine squares
 Initial state: Any state can be designated as the initial state.
 Actions: The simplest formulation defines the actions as movements of the
blank space Left, Right, Up, or Down.
 Transition model: Given a state and action, this returns the resulting state;
 for example, if apply Left to the start state, the resulting state has the 5 and the blank
switched.
 Goal test: checks whether the state matches the goal configuration
 Path cost: Each step costs 1, so the path cost is the number of steps in the
path
Example Problems - vacuum world
11
vacuum world
 States: The state is determined by both the agent location and the dirt
locations.
 Initial state: Any state can be designated as the initial state.
 Actions: each state has just three actions: Left, Right, and Suck.
 Transition model: The actions have their expected effects, except that
moving Left in the leftmost square, moving Right in the rightmost
square, and Sucking in a clean square have no effect.
 Goal test: This checks whether all the squares are clean.
 Path cost: Each step costs 1, so the path cost is the number of steps in
the path.
• 8-queens problem
• route-finding problem
Searching for Solutions
 A solution is an action sequence, so search algorithms work
by considering various possible action sequences.
 The possible action sequences starting at the initial state
form a search tree with the initial state at the root;
 the branches are actions and
 the nodes correspond to states in the state space of the problem.
 Steps in growing Search tree
 root node of the tree corresponds to the initial state
 to test whether this is a goal state
 expanding the current state - generating a new set of states
 The set of all leaf nodes available for expansion at any given point - the frontier
 This is the essence of search

Tree-search and Graph-search
Three common variants are the

• first-in, first-out or FIFO queue
• the last-in, first-out or LIFO queue
• the priority queue 14
Performance Measure
Evaluate an algorithm’s performance in four ways
 Completeness: Is the algorithm guaranteed to find a solution when
there is one?
 Optimality: Does the strategy find the optimal solution?
 Time complexity: How long does it take to find a solution?
 No. of nodes generated during the search
 Space complexity: How much memory is needed to perform the
search?
 No. of nodes stored during the search
Complexity is expressed in terms of three quantities:

 b, the branching factor or maximum number of successors of any node
 d, the depth of the shallowest goal node
 m, the maximum length of any path in the state space.
Search Strategies
 uninformed search algorithms or blind search
 algorithms that are given no information about the problem other than
its definition.
 generate successors and distinguish a goal state from a non-goal state
 all search strategies are distinguished by the order in which nodes are
expanded
 although can solve any solvable problem, not so efficiently.
 Breadth-first search, DFS, Uniform-cost search
 Informed search algorithms or heuristic search
 strategies that know whether one non-goal state is “more promising”
than another
 some guidance on where to look for solutions
 Greedy best-first search, A* search
The End…
17
Uninformed Search Strategies

Contents
 Blind Search
 BFS
 DFS
 UCS
Blind Search
 have no additional information about states beyond that
provided in the problem definition
 All they can do is generate successors and distinguish a goal
state from a non-goal state.
 All search strategies are distinguished by the order in which
nodes are expanded.
Breadth-first search
 is an instance of the general graph-search algorithm
 shallowest unexpanded node is chosen for expansion
 uses a FIFO queue for the frontier
 A simple strategy
 the root node is expanded first, then
 all the successors of the root node are expanded next,
 then their successors, and so on.
 always has the shallowest path to every node on the frontier
BFS
Level 0
Level 1
Level 2
Level 3
5
Breadth-first search - Analysis
 Complete - if the shallowest goal node is at some finite depth
d, breadth-first search will eventually find it
 optimal - shallowest goal node is not necessarily the optimal
one - the same cost
 Time Complexity - O(bd)
 every state has b successors,
 Space Complexity – O(bd)
 O(bd−1) nodes in the explored set and O(bd) nodes in the
frontier
 the memory requirements are a bigger problem for breadth-
first search than is the execution time
Depth-first search
 expands the deepest node in the current frontier of the search
tree
 uses a LIFO queue
 most recently generated node is chosen for expansion
Depth-first search - Analysis
 Complete - not complete
 loop forever (or) fail if an infinite non-goal path is encountered
 optimal - not optimal
 Time Complexity - O(bm)
 m itself can be much larger than d (the depth of the shallowest solution)
 is infinite if the tree is unbounded
 Space Complexity – O(bm)
 to store only a single path from the root to a leaf node
 remaining unexpanded sibling nodes for each node on the path
Depth-first search
Uniform-cost search
 When all step costs are equal, breadth-first search is optimal
 algorithm that is optimal with any step-cost function
 Instead of expanding the shallowest node, expands the node n
with the lowest path cost.
 uses a Priority queue
 does not care about the number of steps a path has, but only
about their total cost
 goal test is applied to a
node when it is selected
for expansion
Uniform-cost search - Analysis
 Complete – yes, if, step cost is non-zero, positive
 infinite - if there is a path with an infinite sequence of zero-cost actions
 optimal - Yes, expands nodes in order of their optimal path cost
 Time Complexity – O(b1+[C∗/ϵ])
 Space Complexity –
 C∗ be the cost of the optimal solution, and assume that every
action costs at least ϵ
 When all step costs are equal, b1+C∗/ When all step costs are
equal, b1+[C∗/ϵ] is just bd+1
Depth-limited search
 The failure of depth-first search in infinite state spaces can be
alleviated by a predetermined depth limit l.
 nodes at depth l are treated as if they have no successors.
 The depth limit solves the infinite-path problem.
 In-completeness, if we choose l < d, the shallowest goal is
beyond the depth limit.
 Non-optimal, if we choose l > d
 Its time complexity is O(bl) and space complexity is O(bl).
 Depth-first search can be viewed as a special case of depth-
limited search with l=∞.
 depth limits can be based on knowledge of the problem
Iterative deepening DFS
 depth-first tree search that finds the best depth limit.
 gradually increase the limit — first 0, then 1, then 2, and so
on—until a goal is found.
 This will occur when the depth limit reaches d, the depth of the
shallowest goal node
 IDS combines the benefits of depth-first and breadth-first
search
 Like DFS, its memory requirements are modest: O(bd)
 Like BFS,
 complete when the branching factor is finite and
 optimal when the path cost is a non-decreasing function of the depth
of the node.
 number of nodes generated in the worst case
Iterative deepening DFS
In general, iterative deepening

is the preferred uninformed
search method when the search
space is large and the depth of
the solution is not known
Comparing uninformed search strategies
 b is the branching factor

 d is the depth of the shallowest solution
 m is the maximum depth of the search tree

a complete if b is finite

b complete if step costs ≥ for positive

c optimal if step costs are all identical
The End…
16
Informed Search Strategies

Contents
 Heuristic Search
 Greedy best-first search
 A* search
Heuristic Search
 uses problem-specific knowledge beyond the definition of the
problem itself
 can find solutions more efficiently than can an uninformed
strategy
 general approach is called best-first search
 a node is selected for expansion based on an evaluation
function, f(n)
 evaluation function is a cost estimate, so the node with the
lowest evaluation is expanded first
 identical to uniform-cost search
 best-first algorithms include a component of f a heuristic
function, denoted h(n)
h(n) = estimated cost of the cheapest path from the state at node n to a goal state
Greedy best-first search
 expand the node that is closest to the goal, on the grounds
that this is likely to lead to a solution quickly
 Evaluation function: f(n) = h(n)
 route-finding problem with straight line distance heuristic
hSLD
 hSLD is correlated with actual road distances and is, therefore,
a useful heuristic
Romania Map with Costs
Values of h —straight-line distances to Bucharest

SLD
5
GBFS
6
Greedy Best-First- Analysis
 Complete – No
 From Iasi to Fagaras
 No – can get stuck in loops, e.g., Iasi -> Neamt -> Iasi ->Neamt …
 Optimal - No
 Time Complexity - O(bm)
 Space Complexity – O(bm) keeps all nodes in memory
A∗ search
 most widely known form of best-first search
 evaluates nodes by combining g(n), the cost to reach the
node, and h(n), the cost to get from the node to the goal
 f(n), estimated cost of the cheapest solution through n .

 complete and optimal
 identical to Uniform Cost Search except that A∗ uses g + h
instead of g.
A* search - Demo
9
A* search - Demo
10
Conditions for optimality
 Admissibility
 admissible heuristic that never overestimates the cost to reach the
goal, i.e., it is optimistic
 A heuristic h(n) is admissible if for every node n, h(n) ≤ h*(n), where
h*(n) is the true cost to reach the goal state from n
 Example: hSLD(n) –straight line cannot be an overestimate
 Consistency
 A heuristic is consistent if for every node n, every successor
n' of n generated by any action a,
 Triangle inequality
 the triangle is formed by n, n', and
the goal Gn closest to n
Workout
Optimality of A*
A∗ has the following properties:

 the tree-search version of A∗ is optimal if h(n) is admissible
 the graph-search version is optimal if h(n) is consistent

A* Search Analysis
 Complete – Yes, b is finite.
 Optimal - Yes, with finite b and positive path cost
 Admissibility and Consistency
 Time Complexity - O(bd), Exponential in the length of the
solution
 Space Complexity – O(bd), keeps all nodes in memory
The End…
15
Knowledge Representation
Contents
 Logical Agents
 Knowledge-based Agents
 Data  Information  Knowledge
 Knowledge Representation
 KR Characteristics
Logical Agents
 Humans know things and do reasoning, which are important
for artificial agents
 intelligence of humans is achieved by processes of reasoning
that operate on internal representations of knowledge
 In AI, this approach to intelligence is embodied in
knowledge-based agents
Knowledge-based Agents
 The central component is the knowledge base, or KB
 A knowledge base is a set of sentences
 A sentence,
 expressed in a language called a knowledge representation language
 logic languages
 represents some assertion about the world
 Operations,
 TELL - add new sentences to the KB
 ASK - a way to query what is known
 Both operations may involve inference - deriving new
sentences from old
Knowledge-based agent
 KB initially contain some background knowledge
 Each time the agent program is called, it does three things.
 First, it TELLs the knowledge base what it perceives.
 Second, it ASKs the knowledge base what action it should perform.
 extensive reasoning may be done about the current state, the
outcomes of possible action sequences

 Third, the agent program TELLs the knowledge base which action was
chosen, and the agent executes the action.
 three functions that implement the interface
 constructs a sentence asserting that the agent perceived the given
percept at the given time.
 constructs a sentence that asks what action should be done at the current
time.
 constructs a sentence asserting that the chosen action was executed.
 A knowledge-based agent can be built in two ways,
 declarative approach
 Starting with an empty knowledge base
 the agent designer can TELL sentences one by one until the agent
knows how to operate in its environment.
 procedural approach
 encodes desired behaviors directly as program code
 successful agent often combines both declarative and
procedural elements in its design
Representation
 Knowledge can take many forms and examples,
 It is raining
 Earth is round
 Humans are idiots
 how knowledge and facts about the world can be represented?
 how should an AI agent store and manipulate knowledge?
 what kinds of reasoning can be done with that knowledge
Data Information Knowledge

disconnected facts data with context, processing relationships among patterns
relationships among facts Knowledge is derived from
information by applying rules
Data  Information  Knowledge
Data: 20, 25, 30
This set of numbers are data representing a set of test results for a student.
(On their own they have no meaning because they have no context.)
Information:
The data has been processed and given context. It is now meaningful information.
student’s results Test 1 Test 2 Test 3

Raw mark 20 25 30
Max mark 20 50 100
Percentage 100% 50% 30%
Knowledge
The teacher can see that student’s results are showing a downward trend.
The teacher applies the rule of 40% for a pass to gain the knowledge that
student has passed Tests 1 and 2, and failed Test 3.
Data  Information  Knowledge
Ice Cream Shop Profit 2017 2018 2019
Month Profit in lakhs
200
2017 2018 2019 190
Jan 100.2 107.38 133.49 180
profit (in lakhs rupee)

170
Feb 120.28 126.56 136.04 160

150
Mar 143.86 163.54 163.95 140
Apr 142.81 161.53 170.26

130
120
May 168.32 194.48 199.17 110

100
Jun 161.67 184.14 190.22 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Jul 155.02 177.00 178.34

Months
Aug 142.65 164.48 173.04

Sep 132.46 134.80 145.29 Knowledge
Oct 137.12 143.74 170.96 Which are the months the profit is better
Nov 109.53 138.42 144.83 than previous month in all three years?
Dec 111.23 118.49 129.22
March, May and October
10
Knowledge Representation
 knowledge representation can be considered at two levels :
 knowledge level at which facts are described, and
 symbol level at which the representations of the objects, defined in
terms of symbols, can be manipulated in the programs.
 Knowledge Representation models are often based on:

 Logic
 Semantic Net
 Frames
 Rules
Characteristics of KR
 Inferential adequacy and efficiency
 Can it infer knowledge from different relations and do it efficiently
 ability to manipulate the representational structures and to derive new
structure
 Acquisitional adequacy
 system’s ability to gain additional knowledge based on the
environment provide, using automatic methods
 Representational adequacy
 ability to represent all kinds of knowledge required
Logical Reasoning
 the agent draws a conclusion from the available information
 conclusion is guaranteed to be correct, if the available
information is correct
 This is a fundamental property of logical reasoning.
The Wumpus World
 environment in which knowledge-based agents can show their worth
The Wumpus World
 Performance measure
 +1000 for climbing out with the gold,
 –1000 for falling into a pit or being eaten by the wumpus,
 –1 for each action taken and –10 for using up the arrow
 game ends either when the agent dies or agent get out with gold
 Environment
 A 4 × 4 grid of rooms, starts in the square [1,1]
 Actuators
 Forward, TurnLeft by 90◦, or TurnRight by 90◦, Grab, Shoot, Climb
 Sensors
 Stench, Breeze, Glitter, Bump and Scream
[Stench, Breeze, None, None, None]

The Wumpus World
[None, None, None, None, None] [None, Breeze, None, None, None]
The Wumpus World
[Stench, None, None, None, None] [Stench, Breeze, Glitter, None, None]
The End…
18
Logical Agents – Predicate Logic

Contents
 Logical Agents
 Predicate Logic – Introduction
Limitations of PL
 limited expressiveness
 becomes impractical, even for very small worlds
 Not possible to represent relations like all, some, or none with
propositional logic
 Cannot represent objects, properties and relationships
Roommate Carrying Umbrella

there is an object we call Roommate,
there is an object we call Umbrella,
there is a relationship Carrying between these two objects
Formally, none of this meaning is there
Roommate Carrying Umbrella by P
Language of thoughts
 Natural languages are very expressive
 Natural languages also suffer from ambiguity, a problem for a
representation language
 Combining the formal and natural languages
 foundation of propositional logic—a declarative, compositional semantics
that is context-independent and unambiguous
 representational ideas from natural language while avoiding its drawbacks
 Nouns and noun phrases refer to objects (squares, pits, wumpuses)
 Verbs and verb phrases refer to relations among objects (is breezy,
is adjacent to, shoots)
 Relations are functions—relations in which there is only one
“value” for a given “input.”
Examples
 “One plus two equals three.”
 Objects: one, two, three, one plus two
 Relation: equals
 Function: plus
(“One plus two” is a name for the object that is obtained by applying the function “plus” to
the objects “one” and “two.” “Three” is another name for this object.)
 “Squares neighboring the wumpus are smelly.”

 Objects: wumpus, squares
 Property: smelly
 Relation: neighboring
 “Evil King John ruled England in 1200.”
 Objects: John, England, 1200
 Relation: ruled
 Properties: evil, king
predicate symbols stand for relations
Predicate Logic
 is built around objects and relations
 extends the syntax of propositional calculus with predicates
and quantifiers
P(X) – P is a predicate
 Atomic sentence is formed from a predicate symbol
optionally followed by a parenthesized list of terms/objects.
Atomic sentence = predicate (term1,...,termn)
 Ex:
Brother(Richard, John)
Married(Father(Richard), Mother(John))
Brother(Richard, John) ∧ Brother(John, Richard)
Syntax of FOL: Basic elements
The basic syntactic elements of FOL are the symbols that
stand for objects, relations & functions.
 Constants (stand for objects) Sarah, A, B,...

 Predicates (stand for relation) Brother, Round,...
 Functions (stand for function) FatherOf,...
 Variables x, y, a, b,...
 Connectives , , , , 
 Equality =
 Quantifiers , 
Older (S1, 20)  Younger(S1, 20)

Universal and Existential quantifier
 Quantifiers  and 
  - For all:
 x P(x) is read “For all x’es, P (x) is true”
 E.g., for all engineering students, they are smart.
  - There Exists:
 x P(x) is read “there exists an x such that P(x) is true”.
 E.g., there exist a engineering student who is not smart.
 Relationship between the quantifiers:

 x P(x)  ¬ (x)¬P(x)
 “If There exists an x for which P holds, then it is not true that for all x
P does not hold”.
x, y, z, … can refer to multiple objects

Examples
 All cats are mammals - x Cats(x) => Mammals(x)
 for all x, if x is a cat, then x is a mammal
 Everyone at IIITK is smart - x At(x, IIITK) => Smart(x)
 Someone at PSU is smart - x At(x, PSU)  Smart(x)
 Nested quantifiers - ∀ x ∀ y Brother(x, y) ⇒ Sibling(x, y)
 mixtures
∀x ∃y Loves(x, y) - Everybody loves somebody
∃y ∀x Loves(x, y) - There is someone who is loved by everyone

Try
 All food are edible
 Brothers are siblings
 One's mother is one's female parent
 Richard has at least two brothers
Try
 All food are edible
 x food( x) => edible( x)
 Brothers are siblings

 x,y Brother(x, y)  Sibling(x, y).
 One's mother is one's female parent

 m,c Motherof(c) = m  (Female(m)  Parent(m, c))
 Richard has at least two brothers

∃x,y Brother(x, Richard) ∧ Brother(y, Richard) (why?)
∃x,y Brother(x, Richard) ∧ Brother(y, Richard) ∧ ¬(x = y)
Activity
Aditya, Mahesh and Lokesh belong to the Dept Champions
Club.
 Every member of the club is either a soccer or a cricketer .
 No cricketer likes rain, and all soccers like snow
 Lokesh dislikes whatever Aditya likes and likes whatever
Aditya dislikes
 Aditya likes rain and snow
 Is there any member of the club who is cricketer but not

soccer?
 Translate this problem into FOL Sentences?

Activity
 S(x) means x is a soccer
 C(x) means x is a cricketer
 L(x,y) means x likes y
 The English sentences in the prev slide can be translated into

the following FOL:
x Clubmember(x) => S(x) V C (x)
~ x C(x)  L(x,rain)
x S(x) => L(x,snow)
y L(Lokesh, y) <=> ~L(Aditya,y)
L(Aditya, rain) ^ L(Aditya,snow)
Example of inference rules
 “It is illegal for students to copy music.”
 “Joe is a student.”
 “Every student copies music.”
 Is Joe a criminal?
 Knowledge Base:
x, y Student( x)  Music( y )  Copies( x, y ) (1)

 Criminal(x)
Student(Joe) (2)
x y Student(x) Music(y) Copies( x, y ) (3)
Example cont...
From : x y Student(x) Music(y) Copies( x, y )

y Student(Joe)  Music(y) Copies( Joe, y )
Existential Elimination Universal Elimination
Student(Joe)  Music(SomeSong)  Copies( Joe, SomeSong)
Modus Ponens
Criminal(Joe)
Example partially borrowed from http://sern.ucalgary.ca/courses/CPSC/533/W99/presentations/L1_9A_Chow_Low/main.html

Prolog
 Prolog is a logic programming language
 Used for implementing logical representations and for
drawing inference
16
Prolog
17
Example 1
Example 2
The End…
20
Logical Agents - Propositional Logic

Contents
 Logical Agents
 Propositional Logic – Introduction
 Connectives – TT
 Example Sentences
 Tautology and Contradictory
 A simple knowledge base
 Entailment and derivation
 Inference and rules
 Proving things
Logical Agents
 Humans know things and do reasoning/inference, which are
important for artificial agents
 Central component is a knowledge-base
 Knowledge is made of sentences
 Sentences are expressed in a language
 To derive new sentences through inference
 Sentences are expressed according to the syntax of the
representation language, which specifies all the sentences that
are well formed
 semantics defines the truth of each sentence with respect to
each possible world
 logical entailment between sentences—the idea that a sentence
follows logically from another sentence
In every model where a is true, b is also true
Propositional Logic
 a simple but powerful logic
 statements are made by propositions, which is a declarative
statement which is either true or false
 syntax defines the allowable sentences and atomic sentences
consist of a single proposition symbol (can be true or false)
 Example: P, Q, R, W1,3 and North
 P - “It is hot.”
 Q - “It is humid.”
 R - “It is raining.”
 (P  Q)  R - “If it is hot and humid, then it is raining”
 Q  P - “If it is humid, then it is hot”
 W1,3 to stand for the proposition that the wumpus is in [1,3]
Propositional Logic
 There are five connectives in common use:
 ...and [conjunction]
 ...or [disjunction]
...implies [implication / conditional]
..is equivalent [biconditional]
 ...not [negation]
¬P is true iff P is false in m
P ∧ Q is true iff both P and Q are true in m
P ∨ Q is true iff either P or Q is true in m
P ⇒ Q is true unless P is true and Q is false in m
P ⇔ Q is true iff P and Q are both true or both false in m
Complex sentences are constructed from simpler sentences,
using parentheses and logical connectives.
Propositional Logic
 Truth tables for the five logical connectives
Parenthesis > Negation > Conjunction(AND) > Disjunction(OR) > Implication > Biconditional
Examples of PL sentences
 Example
 It is Raining and it is Thursday:
 R  T, where
 R represents “It is Raining”, T represents “it is Thursday”.
 Example
 It is not hot but it is sunny. It is neither hot nor sunny.
It is not hot, and it is sunny. It is not hot, and it is not sunny.
 Let h = “it is hot” and s = “it is sunny.”
 ~h  s ~h  ~s
 Example
 Suppose x is a particular real number. Let p, q, and r symbolize
“0 < x,” “x < 3,” and “x = 3.” respectively.
Then the following inequalities
 x3 0<x<3 0<x3
can be translated as
 qr pq p  (q  r)
Tautology and Contradictory
 A tautology is true under any interpretation.
 The expression A ˅ ¬A is a tautology.
 This means it is always true, regardless of the value of A.
 An expression which is false under any interpretation is
contradictory (or unsatisfiable).
 A  ¬A
 Some expressions are satisfiable, but not valid. This means
that they are true under some interpretation, but not under
all interpretations.
 AB
A simple knowledge base
 R1 : ¬P1,1
 R2 : B1,1 ⇔ (P1,2 ∨ P2,1)
 R3 : B2,1 ⇔ (P1,1 ∨ P2,2 ∨ P3,1)
 R4 : ¬B1,1
 R5 : B2,1
Entailment and derivation
 Entailment: KB |= Q
 Q is entailed by KB (a set of premises or assumptions) if and
only if there is no logically possible world in which Q is false
while all the premises in KB are true.
 Or, stated positively, Q is entailed by KB if and only if the
conclusion is true in every logically possible world in which
all the premises in KB are true.
 Derivation: KB |- Q
 We can derive Q from KB if there is a proof consisting of a
sequence of valid inference steps starting from the premises in
KB and resulting in Q
10
Inference
 Deduction: the process of deriving a conclusion from a set of
assumptions
 Applying a sequence of rules is called a proof
 Equivalent to searching for a solution
 If we deduce a conclusion C from a set of assumptions, we
write:
{A1, A2, …, An} ├ C
 If C can be concluded without any assumption
├C
 The inference rule A ├ B is expressed as
A
B Given A, B is deduced (or concluded).
It is like if A is true, then B is true.
Types of Inference rules
Soundness of Rules
P Q P→Q OK?
True True True 
True False False 
False True True 
False False True 
Proving things
 A proof is a sequence of sentences, where each sentence is either a
premise or a sentence derived from earlier sentences in the proof by
one of the rules of inference.
 The last sentence is the theorem (also called goal or query) that we
want to prove.
 Example
1 Humid Premise “It is humid”

2 HumidHot Premise “If it is humid, it is hot”
3 Hot Modus Ponens(1,2) “It is hot”
4 (HotHumid)Rain Premise “If it’s hot & humid, it’s raining”
5 HotHumid And Introduction(1,2) “It is hot and humid”
6 Rain Modus Ponens(4,5) “It is raining”
The End…
15
Fuzzy Logic
 Introduction
 Fuzzy thinking, why fuzzy, logic
 Fuzzy sets
 Representation
 Linguistic variable and hedges
 Operations on Fuzzy sets
 Complement
 Containment
 Intersection, etc.
 Fuzzy rules
1
Introduction
 Experts rely on common sense when they solve
problems.
 How can we represent expert knowledge that uses
vague and ambiguous terms in a computer?
 Fuzzy logic is not logic that is fuzzy, but logic that is
used to describe fuzziness.
 Fuzzy logic is the theory of fuzzy sets, sets that
calibrate vagueness.
 Fuzzy logic is based on the idea that all things admit
of degrees.
2
Fuzzy Logic
 Boolean logic uses sharp distinctions.
 Fuzzy logic reflects how people think. It
attempts to model our sense of words,
our decision making and our common
sense. As a result, it is leading to new,
more human, intelligent systems.
3
Fuzzy Logic Histroty
 Fuzzy, or multi-valued logic was introduced in the
1930s by Jan Lukasiewicz, a Polish philosopher. This
work led to an inexact reasoning technique often
called possibility theory.
 Later, in 1937, Max Black published a paper called
“Vagueness: an exercise in logical analysis”. In this
paper, he argued that a continuum implies degrees.
 In 1965 Lotfi Zadeh, published his famous paper
“Fuzzy sets”.
 Zadeh extended possibility theory into a formal
system of mathematical logic.
4
Why fuzzy?
 As Zadeh said, the term is concrete,
immediate and descriptive.
Why logic?
 Fuzziness rests on fuzzy set theory, and
fuzzy logic is just a small part of that
theory.
5
Definition
 Fuzzy logic is a set of mathematical principles for
knowledge representation based on degrees of
membership.
 Unlike two-valued Boolean logic, fuzzy logic is
multi-valued.
 It deals with degrees of membership and degrees of
truth.
 Fuzzy logic uses the continuum of logical values
between 0 (completely false) and 1 (completely
true).
6
Range of logical values in
Boolean and fuzzy logic
7
Fuzzy sets
 The concept of a set is fundamental to
mathematics.
 However, our own language is also the
supreme expression of sets. For example, car
indicates the set of cars. When we say a car,
we mean one out of the set of cars.
8
Tall men example
9
Fuzzy Logic
 Introduction
 Fuzzy sets
 Representation
 Complement
 Containment
 Fuzzy rules
1
Tall men example
2
Crisp and fuzzy sets of “tall men”
3
A fuzzy set is a set with fuzzy boundaries
 The x-axis represents the universe of discourse
 The y-axis represents the membership value of the
fuzzy set.
 In classical set theory, crisp set A of X is defined as
fA(x): X → {0, 1}, where
 In fuzzy theory, fuzzy set A of universe X is defined
μA(x): X → [0, 1], where μA(x) = 1 if x is totally in A;
μA(x) = 0 if x is not in A;
0 < μA(x) < 1 if x is partly in A.
4
fuzzy set representation
 First, we determine the membership
functions. In our “tall men” example, we
can obtain fuzzy sets of tall, short and
average men.
 The universe of discourse − the men’s
heights − consists of three sets: short,
average and tall men.
5
Crisp and fuzzy sets
6
Representation of crisp and
fuzzy subsets
 Typical functions : sigmoid, gaussian and pi.

 However, these functions increase the time of
computation. Therefore, in practice, most
applications use linear fit functions.
7
Fuzzy Logic
➢ Introduction
➢ Fuzzy thinking, why fuzzy, logic
➢ Fuzzy sets
➢ Representation
➢ Linguistic variable and hedges
➢ Operations on Fuzzy sets
➢ Complement
➢ Containment
➢ Intersection, etc.
➢ Fuzzy rules
Linguistic variables and hedges
▪ At the root of fuzzy set theory lies the
idea of linguistic variables.
▪ A linguistic variable is a fuzzy variable.
For example, the statement “John is tall”
implies that the linguistic variable John
takes the linguistic value tall.
2
Example
▪ In fuzzy expert systems, linguistic variables are used
in fuzzy rules. For example:
IF wind is strong
THEN sailing is good
IF project_duration is long
THEN completion_risk is high
IF speed is slow
THEN stopping_distance is short
3
Hedge
▪ A linguistic variable carries with it the
concept of fuzzy set qualifiers, called
hedges.
▪ Hedges are terms that modify the shape
of fuzzy sets. They include adverbs such
as very, somewhat, quite, more or less
and slightly.
4
Crisp and fuzzy sets
5
Fuzzy sets with the hedge very
6
Representation of hedges
7
Representation of hedges (cont.)
8
Operations on Classical Sets
Union:
A  B = {x | x  A or x  B}
Intersection:
A  B = {x | x  A and x  B}
Complement:
A’ = {x | x  A, x  X}
X – Universal Set
Set Difference:
A | B = {x | x  A and x  B}
Set difference is also denoted by A - B
9
Union of sets A and B (logical or).
Intersection of sets A and B.
10
Complement of set A.
Difference operation A|B.
11
Properties of Classical Sets
AB=BA
AB=BA
A  (B  C) = (A  B)  C
A  (B  C) = (A  B)  C
A  (B  C) = (A  B)  (A  C)
A  (B  C) = (A  B)  (A  C)
AA=A
AA=A
AX=X
AX=A
A=A
A=
12
Operations of fuzzy sets
▪ The classical set theory developed in
the late 19th century by Georg Cantor
describes how crisp sets can interact.
These interactions are called
operations.
13
Cantor’s sets
14
Complement
Crisp Sets: Who does not belong to the set?
Fuzzy Sets: How much do elements not belong
to the set?
▪ The complement of a set is an opposite of this
set.
μA(x) = 1 − μA(x)
15
Fuzzy Set Operations
Complement of fuzzy set A

∼
16
Containment
Crisp Sets: Which sets belong to which other sets?
Fuzzy Sets: Which sets belong to other sets?
▪ A set can contain other sets. The smaller set is called
subset.
▪ In crisp sets, all elements of a subset entirely belong
to a larger set.
▪ In fuzzy sets, each element can belong less to the
subset than to the larger set. Elements of the fuzzy
subset have smaller memberships in it than in the
larger set.
17
Intersection
Crisp Sets: Which element belongs to both sets?
Fuzzy Sets: How much of the element is in both sets?
▪ In classical set theory, an intersection between two
sets contains the elements shared by these sets
▪ In fuzzy sets, an element may partly belong to both
sets with different memberships. A fuzzy intersection
is the lower membership in both sets of each
element.
μA∩B(x) = min [μA(x), μB(x)] = μA(x) ∩ μB(x)
where xX
18
Fuzzy Sets
A  B → XA  B(x)
= XA(x)  XB(x)
= min(XA(x),XB(x))
A’ → XA’(x)
= 1 – XA(x)
A’’ = A
19
Union
Crisp Sets: Which element belongs to either set?
Fuzzy Sets: How much of the element is in either set?
▪ The union of two crisp sets consists of every element
that falls into either set.
▪ In fuzzy sets, the union is the reverse of the
intersection. That is, the union is the largest
membership value of the element in either set.
μAB(x) = max [μA(x), μB(x)] = μA(x)  μB(x)
where xX
20
Fuzzy Sets
Characteristic function X, indicating the

belongingness of x to the set A
X(x) = 1 x  A
0 xA
or called membership
Hence,
A  B → XA  B(x)
= XA(x)  XB(x)
= max(XA(x),XB(x))
21
Union of fuzzy sets A and B ∼
Intersection of fuzzy sets Aand

B∼
22
Operations of fuzzy sets
23
A  B(x) = A(x)  B(x)

= max(A(x), B(x))
A  B(x) = A(x)  B(x)
= min(A(x), B(x))
A’(x) = 1 - A(x)
De Morgan’s Law also holds:

(A  B)’ = A’  B’
(A  B)’ = A’  B’
But, in general
A  A’ X
A  A’  24
Operations
A B
AB AB A
25
A  A’ = X A  A’ = Ø
Excluded middle axioms for crisp sets. (a) Crisp set A and its
complement; (b) crisp A ∪ A = X (axiom of excluded
middle); and (c) crisp A ∩ A = Ø (axiom of contradiction).
26
A  A’ A  A’
Excluded middle axioms for fuzzy sets are not valid. (a) Fuzzy set
A and its complement;
∼ (b) fuzzy A ∪ A = X (axiom of
∼
excluded middle); and (c) fuzzy A ∩ A = Ø (axiom of

contradiction).
27
Set-Theoretic Operations
A
A B
A B
A B
28
Examples of Fuzzy Set Operations
▪ Fuzzy union (): the union of two fuzzy sets is the
maximum (MAX) of each element from two sets.
▪ E.g.
▪ A = {1.0, 0.20, 0.75}
▪ B = {0.2, 0.45, 0.50}
▪ A  B = {MAX(1.0, 0.2), MAX(0.20, 0.45), MAX(0.75,
0.50)} = {1.0, 0.45, 0.75}
29
▪ Fuzzy intersection (): the intersection of two fuzzy
sets is just the MIN of each element from the two
sets.
▪ E.g.
▪ A  B = {MIN(1.0, 0.2), MIN(0.20, 0.45), MIN(0.75,
0.50)} = {0.2, 0.20, 0.50}
30
▪ The complement of a fuzzy variable with DOM x is
(1-x).
▪ Example.
▪ Ac = {1 – 1.0, 1 – 0.2, 1 – 0.75} = {0.0, 0.8, 0.25}
31
Properties of Fuzzy Sets
AB=BA
AB=BA
A  (B  C) = (A  B)  C
A  (B  C) = (A  B)  C
A  (B  C) = (A  B)  (A  C)
A  (B  C) = (A  B)  (A  C)
AA=A AA=A
AX=X AX=A
A=A A=
If A  B  C, then A  C
A’’ = A 32
Fuzzy Sets
Note (x)  [0,1]

not {0,1} like Crisp set
A = {A(x1) / x1 + A(x2) / x2 + …}
= { A(xi) / xi}
Note: ‘+’  add
‘/ ’  divide
Only for representing element and its
membership.
Also some books use (x) for Crisp Sets too.
33
Example (Discrete Universe)
U = {1, 2,3, 4,5,6,7,8} # courses a

student may take
in a semester.
 (1, 0.1) (2, 0.3) (3, 0.8) (4,1)  appropriate
A= 
 (5, 0.9) (6, 0.5) (7, 0.2) (8, 0.1)  # courses
taken
1
A ( x)
0.5
0
2 4 6 8
x : # courses 34
Example (Discrete Universe)
U = {1, 2,3, 4,5,6,7,8} # courses a

student may take
in a semester.
 (1, 0.1) (2, 0.3) (3, 0.8) (4,1)  appropriate
A= 
 (5, 0.9) (6, 0.5) (7, 0.2) (8, 0.1)  # courses
taken
Alternative Representation:
A = 0.1/ 1 + 0.3/ 2 + 0.8/ 3 + 1.0 / 4 + 0.9 / 5 + 0.5/ 6 + 0.2 / 7 + 0.1/ 8
35
Example (Continuous Universe)
U : the set of positive real numbers possible ages
B = ( x,  B ( x)) x U 
1 about 50 years old
 B ( x) =
−
4
 x 50 
1+   1.2
 5  1
0.8
Alternative B ( x) 0.6
Representation: 0.4
0.2
B= 1
x 0
R + 1+ ( x−550 )
4 0 20 40 60 80 100
36
x : age
Alternative Notation
A = ( x,  A ( x)) x U 
U : discrete universe A= 
xi U
A ( xi ) / xi
U : continuous universe A =   A ( x) / x
U
Note that  and integral signs stand for the union of

membership grades; “ / ” stands for a marker and does not imply
division.
37
Fuzzy Disjunction
▪ AB max(A, B)
▪ AB = C "Quality C is the disjunction
of Quality A and B"
A B
1 1
0.75
0.375
0 0
• (AB = C)  (C = 0.75)
38
Fuzzy Conjunction
▪ AB min(A, B)
▪ AB = C "Quality C is the conjunction
of Quality A and B"
A B
1 1
0.75
0.375
0 0
• (AB = C)  (C = 0.375)
39
Example: Fuzzy
Conjunction
Calculate AB given that A is .4 and B is 20
A B
1 1
0 0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 5 10 15 20 25 30 35 40
40
Example: Fuzzy
Conjunction
A B
1 1
0 0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 5 10 15 20 25 30 35 40
• Determine degrees of membership:
41
Example: Fuzzy
Conjunction
A B
1 1
0.7
0 0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 5 10 15 20 25 30 35 40

• A = 0.7
42
Example: Fuzzy Conjunction
A B
1 1
0.9
0.7
0 0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 5 10 15 20 25 30 35 40

• A = 0.7 B = 0.9
43
Example: Fuzzy
Conjunction
A B
1 1
0.9
0.7
0 0
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 5 10 15 20 25 30 35 40

• A = 0.7 B = 0.9
• Apply Fuzzy AND
• AB = min(A, B) = 0.7
44
Fuzzy Logic
 Introduction
 Fuzzy sets
 Representation
 Complement
 Containment
 Fuzzy rules
1
Fuzzy rules
 In 1973, Lotfi Zadeh published his
second most influential paper. This paper
outlined a new approach to analysis of
complex systems, in which Zadeh
suggested capturing human knowledge in
fuzzy rules.
2
What is a fuzzy rule?
 A fuzzy rule can be defined as a conditional
statement in the form:
IF x is A
THEN y is B
 where x and y are linguistic variables; and A
and B are linguistic values determined by
fuzzy sets on the universe of discourses X and
Y, respectively.
3
classical vs. fuzzy rules?
 A classical IF-THEN rule uses binary logic
Rule: 1 Rule: 2
IF speed is > 100 IF speed is < 40
THEN stopping_distance is long THEN stopping_distance is short
 Representing the stopping distance rules in a fuzzy

form:
Rule: 1 Rule: 2
IF speed is fast IF speed is slow
THEN stopping_distance is long THEN stopping_distance is short
4
Fuzzy Rules
 Fuzzy rules relate fuzzy sets.
 In a fuzzy system, all rules fire to some
extent, or in other words they fire
partially.
 If the antecedent is true to some degree
of membership, then the consequent is
also true to that same degree
5
Fuzzy sets of tall and heavy men
 These fuzzy sets provide the basis for a weight

estimation model. The model is based on a relationship
between a man’s height and his weight:
IF height is tall
THEN weight is heavy 6
monotonic selection
 The value of the output or a truth membership grade
of the rule consequent can be estimated directly from
a corresponding truth membership grade in the
antecedent. This form of fuzzy inference uses a
method called monotonic selection.
7
Fuzzy Rule
 A fuzzy rule can have multiple antecedents, for
example:
IF project_duration is long
AND project_staffing is large
AND project_funding is inadequate
THEN risk is high
IF service is excellent
OR food is delicious
THEN tip is generous
8
Fuzzy Rule
 The consequent of a fuzzy rule can also include
multiple parts, for instance:
IF temperature is hot
THEN hot_water is reduced;
cold_water is increased
9
Fuzzy inference
 The most commonly used fuzzy inference
technique is the so-called Mamdani method.
 In 1975, Professor Ebrahim Mamdani of
London University built one of the first fuzzy
systems to control a steam engine and boiler
combination.
 He applied a set of fuzzy rules supplied by
experienced human operators.
1
Mamdani fuzzy inference
The Mamdani-style fuzzy inference process is

performed in four steps:
 fuzzification of the input variables,
 rule evaluation;
 aggregation of the rule outputs, and finally
 defuzzification.
2
Example
 We examine a simple two-input one-output problem
that includes three rules:
Rule: 1 Rule: 1
IF x is A3 IF project_funding is adequate
OR y is B1 OR project_staffing is small
THEN z is C1 THEN risk is low
Rule: 2 Rule: 2
IF x is A2 IF project_funding is marginal
AND y is B2 AND project_staffing is large
THEN z is C2 THEN risk is normal
Rule: 3 Rule: 3
IF x is A1 IF project_funding is inadequate
THEN z is C3 THEN risk is high
3
Identify Linguistic Variables/Values
and Inputs/Output
Rule: 1 Rule: 1
IF x is A3 IF project_funding is adequate
OR y is B1 OR project_staffing is small
THEN z is C1 THEN risk is low
Rule: 2 Rule: 2
IF x is A2 IF project_funding is marginal
AND y is B2 AND project_staffing is large
THEN z is C2 THEN risk is normal
Rule: 3 Rule: 3
IF x is A1 IF project_funding is inadequate
THEN z is C3 THEN risk is high
4
Mamdani-style fuzzy inference
5
Step 1: Fuzzification
 The process of transforming crisp quantities into
fuzzy sets/linguistic variables.
 Use hedges to generate new fuzzy sets if required.
 Project funding (A)
={A1,A2,A3}
= {inadequate, marginal, adequate}
Project staffing (B)
={B1, B2}
={small, large}
 Risk (C) = { C1, C2, C3}
={low, normal, high} 6
 given the crisp inputs, x1 and y1 (project funding
=35% and project staffing = 60%)
 Determine the degree to which these inputs belong to
each of the appropriate fuzzy sets.
Crisp Input Crisp Input
x1 y1
1 1 B1 B2
A1 A2 A3 0.7
0.5
0.2 0.1
0 0
x1 X y1 Y
 (x = A1) = 0.5  (y = B1) = 0.1
 (x = A2) = 0.2  (y = B2) = 0.7
7
Fuzzy Membership Function
 Membership function (MF) - A function that specifies the degree to which a given input
belongs to a set.
 Degree of membership- The output of a membership function, this value is always limited to
between 0 and 1. Also known as a membership value or membership grade.
 Membership functions are used in the fuzzification and defuzzification steps of a FLS (fuzzy
logic system), to map the non-fuzzy input values to fuzzy linguistic terms and vice versa
 Support: elements having non-zero degree of membership.
 Core: set with elements having degree of 1.
 α-Cut: set of elements with degree >= α.
 Height: maximum degree of membership.
 The Fuzzy Logic Toolbox includes 9 built-in membership function types.
 These 9 functions are, in turn, built from several basic functions:
 Piecewise linear functions.
 Gaussian distribution function.
 Sigmoid curve.
 Quadratic polynomial curves.
 Cubic polynomial curves.
8
Membership Function
 There are several ways to assign values to fuzzy variables: Intuition,
Inference, Rank ordering, Angular fuzzy sets, Neural networks, Genetic
algorithm, etc.
 Inference method performs deductive reasoning and uses knowledge of
geometrical shapes for defining membership values.
 membership functions may be defined by various shapes: Triangular,
Trapezoidal, Piecewise linear, Gaussian, Singleton.
9
Membership Functions in the Fuzzy
Logic Toolbox
 The simplest membership functions are formed using straight lines.
 These straight line membership functions have the advantage of simplicity.
 Triangular membership function: trimf.
 Trapezoidal membership function: trapmf.
 Two membership functions are built on the Gaussian distribution curve: a simple
Gaussian curve and a two-sided composite of two different Gaussian curves.
 The two functions are gaussmf and gauss2mf.
 The generalized bell membership function is specified by three parameters and
has the function name gbellmf.
 Sigmoidal membership function: sigmf.
 Polynomial based curves: Three related membership functions are the Z, S, and Pi
curves, all named because of their shape ( The functions zmf, smf and pimf).
 Fuzzy Logic Toolbox also allows you to create your own membership functions.
 x = (0:0.1:10)'; y1 = trapmf (x, [2 3 7 9]); y2 = trapmf (x, [3 4 6 8]); y3 = trapmf
(x, [4 5 5 7]); y4 = trapmf (x, [5 6 4 6]);
 plot (x, [y1 y2 y3 y4]);
10
Triangular membership function
 most widely accepted and used membership function (MF).
 The triangle which fuzzifies the input can be defined by three
parameters a, b and c, where and c defines the base and b
defines the height of the triangle.
 Trivial case:
 If input x = b,
then it is having full membership
in the given set.
So, μ(x) = 1, if x = b
 If input is less than a or greater then b, then it does belongs to
fuzzy set at all, and its membership value will be 0
μ(x)=0, x<a or x>c
11
 x is between a and b:
 If x is between a and b, its membership

value varies from 0 to 1.
If it is near a, its membership value is close to 0,

and if x is near to b, its membership value gets close to 1.
 We can compute the fuzzy value of x using similar triangle

rule, μ(x)= (x – a) / (b – a), a≤x≤b
12
 x is between b and c:
If x is between b and c, its membership
value varies from 0 to 1.
If it is near b, its membership value
is close to 1, and if x is near to c,
its membership value gets close to 0.
 We can compute the fuzzy value of x using similar triangle
rule, μ(x) = (c – x) / (c – b), b≤x≤c
 Combine all together:
13
Trapezoidal membership function:
14
 given the crisp inputs, x1 and y1 (project funding =35% and project staffing = 60%)
 A1 = {0,0,20,50}
 A2 = {30, 50, 75}
 B1 = {0,0,15,65}
 B2 = {35,70,100,100}
15
Step 2: Rule Evaluation
 Take the fuzzified inputs
(x=A1) = 0.5,  (x=A2) = 0.2,
 (y=B1) = 0.1  (y=B2) = 0.7
 Apply them to the antecedents of the fuzzy
rules.
 This number (the truth value) is then applied
to the consequent membership function.
16
Rule Evaluation (cont.)
 To evaluate the disjunction of the rule antecedents,
we use the OR fuzzy operation. Typically, fuzzy
expert systems make use of the classical fuzzy
operation union:
AB(x) = max [A(x), B(x)]
 Similarly, in order to evaluate the conjunction of the
rule antecedents, we apply the AND fuzzy operation
intersection:
AB(x) = min [A(x), B(x)]
17
Mamdani-style rule evaluation
1 1 1
A3 B1 C1 C2 C3
0.1 OR 0.1
0.0
(max)
0 x1 X 0 y1 Y 0 Z
Rule 1: IF x is A3 (0.0) OR y is B1 (0.1) THEN z is C1 (0.1)

1 1 1
0.7
C1 C2 C3
A2 0.2 B2 AND 0.2
(min)
0 x1 X 0 y1 Y 0 Z
Rule 2: IF x is A2 (0.2) AND y is B2 (0.7) THEN z is C2 (0.2)
1 1
A1 0.5 0.5 C1 C2 C3
0 x1 X 0 Z
Rule 3: IF x is A1 (0.5) THEN z is C3 (0.5)
18
Rule Evaluation (cont.)
 Now the result of the antecedent evaluation can be
applied to the membership function of the consequent.
 clipping : cut the consequent membership function at the
level of the antecedent truth..
 Since the top of the membership function is sliced, the
clipped fuzzy set loses some information.
 scaling : The original membership function of the rule
consequent is adjusted by multiplying all its membership
degrees by the truth value of the rule antecedent.
 offers a better approach for preserving the original shape
of the fuzzy set.
 Generally loses less information
19
Clipped and scaled membership
functions
Degree of Degree of
Membership Membership
1.0 1.0
C2 C2
0.2 0.2
0.0 0.0
Z Z
20
Step 3: Aggregation of the rule
outputs
 Process of unification of the outputs of all rules.
 take the membership functions of all rule
consequents previously clipped or scaled and
combine them into a single fuzzy set.
 Input: the list of clipped or scaled consequent
membership functions
 Output: one fuzzy set for each output variable.
1 1 1
C1 C2 C3
0.5 0.5
0.2 0.2
0.1 0.1
0 Z 0 Z 0 Z 0 Z
z is C 1 (0.1) z is C 2 (0.2) z is C 3 (0.5) 
21
Step 4: Defuzzification
 Fuzziness helps us to evaluate the rules, but the final
output of a fuzzy system has to be a crisp number.
 The input for the defuzzification process is the
aggregate output fuzzy set and the output is a single
number.
 There are several defuzzification methods
 The most popular one is the centroid technique. It
finds the point where a vertical line would slice the
aggregate set into two equal masses.
22
Defuzzification (cont.)
 Mathematically this centre of gravity (COG) can be
b
expressed as:   x  x dx
 A
COG  a
b
  A  x  dx
a
 Centroid defuzzification method finds a point

representing the centre of gravity of the fuzzy set, A,
on the interval, ab.
 A reasonable estimate can be obtained by calculating
it over a sample of points.
23
Centre of gravity (COG)
(0  10  20)  0.1  (30  40  50  60)  0.2  (70  80  90  100)  0.5
COG   67.4
0.1  0.1  0.1  0.2  0.2  0.2  0.2  0.5  0.5  0.5  0.5
Degree of
Membership
1.0
0.8
0.6
0.4
0.2
0.0
0 10 20 30 40 50 60 70 80 90 100
67.4 Z
24
Sugeno fuzzy inference
 Mamdani-style inference: find the centroid of a two-
dimensional shape by integrating across a
continuously varying function.
 Not computationally efficient.
 Michio Sugeno suggested to use a single spike, a
singleton, as the membership function of the rule
consequent.
 A fuzzy singleton is a fuzzy set with a membership
function that is unity at a single particular point on
the universe of discourse and zero everywhere else.
1
Sugeno fuzzy inference (cont.)
 Sugeno-style fuzzy inference is very similar to the
Mamdani method.
 Sugeno changed only a rule consequent.
 Instead of a fuzzy set, he used a mathematical
function of the input variable.
IF x is A
AND y is B
THEN z is f (x, y)
where x, y and z are linguistic variables; A and B are
fuzzy sets on universe of discourses X and Y,
respectively; and f (x, y) is a mathematical function.
2
Sugeno fuzzy inference (cont.)
 The most commonly used zero-order Sugeno fuzzy
model applies fuzzy rules in the following form:
IF x is A
AND y is B
THEN z is k
where k is a constant.
 In this case, the output of each fuzzy rule is constant.
 All consequent membership functions are
represented by singleton spikes.
3
Sugeno-style rule evaluation
4
Sugeno-style aggregation of the rule
outputs
5
 Weighted average (WA):
 Sugeno-style defuzzification
6
Mamdani or Sugeno?
 Mamdani method
 Widely accepted for capturing expert knowledge.
 Allows description of the expertise in more intuitive,
more human-like manner.
 Entails a substantial computational burden.
 Sugeno method
 Computationally effective
 Works well with optimisation and adaptive techniques
 Makes it very attractive in control problems,
particularly for dynamic nonlinear systems.
7
Building a fuzzy expert system: case
study
 A service centre keeps spare parts and repairs failed
ones.
 A customer brings a failed item and receives a spare
of the same type.
 Failed parts are repaired, placed on the shelf, and
thus become spares.
 The objective here is to advise a manager of the
service centre on certain decision policies to keep the
customers satisfied.
1
Process of developing a fuzzy expert
system
 1. Specify the problem and define linguistic
variables.
 2. Determine fuzzy sets.
 3. Elicit and construct fuzzy rules.
 4. Encode the fuzzy sets, fuzzy rules and
procedures to perform fuzzy inference into the
expert system.
 5. Evaluate and tune the system.
2
Step 1: Specify the problem and define
linguistic variables
 There are four main linguistic variables:
 average waiting time (mean delay) m,
 repair utilization factor of the service
centre ,
 number of servers s,
 initial number of spare parts n.
3
Linguistic variables and their ranges
Linguistic Variable: Mean Delay, m
Linguistic Value Notation Numerical Range (normalised)
Very Short VS [0, 0.3]
Short S [0.1, 0.5]
Medium M [0.4, 0.7]
Linguistic Variable: Number of Servers, s
Small S [0, 0.35]
Medium M [0.30, 0.70]
Large L [0.60, 1]
Factor, 
Linguistic Variable: Repair Utilisation Factor,
Linguistic Value Notation Numerical Range
Low L [0, 0.6]
Medium M [0.4, 0.8]
High H [0.6, 1]
Linguistic Variable: Number of Spares, n
Very Small VS [0, 0.30]
Small S [0, 0.40]
Rather Small RS [0.25, 0.45]
Medium M [0.30, 0.70]
Rather Large RL [0.55, 0.75]
Large L [0.60, 1]
Very Large VL [0.70, 1]
4
Step 2: Determine fuzzy sets
 Fuzzy sets can have a variety of shapes.
 A triangle or a trapezoid can often provide an
adequate representation of the expert knowledge, and
at the same time, significantly simplifies the process
of computation.
5
Fuzzy sets of Mean Delay m
Degree of
Membership
1.0
0.8 VS S M
0.6
0.4
0.2
0.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Mean Delay (normalised)
6
Fuzzy sets of Number of Servers s
Degree of
Membership
1.0
0.8 S M L
0.6
0.4
0.2
0.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Number of Servers (normalised)
7
Fuzzy sets of Repair Utilisation Factor 
Degree of
Membership
1.0
0.8 L M H
0.6
0.4
0.2
0.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Repair Utilisation Factor
8
Fuzzy sets of Number of Spares n
Degree of
Membership
1.0
0.8 VS S RS M RL L VL
0.6
0.4
0.2
0.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Number of Spares (normalised)
9
Step 3: Elicit and construct fuzzy
rules
 To accomplish this task, we might ask the expert to
describe how the problem can be solved using the
fuzzy linguistic variables defined previously.
 Required knowledge also can be collected from other
sources such as books, computer databases, flow
diagrams and observed human behaviour.
10
Fuzzy Associative Memeory (FAM)
square
No. of Spares
11
Rule Base 1
1. If (utilisation_factor is L) then (number_of_spares is S)

2. If (utilisation_factor is M) then (number_of_spares is M)
3. If (utilisation_factor is H) then (number_of_spares is L)
4. If (mean_delay is VS) and (number_of_servers is S) then (number_of_spares is VL)
5. If (mean_delay is S) and (number_of_servers is S) then (number_of_spares is L)
6. If (mean_delay is M) and (number_of_servers is S) then (number_of_spares is M)
7. If (mean_delay is VS) and (number_of_servers is M) then (number_of_spares is RL)
8. If (mean_delay is S) and (number_of_servers is M) then (number_of_spares is RS)
9. If (mean_delay is M) and (number_of_servers is M) then (number_of_spares is S)
10.If (mean_delay is VS) and (number_of_servers is L) then (number_of_spares is M)
11.If (mean_delay is S) and (number_of_servers is L) then (number_of_spares is S)
12.If (mean_delay is M) and (number_of_servers is L) then (number_of_spares is VS)
12
The rule table
13
Cube FAM of Rule Base 2
14
Step 4
 Encode the fuzzy sets, fuzzy rules and procedures to
perform fuzzy inference into the expert system
 two options:
 build our system using a programming language such
as C/C++ or Pascal,
 apply a fuzzy logic development tool such as
MATLAB Fuzzy Logic Toolbox or Fuzzy Knowledge
Builder.
15
Step 5: Evaluate and tune the system
 We want to see whether our fuzzy system meets the
requirements specified at the beginning.
 Several test situations depend on the mean delay,
number of servers and repair utilisation factor.
 The Fuzzy Logic Toolbox can generate surface to
help us analyze the system’s performance.
16
Three-dimensional plots for Rule Base
1
17
1
18
2
19
2
20
Modified fuzzy sets of Number of
Servers s
21
Cube FAM of Rule Base 3
22
3
23
3
24
Tuning fuzzy systems
 1. Review model input and output variables, and if
required redefine their ranges.
 2. Review the fuzzy sets, and if required define
additional sets on the universe of discourse. The use
of wide fuzzy sets may cause the fuzzy system to
perform roughly.
 3. Provide sufficient overlap between neighbouring
sets. It is suggested that triangle-to-triangle and
trapezoid-to-triangle fuzzy sets should overlap
between 25% to 50% of their bases.
25
Tuning fuzzy systems
 4. Review the existing rules, and if required add new
rules to the rule base.
 5. Examine the rule base for opportunities to write
hedge rules to capture the pathological behaviour of
the system.
 6. Adjust the rule execution weights. Most fuzzy
logic tools allow control of the importance of rules
by changing a weight multiplier.
 7. Revise shapes of the fuzzy sets. In most cases,
fuzzy systems are highly tolerant of a shape
approximation.
26
Introduction to
Artificial Neural Networks
Background
- Neural Networks can be :
- Biological models
- Artificial models
- Desire to produce artificial systems capable of

sophisticated computations similar to the human
brain.
2
Biological analogy and some main ideas
 The brain is composed of a mass of interconnected neurons

 each neuron is connected to many other neurons
 Neurons transmit signals to each other
 Whether a signal is transmitted is an all-or-nothing event

(the electrical potential in the cell body of the neuron is
thresholded)
 Whether a signal is sent, depends on the strength of the bond

(synapse) between two neurons
3
How Does the Brain Work ? (1)
NEURON
- The cell that performs information processing in the
brain.
- Fundamental functional unit of all nervous system tissue.
4
How Does the Brain Work ? (2)
Each consists of :
SOMA, DENDRITES, AXON, and SYNAPSE.
5
Brain vs. Digital Computers (1)
- Computers require hundreds of cycles to simulate

a firing of a neuron.
- The brain can fire all the neurons in a single step.

Parallelism
- Serial computers require billions of cycles to

perform some tasks but the brain takes less than
a second.
e.g. Face Recognition
6
Human Computer
Processing 100 Billion 10 Million

Elements neurons gates
Interconnects 1000 per neuron A few
Cycles per sec 1000 500 Million
2X improvement 200,000 Years 2 Years
7
Future : combine parallelism of the brain with the

switching speed of the computer.
8
History
 1943: McCulloch & Pitts show that neurons can be
combined to construct a Turing machine (using ANDs,
ORs, & NOTs)
 1958: Rosenblatt shows that perceptrons will converge if

what they are trying to learn can be represented
 1969: Minsky & Papert showed the limitations of

perceptrons, killing research for a decade
 1985: backpropagation algorithm revitalizes the field
9
Definition of Neural Network
A Neural Network is a system composed of
many simple processing elements operating in
parallel which can acquire, store, and utilize
experiential knowledge.
10
Neurons vs. Units (1)
- Each element of NN is a node called unit.
- Units are connected by links.
- Each link has a numeric weight.
Notation
11
Notation (cont.)
12
Computing Elements
A typical unit:
13
A Computing Unit.
Now in more detail but for a particular model only
14
Calculations
Input function:
Activation function g:
15
Simple Computations in this network
- There are 2 types of components:

- Linear and Non-linear.
- Linear: Input function

- calculate weighted sum of all inputs.
- Non-linear: Activation function

- transform sum into activation level.
16
Activation Functions
- Use different functions to obtain different models.
- 3 most common choices :
1) Step function
2) Sign function
3) Sigmoid function
- An output of 1 represents firing of a neuron down

the axon.
17
Activation Functions
18
Standard structure of an artificial neural network
 Input units
 represents the input as a fixed-length vector of numbers (user
defined)
 Hidden units
 calculate thresholded weighted sums of the inputs
 represent intermediate calculations that the network learns
 Output units
 represent the output as a fixed length vector of numbers
19
Neural Network Example
A very simple, two-layer, feed-forward network with two inputs,

two hidden nodes, and one output node.
20
Units in Action
- Individual units representing Boolean functions
21
Network Structures
 The main distinction is between feed-forward and recurrent networks.
 feed-forward network, links are unidirectional, and there are no cycles.
 recurrent network, the links can form arbitrary topologies.
 Technically speaking, a feed-forward network is a directed acyclic graph (DAG).
 We deal with networks that are arranged in layers.
 In a layered feed-forward network, each unit is linked only to units in the next
layer;
 there are no links between units in the same layer, no links backward to a
previous layer, and no links that skip a layer.
22
Network Structures
 lack of cycles - computation can proceed uniformly from input units to output
units.
 The activation from the previous time step plays no part in the computation,
because it is not fed back to an earlier unit.
 Hence, a feed-forward network simply computes a function of the input values
that depends on the weight settings—it has no internal state other than the
weights themselves.
 Such networks can implement adaptive versions of simple reflex agents or they
can function as components of more complex agents.
 we will focus on feed-forward networks because they are relatively well-
understood.
 Obviously, the brain cannot be a feed-forward network, else we would have no
short-term memory.
 Some regions of the brain are largely feed-forward and somewhat layered, but
there are rampant back-connections.
 In our terminology, the brain is a recurrent network.
23
Network Structures
 Recurrent networks can become unstable, or oscillate, or exhibit chaotic
behavior.
 Given some input values, it can take a long time to compute a stable output, and
learning is made more difficult.
 On the other hand, recurrent networks can implement more complex agent
designs and can model systems with state.
24
Hopfield networks
 Hopfield networks are probably the best-understood class of recurrent networks.
 They use bidirectional connections with symmetric weights
 all of the units are both input and output units;
 the activation function g is the sign function; and the activation levels can only
be ± 1.
 functions as an associative memory—after training on a set of examples, a new
stimulus will cause the network to settle into an activation pattern corresponding
to the example in the training set that most closely resembles the new stimulus.
 One of the most interesting theoretical results is that Hopfield networks can
reliably store up to 0.1387V training examples, where N is the number of units in
the network.
25
Boltzmann machines
 Boltzmann machines also use symmetric weights, but include units that are
neither input nor output units.
 They also use a stochastic activation function, such that the probability of the
output being 1 is some function of the total weighted input.
 Boltzmann machines therefore undergo state transitions that resemble a
simulated annealing search for the configuration that best approximates the
training set.
 It turns out that Boltzmann machines are formally identical to a special case of
belief networks evaluated with a stochastic simulation algorithm.
 Some networks, called perceptrons, have no hidden units.
 This makes the learning problem much simpler, but it means that perceptrons are
very limited in what they can represent.
 Networks with one or more layers of hidden units are called multilayer networks.
 With one (sufficiently large) layer of hidden units, it is possible to represent any
continuous function of the inputs; with two layers, even discontinuous functions
can be represented.
26
Example
 With a fixed structure and fixed
activation functions g, the functions
representable by a feed-forward network
are restricted to have a specific
parameterized structure.
 The weights chosen for the network
determine which of these functions is
actually represented.
 where g is the activation function, and a, is the output of node i.

 because the activation functions g are nonlinear, the whole network represents
a complex nonlinear function.
 think of the weights as parameters or coefficients of this function, then
learning just becomes a process of tuning the parameters to fit the data in the
training set—a process that statisticians call nonlinear regression.
27
Optimal Network Structure
 So far we have considered networks with a fixed structure, determined by some
outside authority.
 This is a potential weak point, because the wrong choice of network structure
can lead to poor performance.
 If we choose a network that is too small, then the model will be incapable of
representing the desired function.
 If we choose a network that is too big, it will be able to memorize all the
examples by forming a large lookup table, but will not generalize well to inputs
that have not been seen before.
 In other words, like all statistical models, neural networks are subject to
overfitting when there are too many parameters (i.e., weights) in the model.
28
 It is known that a feed-forward network with one hidden layer can approximate
any continuous function of the inputs, and a network with two hidden layers can
approximate any function at all.
 However, the number of units in each layer may grow exponentially with the
number of inputs.
 As yet, we have no good theory to characterize NERFs, or Network Efficiently
Representable Functions—functions that can be approximated with a small
number of units.
 We can think of the problem of finding a good network structure as a search
problem.
 One approach that has been used is to use a genetic algorithm to search the space
of network structures.
 However, this is a very large space, and evaluating a state in the space means
running the whole neural network training protocol, so this approach is very
CPU-intensive.
 Therefore, it is more common to see hill-climbing searches that selectively
modify an existing network structure.
29
 There are two ways to do this: start with a big network and make it smaller, or
start with a small one and make it bigger.
 A mechanism called optimal brain damage to remove weights from the initial
fully-connected model.
 After the network is initially trained, an information theoretic approach identifies
an optimal selection of connections that can be dropped (i.e., the weights are set
to zero).
 The network is then retrained, and if it is performing as well or better, the
process is repeated.
 This process was able to eliminate 3/4 of the weights, and improve overall
performance on test data.
 In addition to removing connections, it is also possible to remove units, that are
not contributing much to the result.
30
 Several algorithms have been proposed for growing a larger network from a
smaller one.
 The tiling algorithm (Mezard and Nadal, 1989) is interesting because it is similar
to the decision tree learning algorithm.
 The idea is to start with a single unit that does its best to produce the correct
output on as many of the training examples as possible.
 Subsequent units are added to take care of the examples that the first unit got
wrong.
 The algorithm adds only as many units as are needed to cover all the examples.
 The cross-validation techniques are useful for deciding when we have found a
network of the right size.
31
Perceptrons
 Layered feed-forward networks were first studied in the late 1950s under the
name perceptrons.
 Although networks of all sizes and topologies were considered, the only
effective learning element at the time was for single-layered networks, so that is
where most of the effort was spent.
 Today, the name perceptron is used as a synonym for a single-layer, feed-
forward network.
 The left-hand side of Figure shows such a
perceptron network.
 Notice that each output unit is independent of the
others — each weight only affects one of the
outputs.
 That means that we can limit our study to
perceptrons with a single output unit, as in the
right-hand side of Figure, and use several of them
to build up a multi-output perceptron.
32
What can Perceptrons Represent ?
 We saw that units can represent the simple Boolean functions AND, OR, and
NOT, and that therefore a feed-forward network of units can represent any
Boolean function, if we allow for enough layers and units.
 But what Boolean functions can be represented with a single-layer perceptron?
 Some complex Boolean functions can be represented.
 For example, the majority function, which outputs a 1 only if more than half of
its n inputs are 1, can be represented by a perceptron with each Wj, = 1 and
threshold t= n/2.
 This would require a decision tree with O(2^n) nodes.
33
 Figure shows three different Boolean functions of two inputs, the AND, OR, and
XOR functions.
 Each function is represented as a two-dimensional plot, based on the values of
the two inputs.
 Black dots indicate a point in the input space where the value of the function is
1, and white dots indicate a point where the value is 0.
 , a perceptron can represent a function only if there is some line that separates all
the white dots from the black dots.
 Such functions are called linearly separable.
 Thus, a perceptron can represent AND and OR, but not XOR.
34
 The fact that a perceptron can only represent linearly separable functions follows
directly from Equation
 which defines the function computed by a perceptron.
 A perceptron outputs a 1 only if W • I > 0.
 This means that the entire input space is divided in two along a boundary defined
by W • I = 0, that is, a plane in the input space with coefficients given by the
weights.
 With n inputs, the input space is n-dimensional, and linear separability can be
rather hard to visualize if n is too large.
 It is easiest to understand for the case where n = 2.
 In Figure (a), one possible separating "plane" is the dotted line defined by the
equation
35
 With three inputs, the separating plane can still be visualized. Figure shows an
example in three dimensions.
 The function we are trying to represent is true if and only if a minority of its
three inputs are true.
 The shaded separating plane is defined by the equation I1 +I2 + I3 = 1.5
 This time the positive outputs lie below the plane, in the region (-I1) + (-I2) + (-I
3 )>-1.5
 Figure (b) shows a unit to implement the function
36
Learning linearly separable functions
 As with any performance element, the question of what perceptrons can
represent is prior to the question of what they can learn.
 We have just seen that a function can be represented by a perceptron if and only
if it is linearly separable.
 That is relatively bad news, because there are not many linearly separable
functions.
 The (relatively) good news is that there is a perceptron algorithm that will learn
any linearly separable function, given enough training examples.
 Most neural network learning algorithms, including the perceptron learning
method, follow the current-best-hypothesis (CBH), in this case, the hypothesis is
a network, defined by the current values of the weights.
 The initial network has randomly assigned weights, usually from the range [-
0.5,0.5].
 The network is then updated to try to make it consistent with the examples. This
is done by making small adjustments in the weights to reduce the difference
between the observed and predicted values.
37
 The main difference from the logical algorithms is the need to repeat the update
phase several times for each example in order to achieve convergence.
 Typically, the updating process is divided into epochs.
 Each epoch involves updating all the weights for all the examples.
 The general scheme is shown as NEURAL-NETWORK-LEARNING
 For perceptrons, the weight update rule is particularly simple.
 If the predicted output for the single output unit is O, and the correct output
should be T, then the error is given by Err = T-O
 If the error is positive, then we need to increase O; if it is negative, we need to
decrease O.
 Now each input unit contributes Wj Ij to the total input, so if Ij is positive, an
increase in Wj will tend to increase O, and if Ij is negative, an increase in Wj will
tend to decrease O.
 Thus, we can achieve the effect we want with the following rule:

 where the term a is a constant called the learning rate.

38
39
 This rule is a slight variant of the perceptron learning rule proposed by Frank
Rosenblatt in 1960.
 Rosenblatt proved that a learning system using the perceptron learning rule will
converge to a set of weights that correctly represents the examples, as long as the
examples represent a linearly separable function.
 The perceptron convergence theorem created a good deal of excitement when it
was announced.
 People were amazed that such a simple procedure could correctly learn any rep-
resentable function, and there were great hopes that intelligent machines could
be built from perceptrons.
 It was not until 1969 that Minsky and Papert undertook what should have been
the first step: analyzing the class of representable functions.
 Their book Perceptrons (Minsky and Papert, 1969) clearly demonstrated the
limits of linearly separable functions.
40
Multi-Layer Feed Forward
Networks
Multi-Layer Feed Forward Networks
 Rosenblatt and others described multilayer feed-forward networks in the late
1950s, but concentrated their research on single-layer perceptrons.
 This was mainly because of the difficulty of finding a sensible way to update the
weights between the inputs and the hidden units; whereas an error signal can be
calculated for the output units, it is harder to see what the error signal should be
for the hidden units.
 When the book Perceptrons was published, Minsky and Papert (1969) stated that
it was an "important research problem" to investigate multilayer networks more
thoroughly.
 In a sense, they were right. Learning algorithms for multilayer networks are
neither efficient nor guaranteed to converge to a global optimum.
 On the other hand, the results of computational learning theory tell us that
learning general functions from examples is an intractable problem in the worst
case, regardless of the method, so we should not be too dismayed.
 The most popular method for learning in multilayer networks is called back-
propagation.
2
Multi-Layer Feed Forward Networks
 It was first invented in 1969 by Bryson and Ho, but was more or less ignored
until the mid- 1980s.
 The reasons for this may be sociological, but may also have to do with the
computational requirements of the algorithm on nontrivial problems.
3
N-layer FeedForward Network
 Layer 0 is input nodes
 Layers 1 to N-1 are hidden nodes
 Layer N is output nodes
 All nodes at any layer k are connected to all nodes at layer

k+1
 There are no cycles
4
Back-Propagation Learning
 Learning in a network proceeds the same way as for perceptrons: example inputs
are presented to the network, and if the network computes an output vector that
matches the target, nothing is done.
 If there is an error (a difference between the output and target), then the weights
are adjusted to reduce this error.
 The trick is to assess the blame for an error and divide it among the contributing
weights.
 In perceptrons, this is easy, because there is only one weight between each input
and the output.
 But in multilayer networks, there are many weights connecting each input to an
output, and each of these weights contributes to more than one output.
 The back-propagation algorithm is a sensible approach to dividing the
contribution of each weight.
 As in the perceptron learning algorithm, we try to minimize the error between
each target output and the output actually computed by the network.
 At the output layer, the weight update rule is very similar to the rule for the
perceptron.
5

6

7
8

9
 In Figure, we show two curves.
 The first is a training curve, which shows the
mean squared error on a given training set of 100
examples during the weight-updating process.
 This demonstrates that the network does indeed
converge to a perfect fit to the training data.
 The second curve is the standard learning curve
for the restaurant data, with one minor exception:
the y-axis is no longer the proportion of correct
answers on the test set, because sigmoid units do
not give 0/1 outputs.
 Instead, we use the mean squared error on the test
set, which happens to coincide with the proportion
of correct answers in the 0/1 case.
 The curve clearly shows that the network is
capable of learning in the restaurant domain;
indeed, the curve is very similar to that for
decision-tree learning, albeit somewhat shallower.
10
Back-propagation as gradient descent search
 the gradient is on the error surface: the surface that
describes the error on each example as a function of
the all the weights in the network.
 An example error surface is shown in Figure. The
current set of weights defines a point on this
surface.
 At that point, we look at the slope of the surface
along the axis formed by each weight.
 This is known as the partial derivative of the surface
with respect to each weight—how much the error
would change if we made a small change in weight.
 We then alter the weights in an amount proportional
to the slope in each direction.
 This moves the network as a whole in the direction
of steepest descent on the error surface.
11

12

13

14
Discussion

15
Discussion
 Computational efficiency: Computational efficiency depends on the amount of
computation time required to train the network to fit a given set of examples.
 If there are m examples, and |W| weights, each epoch takes O(m|W|) time.
 However, work in computational learning theory has shown that the worst-case
number of epochs can be exponential in n, the number of inputs.
 In practice, time to convergence is highly variable, and a vast array of techniques
have been developed to try to speed up the process using an assortment of
tunable parameters.
 Local minima in the error surface are also a problem.
 Networks quite often converge to give a constant "yes" or "no" output,
whichever is most common in the training set.
 At the cost of some additional computation, the simulated annealing method can
be used to assure convergence to a global optimum.
16
Discussion
 Generalization: neural networks can do a good job of generalization.
 One can say, somewhat circularly, that they will generalize well on functions for
which they are well-suited.
 These seem to be functions in which the interactions between inputs are not too
intricate, and for which the output varies smoothly with the input.
 There is no theorem to be proved here, but it does seem that neural networks
have had reasonable success in a number of real-world problems.
17
Discussion
 Sensitivity to noise: Because neural networks are essentially doing nonlinear
regression, they are very tolerant of noise in the input data.
 They simply find the best fit given the constraints of the network topology.
 On the other hand, it is often useful to have some idea of the degree of certainty
of the output values.
 Neural networks do not provide probability distributions on the output values.
 For this purpose, belief networks seem more appropriate.
 Transparency: Neural networks are essentially black boxes.
 Even if the network does a good job of predicting new cases, many users will
still be dissatisfied because they will have no idea why a given output value is
reasonable.
 If the output value represents, for example, a decision to perform open heart
surgery, then an explanation is clearly in order.
 With decision trees and other logical representations, the output can be explained
as a logical derivation and by appeal to a specific set of cases that supports the
decision.
 This is not currently possible with neural networks.
18
Discussion
 Prior knowledge: learning systems can often benefit from prior knowledge that is
available to the user or expert.
 Prior knowledge can mean the difference between learning from a few well-
chosen examples and failing to learn anything at all.
 Unfortunately, because of the lack of transparency, it is quite hard to use one's
knowledge to "prime" a network to learn better.
 Some tailoring of the network topology can be done—for example, when
training on visual images it is common to connect only small sets of nearby
pixels to any given unit in the first hidden layer.
 On the other hand, such "rules of thumb" do not constitute a mechanism by
which previously accumulated knowledge can be used to learn from subsequent
experience.
 It is possible that learning methods for belief networks can overcome this
problem
19
Discussion
 All these considerations suggest that simple feed-forward networks, although
very promising as construction tools for learning complex input/output
mappings, do not fulfil our needs for a comprehensive theory of learning in their
present form.
 Researchers in AI, psychology, theoretical computer science, statistics, physics,
and biology are working hard to overcome the difficulties.
20
Applications of Neural Networks
 a few examples of the many significant applications of neural networks.
 In each case, the network design was the result of several months of trial-and-
error experimentation by researchers.
 From these examples, it can be seen that neural networks have wide
applicability, but that they cannot magically solve problems without any thought
on the part of the network designer.
 John Denker's remark that "neural networks are the second best way of doing
just about anything" may be an exaggeration, but it is true that neural networks
provide passable performance on many tasks that would be difficult to solve
explicitly with other programming techniques.
 We encourage the reader to experiment with neural network algorithms to get a
feel for what happens when data arrive at an unprepared network.
21
 Pronunciation Pronunciation of written English text by a computer is a
fascinating problem in linguistics, as well as a task with high commercial payoff.
 It is typically carried by first mapping the text stream to phonemes—basic sound
elements—and then passing the phonemes to an electronic speech generator.
 The problem we are concerned with here is learning the mapping from text to
phonemes.
 This is a good task for neural networks because most of the "rules" are only
approximately correct.
 For example, although the letter "k" usually corresponds to the sound [k], the
letter "c" is pronounced [k] in cat and [s] in cent.
 The NETtalk program (Sejnowski and Rosenberg, 1987) is a neural network that
learns to pronounce written text.
 The input is a sequence of characters presented in a window that slides through
the text.
 At any time, the input includes the character to be pronounced along with the
preceding and following three characters.
 Each character is actually 29 input units—one for each of the 26 letters, and one
22
each for blanks, periods, and other punctuation.
 There were 80 hidden units in the version for which results are reported.
 The output layer consists of features of the sound to be produced: whether it is
high or low, voiced or unvoiced, and so on.
 Sometimes, it takes two or more letters to produce a single sound; in this case,
the correct output for the second letter is nothing.
 Training consisted of a 1024-word text that had been hand-transcribed into the
proper phonemic features.
 NETtalk learns to perform at 95% accuracy on the training set after 50 passes
through the training data.
 One might think that NETtalk should perform at 100% on the text it has trained
on.
 But any program that learns individual words rather than the entire text as a
whole will inevitably score less than 100%.
 The difficulty arises with words like lead, which in some cases should be
pronounced to rhyme with bead and sometimes like bed.
 A program that looks at only a limited window will occasionally get such words
wrong.
23
 So much for the ability of the network to reproduce the training data.
 What about the generalization performance? This is somewhat disappointing.
 On the test data, NETtalk's accuracy goes down to 78%, a level that is
intelligible, but much worse than commercially available programs.
 Of course, the commercial systems required years of development, whereas
NETtalk only required a few dozen hours of training time plus a few months of
experimentation with various network designs.
 However, there are other techniques that require even less development and
perform just as well.
 For example, if we use the input to determine the probability of producing a
particular phoneme given the current and previous character and then use a
Markov model to find the sequence of phonemes with maximal probability, we
do just as well as NETtalk.
 NETtalk was perhaps the "flagship" demonstration that converted many
scientists, particularly in cognitive psychology, to the cause of neural network
research.
24
 A post hoc analysis suggests that this was not because it was a particularly
successful program, but rather because it provided a good showpiece for the
philosophy of neural networks.
 Its authors also had a flair for the dramatic: they recorded a tape of NETtalk
starting out with poor, babbling speech, and then gradually improving to the
point where the output is understandable.
 Unlike conventional speech generators, which use a midrange tenor voice to
generate the phonemes, they used a high-pitched generator.
 The tape gives the unmistakable impression of a child learning to speak.
25
 Handwritten character recognition In one of the largest applications of neural
networks to date, Le Cun et al. (1989) have implemented a network designed to
read zip codes on hand-addressed envelopes.
 The system uses a preprocessor that locates and segments the individual digits in
the zipcode; the network has to identify the digits themselves.
 It uses a 16 x 16 array of pixels as input, three hidden layers, and a distributed
output encoding with 10 output units for digits 0-9.
 The hidden layers contained 768, 192, and 30 units, respectively.
 A fully connected network of this size would contain 200,000 weights, and
would be impossible to train.
 Instead, the network was designed with connections D!TECTORS intended to
act as feature detectors.
 For example, each unit in the first hidden layer was connected by 25 links to a
5 x 5 region in the input.
 Furthermore, the hidden layer was divided into 12 groups of 64 units; within
each group of 64 units, each unit used the same set of 25 weights.
26
 Hence the hidden layer can detect up to 12 distinct features, each of which can
occur anywhere in the input image.
 Overall, the complete network used only 9760 weights.
 The network was trained on 7300 examples, and tested on 2000.
 One interesting property of a network with distributed output encoding is that it
can display confusion over the correct answer by setting two or more output
units to a high value.
 After rejecting about 12% of the test set as marginal, using a confusion
threshold, the performance on the remaining cases reached 99%, which was
deemed adequate for an automated mail-sorting system.
 The final network has been implemented in custom VLSI, enabling letters to be
sorted at high speed.
27
 Driving ALVINN (Autonomous Land Vehicle In a Neural Network)
(Pomerleau, 1993) is a neural network that has performed quite well in a domain
where some other approaches have failed.
 It learns to steer a vehicle along a single lane on a highway by observing the
performance of a human driver.
 We described the system briefly on page 26, but here we take a look under the
hood.
 ALVINN is used to control the NavLab vehicles at Carnegie Mellon University.
NavLab 1 is a Chevy van, and NavLab 2 is a U.S. Army HMMWV personnel
carrier.
 Both vehicles are specially outfitted with computer-controlled steering,
acceleration, and braking.
 Sensors include color stereo video, scanning laser range finders, radar, and
inertial navigation.
28
ALVINN
ALVINN - Autonomous Land Vehicle In a Neural Network
29
 Researchers ride along in the vehicle and monitor the progress of the computer
and the vehicle itself.
 (Being inside the vehicle is a big incentive to making sure the program does not
"crash.")
 The signal from the vehicle's video camera is preprocessed to yield an array of
pixel values that are connected to a 30 x 32 grid of input units in a neural
network.
 The output is a layer of 30 units, each corresponding to a steering direction.
 The output unit with the highest activation is the direction that the vehicle will
steer.
 The network also has a layer of five hidden units that are fully connected to the
input and output layers.
 ALVINN'S job is to compute a function that maps from a single video image of
the road in front of it to a steering direction.
 To learn this function, we need some training data—some image/direction pairs
with the correct direction.
30
 Fortunately, it is easy to collect this data just by having a human drive the
vehicle and recording the image/direction pairs.
 After collecting about five minutes of training data (and applying the back-
propagation algorithm for about ten minutes), ALVINN is ready to drive on its
own.
 One fine point is worth mentioning. There is a potential problem with the
methodology of training based on a human driver: the human is too good.
 If the human never strays from the proper course then there will be no training
examples that show how to recover when you are off course.
 ALVINN corrects this problem by rotating each video image to create additional
views of what the road would look like from a position a little to the right or left.
The results of the training are impressive.
 ALVINN has driven at speeds up to 70 mph for distances up to 90 miles on
public highways near Pittsburgh.
 It has also driven at normal speeds on single lane dirt roads, paved bike paths,
and two lane suburban streets.
31
 ALVINN is unable to drive on a road type for which it has not been trained, and
is also not very robust with respect to changes in lighting conditions or the
presence of other vehicles.
 A more gerleral capability is exhibited by the MANIAC system (Jochem et al.,
1993).
 MANIAC is a neural network that has as subnets two or more ALVINN models
that have each been trained for a particular type of road.
 MANIAC takes the output from each subnet and combines them in a second
hidden layer.
 With suitable training, MANIAC can perform well on any of the road types for
which the component subnets have been trained.
 Some previous autonomous vehicles employed traditional vision algorithms that
used various image-processing techniques on the entire scene in order to find the
road and then follow it.
 Such systems achieved top speeds of 3 or 4 mph. 5
32
 Why has ALVINN proven to be successful? There are two reasons.
 First and foremost, a neural network of this size makes an efficient performance
element.
 Once it has been trained, ALVINN is able to compute a new steering direction
from a video image 10 times a second.
 This is important because it allows for some slack in the system.
 Individual steering directions can be off by 10% from the ideal as long as the
system is able to make a correction in a few tenths of a second.
 Second, the use of a learning algorithm is more appropriate for this domain than
knowledge engineering or straight programming.
 There is no good existing theory of driving, but it is easy to collect sample
input/output pairs of the desired functional mapping.
 This argues for a learning algorithm, but not necessarily for neural nets.
 But driving is a continuous, noisy domain in which almost all of the input
features contribute some useful information; this means that neural nets are a
better choice than, say, decision trees.
33
 Of course, ALVINN and MANIAC are pure reflex agents, and cannot execute
maneuvers that are much more complex than lane-following, especially in the
presence of other traffic.
 Current research by Pomerleau and other members of the group is aimed at
combining ALVINN'S low-level expertise with higher-level symbolic
knowledge.
 Hybrid systems of this kind are becoming more common as AI moves into the
real (physical) world.
34
Expert Systems
Contents
 What is an Expert System?
 Why should we use Expert Systems?
 Early ES Systems
 General Structure of an ES
 Components of an Expert System.
 Conventional System vs. Expert System
 Human Expert Vs Expert System
 Limitations of Expert Systems
What is an Expert System?
 Human experts have
 a considerable knowledge about their areas of expertise
 Can learn from their experience
 Can do reasoning
 Can explain the solution
 Can restructure knowledge
 Can determine relevance
What is an Expert System?
 An Expert System (ES) is software that attempts to
reproduce the performance of one or more human experts,
typically in a specific problem domain
 a kind of software that simulates the problem-solving
behavior of a human expert of given domain.
 ES employs human knowledge represented in a computer
to solve problems that ordinarily require human expertise.
 ES imitate the expert’s reasoning processes to solve
specific problems
 An expert system compared with traditional computer :
Inference engine +Knowledge = Expert system
(Algorithm + Data structures= Program in traditional computer )
Why should we Use Expert System?
 Expert Systems:
 Capture and preserve irreplaceable human expertise
 Provide expertise needed at a number of locations at the
same time or in a hostile environment that is dangerous
to human health
 Provide unemotional objective solutions faster than
human experts
 Provide expertise that is expensive or rare
 Share human expertise with a large number of people
Early ES Systems
 First expert system, called DENDRAL, was developed in
the early 70's at Stanford University.
 DENDRAL
 the first knowledge intensive system
 Expert system used for chemical analysis to predict molecular
structure
 determining 3D structures of complex chemical compounds
 MYCIN
 Diagnose infectious diseases such as bacteremia and meningitis.
 Recommend antibiotics.
 Dosage adjusted for patient’s body weight.
 Name derived from antibiotics (suffix – “mycin”).
ES Systems
 SHINE: designed by NASA for monitoring, analyzing, and
diagnosing real-time and non-real-time systems
 Stock Market Prediction
 Apple's SIRI, a dialog system
 insurance company Blue Cross's automated insurance claim
processing system
 Some basic variants of expert systems
 a rule-based expert system
 fuzzy expert system
 neural expert system
 neuro-fuzzy expert system
General Structure of an ES
 Knowledge base—containing the knowledge about the system, its

functioning, rules of problem solving, etc,
 Data base—including the facts, which generally describe the domain and
the state of the problem to be solved
 Inference Engine—with the reasoning principles and conflict resolution
strategies
Components of an Expert System
a knowledge acquisition and learning module

Designing of an Expert System
 Human expert (can solve problems; we desire to solve the
problems without him/her),
 Knowledge engineer (can communicate with human expert
to obtain and model the knowledge that we need in the
system),
 Programmer (builds and maintains all the necessary
computer programs),
 User (wants to use expertise to solve problems).
Designing of an Expert System
Explanation in Reasoning
 ES answers two questions:
 WHY? means “why are you asking for a particular
information”.
 ES returns the current rule that is being fired.
 HOW? means “how did you reach the conclusion”.
 ES returns the sequence of rules that are fired to reach a
conclusion.
Conventional System vs. Expert System
Conventional System Expert System
Knowledge database and the processing
Knowledge and processing are
mechanism are two separate
combined in one unit.
components.
The program does not make errors

The Expert System may make a mistake.
(Unless error in programming).
The expert system is optimized on an

The system is operational only when
ongoing basis and can be launched with
fully developed.
a small number of rules.
Step by step execution according to Execution is done logically &

fixed algorithms is required. heuristically.
It can be functional with sufficient or

It needs full information.
insufficient information.
Human Expert Vs Expert System
Human Expert Expert System
Expert system’s resources are

Resources of human expert are costly.
inexpensive
Harder to move any data and files Easy to transfer
It is unpredictable But, it is consistent
Harder to keep document East to keep document
It leads to Perishable nature. But, it leads to permanently nature.

Applications
 Medical diagnosis
 Agriculture
 Education
 Environment
 Accounting - Loan analysis
 Coding
 Games
 Planning and scheduling - Airline scheduling & cargo
schedules
Limitations of the Expert System
 Limited to relatively narrow problems
 Cannot readily deal with “mixed” knowledge
 Errors in the knowledge base can lead to wrong decision
 Cannot refine own knowledge base or learn from experience
 Lack common sense
 Cannot make creative responses as human expert
 The maintenance cost of an expert system is too expensive
The End…
17

0 AIcompleteMerged

Uploaded by

Copyright:

Available Formats

You might also like

0 AIcompleteMerged

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

0 AIcompleteMerged

Uploaded by

Copyright:

Available Formats

Introduction to

♦ 1943: McCulloch & Pitts: model of neurons → Boolean circuit of the

The birth (1956):

♦ McCarthy (1927 -2011): 2 month, 10 man study of AI, to make machine

Early enthusiasm, great expectations (1952-1969):

♦ Geometry Theorem Prover, Logic Theorist, General Problem Solver,

A scene from the blocks world. S HRDLU (Winograd,1972) has just

A dose of reality (1966-1973):

♦ Translation of Russian scientific paper in context of Sputnik (Alpack

♦ Lighthill report (1973) most successful algorithms would halt on real

♦ expert systems: Dendral, Mycin (certainty factor)

AI becomes an industry (1980-present)

♦ 1981, the Japanese announced the ”Fifth Generation project, a 10-year

♦ hundreds of companies building expert systems, vision systems, robots,

The emergence of intelligent agents (1995-present)

1943 McCulloch & Pitts: Boolean circuit model of brain

What can AI do today?

Intelligent agents are supposed to

The agent function is an abstract mathematical description;

A vacuum-cleaner world with just two locations.

a simple agent function for the vacuum-cleaner world

For each possible percept sequence, a rational agent should select an

 task environment - PEAS (Performance, Environment, Actuators,

 Four basic kinds of agent programs

Problem Solving Agent

“formulate, search, execute” design for the agent

 This is the essence of search

Three common variants are the

Complexity is expressed in terms of three quantities:

Uninformed Search Strategies

In general, iterative deepening

 b is the branching factor

Informed Search Strategies

Values of h —straight-line distances to Bucharest

 f(n), estimated cost of the cheapest solution through n .

A∗ has the following properties:

 the graph-search version is optimal if h(n) is consistent

outcomes of possible action sequences

Data Information Knowledge

student’s results Test 1 Test 2 Test 3

Jan 100.2 107.38 133.49 180

profit (in lakhs rupee)

Feb 120.28 126.56 136.04 160

Apr 142.81 161.53 170.26

May 168.32 194.48 199.17 110

Jul 155.02 177.00 178.34

Aug 142.65 164.48 173.04

 Knowledge Representation models are often based on:

[Stench, Breeze, None, None, None]

Logical Agents – Predicate Logic

Roommate Carrying Umbrella

 “Squares neighboring the wumpus are smelly.”

 Constants (stand for objects) Sarah, A, B,...

Older (S1, 20)  Younger(S1, 20)

 Relationship between the quantifiers:

x, y, z, … can refer to multiple objects

 Everyone at IIITK is smart - x At(x, IIITK) => Smart(x)