Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

CSC422:

INTRODUCTION TO
ARTIFICIAL INTELLIGENCE

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 1


CSC422 INTRODUCTION TO ARTIFICIAL INTELLIGENCE

CONCEPT OF ARTIFICIAL INTELLIGENCE (AI)


Since the invention of computers or machines, their capability to perform various tasks went on
growing exponentially. Humans have developed the power of computer systems in terms of
their diverse working domains, their increasing speed, and reducing size with respect to time.
A branch of Computer Science named Artificial Intelligence pursues creating the computers or
machines as intelligent as human beings.

What is Artificial Intelligence?


Artificial Intelligence is a way of making a computer, a computer-controlled robot, or a
software think intelligently, in the similar manner the intelligent humans think.
AI is accomplished by studying how human brain thinks and how humans learn, decide, and
work while trying to solve a problem, and then using the outcomes of this study as a basis of
developing intelligent software and systems.

Goals of AI
 To Create Expert Systems − The systems which exhibit intelligent behavior, learn,
demonstrate, explain, and advice its users.
 To Implement Human Intelligence in Machines − Creating systems that understand,
think, learn, and behave like humans.
What Contributes to AI?
Artificial intelligence is a science and technology based on disciplines such as Computer
Science, Biology, Psychology, Linguistics, Mathematics, and Engineering. A major thrust of AI
is in the development of computer functions associated with human intelligence, such as
reasoning, learning, and problem solving.
Out of the following areas, one or multiple areas can contribute to build an intelligent system.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 2


Programming Without and With AI
The programming without and with AI is different in following ways −

Programming Without AI Programming With AI

A computer program without AI can answer A computer program with AI can


the specific questions it is meant to solve. answer the generic questions it is
meant to solve.

AI programs can absorb new


modifications by putting highly
independent pieces of information
Modification in the program leads to change in its
together. Hence you can modify
structure.
even a minute piece of
information of program without
affecting its structure.

Modification is not quick and easy. It may lead to affecting Quick and Easy program
the program adversely. modification.

Applications of AI
AI has been dominant in various fields such as −
 Gaming − AI plays crucial role in strategic games such as chess, poker, tic-tac-toe, etc.,
where machine can think of large number of possible positions based on heuristic
knowledge.
 Natural Language Processing − It is possible to interact with the computer that
understands natural lan guage spoken by humans.
 Expert Systems − There are some applications which integrate machine, software, and
special information to impart reasoning and advising. They provide explanation and
advice to the users.
 Vision Systems − These systems understand, interpret, and comprehend visual input on
the computer. For example,
o A spying airplane takes photographs, which are used to figure out spatial
information or map of the areas.
o Doctors use clinical expert system to diagnose the patient.
o Police use computer software that can recognize the face of criminal with the
stored portrait made by forensic artist.
 Speech Recognition − Some intelligent systems are capable of hearing and
comprehending the language in terms of sentences and their meanings while a human

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 3


talks to it. It can handle different accents, slang words, noise in the background, change
in human’s noise due to cold, etc.
 Handwriting Recognition − The handwriting recognition software reads the text written
on paper by a pen or on screen by a stylus. It can recognize the shapes of the letters and
convert it into editable text.
 Intelligent Robots − Robots are able to perform the tasks given by a human. They have
sensors to detect physical data from the real world such as light, heat, temperature,
movement, sound, bump, and pressure. They have efficient processors, multiple sensors
and huge memory, to exhibit intelligence. In addition, they are capable of learning from
their mistakes and they can adapt to the new environment.

Intelligent System
An intelligent system is the ability of a system to calculate, reason, perceive relationships and
analogies, learn from experience, store and retrieve information from memory, solve problems,
comprehend complex ideas, use natural language fluently, classify, generalize, and adapt new
situations.
Types of Intelligence
As described by Howard Gardner, an American developmental psychologist, the Intelligence
comes in multifold -
Intelligence Description Example

The ability to speak, recognize, and use


Linguistic intelligence mechanisms of phonology (speech sounds), Narrators, Orators
syntax (grammar), and semantics (meaning).

The ability to create, communicate with, and Musicians,


Musical intelligence understand meanings made of sound, Singers,
understanding of pitch, rhythm. Composers

The ability of use and understand


Logical-mathematical relationships in the absence of action or Mathematicians,
intelligence objects. Understanding complex and abstract Scientists
ideas.

The ability to perceive visual or spatial


Map readers,
information, change it, and re-create visual
Astronauts,
Spatial intelligence images without reference to the objects,
Physicists
construct 3D images, and to move and rotate
them.

Bodily-Kinesthetic The ability to use complete or part of the body


Players, Dancers
intelligence to solve problems or fashion products, control

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 4


over fine and coarse motor skills, and
manipulate the objects.

Intra-personal The ability to distinguish among one’s own


Gautam Buddhha
intelligence feelings, intentions, and motivations.

The ability to recognize and make distinctions Mass


Interpersonal intelligence among other people’s feelings, beliefs, and Communicators,
intentions. Interviewers
You can say a machine or a system is artificially intelligent when it is equipped with at least
one and at most all intelligences in it.

What is Intelligence composed of?


The intelligence is intangible. It is composed of −
 Reasoning
 Learning
 Problem Solving
 Perception
 Linguistic Intelligence

 Reasoning − It is the set of processes that enables us to provide basis for judgment,
making decisions, and prediction. There are broadly two types −

Inductive Reasoning Deductive Reasoning

It starts with a general statement and


It conducts specific observations to makes broad
examines the possibilities to reach a specific,
general statements.
logical conclusion.

Even if all of the premises are true in a statement, If something is true of a class of things in
inductive reasoning allows for the conclusion to be general, it is also true for all members of that
false. class.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 5


Example − "All women of age above 60 years
Example − "Nita is a teacher. Nita is studious.
are grandmothers. Shalini is 65 years.
Therefore, All teachers are studious."
Therefore, Shalini is a grandmother."

 Learning − It is the activity of gaining knowledge or skill by studying, practicing, being


taught, or experiencing something. Learning enhances the awareness of the subjects of
the study.
The ability of learning is possessed by humans, some animals, and AI-enabled systems.
Learning is categorized as −
o Auditory Learning − It is learning by listening and hearing. For example, students
listening to recorded audio lectures.
o Episodic Learning − To learn by remembering sequences of events that one has
witnessed or experienced. This is linear and orderly.
o Motor Learning − It is learning by precise movement of muscles. For example,
picking objects, Writing, etc.
o Observational Learning − To learn by watching and imitating others. For
example, child tries to learn by mimicking her parent.
o Perceptual Learning − It is learning to recognize stimuli that one has seen before.
For example, identifying and classifying objects and situations.
o Relational Learning − It involves learning to differentiate among various stimuli
on the basis of relational properties, rather than absolute properties. For
Example, Adding ‘little less’ salt at the time of cooking potatoes that came up
salty last time, when cooked with adding say a tablespoon of salt.
o Spatial Learning − It is learning through visual stimuli such as images, colors,
maps, etc. For Example, A person can create roadmap in mind before actually
following the road.
o Stimulus-Response Learning − It is learning to perform a particular behavior
when a certain stimulus is present. For example, a dog raises its ear on hearing
doorbell.
 Problem Solving − It is the process in which one perceives and tries to arrive at a desired
solution from a present situation by taking some path, which is blocked by known or
unknown hurdles.
Problem solving also includes decision making, which is the process of selecting the
best suitable alternative out of multiple alternatives to reach the desired goal are
available.
 Perception − It is the process of acquiring, interpreting, selecting, and organizing sensory
information.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 6


Perception presumes sensing. In humans, perception is aided by sensory organs. In the
domain of AI, perception mechanism puts the data acquired by the sensors together in a
meaningful manner.
 Linguistic Intelligence − It is one’s ability to use, comprehend, speak, and write the
verbal and written language. It is important in interpersonal communication.
Difference between Human and Machine Intelligence
 Humans perceive by patterns whereas the machines perceive by set of rules and data.
 Humans store and recall information by patterns; machines do it by searching algorithms.
For example, the number 40404040 is easy to remember, store, and recall as its pattern
is simple.
 Humans can figure out the complete object even if some part of it is missing or distorted;
whereas the machines cannot do it correctly.
Artificial Intelligence - Research Areas
The domain of artificial intelligence is huge in breadth and width. While proceeding, we consider
the broadly common and prospering research areas in the domain of AI.

Speech and Voice Recognition


These both terms are common in robotics, expert systems and natural language processing.
Though these terms are used interchangeably, their objectives are different.

Speech Recognition Voice Recognition

The speech recognition aims at understanding The objective of voice recognition is to


and comprehending WHAT was spoken. recognize WHO is speaking.

It is used in hand-free computing, map, or menu It is used to identify a person by analyzing


navigation. its tone, voice pitch, and accent, etc.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 7


Machine does not need training for Speech This recognition system needs training as it
Recognition as it is not speaker dependent. is person oriented.

Speaker independent Speech Recognition Speaker dependent Speech Recognition


systems are difficult to develop. systems are comparatively easy to develop.

Real Life Applications of Research Areas


There is a large array of applications where AI is serving common people in their day-to-day
lives:

S/No. Research Areas Real Life Application

1
Expert Systems
Examples − Flight-tracking systems, Clinical systems.

2
Natural Language Processing
Examples: Google Now feature, speech recognition,
Automatic voice output.

3
Neural Networks
Examples − Pattern recognition systems such as face
recognition, character recognition, handwriting
recognition.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 8


4
Robotics
Examples − Industrial robots for moving, spraying,
painting, precision checking, drilling, cleaning,
coating, carving, etc.

5
Fuzzy Logic Systems
Examples − Consumer electronics, automobiles, etc.

PROBLEM SOLVING IN ARTIFICIAL INTELIGENCE (AI)


Problem-solving in AI includes various techniques like efficient algorithms, heuristics, and
performing root cause analysis to obtain desirable solutions. For a given problem in AI, there
may be various solutions by different heuristics and at the same time, there are problems with
unique solutions. These all depend on the nature of the problem to be solved.

Characteristics of problem in AI
Each problem given to an AI is with different aspects of representation and explanation. The
given problem has to be analyzed along with several dimensions to choose the most acceptable
method to solve. Some of the key features of a problem are listed below.

 Can the problem decompose into sub problems?


 Can any of the solution steps be ignored?
 Is the given problem universally predictable?
 Can we select a good solution for the given problem without comparing it to all the
possible solutions?
 Whether the desired output is a state of the world or a path to a state?
 Did the problem require a large amount of knowledge to solve?
 Does it require any interaction with computers and humans?

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 9


The above-listed characteristics of a problem are called 7- problem characteristics. The solution
for the given problem must take place under these characteristics. Problems in AI can basically
be divided into two types. Toy Problems and Real-World Problems.

Toy Problem:
It can also be called a puzzle-like problem which can be used as a way to explain a more general
problem-solving technique. For testing and demonstrating methodologies, or to compare the
performance of different algorithms, Toy problems can be used. Toy problems are often useful in
providing divination about specific phenomena in complicated problems. Large complicated
problems are divided into many smaller toy problems that can be understood in detail easily.
Sliding-block puzzles, N-Queens problem, Tower of Hanoi are some examples.

Real-World Problem:
As with the name, it is a problem based on the real world. Real-world problems require a
solution. It doesn't depend on descriptions, but with a general formulation. Online shopping,
Fraud detection, Medical diagnosis are some examples of real-world problems in AI.

Steps performed in problem-solving

 Goal formulation: The first step in problem-solving is to identify the problem. It involves
selecting the steps to formulate the perfect goal out of multiple goals and selecting
actions to achieve the goal.
 Problem formulation: The most important step in problem-solving is choosing the action
to be taken to achieve the goal formulated.
 Initial State: The starting state of the agent towards the goal.
 Actions: The list of the possible actions available to the agent.
 Transition Model: What each action does is described.
 Goal Test: The given state is tested whether it is a goal state.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 10


 Path cost: A numeric cost to each path that follows the goal is assigned. It reflects its
performance. Solution with the lowest path cost is the optimal solution.

Example for the problem-solving procedure


Eight queens puzzle

The problem here is to place eight chess queens on an 8*8 chessboard in a way that no queens
will threaten the other. If two queens will come in the same row, column, or diagonal one will
attack another.

Consider the Incremental formulation for this problem: It starts from an empty state and the
operator expands a queen at each step.

Following are the steps involved in this formulation:


 States: Arranging 0 to 8 queens on the chessboard.
 Initial State: An empty chessboard
 Actions: Adding a queen to any of the empty boxes on the chessboard.
 Transition model: Returns the new state of the chessboard with the queen added in a
box.
 Goal test: Checks whether 8-queens are kept on the chessboard without any possibility of
attack.
 Path cost: Path costs are neglected because only final states are counted.

Problem-solving methods in Artificial Intelligence


Let us discuss the techniques like Heuristics, Algorithms, Root cause analysis used by AI as
problem-solving methods to find a desirable solution for the given problem.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 11


1. Algorithms
A problem-solving algorithm can be said as a procedure that is guaranteed to solve if its steps are
strictly followed.

Let's have a simple example to get to know what it means:

A person wants to find a book on display among the vast collections of the library. He does not
know where the book is kept. By tracing a sequential examination of every book displayed in
every rack of the library, the person will eventually find the book. But the approach will
consume a considerable amount of time. Thus an algorithmic approach will succeed but are often
slow.

Types of AI Algorithms
 Regression Algorithms.
 Instance-Based Algorithms.
 Decision Tree Algorithms.
 Clustering Algorithms.
 Association Rule Learning Algorithms.
 Artificial Neural Network Algorithms.
 Deep Learning Algorithms.
 Search Algorithms

2. Heuristics
A problem-solving heuristic can be said as an informal, ideational, impulsive procedure that
leads to the desired solution in some cases only. The fact is that the outcome of a heuristic
operation is unpredictable. Using a heuristic approach may be more or less effective than using
an algorithm.

Consider the same example discussed above. If he had an idea of where to look for the book, a
great deal of time could be saved. This can be said as searching heuristically. But if one happens
to be wrong on the first trial, He has to try another heuristic. The frequently used problem-
solving heuristics are:

Working forward
It is a forward approach problem-solving solution. In this method, the problem is solved from the
beginning itself and working to the end.

Working backward
It is a backward approach problem-solving solution. Here the problem is solved from the
endpoint or goal to steps that led to the goal.

Means-ends analysis
A problem-solving technique where a mixture of the above two directions. This method is
appropriate for solving complex and large problems. MEA is a strategy to control the searching
procedure in problem-solving. It is centered on evaluating the difference between the current

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 12


state and the goal state. First, the difference between the initial state and the final state is
calculated. Then select various operators for each difference. The application of these operators
will reduce the difference between the current state and the goal state.

Generate-and-test
The Generate-and-test method is a problem-solving heuristics that involve an alternate set of
actions in a random method. Each alternate method is to check whether it will solve the problem.
It ensures that the selected best possible solution is checked against the generated possible
solutions.

Difference between algorithm and heuristics


Algorithm Heuristics

-An automated solution to a problem Arbitrary choices or educated guesses


Deterministic and proven to grant an optimal No proof of correctness, may not grant optimal
result results.
-Containing a finite set of instructions to solve a Applied to improve the running time of
problem. algorithms.

The terms heuristics and algorithms overlap somewhat in AI. A heuristic may be a subroutine
that can be used to determine where to look first for an optimal algorithm. Heuristic algorithms
can be listed under the categories of algorithms. In a sense of heuristics are algorithms, heuristic
makes a guessing approach to solve the problem, granting a good enough answer rather than
finding the best possible outcome. The level of indirection is the main difference between the
two.

3. Root Cause Analysis


Like the name itself, it is the process of identifying the root cause of the problem. The root cause
of the problem is analyzed to identify the appropriate solutions. A collection of principles,
techniques, and methodologies are used to identify the root causes of a problem. RCA can
identify an issue in the first place.

The first goal of RCA is to identify the root cause of the problem. The second goal is to
understand how to fix the underlying issues within the root cause. The third goal is to prevent
future issues or to repeat the success.

The core principles of RCA are:


 Focus on correcting the root causes.
 Treating symptoms for short-term relief.
 There can be multiple root causes.
 Focus on HOW? and WHY?

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 13


 Root cause claims are kept back up with concrete cause-effect.
 Provides information to get a corrective course of action.
 Analyze how to avoid a root cause in the future.

PROBLEM SOLVING BY SEARCH STRATEGY


In AI, the universal technique for problem-solving is searching. For a given problem, it is needed
to think of all possible ways to attain the existing goal state from its initial state. Searching in AI
can be defined as a process of finding the solution for a given set of problems. A searching
strategy is used to specify which path is to be selected to reach the solution.

Terminologies in Search Algorithms


 Search: A step-by-step procedure to solve a given search problem. Three main factors of
a search problem:
o Search Space: The set of possible solutions a system may have.
o Start state: The state where the agent starts the search.
o Goal test: A function that monitors the current state and returns whether the goal
is achieved or not.
 Search Tree: The representation of the search problem in the form of a tree. The root of
the tree is the initial state of the problem.
 Actions: The description of all actions to the agent.
 Transition model: Used to represent the description of the action.
 Solution: Sequence of actions that leads from the initial node to the final node.
 Path Cost: A function that assigns a numeric cost to each path.
 Optimal Solution: The solution with the lowest cost.
 Problem Space: The environment in which the search takes place.
 Depth of a problem: Length of the shortest path from the initial state to the final state.

Properties of Search Algorithms


There are four essential properties for search algorithms to compare the efficiencies.
1. Completeness: If the algorithm guarantees to return a solution, if there exists at
least one for any random input then the algorithm can be said to be complete.
2. Optimality: If the algorithm found the best solution (lowest path cost) among the
others, then that solution can be said as the optimal solution. The ability to find the
optimal solution is called optimality.
3. Time complexity: The time is taken by the algorithm to complete a task
4. Space Complexity: the required storage space at any point during the search.
Search algorithms can be classified mainly into two based on the search problems;

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 14


.
Uninformed search algorithms
The uninformed search algorithm does not have any domain knowledge such as closeness,
location of the goal state, etc. it behaves in a brute-force way.
It only knows the information about how to traverse the given tree and how to
find the goal state. This algorithm is also known as the Blind search algorithm
or Brute -Force algorithm. The uninformed search strategies are of six types. They are-

 Breadth-first search
 Depth-first search
 Depth-limited search
 Iterative deepening depth-first search
 Bidirectional search
 Uniform cost search
.
1. Breadth-first search
It generally starts from the root node and examines the neighbor nodes and then moves to the
next level. It uses First-in First-out (FIFO) strategy as it gives the shortest path to achieving the
solution. The BFS is used where the given problem is very small and space complexity is not
considered.
Now, consider the following tree.

Let’s take node A as the start state and node F as the goal state.
The BFS algorithm starts with the start state and then goes to the next level and visits the node
until it reaches the goal state.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 15


In this example, it starts from A and then travel to the next level and visits B and C and then
travel to the next level and visits D, E, F and G. Here, the goal state is defined as F. So, the
traversal will stop at F.

The path of traversal is:


A —-> B —-> C —-> D —-> E —-> F

Advantages of BFS
 BFS will never be trapped in any unwanted nodes.
 If the graph has more than one solution, then BFS will return the optimal solution which
provides the shortest path.

Disadvantages of BFS
 BFS stores all the nodes in the current level and then go to the next level. It requires a lot of
memory to store the nodes.
 BFS takes more time
to reach the goal state which is far away.

2. Depth-first search
The depth-first search uses Last-in, First-out (LIFO) strategy and hence it can be implemented by
using stack. DFS uses backtracking. That is, it starts from the initial state and explores each path
to its greatest depth before it moves to the next path.

DFS will follow


Root node —-> Left node —-> Right node

Now, consider the same example tree mentioned above.


Here, it starts from the start state A and then travels to B and then it goes to D. After reaching
D, it backtracks to B. B is already visited, hence it goes to the next depth E and then backtracks
to B. as it is already visited, it goes back to A. A is already visited. So, it goes to C and then to F.
F is our goal state and it stops there.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 16


The path of traversal is:
A —-> B —-> D —-> E —-> C —-> F

Advantages of DFS
 It takes lesser memory as compared to BFS.
 The time complexity is lesser when compared to BFS.
 DFS does not require much more search.

Disadvantages of DFS
 DFS does not always guarantee to give a solution.
 As DFS goes deep down, it may get trapped in an infinite loop.

3. Depth-limited search
Depth-limited works similarly to depth-first search. The difference here is that depth-limited
search has a pre-defined limit up to which it can traverse the nodes. Depth-limited search solves
one of the drawbacks of DFS as it does not go to an infinite path.
DLS ends its traversal if any of the following conditions exits.

Standard Failure
It denotes that the given problem does not have any solutions.

Cut off Failure Value


It indicates that there is no solution for the problem within the given limit.
Now, consider the same example.
Let’s take A as the start node and C as the goal state and limit as 1.
The traversal first starts with node A and then goes to the next level 1 and the goal state C is
there. It stops the traversal.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 17


The path of traversal is:
A —-> C

If we give C as the goal node and the limit as 0, the algorithm will not return any path as the goal
node is not available within the given limit.
If we give the goal node as F and limit as 2, the path will be A, C, F.

Advantages of DLS
 It takes lesser memory when compared to other search techniques.

Disadvantages of DLS
 DLS may not offer an optimal solution if the problem has more than one solution.
 DLS also encounters incompleteness.

4. Iterative deepening depth-first search


Iterative deepening depth-first search is a combination of depth-first search and breadth-first
search. IDDFS find the best depth limit by gradually adding the limit until the defined goal state
is reached.

With the same example tree above;

Consider, A as the start node and E as the goal node. Let the maximum depth be 2.
The algorithm starts with A and goes to the next level and searches for E. If not found, it goes to
the next level and finds E.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 18


The path of traversal is
A —-> B —-> E

Advantages of IDDFS
 IDDFS has the advantages of both BFS and DFS.
 It offers fast search and uses memory efficiently.

Disadvantages of IDDFS
 It does all the works
of the previous stage again and again.

5. Bidirectional search
The bidirectional search algorithm is completely different from all other search strategies. It
executes two simultaneous searches called forward-search and backwards-search and reaches the
goal state. Here, the graph is divided into two smaller sub-graphs. In one graph, the search is
started from the initial start state and in the other graph, the search is started from the goal state.
When these two nodes intersect each other, the search will be terminated.
Bidirectional search requires both start and goal start to be well defined and the branching factor
to be the same in the two directions.
Consider the below graph.

Here, the start state is E and the goal state is G. In one sub-graph, the search starts from E and in
the other, the search starts from G. E will go to B and then A. G will go to C and then A. Here,
both the traversal meets at A and hence the traversal ends.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 19


The path of traversal is
E —-> B —-> A —-> C —-> G

Uniform cost search


Uniform cost search is considered the best search algorithm for a weighted graph or graph with
costs. It searches the graph by giving maximum priority to the lowest cumulative cost. Uniform
cost search can be implemented using a priority queue. Consider the below graph where each
node has a pre-defined cost.

Here, S is the start node and G is the goal node. From S, G can be reached in the following ways.
S, A, E, F, G -> 19
S, B, E, F, G -> 18
S, B, D, F, G -> 19
S, C, D, F, G -> 23
Here, the path with the least cost is S, B, E, F, G.

Advantages of UCS
 This algorithm is optimal as the selection of paths is based on the lowest cost.

Disadvantages of UCS
The algorithm does not consider how many steps it goes to reach the lowest path. This may result
in an infinite loop also.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 20


Informed search algorithms
The informed search algorithm is also called heuristic search or directed search. In contrast to
uninformed search algorithms, informed search algorithms require details such as distance to
reach the goal, steps to reach the goal, cost of the paths which makes this algorithm more
efficient.

Here, the goal state can be achieved by using the heuristic function.
The heuristic function is used to achieve the goal state with the lowest cost possible. This
function estimates how close a state is to the goal.
Let’s discuss some of the informed search strategies.

1. Greedy best-first search algorithm


Greedy best-first search uses the properties of both depth-first search and breadth-first search.
Greedy best-first search traverses the node by selecting the path which appears best at the
moment. The closest path is selected by using the heuristic function.
Consider the below graph with the heuristic values.

Here, A is the start node and H is the goal node.


Greedy best-first search first starts with A and then examines the next neighbour B and C. Here,
the heuristics of B is 12 and C is 4. The best path at the moment is C and hence it goes to C.
From C, it explores the neighbours F and G. the heuristics of F is 8 and G is 2. Hence it goes to
G. From G, it goes to H whose heuristic is 0 which is also our goal state.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 21


The path of traversal is
A —-> C —-> G —-> H

Advantages of Greedy best-first search


 Greedy best-first search is more efficient compared with breadth-first search and depth-first
search.

Disadvantages of Greedy best-first search


 In the worst-case scenario, the greedy best-first search algorithm may behave like an unguided
DFS.
 There are some possibilities for greedy best-first to get trapped in an infinite loop.
 The algorithm is not
an optimal one.
Next, let’s discuss the other informed search algorithm called the A* search algorithm.

2. A* search algorithm
A* search algorithm is a combination of both uniform cost search and greedy best-first search
algorithms. It uses the advantages of both with better memory usage. It uses a heuristic function
to find the shortest path. A* search algorithm uses the sum of both the cost and heuristic of the
node to find the best path. Consider the following graph with the heuristics values as follows.

Let A be the start node and H be the goal node.


First, the algorithm will start with A. From A, it can go to B, C, H.

Note the point that A* search uses the sum of path cost and heuristics value to determine the
path.

Here, from A to B, the sum of cost and heuristics is 1 + 3 = 4.


From A to C, it is 2 + 4 = 6.
From A to H, it is 7 + 0 = 7.
Here, the lowest cost is 4 and the path A to B is chosen. The other paths will be on hold.
Now, from B, it can go to D or E.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 22


From A to B to D, the cost is 1 + 4 + 2 = 7.
From A to B to E, it is 1 + 6 + 6 = 13.
The lowest cost is 7. Path A to B to D is chosen and compared with other paths which are on
hold.
Here, path A to C is of less cost. That is 6.
Hence, A to C is chosen and other paths are kept on hold.
From C, it can now go to F or G.
From A to C to F, the cost is 2 + 3 + 3 = 8.
From A to C to G, the cost is 2 + 2 + 1 = 5.
The lowest cost is 5 which is also lesser than other paths which are on hold. Hence, path A to G
is chosen.
From G, it can go to H whose cost is 2 + 2 + 2 + 0 = 6.
Here, 6 is lesser than other paths cost which is on hold.
Also, H is our goal state. The algorithm will terminate here.

The path of traversal is


A —-> C —-> G —-> H

Search algorithms are used in games, stored databases, virtual search spaces, quantum
computers, and so on. In this article, we have discussed some of the important search strategies
and how to use them to solve the problems in AI and this is not the end. There are several
algorithms to solve any problem. Nowadays, AI is growing rapidly and applies to many real-life
problems. Keep learning! Keep practicing!

FORMS OF LEARNING IN ARTIFICIAL INTELLIGENCE

Machine Learning
At its core, machine learning is simply a way of achieving AI. Machine learning is an application
of artificial intelligence (AI) that enables systems to learn and advance based on experience
without being clearly programmed. Machine learning focuses on the development of computer
programs that can access data and use it for their own learning. There are 4 types of machine
learning.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 23


1. Supervised learning
2. Unsupervised learning
3. Semi-supervised learning
4. Reinforced learning

1. Supervised learning

Supervised machine learning can take what it has learned in the past and apply that to new data
using labelled examples to predict future patterns and events. It learns by explicit example.
Supervised learning requires that the algorithm’s possible outputs are already known and that the
data used to train the algorithm is already labelled with correct answers. It’s like teaching a child
that 2+2=4 or showing an image of a dog and teaching the child that it is called a dog. The
approach to supervised machine learning is essentially the same – it is presented with all the
information it needs to reach pre-determined conclusions. It learns how to reach the conclusion,
just like a child would learn how to reach the total of ‘5’ and the few, pre-determined ways to get
there, for example, 2+3 and 1+4. If you were to present 6+3 as a way to get to 5, that would be
determined as incorrect. Errors would be found and adjusted.

The algorithm learns by comparing its actual output with correct outputs to find errors. It then
modifies the model accordingly.Supervised learning is commonly used in applications
where historical data predicts likely future events. Using the previous example, if 6+3 is the most
common erroneous way to get to 5, the machine can predict that when someone inputs 6+3, after
the correct answer of 9, 5 would be the second most commonly expected result. We can also
consider an everyday example - it can foresee when credit card transactions are likely to be

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 24


fraudulent or which insurance customer is more likely to put forward a claim. Supervised
learning is further divided into:
1. Classification
2. Regression

2. Unsupervised learning
Supervised learning tasks find patterns where we have a dataset of “right answers” to learn
from. Unsupervised learning tasks find patterns where we don’t. This may be because the “right
answers” are unobservable, or infeasible to obtain, or maybe for a given problem, there isn’t
even a “right answer” per se. Unsupervised learning is used against data without any historical
labels. The system is not given a pre-determined set of outputs or correlations between inputs
and outputs or a "correct answer." The algorithm must figure out what it is seeing by itself, it has
no storage of reference points. The goal is to explore the data and find some sort of patterns of
structure. Unsupervised learning works well when the data is transactional. For example,
identifying pockets of customers with similar characteristics who can then be targeted in
marketing campaigns. Unsupervised machine learning is a more complex process and has been
used far fewer times than supervised machine learning. But it’s exactly for this reason that there
is so much buzz around the future of AI. Advances in unsupervised ML are seen as the future of
AI because it moves away from narrow AI and closer to AGI (‘artificial general intelligence’ that
we discussed a few paragraphs earlier). If you’ve ever heard someone talking about computers
teaching themselves, this is essentially what they are referring to. In unsupervised learning,
neither a training data set nor a list of outcomes is provided. The AI enters the problem blind –
with only its faultless logical operations to guide it. Imagine yourself as a person that has never
heard of or seen any sport being played. You get taken to a football game and left to figure out
what it is that you are observing. You can’t refer to your knowledge of other sports and try to
draw up similarities and differences that will eventually boil down to an understanding of
football. You have nothing but your cognitive ability. Unsupervised learning places the AI in an
equivalent of this situation and leaves it to learn using only it’s on/off logic mechanisms that are
used in all computer systems.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 25


3. Semi-supervised learning (SSL)
Semi-supervised learning falls somewhere in the middle of supervised and unsupervised
learning. It is used because many problems that AI is used to solving require a balance of both
approaches. In many cases the reference data needed for solving the problem is available, but it
is either incomplete or somehow inaccurate. This is when semi-supervised learning is summoned
for help since it is able to access the available reference data and then use unsupervised learning
techniques to do its best to fill the gaps. Unlike supervised learning which uses labelled data and
unsupervised which is given no labelled data at all, SSL uses both. More often than not the scales
tip in favour of unlabelled data since it is cheaper and easier to acquire, leaving the volume of
available labelled data in the minority. The AI learns from the labelled data to then make a
judgment on the unlabelled data and find patterns, relationships and structures.SSL is also useful
in reducing human bias in the process. A fully labelled, supervised learning AI has been labelled
by a human and thus poses the risk of results potentially being skewed due to improper labelling.
With SSL, including a lot of unlabelled data in the training process often improves the precision
of the end result while time and cost are reduced. It enables data scientists to access and use lots
of unlabelled data without having to face the insurmountable task of assigning information and
labels to each one.

4. Reinforcement learning
Reinforcement learning is a type of dynamic programming that trains algorithms using a system
of reward and punishment. A reinforcement learning algorithm, or agent, learns by interacting
with its environment. It receives rewards by performing correctly and penalties for doing so
incorrectly. Therefore, it learns without having to be directly taught by a human – it learns by
seeking the greatest reward and minimizing penalty. This learning is tied to a context because
what may lead to maximum reward in one situation may be directly associated with a penalty in
another. This type of learning consists of three components: the agent (the AI learner/decision
maker), the environment (everything the agent has interaction with) and actions (what the agent
can do). The agent will reach the goal much faster by finding the best way to do it – and that is
the goal - maximizing the reward, minimizing the penalty and figuring out the best way to do so.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 26


Machines and software agents learn to determine the perfect behavior within a specific context,
to maximize its performance and reward. Learning occurs via reward feedback which is known
as the reinforcement signal. An everyday example of training pets to relieve themselves outside
is a simple way to illustrate this. The goal is getting the pet into the habit of going outside rather
than in the house. The training then involves rewards and punishments intended for the pet’s
learning. It gets a treat for going outside or has its nose rubbed in its mess if it fails to do so.
Reinforcement learning tends to be used for gaming, robotics and navigation. The algorithm
discovers which steps lead to the maximum rewards through a process of trial and error. When
this is repeated, the problem is known as a Markov Decision Process.Facebook's News Feed is
an example most of us will be able to understand. Face book uses machine learning to
personalize people’s feeds. If you frequently read or "like" a particular friend's activity, the News
Feed will begin to bring up more of that friend's activity more often and nearer to the top. Should
you stop interacting with this friend’s activity in the same way, the data set will be updated and
the News Feed will consequently adjust.

INTELLIGENT AGENT/AGENTS IN ARTIFICIAL INTELLIGENCE

What is an Agent?
An agent can be viewed as anything that perceives its environment through sensors and acts
upon that environment through actuators.
For example, human being perceives their surroundings through their sensory organs known as
sensors and take actions using their hands, legs, etc., known as actuators.

Diagrammatic Representation of an Agent

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 27


Intelligent Agent
An intelligent agent is a goal-directed agent. It perceives its environment through its sensors
using the observations and built-in knowledge, acts upon the environment through its actuators.

Rational Agent
A rational agent is an agent which takes the right action for every perception. By doing so, it
maximizes the performance measure, which makes an agent be the most successful.
Note: There is a slight difference between a rational agent and an intelligent agent.

Omniscient Agent
An omniscient agent is an agent which knows the actual outcome of its action in advance.
However, such agents are impossible in the real world.

Note: Rational agents are different from Omniscient agents because a rational agent tries to get
the best possible outcome with the current perception, which leads to imperfection. A chess
AI can be a good example of a rational agent because, with the current action, it is not possible to
foresee every possible outcome whereas a tic-tac-toe AI is omniscient as it always knows the
outcome in advance.

Software Agents
It is a software program which works in a dynamic environment. These agents are also known
as Softbots because all body parts of software agents are software only. For example, video
games, flight simulator, etc.

Behavior of an Agent
Mathematically, an agent behavior can be described by an:

 Agent Function: It is the mapping of a given percept sequence to an action. It is an


abstract mathematical explanation.
 Agent Program: It is the practical and physical implementation of the agent function.

For example, an automatic hand-dryer detects signals (hands) through its sensors. When we
bring hands nearby the dryer, it turns on the heating circuit and blows air. When the signal
detection disappears, it breaks the heating circuit and stops blowing air.

Rationality of an agent
It is expected from an intelligent agent to act in a way that maximizes its performance measure.
Therefore, the rationality of an agent depends on four things:
 The performance measure which defines the criterion of success.
 The agent's built-in knowledge about the environment.
 The actions that the agent can perform.
 The agent’s percept sequence until now.

For example: score in exams depends on the question paper as well as our knowledge.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 28


Note: Rationality maximizes the expected performance, while perfection maximizes the actual
performance which leads to omniscience.

Task Environment
A task environment is a problem to which a rational agent is designed as a solution.
Consequently, in 2003, Russell and Norvig introduced several ways to classify task
environments. However, before classifying the environments, we should be aware of the
following terms:

 Performance Measure: It specifies the agent’s sequence of steps taken to achieve its
target by measuring different factors.
 Environment: It specifies the interaction of the agent with different types of
environment.
 Actuators: It specifies the way the agent affects the environment by taking expected
actions.
 Sensors: It specifies the way the agent gets information from its environment.

These terms acronymically called as PEAS (Performance measure, Environment, Actuators,


Sensors). To understand PEAS terminology in more detail, let’s discuss each element in the
following example:

Agent Type Performance Environment Actuators Sensors

Safe, fast, correct Steering, horn, Cameras, GPS,


Taxi Driver Roads, traffic
destination breaks speedometer

PEAS summary for an automated taxi driver

Properties/ Classification of Task Environment

Fully Observable vs. Partially Observable:


When an agent’s sensors allow access to complete state of the environment at each point of time,
then the task environment is fully observable, whereas, if the agent does not have complete and
relevant information of the environment, then the task environment is partially observable.
Example: In the Checker Game, the agent observes the environment completely while in Poker
Game, the agent partially observes the environment because it cannot see the cards of the other
agent.
Note: Fully Observable task environments are convenient as there is no need to maintain the
internal state to keep track of the world.

Single-agent vs. Multiagents


When a single agent works to achieve a goal, it is known as Single-agent, whereas when two or
more agents work together to achieve a goal, they are known as Multiagents.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 29


Example: Playing a crossword puzzle – single agent Playing chess-multiagents (requires two
agents)

Deterministic vs. Stochastic


If the agent's current state and action completely determine the next state of the environment,
then the environment is deterministic whereas if the next state cannot be determined from the
current state and action, then the environment is Stochastic.

Example: Image analysis – Deterministic


Taxi driving – Stochastic (cannot determine the traffic behavior)
Note: If the environment is partially observable, it may appear as Stochastic

Episodic vs. Sequential


If the agent’s episodes are divided into atomic episodes and the next episode does not depend on
the previous state actions, then the environment is episodic, whereas, if current actions may
affect the future decision, such environment is sequential.

Example: Part-picking robot – Episodic


Chess playing – Sequential

Static vs. Dynamic


If the environment changes with time, such an environment is dynamic; otherwise, the
environment is static.
Example: Crosswords Puzzles have a static environment while the Physical world has a dynamic
environment.

Discrete vs. Continuous


If an agent has the finite number of actions and states, then the environment is discrete otherwise
continuous.
Example: In Checkers game, there is a finite number of moves – Discrete
A truck can have infinite moves while reaching its destination – Continuous.

Known vs. Unknown


In a known environment, the agents know the outcomes of its actions, but in an unknown
environment, the agent needs to learn from the environment in order to make good decisions.
Example: A tennis player knows the rules and outcomes of its actions while a player needs to
learn the rules of a new video game.

Note: A known environment is partially observable, but an unknown environment is fully


observable.

Structure of agents
The goal of artificial intelligence is to design an agent program which implements an agent
function i.e., mapping from percepts into actions. A program requires some computer devices
with physical sensors and actuators for execution, which is known as architecture.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 30


Therefore, an agent is the combination of the architecture and the program i.e.
agent = architecture + program

Note: The difference between the agent program and agent function is that an agent program
takes the current percept as input, whereas an agent function takes the entire percept history.

Types of Agent Programs


Varying in the level of intelligence and complexity of the task, the following four types of agents
are there:
 Simple reflex agents: It is the simplest agent which acts according to the current percept
only, pays no attention to the rest of the percept history. The agent function of this type
relies on the condition-action rule – "If condition, then action." It makes correct decisions
only if the environment is fully observable. These agents cannot ignore infinite loop
when the environment is partially observable but can escape from infinite loops if the
agents randomize its actions.

Example: iDraw, a drawing robot which converts the typed characters into writing without
storing the past data.
Note: Simple reflex agents do not maintain the internal state and do not depend on the percept
theory.

 Model-based agent: These type of agents can handle partially observable environments
by maintaining some internal states. The internal state depends on the percept history,
which reflects at least some of the unobserved aspects of the current state. Therefore, as
time passes, the internal state needs to be updated which requires two types of knowledge

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 31


or information to be encoded in an agent program i.e., the evolution of the world on its
own and the effects of the agent's actions.

Example: When a person walks in a lane, he maps the pathway in his mind.
 Goal-based agents: It is not sufficient to have the current state information unless the
goal is not decided. Therefore, a goal-based agent selects a way among multiple
possibilities that helps it to reach its goal.

Note: With the help of searching and planning (subfields of AI), it becomes easy for the Goal-
based agent to reach its destination.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 32


 Utility-based agents: These types of agents are concerned about the performance
measure. The agent selects those actions which maximize the performance measure and
devote towards the goal.

Example: The main goal of chess playing is to 'check-and-mate' the king, but the player
completes several small goals previously.

Note: Utility-based agents keep track of its environment, and before reaching its main goal, it
completes several tiny goals that may come in between the path.

 Learning agents: The main task of these agents is to teach the agent machines to operate
in an unknown environment and gain as much knowledge as they can. A learning agent is
divided into four conceptual components:

o Learning element: This element is responsible for making improvements.


o Performance element: It is responsible for selecting external actions according to
the percepts it takes.
o Critic: It provides feedback to the learning agent about how well the agent is
doing, which could maximize the performance measure in the future.
o Problem Generator: It suggests actions which could lead to new and informative
experiences.
Example: Humans learn to speak only after taking birth.
Note: The objective of a Learning agent is to improve the overall performance of the agent.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 33


Working of an agent program’s components
The function of agent components is to answer some basic questions like “What is the world like
now?”, "what do my actions do?” etc.
We can represent the environment inherited by the agent in various ways by distinguishing on an
axis of increasing expressive power and complexity as discussed below:

 Atomic Representation: Here, we cannot divide each state of the world. So, it does not
have any internal structure. Search, and game-playing, Hidden Markov Models, and
Markov decision process all work with the atomic representation.
 Factored Representation: Here, each state is split into a fixed set of attributes or
variables having a value. It allows us to represent uncertainty. Constraint satisfaction,
propositional logic, Bayesian networks, and machine learning algorithms work with the
Factored representation.

Note: Two different factored states can share some variables like current GPS location,
but two different atomic states cannot do so.

 Structured Representation: Here, we can explicitly describe various and varying


relationships between different objects which exist in the world. Relational databases
and first-order logic, first-order probability models, natural language
understandings underlie structured representation.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 34


CONCEPT OF DEEP LEARNING

Deep Learning is computer software that mimics the network of neurons in a brain. It is a subset
of machine learning based on artificial neural networks with representation learning. It is called
deep learning because it makes use of deep neural networks. This learning can be supervised,
semi-supervised or unsupervised.

In deep learning, models use different layers to learn and discover insights from the data. Some
popular applications of deep learning are self-driving cars, language translation, natural language
processing, etc. Some popular deep learning models are:

o Convolutional Neural Network


o Recurrent Neural Network
o Autoencoders
o Classic Neural Networks, etc.

Deep learning drives many artificial intelligence (AI) applications and services that improve
automation, performing analytical and physical tasks without human intervention. Deep learning
technology lies behind everyday products and services (such as digital assistants, voice-enabled
TV remotes, and credit card fraud detection) as well as emerging technologies (such as self-
driving cars).

Deep learning vs. machine learning


If deep learning is a subset of machine learning, how do they differ? Deep learning distinguishes
itself from classical machine learning by the type of data that it works with and the methods in
which it learns.

Machine learning algorithms leverage structured, labeled data to make predictions—meaning


that specific features are defined from the input data for the model and organized into tables.
This doesn’t necessarily mean that it doesn’t use unstructured data; it just means that if it does, it
generally goes through some pre-processing to organize it into a structured format.

Deep learning eliminates some of data pre-processing that is typically involved with machine
learning. These algorithms can ingest and process unstructured data, like text and images, and it
automates feature extraction, removing some of the dependency on human experts. For example,
let’s say that we had a set of photos of different pets, and we wanted to categorize by “cat”,
“dog”, “hamster”, et cetera. Deep learning algorithms can determine which features (e.g. ears)
are most important to distinguish each animal from another. In machine learning, this hierarchy
of features is established manually by a human expert.

Then, through the processes of gradient descent and back propagation, the deep learning
algorithm adjusts and fits itself for accuracy, allowing it to make predictions about a new photo
of an animal with increased precision.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 35


Machine learning and deep learning models are capable of different types of learning as well,
which are usually categorized as supervised learning, unsupervised learning, and reinforcement
learning. Supervised learning utilizes labeled datasets to categorize or make predictions; this
requires some kind of human intervention to label input data correctly. In contrast, unsupervised
learning doesn’t require labeled datasets, and instead, it detects patterns in the data, clustering
them by any distinguishing characteristics. Reinforcement learning is a process in which a model
learns to become more accurate for performing an action in an environment based on feedback in
order to maximize the reward.

Parameter Machine Learning Deep Learning

Data Although machine learning depends Deep Learning algorithms highly depend on a
Dependency on the huge amount of data, it can large amount of data, so we need to feed a
work with a smaller amount of data. large amount of data for good performance.

Execution time Machine learning algorithm takes Deep Learning takes a long execution time to
less time to train the model than train the model, but less time to test the model.
deep learning, but it takes a long-
time duration to test the model.

Hardware Since machine learning models do The deep learning model needs a huge amount
Dependencies not need much amount of data, so of data to work efficiently, so they need GPU's
they can work on low-end and hence the high-end machine.
machines.

Feature Machine learning models need a Deep learning is the enhanced version of
Engineering step of feature extraction by the machine learning, so it does not need to
expert, and then it proceeds further. develop the feature extractor for each problem;
instead, it tries to learn high-level features
from the data on its own.

Problem- To solve a given problem, the The problem-solving approach of a deep


solving traditional ML model breaks the learning model is different from the traditional
approach problem in sub-parts, and after ML model, as it takes input for a given
solving each part, produces the problem, and produce the end result. Hence it
final result. follows the end-to-end approach.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 36


Interpretation The interpretation of the result for a The interpretation of the result for a given
of result given problem is easy. As when we problem is very difficult. As when we work
work with machine learning, we with the deep learning model, we may get a
can interpret the result easily, it better result for a given problem than the
means why this result occur, what machine learning model, but we cannot find
was the process. why this particular outcome occurred, and the
reasoning.

Type of data Machine learning models mostly Deep Learning models can work with
require data in a structured form. structured and unstructured data both as they
rely on the layers of the Artificial neural
network.

Suitable for Machine learning models are Deep learning models are suitable for solving
suitable for solving simple or bit- complex problems.
complex problems.

As we have seen the brief introduction of ML and DL with some comparisons, now why and
which one needs to be chosen to solve a particular problem. So, it can be understood by the given
flowchart:

Hence, if you have lots of data and high hardware capabilities, go with deep learning. But if you
don't have any of them, choose the ML model to solve your problem.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 37


In conclusion, we can say that deep learning is machine learning with more capabilities and a
different working approach. And selecting any of them to solve a particular problem is
depending on the amount of data and complexity of the problem.

How Deep Learning Works


We can understand the working of deep learning with the same example of identifying cat vs.
dog. The deep learning model takes the images as the input and feed it directly to the algorithms
without requiring any manual feature extraction step. The images pass to the different layers of
the artificial neural network and predict the final output.

Consider the below image:

Deep learning algorithms are constructed with connected layers.


 The first layer is called the Input Layer
 The last layer is called the Output Layer
 All layers in between are called Hidden Layers. The word deep means the network join
neurons in more than two layers.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 38


Each Hidden layer is composed of neurons. The neurons are connected to each other. The neuron
will process and then propagate the input signal it receives the layer above it. The strength of the
signal given the neuron in the next layer depends on the weight, bias and activation function.

The network consumes large amounts of input data and operates them through multiple layers;
the network can learn increasingly complex features of the data at each layer.

Deep learning Process


A deep neural network provides state-of-the-art accuracy in many tasks, from object detection to
speech recognition. They can learn automatically, without predefined knowledge explicitly
coded by the programmers.

Deep learning Process

To grasp the idea of deep learning, imagine a family, with an infant and parents. The toddler
points objects with his little finger and always says the word ‘cat.’ As his parents are concerned
about his education, they keep telling him ‘Yes, that is a cat’ or ‘No, that is not a cat.’ The infant
persists in pointing objects but becomes more accurate with ‘cats.’ The little kid, deep down,
does not know why he can say it is a cat or not. He has just learned how to hierarchies complex
features coming up with a cat by looking at the pet overall and continue to focus on details such
as the tails or the nose before to make up his mind.

A neural network works quite the same. Each layer represents a deeper level of knowledge, i.e.,
the hierarchy of knowledge. A neural network with four layers will learn more complex feature
than with two layers.

The learning occurs in two phases:

First Phase: The first phase consists of applying a nonlinear transformation of the input and
creates a statistical model as output.

Second Phase: The second phase aims at improving the model with a mathematical method
known as derivative.

The neural network repeats these two phases hundreds to thousands of times until it has reached
a tolerable level of accuracy. The repeat of this two-phase is called iteration.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 39


To give a Deep learning example, take a look at the motion below, the model is trying to learn
how to dance. After 10 minutes of training, the model does not know how to dance, and it looks
like a scribble.

The above describes the simplest type of deep neural network in the simplest terms. However,
deep learning algorithms are incredibly complex, and there are different types of neural networks
to address specific problems or datasets. For example,

 Convolutional neural networks (CNNs), used primarily in computer vision and image
classification applications, can detect features and patterns within an image, enabling
tasks, like object detection or recognition.
 Recurrent neural network (RNNs) are typically used in natural language and speech
recognition applications as it leverages sequential or times series data.

Classification of Neural Networks

Shallow neural network: The Shallow neural network has only one hidden layer between the
input and output.

Deep neural network: Deep neural networks have more than one layer. For instance, Google
LeNet model for image recognition counts 22 layers.

Nowadays, deep learning is used in many ways like a driverless car, mobile phone, Google
Search Engine, Fraud detection, TV, and so on.

Types of Deep Learning Networks


Now in this Deep Neural network tutorial, we will learn about types of Deep Learning Networks:

Types of Deep Learning Networks

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 40


Feed-forward Neural Networks
The simplest type of artificial neural network. With this type of architecture, information flows
in only one direction, forward. It means, the information’s flows starts at the input layer, goes to
the “hidden” layers, and end at the output layer. The network does not have a loop. Information
stops at the output layers.

Recurrent Neural Networks (RNNs)


RNN is a multi-layered neural network that can store information in context nodes, allowing it to
learn data sequences and output a number or another sequence. In simple words, it is an Artificial
Neural Networks whose connections between neurons include loops. RNNs are well suited for
processing sequences of inputs.

Recurrent Neural Networks

For Example, if the task is to predict the next word in the sentence “Do you want a…………?

 The RNN neurons will receive a signal that point to the start of the sentence.
 The network receives the word “Do” as an input and produces a vector of the number.
This vector is fed back to the neuron to provide a memory to the network. This stage
helps the network to remember it received “Do” and it received it in the first position.
 The network will similarly proceed to the next words. It takes the word “you” and
“want.” The state of the neurons is updated upon receiving each word.
 The final stage occurs after receiving the word “a.” The neural network will provide a
probability for each English word that can be used to complete the sentence. A well-
trained RNN probably assigns a high probability to “café,” “drink,” “burger,” etc.

Common uses of RNN

 Help securities traders to generate analytic reports

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 41


 Detect abnormalities in the contract of financial statement
 Detect fraudulent credit-card transaction
 Provide a caption for images
 Power chatbots
 The standard uses of RNN occur when the practitioners are working with time-series data
or sequences (e.g., audio recordings or text).

Convolutional Neural Networks (CNN)


CNN is a multi-layered neural network with a unique architecture designed to extract
increasingly complex features of the data at each layer to determine the output. CNN’s are well
suited for perceptual tasks.

Convolutional Neural Network

CNN is mostly used when there is an unstructured data set (e.g., images) and the practitioners
need to extract information from it.

For instance, if the task is to predict an image caption:


 The CNN receives an image of let’s say a cat, this image, in computer term, is a
collection of the pixel. Generally, one layer for the greyscale picture and three layers for
a color picture.
 During the feature learning (i.e., hidden layers), the network will identify unique features,
for instance, the tail of the cat, the ear, etc.
 When the network thoroughly learned how to recognize a picture, it can provide a
probability for each image it knows. The label with the highest probability will become
the prediction of the network.

Reinforcement Learning
Reinforcement learning is a subfield of machine learning in which systems are trained by
receiving virtual “rewards” or “punishments,” essentially learning by trial and error. Google’s
DeepMind has used reinforcement learning to beat a human champion in the Go games.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 42


Reinforcement learning is also used in video games to improve the gaming experience by
providing smarter bots.

The most famous algorithms are:


 Q-learning
 Deep Q network
 State-Action-Reward-State-Action (SARSA)
 Deep Deterministic Policy Gradient (DDPG)

Examples of deep learning applications

AI in Finance:
The financial technology sector has already started using AI to save time, reduce costs, and add
value. Deep learning is changing the lending industry by using more robust credit scoring. Credit
decision-makers can use AI for robust credit lending applications to achieve faster, more
accurate risk assessment, using machine intelligence to factor in the character and capacity of
applicants.

AI in Marketing:
AI is a valuable tool for customer service management and personalization challenges. Improved
speech recognition in call-center management and call routing as a result of the application of AI
techniques allows a more seamless experience for customers.

For example, deep-learning analysis of audio allows systems to assess a customer’s emotional
tone. If the customer is responding poorly to the AI chatbot, the system can be rerouted the
conversation to real, human operators that take over the issue.

Apart from the three Deep learning examples above, AI is widely used in other
sectors/industries.

Why is Deep Learning Important?


Deep learning is a powerful tool to make prediction an actionable result. Deep learning excels in
pattern discovery (unsupervised learning) and knowledge-based prediction. Big data is the fuel
for deep learning. When both are combined, an organization can reap unprecedented results in
term of productivity, sales, management, and innovation.

Deep learning can outperform traditional method. For instance, deep learning algorithms are
41% more accurate than machine learning algorithm in image classification, 27 % more accurate
in facial recognition and 25% in voice recognition.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 43


Limitations of deep learning

Data labeling
Most current AI models are trained through “supervised learning.” It means that humans must
label and categorize the underlying data, which can be a sizable and error-prone chore. For
example, companies developing self-driving-car technologies are hiring hundreds of people to
manually annotate hours of video feeds from prototype vehicles to help train these systems.
Obtain huge training datasets
It has been shown that simple deep learning techniques like CNN can, in some cases, imitate the
knowledge of experts in medicine and other fields. The current wave of machine learning,
however, requires training data sets that are not only labeled but also sufficiently broad and
universal.

Deep-learning methods required thousands of observations for models to become relatively good
at classification tasks and, in some cases, millions for them to perform at the level of humans.
Without surprise, deep learning is famous in giant tech companies; they are using big data to
accumulate petabytes of data. It allows them to create an impressive and highly accurate deep
learning model.
Explain a problem
Large and complex models can be hard to explain, in human terms. For instance, why a
particular decision was obtained. It is one reason that acceptance of some AI tools are slow in
application areas where interpretability is useful or indeed required.

Furthermore, as the application of AI expands, regulatory requirements could also drive the need
for more explainable AI models.

Summary
Deep Learning Overview: Deep learning is the new state-of-the-art for artificial intelligence.
Deep learning architecture is composed of an input layer, hidden layers, and an output layer. The
word deep means there are more than two fully connected layers.

There is a vast amount of neural networks, where each architecture is designed to perform a
given task. For instance, CNN works very well with pictures; RNN provides impressive results
with time series and text analysis.

Deep learning is now active in different fields, from finance to marketing, supply chain, and
marketing. Big firms are the first one to use deep learning because they have already a large pool
of data. Deep learning requires having an extensive training dataset.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 44


MODEL PERFORMANCE MEASURE

Most model-performance measures are based on the comparison of the model's predictions with
the (known) values of the dependent variable in a dataset. For an ideal model, the predictions and
the dependent-variable values should be equal. In practice, it is never the case, and we want to
quantify the disagreement.

The measures may be applied for several purposes, including:

 Model evaluation: we may want to know how good the model is, i.e., how reliable are the
model’s predictions (how frequent and how large errors may we expect)?;
 Model comparison: we may want to compare two or more models in order to choose between
them;
 Out-of-sample and out-of-time comparisons: we may want to check a model’s performance
when applied to new data to evaluate if performance has not worsened.

Depending on the nature of the dependent variable (continuous, binary, categorical, count,
etc.), different model-performance measures may be used. Moreover, the list of useful
measures is growing as new applications emerge. here, we will discuss only a selected set of
measures, some of which are used in dataset-level exploration techniques.

Confusion Matrix in Machine Learning


The confusion matrix is a matrix used to determine the performance of the classification models
for a given set of test data. It can only be determined if the true values for test data are known.
The matrix itself can be easily understood, but the related terminologies may be confusing. Since
it shows the errors in the model performance in the form of a matrix, hence also known as
an error matrix. Some features of Confusion matrix are given below:

o For the 2 prediction classes of classifiers, the matrix is of 2*2 tables, for 3 classes, it is
3*3 tables, and so on.
o The matrix is divided into two dimensions that are predicted values and actual
values along with the total number of predictions.
o Predicted values are those values, which are predicted by the model, and actual values are
the true values for the given observations.
o It looks like the below table:

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 45


The above table has the following cases:

o True Negative: Model has given prediction No, and the real or actual value was also No.
o True Positive: The model has predicted yes, and the actual value was also true.
o False Negative: The model has predicted no, but the actual value was Yes, it is also
called as Type-II error.
o False Positive: The model has predicted Yes, but the actual value was No. It is also
called a Type-I error.

Need for Confusion Matrix in Machine learning


o It evaluates the performance of the classification models, when they make predictions on
test data, and tells how good our classification model is.
o It not only tells the error made by the classifiers but also the type of errors such as it is
either type-I or type-II error.
o With the help of the confusion matrix, we can calculate the different parameters for the
model, such as accuracy, precision, etc.

Example: We can understand the confusion matrix using an example.

Suppose we are trying to create a model that can predict the result for the disease that is either a
person has that disease or not. So, the confusion matrix for this is given as:

From the above example, we can conclude that:


o The table is given for the two-class classifier, which has two predictions "Yes" and "NO."
Here, Yes defines that patient has the disease, and No defines that patient does not has
that disease.
o The classifier has made a total of 100 predictions. Out of 100 predictions, 89 are true
predictions, and 11 are incorrect predictions.
o The model has given prediction "yes" for 32 times, and "No" for 68 times. Whereas the
actual "Yes" was 27, and actual "No" was 73 times.

Calculations using Confusion Matrix:


We can perform various calculations for the model, such as the model's accuracy, using this
matrix. These calculations are given below:

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 46


o Classification Accuracy: It is one of the important parameters to determine the accuracy
of the classification problems. It defines how often the model predicts the correct output.
It can be calculated as the ratio of the number of correct predictions made by the
classifier to all number of predictions made by the classifiers. The formula is given
below:

o Misclassification rate: It is also termed as Error rate, and it defines how often the model
gives the wrong predictions. The value of error rate can be calculated as the number of
incorrect predictions to all number of the predictions made by the classifier. The formula
is given below:

o Precision: It can be defined as the number of correct outputs provided by the model or
out of all positive classes that have predicted correctly by the model, how many of them
were actually true. It can be calculated using the below formula:

o Recall: It is defined as the out of total positive classes, how our model predicted
correctly. The recall must be as high as possible.

o F-measure: If two models have low precision and high recall or vice versa, it is difficult
to compare these models. So, for this purpose, we can use F-score. This score helps us to
evaluate the recall and precision at the same time. The F-score is maximum if the recall is
equal to the precision. It can be calculated using the below formula:

From the confusion matrix, Sensitivity and Specificity is evaluated using the following equations
also.

 Sensitivity (true positive rate) refers to the probability of a positive test, conditioned on truly
being positive.
 Specificity (true negative rate) refers to the probability of a negative test, conditioned on
truly being negative

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 47


Cross-Validation in Machine Learning

Cross-validation is a technique for validating the model efficiency by training it on the subset of
input data and testing on previously unseen subset of the input data. We can also say that it is a
technique to check how a statistical model generalizes to an independent dataset.

In machine learning, there is always the need to test the stability of the model. It means based
only on the training dataset; we can't fit our model on the training dataset. For this purpose, we
reserve a particular sample of the dataset, which was not part of the training dataset. After that,
we test our model on that sample before deployment, and this complete process comes under
cross-validation. This is something different from the general train-test split.
Hence the basic steps of cross-validations are:
o Reserve a subset of the dataset as a validation set.
o Provide the training to the model using the training dataset.
o Now, evaluate model performance using the validation set. If the model performs well
with the validation set, perform the further step, else check for the issues.

Methods used for Cross-Validation


There are some common methods that are used for cross-validation. These methods are given
below:
1. Validation Set Approach
2. Leave-P-out cross-validation
3. Leave one out cross-validation
4. K-fold cross-validation
5. Stratified k-fold cross-validation
6. Holdout Method

Validation Set Approach


We divide our input dataset into a training set and test or validation set in the validation set
approach. Both the subsets are given 50% of the dataset. But it has one of the big disadvantages
that we are just using a 50% dataset to train our model, so the model may miss out to capture
important information of the dataset. It also tends to give the underfitted model.

Leave-P-out cross-validation
In this approach, the p datasets are left out of the training data. It means, if there are total n
datapoints in the original input dataset, then n-p data points will be used as the training dataset
and the p data points as the validation set. This complete process is repeated for all the samples,
and the average error is calculated to know the effectiveness of the model.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 48


There is a disadvantage of this technique; that is, it can be computationally difficult for the large
p.
Leave one out cross-validation
This method is similar to the leave-p-out cross-validation, but instead of p, we need to take 1
dataset out of training. It means, in this approach, for each learning set, only one datapoint is
reserved, and the remaining dataset is used to train the model. This process repeats for each
datapoint. Hence for n samples, we get n different training set and n test set. It has the following
features:
o In this approach, the bias is minimum as all the data points are used.
o The process is executed for n times; hence execution time is high.
o This approach leads to high variation in testing the effectiveness of the model as we
iteratively check against one data point.

K-Fold Cross-Validation
K-fold cross-validation approach divides the input dataset into K groups of samples of equal
sizes. These samples are called folds. For each learning set, the prediction function uses k-1
folds, and the rest of the folds are used for the test set. This approach is a very popular CV
approach because it is easy to understand, and the output is less biased than other methods.
The steps for k-fold cross-validation are:
o Split the input dataset into K groups
o For each group:
o Take one group as the reserve or test data set.
o Use remaining groups as the training dataset
o Fit the model on the training set and evaluate the performance of the model using
the test set.

Let's take an example of 5-folds cross-validation. So, the dataset is grouped into 5 folds. On
1stiteration, the first fold is reserved for test the model and rest are used to train the model. On
2nditeration, the second fold is used to test the model, and rest is used to train the model. This
process will continue until each fold is not used for the test fold.
Consider the below diagram:

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 49


Stratified k-fold cross-validation
This technique is similar to k-fold cross-validation with some little changes. This approach
works on stratification concept, it is a process of rearranging the data to ensure that each fold or
group is a good representative of the complete dataset. To deal with the bias and variance, it is
one of the best approaches.

It can be understood with an example of housing prices, such that the price of some houses can
be much high than other houses. To tackle such situations, a stratified k-fold cross-validation
technique is useful.

Holdout Method
This method is the simplest cross-validation technique among all. In this method, we need to
remove a subset of the training data and use it to get prediction results by training it on the rest
part of the dataset.
The error that occurs in this process tells how well our model will perform with the unknown
dataset. Although this approach is simple to perform, it still faces the issue of high variance, and
it also produces misleading results sometimes.

Comparison of Cross-validation to train/test split in Machine Learning


o Train/test split: The input data is divided into two parts that are training set and test set on
a ratio of 70:30, 80:20, etc. It provides a high variance, which is one of the biggest
disadvantages.
o Training Data: The training data is used to train the model, and the dependent
variable is known.
o Test Data: The test data is used to make the predictions from the model that is
already trained on the training data. This has the same features as training data but
not the part of that.
o Cross-Validation dataset: It is used to overcome the disadvantage of train/test split by
splitting the dataset into groups of train/test splits, and averaging the result. It can be used
if we want to optimize our model that has been trained on the training dataset for the best

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 50


performance. It is more efficient as compared to train/test split as every observation is
used for the training and testing both.

Limitations of Cross-Validation
There are some limitations of the cross-validation technique, which are given below:
o For the ideal conditions, it provides the optimum output. But for the inconsistent data, it
may produce a drastic result. So, it is one of the big disadvantages of cross-validation, as
there is no certainty of the type of data in machine learning.
o In predictive modeling, the data evolves over a period, due to which, it may face the
differences between the training set and validation sets. Such as if we create a model for
the prediction of stock market values, and the data is trained on the previous 5 years stock
values, but the realistic future values for the next 5 years may drastically different, so it is
difficult to expect the correct output for such situations.

Applications of Cross-Validation
o This technique can be used to compare the performance of different predictive modeling
methods.
o It has great scope in the medical research field.
o It can also be used for the meta-analysis, as it is already being used by the data scientists
in the field of medical statistics.

Area Under Curve - Receiver Operating Characteristic (AUC-ROC) Curve in Machine


Learning

In Machine Learning, only developing an ML model is not sufficient as we also need to see
whether it is performing well or not. It means that after building an ML model, we need to
evaluate and validate how good or bad it is, and for such cases, we use different Evaluation
Metrics. AUC-ROC curve is such an evaluation metric that is used to visualize the performance
of a classification model. It is one of the popular and important metrics for evaluating the
performance of the classification model. In this topic, we are going to discuss more details about
the AUC-ROC curve.

What is AUC-ROC Curve?


AUC-ROC curve is a performance measurement metric of a classification model at different
threshold values. Firstly, let's understand ROC (Receiver Operating Characteristic curve) curve.

ROC Curve
ROC or Receiver Operating Characteristic curve represents a probability graph to show the
performance of a classification model at different threshold levels. The curve is plotted between
two parameters, which are:
o True Positive Rate or TPR
o False Positive Rate or FPR

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 51


In the curve, TPR is plotted on Y-axis, whereas FPR is on the X-axis.

TPR:
TPR or True Positive rate is a synonym for Recall, which can be calculated as:

FPR or False Positive Rate can be calculated as:

Here, TP: True Positive


FP: False Positive
TN: True Negative
FN: False Negative

Now, to efficiently calculate the values at any threshold level, we need a method, which is AUC.

AUC: Area Under the ROC curve


AUC is known for Area Under the ROC curve. As its name suggests, AUC calculates the two-
dimensional area under the entire ROC curve ranging from (0,0) to (1,1), as shown below image:

In the ROC curve, AUC computes the performance of the binary classifier across different
thresholds and provides an aggregate measure. The value of AUC ranges from 0 to 1, which
means an excellent model, will have AUC near 1, and hence it will show a good measure of
Separability.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 52


When to Use AUC-ROC
AUC is preferred due to the following cases:
o AUC is used to measure how well the predictions are ranked instead of giving their
absolute values. Hence, we can say AUC is Scale-Invariant.
o It measures the quality of predictions of the model without considering the selected
classification threshold. It means AUC is classification-threshold-invariant.
When not to use AUC-ROC
o AUC is not preferable when we need to calibrate probability output.
o Further, AUC is not a useful metric when there are wide disparities in the cost of false
negatives vs false positives, and it is difficult to minimize one type of classification error.

How AUC-ROC curve can be used for the Multi-class Model?


Although the AUC-ROC curve is only used for binary classification problems, we can also use it
for multiclass classification problems. For multi-class classification problems, we can plot N
number of AUC curves for N number of classes with the One vs ALL method.

For example, if we have three different classes, X, Y, and Z, then we can plot a curve for X
against Y & Z, a second plot for Y against X & Z, and the third plot for Z against Y and X.

Applications of AUC-ROC Curve


Although the AUC-ROC curve is used to evaluate a classification model, it is widely used for
various applications. Some of the important applications of AUC-ROC are given below:

1. Classification of 3D model
The curve is used to classify a 3D model and separate it from the normal models. With
the specified threshold level, the curve classifies the non-3D and separates out the 3D
models.
2. Healthcare
The curve has various applications in the healthcare sector. It can be used to detect cancer
disease in patients. It does this by using false positive and false negative rates, and
accuracy depends on the threshold value used for the curve.
3. Binary Classification
AUC-ROC curve is mainly used for binary classification problems to evaluate their
performance.

CSC422: Introduction to Artificial Intelligence Lecture Notes Page 53

You might also like