Professional Documents
Culture Documents
CSC422: Introduction To Artificial Intelligence Lecture Notes Page 1
CSC422: Introduction To Artificial Intelligence Lecture Notes Page 1
INTRODUCTION TO
ARTIFICIAL INTELLIGENCE
Goals of AI
To Create Expert Systems − The systems which exhibit intelligent behavior, learn,
demonstrate, explain, and advice its users.
To Implement Human Intelligence in Machines − Creating systems that understand,
think, learn, and behave like humans.
What Contributes to AI?
Artificial intelligence is a science and technology based on disciplines such as Computer
Science, Biology, Psychology, Linguistics, Mathematics, and Engineering. A major thrust of AI
is in the development of computer functions associated with human intelligence, such as
reasoning, learning, and problem solving.
Out of the following areas, one or multiple areas can contribute to build an intelligent system.
Modification is not quick and easy. It may lead to affecting Quick and Easy program
the program adversely. modification.
Applications of AI
AI has been dominant in various fields such as −
Gaming − AI plays crucial role in strategic games such as chess, poker, tic-tac-toe, etc.,
where machine can think of large number of possible positions based on heuristic
knowledge.
Natural Language Processing − It is possible to interact with the computer that
understands natural lan guage spoken by humans.
Expert Systems − There are some applications which integrate machine, software, and
special information to impart reasoning and advising. They provide explanation and
advice to the users.
Vision Systems − These systems understand, interpret, and comprehend visual input on
the computer. For example,
o A spying airplane takes photographs, which are used to figure out spatial
information or map of the areas.
o Doctors use clinical expert system to diagnose the patient.
o Police use computer software that can recognize the face of criminal with the
stored portrait made by forensic artist.
Speech Recognition − Some intelligent systems are capable of hearing and
comprehending the language in terms of sentences and their meanings while a human
Intelligent System
An intelligent system is the ability of a system to calculate, reason, perceive relationships and
analogies, learn from experience, store and retrieve information from memory, solve problems,
comprehend complex ideas, use natural language fluently, classify, generalize, and adapt new
situations.
Types of Intelligence
As described by Howard Gardner, an American developmental psychologist, the Intelligence
comes in multifold -
Intelligence Description Example
Reasoning − It is the set of processes that enables us to provide basis for judgment,
making decisions, and prediction. There are broadly two types −
Even if all of the premises are true in a statement, If something is true of a class of things in
inductive reasoning allows for the conclusion to be general, it is also true for all members of that
false. class.
1
Expert Systems
Examples − Flight-tracking systems, Clinical systems.
2
Natural Language Processing
Examples: Google Now feature, speech recognition,
Automatic voice output.
3
Neural Networks
Examples − Pattern recognition systems such as face
recognition, character recognition, handwriting
recognition.
5
Fuzzy Logic Systems
Examples − Consumer electronics, automobiles, etc.
Characteristics of problem in AI
Each problem given to an AI is with different aspects of representation and explanation. The
given problem has to be analyzed along with several dimensions to choose the most acceptable
method to solve. Some of the key features of a problem are listed below.
Toy Problem:
It can also be called a puzzle-like problem which can be used as a way to explain a more general
problem-solving technique. For testing and demonstrating methodologies, or to compare the
performance of different algorithms, Toy problems can be used. Toy problems are often useful in
providing divination about specific phenomena in complicated problems. Large complicated
problems are divided into many smaller toy problems that can be understood in detail easily.
Sliding-block puzzles, N-Queens problem, Tower of Hanoi are some examples.
Real-World Problem:
As with the name, it is a problem based on the real world. Real-world problems require a
solution. It doesn't depend on descriptions, but with a general formulation. Online shopping,
Fraud detection, Medical diagnosis are some examples of real-world problems in AI.
Goal formulation: The first step in problem-solving is to identify the problem. It involves
selecting the steps to formulate the perfect goal out of multiple goals and selecting
actions to achieve the goal.
Problem formulation: The most important step in problem-solving is choosing the action
to be taken to achieve the goal formulated.
Initial State: The starting state of the agent towards the goal.
Actions: The list of the possible actions available to the agent.
Transition Model: What each action does is described.
Goal Test: The given state is tested whether it is a goal state.
The problem here is to place eight chess queens on an 8*8 chessboard in a way that no queens
will threaten the other. If two queens will come in the same row, column, or diagonal one will
attack another.
Consider the Incremental formulation for this problem: It starts from an empty state and the
operator expands a queen at each step.
A person wants to find a book on display among the vast collections of the library. He does not
know where the book is kept. By tracing a sequential examination of every book displayed in
every rack of the library, the person will eventually find the book. But the approach will
consume a considerable amount of time. Thus an algorithmic approach will succeed but are often
slow.
Types of AI Algorithms
Regression Algorithms.
Instance-Based Algorithms.
Decision Tree Algorithms.
Clustering Algorithms.
Association Rule Learning Algorithms.
Artificial Neural Network Algorithms.
Deep Learning Algorithms.
Search Algorithms
2. Heuristics
A problem-solving heuristic can be said as an informal, ideational, impulsive procedure that
leads to the desired solution in some cases only. The fact is that the outcome of a heuristic
operation is unpredictable. Using a heuristic approach may be more or less effective than using
an algorithm.
Consider the same example discussed above. If he had an idea of where to look for the book, a
great deal of time could be saved. This can be said as searching heuristically. But if one happens
to be wrong on the first trial, He has to try another heuristic. The frequently used problem-
solving heuristics are:
Working forward
It is a forward approach problem-solving solution. In this method, the problem is solved from the
beginning itself and working to the end.
Working backward
It is a backward approach problem-solving solution. Here the problem is solved from the
endpoint or goal to steps that led to the goal.
Means-ends analysis
A problem-solving technique where a mixture of the above two directions. This method is
appropriate for solving complex and large problems. MEA is a strategy to control the searching
procedure in problem-solving. It is centered on evaluating the difference between the current
Generate-and-test
The Generate-and-test method is a problem-solving heuristics that involve an alternate set of
actions in a random method. Each alternate method is to check whether it will solve the problem.
It ensures that the selected best possible solution is checked against the generated possible
solutions.
The terms heuristics and algorithms overlap somewhat in AI. A heuristic may be a subroutine
that can be used to determine where to look first for an optimal algorithm. Heuristic algorithms
can be listed under the categories of algorithms. In a sense of heuristics are algorithms, heuristic
makes a guessing approach to solve the problem, granting a good enough answer rather than
finding the best possible outcome. The level of indirection is the main difference between the
two.
The first goal of RCA is to identify the root cause of the problem. The second goal is to
understand how to fix the underlying issues within the root cause. The third goal is to prevent
future issues or to repeat the success.
Breadth-first search
Depth-first search
Depth-limited search
Iterative deepening depth-first search
Bidirectional search
Uniform cost search
.
1. Breadth-first search
It generally starts from the root node and examines the neighbor nodes and then moves to the
next level. It uses First-in First-out (FIFO) strategy as it gives the shortest path to achieving the
solution. The BFS is used where the given problem is very small and space complexity is not
considered.
Now, consider the following tree.
Let’s take node A as the start state and node F as the goal state.
The BFS algorithm starts with the start state and then goes to the next level and visits the node
until it reaches the goal state.
Advantages of BFS
BFS will never be trapped in any unwanted nodes.
If the graph has more than one solution, then BFS will return the optimal solution which
provides the shortest path.
Disadvantages of BFS
BFS stores all the nodes in the current level and then go to the next level. It requires a lot of
memory to store the nodes.
BFS takes more time
to reach the goal state which is far away.
2. Depth-first search
The depth-first search uses Last-in, First-out (LIFO) strategy and hence it can be implemented by
using stack. DFS uses backtracking. That is, it starts from the initial state and explores each path
to its greatest depth before it moves to the next path.
Advantages of DFS
It takes lesser memory as compared to BFS.
The time complexity is lesser when compared to BFS.
DFS does not require much more search.
Disadvantages of DFS
DFS does not always guarantee to give a solution.
As DFS goes deep down, it may get trapped in an infinite loop.
3. Depth-limited search
Depth-limited works similarly to depth-first search. The difference here is that depth-limited
search has a pre-defined limit up to which it can traverse the nodes. Depth-limited search solves
one of the drawbacks of DFS as it does not go to an infinite path.
DLS ends its traversal if any of the following conditions exits.
Standard Failure
It denotes that the given problem does not have any solutions.
If we give C as the goal node and the limit as 0, the algorithm will not return any path as the goal
node is not available within the given limit.
If we give the goal node as F and limit as 2, the path will be A, C, F.
Advantages of DLS
It takes lesser memory when compared to other search techniques.
Disadvantages of DLS
DLS may not offer an optimal solution if the problem has more than one solution.
DLS also encounters incompleteness.
Consider, A as the start node and E as the goal node. Let the maximum depth be 2.
The algorithm starts with A and goes to the next level and searches for E. If not found, it goes to
the next level and finds E.
Advantages of IDDFS
IDDFS has the advantages of both BFS and DFS.
It offers fast search and uses memory efficiently.
Disadvantages of IDDFS
It does all the works
of the previous stage again and again.
5. Bidirectional search
The bidirectional search algorithm is completely different from all other search strategies. It
executes two simultaneous searches called forward-search and backwards-search and reaches the
goal state. Here, the graph is divided into two smaller sub-graphs. In one graph, the search is
started from the initial start state and in the other graph, the search is started from the goal state.
When these two nodes intersect each other, the search will be terminated.
Bidirectional search requires both start and goal start to be well defined and the branching factor
to be the same in the two directions.
Consider the below graph.
Here, the start state is E and the goal state is G. In one sub-graph, the search starts from E and in
the other, the search starts from G. E will go to B and then A. G will go to C and then A. Here,
both the traversal meets at A and hence the traversal ends.
Here, S is the start node and G is the goal node. From S, G can be reached in the following ways.
S, A, E, F, G -> 19
S, B, E, F, G -> 18
S, B, D, F, G -> 19
S, C, D, F, G -> 23
Here, the path with the least cost is S, B, E, F, G.
Advantages of UCS
This algorithm is optimal as the selection of paths is based on the lowest cost.
Disadvantages of UCS
The algorithm does not consider how many steps it goes to reach the lowest path. This may result
in an infinite loop also.
Here, the goal state can be achieved by using the heuristic function.
The heuristic function is used to achieve the goal state with the lowest cost possible. This
function estimates how close a state is to the goal.
Let’s discuss some of the informed search strategies.
2. A* search algorithm
A* search algorithm is a combination of both uniform cost search and greedy best-first search
algorithms. It uses the advantages of both with better memory usage. It uses a heuristic function
to find the shortest path. A* search algorithm uses the sum of both the cost and heuristic of the
node to find the best path. Consider the following graph with the heuristics values as follows.
Note the point that A* search uses the sum of path cost and heuristics value to determine the
path.
Search algorithms are used in games, stored databases, virtual search spaces, quantum
computers, and so on. In this article, we have discussed some of the important search strategies
and how to use them to solve the problems in AI and this is not the end. There are several
algorithms to solve any problem. Nowadays, AI is growing rapidly and applies to many real-life
problems. Keep learning! Keep practicing!
Machine Learning
At its core, machine learning is simply a way of achieving AI. Machine learning is an application
of artificial intelligence (AI) that enables systems to learn and advance based on experience
without being clearly programmed. Machine learning focuses on the development of computer
programs that can access data and use it for their own learning. There are 4 types of machine
learning.
1. Supervised learning
Supervised machine learning can take what it has learned in the past and apply that to new data
using labelled examples to predict future patterns and events. It learns by explicit example.
Supervised learning requires that the algorithm’s possible outputs are already known and that the
data used to train the algorithm is already labelled with correct answers. It’s like teaching a child
that 2+2=4 or showing an image of a dog and teaching the child that it is called a dog. The
approach to supervised machine learning is essentially the same – it is presented with all the
information it needs to reach pre-determined conclusions. It learns how to reach the conclusion,
just like a child would learn how to reach the total of ‘5’ and the few, pre-determined ways to get
there, for example, 2+3 and 1+4. If you were to present 6+3 as a way to get to 5, that would be
determined as incorrect. Errors would be found and adjusted.
The algorithm learns by comparing its actual output with correct outputs to find errors. It then
modifies the model accordingly.Supervised learning is commonly used in applications
where historical data predicts likely future events. Using the previous example, if 6+3 is the most
common erroneous way to get to 5, the machine can predict that when someone inputs 6+3, after
the correct answer of 9, 5 would be the second most commonly expected result. We can also
consider an everyday example - it can foresee when credit card transactions are likely to be
2. Unsupervised learning
Supervised learning tasks find patterns where we have a dataset of “right answers” to learn
from. Unsupervised learning tasks find patterns where we don’t. This may be because the “right
answers” are unobservable, or infeasible to obtain, or maybe for a given problem, there isn’t
even a “right answer” per se. Unsupervised learning is used against data without any historical
labels. The system is not given a pre-determined set of outputs or correlations between inputs
and outputs or a "correct answer." The algorithm must figure out what it is seeing by itself, it has
no storage of reference points. The goal is to explore the data and find some sort of patterns of
structure. Unsupervised learning works well when the data is transactional. For example,
identifying pockets of customers with similar characteristics who can then be targeted in
marketing campaigns. Unsupervised machine learning is a more complex process and has been
used far fewer times than supervised machine learning. But it’s exactly for this reason that there
is so much buzz around the future of AI. Advances in unsupervised ML are seen as the future of
AI because it moves away from narrow AI and closer to AGI (‘artificial general intelligence’ that
we discussed a few paragraphs earlier). If you’ve ever heard someone talking about computers
teaching themselves, this is essentially what they are referring to. In unsupervised learning,
neither a training data set nor a list of outcomes is provided. The AI enters the problem blind –
with only its faultless logical operations to guide it. Imagine yourself as a person that has never
heard of or seen any sport being played. You get taken to a football game and left to figure out
what it is that you are observing. You can’t refer to your knowledge of other sports and try to
draw up similarities and differences that will eventually boil down to an understanding of
football. You have nothing but your cognitive ability. Unsupervised learning places the AI in an
equivalent of this situation and leaves it to learn using only it’s on/off logic mechanisms that are
used in all computer systems.
4. Reinforcement learning
Reinforcement learning is a type of dynamic programming that trains algorithms using a system
of reward and punishment. A reinforcement learning algorithm, or agent, learns by interacting
with its environment. It receives rewards by performing correctly and penalties for doing so
incorrectly. Therefore, it learns without having to be directly taught by a human – it learns by
seeking the greatest reward and minimizing penalty. This learning is tied to a context because
what may lead to maximum reward in one situation may be directly associated with a penalty in
another. This type of learning consists of three components: the agent (the AI learner/decision
maker), the environment (everything the agent has interaction with) and actions (what the agent
can do). The agent will reach the goal much faster by finding the best way to do it – and that is
the goal - maximizing the reward, minimizing the penalty and figuring out the best way to do so.
What is an Agent?
An agent can be viewed as anything that perceives its environment through sensors and acts
upon that environment through actuators.
For example, human being perceives their surroundings through their sensory organs known as
sensors and take actions using their hands, legs, etc., known as actuators.
Rational Agent
A rational agent is an agent which takes the right action for every perception. By doing so, it
maximizes the performance measure, which makes an agent be the most successful.
Note: There is a slight difference between a rational agent and an intelligent agent.
Omniscient Agent
An omniscient agent is an agent which knows the actual outcome of its action in advance.
However, such agents are impossible in the real world.
Note: Rational agents are different from Omniscient agents because a rational agent tries to get
the best possible outcome with the current perception, which leads to imperfection. A chess
AI can be a good example of a rational agent because, with the current action, it is not possible to
foresee every possible outcome whereas a tic-tac-toe AI is omniscient as it always knows the
outcome in advance.
Software Agents
It is a software program which works in a dynamic environment. These agents are also known
as Softbots because all body parts of software agents are software only. For example, video
games, flight simulator, etc.
Behavior of an Agent
Mathematically, an agent behavior can be described by an:
For example, an automatic hand-dryer detects signals (hands) through its sensors. When we
bring hands nearby the dryer, it turns on the heating circuit and blows air. When the signal
detection disappears, it breaks the heating circuit and stops blowing air.
Rationality of an agent
It is expected from an intelligent agent to act in a way that maximizes its performance measure.
Therefore, the rationality of an agent depends on four things:
The performance measure which defines the criterion of success.
The agent's built-in knowledge about the environment.
The actions that the agent can perform.
The agent’s percept sequence until now.
For example: score in exams depends on the question paper as well as our knowledge.
Task Environment
A task environment is a problem to which a rational agent is designed as a solution.
Consequently, in 2003, Russell and Norvig introduced several ways to classify task
environments. However, before classifying the environments, we should be aware of the
following terms:
Performance Measure: It specifies the agent’s sequence of steps taken to achieve its
target by measuring different factors.
Environment: It specifies the interaction of the agent with different types of
environment.
Actuators: It specifies the way the agent affects the environment by taking expected
actions.
Sensors: It specifies the way the agent gets information from its environment.
Structure of agents
The goal of artificial intelligence is to design an agent program which implements an agent
function i.e., mapping from percepts into actions. A program requires some computer devices
with physical sensors and actuators for execution, which is known as architecture.
Note: The difference between the agent program and agent function is that an agent program
takes the current percept as input, whereas an agent function takes the entire percept history.
Example: iDraw, a drawing robot which converts the typed characters into writing without
storing the past data.
Note: Simple reflex agents do not maintain the internal state and do not depend on the percept
theory.
Model-based agent: These type of agents can handle partially observable environments
by maintaining some internal states. The internal state depends on the percept history,
which reflects at least some of the unobserved aspects of the current state. Therefore, as
time passes, the internal state needs to be updated which requires two types of knowledge
Example: When a person walks in a lane, he maps the pathway in his mind.
Goal-based agents: It is not sufficient to have the current state information unless the
goal is not decided. Therefore, a goal-based agent selects a way among multiple
possibilities that helps it to reach its goal.
Note: With the help of searching and planning (subfields of AI), it becomes easy for the Goal-
based agent to reach its destination.
Example: The main goal of chess playing is to 'check-and-mate' the king, but the player
completes several small goals previously.
Note: Utility-based agents keep track of its environment, and before reaching its main goal, it
completes several tiny goals that may come in between the path.
Learning agents: The main task of these agents is to teach the agent machines to operate
in an unknown environment and gain as much knowledge as they can. A learning agent is
divided into four conceptual components:
Atomic Representation: Here, we cannot divide each state of the world. So, it does not
have any internal structure. Search, and game-playing, Hidden Markov Models, and
Markov decision process all work with the atomic representation.
Factored Representation: Here, each state is split into a fixed set of attributes or
variables having a value. It allows us to represent uncertainty. Constraint satisfaction,
propositional logic, Bayesian networks, and machine learning algorithms work with the
Factored representation.
Note: Two different factored states can share some variables like current GPS location,
but two different atomic states cannot do so.
Deep Learning is computer software that mimics the network of neurons in a brain. It is a subset
of machine learning based on artificial neural networks with representation learning. It is called
deep learning because it makes use of deep neural networks. This learning can be supervised,
semi-supervised or unsupervised.
In deep learning, models use different layers to learn and discover insights from the data. Some
popular applications of deep learning are self-driving cars, language translation, natural language
processing, etc. Some popular deep learning models are:
Deep learning drives many artificial intelligence (AI) applications and services that improve
automation, performing analytical and physical tasks without human intervention. Deep learning
technology lies behind everyday products and services (such as digital assistants, voice-enabled
TV remotes, and credit card fraud detection) as well as emerging technologies (such as self-
driving cars).
Deep learning eliminates some of data pre-processing that is typically involved with machine
learning. These algorithms can ingest and process unstructured data, like text and images, and it
automates feature extraction, removing some of the dependency on human experts. For example,
let’s say that we had a set of photos of different pets, and we wanted to categorize by “cat”,
“dog”, “hamster”, et cetera. Deep learning algorithms can determine which features (e.g. ears)
are most important to distinguish each animal from another. In machine learning, this hierarchy
of features is established manually by a human expert.
Then, through the processes of gradient descent and back propagation, the deep learning
algorithm adjusts and fits itself for accuracy, allowing it to make predictions about a new photo
of an animal with increased precision.
Data Although machine learning depends Deep Learning algorithms highly depend on a
Dependency on the huge amount of data, it can large amount of data, so we need to feed a
work with a smaller amount of data. large amount of data for good performance.
Execution time Machine learning algorithm takes Deep Learning takes a long execution time to
less time to train the model than train the model, but less time to test the model.
deep learning, but it takes a long-
time duration to test the model.
Hardware Since machine learning models do The deep learning model needs a huge amount
Dependencies not need much amount of data, so of data to work efficiently, so they need GPU's
they can work on low-end and hence the high-end machine.
machines.
Feature Machine learning models need a Deep learning is the enhanced version of
Engineering step of feature extraction by the machine learning, so it does not need to
expert, and then it proceeds further. develop the feature extractor for each problem;
instead, it tries to learn high-level features
from the data on its own.
Type of data Machine learning models mostly Deep Learning models can work with
require data in a structured form. structured and unstructured data both as they
rely on the layers of the Artificial neural
network.
Suitable for Machine learning models are Deep learning models are suitable for solving
suitable for solving simple or bit- complex problems.
complex problems.
As we have seen the brief introduction of ML and DL with some comparisons, now why and
which one needs to be chosen to solve a particular problem. So, it can be understood by the given
flowchart:
Hence, if you have lots of data and high hardware capabilities, go with deep learning. But if you
don't have any of them, choose the ML model to solve your problem.
The network consumes large amounts of input data and operates them through multiple layers;
the network can learn increasingly complex features of the data at each layer.
To grasp the idea of deep learning, imagine a family, with an infant and parents. The toddler
points objects with his little finger and always says the word ‘cat.’ As his parents are concerned
about his education, they keep telling him ‘Yes, that is a cat’ or ‘No, that is not a cat.’ The infant
persists in pointing objects but becomes more accurate with ‘cats.’ The little kid, deep down,
does not know why he can say it is a cat or not. He has just learned how to hierarchies complex
features coming up with a cat by looking at the pet overall and continue to focus on details such
as the tails or the nose before to make up his mind.
A neural network works quite the same. Each layer represents a deeper level of knowledge, i.e.,
the hierarchy of knowledge. A neural network with four layers will learn more complex feature
than with two layers.
First Phase: The first phase consists of applying a nonlinear transformation of the input and
creates a statistical model as output.
Second Phase: The second phase aims at improving the model with a mathematical method
known as derivative.
The neural network repeats these two phases hundreds to thousands of times until it has reached
a tolerable level of accuracy. The repeat of this two-phase is called iteration.
The above describes the simplest type of deep neural network in the simplest terms. However,
deep learning algorithms are incredibly complex, and there are different types of neural networks
to address specific problems or datasets. For example,
Convolutional neural networks (CNNs), used primarily in computer vision and image
classification applications, can detect features and patterns within an image, enabling
tasks, like object detection or recognition.
Recurrent neural network (RNNs) are typically used in natural language and speech
recognition applications as it leverages sequential or times series data.
Shallow neural network: The Shallow neural network has only one hidden layer between the
input and output.
Deep neural network: Deep neural networks have more than one layer. For instance, Google
LeNet model for image recognition counts 22 layers.
Nowadays, deep learning is used in many ways like a driverless car, mobile phone, Google
Search Engine, Fraud detection, TV, and so on.
For Example, if the task is to predict the next word in the sentence “Do you want a…………?
The RNN neurons will receive a signal that point to the start of the sentence.
The network receives the word “Do” as an input and produces a vector of the number.
This vector is fed back to the neuron to provide a memory to the network. This stage
helps the network to remember it received “Do” and it received it in the first position.
The network will similarly proceed to the next words. It takes the word “you” and
“want.” The state of the neurons is updated upon receiving each word.
The final stage occurs after receiving the word “a.” The neural network will provide a
probability for each English word that can be used to complete the sentence. A well-
trained RNN probably assigns a high probability to “café,” “drink,” “burger,” etc.
CNN is mostly used when there is an unstructured data set (e.g., images) and the practitioners
need to extract information from it.
Reinforcement Learning
Reinforcement learning is a subfield of machine learning in which systems are trained by
receiving virtual “rewards” or “punishments,” essentially learning by trial and error. Google’s
DeepMind has used reinforcement learning to beat a human champion in the Go games.
AI in Finance:
The financial technology sector has already started using AI to save time, reduce costs, and add
value. Deep learning is changing the lending industry by using more robust credit scoring. Credit
decision-makers can use AI for robust credit lending applications to achieve faster, more
accurate risk assessment, using machine intelligence to factor in the character and capacity of
applicants.
AI in Marketing:
AI is a valuable tool for customer service management and personalization challenges. Improved
speech recognition in call-center management and call routing as a result of the application of AI
techniques allows a more seamless experience for customers.
For example, deep-learning analysis of audio allows systems to assess a customer’s emotional
tone. If the customer is responding poorly to the AI chatbot, the system can be rerouted the
conversation to real, human operators that take over the issue.
Apart from the three Deep learning examples above, AI is widely used in other
sectors/industries.
Deep learning can outperform traditional method. For instance, deep learning algorithms are
41% more accurate than machine learning algorithm in image classification, 27 % more accurate
in facial recognition and 25% in voice recognition.
Data labeling
Most current AI models are trained through “supervised learning.” It means that humans must
label and categorize the underlying data, which can be a sizable and error-prone chore. For
example, companies developing self-driving-car technologies are hiring hundreds of people to
manually annotate hours of video feeds from prototype vehicles to help train these systems.
Obtain huge training datasets
It has been shown that simple deep learning techniques like CNN can, in some cases, imitate the
knowledge of experts in medicine and other fields. The current wave of machine learning,
however, requires training data sets that are not only labeled but also sufficiently broad and
universal.
Deep-learning methods required thousands of observations for models to become relatively good
at classification tasks and, in some cases, millions for them to perform at the level of humans.
Without surprise, deep learning is famous in giant tech companies; they are using big data to
accumulate petabytes of data. It allows them to create an impressive and highly accurate deep
learning model.
Explain a problem
Large and complex models can be hard to explain, in human terms. For instance, why a
particular decision was obtained. It is one reason that acceptance of some AI tools are slow in
application areas where interpretability is useful or indeed required.
Furthermore, as the application of AI expands, regulatory requirements could also drive the need
for more explainable AI models.
Summary
Deep Learning Overview: Deep learning is the new state-of-the-art for artificial intelligence.
Deep learning architecture is composed of an input layer, hidden layers, and an output layer. The
word deep means there are more than two fully connected layers.
There is a vast amount of neural networks, where each architecture is designed to perform a
given task. For instance, CNN works very well with pictures; RNN provides impressive results
with time series and text analysis.
Deep learning is now active in different fields, from finance to marketing, supply chain, and
marketing. Big firms are the first one to use deep learning because they have already a large pool
of data. Deep learning requires having an extensive training dataset.
Most model-performance measures are based on the comparison of the model's predictions with
the (known) values of the dependent variable in a dataset. For an ideal model, the predictions and
the dependent-variable values should be equal. In practice, it is never the case, and we want to
quantify the disagreement.
Model evaluation: we may want to know how good the model is, i.e., how reliable are the
model’s predictions (how frequent and how large errors may we expect)?;
Model comparison: we may want to compare two or more models in order to choose between
them;
Out-of-sample and out-of-time comparisons: we may want to check a model’s performance
when applied to new data to evaluate if performance has not worsened.
Depending on the nature of the dependent variable (continuous, binary, categorical, count,
etc.), different model-performance measures may be used. Moreover, the list of useful
measures is growing as new applications emerge. here, we will discuss only a selected set of
measures, some of which are used in dataset-level exploration techniques.
o For the 2 prediction classes of classifiers, the matrix is of 2*2 tables, for 3 classes, it is
3*3 tables, and so on.
o The matrix is divided into two dimensions that are predicted values and actual
values along with the total number of predictions.
o Predicted values are those values, which are predicted by the model, and actual values are
the true values for the given observations.
o It looks like the below table:
o True Negative: Model has given prediction No, and the real or actual value was also No.
o True Positive: The model has predicted yes, and the actual value was also true.
o False Negative: The model has predicted no, but the actual value was Yes, it is also
called as Type-II error.
o False Positive: The model has predicted Yes, but the actual value was No. It is also
called a Type-I error.
Suppose we are trying to create a model that can predict the result for the disease that is either a
person has that disease or not. So, the confusion matrix for this is given as:
o Misclassification rate: It is also termed as Error rate, and it defines how often the model
gives the wrong predictions. The value of error rate can be calculated as the number of
incorrect predictions to all number of the predictions made by the classifier. The formula
is given below:
o Precision: It can be defined as the number of correct outputs provided by the model or
out of all positive classes that have predicted correctly by the model, how many of them
were actually true. It can be calculated using the below formula:
o Recall: It is defined as the out of total positive classes, how our model predicted
correctly. The recall must be as high as possible.
o F-measure: If two models have low precision and high recall or vice versa, it is difficult
to compare these models. So, for this purpose, we can use F-score. This score helps us to
evaluate the recall and precision at the same time. The F-score is maximum if the recall is
equal to the precision. It can be calculated using the below formula:
From the confusion matrix, Sensitivity and Specificity is evaluated using the following equations
also.
Sensitivity (true positive rate) refers to the probability of a positive test, conditioned on truly
being positive.
Specificity (true negative rate) refers to the probability of a negative test, conditioned on
truly being negative
Cross-validation is a technique for validating the model efficiency by training it on the subset of
input data and testing on previously unseen subset of the input data. We can also say that it is a
technique to check how a statistical model generalizes to an independent dataset.
In machine learning, there is always the need to test the stability of the model. It means based
only on the training dataset; we can't fit our model on the training dataset. For this purpose, we
reserve a particular sample of the dataset, which was not part of the training dataset. After that,
we test our model on that sample before deployment, and this complete process comes under
cross-validation. This is something different from the general train-test split.
Hence the basic steps of cross-validations are:
o Reserve a subset of the dataset as a validation set.
o Provide the training to the model using the training dataset.
o Now, evaluate model performance using the validation set. If the model performs well
with the validation set, perform the further step, else check for the issues.
Leave-P-out cross-validation
In this approach, the p datasets are left out of the training data. It means, if there are total n
datapoints in the original input dataset, then n-p data points will be used as the training dataset
and the p data points as the validation set. This complete process is repeated for all the samples,
and the average error is calculated to know the effectiveness of the model.
K-Fold Cross-Validation
K-fold cross-validation approach divides the input dataset into K groups of samples of equal
sizes. These samples are called folds. For each learning set, the prediction function uses k-1
folds, and the rest of the folds are used for the test set. This approach is a very popular CV
approach because it is easy to understand, and the output is less biased than other methods.
The steps for k-fold cross-validation are:
o Split the input dataset into K groups
o For each group:
o Take one group as the reserve or test data set.
o Use remaining groups as the training dataset
o Fit the model on the training set and evaluate the performance of the model using
the test set.
Let's take an example of 5-folds cross-validation. So, the dataset is grouped into 5 folds. On
1stiteration, the first fold is reserved for test the model and rest are used to train the model. On
2nditeration, the second fold is used to test the model, and rest is used to train the model. This
process will continue until each fold is not used for the test fold.
Consider the below diagram:
It can be understood with an example of housing prices, such that the price of some houses can
be much high than other houses. To tackle such situations, a stratified k-fold cross-validation
technique is useful.
Holdout Method
This method is the simplest cross-validation technique among all. In this method, we need to
remove a subset of the training data and use it to get prediction results by training it on the rest
part of the dataset.
The error that occurs in this process tells how well our model will perform with the unknown
dataset. Although this approach is simple to perform, it still faces the issue of high variance, and
it also produces misleading results sometimes.
Limitations of Cross-Validation
There are some limitations of the cross-validation technique, which are given below:
o For the ideal conditions, it provides the optimum output. But for the inconsistent data, it
may produce a drastic result. So, it is one of the big disadvantages of cross-validation, as
there is no certainty of the type of data in machine learning.
o In predictive modeling, the data evolves over a period, due to which, it may face the
differences between the training set and validation sets. Such as if we create a model for
the prediction of stock market values, and the data is trained on the previous 5 years stock
values, but the realistic future values for the next 5 years may drastically different, so it is
difficult to expect the correct output for such situations.
Applications of Cross-Validation
o This technique can be used to compare the performance of different predictive modeling
methods.
o It has great scope in the medical research field.
o It can also be used for the meta-analysis, as it is already being used by the data scientists
in the field of medical statistics.
In Machine Learning, only developing an ML model is not sufficient as we also need to see
whether it is performing well or not. It means that after building an ML model, we need to
evaluate and validate how good or bad it is, and for such cases, we use different Evaluation
Metrics. AUC-ROC curve is such an evaluation metric that is used to visualize the performance
of a classification model. It is one of the popular and important metrics for evaluating the
performance of the classification model. In this topic, we are going to discuss more details about
the AUC-ROC curve.
ROC Curve
ROC or Receiver Operating Characteristic curve represents a probability graph to show the
performance of a classification model at different threshold levels. The curve is plotted between
two parameters, which are:
o True Positive Rate or TPR
o False Positive Rate or FPR
TPR:
TPR or True Positive rate is a synonym for Recall, which can be calculated as:
Now, to efficiently calculate the values at any threshold level, we need a method, which is AUC.
In the ROC curve, AUC computes the performance of the binary classifier across different
thresholds and provides an aggregate measure. The value of AUC ranges from 0 to 1, which
means an excellent model, will have AUC near 1, and hence it will show a good measure of
Separability.
For example, if we have three different classes, X, Y, and Z, then we can plot a curve for X
against Y & Z, a second plot for Y against X & Z, and the third plot for Z against Y and X.
1. Classification of 3D model
The curve is used to classify a 3D model and separate it from the normal models. With
the specified threshold level, the curve classifies the non-3D and separates out the 3D
models.
2. Healthcare
The curve has various applications in the healthcare sector. It can be used to detect cancer
disease in patients. It does this by using false positive and false negative rates, and
accuracy depends on the threshold value used for the curve.
3. Binary Classification
AUC-ROC curve is mainly used for binary classification problems to evaluate their
performance.