Professional Documents
Culture Documents
Aiml QB
Aiml QB
PART A
● Goal-based agents
● Utility-based agent
● Learning agent
AI has numerous applications across various industries, here are a couple of examples:
● Finance: AI is used for fraud detection, algorithmic trading, credit scoring, and
customer service chatbots, aiding in better decision-making and operational
efficiency.
Search algorithms are typically chosen based on their efficiency, suitability for the
problem at hand, and the resources available. Common criteria for choosing a search
algorithm include completeness, optimality, time complexity, space complexity, heuristics,
scalability, and implementation complexity.
● BFS can be used to find the shortest path between two nodes in an unweighted
graph
● BFS is often used to solve maze problems, where each cell in the maze is considered
a node and the algorithm finds the shortest path from the start to the exit.
● BFS is used in robotics for path planning, especially in environments with obstacles.
By treating the environment as a graph, BFS can find a collision-free path for a robot
to navigate from its current position to a target position.
8.Define Depth Limited Search?
Depth-Limited Search (DLS) is a variant of Depth-First Search (DFS) that sets a maximum
depth to limit how far the search can go. It's particularly useful for search algorithms dealing
with large or infinite-depth search spaces, preventing the search from going too deep and
consuming excessive resources or time.
9.Evaluate the problem solving agents using Depth first search algorithm?
● Completeness: DFS is not complete for problems with infinite depths or cycles in the
search space.
● Optimality: It does not guarantee finding the optimal solution; it may find a solution
quickly but not necessarily the best one.
● Breadth-first Search
● Depth-first Search
● Depth-limited Search
● Bidirectional Search
11.How will you specify the task environment for an agent taxi?
12.Compare the time complexity and space complexity of an uninformed search algorithm?
● Time complexityO(b)
● Space complexity:O(bm)
PART-B
Artificial Intelligence has become a buzzword in the technological world, with the potential
to transform the way we live and work. The foundation of AI lies in the Base i, which refers
to the fundamental building blocks that make up the technology. These building blocks
include machine learning, natural language processing, computer vision, and robotics, among
others. Together, these components form the backbone of AI, allowing machines to learn,
● Natural Language Processing (NLP): NLP deals with the interaction between
computers and humans using natural language. It involves tasks like speech
recognition, language translation, sentiment analysis, and text generation.
● Computer Vision: This field involves enabling computers to interpret and understand
the visual world. It includes tasks such as image recognition, object detection, image
segmentation, and scene understanding.
● Expert Systems: These are AI systems that mimic the decision-making abilities of a
human expert in a specific domain. They use rules and knowledge bases to provide
solutions to complex problems.
● Search and Optimization: AI algorithms often involve searching through large spaces
of possibilities to find optimal solutions. Techniques like depth-first search, breadth-
first search, genetic algorithms, and simulated annealing are used for optimization.
● Neural Networks: Inspired by the structure of the human brain, neural networks are a
fundamental component of modern AI, especially in deep learning. They consist of
interconnected nodes (neurons) organized in layers, and they excel at tasks like
pattern recognition and feature extraction.
● Ethics and Bias: As AI becomes more pervasive, ethical considerations regarding
fairness, transparency, privacy, and accountability are crucial. Addressing biases in
data and algorithms is also a significant part of AI ethics.
An environment in artificial intelligence is the surrounding of the agent. The agent takes
input from the environment through sensors and delivers the output to the environment
through actuators. There are several types of environments:
● When an agent sensor is capable to sense or access the complete state of an agent
at each point in time, it is said to be a fully observable environment.A task
environment is fully observable if the sensors detect all aspects that are relevant to
the choice of action;relevance ,in turn,depends on the performance
measure .sometimes environment might be partially observable due to the noisy
and inaccurate sensor.
● Maintaining a fully observable environment is easy as there is no need to keep
track of the history of the surrounding.
● An environment is called unobservable when the agent has no sensors in all
environments.
● Examples:
● Chess – the board is fully observable, and so are the opponent’s
moves.
● Driving – the environment is partially observable because what’s
around the corner is not known.
2. Deterministic vs Stochastic
● When a uniqueness in the agent’s current state completely determines the next
state of the agent, the environment is said to be deterministic.
● The stochastic environment is random in nature which is not unique and cannot be
completely determined by the agent.
● Non Deterministic environment is one which actions are
● Examples:
● Chess – there would be only a few possible moves for a coin at the
current state and these moves can be determined.
● Self-Driving Cars- the actions of a self-driving car are not unique, it
varies time to time.
3. Competitive vs Collaborative
● The game of chess is competitive as the agents compete with each other to win the
game which is the output.
● An agent is said to be in a collaborative environment when multiple agents
cooperate to produce the desired output.
● When multiple self-driving cars are found on the roads, they cooperate with each
other to avoid collisions and reach their destination which is the output desired.
4. Single-agent vs Multi-agent
5. Dynamic vs Static
● An environment that keeps constantly changing itself when the agent is up with
some action is said to be dynamic.
● A roller coaster ride is dynamic as it is set in motion and the environment keeps
changing every instant.
● An idle environment with no change in its state is called a static environment.
● An empty house is static as there’s no change in the surroundings when an agent
enters.
7.Episodic vs Sequential
Checkers- Where the previous move can affect all the following moves
8. Known vs Unknown :
In a known environment, the output for all probable actions is given. Obviously, in
case of unknown environment, for an agent to make a decision, it has to gain knowledge
about how the environment works.
Uninformed search, also known as blind search, is a search algorithm that explores a problem
space without any specific knowledge or information about the problem other than the initial
state and the possible actions to take. It lacks domain-specific heuristics or prior knowledge
about the problem. Uninformed search algorithms, such as breadth-first search and depth-first
search, systematically explore the search space by applying predefined rules to generate
successor states until a goal state is found or the search is exhausted. These algorithms are
typically less efficient than informed search algorithms but can be useful in certain scenarios
or as a basis for more advanced search techniques.
It is a search algorithm where the search tree will be traversed from the root node. It will be
traversing, searching for a key at the leaf of a particular branch. If the key is not found, the
searcher retraces its steps back (backtracking) to the point from where the other branch was
left unexplored, and the same procedure is repeated for that other branch.
The above image clearly explains the DFS Algorithm. First, the search technique starts from
the root node A and then goes to the branch where node B is present (lexicographical order).
Then it goes to node D because of DFS, and from D, there is only one node to traverse, i.e.,
node H. But after node H does not have any child nodes, we retrace the path in which we
traversed earlier and again reach node B, but this time, we traverse through in the untraced
path a traverse through node E. There are two branches at node E, but let’s traverse node I
(lexicographical order) and then retrace the path as we have no further number of nodes after
E to traverse. Then we traverse node J as it is the untraced branch and then again find we are
at the end and retrace the path and reach node B and then we will traverse the untraced
branch, i.e., through node C, and repeat the same process. This is called the DFS Algorithm.
This is another graph search algorithm in AI that traverses breadthwise to search for the goal
in a tree. It begins searching from the root node and expands the successor node before
expanding further along breadthwise and traversing those nodes rather than searching depth-
wise.
The above figure is an example of a BFS Algorithm. It starts from the root node A and then
traverses node B. Till this step, it is the same as DFS. But here, instead of expanding the
children of B as in the case of DFS, we expand the other child of A, i.e., node C because of
BFS, and then move to the next level and traverse from D to G and then from H to K in this
typical example. To traverse here, we have only taken into consideration the lexicographical
Uniform Cost Search (UCS) is a graph traversal and search algorithm used in the field of
artificial intelligence and computer science. UCS is an informed search algorithm that
explores a graph by gradually expanding nodes starting from the initial node and moving
towards the goal node while considering the cost associated with each edge or step.
This algorithm is mainly used when the step costs are not the same, but we need the optimal
solution to the goal state. In such cases, we use Uniform Cost Search to find the goal and the
path, including the cumulative cost to expand each node from the root node to the goal node.
It does not go depth or breadth. It searches for the next node with the lowest cost, and in the
case of the same path cost, let’s consider lexicographical order in our case.
In the above figure, consider S to be the start node and G to be the goal state. From node S
we look for a node to expand, and we have nodes A and G, but since it’s a uniform cost
search, it’s expanding the node with the lowest step cost, so node A becomes the successor
rather than our required goal node G. From A we look at its children nodes B and C. Since C
has the lowest step cost, it traverses through node C. Then we look at the successors of C, i.e.,
D and G. Since the cost to D is low, we expand along with node D. Since D has only one
child G which is our required goal state we finally reach the goal state D by implementing
UFS Algorithm. If we have traversed this way, definitely our total path cost from S to G is
just 6 even after traversing through many nodes rather than going to G directly where the cost
is 12 and 6<<12(in terms of step cost). But this may not work with all cases.
DLS is an uninformed search algorithm. This is similar to DFS but differs only in a few
ways. The sad failure of DFS is alleviated by supplying a depth-first search with a
predetermined depth limit. That is, nodes at depth are treated as if they have no successors.
This approach is called a depth-limited search. The depth limit solves the infinite-path
problem. Depth-limited search can be halted in two cases:
1. Standard Failure Value (SFV): The SFV tells that there is no solution to the problem.
2. Cutoff Failure Value (CFV): The Cutoff Failure Value tells that there is no solution
The above figure illustrates the implementation of the DLS algorithm. Node A is at Limit =
start state is considered to be node A, and our goal state is node H. To reach node H, we
apply DLS. So in the first case, let’s set our limit to 0 and search for the goal.
Since limit 0, the algorithm will assume that there are no children after limit 0 even if nodes
exist further. Now, if we implement it, we will traverse only node A as there is only one node
in limit 0, which is basically our goal state. If we use SFV, it says there is no solution to the
problem at limit 0, whereas LCV says there is no solution for the problem until the set depth
limit. Since we could not find the goal, let’s increase our limit to 1 and apply DFS till limit 1,
even though there are further nodes after limit 1. But those nodes aren’t expanded as we have
first case, if we use SFV, it says there is no solution to the problem at limit 1, whereas LCV
says there is no solution for the problem until the set depth limit 1. Hence we again increase
Till limit 2, DFS will be implemented from our start node A and its children B, C, D, and E.
Then from E, it moves to F, similarly backtracks the path, and explores the unexplored
branch where node G is present. It then retraces the path and explores the child of C, i.e.,
node H, and then we finally reach our goal by applying DLS Algorithm. Suppose we have
further successors of node F but only the nodes till limit 2 will be explored as we have limited
This image explains the DLS implementation and could be referred to for better
understanding.
2. Cutoff Failure Value: It defines no solution for the problem within a given depth
limit.
It is a search algorithm that uses the combined power of the BFS and DFS algorithms. It is
iterative in nature. It searches for the best depth in each iteration. It performs the Algorithm
until it reaches the goal node. The algorithm is set to search until a certain depth and the
depth keeps increasing at every iteration until it reaches the goal state.
In the above figure, let’s consider the goal node to be G and the start state to be A. We
perform our IDDFS from node A. In the first iteration, it traverses only node A at level 0.
Since the goal is not reached, we expand our nodes, go to the next level, i.e., 1 and move to
the next iteration. Then in the next iteration, we traverse the node A, B, and C. Even in this
iteration, our goal state is not reached, so we expand the node to the next level, i.e., 2, and the
nodes are traversed from the start node or the previous iteration and expand the nodes A, B,
C, and D, E, F, G. Even though the goal node is traversed, we go through for the next
iteration, and the remaining nodes A, B, D, H, I, E, C, F, K, and G(BFS & DFS) too are
explored, and we find the goal state in this iteration. This is the implementation of the IDDFS
Algorithm.
Before moving into bidirectional search, let’s first understand a few terms.
search. Basically, if the average branching factor going out of node / fan-out, if fan-out is
less, prefer forward search. Else if the average branching factor going into a node/fan-in is
less (i.e., fan-out is more), prefer backward search. We must traverse the tree from the start
node and the goal node, and wherever they meet, the path from the start node to the goal
through the intersection is the optimal solution. The BS Algorithm is applicable when
generating predecessors is easy in both forward and backward directions, and there exist only
This figure provides a clear-cut idea of how BS is executed. We have node 1 as the start/root
node and node 16 as the goal node. The algorithm divides the search tree into two sub-trees.
So from the start of node 1, we do a forward search, and at the same time, we do a backward
search from goal node 16. The forward search traverses nodes 1, 4, 8, and 9, whereas the
backward search traverses through nodes 16, 12, 10, and 9. We see that both forward and
backward search meets at node 9, called the intersection node. So the total path traced by
forwarding search and the path traced by backward search is the optimal solution. This is how
4 Give an example of a problem for which breadth first search would work better than
Let's consider a scenario where Breadth-First Search (BFS) would be more suitable than
edge between cities has the same weight (indicating equal distance or travel time). You want
Graph Structure: The graph represents a network of cities connected by roads, with equal
weights on all edges. It does not have a significant depth or branching factor.
Shortest Path Requirement: You specifically need to find the shortest path from the
starting city to the destination city, without considering other paths or exploring
deeper branches.
Memory Efficiency: BFS explores nodes level by level, ensuring that it always finds the
shortest path first. It may use more memory than DFS due to the need to store all
nodes at the current level, but in this scenario, the memory usage is not a critical
concern.
Shortest Path Guarantee: BFS guarantees that the first instance the destination city is
Here's a simplified example to illustrate BFS finding the shortest path in a graph:
● Graph:
A -- B -- C
| | |
D -- E -- F
| | |
G -- H -- I
● Starting Point: A
● Destination: F
Applying BFS, the algorithm would explore the graph level by level:
Start at A.
In this scenario, BFS ensures that the shortest path is found by systematically exploring nodes
in order of their distance from the starting point. This approach guarantees optimality in
finding the shortest path, making BFS better suited than DFS when the goal is to find the
let's apply Depth-First Search (DFS) to the same graph used in the previous example and
demonstrate its behavior. We'll start at node A and aim to reach node
Start at A: Mark A as visited and explore one of its neighbors (e.g., B).
Explore B: Mark B as visited and explore one of its neighbors (e.g., C).
Explore C: Mark C as visited and realize there are no unvisited neighbors left.
Explore D: Mark D as visited and explore one of its neighbors (e.g., E).
Explore E: Mark E as visited and explore one of its neighbors (e.g., F).
Explore F: Mark F as visited and realize there are no unvisited neighbors left.
Backtrack to A: Since all neighbors of A are visited, backtrack to the previous node (there
Backtrack to B: Since all neighbors of B are visited, backtrack to the previous node.
The sequence of node visits using DFS would be: A -> B -> C -> D -> E -> F -> H -> G -> I.
Now, let's analyze why DFS may not be suitable for finding the shortest path in an
Depth-First Nature: DFS prioritizes going as deep as possible along a branch before
No Shortest Path Guarantee: DFS does not guarantee finding the shortest path. Depending
on the order of exploration, DFS may find longer paths before finding the shortest
one.
In contrast, Breadth-First Search (BFS) guarantees finding the shortest path in an unweighted
graph because it explores nodes level by level, ensuring that shorter paths are discovered
For example –
Consider a agent program Indian Traveller developed for travelling from Pune to Chennai
travelling through different states. The initial state for this agent can be described as In
(Pune).
2) A description of the possible actions available to the agent
The most common formulation uses a successor funtion. Given a perticular state x,
SUCCESSOR Function (X) returns a set of <action, successor>, ordered pairs, where each
action is one of the legal actions in state x and each successor is a state that can be reached
from x by applying the action.
For example:
From the state In (Pune), the successor function for Indian Traveller problem wouldreturn.
Together, the initial state and successor function implicitly define the state space of the
problem - which is the set of all states reachable from the initial state.
The state space forms a graph in which the nodes are states and the arcs between nodes are
actions.
3) The goal test, which determines whether a given state is goal (final) state. In some
problems we can explicitely specify a set of goals. If a particular state is reached we can
check it with set of goals and if a match is found success can be announced.
For example:
In Indian Traveller problem the goal is to reach chennai i.e. it is a singleton set {In
(Chennai)}.
In certain types of problems we can not specify goals explicitly. Instead, goal is specified by
an abstract property rather than an explicitly enumerated set of states.
For example:
In chess, the goal is to reach a state called "Checkmate" where the opponent's king is under
attack and can not escape. This "Checkmate" situation can be represented using various state
spaces.
4) A path cost function that assigns a numeric cost (value) to each path. The problem-solving
agent is expected to choose a cost-function that reflects its own performance measure.
For Indian-Traveller agent we can have time requireded as cost for path-cost function. It
should consider length of each road being travelled.
In general step-cost of taking action 'a' to go from state x to state c (x, a, y).
The above four elements define a problem and can be put together in single data structure
which can be given as input to a problem-solving algorithm.
A solution to the problem is a path from the initial state to a goal state.
We can measure quality of solution by the path cost function. We can have multiple solutions
to the problem. The optimal solution will be the one with lowest path cost among all the
solutions.
It is a search algorithm where the search tree will be traversed from the root node. It will be
traversing, searching for a key at the leaf of a particular branch. If the key is not found, the
searcher retraces its steps back (backtracking) to the point from where the other branch was
left unexplored, and the same procedure is repeated for that other branch.
The above image clearly explains the DFS Algorithm. First, the search technique starts from
the root node A and then goes to the branch where node B is present (lexicographical order).
Then it goes to node D because of DFS, and from D, there is only one node to traverse, i.e.,
node H. But after node H does not have any child nodes, we retrace the path in which we
traversed earlier and again reach node B, but this time, we traverse through in the untraced
path a traverse through node E. There are two branches at node E, but let’s traverse node I
(lexicographical order) and then retrace the path as we have no further number of nodes after
E to traverse. Then we traverse node J as it is the untraced branch and then again find we are
at the end and retrace the path and reach node B and then we will traverse the untraced
branch, i.e., through node C, and repeat the same process. This is called the DFS Algorithm.
Advantage:
● DFS requires very less memory as it only needs to store a stack of the nodes on the
path from root node to the current node.
● It takes less time to reach to the goal node than BFS algorithm (if it traverses in the
right path).
Disadvantage:
● There is the possibility that many states keep re-occurring, and there is no guarantee
of finding the solution.
● DFS algorithm goes for deep down searching and sometime it may go to the infinite
loop.
Verdict
It occupies a lot of memory space and time to execute when the solution is at the bottom or
end of the tree and is implemented using the LIFO Stack data structure[DS].
● Complete: No
● Optimal: Yes
DLS is an uninformed search algorithm. This is similar to DFS but differs only in a few
ways. The sad failure of DFS is alleviated by supplying a depth-first search with a
predetermined depth limit. That is, nodes at depth are treated as if they have no successors.
This approach is called a depth-limited search. The depth limit solves the infinite-path
3. Standard Failure Value (SFV): The SFV tells that there is no solution to the problem.
4. Cutoff Failure Value (CFV): The Cutoff Failure Value tells that there is no solution
The above figure illustrates the implementation of the DLS algorithm. Node A is at Limit =
start state is considered to be node A, and our goal state is node H. To reach node H, we
apply DLS. So in the first case, let’s set our limit to 0 and search for the goal.
Since limit 0, the algorithm will assume that there are no children after limit 0 even if nodes
exist further. Now, if we implement it, we will traverse only node A as there is only one node
in limit 0, which is basically our goal state. If we use SFV, it says there is no solution to the
problem at limit 0, whereas LCV says there is no solution for the problem until the set depth
limit. Since we could not find the goal, let’s increase our limit to 1 and apply DFS till limit 1,
even though there are further nodes after limit 1. But those nodes aren’t expanded as we have
Hence nodes A, followed by B, C, D, and E, are expanded in the mentioned order. As in our
first case, if we use SFV, it says there is no solution to the problem at limit 1, whereas LCV
says there is no solution for the problem until the set depth limit 1. Hence we again increase
Till limit 2, DFS will be implemented from our start node A and its children B, C, D, and E.
Then from E, it moves to F, similarly backtracks the path, and explores the unexplored
branch where node G is present. It then retraces the path and explores the child of C, i.e.,
node H, and then we finally reach our goal by applying DLS Algorithm. Suppose we have
further successors of node F but only the nodes till limit 2 will be explored as we have limited
This image explains the DLS implementation and could be referred to for better
understanding.
3. Standard Failure: it indicates that the problem does not have any solutions.
4. Cutoff Failure Value: It defines no solution for the problem within a given depth limit
Advantages
Disadvantages
● The DLS has disadvantages of completeness and is not optimal if it has more than one
goal state.
Verdict
Artificial Intelligence is not a new word and not a new technology for researchers. This
technology is much older than you would imagine. Even there are the myths of Mechanical
men in Ancient Greek and Egyptian Myths. Following are some milestones in the history of
AI which defines the journey from the AI generation to till date development.
● Year 1943: The first work which is now recognized as AI was done by Warren
McCulloch and Walter pits in 1943. They proposed a model of artificial neurons.
● Year 1949: Donald Hebb demonstrated an updating rule for modifying the
connection strength between neurons. His rule is now called Hebbian learning.
● Year 1950: The Alan Turing who was an English mathematician and pioneered
Machine learning in 1950. Alan Turing publishes "Computing Machinery and
Intelligence" in which he proposed a test. The test can check the machine's ability to
exhibit intelligent behavior equivalent to human intelligence, called a Turing test.
● Year 1952: A computer scientist named Arthur samuel developed a program to play
checkers, which is the first to ever learn the game independently.
● Year 1955: An Allen Newell and Herbert A. Simon created the "first artificial
intelligence program"Which was named as "Logic Theorist". This program had
proved 38 of 52 Mathematics theorems, and find new and more elegant proofs for
some theorems.
● Year 1956: The word "Artificial Intelligence" first adopted by American Computer
scientist John McCarthy at the Dartmouth Conference. For the first time, AI coined as
an academic field.At that time high-level computer languages such as FORTRAN,
LISP, or COBOL were invented. And the enthusiasm for AI was very high at that
time.
● Year 1958: John McCarthy created LISP (acronym for List Processing), the first
programming language for AI research, which is still in popular use to this day.
● Year 1959:Arthur samuel created trem machine learning when doing a speech about
teaching machines to play chess better than the humans who programmed them.
● Year 1961: The first industrial robot unimate started working on an assembly line at
General Motors in New Jersey, tasked with transporting die casings and welding parts
on cars (which was deemed too dangerous for humans).
● Year 1965: Edward Feigenbaum and Joshua Lederberg created the first expert system
which was a form of AI programmed to replicate the thinking and decision-making
abilities of human experts.
● Year 1966: The researchers emphasized developing algorithms which can solve
mathematical problems. Joseph Weizenbaum created the first chatbot in 1966, which
was named as ELIZA.
● Year 1972: The first intelligent humanoid robot was built in Japan which was named
as WABOT-1.
● Year 1973: An applied mathematician named james light hill gave a report to the
British Science Council, underlining that strides were not as impressive as those that
had been promised by scientists, which led to much-reduced support and funding for
AI research from the British government.
● Year 1979: James L. Adams created The sanford cart in 1961, which became one of
the first examples of an autonomous vehicle. In ‘79, it successfully navigated a room
full of chairs without human interference.
● Year 1979: The American Association of Artificial Intelligence which is now known
as the Association for the Advancement of Artificial Intelligence (AAAI) was
founded.
● The duration between years 1974 to 1980 was the first AI winter duration. AI winter
refers to the time period where computer scientist dealt with a severe shortage of
funding from government for AI researches.
● During AI winters, an interest of publicity on artificial intelligence was decreased.
● The duration between the years 1987 to 1993 was the second AI Winter duration.
● Again Investors and government stopped in funding for AI research as due to high
cost but not efficient result. The expert system such as XCON was very cost effective.
● Year 1997: In the year 1997, IBM Deep Blue beats world chess champion, Gary
Kasparov, and became the first computer to beat a world chess champion.
● Year 2000: Professor Cynthia Breazeal developed the first robot that could simulate
human emotions with its face,which included eyes, eyebrows, ears, and a mouth. It
was called Kismet.
● Year 2002: for the first time, AI entered the home in the form of Roomba, a vacuum
cleaner.
● Year 2003: Nasa landed two rovers onto Mars (Spirit and Opportunity) and they
navigated the surface of the planet without human intervention.
● Year 2006: AI came in the Business world till the year 2006. Companies like
Facebook, Twitter, and Netflix also started using AI.
● Year 2010: Microsoft launched the Xbox 360 Kinect, the first gaming hardware
designed to track body movement and translate it into gaming directions.
● Year 2011: In the year 2011, IBM's Watson won jeopardy, a quiz show, where it had
to solve the complex questions as well as riddles. Watson had proved that it could
understand natural language and can solve tricky questions quickly.
● Year 2012: Google has launched an Android app feature "Google now", which was
able to provide information to the user as a prediction.
● Year 2014: In the year 2014, Chatbot "Eugene Goostman" won a competition in the
infamous "Turing test."
● Year 2018: The "Project Debater" from IBM debated on complex topics with two
master debaters and also performed extremely well.
● Google has demonstrated an AI program "Duplex" which was a virtual assistant and
which had taken hairdresser appointment on call, and lady on other side didn't notice
that she was talking with the machine.
● 2020: OpenAI started beta testing GPT-3, a model that uses Deep Learning to create
code, poetry, and other such language and writing tasks. While not the first of its kind,
it is the first that creates content almost indistinguishable from those created by
humans.
● 2021: OpenAI developed DALL-E, which can process and understand images enough
to produce accurate captions, moving AI one step closer to understanding the visual
world.
2.With relevant examples,discuss about the agent types and their PEASD Description
according to their uses
1. This is a simple type of agent which works on the basis of current percept and not
based on the rest of the percepts history.
2. The agent function, in this case, is based on condition-action rule where the condition
or the state is mapped to the action such that action is taken only when condition is
true or else it is not.
3. If the environment associated with this agent is fully observable, only then is the
agent function successful, if it is partially observable, in that case the agent function
enters into infinite loops that can be escaped only on randomization of its actions.
4. The problems associated with this type include very limited intelligence, No
knowledge of non-perceptual parts of the state, huge size for generation and storage
and inability to adapt to changes in the environment.
Model-Based Agents
1. Model-based agent utilizes the condition-action rule, where it works by finding a rule
that will allow the condition, which is based on the current situation, to be satisfied.
3. It consists of two important factors, which are Model and Internal State.
5. Internal State uses the perceptual history to represent a current percept. The agent
keeps a track of this internal state and is adjusted by each of the percepts. The current
internal state is stored by the agent inside it to maintain a kind of structure that can
describe the unseen world.
6. The state of the agent can be updated by gaining information about how the world
evolves and how the agent's action affects the world.
7. Example: A vacuum cleaner that uses sensors to detect dirt and obstacles and moves
and cleans based on a model.
Goal-Based Agents
1. This type takes decisions on the basis of its goal or desirable situations so that it can
choose such an action that can achieve the goal required.
2. It is an improvement over model based agent where information about the goal is also
included. This is because it is not always sufficient to know just about the current
state, knowledge of the goal is a more beneficial approach.
3. The aim is to reduce the distance between action and the goal so that the best possible
way can be chosen from multiple possibilities. Once the best way is found, the
decision is represented explicitly which makes the agent more flexible.
Utility-Based Agents
1. Utility agent have their end uses as their building blocks and is used when best action
and decision needs to be taken from multiple alternatives.
2. It is an improvement over goal based agent as it not only involves the goal but also
the way the goal can be achieved such that the goal can be achieved in a quicker,
safer, cheaper way.
3. The extra component of utility or method to achieve a goal provides a measure of
success at a particular state that makes the utility agent different.
4. It takes the agent happiness into account and gives an idea of how happy the agent is
because of the utility and hence, the action with maximum utility is considered. This
associated degree of happiness can be calculated by mapping a state onto a real
number.
5. Mapping of a state onto a real number with the help of utility function gives the
efficiency of an action to achieve the goal.
6. Example: A delivery drone that delivers packages to customers efficiently while
optimizing factors like delivery time, energy consumption, and customer satisfied
Learning Agents
1. Learning agent, as the name suggests, has the capability to learn from past
experiences and takes actions or decisions based on learning capabilities. Example: A
spam filter that learns from user feedback.
2. It gains basic knowledge from past and uses that learning to act and adapt
automatically.
3. It comprises of four conceptual components, which are given as follows:
● Learning element: It makes improvements by learning from the environment.
● Critic: Critic provides feedback to the learning agent giving the performance measure
of the agent with respect to the fixed performance standard.
● Performance element: It selects the external action.
● Problem generator: This suggests actions that lead to new and informative
experiences.
PEAS stands for performance measure, environment, actuators, and sensors. PEAS defines
AI models and helps determine the task environment for an intelligent agent.
Performance measure: It defines the success of an agent. It evaluates the criteria that
determines whether the system performs well.
Actuators: They are responsible for executing actions based on the decisions made. They
interact with the environment to bring about desired changes.
Sensors: An agent observes and perceives its environment through sensors. Sensors provide
input data to the system, enabling it to make informed decisions.
Examples
Dijkstra’s algorithm is used to find the shortest path between the two mentioned vertices of a
graph by applying the greedy algorithm as the basis of principle.For Example: Used to find
the shortest between the destination to visit from your current location on a Google map.Now
let’s look into the working principle of Dijkstra’s algorithm.
To find the shortest path between two given vertices of a graph, we will follow the following
mentioned steps of the algorithm/approach, which are:
Where,
Note: By default, the source node's immediate and non-immediate distance to the other nodes
in the graph is “∞ (Infinite).”
For Example: Find the shortest path for the given graph.
1. We will find the shortest path from node A to the other nodes in the graph, assuming that
node A is the source.
if 0 + 20 < ∞
>[TRUE]
Node A to Node B = 20
if 0 + 50 < ∞
>[TRUE]
Node A to Node B = 50
if 20 + 10 < ∞
>[TRUE]
Node B to Node C = 30
5. By the value obtained from step 3, we change the shortest distance between node A to
Node C to 30 from the previous distance of 50.
EXAMPLE:
Let’s apply Dijkstra’s Algorithm for the graph given below, and find the shortest path from
node A to node C:
Solution:
2. Calculating the distance between node A and the immediate nodes (node B & node D):
For node B,
Node A to Node B = 3
For node D,
Node A to Node D = 8
3. Choose the node with the shortest distance to be the current node from unvisited nodes,
i.e., node B. Calculating the distance between node B and the immediate nodes:
For node E,
For node E,
4. Choose the node with the shortest distance to be the current node from unvisited nodes,
i.e., node D. Calculating the distance between node D and the immediate nodes:
For node E,
For node F,
5. Choose the node with the shortest distance to be the current node from unvisited nodes,
i.e., node E. Calculating the distance between node E and the immediate nodes:
For node C,
For node F,
For node C,
Node F to Node C = 10+3 = 13 ([18 < 13] FALSE: So, Change the previous value)
So, after performing all the steps, we have the shortest path from node A to node C, i.e., a
value of 13 units.
4.Define the following problems .what type of control strategy is used in the following
problem
1. 1.tower of hanoi
2. 2.crypto-arthimetic
● The Tower of Hanoi problem is most commonly solved using a recursive algorithm
based on the divide-and-conquer strategy.
If there is only one disk to move, move it directly to the target rod.
If there are more than one disk, recursively move the top
n−1
n−1 disks from the source rod to an auxiliary rod using the target rod as a
temporary holding area.
Move the largest disk from the source rod to the target rod.
● This recursive approach efficiently solves the Tower of Hanoi problem for any
number of disks in2^n−1 moves ,where n is the number of disc
Cryptarithmetic Problem:
Definition:
These detailed explanations should give you a comprehensive understanding of the control
strategies used for solving the Tower of Hanoi problem and cryptoarithmetic puzzles.
UNIT II
PROBLEM SOLVING WITH SEARCH TECHNIQUES
Q.N
PART – A
o
1. What is informed search? CO K1
An "informed search" refers to a category of search algorithms 2
that use specific knowledge about the problem domain to find
solutions more efficiently. This is contrasted with "uninformed
search" algorithms, which do not have any additional information
about the state space or the goal beyond the problem definition.
Informed search algorithms leverage heuristics to guide the search
towards goal states, potentially reducing the number of states they
have to explore compared to an uninformed search
Why does one go for heuristics search? CO K2
2
Efficiency
Scalability
2. Practicality
Solvability
Flexibility
Trade off between
Differentiate Blind Search and Heuristic Search? CO K2
Blind search, also known as uninformed search, does not use any 2
additional information about the problem other than the problem's
structure itself. It systematically explores the search space without
any guidance on which direction might lead to a solution.
3.
Heuristic search, or informed search, uses additional information
about the problem (usually in the form of a heuristic function) to
make estimates about the benefit of following each path. It uses
this information to guide the search more intelligently towards the
goal.
What is CSP? CO K1
In the context of machine learning and artificial intelligence, CSP 2
stands for Constraint Satisfaction Problem. A Constraint
Satisfaction Problem is a mathematical question defined by a set
4. of objects whose state must satisfy a number of constraints or
limitations. CSPs are a type of problem frequently encountered in
fields such as AI, computer science, and operations research,
where the goal is to find a configuration of variables that meets all
the given constraints.
State Game theory. CO K1
Game theory is a mathematical framework designed for analyzing 2
situations in which players make decisions that are
interdependent. This interdependence causes each participant to
consider the other participants' decisions or strategies when
5.
formulating their own strategy. Originally developed as a tool for
understanding economic behavior, game theory is now used in
various fields, including psychology, biology, politics, and
computer science, to study competitive situations where the
outcome for each participant depends on the actions of others.
6. What is perfect information and imperfect information? CO K1
A game of perfect information is one in which all players have 2
complete knowledge about the game's state and the history of play
at all times. This means that every decision in the game is made
with full knowledge of all the events that have previously
occurred. There are no hidden cards, secret moves, or private
information. Each player, when making a decision, knows the full
history of actions that have led to that point.
Players
There are two players in Tic-Tac-Toe:
1. Player X
2. Player O
1. Alpha (α): The best value that the maximizer currently can
guarantee at that level or above.
2. Beta (β): The best value that the minimizer currently can
guarantee at that level or above.
2. Choose a Value
Once a variable is selected, choose a value for that variable from
its domain. The order of value selection can also be optimized
through heuristics such as Least Constraining Value, which
prefers the value that rules out the fewest choices for the
neighboring variables in the constraint graph.
3. Apply Constraints
Apply constraints to the current assignment to filter the domains
of the remaining variables. This could involve:
5. Recursive Call
If the current assignment does not lead to a contradiction:
6. Solution Check
If all variables are successfully assigned without contradictions:
7. Exit Condition
The process terminates when:
● A solution is found, or
● All possibilities are exhausted, indicating that no solution
exists.
Q.N PART-B
o
1. Explain A* Search with an example.(13) CO2 K2
A* Search is a popular and powerful pathfinding and graph
traversal algorithm that efficiently finds the shortest path from a
start node to a target node while trying to minimize the total cost
(distance, time, etc.). It combines features of Dijkstra’s Algorithm
(which finds the shortest path) and Greedy Best-First-Search
(which is faster but less accurate) by using heuristics to estimate
the cost of the cheapest path from each node to the destination.
f(n)=g(n)+h(n)
1. Initialize: Start with only the initial node. This node's g(n)
is zero because it's the starting point. h(n) is calculated
using the heuristic.
2. Open Set: This is a priority queue that stores all the nodes
to be explored. Nodes are sorted by their f(n) values.
3. Closed Set: A set of nodes already explored.
4. Loop:
● Choose the node with the lowest f(n) value from
the open set. This is the most promising next step.
● If this node is the target node, reconstruct the path
from start to finish.
● For each neighbor of this node:
● If the neighbor is in the closed set, skip it.
● Calculate g(n) for the neighbor. If it's not
already in the open set, or if the new g(n)
is lower than previously recorded, update
the neighbor’s g(n).
● Update the neighbor's f(n) and add it to
the open set.
● Move the current node to the closed set and repeat.
5. Completion: The loop continues until the open set is
empty (meaning no path was found) or the target node is
dequeued from the open set (path found).
Starting from (0,0), the algorithm explores paths using the sum of
the actual distance traveled from the start and the Manhattan
distance to the goal. It will efficiently navigate around blocks by
dynamically updating paths based on the lowest f(n) values until
it reaches (4,4).
Example
Consider a simplified map where cities are nodes and roads are
edges connecting these nodes. The goal is to find a route from city
A to city D. The heuristic used is the straight-line distance from
each city to city D (the target).
1. Define Variables
Each letter (C, R, O, S, A, D, N, G, E) represents a distinct digit
(0-9).
2. Define Domain
Each variable (letter) can take a value between 0 and 9. However,
the leading digits (C, D) cannot be zero because they are at the
start of the number representation.
3. Define Constraints
● Each letter must represent a different digit.
● The sum "CROSS" + "ROADS" must equal "DANGER"
when the letters are replaced with their corresponding digit
values.
4. Set up the problem:
To solve the problem, we can express it as:
● C, R, O, S, A, D, N, G, E ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
● C ≠ 0, D ≠ 0 (since they are the most significant digits in
their respective numbers)
● All digits are unique.
5. Mathematical Representation:
Let's set up the equation for the puzzle:
6. Constraint Satisfaction:
The solution requires finding values for C, R, O, S, A, D, N, G, E
that satisfy all equations and constraints. This involves checking
combinations and ensuring all constraints are met.
7. Solve:
This problem can be solved via a constraint satisfaction algorithm,
such as backtracking with forward checking. Each step involves:
Minimax Algorithm
The Minimax algorithm aims to minimize the possible loss for a
worst-case scenario. When applied to games like chess, tic-tac-
toe, etc., the algorithm considers all possible moves, simulates
them on the game board, and returns the best move the player can
make. Here's how it works:
Example: Tic-Tac-Toe
● Imagine a simple scenario in tic-tac-toe where you (X)
have two possible moves. One move leads to an immediate
win, and the other move continues the game without a
clear path to victory.
● The Minimax algorithm evaluates the immediate win as a
higher value (+1) and the other path as less favorable (0 or
negative, depending on the likelihood of eventually
losing). Thus, it chooses the immediate win.
Alpha-Beta Pruning
Alpha-Beta Pruning is an optimization technique for the Minimax
algorithm. It reduces the number of nodes evaluated in the search
tree by pruning branches that cannot possibly influence the final
decision. It uses two values, alpha and beta:
CSPs Aim
The goal of a CSP is to find a value for each variable from its
domain such that all constraints are satisfied. If no assignment
satisfies all constraints, the problem is deemed unsolvable.
1. Players
The game involves a set of players, typically denoted as
objectives.
2. Initial State
The game begins from an initial state. This is the
positions.
3. States
The state space of the game includes all possible
4. Actions
For each state, there are actions available to the players.
5. Transition Model
This defines what the next state of the game will be, given
a current state and an action by a player. In deterministic
games, the transition model is a function
a is taken in state s.
6. Terminal States
These are the states where the game ends. In many
7. Utility Function
Also known as the payoff or objective function, it assigns
sum games, the utility values for each player are exact
Player(�)
s.
Example: Chess
● Players: Two players (White and Black).
● Initial State: Standard board setup with pieces in
initial positions.
● States: Any valid arrangement of pieces on the
board.
● Actions: Any legal move according to the rules of
chess.
● Transition Model: Given a board state and a
player's move, it defines the new board
configuration.
● Terminal States: States where the game is in
checkmate, stalemate, or agreed draw.
● Utility Function: +1 for a win, 0 for a draw, -1 for a
loss (from the perspective of one player).
● Player Function: Alternates between White and
Black depending on the turn.
formal structure.
Part - C
Q.N CO’s Bloom’s
Questions Level
o
Algorithm Description
The steepest hill climbing algorithm iteratively explores the
f(x)=−x
Considerations
● Local vs. Global Maximum: Steepest Hill Climbing
can get stuck at local maxima or plateaus and may not
find the global maximum.
● Step Size and Neighbors: The definition of neighbors
(step size, direction) significantly impacts the
efficiency and outcome of the algorithm.
● Restart Strategies: To overcome local maxima, a
common strategy is to restart the algorithm from
different initial states (Random-restart hill climbing).
landscapes.
● f(n).
5. Repetition: Move the chosen node to the closed list and
repeat the process from step 2.
6. Completion: This continues until the goal is found or the
open list is empty (no path).
1. Start at S with
2. f(S)=g(S)+h(S)=0+3=3 (Manhattan distance from S to
G).
3. Explore neighbors (down and right from S). Calculate
f for each:
● Down to [1]:
● g=1,
● ℎ=3
● h=3, so
● f=4.
● Right to [1]:
● g=1,
● ℎ=2
● h=2, so
● f=3.
4. Select the cell with the lowest f (right from S), and
continue.
5. Keep expanding until reaching G, always selecting
the node with the lowest f value from the open list.
Path Found: S -> Right [1] -> Right [1] -> Down to G.
Limitations of A* Algorithm
● Heuristic Dependent: The performance and efficiency
of A* depend significantly on the heuristic used. A
poorly chosen heuristic may lead to inefficient
pathfinding and longer processing times.
● Memory Consumption: A* keeps all generated nodes
in memory (in the open or closed list), which can
become a problem in large graphs or complex
environments, potentially leading to high memory
use.
● Admissibility and Consistency: The heuristic needs to
be admissible (never overestimates the true cost) and
consistent (the estimated cost from any node n to a
node p via any successor q is not less than the
estimated cost from n to p) for A* to guarantee the
shortest path. Crafting such heuristics can be non-
trivial.
● Optimality vs. Efficiency: While A* is optimal when
the heuristic is admissible, its runtime can still be
slow for very large graphs because it potentially
needs to explore many nodes.
processing scenarios.
Q.N
PART – A
o
What is Machine learning? C K
Machine Learning is a branch of artificial intelligence that O 1
develops algorithms by learning the hidden patterns of the 3
1.
datasets used it to make predictions on new similar type data,
without being explicitly programmed for each task.
3.
Dividing Data: Splitting the dataset into multiple folds (e.g., 5
or 10).
Rotating Roles: Training the model on a subset of the folds and
using the remaining fold for testing. This process is repeated
until each fold has served as a testing set.
What is ‘Training set’ and ‘Test set’? C K
O
Training Set: The data used to train a machine learning model, 3
1
allowing it to learn patterns.
4.
Given:
P(Positive | Disease)= The probability of testing positive given the
person has the disease (95%).
P(B|A)=0.95
P(Disease)= The probability a random person has the disease (1%)
P(A)=0.01
To Find: P(Disease | Positive)= P(A|B)= The probability of having the
disease given a positive test.
Given:
Overall probability of rain: 30% of the time, it rains in your city.
Probability of rain given cloudy skies: 50%
Probability of rain given clear skies: 10%
Forecast for tomorrow: Cloudy
To Find: P(Rain | Cloudy) = Probability of rain given it's cloudy.
Calculation:
Probability of cloudy weather: Let's assume the probability of
cloudy weather is 70%
Probability of NOT cloudy weather: This would be 100% - 70% =
30%.
a) Supervised Learning
c) Semi-Supervised Learning
Linear Regression vs
Logistic Regression
Linear Regression and Logistic Regression are the two famous
Machine Learning Algorithms which come under supervised
learning technique. Since both the algorithms are of supervised in
nature hence these algorithms use labeled dataset to make the
predictions. But the main difference between them is how they are
being used. The Linear Regression is used for solving Regression
problems whereas Logistic Regression is used for solving the
Classification problems. The description of both the algorithms is
given below along with difference table.
Linear Regression:
y= a0+a1x+ε
Logistic Regression:
AD
o The equation for logistic regression is:
(OR)
Equation of logistic
Equation of linear regression
regression: y(x) = e(a0 + a1x1 +
y = a0 + a1x1 + a2x2 + … a2x2 + … + aixi) / (1 +
+ aixi e(a0 + a1x1 + a2x2 + …
2. Here, + aixi))
y = response variable Here,
xi = ith predictor variable y = response variable
ai = average effect on y as xi = ith predictor variable
xi increases by 1 ai = average effect on y as
xi increases by 1
Given:
Disease Prevalence: Only 1 out of 1,000,000 people have the
disease.
Test Accuracy (true positive): If someone has the disease,
there's a 99% chance the test will be positive.
Test Accuracy (false positive): If someone is healthy, there's
a 1 in 1,000 (0.1%) chance the test will be positive.
To Find: The probability of having the disease given a
positive test .
Cross-Validation
Purpose: Cross-validation is a technique used
to evaluate how well a machine learning model
generalizes to unseen data. It's essential for
reliable model selection and tuning
hyperparameters.
How it Works:
1. Data Splitting: The dataset is divided into
multiple subsets called "folds" (e.g., 5
folds, 10 folds).
2. Iterative Training and Testing: For each
fold:
The fold is held out as a test set.
The model is trained on the
remaining folds.
The model's performance is
evaluated on the held-out test fold.
3. Averaging Results: The performance
scores from each iteration are averaged
to get an overall estimate of the model's
generalization ability.
Types: Common types of cross-validation
include:
o k-fold Cross-Validation: Data is split into
'k' folds.
o Stratified k-fold Cross-Validation:
Ensures each fold has a similar
distribution of classes as the whole
dataset (helpful for imbalanced data).
o Leave-One-Out Cross-Validation: A
form of k-fold where each fold contains a
single data point.
Overfitting
What it is: Overfitting occurs when a machine
learning model becomes too complex and
"memorizes" the training data along with its
noise and peculiarities. This leads to excellent
performance on the training set but poor
performance on new, unseen data.
Signs of Overfitting:
o Large gap between training and
validation/test accuracy.
o The model performs exceptionally well on
training data but struggles with new
examples.
How to Prevent Overfitting:
o Regularization: Techniques like L1
(Lasso) or L2 (Ridge) regularization add a
penalty term to the model's cost function
to discourage overly complex models.
o Early Stopping: Training is stopped
when performance on a validation set
starts to degrade.
o Simpler Models: Start with less complex
models and gradually increase complexity
if needed.
o More Data: If possible, collect more
training data.
o Cross-Validation: To reliably assess
model performance and choose
hyperparameters.
Underfitting
What it is: Underfitting happens when a model
is too simple to capture the underlying patterns
in the data. This results in poor performance on
both the training set and unseen data.
Signs of Underfitting:
o Poor accuracy on both training and
validation/test sets.
o The model fails to learn the complexity of
the data.
How to Prevent Underfitting:
o Increase Model Complexity: Try models
with more features or layers.
o Train for Longer: Allow more training
time for the model to learn.
o Feature Engineering: Create more
informative features.
UNIT 4
PART – A
1) Define Neural Network in the context of artificial intelligence.
Ans: A neural network is a machine learning program that emulates the human
brain’s decision-making process. It consists of layers of artificial neurons,
including an input layer, one or more hidden layers, and an output layer. These
interconnected nodes use weights and thresholds to process data, allowing
neural networks to classify and cluster information efficiently.
2) What is the purpose of an activation function in a neural network?
Ans: The activation function decides whether a neuron should be activated or
not by calculating the weighted sum and further adding bias to it. Its purpose is
to introduce non-linearity into the output of a neuron, allowing neural networks
to learn and perform more complex tasks.
3) Differentiate between a perceptron and a multi-layer perceptron (MLP).
Ans:
A perceptron is a single-layer neural network used for binary classification,
capable of learning linearly separable patterns. It consists of an input layer
connected directly to an output layer.
In contrast, a multi-layer perceptron (MLP) has multiple hidden layers
between the input and output layers. These additional layers allow it to learn
complex, non-linear relationships within the data.
4) State backpropagation in neural networks.
Ans: Backpropagation, also known as backward propagation of errors, is a
widely used method for calculating derivatives within deep feedforward neural
networks. It plays a crucial role in training these networks, such as stochastic
gradient descent, by determining how much each weight contributes to the
overall error or loss of the network’s predictions.
5) What is Decision tree?
Ans: A decision tree is a versatile supervised machine-learning algorithm used
for both classification and regression tasks. It constructs a flowchart-like tree
structure where each internal node represents a test on an attribute, branches
denote the outcomes of the test, and leaf nodes hold class labels.
6) What is pruning in decision tree and how is it done?
Ans: Decision tree pruning is a crucial technique in machine learning that
optimizes decision tree models by reducing overfitting and improving
generalization. It involves pre-pruning (early stopping) and post-pruning
(reducing nodes) to simplify the tree and enhance its ability to generalize to new
data.
7) Describe the structure of a decision tree and its components.
Ans: A decision tree is a hierarchical model that represents decisions or
decisions based on certain conditions. It consists of three main components: root
node, internal nodes, and leaf nodes. The root node represents the initial
decision, internal nodes correspond to intermediate decisions based on features,
and leaf nodes indicate the final class labels or regression values.
8) What is the difference between Gini impurity and entropy as splitting
criteria in decision trees?
Ans:
Gini Impurity: It measures how heterogeneous or mixed a set is. The Gini
index ranges from 0 (maximum purity) to 0.5 (maximum impurity). A lower
Gini impurity indicates better separation of classes.
Entropy: Derived from information theory, entropy quantifies the disorder
or uncertainty in a set. It ranges from 0 (maximum purity) to 1 (maximum
impurity). Lower entropy signifies better class separation.
9) Mention the classification algorithms.
Ans:
Logistic Regression: A binary classifier that predicts categorical outcomes
(e.g., Yes/No, Spam/Not Spam) based on input features.
Support Vector Machine (SVM): Learns decision boundaries to separate
classes in both linear and non-linear scenarios.
Decision Tree: Constructs a tree-like structure to make decisions based on
features.
Artificial Neural Network (ANN): A powerful model inspired by the
human brain, capable of handling complex relationships.
10) State the role of support vectors in SVM classification?
Ans: Support vectors play a crucial role in SVM classification. These data
points are closest to the hyperplane and significantly influence its position and
orientation. By maximizing the margin of the classifier using these support
vectors, we achieve better separation between classes. Removing support
vectors would alter the hyperplane’s position, emphasizing their importance in
building an effective SVM model.
11) What is Random Forest?
Ans: Random Forest is a powerful and versatile supervised machine learning
algorithm that grows and combines multiple decision trees to create a “forest.”
It can be used for both classification and regression problems. During training,
it constructs several decision trees, each using a random subset of the dataset
and features.
12) Why is random forest better than SVM?
Ans: Random Forest is advantageous when dealing with complex and high-
dimensional datasets, as it can handle feature interactions effectively.
Additionally, it provides robustness against overfitting, which can be a
challenge for SVMs, especially when the dataset is noisy or contains outliers.
13) State Rule based classification with an example.
Ans: Rule-based classifiers use a set of if-then rules to assign instances to
predefined classes. These rules are interpretable and often used for creating
descriptive models.
Suppose we’re classifying emails as either “Spam” or “Not Spam.” Our rules
could be:
Rule 1 (Keyword “Free”):
If an email contains the word “Free” then classify it as “Spam.”
Rule 2 (Exclamation Marks):
If an email has three or more exclamation marks, then classify it as “Spam.”
14) Define Naive Bayes classification in machine learning.
Ans: Naive Bayes Classifier is a probabilistic machine learning model based
on Bayes’ theorem. It assumes independence between features and calculates
the probability of a given input belonging to a particular class. Naive Bayes is
widely used in text classification, spam filtering, and recommendation systems.
15) Specify the limitations of Naive Bayes classifiers.
Ans:
Naive Bayes assumes that all features are independent of each other, which
is often unrealistic in practical applications.
When encountering unseen features (i.e., features not present in the training
data) for a particular class, Naive Bayes may produce zero class
probabilities.
PART – B
1) Explain the steps in back propagation algorithm. What is the
importance of it in designing the neural network?
learning trajectory:
Forward Pass: During the initial forward pass, input data traverses through the
network, layer by layer, generating activations. Each neuron computes a
weighted sum of its inputs, applies an activation function, and forwards the
result to subsequent layers.
Input Data: Start with input data (features).
Weighted Sum and Activation: Compute the weighted sum of inputs for each
neuron, apply an activation function (e.g., sigmoid, ReLU), and propagate
the output forward through the layers.
Output Prediction: Obtain the final prediction (output) of the network.
Error Calculation: After the forward pass, the network's output is juxtaposed
against actual target values to compute the error. This quantification typically
employs a loss function, such as mean squared error or cross-entropy loss,
providing a measure of the discrepancy between predicted and true values.
Compare the predicted output with the actual target (ground truth) using a
loss function (e.g., mean squared error, cross-entropy).
The goal is to minimize this loss.
Backward Pass (Backpropagation): The crux of the backpropagation
algorithm unfolds in the backward pass, where the error is retroactively
propagated through the network to compute weight gradients. Commencing
from the output layer, gradients of the loss function with respect to neuron
weights are computed, employing the chain rule of calculus.
Gradient Descent: Calculate the gradient of the loss with respect to each
weight and bias.
Chain Rule: Use the chain rule from calculus to compute gradients layer by
layer.
Update Weights and Biases: Adjust weights and biases in the opposite
direction of the gradient to minimize the loss.
Weight Update: Post-gradients computation, weights are adjusted in the
opposite direction of the gradient to minimize error. This crucial adjustment
transpires through optimization algorithms like gradient descent, orchestrating
weight updates proportional to the negative gradient, hence refining network
parameters. It is expressed mathematically as,
∂ Loss ∂ Loss ∂ Output
= ⋅
∂ Weight ∂Output ∂ Weight
Iterative Training: The backpropagation cycle iterates through multiple
epochs, facilitating gradual refinement of network weights. Each iteration
encompasses a forward pass, error calculation, backward pass, and weight
update, iteratively honing the network's predictive prowess until convergence.
Iterate through the entire dataset multiple times (epochs).
In each epoch, update weights based on the average gradient across all
samples.
Importance in neural network design:
The pivotal role of the backpropagation algorithm in neural network design
emanates from its multifaceted contributions:
Learning Proficiency: Backpropagation empowers neural networks to discern
intricate patterns within data, iteratively adjusting weights to minimize error.
This dynamic learning capability enables networks to unravel complex
relationships and optimize predictive performance.
Architectural Flexibility: The algorithm's adaptability facilitates the training of
deep neural networks with multiple layers, fostering the extraction of
hierarchical features from raw input data. This architectural flexibility enables
neural networks to model intricate data relationships across diverse domains.
Generalization Efficacy: By mitigating overfitting, backpropagation promotes
model generalization, ensuring robust performance on unseen data.
Incorporation of regularization techniques within the backpropagation process
further enhances generalization capabilities, safeguarding against data
memorization.
Computational Efficiency: Backpropagation's computational efficiency and
scalability render it conducive to training large-scale neural networks on
extensive datasets. Modern deep learning frameworks provide optimized
implementations of backpropagation algorithms, facilitating efficient training
and model deployment.
Start
Humidity Wind
5) Apply the Decision tree algorithm with Gini Index as the splitting
criterion to classify the dataset given below.
Ans: The given data set is given below,
First, calculate the Gini impurity for the entire dataset, which represents the
overall uncertainty about the "Decision" class (Yes/No for playing golf).
Today’s weather is
given as,
PART – C
1) Draw the architecture of a single layer perceptron (SLP) and explain its
operation. Mention its advantages and disadvantages.
Ans:
Architecture: A single layer perceptron (SLP) is the most basic unit of artificial
neural networks. Here's a breakdown of its architecture with an accompanying
image:
Imagine a drawing with circles arranged in a line. On the left are a few blue
circles, each representing a single piece of information you feed the SLP. Maybe
you're trying to predict if an email is spam. The circles on the left might hold
information like the sender's address, keywords in the email, etc.
An arrow goes from each circle on the left to a
green circle in the middle. This green circle is
the single neuron in the SLP, like a tiny
processor. Each arrow has a different thickness,
showing how important that piece of
information is to the neuron's decision.
The neuron adds up all the information it
receives, considering how important each piece
is (based on the arrow thickness). Then, it
applies a special function to this sum, kind of
like a filter. Finally, it outputs a decision (like
spam or not spam) based on the filtered sum to the orange circles.
Operation:
1. Input Layer: This layer consists of multiple circles, each representing a
single data point. The number of circles corresponds to the number of
features in your input data. For example, if you're predicting house prices,
your input features might be square footage, number of bedrooms, and
location.
2. Weights: Each connection between an input node and the single neuron in
the middle layer has a weight associated with it. These weights are
visualized as arrows of varying thickness. A thicker arrow signifies a
stronger influence of that input on the neuron's output.
3. Neuron: The single neuron in the SLP acts as the processing unit. It receives
the data points from the input layer, multiplies them by their respective
weights, and sums them up.
4. Bias: A bias term is added to the weighted sum from the previous step. This
bias allows the neuron to shift the activation function and learn functions that
don't necessarily go through the origin (0,0) in the input space. It's depicted
as a small circle with a value beside it.
5. Activation Function: The combined sum from the weighted inputs and the
bias is then passed through an activation function. This function introduces
non-linearity into the model, allowing it to learn more complex patterns than
a simple linear model. Common activation functions include sigmoid, ReLU
(Rectified Linear Unit), and tanh (hyperbolic tangent).
6. Output Layer: The final output layer consists of a single circle representing
the final classification or prediction made by the SLP.
Advantages:
Simple and Easy to Understand: Due to its single layer structure, the SLP
the relationship between the input features and the output. By analysing the
weights, we can understand which features have a stronger influence on the
model's predictions.
Disadvantages:
Limited Learning Capability: One of the biggest limitations of SLPs is
their inability to learn linearly inseparable data, which isn't always applicable
in real-world scenarios with complex relationships between data points.
Not Suitable for Complex Problems: Due to the limitation above, they
might not be ideal for tasks like image recognition or natural language
processing.
Limited Applications: Because of their limitations, SLPs have been largely
replaced by more powerful neural network architectures like multi-layer
perceptrons (MLPs) which can learn more complex patterns by stacking
multiple layers of neurons.
where H ( S ) denotes the entropy of the original dataset, V represents the distinct
values of attribute A, Sv represents the subset of S corresponding to value v of
attribute A, and |S| denotes the total number of instances in dataset S.
Information gain evaluates the reduction in entropy achieved by partitioning the
dataset based on attribute A, with higher values indicating more informative
attribute splits.
In decision tree algorithms, information gain serves as a guiding metric for
attribute selection and node splitting. Decision trees aim to maximize
information gain at each node by selecting the attribute that minimizes entropy
and maximizes homogeneity within resulting subsets. Attributes with higher
information gain are preferred for splitting, as they contribute more significantly
to reducing uncertainty and improving classification accuracy.
Gini Impurity: Gini impurity is a measure of node impurity commonly used in
decision tree algorithms, particularly in the CART (Classification and
Regression Trees) algorithm. It quantifies the probability of misclassifying an
instance randomly chosen from a dataset based on the distribution of class
labels. The Gini impurity for a dataset \(S\) is calculated using the formula:
n
G ( S )=1−∑ p 2i Where, pi represents the proportion of instances belonging to class
i=1
i and n signifies the number of distinct classes. Gini impurity evaluates the
impurity of a dataset by summing the squared probabilities of each class and
subtracting the result from 1. Higher Gini impurity values indicate greater
impurity and uncertainty, while lower values signify more homogeneous and
pure datasets.
In decision tree algorithms, Gini impurity serves as a criterion for node splitting
and attribute selection. Decision trees aim to minimize Gini impurity at each
node by selecting the attribute that maximally reduces impurity and improves
classification accuracy. Attributes with lower Gini impurity after splitting are
preferred, as they lead to more homogeneous subsets and better separation of
classes.
between clusters:
2. Assignment Step:
3. Update Step:
4. Iteration:
shapes.
degrees of membership.
Applications of K-Means
K-Means is used across many fields for various applications,
set of variables into a smaller one that still contains most of the
purposes.
● Feature 1: [1, 2, 3]
● Feature 2: [5, 6, 7]
● Feature 3: [9, 10, 11]
Steps:
loss.
Limitations
● PCA assumes that the directions with the maximum
variance are the most important, which might not
always be the case.
● Linear transformations: PCA is not effective for
nonlinear relationships among data.
Architecture:
Training Process:
Features:
Advantages:
Disadvantages:
Architecture:
Features:
Advantages:
Disadvantages:
Architecture:
Advantages:
Disadvantages:
Architecture:
Features:
Advantages:
Disadvantages:
Step 1: Initialization
Step 3: Update
● Calculate the mean of the data points within each cluster. This
means it becomes the new centroid for that cluster.
● Repeat steps 2 and 3 until convergence, i.e., until the centroids
no longer change significantly or until a maximum number of
iterations is reached.
Data points:
(1, 1), (1, 2), (2, 1), (2, 3), (3, 2), (8, 8), (9, 8), (8, 9), (9, 9)
Step 1: Initialization
Step 2: Assignment
Cluster 1 (centered at (1, 1)): (1, 1), (1, 2), (2, 1), (2, 3), (3, 2)
Cluster 2 (centered at (9, 9)): (8, 8), (9, 8), (8, 9), (9, 9)
Step 3: Update
After convergence, the algorithm outputs the final clusters and their
centroids.
Part- C
1. List the applications of clustering and identify advantages and CO6
disadvantages of clustering algorithm. (14)
Clustering is a fundamental technique in unsupervised learning
used to group similar data points together. It finds applications
disadvantages:
Applications of Clustering:
1. Customer Segmentation:
● Description: Grouping customers based on
similarities in purchasing behavior, demographics,
or preferences.
● Advantages: Helps businesses tailor marketing
strategies, personalized recommendations, and
improve customer engagement.
● Disadvantages: May result in oversimplified
customer segments, difficulty in interpreting
complex clusters, and challenges in integrating
segmented strategies across different departments.
2. Image Segmentation:
● Description: Partitioning images into meaningful
regions based on visual similarities such as color,
texture, or intensity.
● Advantages: Facilitates object detection, image
retrieval, and medical image analysis.
● Disadvantages: Sensitivity to noise and lighting
variations, challenges in accurately delineating
boundaries, and difficulty in handling large-scale
image datasets.
3. Anomaly Detection:
● Description: Identifying unusual or abnormal
patterns in data that deviate from expected
behavior.
● Advantages: Helps detect fraud, network intrusions,
equipment failures, and other anomalies in various
domains.
● Disadvantages: Imbalanced datasets may lead to
biased models, difficulty in defining what
constitutes an anomaly, and challenges in
distinguishing anomalies from noise.
4. Document Clustering:
● Description: Organizing documents into clusters
based on similarities in content, topic, or sentiment.
● Advantages: Facilitates document categorization,
information retrieval, and content recommendation.
● Disadvantages: Challenges in handling large and
high-dimensional text data, difficulty in capturing
semantic meaning, and sensitivity to preprocessing
techniques.
5. Genomic Clustering:
● Description: Grouping genes or DNA sequences
based on similarities in expression patterns,
sequence homology, or functional annotations.
● Advantages: Aids in gene function prediction,
comparative genomics, and understanding
biological pathways.
● Disadvantages: Complexity in analyzing high-
throughput genomic data, challenges in integrating
clustering results with other omics data, and
difficulty in interpreting biological significance.
6. Market Basket Analysis:
● Description: Identifying associations and patterns in
transactional data to understand co-occurring
purchases and customer preferences.
● Advantages: Supports product recommendation,
inventory management, and pricing optimization.
● Disadvantages: Scalability issues with large
transactional datasets, challenges in handling
sparse and high-dimensional data, and potential
privacy concerns.
Advantages of Clustering:
applications.
Determination:
Determination: