Module XM

Artificial Intelligence(AI)
Artificial Intelligence is a method of making a computer, a

computer-controlled robot, or a software think intelligently like
the human mind. AI is accomplished by studying the patterns of
the human brain and by analyzing the cognitive process. The
outcome of these studies develops intelligent software and
systems.
Artificial Intelligence(AI) is a machine’s ability to perform

cognitive functions as humans do, such as perceiving, learning,
reasoning, and solving problems.
16/11/2023 1
Machine Learning
Machine Learning is concerned with computer
programs that automatically improve their
performance through experience.
— Herbert Alexander Simon.
Machine learning is a branch of artificial Intelligence

(AI) and computer science which focuses on the use of
data and algorithms to imitate the way that humans
learn, gradually improving its accuracy.
16/11/2023 2
Application of Machine Learning
Source : https://www.javatpoint.com/applications-of-machine-learning
16/11/2023 3
Types of Machine Learning
Supervised Learning Unsupervised Learning Reinforcement Learning
16/11/2023 4
Types of Machine Learning (Contd.)
Supervised learning In supervised learning, aim is to learn a mapping from the input to an output whose
correct values are provided by a supervisor.
Supervised learning is
the method in which we Input (X): This represents the data or features that are provided to the machine
teach the machine using learning model as input.
labelled data. Mapping: This mapping is basically a mathematical function that maps input (X)
Input (X)
and the output (Y).
Supervisor: The supervisor's role is to guide the model's learning process by
providing the ground truth.
Model: In supervised learning, model is designed to capture the relationship
between the input and output. Examples of models include linear regression,
Output (Y)
decision trees, neural networks, and many others. The parameters of the model
are adjusted during the training process to minimize the difference between the
predicted outputs and the actual outputs provided by the supervisor. The goal is to
make the model's predictions as close as possible to the correct values provided by
the supervisor.
16/11/2023 Output (Y): This is the target or the label that the model is trying to predict.
5
Unsupervised learning
In unsupervised Learning, there is no such supervisor and we only have input
data. The aim is to find the regularities in Input.
Unsupervised learning is Input (X): In unsupervised learning, we have input data (X), which consists of
the machine is trained features or observations. However, unlike supervised learning, we don't have
on un-labelled data corresponding output labels (Y) provided by a supervisor.
without any guidance. Discovering Patterns: The primary goal of unsupervised learning is to discover
patterns, structures, or relationships within the input data (X) without explicit
guidance. Instead of predicting specific outputs, our aim is aim to find inherent
structures or groupings in the data.
Model: Models or algorithms are required to identify patterns within the data.
Unsupervised learning is classified into two categories of algorithms: Clustering
and association.
Learning: In unsupervised learning, the learning process is about the model
autonomously identifying patterns or structures within the input data. This is
typically done by optimizing some objective function that quantifies how well the
model captures the underlying patterns.
No Ground Truth: Since there are no ground truth labels, the evaluation in
unsupervised learning can be more challenging. You might assess the quality of
clustering or the ability of dimensionality reduction to capture meaningful
features through various metrics. However, there's no direct comparison to
"correct answers" as in supervised learning.
16/11/2023 6
Reinforcement learning
In Reinforcement learning an agent interacts with its environment by producing action & discover errors and
rewards
To understand reinforcement learning better, consider a dog that we have to

house train. Here, the dog is the agent and the house, the environment.
We can get the dog to perform various actions by offering incentives such
as dog biscuits as a reward.
The dog will follow a policy to maximize its reward and hence will follow
every command and might even learn a new action, like begging, all by
itself.
16/11/2023 7
Reinforcement learning
The dog will also want to run around and play and explore its
environment. This quality of a model is called Exploration. The tendency
of the dog to maximize rewards is called Exploitation.
Important Terms in Reinforcement Learning

Agent: Agent is the model that is being trained via reinforcement learning
Environment: The training situation that the model must optimize to is called its environment
Action: All possible steps that can be taken by the model
State: The current position/ condition returned by the model
Reward: To help the model move in the right direction, it is rewarded/points are given to it to appraise some action
Policy: Policy determines how an agent will behave at any time. It acts as a mapping between Action and present State
16/11/2023 8
The below table shows the differences between the three main sub-branches of
machine learning.
Supervised Learning Unsupervised Learning, Reinforcement Learning
Data provided is labeled data, with Data provided is unlabeled data, the The machine learns from its
output values specified outputs are not specified, machine environment using reward and errors
makes its own prediction
Used to solve Regression and Used to solve association and Used to solve reward based problems
classification problems clustering problems
Labeled data is used Unlabeled data is used No predefined data is used
External supervision No supervision No supervision
Solved problems by mapping labeled Solved problems by understanding Follows trial and error problem solving
input to known output patterns and discovering output approach
16/11/2023 9
Least Square Regression Method
The least square regression is technique commonly used in Regression
analysis. It is a mathematical method used to find the best fit line that
represent the relationship between an independent and dependent
variable in such a way that the error is minimized.
16/11/2023 10
What is line of best fit?
The line of best fit is drawn across a scatter plot of data point in order to
represent a relationship between those data point.
16/11/2023 11
Steps to compute the Line of Best Fit
Step 1: Calculate the slope ‘m’ of the line
16/11/2023 12
Step 2: Compute the Y intercept
Y intercept of a line is the value of Y at the point where the line crosses the Y-axis.
Step 3: Substitute the value in the final equation
16/11/2023 13
Polynomial Regression
Polynomial Regression is a regression algorithm that models the
relationship between a dependent(y) and independent variable(x) as nth
degree polynomial. The Polynomial Regression equation is given below:
y= b0+b1x1+ b2x12+ b2x13+...... bnx1n
16/11/2023 14
Polynomial Regression
Need for Polynomial Regression:
• If we apply a linear model on a linear dataset, then it provides us a good result as we have seen in Simple Linear
Regression, but if we apply the same model without any modification on a non-linear dataset, then it will produce a
drastic output. Due to which loss function will increase, the error rate will be high, and accuracy will be decreased.
• So for such cases, where data points are arranged in a non-linear fashion, we need the Polynomial Regression
model. We can understand it in a better way using the below comparison diagram of the linear dataset and non-linear
dataset.
16/11/2023 15
Unsupervised Learning Python Example
Common scenarios for using unsupervised learning algorithms include:
- Data Exploration
- Outlier Detection
- Pattern Recognition
While there is an exhaustive list of clustering algorithms available in Python’s Scikit-Learn library.
K-Means clustering
The most common and simplest clustering algorithm out there is
the K-Means clustering. This algorithms involve you telling the
algorithms how many possible cluster (or K) there are in the
dataset. The algorithm then iteratively moves the k-centers and
selects the datapoints that are closest to that centroid in the
cluster.
16/11/2023 16
Unsupervised Learning Python Example (Contd.)
Methodology for picking the K value

One obvious question that may come to mind is the
methodology for picking the K value. This is done using an
elbow curve, where the x-axis is the K-value and the y axis is
some objective function. A common objective function is the
average distance between the datapoints and the nearest
centroid.
16/11/2023 17
Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a dimensionality reduction technique used to transform
high-dimensional data into a lower-dimensional space while preserving the most important
information.
• Objective: PCA aims to find a new set of orthogonal variables, known as principal
components, that capture the maximum variance in the data.
• Steps of PCA:
• Standardize the data to have zero mean and unit variance.
• Compute the covariance matrix or correlation matrix of the standardized data.
• Perform eigenvalue decomposition on the covariance matrix to obtain the eigenvectors and eigenvalues.
• Select the top k eigenvectors corresponding to the largest eigenvalues as the principal components.
• Transform the original data onto the new lower-dimensional space using the selected principal
components.
Equation: Principal Component: PC = XW
PC: Principal component vector
X: Original data matrix
W: Weight matrix of eigenvectors
16/11/2023 18
Mathematical Preliminaries
Application of Principal Component Analysis
in Machine Learning
• Dimensionality Reduction:
• PCA is widely used for reducing the dimensionality of high-dimensional datasets while retaining
important information.
• By selecting a subset of principal components, we can represent the data in a lower-dimensional
space without significant loss of information.
• This leads to more efficient computation and visualization of the data.
Example: Facial Recognition
• PCA has been applied in facial recognition tasks to reduce the dimensionality of face
images.
• In this application, a dataset of face images is transformed using PCA, and the
resulting principal components represent the most discriminative features.
• The reduced-dimensional representation facilitates efficient face recognition and can
be used for tasks such as facial identification and verification.
16/11/2023 19
Singular Value Decomposition (SVD)
Singular Value Decomposition (SVD) is a matrix factorization technique that
decomposes a matrix into three separate matrices.
• Objective: SVD allows us to represent a matrix as a product of three components, capturing the
underlying structure and patterns in the data.
• Components of SVD:
• U matrix: Left singular vectors, representing the orthogonal basis for the row space of the
original matrix.
• Σ matrix: Diagonal matrix with singular values, representing the strengths of each basis
vector.
• V^T matrix: Right singular vectors, representing the orthogonal basis for the column space of
the original matrix.
• Equation: Original Matrix, A = UΣV^T
A: Original matrix
U: U matrix of left singular vectors
Σ: Σ matrix of singular values
V^T: Transpose of V matrix of right singular vectors
16/11/2023 20
Singular Value Decomposition (SVD)
Application of Singular Value Decomposition in Machine Learning.
• Dimensionality Reduction:
• SVD is extensively used for reducing the dimensionality of high-dimensional data,
similar to PCA.
• By retaining only the top-k singular values and corresponding singular vectors, we can
represent the data in a lower-dimensional space.
Example: Image Compression
• SVD is commonly employed in image compression techniques, such as JPEG.
• In this application, an image is represented as a matrix, and SVD is applied to
decompose the matrix.
• By truncating the singular values and singular vectors, the image can be
reconstructed with a lower number of components, resulting in reduced
storage requirements without significant loss of visual quality.
16/11/2023 21
Type of probability
Probability of Multiple Random Variables
In machine learning, we are likely to work with many random variables.
For example, given a table of data, such as in excel, each row represents a separate observation or event, and each column represents a
separate random variable.
• This is complicated as there are many ways that random variables can interact, which, in turn, impacts their probabilities.
• This can be simplified by reducing the discussion to just two random variables (X, Y), although the principles generalize to multiple
variables.
• And further, to discuss the probability of just two events, one for each variable (X=A, Y=B), although we could just as easily be discussing
groups of events for each variable.
• Therefore, we will introduce the probability of multiple random variables as the probability of event A and event B, which in shorthand
is X=A and Y=B.
As such, there are three main types of probability we might want to consider; they are:
• Joint Probability: Probability of events A and B.
• Marginal Probability: Probability of event X=A given variable Y.
• Conditional Probability: Probability of event A given event B.
16/11/2023 22
Type of probability
Joint Probability
The term joint probability refers to a statistical measure that calculates the likelihood of two events occurring together and at the
same point in time.
In order for joint probability to work, both events must be independent of one another, which means they aren't conditional or don't
rely on each other.
The probability of A times the probability of B equals the joint probability of A and B happening at the same time.
Mathematically, P(A and B) = P(A) x P(B).
So the joint probability of picking a card that

is both red and 6 from a deck is P(6 ∩ red) =
2/52 = 1/26 since a deck of cards has two red
sixes—the six of hearts and the six of
diamonds. Because the events red and 6 are
independent.
P(6∩red)=P(6)×P(red)=4/52×26/52=1/26
16/11/2023 23
Assignment Question
1. Find the probability that the number “4” will occur twice when 2 dices are rolled simultaneously.
Solution.
Number of possible outcomes when dice is rolled = 6
Let X be the event of number 4 occurring on the first die, whereas Y is the event of number 4 occurring on the
second die.
X = 1/6
Y = 1/6
P(X,Y) = 1/6 x 1/6 = 1/36
2. What is the joint probability of drawing two red cards with the number 9?
Solution.
Event ‘C’ - Probability of drawing a 9 = 2/52 = 0.0384
Event ‘D’ - Probability of drawing a card that is red = 26/52 = 0.5
P(C,D) = 0.0384x0.5 = 0.0192 = 1.92
16/11/2023 24
Marginal Probability
Marginal probability refers to the probability of an event occurring without
considering the outcome of other related events. Mathematically, for two random
variables X and Y, the marginal probability of X (P(X)) is obtained by summing or
integrating the joint probabilities of X and Y over all possible values of Y:
P(X)=∑yP(X,y) (for discrete variables)
P(X)=∫P(X,y)dy (for continuous variables)
Here, P(X, y) represents the joint probability of events X and Y occurring simultaneously. The
marginal probability distribution provides insight into the individual probabilities of each
variable, disregarding their interdependence.
16/11/2023 25
Assignment Question
Q1: A coin is tossed three times. What is the marginal probability of getting heads on the
first toss?
Solution: There are 8 possible outcomes of tossing a coin three times. The probability of getting heads on the first
toss is 3/8.
To calculate the marginal probability, we need to count the number of outcomes that have heads on the first toss and
divide that number by the total number of outcomes.
There are 3 outcomes that have heads on the first toss, so the marginal probability is 3/8.
Q2: A coin is tossed three times. What is the marginal probability of getting heads on the first toss?
16/11/2023 26
Conditional Probability
Conditional probability is known as the possibility of an event or outcome happening, based on the existence of a
previous event or outcome. It is calculated by multiplying the probability of the preceding event by the renewed
probability of the succeeding, or conditional, event.
It is denoted by P(A|B), which reads as "the probability of event A given event B." Mathematically, it's defined as:
P(A∣B)=P(B)P(A∩B)
Where:
P(A∣B) is the conditional probability of event A given event B.
P(A∩B) is the probability of both events A and B occurring.
P(B) is the probability of event B occurring.
16/11/2023 27
Assignment Question
Example 1: Two dies are thrown simultaneously, and the sum of the numbers obtained is found to be 7. What is
the probability that the number 3 has appeared at least once?
Solution:
The sample space S would consist of all the numbers possible by the combination of two dies. Therefore S consists of
6 × 6, i.e. 36 events.
Event A indicates the combination in which 3 has appeared at least once.
Event B indicates the combination of the numbers which sum up to 7.
A = {(3, 1), (3, 2), (3, 3)(3, 4)(3, 5)(3, 6)(1, 3)(2, 3)(4, 3)(5, 3)(6, 3)}
B = {(1, 6)(2, 5)(3, 4)(4, 3)(5, 2)(6, 1)}
P(A) = 11/36
P(B) = 6/36
A∩B=2
P(A ∩ B) = 2/36
Applying the conditional probability formula we get,
P(A|B) = P(A∩B)/P(B) = (2/36)/(6/36) = ⅓
16/11/2023 28
Assignment Question
Example 2: In a group of 100 computer buyers, 40 bought CPU, 30 purchased
monitor, and 20 purchased CPU and monitors. If a computer buyer chose at
random and bought a CPU, what is the probability they also bought a Monitor?
Solution:
As per the first event, 40 out of 100 bought CPU,
So, P(A) = 40% or 0.4
Now, according to the question, 20 buyers purchased both CPU and monitors. So, this
is the intersection of the happening of two events. Hence,
P(A∩B) = 20% or 0.2
By the formula of conditional probability we know;
P(B|A) = P(A∩B)/P(B)
P(B|A) = 0.2/0.4 = 2/4 = ½ = 0.5
The probability that a buyer bought a monitor, given that they purchased a CPU, is
50%.
16/11/2023 29

Module XM

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module XM

Uploaded by

Copyright:

Available Formats

Artificial Intelligence(AI)

Artificial Intelligence is a method of making a computer, a

Artificial Intelligence(AI) is a machine’s ability to perform

Machine learning is a branch of artificial Intelligence

Supervised Learning Unsupervised Learning Reinforcement Learning

To understand reinforcement learning better, consider a dog that we have to

Important Terms in Reinforcement Learning

Labeled data is used Unlabeled data is used No predefined data is used

External supervision No supervision No supervision

Step 1: Calculate the slope ‘m’ of the line

Step 3: Substitute the value in the final equation

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n

Methodology for picking the K value

So the joint probability of picking a card that

P(X)=∑yP(X,y) (for discrete variables)

P(X)=∫P(X,y)dy (for continuous variables)

You might also like

Module XM

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module XM

Uploaded by

Copyright:

Available Formats

Artificial Intelligence(AI)

Artificial Intelligence is a method of making a computer, a

Artificial Intelligence(AI) is a machine’s ability to perform

Machine learning is a branch of artificial Intelligence

Supervised Learning Unsupervised Learning Reinforcement Learning

To understand reinforcement learning better, consider a dog that we have to

Important Terms in Reinforcement Learning

Labeled data is used Unlabeled data is used No predefined data is used

External supervision No supervision No supervision

Step 1: Calculate the slope ‘m’ of the line

Step 3: Substitute the value in the final equation

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n

Methodology for picking the K value

So the joint probability of picking a card that

P(X)=∑y​P(X,y) (for discrete variables)

P(X)=∫P(X,y)dy (for continuous variables)

You might also like

P(X)=∑yP(X,y) (for discrete variables)