7 محاضرات

Data Preprocessing in Machine learning
Data preprocessing is a process of preparing the raw data and

making it suitable for a machine learning model. It is the first and
crucial step while creating a machine learning model.
When creating a machine learning project, it is not always a case
that we come across the clean and formatted data. And while
doing any operation with data, it is mandatory to clean it and put
in a formatted way. So, for this we use data preprocessing task.
Why do we need Data Preprocessing?

A real-world data generally contains noises, missing values, and
maybe in an unusable format which cannot be directly used for
machine learning models. Data preprocessing is required tasks for
cleaning the data and making it suitable for a machine learning
model which also increases the accuracy and efficiency of a
machine learning model.
It involves below steps:
o Getting the dataset
o Importing libraries
o Importing datasets
o Finding Missing Data
o Encoding Categorical Data
o Splitting dataset into training and test set
o Feature scaling
1) Get the Dataset
To create a machine learning model, the first thing we required is
a dataset as a machine learning model completely works on data.
The collected data for a particular problem in a proper format is
known as the dataset.
Dataset may be of different formats for different purposes, such
as, if we want to create a machine learning model for business
purpose, then dataset will be different with the dataset required
for a liver patient. So, each dataset is different from another
dataset.
2) Importing Libraries
In order to perform data preprocessing using Python, we need to
import some predefined Python libraries. These libraries are used
to perform some specific jobs. There are three specific libraries
that we will use for data preprocessing
3) Importing the Datasets

Now we need to import the datasets which we have collected for
our machine learning project. But before importing a dataset, we
need to set the current directory as a working directory.
4) Handling Missing data:

The next step of data preprocessing is to handle missing data in
the datasets. If our dataset contains some missing data, then it
may create a huge problem for our machine learning model.
Hence it is necessary to handle missing values present in the
dataset.
Ways to handle missing data:
There are mainly two ways to handle missing data, which are:
By deleting the particular row: The first way is used to commonly
deal with null values. In this way, we just delete the specific row
or column which consists of null values. But this way is not so
efficient and removing data may lead to loss of information which
will not give the accurate output.
By calculating the mean (or any suitable mathematical
equation): In this way, we will calculate the mean of that column
or row which contains any missing value and will put it on the
place of missing value. This strategy is useful for the features
which have numeric data such as age, salary, year, etc. Here, we
will use this approach.
5) Encoding Categorical data:

Categorical data is data which has some categories such as, in our
dataset; there are two categorical variables, Country,
and Purchased.
Since machine learning model completely works on mathematics
and numbers, but if our dataset would have a categorical variable,
then it may create trouble while building the model. So it is
necessary to encode these categorical variables into numbers.
6) Splitting the Dataset into the Training set and Test set
In machine learning data preprocessing, we divide our dataset
into a training set and test set. This is one of the crucial steps of
data preprocessing as by doing this, we can enhance the
performance of our machine learning model.
Suppose, if we have given training to our machine learning model
by a dataset and we test it by a completely different dataset.
Then, it will create difficulties for our model to understand the
correlations between the models.
If we train our model very well and its training accuracy is also
very high, but we provide a new dataset to it, then it will decrease
the performance. So we always try to make a machine learning
model which performs well with the training set and also with the
test dataset. Here, we can define these datasets as:
Training Set: A subset of dataset to train the machine learning

model, and we already know the output.
Test set: A subset of dataset to test the machine learning model,
and by using the test set, model predicts the output.
7) Feature Scaling
Feature scaling is the final step of data preprocessing in machine
learning. It is a technique to standardize the independent
variables of the dataset in a specific range. In feature scaling, we
put our variables in the same range and in the same scale so that
no any variable dominate the other variable.
There are two ways to perform feature scaling in machine
learning:
Standardization
Normalization
AI Lab 2 Eng. Awsan Ahmed 1
Lab 2
There is an important difference between classification and regression

problems.
Fundamentally, classification is about predicting a label and regression is
about predicting a quantity.
Regression Predictive Modeling

Regression predictive modeling is the task of approximating a mapping
function (f) from input variables (X) to a continuous output variable (y).
A continuous output variable is a real-value, such as an integer or floating-
point value. These are often quantities, such as amounts and sizes.
For example, a house may be predicted to sell for a specific dollar value,
perhaps in the range of 100,000 to 200,000.
• A regression problem requires the prediction of a quantity.
• A regression can have real valued or discrete input variables.
• A problem with multiple input variables is often called a multiple
regression problem.
• A regression problem where input variables are ordered by time is
called a time series forecasting problem.
Because a regression predictive model predicts a quantity, the skill of the

model must be reported as an error in those predictions.
There are many ways to estimate the skill of a regression predictive model,
but perhaps the most common is to calculate the root mean squared error,
abbreviated by the acronym RMSE.
For example, if a regression predictive model made 2 predictions, one of

1.5 where the expected value is 1.0 and another of 3.3 and the expected
value is 3.0, then the RMSE would be:
RMSE = sqrt(average(error^2))
RMSE = sqrt(((1.0 - 1.5)^2 + (3.0 - 3.3)^2) / 2)
RMSE = sqrt((0.25 + 0.09) / 2)
RMSE = sqrt(0.17)
RMSE = 0.412
An algorithm that is capable of learning a regression predictive model is

called a regression algorithm.
linear regression
Linear regression analysis is used to predict the value of a variable based on
the value of another variable. The variable you want to predict is called the
dependent variable. The variable you are using to predict the other
variable's value is called the independent variable.
This form of analysis estimates the coefficients of the linear equation,
involving one or more independent variables that best predict the value of
the dependent variable. Linear regression fits a straight line or surface that
minimizes the discrepancies between predicted and actual output values.
Simple linear regression is one that has one dependent variable
and only one independent variable. If “income” is explained by the
“education” of an individual then the regression is expressed in terms of
simple linear regression as follows:
Income= b0+ b1*Education+ e
multiple regression
Multiple regression is like linear regression, but with more than one
independent value, meaning that we try to predict a value based on two or
more variables.
Multiple regression goes one step further, and instead of one, there will be
two or more independent variables. If we have an additional variable (let’s
say “experience”) in the equation above then it becomes a multiple
regression:
Income= b0+ b1*Education+ b2*Experience+ e
*Test
* Dataset with linear regression
* Dataset with multiple regression
AI – Lab3- Eng. Awsan Ahmed 1
Lab3
Logistic Regression in Machine Learning
Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised Learning
technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
Logistic regression predicts the output of a categorical dependent
variable. Therefore, the outcome must be a categorical or discrete
value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression except
that how they are used. Linear Regression is used for solving
Regression problems, whereas Logistic regression is used for
solving the classification problems.
In Logistic regression, It is the go-to method for binary
classification problems (problems with two class values).
The curve from the logistic function indicates the likelihood of
something such as whether the cells are cancerous or not, etc.
Logistic Regression is a significant machine learning algorithm
because it has the ability to provide probabilities and classify new
data using continuous and discrete datasets.
Logistic Function
Logistic regression is named for the function used at the core of
the method, the logistic function.
The logistic function, also called the sigmoid function. It’s an S-
shaped curve that can take any real-valued number and map it
into a value between 0 and 1, but never exactly at those limits.
Logistic Regression Equation:

The Logistic regression equation can be obtained from the Linear
Regression equation. The mathematical steps to get Logistic
Regression equations are given below:
o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this

let's divide the above equation by (1-y):
o But we need range between -[infinity] to +[infinity], then take

logarithm of the equation it will become:
The above equation is the final equation for Logistic Regression.

AI Lab4 ‫ـــ‬Eng. Awsan Ahmed 1
Lecture 4 (Performance Metrics)
Performance Metrics in Machine Learning

Performance metrics are a part of every machine learning problem. They
tell you if you’re making progress, and put a number on it.
Every machine learning task can be broken down to either Regression
or Classification, just like the performance metrics. There are dozens of
metrics for both problems, but we’re going to discuss popular ones along
with what information they provide about model performance.
Classification metrics
Classification Metrics evaluate a model’s performance and tell you how
good or bad the classification is, but each of them evaluates it in a different
way, so in order to evaluate Classification models, we’ll discuss these
metrics in detail:
• Confusion Matrix (not a metric but fundamental to others)

• Accuracy
• Precision and Recall
• F1-score
• AU-ROC
Confusion Matrix
Confusion Matrix is a tabular visualization of the ground-truth labels versus
model predictions. Each row of the confusion matrix represents the
instances in a predicted class and each column represents the instances in
an actual class. Confusion Matrix is not exactly a performance metric but
sort of a basis on which other metrics evaluate the results.
Each cell in the confusion matrix represents an evaluation factor. Let’s

understand these factors one by one:
• True Positive (TP) signifies how many positive class samples your
model predicted correctly.
• True Negative (TN) signifies how many negative class samples your
model predicted correctly.
• False Positive (FP) are positive outcomes that the model
predicted incorrectly. This is also known as Type I error.
• False Negative (FN) are negative outcomes that the model
predicted incorrectly. This is also known as Type II error.
Accuracy
Classification accuracy is perhaps the simplest metric to use and implement
and is defined as the number of correct predictions divided by the total
number of predictions, multiplied by 100.
Accuracy=(TP+TN) / (TP+FP+FN+TN)
Precision
Precision is the ratio of true positives and total positives predicted
Recall/Sensitivity/Hit-Rate
Recall is the percentage of actual positives that were correctly identified.
Having high precision but low recall means that although the model is good
at predicting the positive class, it only detects a small proportion of the
total number of positive outcomes. In our example, this would mean that
patients who were predicted to have cancer most likely have cancer but out
of all the patients who have cancer, the model only predicted a small
proportion of them to have cancer. Therefore, the model is under-
predicting.
Having low precision but high recall means that although the model
correctly predicted most of the positive cases, it also predicted a lot of
negatives to be positive too. In our example, it would mean that of all the
patients who actually have cancer, most of them were correctly predicted
to have cancer, there were also a lot of patients who didn’t have cancer
that the model predicted to have cancer. Therefore, the model is over-
predicting.
F1-score
The F1-score metric uses a combination of precision and recall. In fact, the
F1 score is the harmonic mean of the two. The formula of the two
essentially is:
𝑭𝟏 = 𝟐 ∗ (𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 ∗ 𝒓𝒆𝒄𝒂𝒍𝒍) / (𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 + 𝒓𝒆𝒄𝒂𝒍𝒍)
a high F1 score symbolizes a high precision as well as high recall.
AUROC (Area under Receiver operating characteristics curve)

Better known as AUC-ROC score/curves. It makes use of true positive rates
(TPR) and false positive rates (FPR).
• Intuitively TPR/recall corresponds to the proportion of positive data

points that are correctly considered as positive, with respect to all
positive data points. In other words, the higher the TPR, the fewer
positive data points we will miss.
• Intuitively FPR/fallout corresponds to the proportion of negative
data points that are mistakenly considered as positive, with respect
to all negative data points. In other words, the higher the FPR, the
more negative data points we will misclassify.
The resulting curve is called the ROC curve, and the metric we consider is
the area under this curve, which we call AUROC.
Regression metrics
Regression models have continuous output. So, we need a metric based on
calculating some sort of distance between predicted and ground truth.
In order to evaluate Regression models, we’ll discuss these metrics in
detail:
• Mean Squared Error (MSE),
• Root Mean Squared Error (RMSE),
• Mean Absolute Error (MAE),
Mean Squared Error (MAE)

Mean Squared Error, or MSE for short, is a popular error metric for
regression problems.
The MSE is calculated as the mean or average of the squared differences
between predicted and expected target values in a dataset.
• MSE = 1 / N * sum for i to N (y_i – ypred_i)^2
Where y_i is the i’th expected value in the dataset and ypred_i is the i’th
predicted value. The difference between these two values is squared, which
has the effect of removing the sign, resulting in a positive error value.
Root Mean Squared Error

The Root Mean Squared Error, or RMSE, is an extension of the mean
squared error.
Importantly, the square root of the error is calculated, which means that
the units of the RMSE are the same as the original units of the target value
that is being predicted.
We can restate the RMSE in terms of the MSE as:
RMSE = sqrt(MSE)
Mean Absolute Error

Mean Absolute Error, or MAE, is a popular metric because, like RMSE, the
units of the error score match the units of the target value that is being
predicted.
The MAE can be calculated as follows:
• MAE = 1 / N * sum for i to N abs(y_i – ypred_i)
Where y_i is the i’th expected value in the dataset, yhat_i is the i’th
predicted value and abs() is the absolute function.
Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in
Machine Learning.
The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future. This
best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and hence
algorithm is termed as Support Vector Machine. Consider the below
diagram in which there are two different categories that are classified using
a decision boundary or hyperplane:
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate
the classes in n-dimensional space, but we need to find out the best
decision boundary that helps to classify the data points. This best boundary
is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means
the maximum distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector. Since
these vectors support the hyperplane, hence called a Support vector.
How does SVM works?

Linear SVM:
The working of the SVM algorithm can be understood by using an example.
Suppose we have a dataset that has two tags (green and blue), and the
dataset has two features x1 and x2. We want a classifier that can classify
the pair(x1, x2) of coordinates in either green or blue. Consider the below
image:
So as it is 2-d space so by just using a straight line, we can easily separate

these two classes. But there can be multiple lines that can separate these
classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary;
this best boundary or region is called as a hyperplane. SVM algorithm finds
the closest point of the lines from both the classes. These points are called
support vectors. The distance between the vectors and the hyperplane is
called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line,
but for non-linear data, we cannot draw a single straight line. Consider the
below image:
So to separate these data points, we need to add one more dimension. For
linear data, we have used two dimensions x and y, so for non-linear data,
we will add a third dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below
image:
So now, SVM will divide the datasets into classes in the following way.
Consider the below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-
axis. If we convert it in 2d space with z=1, then it will become as:
Lab 6
K-Nearest Neighbor (KNN) Algorithm for Machine Learning

o K-Nearest Neighbor is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data
and available cases and put the new case into the category that is
most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears
then it can be easily classified into a well suite category by using K-
NN algorithm.
o K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification problems.
o It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at
the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when
it gets new data, then it classifies that data into a category that is much
similar to the new data.
Suppose there are two categories, i.e., Category A and Category B, and we
have a new data point x1, so this data point will lie in which of these
categories. To solve this type of problem, we need a K-NN algorithm. With
the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:
The K-NN working can be explained on the basis of the below algorithm:
o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
o Step-4: Among these k neighbors, count the number of the data points in
each category.
o Step-5: Assign the new data points to that category for which the number
of the neighbor is maximum.
o Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the required
category. Consider the below image:
o Firstly, we will choose the number of neighbors, so we will choose

the k=5.
o Next, we will calculate the Euclidean distance between the data
points. The Euclidean distance is the distance between two points,
which we have already studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors,

as three nearest neighbors in category A and two nearest neighbors
in category B. Consider the below image:
o As we can see the 3 nearest neighbors are from category A, hence

this new data point must belong to category A.
How to select the value of K in the K-NN Algorithm?

Below are some points to remember while selecting the value of K in the K-
NN algorithm:
o There is no particular way to determine the best value for "K", so we
need to try some values to find the best out of them. The most
preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to
the effects of outliers in the model.
o Large values for K are good, but it may find some difficulties.
The distance metric that most common used is Minkowski, the equation
for it is given below
Lab 7
Decision Tree Algorithm
o Decision Tree is a Supervised learning technique that can be
used for both classification and Regression problems, but
mostly it is preferred for solving Classification problems. It is
a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the
Decision Node and Leaf Node. Decision nodes are used to
make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not
contain any further branches.
o It is a graphical representation for getting all the possible

solutions to a problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts

with the root node, which expands on further branches and
constructs a tree-like structure.
o A decision tree simply asks a question, and based on the

answer (Yes/No), it further split the tree into subtrees.
o Below diagram explains the general structure of a decision

tree:
Decision Tree Terminologies

Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted
branches from the tree.
Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree. This algorithm
compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the
branch and jumps to the next node.
For the next node, the algorithm again compares the attribute
value with the other sub-nodes and move further. It continues the
process until it reaches the leaf node of the tree.
o Step-1: Begin the tree with the root node, says S, which
contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible
values for the best attributes.
o Step-4: Generate the decision tree node, which contains the
best attribute.
o Step-5: Recursively make new decision trees using the
subsets of the dataset created in step -3. Continue this
process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and

wants to decide whether he should accept the offer or Not. So, to
solve this problem, the decision tree starts with the root node
(Salary attribute by ASM). The root node splits further into the
next decision node (distance from the office) and one leaf node
based on the corresponding labels. The next decision node further
gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted
offers and Declined offer). Consider the below diagram:
Attribute Selection Measures

While implementing a Decision tree, the main issue arises that
how to select the best attribute for the root node and for sub-
nodes. So, to solve such problems there is a technique which is
called as Attribute selection measure or ASM. By this
measurement, we can easily select the best attribute for the
nodes of the tree. There are two popular techniques for ASM,
which are:
o Information Gain
o Gini Index
Information Gain:
o Information gain is the measurement of changes in entropy
after the segmentation of a dataset based on an attribute.
o It calculates how much information a feature provides us
about a class.
o According to the value of information gain, we split the node
and build the decision tree.
o A decision tree algorithm always tries to maximize the value
of information gain, and a node/attribute having the highest
information gain is split first. It can be calculated using the
below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each

feature)
Entropy: Entropy is a metric to measure the impurity in a given

attribute. It specifies randomness in data. Entropy can be
calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
o S= Total number of samples
o P(yes)= probability of yes
o P(no)= probability of no
Gini Index:
o Gini index is a measure of impurity or purity used while
creating a decision tree in the CART(Classification and
Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as
compared to the high Gini index.
o It only creates binary splits, and the CART algorithm uses the
Gini index to create binary splits.
o Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that
belongs to the supervised learning technique. It can be used for
both Classification and Regression problems in ML. It is based on
the concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex problem and
to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that
contains a number of decision trees on various subsets of the
given dataset and takes the average to improve the predictive
accuracy of that dataset." Instead of relying on one decision tree,
the random forest takes the prediction from each tree and based
on the majority votes of predictions, and it predicts the final
output.
The greater number of trees in the forest leads to higher
accuracy and prevents the problem of overfitting.
How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random
forest by combining N decision tree, and second is to make
predictions for each tree created in the first phase.
The Working process can be explained in the below steps and
diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data
points (Subsets).
Step-3: Choose the number N for decision trees that you want to
build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision
tree, and assign the new data points to the category that wins the
majority votes.
Example: Suppose there is a dataset that contains multiple fruit

images. So, this dataset is given to the Random forest classifier.
The dataset is divided into subsets and given to each decision
tree. During the training phase, each decision tree produces a
prediction result, and when a new data point occurs, then based
on the majority of results, the Random Forest classifier predicts
the final decision. Consider the below image:

7 محاضرات

Uploaded by

Copyright:

Available Formats

You might also like

7 محاضرات

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

7 محاضرات

Uploaded by

Copyright:

Available Formats

Data Preprocessing in Machine learning

Data preprocessing is a process of preparing the raw data and

Why do we need Data Preprocessing?

3) Importing the Datasets

4) Handling Missing data:

5) Encoding Categorical data:

Training Set: A subset of dataset to train the machine learning

There is an important difference between classification and regression

Regression Predictive Modeling

Because a regression predictive model predicts a quantity, the skill of the

For example, if a regression predictive model made 2 predictions, one of

An algorithm that is capable of learning a regression predictive model is

Logistic Regression Equation:

o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this

o But we need range between -[infinity] to +[infinity], then take

The above equation is the final equation for Logistic Regression.

Lecture 4 (Performance Metrics)

Performance Metrics in Machine Learning

• Confusion Matrix (not a metric but fundamental to others)

Each cell in the confusion matrix represents an evaluation factor. Let’s

AUROC (Area under Receiver operating characteristics curve)

• Intuitively TPR/recall corresponds to the proportion of positive data

Mean Squared Error (MAE)

Root Mean Squared Error

Mean Absolute Error

Support Vector Machine Algorithm

How does SVM works?

So as it is 2-d space so by just using a straight line, we can easily separate

K-Nearest Neighbor (KNN) Algorithm for Machine Learning

o Step-1: Select the number K of the neighbors

o Firstly, we will choose the number of neighbors, so we will choose

o By calculating the Euclidean distance we got the nearest neighbors,

o As we can see the 3 nearest neighbors are from category A, hence

How to select the value of K in the K-NN Algorithm?

o It is a graphical representation for getting all the possible

o It is called a decision tree because, similar to a tree, it starts

o A decision tree simply asks a question, and based on the

o Below diagram explains the general structure of a decision

Decision Tree Terminologies

How does the Decision Tree algorithm Work?

Example: Suppose there is a candidate who has a job offer and

Attribute Selection Measures

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each

Entropy: Entropy is a metric to measure the impurity in a given

Gini Index= 1- ∑jPj2

Random Forest Algorithm

How does Random Forest algorithm work?

Example: Suppose there is a dataset that contains multiple fruit

You might also like