Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

Supervised Learning

Introduction to Machine Learning

Lecture 2
Agenda
Introduction
Linear model
Regression: - Linear regression,
 Case study in regression
Classification:
 Operations of classifiers: -Decision trees, KNN,
Naïve Bayes, Logistic regression, Random Forest

2
Supervised Learning
 Is when we teach/train the machine using data that is
well labelled.
 On basis of that data, machines predict the output.
 The labelled data means some input data is already
tagged with the correct output.
 The training data provided to the machines work as
the supervisor that teaches the machines to predict
the output correctly.
3
Supervised Learning…
 Is a process of providing input data as well as correct
output data to the machine learning model.
 The aim is to find a mapping function to map the input
variable(x) with the output variable(y).
 Application areas:-
 Image classification
 Fraud Detection - transactions made by the user are authentic or not.
 Visual Recognition
 Spam filtering, etc.
4
Supervised Learning: How it works
 Suppose we have a dataset of different types of shapes which includes square,
rectangle, triangle, and Polygon.
 First step is that we need to train the model for each shape.
 If the given shape has four sides, and all the sides are equal, then it will be labelled as a
Square.
 If the given shape has three sides, then it will be labelled as a triangle.
 If the given shape has six equal sides, then it will be labelled as hexagon.
 Test our model using the test set, and the task of the model is to identify the
shape.
 The machine is already trained on all types of shapes, and when it finds a new
shape, it classifies the shape on the bases of a number of sides, and predicts the
output. 5
Supervised Learning …

https://www.javatpoint.com/supervised-machine-learning 6
Supervised Learning …
 Steps Involved:-
I. Determine the type of training dataset
II. Collect/Gather the labelled training data.
III. Split the training dataset (training, test, and validation).
IV. Determine the input features of the training dataset,
V. Determine the suitable algorithm for the model.
VI. Execute the algorithm on the training dataset.
VII.Evaluate the accuracy of the.

7
Supervised Learning: Types
 Supervised learning can be further divided into two
types of problems:

8
Supervised Learning: Regression
 Is used when the output variable is a real or continuous value.
 Are used if there is a relationship b/n the input variable and the output
variable. i.e., a change in one variable is associated with a change in the
other variable.
 E.g., salary based on work experience
 A regression problem is when the output variable is a real value, such as
“Birrs/Money” or “weight”.
 It is used for the prediction of continuous variables, such as Weather
forecasting and Market Trends.
 Algorithms
 Linear Regression
 Polynomial Regression
9
Supervised Learning: Regression
 Linear Regression:-
 It assumes that there is a linear r/n b/n the 2
variables, Input/independent (X) and
Output/dependent (Y), of the data it has
learnt from.
 It makes predictions for continuous/real or
numeric variables such as sales, salary,
age, product price, etc.

10 10
Supervised Learning: Regression…
 Here are some “real world” examples:
 Impact of GPA on College Admissions
 Impact of product price on number of sales
 Impact of rainfall amount on number fruits yielded
 Impact of blood alcohol content on coordination

11
Supervised Learning: Classification
 Is used when the output variable is categorical i.e. with 2 or more
classes.
 E.g., yes or no, male or female, true or false, etc.
 The output will be either one of the classes and not a number as it
was in Regression.
 There are two types of Classifications: Binary and Multi class
classification
 Algorithms
 Decision tree
 Naïve bayes
 Support vector machines (SVM)
 K-Nearest Neighbors
12
Decision Tree
 It is a tree-structured classifier, Where,
 Internal nodes represent the features of a dataset,
 Branches represent the decision rules, and
 Each leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are the Decision nodes and Leaf
Node.
 Decision Node are used to make any decision and have multiple branches,
 Leaf nodes are the output of those decisions and do not contain any further
branches.
 The decisions or the test are performed based on features of the given
dataset.
 It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions. 13
Decision Tree…
 It is called a decision tree because, like a tree, it starts
with the root node, which expands on further branches
and constructs a tree-like structure.
 A decision tree simply asks a question and based on the
answer (Yes/No), it further split the tree into subtrees.
 Why Decision tree?
 It usually mimic human thinking ability while making a
decision, so it is easy to understand.
 The logic behind the decision tree can be easily understood
because it shows a tree-like structure.
14
Decision Tree…

15
Decision Tree…
 Terminologies:
 Root Node: It represents the entire dataset, which further
gets divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node,
 Splitting: dividing the decision node/root node into sub-
nodes according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted
branches from the tree.
 Parent/Child node: The root node of the tree is called the
parent node, and other nodes are called the child nodes. 16
Decision Tree…
 How it works:-
 Step-1: Begin the tree with the root node, says S, which
contains the complete dataset.
 Step-2: Find the best attribute in the dataset.
 Step-3: Divide the S into subsets that contains possible values
for the best attributes.
 Step-4: Generate the decision tree node, which contains the
best attribute.
 Step-5: Recursively make new decision trees using the subsets
of the dataset created in step -3.
 Continue this process until a stage is reached where you cannot
further classify the nodes and called the final node as a leaf node. 17
Decision Tree…
 Case study:
 Suppose there is a candidate
who has a job offer and wants
to decide whether he should
accept the offer or Not

18
Regularization techniques
 Machine learning model performs well on the training
data but does not perform well on the unseen or test
data.
 Model is not able to predict on the unseen data by
introducing noise in the output – overfitting.
 Regularization is a technique to prevent the model
from overfitting by adding extra information to it.

19
Regularization techniques…

20
Quizze
21
Quizze
1. What is the difference between regression and
classification algorithms
2. Write real world examples which can be solved by
regression and classification

22
Thank You!

?C0 - Personal Information


23

You might also like