Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

Supervised Learning

Supervised learning is the machine learning task of learning a function that maps an


input to an output based on example input-output pairs.
Machine Language
Machine learning is the scientific study of algorithms and statistical
models that computer systems use to perform a specific task without using explicit
instructions, relying on patterns and inference instead.

It has two types:


• Supervised Learning
• Unsupervised Learning
Supervised learning is of two types:

Classification
• Classification is a supervised learning approach in which
the computer program learns from the data input given to it
and then uses this learning to classify new observation.
Regression
• Regression models are used to predict a continuous value.
 This is used to predict the outcome of an event based on
the relationship between variables obtained from the data-
set.
We can solve supervised learning following way
• Determine the type of training examples.
• Gather a training set.
• Determine the structure of the learned function and
corresponding learning algorithm.
• Complete the design. Run the learning algorithm on the
gathered training set.
• Evaluate the accuracy of the learned function.
Application of supervised learning are:
• BioInformatics
• Handwriting Recognition
• Spam Detection
• Object Character Recognition (OCR)
• Speech Recognition
What are Decision trees ?
• A decision tree is a type of supervised Learning algorithm
which each branch node represents a choice between a
number of alternatives and each leaf nodes represents a
decision tree.
• It is one of the most widely used and practical methods
for inductive inference
How can we understand decision tree ?

• 1) Root node
• 2) Decision node
• 3) Leaf node
Root node
• Root node is the node which consists of entire population
i.e. sample of a given data.
• It can be further be divided into two or more
homogeneous(similar kind)sets depending on the kind of
desired output and sampled data.
Parent node/Child node
• A node, which is divided into sub-nodes is called parent
node of sub-nodes

• Where as, sub-nodes are the child of parent node.


Splitting
• Splitting is the process of dividing a node into two or
mode sub nodes.
• It is divided on basis of given condition(s).
Branching
• Branching is also known as sub-tree is formed when
tree/node is splitted.
Pruning
• Pruning is the method of reducing the size of decision
trees by removing (unwanted)nodes.
• It is the opposite of splitting because it narrows down the
decision tree instead of expanding it.
Terminal node/Leaf node
• The end node is called final node.
• Terminal node has no further nodes/child nodes.
Terminologies used in Decision Tree.
a. Root node.
b. Child node.
c. Splitting.
d. Branch/Sub Tree.
e. Pruning.
f. Terminal node/Leaf node.
SINGLE
?
YES NO
GOOD IF WIFE, I
LOOKS KNOW YOU
? ARE
YES NO SUFFERING

BETRAY
RICH
ED?
? NO
O!
YES NO R
B U
YES YO BE
’LL ILL LE
TAKE U W NG E
YO ND YOU SI REV
YOUR FI O N T FO R
M E MIGH
TIME SO OON
ES HAVE
R RA NGE
A
A R R IA G
M
E
what is random forest?
• its the collection of several decison trees to create
sophisticated classification

WHAT KIND OF DATA CAN BE MODULED USING DISIGION TREE ?

• DATA MODULE
classification module
regression module
Advantage of Decision Tree are:
• Simple to understand and interpret.
• Able to handle both numerical and categorical data.
• Performs well with large datasets. 
• Requires little data preparation.
• Possible to validate a model using statistical tests.
Application of Decision Tree
• Business Management
• Customer RelationShip Management
• Fraud Statement Detection
• Healthcare Management
• Energy Consumption
CART ALGORITHM
• It stands for classification and regression tree.
It can handle both classification and regression.
It is a binary decision tree that is constructed by splitting
a parent node in the child node repeatedly.
ENTROPY

• Entropy is defined as the degree of randomness of a


labelled data.
• It is a metric measure of the change in order of a training
data.
• It is the first step to solve the decision tree.
• Types Of Entropy
IMPURITY:

• It is a measure of homogenity of the


labelled data.
Basketball

Basketbal
l

Basketball

IMPURITY =0
basketbal
football l

cricketball
baseball

IMPURITY ≠ 0
Entropy Formula :

• Entropy = -P(yes)log2P(yes) - P(no)log2P(NO)

ENTROPY GRAPH
• P(Yes) is the probablity of yes and P(no) is the
probablity of no
• P(s) is the total sample space
If number of yes = number of No. then probablity is : P(s) =0.5
::Entropy(s)=1

If it contains all yes or all no ie. P(S) = 1 or 0

:: Entropy(s) =0
INFORMATION GAIN
• It is defined as the difference between the parent node impurity
and weighted sum of internal node impurity.
• It measures the reduction in entropy.
• It decides which attributes should be selected as decision node

• FORMULA OF A INFORMATION GAIN:


Information gain = Entropy-[(weight avg)*entropy of every
feature.
GINI IMPURITY:

• It is derived from Italian statistician Corrodo Gini.


• It is defined as the metric probablity of incorrectly
choosen random number fron the data if the data is
labelled according to the class.
It is calculated two times ie,
Left part and right part
Gini impurity(GI)= 1-(Probablity of yes)²- (Probablity of no)²
EXAMPLE FROM GINI IMPURITY:

Chest Good Blocked Heart


pain Blood Arteries Disease
circulation
NO NO NO NO
YES YES YES YES
NO NO YES YES
YES YES YES YES
NO NO YES YES
YES NO YES NO
YES NO NO YES
YES NO YES NO
NO NO NO YES
• Gini Impurity of chest pain:

Gini impurity of left side of leaf node is::


1-(3/5)²-(2/5)² = 0.48

Gini impurity of right side of lead node is:


1-(3/4)²-(1/4)² = 0.375
gini impurity of chest pain = weighted average of a gini impurity of chest pain
= (5/9)*0.48+(4/9)*0.375
=0.44
Gini Impurity of Good Blood Circulation:
Gini impurity of left side of leaf node is::
1-(2/2)²-(0/2)² =0
Gini Impurity of right side of leaf node ::
1-(4/7)²-(3/7)² = 0.48
Gini Impurity of Good Blood Circulation = weightev average of a gini impurity of GBC
(2/9)*0+(7/9)*0.48= 0.37
Gini impurity of a Blocked Arteries:
Gini impurity of the left side of leaf node is:
1-(4/6)²-(2/6)²=0.44
Gini impurity of the right side of the leaf node Is
1-(2/3)²-(1/3)²=0.44
Gini impurity of the Blocked Arteries = Weighted Average
of the Blocked arteries leaf nodes
=(6/9)*0.44+(3/9)*0.44
= 0.42
Gini impurity of a Blocked Arteries:

Gini impurity of the left side of leaf node is:


1-(4/6)²-(2/6)²=0.44

Gini impurity of the right side of the leaf node Is


1-(2/3)²-(1/3)²=0.44

Gini impurity of the Blocked Arteries = Weighted Average of the


Blocked arteries leaf nodes
=(6/9)*0.44+(3/9)*0.44
= 0.42
solving

CHEST Good Blood Blocked


PAIN circulation Arteries

FALSE TRUE FALSE TRUE FALSE


TRUE

HEART HEART HEART HEART HEART


HEART DISEASE DISEASE DISEASE
DISEASE DISEASE
DISEASE

YES NO YES NO
YES NO YES NO
YES NO YES NO

3 2 4 2
3 2 0 4 3
1 2 1
GOOD
BLOOD
CHES CIRCUL
TRUE ATION
T PAIN FALSE

HEART
TRUE FALSE HEART DISEASE
Gini impurity of Good blood circulation DISEASE
. HEART
DISEASE
HEART
DISEASE
=0.37
YES NO YES NO

YES NO YES NO 2 0 4
3
3 2 3
1
BLOCKE
Gini impurity of chest pain = 0.46 D
TRUE ARTERI FALSE
ES

HEART
HEART DISEASE
DISEASE

YES NO YES NO
Gini impurity of blood arteries = 0.37
4 2
2
1
Decision Tree Sample
GOOD BLOOD
CIRCULATION

BLOCKED
CHEST PAIN
ARTERIES

BLOCKED
CHEST PAIN 2/1 3/2
ARTERIES

You might also like