Professional Documents
Culture Documents
Supervised Learning28
Supervised Learning28
Classification
• Classification is a supervised learning approach in which
the computer program learns from the data input given to it
and then uses this learning to classify new observation.
Regression
• Regression models are used to predict a continuous value.
This is used to predict the outcome of an event based on
the relationship between variables obtained from the data-
set.
We can solve supervised learning following way
• Determine the type of training examples.
• Gather a training set.
• Determine the structure of the learned function and
corresponding learning algorithm.
• Complete the design. Run the learning algorithm on the
gathered training set.
• Evaluate the accuracy of the learned function.
Application of supervised learning are:
• BioInformatics
• Handwriting Recognition
• Spam Detection
• Object Character Recognition (OCR)
• Speech Recognition
What are Decision trees ?
• A decision tree is a type of supervised Learning algorithm
which each branch node represents a choice between a
number of alternatives and each leaf nodes represents a
decision tree.
• It is one of the most widely used and practical methods
for inductive inference
How can we understand decision tree ?
• 1) Root node
• 2) Decision node
• 3) Leaf node
Root node
• Root node is the node which consists of entire population
i.e. sample of a given data.
• It can be further be divided into two or more
homogeneous(similar kind)sets depending on the kind of
desired output and sampled data.
Parent node/Child node
• A node, which is divided into sub-nodes is called parent
node of sub-nodes
BETRAY
RICH
ED?
? NO
O!
YES NO R
B U
YES YO BE
’LL ILL LE
TAKE U W NG E
YO ND YOU SI REV
YOUR FI O N T FO R
M E MIGH
TIME SO OON
ES HAVE
R RA NGE
A
A R R IA G
M
E
what is random forest?
• its the collection of several decison trees to create
sophisticated classification
•
WHAT KIND OF DATA CAN BE MODULED USING DISIGION TREE ?
• DATA MODULE
classification module
regression module
Advantage of Decision Tree are:
• Simple to understand and interpret.
• Able to handle both numerical and categorical data.
• Performs well with large datasets.
• Requires little data preparation.
• Possible to validate a model using statistical tests.
Application of Decision Tree
• Business Management
• Customer RelationShip Management
• Fraud Statement Detection
• Healthcare Management
• Energy Consumption
CART ALGORITHM
• It stands for classification and regression tree.
It can handle both classification and regression.
It is a binary decision tree that is constructed by splitting
a parent node in the child node repeatedly.
ENTROPY
Basketbal
l
Basketball
IMPURITY =0
basketbal
football l
cricketball
baseball
IMPURITY ≠ 0
Entropy Formula :
ENTROPY GRAPH
• P(Yes) is the probablity of yes and P(no) is the
probablity of no
• P(s) is the total sample space
If number of yes = number of No. then probablity is : P(s) =0.5
::Entropy(s)=1
:: Entropy(s) =0
INFORMATION GAIN
• It is defined as the difference between the parent node impurity
and weighted sum of internal node impurity.
• It measures the reduction in entropy.
• It decides which attributes should be selected as decision node
YES NO YES NO
YES NO YES NO
YES NO YES NO
3 2 4 2
3 2 0 4 3
1 2 1
GOOD
BLOOD
CHES CIRCUL
TRUE ATION
T PAIN FALSE
HEART
TRUE FALSE HEART DISEASE
Gini impurity of Good blood circulation DISEASE
. HEART
DISEASE
HEART
DISEASE
=0.37
YES NO YES NO
YES NO YES NO 2 0 4
3
3 2 3
1
BLOCKE
Gini impurity of chest pain = 0.46 D
TRUE ARTERI FALSE
ES
HEART
HEART DISEASE
DISEASE
YES NO YES NO
Gini impurity of blood arteries = 0.37
4 2
2
1
Decision Tree Sample
GOOD BLOOD
CIRCULATION
BLOCKED
CHEST PAIN
ARTERIES
BLOCKED
CHEST PAIN 2/1 3/2
ARTERIES