Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 40

A REPORT ON

Machine learning by
building Decisions
trees
Anshul Sharma
18IT06
TABLE OF
CONTENT
Title
Page Number
Abstract 4
Introduction
5
Decision Tree
6 – 10
Random Forest
11 – 15
Deep Forest
16 - 21
Results and Discussion
22 - 25
Conclusion
26
References and
Bibliography 27
TABLE OF
CONTENT
Title
Page Number
Abstract 4
Introduction
5
Decision Tree
6 – 10
Random Forest
11 – 15
Deep Forest
16 - 21
Results and Discussion
22 - 25
Conclusion
26
References and
Bibliography 27
TABLE OF CONTENT
Introduction
Decision Tree
The Problem
Why use Decision Trees?
How does the Decision Tree algorithm Work?
Attribute Selection Measures
K-folds cross-validation
Conclusion

Introduction
Data mining or
knowledge discovery
is computer assisted
process for analysis of
data from perspectives.
Data mining
tools predict the
behaviour of data and
help take knowledge
driven decisions. In
supervised learning,
classes are
predetermined. The
classes are seen as a
finite set of data. A
certain segment of data
will be labelled with
these
classification. The task
is to search for patterns
and construct
mathematical models.
The training set
consists of unlabelled
data. The task of
classification, which is
one of supervised data
mining technique, is to
predict accurately the
class to which the
data samples belong to.
For example, consider
weather data
samples.
Based on training
dataset predict whether
it is going to rain
next day or not and
accordingly classify
into data labels named
“yes” and “no”.
Random forests are an
ensemble learning
method used for
classification. The
methodology includes
construction of
decision trees of the
given training data and
matching the test data
with these. Random
forests are used to
rank the importance of
variables in a
classification problem.
Data mining or
knowledge discovery
is computer assisted
process for analysis of
data from perspectives.
Data mining
tools predict the
behaviour of data and
help take knowledge
driven decisions. In
supervised learning,
classes are
Data mining or
knowledge discovery
is computer assisted
process for analysis of
data from perspectives.
Data mining
tools predict the
behaviour of data and
help take knowledge
driven decisions. In
supervised learning,
classes are
Data mining or knowledge discovery is computer
assisted process for analysis of data from perspectives.
Data mining tools predict the behaviour of data and
help take knowledge driven decisions. In supervised
learning, classes are predetermined. The classes are
seen as a finite set of data. A certain segment of data
will be labelled with this classification. The task is to
search for patterns and construct mathematical models.
The training set consists of unlabelled data. The task of
classification, which is one of supervised data mining
technique, is to predict accurately the class to which the
data samples belong to. For example, consider weather
data samples. Based on training dataset predict whether
it is going to rain next day or not and accordingly
classify into data labels named “yes” and “no”. Random
forests are an ensemble learning method used for
classification. The methodology includes construction
of decision trees of the given training data and
matching the test data with these. Random forests are
used to rank the importance of variables in a
classification problem.

Data mining or
knowledge discovery
is computer assisted
process for analysis of
data from perspectives.
Data mining
tools predict the
behaviour of data and
help take knowledge
driven decisions. In
supervised learning,
classes are
predetermined. The
classes are seen as a
finite set of data. A
certain segment of data
will be labelled with
these
classification. The task
is to search for patterns
and construct
mathematical models.
The training set
consists of unlabelled
data. The task of
classification, which is
one of supervised data
mining technique, is to
predict accurately the
class to which the
data samples belong to.
For example, consider
weather data
samples.
Based on training
dataset predict whether
it is going to rain
next day or not and
accordingly classify
into data labels named
“yes” and “no”.
Random forests are an
ensemble learning
method used for
classification. The
methodology includes
construction of
decision trees of the
given training data and
matching the test data
with these. Random
forests are used to
rank the importance of
variables in a
classification problem.
Data mining or
knowledge discovery
is computer assisted
process for analysis of
data from perspectives.
Data mining
tools predict the
behaviour of data and
help take knowledge
driven decisions. In
supervised learning,
classes are
predetermined. The
classes are seen as a
finite set of data. A
certain segment of data
will be labelled with
these
classification. The task
is to search for patterns
and construct
mathematical models.
The training set
consists of unlabelled
data. The task of
classification, which is
one of supervised data
mining technique, is to
predict accurately the
class to which the
data samples belong to.
For example, consider
weather data
samples.
Based on training
dataset predict whether
it is going to rain
next day or not and
accordingly classify
into data labels named
“yes” and “no”.
Random forests are an
ensemble learning
method used for
classification. The
methodology includes
construction of
decision trees of the
given training data and
matching the test data
with these. Random
forests are used to
rank the importance of
variables in a
classification problem.
DECISION TREES
Decision trees are powerful and popular tools for
classification and prediction. Decision trees represent
rules, which can be understood by humans and used in
knowledge system such as database. A decision tree is a
hierarchical model for supervised learning whereby the
local region is identified in a sequence of recursive
splits in a smaller number of steps. A decision tree is
composed of internal decision nodes decision node and
terminal leaves. Each decision node m implements a
test function fm(x) with discrete outcomes labelling the
branches. Given an input, at each node, a test is applied
and one of the branches is taken depending on the
outcome. This process starts at the root and is repeated
recursively until a leaf node is hit, at which point the
value written in the leaf constitutes the output.
A decision tree is also a nonparametric model in the
sense that we do not assume any parametric form for
the class densities and the tree structure is not fixed a
priori but the tree grows, branches and leaves are added,
during learning depending on the complexity of the
problem inherent in the data. Decision tree is a classifier
in the form of a tree structure which consists of:
 Decision node: specifies a test on a single attribute.
 Leaf node: indicates the value of the target
attribute.
 Edge: split of one attribute
 Path: a disjunction of test to make the final
decision.
Machine learning is an application of artificial
intelligence that involves algorithms and data that
automatically analyse and make decision by itself
without human intervention. It describes how
computer perform tasks on their own by previous
experiences. Therefore, we can say in machine
language artificial intelligence is generated on the basis
of experience.
Decision Tree is a Supervised learning technique that
can be used for both classification and Regression
problems, but mostly it is preferred for solving
Classification problems It is a tree-structured classifier,
where internal nodes represent the features of a
dataset, branches represent the decision
rules and each leaf node represents the outcome
In a Decision tree, there are two nodes, which are
the Decision Node and Leaf Node. Decision nodes are
used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions
and do not contain any further branches
The decisions or the test are performed on the basis of
features of the given dataset It is a graphical
representation for getting all the possible solutions to a
problem/decision based on given conditions It is called
a decision tree because, similar to a tree, it starts with
the root node, which expands on further branches and
constructs a tree-like structure.
The Problem
Many factors influence the result of the decision tree
induction process. The data collection problem is a
tricky pit fall. The data might become very noisy due to
some subjective or system-dependent problems during
the data collection process. Newcomers in data mining
go into data mining step by step. First, they will acquire
a small data base that allows them to test what can be
achieved by data mining methods. Then, they will
enlarge the data base hoping that a larger data set will
result in better data mining results. But often this is not
the case. Others may have big data collections that
have been collected in their daily practice such as in
marketing and finance. To a certain point, they want to
analyse these data with data mining methods. If they
do this based on all data they might be faced with a lot
of noise in the data since customer behaviour might
have changed over time due to some external factors
such as economic factors, climate condition changes in
a certain area and so on. Web data can change
severely over time. People from different geographic
areas and different nations can access a website and
leave a distinct track dependent on the geographic
area they are from and the nation they belong to. If the
user has to label the data, then it might be apparent
that the subjective decision about the class the data
set belongs to might result in some noise. Depending
on the form of the day of the expert or on his
experience level, he will label the data properly or not
as well as he should. Oracle-based classification
methods [12] [13] or similarity-based methods [14]
[15] might help the user to overcome such subjective
factors. If the data have been collected over an
extended period of time, there might be some data
drift. In case of a web-based shop the customers
frequenting the shop might have changed because the
products now attract other groups of people. In a
medical application the data might change because the
medical treatment protocol has been changed. This has
to be taken into consideration when using the data. It
is also possible that the data are collected in time
intervals. The data in time period _1 might have other
characteristics than the data collected in time
period_2. In agriculture this might be true because the
weather conditions have changed. If this is the case,
the data cannot make up a single data set. The data
must be kept separate with a tag indicating that they
were collected under different weather conditions. In
this paper we describe the behaviour of decision tree
induction under changing conditions (see Figure 1) in
order to give the user a methodology for using decision
tree induction methods. The user should be able to
detect such influences based on the results of the
decision tree induction process.
Why use Decision Trees?
Decision Trees usually mimic human thinking ability
while making a decision, so it is easy to understand.
The logic behind the decision tree can be easily
understood because it shows a tree-like structure.
How does the Decision Tree algorithm Work?
Step-1: Begin the tree with the root node, says S,
which contains the complete dataset.
Step-2: Find the best attribute in the dataset
using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible
values for the best attributes.
Step-4: Generate the decision tree node, which
contains the best attribute.
Step-5: Recursively make new decision trees using the
subsets of the dataset created in step -3.
Continue this process until a stage is reached where
you cannot further classify the nodes and called the
final node as a leaf node.
Attribute Selection Measures
While implementing a Decision tree, the main issue
arises that how to select the best attribute for the root
node and for sub-nodes. So, to solve such problems
there is a technique which is called as Attribute
selection measure or ASM. By this measurement, we
can easily select the best attribute for the nodes of the
tree. There are two popular techniques for ASM, which
are:
• Information Gain
• Gini Index
1. Information Gain:
• Information gain is the measurement of changes in
entropy after the segmentation of a dataset based
on an attribute.
• It calculates how much information a feature
provides us about a class.
• According to the value of information gain, we
split the node and build the decision tree.
• A decision tree algorithm always tries to maximize
the value of information gain, and a
node/attribute
• having the highest information gain is split first. It
can be calculated using the below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *Entrop
y(each feature)  
Entropy: Entropy is a metric to measure the impurity in
a given attribute.
It specifies randomness in data. Entropy can be
calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Were,
• S= Total number of samples
• P(yes)= probability of yes
• P(no)= probability of no
2. Gini Index:
• Gini index is a measure of impurity or purity used
while creating a decision tree
in the CART(Classification and Regression Tree)
algorithm.
• An attribute with the low Gini index should be
preferred as compared to the high Gini index.
• It only creates binary splits, and the CART
algorithm uses the Gini index to create binary
splits.
• Gini index can be calculated using the below
formula:
Gini Index= 1- ∑ j P j 2
Pruning: Getting an Optimal Decision tree
Pruning is a process of deleting the unnecessary nodes
from a tree in order to get the optimal decision tree.
A too-large tree increases the risk of overfitting, and a
small tree may not capture all the important features
of the dataset. Therefore, a technique that decreases
the size of the learning tree without reducing accuracy
is known as Pruning. There are mainly two types of
trees pruning technology used:
• Cost Complexity Pruning
• Reduced Error Pruning.
K-folds cross-validation
When MicroStrategy trains a decision tree model,
the decision tree algorithm splits the training data
into two sets; one set is used to develop the tree
and the other set is used to validate it. Prior to
MicroStrategy 9.0, one fifth of the training data
was always reserved for validating the model built
on the remaining four fifths of the data. The quality
of this model (referred to as the holdout method)
can vary depending on how the data is split,
especially if there is an unintended bias between
the training set and the validation set.
K-folds cross-validation is an improvement over
the holdout method. The training data is divided
into k subsets, and the holdout method is repeated
k times. Each time, one of the k subsets is used as
the test set and the other k-1 subsets are used to
build a model. Then the result across all k trials is
computed, typically resulting in a better model.
Since every data point is in the validation set only
once, and in the training dataset k-1 times, the
model is much less sensitive to how the partition is
made.
A downside is that training time tends to increase
proportionally with k, so MicroStrategy allows the
user to control the k parameter, limiting it to a
maximum value of 10. The K-fold setting is
specified on the Select Type of Analysis dialog in
the Training Metric Wizard.
If the user sets k=1, the basic hold-out method is
used, with one fifth of the training data withheld for
validation of a model built on the remaining four
fifths of the data.

Conclusion
Since the invention of computers, a lot of efforts have
been made in order to make them learn. If we can
program computer to learn – that is, to improve its
performance through experience - the consequences
will be far-reaching. For example, it will be possible to
make a computer ordinate an optimal treatment for a
new disease using its past experience from treatment
of a series of related diseases. Providing learning
capabilities to computers will lead to many new
applications of computers. Furthermore, knowledge
about automation of learning processes may give
insight in humans’ ability to learn.
Unfortunately, we don’t know yet how to make
computers learn as well as humans. However, in recent
years a series of algorithms has appeared which now
makes it possible to successfully automate learning in
some application areas. For example, one the most
efficient algorithms for speech recognition are based
on machine learning.
Today the interest in machine learning is so great that
it is the most active research area in artificial
intelligence.
The area may be divided into sub areas, symbolic and
non-symbolic machine learning. In symbolic learning
the result of the learning process is represented as
symbols, either in form of logical statements or as
graph structures. In non-symbolic learning the result is
represented as quantities, for example as weights in a
neural network (a model of the human brain).
In recent years the research in neural networks has
been very intensive and remarkably good results have
been achieved. This is especially true in connection
with speech and image recognition.
But research in symbolic machine learning has also
been intensive. An important reason for this is that
humans can understand the result of a learning
process (in contrast to neural networks), which is
significant if one should trust the result.
The following two projects deal with symbolic machine
learning and are both using so-called induction. Logical
deduction is the process of learning from examples.
The goal is to reach a general principle or conclusion
from some given examples. Here is a simple example of
induction. Suppose you see a set of letterboxes that
are all red. By induction you may conclude that all
letterboxes in the world are red (including letterboxes
that you haven’t seen). Decision trees are one of the
most applied methods for leaning by induction. The
general principle of decision trees is best illustrated
through and example. Suppose that you don’t know
what causes people to be sunburnt. You observe a
series of persons and register some of their features,
among these whether or not they are sunburnt. The
observations are given in the table blow This decision
tree may be used to classify a given person, i.e., to
predict whether or not he will get sunburnt. Start at
the top of the tree and answer the question one after
another, until you reach a leaf. The leaf stores the
classification (Sunburnt or None).
In the present case the decision tree agrees with our
intuition about factors that are decisive for getting
sunburnt. For example, neither a person’s weight nor
height plays a role. It is often possible to construct
more than one decision tree that agrees with the
observed data. However, not all of them may be
equally well suited for making generalizations, i.e., to
classify examples outside the set of examples used to
build the tree. How do we build the best tree? Using
the so-called ID3 algorithm, one of the most effective
algorithms for induction, may solve this problem. The
algorithm builds a tree while striving at as simple a tree
as possible. The assumption is that a simple tree
performs better than a complex tree when unknown
data are to be classified. The simplification algorithm is
based on information theory.

You might also like