Decision Tree and Random Forest

DECISION TREE AND RANDOM FOREST 1
DECISION TREE AND RANDOM FORESTAND CLASSIFYING DATA SETS
[Author Name(s), First M. Last, Omit Titles and Degrees]
[Institutional Affiliation(s)]
Author Note
[Include any grant/funding information and a complete correspondence address.]

Abstract
In this research, we initially give a brief background and then introduction to Decision
Tree and Random Forest, further discussing their advantages and disadvantages. We also
examine the differences between the two. We evaluate the classification outcomes of Decision
Tree and Random Forest, for the classification of twenty diverse datasets. We used 20 data sets
with 148 to 20,000 instances each that were accessible in the UCI repository [1]. We contrasted
the classification outcomes produced by the Random Forest and J48 Decision Tree approaches.
We spoke about the advantages and disadvantages of applying these models to big and small
data sets. The classification results demonstrate that Decision tree performs well with small data
sets, but Random Forest performs better with the same amount of characteristics and big data
sets, i.e. with more occurrences. The data set also shows that when number of instances
increased the percentage of correctly classified instances increased for Random Forest.
Keywords: Decision Tree, Random Forest, Classification, Bagging

DECISION TREE AND RANDOM FOREST
Background
Decision trees date back to the early days of the emergence of written records. This narrative
exemplifies one of the main advantages of trees: highly interpretable outcomes with a simple,
tree-like presentation, which in turn improves comprehension and the distribution of data.
Decision trees, also known as classification trees or regression trees, have their computational
roots in models of biological and mental processes. The complementary growth of statistical
decision trees and machine learning trees is driven by their common heritage.
Ho (1995) proposed a technique to overcome a problems and challenges faced on the complexity
of decision tree classifiers created using traditional means. Such classifiers are limited in their
ability to increase in complexity without compromising their generalization ability accurately to
new input. The suggested approach makes use of oblique decision trees, which are useful for
enhancing training set accuracy. The method's main step is to construct numerous trees in feature
space subspaces that have been randomly chosen. The classification of the two trees together can
be monotonically enhanced since the trees generalize their classification in complimentary ways.
In 1997, Amit and Geman introduced a method for form identification based on the cooperative
induction of shape features and tree classifiers. They came to the conclusion that no classifier
based on the complete feature set could be assessed since it was impossible to know a
beforehand which characteristics were relevant due to the almost unlimited number of features.
Standard decision tree building based on a fixed length feature vector was not possible due to the
quantity and kind of features. An alternate strategy would be to create numerous trees while
considering a tiny random sample of attributes at each node, constraining their complexity to rise
with tree depth. Terminal nodes have estimates of the associated posterior distribution across
shape classes. The image may be classified by distributing it downward and aggregating the
output.
In another study, Ho (1998) [2] offered a solution to the conflict between over fitting and
achieving maximum accuracy. This was accomplished by building a decision-tree-based
classifier that maintained the maximum accuracy on training data while simultaneously
increasing generalization accuracy as the classifier's complexity increased. The classifier was
made up of a number of trees that were built in a systematic manner by pseudo-randomly
choosing subsets of the feature vector's components, or trees built in randomly selected
subspaces. The subspace approach demonstrated its superiority when compared to single-tree
classifiers and other forest construction techniques when empirically evaluated against publically
accessible data sets. Then he proposes the Random Forest ensemble method, which integrates
already-existing approaches to create a set of decision trees with carefully controlled variation.
Random Forest is an ensemble learning technique for classification and regression. In order to
create a collection of decision trees with controlled variation, Breiman (2001) [3] developed a
technique that combines his bagging sampling methodology ((1996a)) with the randomly chosen
characteristics provided independently by Ho (1995); Ho (1998); and Amit and Geman (1997).
Each decision tree in the ensemble is created via bagging, using a sample with replacement taken
from the training dataset. According to statistics, the sample is expected to include roughly 64%
of instances at least once. The other cases (about 36%) are referred to as out-of-bag instances,
whereas the examples in the sample are known as in-bag instances. To identify the class label of
an unlabeled instance, each tree in the ensemble serves as a base classifier. By using majority
voting, which assigns one vote to each classifier's projected class label, the instance is classified
according to the class label that has received the most votes [17], this is briefly discussed in
introduction.
Introduction
There are several domains where the Decision Tree method [4] is used. They are employed in a
variety of applications, including statistical data comparison, text classification, and text
extraction. In addition, libraries can use the Decision Tree method to classify books into several
groups according to their type. It may be used in hospitals to diagnose disorders including
tumors, cancer, heart issues, hepatitis, etc. It is used by businesses, hospitals, schools, colleges,
and universities to keep track of their records. It can also be used for statistics in the stock
market.
Decision Tree algorithms are efficient [5] because they offer classification rules that are
understandable to humans. In addition to this, it has certain flaws, one of which is the sorting of
all numerical properties when the tree decides to break a node. Such a split on sorting all
numerical characteristics becomes expensive, i.e., in terms of efficiency or running time and
memory space, especially if Decision Trees are set on data that is vast in size, i.e., it contains
more instances. Breiman [3] introduced the concept of random forests in 2001. Random forests
outperform existing classifiers such as support vector machines, neural networks, and
discriminant analysis while also solving the over fitting issue.
Methods that employ an ensemble of different classifiers and randomization to provide variety,
such as bagging or random subspaces [6, 7], have shown to be particularly effective. They
employ randomization throughout the induction phase to provide diversity and create classifiers
that are different from one another. Because of its effectiveness in discriminative classification,
Random Forests have drawn a lot of interest in machine learning [8].

Lepetit et al. [9, 10] introduced Random Forests to the world of computer vision community.
His work in this area served as the basis for studies using Random Forests in areas including
class recognition [11, 12], bi-layer video segmentation [13], picture classification [14], and
person identification [15]. The Random Forest also naturally supports a wide range of visual
signals, such as color, form, texture, and depth. Random Forests are thought of as efficient
general-purpose vision tools.
According to the definition given in [3], Random Forest is a general concept of classifier
combination that makes use of L tree-structured base classifiers {h(X, Ѳn), N=1, 2, 3, L}, where
X stands for the input data and {Ѳn} is a group of identical and dependent distributed random
vectors. Data from the available data are randomly chosen for each Decision Tree. For instance,
by randomly picking a feature subset or a subset of training data for each Decision Tree, a
Random Forest (as in Random Subspaces) for each Decision Tree may be created (the concept of
Bagging). The features of a Random Forest are chosen at random for each decision split. By
picking features at random, the correlation across trees is decreased, increasing the accuracy of
predictions and resulting in higher efficiency. .
In addition to maintaining the benefits of Decision Trees, Random Forest frequently outperforms
Decision Trees thanks to its usage of random subsets of variables, bagging on samples as
previously mentioned, voting system [17], and decision-making process. The Random Forest can
accommodate missing values and can handle continuous, categorical, and binary data, making it
suitable for high dimensional data modelling. There is no need to trim the trees because Random
Forest is robust enough to handle overfitting issues thanks to the bootstrapping and ensemble
approach. Random Forest is effective, understandable, and non-parametric for a variety of
dataset types [18] in addition to having excellent prediction accuracy. Compared to other
prominent machine learning techniques, Random Forest offers a highly special combination of
model interpretability and prediction accuracy. Because ensemble techniques and random
sampling are used, accurate forecasts and superior generalization are made.
The generalization feature of the bagging method improves with lower variation and lower
overall generalization error. As a result, the boosting strategy accomplishes the reduction in bias
[19].
Three key characteristics of Random Forest have drawn attention [17]:
• Reliable forecasting outcomes for a range of applications
• A trained model can calculate the relative weights of each characteristic and determine
how close samples are to one another.
The classification performance of the Decision Tree (J48) and the Random Forest for big and
small datasets is discussed ahead in the paper. This comparison's goal is to provide a baseline
that will be helpful for the classification situations. Additionally, it will aid in choosing the right
model.
Advantages and Disadvantages of Decision Trees
ADVANTAGES:
1. Preparing the data for decision trees during pre-processing is easier than it is for other
algorithms.
2. Data normalization is not necessary for a decision tree.
3. Data scaling is not necessary when using a decision tree.

4. Additionally, the process of creating a decision tree is not materially impacted by missing
values in the data.
5. Technical teams and stakeholders may easily understand a decision tree model since it is
so straightforward.
DISADVANTAGE:
1. An unstable decision tree might result from a little change in the data that has a huge
impact on the decision tree's structure.
2. Compared to other algorithms, a decision tree's calculations may become far more
complicated.
3. It takes more time to train a decision tree model.
4. Because of the intricacy and length of time required, decision tree training is
relatively costly.
5. Regression and the prediction of continuous values cannot be done well with the
Decision Tree method.
Advantages and Disadvantages of Random Forest
ADVANTAGES:
1. It provides variables importance, which assists in identifying the variables that have a
favorable influence.
2. Over fitting is a common problem with machine learning models; random forest
classifiers would not.
3. It may be applied as a classification and regression model.
4. It handles null values.
5. When a class in the data is less frequent than other classes, it may automatically
balance data sets.
6. The approach is appropriate for challenging jobs since it quickly manages variables.
DISADVANTAGES:
1. The biggest drawback of random forest is that it might become too sluggish and
inefficient for real-time forecasts when there are a lot of trees.
2. Random forest is not a descriptive tool; it is a predictive modelling tool.
Difference between Decision Tree and Random Forest
Decision trees are graphs that show all potential outcomes of a decision using a branching
technique, which is a key distinction between them and the random forest algorithm. A series of
decision trees that operate in accordance with the output are produced by the random forest
method, in contrast.
Because the random forest approach is so precise and because modern computers and systems
can often handle big, previously unmanageable datasets, machine learning engineers and data
scientists frequently employ it in practice.

The random forest algorithm's drawback is that you can't see the final model if your computer's
processing capacity is insufficient if your dataset is excessively huge. They can take a lot of time
to be created.
A basic decision tree has the advantage of being simple to understand. We are able to
immediately predict the conclusion since we know which variable and which value the variable
uses to split the data while we are building the decision tree. The models created by the random
forest method, on the other hand, are more complex since they combine different decision trees.
We must decide how many trees to generate and how many variables are required for each node
when creating a random forest method model. In general, adding more trees will increase
performance and predictability while decreasing calculation speed. The end solution for
regression issues is the average of all the trees. The sample in the tree target cell is the initial
level of means in a random forest method regression model, followed by all trees. In contrast to
linear regression, it estimates values beyond the observed range using prior observations.
More trees are needed for more precise predictions, which slows down models. You would most
likely obtain an answer that was really near to the correct answer if there was a technique to
build several trees by averaging their responses. In this post, we examined the distinctions
between the decision tree and the random forest algorithms. A decision tree is a network structure
that use branching to deliver information in all conceivable ways. The random forest method, in
contrast, combines decision trees from all of their choices based on the outcome. A decision
tree's key benefit is that it can swiftly adapt to the dataset and that the final model can be seen
and comprehended sequentially.
Mathematics Involved in Decision Tree and Random Forest
Decision Tree
Decision trees use mathematics during the learning process. To begin, we need to identify a tree
structure and decision rules for each node using a dataset D = X, y. Each node will divide the
dataset into two or more disjoint subsets, each designated by the letter D (l,i)*, where l stands for
the layer number and i for the subset number. If every one of our labels in this subset belongs to
the same class, the subset is said to be PURE, this node is labeled as a leaf node, and this branch
of the tree has terminated. If not, the separation criteria will be applied once again.
In some, more complicated datasets, to get to a stage in which every leaf node is pure may
require extremely deep decision trees that will lead to overfitting of the dataset. Due to this, we
often stop before we arrive at pure nodes and instead have to develop some more sophisticated
termination strategies in which the nodes may be impure but the error is accounted for and
measured. To reach a point where every leaf node is pure in some more complex datasets, it may
be necessary to build exceptionally deep decision trees, which will cause the over fitting of the
dataset. As a result, we frequently stop before reaching pure nodes and must instead develop
more complex termination techniques in which the nodes may be impure but the error is taken
into account and measured.
A tree is referred to as MONOTHETIC if just one characteristic is taken into account at
each node and POLYTHETIC if more than one are taken into considerations. Simpler trees are
typically preferred since they are simpler to read and use. Most programming languages allow
you to constrain a decision tree to produce either monothetic or polythetic trees, although it's
nearly always better to start with the simpler and only expand in complexity if absolutely
necessary. We remove an impurity-related factor in order to produce these trees.
Impurity Measures
The formula below is used to calculate Entropy impurity or information impurity the formula
below:
Equation 1
This equation basically informs us how predictable each node in our tree is. In the end, we want
our notes to be predictable, and we do this by making sure a node has a sizable proportion of a
certain class.
Equation 2: Other Decision Tree Impurity formulas
Random Forest
Gini Index
The Gini index is a method for splitting out data; it evaluates the impurity or purity of data and is
used in CART (Classification and Regression Tree) algorithms such as Decision Tree. It
generates a binary split, which is then used by the CART algorithm.
As the root node, an attribute with a low Gini index is preferred.
Formula for calculating Gini index is:-
Equation 3
Information Gain
Information gain is calculated using entropy in the data set and attribute entropy, and it tells us
how much information a feature offers us with a class.
Equation 4
Entropy measures how much unpredictability or impurity is there in the provided data. It is used
to select the root node in the decision tree where the data will be split equally.
Formula to calculate information gain
Information Gain= Entropy(S) - [(Weighted Avg) *Entropy(each feature) ]
Regression Problems
The Random Forest Algorithm is used to solve regression issues where the mean squared error
(MSE) value is used to determine how your data branches from each node.
Equation 5
Classification in Decision Tree and Random Forest
A Decision Tree represents a supervised classification strategy [20]. The concept was
inspired by the typical tree structure, which consists of a root, nodes (locations where branches
divide), branches, and leaves. Similar to this, a Decision Tree is built from nodes, which stand in
for circles, and the segments that link the nodes, which stand in for the branches. A decision tree
is typically drawn from left to right, descends from the root, and travels downward. A root node
is the node from which the tree begins. The "leaf" node is the one at which the chain comes to an
end. Each internal node, or node that is not a leaf node, can extend two or more branches. While
the branches indicate a spectrum of values, a node represents a specific attribute. These value
ranges serve as dividing lines between the set of values for the specified attribute. Tree structure
is shown in Figure 1.
Figure 1: Tree Structure
The values of the attributes of the provided data are used to group the data in the
Decision Tree. The pre-classified data is used to create a Decision Tree. The attribute that split
the data into the most appropriate classes are chosen for classification. The division of the data
items is done by the values of these feature values. Recursively, this technique is done to every
divided subset of the data items. As soon as every data item in the current subset belongs to the
same class, the procedure is finished.
We employ WEKA's Decision Trees' J48 implementation (open source software). We can
examine data in WEKA, and in addition to this, it implements techniques for regression, data
pre-processing, clustering, classification, and visualization. WEKA has more than sixty
algorithms accessible. An overview of a few Decision Tree-based algorithms is provided below.
REPTree
The splitting criterion for the decision/regression tree in REPTree is information
gain, and the pruning method is reduced error pruning. For numeric attributes, it only
sorts values once. C4.5 uses the approach of fractional instances to manage missing
values. REP Tree is a quick learner of Decision Trees.
Random Tree
An undetermined collection of alternative trees with K random attributes at each
node are combined to create a random tree. In this context, the term "at random" refers to
a group of trees where each tree has an equal probability of being sampled. Or we may
remark that the distribution of trees is uniform. Random trees may be produced
effectively, and combining several such random trees typically results in models that are
realistic. In the field of machine learning, there has been substantial study on random
trees in recent years.
J.48
The C4.5 algorithm, created by Ross Quinlan [21], is used to produce Decision
Trees. In the WEKA data mining tool, decision trees are generated using the J48, or Open
Source Java, version of C4.5 release [22]. This Decision Tree algorithm is typical.
Decision Tree Induction is one of the classification techniques used in data mining. From
the pre-classified data set, a model is inductively trained using the Classification
algorithm. The values of the attributes or features characterize each data item. One way to
think about classification is as a mapping from a collection of features to a certain class.

Random Forests
Leo Breiman created the by Random Forest [3], which is a collection of unpruned
classification or regression trees created from a random sampling of training data. The features
chosen throughout the induction procedure are at random. The predictions of the ensemble are
combined (majority vote for classification, average for regression). Each tree is cultivated in
accordance with:
• If the training set contains N cases, but with replacement, then by randomly selecting N
cases. The training set for developing the tree will be this sample.
• The variable m is chosen so that for M input variables, mM is supplied at each node, m
variables are chosen at random from the M, and the best split on these m is used to divide
the node. The quantity m is maintained constant during the growth of the forest.
• Each tree is developed to its full potential. Pruning is not done.
Generally speaking, Random Forest performs significantly better than single tree classifiers like
C4.5. Its generalization error rate compares favorably to Adaboost's, although it is more noise-
resistant.
Classification Performance of the Experiment
The classification performance of the Decision Tree (J48) and the Random Forest for big
and small datasets is the main focus of this section. This comparison's goal is to provide a
baseline that will be helpful for the categorization situations. Additionally, it will aid in choosing
the right model.
Data Sets
We used these datasets from the UCI Machine Learning library for classification
issues [1]. Some features in the data on breast cancer are linear, whereas few are nominal.
Each dataset's comprehensive description, properties, and source can be found in the UCI
repository. The twenty datasets we utilized for our research and comparison are listed in
Table 1 along with their names, instances, and attributes. The distribution of data
variables in the corresponding three sampled data sets is shown visually in Figures 2 and
3. Figure 2 displays the Lymphography Dataset. There are 148 total instances of it, 19
total attributes, and four classes. The Dataset Sonar is depicted in Figure 3 with 208
instances, 61 attributes, and biclasses data.
Table 1: Datasets
Name Instance Attribute

# #
Lymph 148 19
Autos 205 26
Sonar 208 61
Heart-h 270 14
Breast cancer 286 10
Heart-c 303 14
Ionosphere 351 35
Colic 368 23
Colic.org 368 28
Primary tumor 399 18
Balance Scale 625 25
Soybean 683 36
Figure 2: Lymph Dataset
Credit a 690 16
Breast W 699 10
Vehicle 846 19
Vowel 990 14
Credit g 1000 21
Segment 2310 20
Waveform 5000 41
Letter 20000 17
Figure 2: Sonar Dataset

The J48 and the Random Forest both employ distinct parameter settings and variables. Binary
splits: Demonstrate the usage of binary splits in tree construction. Confidence factor: depicts tree
pruning; lower values depict greater pruning. If option is set to true, more information is shown
on the console under the heading "Debug." When reduced error pruning is employed, a seed is
implemented to randomize the data. Shows whether or not pruning is applied, unpruned. The
minimal number of instances per leaf is displayed by MinNumObj. Whether to preserve data for
visualization in an instance. numFolds: displays how much data was utilized for trimming.
Reduced error pruning: C.4.5 may or may not be substituted by reduced error pruning. Sub-tree
Raising: When pruning is done, sub-tree raising is employed. If counts at leaves are smoothed
based on Laplace, use Laplace. MaxDepth: displays the trees' maximum depth; zero is used to
represent infinite depth. Number of features utilized during random selection, numFeatures.
Number of trees to be created, numTrees. The random number that will be used as the seed value
is known as the seed.
Results and Discussion
We contrasted the Decision Tree and Random Forest classification outcomes. By employing 10-
fold cross validation, which repeats the procedure ten times using 9/10 of the data for training the
algorithm and the remaining data for testing, we were able to prevent the over-fitting issue.
Examples that were properly and wrongly identified using the Random Forest and Decision Tree
J48 classifiers are presented in Table 2. The name of the related dataset, the approximate number
of instances, and the approximate number of attributes are displayed in columns 2 and 3,
respectively. The classification results demonstrate that Decision Tree performs well with small
datasets, or fewer examples, whereas Random Forest performs better for the same amount of
attributes and large datasets, or more instances. According to the findings from the breast cancer
data set, as the number of occurrences rose from 286 to 699, the Random Forest accurately
identified instances climbed from 69.23 percent to 96.13 percent.

Table 2: Comparison of the Random Forest and the Decision Forest results.
Serial Data Set No. of No. of Random Forest Decision Tree(J48) Results
NO instances attributes Correctly Incorrectly Correctly Incorrectly
classified classified classified classified
instances instances instances instances
1 Lymph 148 19 81.08% 18.91% 77.02% 22.97%
2 Autos 205 26 83.41% 16.58% 80.95% 18.04%
3 Sonar 208 61 80.77% 19.23% 71.15% 28.84%
4 Heart-h 270 14 77.89% 22.10% 80.95% 19.04%
5 Breast 286 10 69.23% 30.76% 75.52% 24.47%

cancer
6 Heart-c 303 14 81.51% 18.48% 77.56% 22.44%
7 Ionosphere 351 35 92.88% 7.12% 91.45% 8.54%
8 colic 368 23 86.14% 13.85% 85.32% 14.67%
9 Colic.org 368 28 68.47% 31.52% 66.30% 33.69%
10 Primary 399 18 42.48% 57.52% 39.82% 60.17%

tumor
11 Balance 625 25 80.48% 19.52% 76.64% 23.36%
Scale
12 Soyben 683 36 91.65% 8.34% 91.50% 8.49%
13 Credit a 690 16 85.07% 14.92% 86.09% 13.91%
14 Breast W 699 10 96.13% 3.68% 94.56% 5.43%
15 Vehicle 846 19 77.06% 22.93% 72.45% 27.54%
16 vowel 990 14 96.06% 3.03% 81.51% 18.48%
17 Credit g 1000 21 72.50% 27.50% 70.50% 29.50%
18 Segment 2310 20 97.66% 2.33% 96.92% 3.07%
19 Waveform 5000 41 81.94% 18.06% 75.30% 24.70%
20 Letter 20,000 17 94.71% 5.29% 87.98% 12.02%

References
[1] A. Asuncion, & D. Newman, (2007). Retrieved July 26, 2022, from
https://archive.ics.uci.edu/ml/index.php
[2] Tin Kam Ho. (1998). The random subspace method for constructing decision forests. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844.
doi:10.1109/34.709601
[3] Breiman, L. (2001). Machine Learning, 45(1), 5-32. doi:10.1023/a:1010933404324
[4] Mitchell, T. M. (1997). Machine learning. New York: MacGraw-Hill.
[5] Yael Ben-Haim, Elad Tom-Tov “A Streaming Parallel Decision Tree Algorithm” , 2010 Fp
[7] Tin Kam Ho. (1998). The random subspace method for constructing decision forests. IEEE
doi:10.1109/34.709601
[8] Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees.
Neural Computation, 9(7), 1545-1588. doi:10.1162/neco.1997.9.7.1545
[9] Lepetit, V., & Fua, P. (2006). Keypoint recognition using randomized trees. IEEE
doi:10.1109/tpami.2006.188
[10] Ozuysal, M., Fua, P., & Lepetit, V. (2007). Fast keypoint recognition in ten lines of code.
2007 IEEE Conference on Computer Vision and Pattern Recognition.
doi:10.1109/cvpr.2007.383123
[11] J. Winn and A. Criminisi, “Object class recognition at a glance,” CVPR, video track, 2006
[12] Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic Texton forests for image
categorization and segmentation. 2008 IEEE Conference on Computer Vision and Pattern
Recognition. doi:10.1109/cvpr.2008.4587503
[13] Yin, P., Criminisi, A., Winn, J., & Essa, I. (2007). Tree-based classifiers for bilayer video
segmentation. 2007 IEEE Conference on Computer Vision and Pattern Recognition.
doi:10.1109/cvpr.2007.383008
[14] Bosch, A., Zisserman, A., & Munoz, X. (2007). Image classification using random forests
and Ferns. 2007 IEEE 11th International Conference on Computer Vision.
doi:10.1109/iccv.2007.4409066
[15] Apostoloff, N., & Zisserman, A. (2007). Who are you? - real-time person identification.
Procedings of the British Machine Vision Conference 2007. doi:10.5244/c.21.48
[16] Introduction to decision trees and random · pdf file introduction to decision trees and
random forests Ned Horning. ... decision trees tend to overfit training data which can give -
[PDF document]. (n.d.). Retrieved July 28, 2022, from
https://fdocuments.net/document/introduction-to-decision-trees-and-random-introduction-
to-decision-trees-and-random.html?page=1
18] Random Forest for bioinformatics - Carnegie Mellon University. (n.d.). Retrieved July 28,
2022, from https://www.cs.cmu.edu/~qyj/papersA08/11-rfbook.pdf
[19] Yang, P., Hwa Yang, Y., B. Zhou, B., & Y. Zomaya, A. (2010). A review of Ensemble
Methods in Bioinformatics. Current Bioinformatics, 5(4), 296-308.
doi:10.2174/157489310794072508
[20] Zhao, Y., & Zhang, Y. (2008). Comparison of decision tree methods for finding active
objects. Advances in Space Research, 41(12), 1955-1959. doi:10.1016/j.asr.2007.07.020
[22] C4.5 algorithm. (2022, February 10). Retrieved July 28, 2022, from
http://en.wikipedia.org/wiki/C4.5_algorithm

Decision Tree and Random Forest

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decision Tree and Random Forest

Uploaded by

Copyright:

Available Formats

DECISION TREE AND RANDOM FOREST 1

DECISION TREE AND RANDOM FORESTAND CLASSIFYING DATA SETS

[Author Name(s), First M. Last, Omit Titles and Degrees]

[Include any grant/funding information and a complete correspondence address.]

Keywords: Decision Tree, Random Forest, Classification, Bagging

DECISION TREE AND RANDOM FOREST

ability to increase in complexity without compromising their generalization ability accurately to

achieving maximum accuracy. This was accomplished by building a decision-tree-based

made up of a number of trees that were built in a systematic manner by pseudo-randomly

discriminant analysis while also solving the over fitting issue.

Random Forests have drawn a lot of interest in machine learning [8].

general-purpose vision tools.

predictions and resulting in higher efficiency. .

approach. Random Forest is effective, understandable, and non-parametric for a variety of

Three key characteristics of Random Forest have drawn attention [17]:

• Reliable forecasting outcomes for a range of applications

how close samples are to one another.

Advantages and Disadvantages of Decision Trees

2. Data normalization is not necessary for a decision tree.

3. Data scaling is not necessary when using a decision tree.

values in the data.

impact on the decision tree's structure.

3. It takes more time to train a decision tree model.

Decision Tree method.

Advantages and Disadvantages of Random Forest

classifiers would not.

3. It may be applied as a classification and regression model.

4. It handles null values.

balance data sets.

inefficient for real-time forecasts when there are a lot of trees.

2. Random forest is not a descriptive tool; it is a predictive modelling tool.

Difference between Decision Tree and Random Forest

scientists frequently employ it in practice.

and comprehended sequentially.

Mathematics Involved in Decision Tree and Random Forest

into account and measured.

A tree is referred to as MONOTHETIC if just one characteristic is taken into account at

necessary. We remove an impurity-related factor in order to produce these trees.

Equation 2: Other Decision Tree Impurity formulas

generates a binary split, which is then used by the CART algorithm.

As the root node, an attribute with a low Gini index is preferred.

Formula for calculating Gini index is:-

how much information a feature offers us with a class.

Formula to calculate information gain

Information Gain= Entropy(S) - [(Weighted Avg) *Entropy(each feature) ]

Classification in Decision Tree and Random Forest

Figure 1: Tree Structure

same class, the procedure is finished.

algorithms accessible. An overview of a few Decision Tree-based algorithms is provided below.

The splitting criterion for the decision/regression tree in REPTree is information

values. REP Tree is a quick learner of Decision Trees.

An undetermined collection of alternative trees with K random attributes at each

trees in recent years.

think about classification is as a mapping from a collection of features to a certain class.

• Each tree is developed to its full potential. Pruning is not done.

Classification Performance of the Experiment

the right model.

instances, 61 attributes, and biclasses data.

Name Instance Attribute

Figure 2: Sonar Dataset

is known as the seed.

Results and Discussion