Professional Documents
Culture Documents
Decision Tree and Random Forest
Decision Tree and Random Forest
[Institutional Affiliation(s)]
Author Note
Abstract
In this research, we initially give a brief background and then introduction to Decision
Tree and Random Forest, further discussing their advantages and disadvantages. We also
examine the differences between the two. We evaluate the classification outcomes of Decision
Tree and Random Forest, for the classification of twenty diverse datasets. We used 20 data sets
with 148 to 20,000 instances each that were accessible in the UCI repository [1]. We contrasted
the classification outcomes produced by the Random Forest and J48 Decision Tree approaches.
We spoke about the advantages and disadvantages of applying these models to big and small
data sets. The classification results demonstrate that Decision tree performs well with small data
sets, but Random Forest performs better with the same amount of characteristics and big data
sets, i.e. with more occurrences. The data set also shows that when number of instances
increased the percentage of correctly classified instances increased for Random Forest.
Background
Decision trees date back to the early days of the emergence of written records. This narrative
exemplifies one of the main advantages of trees: highly interpretable outcomes with a simple,
tree-like presentation, which in turn improves comprehension and the distribution of data.
Decision trees, also known as classification trees or regression trees, have their computational
roots in models of biological and mental processes. The complementary growth of statistical
decision trees and machine learning trees is driven by their common heritage.
Ho (1995) proposed a technique to overcome a problems and challenges faced on the complexity
of decision tree classifiers created using traditional means. Such classifiers are limited in their
new input. The suggested approach makes use of oblique decision trees, which are useful for
enhancing training set accuracy. The method's main step is to construct numerous trees in feature
space subspaces that have been randomly chosen. The classification of the two trees together can
be monotonically enhanced since the trees generalize their classification in complimentary ways.
In 1997, Amit and Geman introduced a method for form identification based on the cooperative
induction of shape features and tree classifiers. They came to the conclusion that no classifier
based on the complete feature set could be assessed since it was impossible to know a
beforehand which characteristics were relevant due to the almost unlimited number of features.
Standard decision tree building based on a fixed length feature vector was not possible due to the
quantity and kind of features. An alternate strategy would be to create numerous trees while
considering a tiny random sample of attributes at each node, constraining their complexity to rise
DECISION TREE AND RANDOM FOREST 4
with tree depth. Terminal nodes have estimates of the associated posterior distribution across
shape classes. The image may be classified by distributing it downward and aggregating the
output.
In another study, Ho (1998) [2] offered a solution to the conflict between over fitting and
classifier that maintained the maximum accuracy on training data while simultaneously
increasing generalization accuracy as the classifier's complexity increased. The classifier was
choosing subsets of the feature vector's components, or trees built in randomly selected
subspaces. The subspace approach demonstrated its superiority when compared to single-tree
classifiers and other forest construction techniques when empirically evaluated against publically
accessible data sets. Then he proposes the Random Forest ensemble method, which integrates
already-existing approaches to create a set of decision trees with carefully controlled variation.
Random Forest is an ensemble learning technique for classification and regression. In order to
create a collection of decision trees with controlled variation, Breiman (2001) [3] developed a
technique that combines his bagging sampling methodology ((1996a)) with the randomly chosen
characteristics provided independently by Ho (1995); Ho (1998); and Amit and Geman (1997).
Each decision tree in the ensemble is created via bagging, using a sample with replacement taken
from the training dataset. According to statistics, the sample is expected to include roughly 64%
of instances at least once. The other cases (about 36%) are referred to as out-of-bag instances,
whereas the examples in the sample are known as in-bag instances. To identify the class label of
an unlabeled instance, each tree in the ensemble serves as a base classifier. By using majority
voting, which assigns one vote to each classifier's projected class label, the instance is classified
DECISION TREE AND RANDOM FOREST 5
according to the class label that has received the most votes [17], this is briefly discussed in
introduction.
Introduction
There are several domains where the Decision Tree method [4] is used. They are employed in a
variety of applications, including statistical data comparison, text classification, and text
extraction. In addition, libraries can use the Decision Tree method to classify books into several
groups according to their type. It may be used in hospitals to diagnose disorders including
tumors, cancer, heart issues, hepatitis, etc. It is used by businesses, hospitals, schools, colleges,
and universities to keep track of their records. It can also be used for statistics in the stock
market.
Decision Tree algorithms are efficient [5] because they offer classification rules that are
understandable to humans. In addition to this, it has certain flaws, one of which is the sorting of
all numerical properties when the tree decides to break a node. Such a split on sorting all
numerical characteristics becomes expensive, i.e., in terms of efficiency or running time and
memory space, especially if Decision Trees are set on data that is vast in size, i.e., it contains
more instances. Breiman [3] introduced the concept of random forests in 2001. Random forests
outperform existing classifiers such as support vector machines, neural networks, and
Methods that employ an ensemble of different classifiers and randomization to provide variety,
such as bagging or random subspaces [6, 7], have shown to be particularly effective. They
employ randomization throughout the induction phase to provide diversity and create classifiers
that are different from one another. Because of its effectiveness in discriminative classification,
Lepetit et al. [9, 10] introduced Random Forests to the world of computer vision community.
His work in this area served as the basis for studies using Random Forests in areas including
class recognition [11, 12], bi-layer video segmentation [13], picture classification [14], and
person identification [15]. The Random Forest also naturally supports a wide range of visual
signals, such as color, form, texture, and depth. Random Forests are thought of as efficient
According to the definition given in [3], Random Forest is a general concept of classifier
combination that makes use of L tree-structured base classifiers {h(X, Ѳn), N=1, 2, 3, L}, where
X stands for the input data and {Ѳn} is a group of identical and dependent distributed random
vectors. Data from the available data are randomly chosen for each Decision Tree. For instance,
by randomly picking a feature subset or a subset of training data for each Decision Tree, a
Random Forest (as in Random Subspaces) for each Decision Tree may be created (the concept of
Bagging). The features of a Random Forest are chosen at random for each decision split. By
picking features at random, the correlation across trees is decreased, increasing the accuracy of
In addition to maintaining the benefits of Decision Trees, Random Forest frequently outperforms
Decision Trees thanks to its usage of random subsets of variables, bagging on samples as
previously mentioned, voting system [17], and decision-making process. The Random Forest can
accommodate missing values and can handle continuous, categorical, and binary data, making it
suitable for high dimensional data modelling. There is no need to trim the trees because Random
Forest is robust enough to handle overfitting issues thanks to the bootstrapping and ensemble
dataset types [18] in addition to having excellent prediction accuracy. Compared to other
DECISION TREE AND RANDOM FOREST 7
prominent machine learning techniques, Random Forest offers a highly special combination of
model interpretability and prediction accuracy. Because ensemble techniques and random
sampling are used, accurate forecasts and superior generalization are made.
The generalization feature of the bagging method improves with lower variation and lower
overall generalization error. As a result, the boosting strategy accomplishes the reduction in bias
[19].
• A trained model can calculate the relative weights of each characteristic and determine
The classification performance of the Decision Tree (J48) and the Random Forest for big and
small datasets is discussed ahead in the paper. This comparison's goal is to provide a baseline
that will be helpful for the classification situations. Additionally, it will aid in choosing the right
model.
ADVANTAGES:
1. Preparing the data for decision trees during pre-processing is easier than it is for other
algorithms.
4. Additionally, the process of creating a decision tree is not materially impacted by missing
5. Technical teams and stakeholders may easily understand a decision tree model since it is
so straightforward.
DISADVANTAGE:
1. An unstable decision tree might result from a little change in the data that has a huge
2. Compared to other algorithms, a decision tree's calculations may become far more
complicated.
4. Because of the intricacy and length of time required, decision tree training is
relatively costly.
5. Regression and the prediction of continuous values cannot be done well with the
ADVANTAGES:
1. It provides variables importance, which assists in identifying the variables that have a
favorable influence.
DECISION TREE AND RANDOM FOREST 9
2. Over fitting is a common problem with machine learning models; random forest
5. When a class in the data is less frequent than other classes, it may automatically
6. The approach is appropriate for challenging jobs since it quickly manages variables.
DISADVANTAGES:
1. The biggest drawback of random forest is that it might become too sluggish and
Decision trees are graphs that show all potential outcomes of a decision using a branching
technique, which is a key distinction between them and the random forest algorithm. A series of
decision trees that operate in accordance with the output are produced by the random forest
method, in contrast.
Because the random forest approach is so precise and because modern computers and systems
can often handle big, previously unmanageable datasets, machine learning engineers and data
The random forest algorithm's drawback is that you can't see the final model if your computer's
processing capacity is insufficient if your dataset is excessively huge. They can take a lot of time
to be created.
A basic decision tree has the advantage of being simple to understand. We are able to
immediately predict the conclusion since we know which variable and which value the variable
uses to split the data while we are building the decision tree. The models created by the random
forest method, on the other hand, are more complex since they combine different decision trees.
We must decide how many trees to generate and how many variables are required for each node
when creating a random forest method model. In general, adding more trees will increase
performance and predictability while decreasing calculation speed. The end solution for
regression issues is the average of all the trees. The sample in the tree target cell is the initial
level of means in a random forest method regression model, followed by all trees. In contrast to
linear regression, it estimates values beyond the observed range using prior observations.
DECISION TREE AND RANDOM FOREST 11
More trees are needed for more precise predictions, which slows down models. You would most
likely obtain an answer that was really near to the correct answer if there was a technique to
build several trees by averaging their responses. In this post, we examined the distinctions
between the decision tree and the random forest algorithms. A decision tree is a network structure
that use branching to deliver information in all conceivable ways. The random forest method, in
contrast, combines decision trees from all of their choices based on the outcome. A decision
tree's key benefit is that it can swiftly adapt to the dataset and that the final model can be seen
Decision Tree
Decision trees use mathematics during the learning process. To begin, we need to identify a tree
structure and decision rules for each node using a dataset D = X, y. Each node will divide the
dataset into two or more disjoint subsets, each designated by the letter D (l,i)*, where l stands for
the layer number and i for the subset number. If every one of our labels in this subset belongs to
the same class, the subset is said to be PURE, this node is labeled as a leaf node, and this branch
of the tree has terminated. If not, the separation criteria will be applied once again.
DECISION TREE AND RANDOM FOREST 12
In some, more complicated datasets, to get to a stage in which every leaf node is pure may
require extremely deep decision trees that will lead to overfitting of the dataset. Due to this, we
often stop before we arrive at pure nodes and instead have to develop some more sophisticated
termination strategies in which the nodes may be impure but the error is accounted for and
measured. To reach a point where every leaf node is pure in some more complex datasets, it may
be necessary to build exceptionally deep decision trees, which will cause the over fitting of the
dataset. As a result, we frequently stop before reaching pure nodes and must instead develop
more complex termination techniques in which the nodes may be impure but the error is taken
each node and POLYTHETIC if more than one are taken into considerations. Simpler trees are
typically preferred since they are simpler to read and use. Most programming languages allow
you to constrain a decision tree to produce either monothetic or polythetic trees, although it's
nearly always better to start with the simpler and only expand in complexity if absolutely
Impurity Measures
The formula below is used to calculate Entropy impurity or information impurity the formula
below:
Equation 1
DECISION TREE AND RANDOM FOREST 13
This equation basically informs us how predictable each node in our tree is. In the end, we want
our notes to be predictable, and we do this by making sure a node has a sizable proportion of a
certain class.
Random Forest
Gini Index
The Gini index is a method for splitting out data; it evaluates the impurity or purity of data and is
used in CART (Classification and Regression Tree) algorithms such as Decision Tree. It
Equation 3
DECISION TREE AND RANDOM FOREST 14
Information Gain
Information gain is calculated using entropy in the data set and attribute entropy, and it tells us
Equation 4
Entropy measures how much unpredictability or impurity is there in the provided data. It is used
to select the root node in the decision tree where the data will be split equally.
Regression Problems
The Random Forest Algorithm is used to solve regression issues where the mean squared error
(MSE) value is used to determine how your data branches from each node.
Equation 5
A Decision Tree represents a supervised classification strategy [20]. The concept was
inspired by the typical tree structure, which consists of a root, nodes (locations where branches
divide), branches, and leaves. Similar to this, a Decision Tree is built from nodes, which stand in
DECISION TREE AND RANDOM FOREST 15
for circles, and the segments that link the nodes, which stand in for the branches. A decision tree
is typically drawn from left to right, descends from the root, and travels downward. A root node
is the node from which the tree begins. The "leaf" node is the one at which the chain comes to an
end. Each internal node, or node that is not a leaf node, can extend two or more branches. While
the branches indicate a spectrum of values, a node represents a specific attribute. These value
ranges serve as dividing lines between the set of values for the specified attribute. Tree structure
is shown in Figure 1.
The values of the attributes of the provided data are used to group the data in the
Decision Tree. The pre-classified data is used to create a Decision Tree. The attribute that split
the data into the most appropriate classes are chosen for classification. The division of the data
items is done by the values of these feature values. Recursively, this technique is done to every
divided subset of the data items. As soon as every data item in the current subset belongs to the
We employ WEKA's Decision Trees' J48 implementation (open source software). We can
examine data in WEKA, and in addition to this, it implements techniques for regression, data
DECISION TREE AND RANDOM FOREST 16
pre-processing, clustering, classification, and visualization. WEKA has more than sixty
REPTree
gain, and the pruning method is reduced error pruning. For numeric attributes, it only
sorts values once. C4.5 uses the approach of fractional instances to manage missing
Random Tree
node are combined to create a random tree. In this context, the term "at random" refers to
a group of trees where each tree has an equal probability of being sampled. Or we may
remark that the distribution of trees is uniform. Random trees may be produced
effectively, and combining several such random trees typically results in models that are
realistic. In the field of machine learning, there has been substantial study on random
J.48
The C4.5 algorithm, created by Ross Quinlan [21], is used to produce Decision
Trees. In the WEKA data mining tool, decision trees are generated using the J48, or Open
Source Java, version of C4.5 release [22]. This Decision Tree algorithm is typical.
Decision Tree Induction is one of the classification techniques used in data mining. From
the pre-classified data set, a model is inductively trained using the Classification
algorithm. The values of the attributes or features characterize each data item. One way to
Random Forests
Leo Breiman created the by Random Forest [3], which is a collection of unpruned
classification or regression trees created from a random sampling of training data. The features
chosen throughout the induction procedure are at random. The predictions of the ensemble are
combined (majority vote for classification, average for regression). Each tree is cultivated in
accordance with:
• If the training set contains N cases, but with replacement, then by randomly selecting N
cases. The training set for developing the tree will be this sample.
• The variable m is chosen so that for M input variables, mM is supplied at each node, m
variables are chosen at random from the M, and the best split on these m is used to divide
the node. The quantity m is maintained constant during the growth of the forest.
Generally speaking, Random Forest performs significantly better than single tree classifiers like
C4.5. Its generalization error rate compares favorably to Adaboost's, although it is more noise-
resistant.
The classification performance of the Decision Tree (J48) and the Random Forest for big
and small datasets is the main focus of this section. This comparison's goal is to provide a
baseline that will be helpful for the categorization situations. Additionally, it will aid in choosing
Data Sets
We used these datasets from the UCI Machine Learning library for classification
issues [1]. Some features in the data on breast cancer are linear, whereas few are nominal.
DECISION TREE AND RANDOM FOREST 18
Each dataset's comprehensive description, properties, and source can be found in the UCI
repository. The twenty datasets we utilized for our research and comparison are listed in
Table 1 along with their names, instances, and attributes. The distribution of data
variables in the corresponding three sampled data sets is shown visually in Figures 2 and
3. Figure 2 displays the Lymphography Dataset. There are 148 total instances of it, 19
total attributes, and four classes. The Dataset Sonar is depicted in Figure 3 with 208
Table 1: Datasets
The J48 and the Random Forest both employ distinct parameter settings and variables. Binary
splits: Demonstrate the usage of binary splits in tree construction. Confidence factor: depicts tree
pruning; lower values depict greater pruning. If option is set to true, more information is shown
on the console under the heading "Debug." When reduced error pruning is employed, a seed is
implemented to randomize the data. Shows whether or not pruning is applied, unpruned. The
minimal number of instances per leaf is displayed by MinNumObj. Whether to preserve data for
visualization in an instance. numFolds: displays how much data was utilized for trimming.
Reduced error pruning: C.4.5 may or may not be substituted by reduced error pruning. Sub-tree
Raising: When pruning is done, sub-tree raising is employed. If counts at leaves are smoothed
based on Laplace, use Laplace. MaxDepth: displays the trees' maximum depth; zero is used to
represent infinite depth. Number of features utilized during random selection, numFeatures.
Number of trees to be created, numTrees. The random number that will be used as the seed value
We contrasted the Decision Tree and Random Forest classification outcomes. By employing 10-
fold cross validation, which repeats the procedure ten times using 9/10 of the data for training the
algorithm and the remaining data for testing, we were able to prevent the over-fitting issue.
Examples that were properly and wrongly identified using the Random Forest and Decision Tree
J48 classifiers are presented in Table 2. The name of the related dataset, the approximate number
of instances, and the approximate number of attributes are displayed in columns 2 and 3,
respectively. The classification results demonstrate that Decision Tree performs well with small
datasets, or fewer examples, whereas Random Forest performs better for the same amount of
attributes and large datasets, or more instances. According to the findings from the breast cancer
DECISION TREE AND RANDOM FOREST 20
data set, as the number of occurrences rose from 286 to 699, the Random Forest accurately
References
[1] A. Asuncion, & D. Newman, (2007). Retrieved July 26, 2022, from
https://archive.ics.uci.edu/ml/index.php
[2] Tin Kam Ho. (1998). The random subspace method for constructing decision forests. IEEE
doi:10.1109/34.709601
[5] Yael Ben-Haim, Elad Tom-Tov “A Streaming Parallel Decision Tree Algorithm” , 2010 Fp
[7] Tin Kam Ho. (1998). The random subspace method for constructing decision forests. IEEE
doi:10.1109/34.709601
[8] Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees.
[9] Lepetit, V., & Fua, P. (2006). Keypoint recognition using randomized trees. IEEE
doi:10.1109/tpami.2006.188
[10] Ozuysal, M., Fua, P., & Lepetit, V. (2007). Fast keypoint recognition in ten lines of code.
doi:10.1109/cvpr.2007.383123
[11] J. Winn and A. Criminisi, “Object class recognition at a glance,” CVPR, video track, 2006
DECISION TREE AND RANDOM FOREST 22
[12] Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic Texton forests for image
categorization and segmentation. 2008 IEEE Conference on Computer Vision and Pattern
Recognition. doi:10.1109/cvpr.2008.4587503
[13] Yin, P., Criminisi, A., Winn, J., & Essa, I. (2007). Tree-based classifiers for bilayer video
doi:10.1109/cvpr.2007.383008
[14] Bosch, A., Zisserman, A., & Munoz, X. (2007). Image classification using random forests
doi:10.1109/iccv.2007.4409066
[15] Apostoloff, N., & Zisserman, A. (2007). Who are you? - real-time person identification.
[16] Introduction to decision trees and random · pdf file introduction to decision trees and
random forests Ned Horning. ... decision trees tend to overfit training data which can give -
https://fdocuments.net/document/introduction-to-decision-trees-and-random-introduction-
to-decision-trees-and-random.html?page=1
18] Random Forest for bioinformatics - Carnegie Mellon University. (n.d.). Retrieved July 28,
[19] Yang, P., Hwa Yang, Y., B. Zhou, B., & Y. Zomaya, A. (2010). A review of Ensemble
doi:10.2174/157489310794072508
DECISION TREE AND RANDOM FOREST 23
[20] Zhao, Y., & Zhang, Y. (2008). Comparison of decision tree methods for finding active
[22] C4.5 algorithm. (2022, February 10). Retrieved July 28, 2022, from
http://en.wikipedia.org/wiki/C4.5_algorithm