Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

MANAGEMENT INFORMATION SYSTEMS

DATA MINING AND BUSINESS INTELLIGENCE

Week 6 – Random Forest


PROF. DR. GÖKHAN SILAHTAROĞLU
Lecturer:. NADA MISK
ENSEMBLE LEARNING &
RANDOM FOREST

 Ensemble learning is based on combining a series of many learned models


with the aim of creating an improved composite classification model

 Random forests are also a model of ensemble learning.

 One of the drawbacks of learning with a single tree is an overfitting or


memorization problem.

2
RANDOM FOREST

 It is the use of mean values by running several decision trees at the same time,

 Significantly improves the quality of the decision tree

 Used to prevent noise


RANDOM FOREST - EXAMPLE

• You received a job offer from two

different companies at the same time f

and……
WHY RANDOM FOREST ALGORITHM?

• The algorithm can be used in both classification and regression tasks.

• At the same time, Overfitting is a critical issue that adversely affects the results, but for the Random Forest

algorithm, the probability of an Overfitting problem is reduced if there are enough trees in the forest.

• The classifier of the Random Forest algorithm can handle missing values and finally the classifier of the Random

Forest algorithm can be modeled for categorical values.


RANDOM FOREST ALGORITHM

▪ First, the Random Forest algorithm is a supervised classification algorithm.

▪ The algorithm randomly creates a forest.

▪ There is a direct relationship between the number of trees in the algorithm and the result it can achieve. As the

number of trees increases, we get a precise result.

▪ The difference between the Random Forest algorithm and the Decision Tree algorithm is that the processes of

finding the root node and splitting the nodes work randomly in the Random Forest.
RANDOM FOREST ALGORITHM
CALCULATING THE SIGNIFICANCE OF VARIABLES

• After training a random forest, it is natural to ask which variables have the most
predictive power. Variables with high importance are drivers of the outcome and
their values have a significant impact on the outcome values.

• By contrast, variables with low importance might be omitted from a model, making
it simpler and faster to fit and predict

7
RANDOM FOREST ALGORITHM
RANDOM FOREST PRINCIPLE

There are two stages in the Random Forest algorithm, one is to generate a

Random Forest, and the other is to make predictions over the Random Forest

classifier created in the first stage.

Generating Random Forest Pseudo Code:

1- Select the “K” features randomly from the total “M” features. (must be K<M)

2- Calculate the "d" node using the best split point among the "K" features.

3- Split the node into child nodes using the best split.

4- Repeat Steps 1 and 3 until the “L” node number is reached.

5- Repeat Steps 1 and 4 “n” times to create “n” number trees.


RANDOM FOREST PRINCIPLE

In the next step, when the Random Forest classifier is created, we will make predictions.

1- Take the test features and use the rules of the randomly generated decision tree to predict the results and store the

predicted result (target).

2- Votes are calculated for each predicted target.

3- The one with the highest vote is selected as the final prediction from the Random Forest algorithm.
RANDOM FOREST ALGORITHM ADVANTAGES

• No tuning or feature selection is required.

• Overfitting possibility is less.

• They are less sensitive to noise.

• They give better results in numerical and categorical variables compared to single
trees.

11
RANDOM FOREST ALGORITHM DISADVANTAGES

• Although random forests are superior to single decision trees, the prediction
accuracy in complex problems is generally lower than gradient trees.

• The outcome model of random forests is more difficult to interpret than a single
decision tree.

• Random forest can require significant memory for storage, as dozens of tree models
need to be maintained individually.

12

You might also like