Professional Documents
Culture Documents
Week 6 - Random Forest
Week 6 - Random Forest
2
RANDOM FOREST
It is the use of mean values by running several decision trees at the same time,
and……
WHY RANDOM FOREST ALGORITHM?
• At the same time, Overfitting is a critical issue that adversely affects the results, but for the Random Forest
algorithm, the probability of an Overfitting problem is reduced if there are enough trees in the forest.
• The classifier of the Random Forest algorithm can handle missing values and finally the classifier of the Random
▪ There is a direct relationship between the number of trees in the algorithm and the result it can achieve. As the
▪ The difference between the Random Forest algorithm and the Decision Tree algorithm is that the processes of
finding the root node and splitting the nodes work randomly in the Random Forest.
RANDOM FOREST ALGORITHM
CALCULATING THE SIGNIFICANCE OF VARIABLES
• After training a random forest, it is natural to ask which variables have the most
predictive power. Variables with high importance are drivers of the outcome and
their values have a significant impact on the outcome values.
• By contrast, variables with low importance might be omitted from a model, making
it simpler and faster to fit and predict
7
RANDOM FOREST ALGORITHM
RANDOM FOREST PRINCIPLE
There are two stages in the Random Forest algorithm, one is to generate a
Random Forest, and the other is to make predictions over the Random Forest
1- Select the “K” features randomly from the total “M” features. (must be K<M)
2- Calculate the "d" node using the best split point among the "K" features.
3- Split the node into child nodes using the best split.
In the next step, when the Random Forest classifier is created, we will make predictions.
1- Take the test features and use the rules of the randomly generated decision tree to predict the results and store the
3- The one with the highest vote is selected as the final prediction from the Random Forest algorithm.
RANDOM FOREST ALGORITHM ADVANTAGES
• They give better results in numerical and categorical variables compared to single
trees.
11
RANDOM FOREST ALGORITHM DISADVANTAGES
• Although random forests are superior to single decision trees, the prediction
accuracy in complex problems is generally lower than gradient trees.
• The outcome model of random forests is more difficult to interpret than a single
decision tree.
• Random forest can require significant memory for storage, as dozens of tree models
need to be maintained individually.
12