Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 9

Random Forest

BOUALLEG MOHAMED LAMINE

master 2 SI
Random Forest
• Random forest is identified as a collection of
decision trees. Each tree estimates a
classification, and this is called a “vote”. Ideally,
we consider each vote from every tree and
chose the most voted classification (Majority-
Voting).
• Random Forest follow the same bagging
process as the decision trees but each time a
split is to be performed, the search for the split
variable is limited to a random subset of m of
the p attributes (variables or features) aka Split-
Attribute Randomization :
• classification trees: m = √p
• regression trees: m = p/3
• m is commonly referred to as mtry
• Random Forests produce many unique trees.
Bagging vs Random Forest

• Bagging introduces randomness into


the rows of the data.

• Random forest
introduces randomness into the
rows and columns of the data

• Combined, this provides a more


diverse set of trees that almost
always lowers our prediction error.

Split-Attribute Randomization : Prediction Error


Random Forest : Out-of-Bag (OOB) Observations
Random Forest : Tuning
Random Forest - Strengths & Weaknesses
implemantation
 import matplotlib.pyplot as plt
 from sklearn import datasets
 from sklearn.model_selection import train_test_split
 from mlxtend.plotting import plot_decision_regions
 from sklearn.metrics import accuracy_score
 from sklearn.ensemble import RandomForestClassifier
 iris = datasets.load_iris()
 X = iris.data[:, 2:]
 y = iris.target
 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
 forest = RandomForestClassifier(criterion='gini',
 n_estimators=5,
 random_state=1,
 n_jobs=2)
 forest.fit(X_train, y_train
 y_pred = forest.predict(X_test)
 print('Accuracy: %.3f' % accuracy_score(y_test, y_pred))
 X_combined = np.vstack((X_train, X_test))
 y_combined = np.hstack((y_train, y_test))
 fig, ax = plt.subplots(figsize=(7, 7))
 plot_decision_regions(X_combined, y_combined, clf=forest)
 plt.xlabel('petal length [cm]')
 plt.ylabel('petal width [cm]')
 plt.legend(loc='upper left')

You might also like