Professional Documents
Culture Documents
Bagged Decision Tree
Bagged Decision Tree
As we know that bagging ensemble methods work well with the algorithms that
have high variance and, in this concern, the best one is decision tree algorithm. In
the following Python recipe, we are going to build bagged decision tree ensemble
model by using BaggingClassifier function of sklearn with DecisionTreeClasifier (a
classification & regression trees algorithm) on Pima Indians diabetes dataset.
First, import the required packages as follows −
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
Now, we need to load the Pima diabetes dataset as we did in the previous
examples −
path = “pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass',
'pedi', 'age', 'class']
data = read_csv(path, names=headernames)
array = data.values
X = array[:,0:8]
Y = array[:,8]
We need to provide the number of trees we are going to build. Here we are building
150 trees −
num_trees = 150
Next, build the model with the help of following script −
model = BaggingClassifier(base_estimator = cart, n_estimators =
num_trees, random_state = seed)
Calculate and print the result as follows −
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())
Output
0.7733766233766234
The output above shows that we got around 77% accuracy of our bagged decision
tree classifier model.