Professional Documents
Culture Documents
LAB4
LAB4
LAB4
REPORT LABWORK 4
I, Decition Tree - DT
Iris Dataset
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5 3.6 1.4 0.2
Divide the original dataset into two subsets: one for training (80%) and one for testing (20%).
Build a DT for the training subset and test the built model for data from the testing subset.
cross_scores = []
knn = KNeighborsClassifier(n_neighbors=100)
scores = cross_val_score(knn, X_train_seed, y_train_seed, cv=10,
scoring='accuracy')
cross_scores.append(scores.mean())
cross_scores : [0.8801470588235294]
seed_dt = DecisionTreeClassifier()
seed_dt.fit(X_train_seed,y_train_seed)
y_predict_seed = seed_dt.predict(X_test_seed)
y_predict_seed_train = seed_dt.predict(X_train_seed)
Accuracy score on train data: 0.9404761904761905
Accuracy score on test data: 0.8571428571428571
Confusion Matrix - Train:
[[58 0 1]
[ 2 53 0]
[ 7 0 47]]
Confusion Matrix - Test:
[[ 9 1 1]
[ 0 15 0]
[ 4 0 12]]
MSE- train dataset:
0.20238095238095238
MSE- test dataset:
0.5
0 1 2 3 4 5 6 7 8 9 ... 32 33 34 35 36 37 38 39 40 41
target 0 0 2 2 1 1 2 1 2 2 ... 1 1 2 2 2 1 2 1 1 1
Predicted 0 0 2 0 1 1 2 1 2 0 ... 1 1 2 0 2 1 2 1 1 1
Accuracy : 0.8571428571428571
0 1 2 3 4 5 6 7 8 9 ... 32 33 34 35 36 37 38 39 40 41
target 2 1 2 1 1 1 2 2 0 0 ... 0 1 0 2 1 1 1 1 0 1
Predicted 2 1 2 1 1 1 2 2 0 0 ... 0 1 0 2 1 0 1 1 0 1
Accuracy: 0.9285714285714286
Conclusion:
The Decision Tree model on the Iris dataset demonstrated high accuracy on both training and testing sets,
with minimal classification errors.
The Random Forests analysis on the Seeds dataset showcased improved accuracy when aggregating
predictions from multiple Decision Trees, highlighting the ensemble's strength in enhancing predictive
performance.
This analysis provides insights into the effectiveness of Decision Trees and Random Forests for classification
tasks, showcasing their potential in different scenarios.