Professional Documents
Culture Documents
Data Mining Journal 5 Kashan
Data Mining Journal 5 Kashan
Karachi Campus
LIST OF TASKS
TASK NO OBJECTIVE
1 Using python implement Naïve Bayes with two different splitting ratios on Heart Attack Analysis
& prediction dataset to predict the chances of heart failure in a person and performed the
following steps: ▪ Data Pre-processing step ▪ Fitting Naive Bayes to the Training set ▪ Predicting
the test result ▪ Test accuracy of the result(Creation of Confusion matrix) ▪ Visualizing the test
set result. ▪ Compare the accuracies
2 Design a workflow with the help of Knime to predict whether a user buys a product by clicking
the ad on the site based on their salary, age, and gender dataset provided in the lab (i.e. Social
network ad dataset).
Date: ___________
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
# Splitting the dataset into training and testing sets with two different ratios
X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y, test_size=0.2,
random_state=42)
X_train2, X_test2, y_train2, y_test2 = train_test_split(X, y, test_size=0.3,
random_state=42)
nb_classifier1.fit(X_train1, y_train1)
nb_classifier2.fit(X_train2, y_train2)
plt.subplot(1, 2, 1)
sns.heatmap(confusion_matrix1, annot=True, fmt='g', cmap='Blues', cbar=False)
plt.title('Confusion Matrix - 80-20 Split')
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.subplot(1, 2, 2)
sns.heatmap(confusion_matrix2, annot=True, fmt='g', cmap='Blues', cbar=False)
plt.title('Confusion Matrix - 70-30 Split')
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.tight_layout()
plt.show()
new_data_point = [[age, sex, cp, trtbps, chol, fbs, restecg, thalachh, exng, oldpeak,
slp, caa, thall]]