Data Mining Journal 5 Kashan

Bahria University,
Karachi Campus
LAB EXPERIMENT NO.

_5_
LIST OF TASKS
TASK NO OBJECTIVE
1 Using python implement Naïve Bayes with two different splitting ratios on Heart Attack Analysis
& prediction dataset to predict the chances of heart failure in a person and performed the
following steps: ▪ Data Pre-processing step ▪ Fitting Naive Bayes to the Training set ▪ Predicting
the test result ▪ Test accuracy of the result(Creation of Confusion matrix) ▪ Visualizing the test
set result. ▪ Compare the accuracies
2 Design a workflow with the help of Knime to predict whether a user buys a product by clicking
the ad on the site based on their salary, age, and gender dataset provided in the lab (i.e. Social
network ad dataset).
Date: ___________
Kashan Riaz 02-131212-075 Data mining Journal

Task No. 1:Naïve Bayes in Python
Solution:
Using heart.csv (heart attack analysis database)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset

heart_data = pd.read_csv("heart.csv")
# Data Pre-processing step

X = heart_data.drop(columns=['output']) # Features
y = heart_data['output'] # Target variable
# Splitting the dataset into training and testing sets with two different ratios
X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y, test_size=0.2,
random_state=42)
X_train2, X_test2, y_train2, y_test2 = train_test_split(X, y, test_size=0.3,
random_state=42)
# Fitting Naive Bayes to the Training set

nb_classifier1 = GaussianNB()
nb_classifier2 = GaussianNB()
nb_classifier1.fit(X_train1, y_train1)
nb_classifier2.fit(X_train2, y_train2)
# Predicting the test result

y_pred1 = nb_classifier1.predict(X_test1)
y_pred2 = nb_classifier2.predict(X_test2)
# Test accuracy of the result and Creation of Confusion matrix

accuracy1 = accuracy_score(y_test1, y_pred1)
accuracy2 = accuracy_score(y_test2, y_pred2)
confusion_matrix1 = confusion_matrix(y_test1, y_pred1)

confusion_matrix2 = confusion_matrix(y_test2, y_pred2)
print("Accuracy for test set with 80-20 split:", accuracy1)

print("Confusion Matrix for test set with 80-20 split:")
print(confusion_matrix1)
print("\nAccuracy for test set with 70-30 split:", accuracy2)

print("Confusion Matrix for test set with 70-30 split:")
print(confusion_matrix2)
# Visualizing the test set result

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.heatmap(confusion_matrix1, annot=True, fmt='g', cmap='Blues', cbar=False)
plt.title('Confusion Matrix - 80-20 Split')
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.subplot(1, 2, 2)
sns.heatmap(confusion_matrix2, annot=True, fmt='g', cmap='Blues', cbar=False)
plt.title('Confusion Matrix - 70-30 Split')
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.tight_layout()
plt.show()
# Accept user input for features

age = float(input("Enter age: "))
sex = float(input("Enter sex (0 for female, 1 for male): "))
cp = float(input("Enter cp (chest pain type): "))
trtbps = float(input("Enter trtbps (resting blood pressure): "))
chol = float(input("Enter chol (serum cholestoral in mg/dl): "))
fbs = float(input("Enter fbs (fasting blood sugar > 120 mg/dl, 1 for true, 0 for
false): "))
restecg = float(input("Enter restecg (resting electrocardiographic results): "))
thalachh = float(input("Enter thalachh (maximum heart rate achieved): "))
exng = float(input("Enter exng (exercise induced angina, 1 for yes, 0 for no): "))
oldpeak = float(input("Enter oldpeak (ST depression induced by exercise relative to
rest): "))
slp = float(input("Enter slp (slope of the peak exercise ST segment): "))
caa = float(input("Enter caa (number of major vessels (0-3) colored by flourosopy):
"))
thall = float(input("Enter thall (thalium stress result): "))
new_data_point = [[age, sex, cp, trtbps, chol, fbs, restecg, thalachh, exng, oldpeak,
slp, caa, thall]]
# Predict using the classifier trained with 80-20 split

prediction_80_20 = nb_classifier1.predict(new_data_point)
print("Prediction using 80-20 split:", prediction_80_20)
# Predict using the classifier trained with 70-30 split

prediction_70_30 = nb_classifier2.predict(new_data_point)
print("Prediction using 70-30 split:", prediction_70_30)

Task No.2: Knime workflow
Solution:
Use social network dataset


Data Mining Journal 5 Kashan

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Journal 5 Kashan

Uploaded by

Copyright:

Available Formats

Bahria University,

LAB EXPERIMENT NO.

Kashan Riaz 02-131212-075 Data mining Journal

# Load the dataset

# Data Pre-processing step

# Fitting Naive Bayes to the Training set

# Predicting the test result

# Test accuracy of the result and Creation of Confusion matrix

confusion_matrix1 = confusion_matrix(y_test1, y_pred1)

print("Accuracy for test set with 80-20 split:", accuracy1)

print("\nAccuracy for test set with 70-30 split:", accuracy2)

# Visualizing the test set result

# Accept user input for features

# Predict using the classifier trained with 80-20 split

# Predict using the classifier trained with 70-30 split

Kashan Riaz 02-131212-075 Data mining Journal

Kashan Riaz 02-131212-075 Data mining Journal

You might also like