Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Bahria University,

Karachi Campus

LAB EXPERIMENT NO.


_5_

LIST OF TASKS
TASK NO OBJECTIVE

1 Using python implement Naïve Bayes with two different splitting ratios on Heart Attack Analysis
& prediction dataset to predict the chances of heart failure in a person and performed the
following steps: ▪ Data Pre-processing step ▪ Fitting Naive Bayes to the Training set ▪ Predicting
the test result ▪ Test accuracy of the result(Creation of Confusion matrix) ▪ Visualizing the test
set result. ▪ Compare the accuracies
2 Design a workflow with the help of Knime to predict whether a user buys a product by clicking
the ad on the site based on their salary, age, and gender dataset provided in the lab (i.e. Social
network ad dataset).

Date: ___________

Kashan Riaz 02-131212-075 Data mining Journal


Task No. 1:Naïve Bayes in Python
Solution:
Using heart.csv (heart attack analysis database)

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset


heart_data = pd.read_csv("heart.csv")

# Data Pre-processing step


X = heart_data.drop(columns=['output']) # Features
y = heart_data['output'] # Target variable

# Splitting the dataset into training and testing sets with two different ratios
X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y, test_size=0.2,
random_state=42)
X_train2, X_test2, y_train2, y_test2 = train_test_split(X, y, test_size=0.3,
random_state=42)

# Fitting Naive Bayes to the Training set


nb_classifier1 = GaussianNB()
nb_classifier2 = GaussianNB()

nb_classifier1.fit(X_train1, y_train1)
nb_classifier2.fit(X_train2, y_train2)

# Predicting the test result


y_pred1 = nb_classifier1.predict(X_test1)
y_pred2 = nb_classifier2.predict(X_test2)

# Test accuracy of the result and Creation of Confusion matrix


accuracy1 = accuracy_score(y_test1, y_pred1)
accuracy2 = accuracy_score(y_test2, y_pred2)

confusion_matrix1 = confusion_matrix(y_test1, y_pred1)


confusion_matrix2 = confusion_matrix(y_test2, y_pred2)

print("Accuracy for test set with 80-20 split:", accuracy1)


print("Confusion Matrix for test set with 80-20 split:")
print(confusion_matrix1)

print("\nAccuracy for test set with 70-30 split:", accuracy2)


print("Confusion Matrix for test set with 70-30 split:")
print(confusion_matrix2)

# Visualizing the test set result


plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
sns.heatmap(confusion_matrix1, annot=True, fmt='g', cmap='Blues', cbar=False)
plt.title('Confusion Matrix - 80-20 Split')
plt.xlabel('Predicted labels')
plt.ylabel('True labels')

plt.subplot(1, 2, 2)
sns.heatmap(confusion_matrix2, annot=True, fmt='g', cmap='Blues', cbar=False)
plt.title('Confusion Matrix - 70-30 Split')
plt.xlabel('Predicted labels')
plt.ylabel('True labels')

plt.tight_layout()
plt.show()

# Accept user input for features


age = float(input("Enter age: "))
sex = float(input("Enter sex (0 for female, 1 for male): "))
cp = float(input("Enter cp (chest pain type): "))
trtbps = float(input("Enter trtbps (resting blood pressure): "))
chol = float(input("Enter chol (serum cholestoral in mg/dl): "))
fbs = float(input("Enter fbs (fasting blood sugar > 120 mg/dl, 1 for true, 0 for
false): "))
restecg = float(input("Enter restecg (resting electrocardiographic results): "))
thalachh = float(input("Enter thalachh (maximum heart rate achieved): "))
exng = float(input("Enter exng (exercise induced angina, 1 for yes, 0 for no): "))
oldpeak = float(input("Enter oldpeak (ST depression induced by exercise relative to
rest): "))
slp = float(input("Enter slp (slope of the peak exercise ST segment): "))
caa = float(input("Enter caa (number of major vessels (0-3) colored by flourosopy):
"))
thall = float(input("Enter thall (thalium stress result): "))

new_data_point = [[age, sex, cp, trtbps, chol, fbs, restecg, thalachh, exng, oldpeak,
slp, caa, thall]]

# Predict using the classifier trained with 80-20 split


prediction_80_20 = nb_classifier1.predict(new_data_point)
print("Prediction using 80-20 split:", prediction_80_20)

# Predict using the classifier trained with 70-30 split


prediction_70_30 = nb_classifier2.predict(new_data_point)
print("Prediction using 70-30 split:", prediction_70_30)

Kashan Riaz 02-131212-075 Data mining Journal


Task No.2: Knime workflow
Solution:
Use social network dataset

Kashan Riaz 02-131212-075 Data mining Journal


Kashan Riaz 02-131212-075 Data mining Journal
Kashan Riaz 02-131212-075 Data mining Journal

You might also like