Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

DWDM Midterm Assignment

Course: Data Warehousing & Data Mining

Submitted by:
ID:
Section:
To
Course Teacher:

Date of Submission:

American International University-Bangladesh (AIUB)


I am selecting Naive Bayes classification because,
1. This algorithm works very fast and can easily predict the class of a test
dataset.
2. We can use it to solve multi-class prediction problems as it’s quite useful
with them.
3. Naive Bayes classifier performs better than other models with less training
data if the assumption of independence of features holds.
4. If we have categorical input variables, the Naive Bayes algorithm performs
exceptionally well in comparison to numerical variables.

I have selected this dataset from Kaggle website. This dataset is used to
predict whether a patient is likely to get stroke based on the input
parameters like gender, age, and smoking status. Each row in the data
provides relevant information about the patient.
There are,
Attribute: 9
 ID
 Gender
 Age
 Ever Married
 Work Type
 Residence
 Avg Glucose Level
 Smoking Status
 Stroke
Stroke represents class attribute.

Total instance: 5110


Processes in WEKA:
Original Dataset:
Training Dataset:

Here are the data that were selected by using WEKA,


60% of instances were taken in this step.
Data of training set,

By using Naïve Bayes my accuracy of this training set is 90.7697%,


Test Dataset:
Here are the rest half of 40% data that were not taken during training
set,
By using Naïve Bayes my accuracy of this test set is 92.4658%,
Cross Validation:
By using Naïve Bayes my accuracy of this Cross Validation is 91.7808%,
Predicted Data:
Here is the prediction from our data,

You might also like