Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Assignment #4 Possible Points: 100

CS-770 ML Due date: 31st October 2023

Name:
_______________________________________________________________________________

Email id:
_____________________________________________________________________________

Assignment: Heart Disease Classification using Machine Learning (Decision tree and Ensemble
methods).

Objective:

Your task is to implement, evaluate, and compare various machine learning classifiers for predicting heart
disease. Employ advanced techniques for a thorough analysis of the data and classifiers’ performance.

Dataset Description: This dataset consists of 11 features and a target variable. It has 6 nominal variables
and 5 numeric variables. The target variable which we must predict 1 means patient is suffering from
heart risk and 0 means patient is normal.

Tasks:

1. Exploratory Data Analysis (EDA)

Load and inspect the dataset’s structure, summary statistics, and data types.

Visualize distributions of numerical and categorical features.

Identify and handle missing values appropriately.

Analyze correlations between features and the target variable.

2. Data Pre-processing & Splitting

Feature Selection: Decide which features are relevant for the classification task.

Data Splitting: Partition the dataset into training and testing sets (80-20 split).

3. Model Development, Training, and Evaluation

Define parameters for SVM, Logistic Regression, Decision Tree, and Random Forest classifiers.

Implement GridSearchCV for hyperparameter tuning and model selection.

For each model:

Initialize and train using a pipeline comprising StandardScaler and the model.

Compute and report accuracy, classification report, and confusion matrix on the testing set.
Visualize and interpret the confusion matrix.

4. Ensemble Learning:

Construct a Voting Classifier using the classifiers trained above. Experiment with both ‘hard’ and ‘soft’
voting strategies.

Evaluate and visualize its performance, drawing comparisons with the individual models.

5. Conclusion & Recommendations

Summarize the key findings regarding model performances.

Offer insights into which model(s) performed best and hypothesize why.

Suggest improvements or alternative approaches for future experimentation.

Deliverables:

Code Notebook: Well-commented Jupiter Notebook with sections corresponding to the tasks outlined.
Ensure your code is clean, readable, and well-documented.

Report: Concise report presenting your approach, findings, visualizations, and recommendations. The
report should be structured, coherent, and professionally formatted.

Evaluation Criteria:

Code Quality: Readability, structure, and documentation.

Analysis Depth: Extent of EDA, feature selection rationale, and hyperparameter tuning.

Model Evaluation: Appropriateness of metrics used, depth of evaluation, and clarity in visualizations.

Report Quality: Clarity, structure, depth of insight, and quality of writing in the report.

Deadline:

Submission is due by 10/31/2023.

You might also like