Assignment 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

7030ICT

Introduction to Big Data


Analytics

Practice-based Assignment
Assignment 2
Assignment 2

Description
The dataset in this assignment has a series of parameter of the patient like Sex, Drug Type etc. You
will work on developing a classification tree-based AI technique to predict the drug type that should be
given to a particular patient based on their characteristics.

Tasks
1. Read the dataset into a data frame called “drugdata”. [Hints: Make sure you set
stringsAsFactors=TRUE while reading the data into the data frame.]
2. Create two sets: training (80% observations of the drug dataset) and test (20% observations of
the drug dataset) sets.
3. Create a classification tree using the training set and calculate the classification accuracy of the
tree using the test set.
4. Use cross-validation to prune the tree (tree from Step 3) optimally. You can use the
misclassification error as the basis for pruning. Calculate the classification accuracy of the
pruned tree using the test set.
5. Write each of the path (root to leaf node) as a classification rule in human-interpretable form
for your stakeholders.

Marking Criteria
Data Import Import Data 1.5 Mark
(Task 1)
1 Mark
Data Split Dataset split into training 1.5 Mark
(Task 2) and test sets
1 Mark
Classification Tree Tree Construction 6 Marks
(Task 3) Tree Plot 3 Marks
8 Marks
Accuracy Calculation 3 marks
Pruned Tree Tree Construction 6 Marks
(Task 4) Tree Plot 3 Marks
8 Marks Accuracy Calculation 3 Marks
Rules Human-interpretable 3 marks
(Task 5) classification rules
2 Marks
Total 30 Marks

Good luck J

You might also like