Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Assignment 3

Due Saturday 11:59 pm (Week 7)


We have introduced a few of models in the class but there are a lot of more to explore.

There are two types of predictive models: classification models and regression models.

Regression models: These are going to be responsible for predicting a number for us. For
example, how many months we have before a piece of our equipment will start to fail on a
machine so you can replace it.

Classification models: They work to predict the membership of a class. For example, you may
work on a project to figure out whether a passenger in Titanic is likely to survive, or whether a
person is going to buy the Iphone based on their gender, salary, and so many different features.
We will stick with the binary options for this kind of model which means the results have to
come in at a 0 or at 1.

In this assignment, you will be required to do research about Decision tree classifier, but you
don’t need to find their mathematical formulas. You are only required to know what they are,
how and when to use them. And we will compare the Decision tree classifier and Naïve Bayes
Algorithm for the dataset of Iphone purchases.

1. Do the research of Decision Tree Classifier. The Decision tree classifier is used for
classification models. The following link is an easy example from W3schools to show
the general idea of Decision tree. Please read the following link carefully. You will
need to learn how to interpret the result from it.
https://www.w3schools.com/python/python_ml_decision_tree.asp

You will need to find the answers for the following questions in order to help you
understand how the Decision Tree Classifier works. Write your answers for each
question in your write-ups.
• What is Entropy?
• Which one is better for predictions, low entropy or high entropy?
• What is information gain?
• What is Gini Index?
• According to the Sklearn reference website below, https://scikit-
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
What is the default criterion for this model?
2. Use the Decision Tree Classifier and Naïve bayes Algorithm for the following data.

We have a dataset that shows which users have purchased an iPhone. Our goal in this
project is to predict if the customer will purchase an iPhone or not given their gender,
age, and salary.
You can find the dataset, iphone_purchase_records.csv, in Canvas.
It will be the similar requirement to our assignment 2. You are required to hand in two
documents. A write-up in pdf, and a py file of your code.

1) use the exploratory data analysis (What you have learned from week 2 to week 5),
that will help you spot if you have any null values, and you will know your data
better and how to organize and use your data for your assignment.
2) Build three models to predict whether a person would purchase the Iphone or not
by using Naïve Bayes Algorithm (1 model) and Decision Tree Classifier (2 models,
one is using entropy, and the other is using gini). Your models should follow the
following requirements.
• Preparing datasets by reserved 20% of the data for testing.
• Model building – use Decision tree classification and Naïve bayes theorem
on the training dataset separately.
For the decision tree, use entropy and gini as the criterion in the
DecisionTreeClassifier of Sklearn.
• Evaluate performance – using predict method to predict if the customer will
purchase an iPhone or not. Also test the accuracy score and R2-score.
• Tree Visualizations for the two models of Decision Tree Classifier.
• Interpret the results.

Hint:
• For Naïve bayes Algorithm, you may want to use LabelEncoder to convert Gender
to number so that you can apply it to the Naïve Bayes classification algorithm.
For more information of LabelEncoder, please take a look at the following link:
https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
• For Decision Tree Classifier, You will need to convert everything into numerical
value.

General Requirements for all your assignments.


You will need to write up your findings, interpretations, and results for this
assignment. Use the Machine Learning Workflow of Week 6 as a guideline
for your assignment. It will be a great idea to screenshot your codes, results,
and graphs so that you can explain your findings along with them. (It is also
easier for me to follow you when I read your paper). A pdf file is required.
There is no page limit but try to be straightforward with your answers.

The py file that you have used to finish your assignment. (It may be a
duplicate or somewhat duplicate of the screenshots that you have inserted in
your paper but that is okay. I would like to look over your codes.)

You might also like