Decision Tree

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

If the number of contacts performed during this campaign and for this client is less than or equal to 1

and the balance is greater than 194, with the age of less than 28, the client has 68% chance to say yes to
deposit in the bank.

https://www.youtube.com/watch?v=M8ueMDaDzco

https://archive.ics.uci.edu/ml/datasets/Bank+Marketing

https://www.kaggle.com/datasets/rodsaldanha/arketing-
campaign?fbclid=IwAR3c3bouzsdcJiJCqvqLwZJmgpP4aWCnHMBuX1U6MKSaZv3RvVUOkGb6U38

https://www.datatechnotes.com/2017/07/decision-tree.html

https://www.youtube.com/watch?v=tU3Adlru1Ng&t=202s

https://www.kaggle.com/datasets/rodsaldanha/arketing-
campaign?fbclid=IwAR1HAo_JWP9K8YDIyNiLfcIIRHzBy263-Oovkd4El4bN9Qb089q5CJzHhrw

https://www.guru99.com/r-decision-trees.html?fbclid=IwAR3qGQKRid5aOKnJhB0zFoKYFi3yG0YysfONR-
cfrbn7DA9FdFk1svp0tpk

Step 3) Create train/test set

In step 3, we will create a train/ test set

For the step 3, Before you train your model, you need to perform two vital steps:

First is to create a train test and second is to create a test set: You train the model on the train set and
test the prediction on the test set (i.e., unseen data)

The common practice is to split the data 80/20, 80 percent of the data serves to train the model, and 20
percent to make predictions. Here, you need to create two separate data frames. You don’t want to touch
the test set until you finish building your model. For us to be able to partition the data into Training and
Testing/Validation datasets, we will be using the command set.seed(1234). The purpose of the set.seed
function in R is to allow you, to set a seed and a generator (with the kind argument) in R.

So, I am going to use a random seed 123, I’ll name PD for partitioned data, wherein we take a sample of
size 2 number of Rho, we specify our data file then we say replace equals TRUE and then we specify the
probabilities, so let’s say we want the training data to be 80%, so we will put 0.8 and validation data
remaining 0.2 or 20%
Once we run this line the data will be partitioned into two sets let’s call training data as “TRAIN”, so this
is within my data file and we had called this PD (partitioned data) to two equal sign will mean that when
it is specified as 1,then comma, that means all columns, so we run that so you can see that the training
data set has 1,798 observations of 31 variables.

Next, we are naming validation data as validate and this data PD equals 2, comma,. Then no number,
Which means all columns, so you can see validate data set has about 442 observations of 31 variables.
Step 4) Build the model

Since we are done creating the train & test/validation dataset, we are now ready to build the Decision
Tree model with party package. The syntax for party decision tree function is shown in this slide:

If you have not yet downloaded party package, the first step is to install it in your R.

After downloading it, you simply need to call that package using library function. Now, it will be available
for making, decision trees.

For this dataset, let me label my decision tree as “tree”. I am going to use the function called ctree or
classification tree. Our target variable is named Response, remember, we created the last variable/
response variable as a categorical variable then we will use Education, Income, Number of kids at home
and number of teens at home as our independent variables. The data we are using is the train data, so I
would say data equals’ train. If we want to make the tree smaller, we can make it by controlling some
parameters, so to do that we can add some controls to this line. Let criteria equals to 0.95, this is the
confidence level (since we deal with market research, we will use 95% confidence level, while if we say
minimum split is 200, it means the branch will split into two. Only, when the sample size is at least 200.
This way, it will be going to restrict the growth of the tree. So, it’s up to you, and the size of the dataset
you’re using, if you will set your minimum split into a small or large number.
Next, If we want to visualize the decision tree let’s we will use the command plot(tree).

So, this tree is generally upside down, you have the root at the top and leaves are at the bottom.

The most important variable to the prediction model is basically at the top. Out of the four variables we
have used, teen home variable or if the customer has/have a teenager children at home is the most
important variable in helping to predict who will respond to an offer for a product or service such as
depositing in this bank.

EXPLANATION:
Let me help you to easily understand and to easily interpret this decision tree shown in the screen. At the
top you can see the variable teenhome. so, if the client has no teenager at home, go on to the left side of
the tree, on the other hand, if the client has/have a teenager at home, go on the right side of the tree.

As a recall, we have turned education variable into a discrete variable, we set the category as follows:
1=Bachelor, 2=2n Cycle (2 years course),3=Graduation, 4=Masters, 5=Phd.

So, in the decision tree shown here, if the client has/have a teenager at home and has any of the
educational attainment such as Bachelor,2n Cycle, Graduation or Masters, it has a low chance to avail in
depositing in the bank.

While, on the case, where client has no teenager at home, and have an income of more than 81,574, the
probability of the customer will deposit or will not deposit in the bank are both almost 50%.

Using this decision tree, the company can easily classify who will accept the campaign of the product or
service of the company.

You might also like