Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

First, we load the data and separate two dataset.

In addition, we transform some variables in


factors.

bank.df <- read.csv("UniversalBank.csv")


#change numerical variables to categorical first

bank.df$Personal.Loan = as.factor(bank.df$Personal.Loan)
bank.df$Online = as.factor(bank.df$Online)
bank.df$CreditCard = as.factor(bank.df$CreditCard)

set.seed(12345)

train.index <- sample(row.names(bank.df), 0.6*dim(bank.df)[1])


valid.index <- setdiff(row.names(bank.df), train.index)
train.df <- bank.df[train.index, ]
valid.df <- bank.df[valid.index, ]

Question1
library(reshape)
library(reshape2)

pv.bank = melt(train.df,id=c("CreditCard","Personal.Loan"),variable= "Online")


recast.bank=dcast(pv.bank,CreditCard+Personal.Loan~Online)
recast.bank[,c(1:2,14)]

We use the funtion melt and to dcast to create our pivote table. The pivot show as the number of
record for each combination of CreditCard with Personal.Loan. The results:

Question2
Consider the task of classifying a customer who owns a bank credit card and is actively
using online banking services. Looking at the pivot table that you created, what is the
probability that this customer will accept the loan offer?

recast.bank[4,3]/length(train.df$Personal.Loan)
#or
85/(1913+194+808+85)

The probability that a customer that have Credit card and online service will accept the loan offer is
0.028 or 2.8%
Question3

pv2.bank = melt(train.df,id=c("Personal.Loan"),variable = "Online")


pv3.bank = melt(train.df,id=c("Personal.Loan"),variable = "CreditCard")

recast2.bank=dcast(pv2.bank,Personal.Loan~Online)
recast3.bank=dcast(pv3.bank,Personal.Loan~CreditCard)
LoanOnline=recast2.bank[,c(1,13)]
LoanCC = recast3.bank[,c(1,14)]
table(train.df[,c(10)])

In this task we created two pivot table with Personal Loan as row variable in both cases. The
difference is in columns with Online service or Credit Card. The result of pivots tables is:

1. P(CC = 1|Loan = 1) = the proportion of credit card holders among the loan
acceptors
2. P(Online = 1|Loan = 1)
3. P(Loan = 1) = the proportion of loan acceptors
4. P(CC = 1|Loan = 0)
5. P(Online = 1|Loan = 0)
6. P(Loan = 0)

We create three tables in order to calculate the probability.

LoanCC2= table(train.df[,c(14,10)])
LoanOnline2=table(train.df[,c(13,10)])

#1. (CC = 1 | Loan = 1) (the proportion of credit card holders among the loan acceptors)
p1= 85/(85+194)
p1
#2 P(Online=1|Loan=1)
p2=169/(169+110)
p2
#3 P (Loan = 1) (the proportion of loan acceptors)
p3=279/(279+2721)
p3
#4 P(CC=1|Loan=0)
p4=808/(1913+808)
p4
#5 P(Online=1|Loan=0)
p5=1637/(1637+1084)
p5
#6 P(Loan=0)
p6=2721/(2721+279)
p6

Question 4 (1 point)
Compute the naive Bayes probability P(Loan = 1|CC = 1, Online = 1).
Note: Use the quantities that you computed in the previous question.

#Question4
naivebayes=p1*p2*p3/((p1*p2*p3)+(p4*p5*p6))
naivebayes

We use the previous calculations. For naïve Bayes we calculate the probability of the event happens
between the probability of all events.

Question 5 (1 point)
Of the two values that you computed earlier (computed in Q2 and Q4), which is a more
accurate estimate of P(Loan=1|CC=1, Online=1)?
In this case, the value in the question 4 is more accurate estimate of P(Loan=1|CC=1,
Online=1. 9.5% is very similar to the 9.6% that is the exact value. The difference between
the question 2 and the question 4 is than in the second we considerate all the
probabilities. On a other hand, the question 2, just considered one scenario.

Question 6 (3 points)
In R, run naive Bayes on the training data and examine the output and find entries that
are needed for computing P(Loan = 1|CC = 1, Online = 1). Compute this probability, and
also the predicted probability for P(Loan=1 | Online = 1, CC = 1)

library(e1071)
# run naive bayes
nb.bank <- naiveBayes(Personal.Loan ~ ., data = train.df)
nb.bank

## predict probabilities
pred.prob <- predict(nb.bank, newdata = valid.df, type = "raw")
## predict class membership
pred.class <- predict(nb.bank, newdata = valid.df)
df <- data.frame(actual = valid.df$Personal.Loan, predicted = pred.class, pred.prob)

You might also like