HW#5

First, we load the data and separate two dataset.
In addition, we transform some variables in

factors.
bank.df <- read.csv("UniversalBank.csv")

#change numerical variables to categorical first
bank.df$Personal.Loan = as.factor(bank.df$Personal.Loan)
bank.df$Online = as.factor(bank.df$Online)
bank.df$CreditCard = as.factor(bank.df$CreditCard)
set.seed(12345)
train.index <- sample(row.names(bank.df), 0.6*dim(bank.df)[1])

valid.index <- setdiff(row.names(bank.df), train.index)
train.df <- bank.df[train.index, ]
valid.df <- bank.df[valid.index, ]
Question1
library(reshape)
library(reshape2)
pv.bank = melt(train.df,id=c("CreditCard","Personal.Loan"),variable= "Online")

recast.bank=dcast(pv.bank,CreditCard+Personal.Loan~Online)
recast.bank[,c(1:2,14)]
We use the funtion melt and to dcast to create our pivote table. The pivot show as the number of
record for each combination of CreditCard with Personal.Loan. The results:
Question2
Consider the task of classifying a customer who owns a bank credit card and is actively
using online banking services. Looking at the pivot table that you created, what is the
probability that this customer will accept the loan offer?
recast.bank[4,3]/length(train.df$Personal.Loan)
#or
85/(1913+194+808+85)
The probability that a customer that have Credit card and online service will accept the loan offer is
0.028 or 2.8%
Question3
pv2.bank = melt(train.df,id=c("Personal.Loan"),variable = "Online")

pv3.bank = melt(train.df,id=c("Personal.Loan"),variable = "CreditCard")
recast2.bank=dcast(pv2.bank,Personal.Loan~Online)
recast3.bank=dcast(pv3.bank,Personal.Loan~CreditCard)
LoanOnline=recast2.bank[,c(1,13)]
LoanCC = recast3.bank[,c(1,14)]
table(train.df[,c(10)])
In this task we created two pivot table with Personal Loan as row variable in both cases. The
difference is in columns with Online service or Credit Card. The result of pivots tables is:
1. P(CC = 1|Loan = 1) = the proportion of credit card holders among the loan
acceptors
2. P(Online = 1|Loan = 1)
3. P(Loan = 1) = the proportion of loan acceptors
4. P(CC = 1|Loan = 0)
5. P(Online = 1|Loan = 0)
6. P(Loan = 0)
We create three tables in order to calculate the probability.
LoanCC2= table(train.df[,c(14,10)])
LoanOnline2=table(train.df[,c(13,10)])
#1. (CC = 1 | Loan = 1) (the proportion of credit card holders among the loan acceptors)
p1= 85/(85+194)
p1
#2 P(Online=1|Loan=1)
p2=169/(169+110)
p2
#3 P (Loan = 1) (the proportion of loan acceptors)
p3=279/(279+2721)
p3
#4 P(CC=1|Loan=0)
p4=808/(1913+808)
p4
#5 P(Online=1|Loan=0)
p5=1637/(1637+1084)
p5
#6 P(Loan=0)
p6=2721/(2721+279)
p6
Question 4 (1 point)
Compute the naive Bayes probability P(Loan = 1|CC = 1, Online = 1).
Note: Use the quantities that you computed in the previous question.
#Question4
naivebayes=p1*p2*p3/((p1*p2*p3)+(p4*p5*p6))
naivebayes
We use the previous calculations. For naïve Bayes we calculate the probability of the event happens
between the probability of all events.
Question 5 (1 point)
Of the two values that you computed earlier (computed in Q2 and Q4), which is a more
accurate estimate of P(Loan=1|CC=1, Online=1)?
In this case, the value in the question 4 is more accurate estimate of P(Loan=1|CC=1,
Online=1. 9.5% is very similar to the 9.6% that is the exact value. The difference between
the question 2 and the question 4 is than in the second we considerate all the
probabilities. On a other hand, the question 2, just considered one scenario.
Question 6 (3 points)
In R, run naive Bayes on the training data and examine the output and find entries that
are needed for computing P(Loan = 1|CC = 1, Online = 1). Compute this probability, and
also the predicted probability for P(Loan=1 | Online = 1, CC = 1)
library(e1071)
# run naive bayes
nb.bank <- naiveBayes(Personal.Loan ~ ., data = train.df)
nb.bank
## predict probabilities
pred.prob <- predict(nb.bank, newdata = valid.df, type = "raw")
## predict class membership
pred.class <- predict(nb.bank, newdata = valid.df)
df <- data.frame(actual = valid.df$Personal.Loan, predicted = pred.class, pred.prob)

HW#5

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HW#5

Uploaded by

Copyright:

Available Formats

First, we load the data and separate two dataset.

In addition, we transform some variables in

bank.df <- read.csv("UniversalBank.csv")

train.index <- sample(row.names(bank.df), 0.6*dim(bank.df)[1])

pv.bank = melt(train.df,id=c("CreditCard","Personal.Loan"),variable= "Online")

pv2.bank = melt(train.df,id=c("Personal.Loan"),variable = "Online")

We create three tables in order to calculate the probability.

You might also like