Welcome to Scribd!

Assign2 GLM

Uploaded by

0% found this document useful (0 votes)

9 views3 pages

This document describes using a logistic regression model to predict customer purchases of caravan insurance from a dataset containing demographic information on 5,822 individuals. The dataset was split randomly into training and test sets. A preliminary logistic regression model using all 85 predictors was refined to use fewer variables to minimize complexity. The refined model achieved an AIC of 1949.1. When predicting purchases on the test and training sets, the model correlated better with actual purchases for the training set. The confusion matrix showed the model accurately predicted true negatives but underestimated true positives, with low sensitivity of 0.020 for predicting purchases.

Original Description:

Original Title

Assign2_GLM

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

9 views3 pages

Assign2 GLM

Uploaded by

Chelsi Gondalia

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 3

Search inside document

Predicting Customer Interest

The Caravan dataset contains 85 predictors that measure demographic characteristics for 5,822
individuals and “Purchase,” which indicates whether or not a given individual purchases a caravan
insurance policy. We begin by converting our Purchase column type from character to integer.
Essentially, all the ‘Yes’ responses are changed to 1 and ‘No’ responses are changed to 0. A preview of
the data set and the distributions of purchases (1) and non-purchases (0) from the Purchase column can be
seen is Figure 1 and 2 respectively. It should be noted that non-purchases are much more frequent that
purchases.

Figure 1. Preview of the Caravan dataset: 5822 records and 85 predictors + 1 Purchase variable.

Figure 2. Frequency of purchases (1) and non-purchase (0).

The data is then randomly split into test and train sets. The size of the test set is chosen to be of 1,000
samples. A logistic regression model is fitted to the training set. For the preliminary model, all the 85
variables are included and an AIC of 1949.1 was obtained. After reviewing the preliminary model, a
function “corr_var” was used to determine the variables that had the highest correlation to Purchase. Once
these variables were identified using the graph in Figure 4, the logistic regression model was refined to
contain fewer variables to minimize the complexity as well as the AIC value. Results of this refined
model can be viewed in Figure 5.
Figure 4. Bar chart of Purchase and Top 10 variables with the highest correlation.

Figure 5. Model summary for th refined logistic regression model with formula: Purchase ~
PPERSAUT+APLEZIER+PWAPART+MKOOPKLA+PBRAND+MOPLLAAG+MINKGEM
(APERSAUT and AWAPART were removed from the model to reduce AIC value further).
We then use the refined logistic regression model to predict the probabilities of purchase in both the test
and train set. These predictions are compared to the actual values of purchases in Figure 6. We can see
that the train set predict slightly more occurrences of purchases than the test set. The correlation
predictions and the actual values was calculated to compliment the understanding of Figure 6. Correlation
between train predictions and actual purchases was found to be 0.307 and the correlation between test
predictions and actual purchases was found to be a little lower at 0.245 which may be considered
acceptable. This gives us an insight that purchases may be infrequent event. This is expected as we have
seen in the bar graph in Figure 2.
Figure 6. Comparison of predictions to actual purchases.
A threshold of 0.5 was set i.e., when the probability (prediction) is above 0.5, the event is assumed to be a
purchase or 1. Moving on, the confusion matrix was calculated to represent: True negatives (5466), True
positives (7) and False negatives (341), False positives (8) as shown in Figure 7.

Figure 7. Confusion matrix containing both test and train sets.

From the confusion matrix, it is evident that the logistic regression model is predicting a significant
number of false negatives (341) while is quite accurate at predicting true negatives/non-purchases.
Correspondingly, the sensitivity and specificity were found to be 0.020 and 0.998 respectively. This
indicates that the logistic regression model is accurate in predicting non-purchases as it has high
specificity. However due to its low sensitivity, the predictions of purchases are not reliable since the
model underestimates the occurrence of purchases. This may be reduced by increasing the threshold to be
less than 0.5.

Data Mining Business Report Hansraj Yadav
Document34 pages
Data Mining Business Report Hansraj Yadav
P Venkata Krishna Rao
83% (12)
Solid State Chemistry and It Applications Answer For Question
Document32 pages
Solid State Chemistry and It Applications Answer For Question
Đặng Vũ Hoàng Đức
89% (9)
Technical Analysis for the Trading Professional
From Everand
Technical Analysis for the Trading Professional
Constance M. Brown
No ratings yet
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
Rating: 3.5 out of 5 stars
3.5/5 (5)
Advanced Portfolio Management: A Quant's Guide for Fundamental Investors
From Everand
Advanced Portfolio Management: A Quant's Guide for Fundamental Investors
Giuseppe A. Paleologo
No ratings yet
Business Report M2 PDF
Document14 pages
Business Report M2 PDF
A d
100% (2)
Telco Customer Churn
Document11 pages
Telco Customer Churn
Hamza Qazi
100% (2)
CF Chapter 11 Excel Master Student
Document40 pages
CF Chapter 11 Excel Master Student
julita08
No ratings yet
BOP Assignment
Document9 pages
BOP Assignment
Saurav Prakash
No ratings yet
Curve Estimation Explained
Document4 pages
Curve Estimation Explained
Gerónimo Maldonado-Martínez
50% (2)
Predictive Analytics Practice Problem
Document3 pages
Predictive Analytics Practice Problem
Rajesh Sharma
No ratings yet
Project 2: Submitted By: Sumit Sinha Program & Group: Pgpbabionline May19 - A
Document17 pages
Project 2: Submitted By: Sumit Sinha Program & Group: Pgpbabionline May19 - A
sumit sinha
No ratings yet
ML Metrics
Document9 pages
ML Metrics
zpddf9hqx5
No ratings yet
Girish Chadha - 29th December 2022
Document35 pages
Girish Chadha - 29th December 2022
Girish Chadha
100% (3)
Logistic Regression in SPSS
Document4 pages
Logistic Regression in SPSS
CART11
No ratings yet
Problem 2 - Logistic Regression and LDA
Document24 pages
Problem 2 - Logistic Regression and LDA
saarang K
No ratings yet
Quality Guru
Document18 pages
Quality Guru
tehky63
No ratings yet
625 Preliminary
Document39 pages
625 Preliminary
Gursimar Singh
No ratings yet
Kuiper Ch03
Document35 pages
Kuiper Ch03
Sunil Pandey
No ratings yet
Final Project Report 22540 PDF
Document9 pages
Final Project Report 22540 PDF
Maria Akhter
No ratings yet
Assessment-1 Sabina K
Document6 pages
Assessment-1 Sabina K
Sabina
No ratings yet
Leads Scoring Case Study - by Sangram Sinha 0 Sourav Banerjee PDF
Document16 pages
Leads Scoring Case Study - by Sangram Sinha 0 Sourav Banerjee PDF
Sai Vishwanath
No ratings yet
Predictive Model: Submitted by
Document27 pages
Predictive Model: Submitted by
Ankita Mishra
100% (2)
SSRN Id3938897
Document56 pages
SSRN Id3938897
21COB164 Aisha Tanvir
No ratings yet
Assignment 1 SOLUTION
Document11 pages
Assignment 1 SOLUTION
Subash Adhikari
No ratings yet
Report Group 8 Final
Document13 pages
Report Group 8 Final
Ankit Jaiswal
No ratings yet
The Advantages of Least Squares Monte Carlo
Document9 pages
The Advantages of Least Squares Monte Carlo
Nilabjo Kanti Paul
0% (1)
1-Linear Regression
Document22 pages
1-Linear Regression
Srinivasa G
No ratings yet
Logistic Regression Lecture Notes
Document11 pages
Logistic Regression Lecture Notes
Pankaj Pandey
No ratings yet
Statistical Analysis of Financial Time Series
Document10 pages
Statistical Analysis of Financial Time Series
Carlos Rodríguez Costas
No ratings yet
Capstone Assessment
Document18 pages
Capstone Assessment
21324jesika
No ratings yet
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
Document6 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
Varshini Kandikatla
100% (1)
1 All Costs Can Be Analysed Into Their Fixed and Variable Elements
Document7 pages
1 All Costs Can Be Analysed Into Their Fixed and Variable Elements
Apoorv
No ratings yet
SPSS 4
Document10 pages
SPSS 4
Lehar Gaba
No ratings yet
FinalProject STAT4444
Document11 pages
FinalProject STAT4444
IncreDABels
No ratings yet
Report Logistic Regression
Document17 pages
Report Logistic Regression
Zara Batool
No ratings yet
Predictive Model For E-Commerce
Document3 pages
Predictive Model For E-Commerce
Nipun Goyal
100% (1)
Employee Attrition Miniblogs
Document15 pages
Employee Attrition Miniblogs
Codein
100% (1)
Interpretation Logistic Regression.
Document8 pages
Interpretation Logistic Regression.
rupeshgoldar97
No ratings yet
How To Do Black Litterman Step by Step
Document17 pages
How To Do Black Litterman Step by Step
James JianYong Song
No ratings yet
Clustering Analysis: Reading The Data
Document15 pages
Clustering Analysis: Reading The Data
KATHIRVEL S
100% (1)
SmartPhone Data Analysis
Document6 pages
SmartPhone Data Analysis
Chakravarthy Narnindi Sharad
No ratings yet
Regression Log
Document4 pages
Regression Log
mrarcadian26
No ratings yet
Assessment-1 Sabina K
Document6 pages
Assessment-1 Sabina K
Sabina
No ratings yet
Think Pair Share Team G1 - 7
Document8 pages
Think Pair Share Team G1 - 7
Akash Kashyap
100% (1)
Web and Social Group 9 Report
Document14 pages
Web and Social Group 9 Report
Dheeraj Jakkuva
No ratings yet
Detail Project Report SMDM
Document25 pages
Detail Project Report SMDM
Deepak Padiyar
100% (1)
Project Submission Clustering
Document20 pages
Project Submission Clustering
ankitbhagat
No ratings yet
Lec10 F
Document22 pages
Lec10 F
Reshma khatun
No ratings yet
EvaluationQuestions Class 10 Ai
Document6 pages
EvaluationQuestions Class 10 Ai
kritavearn
No ratings yet
Economics of Finance and Investment
Document10 pages
Economics of Finance and Investment
Umar Ali
No ratings yet
Project 3 CAPM and Fama-French Three Factor Model Professor Natalia Gershun Reynold D'silva Hanxiang Tang Wen Guo
Document7 pages
Project 3 CAPM and Fama-French Three Factor Model Professor Natalia Gershun Reynold D'silva Hanxiang Tang Wen Guo
Niyati Shah
No ratings yet
1-Value at Risk (Var) Models, Methods & Metrics - Excel Spreadsheet Walk Through Calculating Value at Risk (Var) - Comparing Var Models, Methods & Metrics
Document65 pages
1-Value at Risk (Var) Models, Methods & Metrics - Excel Spreadsheet Walk Through Calculating Value at Risk (Var) - Comparing Var Models, Methods & Metrics
Vaibhav Kharade
No ratings yet
Data Mining: 1 Task: Clustering
Document36 pages
Data Mining: 1 Task: Clustering
El
No ratings yet
SMDM Project Report
Document35 pages
SMDM Project Report
Pramit K
100% (1)
Linear To Logistic Regression
Document10 pages
Linear To Logistic Regression
Noor Ali
No ratings yet
MIM Corporate Finance Berlin
Document7 pages
MIM Corporate Finance Berlin
ismael.benamara
No ratings yet
Association Rule Mining
Document13 pages
Association Rule Mining
gayatrikhasiram
No ratings yet
Advanced Statistics Project
Document12 pages
Advanced Statistics Project
Eric Norman
No ratings yet
Acquisition Analytics Assignment
Document15 pages
Acquisition Analytics Assignment
Harshad Ambekar
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Problems: Photovoltaics: EUB - 7 - 133 Renewable Energy Technologies 1
Document6 pages
Problems: Photovoltaics: EUB - 7 - 133 Renewable Energy Technologies 1
mdmarufur
100% (1)
Diss 11 - Q1 - M14
Document12 pages
Diss 11 - Q1 - M14
MARJORIE BAUTISTA
100% (1)
Learn JAVASCRIPT in Arabic 2021 (41 To 80)
Document57 pages
Learn JAVASCRIPT in Arabic 2021 (41 To 80)
osama mohamed
No ratings yet
Euphemisms
Document2 pages
Euphemisms
sifo nebbar
No ratings yet
2023 LO Grade 10 Term 2 MG Final
Document10 pages
2023 LO Grade 10 Term 2 MG Final
sheronmoyoshalom
No ratings yet
Layout, Flow Process, Koordinator Wilayah & Inbound Outbound
Document11 pages
Layout, Flow Process, Koordinator Wilayah & Inbound Outbound
Hackiem Meazza
No ratings yet
The Importance of Greetings
Document2 pages
The Importance of Greetings
Vignes Krishnan
No ratings yet
Ra 8749-Philippine Clean Air Act
Document45 pages
Ra 8749-Philippine Clean Air Act
Labshare MDC
No ratings yet
Assignment 1
Document2 pages
Assignment 1
Biru Sontakke
No ratings yet
Structural Analysis and Design of G+4 Building
Document166 pages
Structural Analysis and Design of G+4 Building
TRANCEOP
No ratings yet
Basic Theories of Pa
Document25 pages
Basic Theories of Pa
naja firdaus
No ratings yet
Ebook Ebook PDF Research Methods and Statistics A Critical Thinking Approach 4th Edition PDF
Document41 pages
Ebook Ebook PDF Research Methods and Statistics A Critical Thinking Approach 4th Edition PDF
karen.cambell302
100% (37)
Cooperating With Others
Document7 pages
Cooperating With Others
warren bascon
No ratings yet
1 s2.0 S0022247X14006933 Main
Document12 pages
1 s2.0 S0022247X14006933 Main
Valentin Motoc
No ratings yet
Eternity and Contradiction
Document111 pages
Eternity and Contradiction
Marco Cavaioni
No ratings yet
Math 252: Eastern Mediterranean University
Document4 pages
Math 252: Eastern Mediterranean University
Doğu Manalı
No ratings yet
1 Factors Multiples and Primes
Document8 pages
1 Factors Multiples and Primes
ARU PALANI
No ratings yet
Schneider - Industrial Automation - Contractor, Push Button, SMPS, Limit Switch Price List Wef 01-08-2022
Document164 pages
Schneider - Industrial Automation - Contractor, Push Button, SMPS, Limit Switch Price List Wef 01-08-2022
M/S.TEJEET ELECTRICAL & ENGG. CORP.
No ratings yet
331 Nepsy - AEQ - Short Handouts
Document17 pages
331 Nepsy - AEQ - Short Handouts
FARHAT HAJER
No ratings yet
2% L-Leucin 3% PEG 6000
Document10 pages
2% L-Leucin 3% PEG 6000
Con Sóng Âm Thầm
No ratings yet
Gravity Model-Lecture Notes
Document4 pages
Gravity Model-Lecture Notes
Timon Innocent
No ratings yet
Hauff Technik - Cable Entry Systems
Document16 pages
Hauff Technik - Cable Entry Systems
Khairul Amri
No ratings yet
The Business of Belief - How The World's Best Marketers, Designers, Salespeople, Coaches, Fundraisers, Educators, Entrepreneurs and Other Leaders Get Us To Believe (PDFDrive)
Document104 pages
The Business of Belief - How The World's Best Marketers, Designers, Salespeople, Coaches, Fundraisers, Educators, Entrepreneurs and Other Leaders Get Us To Believe (PDFDrive)
Djantoro Paul Jean
No ratings yet
Industrial Profiles 2018 Bros en
Document28 pages
Industrial Profiles 2018 Bros en
Brcak
No ratings yet
Kinetic Theory of Gases 2018
Document23 pages
Kinetic Theory of Gases 2018
Paloma
No ratings yet
Kids Playing Music PowerPoint Templates
Document48 pages
Kids Playing Music PowerPoint Templates
Kyou Kazune
No ratings yet
S10332300-3001 - 1 (1) General Instrument and Control System
Document30 pages
S10332300-3001 - 1 (1) General Instrument and Control System
appril26
No ratings yet
Pembuatan Alat Pengolah Limbah Cair Dengan Metode Elektrokoagulasi Untuk Industri Tahu Kota Samarinda
Document6 pages
Pembuatan Alat Pengolah Limbah Cair Dengan Metode Elektrokoagulasi Untuk Industri Tahu Kota Samarinda
ela safira
No ratings yet
1.4.18. 2020 An Investigation On Botan Bridge Cllapse During Construction
Document6 pages
1.4.18. 2020 An Investigation On Botan Bridge Cllapse During Construction
Juan Bravo
No ratings yet