2.2Nguyễn Ngọc Quỳnh Anh

10 điểm
Kiểm tra Code - Đề số 2

Nguyễn Ngọ c Quỳnh Anh
2023-10-06
CALL PACKAGES
library(foreign)
library(car)
## Loading required package: carData
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':

##
## as.Date, as.Date.numeric
library(AER)
## Loading required package: sandwich
## Loading required package: survival
library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:survival':

##
## cluster
library(caTools)
INPUT DATA
setwd("D:/Kinh tế lượng/Kiểm tra Code")
leha=read.csv("leha.csv", header=TRUE)
1. Đọc tập số liệu và chia tập số liệu thành train.set và test.set theo tỷ lệ 75:25.
split=sample.split(leha,SplitRatio=0.75)
train.set=subset(leha, split==TRUE)
test.set=subset(leha,split==FALSE)
#Kích thước (số hàng và số cột) của tập số liệu

dim(train.set)
## [1] 2927 25
dim(test.set)
## [1] 1139 25
2. Ước lượng mô hình Logit, biến phụ thuộc là default_int, các biến còn lại là các biến
độc lập trên tập train.set.
model_logit=glm(default_prin~income2+age+newcustomer+notran+notran3+wpcompany
+wpmanager,data=train.set,family=binomial(link="logit"))
summary(model_logit)
##
## Call:
## glm(formula = default_prin ~ income2 + age + newcustomer + notran +
## notran3 + wpcompany + wpmanager, family = binomial(link = "logit"),
## data = train.set)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.207956 0.302096 -0.688 0.491215
## income2 -0.039067 0.012224 -3.196 0.001394 **
## age -0.026203 0.007273 -3.603 0.000315 ***
## newcustomer 0.554572 0.123262 4.499 6.82e-06 ***
## notran 2.055847 0.684118 3.005 0.002655 **
## notran3 -0.038691 1.109877 -0.035 0.972191
## wpcompany -0.760828 0.119056 -6.390 1.65e-10 ***
## wpmanager -0.490889 0.119656 -4.102 4.09e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2605.3 on 2925 degrees of freedom
## Residual deviance: 2441.6 on 2918 degrees of freedom
## (1 observation deleted due to missingness)
## AIC: 2457.6
##
## Number of Fisher Scoring iterations: 5
3. Điều chỉnh mô hình cho đến khi mô hình chỉ chứa các biến có ý nghĩa thống kê. Mô
hình cuối cùng được sử dụng để trả lời cho các câu hỏi tiếp theo.
Sau khi dù ng lệnh summary(model_logit) ta đượ c bả ng ướ c lượ ng như câ u 2.
Ta thấ y biến độ c lậ p notran3 có p-value = 0.972191 > 0.05 => Biến notran3 không có ý
nghĩa thống kê ở mức ý nghĩa 5% nên ta bỏ biến notran3 ra khỏi mô hình và ước
lượng lại mô hình model_logit1 không có biến notran3 như dưới đây:
model_logit1=glm(default_prin~income2+age+newcustomer+notran+wpcompany+wpmana
ger,data=train.set,family=binomial(link="logit"))
summary(model_logit1)
##
## Call:
## glm(formula = default_prin ~ income2 + age + newcustomer + notran +
## wpcompany + wpmanager, family = binomial(link = "logit"),
## data = train.set)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.208493 0.301711 -0.691 0.489543
## income2 -0.039057 0.012221 -3.196 0.001394 **
## age -0.026205 0.007272 -3.603 0.000314 ***
## newcustomer 0.554959 0.122766 4.520 6.17e-06 ***
## notran 2.056096 0.684069 3.006 0.002650 **
## wpcompany -0.760611 0.118893 -6.397 1.58e-10 ***
## wpmanager -0.490690 0.119522 -4.105 4.04e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2605.3 on 2925 degrees of freedom
## Residual deviance: 2441.6 on 2919 degrees of freedom
## (1 observation deleted due to missingness)
## AIC: 2455.6
##
## Number of Fisher Scoring iterations: 5
Nhìn và o kết quả trên ta thấ y cá c biến độ c lậ p đều có p – value < 0.05 cho nên tấ t cả biến
độ c lậ p đều có ý nghĩa thố ng kê ở mứ c ý nghĩa 5%. Vậ y ta đã điều chỉnh mô hình phù hợ p.
4. Kết quả ước lượng có phù hợp về mặt kinh tế không?
Đọ c kết quả từ kết quả bên trên ta có :
^β
income 2=−0.039057< 0 => Khi thu nhậ p củ a khá ch hà ng/ thá ng tă ng 1 đơn vị thì khả nă ng khô ng
trả tiền gố c đú ng hạ n sẽ giả m 100. [ 1−exp (−0.039057 ) ] ≈ 3.830369 %
⇨ Điều nà y phù hợ p về mặ t kinh tế
100*(1-exp(model_logit1$coef[2]))
## income2
## 3.830369
## age
## 2.586504
⇨ Khi tuổi của khách hàng tăng 1 đơn vị thì khả năng không trả tiền gốc
đúng hạn sẽ giảm xấp xỉ 2.59% => Phù hợp về mặt kinh tế
⇨ Tương tự cho các biến độc lập khác thì chúng đều phù hợp về mặt kinh tế
## newcustomer
## -74.18704
## notran
## -681.5399
## wpcompany
## 53.26194
## wpmanager
## 38.77963
5. Hãy tính tỷ lệ dự báo đúng trên tập này.
prob_L.train = predict(model_logit1, train.set,type="response")
prob_L.train.classes = ifelse(prob_L.train > 0.5, 1, 0)
TLDBD_train=mean(prob_L.train.classes == train.set$default_prin, na.rm=TRUE)
print("Tỷ lệ dự báo đúng trên tập số liệu train.set là")
## [1] "Tỷ lệ dự báo đúng trên tập số liệu train.set là"
TLDBD_train
## [1] 0.836637
6. Hãy tính xác suất không trả được tiền lãi tại giá trị trung bình của biến income2,
giá trị trung bình của biến age, đối với khách hàng mới, có giao dịch trong 3 tháng, có
giao dịch trong 4 tháng. Các biến số tính trên tập test.set.
mean_income2=mean(test.set$income2)
mean_age=mean(test.set$age)
prob_L.train1=predict(model_logit1, data.frame(income2 = mean_income2,
age=mean_age,
notran = 0,
notran3 = 0,
newcustomer=1,
wpcompany=0,
wpmanager=0),
type = "response")
prob_L.train1
## 1
## 0.2679228

2.2Nguyễn Ngọc Quỳnh Anh

Uploaded by

Copyright:

Available Formats

You might also like

2.2Nguyễn Ngọc Quỳnh Anh

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2.2Nguyễn Ngọc Quỳnh Anh

Uploaded by

Copyright:

Available Formats

10 điểm

Kiểm tra Code - Đề số 2

## Loading required package: carData

## Loading required package: zoo

## The following objects are masked from 'package:base':

## Loading required package: sandwich

## Loading required package: survival

## Loading required package: ggplot2

## Loading required package: lattice

## The following object is masked from 'package:survival':

#Kích thước (số hàng và số cột) của tập số liệu

⇨ Điều nà y phù hợ p về mặ t kinh tế

## [1] "Tỷ lệ dự báo đúng trên tập số liệu train.set là"

You might also like