Professional Documents
Culture Documents
Exerc Icio Computacional - Regress Ao Linear: Universidade Federal de Minas Gerais
Exerc Icio Computacional - Regress Ao Linear: Universidade Federal de Minas Gerais
Exerc Icio Computacional - Regress Ao Linear: Universidade Federal de Minas Gerais
Exercı́cio Computacional -
Regressão Linear
19 de Abril de 2017
Questão 9
a)
> data(Auto)
> plot(Auto)
40
mpg
10
3 5 7
cylinders
400
displacement
100
200
horsepower
50
5000
weight
1500
10 20
acceleration
78
year
70
2.5
origin
1.0
200
name
0
b)
> auto_quant <- Auto[,1:8]
> corr_auto <- cor(auto_quant)
> corr_auto
1
acceleration 1.0000000 0.2903161 0.2127458
year 0.2903161 1.0000000 0.1815277
origin 0.2127458 0.1815277 1.0000000
c)
> lmauto <- lm(formula = mpg~cylinders+displacement+horsepower+weight+acceleration+year+orig
> summary(lmauto)
Call:
lm(formula = mpg ~ cylinders + displacement + horsepower + weight +
acceleration + year + origin, data = auto_quant)
Residuals:
Min 1Q Median 3Q Max
-9.5903 -2.1565 -0.1169 1.8690 13.0604
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.218435 4.644294 -3.707 0.00024 ***
cylinders -0.493376 0.323282 -1.526 0.12780
displacement 0.019896 0.007515 2.647 0.00844 **
horsepower -0.016951 0.013787 -1.230 0.21963
weight -0.006474 0.000652 -9.929 < 2e-16 ***
acceleration 0.080576 0.098845 0.815 0.41548
year 0.750773 0.050973 14.729 < 2e-16 ***
origin 1.426141 0.278136 5.127 4.67e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ii. A significância dos preditores para a resposta pode ser avaliada pelo seu p-
value. As variáveis ”weight”, ”year”, ”origin”e ”displacement”mostraram-se
bastante significantes para a saı́da, todas elas com p-values corresponden-
tes a menos de 1%. As variáveis ”cylinders”, ”horsepower”e ”acceleration”mostraram-
se menos significativas, com p-value mais alto, portanto, se encaixando
menos no modelo linear.
2
um aumento de por volta de 0.75 na autonomia com um galão de combus-
tı́vel dos veı́culos produzidos.
d)
Standardized residuals
323
4
323
326327 327
326
10
Residuals
2
5
0
0
−2
−10
10 15 20 25 30 35 −3 −2 −1 0 1 2 3
323
Standardized residuals
326327 0.5
4
327
394
1.5
2
1.0
0
0.5
−2
14
Cook's distance
0.0
e)
Buscou-se analisar o quanto são significantes as interações entre as variáveis que
apresentaram baixa significância no modelo linear sem interações: cylinders,
horsepower e acceleration.
Call:
lm(formula = mpg ~ . + acceleration * horsepower + horsepower *
cylinders + cylinders * acceleration, data = Auto[, 1:8])
Residuals:
Min 1Q Median 3Q Max
-9.6133 -1.5421 -0.0494 1.3463 12.0351
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.9574157 7.3479673 1.491 0.136732
3
cylinders -5.7441776 1.1502991 -4.994 9.04e-07 ***
displacement -0.0093760 0.0076153 -1.231 0.219006
horsepower -0.1999433 0.0546991 -3.655 0.000293 ***
weight -0.0033383 0.0006671 -5.004 8.60e-07 ***
acceleration -0.2275034 0.2544556 -0.894 0.371844
year 0.7344560 0.0447333 16.419 < 2e-16 ***
origin 0.8029378 0.2506548 3.203 0.001473 **
horsepower:acceleration -0.0070588 0.0025705 -2.746 0.006318 **
cylinders:horsepower 0.0375804 0.0045891 8.189 4.02e-15 ***
cylinders:acceleration 0.1261022 0.0619227 2.036 0.042397 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
2
5
0
0
−2
−10
112
112
10 15 20 25 30 35 −3 −2 −1 0 1 2 3
387 323
Standardized residuals
387
4
112 0.5
327
394
2
0
−2
Cook's distance
−4
0.5
e)
Buscou-se avaliar diversas transformações das duas variáveis com maior p-value
na regressão, ”acceleration”e ”horsepower”.
4
40
40
Auto$mpg
Auto$mpg
30
30
20
20
10
10
2.2 2.4 2.6 2.8 3.0 3.2 3.0 3.5 4.0 4.5 5.0
log(Auto$acceleration) sqrt(Auto$acceleration)
40
40
Auto$mpg
Auto$mpg
30
30
20
20
10
10
(Auto$acceleration)^2 (Auto$acceleration)
40
40
Auto$mpg
Auto$mpg
30
30
20
20
10
10
log(Auto$horsepower) sqrt(Auto$horsepower)
40
40
Auto$mpg
Auto$mpg
30
30
20
20
10
10
(Auto$horsepower)^2 (Auto$horsepower)
Call:
lm(formula = mpg ~ . + log(horsepower), data = Auto[, 1:8])
Residuals:
Min 1Q Median 3Q Max
5
-8.5777 -1.6623 -0.1213 1.4913 12.0230
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.674e+01 1.106e+01 7.839 4.54e-14 ***
cylinders -5.530e-02 2.907e-01 -0.190 0.849230
displacement -4.607e-03 7.108e-03 -0.648 0.517291
horsepower 1.764e-01 2.269e-02 7.775 7.05e-14 ***
weight -3.366e-03 6.561e-04 -5.130 4.62e-07 ***
acceleration -3.277e-01 9.670e-02 -3.388 0.000776 ***
year 7.421e-01 4.534e-02 16.368 < 2e-16 ***
origin 8.976e-01 2.528e-01 3.551 0.000432 ***
log(horsepower) -2.685e+01 2.652e+00 -10.127 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
323 323
4
387 387
10
310 310
Residuals
2
5
0
0
−2
−10
10 15 20 25 30 35 −3 −2 −1 0 1 2 3
387 323
Standardized residuals
0.5
4
310 387
1.5
2
1.0
0
0.5
14
−2
Cook's
103 distance
0.0
6
Questão 10
a)
Call:
lm(formula = Sales ~ Price + Urban + US, data = Carseats)
Residuals:
Min 1Q Median 3Q Max
-6.9206 -1.6220 -0.0564 1.5786 7.0581
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
Price -0.054459 0.005242 -10.389 < 2e-16 ***
UrbanYes -0.021916 0.271650 -0.081 0.936
USYes 1.200573 0.259042 4.635 4.86e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
b)
Price De acordo com o modelo, mantendo-se constantes as demais variáveis, um
aumento de 1 no preço provoca uma diminuição de por volta de 0.054
milhares de unidades nas vendas.
UrbanYes De acordo com o modelo, mantendo-se constantes as demais variáveis, lo-
jas em ambientes urbanos vendem por volta de 0.022 milhares de unidades
a menos.
c)
Sales = −0.054P rice − 0.022U rban + 1.20U S + 13.04
d)
Pode-se rejeitar a hipótese nula para as variáveis ”Price”e ”USYes”.
7
e)
Pode-se criar um modelo que utiliza apenas as variáveis ”Price”e ”US”.
Call:
lm(formula = Sales ~ Price + US, data = Carseats)
Residuals:
Min 1Q Median 3Q Max
-6.9269 -1.6286 -0.0574 1.5766 7.0515
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
Price -0.05448 0.00523 -10.416 < 2e-16 ***
USYes 1.19964 0.25846 4.641 4.71e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
f)
O modelo menor obtido na alternativa e) se encaixa levemente melhor nos dados
que o modelo obtido na alternativa e). Pode-se inferir isto através dos valores
do erro RSE, 2.469 para f) e 2.472 para e).
g)
2.5 % 97.5 %
(Intercept) 11.79032020 14.27126531
Price -0.06475984 -0.04419543
USYes 0.69151957 1.70776632
8
h)
Standardized residuals
3
377 377
69 69
5
2
Residuals
1
0
−1
−5
51
51
−3
4 6 8 10 12 −3 −2 −1 0 1 2 3
377
6951
Standardized residuals
1 2 3
1.5
26
50
368
1.0
−1
0.5
Cook's distance
−3
0.0
Questão 13
a)
> set.seed(1)
> X <- rnorm(100)
b)
> eps <- rnorm(100, sd = sqrt(0.25))
c)
> Y <- -1 + 0.5 * X + eps
> length(Y)
[1] 100
9
d)
0.5
0.0
−0.5
−1.0
Y
−1.5
−2.0
−2.5
−2 −1 0 1 2
e)
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-0.93842 -0.30688 -0.06975 0.26970 1.17309
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.01885 0.04849 -21.010 < 2e-16 ***
X 0.49947 0.05386 9.273 4.58e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
10
f)
0.5
Mínimos Quadrados
População
0.0
−0.5
−1.0
Y
−1.5
−2.0
−2.5
−2 −1 0 1 2
g)
Call:
lm(formula = Y ~ X + I(X^2))
Residuals:
Min 1Q Median 3Q Max
-0.98252 -0.31270 -0.06441 0.29014 1.13500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.97164 0.05883 -16.517 < 2e-16 ***
X 0.50858 0.05399 9.420 2.4e-15 ***
I(X^2) -0.05946 0.04238 -1.403 0.164
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
11
h)
Foi gerado um novo conjunto de dados com desvio padrão do ruı́do de 0.1 e os
passos a-f foram executados.
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-0.232416 -0.060361 0.000536 0.058305 0.229316
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.989115 0.009035 -109.48 <2e-16 ***
X 0.499907 0.009472 52.78 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Mínimos Quadrados
População
0.0
−0.5
Y
−1.0
−1.5
−2.0
−2 −1 0 1 2
12
i)
Foi gerado um novo conjunto de dados com desvio padrão do ruı́do de 1 e os
passos a-f foram executados.
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-2.32416 -0.60361 0.00536 0.58305 2.29316
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.89115 0.09035 -9.864 2.39e-16 ***
X 0.49907 0.09472 5.269 8.16e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Mínimos Quadrados
População
1
0
−1
Y
−2
−3
−2 −1 0 1 2
13
j)
A seguir, os intervalos de confiança para, respectivamente, o modelo original, o
modelo com menos ruı́do e o modelo com mais ruı́do.
2.5 % 97.5 %
(Intercept) -1.1150804 -0.9226122
X 0.3925794 0.6063602
2.5 % 97.5 %
(Intercept) -1.0070441 -0.9711855
X 0.4811096 0.5187039
2.5 % 97.5 %
(Intercept) -1.0704405 -0.7118552
X 0.3110958 0.6870395
14