Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

2 Seatbelt saves lives - Time series

We plot the timeseries for the variable drivers to explore the pattern.

We first use a very simple liner model to check the data.


## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 69 1687 1508 1507 1385 1632 1511 1559 1630 1579 1653 2152 2148
## 70 1752 1765 1717 1558 1575 1520 1805 1800 1719 2008 2242 2478
## 71 2030 1655 1693 1623 1805 1746 1795 1926 1619 1992 2233 2192
## 72 2080 1768 1835 1569 1976 1853 1965 1689 1778 1976 2397 2654
## 73 2097 1963 1677 1941 2003 1813 2012 1912 2084 2080 2118 2150
## 74 1608 1503 1548 1382 1731 1798 1779 1887 2004 2077 2092 2051
## 75 1577 1356 1652 1382 1519 1421 1442 1543 1656 1561 1905 2199
## 76 1473 1655 1407 1395 1530 1309 1526 1327 1627 1748 1958 2274
## 77 1648 1401 1411 1403 1394 1520 1528 1643 1515 1685 2000 2215
## 78 1956 1462 1563 1459 1446 1622 1657 1638 1643 1683 2050 2262
## 79 1813 1445 1762 1461 1556 1431 1427 1554 1645 1653 2016 2207
## 80 1665 1361 1506 1360 1453 1522 1460 1552 1548 1827 1737 1941
## 81 1474 1458 1542 1404 1522 1385 1641 1510 1681 1938 1868 1726
## 82 1456 1445 1456 1365 1487 1558 1488 1684 1594 1850 1998 2079
## 83 1494 1057 1218 1168 1236 1076 1174 1139 1427 1487 1483 1513
## 84 1357 1165 1282 1110 1297 1185 1222 1284 1444 1575 1737 1763

##
## Call:
## lm(formula = drivers ~ kms + petrol + law, data = seatbelt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -436.76 -175.66 -52.41 164.94 781.80
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.727e+03 1.699e+02 16.055 < 2e-16 ***
## kms -2.231e-02 6.956e-03 -3.207 0.00158 **
## petrol -6.743e+03 1.589e+03 -4.243 3.45e-05 ***
## law -1.988e+02 6.297e+01 -3.157 0.00186 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 239 on 188 degrees of freedom
## Multiple R-squared: 0.3299, Adjusted R-squared: 0.3192
## F-statistic: 30.85 on 3 and 188 DF, p-value: 2.901e-16

We notice that all three are significant. but R2 seems to be low so this model does not
explain the variance as much.

We try to find a better model to explain the data


##
## Time series regression with "numeric" data:
## Start = 1, End = 192
##
## Call:
## dynlm(formula = drivers ~ kms + petrol + law, data = seatbelt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -436.76 -175.66 -52.41 164.94 781.80
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.727e+03 1.699e+02 16.055 < 2e-16 ***
## kms -2.231e-02 6.956e-03 -3.207 0.00158 **
## petrol -6.743e+03 1.589e+03 -4.243 3.45e-05 ***
## law -1.988e+02 6.297e+01 -3.157 0.00186 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 239 on 188 degrees of freedom
## Multiple R-squared: 0.3299, Adjusted R-squared: 0.3192
## F-statistic: 30.85 on 3 and 188 DF, p-value: 2.901e-16

The R2 is still the same.

We try to find an even better model using glm command.


##
## Call:
## glm(formula = drivers ~ kms + petrol + law, data = seatbelt)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -436.76 -175.66 -52.41 164.94 781.80
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.727e+03 1.699e+02 16.055 < 2e-16 ***
## kms -2.231e-02 6.956e-03 -3.207 0.00158 **
## petrol -6.743e+03 1.589e+03 -4.243 3.45e-05 ***
## law -1.988e+02 6.297e+01 -3.157 0.00186 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 57102.38)
##
## Null deviance: 16020031 on 191 degrees of freedom
## Residual deviance: 10735247 on 188 degrees of freedom
## AIC: 2653.7
##
## Number of Fisher Scoring iterations: 2

## Wald test:
## ----------
##
## Chi-squared test:
## X2 = 9473.3, df = 4, P(> X2) = 0.0

The pvalue of model indicates the rank is statistically significant. The chi squared is a little
high, so this model might not be a great fit in general.
We now plot the time series data for variables law and kms to explore the
pattern in the data.
We also plot the time series to explore the patten for the variable petrol.

We try to understand the auto-correlation between drivers and law using


the ccf function.

There seems to be a negative correlation. So there seems to be a positive impact of


implementing the seat belt law on driver safety.
## We try to understand the auto-correlation between drivers and kms
using the ccf function

What we see is that the more kms driven, leads to more number of accidents and deaths,
which seems like an obvious intuition.
We try to understand the auto-correlation between drivers and petrol using
the ccf function

With the increase in petrol consumption, the number of accident increases, which also
confirms a straight forward intuition from the data.

We now use the arima function to find better models. First we use only the
drivers and law time series variables.
## Series: driversT
## Regression with ARIMA(1,0,1)(0,1,1)[12] errors
##
## Coefficients:
## ar1 ma1 sma1 xreg
## 0.9338 -0.6020 -0.8596 -317.7305
## s.e. 0.0467 0.1151 0.0777 86.6232
##
## sigma^2 = 17346: log likelihood = -1139.85
## AIC=2289.7 AICc=2290.04 BIC=2305.66

We notice that around 317 fewer people die per month because of the seatbelt safety law.

We use formula in order to find the statistical significance.


## [1] 0.0001591071

We notice that the lawst is statistically significant.


We now use more time series variables to prepare a better model and check
the statistical significance.
Here our xreg1, xreg2, xreg3 are variables lawsT, Kmsst and petrolsT
respectively.
## Series: driversT
## Regression with ARIMA(1,0,3)(0,1,1)[12] errors
##
## Coefficients:
## ar1 ma1 ma2 ma3 sma1 drift xreg1
xreg2
## 0.9707 -0.6711 0.0074 -0.1753 -0.8873 -1.6752 -279.8448
0.0286
## s.e. 0.0321 0.0788 0.0977 0.0842 0.0896 1.2931 74.2275
0.0175
## xreg3
## -4856.641
## s.e. 1603.739
##
## sigma^2 = 16155: log likelihood = -1131.64
## AIC=2283.28 AICc=2284.58 BIC=2315.21

calculate the statistical significance of the model.


## ar1 ma1 ma2 ma3 sma1
drift
## 1.000000e+00 2.604823e-15 5.301317e-01 1.936732e-02 3.614849e-19
9.835698e-02
## xreg1 xreg2 xreg3
## 1.091991e-04 9.483723e-01 1.402317e-03

We see that the time series is significant for lawst and the petrolst variabless.

We then plot a model with the tslm function and check statistical
significance.
##
## Call:
## tslm(formula = driversT ~ kmssT + petrolsT + lawsT + trend +
## season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -322.51 -78.45 -16.15 75.26 311.01
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1676.7643 187.3162 8.952 4.86e-16 ***
## kmssT 0.0742 0.0162 4.580 8.77e-06 ***
## petrolsT -4792.2055 951.9724 -5.034 1.18e-06 ***
## lawsT -239.6581 37.3698 -6.413 1.27e-09 ***
## trend -4.2257 0.6937 -6.091 6.87e-09 ***
## season2 -172.2945 46.9082 -3.673 0.000318 ***
## season3 -267.6655 55.5676 -4.817 3.13e-06 ***
## season4 -412.4100 60.1768 -6.853 1.17e-10 ***
## season5 -357.2234 72.2698 -4.943 1.78e-06 ***
## season6 -418.4934 74.4215 -5.623 7.24e-08 ***
## season7 -430.7031 90.9932 -4.733 4.52e-06 ***
## season8 -452.9997 97.6762 -4.638 6.85e-06 ***
## season9 -271.2925 73.8564 -3.673 0.000318 ***
## season10 -80.5607 64.8660 -1.242 0.215905
## season11 233.0279 51.2927 4.543 1.03e-05 ***
## season12 399.9386 48.0259 8.328 2.24e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 132.3 on 176 degrees of freedom
## Multiple R-squared: 0.8077, Adjusted R-squared: 0.7913
## F-statistic: 49.27 on 15 and 176 DF, p-value: < 2.2e-16

We see that the R2 is much much higher, this model explain best the data. We still see that
the law is statistically significant and helps with the reduction in the numbers of accidents
resulting in deaths. So as a conclusion , I would agree that implementing the law of seatbelt
safety has reduced the number of accidents

You might also like