Professional Documents
Culture Documents
Statistics
Statistics
Statistics
U.S. Department of
Transportation
GROUP 4
Doing a quick scatter plot reveals that there is indeed relationship between two variables.
Both variables don’t follow normal distribution as evident from the below histograms
Histogram Histogram
8 7
7 6
6
5
Frequency
Frequency
5
4 4
3 3
2 2
1 1
0 0
We have done linear regression fitting of the data, below are the output.
We can see from the R Square value that the linear regression model explains 70% of the variation in fatal accidents
through drivers under age of 21.
Below are ANOVA tables for the model
df SS MS F Significance F
We can see that p-value of F-test is significant. Also p-values of parameters are significant.
4
3
Licenses
2
1
0
0 5 10 15 20
Percent Under 21
0.5
0
-0.5 0 5 10 15 20
-1
-1.5
Percent Under 21
The line fit plot shows that predicted values are close to actual values and also the relationship that with increase in
percentage of licensed drivers under the age of 21 number of fatal accidents also increases.
Residual plot shows no pattern so we can conclude that linear regression is the best fit for modelling this problem.
We reject the null hypothesis that two variables are independent and conclude that there is relationship between
number of fatal accidents and percentage of licensed drivers under the age of 21.
3. What conclusion and recommendations can you derive from your
analysis?
We saw from regression analysis of the given data that number of fatal accidents increases with the increase in
percentage of licensed drivers under the age of 21. It can be recommended to increase the minimum age for
applying driver’s license to increase transportation safety.