Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

GMDH SHELL

Summary of Regression Model Statistics

Adjusted R2 = Using the formula for adjusted R2 :

Adjusted R2=1− (1−R2)×(n−1)/ (n−k−1) Given:

R2=0.781591
k=11
n=1731

Let's calculate the adjusted R2 :

Adjusted R2=1− (1−0.781591)×(1731−1)/ (1731−11−1)

Adjusted R2=1− (0.218409)×(1730)/ 1719

Adjusted R2=1−378.495/1719

Adjusted R2=1−0.220021
Adjusted R2≈0.779979
The Adjusted R-Square indicates that the model explains approximately 77% variability in the
dependent variable based on the chosen predictors.

Goodness of Fit: Check how well the model fit line aligns with the actual data points.
meaning the model's predicted values are close to the observed values.

however most of the predicted values fall within the confidence bands, but none of
the actual data points lie within these bands, it suggests a potential discrepancy
between the model's predictions and the observed reality.

1. Model Accuracy vs. Variability: The model might be accurately predicting the
central tendency or overall trend of the data, as indicated by the majority of predicted
values falling within the confidence bands. However, the actual observed data points
deviate significantly from the predicted values, suggesting higher variability or noise
in the real-world data that the model isn't capturing.

Residuals Analysis Overview


Residuals – difference between observed and predicted values
- Examine distribution and patterns of residuals to assess model performance
- Largely fall along mean line and within std dev band –--model is doing good job in
capturing variance in data, making consistent predictions across dataset
- Few points across std dev bands might be outliers
Histogram – ideally should be a bell –shaped curve centered around zero
- Majority of residuals are distributed symmetrically around zero presenting normal
distributions
- The small histogram on the right side might indicate a slight skewness or a presence of
outliers or influential points.
Autocorrelation – Values close to 1 at the start of the graph indicates a strong spatial autocorrelation
at shorter distances or smaller lags.
Most values below 0.05 suggests a weaker spatial autocorrelation at larger distances or higher lags

SPSS REGRESSION
Model results:

Log of Model

MODEL INFORMATION

Overview of the model settings and specifications such as the method used (Poisson regression),
model building procedure, and other basic information
CASE PROCESSING SUMMARY

Provides information about missing data, number of cases used in the analysis, and any
excluded cases due to missing values or other criteria.

This shows proportion of included versus excluded variables

All variables are included and there is no missing value

Goodness of fit for Poisson Regression in SPSS


Assessments like deviance and chi-square statistics to evaluate how well the model fits the data.
These help determine if the model adequately explains the variability in the dependent variable.

Analysis:
The higher values observed in deviance, Pearson chi-square, log-likelihood, AIC, AICC, BIC, and CAIC
in the Poisson regression model from SPSS suggest a more intricate scenario regarding model fit and
complexity.
The deviance, Pearson chi-square, and log-likelihood values are crucial for assessing how well the model
fits the data
Higher deviance and chi-square values indicate that the model doesn't fit the data as well as a perfect
model
AIC, among other criteria, evaluates the trade-off between model fit and complexity. In our analysis,
higher AIC values suggest that the model, while attempting to explain the data, might be relatively more
complex. This complexity might not significantly improve the model's predictive ability compared to
simpler models.
Similar to AIC, BIC also addresses model fit and complexity. However, BIC tends to penalize complex
models more severely than AIC. The relatively higher BIC values observed in our Poisson regression
model signal that the model might lean towards being overly complex, possibly containing redundant or
less impactful variables.
These higher AIC and BIC values indicate that the current Poisson regression model, despite attempting
to explain the data, might struggle due to its complexity. It's essential to carefully evaluate the variables
included in the model and consider simplifying it to enhance interpretability and predictive accuracy.

TEST OF MODEL EFFECTS

Statistical tests (like Wald tests) for each predictor variable to determine their significance in
predicting the dependent variable. This table helps assess the importance of each independent
variable.

Interpretation:
Intercept Significance: The intercept's significant Wald chi-square value suggests that even in the
absence of other predictors, there remains a substantial effect on the dependent variable.
Independent Variables' Significance:
Variables with lower p-values (highly significant) contribute more substantially to explaining the
variability in the dependent variable.

Parameters of Estimate in Poisson Regression Model (SPSS)


Details about the estimated coefficients, standard errors, Wald statistics, and significance levels
for each predictor in the model. These coefficients indicate the strength and direction of the
relationship between predictors and the dependent variable.

R REGRESSION
Model fit

The values—null deviance, residual deviance, and AIC—are used to assess the goodness
of fit in Poisson regression models. Here's what these values indicate:

1. Null Deviance (124436 on 1730 df): This represents the deviance for a model with
only the intercept (no predictors). It measures how well the model fits the data
compared to a model with no predictors. Higher null deviance values suggest that the
model with no predictors has more variability or lack of fit when compared to the
observed data.
2. Residual Deviance (123832 on 1722 df): This is the deviance after fitting the model
with predictors. It measures how well the model with predictors fits the data. A lower
residual deviance compared to the null deviance indicates that the added predictors
contribute to explaining some of the variability in the dependent variable.
3. AIC (135489): The Akaike Information Criterion balances model fit with complexity.
Lower AIC values indicate better-performing models relative to others. However, the
absolute value of AIC itself doesn’t provide much insight without comparison to
alternative models.

ANALYSIS:

Higher deviance values (both null and residual) might suggest that the model struggles
to explain the variability in the dependent variable with the given set of predictors. It
could indicate that the model, as constructed, might not capture all the important
factors influencing the occurrence of road accidents.

Higher AIC values suggest that the model might be relatively more complex or less
effective in explaining the data compared to alternative models.
The Normal Q-Q plot, when examining residuals in regression analysis, helps assess if the
residuals (the differences between observed and predicted values) follow a normal distribution

 Majority of residuals conform closely to the diagonal line, indicating residuals are normally
distributed

 Deviations indicate potential outliers, affecting the accuracy of the regression model's
predictions

"The Q-Q plot reveals that the majority of residuals conform closely to the diagonal line,
suggesting compliance with the normality assumption for most data points. However, deviations
beyond 17 standard deviations indicate potential outliers or heavy-tailed behavior, impacting
the model's reliability in predicting extreme values. Further analysis or sensitivity checks are
warranted to address this skewness and ensure robustness in predicting extreme outcomes."
 The scale-location plot helps evaluate whether the residuals' spread remains consistent
across different levels of the predicted values.
 Expected Behavior: In an ideal scenario, the points on the plot would scatter randomly
around a horizontal line (the red line) with no discernible pattern, indicating consistent
variance in residuals across predicted values (homoscedasticity).

Observations:
 Scattered Points: The fact that the points are widely scattered and don't conform to the
red line suggests that the variance of the residuals is not consistent across the predicted
values.
 Potential Issues: The lack of a consistent spread of residuals around the red line might
indicate heteroscedasticity, where the variability of residuals changes across the range of
predicted values
Heteroscedasticity could affect the reliability of statistical tests, confidence intervals, and
predictions made by the regression model, particularly for extreme predicted values.

The residuals vs. fitted values plot in Poisson regression helps assess the relationship
between the Pearson residuals (standardized residuals) and the predicted (fitted) values
from the model.

 Nonlinear Pattern: The lines are often references for assessing the fit of the residuals.
The dashed line represents an expected linear pattern. The downward curve of the red
line and its divergence from the dashed line indicate a deviation from the expected
linear relationship between residuals and predicted values.
 Heteroscedasticity Concerns:
 The scatter of points between residuals of -15 to 15 suggests a wide spread of residuals
across different levels of predicted values. This wide dispersion could indicate
heteroscedasticity or varying levels of variability in the residuals across the range of
predicted values.
 The scattered points across a wide range of residuals may signal unequal variance or
heteroscedasticity, which can affect the model's reliability, particularly for extreme
predicted values.
Cook's distance graph, assesses influential observations in regression analysis, the specific observations
and their corresponding Cook's distances provide insights into the potential impact of individual data
points on the regression model

 Observation 1071 (Cook's Distance: 6): This observation stands out with the highest
Cook's distance of 6, signifying it as a highly influential data point. It might significantly
impact the estimated regression coefficients if excluded or might have a substantial effect
on the model's fit.
 Observations 417 and 299 (Cook's Distances: 0.9 and 0.8): These observations also
exhibit relatively higher Cook's distances, indicating moderate influence on the
regression model compared to others but not as impactful as Observation 1071.
 Observation Influence: Higher Cook's distances for specific observations suggest that
these data points might have a considerable effect on the regression model's coefficients
or predictions.
 Model Sensitivity: A few observations having disproportionately high Cook's distances
could potentially affect the overall model's performance if these observations carry
substantial weight in influencing the regression results.

You might also like