Professional Documents
Culture Documents
Heteroscedasticity
Heteroscedasticity
Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that all
residuals are drawn from a population that has a constant variance (homoscedasticity).
To satisfy the regression assumptions and be able to trust the results, the residuals should have a
constant variance.
Heteroscedasticity, also spelled heteroskedasticity, occurs more often in datasets that have a large
range between the largest and smallest observed values. While there are numerous reasons why
heteroscedasticity can exist, a common explanation is that the #error_variance_changes_proportionally
with a factor. This factor might be a variable in the model.
In some cases, the variance increases proportionally with this factor but remains constant as a
percentage. For instance, a 10% change in a number such as 100 is much smaller than a 10% change in a
large number such as 100,000. In this scenario, you expect to see larger residuals associated with higher
values. That’s why you need to be careful when working with wide ranges of values!
Because large ranges are associated with this problem, some types of models are more prone to
heteroscedasticity.
#Cross_sectional_studies often have very small and large values and, thus, are more
#likely_to_have_heteroscedasticity.
#Cross-sectional studies have a larger risk of residuals with #non-constant variance because of the larger
disparity between the largest and smallest values.
A residual plot can suggest (but not prove) heteroscedasticity. Residual plots are created by:
1. Park Test.
2. White Test.
Consequences of Heteroscedasticity?
OLS will not give you the estimator with the smallest variance (i.e. your estimators will not be
useful).
Significance tests will run either too high or too low.
Standard errors will be biased, along with their corresponding test statistics and confidence
intervals.
If your data is heteroscedastic, it would be inadvisable to run regression on the data as is. There are a
couple of things you can try if you need to run regression:
Sources:
https://statisticsbyjim.com/regression/heteroscedasticity-regression
https://www.statisticshowto.datasciencecentral.com/heteroscedasticity-simple-definition-examples