Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Heteroscedasticity:

#Heteroscedasticity means unequal scatter. In #regression analysis, we talk about heteroscedasticity in


the context of the #residuals or #error term. Specifically, heteroscedasticity is a systematic change in the
spread of the residuals over the range of measured values.

Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that all
residuals are drawn from a population that has a constant variance (homoscedasticity).

#To_trust_the_result_of_regression_analysis, #there_should_not _be_heteroskedasticity.

To satisfy the regression assumptions and be able to trust the results, the residuals should have a
constant variance.

What Causes Heteroscedasticity?

Heteroscedasticity, also spelled heteroskedasticity, occurs more often in datasets that have a large
range between the largest and smallest observed values. While there are numerous reasons why
heteroscedasticity can exist, a common explanation is that the #error_variance_changes_proportionally
with a factor. This factor might be a variable in the model.

In some cases, the variance increases proportionally with this factor but remains constant as a
percentage. For instance, a 10% change in a number such as 100 is much smaller than a 10% change in a
large number such as 100,000. In this scenario, you expect to see larger residuals associated with higher
values. That’s why you need to be careful when working with wide ranges of values!

Because large ranges are associated with this problem, some types of models are more prone to
heteroscedasticity.

#Cross_sectional_studies often have very small and large values and, thus, are more
#likely_to_have_heteroscedasticity.

#Cross-sectional studies have a larger risk of residuals with #non-constant variance because of the larger
disparity between the largest and smallest values.

How to Detect Heteroscedasticity?

A residual plot can suggest (but not prove) heteroscedasticity. Residual plots are created by:

1. Calculating the square residuals.


2. Plotting the squared residuals against an explanatory variable (one that you think is related to
the errors).
3. Make a separate plot for each explanatory variable you think is contributing to the errors.
You don’t have to do this manually; most statistical software (i.e. SPSS, Maple) have commands to
create residual plots.

Several tests can also be run:

1. Park Test.
2. White Test.

Consequences of Heteroscedasticity?

Severe heteroscedastic data can give you a variety of problems:

 OLS will not give you the estimator with the smallest variance (i.e. your estimators will not be
useful).
 Significance tests will run either too high or too low.
 Standard errors will be biased, along with their corresponding test statistics and confidence
intervals.

How to Deal with Heteroscedastic Data?

If your data is heteroscedastic, it would be inadvisable to run regression on the data as is. There are a
couple of things you can try if you need to run regression:

1. Give data that produces a large scatter less weight.


2. Transform the Y variable to achieve homoscedasticity. For example, use the #Box-Cox normality
plot to transform the data.

Sources:

https://statisticsbyjim.com/regression/heteroscedasticity-regression

https://www.statisticshowto.datasciencecentral.com/heteroscedasticity-simple-definition-examples

You might also like