Professional Documents
Culture Documents
W6 - L5 - Assumptions of Regression
W6 - L5 - Assumptions of Regression
Regression models rely on four key assumptions, and understanding these is crucial for valid
interpretation and reliable results.
Normality:
The fundamental assumption of a linear regression model is that the errors, or the differences
between the predicted and actual values, follow a normal distribution. This implies that, given
a predicted outcome, the errors should be distributed in a bell-shaped, symmetrical pattern. In
simpler terms, the normality assumption asserts that the errors in estimating the outcome
variable (Y), considering the predicted values (Ŷ), should conform to a normal distribution.
Linearity:
The core assumption of linearity suggests that the relationship between the dependent variable
(Y) and the independent variable(s) is linear. When employing regression to model an outcome
variable (Y) based on predictor variable(s) (X), we assume that this relationship is linear or at
least approximately so. In practical terms, linearity means that changes in the predictor
variable(s) lead to proportional changes in the outcome variable. It's the assumption that the
relationship can be adequately represented by a straight line or a close approximation.
Homoscedasticity:
The concept of homoskedasticity originates from the Greek words 'homos,' signifying 'same,'
and 'skedastikos,' meaning 'scattering' or 'dispersion.' Homoskedasticity, in simpler terms,
implies 'having the same scatter or variance.' In the context of linear regression, a crucial
assumption is that the error terms exhibit consistent variance across all observations.
Homoscedasticity asserts that the errors in estimation should display equal variability, given a
predicted value (Ŷ). It essentially means that the spread of residuals derived from the linear
regression model should be uniform or exhibit equal dispersion.
In practical terms, homoscedasticity ensures that the variability of the errors remains constant
across the range of predicted values. This assumption is vital for the reliability of the regression
model, as violations, such as unequal variances in the errors, can lead to biased and inefficient
estimates. Therefore, assessing homoscedasticity is a critical step in validating the robustness
of linear regression analyses.
Independence of variance
The concept of independence of variance underscores the idea that the variability in the errors
associated with one data point should not be dependent on the variability of errors from other
data points. This assumption is crucial for ensuring the integrity and reliability of regression
analyses. Violations of independence of variance can compromise the statistical validity of the
model, leading to inaccurate inferences and conclusions. Therefore, assessing and confirming
the independence of variance is a critical step in the evaluation of regression results.
These assumptions provide the foundation for the accuracy and reliability of regression models.
Violations of these assumptions can lead to biased results and inaccurate conclusions.
Therefore, it is essential to assess and validate these assumptions to ensure the robustness of
your regression analysis.
Assumptions in Excel
1. Normality – Histograms
• Select Data: Highlight the column of data for which you want to create a histogram.
• Insert Histogram: Go to the "Insert" tab in the Excel ribbon.
• Chart Options: In the Charts group, select the "Insert Statistic Chart" option. Depending
on your Excel version, this might be labeled as "Statistical Chart" or "Histogram."
• Choose Histogram: From the options presented, choose the histogram chart type. In
some versions, you might find it under "Histogram."
• Visual Inspection: Visually inspect the histogram. A roughly bell-shaped curve
indicates a normal distribution.
• Check Skewness and Kurtosis: Additionally, you can calculate skewness and kurtosis
using Excel functions (SKEW and KURT). In a normal distribution, skewness and
kurtosis should be close to 3.
• Assess Normality: Based on the visual inspection and calculated values, you can make
an initial assessment of normality.
2. Linearity – Scatter Plot
• Select Data: Highlight the columns containing your X and Y data.
• Insert Scatter Plot: Go to the "Insert" tab in the Excel ribbon.
• Choose Scatter Plot: In the Charts group, select the "Scatter" or "Scatter Plot"
option. Choose the type of scatter plot that best fits your needs. For linearity
assessment, a simple scatter plot will suffice.
• Format Scatter Plot: Excel will generate a default scatter plot. You may need to
format it for better clarity.
• Add Trendline: To assess linearity, you can add a trendline to the scatter plot.
Right-click on a data point, choose "Add Trendline," and select
the type of trendline (linear, exponential, etc.).
Display the equation on the chart if you want to see the formula
of the trendline.
• Visual Inspection: Visually inspect the scatter plot and trendline. A clear, linear
pattern in the data points suggests a linear relationship.