Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Assumptions of Linear Regression

Regression models rely on four key assumptions, and understanding these is crucial for valid
interpretation and reliable results.

Normality:
The fundamental assumption of a linear regression model is that the errors, or the differences
between the predicted and actual values, follow a normal distribution. This implies that, given
a predicted outcome, the errors should be distributed in a bell-shaped, symmetrical pattern. In
simpler terms, the normality assumption asserts that the errors in estimating the outcome
variable (Y), considering the predicted values (Ŷ), should conform to a normal distribution.

Linearity:
The core assumption of linearity suggests that the relationship between the dependent variable
(Y) and the independent variable(s) is linear. When employing regression to model an outcome
variable (Y) based on predictor variable(s) (X), we assume that this relationship is linear or at
least approximately so. In practical terms, linearity means that changes in the predictor
variable(s) lead to proportional changes in the outcome variable. It's the assumption that the
relationship can be adequately represented by a straight line or a close approximation.

Homoscedasticity:
The concept of homoskedasticity originates from the Greek words 'homos,' signifying 'same,'
and 'skedastikos,' meaning 'scattering' or 'dispersion.' Homoskedasticity, in simpler terms,
implies 'having the same scatter or variance.' In the context of linear regression, a crucial
assumption is that the error terms exhibit consistent variance across all observations.
Homoscedasticity asserts that the errors in estimation should display equal variability, given a
predicted value (Ŷ). It essentially means that the spread of residuals derived from the linear
regression model should be uniform or exhibit equal dispersion.

In practical terms, homoscedasticity ensures that the variability of the errors remains constant
across the range of predicted values. This assumption is vital for the reliability of the regression
model, as violations, such as unequal variances in the errors, can lead to biased and inefficient
estimates. Therefore, assessing homoscedasticity is a critical step in validating the robustness
of linear regression analyses.

Independence of variance
The concept of independence of variance underscores the idea that the variability in the errors
associated with one data point should not be dependent on the variability of errors from other
data points. This assumption is crucial for ensuring the integrity and reliability of regression
analyses. Violations of independence of variance can compromise the statistical validity of the
model, leading to inaccurate inferences and conclusions. Therefore, assessing and confirming
the independence of variance is a critical step in the evaluation of regression results.

These assumptions provide the foundation for the accuracy and reliability of regression models.
Violations of these assumptions can lead to biased results and inaccurate conclusions.
Therefore, it is essential to assess and validate these assumptions to ensure the robustness of
your regression analysis.

Assumptions in Excel
1. Normality – Histograms
• Select Data: Highlight the column of data for which you want to create a histogram.
• Insert Histogram: Go to the "Insert" tab in the Excel ribbon.
• Chart Options: In the Charts group, select the "Insert Statistic Chart" option. Depending
on your Excel version, this might be labeled as "Statistical Chart" or "Histogram."
• Choose Histogram: From the options presented, choose the histogram chart type. In
some versions, you might find it under "Histogram."
• Visual Inspection: Visually inspect the histogram. A roughly bell-shaped curve
indicates a normal distribution.
• Check Skewness and Kurtosis: Additionally, you can calculate skewness and kurtosis
using Excel functions (SKEW and KURT). In a normal distribution, skewness and
kurtosis should be close to 3.
• Assess Normality: Based on the visual inspection and calculated values, you can make
an initial assessment of normality.
2. Linearity – Scatter Plot
• Select Data: Highlight the columns containing your X and Y data.
• Insert Scatter Plot: Go to the "Insert" tab in the Excel ribbon.
• Choose Scatter Plot: In the Charts group, select the "Scatter" or "Scatter Plot"
option. Choose the type of scatter plot that best fits your needs. For linearity
assessment, a simple scatter plot will suffice.
• Format Scatter Plot: Excel will generate a default scatter plot. You may need to
format it for better clarity.
• Add Trendline: To assess linearity, you can add a trendline to the scatter plot.
Right-click on a data point, choose "Add Trendline," and select
the type of trendline (linear, exponential, etc.).
Display the equation on the chart if you want to see the formula
of the trendline.
• Visual Inspection: Visually inspect the scatter plot and trendline. A clear, linear
pattern in the data points suggests a linear relationship.

3. Independence of Variance and Homoskedasticity


• Click on "Data Analysis" and choose "Regression" from the list.
• In the Regression dialog box, enter the input range for the Y variable
(dependent) and X variable (independent).
• Check the box for "Residuals."
• Specify where you want the output to be displayed. Choose a cell for the output
to start, or create a new worksheet.
• Click "OK" to run the regression analysis with residuals.
• The output will include a column of residuals, which are the differences
between observed and predicted values.
• Visually inspect the residual plot for patterns.
• Ideally, residuals should be randomly scattered around zero, indicating
independence of variance.
• Look for any discernible patterns, such as a cone shape or a trend, which may
indicate a violation of the independence of variance assumption.
• Create a scatter plot of residuals against the predicted values (fitted values).
• Look for a consistent spread of residuals across all levels of the predicted values.
• Ideally, the spread of residuals should be relatively constant, indicating
homoscedasticity.
• A consistent spread of residuals suggests that the assumption of
Homoscedasticity is met. If you observe a pattern where the spread of residuals
widens or narrows systematically across predicted values, it may indicate
heteroscedasticity, suggesting a violation of the assumption.

You might also like