Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

BUSINESS ANALYTICS MODULE 5

Multiple Regression

→ We use single variable linear regression to investigate the relationship between a dependent variable and one
independent variable.
• A coefficient in a single variable linear regression characterizes the gross relationship between the
independent variable and the dependent variable.

→ We use multiple regression to investigate the relationship between a dependent variable and multiple
independent variables.

→ The structure of the multiple regression equation is 𝐲=a+b1x1+b2x2+…+bkxk.


• The true relationship between multiple variables is described by y=α+β1x1+β2x2+…+βkxk+ε, where 𝜀 is the error
term. The idealized equation that describes the true regression model is  y=α+β1x1+β2x2+…+βkxk.
• Coefficients in multiple regression characterize relationships that are net with respect to the independent
variables included in the model but gross with respect to all omitted independent variables.

→ Forecasting with a multiple regression equation is similar to forecasting with a single variable linear model.
However, instead of entering only one value for a single independent variable, we input a value for each of the
independent variables.

→ As with single variable linear regression, it is important to evaluate several metrics to determine whether a multiple
variable linear regression model is a good fit for our data.
• For multiple regression we rely less on scatter plots and more on numerical values and residual plots because
visualizing three or more variables can be difficult.

2
→ Because R never decreases when independent variables are added to a regression, it is important to multiply it by
an adjustment factor when assessing and comparing the fit of a multiple regression model. This adjustment factor
2
compensates for the increase in R that results solely from increasing the number of independent variables.
2
• Adjusted R is provided in the regression output.
2 2
• It is particularly important to look at Adjusted R , rather than R , when comparing regression models with
different numbers of independent variables.

2
→ In addition to analyzing Adjusted R , we must test whether the relationship between the independent and
dependent variables is linear and significant. We do this by analyzing the regression’s residual plots and the p-
values associated with each independent variable’s coefficient.

→ For multiple regression models, because it is difficult to view the data in a simple scatter plot, residual plots are an
indispensable tool for detecting whether the linear model is a good fit.
• There is a residual plot for each independent variable included in the regression model.
• We can graph a residual plot for each independent variable to help detect patterns such as
heteroskedasticity and nonlinearity.

Multiple Regression | Page 1 of 3


BUSINESS ANALYTICS MODULE 5
Multiple Regression

• As with single variable regression models, if the underlying multiple relationship is linear, each of the residuals
follows a normal distribution with a mean of zero and fixed variance.

→ We should also analyze the p-values of the independent variables to determine whether there is a significant
relationship between the variables in the model. If the p-value of each of the independent variables is less than
0.05, we conclude that there is sufficient evidence to say that we are 95% confident that there is a significant linear
relationship between the independent and dependent variables.

→ Multiple regression requires us to be aware of the possibility of multicollinearity among the independent variables.
• Multicollinearity occurs when there is a strong linear relationship among two or more of the independent
variables.
• Indications of multicollinearity include seeing an independent variable’s p-value increase when one or more
other independent variables are added to a regression model.
• We may be able to reduce multicollinearity by either increasing the sample size or removing one (or more) of
the collinear variables.

→ Dummy variables and lagged variables can be useful in regression models.


• Multiple regression models allow us to include multiple dummy variables for categorical data—day of week,
for example.
→ A dummy variable is equal to 1 when the variable of interest fits a certain criterion. For example, a dummy
variable for “Saturday” would equal 1 for observations relating to Saturdays and 0 for observations related
to all other days.
→ The number of dummy variables we include must always be one fewer than the number of options in a
category.
• We can also include lagged variables in multiple regression models. Lagged values are used to capture the
ongoing effects of a given variable.
→ The lag period is based on managerial insight and data availability.
→ Including lagged variables has some drawbacks:
• Each lagged variable decreases our sample size by one observation.
• If the lagged variable does not increase the model’s explanatory power, the addition of the variable
2
decreases Adjusted R .

Multiple Regression | Page 2 of 3


BUSINESS ANALYTICS MODULE 5
Multiple Regression

EXCEL SUMMARY

Recall the Excel functions and analyses covered in this course and make sure to familiarize yourself with all of the
necessary steps, syntax, and arguments. We have provided some additional information for the more complex
functions listed below. As usual, the arguments shown in square brackets are optional.

→ Forecasting with regression models in Excel

→ Creating a regression output table using the Data Analysis tool

→ Creating regression models using dummy variables


• =IF(logical_test,[value_if_true],[value_if_false])
→ Returns value_if_true if the specified condition is met, and returns value_if_false if the condition is
not met.

→ Creating regression models using lagged variables

Multiple Regression | Page 3 of 3

You might also like