Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Topic

Regression Analysis: Linear Regression and Multiple Regression

Discussion and Example:


Regression analysis is a powerful statistical method that allows you to examine the
relationship between two or more variables of interest. While there are many types of
regression analysis, at their core they all examine the influence of one or more
independent variables on a dependent variable. Regression analysis provides detailed
insight that can be applied to further improve products and services. What is regression
analysis and what does it mean to perform a regression? Regression analysis is a
reliable method of identifying which variables have impact on a topic of interest. The
process of performing a regression allows you to confidently determine which factors
matter most, which factors can be ignored, and how these factors influence each other.
In order to understand regression analysis fully, it’s essential to comprehend the
following terms:

Dependent Variable: This is the main factor that you’re trying to understand or predict.

Independent Variables: These are the factors that you hypothesize have an impact on
your dependent variable.

A simple linear regression plot for amount of rainfall.

Regression analysis is used in stats to find trends in data. For example, you might
guess that there’s a connection between how much you eat and how much you weigh;
regression analysis can help you quantify that. Regression analysis will provide you with
an equation for a graph so that you can make predictions about your data. For example,
if you’ve been putting on weight over the last few years, it can predict how much you’ll
weigh in ten years time if you continue to put on weight at the same rate. It will also give
you a slew of statistics (including a p-value and a correlation coefficient) to tell you how
accurate your model is. Most elementary stats courses cover very basic techniques, like
making scatter plots and performing linear regression. However, you may come across
more advanced techniques like multiple regressions.
In statistics, it’s hard to stare at a set of random numbers in a table and try to make any
sense of it. For example, global warming may be reducing average snowfall in your
town and you are asked to predict how much snow you think will fall this year. Looking
at the following table you might guess somewhere around 10-20 inches. That’s a good
guess, but you could make a better guess, by using regression.

Essentially, regression is the “best guess” at using a set of data to make some kind of
prediction. It’s fitting a set of points to a graph. There’s a whole host of tools that can run
regression for you, including Excel, which I used here to help make sense of that
snowfall data:

Just by looking at the regression line running down through the data, you can fine tune
your best guess a bit. You can see that the original guess (20 inches or so) was way off.
For 2015, it looks like the line will be somewhere between 5 and 10 inches! That might
be “good enough”, but regression also gives you a useful equation, which for this chart
is:
y = -2.2923x + 4624.4.
What that means is you can plug in an x value (the year) and get a pretty good estimate
of snowfall for any year. For example, 2005:
y = -2.2923(2005) + 4624.4 = 28.3385 inches, which is pretty close to the actual figure
of 30 inches for that year.
Best of all, you can use the equation to make predictions. For example, how much snow
will fall in 2017?
y=2.2923(2017) + 4624.4 = 0.8 inches
Regression also gives you an R squared value which for this graph is 0.0702. this
number tells you how good your model is. The values range from 0 to 1 with 0 being a
terrible model and 1 being a perfect model. As you can probably see 0.7 is a fairly
decent model so you can be fairly confident in your weather prediction!

Multiple regression analysis is used to see if there is a statistically


significant relationship between sets of variables. It’s used to find trends in those sets of
data.
Multiple regression analysis is almost the same as simple linear regression. The only
difference between simple linear regression and multiple regression is in the number
of predictors (“x” variables) used in the regression.
 Simple regression analysis uses a single x variable for each dependent “y”
variable. For example: (x1, Y1).
 Multiple regression uses multiple “x” variables for each independent variable:
(x1)1, (x2)1, (x3)1, Y1).
In one-variable linear regression, you would input one dependent variable (i.e. “sales”)
against an independent variable (i.e. “profit”). But you might be interested in
how different types of sales effect the regression. You could set your X 1 as one type of
sales, your X2 as another type of sales and so on.

Regression analysis is always performed in software, like Excel or SPSS. The output
differs according to how many variables you have but it’s essentially the same type of
output you would find in a simple linear regression. There’s just more of it:

 Simple regression: Y = b0 + b1 x.


 Multiple regression: Y = b0 + b1 x1 + b0 + b1 x2…b0…b1 xn.
The output would include a summary, similar to a summary for simple linear regression,
that includes:

 R (the multiple correlation coefficient),


 R squared (the coefficient of determination),
 adjusted R-squared,
 The standard error of the estimate.
These statistics help you figure out how well a regression model fits the data.
The ANOVA table in the output would give you the p-value and f-statistic.

Regression Analysis – Linear model assumptions

Linear regression analysis is based on six fundamental assumptions:

1. The dependent and independent variables show a linear relationship between


the slope and the intercept.
2. The independent variable is not random.
3. The value of the residual (error) is zero.
4. The value of the residual (error) is constant across all observations.
5. The value of the residual (error) is not correlated across all observations.
6. The residual (error) values follow the normal distribution.

Regression Analysis – Simple linear regression

Simple linear regression is a model that assesses the relationship between a dependent
variable and an independent variable. The simple linear model is expressed using the
following equation:

Y = a + bX + ϵ

Where:

 Y – Dependent variable


 X – Independent (explanatory) variable
 a – Intercept
 b – Slope
 ϵ – Residual (error)

Regression Analysis – Multiple linear regression

Multiple linear regression analysis is essentially similar to the simple linear model, with
the exception that multiple independent variables are used in the model. The
mathematical representation of multiple linear regression is:

Y = a + bX1 + cX2  + dX3 + ϵ

Where:

 Y – Dependent variable


 X1, X2, X3 – Independent (explanatory) variables
 a – Intercept
 b, c, d – Slopes
 ϵ – Residual (error)

 
Multiple linear regression follows the same conditions as the simple linear model.
However, since there are several independent variables in multiple linear analysis, there
is another mandatory condition for the model:

 Non-collinearity: Independent variables should show a minimum of correlation


with each other. If the independent variables are highly correlated with each
other, it will be difficult to assess the true relationships between the dependent
and independent variables.

Practice/Drills
Answer the following question:

Suppose we want to predict job performance of Chevy mechanics based on mechanical


aptitude test scores and test scores from personality test that measures conscientiousness.

Job Perf Mech Consc


Apt
Y X1 X2
1 40 25
2 45 20
1 38 30
3 50 30
2 48 28
3 55 30
3 53 34
4 55 36
4 58 32
3 40 34
5 55 38
3 48 28
3 45 30
2 55 36
4 60 34
5 60 38
5 60 42
5 65 38
4 50 34
3 58 38
References:

1. https://www.surveygizmo.com/resources/blog/regression-
analysis/#:~:text=Regression%20analysis%20is%20a%20powerful,variables
%20on%20a%20dependent%20variable.
2. http://faculty.cas.usf.edu/mbrannick/regression/Part3/Reg2.html

You might also like