Edu6950 Multiple Regression

EDU6950 – ADVANCED STATISTICS IN EDUCATION
“MULTIPLE REGRESSION ANALYSIS”
PREPARED FOR :
PROF. DR. MOHD MAJID BIN KONTING
PREPARED BY :
NOR SA’ADAH BINTI JAMALUDDIN
GS48233
1
MULTIPLE REGRESSION
1) What is Multiple Regression?
 Multiple regression is a very advanced statistical too and it is extremely powerful

when you are trying to develop a “model” for predicting a wide variety of outcomes.
 Multiple regressions is a statistical tool that allows you to examine how multiple
independent variables are related to a dependent variable. Once you have identified
how these multiple variables relate to your dependent variable, you can take
information about all of the independent variables and use it to make much more
powerful and accurate predictions about why things are the way they are.
 Multiple regression allows us to include additional variables in the model and
estimate their effects on the dependent variable as well. In multiple regression we still
estimate a linear equation which can be used for prediction. For a case with seven
independent variables we estimate:
Yi  β0  β1 X i1  β 2 X i2  β3 X i3…. + β7 X i7
 Once we add additional variables into the regression model, several things change,
while much remains the same. First let’s look at what remains the same. The output of
regression will look relatively the same. We will generate an R Square for our model
that will be interpreted as the proportion of the variability in Y accounted for by all
the independent variables in the model. The model will include a single measure of
the standard error which reflects an assumption of constant variance of Y for all levels
of the independent variables in the model.
2) What are the objectives of the Multiple Regression in HATCO data?
 HATCO management has long been interested in more accurately predicting the level
of business obtained from its customers in the attempt to provide a better basis for
production control and marketing errors.
 In doing multiple regression, the objective that researchers want to focuses on ;
o To this end, researchers at HATCO proposed that a multiple regression analysis
should be attempted to predict the product usage levels of the customers based
on their perceptions of HATCO’s performance.
2
o To finding a way to accurately predicts usage levels, the researchers were also
interested in identifying the factors that led to increased product usage for
application in differentiated marketing campaigns.
 To apply the regression procedure, researcher selected variables as below ;
o Product Usage Level (X9) as Dependent Variables
o X1 – X7 as Independent Variables ;
 X1 = Delivery speed
 X2 = Price Level
 X3 = Price Flexibility
 X4 = Manufacturer Image
 X5 = Overall Image
 X6 = Salesforce Image
 X7 = Product Quality
 The relationship among the seven independent variables and product usage was
assumed to be statistical, not functional, because it involved perceptions of
performance and may have had levels of measurement error.
3) What is the Research Design of a Multiple Regression Analysis?
 The HATCO survey obtained 100 respondents from their customer base. All 100
respondents provided complete responses, resulting in 100 observations available for
analysis. The first question to be answered concerning sample size is the level of
relationship (R square) that can be detected reliably with the proposed regression
analysis. The significant level is relaxed to .05.
4) What are the Assumptions in Multiple Regression Analysis?
 When you choose to analyse your data using multiple regression, part of the process
involves checking to make sure that the data you want to analyse can actually be
analysed using multiple regression. You need to do this because it is only appropriate
to use multiple regression if your data "passes" eight assumptions that are required for
multiple regression to give you a valid result. In practice, checking for these eight
assumptions just adds a little bit more time to your analysis, requiring you to click a
few more buttons in SPSS Statistics when performing your analysis, as well as think a
little bit more about your data, but it is not a difficult task.
3
 Assumption #1:
o Your dependent variable should be measured on a continuous scale (i.e., it is
either an interval or ratio variable). If your dependent variable was measured
on an ordinal scale, you will need to carry out ordinal regression rather than
multiple regression.
 Assumption #2:
o You have two or more independent variables, which can be
either continuous (i.e., an interval or ratio variable) or categorical (i.e.,
an ordinal or nominal variable). If one of your independent variables is
dichotomous and considered a moderating variable, you might need to run
a Dichotomous moderator analysis.
 Assumption #3:
o You should have independence of observations (i.e., independence of
residuals), which you can easily check using the Durbin-Watson statistic,
which is a simple test to run using SPSS Statistics. We explain how to
interpret the result of the Durbin-Watson statistic, as well as showing you the
SPSS Statistics procedure required, in our enhanced multiple regression guide.
 Assumption #4:
o There needs to be a linear relationship between (a) the dependent variable
and each of your independent variables, and (b) the dependent variable and
the independent variables collectively. Whilst there are a number of ways to
check for these linear relationships, we suggest
creating scatterplots and partial regression plots using SPSS Statistics, and
then visually inspecting these scatterplots and partial regression plots to check
for linearity.
o If the relationship displayed in your scatterplots and partial regression plots
are not linear, you will have to either run a non-linear regression analysis or
"transform" your data, which you can do using SPSS Statistics. In our
enhanced multiple regression guide, we show you how to: (a) create
scatterplots and partial regression plots to check for linearity when carrying
out multiple regression using SPSS Statistics; (b) interpret different scatterplot
and partial regression plot results; and (c) transform your data using SPSS
Statistics if you do not have linear relationships between your variables.
 Assumption #5:
o Your data needs to show homoscedasticity, which is where the variances
along the line of best fit remain similar as you move along the line.
4
 Assumption #6:
o Your data must not show multicollinearity, which occurs when you have two
or more independent variables that are highly correlated with each other. This
leads to problems with understanding which independent variable contributes
to the variance explained in the dependent variable, as well as technical issues
in calculating a multiple regression model.
 Assumption #7:
o There should be no significant outliers, high leverage points or highly
influential points. Outliers, leverage and influential points are different terms
used to represent observations in your data set that are in some way unusual
when you wish to perform a multiple regression analysis. These different
classifications of unusual points reflect the different impact they have on the
regression line. An observation can be classified as more than one type of
unusual point.
o However, all these points can have a very negative effect on the regression
equation that is used to predict the value of the dependent variable based on
the independent variables. This can change the output that SPSS Statistics
produces and reduce the predictive accuracy of your results as well as the
statistical significance.
o Fortunately, when using SPSS Statistics to run multiple regression on your
data, you can detect possible outliers, high leverage points and highly
influential points. In our enhanced multiple regression guide, we: (a) show you
how to detect outliers using "casewise diagnostics" and "studentized deleted
residuals", which you can do using SPSS Statistics, and discuss some of the
options you have in order to deal with outliers; (b) check for leverage points
using SPSS Statistics and discuss what you should do if you have any; and (c)
check for influential points in SPSS Statistics using a measure of influence
known as Cook's Distance, before presenting some practical approaches in
SPSS Statistics to deal with any influential points you might have.
 Assumption #8:
o Finally, you need to check that the residuals (errors) are approximately
normally distributed (we explain these terms in our enhanced multiple
regression guide). Two common methods to check this assumption include
using: (a) a histogram (with a superimposed normal curve) and a Normal P-P
Plot; or (b) a Normal Q-Q Plot of the studentized residuals.
o Again, in our enhanced multiple regression guide, we: (a) show you how to
check this assumption using SPSS Statistics, whether you use a histogram
(with superimposed normal curve) and Normal P-P Plot, or Normal Q-Q Plot;
5
(b) explain how to interpret these diagrams; and (c) provide a possible solution
if your data fails to meet this assumption.
___________________________________________________________________________
5) How to run the Standard Multiple Regression?
a) Click Analyze > Regression > Linear... on the main menu, as shown below:
b) You will be presented with the Linear Regression dialogue box below:
6
c) Transfer the dependent variable (X1-X7) into the Dependent: box and the independent
variables, X9 into the Independent(s): box, using the buttons, as shown below (all
other boxes can be ignored):
Note: For a standard multiple regression you should ignore

the and buttons as they are for sequential (hierarchical) multiple
regression. The Method: option needs to be kept at the default value, which is .
If, for whatever reason, is not selected, you need to change Method: back
to . The method is the name given by SPSS Statistics to standard
regression analysis.
d) Click the button. You will be presented with the Linear Regression:
Statistics dialogue box, as shown below:
7
e) In addition to the options that are selected by default, select Confidence intervals in
the –Regression Coefficients– area leaving the Level(%): option at "95". You will end
up with the following screen:
f) Click the button. You will be returned to the Linear Regression dialogue
box.
g) Click the button. This will generate the output.
5.1 Interpreting and Reporting the Output of Multiple Regression Analysis
 SPSS Statistics will generate quite a few tables of output for a multiple regression
analysis. In this section, we show you only the three main tables required to
understand your results from the multiple regression procedure, assuming that no
assumptions have been violated.
 A complete explanation of the output you have to interpret when checking your data
for the eight assumptions required to carry out multiple regression is provided in our
enhanced guide. This includes relevant scatterplots and partial regression plots,
histogram (with superimposed normal curve), Normal P-P Plot and Normal Q-Q Plot,
correlation coefficients and Tolerance/VIF values, casewise diagnostics and
studentized deleted residuals.
8
(a) Determining how well the model fits
The first table of interest is the Model Summary table. This table provides the R, R2,
adjusted R2, and the standard error of the estimate, which can be used to determine how well
a regression model fits the data:
Model Summary
R Square Adjusted R Std. Error of the

Model R
Square Estimate
a
1 .880 .775 .758 4.4237
a. Predictors: (Constant), Product Quality, Service, Salesforce Image, Price

Flexibility, Price Level, Manufacturer Image, Delivery Speed
 The "R" column represents the value of R, the multiple correlation coefficient. R can be
considered to be one measure of the quality of the prediction of the dependent variable;
in this case, X9 (Usage Level) . A value of 0.775, in this example, indicates a good level
of prediction. The "R Square" column represents the R2 value (also called the coefficient
of determination), which is the proportion of variance in the dependent variable that can
be explained by the independent variables (technically, it is the proportion of variation
accounted for by the regression model above and beyond the mean model). You can see
from our value of 0.758 that our independent variables explain 75.8% of the variability
of our dependent variable, X9 (Usage Level) . However, you also need to be able to
interpret "Adjusted R Square" (adj. R2) to accurately report your data.
(b) Statistical Significance
 The F-ratio in the ANOVA table (see below) tests whether the overall regression model is
a good fit for the data. The table shows that the independent variables statistically
significantly predict the dependent variable, F(7, 92) = 45.252, p < .0005 (i.e., the
regression model is a good fit of the data).
9
a
ANOVA
Sum of df Mean
Model F Sig.
Squares Square
b
Regression 6198.677 7 885.525 45.252 .000
1 Residual 1800.323 92 19.569
Total 7999.000 99
a. Dependent Variable: Usage Level
b. Predictors: (Constant), Product Quality, Service, Salesforce Image, Price Flexibility, Price
Level, Manufacturer Image, Delivery Speed
(c) Estimated Model Coefficients
 The general form of the equation to predict Usage Level (X9) from delivery speed (X1),
price level (X2), price flexibility (X3), manufacturer image (X4), overall image (X5),
salesforce image (X6) and product quality (X7) is:
predicted Usage Level = -10.187 – (0.058 x Delivery Speed ) – (0.697 x Price Level )
+ (3.368 x Price Flexibility ) - (.042 x Manufacturer Image ) + (8.369 x Service ) +
(1.281 x Salesforce Image ) + (0.567 x Product Quality )
This is obtained from the Coefficients table, as shown below:
a
Coefficients
Unstandardized Standardized 95.0% Confidence Interval

Model Coefficients Coefficients t Sig. for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -10.187 4.977 -2.047 .044 -20.071 -.303
Delivery Speed -.058 2.013 -.008 -.029 .977 -4.055 3.940
Price Level -.697 2.090 -.093 -.333 .740 -4.848 3.454
Price Flexibility 3.368 .411 .520 8.191 .000 2.551 4.185
Manufacturer Image -.042 .667 -.005 -.063 .950 -1.367 1.282
Service 8.369 3.918 .699 2.136 .035 .587 16.151
Salesforce Image 1.281 .947 .110 1.352 .180 -.600 3.162
Product Quality .567 .355 .100 1.595 .114 -.139 1.273
10
 Unstandardized coefficients indicate how much the dependent variable varies with an
independent variable when all other independent variables are held constant. Consider
the effect of Delivery Speed in this example. The unstandardized coefficient, B1,
for Delivery Speed is equal to -0.058 (see Coefficients table). This means that the
increase of delivery speed resulted from the decrease in Usage Level of 0.058.
(d) Statistical significance of the independent variables
 You can test for the statistical significance of each of the independent variables. This
tests whether the unstandardized (or standardized) coefficients are equal to 0 (zero) in the
population. If p < .05, you can conclude that the coefficients are statistically significantly
different to 0 (zero). The t-value and corresponding p-value are located in the "t" and
"Sig." columns, respectively, as highlighted below:
a
Coefficients
Unstandardized Standardized 95.0% Confidence Interval

Model Coefficients Coefficients t Sig. for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -10.187 4.977 -2.047 .044 -20.071 -.303
Delivery Speed -.058 2.013 -.008 -.029 .977 -4.055 3.940
Price Level -.697 2.090 -.093 -.333 .740 -4.848 3.454
Service 8.369 3.918 .699 2.136 .035 .587 16.151
Product Quality .567 .355 .100 1.595 .114 -.139 1.273
 We can see from the "Sig." column that all independent variable coefficients are
statistically significantly different from 0 (zero). Although the intercept, B0, is tested for
statistical significance, this is rarely an important or interesting finding.
 A multiple regression was run to predict Usage Level (X9) from delivery speed (X1),
price level (X2), price flexibility (X3), manufacturer image (X4), overall image (X5),
salesforce image (X6) and product quality (X7) rate. These variables statistically
significantly predicted Usage Level, F (7, 92) = 45.252, p < .0005, R2 = .775. All seven
variables added statistically significantly to the prediction, p < .05.
11
6) How to run the “Stepwise” Multiple Regression?
12
13
e) Click the button. You will be returned to the Linear Regression dialogue
box.
f) Click the button. This will generate the output.
The output from Stepwise procedures as shown below :
a
Variables Entered/Removed
Variables Variables
Model Method
Entered Removed
1 Service Stepwise (Criteria:
Probability-of-F-to-enter <=
.
.050, Probability-of-F-to-
remove >= .100).
2 Price Flexibility Stepwise (Criteria:
.
remove >= .100).
3 Salesforce Image Stepwise (Criteria:
.
remove >= .100).
 This above table tells you which variables were included in the model at each step:
“Service” was the single best predictor (step 1), “Price Flexibility” was the next best
predictor (added the most), after “Service” was included in the model (step 2) and
“Salesforce Image” was the last best predictor.
14
Model Summary
R Adjusted R Std. Error of the

Model R Square
Square Estimate
a
1 .701 .491 .486 6.4458
b
2 .869 .755 .750 4.4980
c
3 .877 .768 .761 4.3938
a. Predictors: (Constant), Service
b. Predictors: (Constant), Service, Price Flexibility
c. Predictors: (Constant), Service, Price Flexibility, Salesforce Image
 Again, here are the R-squares. With “Service” alone (step 1), 49.1% of the variance
was accounted for. With both “Service” and “Price Flexibility” (step 2), 75.5% of the
variance was accounted for. With all variables “Service”, “Price Flexibility” and
“Salesforce Image” (step 3), 76.8% of the variance was accounted for.
a
ANOVA
Model Sum of Squares df Mean Square F Sig.

b
1 Regression 3927.309 1 3927.309 94.525 .000
Residual 4071.691 98 41.548
Total 7999.000 99
c
2 Regression 6036.513 2 3018.256 149.184 .000
Residual 1962.487 97 20.232
Total 7999.000 99
d
3 Regression 6145.700 3 2048.567 106.115 .000
Residual 1853.300 96 19.305
Total 7999.000 99
b. Predictors: (Constant), Service
c. Predictors: (Constant), Service, Price Flexibility
d. Predictors: (Constant), Service, Price Flexibility, Salesforce Image
 This ANOVA table above gives three F-tests, one for each step of the procedure. All
steps had overall significant results (p = .000 for “Service” alone, and p = .000 for
Service and Price Flexibility and p = .000 for Service, Price Flexibility and Salesforce
Image).
15
a
Coefficients
Unstandardized Standardized 95.0% Confidence Interval for

Coefficients Coefficients B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 21.653 2.596 8.341 .000 16.502 26.804
Service 8.384 .862 .701 9.722 .000 6.673 10.095

2 (Constant) -3.489 3.057 -1.141 .257 -9.556 2.578
Service 7.974 .603 .666 13.221 .000 6.777 9.171

3 (Constant) -6.520 3.247 -2.008 .047 -12.965 -.075
Service 7.621 .607 .637 12.547 .000 6.416 8.827

Salesforce Image 1.406 .591 .121 2.378 .019 .232 2.579
 Again, this table gives beta coefficients so that you can construct the regression equation.
 Notice that the betas change, depending on which predictors are included in the model.
 These are the weights that you want, for an equation that includes Service, Price
Flexibility and Salesforce Image (the three best predictors). The equation would be:
 Predicted Usage Level = -6.520 + 7.621 (Service) + 3.376 (Price Flexibility) + 1.406
(Salesforce Image)
16
The last table (“Variables Excluded from the Equation”) just lists the variables that weren’t
included in the model at each step.
a
Excluded Variables
Collinearity
Partial Statistics
Model Beta In t Sig. Correlation Tolerance
b
1 Delivery Speed .396 4.812 .000 .439 .626
b
Price Level -.377 -5.007 .000 -.453 .737
b
Price Flexibility .515 10.210 .000 .720 .996
b
Manufacturer Image .016 .216 .830 .022 .911
b
Salesforce Image .093 1.252 .214 .126 .942
b
Product Quality -.154 -2.178 .032 -.216 .997
c
2 Delivery Speed .016 .205 .838 .021 .405
c
Price Level -.020 -.267 .790 -.027 .464
c
Manufacturer Image .095 1.808 .074 .181 .892
c
Salesforce Image .121 2.378 .019 .236 .939
c
Product Quality .094 1.683 .096 .169 .799
d
3 Delivery Speed .030 .389 .698 .040 .403
d
Price Level -.029 -.405 .687 -.041 .462
d
Manufacturer Image -.002 -.021 .983 -.002 .357
d
Product Quality .071 1.273 .206 .130 .768
b. Predictors in the Model: (Constant), Service
c. Predictors in the Model: (Constant), Service, Price Flexibility
d. Predictors in the Model: (Constant), Service, Price Flexibility, Salesforce Image
17
7) How to run the “Backward” Multiple Regression?
18
e) Click the button. You will be returned to the Linear Regression dialogue
box.
f) Click the button. This will generate the output.
19
The output from Backward procedures as shown below :
a
Variables Entered/Removed
Variables Variables
Model Entered Removed Method
1 Product Quality,
Service,
Salesforce
Image, Price
Flexibility, Price . Enter
Level,
Manufacturer
Image, Delivery
b
Speed
Backward
(criterion:
2 . Delivery Speed Probability of F-
to-remove >=
.100).
Backward
3 (criterion:
Manufacturer
. Probability of F-
Image
to-remove >=
.100).
Backward
(criterion:
4 . Price Level Probability of F-
to-remove >=
.100).
Backward
(criterion:
5 . Product Quality Probability of F-
to-remove >=
.100).

b. All requested variables entered.
20
Model Summary
Adjusted R Std. Error of

Model R R Square Square the Estimate
a
1 .880 .775 .758 4.4237
b
2 .880 .775 .760 4.3998
c
3 .880 .775 .763 4.3764
d
4 .879 .772 .763 4.3796
e
5 .877 .768 .761 4.3938
a. Predictors: (Constant), Product Quality, Service, Salesforce Image,
Price Flexibility, Price Level, Manufacturer Image, Delivery Speed
b. Predictors: (Constant), Product Quality, Service, Salesforce Image,
Price Flexibility, Price Level, Manufacturer Image
c. Predictors: (Constant), Product Quality, Service, Salesforce Image,
Price Flexibility, Price Level
d. Predictors: (Constant), Product Quality, Service, Salesforce Image,
Price Flexibility
e. Predictors: (Constant), Service, Salesforce Image, Price Flexibility
 From this model summary, we can see the same value in R Square for step 1 until step 3)
and we can see that occurred small difference in the R-square between step 4 and step
5—that’s why the model discarded the six and seven predictor as not being particularly
useful.
a
ANOVA

b
1 Regression 6198.677 7 885.525 45.252 .000
Residual 1800.323 92 19.569
Total 7999.000 99
c
2 Regression 6198.661 6 1033.110 53.367 .000
Residual 1800.339 93 19.358
Total 7999.000 99
d
3 Regression 6198.591 5 1239.718 64.726 .000
Residual 1800.409 94 19.153
Total 7999.000 99
e
4 Regression 6176.787 4 1544.197 80.506 .000
Residual 1822.213 95 19.181
Total 7999.000 99
f
5 Regression 6145.700 3 2048.567 106.115 .000
Residual 1853.300 96 19.305
Total 7999.000 99
21
b. Predictors: (Constant), Product Quality, Service, Salesforce Image, Price Flexibility, Price Level,
Manufacturer Image, Delivery Speed
c. Predictors: (Constant), Product Quality, Service, Salesforce Image, Price Flexibility, Price Level,
Manufacturer Image
d. Predictors: (Constant), Product Quality, Service, Salesforce Image, Price Flexibility, Price Level
e. Predictors: (Constant), Product Quality, Service, Salesforce Image, Price Flexibility
f. Predictors: (Constant), Service, Salesforce Image, Price Flexibility
 This ANOVA table above gives three F-tests, one for each step of the procedure. All
steps had overall significant results (p = .000 for “Product Quality, Service, Salesforce
Image, Price Flexibility, Pice Level, Manufacturer Image and Delivery Speed”).
a
Coefficients
Unstandardized Standardized 95.0% Confidence

Coefficients Coefficients Interval for B
Lower
Model B Std. Error Beta t Sig. Bound Upper Bound
1 (Constant) -10.187 4.977 -2.047 .044 -20.071 -.303
Delivery Speed -.058 2.013 -.008 -.029 .977 -4.055 3.940
Price Level -.697 2.090 -.093 -.333 .740 -4.848 3.454
Service 8.369 3.918 .699 2.136 .035 .587 16.151
Product Quality .567 .355 .100 1.595 .114 -.139 1.273
2 (Constant) -10.216 4.845 -2.109 .038 -19.836 -.596
Price Level -.640 .604 -.085 -1.059 .292 -1.839 .560
Service 8.260 .822 .690 10.051 .000 6.628 9.891
Product Quality .567 .353 .100 1.603 .112 -.135 1.269
3 (Constant) -10.298 4.621 -2.228 .028 -19.474 -1.122
Price Level -.641 .601 -.085 -1.067 .289 -1.833 .552
Service 8.253 .810 .690 10.186 .000 6.644 9.862
Product Quality .566 .351 .100 1.611 .111 -.132 1.264
22
4 (Constant) -10.699 4.610 -2.321 .022 -19.850 -1.548
Service 7.680 .607 .642 12.648 .000 6.475 8.886
Product Quality .403 .317 .071 1.273 .206 -.226 1.032
5 (Constant) -6.520 3.247 -2.008 .047 -12.965 -.075
Service 7.621 .607 .637 12.547 .000 6.416 8.827
_______________________________________________________________________________________________
23
Correlation
The “descriptives” command also gives you a correlation matrix, showing you the Pearson rs
between the variables (in the top part of this table).
What is Pearson Correlation?
Correlation between sets of data is a measure of how well they are related. The most common
measure of correlation in stats is the Pearson Correlation. The full name is the Pearson
Product Moment Correlation or PPMC.
Why Pearson Correlation?
Pearson’s correlation coefficient is the test statistics that measures the statistical
relationship, or association, between two continuous variables. It is known as the best
method of measuring the association between variables of interest because it is based on the
24
method of covariance. It gives information about the magnitude of the association, or
correlation, as well as the direction of the relationship.
The Pearson’s r for the correlation between the Service (IV) and Usage Level (DV) in our
example is 0.701.
When Pearson’s r is close to 1…
This means that there is a strong relationship between your two variables. This means that
changes in one variable are strongly correlated with changes in the second variable. In our
example, Pearson’s r is 0.701. This number is very close to 1. For this reason, we can
conclude that there is a strong relationship between our Usage Level and Service. It’s means
that the perception on HATCO service will affect the usage level. However, we cannot make
any other conclusions about this relationship, based on this number.
When Pearson’s r is close to 0…
This means that there is a weak relationship between your two variables. This means that
changes in one variable are not correlated with changes in the second variable. If our
Pearson’s r were 0.01, we could conclude that our variables were not strongly correlated. In
this example, price level is not correlated with the changes of the usage level.
When Pearson’s r is positive (+)…
This means that as one variable increases in value, the second variable also increase in value.
Similarly, as one variable decreases in value, the second variable also decreases in value.
This is called a positive correlation. In our example, our Pearson’s r value of 0.701 was
positive. We know this value is positive because SPSS did not put a negative sign in front of
it. So, positive is the default. Since our example Pearson’s r is positive, we can conclude that
when the service (our first variable) increase, the usage level (our second variable) also
increases.
When Pearson’s r is negative (-)…
This means that as one variable increases in value, the second variable decreases in value.
This is called a negative correlation. In our example, our Pearson’s r value of product quality
25
and usage level is -0.192. So, we could conclude that when the product quality (our first
variable) is decrease, the usage level (our second variable) decreases.
 Multicollinearity. The correlations between the variables in your model are provided
in the table labelled Correlations. Check that your independent variables show at
least some relationship with your dependent variable (above .3 preferably). In this
case 3/7 of the scales (Delivery Speed, Price Flexibility and Service) correlate
substantially with Usage Level (.676, .559, and .701 respectively). Meanwhile, 4/7 of
the scales (Price Level, Manufacturer Image, Salesforce Image and Product
Quality) show relationship with dependent variable below .3 (which are .082, .224,
.256 and -.192).
 Also check that the correlation between each of your independent variables is not too
high. Tabachnick and Fidell (2001, p. 84) suggest that you ‘think carefully before
including two variables with a bivariate correlation of, say, .7 or more in the same
analysis’. If you find yourself in this situation you may need to consider omitting
one of the variables or forming a composite variable from the scores of the two
highly correlated variables. In the example presented here the correlation between
salesforce image variable and manufacturer image variable is .788, which is more
than .7, therefore all variables will be retained.
26
Interpretation of Output from Standard Multiple Regression
As with the output from most of the SPSS procedures, there are lots of rather confusing
numbers generated as output from regression.
Step 1: Checking the assumptions
Coefficientsa
Unstandardized Standardized 95.0% Confidence Collinearity

Correlations
Coefficients Coefficients Interval for B Statistics
Model t Sig.
Std. Lower Upper Zero-
B Beta Partial Part Tolerance VIF
Error Bound Bound order
1 (Constant) -.567 .445 -1.274 .206 -1.451 .317
Delivery
.240 .180 .370 1.333 .186 -.118 .598 .651 .138 .062 .028 35.747
Speed
Price Level .176 .187 .246 .942 .349 -.195 .547 .028 .098 .044 .032 31.597
Price
.290 .037 .470 7.882 .000 .217 .363 .525 .635 .366 .608 1.645
Flexibility
Manufacturer
.429 .060 .567 7.183 .000 .310 .547 .476 .599 .334 .347 2.879
Image
Service .132 .351 .116 .376 .708 -.565 .828 .631 .039 .017 .023 43.834
Salesforce
-.196 .085 -.177 -2.315 .023 -.364 -.028 .341 -.235 -.108 .371 2.697
Image
Product
-.046 .032 -.085 -1.446 .152 -.109 .017 -.283 -.149 -.067 .623 1.606
Quality
a. Dependent Variable: Satisfaction Level
 SPSS also performs ‘collinearity diagnostics’ on your variables as part of the

multiple regression procedure. This can pick up on problems with multi-collinearity
that may not be evident in the correlation matrix. The results are presented in the
table labelled Coefficients. Two values are given: Tolerance and VIF.
 Tolerance is an indicator of how much of the variability of the specified independent

is not explained by the other independent variables in the model and is calculated
using the formula 1–R2 for each variable. If this value is very small (less than .10), it
indicates that the multiple correlation with other variables is high, suggesting the
possibility of multi-collinearity. The other value given is the VIF (Variance inflation
27
factor), which is just the inverse of the Tolerance value (1 divided by Tolerance).
VIF values above 10 would be a concern here, indicating multi-collinearity.
 I have quoted commonly used cut-off points for determining the presence of multi-
collinearity (tolerance value of less than .10, or a VIF value of above 10). These
values, however, still allow for quite high correlations between independent variables
(above .9), so you should take them only as a warning sign, and check the correlation
matrix.
 In this example the tolerance value for Delivery Speed (.028), Price Level (0.32)
and Service (.023) variable show result less than .10; therefore, we have violated
the multi-collinearity assumption. Meanwhile, the tolerance value for Price
Flexibility (.608), Manufacturer Image (.347), Salesforce Image (.371) and
Product Quality (.623) variables, which is not less than .10; therefore, we have
not violated the multi-collinearity assumption.
 This is also supported by the VIF values, which are Delivery Speed (35.747), Price
Level (31.597) and Service (43.834) variables show results above 10. Meanwhile,
Price Flexibility (1.645), Manufacturer Image (2.879), Salesforce Image (2.697)
and Product Quality (1.606) variables shows good results which below the cut-off
of 10. If you exceed these values in your own results, you should seriously
consider removing one of the highly inter-correlated independent variables from
the model.
 So that, you should remove delivery speed, price level and service variables in your
model.
Outliers, Normality, Linearity, Homoscedasticity, Independence of Residuals.
One of the ways that these assumptions can be checked is by inspecting the residuals
scatterplot and the Normal Probability Plot of the regression standardised residuals
that were requested as part of the analysis. These are presented at the end of the output. In
the Normal Probability Plot you are hoping that your points will lie in a reasonably straight
diagonal line from bottom left to top right. This would suggest no major deviations from
normality. In the Scatterplot of the standardised residuals (the recond plot displayed) you are
hoping that the residuals will be roughly rectangular distributed, with most of the scores
28
concentrated in the centre (along the 0 point). What you don’t want to see is a clear or
systematic pattern to your residuals (e.g. curvilinear, or higher on one side than the other).
The presence of outliers can also be detected from the Scatterplot. Tabachnick and Fidell
(2001) define outliers as cases that have a standardised residual (as displayed in the
scatterplot) of more than 3.3 or less than –3.3. With large samples, it is not uncommon to find
a number of outlying residuals. If you find only a few, it may not be necessary to take any
action. The results of Scatterplot is shown below :
 The other information in the output concerning unusual cases is in the Table titled
Casewise Diagnostics. This presents information about cases that have standardised
residual values above 3.0 or below –3.0. In a normally distributed sample we would
expect only 1 per cent of cases to fall outside this range.
29
 To check whether this strange case is having any undue influence on the results for
our model as a whole, we can check the value for Cook’s Distance given towards the
bottom of the Residuals Statistics table. According to Tabachnick and Fidell (2001,
p. 69), cases with values larger than 1 are a potential problem. In our example the
maximum value for Cook’s Distance is .100, suggesting no major problems. In your
own data, if you obtain a maximum value above 1 you will need to go back to your
data file, sort cases by the new variable that SPSS created at the end of your file.
Check each of the cases with values above 1—you may need to consider removing the
offending case/cases.
a
Residuals Statistics
Minimum Maximum Mean Std. Deviation N
Predicted Value 3.129 6.495 4.771 .7658 100
Std. Predicted Value -2.145 2.252 .000 1.000 100
Standard Error of Predicted
.070 .240 .108 .028 100
Value
Adjusted Predicted Value 3.097 6.446 4.771 .7668 100
Residual -.9393 .7193 .0000 .3815 100
Std. Residual -2.374 1.818 .000 .964 100
Stud. Residual -2.519 1.884 .000 .999 100
Deleted Residual -1.0577 .7726 .0003 .4103 100
Stud. Deleted Residual -2.596 1.911 -.003 1.011 100
Mahal. Distance 2.109 35.390 6.930 5.043 100
Cook's Distance .000 .100 .009 .015 100
Centered Leverage Value .021 .357 .070 .051 100
Step 2: Evaluating the Model
 Look in the Model Summary box and check the value given under the heading R
Square. This tells you how much of the variance in the dependent variable
(Satisfaction Level) is explained by the model (which includes the variables of
Perceptions HATCO - Delivery Speed, Price Level, Price Flexibility, Manufacturer’s
Image, Service, Salesforce Image and Product Quality).
b
Model Summary
Adjusted R Std. Error of the

Model R R Square Square Estimate
30
a
1 .895 .801 .786 .3957
a. Predictors: (Constant), Product Quality, Service, Salesforce Image,
Price Flexibility, Price Level, Manufacturer Image, Delivery Speed
b. Dependent Variable: Satisfaction Level
 In this case the value is .801. Expressed as a percentage (multiply by 100, by shifting
the decimal point two places to the right), this means that our model (which includes
Perception of HATCO) explains 80.1 per cent of the variance in satisfaction level.
This is quite a respectable result (particularly when you compare it to some of the
results that are reported in the journals!). You will notice that SPSS also provides an
Adjusted R Square value in the output. When a small sample is involved, the R
square value in the sample tends to be a rather optimistic overestimation of the true
value in the population (see Tabachnick & Fidell, 2001, p. 147).
 The Adjusted R square statistic ‘corrects’ this value to provide a better estimate of the
true population value. If you have a small sample you may wish to consider reporting
this value, rather than the normal R Square value. To assess the statistical significance
of the result it is necessary to look in the table labelled ANOVA. This tests the null
hypothesis that multiple R in the population equals 0. The model in this example
reaches statistical significance (Sig = .000, this really means p<.0005).
a
ANOVA

b
1 Regression 58.058 7 8.294 52.962 .000
Residual 14.408 92 .157
Total 72.466 99
b. Predictors: (Constant), Product Quality, Service, Salesforce Image, Price Flexibility, Price Level,
Manufacturer Image, Delivery Speed
Step 3: Evaluating Each of the Independent Variables
The next thing we want to know is which of the variables included in the model contributed
to the prediction of the dependent variable. We find this information in the output box
labelled Coefficients. Look in the column labelled Beta under Standardised Coefficients.
To compare the different variables it is important that you look at the standardised
31
coefficients, not the unstandardised ones. ‘Standardised’ means that these values for each of
the different variables have been converted to the same scale so that you can compare them.
If you were interested in constructing a regression equation, you would use the
unstandardized coefficient values listed as B.
Coefficientsa
Unstandardized Standardized 95.0% Confidence Collinearity

Correlations
Coefficients Coefficients Interval for B Statistics
Model t Sig.
Std. Lower Upper Zero-
B Beta Partial Part Tolerance VIF
Error Bound Bound order
1 (Constant) -.567 .445 -1.274 .206 -1.451 .317
Delivery
.240 .180 .370 1.333 .186 -.118 .598 .651 .138 .062 .028 35.747
Speed
Price Level .176 .187 .246 .942 .349 -.195 .547 .028 .098 .044 .032 31.597
Price
.290 .037 .470 7.882 .000 .217 .363 .525 .635 .366 .608 1.645
Flexibility
Manufacturer
.429 .060 .567 7.183 .000 .310 .547 .476 .599 .334 .347 2.879
Image
Service .132 .351 .116 .376 .708 -.565 .828 .631 .039 .017 .023 43.834
Salesforce
-.196 .085 -.177 -2.315 .023 -.364 -.028 .341 -.235 -.108 .371 2.697
Image
Product
-.046 .032 -.085 -1.446 .152 -.109 .017 -.283 -.149 -.067 .623 1.606
Quality
 In this case we are interested in comparing the contribution of each independent

variable; therefore we will use the beta values. Look down the Beta column and
find which beta value is the largest (ignoring any negative signs out the front). In
this case the largest beta coefficient is .567, which is for Manufacturer Image
Variable. This means that this variable makes the strongest unique contribution to
explaining the dependent variable, when the variance explained by all other variables
in the model is controlled for.
 The Beta value for Product Quality Variable was slightly lower (–.085), indicating
that it made less of a contribution. For each of these variables, check the value in the
column marked Sig. This tells you whether this variable is making a statistically
significant unique contribution to the equation. This is very dependent on which
32
variables are included in the equation, and how much overlap there is among the
independent variables. If the Sig. value is less than .05 (.01, .0001, etc.), then the
variable is making a significant unique contribution to the prediction of the dependent
variable. If greater than .05, then you can conclude that variable is not making a
significant unique contribution to the prediction of your dependent variable. This may
be due to overlap with other independent variables in the model.
 In this case, Price Flexibility (.000), Manufacturer Image (.000) and Salesforce
Image (.023) variables made a unique, and statistically significant, contribution
to the prediction of satisfaction level scores. Meanwhile, Delivery Speed (.186),
Price Level (.349), Service (.708) and Product Quality (.152) not making a
significant unique contribution to the prediction of your dependent variable. This
may be due to overlap with other independent variables in the model.
 The other potentially useful piece of information in the coefficients table is the Part
correlation coefficients. Just to confuse matters, you will also see these coefficients
referred to as semi-partial correlation coefficients (see Tabachnick and Fidell, 2001, p.
140). If you square this value (whatever it is called) you get an indication of the
contribution of that variable to the total R squared. In other words, it tells you how
much of the total variance in the dependent variable is uniquely explained by that
variable and how much R squared would drop if it wasn’t included in your model. In
this example the Product Quality scale has a part correlation coefficient of –.085.
If we square this (multiply it by itself) we get .72, indicating that Product Quality
uniquely explains 72 per cent of the variance in Satisfaction Level scores.
33
1) Hierarchical multiple regression
In hierarchical regression (also called sequential) the independent variables are entered into
the equation in the order specified by the researcher based on theoretical grounds. Variables
or sets of variables are entered in steps (or blocks), with each independent variable being
assessed in terms of what it adds to the prediction of the dependent variable, after the
previous variables have been controlled for. For example, if you wanted to know how well
optimism predicts life satisfaction, after the effect of age is controlled for, you would enter
age in Block 1 and then Optimism in Block 2. Once all sets of variables are entered, the
overall model is assessed in terms of its ability to predict the dependent measure. The relative
contribution of each block of variables is also assessed.
a) Procedure for Hierarchical Multiple Regression
1) From the menu at the top of the screen click on: Analyze, and then click on
Regression, then on Linear.
2) Choose your continuous dependent variable (e.g. Satisfaction Level) and move it into
the Dependent box.
3) Move the variables you wish to control for into the Independent box (e.g. Usage
Level). This will be the first block of variables to be entered in the analysis (Block 1
of 1).
4) Click on the button marked Next. This will give you a second independent variables
box to enter your second block of variables into (you should see Block 2 of 2).
5) Choose your next block of independent variables (e.g. Perception of HATCO).
6) In the Method box make sure that this is set to the default (Enter). This will give you
standard multiple regressions for each block of variables entered.
7) Click on the Statistics button. Tick the boxes marked Estimates, Model fit, R squared
change, Descriptive, Part and partial correlations and Collinearity diagnostics. Click
on Continue.
8) Click on the Options button. In the Missing Values section click on Exclude cases
pairwise.
9) Click on the Save button. Click on Mahalonobis and Cook’s. Click on Continue and
then OK.
Some of the output generated from this procedure is shown below.
34
c
Model Summary
Change Statistics
R Adjusted Std. Error of
Model R R Square F Sig. F
Square R Square the Estimate df1 df2
Change Change Change
a
1 .711 .505 .500 .6049 .505 100.016 1 98 .000
b
2 .895 .801 .784 .3979 .296 19.363 7 91 .000
a. Predictors: (Constant), Usage Level
b. Predictors: (Constant), Usage Level, Price Level, Salesforce Image, Product Quality, Price Flexibility,
Manufacturer Image, Delivery Speed, Service
c. Dependent Variable: Satisfaction Level
b) Interpretation of Output from Hierarchical Multiple Regression
The output generated from this analysis is similar to the previous output, but with some extra
pieces of information. In the Model Summary box there are two models listed. Model 1
refers to the first block of variables that were entered (Usage Level), while Model 2 includes
all the variables that were entered in both blocks (Perception of HATCO : X1-X7 Variable).
Step 1: Evaluating the model
Check the R Square values in the first Model summary box. After the variables in Block 1
(Usage Level) have been entered, the overall model explains 5.0 per cent of the variance
(.050 × 100). After Block 2 variables (Perception of HATCO) have also been included, the
model as a whole explains 80.1 per cent (.801 × 100). It is important to note that this second
R square value includes all the variables from both blocks, not just those included in the
second step. To find out how much of this overall variance is explained by our variables of
interest (Perception of HAATCO) after the effects of Usage Level desirable responding are
removed, you need to look in the column labelled R Square change.
In the output presented above you will see, on the line marked Model 2, that the R square
change value is .296. This means that X1-X7 variables explain an additional 29.6 per
cent (.296 × 100) of the variance in Satisfaction Level, even when the effects of Usage
Level is statistically controlled for. This is a statistically significant contribution, as indicated
by the Sig. F change value for this line (.000). The ANOVA table indicates that the model as
a whole (which includes both blocks of variables) is significant [F (8, 91) = 45.84, p<.0005).
35
a
ANOVA
b
1 Regression 36.602 1 36.602 100.016 .000
Residual 35.864 98 .366
Total 72.466 99
c
2 Regression 58.060 8 7.257 45.843 .000
Residual 14.406 91 .158
Total 72.466 99
b. Predictors: (Constant), Usage Level
c. Predictors: (Constant), Usage Level, Price Level, Salesforce Image, Product Quality, Price
Flexibility, Manufacturer Image, Delivery Speed, Service
36

Edu6950 Multiple Regression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Edu6950 Multiple Regression

Uploaded by

Copyright:

Available Formats

EDU6950 – ADVANCED STATISTICS IN EDUCATION

“MULTIPLE REGRESSION ANALYSIS”

PROF. DR. MOHD MAJID BIN KONTING

NOR SA’ADAH BINTI JAMALUDDIN

1) What is Multiple Regression?

 Multiple regression is a very advanced statistical too and it is extremely powerful

2) What are the objectives of the Multiple Regression in HATCO data?

3) What is the Research Design of a Multiple Regression Analysis?

4) What are the Assumptions in Multiple Regression Analysis?

Note: For a standard multiple regression you should ignore

g) Click the button. This will generate the output.

5.1 Interpreting and Reporting the Output of Multiple Regression Analysis

R Square Adjusted R Std. Error of the

a. Predictors: (Constant), Product Quality, Service, Salesforce Image, Price

interpret "Adjusted R Square" (adj. R2) to accurately report your data.

(b) Statistical Significance

(c) Estimated Model Coefficients

This is obtained from the Coefficients table, as shown below:

Unstandardized Standardized 95.0% Confidence Interval

(d) Statistical significance of the independent variables

Unstandardized Standardized 95.0% Confidence Interval

f) Click the button. This will generate the output.

The output from Stepwise procedures as shown below :

R Adjusted R Std. Error of the

Model Sum of Squares df Mean Square F Sig.

Unstandardized Standardized 95.0% Confidence Interval for

Service 8.384 .862 .701 9.722 .000 6.673 10.095

Service 7.974 .603 .666 13.221 .000 6.777 9.171

Service 7.621 .607 .637 12.547 .000 6.416 8.827

a. Dependent Variable: Usage Level

f) Click the button. This will generate the output.

a. Dependent Variable: Usage Level

Adjusted R Std. Error of

Model Sum of Squares df Mean Square F Sig.

Unstandardized Standardized 95.0% Confidence

What is Pearson Correlation?

Why Pearson Correlation?

When Pearson’s r is close to 1…

When Pearson’s r is close to 0…

When Pearson’s r is positive (+)…

When Pearson’s r is negative (-)…

Step 1: Checking the assumptions

Unstandardized Standardized 95.0% Confidence Collinearity

a. Dependent Variable: Satisfaction Level

 SPSS also performs ‘collinearity diagnostics’ on your variables as part of the

 Tolerance is an indicator of how much of the variability of the specified independent

Outliers, Normality, Linearity, Homoscedasticity, Independence of Residuals.

Step 2: Evaluating the Model

Adjusted R Std. Error of the

Model Sum of Squares df Mean Square F Sig.

Step 3: Evaluating Each of the Independent Variables

Unstandardized Standardized 95.0% Confidence Collinearity

a. Dependent Variable: Satisfaction Level

 In this case we are interested in comparing the contribution of each independent

a) Procedure for Hierarchical Multiple Regression

Some of the output generated from this procedure is shown below.

b) Interpretation of Output from Hierarchical Multiple Regression

Step 1: Evaluating the model

You might also like