Correlation Regression

SPSS ANALYSIS - CORRELATION
Figure.1
PEARSON CORRELATION = R
SIG.(2-TAILED) = P = COEFFICIENT OF CORRELATION
N = NO. OF RESPONDENT
Correlation is a statistical technique that shows how strongly two variables are
related to each other or the degree of association between the two. For example,
if we have the weight and height data of taller and shorter people, with the
correlation between them, we can find out how these two variables are related.
We can also find the correlation between these two variables and say that their
weights are positively related to height. Correlation is measured by the
Page | 1
correlation coefficient. The correlation coefficient should always be in the range
of -1 to 1. We only consider the correlations that lie above the diagonal because
it is same as the figures that lie below the diagonal for the same variables.
ANALYSIS OF THE CORRELATION TABLE

Positive and negative correlation: In the given table we came across both
positive and negative correlation. For example, the correlation between V1 and
V3 is 0.787 that is a positive correlation because both the variable i.e. V1 and
V3 moves in the same direction. Whereas, the correlation between the variables
V1 and V4 is -0.124 that is a nega tive correlation because both the variables i.e.
V1 and V4 moves in the opposite direction.
DEGREE OF CORRELATION
1. Perfect Correlation- We always correlate two variables on a scale of 0 to 1.
Where 0 means no relation between two variables and 1 means perfect
correlation between two variables. There will always exist a perfect
correlation when a variable is correlated with the same variable. For example,
when V1 is correlated with V1 we will always get a perfect correlation i.e. 1.
All the diagonal values are perfectly correlated in the correlation table.
Page | 2
Figure.2
2. High Degree of Correlation- When the correlation coefficient range is

above .75, it is called high degree of correlation. The values which are
highlighted in the figure no.3 exhibit a high degree of correlation between
two variables. In this table there is high statistically significant correlation
between V1 (outdoor lifestyle) and V3 (relating to weather) i.e. 0.787 and
corresponding value of p is 0.000 which is less than 1% that means there is a
significant correlation.
Page | 3
Figure.3
3. Moderate degree of correlation - When the correlation coefficient range is

between .50 to .75, it is called in moderate degree of correlation. The values
which are highlighted in the figure no.4 exhibit a moderate degree of
correlation between two variables. In this table there is moderate statistically
significant correlation in two cases:
 Between V1 (outdoor lifestyle) and V5 (exercising regularly) i.e. 0.647
and corresponding value of p is 0.000 which is less than 1% that means
there is a moderate degree of correlation.
 Between V2 (enjoying nature) and V4 (living in harmony with
environment) i.e. 0.501 and corresponding value of p is 0.005 which is
less than 1% that means there is a moderate degree of correlation.
Page | 4
Figure.4
4. Low degree of correlation - When the correlation coefficient range is

between .25 to .50, it is called low degree of correlation. The values which
are highlighted in the figure no.5 exhibit a low degree of correlation between
two variables. In this table there is low statistical significant correlation in
the following cases:
 Between V1 (outdoor lifestyle) and V6 (meeting other people) i.e. 0.416
there is a low degree of correlation.
 Between V2 (enjoying nature) and V6 (meeting other people) i.e. 0.448
 Between V3 (relating to weather) and V5 (exercising regularly) i.e. 0.395
Page | 5
 Between V3 (relating to weather) and V6 (meeting other people) i.e. 0.398
Figure.5
Page | 6
LINEAR REGRESSION
Linear regression is the next step up after correlation. It is used when we want to predict the
value of a variable based on the value of another variable. The variable we want to predict is
called the dependent variable (or sometimes, the outcome variable). The variable we are
using to predict the other variable's value is called the independent variable (or sometimes,
the predictor variable). For example, you could use linear regression to understand whether
exam performance can be predicted based on revision time; whether cigarette consumption
can be predicted based on smoking duration; and so forth.
ANALYSIS
1. Variable entered and removed table: The variables entered and removed table shows
that we created two variables so that we could enter our data: V6 (Meeting other people)
which is our independent variable and V1 (outdoor lifestyle) which is our dependent
variable.
Particulars of the variables entered and removed are explained below:

 Model – SPSS allows you to specify multiple models in a single regression command.
This tells you the number of the model being reported.
 Variables Entered – SPSS allows you to enter variables into a regression in blocks, and
it allows stepwise regression. Hence, you need to know which variables were entered
into the current regression. If you did not block your independent variables or use
stepwise regression, this column should list all of the independent variables that you
specified.
 Variables Removed – This column listed the variables that were removed from the
current regression. Usually, this column will be empty unless you did a stepwise
regression.
Page | 7
 Method – This column tells you the method that SPSS used to run the regression.
“Enter” means that each independent variable was entered in usual fashion. If you did a
stepwise regression, the entry in this column would tell you that.
2. Model summary table: The first table of interest is the Model Summary table. This
table provides the R, R2, adjusted R2, and the standard error of the estimate, which can be
used to determine how well a regression model fits the data:
The R value represents the simple correlation and is 0.416 (the "R" Column), which indicates
a low degree of correlation. The R2 value (the "R Square" column) indicates how much of
the total variation in the dependent variable V1 (outdoor lifestyle), can be explained by the
independent variable V6 (meeting other people). In this case, 17.3% can be explained, which
is not very large.
Particulars of the model summary table are explained below:
 R – R is the square root of R-Squared and is the correlation between the observed and
predicted values of dependent variable.
 R-Square – R-Square is the proportion of variance in the dependent variable (outdoor
lifestyle) which can be predicted from the independent variables (meeting other people).
This value indicates that 17.3% of the variance in the variable V1(outdoor lifestyle) can
be predicted from the variable V6 (meeting other people). Note that this is an overall
measure of the strength of association and does not reflect the extent to which any
particular independent variable is associated with the dependent variable. R-Square is also
called the coefficient of determination.
 Adjusted R-square – As predictors are added to the model, each predictor will explain
some of the variance in the dependent variable simply due to chance. One could continue
to add predictors to the model which would continue to improve the ability of the
Page | 8
predictors to explain the dependent variable, although some of this increase in R-square
would be simply due to chance variation in that particular sample. The adjusted R-square
attempts to yield a more honest value to estimate the R-squared for the population.
 Std. Error of the Estimate – The standard error of the estimate, also called the root
mean square error, is the standard deviation of the error term, and is the square root of the
Mean Square Residual (or Error).
3. Anova table: The F-ratio in the ANOVA table (see below) tests whether the overall
regression model is a good fit for the data. The table shows that the independent variables
statistically significantly predict the dependent variable, F(19.196, 91.771) = 5.857, p <
0.05 (i.e., the regression model is a good fit of the data). From the given figure no. the
value of P is 0.022 which is less than 0.05 it means that the regression is significant.
Particulars of anova table are explained below:

 Sum of Squares – These are the Sum of Squares associated with the three sources of
variance, Total, Model and Residual. These can be computed in many ways.
 Df – These are the degrees of freedom associated with the sources of variance. The total
variance has N-1 degrees of freedom. In this case, there were N=30 respondents, so the
DF for total is 29. The model degrees of freedom correspond to the number of predictors
minus 1 (K-1). You may think this would be 0 (since there was only 1 independent
variable in the model, i.e. meeting other people). But, the intercept is automatically
included in the model (unless you explicitly omit the intercept). Including the intercept,
Page | 9
there are 2 predictors, so the model has 2-1=1 degrees of freedom. The Residual degrees
of freedom is the DF total minus the DF model, 29 – 1 is 28.
 Mean Square – These are the Mean Squares, the Sum of Squares divided by their
respective DF.
For the Regression,19.196 / 1 = 19.196.
For the Residual, 91.771 / 28 = 3.278.
These are computed so you can compute the F ratio, dividing the Mean Square
Regression by the Mean Square Residual to test the significance of the predictors in the
model.
 F and Sig. – The F-value is the Mean Square Regression (19.196) divided by the Mean
Square Residual (3.278), yielding F=5.857.
The p-value is compared to your alpha level (typically 0.05) and, if smaller, you can
conclude “Yes, the independent variable reliably predicts the dependent variable”. You
could say that the group of variable V6 meeting other people (the dependent variable). If
the p-value were greater than 0.05, you would say that the group of independent variables
does not show a statistically significant relationship with the dependent variable. Note
that this is an overall significance test assessing whether the group of independent
variables when used together reliably predict the dependent variable and does not address
the ability of any of the particular independent variables to predict the dependent
variable.
4. Model Coefficient table: This table gives beta coefficients so that you can construct the
regression equation. Notice that the betas change, depending on which predictors are
included in the model. The general form of the equation to predict V1 (outdoor
lifestyle) from V6 (meeting other people) is:
Predicted V1 = 2.352 + (0.435 x V6)
This is obtained from the Coefficients table, as shown below:
Page | 10
Unstandardized coefficients indicate how much the dependent variable varies with an
independent variable when all other independent variables are held constant. Consider the
effect of V6 (meeting other people) in this example. The unstandardized coefficient, B 1, for
V6 (meeting other people) is equal to 0.435 (see Coefficients table). This means that for each
one-unit increase in V6 (meeting other people), there is an increase in V1 (outdoor lifestyle)
of 0.435.
Particulars of the model coefficient table are explained below:
This tells you the number of the model being reported. This column shows the predictor
variables (constant and meeting other people).
 B – These are the values for the regression equation for predicting the dependent variable
from the independent variable. These are called unstandardized coefficients because they
are measured in their natural units. This shows the regression equation which is 2.352 +
(0.435 x V6).
These estimates tell you about the relationship between the independent variables and the
dependent variable. These estimates tell the amount of increase in outdoor lifestyle that
would be predicted by a 1 unit increase in the predictor i.e. meeting other people.
 Std. Error – These are the standard errors associated with the coefficients. The standard
error is used for testing whether the parameter is significantly different from 0 by dividing
the parameter estimate by the standard error to obtain a t-value (see the column with t-
values and p-values).
 Beta – These are the standardized coefficients. These are the coefficients that you would
obtain if you standardized all of the variables in the regression, including the dependent
and all of the independent variables, and ran the regression.
 t and Sig. – These columns provide the t-value and 2 tailed p-value used in testing the
null hypothesis that the coefficient/parameter is 0. If you use a 2 tailed test, then you
Page | 11
would compare each p-value to your preselected value of alpha. Coefficients having p-
values less than alpha are statistically significant.
The coefficient for meeting other people (.435) is statistically significant because its p-
value of 0.022 is less than .05
MULTIVARIATE REGRESSION
Multiple regression is an extension of simple linear regression. It is used when we want to
predict the value of a variable based on the value of two or more other variables. The variable
we want to predict is called the dependent variable (or sometimes, the outcome, target or
criterion variable). The variables we are using to predict the value of the dependent variable
are called the independent variables (or sometimes, the predictor, explanatory or regressor
variables).
For example, you could use multiple regression to understand whether exam performance can
be predicted based on revision time, test anxiety, lecture attendance and gender.
ANALYSIS
1. Variable entered and removed table-
Page | 12
Page | 13
2. Model Summary Table
The first table of interest is the Model Summary table. This table provides the R, R2,
adjusted R2, and the standard error of the estimate, which can be used to determine how well
a regression model fits the data:
The "R" column represents the value of R, the multiple correlation coefficient. R can be
considered to be one measure of the quality of the prediction of the dependent variable; in
this case. A value of 0.909 in this example, indicates a good level of prediction. The "R
Square" column represents the R2 value (also called the coefficient of determination), which
is the proportion of variance in the dependent variable that can be explained by the
independent variables (technically, it is the proportion of variation accounted for by the
regression model above and beyond the mean model). We can see from the value of 0.826
that the independent variables explain 82.6% of the variability of the dependent variable.
Adjusted R2 is used to accurately report the data. The footnote on this table tells us which
variables are included in this equation.
3. Anova Table (test using α=0.05)

The F-ratio in the ANOVA table (see below) tests whether the overall regression model is a
good fit for the data. The table shows that the independent variables statistically significantly
predict the dependent variable, F(91.651, 19.315) = 22.776, p < 0.05 (i.e., the regression
model is a good fit of the data). From the given figure no. the value of P is 0.000 which is less
than 0.05 it means that the regression is significant.
Page | 14
4. Model Coefficient table:
This table gives beta coefficients so that you can construct the regression equation. Notice
that the betas change, depending on which predictors are included in the model.
The general form of the equation to predict V1 (outdoor lifestyle) from V6 (meeting other
people), V2 (enjoying nature), V3(relating to the weather), V4 (living in harmony with the
environment), V5 (exercising regularly) is:
Predicted V1 = 0.563 + (0.191x V6) - (0.031 x V2) + (0.566 x V3) -(0.288 x V4) + (0.594 x V5)
This is obtained from the Coefficients table, as shown below:
Unstandardized coefficients indicate how much the dependent variable varies with an
independent variable when all other independent variables are held constant. Consider the
effect of V6 (meeting other people) in this example. The unstandardized coefficient, B 1, for
V6 (meeting other people) is equal to 0.191 (see Coefficients table). This means that for each
one-unit increase in V6 (meeting other people), there is an increase in V1 (outdoor lifestyle)
of 0.191.
Page | 15
Statistical significance of the independent variables
We can test for the statistical significance of each of the independent variables. This tests
whether the unstandardized (or standardized) coefficients are equal to 0 (zero) in the
population. If p < .05, you can conclude that the coefficients are statistically significantly
different to 0 (zero). The t-value and corresponding p-value are located in the "t" and "Sig."
columns, respectively, as highlighted below:
Here, the independent variable viz. V3 (relating to weather), V4 (living in harmony with
environment), V5 (exercising regularly) are statistically significantly different from 0 (zero).
Although the intercept, B0, is tested for statistical significance, this is rarely an important or
interesting finding.
RESULT
A multiple regression was run to predict V1(outdoor lifestyle) from V6 (meeting other
people), V2 (enjoying nature), V3(relating to the weather), V4 (living in harmony with the
environment), V5 (exercising regularly). These variables statistically significantly predicted
V1 (outdoor lifestyle), , F(94.651, 19.315) = 22.776, p < 0.05. All four variables added
statistically significantly to the prediction, p < .05, R2 = 0.826.
Page | 16

Correlation Regression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Correlation Regression

Uploaded by

Copyright:

Available Formats

SPSS ANALYSIS - CORRELATION

ANALYSIS OF THE CORRELATION TABLE

2. High Degree of Correlation- When the correlation coefficient range is

3. Moderate degree of correlation - When the correlation coefficient range is

4. Low degree of correlation - When the correlation coefficient range is

Particulars of the variables entered and removed are explained below:

Particulars of anova table are explained below:

3. Anova Table (test using α=0.05)

This is obtained from the Coefficients table, as shown below:

You might also like