Professional Documents
Culture Documents
SPSS Basic Guidance 3
SPSS Basic Guidance 3
3
Crosstab & chi-square analysis AND Bivariate
data analysis: Correlation and regression
Content:
1. Crosstab & chi-square analysis
2. Bivariate data analysis: Correlation and
regression
3. Extra
Extra 1: Chi-square value
Extra 2: Expected count
Disclaimer: This is a student’s work. If there is any mistake can contact the author
1. Crosstab and chi square analysis
Uses: To determine whether there is any association between 2 qualitative variables
Ho = there is no
association between the
health outcome and
vaccination status
H1 = there is an
association between the
health outcome and
vaccination status
Before we go deep into example, we must know there are 6 assumptions that we
must met when doing regression analysis (a.k.a. the 6 checklist)
A bank loan officer wanted to know the association between income and value of car
purchased by his customer. 20 customers were randomly picked, their income and
value of car bought were tabulated as below. Income and price are in $’000.
Objectives: To test
the association and relationship between income and price
1.
At here we must know how to define which is y- axis and which is x-axis
- Y-axis = independent variable / Predictor variable
- X-axis = dependent variable / Outcome variable
From this example we know that the income actually will affect the price of car
that they bought
So independent = income
Dependent = price
Note: there are some situations that has no independent var. or dependent var.
2 Linear relationship
No significant
Analyze > Regression > Linear… outliers
3
At ‘statistic’
Outcome:
At ‘Plots’ 5
Homoscedasticity
ZRESID:
- Standardized residuals
- Y-axis
ZPRED:
- Standardized predicted values
- X-axis
3.
Outcome:
1. First, we can see the income and price both are specific numbers which means
they are quantitative: they have the value (Note: not the same as
ordinal data)
1 Quantitative data
Note:
r-value range from 0 to 1 (positive association) / -1 (negative association)
0 – 0.3 = weak association
0.4 – 0.7 = moderate association
0.8 – 1 = strong association
Until here, all the above for number 2 is
explain about association.
From here onward, all the below
for number 2 is explain about
relationship.
The R2 value shown is 0.868, which means 86.8% of the variation in Price is
explained by Income.
Note:
R square = coefficient of determination
The p-value is less than 0.001 which is also less than 0.05. Thus, reject null
hypothesis, accept alternative hypothesis.
There is an association between the income and the price of car.
Note:
Normally when the p-value shown is 0.000 we will not report it like this, but we will report it
as ‘less than 0.001’.
The equation:
Y = mX + C
For every increase in income for 1 unit, the predicted price increased by
$2.688 K.
In residual plot, all the points are falling within ± 3 and they
are scattered
randomly
Thus the data is assumed homoscedasticity
Make prediction:
The equation Price = -8.733 + 2.688(Income), can be used to
predict the Price given Income
In the data
- Min income = 51
- Max income = 86
Note:
The larger the Chi-square value, the data deviate more from independence
(means greater association)
To be noted: all expected count must be > 5 so that the result will be valid.
A researcher wanted to test the association between the fitness rate and the trouble falling
asleep within a community. A cross-sectional study was carried out. The recruited samples
were required to rate their fitness rate from the scale 1 to 10 (ordinal) and ‘yes’ or ‘no’
question for the trouble falling asleep (nominal)
Eg:
1. Overall how would you rate your physical fitness? (ordinal)
1 2 3 4 5 6 7 8 9
10
2. Do
After
we do
the
We find out there are 7 cells with expected count less than 5
The result is not valid!
However, it is possible to ‘pool’ the data together! (ordinal data)
Transform > Recode into different
variables
- Important to use ‘different
variables, so it can give you
another new column of
recoded values (you will see
it later)
Ok, after we have done recoding, now let’s move on to the main business. The crosstabs!
Yes! There is no expected value, that is less than 5
The result will be valid!
We can use this for the rest of the test
Example 2: What if it is
impossible to ‘pool’ the data?
Thus, the only way to solve is by INCREASING the sample size. Recruit more people!