Professional Documents
Culture Documents
Correlation Matrix
Correlation Matrix
Correlation Matrix
Understanding correlation
To conduct correlation analysis you need two continuous (also known as ratio or scale) variables. You can also correlate one dichotomous (bivariate) variable and one continuous (scale) variable. For example, sexmale/femalecould be correlated with height. Correlation can also be used for ordinal (rank) data, using Spearmans rank correlation or Kendalls tau instead
of the Pearson correlation. If you want to treat likert-type scale data in a conservative manner you could also use Spearmans rank correlation instead of the Pearson correlation to test associations between scale items. Correlation is exploring the relationship between one variable and another but it does not explore or assume a causal relationship. The question correlation answers is: does variable A relate or associate with variable B. An important caution regarding its use, then, is that there is no assumed cause and effect in correlation. Correlation does not suggest that variable A might cause the change in variable B or that that variable B might cause a change in variable A. Other tests are needed to answer that question.
A Positive relationship as A increases B increases or B increases A increases A Negative relationsip as A increases B decreases or B increases A decreases
The results can range from: positive one (1), which is a complete match between the variables: as one variable goes up, the other goes up at the same rate negative one (1), which is a complete mismatch between the variables: as one variable goes down, the other goes down at the same rate zero (0), which indicates no relationship at all between the variables: as one variable goes up or down the other goes up or down randomly.
The slope of the line determines the relationship between the variables. The test assumes the data has a linear relationship and a normal distribution and does not have problems with skewness, kurtosis or outliers. Outliers are scores or responses on items that fall at the outer edges of a distribution and can create problems when a mean is used, as they can change the mean value drastically.
Skewness is seen when there is a tail in the distribution and where a zero value shows a normal distribution, a positive result is skewed (has a tail) to the left and a negative result (has a tail) to the right (see Pallant 2007; Tabachnick & Fidell 2007).
Negative skew
Zero skew
Positive skew
Kurtosis is the peakedness of the distribution. Zero shows a normal distribution and a positive result shows the distribution as clustered in the centrea high peak; and a negative result shows a flat distribution and may indicate extreme values (outliers). A scatterplot can also be used to look at the distribution and the ideal has a roughly cigar shape to the plot.
Positive kurtosis
Zero kurtosis
Negative kurtosis
Other ways to check for normality in distribution is to look at scatterplots, histograms and boxplots (see Pallant 2007; Tabachnick & Fidell 2007). Boxplots are useful when outliers appear in the data as each outlier is identified with a case number and decisions about those cases are then easier to make. Outliers can distort any of the parametric tests, and create either higher or lower results depending on which end of the distribution the outliers lie. Most parametric tests are reasonably robust and have large sample sizes, so skewness and kurtosis are less of a problem when the number of respondents is large. Tabachnick and Fidell (2007:80) suggest that at a sample size of more than 200, kurtosis underestimates of variance disappear.
Conducting a correlation
Conducting a Pearson r correlation in SPSS is a straightforward process. The complexity is in ensuring that all the assumptions of Pearson r are met before you proceed, or at least
Phillip Patman
before you attempt to interpret your output. SPSS will attempt to undertake any task you ask it to; just because output is generated does not mean that the statistics are valid.
The first step in conducting a Pearson r correlation between two variables is to select the Correlate option from the Analyze tool list. From this list select Bivariatein this example, we are correlating two variables. This screen is shown in figure 24.4. As shown in figure 24.5, missing variables are excluded from the analysis pairwise which means a missing response about, for example, government spending on health will not be included in any correlations involving this variable.
On this screen, we then add the variables we want to correlate and select Pearson r from our options. In figure 24.5, four items relating to government spending are crossanalysed against each other, two at a time, and the results are laid out in the correlations output table. Using the arrows transfer two or more variables into the Variables window and select Options if needed.
Pearsons: Used for scale (interval or ratio) data. If the data is not normally distributed use Spearman or Kendalls tau-b. Kendalls tau-b: used for ordinal data. Spearman: used for ordinal data and more often reported than Kendalls tau-b.
The default settings are shown in the Options window and other analysis can be selected if required, including means and standard deviations, which can be useful but can be generated elsewhere within SPSS.
Phillip Patman
Now examine the correlation table generated from this analysis. The correlation output has each variable tested against all other selected variables and shown figure 24.6. The lines show that there is a mirror image above and below the grey area (not normally grey) and lines indicate this mirroring in some of the cells. The grey results are the questions correlated to themselves and therefore must be a perfect match or 1.0.
Correlations
Govt spending b. Health Govt spending b. Health Govt spending c. The police and law enforcement Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N 2701.000 .373** .000 2656 .420** .000 2677 .126** .000 2646 2669.000 .250** .000 2652 .351** .000 2626 2687.000 .027 .164 2646 2655.000 1.000 Govt spending c. The police and law enforcement .373** .000 2656 1.000 Govt spending d. Education .420** .000 2677 .250** .000 2652 1.000 Govt spending e. The military and defence .126** .000 2646 .351** .000 2646 .027 .164 2646 1.000
Govt spending d. Pearson Correlation Education Sig. (2-tailed) N Govt spending e. The military and defence Pearson Correlation Sig. (2-tailed) N
Each cell in the table has a similar result where the first number is the correlation coefficient, the second the significance result and the last is the number of people. The number of people can vary from variable to variable depending on how many people have responded to the questions in the analysis.
Govt spending c. The police and law enforcement .373** .000 2656 Correlation result Significance result Number in the sample
There are many different interpretations but Pallant (2007:132) suggests that correlation results can be interpreted as follows: r = 0.10 to 0.29 small correlation r = 0.30 to 0.49 medium correlation r = 0.50 to 1.00 high correlation.
Notation at the bottom of the output can be interpreted as follows: no star: result is not significant one star (*): result is significant at the 0.05 level two stars (**): result is significant at the 0.01 level three stars (***): result is significant at the 0.001 level.
Multiplied by itself
0.370 0.420 0.126
Result
0.137 0.176 0.016 * * *
Multiplied by 100
100 100 100
Percentage of variance
13.7 17.6 1.6
Looking at the other correlations in the table, we can see that there is a medium positive correlation between spending on health and spending on education (r = 0.420, p < 0.001) and a low positive correlation to spending on military and defence (r = 0.126, p < 0.001) and all results are significant at the 0.001 level. From these results, a calculation of the amount of variance or overlap between the variables can be performed. We can see that health has more overlap with police and law and education than military and defence. But how can this be interpreted? One explanation that makes intuitive sense would be that those who want more spending on health also want increased government spending in other areas, but are more likely to want more spending on education than on funding for the police and military.
Phillip Patman
. .
M3: R: How many years of education have you completed? H1e: Level of Agreement with the statement: Aboriginal people who no longer follow traditional lifestyles are not really Aboriginal.
Question
Is there a statistically significant correlation between the number of years of education a respondent has and their attitudes towards Aboriginal identity and Aboriginal traditional lifestyles? How strong is this relationship and in what direction is the correlation? (Remember, to interpret this correlation matrix, you need to check on the variable screen what the direction of the level of agreement is on the likert-type scale.)
How might you explain or theorise the relationship between these two variables?
Run four other correlations you think might be interesting and report and interpret the results.
Now have a play with the data and see what you can find. Remember, this is nationally representative data so any results can be generalised to the Australian population overall.
References
Pallant, J. (2007). SPSS Survival Analysis. Crows Nest: Allen & Unwin. Tabachnick, B. G. and Fidell, L. S. (2007). Using Multivariate Statistics. Boston: Pearson.
Phillip Patman