Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

What is correlation?

Correlation is a statistical measure that expresses the extent to which two variables
are linearly related (meaning they change together at a constant rate). It’s a common tool for describing
simple relationships without making a statement about cause and effect. How is correlation measured?
The sample correlation coefficient, r, quantifies the strength of the relationship. Correlations are also
tested for statistical significance. What are some limitations of correlation analysis? Correlation can’t
look at the presence or effect of other variables outside of the two being explored. Importantly,
correlation doesn’t tell us about cause and effect. Correlation also cannot accurately describe curvilinear
relationships. Correlations describe data moving together Correlations are useful for describing simple
relationships among data. For example, imagine that you are looking at a dataset of campsites in a
mountain park. You want to know whether there is a relationship between the elevation of the campsite
(how high up the mountain it is), and the average high temperature in the summer. For each individual
campsite, you have two measures: elevation and temperature. When you compare these two variables
across your sample with a correlation, you can find a linear relationship: as elevation increases, the
temperature drops. They are negatively correlated. What do correlation numbers mean? We describe
correlations with a unit-free measure called the correlation coefficient which ranges from -1 to +1 and is
denoted by r. Statistical significance is indicated with a p-value. Therefore, correlations are typically
written with two key numbers: r = and p = .  The closer r is to zero, the weaker the linear relationship. 
Positive r values indicate a positive correlation, where the values of both variables tend to increase
together.  Negative r values indicate a negative correlation, where the values of one variable tend to
increase when the values of the other variable decrease.  The p-value gives us evidence that we can
meaningfully conclude that the population correlation coefficient is likely different from zero, based on
what we observe from the sample. The output of the analysis is shown below. The Model Summary
table reports the correlation coefficient as R (note it should be a lower case r for bivariate correlation,
but it isn’t). The R Square statistic is in the second column and is also known as “proportionate reduction
in error”or “variance accounted for.” The second table is the ANOVA summary table that tests the null
hypothesis. In the case of correlation the null hypothesis is that the correlation is zero. In this case we
reject the null hypothesis because the p value is less than .05. In this case the p value is .003. The final
table presents the regression coefficients. Looking at the Un-standardized Coefficients column you see B
weights. The B weight (613.611) in the Constant row is referred to as the intercept. The B weight (-
39.861) in the predictor row (Birth Order) is referred to as the slope. Note that the correlation will be
negative when the slope has a negative value. These coefficients are used to form the following linear
regression equation. y =613.61 - 39.8

You might also like