Professional Documents
Culture Documents
Correlation 1
Correlation 1
Definition:
1. Finding the relationship between two quantitative variables without being able to infer
causal relationships.
2. Correlation is a statistical technique used to determine the degree to which two
variables are related
Explanation:
A correlation is expressed by a value, called a correlation coefficient, which is
between 1 and + 1. The further away the correlation coefficient is from 0 the
stronger the relationship between the variables. If the correlation coefficient is 0 it
means there is no relationship between the variables being measure.
An example of two variables that could produce a 0 correlation coefficient are
illustrated below. In reality the correlation coefficient of these data points will be
very close to zero as it is very rare that there is absolutely no relationship
between variables.
A perfect positive correlation has a coefficient of +1. A positive correlation means that
the high values of one data set are matched with the high values of the other data set,
or as the values for one variable increase so do the values for the other variable.
It would seem from the chart that the last x-variable was suspect and either very
atypical or even a mistake, i.e. it is an outlier.
Regression :
Calculates the best-fit line for a certain set of data.
The regression line makes the sum of the squares of the residuals smaller than
for any other line
Regression Analyses
Regression: technique concerned with predicting some variables by
knowing others
The process of predicting variable Y using variable X
Regression equation describes the regression line mathematically
Intercept
Slope
predicting from linear relationships
The simplest type of statistical prediction uses linear correlation between a variable of
interest and another variable that either directly affects it, or is at least correlated with it
(the explanatory variable).
If we look at a scatter plot that shows the size of ten towns along the bottom (x-axis)
and the number of dentists in each of those towns up the side (y-axis).
So in the first town there is a population of 50,000 people and a total of 6 dentists. If
these two variables were perfectly correlated it would be very easy to predict how many
dentists would be in any size town. It would simply be a matter of tracing a vertical line
from the town size on the x-axis up to the line passing through the all the points and
then taking a horizontal line from there over to the y-axis.
Unfortunately our variables are not perfectly correlated. So instead of using a line which
joins all the points we use what is called a least squares regression line. This is a line
which passes through all the data points in such a way that the minimum possible
distance lies between each point and the line. This line is also called the line of best fit
and is illustrated using our original dentist/population chart below.