Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

STAT 1430 Recitation 4A Scatterplots and Correlation

Basics of correlation:

1. Write down a data set that contains five (x, y) points whose correlation is EXACTLY
NEGATIVE ONE (-1). Hint: A scatterplot might help.
Data X value Y value
Point
1
2
3
4
5

2. Write down a data set that contains five (x, y) points whose correlation is EXACTLY
POSITIVE ONE (+1)
Data X value Y value
Point
1
2
3
4
5

Properties of correlation. See your lecture notes. Looking at the formula on the formula
sheet for r may help, but also consider the statistical reasons for your answers, not solely
math.

3. Correlation has .
a) The same units as the data
b) No units

4. When finding the correlation between two quantitative variables, you will get the same
answer if you switch X and Y. Explain briefly. (See lecture notes)

5. Scatterplots examine relationships between what type(s) of variables?


a) Categorical b) Quantitative c) Both categorical and quantitative variables

6. Correlation is affected by outliers. Explain why, briefly.

7. Correlation is a measure of the strength and direction of what type of relationship between
two quantitative variables? ______________________________

8. Correlation of a sample (r) will always be a number between and

1
STAT 1430 Recitation 4A Scatterplots and Correlation

A professor examines the relationship between minutes studying and exam score (out of 200
points) for students in his course using data from 320 randomly selected students from her
large class. Use the results below to answer the following questions:

9. Which of the following correlations is most plausible for this data set, given the scatterplot
above?
a. 0.061
b. 0.61
c. -0.61
d. 6.1

10. Interpret this correlation (pattern, strength, direction).

Bob wants to use house size to predict house price in Columbus. He starts out by making a
scatterplot of data from 100 homes randomly selected from the current Columbus market. In
this case:

11. Discuss how Bob could take a random sample of homes from the current Columbus market
(this means all the homes that are currently for sale in Columbus.)

12. What are the names of the two quantitative variables in this data set?

13. On Bob’s scatterplot, which variable makes the most sense to appear on the X axis?

-----------------------------------------------------------------------------------------
14. Which of the following correlations is the strongest?
a. 0.75
b. 0.45
c. -0.83

2
STAT 1430 Recitation 4A Scatterplots and Correlation

15. Each of the following statements about correlation has a statistical problem, in terms of the
properties of correlation (see lecture notes). In each case identify the problem.

a. The correlation sales and customer returns is .35. This means the correlation between
customer returns and sales is -.35.

b. The correlation between the shelf number at the store (1 = lowest) and the cost of the
product (per ounce) is 1.2.

c. There is a strong correlation between region of the country and sales.

d. The correlation between number of rushing yards and number of passing yards for Ohio
State is 0.6. In feet, the correlation is 0.6x12, converting to feet.

e. The correlation between weight and height is .4 inches per pound.

The following scatter plot shows the relationship between vocabulary size and age. The
variables vocabulary size and age have a correlation of 0.96. Answer the following questions
based on the known facts.

16. True/false: Having a correlation of positive 0.96, means that there is a 96% chance that a
subject at age 4 will have a vocabulary size of 1500 words.
a) True b) False

17. True/false: The correlation of positive 0.96, means that 96% of the subjects observed had a
vocabulary size that was greater than their age.
a) True b) False

3
STAT 1430 Recitation 4A Scatterplots and Correlation

NOTE: Correlation does NOT ALWAYS imply Causation, which says that a correlation between
two variables does not always mean that a change in one variable causes a change in the other
(there could be other reasons the data has a good correlation). If you control for the other variables,
yes, a causal relationship can be established. Otherwise, no.

18. Please interpret the correlation for this problem. (Remember the three things, and use the
context of the problem)

19. By looking at this scatter plot we can see that higher ages correspond with a higher
vocabulary size. But just because two variables are correlated does not mean one CAUSES
the other. Explain why this case is a good example of that.

Explaining Ice Cream Sales (based on source: Mathisfun.com)


A local ice cream shop keeps track of how much ice cream they sell in a day (in $) versus the
temperature on that day, in degrees Celsius. Here are their figures for 12 days during the summer:

Ice Cream Sales vs Temperature


Temperature °C Ice Cream Sales
14.2° $215
16.4° $325
11.9° $185
15.2° $332
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
18.1° $421
22.6° $445
17.2° $408

Here is the data as a scatterplot:

4
STAT 1430 Recitation 4A Scatterplots and Correlation

20. The correlation is 0.9575. Interpret this value in the context of the problem. (Use the 3 items
described in your lecture notes.)

21. If we changed the temperature units from Celsius to Fahrenheit what would happen to the
correlation, if anything?

Here is a graph of the ice cream sales during end of summer/beginning of fall when temperature
changes a lot more than just during the summer, using WEEK OF THE YEAR as the X variable:

Week of the Year

22. Interpret this 2nd scatterplot in the context of the problem. What do you think might explain
the difference in pattern from the 1st scatterplot?

23. The calculated value of correlation for the 2nd scatterplot is 0. Why is this? There appears to
be a definite relationship between Week and Sales.

24. Interpret this correlation in the context of the problem. Include any additional information
that will help the ice cream manager know what’s going on.

5
STAT 1430 Recitation 4A Scatterplots and Correlation

Our Ice Cream shop finds out how many sunglasses were sold by a nearby Walmart on each of the 12
days that the ice cream sales were recorded, and compares sunglasses sales to their ice cream sales:

25. Interpret the scatterplot in the context of the problem.

26. The correlation between Sunglasses and Ice Cream sales is 0.94. Interpret this correlation.

27. Explain what may be going on here to create this correlation that seems strange on its own. Hint:
There is another variable out there that is related to BOTH of these variables. What is it?

You might also like