Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Assignment V

Circle the letter corresponding to the best answer.


1. We want to calculate the correlation r between two variables X and Y for a sample of individuals.
Which of the following conditions are necessary for r to be a meaningful measure of association?

I. X and Y are both quantitative variables.


II. The relationship between X and Y is linear.
III. X is an explanatory variable and Y is a response variable.

(A) I only (B) II only (C) I and II only (D) I and III only (E) I, II and III

2. We measure the distance (in km) of a sample of commercial airline flights, as well as the price
(in Canadian $) for a ticket on each of the flights. The correlation between the two variables is
calculated to be r = 0.53. What would be the value of the correlation if we had instead measured
distance in miles (1 mile = 1.61 km) and ticket price in U.S. dollars ($1 U.S. = $1.31 Canadian)?

(A) 0.53 (B) 0.84 (C) 0.62 (D) 0.32 (E) 0.43

3. We have gathered data from a sample of individuals for some explanatory variable X and some
response variable Y. We plot the data on a scatterplot and we see that a linear relationship is a
reasonable assumption. We fit the least squares regression line to the data. This is the line that:
(A) minimizes the sum of the residuals.
(B) maximizes the value of the correlation.
(C) minimizes the sum of the deviations from the points to the line in the horizontal direction.
(D) minimizes the sum of the squared residuals.
(E) minimizes the sum of the squared deviations from the points to the line in the horizontal
direction.

4. The next three questions (17 to 19) refer to the following:


A sample of U of M students is selected. The distance X (in km) between a student’s place of
residence and the time Y (in minutes) it takes them to get to the university are recorded. The least
1
squares regression line is calculated to be ˆy = 2.3 + 1.7x. It is reported that 75% of the variation
in time can be accounted for by its regression on distance.

5. What is the value of the correlation between distance and time?


(A) 0.250 (B) 0.866 (C) 0.625 (D) 0.750 (E) 0.563

6. One student lives 5 kilometres from the university and takes 8 minutes to get there. What is the
value of the residual for this student?
(A) −2.8 (B) 2.8 (C) −10.9 (D) 10.9 (E) 10.8

7. If we had instead measured distance in miles (1 mile = 1.61 km), which of the following values
would change?
(I) slope
(II) intercept
(III) correlation

(A) I only (B) II only (C) I and II only (D) I and III only (E)
II and III only

8. To study the relation between the output, y (in volts) of a windmill and the wind velocity, x (in
km per hour), a researcher collects 20 pairs of observations. Her scatterplot suggests a linear
association between x and y. She calculates
x = 6.8, y =1.8, sx = 2.3, sy = 0.04, and r = 0.96
The most appropriate statement describing this data set is:
A) there is strong negative linear association between x and y.
B) there is weak negative linear association between x and y.
C) there is strong positive linear association between x and y.
D) there is weak positive linear association between x and y.
E) none of the above.

2
9. In a game of chance, your chance of winning a game is 0.2. If you play the game five times and
outcomes are independent, then the probability that you win at most once is (show your work):
A) 0.3277 B) 0.2 C) 0.4096 D) 0.7373 E) 0.5904

10. In a study of 82 young drivers (under the age of 32), 39 were men who were ticketed, 11 were
men who were not ticketed, 8 were women who were ticketed, and 24 were women who were not
ticketed. If one of these subjects is randomly selected, use the general additional rule to find the
probability of getting a man, or someone who was ticketed (show your work).
A) 50% B) 27% C) 16% D) 100% E) 71%

11. Which of the following pairs of variables would be the most likely to have a correlation close
to r = 0.5?
(A) Select a sample of commercial airline flights leaving from the airport one day: X = flight
distance in kilometres; Y = flight distance in miles
(B) Select a sample of grocery stores: X = price of orange juice; Y = amount of orange juice sold
(C) Select a sample of STAT 1000 students: X = number of incorrect answers on the midterm test;
Y = score on the test
(D) Select a sample of adults in Winnipeg: X = IQ; Y = weight
(E) Select a sample of male students at the University of Manitoba: X = height; Y = shoe size

12. A small graduate class of three students writes a math test. The student who finished writing
the fastest got the highest score in the class. The student who finished second got the second highest
score, and the student who took the longest to write the test got the lowest score. If X is the time
it takes for a student to write the test and Y is the student’s test score, then what can be said about
the correlation r between X and Y for this class?
(A) There is a perfect negative linear relationship between X and Y , and so r = −1.
(B) The correlation between X and Y is negative, but not necessarily equal to −1.
(C) There is no linear relationship between X and Y , and so r = 0.
(D) There is a perfect positive linear relationship between X and Y , and so r = 1.
(E) The correlation between X and Y is positive, but not necessarily equal to 1.

3
13. Two quantitative variables X and Y are measured on a sample of five individuals. Consider
the following (incomplete) table of values for this data set.

Xi yi xi-𝑋̅ Yi-𝑌̅ (xi-𝑋̅)(yi-𝑌̅)


3 -5 25
3 7 -3 12
6 8 -4 1 -1
10 11 -1 2 6
3 5

The means and standard deviations are calculated to be 𝑥̅ = 7, 𝑦̅ = 6, sx = 5, sy = 4. What is the


value of the correlation between X and Y for this data set?
(A) 0.9650 (B) 0.9775 (C) 0.9625 (D) 0.9850 (E) 0.9575

14. Determine whether the correlation for each of the following pairs of variables is most likely
positive or negative:
(I) X = Speed of wind in a snowstorm Y = Visibility
(II) X = Global supply of oil Y = Price of gasoline
(III) X = Number of people in line at a bank when you arrive
Y = Time until you are served by a teller

(A) (I) negative, (II) positive, (III) positive


(B) (I) positive, (II) positive, (III) positive
(C) (I) negative, (II) negative, (III) positive
(D) (I) positive, (II) negative, (III) negative
(E) (I) negative, (II) negative, (III) negative

15. We record the heights X (in cm) and weights Y (in kg) of a sample of individuals. We calculate
the correlation between X and Y to be r = 0.56. Now suppose that we reversed the roles of X and
Y, i.e., define weight as X and height as Y. The correlation between X and Y would now be:
(A) 0.56 (B) 0.44 (C) 0.65 (D) −0.44 (E) −0.56

4
16. A national consumer magazine obtained data for several variables measured on a random
sample of cars. The magazine reported the following correlations:
• The correlation between car weight and car reliability is −0.30.
• The correlation between car weight and annual maintenance cost is 0.20.
Which of the following statements is/are true?
(I) Lighter weight cars tend to be more reliable.
(II) Heavier cars tend to cost more to maintain.
(III) Car weight is related more strongly to maintenance cost than to reliability.

(A) I only
(B) II only
(C) I and II only
(D) II and III only
(E) I, II and III

17. Can the number of calories in breakfast cereal be predicted by the sugar content? Re searchers
gathered data for 10 breakfast cereals, including sugar content and calories per serving (both in
grams). The data are as follows:

Cereal 1 2 3 4 5 6 7 8 9 10
Sugar 4.3 7.1 3.8 5.7 8.5 4.2 9.7 3.5 4.9 6.3
Calories 99 109 97 106 107 104 112 102 103 102

The correlation between sugar content and calories for this sample is calculated to be 0.84. The
equation of the least squares regression line is:
(A) 𝑦̂= 93.54 + 1.82x
(B) 𝑦̂= 101.84 + 0.39x
(C) 𝑦̂= 95.23 + 1.53x
(D) 𝑦̂= 114.29 + 1.82x
(E) 𝑦̂ =104.10 + 0.39x

5
18. The next two questions (16 and 17) refer to the following: We would like to determine whether
a man’s shoe size can be used to predict his height. The shoe sizes and heights (in inches) of a
random sample of eight men are shown below:
Shoe Size 11 10 9.5 12 11 11.5 10.5 10
Height (inches) 69 70 67 74 72 70 71 68
The correlation between shoe size and height is calculated to be r = 0.78, and the equation of the
least squares regression line is calculated to be 𝑦̂ = 50 + 2x.

19. What is the correct interpretation of the slope of the least squares regression line?
(A) When a man’s shoe size increases by one, his height increases by two inches.
(B) When a man’s height increases by two inches, we predict his shoe size to increase by one.
(C) When a man’s shoe size increases by two, we predict his height to increase by one inch.
(D) When a man’s height increases by one inch, we predict his shoe size to increase by two.
(E) When a man’s shoe size increases by one, we predict his height to increase by two inches.

20. Which of the following statements is false?


(A) The predicted height of a man with a size 11 shoe is 72 inches.
(B) It would not be appropriate to use this regression line to predict the height of a man with a size
8 shoe.
(C) About 78% of the variation in height is accounted for by its regression on shoe size.
(D) The high correlation between shoe size and height does not indicate a causal relationship.
(E) It would not be appropriate to use this regression line to predict the height of a woman from
her shoe size.

21. An economist would like to determine whether the amount of a country’s exports (in billions
of dollars) can be predicted by the country’s population (in millions). He collects data for a random
sample of 86 countries, and the least squares regression line is calculated to be ˆy = 2+1.5x. One
country in the sample has 24 billion dollars of exports and a population of 26 million. What is the
value of the residual for this country?
(A) −17 (B) 17 (C) −12 (D) 12 (E) 15

6
22. A class of fourth year statistics students is studying for their final exam, which will be marked
out of 50. Their midterm results (also out of 50) have already been posted. Consider predicting
their final exam scores (y) based on their midterm (x). The data from last year's class are:
Midterm (x) 16 23 27 29 34 35 37 41 43 48
Final Exam (y) 11 25 28 31 25 30 34 40 37 42

From these, it can be shown that 𝑥̅ = 33.3, 𝑦̅ = 30.3, sx = 9.7188, sy = 8.9697 and r = 0.9191. The
least squares regression line is:
A) 𝑦̂ = 2.0516 + 0.8483x
B) 𝑦̂= –5.7806 + 1.0835x
C) 𝑦̂ = –4.0756 + 1.0323x
D) 𝑦̂= 7.9565 + 0.8483x
E) 𝑦̂ = 0.4700 + 1.0835x

23. When we use the least-squares regression criterion to fit a straight line to a set of data, we are
choosing the line that minimizes:
A)the sum of the squares of the horizontal distances between the points and the line.
B)the sum of the squares of the perpendicular distances between the points and the line.
C) the sum of the perpendicular distances between the points and the line.
D)the sum of the horizontal distances between the points and the line.
E) the sum of the squares of the vertical distances between the points and the line.

24. Which of the following is true of the least-squares regression line?


a) The slope is the change in the response variable that would be predicted by an increase of 1 unit
in the explanatory variable.
b) It always passes through the point (x, y), the means of the explanatory and response variables,
respectively.
c) It will only pass through all the data points if r = 1 or r = – 1.

7
d) a), b) and c) are all true.
e) None of a), b) and c) are true.

25. A researcher wishes to study how the height of children during early adolescence (ages 12-14)
is affected by milk consumption. She plots the height of children (in inches) versus their milk
consumption (in cups/day), and decides to fit a least squares regression line to the data with x as
the explanatory variable and y as the response variable. She computes the following quantities:
• Correlation between height and milk consumption is 0.9
• mean milk consumption is 6.5 cups/day
• mean height is 60 inches • standard deviation of milk consumption is 3.6 cups/day
• Standard deviation of height is 1.2 inches.
The equation of the least-squares regression line is:

a) 𝑦̂= 0.3 + 58.05x


b) 𝑦̂ = 0.3 – 58.05x
c) 𝑦̂= 58.05 + 0.3x
d) 𝑦̂= 58.05 – 0.3x
e) 𝑦̂ = – 155.5 + 2.7x

26) The correlation, r, provides:


a) a measure of the extent to which changes in one variable cause changes in another variable.
b) a measure of the strength and direction of the linear association between two categorical
variables.
c) a measure of the strength and direction of the association (not necessarily linear) between two
categorical variables.
d) a measure of the strength and direction of the linear association between two quantitative
variables.
e) a measure of the strength and direction of the linear association between a quantitative variable
and a categorical variable.

8
27) The equation of the least-squares regression line of stopping distance (x), in feet) on speed (x),
in km/hr) is:
𝑦̂= – 36.22+ 0.94 x

The slope of the line tells us that:


a) 94% of cars stop at a speed of 36 km/hr.
b) stopping distance increases an average of 0.94 feet with an increase in speed of 1 km/hr.
c) the relationship between speed and stopping distance is linear.
d) there is a high positive correlation between speed and stopping distance.
e) stopping distance increases an average of I feet with an increase of speed of 0.94 km/hr.

28. In a simple linear regression of y on x, the residual at xi is:


(A) the observed yi
(B) the predicted 𝑦̂𝑖
(C) the difference between yi and y
(D) the observed yi minus the predicted 𝑦̂𝑖
(E) none of the above

Problems

29. A financial analyst provides you with the following ex-ante data regarding returns of BCE and
the Market index in the following year:
State Probability S&P 500 BCE
X Y
Boom 0.4 14% 30%
Normal 0.5 8% 18%
Recession 0.1 4% 10%

In the space provided, please compute the least squares regression line for the data above.

9
a) The expected return to BCE stock is 22% with standard deviation of returns is 6.93% and the
expected return to the S&P 500 is 10% with standard deviation 3.46%. Calculate the correlation
coefficient of BCE and S&P 500 returns.
b) Compute the least squares regression line of BCE returns vs. S&P 500 returns.

30. The leaning Tower of Pisa is leaning more over time. Eventually, it will fall. The following is
selected data on the tower’s lean over a thirteen year period. Lean, measured in 10th’s of a
millimeter is the distance between where a point at the top of the tower was when the observation
was taken and where it would have been if the tower were straight:
Year (X) 1 3 5 7 9 11 13
Lean (Y) 642 656 673 696 713 725 757

a) What is the principle behind the “least-squares regression line”?


b) Find the least-squares linear regression line of “Lean vs. Year”
c) Given the above information, interpret the slope of the regression line.
d) Would you use the above regression line to predict what the Lean would be in Year equal to
50? Briefly explain your answer

Deadline: Lundi 31/ 01 /2022

Prof: CACEUS Taylor

10

You might also like