Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Minitest 1: Data Handling & Regression

1. Classify the data as nominal, ordinal, discrete or continuous: (4 marks)

a) Ages of students in a classroom C


b) Ratings of movies O
c) Colours of hats in a store N
d) Temperature in London C
e) Types of Religion N
f) Shoe size D

2. Popular destinations for the newlyweds of today are Mykonos and Santorini. According to a
recent Greek Wedding Study, a honeymoon, on average, lasts 9.4 days and costs €5111. A
sample of 12 newlyweds reported the following lengths of stay of their honeymoons.

5 14 7 10 6 8 12 9 20 9 7 11

For questions (a) and (b) show your work.

a) For the given data, compute the mean, standard deviation, median and the upper and lower
quartiles. (5 marks)

5 6 7 7 8 9 9 10 11 12 14 20

"
∑𝑥 1 (∑𝑥)! !
𝑥̅ = = 9.8333 … , 𝑠 = . 1∑𝑥 ! − 45 ≈ 4.01838
𝑛 𝑛−1 𝑛

𝑛+1
pos(𝑄" ) = = 3.25 ⇒ 𝑄" = 7
4

𝑛+1
pos(𝑄! ) = = 6.5 ⇒ 𝑄! = 9
2
#(%&")
pos(𝑄# ) = (
= 9.75 ⇒ 𝑄# = 11.75 (also accept 11.5)

A popular method for finding the upper and lower quartiles if n is even, is to divide the data exactly in
half and then find the median of the top half (Q1) and the median of the bottom half (Q3). In this
case you would get 𝑄! = 7, 𝑄" = 11.5. If n is odd exclude the median then split the data.

b) Any newlyweds whose sample value is an outlier is considered very rich and will be charged
extra. By checking for outliers, can you find any couple whose samples indicate an extra
charge? (3 marks)

| Kaplan International Pathways | 1 | kaplanpathways.com


Lower Limit = 𝑄" − 1.5(𝑄# − 𝑄" ) = −0.125 (if 𝑄# = 11.75)

Upper Limit = 𝑄# + 1.5(𝑄# − 𝑄" ) = 18.875 (if Q# = 11.75)

There is only 1 outlier = 20 (for both cases of Q3)

c) Construct a modified boxplot visualising the given data distribution. Create an appropriate
title and label all data entries/axis appropriately. (4 marks)

Box Plot (modified)


25

20

15

10

3. Consider the following data:

𝑥 2 4 5 6 8 11
𝑦 18 12 10 8 7 5

You may use: ∑𝑥 = 36, ∑𝑦 = 60, ∑𝑥 ! = 266, ∑𝑦 ! = 706, ∑𝑥𝑦 = 293

For questions (b) and (c) show your work.


a) Plot a scatter diagram of the data. (2 marks)

20
18
16
14 y = -1.34x + 18.04
12
10
8
6
4
2
0
0 2 4 6 8 10 12

| Kaplan International Pathways | 2 | kaplanpathways.com


b) Calculate the correlation coefficient 𝑟 giving your answer correct to 3 decimal places and
describe the relationship between the variables. (3 marks)

(∑ 𝑥)! 36!
𝑆)) = ∑𝑥 ! − = 266 − = 50
𝑛 6
(∑ 𝑦)! 60!
𝑆** = ∑𝑦 ! − = 706 − = 106
𝑛 6
∑𝑥∑𝑦 36 × 60
𝑆)* = ∑𝑥𝑦 − = 293 − = −67
𝑛 6

𝑆)* −7
𝑟= = = −0.9203
Q𝑆)) Q𝑆** √50√106

The correlation coefficient r shows a strong positive linear relationship between the variables x
and y.

c) Calculate the equation of the regression line 𝑦 = 𝑎 + 𝑏𝑥. (3 marks)

𝑆)* −67
𝑏= = = −1.34
𝑆)) 50

∑𝑦 ∑𝑥
𝑎 = 𝑦U − 𝑏𝑥̅ = −𝑏 = 10 − (−1.34) × 6 = 18.04
𝑛 𝑛

𝑦 = 18.04 − 1.34𝑥

d) Sketch the regression line on your scatter diagram and give an interpretation of the
coefficients 𝑎 and 𝑏 in part (c). (2 marks)

When 𝑥 = 0, then 𝑦 = 𝑎 is the 𝑦-intercept. 𝑏 is the slope and measures the steepness of a straight
line and is defined as the ratio of the change in 𝑦 to the change in 𝑥.

| Kaplan International Pathways | 3 | kaplanpathways.com

You might also like