Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

STAT 1012: Statistics for Life Sciences

Solutions to Practice Problems (Chapter 1)

Problem 1: (a) Quantitative (Discrete) Variable, (b) Quantitative (Continuous) Variable,


(c) Categorical Variables, d) Categorical Variables
Problem 2: 1, 3 and 4 are continuous variables, as they can be expressed as infinite
decimals.
Problem 3: (a) Median for Men = Median for Women = 0
Sample Mean for Men = (1541+2×43)/8658 = 0.1879
Sample Mean for Women = (2773+2×105)/8739 = 0.3413
(b) In this example, median = 0 only tells us that more than half of the observations are zero.
The mean value, however, is able to tell us the number of times married on average.
Problem 4: (a) x = (−1 2 + 1 2 + 2  2 + 4) / 10 = 0.8, y = −2 x + 1 = −0.6
(b) Note that y = 2x3 is a strictly increasing function in x, and therefore the mode of y1,
y2, ..y10 should be equal to the mode of x1, x2, .. x10. That is, Modey = 2(Modex)3=2(0)=0
(c) Note that y = 3x2 is a NOT strictly increasing function in x, and therefore there is no
simple relationship between Modey and Modex. Now y1, y2, ..., y10 are given by 3, 3, 0, 0, 0, 3,
12, 12, 48. Therefore Modey =3.
Problem 5: Data Set A: Sample Mean is not good in the presence of outlier (70), Mode (=23)
is not good because it is also the minimum value. Median (=29.5) should therefore be the
best measure of location.
Data Set B: For a distribution which is unimodal and roughly symmetric, all the sample
mean, median and mode should be good as measures of location.
Data Set C: The median and the mode (=1) show that there are more students taking
STAT1012 than without. However, the sample mean (=0.8) is able to show the proportion of
students taking STAT1012, which is more informative.
Problem 6:
(a) Histogram should be used, as the
number of children is a quantitative
variable. In a bar graph, the data values do
not have to be ordered, and therefore
unable to figure out the shape of the
distribution.
(c) right-skewed
(d) 2 modes: 1 and 2
Problem 7: (a) x = (i =1 xi ) / n = (1  5 + 2  6 + 3  4 + 4  3 + 5  2) / 20 = 2.55 , Mode = 2
n

Median = average of the 10th and 11th smallest observation = (2+2)/2=2


(b) The distribution is right-skewed because the right-hand tail is longer.
(c) in=1 xi2 = (12  5 + 22  6 + 32  4 + 42  3 + 52  2) = 163 , s= (
n
)
x − nx 2 /(n − 1) = [163 − 20(2.55) 2 ] / 19 = 1.317
2
i =1 i

Page 1/3
Problem 8: (d) Both Median and IQR are insensitive to outliers. However, both the mean
and the standard deviation are very sensitive to the outlying values, because their values
will be distorted greatly when averaging.

Problem 9: (a) is true because it’s a square root value. b) is false: s can actually be zero,
when all the data points have the same value. c) is false: With the presence of outliers, s is
not a good measure of spread. However, the interquartile range (IQR) is a better measure of
spread in that case, as it depends on the middle half of the data points only.

Problem 10: (a) x = (i =1 xi ) / 10 = 100 / 10 = 10, s =


10
( 10 2
i =1 i
)
x − 10 x 2 /(10 − 1) = [1294 − 10(10)2 ] / 9 = 5.715

(b) x = (i =1 xi ) / 11 = (100 + 10) / 11 = 10, s =


11
( 11
)
x − 11x 2 /(11 − 1) = [(1294 + 100) − 11(10)2 ] / 10 = 5.422
2
i =1 i

Problem 11: (a) Range = 6 - 0 = 6


(b) x = (4 + 6) / 8 = 1.25, 
n
i =1
xi2 = 4 2 + 6 2 = 52 , s = ( n
i =1
)
xi2 − nx 2 /(n − 1) = (52 − 8(1.25) 2 ) / 7 = 2.375


n
(c) Range = 60, x = (4 + 60) / 8 = 8, x = 42 + 602 = 3616 ,
2
i =1 i

s= ( n
)
x − nx 2 /(n − 1) = (3616 − 8(8)2 ) / 7 = 21.058
2
i =1 i

Both the range and standard deviation are sensitive to outliers, with the presence of the
data point 60 increasing the values for both substantially.

Problem 12: Note that mean and mode are preserved under both translation and rescaling,
while standard deviation and range are preserved under rescaling only. Hence,
y = 10 x + 2 = 10(10) + 2 = 102, mode y = 10mode x + 2 = 10(12) + 2 = 122
s y =| 10 | s x = 10( 4) = 40, range y =| 10 | range x = 10(13) = 130

Problem 13: Sample mean = 10/10 = 1. In this case, the sample variance can be computed
through alternative formula, given by s2 = ( n
i =1
)
xi2 − nx 2 /(n − 1) = (3300 − 10[1]2 ) / 9 = 365.56 .

Problem 14: s 2y = 2 2 s x2 = 4(21) = 84 , Rangey=|-2|Rangex = 2(10)=20.

Problem 15: (a) np/100 = 16(10)/100=1.6 => k =2. Hence the 10th percentile of body fat
percentage is the 2nd smallest observation of body fat percentage = 25.9%.
(b) np/100 = 16(81.25)/100 = 13. Hence the 81.25th percentile of age is the average of the
13th and 14th smallest observation of age = (57+58)/2 = 57.5 years.

Problem 16: (a) IQR = Q3 - Q1 = 245 - 145 = 100, hence the thresholds to define the outliers
are Q3 + 1.5IQR = 245 + 150 = 395 and Q1 - 1.5IQR = 145 - 150 = -5
Since all the data are within the two thresholds, there is no outlier in the data.
Page 2/3
Problem 17 (b) From the boxplots, IQR is roughly equal to 80-60=20. Hence the length of
the vertical bars cannot be longer than 1.5×IQR=30, with (ii) violating the criterion. (i) is a
valid boxplot, with the absence of a vertical line suggesting that the largest 25% of the data
points are of the same value (=80!). (iii) is a valid boxplot with two outliers in the data.

Problem 18:
Q1: (np/100)=(11)(25)/100=2.75≤3=k.
=> Q1 = 3rd smallest observation = 11.
Q3: (np/100)=(11)(75)/100=8.25≤9=k.
=> Q3 = 9th smallest observation = 15.
Median = (11+1)/2th smallest observation
= 6th smallest observation = 13.
IQR=15-11=4. The thresholds for the
outliers are therefore
Q3 + 1.5IQR= 15+1.5(4)=21 and
Q1 - 1.5IQR=11-1.5(4)=5
Based on the thresholds, we can see that
the data values 2 and 25 are outliers, with
the largest and smallest non-outlying values
given by 9 and 18.

Problem 19: (a) Mean = 4.6375, Median = (1.2+1.8)/2=1.5


(b) Q1 = average of 2nd and 3rd smallest values = (0.7+1.1)/2=0.9
Q3 = average of the 6th and 7th smallest values = (9.8+2.3)/2 = 6.05
IQR = Q3 - Q1 = 5.15 => Lower Threshold = Q1 - 1.5(IQR)=0.9-7.725=-6.825
Upper Threshold = Q3 + 1.5(IQR)=6.05+7.725=13.775
Since 20 is the only data point outside the two thresholds, there is one outlier (20) in the
data set.

Problem 20: (a) Stem-and-leaf Plot: (b) Since np/100=20(25)/100=5 is an integer,


Q1 = average of the 5th and 6th smallest
observation
= 2800+(38+41)/2=2839.5 gram

Since np/100=20(75)/100=15 is an integer,


Q3 = average of the 15th and 16th smallest
observation
= (3323+3484)/2 = 3403.5 gram

Page 3/3

You might also like