Professional Documents
Culture Documents
Homework M2 Solution
Homework M2 Solution
Homework M2 Solution
Homework M2
1. People with diabetes must monitor and control their blood glucose level. The goal is to maintain
“fasting plasma glucose” between 90 and 130 milligrams per deciliter (mg/dl). Here are the
fasting plasma glucose levels for 18 diabetics enrolled in a diabetes control class, 5 months after
the class has ended.
a) Calculate the median, mean, geometric mean, mange, variance, standard deviation, and
interquartile range (IQR) of fasting blood glucose levels.
Note: Sometimes when you calculate a percentile location you may get something like 5.5,
which we know means the median lies ½ way between the 5 th and 6th value in the ordered
dataset- so we can just average the 5 th and 6th values to get median. But for 1st and 3rd
quartiles you get a value like 5.75 it means you have to find the number that is ¾ between
the 5th and 6th observation. To do this find the difference between the 5 th and 6th observation
in the ordered dataset. Multiply the difference by 0.75 and then add this amount to the 5 th
observation in the ordered dataset.
We did not see an example of this in the kectture but you will in this problem.
As an Example assume we have data: 34 78 79 81
3rd quartile location (75th percentile) is 0.75(4+1) = 3.75
So the 3rd and 4th observation is 79 and 81
Calculate: (81-79)*0.75 = 1.5
So 3rd quartile, Q3 = 79+1.5 = 80.5
Solution
78,95,96,103,112,134,141,145,147,148,153,158,172,172,200,255,271,359
Mean=
∑x
n
Range=maximum−minimum
Range=359−78=281
x (x-mean) (x-mean)^2
141 -22.278 496.3093
158 -5.278 27.85728
112 -51.278 2629.433
153 -10.278 105.6373
134 -29.278 857.2013
95 -68.278 4661.885
96 -67.278 4526.329
78 -85.278 7272.337
148 -15.278 233.4173
172 8.722 76.07328
200 36.722 1348.505
271 107.722 11604.03
103 -60.278 3633.437
359 195.722 38307.1
145 -18.278 334.0853
147 -16.278 264.9733
172 8.722 76.07328
255 91.722 8412.925
Total 84867.61
variance=
∑ ( x−x )2 = 84867.61 =4992.212
n−1 17
IQR=Q3-Q1
Q3=172+0.25*(200-172)=172+4.5=176.5
Q1=103+0.75*(112-103)=103+6.75=109.75
IQR=176.5-109.75=66.75
b) Which measure of central tendency would you select to report for these data and why?
Solution
Median would be the best measure of central tendency because the data contains an extreme
value i.e 359.
c) Which measure of dispersion would you select to report for these data and why?
Solution
Standard deviation is always the best measure of dispersion because it measures the averages
of the dispersions from the mean.
d) What is your conclusion on the diabetic control class’s success based on these data?
Solution
There is no significant statistical evidence that diabetes control class had variation in blood
glucose level
e) Sketch the boxplot of these data
Solution
2. Suppose that you and your friends emptied the coins in your pockets, wallets, etc recorded the
year marked on each coin. What do you think the shape of this distribution would look like—
skewed left, skewed right, or symmetric? Explain.
Solution
The shape of distribution of year marked on coin will be left skewed because new coins
are more than older coins.
a) Calculate the geometric mean by multiplying the 7 numbers together and taking the
7th root.
Solution
Geoemetric mean=7 √ 1∗10∗100∗1000∗10000∗100000∗1000000
Geoemetric mean=7 √ 1000000000000000000000
Geoemetric mean=1000
b) Use log base 10 and convert all these numbers to the log scale. List the converted
log base 10 values.
Solution
0, 1,2,3,4,5,6
c) Using the results from part b, find the geometric mean by averaging all the
converted log base 10 numbers and taking the antilog base 10. The answer should
be the same as part a.
Solution
Geometric mean=bx=
( 0+1+2+3+ 4+5+ 6 )
7
=b
21
7( )
Geometric mean=10^3=1000
d) Instead of using log base 10, use the natural log to compute the geometric mean.
Show all steps.
This answer should be EXACTLY the same as part a. If it is not, it should be very very
close, and it is due to the rounding that it did not turn out to be exactly the same.
Solution
a) Which country has more dispersion in household size? What might this tell you about each
country?
Solution
South Africa has more dispersion in household size than United Kingdom. This
indicates that the household sizes in UK are usually smaller in size than it is in South
Africa. Again there is a huge variation when it comes to household sizes in South
Africa.
b) While you do not have the data, if you were to calculate the mean household size for each
country, which country do you expect has a larger mean?
Solution
I expect south Africa to have a larger mean since it has more values on the higher
side of the x-axis.
c) For South Africa, which measure of dispersion do you expect to be the better statistic to
summarize the typical value – MEAN or MEDIAN?
Solution
Median is the best measure of central tendency because , the south Africa data is
much spread and there are extreme values. Median is not affected by extreme
values.
5. Use the following results from the DIETFITS randomized control trial to answer the questions below.
a) Which group had the highest median weight change over 12-months?
Solution
b) Which group has the smallest interquartile range of weight change over 12-months?
Solution
c) Which group has the person who had the most weight loss over 12-months?
Solution
d) Does this graph (and data) provide any compelling evidence that the type of diet matters or
whether a person’s genotype matters for weight loss over 12 months? Explain.
Solution
The data cannot give any compelling evidence that the diet type matters or whether a person’s
genotype matters for weight loss over a period of 12 months.
6. In the following dataset of 10 people, I assign a 1 if you have hepatitis A and 0 if you do not have
hepatitis A.
1 0 0 0 1 0 1 0 0 0
a) What proportion of people have hepatitis A? What proportion do NOT have hepatitis A?
Solution
Proportion of people have Hepatitis A=3/10=0.3
Proportion of people do not have Hepatitis A=7/10=0.7
b) Use the sample mean formula and calculate the mean of these data. Notice that the sample
mean formula of 0-1 data gives you the proportion.
Solution
mean=
∑ x = 1+ 0+ 0+0+1+0+ 1+ 0+0+0 = 3 =0.3
n 10 10
7. The Framingham Heart Study is a long term prospective study of the etiology of cardiovascular
disease among a population of subjects in the community of Framingham, Massachusetts. The
Framingham Heart Study was a landmark study in epidemiology in that it was the first prospective
study of cardiovascular disease and identified the concept of risk factors and their joint effects. The
study began in 1948. 5,209 subjects were initially enrolled in the study. Participants have been
examined biennially since the inception of the study and all subjects are continuously followed
through regular surveillance for cardiovascular outcomes.
The sample mean is 54.79 years and the sample standard deviation is 9.56
Mean−x∗Std=43.52
54.79−9.56∗x =43.52
9.56∗x=54.79−43.52
9.56∗x=11.27
11.27
x= =1.179 standard deviations
9.56