Homework M2 Solution

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Students Name:

Homework M2
1. People with diabetes must monitor and control their blood glucose level. The goal is to maintain
“fasting plasma glucose” between 90 and 130 milligrams per deciliter (mg/dl). Here are the
fasting plasma glucose levels for 18 diabetics enrolled in a diabetes control class, 5 months after
the class has ended.

141 158 112 153 134 95


96 78 148 172 200 271
103 359 145 147 172 255

a) Calculate the median, mean, geometric mean, mange, variance, standard deviation, and
interquartile range (IQR) of fasting blood glucose levels.

Note: Sometimes when you calculate a percentile location you may get something like 5.5,
which we know means the median lies ½ way between the 5 th and 6th value in the ordered
dataset- so we can just average the 5 th and 6th values to get median. But for 1st and 3rd
quartiles you get a value like 5.75 it means you have to find the number that is ¾ between
the 5th and 6th observation. To do this find the difference between the 5 th and 6th observation
in the ordered dataset. Multiply the difference by 0.75 and then add this amount to the 5 th
observation in the ordered dataset.

We did not see an example of this in the kectture but you will in this problem.
As an Example assume we have data: 34 78 79 81
3rd quartile location (75th percentile) is 0.75(4+1) = 3.75
So the 3rd and 4th observation is 79 and 81
Calculate: (81-79)*0.75 = 1.5
So 3rd quartile, Q3 = 79+1.5 = 80.5

Solution

Median =0.5*(n+1) th value=0.5*19=9.5 th value

78,95,96,103,112,134,141,145,147,148,153,158,172,172,200,255,271,359

Median =(147+148)/2 =147.5

Mean=
∑x
n

141+ 96+103+158+78+359+112+148+145+153+ 172+147 +134+200+172+ 95+ 271+ 255


Mean=
18
2939
Mean= =163.278
18

Range=maximum−minimum

Range=359−78=281

x (x-mean) (x-mean)^2
141 -22.278 496.3093
158 -5.278 27.85728
112 -51.278 2629.433
153 -10.278 105.6373
134 -29.278 857.2013
95 -68.278 4661.885
96 -67.278 4526.329
78 -85.278 7272.337
148 -15.278 233.4173
172 8.722 76.07328
200 36.722 1348.505
271 107.722 11604.03
103 -60.278 3633.437
359 195.722 38307.1
145 -18.278 334.0853
147 -16.278 264.9733
172 8.722 76.07328
255 91.722 8412.925
Total 84867.61

variance=
∑ ( x−x )2 = 84867.61 =4992.212
n−1 17

standard deviation= √ variance= √ 4992.212=70.65 6

IQR=Q3-Q1

Q3=0.75*(n+1)th value=0.75*19=14.25th value

Q3=172+0.25*(200-172)=172+4.5=176.5

Q1=0.25*(n+1)th value=0.25*19=4.75th value

Q1=103+0.75*(112-103)=103+6.75=109.75

IQR=176.5-109.75=66.75

b) Which measure of central tendency would you select to report for these data and why?
Solution

 Median would be the best measure of central tendency because the data contains an extreme
value i.e 359.

c) Which measure of dispersion would you select to report for these data and why?

Solution

 Standard deviation is always the best measure of dispersion because it measures the averages
of the dispersions from the mean.
d) What is your conclusion on the diabetic control class’s success based on these data?

Solution

 There is no significant statistical evidence that diabetes control class had variation in blood
glucose level
e) Sketch the boxplot of these data

Solution
2. Suppose that you and your friends emptied the coins in your pockets, wallets, etc recorded the
year marked on each coin. What do you think the shape of this distribution would look like—
skewed left, skewed right, or symmetric? Explain.
Solution
 The shape of distribution of year marked on coin will be left skewed because new coins
are more than older coins.

3. Use the following 7 numbers:

1 10 100 1000 10000 100000 1000000

a) Calculate the geometric mean by multiplying the 7 numbers together and taking the
7th root.
Solution
Geoemetric mean=7 √ 1∗10∗100∗1000∗10000∗100000∗1000000
Geoemetric mean=7 √ 1000000000000000000000
Geoemetric mean=1000
b) Use log base 10 and convert all these numbers to the log scale. List the converted
log base 10 values.
Solution
0, 1,2,3,4,5,6
c) Using the results from part b, find the geometric mean by averaging all the
converted log base 10 numbers and taking the antilog base 10. The answer should
be the same as part a.
Solution
Geometric mean=bx=
( 0+1+2+3+ 4+5+ 6 )
7
=b
21
7( )
Geometric mean=10^3=1000
d) Instead of using log base 10, use the natural log to compute the geometric mean.
Show all steps.
This answer should be EXACTLY the same as part a. If it is not, it should be very very
close, and it is due to the rounding that it did not turn out to be exactly the same.

Solution

Geometric mean=b ( ln1+ ln 10+ ln 100+ ln 1000+ln710000+ln 100000+ ln 1000000 )


Geometric mean=b ¿2.303+4.605+6.908+9.210+11.513+13.816)/7)=b(48.355/7)

Geometric mean=e 6.908=1000.24 ≈ 1000


4. Compare the distributions of household size for South Africa and the United Kingdom (UK)
based on a sample of survey respondents.

a) Which country has more dispersion in household size? What might this tell you about each
country?
Solution
 South Africa has more dispersion in household size than United Kingdom. This
indicates that the household sizes in UK are usually smaller in size than it is in South
Africa. Again there is a huge variation when it comes to household sizes in South
Africa.
b) While you do not have the data, if you were to calculate the mean household size for each
country, which country do you expect has a larger mean?
Solution
 I expect south Africa to have a larger mean since it has more values on the higher
side of the x-axis.
c) For South Africa, which measure of dispersion do you expect to be the better statistic to
summarize the typical value – MEAN or MEDIAN?
Solution
 Median is the best measure of central tendency because , the south Africa data is
much spread and there are extreme values. Median is not affected by extreme
values.
5. Use the following results from the DIETFITS randomized control trial to answer the questions below.

a) Which group had the highest median weight change over 12-months?

Solution

Low carbohydrate genotype followed healthy low-fat diet

b) Which group has the smallest interquartile range of weight change over 12-months?

Solution

Low fat genotype that followed healthy low fat diet

c) Which group has the person who had the most weight loss over 12-months?

Solution

Low carbohydrate genotype followed healthy low-fat diet

d) Does this graph (and data) provide any compelling evidence that the type of diet matters or
whether a person’s genotype matters for weight loss over 12 months? Explain.

Solution

 The data cannot give any compelling evidence that the diet type matters or whether a person’s
genotype matters for weight loss over a period of 12 months.

6. In the following dataset of 10 people, I assign a 1 if you have hepatitis A and 0 if you do not have
hepatitis A.
1 0 0 0 1 0 1 0 0 0

a) What proportion of people have hepatitis A? What proportion do NOT have hepatitis A?
Solution
Proportion of people have Hepatitis A=3/10=0.3
Proportion of people do not have Hepatitis A=7/10=0.7

b) Use the sample mean formula and calculate the mean of these data. Notice that the sample
mean formula of 0-1 data gives you the proportion.
Solution

mean=
∑ x = 1+ 0+ 0+0+1+0+ 1+ 0+0+0 = 3 =0.3
n 10 10
7. The Framingham Heart Study is a long term prospective study of the etiology of cardiovascular
disease among a population of subjects in the community of Framingham, Massachusetts. The
Framingham Heart Study was a landmark study in epidemiology in that it was the first prospective
study of cardiovascular disease and identified the concept of risk factors and their joint effects. The
study began in 1948. 5,209 subjects were initially enrolled in the study. Participants have been
examined biennially since the inception of the study and all subjects are continuously followed
through regular surveillance for cardiovascular outcomes.

The following histogram shows the distribution of age at examination.

The sample mean is 54.79 years and the sample standard deviation is 9.56

a) What range of ages contains approximately 95% of the subjects?


Solution
Range of 95% is 2 standard deviations within the mean
Mean ∓2 Std =54.79 ∓2∗9.56=54.79 ∓19.12=[35.67 , 73.91]

b) What age is 2.8 standard deviations below the mean?


Solution
Mean−2.8 Std=54.79−2 .8∗9.56=28.022

c) How many standard deviations above the mean is 81.56 years?


Solution
Mean+ x Std =81.5 6
54.79+9.56∗x=81.5 6
9.56∗x=81.5 6−54.79
9.56∗x=26.77
26.77
x= =2.8 standard deviations
9.56
d) What age is 1 standard deviations above the mean?
Solution
Mean+1 Std=54.79+1∗9.5 6=64.35

e) How many standard deviations below the mean is 43.52 years?


Solution

Mean−x∗Std=43.52
54.79−9.56∗x =43.52
9.56∗x=54.79−43.52
9.56∗x=11.27
11.27
x= =1.179 standard deviations
9.56

You might also like