Estimation Exercises 2324

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Estimation and hypotheses testing exercises

Biostatistics Medicine - UdL - Academic year 2023-24

Exercise 1. A continuous random variable is observed in 1471 individuals. Its average X is


195.7 and its standard deviation is 78.3.

(a) Estimate a 95% confidence interval of the population mean µ.

Exercise 2. A sample of 1355 individuals older than 70 years representative of the Catalan
population was studied. It was observed that the risk of suffering from Alzheimer’s disease
in the next 5 years was 0.068.

(a) Estimate a 95% confidence interval of the risk (π) of suffering Alzheimer’s disease in
the next 5 years, in this age group of the Catalan population.
(b) If we need to estimate the probability π with a precision equal to 0.009 and a 95%
confidence, what sample size do we need? (note: precision refers to the distance from
the center of the confidence interval to its limits).

Exercise 3. A research team studied 995 women who exercise regularly and 2323 women
who do not exercise. After 10 years of follow-up, 70 women in the exercise group and 279
women in the no exercise group developed osteoporosis.

(a) The design of this study is:


a) None of the other answers is correct.
b) Observational of cohorts.
c) Experimental with control group.
d) Observational case-control.
(b) Estimate a 95% confidence interval of the difference of proportions of osteoporosis in
women who exercise compared to those that do not exercise.
(c) Based on the estimated confidence interval, it can be concluded:
a) Regular exercise does not seem to have an effect on osteoporosis in women.
b) Regular exercise increases the risk of osteoporosis in women.
c) The size of the groups is too small to be able to draw conclusions.
d) Regular exercise reduces the risk of osteoporosis in women.

Exercise 4. A study on tobacco use and bladder cancer was designed as a case-control study
and included 256 cases and 495 controls. The study showed that 80.08% of the patients with
bladder cancer were smokers and that 44.04% of the individuals without bladder cancer were
smokers.

Smokers Non-smokers
Cases

controls
cases = individuals with bladder cancer.

1
(a) In order to measure the association between smoking and bladder cancer, estimate the
odds ratio of smoking and its 95% confidence interval.
(b) If instead of having obtained a confidence interval, the authors had performed hypoth-
esis testing, the estimated p-value (significance level) would have been:
a) p = 1; or
b) p > 0.05; or
c) p = 0.05; or
d) p < 0.05 ?
(c) How do you interpret the obtained confidence interval (or p value)?
a) There should be more cases to draw a reliable conclusion
b) Tobacco seems to be associated with bladder cancer
c) The confidence interval does not contain the value 0, therefore the results are statis-
tically significant
d) Tobacco does not seem to be associated with bladder cancer
(d) How many cases should be included in the study if the investigator had wanted to
estimate the proportion of exposure to smoking, among patients with bladder cancer,
with precision equal to 0.02?

Exercise 5. A 95% confidence interval of the population mean µ is:

(a) A range of values in which one expects to find 95% of the sample means.
(b) A range of values that includes 95% of the values of the variable in the population.
(c) A range of values in which one expects to find the sample mean, X, with confidence
95%.
(d) A range of values in which one expects to find the population mean µ with confidence
95%.

Exercise 6. A 95% reference or normality interval is:

(a) A range of values in which one expects to find 95% of the sample means.
(b) A range of values in which one expects to find the sample mean, X, with confidence
95%.
(c) A range of values that includes 95% of the values of the variable in the population.
(d) A range of values in which one expects to find the population mean µ with confidence
95%.

Exercise 7. In 1993, the mothers of all babies born in hospitals in the city of Pelotas
(Brazil), were invited to participate in a study of factors associated with physical activity in
adolescence. 5249 children were included in the study and several variables were collected
from them and their families. After twelve years, 4450 teenagers among the participants
were interviewed to determine the weekly time dedicated to physical activity. The following
tables were published with some of their results:

Gender Participants % sedentary life


Men 2167 49
Women 2283 67

2
Mother years of education Sedentary Active Total
0-8 1916 1446 3362
≥9 684 402 1086

(a) The design of this study is:


a) Retrospective of cohorts.
b) Retrospective of cases and controls.
c) Experimental.
d) Prospective of cohorts.
(b) Estimate a 95% confidence interval of the difference between proportions of sedentary
boys in comparison with girls.
(c) Based on the estimated confidence interval, it can be concluded:
a) There is no relationship between gender and sedentarism.
b) Girls are more sedentary than boys and the differences are statistically significant.
c) It seems that gender and sedentarism are related, but without statistical significance.
d) Girls are less sedentary than boys and the differences are statistically significant.
(d) Estimate a 95% confidence interval for the relative risk of sedentarism in children from
mothers with 0-8 years of education with respect to children from mothers with ≥ 9
years of education.
(e) The results of the previous section show that, with a 95% confidence:
a) There is no relationship between level of education of the mother and physical activity
of children.
b) Children of mothers with low educational level are less sedentary than children of
mothers with high educational level and the differences are statistically significant.
c) None of the other answers is correct.
d) Children of mothers with low educational level are more sedentary than children of
mothers with high educational level and the differences are statistically significant.

Exercise 8. A random sample of 965 people has a total cholesterol mean X = 215.8 mg/dL.

(a) Assuming that σ = 42.6 mg/dL, compute the 95% confidence interval (CI) of the
population mean µ.
(b) Estimate the sample size needed to reduce the width of the confidence interval to half
of the previous interval.

Exercise 9. An epidemiological study of cases (Ca) and controls (Co) had the following
results:

Cases Controls Total


Exposed 61 67 128
Non exposed 50 266 316
Total 111 333 444

(a) Estimate the proportion of exposed cases and the proportion of exposed controls with
95% confidence intervals.
(b) Estimate the odds ratio and its 95% confidence interval.
(c) Based on the OR and its confidence interval, is there any association between exposure
and case-control status?
a) Yes
b) No

3
Exercise 10. An observational cohort study gave these results:

Disease No disease Total


Exposed 573 184 757
Non exposed 593 1206 1799
Total 1166 1390 2556

(a) Estimate the proportion of individuals with the disease in the exposed group and its
95% confidence interval.
(b) Estimate the proportion of individuals with the disease in the non exposed group and
its 95% confidence interval.
(c) Estimate a 95% confidence interval of the difference in the risk of having the disease in
exposed versus non exposed.
(d) Do you think there is association between exposure and disease?

Exercise 11. To study the effect of vitamin D3 on mortality, morbidity, and growth in
infants born with low birth weight (between 1.8 and 2.5 kg), 2053 babies were included and
followed-up during a period of 6 months. The selected newborns were randomly allocated
to two groups: one group received weekly doses of vitamin D3 and the other group received
placebo.

(a) According to the information given, the design of the study is:
a) Experimental.
b) Retrospective, case-control.
c) Descriptive, health survey.
d) Prospective of cohorts and longitudinal.
(b) Regarding mortality during the six months of follow-up, the following table shows the
total number of deaths in both groups.

Vitamin D3 Placebo Total


Dead 49 46 95
Alive 977 981 1958
Total 1026 1027 2053

Estimate the relative risk (RR) of death during the first six months of life of infants
who take a vitamin D3 supplement with respect to placebo.
(c) Estimate a 95% confidence interval (CI) of the relative risk obtained in the previous
question.
(d) According to the results obtained in the previous question, which of the following sen-
tences is true?
a) Vitamin D3 prevents death in low birth weight babies because the 95% confidence
interval of the relative risk contains 1.
b) Vitamin D3 prevents death in low birth weight babies because the 95% confidence
interval of the relative risk does not contain 1.
c) Vitamin D3 does not prevent death in low birth weight babies because the 95% con-
fidence interval of the relative risk does not contain 1.
d) Vitamin D3 does not prevent death in low birth weight babies because the 95%
confidence interval of the relative risk contains 1.

4
Exercise 12. In order to estimate the birth weight of the children of smoking mothers a
gynecologist examines the weight of a sample including 82 children of these mothers. The
average weight is 2.749 kg and standard deviation 0.37 kg.

(a) Estimate, with a confidence of 95%, the average weight µ of the children of smoking
mothers in the town where the sample was taken
(b) With a 95% confidence, what sample size is needed to ensure that the accuracy, or
margin of error, of the estimated mean is 0.05.

Exercise 13. To estimate the percentage of people with more than two cardiovascular risk
factors in a given population, a random sample of 527 individuals was chosen. Of these
13.9% had more than two risk factors.

(a) Estimate a 95% confidence interval of the percentage of people with more than two risk
factors in this community.
(b) What sample size is needed to estimate the percentage of people with more than two
risk factors with an accuracy of 2% and 95% confidence? (Note: accuracy refers to the
distance from the center to the limits of the interval ).

Exercise 14. A metabolic variable has been observed in two groups of individuals. In group
1, with a sample size equal to 150 the mean was X 1 = 72.98 and the standard deviation
s1 = 6.86. In group 2, with a sample size equal to 179 the mean was X 2 = 89.6 and the
standard deviation s2 = 7.09.

(a) Estimate a 95% confidence interval of the difference of means, µ1 −µ2 , in the population.
(b) How do you interpret the confidence interval?
a) There are statistically significant differences among the groups means, because the
confidence interval includes the value 0.
b) There are statistically significant differences among the groups means, because the
confidence interval does not include the value 0.
c) A larger sample size would be necessary before drawing conclusions.
d) The difference among the groups means is not statistically significant, because the
confidence interval includes the value 0.
(c) If you want to estimate the mean of group 2 with a precision of ± 1.1 units and a
confidence of 95 %, what should the sample size be?

Exercise 15. A randomized controlled clinical study evaluated the effect of cognitive be-
havioural therapy (CBT) on chronic fatigue syndrome. To do this, 457 patients were included
in the CBT group and 454 in the control group. One year later, the following scores were
obtained on a scale of assessment of fatigue (a higher score indicates more fatigue): mean
=57.73, sd =28.46 in the treated group and mean =55.41 sd =32.76 in the control group.

(a) Estimate a 95% confidence interval of the difference of means, µ1 −µ2 , in the population.

5
(b) The interval above shows that, with confidence 95%:
a) The study design does not allow to assess whether cognitive behavioural therapy
reduces chronic fatigue.
b) Cognitive behavioural therapy has a statistically significant effect to relieve chronic
fatigue.
c) Cognitive behavioural therapy does not seem to relieve chronic fatigue.
d) Cognitive behavioural therapy treatment is detrimental for chronic fatigue.

Exercise 16. The hippocampus is a part of the brain that decreases with aging. This
reduction involves risk of memory loss and dementia. A study on the effect of aerobic
exercise in the prevention of memory loss distributed randomly 168 old individuals in two
groups. Group 1 (n =84) was assigned moderate or intense aerobic exercise three or more
times a week (aerobics). Group 2 (n =84), which acted as the control group, was assigned
stretching and toning exercises (stretching). According to the authors the two groups were
similar at the time of randomization. Table 1 shows the characteristics of the participants
when the study began:

Table 1: Participants’ characteristics at study entry


Aerobic Control Difference (95% CI)
n 84 84
Age, mean (sd) 68.5 (4.82) 67.5 (5.46)
Gender (% women) 65 69
Improvement in physical performance (%) 9 1 0.08 (0.015, 0.145)

Figure 1: Evolution of the hippocampus volume

Black lines indicate group 1 (aerobic) and gray lines indicate group 2 (stretching). Solid lines indicate the right side of
the hippocampus and the dotted lines the left side.

(a) Estimate a 95% confidence interval of the mean difference of age between the aerobic
and the control groups.
(b) According to the confidence interval obtained in the previous question, the correct an-
swer is?
a) There are no statistically significant differences between the groups ages.
b) Both groups are comparable with respect to age because it is a randomized experi-
mental study.

6
Figure 2: Scatter plot of the specified variables between the start and the end of the study

c) There are statistically significant differences between the groups ages, and this idi-
cates that the random allocation has been manipulated.
d) There are statistically significant differences between the groups ages.
(c) Table 1 shows the percentage of women in each group. Estimate a 95% confidence
interval of the difference of proportions of women in group 1 (aerobic) with respect to
group 2 (stretching).
(d) According to the previous confidence interval:
a) Since the confidence interval contains the value 1, there are no statistically significant
differences between the proportions of women in the studied groups.
b) Since the confidence interval does not contain the value 1, there are statistically
significant differences between the proportions of women in the studied groups.
c) Since the confidence interval contains the value 0, there are no statistically significant
differences between the proportions of women in the studied groups.
d) Since the confidence interval does not contain the value 0, there are statistically
significant differences between the proportions of women in the studied groups.
(e) The last row of Table 1 shows the percentage of participants that improved their physical
performance in each group. Is the difference between these percentages statistically
significant? (Answer 1 if yes or 0 if no)
(f) When volume changes in the anterior hippocampus between both study groups were
compared, p-values were < 0.001 for both the right and left sides. In contrast, when
volume changes in the posterior hippocampus were compared p-values were >0.1 (also
for both sides). Figure 1 shows the evolution of the hippocampus volume.
Which of the following answers do you think is the most adequate?
a) Statistically significant differences were observed in the volum evolution of the ante-
rior hippocampus, but not in the posterior one.
b) Statistically significant differences were observed in the evolution of both hippocam-
pus between both study groups.
c) There is not enough information to decide if the differences were statistically signif-
icant.
d) Statistically significant differences were observed in the volum evolution of the pos-

7
terior hippocampus, but not in the anterior one.
(g) The scatter plot (Figure 2) shows the change in the hippocampus volume (X axis) and
the change in the maximum volume of oxygen consumed after practising intense exercise
(Y axis), between the start and the end of the study. This plot includes only individuals
in group 1. Which of the following answers do you think is the most adequate?
a) The correlation coefficient, r, and its p-value p < 0.001 indicate that there is a direct
linear association between changes in the left hippocampus volume and changes in the
maximum aspirated volume.
b) The correlation coefficient, r, and its p-value p < 0.001 indicate that there is no asso-
ciation between changes in the left hippocampus volume and changes in the maximum
aspirated volume.
c) It is not possible to conclude that there is linear association between the two variables
if we do not perform a regression analysis.
d) The correlation coefficient, r, and its p-value p < 0.001 indicate that there is an
inverse linear association between changes in the left hippocampus volume and changes
in the maximum aspirated volume.

You might also like