Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Name:_________________________________ Date:_________________ Score:_____________

Biostatistics Final Examination


Part 1

Set A. Read carefully the questions and single out the best answer. Write the letter only.
1. A statistical test, denoted by p, shall be interpreted as follows:
A. the null hypothesis H0 is accepted if p <0.05
B. the null hypothesis H0 is rejected if p <0.05
C. the null hypothesis H0 is rejected if p > 0.05
D. the alternate hypothesis H1 is rejected if p > 0.05
2. A malignant cancer is recorded in stages using the symbols 0, I, II, III, IV. We say that the scale
used is:
A. Alphanumeric B. Numerical
C. Ordinal D. Nominal
3. In a contingency table that shows data from a clinical trial is good to have high values for:
A. sick subjects, diagnosed as negative
B. sick subjects, diagnosed as positive
C. healthy subjects, not yet diagnosed
D. healthy subjects, diagnosed as positive
4. Pearson correlation coefficient, denoted by r, measures:
A. The scattering strength of data for a statistical series, and its potential for predictive testing
B. The strength of the correlation between the mean and median
C. The relationship between two parameters, either categorical or numerical
D. The tendency of simultaneous increase or decrease, or inverse evolution, for two numerical
parameters
5. The following statement is true for a Histogram chart:
A. Each bar is of the same width
B. The height of the bars is approximately the number of individuals in the class
C. The width of the bars is obtained by dividing the difference between the maximum
and the minimum values in the series to the number of desired class
D. Used for comparison of discrete variables and in presenting categorical data
6. In defining the Standard deviation, it would be incorrect to say that it:
A. is the square root of variance
B. is measured using the unit of the variable
C. is measured using the squared unit of the variable
D. has values generally comparable with the average value
7. A Frequency Polygon is defined as the following, except:
A. A statistical indicator that shows the distribution of a series of values
B. A graph represented by a broken line the absolute frequencies of classes of a data series
C. A graph that contains exactly the same information as the corresponding histogram
D. A statistical indicator presenting the occurrence
8. An exception to defining regression line is that it is a straight line which:
A. is located as close as possible to all the points of a scatter chart
B. is defined by an equation having 2 parameters: the slope and the intercept
C. provides an approximate relationship between the values of two parameters
D. is parallel to one of the coordinate axes
9. In a hypothesis test with hypotheses H0: μ ≥ 37 and H1: μ < 37, a random sample of 67 elements
selected from the population produced a mean of 35.3. Assuming that σ = 8.9, what can be
assumed?
A. one-tailed test to the left B. a p-value can be computed
C. the null is incorrect D. none of these
10. After performing a t - test for comparison of means, we obtain p = 0.0256, then:
A. We reject H0 and accept H1 B. We accept H0
C. We reject H0 D. We cannot decide
11. Chi square is zero when:
A. observed frequency is positive and expected frequency is negative
B. expected frequency is lesser than the observed frequency
C. observed frequency is double than a negative expected frequency
D. expected frequency is equal to the observed frequency
12. In describing a Relative risk, one of the following is correct:
A. Shows the relationship between a factor assumed to influence the disease, and the disease
B. It is the ratio of morbidity for those exposed and those in the prime group of interest
C. Cannot be greater than 1
D. is expressed as a percentage in relation to 10^n
13. A Gaussian curve of a normal distribution, has the following features, except:
A. in the interval [u – 1s; u + 1s] about 2/3 (~ 68%) of the series’ values are located
B. in the interval [u – 2s; u + 2s] about 95% of the series’ values are located
C. in the interval [u – 3s; u + 3s] about 99% of the series’ values are located
D. in the interval [u – 1s; u + 1 s] about 50% of the series’ values are located
14. The statistical test that can be used to validate the statement - people having a high cholesterol
suffer more from hypertension, is:
A. ANOVA B. Pearson r
C. Linear regression D. T-test
15. In a series of values, the first quartile is:
A. The value in the ordered series located at the first half of the number of values in the series
B. Any quarter of the values in the ordered series
C. The numeric value for which a quarter of the series’ values are lower
D. The numeric value for which a quarter of the series’ values are higher
16. For a clinical trial, the Sensitivity is 0. 662 and Specificity is 0.993. This means that:
A. The test is a valuable test because both indicators are more than 50%
B. The test is a worthless test, since it gives errors when detecting both sick and healthy subjects
C. The test is not reliable, because the sensitivity is too low
D. A passable test because of the high specificity
17. If on a group of 600 patients, for a risk factor we calculated an Odds Ratio of 17.0 , the
possibility of developing the disease being investigated is:
A. very high when exposed to the factor
B. very small when exposed to the factor
C. the same in the case of exposure in the case of non-exposure
D. lower in the exposed than in the unexposed, OR being less than 100
18. When comparing two means, the null hypothesis shall be interpreted as follows, except:
A. Data do not support the Hypothesis that the populations' means are different
B. The compared values are manifesting variation
C. The two sampling averages do not differ significantly
D. The two populations, from which the compared values were sampled, do not differ
19. In a data set given, the average of a series of values is 30 and their variance is 9, then the
coefficient of variation (= the ratio standard deviation / average) is:
A. 30% B.20%
C. 90% D.10%
20. Comparing two sets of data values, we may observe a distribution or scatter as follows, except:
A. For equal average values, the one with a higher SD is more scattered
B. For approximately equal SD values, the one with a higher average is more scattered
C. For equal SD values, the one with a lower average is more scattered
D. If both the averages and SD differ much between the series, we can compare scattering using
the coefficient of variation
21. The correlation coefficient computed for two parameters is r = 0.835. This means that:
A. The parameters are directly correlated, and the link is weak - r is positive and close to 1
B. The parameters are inversely correlated, and the link is strong - r is negative and close to 1
C. The parameters are directly correlated, and the link is strong - r is positive and close to1
D. The coefficient’s value is equivocal, and the number of values should be considered
22. In a series of numerical values, the median is:
A. A value for which half of the values are higher and half of the values are lower
B. The value located exactly midway between the minimum and maximum of the series
C. The most commonly encountered values among the series
D. A measure of the eccentricity of the series
23. Considering the Sensitivity of a clinical trial:
A. It is the ratio of sick patients, diagnosed as positive, and the total number of sick patients.
B It is the ratio of healthy subjects diagnosed as negative, and the number of healthy subjects
C. It is the ratio of sick patients, diagnosed as negative, and the total number of patients
D is the ratio of sick patients, diagnosed as negative, and the total number of healthy persons
24. If 580 patients, we calculated a Relative Risk of 15.0 , the possibility of developing the disease
being investigated is:
A. very high when exposed to the factor
B. very small when exposed to the factor
C. the same in the case of exposure in the case of non-exposure
D. lower in the exposed than in the unexposed, RR being less than 100
25. An exception describing the average of a series of numerical values is:
A. The sum of the values divided by their number
B. Lower than the minimum value in the series
C. Lower than the maximum value in the series
D. An indicator of central tendency for the values of the series

Set B. Do not explain. Answer directly what is being asked in each scenario. The hypotheses may be
stated in words or in symbols, do so as directed. A sentence for the conclusion / decision will suffice.
1. A pharmaceutical company is advancing a medical trial to test whether or not a new
medicine (cholestinor hydrolipase) reduces cholesterol by 40 %. There are already 17
respondents who signed the consent and gave their approval.
A. How should a researcher make a claim in a statement?
B. Construct the hypotheses appropriate for this study.
C. What statistical formula can be used to compute the hypothesis test?
2. Jonathan is a fisherman who is interested in whether the distribution of fish caught in
Laguna Lake is the same as the distribution of fish caught in Taal Lake. Of the 191 randomly
selected fish caught in Laguna Lake, 105 were rainbow trout, 27 were milkfish, 35 were bass,
and 24 were catfish. Of the 293 randomly selected fish caught in Taal Lake, 115 were
rainbow trout, 58 were milkfish, 67 were bass, and 53 were catfish.
A. If the distribution of fishes in these two lakes are compared, what are the hypotheses?
B. What statistical tool will you use to compute this?
C. If the computed p-value = 0.0083 in an alpha of 1%, what is your conclusion?
3. A Fertility Laboratory claims that their procedures improve the chances of a boy being born.
The results for a test of a single population proportion are as follows:
H0: p = 0.50, Ha: p > 0.50 α = 0.01 p-value = 0.0025
A. State the null and alternative hypotheses in words.
B. What study should be conducted to prove this?
C. Interpret the results and state a conclusion.
4. In some selected college students, 43 used a computer, 102 used a calculator with built in
statistics functions, and 65 used a table from the textbook. Of some randomly selected
university students, 28 used a computer, 33 used a calculator with built in statistics
functions, and 40 used a table from the textbook. Is there a difference between the
distribution of community college statistics students and the distribution of university
statistics students in what technology they use on their homework? Conduct an appropriate
hypothesis test using a 0.05 level of significance.
A. State the hypotheses in words and symbols.
B. What statistical tool should be used to test the mean?
C. Tabulate the given.
D. If the p-value = 0.0294, what is the conclusion?
5. It is believed that the mean height of high school students who play basketball on the school
team is 80 inches with a standard deviation of 1.8 inches. A random sample of 40 players is
chosen. The sample mean was 69 inches, and the sample standard deviation was 1.5 inches.
A. State the null and alternative hypotheses in words and symbols.
b. What is the most appropriate statistical tool to be used in hypotheses testing?
C. If the p value is almost zero, make a conclusion.
6. The National Institute of Mental Health published an article stating that in any one-year
period, approximately 12% of Filipino adults suffer from depression or a depressive illness.
Suppose that in a survey of 85 people in a certain city, 8 of them suffered from depression
or a depressive illness. Conduct a hypothesis test comparing the percentage of people in
that city suffering from depressive illness than the percentage in the general adult Filipino
population.
A. Is this a test of mean or proportion?
B. State the null and alternative hypotheses.
C. Is this a right-tailed, left-tailed, or two-tailed test?
D. What is the prevalence of persons with depressive illness in that city per 100 people?
7. Joselito is an eight-year old boy who established a mean time of 15.2 seconds for swimming
the 25-yard freestyle, with a standard deviation of 0.8 seconds. His dad, Frank, thought that
Joselito could swim the 25-yard freestyle faster using goggles. Frank bought Jeffrey a new
pair of expensive goggles and timed Jeffrey for 15 swims in 25-yard freestyle. For the 15
swims, Joselito’s mean time was 14.1 seconds. Frank thought that the goggles helped
Joselito swim faster. Conduct a hypothesis test using a preset α = 0.05.
A. Set up the Hypothesis statements.
B. What are the dependent and independent variables?
C. What parameter or tool will you use for hypothesis testing?
D. If the p-value = 0.0187, what is your decision / conclusion?
8. A teacher wants to test if it takes less than 40 minutes to teach a lesson plan in Biological
sciences. He taught in Botany, Physics and Chemistry in 4 classes each subject and timed.
A. What simple and direct parameter may be used to prove the assumption?
B. What are the dependent and independent variables?
C. If a hypothesis testing is to be performed, how should you state the hypotheses
comparing the variables?
D. Based on these hypotheses, what statistical tool is most appropriate?
9. The mean speed of your cable Internet connection is more than 5 Megabits per second at
the 4th floor of the condominium and only 2 mbps at the 8th floor. The variation is 0.5 mbps.
A. You want to know if the internet speed is affected by the level by where you connect in
that building. What statistical tool is appropriate to test this?
B. What are the variables involved? Identify what kind of variables are these.
C. The 15th floor or roof deck is off limits. What formula will you use to determine the
internet speed in that level?
D. How do the variables result affect the assumed internet speed in the 15th floor?
10. A study was conducted to investigate the effectiveness of a tranquilizer in reducing pain. A
lower score indicates less pain. The “before” value is matched to an “after” value and the
differences are calculated. The scoring decreases as the pain felt decreases.
Subject: A B C D E F G H
Before 6.6 6.5 9.0 10.3 11.3 8.1 6.3 11.6
After 6.8 2.4 7.4 8.5 8.1 6.1 3.4 2.0
A. What statistical tool is appropriate if a hypothesis testing is performed?
B. What kind of research design is this?
C. What kind of variable is presented in the data?

Set C. Write the appropriate study design for each of the objectives. There may be more than 1 answer
per number.

1. Disease prevalence
2. Causation / Etiology / Harm
3. Comparing interventions
4. Disease incidence
5. Operating characteristics of a diagnostic test
6. Disease description or spectrum
7. Prognosis
8. Summarizing literature
9. Drug efficacy

Prepared by:

Prof. Sherwin B. Toriano

You might also like