Professional Documents
Culture Documents
Summary 2022 2023
Summary 2022 2023
والعلوم الصحية
10216235
الفصل ألاوو2023-2022
Elementary Statistics: A Step by Step Approach, Bluman, 7th Edition 2022-2023
BIOSTATISTICS
Statistics is used in almost all fields of human endeavor. In sports, for example, a statistician may keep
records of the number of yards a running back gains during a football game, or the number of hits a baseball
player gets in a season. In other areas, such as public health, an administrator might be concerned with the
number of residents who contract a new strain of flu virus during a certain year. In education, a researcher
might want to know if new methods of teaching are better than old ones. These are only a few
examples of how statistics can be used in various occupations.
Statistics is the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions
from data
Biostatistics: The branch of statistics responsible for the proper interpretation of scientific data generated
in the biology, public health and other health sciences (i.e., the biomedical sciences). In these sciences,
subjects are patients, mice, cells, etc.
Branches of Statistics
Descriptive statistics consists of the collection, organization, summarization, and presentation of data.
Inferential statistics consists of generalizing from samples to populations, performing estimations and
hypothesis tests, determining relationships among variables, and making predictions.
Basic Definitions
A population consists of all subjects (human or otherwise) that are being studied.
A sample is a group of subjects selected from a population. If the subjects of a sample are properly selected,
most of the time they should possess the same or similar characteristics as the subjects in the population.
Data are the values (measurements or observations) that the variables can assume.
A collection of data values forms a data set. Each value in the data set is called a data value or a datum.
When data are collected from every subject in the population, it is called a census.
An-Najah National University CH 1 – Page 1
Elementary Statistics: A Step by Step Approach, Bluman, 7th Edition 2022-2023
Qualitative variables are variables that have distinct categories according to some characteristic or
attribute.
For example, if subjects are classified according to gender (male or female), then the
variable gender is qualitative. Other examples of qualitative variables are religious preference and
geographic locations.
For example, the variable age is numerical, and people can be ranked in order according to the value of their
ages. Other examples of quantitative variables are heights, weights, and body temperatures.
Example 1:
a. Sizes of soft drinks sold by a fast-food restaurant (small, medium, and large)
c. Microwave wattage
d. Number of degrees awarded by a college each year for the last 10 years
e. Ratings of teachers
Quantitative variables can be further classified into two groups: discrete and continuous.
Discrete variables can be assigned values such as 0, 1, 2, 3 and are said to be countable.
Examples of discrete variables are the number of children in a family, the number of students in a classroom,
and the number of calls received by a call center each day for a month. Discrete variables assume values that
can be counted.
Continuous variables, by comparison, can assume an infinite number of values in an interval between any
two specific values.
Temperature, for example, is a continuous variable, since the variable can assume an infinite number of
values between any two given temperatures.
Continuous variables can assume an infinite number of values between any two specific values. They are
obtained by measuring. They often include fractions and decimals.
Example 2:
b. Microwave wattage
c. Number of degrees awarded by a college each year for the last 10 years
Level of Measurement
In addition to being classified as qualitative or quantitative, variables can be classified by how they are
categorized, counted, or measured. There are 4 levels:
The nominal level of measurement classifies data into mutually exclusive (non-overlapping) categories in
which no order or ranking can be imposed on the data.
The ordinal level of measurement classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist.
The interval level of measurement ranks data, and precise differences between units of measure do exist;
however, there is no meaningful zero.
The ratio level of measurement possesses all the characteristics of interval measurement, and there exists a
true zero. In addition, true ratios exist when the same variable is measured on two different members of the
population.
Example 3:
b. Blood types—O, A, B, AB
d. Ratings of teachers
e. Number of degrees awarded by a college each year for the last 10 years
g. Sizes of soft drinks sold by a fast-food restaurant (small, medium, and large)
Sampling Techniques
A random sample is a sample in which all members of the population have an equal chance of being
selected.
A systematic sample is a sample obtained by selecting every 𝑘th member of the population where 𝑘 is a
counting number.
A stratified sample is a sample obtained by dividing the population into subgroups or strata according to
some characteristic relevant to the study. (There can be several subgroups.) Then subjects are selected at
random from each subgroup.
A cluster sample is obtained by dividing the population into sections or clusters and then selecting one or
more clusters at random and using all members in the cluster(s) as the members of the sample.
Example 4:
a. Out of 10 hospitals in a municipality, a researcher selects one and collects records for a 24-hour
period on the types of emergencies that were treated there.
b. A researcher divides a group of students according to gender, major field, and low, average, and high
grade point average. Then she randomly selects six students from each group to answer questions in
a survey.
c. The subscribers to a magazine are numbered. Then a sample of these people is selected using
random numbers.
d. Every 10th bottle of Energized Soda is selected, and the amount of liquid in the bottle is measured.
The purpose is to see if the machines that fill the bottles are working properly.
BIOSTATISTICS
After organizing the data, the researcher must present them so they can be understood by those who will
benefit from reading the study. The most useful method of presenting the data is by constructing statistical
charts and graphs. There are many different types of charts and graphs, and each one has a specific purpose.
Example 1:
Twenty-five patients were given a blood test to determine their blood type. The data set is
A B B AB O O O B AB B B B O
A O A O O O AB AB A O B A
6
Frequency
5
0
A AB B O
Blood Type
The angle of each section in the above pie graph is evaluated as follows:
Notice that the sum of all angles is 72° + 100.8° + 57.6° + 129.6° = 360°
Example 2:
19 18 22 20 20 21 23 20 19 20 22 21 20
19 19 21 19 20 18 20 22 19 20 21 20
Histogram of Age
6
Frequency
0
18 19 20 21 22 23
Age
6
Frequency
17 18 19 20 21 22 23 24
Ages
When the range of the data is large, the data must be grouped into classes that are more than one unit in
width, in what is called a grouped frequency distribution.
Example 3:
The data shown here represent the number of grams per serving of 30 randomly selected brands of cakes.
32 47 51 41 46 30 46 38 34 34 52 48 48 38 43
41 21 24 25 29 33 45 51 32 32 27 23 23 34 35
These data can be grouped into 5 classes as shown in the following table.
Notice that:
The numbers 21, 28, 35, 42, and 49 are called the lower limits.
The numbers 27, 34, 41, 48, and 55 are called the upper limits.
The midpoint of each class is the average of the lower and upper limits.
The difference between any two successive lower limits is the same and it equals the difference
between any two successive upper limits and any two successive midpoints. In the above example
this difference equals 7.
The first class contains the lowest data value and the final class contains the highest data value.
6
Frequency
0
24 31 38 45 52
Number of Grams
6
Frequency
17 24 31 38 45 52 59
Number of Grams
Example 4:
Given the following frequency distribution for the scores of health care quality for selected hospitals.
5
Frequency
PROBLEMS
Class limits 0.6 – 1.0 1.1 – 1.5 1.6 – 2.0 2.1 – 2.5 2.6 – 3.0 3.1 – 3.5 3.6 – 4.0
Frequency 3 4 7 8 5 2 1
2. Suppose that weights of shipments to the nearest pound are grouped as following:
Class
1 – 75 76 – 150 151 – 225 226 – 300 301 – 375 376 – 450 451 – 525 526 – 600
Limits
Frequency 12 21 33 42 19 14 6 5
a. 75 pounds?
b. Less than 200 pounds?
c. 300 pounds or less?
d. Less than 300 pounds?
e. More than 450 pounds?
f. 450 pounds or more?
3. The following are numbers of courses taken by 20 randomly selected college students:
3 4 3 5 1 4 6 3 4 1
3 5 2 2 4 2 3 5 4 4
a. Construct a frequency table and show the relative and cumulative frequencies.
b. Construct a bar chart.
4. The weight and being smoker are observed for 65 randomly selected adolescents, and the following data
are obtained:
# W S # W S # W S # W S # W S
1 A Yes 14 A No 27 A No 40 A No 53 U Yes
2 A Yes 15 A No 28 O No 41 O Yes 54 A Yes
3 A No 16 O No 29 A No 42 A Yes 55 A No
4 A No 17 O Yes 30 A No 43 A No 56 A No
5 A No 18 A No 31 A Yes 44 A No 57 A Yes
6 A No 19 A No 32 A No 45 A No 58 U No
7 A No 20 A No 33 A No 46 O No 59 A No
8 O No 21 U No 34 A No 47 A No 60 A No
9 U No 22 O No 35 A No 48 A No 61 A No
10 A Yes 23 U No 36 A Yes 49 A No 62 A No
11 O No 24 A No 37 A No 50 A No 63 A No
12 A No 25 O Yes 38 A No 51 A No 64 U Yes
13 A No 26 A Yes 39 A No 52 A Yes 65 A No
*W: weight, S: smoking, U: underweight, O: overweight, A: appropriate weight
Smoking
Yes No
b. Construct a pie chart to show the proportions of weight categories among smoker adolescents.
c. Construct a pie chart to show the proportions of weight categories among nonsmoker
adolescents.
Answers
2. a. 0 – 12 b. 33 – 66 c. 108 d. 66 – 108 e. 11 f. 11 – 25
4
Frequency
0
1 2 3 4 5 6
Number of courses
Smoking
Yes No
Underweight 2 4
Overweight 3 6
Appropriate 10 40
b. The pie chart to show the proportions of weight categories among smoker adolescents is below.
Category
Appropriate
Ov erweight
Underweight
c. The pie chart to show the proportions of weight categories among nonsmoker adolescents is below.
Category
Appropriate
Ov erweight
Underweight
BIOSTATISTICS
Definition:
A statistic is a characteristic or measure obtained by using the data values from a sample.
A parameter is a characteristic or measure obtained by using all the data values from a specific population.
In statistics, Greek letters are used to denote parameters, and Roman letters are used to denote statistics.
For this chapter assume that the data are obtained from samples unless otherwise specified.
I. The Mean
The mean, also known as the arithmetic average, is found by adding the values of the data and
dividing by the total number of values. For example, the mean of 3, 2, 6, 5, and 4 is found by adding
3 + 2 + 6 + 5 + 4 = 20 and dividing by 5; hence, the mean of the data is 20 ÷ 5 = 4. The values of
the data are represented by 𝑋’s. In this data set, 𝑋1 = 3, 𝑋2 = 2, 𝑋3 = 6, 𝑋4 = 5, and 𝑋5 = 4. To show
a sum of the total 𝑋 values, the symbol Σ (the capital Greek letter sigma) is used, and 𝑋 means to
find the sum of the 𝑋 values in the data set.
The mean is the sum of the values, divided by the total number of values. The symbol 𝑥 represents the
sample mean.
𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 𝑥
𝑥= =
𝑛 𝑛
Where 𝑛 represents the total number of values in the sample.
For population, the Greek letter 𝜇 (mu) is used for the mean.
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 𝑋
𝜇= =
𝑁 𝑁
Where 𝑁 represents the total number of values in the population.
Example 1:
The data show the number of patients in a sample of 6 hospitals who acquired an infection while
hospitalized. Find the mean.
110 76 29 38 105 31
Example 2:
Example 3:
The data value "1" in set (b) is an extreme value and it affects the mean highly.
The median is the midpoint of the data array. The symbol for the median is 𝑀𝐷
This means the median divides the data set into two equal parts, 50% of the data are below the median and
50% are above the median.
1
3. If 𝑛 is an even then the median 𝑀𝐷 = 2 𝑥 𝑛 +𝑥 𝑛
+1
2 2
Example 4:
The number of children with asthma during a specific year in seven local districts is shown. Find the
median.
253 125 328 417 201 70 90
Example 5:
The data show the number of patients in a sample of 6 hospitals who acquired an infection while
hospitalized. Find the median.
110 76 29 38 105 31
To find the median:
1. Arrange the data → 29, 31, 38, 76, 105, 110
2. 𝑛 = 6, even number
1 1 1
3. 𝑀𝐷 = 2 𝑥 6 +𝑥 6
+1
=2 𝑥 3 +𝑥 4 = 2 38 + 76 = 57
2 2
Example 6:
The value that occurs most often in a data set is called the mode.
A data set that has only one value that occurs with the greatest frequency is said to be unimodal. If a data set
has two values that occur with the same greatest frequency, both values are considered to be the mode and
the data set is said to be bimodal. When no data value occurs more than the others, the data set is said to
have no mode.
Example 7:
b) 18.0, 14.0, 34.5, 10.0, 11.3, 10.0, 12.4, 10.0, 12.4, 11.3, 12.4
Since 10 and 12.4 occurred 3 times each, a frequency larger than any other number, there are two
modes 10.0 and 12.4. the data set is bimodal.
c) 18.0, 14.0, 34.5, 10.0, 11.3, 10.0, 12.4, 10.0, 12.4, 11.3, 12.4, 11.3
Since 10 , 12.4 and 11.3 occurred 3 times each, a frequency larger than any other number, there are
three modes 10 , 12.4 and 11.3. the data set is trimodal
d) 18.0, 14.0, 34.5, 10.0, 11.3, 10.0, 12.4, 10.0, 12.4, 11.3, 12.4, 11.3, 14.0, 14.0
The maximum number of modes is three modes, when the number of modes exceeds 3 we say there is
no mode. So, the above data have no mode.
Example 8:
Since each value occurs many times as any other value then there is no mode for data set (a) and data set
(b).
Note: Do not say that the mode is zero. That would be incorrect, because in some data zero can be an actual
value.
Example 9:
Age Frequency
18 2
19 6
Mode → 20 9 ← The largest frequency
21 4
22 3
23 1
The mode age is 20, because the age 20 occurs more than any other age.
Exercise 1:
The score is a short quiz are distributed as given in the following table:
Score 1 2 3 4 5
Relative frequency 0.05 0.15 0.20 0.35 0.25
Exercise 2:
The score is a short quiz are distributed as given in the following table:
Score 1 2 3 4 5
Cumulative frequency 3 8 18 24 30
Exercise 3:
Find the mean, median and mode for the following data set:
4
Frequency
0
1 2 3 4 5 6 7 8 9 10
x
PROBLEMS
1. Determine the mean, median, and mode for each of the following samples:
a. 20, 22, 23, 26, 29, 30 𝑥 = 25, 𝑀𝐷 = 24.5, 𝑁𝑜 𝑚𝑜𝑑𝑒
b. 120, 122, 123, 126, 126, 130, 134 𝑥 = 125.86, 𝑀𝐷 = 126, 𝑀𝑜𝑑𝑒 = 126
c. 40, 44, 44, 46, 52, 52, 58, 60, 61, 65, 70, 72 𝑥 = 55.33, 𝑀𝐷 = 55, 𝑀𝑜𝑑𝑒 = 44, 52
2. Given the data below, find the mean, the median and the mode:
132, 117, 143, 114, 125, 133, 197, 134, 114, 143, 121, 108, 131, 109, 117, 116, 84, 102, 153, 116, 98,
122, 127, 113, 111, 65, 122, 114
𝑥 = 120.75, 𝑀𝐷 = 117, 𝑀𝑜𝑑𝑒 = 114
3. Find the mean, median, and mode for the following ages:
Age 17 18 19 20 21 22 23
Frequency 2 4 8 11 14 7 3
4. Two groups of students were given a test and the following data are obtained:
5. A sample of adult males produced a mean weight of 154 pounds, a median weight of 160 pounds, and
a mode of 157 pounds. If the unit of measurement is converted from pound to kilogram find the
mean, the median and the mode in kilograms. (Hint: 1 kg = 2.205 pounds)
154
𝑥= = 69.84 𝑘𝑔, 𝑀𝐷 = 72.56 𝑘𝑔, 𝑀𝑜𝑑𝑒 = 71.20 𝑘𝑔
2.205
6. The mean, the median and the mode of a certain data set of size 𝑛 = 20 are 123, 125 and 130,
respectively. If two new observations of values 120 and 148 are obtained. Find the new values of the
mean, the median and the mode.
𝑥 = 124, 𝑀𝐷 = 125, 𝑀𝑜𝑑𝑒 = 130 𝑜𝑟 130,120 𝑜𝑟 130,148 𝑜𝑟 130,120,148
7. Compute the mean, the median and the mode for each of the following:
a. c.
10 12
10
8
8
Frequency
Frequency
4
4
2
2
0 0
16 17 18 19 20 21 22 23 24 16 17 18 19 20 21 22 23 24
Score Score
b.
12
10
8
Frequency
0
16 17 18 19 20 21 22 23 24
a. 𝑥 = 20, 𝑀𝐷 = 20, 𝑀𝑜𝑑𝑒 = 20
Score b. 𝑥 = 19.24, 𝑀𝐷 = 19, 𝑀𝑜𝑑𝑒 = 18
c. 𝑥 = 19.84, 𝑀𝐷 = 20, 𝑀𝑜𝑑𝑒 = 18, 20
Example 1:
A testing lab wishes to test two experimental brands of paint to see how long each will last before fading.
The testing lab makes 6 gallons of each brand to test. The results (in months) are shown.
Brand A 10 60 50 30 40 20
Brand B 35 45 30 35 40 25
10+60+50+30+40+20
The mean of brand A is 𝜇 = 6
= 35 months
35+45+30+35+40+25
The mean of brand B is 𝜇 = 6
= 35 months
Since the means are equal, you might conclude that both brands of paint last equally well, but this is not the
case. Even though the means are the same for the two brands, the spread, or variation, is quit different. You
can see that brand B performs more consistently; it is less variable.
For the spread or variability of a data set, three measures are commonly used: range, variance, and standard
deviation. Each measure will be discussed in this section.
I. The Range
The range is the simplest of the three measures.
The range is the absolute difference between the highest value and the lowest value in the data set. The
symbol 𝑅 is used for the range.
𝑅 = 𝑀𝑎𝑥 – 𝑀𝑖𝑛
Example 2:
The range of brand A shows that 50 months separate the largest data value from the smallest data value.
For brand B, 20 months separate the largest data value from the smallest data value, which less than one-
half of brand A’s range.
Example 3:
Set 1 40 30 25 15 18
Set 2 40 30 25 15 18 100
The range of data set 2 is very large comparing to the range of data set 1, since data set 2 contains an
extreme value (outlier).
Note: One extremely high or low data value can affect the range markedly.
The variance is the average of the squares of the deviation of each value from the mean. The symbol for the
population variance is 𝜎 2 .
The formula for the population variance is
𝑋1 − 𝜇 2 + 𝑋2 − 𝜇 2 + ⋯ + 𝑋𝑁 − 𝜇 2
𝑋−𝜇 2
𝜎2 = =
𝑁 𝑁
You might wonder why the squared deviations are used instead of the actual deviations. One reason is that
the sum of the deviations will always zero; 𝑋 − 𝜇 = 0.
Example 4:
Brand A 10 60 50 30 40 20
Brand B 35 45 30 35 40 25
Sample Variance
𝑥1 − 𝑥 2 + 𝑥2 − 𝑥 2 + ⋯ + 𝑥𝑛 − 𝑥 2
𝑥−𝑥 2
𝑠2 = =
𝑛−1 𝑛−1
Example 5:
The data show the number of patients in a sample of 6 hospitals who acquired an infection while
hospitalized. Find the variance.
110 76 29 38 105 32
𝑥 2 − 𝑛𝑥 2 𝑛 𝑥2 − 𝑥 2
𝑠2 = , 𝑠2 =
𝑛−1 𝑛(𝑛 − 1)
Example 6:
Why is it necessary to take the square root? The reason is that since the observations were squared, the unit
of the variance is the square of the unit of the original raw data. Finding the square root of the variance puts
the standard deviation in the same units as the raw data.
Note that the variance and the standard deviation cannot be negative numbers.
Example 7:
Example 8:
Find the standard deviation for the following ages:
Example 9:
The mean and the variance of the data set 25, 15, 18, 21, 16 are 19 and 16.5, respectively.
a) If each value is increased by 3 units find the new mean and variance
The new data will be 28, 18, 21, 24, 19
The new mean is 𝑥 = 19 + 3 = 22
28 2 +18 2 +212 +242 +192 −5×22 2
The new variance is 𝑠 2 = 5−1
= 16.5 (The addition does not affect the
variance)
b) If each value is multiplied by 2 units find the new mean and variance
The new data will be 50, 30, 36, 42, 32
The new mean is 𝑥 = 2 × 19 = 38
50 2 +30 2 +36 2 +42 2 +32 2 −5×38 2
The new variance is 𝑠 2 = 5−1
= 66 = 22 × 16.5 (The multiplication affect
the variance)
Example 10:
Sample 1 Sample 2
Age 25 years 10 years
Mean Weight 70 kg 30 kg
Variance 200 kg2 200 kg2
In this example the units of measure are the same for the two samples, but the nature of people in the two
samples is different. In sample one all people are of age 25, they are adults, but in sample 2 they are
children, so we must use the coefficient of variation to compare the variability.
𝑠 200
The coefficient of variation for sample 1 is CV = . 100% = . 100% = 20.20%
𝑥 70
200
The coefficient of variation for sample 2 is CV = . 100% = 47.14%
30
Sample 2 has larger variability than sample 1.
PROBLEMS
1. Determine the range, variance, and standard deviation for each of the following samples:
a. 20, 22, 23, 26, 29, 30 𝑅 = 10, 𝑠 2 = 16, 𝑠 = 4
b. 120, 122, 123, 126, 126, 130, 134 𝑅 = 14, 𝑠 2 = 23.48, 𝑠 = 4.85
c. 40, 44, 44, 46, 52, 52, 58, 60, 61, 65, 70, 72 𝑅 = 32, 𝑠 2 = 113.52, 𝑠 = 10.65
2. Given the data below, find the range, variance, and standard deviation:
132, 117, 143, 114, 125, 133, 197, 134, 114, 143, 121, 108, 131, 109, 117, 116, 84, 102, 153, 116, 98,
122, 127, 113, 111, 65, 122, 114
𝑅 = 132, 𝑠 2 = 535.38, 𝑠 = 23.14
3. Find the range, variance, and standard deviation for the following ages:
Age 17 18 19 20 21 22 23
Frequency 2 4 8 11 14 7 3
𝑅 = 6, 𝑠 2 = 2.22, 𝑠 = 1.49
4. A group of students were given two tests A and B, the following data are obtained:
5. A sample of adult males produced a mean weight of 154 pounds with standard deviation of a
standard deviation of 28 pounds. If the unit of measurements is converted from pound to kilogram
find the mean and the variance after the conversion. (Hint: 1 kg = 2.205 pounds)
2
28 28
𝑠2 = = 161.25 𝑘𝑔2 , 𝑠 = = 12.70 𝑘𝑔
2.205 2.205
6. The mean of the waiting times in an emergency room is 80.2 minutes with a standard deviation of
10.5 minutes for people who are admitted for additional treatment. The waiting time for patients
who are discharged after receiving treatment is 120.6 minutes with a standard deviation of 18.3
minutes. Which times are more variable?
Waiting time for patients who are discharged after receiving treatment is more variable
7. Three sections were given a test and the scores for each section are given below, compare the
variances for these sections:
a.
10
8
Frequency
0
16 17 18 19 20 21 22 23 24
Score
b.
4
Frequency
2
0
16 17 18 19 20 21 22 23 24
Score
c.
7
5
Frequency
0
16 17 18 19 20 21 22 23 24
Score
I. Standard Score
A standard score or 𝑧-score tells how many standard deviations a data value is above or below the mean for
a specific distribution of values.
the 𝑧-score of a data value 𝑋 that belongs to a population with mean of 𝜇 and standard deviation of
𝜎 is
𝑋−𝜇
𝑧=
𝜎
the 𝑧-score of a data value 𝑥 that belongs to a sample with mean of 𝑥 and standard deviation of 𝑠 is
𝑥−𝑥
𝑧=
𝑠
Note: If the 𝑧-score is positive; the data value is above the mean. If the 𝑧-score is 0; the data value is the same
as the mean. And if the 𝑧-score is negative; then the data value is below the mean.
Example 1:
d) The data value that is 1.2 standard deviations below the mean.
𝑥−𝑥 𝑥−120
𝑧 = 𝑠 = 225 = −1.2 → 𝑥 = 120 − 1.2 × 225 = 102.
The 𝑧-score can be used to compare the position of two data values from two different data sets. See the
following example.
Example 2:
A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she scored 30
on a statistics test with a mean of 25 and a standard deviation of 5. Compare her relative positions on the
two tests.
𝑥−𝑥 65−50
For calculus, the z score is 𝑧 = 𝑠
= 10 = 1.5
𝑥−𝑥 30−25
For statistics, the z score is 𝑧 = 𝑠 = 5 = 1.0
Since the 𝑧-score for calculus is larger, her relative position in the calculus class is higher than her
relative position in the statistics class.
Example 3:
A student scored 36 on test A that had a mean of 40 and a standard deviation of 5; she scored 94 on test B
that had a mean of 100 and a standard deviation of 10. Compare her relative positions on the two tests.
𝑥−𝑥 36−40
For test A, the z score is 𝑧 = = = −0.8
𝑠 5
𝑥−𝑥 94−100
For test B, the z score is 𝑧 = = = −0.6
𝑠 10
The score for test B is relatively higher than the score for test A.
Note: When all data for a variable are transformed into 𝑧-scores, the resulting distribution will have a mean
of 0 and a standard deviation of 1. See the following example.
Example 4:
a) Find the mean and the standard deviation of the given sample
𝑥 = 10 and 𝑠 = 2
c) Find the mean and the standard deviation for the resulting 𝑧-scores.
The mean of the resulting 𝑧-scores is 𝑧 = 0 and the standard deviation is 𝑠𝑧 = 1.
II. Percentiles
Percentiles are position measures used in educational and health-related fields to indicate the position of an
individual in a group.
Percentiles divide the data set into 100 equal parts, and they are symbolized by 𝑃1 , 𝑃2 ,…, 𝑃99
As shown in the above graph, 1% of data values are less than the first Percentile, 𝑃1 , while 99% are
greater than 𝑃1 . Similarly, 2% of data values are less than the second percentile, 𝑃2 , while 98% are
greater than 𝑃2 , and so on. It is clear that 𝑃1 ≤ 𝑃2 ≤…,≤ 𝑃99
Example 5:
A data set, of size 500, has the following percentiles; 𝑃20 = 104, 𝑃60 = 127, 𝑃85 = 143. Find the number of
data values that are:
How to evaluate the 𝒌th percentile, 𝑷𝒌 , for a data set containing 𝒏 observations:
Arrange the data set from the smallest to the largest value. 𝑥 1 , 𝑥 2 ,…,𝑥 𝑛
𝑘
The position of 𝑃𝑘 is 𝑙 = 100 × 𝑛
1
If 𝑙 is an integer then 𝑃𝑘 = 2 𝑥 𝑙 +𝑥 𝑙+1
If 𝑙 is a fraction then round 𝑙 up to the next integer, say 𝐿, and the kth percentile will be 𝑃𝑘 = 𝑥 𝐿 .
Example 6:
Twenty-four patients admitted to a hospital are tested for levels of blood sugar with the results:
87, 51, 83, 67, 78, 77, 69, 76, 68, 85, 84, 85,
77, 70, 68, 80, 74, 79, 66, 85, 73, 75, 78, 81.
To find 𝑃20 :
Arrange the data values
1 2 3 4 5 6 7 8 9 10 11 12
51 66 67 68 68 69 70 73 74 75 76 77
13 14 15 16 17 18 19 20 21 22 23 24
77 78 78 79 80 81 83 84 85 85 85 87
𝑘 20
Find the position of 𝑃20 , 𝑙 = 100 𝑛 = 100 24 = 4.8
Since 𝑙 is not integer round it up to 5.
Then 𝑃20 = 𝑥 5 = 68
To find 𝑃75 :
75
Find the position of 𝑃75 , 𝑙 = 100 24 = 18
1 81+83
Since 𝑙 is integer, then 𝑃75 = 2 𝑥 18 +𝑥 19 = 2
= 82
To find 𝑃85 :
85
Find the position of 𝑃85 , 𝑙 = 100 24 =20.4
Round 𝑙 up to 21
Then 𝑃85 = 𝑥 21 = 85
III. Quartiles
The quartiles are 3 values, 𝑄1 , 𝑄2 ,and 𝑄3 , that divide an ordered data set into 4 equal parts; each
part containing about 25% of observations
The first quartile is denoted by 𝑄1 and it is equal to the 25th percentile 𝑃25
The second quartile is denoted by 𝑄2 and it is equal to the 50th percentile 𝑃50 = 𝑀𝐷
The third quartile is denoted by 𝑄3 and it is equal to the 75th percentile 𝑃75
Example 7:
The inter-quartile range, denoted by 𝐼𝑄𝑅, is the positive difference between the first and third
quartiles.
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 𝑃75 − 𝑃25
Relatively large values of IQR indicate relatively large variability; smaller values indicate less variability.
Note: The 𝐼𝑄𝑅 is used to measure the variability when the data set contains outliers or when the
distribution is badly skewed
Example 8:
V. Outliers
A data set should be checked for extremely high or extremely low values. These values are called outliers.
There are many methods to check a data set for outliers. One of them will be shown in this section.
Example 9:
1.34 1.69 1.78 1.89 2.03 2.16 2.27 2.34 2.39 2.88
2.92 3.13 3.36 3.57 3.67 3.83 5.44 8.63 11.82
Example 10:
Twenty-four patients admitted to a hospital are tested for levels of blood sugar with the results:
87, 51, 83, 67, 78, 77, 69, 76, 68, 85, 84, 85,
77, 70, 68, 80, 74, 79, 66, 85, 73, 75, 78, 81.
𝑄1 = 69.5, 𝑄3 = 82
Check for outliers.
3. There are no data values less than 50.75 or greater than 100.75, so there are no outliers.
1. The data value may have resulted from a measurement or observational error. Perhaps the
researcher measured the variable incorrectly.
2. The data value may have resulted from a recording error. That is, it may have been written or
typed incorrectly.
3. The data value may have been obtained from a subject that is not in the defined population. For
example, suppose test scores were obtained from a 7th grade class, but a student in that class was
actually in the 6th grade and had special permission to attend the class. This student might have
scored extremely low on that particular exam on that day.
4. The data value might be a legitimate value that occurred by chance.
There are no hard-and-fast rules on what to do with outliers, nor is there complete agreement among
statisticians on ways to identify them. Obviously, if they occurred as a result of an error, an attempt
should be made to correct the error or else the data value should be omitted entirely. When they
occur naturally by chance, the statistician must make a decision about whether to include them in the
data set.
Note: When a distribution is normal or bell-shaped, data values that are beyond 3 standard deviations of the
mean can be considered suspected outliers.
The box plot is a graph used to detect outliers and it can be used to determine the direction of skewness
Calculate the three quartiles, 𝑄1 , 𝑀𝐷, and 𝑄3 , and the inner fences.
Detect the outliers (if any)
The largest and the smallest non-outlier-observations are called adjacent values
Draw a horizontal line representing the scale of measurements.
Form a box just above the horizontal line with the right and the left ends at 𝑄1 and 𝑄3 .
Draw a vertical line through the box at the location of the median.
Locate the adjacent values using the scale along the horizontal line, and connect them to the box with
horizontal lines.
The outliers are marked on the graph with open circles (•) or with stars (∗)
Note: The box plot can be used describe the shape of a data distribution by looking at the position of
the median line compared to 𝑄1 and 𝑄3 lines. If the median is closed to the middle of the box, the
distribution is approximately symmetric. The distribution is skewed to the right if the median line is
to the left of the centre. The distribution is skewed to the left if the median line is to the right of the
centre.
Example 11:
There are two outliers on the right of the distribution, which makes the distribution skewed to right
(positively skewed)
PROBLEMS
1. A test (A) of job-related stress was standardized and found to have a mean of 112.6 with a standard
deviation of 13.8. A second test (B) had a mean of 44.6 with a standard deviation of 6.3.
a) Which test is more highly variable? Test B
b) Ali was given the two tests A and B, his scores were 𝑥𝐴 = 119 and 𝑥𝐵 = 54. On which test was the
score better? Test B
2. Given the following sample 5, 6, 12, 13, 15, 18, 20, 22, 50
a) Evaluate the interqurtile range. 8
b) Check for outliers. 50
c) Construct a box plot.
3. A dietitian is interested in comparing the sodium content for real cheese with the sodium content of a
cheese substitute. The data for two random samples are shown. Compare the distributions, using box
plots.
400
300
200
100
0
Real Substitute
4. The distribution of a certain data set is symmetric. The first and the second quartiles are 65 and 70
respectively.
a) Find the mean. 70
b) Find the third quartile. 75
c) Find the interquartile range. 10
BIOSTATISTICS
CHAPTER 4: Probability
I. Basic Concepts:
A random experiment is an experiment, the outcome of which cannot be determined with certainty
before performing the experiment.
For example: Measuring the rainfall for the next month, testing for blood type, testing whether a patient
diabetic or not.
The set of all possible outcomes is called the sample space and denoted by Ω.
Example 1:
d) Five pints of blood are stored in a hospital laboratory. It is known that exactly two pints are type O,
but it is not known which ones. Pints are selected one by one until getting a pint of type O.
Let O: The selected pint is type O, O*: The selected pint is not type O
Then Ω = {O, O*O, O*O*O, O*O*O*O}.
Exercise:
For Example 1 Part (b), write the sample space if pints are selected one by one until getting the two-type O
pints.
Example 2:
Example 3:
In experiment of testing three patients whether they have cancer or not, define an event, A, to be exactly
two of the three patients do not have cancer
Here, A = {CC*C*, C*CC*, C*C*C}
Equally likely outcomes are outcomes that have the same probability of occurring.
Example 4:
The following is the sample space of a random experiment with equally likely outcomes: Select 3 people
at random from a population 50% of which are males and 50% are females, and then record the gender
of the selected people, then Ω = {MMM, MMF, MFM, MFF, FMM, FMF, FFM, FFF}. Here, Ω consists of 8
equally likely outcomes.
Exercise:
Give another examples for random experiments whose outcomes are equally likely.
Operations on Sets
The intersection of events 𝐴and 𝐵, denoted by 𝐴 ∩ 𝐵, is the event that both 𝐴 and 𝐵 occur.
If 𝐴 ∩ 𝐵 = ∅, then 𝐴 and 𝐵 are called disjoint or mutually exclusive events and this means the two
events cannot occur together.
The union of events 𝐴and 𝐵, denoted by 𝐴 ∪ 𝐵, is the event that 𝐴 or 𝐵 or both occur.
The complement of an event 𝐴, denoted by 𝐴𝑐 , consists of all outcomes in Ω that are not in 𝐴. 𝐴𝑐 is the
event that 𝐴 will not occur.
Demorgan’s laws:
𝑐
1) 𝐴∩𝐵 = 𝐴𝑐 ∪ 𝐵𝑐 2) 𝐴 ∪ 𝐵 𝑐
= 𝐴𝑐 ∩ 𝐵𝑐
𝐴∩𝐵 𝐴∪𝐵 𝐴𝑐
B A
𝐴 ∩ 𝐵𝑐 𝐴𝑐 ∩ 𝐵 𝐴𝑐 ∩ 𝐵𝑐 𝐴∩𝐵 𝑐
Example 5:
1. 𝐴 ∩ 𝐵 = {1,3}
2. 𝐴 ∪ 𝐵 = {1,2,3,5,7}
3. 𝐴𝑐 = {2,4,6}
4. 𝐴 ∩ 𝐵𝑐 = 1, 3, 5, 7 ∩ 4,5,6,7 = {5,7}
5. 𝐴𝑐 ∪ 𝐵 = 2,4,6 ∪ 1, 2, 3 = {1,2,3,4,6}
6. 𝐴 ∩ Ω = 𝐴
7. 𝐴 ∪ Ω = Ω
8. 𝐴 ∩ ∅ = ∅
9. 𝐴 ∪ ∅ = 𝐴
𝑐
10. 𝐴 ∪ 𝐵 = {1,2,3,5,7}𝑐 = {4,6}
1. If an experiment has 𝑛 equally likely outcomes, 𝑟 of which constitute an event 𝐴, then the probability that
𝑟
the event 𝐴 will occur is 𝑃 𝐴 =
𝑛
Example 6:
Given the experiment in Example 4. Find the probability for the following:
a) A: Two of the selected people are males and the other is female.
3
A = {MMF, MFM, FMM}→ 𝑃 𝐴 = 8
2. The probability that an event 𝐴 will occur is equal to the sum of the probabilities of the outcomes that
make up the event 𝐴
Example 7:
Given Ω = {a, b, c, d} such that P(a) = 0.1, P(b) = 0.2, P(c) = 0.3, and P(d) = 0.4. Find,
b) P(Ω)
P(Ω) = P(a) + P(b) + P(c) + P(d) = 1
9. If 𝐴 ⊆ 𝐵, then 𝑃 𝐴 ≤ 𝑃 𝐵
Example 8:
d) 𝑃 𝐴𝑐 ∩ 𝐵𝑐 = 𝑃 𝐴 ∪ 𝐵 𝑐
= 1 − 𝑃 𝐴 ∪ 𝐵 = 1 − 0.9 = 0.1
e) 𝑃 𝐴 ∪ 𝐵𝑐 = 𝑃 𝐴 + 𝑃 𝐵𝑐 − 𝑃 𝐴 ∩ 𝐵𝑐 = 𝑃 𝐴 + 1 − 𝑃 𝐵 − 𝑃 𝐴 −𝑃 𝐴∩𝐵 = 0.7
𝐴∪𝐵 𝐴 ∩ 𝐵𝑐
.1
.3 .4 .2
B A
𝐴𝑐 ∩ 𝐵 𝐴𝑐 ∩ 𝐵𝑐 𝐴 ∪ 𝐵𝑐
Example 9:
Let 𝐴 and 𝐵 be two events, compare the probabilities 𝑃(𝐴), 𝑃(𝐴 ∩ 𝐵), and 𝑃(𝐴 ∪ 𝐵)
Since 𝐴 ∩ 𝐵 ⊆ 𝐴 ⊆ 𝐴 ∪ 𝐵 → 𝑃(𝐴 ∩ 𝐵) ≤ 𝑃(𝐴) ≤ 𝑃(𝐴 ∪ 𝐵)
Example 10:
(True or False), There are two disjoint events 𝐴 and 𝐵 such that 𝑃 𝐴 = 0.8 and 𝑃 𝐵 = 0.3
If 𝐴 and 𝐵 are disjoint, then, using Rule 7, 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 = 0.8 + 0.3 = 1.1 > 1.
This result contradicts Rule 3, so, the statement is False
Example 11:
Example 12:
c) Is not overweight
151
𝑃 𝑂𝑐 = 324 = 0.466.
Example 13:
Suppose that the probability that Ali will get an A in biostatistics is about 0.4, Omar will get an A is about
0.35, and because they studied together, the probability that both Ali and Omar will get an A is 0.2.
Determine the probability that:
Now the even “Either Ali or Omar will get an A” can be denoted by 𝐴 ∪ 𝑂
And 𝑃 𝐴 ∪ 𝑂 = 𝑃 𝐴 + 𝑃 𝑂 − 𝑃 𝐴 ∩ 𝑂 = 0.55
Example 14:
Among a group of children, 35% suffer from cognitive dissonance as a result of school experiences and
40% have a distorted sense of reality as measured by a standard measuring device. In a group of 200
children for which these percentages hold, a total of 32 have both cognitive dissonance and distorted
sense of reality. How many of these children are free from both problems?
Now,
𝑃 𝐶 𝑐 ∩ 𝐷𝑐 = 𝑃 𝐶 ∪ 𝐷 𝑐
= 1 − 𝑃 𝐶 ∪ 𝐷 = 1 − 0.35 + 0.4 − 0.16 = 0.41
The number of children that are free from both problems is 200 × 0.41 = 82
Exercise:
Among a group of children, 35% suffer from cognitive dissonance only and 40% have a distorted sense of
reality only. In a group of 200 children for which these percentages hold, a total of 32 have both cognitive
dissonance and distorted sense of reality. How many of these children are free from both problems?
The conditional probability of an event 𝐴 in relationship of an event 𝐵 is defined as the probability that event
𝐴 occurs after event 𝐵 has already occurred. And it can be evaluated using the following formula:
𝑃 𝐴∩𝐵
𝑃 𝐴𝐵 = , 𝑃(𝐵) ≠ 0
𝑃 𝐵
Example 15:
A class consists of 10 girls and 15 boys. 2 girls and 6 boys are left-handed. One Student is selected at
random from this class, determine the probability that the selected student is:
10
a) Girl→ 𝑃 𝐺 = 10+15 = 0.4
2
b) Girl and left-handed→ 𝑃 𝐺 ∩ 𝐿 = = 0.08
25
2+6
c) Left-handed→ 𝑃 𝐿 = 25
= 0.32
2 𝑃(𝐿∩𝐺) 2 25 2
d) Left-handed if she is a girl→ 𝑃 𝐿|𝐺 = = 0.2 or 𝑃 𝐿|𝐺 = = = = 0.2
10 𝑃(𝐺) 10 25 10
6 𝑃(𝐿∩𝐵) 6 25 6
e) Boy if he is left-handed→ 𝑃 𝐵|𝐿 = = 0.75 or 𝑃 𝐵|𝐿 = = = = 0.75
2+6 𝑃(𝐿) 8 25 8
Example 16:
𝑃(𝑂∩𝐴) 0.2
a) Omar will get an A if Ali gets an A→ 𝑃 𝑂|𝐴 = = = 0.5
𝑃(𝐴) 0.4
Example 17:
According to Example 12, what is the probability that the selected patient is:
121
a) Overweight if he or she has high blood pressure. → 𝑃 𝑂|𝐻 = 184 = 0.658
63
b) Not overweight if he or she has high blood pressure. → 𝑃 𝑂𝑐 |𝐻 = = 0.342
184
Example 18:
Let 𝐴 and 𝐵 be two events from a sample space Ω such that 𝑃(𝐴) = 0.6, 𝑃(𝐵) = 0.7, and 𝑃(𝐴 ∩ 𝐵) = 0.4,
find:
𝑃(𝐴∩𝐵) 0.4 4
a) 𝑃(𝐴|𝐵) → 𝑃 𝐴 𝐵 = 𝑃(𝐵)
= 0.7 = 7
𝑃(𝐴∩𝐵) 0.4 2
b) 𝑃(𝐵|𝐴) → 𝑃 𝐵 𝐴 = = = . Note that in general 𝑃(𝐴|𝐵) ≠ 𝑃(𝐵|𝐴)
𝑃(𝐴) 0.6 3
𝑃 𝐴∩Ω 𝑃 𝐴
c) 𝑃 𝐴 Ω → 𝑃 𝐴 Ω = 𝑃 Ω
= 1
= 𝑃(𝐴) = 0.6
𝑃 𝐴∩Ω 𝑃 𝐴
d) 𝑃 Ω 𝐴 → 𝑃 Ω 𝐴 = 𝑃 𝐴
=𝑃 𝐴
= 1.
𝑃 𝐴∩∅ 0
e) 𝑃(∅|𝐴) → 𝑃(∅|𝐴) = 𝑃 𝐴
=𝑃 𝐴
=0
𝑃 𝐴∩𝐴 𝑃 𝐴
f) 𝑃(𝐴|𝐴) → 𝑃(𝐴|𝐴) = 𝑃 𝐴
=𝑃 𝐴
=1
𝑃 𝐴𝑐 ∩𝐵 0.7−0.4 3
g) 𝑃(𝐴𝑐 |𝐵) → 𝑃(𝐴𝑐 |𝐵) = 𝑃 𝐵
= 0.7
= 7. Notice that 𝑃 𝐴𝑐 𝐵 = 1 − 𝑃(𝐴|𝐵)
𝑃 𝐵 𝑐 ∩𝐵 𝑃 ∅
h) 𝑃 𝐵𝑐 𝐵 → 𝑃 𝐵𝑐 𝐵 = = = 0.
𝑃 𝐵 𝑃 𝐵
The multiplication rule can be used to find the probability of two or more events that occur in sequence. For
example, if you select three students from a class of 40 students, you can find the probability that the three
students are left-handed.
Example 19:
Let 𝐴 and 𝐵 be two events such that 𝑃 𝐴 𝐵 = 0.4, 𝑃 𝐴|𝐵𝑐 = 0.3 and 𝑃 𝐵 = 0.6, find:
𝑃 𝐴∩𝐵 0.24 2
d) 𝑃 𝐵|𝐴 → 𝑃(𝐴)
= 0.36 = 3 = 0.667
Example 20:
Suppose that there are 5 smokers in a group of 20 men. Two men are to be selected at random and one
by one from this group. Find the probability that:
c) The two men are smokers (the first is smoker and the second is smoker)
4 5 1
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃 𝐴 = 19 × 20 = 19
Notice that:
2. We can find the probability for each out come in the sample space using the multiplication rule.
3. We can find the probability for any event related to the above experiment, for example the
4 15 15 34
probability that at least one of two men is smoker equals to 𝑃 {𝑆 𝑆, 𝑆 𝑆 𝑐 , 𝑆 𝑐 𝑆} = 76 + 76 + 76 = 76
Example 21:
Suppose that 60% of students in a class are boys. And suppose that 30% of boys and 25% of girls wear
eyeglasses. One student is selected at random from this class, determine the probability that
V. Independent Events
When the outcome or occurrence of the first event affects the outcome or occurrence of the second
event in such a way the probability is changed, the events are said to be dependent events.
When the outcome or occurrence of the first event does not affect the outcome or occurrence of the
second event in such a way the probability remains the same, the events are said to be independent
events.
If two events are disjoint, this does not mean the two events are independent.
Example 22:
Let 𝐴 and 𝐵 be two independent events such that 𝑃(𝐴) = 0.4 and 𝑃(𝐵) = 0.7, find
a) 𝑃 𝐴 𝐵 → 𝑃 𝐴 𝐵 = 𝑃 𝐴 = 0.4
Note that 𝐴 and 𝐵𝑐 are independent, also 𝐴𝑐 and 𝐵 are independent and 𝐴𝑐 and 𝐵𝑐 are independent.
Example 23:
Approximately 10% of men have a type of color blindness that prevents from distinguishing between red
and green. If 3 men are selected at random, find the probability that:
Exercise:
A group of 100 students is classified according to their scores on a certain test and their sections.
Complete the following table so that the two events "The student's score is high" and "The student is
from section 1" are independent.
Score
Section High (𝐻) Not high (𝐻 𝑐 ) Total
1 (𝑆1) ____ ____ 40
2 (𝑆2) ____ ____ ____
Total 20 ____ 100
PROBLEMS
1. A group of students consists of 3 boys, Ali, Omar and Khalid, and 2 girls, Rana and Huda. One student will
be selected at random from this group.
a) What is the sample space? Ω ={ Ali, Omar, Khalid, Rana, Huda}
b) Find the probability that:
1. Omar will be selected 0.2
2. Omar or Huda will be selected 0.4
3. A boy will be selected 0.6
4. A boy or girl will be selected 1.0
5. Ali will not be selected 0.8
2. A group of students consists of 3 boys, Ali, Omar and Khalid, and 2 girls, Rana and Huda. Two students
will be selected at random from this group.
a) What is the sample space? Ω ={(A,O), (A,K), (A,R), (A,H), (O,k), (O,R), (O,H), (K,R), (K,H), (R,H)}
b) Find the probability that:
1. Omar will be selected 0.4
2. Omar and Huda will be selected 0.1
3. Two boys will be selected 0.3
4. At least one boy will be selected 0.9
5. A boy and a girl will be selected 0.6
6. Two boys or two girls will be selected 0.4
7. Ali will be selected if Omar has been selected 0.25
3. In a genetics experiment, the researcher mated two Drosophila fruit and observed the traits of 300
offspring. The results are shown in the following table:
Wing size
Eye color Normal Miniature
Normal 140 6
Vermillion 3 151
One of these offspring is randomly selected and observed for the two genetic traits.
What is the probability that the fly has:
a) Normal eye color and normal wing size? 140/300
b) Vermillion eyes? 154/300
c) Either vermillion eyes or miniature wings? 160/300
d) Normal eye color if it has normal wing size? 140/143
4. Out of 100 applicants for a certain job, 70 have some experience, 28 are over 40 years old, and 65 are
men. The distribution of applicants over these three factors is shown here:
Experience No Experience
Over 40 Under 40 Over 40 Under 40
Men 15 40 Men 3 7
Women 5 10 Women 5 15
One person is chosen at random from the 100, find the probability that the selected person is:
a) Over 40 years old 0.28
b) Man and has some experience 0.55
c) Under 40 or woman 0.82
d) Woman if she has no experience 0.667
5. Let 𝐴 and 𝐵 be two events such that 𝑃 𝐴 = 0.6, 𝑃 𝐵 = 0.8 and 𝑃 𝐴 ∩ 𝐵 = 0.5, answer the following:
a) Find 𝑃 𝐴 ∪ 𝐵 0.9
b) Find 𝑃 𝐴 ∩ 𝐵𝑐 0.1
𝑐
c) Find 𝑃 𝐴 ∪ 𝐵 0.7
𝑐 𝑐
d) Find 𝑃 𝐴 ∩ 𝐵 0.1
e) Find 𝑃 𝐴|𝐵 0.625
f) Find 𝑃 𝐴𝑐 |𝐵 0.375
g) Find𝑃 𝐵|𝐴 0.833
h) Find 𝑃 𝐴|𝐵𝑐 0.5
i) Are 𝐴 and 𝐵 mutually exclusive? No
j) Are 𝐴 and 𝐵 independent events? No
6. Suppose that 20% of adults in a large population are smokers. A random sample of three persons is
selected, find the probability that the sample will contain:
7. Suppose that 30% of adults in a certain population are smokers, 20% of smokers have lung cancer and
10% of nonsmokers have lung cancer. One person is selected at random from this population, find the
probability that the selected person:
8. A survey of people in a given region showed that 20% were smokers. The probability of death due to
lung cancer, given that a person smoked, was roughly ten times the probability of death due to lung
cancer, given that a person did not smoke. If the probability of death due to lung cancer in the region is
0.007
a) What is the probability of death due to lung cancer given that a person is a smoker?
0.025
b) If a person died due to lung cancer, what is the probability that the person was smoker?
0.714
9. In a group of 200 students, 160 take an English course, 120 take a mathematics course, and 16 take
neither. Are the two events “takes an English course” and “takes a mathematics course” independent?
Yes
Exercises:
Section 4-1: 10, 12, 13, 15, 17, 18, 19, 20, 21, 24, 28, 35, 39
Section 4-3: 5, 7, 9, 11, 17, 18, 22, 23, 33, 34, 38, 46
An-Najah National University CH 4 – Page 55
Elementary Statistics:A Step by Step Approach, Bluman, 7th Ediyion 2022-2023
BIOSTATISTICS
Random Variables
In Chapter 1, a variable was defined as a quantity whose value is not fixed. A variable that takes on number
values is a numerical variable. A numerical variable whose value is determined by chance is called a random
variable. A random variable can be defined as follows:
A variable that assumes a unique numerical value for each of the outcomes in the sample space of a
random experiment is called a random variable.
Example 1:
If a random sample of 5 patients is selected from a hospital, a letter such as 𝑋 can be used to represent the
number of diabetic patients in the sample. then the value that 𝑋 can assume is 0, 1, 2, 3, 4, or 5. The set of
all possible values of 𝑋 is called the space of 𝑋 and it is denoted by Ω𝑋 .
Example 2:
If 𝑇 is the time it takes a student to finish a one-hour exam, then 𝑇 is a random variable with possible value
from 15 minutes to 60 minutes.
When the space of a random variable 𝑋, Ω𝑋 , is finite or countable set then 𝑋 is called discrete random
variable. In general discrete random variables are obtained from data that can be counted.
Example 3:
When the space of a random variable 𝑋, Ω𝑋 , is an interval of real numbers then the random variable 𝑋 is
called continuous random variable. In general continuous random variables obtained from data that can
be measured rather than counted.
Example 4:
Example 5:
Suppose that 10% of men in a population are color blind. A sample of 3 men is selected at random and
each one of them is checked for color blindness.
Notice that the sample space can be determined using a tree diagram.
b) Construct a probability distribution for the number of color blind men in the sample.
Let a random variable 𝑋 denotes the number of color blind men in the sample.
To construct a probability distribution for 𝑋 ,we must determine all possible values of 𝑋 and
their probabilities.
When the outcome is 𝐵𝐵𝐵 then 𝑋 = 3 with probability 𝑃 3 = 0.1 × 0.1 × 0.1 = 0.001
When the outcome is one of the outcomes, 𝐵𝐵𝐵𝑐 , 𝐵𝐵𝑐 𝐵, 𝐵𝑐 𝐵𝐵 , then 𝑋 = 2 with probability
𝑃 2 = 3 × 0.1 × 0.1 × 0.9 = 0.027
When the outcome is one of the outcomes, 𝐵𝐵𝑐 𝐵𝑐 , 𝐵𝑐 𝐵𝐵𝑐 , 𝐵𝑐 𝐵𝑐 𝐵 , then 𝑋 = 1 with
probability 𝑃 1 = 3 × 0.1 × 0.9 × 0.9 = 0.243
When the outcome is 𝐵𝑐 𝐵𝑐 𝐵𝑐 then 𝑋 = 0 with probability 𝑃 0 = 0.9 × 0.9 × 0.9 = 0.729
The probability distribution for 𝑋 is
𝑋 0 1 2 3
𝑃(𝑋) 0.729 0.243 0.027 0.001
Example 6:
In a group of 12 men, there are 4 smokers. A sample of 3 men is selected at random from this group.
Let a random variable 𝑋 denotes the number of smokers in the sample. To construct a
probability distribution for 𝑋 ,we must determine all possible values of 𝑋 and their
probabilities.
4 3 2 1
When the outcome is 𝑆𝑆𝑆 then 𝑋 = 3 with probability𝑃 3 = × × =
12 11 10 55
When the outcome is one of the outcomes, 𝑆𝑆𝑆 𝑐 , 𝑆𝑆 𝑐 𝑆, 𝑆 𝑐 𝑆𝑆 , then 𝑋 = 2 with probability
4 3 8 12
𝑃 2 = 3 × 12 × 11 × 10 = 55
8 7 6 14
When the outcome is 𝑆 𝑐 𝑆 𝑐 𝑆 𝑐 then 𝑋 = 0 with probability 𝑃 0 = 12 × 11 × 10 = 55
The probability distribution for 𝑋 is
𝑋 0 1 2 3
𝑃(𝑋) 14/55 28/55 12/55 1/55
Example 7:
a. 𝑋 –5 0 5 10 15 b. 𝑌 2 3 7 c. 𝑍 0 2 4 6
𝑃(𝑋) 0.2 0.2 0.1 0.2 0.3 𝑃(𝑌) 0.5 0.3 0.4 𝑃(𝑍) – 1.0 1.5 0.3 0.2
Example 8:
The probability that a patient will have 0, 1, 2, or 3 medical tests performed on entering a hospital are 6/15,
5/15, 3/15, and 1/15, respectively.
a) Construct a probability distribution for the number of tests that a patient will have.
Let 𝑋 denotes the number of tests that a patient will have, the probability distribution of 𝑋 is
𝑋 0 1 2 3
𝑃(𝑋) 6/15 5/15 3/15 1/15
f) If 300 patients entered the hospital, how many patients would you expect to have exactly 2
tests?
3
300 × 𝑃 2 = 300 × 15 = 60
Example 9:
0.48
Let 𝑃 𝑋 = 𝑋
,𝑋 = 1, 2, 3, 4 be the probability distribution of a discrete random variable 𝑋, find
a) 𝑃 𝑋 = 3
0.48
→𝑃 𝑋=3 = = 0.16
3
c) 𝑃 𝑋 ≠ 2
0.48
→ 𝑃 𝑋 ≠ 2 = 1 − 𝑃 𝑋 = 2 = 1 − 2 = 0.76
d) 𝑃 𝑋 ≤ 2
0.48 0.48
→ 𝑃 𝑋 ≤ 2 = 𝑃 𝑋 = 1 + 𝑃 𝑋 = 2 = 1 + 2 = 0.72
e) 𝑃 𝑋 < 2
0.48
→𝑃 𝑋<2 =𝑃 𝑋=1 = = 0.48
1
0.5
0.4
P(x)
0.3
0.2
0.1
1 2 3 4
x
Let 𝑃 𝑋 , 𝑋 ∈ Ω𝑋 be the probability distribution of a discrete random variable 𝑋 and let 𝑢(𝑋) be a function
of 𝑋. The expected value of 𝒖(𝑿) is defined by
𝐸 𝑢(𝑋) = 𝑢(𝑋)𝑃 𝑋
Ω𝑋
Example 10:
Let 𝑋 be a random variable that has a probability distribution, 𝑃(𝑋), evaluate the following:
𝑋 1 2 3 4
𝑃(𝑋) 0.4 0.3 0.2 0.1
a) 𝐸 𝑋 → 𝐸 𝑋 = 𝑋 𝑃 𝑋 = 1 𝑃 1 + 2 𝑃 2 + 3 𝑃 3 + 4 𝑃 4
= 1 .4 + 2 .3 + 3 .2 + 4 .1 = 2
b) 𝐸 5 → 𝐸 5 = 5 𝑃 𝑋 = 5 𝑃 1 + 5 𝑃 2 + 5 𝑃 3 + 5 𝑃 4
= 5 𝑃 1 + 𝑃 2 + 𝑃 3 + 𝑃 4 = 5 . 4 + .3 + .2 + .1 = 5 1 = 5
c) 𝐸 3𝑋 → 𝐸 3𝑋 = 3𝑋 𝑃 𝑋 = 3 𝑋 𝑃 𝑋 = 3𝐸 𝑋 = 3 2 = 6
d) 𝐸 3𝑋 − 5 → 𝐸 3𝑋 − 5 = 3𝑋 − 5 𝑃 𝑋 = 3𝑋 𝑃 𝑋 − 5 𝑃 𝑋 =6−5=1
e) 𝐸 𝑋 2 → 𝐸 𝑋 2 = 𝑋 2 𝑃 𝑋 = 12 𝑃 1 + 22 𝑃 2 + 32 𝑃 3 + 42 𝑃 4
= 1 . 4 + 4 . 3 + 9 . 2 + 16 . 1 = 5
Properties of Expectation
Let 𝑢(𝑋) and 𝑣(𝑋) be two functions of a random variable 𝑋 and let 𝑐 be a constant, then:
1. 𝐸 𝑐 = 𝑐
2. 𝐸 𝑐 𝑢 𝑋 = 𝑐𝐸 𝑢 𝑋
3. 𝐸 𝑢 𝑋 + 𝑣(𝑋) = 𝐸 𝑢 𝑋 + 𝐸 𝑣 𝑋
4. 𝐸 𝑢 𝑋 − 𝑣(𝑋) = 𝐸 𝑢 𝑋 − 𝐸 𝑣 𝑋
5. 𝐸 𝑢 𝑋 × 𝑣(𝑋) ≠ 𝐸 𝑢 𝑋 × 𝐸 𝑣 𝑋
6. 𝐸 𝑢 𝑋 ÷ 𝑣(𝑋) ≠ 𝐸 𝑢 𝑋 ÷ 𝐸 𝑣 𝑋
Example 11:
2
𝐸 𝑋 − 20 = 𝐸 𝑋 2 − 40𝑋 + 400 = 𝐸 𝑋 2 − 40𝐸 𝑋 + 400 = 500 − 40 20 + 400 = 100
Mean
Since the expected value of a random variable is the long-term average (or mean value) of the theoretical
population, it is identical to the population mean 𝜇 introduced in Chapter 3. We thus define the mean of a
random variable as follows:
Example 12:
Refer to Example 8, determine the mean number of tests that patients will have on entering a hospital.
6 5 3 1 14
𝜇=𝐸 𝑋 = 𝑋 𝑃 𝑋 =0 +1 +2 +3 = ≅ 0.93
15 15 15 15 15
Let 𝑋 be a random variable with probability distribution 𝑃 𝑋 and mean 𝜇, the variance of 𝑋 is defined by
𝜎2 = 𝐸 𝑋 − 𝜇 2
= 𝑋 − 𝜇 2𝑃 𝑋
Example 13:
Find the mean, variance and standard deviation for the random variable in Example 10.
1. The mean is 𝜇 = 𝐸 𝑋 = 𝑋 𝑃 𝑋 = 1 .4 + 2 .3 + 3 .2 + 4 .1 = 2
2. The variance is 𝜎 2 = 𝐸 𝑋 − 𝜇 2
= 𝑋 − 2 2𝑃 𝑋
2 2 2 2
= 1−2 .4 + 2 − 2 .3 + 3 − 2 .2 + 4 − 2 . 1 =1
Example 14:
A committee of 3 members is to be selected at random from a group of 2 nurses and 4 doctors. Let 𝑋
denotes the number of nurses in the selected committee, find the mean and the standard deviation of 𝑋.
1. To find the mean of 𝑋 (the expected value of 𝑋) , we must construct a probability distribution of 𝑋
Here, the sample space is Ω = 𝐷𝐷𝐷, 𝐷𝐷𝑁, 𝐷𝑁𝐷, 𝐷𝑁𝑁, 𝑁𝐷𝐷, 𝑁𝐷𝑁, 𝑁𝑁𝐷 and the space of 𝑋is
Ω𝑋 = 0,1,2 .
4 3 2
The probability of no nurses in the committee is 𝑃 𝑋 = 0 = 𝑃 𝐷𝐷𝐷 = 6 × 5 × 4 = 0.2
Similarly, 𝑃 𝑋 = 1 = 0.6 and 𝑃 𝑋 = 2 = 0.2
𝑋 0 1 2
𝑃(𝑋) 0.2 0.6 0.2
Factorial Notation
0! = 1
𝑛 + 1 ! = 𝑛 + 1 𝑛!
Example 15:
Evaluate
a) 5! = 5 × 4 × 3 × 2 × 1 = 120
b) 1! = 1
7! 7×6×5!
c) = = 7 × 6 = 42
5! 5!
8! 8×7×6×5!
d) 3!×5!
= (3×2×1)×5! = 8 × 7 = 56
Combinations Rule:
Let 𝑟 and 𝑛 be two nonnegative integers such that 𝑟 ≤ 𝑛, then the combinations of 𝑛 and 𝑟 is computed as
follows:
𝑛 𝑛!
𝑛C𝑟 = =
𝑟 𝑟! × 𝑛 − 𝑟 !
Example 16:
Evaluate
8 8! 8×7×6×5!
a) 8C3 = = 3!× 8−3 ! = 3×2×1×5! = 56
3
10 10! 10×9!
b) 10C1 = = 1!× 10−1 ! = 1×9! = 10
1
5 5! 5!
c) 5C0 = = 0!× 5−0 ! = 1×5! = 1
0
7 7! 7! 7!
d) 7C7 = = 7!× 7−7 ! = 7!×0! = 7!×1 = 1
7
15 15! 15×14! 15×14!
e) 15C14 = = 14!× 15−14 ! = 14!×1! = 14!×1 = 15
14
Note:
𝑛 𝑛
1. = =1
0 𝑛
𝑛 𝑛
2. = =𝑛
1 𝑛−1
𝑛 𝑛
3. =
𝑟 𝑛−𝑟
Example 17:
100
a) =1
0
20
b) =1
20
25
c) = 25
1
10 10
d) If = 120, then = 120
3 7
Note:
In binomial distribution, the combination (𝑛C𝑟) is used to count the ways of arranging 𝑟 successes
within 𝑛 trials.
Example 18:
Suppose that 15% of people in a large population are left-handed. A random sample of size 𝑛 = 6 is
selected from this population; in how many ways can the sample contains 2 left-handed people?
{L L R R R R} , {L R L R R R} , {L R R L R R} , {L R R R L R} , {L R R R R L} , {R L L R R R} ,
{R L R L R R} , {R L R R L R} , {R L R R R L} , {R R L L R R} , {R R L R L R} , {R R L R R L} ,
{R R R L L R}, {R R R L R L} , {R R R R L L}
As you see, there are 15 permutations for 2 left-handed people within 6 people.
We can find the number of permutations without define each one of them by evaluating the following
6 6! 6×5
combination 6C2 = = 2!×4! = 2×1 = 15
2
The outcomes of a binomial experiment and the corresponding probabilities of these outcomes constitute
the binomial distribution.
Let 𝑋 denotes the number of successes in 𝑛 trials of a binomial experiment, the probability distribution of
𝑋 is
𝑛 𝑋 𝑛−𝑋
𝑃 𝑋 = 𝑝 1−𝑝 , 𝑋 = 0,1,2, … , 𝑛
𝑋
𝑛 𝑛!
Where = and 𝑛! = 𝑛 𝑛 − 1 𝑛 − 2 … 2 (1)
𝑋 𝑛−𝑋 ! 𝑋!
The mean of 𝑋 is 𝜇 = 𝑛𝑝
The variance of 𝑋 is 𝜎 2 = 𝑛𝑝(1 − 𝑝)
The standard deviation is 𝜎 = 𝑛𝑝(1 − 𝑝)
Example 19:
A survey on a large population found that one out of 5 people visited a doctor in any given month. If 10
people are selected at random, find:
a) The probability that exactly 3 will have visited a doctor last month.
This experiment is a binomial experiment for which, 𝑛 = 10, S: the selected person will have
1
visited the doctor last month, 𝑝 = 5 = 0.2, and 𝑋 is the number of people (from the 10) will
have visited a doctor last month
10
The probability distribution of 𝑋 is 𝑃 𝑥 = 0.2𝑥 0.8 10−𝑥
, 𝑥 = 0,1,2, … ,10
𝑥
10
Hence, the solution of Part (a) is 𝑃 3 = 0.23 0.8 7
= 0.201
3
b) The probability that at most 2 will have visited a doctor last month.
𝑃 𝑋 ≤2 =𝑃 0 +𝑃 1 +𝑃 2
10 10 10
= 0.20 0.8 10 + 0.21 0.8 9 + 0.22 0.8 8 = 0.678
0 1 2
c) The probability that at least 3 will have visited a doctor last month.
𝑃 𝑋 ≥ 3 = 𝑃 3 + 𝑃 4 + ⋯ + 𝑃 10 = 1 − 𝑃 0 + 𝑃 1 + 𝑃 2 = 1 − 0.678 = 0.322
10
𝑃 𝑋=0 =𝑃 0 = 0.20 0.8 10
= 0.107
0
e) The probability that all the 10 will have visited a doctor last month.
10
𝑃 𝑋 = 10 = 𝑃 10 = 0.210 0.8 0
≅ 0.0
10
f) The expected number of people (from the 10) will have visited a doctor last month.
𝐸 𝑋 = 𝜇 = 𝑛𝑝 = 10 (0.2) = 2
g) The variance of the number of people (from the 10) will have visited a doctor last month.
Example 20:
The following histograms show the probability distributions for binomial random variables with 𝑛 = 10
and 𝑝 = 0.2, 0.5, 0.9
0.30
0.25
0.20
Probability
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
X
𝑥 0 1 2 3 4 5 6 7 8 9 10
𝑃(𝑥) .107 .268 .302 .201 .088 .026 .006 .001 .000 .000 .000
0.25
0.20
Probability
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
X
𝑥 0 1 2 3 4 5 6 7 8 9 10
𝑃(𝑥) .001 .010 .044 .117 .205 .246 .205 .117 .044 .010 .001
0.30
0.25
Probability 0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
X
𝑥 0 1 2 3 4 5 6 7 8 9 10
𝑃(𝑥) .000 .000 .000 .001 .006 .026 .088 .201 .302 .268 .107
PROBLEMS
1. Omar is taking 4 courses for the semester. He believes that the probability distribution for the random
variable 𝑋 = number of courses for which he will get an A grade is given below:
𝑋 0 1 2 3 4
𝑃(𝑋) .15 .20 .30 .20 .15
𝑋 0 1 2 3 4 5
𝑃(𝑋) .10 .30 .40 .10 𝑐 .05
3. 8 pints of blood are stored in a hospital laboratory. It is known that exactly 3 pints are type O, but it is
not known which ones. 2 pints of type O blood are needed. One pint at a time is removed and typed. If it
is type O, it is used; if not, it is labeled and the next pint is tested.
a) Make a probability distribution for the number of pints that must be tested in order to obtain 2 pints
of type O.
𝑋 2 3 4 5 6 7
𝑃(𝑋) 3/28 5/28 6/28 6/28 5/28 3/28
4. Suppose 𝑋 has a binomial distribution with 𝑛 = 10 and 𝑝 = 0.4. Determine the following:
10
a) The probability distribution of 𝑋 𝑃 𝑥 = (.4)𝑥 (.6)10−𝑥 , 𝑥 = 0, 1, … , 10
𝑥
b) 𝑃(𝑋 = 3) 0.215
c) 𝑃(𝑋 ≥ 4) 0.618
d) 𝑃(𝑋 ≥ 9) 0.002
e) 𝑃( 𝑋 = 3 | 𝑋 ≤ 3 ) 0.562
f) Mean of 𝑋 4
g) Variance of 𝑋 2.4
h) 𝐸(𝑋 2 ) 18.4
5. It is known that 20% of a certain variety of flower bulb will not grow. If 15 bulbs are planted, what is the
probability that:
a) What is the probability that, in 6 single births, at least half the babies born are females?
0.633
b) Compare this probability with the result you would obtain if you used a gender ratio of 1 to 1?
0.656
7. Suppose that 30% of adults in a certain large population are smokers. A random sample of size 20 is
selected from this population, find:
b) The probability that the number of smokers in the sample is more than expected. 0.392
8. A multiple-choice quiz has 8 questions with 4 responses (one correct) on each question. To pass, you
must get at least 5 correct.
a) If you guess on every question, what is the probability that you will pass? 0.027
b) If the probability of correct answer is 0.9, what is the probability that you will pass? 0.995
9. Let 𝑋 be a binomial random variable such that the mean is 𝜇 = 5 and the variance is 𝜎 2 = 3.75, find
𝑃 𝑋=5 0.202
Section 5-1
Examples: 1, 2, 3, 4
Exercises: 7, 9, 11, 13, 15, 17, 18, 19
Section 5-2
Examples: 6, 10, 11, 12, 13
Exercises: 1, 3, 7, 9, 12
Section 5-3
Examples: 15, 16, 17, 20, 23
Exercises: 3, 4, 5, 11, 17, 28
BIOSTATISTICS
Introduction
When a random variable 𝑋 is discrete, you can assign a positive probability to each value that 𝑋 can take and
get the probability distribution of 𝑋. The sum of all probabilities associated the different values of 𝑋 is one.
However continuous random variables, such as heights, weights, length of life of a particular product, or time
required to complete a task, can assume the infinitely many values corresponding to points on a line interval.
If you try to assign a positive probability to each of these uncountable values, the probabilities will no longer
sum to 1. Therefore, you must use a different approach to find the probabilities for continuous random
variables.
Definition:
Let X be a continuous random variable with space ΩX , the probability distribution of X is a function f, that
satisfies the following:
1. f x ≥ 0 , x ∈ ΩX
2. ΩX
f x dx = 1, which means that the area under the curve of f is 1
Example 1:
Determine whether each of the following is a probability distribution of a continuous random variable or
not:
a) b) c)
The curves in parts (a) and (c) are probability distributions that the area under these curves is one
The area under the curve in part (b) is 1.5 so it is not a probability distribution.
Properties:
Let 𝑋 be a continuous random variable with space Ω𝑋 , and probability distribution𝑓(𝑥), then:
𝑏
1. 𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = 𝑎
𝑓 𝑥 𝑑𝑥 = The area under the curve of 𝑓 from 𝑥 = 𝑎 to 𝑥 = 𝑏
𝑘
4. The 𝑘th percentile, 𝑃𝑘 , can be evaluated by solving the equation 𝑃 𝑋 ≤ 𝑃𝑘 = 100 . This means the
area under the curve of 𝑓(𝑥) to the left of 𝑥 = 𝑃𝑘 is 𝑘/100
Example 2:
The probability density function of the time it takes a hematology cell counter to complete a test on a blood
sample is 𝑓 𝑥 = 0.04 , 50 < 𝑥 < 75 seconds
a) Find the probability that a test will require exactly 70 seconds to complete.
𝑃 𝑋 = 70 = 0, since 𝑋 is a continuous random variable
b) Find the probability that a test will require more than 70 seconds to complete.
𝑃 𝑋 > 70 = 0.04 75 − 70 = 0.2.
c) What is the percentage of tests that require less than one minute to complete?
𝑃 𝑋 < 60 = 0.04 60 − 50 = 0.4.
0.8
of rats, and cholesterol levels of adults, have distributions that are bell-shaped, and these are called
Definition:
1 − 𝑥−𝜇 2
𝑓 𝑥 = 𝑒 2𝜎 2 , −∞ < 𝑥 < ∞ , 𝑒 ≅ 2.7 and 𝜋 ≅ 3.14
2𝜋𝜎 2
is a normal random variable with mean 𝜇and variance𝜎 2 , and the notation 𝑋: 𝑁(𝜇, 𝜎 2 )is used to denote
the distribution.
3. The measures of central tendency; mean, median and mode are equals
4. The area under a normal distribution curve is equal to 1, hence the area to the right of 𝜇 equals the
5. The shape of the distribution is determined by the variance, 𝜎 2 . Large values of 𝜎 2 reduce the
height of the curve and increase the spread; small values of 𝜎 2 increase the height of the curve and
6. About 99.7% of the area under the normal curve lie within 3 standard deviations of the mean.
Normal; Mean=120
StDev
0.04 20
10
0.03
0.02
0.01
0.00
60 70 80 90 100 110 120 130 140 150 160 170 180
Mean StDev
0 0.45
0 1.4
0.8 0 2.24
-2 0.71
0.6
0.4
0.2
0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Example 3:
Given that 𝑃 75 < 𝑋 < 80 = 0.34 and 𝑃 75 < 𝑋 < 85 = 0.48, find
g) The 84th percentile→ 𝑃84 = 80 since the area on the left of 80 is 0.84.
In part (c) you can see the area on the left of 80 is 0.84, so, 80 is the 84th percentile.
these curves will vary. In particular applications you would have to have a table of areas under the curve for
each variable. To simplify this situation, statisticians use what is called the standard normal distribution.
Definition:
The Standard normal distribution is a normal distribution with a mean of 0 and a standard deviation
of 1. We will use, 𝑍, to denote a standard normal variable.
Example 4:
d) 𝑃 𝑍 < 2/3 = 𝑃 𝑍 < 0.67 = 0.5 + 𝑃 0 < 𝑍 < 0.67 = 0.5 + 0.2486 = 0.7486
Example 5:
e) 𝑃 1 < 𝑍 < 𝑐 = 0.1557 → 𝑃 0 < 𝑍 < 𝑐 − 𝑃 0 < 𝑍 < 1 = 𝑃 0 < 𝑍 < 𝑐 − 0.3413 = 0.1557 →
𝑃 0 < 𝑍 < 𝑐 = 0.497 → 𝑐 = 2.75
Theorem:
Let 𝑋 be a normal random variable with a mean 𝜇 of and a standard deviation of 𝜎, and let
𝑋−𝜇
𝑍=
𝜎
Here, the random variable 𝑍 is a standard normal random variable.
Example 6:
b) 𝑃 𝑋 > 38 = 𝑃 𝑍 > −1.2 = 0.5 + 𝑃 −1.2 < 𝑍 < 0 = 0.5 + 0.3849 = 0.8849
𝑐−50
d) Find the value of 𝑐 if 𝑃 𝑋 < 𝑐 = 0.9 → 𝑃 𝑍 < 𝑑 = 0.9 → 𝑃 0 < 𝑍 < 𝑑 = 0.4, where 𝑑 = 10
→ 𝑑 ≅ 1.28 → 𝑐 = 50 + 1.28 × 10 = 62.8 .
𝑃5 −50
e) 𝑃5 → 𝑃 𝑋 < 𝑃5 = 0.05 → 𝑃 𝑍 < 𝑑 = 0.05 → 𝑃 𝑑 < 𝑍 < 0 = 0.45, where 𝑑 =
10
→ 𝑑 ≅ −1.645 → 𝑃5 = 50 − 1.645 × 10 = 33.55 .
Example 7:
21
If 𝑃 𝑋 > 121 = 0.1587 → 𝑃 𝑍 > 𝑑 = 0.1587 → 𝑃 0 < 𝑍 < 𝑑 = 0.3413, where 𝑑 = 𝜎
21 21 2
→ 𝑑 = 1.00 → 𝜎 = 1.00 → 𝜎 2 = 1.00
= 441
The standard normal distribution curve can be used to solve a wide variety of practical problems. The only
requirement is that the variable be normally or approximately normally distributed. To solve problems by
using the standard normal distribution, transform the original variable to a standard normal and use the
Example 8:
A study shows that the systolic blood pressures for adults in a certain population are approximately
normally distributed with mean of 120 and standard deviation of 8.
a) Find the probability that a randomly selected person will have a blood pressure between 110 and 130
𝑃 110 < 𝑋 < 130 = 𝑃 −1.25 < 𝑍 < 1.25 = 2𝑃 0 < 𝑍 < 1.25 = 0.7888 .
b) What percentage of the population have blood pressures less than 140
𝑃 𝑋 < 140 = 𝑃 𝑍 < 2.5 = 0.5 + 𝑃 0 < 𝑍 < 2.5 = 0.9938
c) For a medical study, a researcher wishes to select people in the middle 60% of the population based on
blood pressure. Find the upper and lower readings that would qualify people to participate in the study.
d) A sample of 5 people is selected at random; find the probability that exactly 2 of the selected people have
blood pressure less than 130.
The probability a person will have a blood pressure less than 130 is
𝑋 − 120 130 − 120
𝑃 𝑋 < 130 = 𝑃 < = 𝑃 𝑍 < 1.25 = 0.5 + 𝑃 0 < 𝑍 < 1.25 = 0.8944
8 8
The number of people in the sample how blood pressure is less than 130, say 𝑌, is a binomial random
variable with 𝑛 = 5 and 𝑝 = 0.8944
5 2 5−2 2 3
We want to find 𝑃 𝑌 = 2 = 0.8944 1 − 0.8944 = 10 × 0.8944 × 0.1056 = 0.009
2
Example 9:
The final exam scores in a statistics class are normally distributed with a mean of 70 and a standard
deviation of 10.
a) If the lowest passing mark is 60, what proportion of the class fails?
𝑃 𝑋 < 60 = 𝑃 𝑍 < −1 = 0.5 − 0.3413 = 0.1587 .
To be more accurate we must evaluate 𝑃 𝑋 < 59.5 = 0.1469
b) If the highest 80% are to pass, what should be the lowest passing score?
𝑐−70
𝑃 𝑋 > 𝑐 = 0.8 → 𝑃 𝑍 > 𝑑 = 0.8 → 𝑃 𝑑 < 𝑍 < 0 = 0.3, where 𝑑 = 10
Example 10:
The sick-leave time of employees in a firm in a month is normally distributed with a mean of 100 hours and a
standard deviation of 20 hours.
a) What is the probability that the sick-leave time for next month will be between 50 and 80 hours?
Let 𝑋 denote the sick-leave time per month, then 𝑋: 𝑁 100, 202 , and we want to find 𝑃(50 < 𝑋 < 80)
b) How much time should be budgeted for sick leave if the budgeted amount should be exceeded with a
probability of only 10%?
Let 𝑐 denote the time should be budgeted for sick leave, then 𝑃 𝑋 > 𝑐 = 0.1
𝑐 − 100 𝑐 − 100
𝑃 𝑋>𝑐 =𝑃 𝑍> = 0.1 → 𝑃 0 < 𝑍 < = 0.5 − 0.1 = 0.4
20 20
𝑐−100
By using the table, it is found that = 1.28 → 𝑐 = 100 + 1.28 20 = 125.6 hours.
20
interested in knowing how the means of samples of the same size taken from the same population vary
A sampling distribution of sample means is a distribution using the means computed from all possible
random samples of a specific size taken from a population.
Example 11:
a) If a random sample of size 𝑛 = 2 is to be selected from this population (the selection is with
replacement), find the sampling distribution of the sample mean 𝑋 and evaluate 𝜇𝑥 and 𝜎𝑥
The following table shows all possible samples corresponding with their means.
Sample 1,1 1,3 1,5 1,7 3,1 3,3 3,5 3,7 5,1 5,3 5,5 5,7 7,1 7,3 7,5 7,7
𝑿 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7
𝑿 1 2 3 4 5 6 7
𝑷 𝑿 1/16 2/16 3/16 4/16 3/16 2/16 1/16
1 2 3 4 3 2 1
𝜇𝑥 = 𝐸 𝑋 = 𝑋 𝑃(𝑋) = 1 +2 +3 +4 +5 +6 +7 =4=𝜇
16 16 16 16 16 16 16
𝜎𝑥2 = 𝐸 𝑋 − 𝜇𝑥 2
= 𝑋−4 2
𝑃 𝑋
2 1 2 2 23 2 4 2 3
= 1−4 16
+ 2−4 16
+ 3−4 16
+ 4−4 16
+ 5−4 16
2 2 2 1 5 𝜎2
+ 6−4 16
+ 7−4 16
=2 = 𝑛
The following graph shows the distribution of 𝑋and compares it with normal distribution.
1 2 3 4 5 6 7
n=2
b) If a random sample of size 𝑛 = 3 is to be selected from this population (the selection is with
replacement), find the sampling distribution of the sample mean 𝑋 and evaluate 𝜇𝑥 and 𝜎𝑥
The following table shows all possible samples corresponding with their means.
Sample 1,1,1 1,1,3 1,1,5 1,1,7 1,3,1 1,3,3 1,3,5 1,3,7 1,5,1 1,5,3 1,5,5 1,5,7 1,7,1
𝑋 1 5/3 7/3 3 5/3 7/3 3 11/3 7/3 3 11/3 13/3 3
Sample 1,7,3 1,7,5 1,7,7 3,1,1 3,1,3 3,1,5 3,1,7 3,3,1 3,3,3 3,3,5 3,3,7 3,5,1 3,5,3
𝑋 11/3 13/3 5 5/3 7/3 3 11/3 7/3 3 11/3 13/3 3 11/3
Sample 3,5,5 3,5,7 3,7,1 3,7,3 3,7,5 3,7,7 5,1,1 5,1,3 5,1,5 5,1,7 5,3,1 5,3,3 5,3,5
𝑋 13/3 5 11/3 13/3 5 17/3 7/3 3 11/3 13/3 3 11/3 13/3
Sample 5,3,7 5,5,1 5,5,3 5,5,5 5,5,7 5,7,1 5,7,3 5,7,5 5,7,7 7,1,1 7,1,3 7,1,5 7,1,7
𝑋 5 11/3 13/3 5 17/3 13/3 5 17/3 19/3 3 11/3 13/3 5
Sample 7,3,1 7,3,3 7,3,5 7,3,7 7,5,1 7,5,3 7,5,5 7,5,7 7,7,1 7,7,3 7,7,5 7,7,7
𝑋 11/3 13/3 5 17/3 13/3 5 17/3 19/3 5 17/3 19/3 7
5 𝜎2
𝜇𝑥 = 𝐸 𝑋 = 𝑋 𝑃(𝑋 ) = 4 = 𝜇 and 𝜎𝑥2 = 𝐸 𝑋 − 𝜇𝑥 2
= 𝑋−4 2
𝑃 𝑋 = =
3 𝑛
The following graph shows the distribution of 𝑋and compares it with normal distribution
n=3
Notice that the distribution of 𝑋when 𝑛 = 3 is closer to the normal distribution than the distribution when
𝑛 = 2.
If a random sample of 𝑛 observations is selected from a large population with mean 𝜇 and variance 𝜎 2 ,
then
𝜎2
1. The sampling distribution of the sample mean 𝑋 will have mean𝜇𝑋 = 𝜇 and variance𝜎𝑋2 = 𝑛
𝜎
2. The standard deviation of 𝑋 is called the standard error of 𝑋, 𝑆. 𝐸 𝑋 = 𝜎𝑋 =
𝑛
3. If the population has a normal distribution, the sampling distribution of 𝑋 will be exactly
normally distributed, regardless of the sample size 𝑛.
4. If the population distribution is not normal, the sampling distribution of 𝑋 will be
approximately normally distributed for large samples (𝑛 ≥ 30). This theorem is called the
Central Limit Theorem
Example 12:
A random sample is selected from a population the mean of which is 𝜇 = 120 and the standard deviation is
𝜎 = 8.
b) How large must the random sample be if we want the standard error of the sample mean to be 0.5 or
less?
8 𝑛
𝑆. 𝐸 𝑋 = 𝑛
≤ 0.5 → 8
≥2 → 𝑛 ≥ 16 → 𝑛 ≥ 256 , the sample size must be 256 or more.
c) How large must the random sample be if we want the standard error of the sample mean to be 2 or less?
8 8 2
𝑆. 𝐸 𝑋 ≤ 2 → 𝑛
≤2→𝑛≥ 2
= 16. The sample size must be 16 or more
d) If the sample size is very large, how does this affect the expected value of the sample mean and its
standard error?
The expected value of the sample mean, 𝐸 𝑋 = 𝜇 for any sample size.
The standard error gets smaller as the sample size increases, when the sample size is very large then the
standard error will be close to 0.
An-Najah National University CH 6 – Page 88
Elementary Statistics: A Step by Step Approach, Bluman, 7th Edition 2022-2023
Example 13:
The weights of 10-year-old boys are normally distributed with mean of 44 kilograms and standard deviation
of 5 kilograms.
a) A boy is selected at random, what is the probability that he weighs between 43 and 45 kilograms?
43−44 𝑋−44 45−44
𝑃 43 < 𝑋 < 45 = 𝑃 < < = 𝑃 −0.2 < 𝑍 < 0.2 = 2(0.0793) = 0.1586.
5 5 5
b) 4 boys are to be selected, what is the probability that their mean weight will be between 43 and 45
kilograms?
25
Here 𝑋: 𝑁 44, 4
→ 𝑃 43 < 𝑋 < 45 = 𝑃 −0.4 < 𝑍 < 0.4 = 2(.1554) = 0.3108
c) A random sample of 25 boys is selected, what is the probability that their mean weight will be between
43 and 45 kilograms?
Here
25
𝑋: 𝑁 44, : 𝑁 44,1 → 𝑃 43 < 𝑋 < 45 = 𝑃 −1 < 𝑍 < 1 ≅ 2(0.3413) = 0.6826
25
d) Suppose that all random samples of size 25 are selected, give an interval symmetric about the population
mean and contains about 95% of all the sample means.
Here, 𝑋: 𝑁 44,1 as shown in the previous part.
We want to find to numbers 𝑎 and 𝑏 such that
𝑃 𝑎 < 𝑋 < 𝑏 = 0.95, where 44 in at the center between 𝑎and 𝑏.
0.95
→ 𝑃 𝑎 < 𝑋 < 44 = 𝑃 44 < 𝑋 < 𝑏 = 2
= 0.475 .
It is easy to show that 𝑎 ≅ 42 and 𝑏 ≅ 46 kilograms.
Example 14:
The average number of milligrams of sodium in a certain brand of low-salt microwave frozen dinners is 660
mg, and the standard deviation is 35 mg.
a) If a sample of 10 dinners is selected, can you find the probability that the mean of the sample will be
larger than 670 mg? If yes, find it.
We cannot find the probability because we do not know what the distribution of the sample mean is.
Here, we cannot apply the central limit theory since 𝑛 = 10 < 30.
b) If a sample of 50 dinners is selected, can you find the probability that the mean of the sample will be
larger than 670 mg? If yes, find it
Here, we can apply the central limit theory since 𝑛 = 50 > 30.
𝑋 − 660 670 − 660
𝑃 𝑋 > 670 = 𝑃 > = 𝑃 𝑍 > 2.02 = 0.5 − 0.4783 = 0.0217
35/ 50 35/ 50
PROBLEMS
1. 1
Suppose that 𝑓 𝑥 = 2 , 3 < 𝑥 < 5 is a probability distribution function, find:
a) 𝑃 𝑋 < 46 0.9332
b) 𝑃 𝑋 > 38 0.6915
c) 𝑃 32 < 𝑋 < 48 0.9544
d) 𝑃 −2 < 𝑋 < 36 0.1587
e) 𝑐 so that 𝑃 𝑋 < 𝑐 = .95 46.58
f) 𝑐 so that 𝑃 𝑐 < 𝑋 < 40 = .3 36.64
g) 𝑐 so that 𝑃 −𝑐 < 𝑋 − 40 < 𝑐 = .95 7.84
h) The first quartile, 𝑄1 37.32
i) The third quartile, 𝑄3 42.68
5. The IQs of individuals admitted to a school for the mentally retarded are approximately normally
distributed with a mean of 60 and a standard deviation of 10.
6. The time it takes a cell to divide is normally distributed with an average time of one hour and a
standard deviation of 5 minutes.
a) What is the probability that it takes a cell more than 65 minutes to divide? 0.1587
b) What is the proportion of cells that divide in less than 45 minutes? 0.0013
c) What is the time that 15% of cells need more than it to divide? 65.2 minutes
7. A nurse supervisor has found that staff nurses, on the average, complete a certain task in 10 minutes. If
the times required to complete the task are approximately normally distributed with a standard
deviation of 3 minutes, find:
a) The proportion of nurses completing the task in less than 4 minutes. 0.0228
b) The proportion of nurses requiring more than 5 minutes to complete the task. 0.9525
c) The probability that a nurse will complete the task within 3 minutes. 0.0099
8. A doctor goes daily from his home to his clinic. The trip takes 24 minutes on the average, with a
standard deviation of 4 minutes. Assume the distribution of trip times is normally distributed.
a) What is the probability that a trip will take at least 15 minutes? 0.9878
b) If the clinic opens at 9.00 a.m. and he leaves his house at 8.30 a.m. daily, what percentage of days he
will be late for work? 0.0668
9. Random samples of size 𝑛 were selected from populations with the means and variances given here.
Find the mean and the standard error of the sample mean in each case:
a) If the sampled populations are normal, what is the sampling distribution of 𝑋 for parts a, b, and c.
𝑋 is normally distributed
b) According to the Central Limit Theorem, if the sampled populations are not normal, what can be
said about the sampling distribution of 𝑋 for parts a, b, and c.
If the population is not normal then the sample mean 𝑋 is approximately normally distributed in parts
(a) and (b) since the sample size 𝑛 is greater than 30 in these parts.
In part (c) the distribution of 𝑋 is not normal since the sample size 𝑛 is less than 30.
11. A normal population has mean 100 and variance 40. How large must a random sample be if we want
standard error of the sample mean to be 1.5 or less? 18 or more
12. Suppose a random sample of 𝑛 = 25 observations is to be selected from a population that is normally
distributed, with mean equal to 106 and standard deviation equal to 12.
a) Find the probability that the sample mean 𝑋 exceeds 110. 0.0475
b) Find the probability that the sample mean deviates from the population mean by no more than 4
units. 0.905
13. The IQ scores of students at a certain university are normally distributed with a mean of 125 and a
standard deviation of 14.
a) What is the percentage of scores that are greater than 146? 6.68%
b) What is the probability that a random sample of 49 students will have a mean IQ score greater than
128? 0.0668
c) What is the probability that the mean will be between 122.5 and 126.0? 0.5859
14. The normal daily human potassium requirement is in the range of 2000 to 6000 mg. Suppose that the
amount of potassium in a banana is normally distributed with mean 630 mg and standard deviation of
40 mg. If you eat 3 bananas per day, find the probability that your total daily intake of potassium from
the 3 bananas will be in the required range. 0.0559
See and solve the following example and exercises from the textbook:
Section 6 – 1 :
Examples: 1, 2, 3, 4, 5
Exercises: All odd-number problems
Section 6 – 2 :
Examples: 6, 7, 8, 9, 10
Exercises: 5, 8, 10, 11, 14, 25, 26, 35, 38
Section 6 – 3 :
Examples: 13, 14, 15
Exercises: 9, 11, 15, 17, 18, 23
z
AREAS UNDER THE NORMAL DISTRIBUTION
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2024 .2088 .2123 .2157 .2190 .2224
0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549
0.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4987
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
3.1 .4990 .4991 .4991 .4991 .4992 .4992 .4992 .4992 .4993 .4993
3.2 .4993 .4993 .4994 .4994 .4994 .4994 .4994 .4995 .4995 .4995
3.3 .4995 .4995 .4996 .4996 .4996 .4996 .4996 .4996 .4996 .4997
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4998 .4998
3.5 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998
BIOSTATISTICS
Introduction
As stated in Chapter 1, Statistics is divided into two branches; descriptive and inferential. In this chapter we
will learn about the first area of the inferential statistics which is estimation. Before discussing this topic, it is
better to recall the following concepts:
A parameter is a measure that describes the population of interest, such as the population mean 𝜇, population
proportion 𝑝, and population standard deviation 𝜎. The parameters are considered as constants and they are
difficult to evaluate for large populations, so they are estimated.
A statistic is a measure that describe a sample selected from the population, such as the sample mean 𝑥 ,
sample proportion 𝑝, and sample standard deviation 𝑠. The statistics are considered as random variables and
they are computed to estimate the population parameters.
Estimation is the process of estimating the value of a parameter from information obtained from a sample.
Point Estimation
A point estimate is a specific numerical value estimate of a parameter. The best point estimator of the
population mean 𝜇 is the sample mean 𝑥 .
For example, suppose a college president wishes to estimate the mean age of students attending classes this
semester. The president could select a random sample of 100 students and find the mean age of these students,
say 21.3 years. From the sample mean, the president could infer that the mean age of all the students is 21.3
years. This type of estimation is called point estimation.
Example 1:
A random sample of 20 men was selected and gave the following data for the hemoglobin reading. Based on
this sample, estimate the mean hemoglobin reading for all men in the population.
17 18 16 19 15 17 15 16 18 15
13 15 14 17 15 17 15 16 14 14
𝑥 17+18+⋯+14
The best point estimator is the sample mean. Here, 𝑥 = 𝑛
= 20
= 15.8
3
The point estimate for the percent of men whose hemoglobin readings are above 17 is 𝑝 = 20 = 0.15
1. The estimator should be an unbiased estimator. That is, the expected value of the estimator is equal to
the parameter being estimated.
2. The estimator should be consistent estimator. That is, the value of the estimator approaches the value
of the parameter estimated as the sample size increases.
Example 2:
Show that the sample mean 𝑥 is unbiased and consistent estimator for the population mean 𝜇
2. To show that is a consistent estimator, we must find the variance of 𝑥 , by central limit
𝜎2
theorem𝑣𝑎𝑟 𝑥 = 𝑛 . It is easy to recognize the variance of 𝑋 decreases as 𝑛 increases, which means
the values of 𝑥 will be closer to 𝜇 as 𝑛 increases.
In an interval estimate, the parameter is specified as being between two values. For example, an interval
estimate for the mean age of all students might be between 20.9 and 21.7 years, or it might be 21.3∓ 0.4 years.
Either the interval contains the parameter or it does not. A degree of confidence can be assigned before an
interval estimate is made. For instance, you may wish to be 95% confident that the interval contains the true
parameter.
A confidence interval is a specific interval estimate to a parameter determined by using data obtained from a
sample and by using the specific confidence level of the estimate.
The confidence level of an interval estimate of a parameter is the probability that the interval estimate will
contain the parameter. The confidence level is denoted by 1 − 𝛼 100%
For example, a 95% confidence interval means the probability that the interval estimate will contain the
parameter is 0.95 and the probability the interval estimate will not contain the parameter is 0.05, this
probability is denoted by 𝛼. ( this means that 1 − 𝛼 = 0.95 and 𝛼 = 0.05)
1. Here, the population must be normally distributed or the sample size 𝑛 ≥ 30.
2. Select a random sample and evaluate the sample mean 𝑥 (which is a point estimator for 𝜇).
𝜎
3. Evaluate the standard error of 𝑥 , 𝑆. 𝐸 = 𝑛
𝛼
4. Determine the confidence level 1 − 𝛼 , and evaluate 𝑧𝛼 2 by solving the equation 𝑃 𝑍 > 𝑧𝛼 /2 = 2
For example; 𝑧0.1 = 1.28, 𝑧0.05 = 1.645, 𝑧0.025 = 1.96, 𝑧0.01 = 2.33, and 𝑧0.005 = 2.575
𝜎 𝜎 𝜎
𝑥 − 𝑧𝛼 2 , 𝑋 + 𝑧𝛼 2 = 𝑥 ∓ 𝑧𝛼 2
𝑛 𝑛 𝑛
𝜎
6. The term 𝐸 = 𝑧𝛼 2 𝑛 is called the maximum error of the estimate or the margin of error.
Example 3:
Noise levels at various area urban hospitals were measured in decibels; the population standard deviation
from a previous study was 8.0 decibels.
a) The mean of the noise levels in 85 randomly selected corridors was 61.2 decibels, find a 95%
confidence interval of the true mean (population mean)
8.0
A 95% C.I is 61.2 ∓ 1.96 85
= 61.2 ∓ 1.7 = 59.5 , 62.9 , (margin of error = 1.7)
Based on the given data, we can say the true mean falls between 59.5 and 62.9 with probability of
0.95.
b) The mean of the noise levels in 85 randomly selected corridors was 61.2 decibels, find a 98%
confidence interval of the true mean (population mean)
8.0
A 95% C.I is 61.2 ∓ 2.33 85
= 61.2 ∓ 2.0 = 59.2 , 63.2 , (margin of error = 2.0)
Based on the given data, we can say the true mean falls between 58.2 and 63.2 with probability of .98
c) The mean of the noise levels in 170 randomly selected corridors was 60.8 decibels, find a 95%
confidence interval of the true mean (population mean)
𝑛 = 170 > 30, 𝑥 = 60.8, 𝜎 = 8.0(known), 𝛼 = 0.05 → 𝛼/2 = 0.025 → 𝑧0.025 = 1.96
8.0
A 95% C.I is 60.8 ∓ 1.96 170
= 60.8 ∓ 1.2 = 59.6 , 62.0 , (margin of error = 1.2)
Based on the given data, we can say the true mean falls between 59.6 and 62.0 with probability of .95
Note: The larger sample size the smaller margin of error (the narrower C.I)
Example 4:
The average cholesterol level for a random sample of 25 adult women is 263 units with a standard deviation
of 43 units. From previous study, it is known that the population is normally distributed with a standard
deviation of 40 units. Find a 90% C.I for the mean cholesterol level for all adult women in the population.
Here, the population is normally distributed, 𝑛 = 25 < 30 small , 𝑥 = 263, 𝑠 = 43, 𝜎 = 40, and 𝛼 = 0.10 →
𝛼/2 = 0.05 → 𝑧0.05 = 1.645.
40
A 90% C.I is 263 ∓ 1.645 ≅ 263 ∓ 13.16 = 249.84 , 276.16 , (margin of error = 13.16)
25
Based on the given data, we can say the true mean falls between 249.84 and 276.16 with probability of .90
𝜎 𝑧𝛼 2 . 𝜎 2
𝐸 = 𝑧𝛼 2 → 𝑛=
𝑛 𝐸
Example 5:
A scientist wishes to estimate the average depth of a river. He wants to be 98% confident that the estimate
is accurate within 2 feet. From a previous study, the standard deviation of the depths measured was 4.38
feet. Find the required sample size.
𝛼
𝐸 = 2 , 𝜎 = 4.38 , 𝛼 = 0.02 → = 0.01 → 𝑧0.01 = 2.33 , we want to find 𝑛
2
𝜎 4.38 2.33×4.38 2
𝐸 = 𝑧𝛼 /2 𝑛
→ 2 = 2.33 𝑛
→𝑛= 2
= 26.03 , then round up to get 𝑛 = 27
The 𝑡 distribution shares some characteristics of the normal distribution and differs from it in others. The 𝑡
distribution is similar to the standard normal distribution in these ways:
1. It is bell-shaped.
2. It is symmetric about the mean.
3. The mean, median, and mode are equal to 0.
The 𝑡 distribution differs from the standard normal distribution in the following ways:
1. The variance is greater than 1.
2. The 𝑡 distribution is a family of curves based on the concept of degrees of freedom, which is related to
the sample size.
3. As the sample size increases, the 𝑡 distribution approaches the standard normal distribution. See the
following figure.
df
0.4
1
5
10
z
0.3
0.2
0.1
0.0
-3 -2 -1 0 1 2 3
The following table gives the values of 𝑡𝛼 for 𝛼 = 0.1, 0.05, 0.025, 0.01, and 0.005
𝒕𝜶
Example 6:
Here we want to find 𝑡0.05/2 = 𝑡0.025 = 2.06 , the reading under 𝑡0.025 in the row of 𝐷. 𝐹 = 25
𝑠 𝑠 𝑠
𝑥 − 𝑡𝛼 2 , 𝑋 + 𝑡𝛼 2 = 𝑥 ∓ 𝑡𝛼 2
𝑛 𝑛 𝑛
𝑠
8. The term 𝐸 = 𝑡𝛼 2 𝑛 is called the maximum error of the estimate or the margin of error.
Example 7:
A sample of 6 college wrestlers had an average weight of 276 pounds with a standard deviation of 12 pounds.
Assuming normality:
a) Find a 90% C.I of the true mean weight of all college wrestlers.
𝑛 = 6, 𝑥 = 276, 𝑠 = 12, 𝛼 = 0.10 → 𝛼/2 = 0.05, 𝐷. 𝐹 = 6 − 1 = 5 → 𝑡0.05,5 = 2.015.
12
The 90% C.I is 276 ∓ 2.015 6
≅ 276 ∓ 10 = 266 , 286 pounds
b) If a coach claimed that the average weight of the wrestlers on the team was 310, would the claim be
believable?
No, it would not be, since 310 is not contained in the obtained C.I.
Example 8:
100 randomly selected people were asked how long they slept at night. The mean time was 7.6 hours, and the
standard deviation was 0.9 hour. Construct a 95% C.I of the true mean
0.9
A 95% C.I is 7.6 ∓ 1.96 = 7.6 ∓ 0.18 = 7.42 , 7.78 , (margin of error = 0.18)
100
Example 9:
The number of unhealthy days based on the Air Quality Index for a random sample of metropolitan areas is
shown below. Assuming normality, construct a 98% C.I based on the data.
61 12 6 40 27 38 93 5 13 40
1. Evaluate the sample mean and sample standard deviation. Here, 𝑥 = 33.5 and 𝑠 = 27.68
𝛼
2. 𝛼 = .02 → = .01, 𝐷. 𝐹 = 10 − 1 = 9 → 𝑡0.01,9 = 2.821
2
27.68
3. A 98% C.I is 33.5 ∓ 2.821 = 33.5 ∓ 24.7 = 8.8 , 58.2
10
Therefore, one can be 98% confident that the population mean is between 8.8 and 58.2.
Proportions can be obtained from samples or populations. The following symbols will be used.
Example 10:
Suppose that 20% of people in a certain population are diabetic. A random sample of 150 people is selected
and it is found that there are 42 diabetic people in the sample.
1. Select a random sample, of size 𝑛, from the population and evaluate the corresponding sample
proportion 𝑝.
2. The best point estimator of 𝑝 is 𝑝
𝑝 1−𝑝
3. The standard error of 𝑝 is 𝑆. 𝐸 = 𝑛
𝑝 1−𝑝
4. Since 𝑝 is unknown, the standard error of 𝑝 can be estimated by using the formula 𝑆. 𝐸 =
𝑛
5. Make sure that both 𝑛𝑝 and 𝑛 1 − 𝑝 are greater than or equal to 5.
6. Determine the confidence level 1 − 𝛼 , and find 𝑧𝛼 2 .
7. A 1 − 𝛼 100% confidence interval for the population proportion 𝑝 is
𝑝 1−𝑝
8. The term 𝐸 = 𝑧𝛼 /2 𝑛
is called the maximum error of the estimate or the margin of error
Example 11:
A sample of 500 nursing applications included 60 from men. Find a 90% C.I of the true population proportion
of men who applied to the nursing program.
𝑋 60
1. 𝑛 = 500, 𝑝 = 𝑛 = 500 = 0.12
2. 𝑛𝑝 = 60 ≥ 5 and 𝑛 1 − 𝑝 = 440 ≥ 5
3. 𝛼 = 0.10 → 𝛼/2 = 0.05, → 𝑧0.05 = 1.645
0.12 1−0.12
4. A 90% C.I of 𝑝 is 0.12 ∓ 1.645 500
≅ 0.12 ∓ 0.024 = 0.096 , 0.144
Hence, you can be 90% confident the percentage of applicants who are men is between 9.6% and 14.4%
Also, you can be 90% confident the percentage of applicants who are women is between 85.6% and 90.4%
Example 12:
To determine what proportion of people use brand X cough syrup, 500 people are questioned. It is found that
40 use brand X, 160 use some other brand, and 300 do not use cough syrup Construct a 95% C.I for:
𝑝 1−𝑝 𝑧𝛼/2 2
𝐸 = 𝑧𝛼/2 → 𝑛 =𝑝 1−𝑝
𝑛 𝐸
Notice that: The value of 𝑝 can be obtained from a previous study. If no information is given about 𝑝, you
should use 𝑝 = 0.5
Example 13:
A researcher wishes to estimate, with 95% confidence, the proportion of people who own a home computer.
The researcher wishes to be accurate within 0.02 of the true proportion. Find the required sample size if
there is a previous study shows that 0.60 of those interviewed had a computer at home.
𝛼
Here, 𝛼 = 0.05 → 2
= 0.025 → 𝑧.025 = 1.96, 𝑝 = .60, and margin of error 𝐸 = 0.02
𝑧 𝛼 /2 2 1.96 2 𝑅𝑜𝑢𝑛𝑑 𝑢𝑝
→𝑛 =𝑝 1−𝑝 = 0.6 1 − 0.6 = 2304.96 𝑛 = 2305 people.
𝐸 0.02
Example 14:
1.96 2
→ 𝑛 = 0.5 1 − 0.5 0.02
= 2401 people.
PROBLEMS
1. The daily yield for a local chemical plant has averaged 880 tons for the last several years. The quality
control manager would like to know whether this average has changed in recent months. He randomly
selects 50 days from the computer data base and computes the average and standard deviation of the
𝒏 = 𝟓𝟎 yields as 𝒙 = 𝟖𝟕𝟏 tons and 𝒔 = 𝟐𝟏 tons, respectively.
a) Construct a 90% C.I for the average daily yield 866 , 876
b) Can we say the mean daily yield has changed? Yes, since the constructed C.I does not contain 880
2. It is recognized that cigarette smoking has a deleterious effect on lung function. In their study of the
effect of cigarette smoking on the carbon monoxide diffusing capacity (DL) of the lung, researchers
found that current smokers had DL readings significantly lower than those of nonsmokers. The carbon
monoxide diffusing capacities for a random sample of 20 current smokers are listed below:
104, 87, 73, 123, 91, 92, 62, 91, 84, 76, 101, 88, 71, 82, 89, 103, 109, 73, 107, and 90
Construct a 98% C.I for mean DL reading for current smokers in the population. Assuming normality.
81.3 , 98.3
3. A water company wishes to discover the mean water consumption for month of July in all homes in a
certain region. There are 600 homes in the region and a random sample of 15 homes showed a mean of
consumption of 11.6 m3 with a standard deviation of 1.2 m3. Assuming normality
a) Construct a 95% confidence interval for the mean water consumption per home in this region.
10.94 , 12.26
b) Construct a 95% confidence interval for the total water consumption for all homes in this region.
6564 , 7356
4. A peony plant with red petals was crossed with another plant have streaky petals. 100 seeds from this
cross were collected and germinated and it was found that 58 plants had red petals.
a) Construct a 90% C.I for the percentage of plants that have red petals. 49.88% , 66.12%
b) A genetic state that 75% of the offspring resulting from this cross will have red flowers, based on the
given data, is his claim true?
No, since the constructed C.I does not contain the percentage 75%.
5. Suppose that we want to estimate the proportion of smokers in a certain population of 2500 adults. A
random sample of 100 adults is selected from the population. It is found that 20 of the sampled adults
are smokers.
a) Construct a 95% C.I for the proportion of smokers in the population. 0.12 , 0.28
b) Construct a 95% C.I for the proportion of nonsmokers in the population. 0.72 , 0.88
c) Construct a 95% C.I for the number of smokers in the population. 300 , 700
6. The manager of a machine shop wishes to estimate the average time an operator needs to complete a
simple task. 20 operators are selected at random and timed (in minutes). The observed results are:
7.4 7.1 4.3 4.5 7.3 4.8 5.1 4.7 6.3 5.1
, 𝑥𝑖 = 121.2 , 𝑥𝑖2 = 759.02
6.5 6.8 7.6 6.1 5.9 6.3 7.3 4.6 5.9 7.6
a) Construct a 95% confidence interval for the average time for completion of the task among all
operators. (5.53 , 6.59)
b) Construct a 90% confidence interval for the proportion of operators that need more than 7 minutes
to complete a simple task. (0.13 , 0.47)
7. It is desired to estimate the mean number of chocolate chips per cookie for a large national brand. How
many cookies would have to be sampled to estimate the true mean number of chips per cookie within 2
chips with 98% confidence? Assume that 𝝈 = 𝟏𝟎. 𝟏 chips. 𝑛 = 139
8. A medical researcher wishes to determine the percentage of females who take vitamins. He wishes to be
99% confident that the estimate is within 3% of the true proportion.
a) How large should the sample size be, if a previous study showed that 25% of females took
vitamins?𝑛 = 1383
b) If no previous study is available, how large should the sample size be? 𝑛 = 1844
See and solve the following examples and exercises from the text book:
Section 7 – 1
Examples: 1, 2, 3, 4
Exercises: 9, 11, 19, 25, 26
Section 7 – 2
Examples: 5, 6, 7
Exercises: 5, 9, 14, 16
Section 7 – 3
Examples: 8, 9, 10, 11
Exercises: 3, 13, 15, 18
BIOSTATISTICS
Researchers are interested in answering many types of questions. For example; a scientist might want to know
whether the earth is warming up. A physician might want to know whether a new medication will lower a
person’s blood pressure. An educator might wish to see whether a new teaching technique is better than a
traditional one. These types of questions can be addressed through statistical hypothesis testing, which is a
decision –making process for evaluating claims about a population. In hypothesis testing, the researcher must:
A statistical hypothesis is a statement about a population parameter. This statement may or may not true.
There are two types of statistical hypotheses for each situation: the null hypothesis and the alternative
hypothesis.
1. The null hypothesis, symbolized by 𝐻𝑜 , is a statistical hypothesis that states that there is no difference
between a parameter and a specific value, or that no difference between two parameters.
2. The alternative hypothesis, symbolized by 𝐻1 or 𝐻𝑎 , is a statistical hypothesis that states that there is a
difference between a parameter and a specific value, or states that there is a difference between two
parameters.
Example 1:
As an illustration of how hypotheses should be stated, three different statistical studies will be used as
examples.
Situation 1:
A medical researcher is interested in finding out whether medication will have any undesirable side effects. The
researcher is particularly concerned with the pulse rate of the patients who take the medication. Will the pulse
rate increase, decrease or remained unchanged after a patient takes the medication?
The researcher knows that the mean pulse rate for the population under study is 82 beats per minute, so the
hypotheses for this situation are 𝐻𝑜 : 𝜇 = 82 versus 𝐻1 : 𝜇 ≠ 82. This test is called a two-tailed test
Situation 2:
A chemist invents an additive to increase the life of a car battery. If the mean life time of the car battery without
the additive is 36 months, then the hypotheses are 𝐻𝑜 : 𝜇 = 36 versus 𝐻1 : 𝜇 > 36. This test is called right-
tailed test.
Situation 3:
A contractor wishes to lower heating bills by using a special type of insulation in houses. If the average of the
monthly heating bills is $78, the hypotheses about heating cost are 𝐻𝑜 : 𝜇 = 78 versus 𝐻1 : 𝜇 < 78. This test is
called a left-tailed test.
Example 2:
State the null and alternative hypotheses for each of the following situations:
a) A researcher thinks that if expectant mothers use vitamin pills, the birth weight of the babies will
increase. The average birth weight of the population is 8.6 pounds.
Solution: 𝐻𝑜 : 𝜇 = 8.6 versus 𝐻1 : 𝜇 > 8.6, right-tailed test.
b) An engineer hypothesized that the percentage of defective compact disks can be decreased by using
robots instead of humans. The percentage of defective disks is 1.8%.
Solution: 𝐻𝑜 : 𝑝 = 0.018 versus 𝐻1 : 𝑝 < 0.018, left- tailed test
c) A psychologist feels that playing soft music during a test will change the results of the test. The
psychologist is not sure whether the grades will be higher or lower. In the past, the mean of the grades
was 73.
Solution: 𝐻𝑜 : 𝜇 = 73 versus 𝐻1 : 𝜇 ≠ 73, two-tailed test.
STEP 2
After stating the hypotheses, the researcher designs the study. The researcher selects a sample from the
population and evaluates the correct statistical test. In situation 1, for instance, the researcher will select a
sample of patients who will be given the drug. After allowing a suitable time for the drug to be absorbed, the
researcher will measure each person’s pulse rate. Then the researcher will evaluate the sample mean and
variance to evaluate the test statistic.
A statistical test is a value computed based on the data obtained from a sample and it is used to make a
decision about whether the null hypothesis should be rejected.
STEP 3
In the hypothesis testing there are 4 possible outcomes and they are shown below:
𝑯𝒐 is true 𝑯𝒐 is false
Error Correct
Reject 𝑯𝒐
Type I decision
Correct Error
Do not reject 𝑯𝒐
decision Type II
If a null hypothesis is true and it is rejected, then a type I error is made. In situation 1, for instance, the
medication might not significantly change the pulse rate of all the users in the population; but it might change
the rate, by chance, of the subjects in the sample. In this case, the researcher will reject the null hypothesis
when it is really true, thus committing a type I error.
The probability of type I error is denoted by 𝛼, and it is called the level of significance.
Statisticians generally agree on using 3 arbitrary significance levels: 𝛼 = 0.1, 0.05, and 0.01. That is, if the null
hypothesis is rejected, the probability of a type I error will be 0.10, 0.05, or 0.01, depending on which level of
significance is used. The level of significance does not have to be the 0.10, 0.05, or 0.01. It can be any level,
depending on the seriousness of the type I error.
STEP 4
After a significance level is chosen, a critical value is selected from a distribution table, 𝑧 or 𝑡 tables, depends on
the method of computing the test statistic.
1. The critical value separates the critical (rejection) region from the noncritical region. The symbol for
critical value is 𝑧𝛼 , 𝑧𝛼 /2 , 𝑡𝛼 ,… The used symbol depends on the type of the test and on the table used to
select the critical value.
2. The critical or rejection region is the range of values of the test statistic that indicates that there is a
significant difference and that the null hypothesis 𝐻𝑜 should be rejected.
3. The noncritical or non-rejection region is the range of values of the test statistic that indicates that the
null hypothesis 𝐻𝑜 should not be rejected
For right-tailed tests the rejection region can be on the right of the mean, for left-tailed tests the rejection
region can be on the left of the mean, and for two-tailed tests the rejection region is divided into two smaller
regions one on the right and the other on the left of the mean.
In all types of tests, the null hypothesis 𝐻𝑜 should be rejected if the test statistic (test value) belongs to the
rejection region. On the other hand, the null hypothesis 𝐻𝑜 should not be rejected if the test statistic (test value)
does not belong to the rejection region.
1. State the null and alternative hypotheses and identify the claim.
There are 3 types of tests which are:
𝐻𝑜 : 𝜇 = 𝜇𝑜 versus𝐻1 : 𝜇 ≠ 𝜇𝑜 , This test is called two-tailed test
𝐻𝑜 : 𝜇 = 𝜇𝑜 versus𝐻1 : 𝜇 > 𝜇𝑜 , This test is called right-tailed test
𝐻𝑜 : 𝜇 = 𝜇𝑜 versus𝐻1 : 𝜇 < 𝜇𝑜 , This test is called left-tailed test
𝑥 −𝜇 𝑜
2. Select a random sample and then compute the test statistic, 𝑧𝑜 = .
𝜎/ 𝑛
3. Choose the level of significance 𝛼 that will be used to determine the rejection region as explained in the
following step.
4. Evaluate the critical value which separates the rejection region from the non-rejection region.
For right-tailed tests the rejection region is the region on the right of 𝑧𝛼 . (Reject 𝐻𝑜 if 𝑧𝑜 > 𝑧𝛼 ).
For left-tailed tests the rejection region is the region on the left of −𝑧𝛼 . (Reject 𝐻𝑜 if 𝑧𝑜 < −𝑧𝛼 )
For two-tailed tests the rejection region is the region on the right of 𝑧𝛼/2 and the region on the left
of −𝑧𝛼/2 .(Reject 𝐻𝑜 if 𝑧𝑜 > 𝑧𝛼/2 )
The following graphs show the rejection region (shaded region) for each type of tests
5. Make the decision to reject or not reject the null hypothesis, 𝐻𝑜 , In all types of tests, the null hypothesis
should be rejected when the value of the test statistic 𝑧𝑜 belongs to the rejection region.
Example 3:
A researcher claims that the average wind speed in a certain city is 8 miles per hour. A sample of 32 days has an
average wind speed of 8.2 miles per hour. The standard deviation of the population is 0.6 miles per hour. At 5%
level of significance, is there sufficient evidence to reject the claim?
Solution: We will use 𝑧-test since the population standard deviation is known.
𝑥 −𝜇 𝑜 8.2−8
2. Compute the test statistic 𝑧𝑜 = 𝜎/ 𝑛
= 0.6/ 32
= 1.89
3. 𝛼 = 0.05 and the test is two-tailed, and 𝑧𝛼/2 = 𝑧0.025 = 1.96
4. The rejection region is the region on the right of 1.96 or on the left of −1.96
5. Make the decision. Do not reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑧𝑜 does not fall in the
rejection region (𝑧𝑜 falls in the non-rejection region).
6. Summarize the results. At 5% level of significance, there is no sufficient evidence to reject the claim that
the average wind speed in the city is 8 miles per hour.
Example 4:
A manager claims that in his factory, the average number of days per year missed by the employees due to
illness is less than 7 days per year. The following data show the number of days missed by 40 employees last
year. Is there sufficient evidence to believe the manager’s claim at 𝛼 = 0.05?Assume the population standard
deviation is 4 days
0 6 12 3 3 5 4 1 3 9 6 0 7 6 3
4 7 4 7 1 0 8 12 3 2 5 10 5 15 3
2 5 3 11 8 2 2 4 9 1
2. Compute the test statistic. First, we must evaluate the sample mean and standard deviation.
𝑥 201
𝑥=
= = 5.025
𝑛 40
𝑥 −𝜇 𝑜 5.025−7
𝑧𝑜 = 𝜎/ 𝑛
= 4/ 40 = −3.123𝛼 = 0.05and the test is one-tailed, and 𝑧𝛼 = 𝑧0.05 = 1.645
5. Make the decision. Reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑧𝑜 falls in the rejection region
6. Summarize the results. At 5% level of significance, there is sufficient evidence to support the claim that
the average number of days per year missed by the employees due to illness is less than 7 days per year.
Assumptions:
1. State the null and alternative hypotheses and identify the claim.
There are 3 types of tests which are:
𝐻𝑜 : 𝜇 = 𝜇𝑜 versus 𝐻1 : 𝜇 ≠ 𝜇𝑜 , This test is called two-tailed test
𝐻𝑜 : 𝜇 = 𝜇𝑜 versus 𝐻1 : 𝜇 > 𝜇𝑜 , This test is called right-tailed test
𝐻𝑜 : 𝜇 = 𝜇𝑜 versus 𝐻1 : 𝜇 < 𝜇𝑜 , This test is called left-tailed test
𝑥 −𝜇 𝑜
2. Compute the test statistic, 𝑡𝑜 = 𝑠/ 𝑛
.
3. Choose the level of significance 𝛼 that will be used to determine the rejection region as explained in the
following step.
4. Evaluate the critical value which separates the rejection region from the non-rejection region.
For right-tailed tests the rejection region is the region on the right of 𝑡𝛼 . (Reject 𝐻𝑜 if 𝑡𝑜 > 𝑡𝛼 ).
For left-tailed tests the rejection region is the region on the left of −𝑡𝛼 .(Reject 𝐻𝑜 if 𝑡𝑜 < −𝑡𝛼 )
For two-tailed tests the rejection region is the region on the right of 𝑡𝛼/2 and the region on the left of
−𝑡𝛼/2 .(Reject 𝐻𝑜 if 𝑡𝑜 > 𝑡𝛼 /2 )
For the three cases, the degrees of freedom is 𝐷. 𝑓 = 𝑛 − 1
The following graphs show the rejection region (shaded region) for each type of tests
5. Make the decision to reject or not reject the null hypothesis 𝐻𝑜 , In all types of tests, the null hypothesis
should be rejected when the value of the test statistic 𝑡𝑜 belongs to the rejection region.
Example 5:
A medical investigation claims that the average number of infections per week at a hospital is 16.3. A random
sample of 10 weeks had a mean number of 17.7 infections. The sample standard deviation is 1.8. Is there a
sufficient evidence to reject the investigator’s claim at 𝛼 = 0.05?
Here we will use 𝑡- test since the population standard deviation is unknown
𝑥 −𝜇 𝑜 17.7−16.3
2. Compute the test statistic. 𝑡𝑜 = 𝑠/ 𝑛
= 1.8/ 10
= 2.46
4. The rejection region is the region on the right of 2.262 or on the left of −2.262
5. Make the decision. Reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑡𝑜 falls in the rejection region
6. Summarize the results. At 5% level of significance, there is sufficient evidence to reject the claim that
the average number of infections per week at the hospital is 16.3. Or we can say, at 𝛼 = 0.05, the
average number of infections per week at the hospital is significantly different from 16.3.
Example 6:
A physician claims that joggers’ maximal volume oxygen uptake is greater than the average of all adults. A
sample of 15 joggers has a mean of 40.6 ml/kg and a standard deviation of 6 ml/kg. If the average of all adults is
36.7 ml/kg, is there sufficient evidence to support the physician’s claim at 𝛼 = 0.10?
Here we will use 𝑡- test since the population standard deviation is unknown
𝑥 −𝜇 𝑜 40.6−36.7
2. Compute the test statistic. 𝑡𝑜 = 𝑠/ 𝑛
= 6/ 15
= 2.517
5. Make the decision. Reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑡𝑜 falls in the rejection region
6. Summarize the results. At 10% level of significance, there is sufficient evidence to support the claim
that mean joggers’ maximal volume oxygen uptake is greater than the average of all adults. Or we can
say, 𝛼 = 0.10, that joggers’ maximal volume oxygen uptake is significantly greater than the average of
all adults.
To test
𝐻𝑜 : 𝜇 = 𝜇𝑜 versus 𝐻1 : 𝜇 ≠ 𝜇𝑜
At a level of significance 𝛼, you can construct a 1 − 𝛼 100% confidence interval for 𝜇, then
Example 7:
Sugar is packed in 1-kilogram bags. An inspector suspects the bags may not contain 1 kg. A sample of 50 bags
produces a mean of 0.92 kg and a standard deviation of 0.14 kg.
Solution:
𝛼
a) 𝑥 = 0.92 , 𝑠 = 0.14 , 𝑛 = 50, 𝛼 = 0.05 → 2
= 0.025, 𝐷. 𝑓 = 50 − 1 = 49 → 𝑡0.025 ≅ 1.98.
0.14
The 95% C.I is 0.92 ∓ 1.98 50
= 0.92 ∓ 0.04 = 0.88 , 0.96
We can say the true mean falls between 0.88 kg and 0.96 kg with confidence level of 0.95.
1. State the null and alternative hypotheses and identify the claim.
There are 3 types of tests which are:
𝐻𝑜 : 𝑝 = 𝑝𝑜 versus 𝐻1 : 𝑝 ≠ 𝑝𝑜 , This test is called two-tailed test
𝐻𝑜 : 𝑝 = 𝑝𝑜 versus 𝐻1 : 𝑝 > 𝑝𝑜 , This test is called right-tailed test
𝐻𝑜 : 𝑝 = 𝑝𝑜 versus 𝐻1 : 𝑝 < 𝑝𝑜 , This test is called left-tailed test
𝑝 −𝑝 𝑜
2. Compute the test statistic, 𝑧𝑜 = , where 𝑝 is the proportion in the sample.
𝑝 𝑜 1−𝑝 𝑜 /𝑛
3. Choose the level of significance 𝛼 that will be used to determine the rejection region as explained in the
following step.
4. Evaluate the critical value which separates the rejection region from the non-rejection region.
For right-tailed tests the rejection region is the region on the right of 𝑧𝛼
For left-tailed tests the rejection region is the region on the left of −𝑧𝛼 .
For two-tailed tests the rejection region is the region on the right of 𝑧𝛼/2 and the region on the left
of −𝑧𝛼/2 .
The following graphs show the rejection region (shaded region) for each type of tests
Note: We can use the z-test for proportions only if 𝑛𝑝𝑜 ≥ 5 and 𝑛 1 − 𝑝𝑜 ≥ 5
5. Make the decision to reject or not reject the null hypothesis 𝐻𝑜 , In all types of tests, the null hypothesis
should be rejected when the value of the test statistic𝑧𝑜 belongs to the rejection region.
Example 8:
A dietitian claims that more than 60% of people are trying to avoid trans fats in their diets. She randomly
selected 200 people and found that 128 people stated that they were trying to avoid trans fats in their diets. At
𝛼 = 0.05, is there sufficient evidence to reject the claim?
128
Solution: 𝑝𝑜 = 0.6 , 𝑛 = 200, 𝑝 = 200 = .64 , 𝛼 = 0.05 .
𝑝 −𝑝 𝑜 0.64−0.6
2. Compute the test statistic. 𝑧𝑜 = = = 1.15
𝑝 𝑜 1−𝑝 𝑜 /𝑛 0.6 1−0.6 /200
5. Make the decision. Do not reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑧𝑜 does not fall in the
rejection region
6. Summarize the results. At 5% level of significance, there is sufficient evidence to reject the claim that
more than 60% of people are trying to avoid trans fats in their diets. Or we can say, 𝛼 = 0.05, the
percent of people are trying to avoid trans fats in their diets is not significantly greater than 60%.
Example 9:
An automobile association claims that 54% of fatal car/truck accidents are caused by driver error. A researcher
studies 40 randomly selected fatal accidents and finds that 19 were caused by driver error. Using 𝛼 = 0.05, can
the claim be refuted?
19
Solution: 𝑝𝑜 = 0.54 , 𝑛 = 40, 𝑝 = 40 = .475 , 𝛼 = 0.05 .
𝑝 −𝑝 𝑜 0.475−0.54
2. Compute the test statistic. 𝑧𝑜 = = = −0.825
𝑝 𝑜 1−𝑝 𝑜 /𝑛 0.54 1−0.54 /40
4. The rejection region is the region on the right of 1.96 or on the left of −1.96
5. Make the decision. Do not reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑧𝑜 does not fall in the
rejection region
6. Summarize the results. At 5% level of significance, there is sufficient evidence to support the claim that
54% of fatal car/truck accidents are caused by driver error.
The 𝑷-value or observed significance level of a statistical test is the smallest value of 𝛼 for which the null
hypothesis 𝐻𝑜 can be rejected.
If the 𝑃-value is less than a pre-assigned significance level 𝛼, (𝑃 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼) then the null hypothesis 𝐻𝑜 can be
rejected, and you can report that the results are statistically significant at 𝛼.
If 𝑃 − 𝑣𝑎𝑙𝑢𝑒 ≥ 𝛼, the decision is to do not reject the null hypothesis 𝐻𝑜 and we say that the results are not
statistically significant at 𝛼.
Example 10:
Suppose that we want to test 𝐻𝑜 : 𝜇 = 120 versus 𝐻1 : 𝜇 ≠ 120 at 𝛼 = 0.05. A random sample is
selected and yields a 𝑃-value of 0.0001
Here, we reject the null hypothesis 𝐻𝑜 since 𝑃-value= 0.0001 < 𝛼 = 0.05
Example 11:
Suppose that we want to test 𝐻𝑜 : 𝑝 = 0.54 versus 𝐻1 : 𝑝 < 0.54 at 𝛼 = 0.05. A random sample is selected
and yields a 𝑃-value of 0.20
Here, we do not reject the null hypothesis 𝐻𝑜 since 𝑃-value= 0.0001 > 𝛼 = 0.05
PROBLEMS
1. The daily yield for a local chemical plant has averaged 880 tons for the last several years. The quality
control manager would like to know whether this average has changed in recent months. He randomly
selects 50 days from the computer data base and computes the average and standard deviation of the
𝑛 = 50 yields as 𝑥 = 871 tons and 𝑠 = 21 tons, respectively. At 𝛼 = 0.05, can we say the mean daily yield
has changed?
Two- tailed t-test, reject 𝐻𝑜
2. Standards set by government agencies indicate that adults should not exceed an average daily sodium
intake of 3300 mg. To find out whether adults in the population are exceeding this limit, a sample of 100
adults is selected, and the mean and the standard deviation of daily sodium intake are found to be 3400 mg
and 1100 mg, respectively. Use α = 0.05 to conduct a test of hypothesis
Right-tailed t-test, do not reject 𝐻𝑜
3. It is recognized that cigarette smoking has a deleterious effect on lung function. In their study of the effect
of cigarette smoking on the carbon monoxide diffusing capacity (DL) of the lung, researchers found that
current smokers had DL readings significantly lower than those of nonsmokers. The carbon monoxide
diffusing capacities for a random sample of 20 current smokers are listed below:
104, 87, 73, 123, 91, 92, 62, 91, 84, 76, 101, 88, 71, 82, 89, 103, 109, 73, 107, and 90
Assuming normality, do these data indicate that the mean DL reading for current smokers is significantly
lower than 100? Use 𝛼 = 0.01
Left-tailed t-test, reject 𝐻𝑜
4. It is known that the IQ scores of a certain population of adults are approximately normally distributed with
standard deviation of 15. A random sample of 25 adults selected from this population had a mean IQ score
of 105 with a standard deviation of 14. On the basis of the given data can we conclude that the mean IQ
score for the population is not 100?
Two-tailed z-test, do not reject 𝐻𝑜
5. A random sample of 16 adults selected from a certain normal population yielded a mean weight of 64 kg
with variance of 49. Do the sample data provide sufficient evidence to conclude that the mean weight for
the population is greater than 60 kg?
Right-tailed t-test, reject 𝐻𝑜
6. A dietitian wishes to see if a person’s cholesterol level will decrease if the diet is supplemented by a certain
mineral. Six subjects were pretested, and then they took the mineral supplement for a 5-week period. The
results are shown in the following table. At 𝛼 = 0.10, can it be concluded that the cholesterol level has
decreased? Assume the variable is approximately normally distributed.
Subject 1 2 3 4 5 6
Before 210 235 208 190 172 244
After 190 170 210 188 173 228
7. A peony plant with red petals was crossed with another plant have streaky petals. 100 seeds from this
cross were collected and germinated and it was found that 58 plants had red petals. A genetic states that
75% of the offspring resulting from this cross will have red flowers. Test this claim using 𝛼 = 0.01
Two-tailed z-test, reject 𝐻𝑜
8. Suppose that we want to estimate the proportion of smokers in a certain population of adults. A random
sample of 300 adults is selected from the population. It is found that 69 of the sampled adults are smokers.
Test the claim that the percentage of smokers is less than 25%.
Left-tailed z-test, do not reject 𝐻𝑜
See and solve the following examples and exercises from the textbook:
Section 8 – 1
Examples: 1, 2
Exercises: 12, 13
Section 8 – 2
Examples: 3, 4, 5, 6
Exercises: 5, 7, 16, 19, 25
Section 8 – 3
Examples: 8, 9, 10, 11, 12, 13
Exercises: 14, 16, 17, 20
Section 8 – 4
Examples: 17, 18
Exercises: 8, 11, 12, 15, 19
Section 8 – 6
Examples: 30, 31
Exercises: 1, 2, 5
BIOSTATISTICS
CHAPTER 9: Testing the Difference between Two Means and Two Proportions
There are, however, many instances when researchers wish to compare to means. For example, two different
brands of fertilizer might be tested to see whether one is better than the other for growing plants. Or two
brands of cough syrup might be tested to see whether brand is more effective than the other.
In the comparison of two means, the same basic steps for hypothesis testing shown in Chapter 8 are used, and
the 𝑧- and 𝑡- tests are also used. The 𝑧-test can be used to compare two proportions.
Section 9.1: Testing the Difference between Two Means Using the z-Test
In many cases, researchers may be not interested in the true mean of a certain population; instead, they are
interested in comparing the means of two populations. Here, the hypotheses are:
Or equivalently 𝐻𝑜 : 𝜇1 = 𝜇2 versus 𝐻1 : 𝜇1 ≠ 𝜇2
The researcher wants to know whether the two true means 𝜇1 and 𝜇2 are different or not.
𝐻𝑜 : 𝜇1 − 𝜇2 = 0 versus 𝐻1 : 𝜇1 − 𝜇2 > 0
The researcher wants to know whether the first mean, 𝜇1 , exceeds the second mean, 𝜇2 , or not.
𝐻𝑜 : 𝜇1 − 𝜇2 = 0 versus 𝐻1 : 𝜇1 − 𝜇2 < 0
The researcher wants to know whether the first mean, 𝜇1 , is lower than the second mean, 𝜇2 ,or not.
To use the z-test, make sure the following assumptions are satisfied:
1. The samples must be independent of each other. That is, there can be no relationship between the subjects
in each sample.
2. The standard deviations, 𝜎1 and 𝜎2 , of both populations must be known.
3. If the sample sizes, 𝑛1 and 𝑛2 , are less than 30, the populations must be normally distributed.
The steps of the hypothesis testing about the difference between means using z-test:
Step 1: State the null and alternative hypotheses and identify the claim.
Note:
Step 4: Evaluate the critical value and find the critical (rejection) region.
For right-tailed tests the rejection region is the region on the right of 𝑧𝛼
For left-tailed tests the rejection region is the region on the left of −𝑧𝛼 .
For two-tailed tests the rejection region is the region on the right of 𝑧𝛼/2 and the region on the left of
−𝑧𝛼/2 .
The following graphs show the rejection region (shaded region) for each type of tests
Step 5: Make the decision to reject or not reject the null hypothesis 𝐻𝑜 .
In all types of tests, the null hypothesis should be rejected when the value of the test statistic 𝑧𝑜 belongs to the
rejection region.
The null hypothesis is rejected. Confidence intervals can be found by using this formula:
Formula for the (1 − 𝛼)100% confidence interval for the difference between two population means is
𝜎12 𝜎22
𝑥1 − 𝑥2 ∓ 𝑧𝛼 /2 +
𝑛1 𝑛2
Example 1:
Analyses of drinking water samples for 100 homes in each two different sections of a city gave the following
means of lead levels (in parts per million):
population
Sample Sample
Standard
Size Mean
deviation
Section 1 100 34.1 5.9
Section 2 100 36.0 6.0
Do the data provide a sufficient evidence to indicate that there is a difference in the two population means? Use
𝛼 = 0.05
Solution:
Here, we will use z-test since the population standard deviations are known.
𝑥 1 −𝑥 2 34.1−36.0
3. Compute the test statistic 𝑧𝑜 = = = −2.258
2 (5.9)2 (6.0)2
𝜎2
1 +𝜎 2 +
100 100
𝑛1 𝑛2
4. The rejection region is the region on the right of 1.96 or on the left of −1.96
5. Make the decision. Reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑧𝑜 falls in the rejection region.
6. Summarize the results. At 5% level of significance, there is a sufficient evidence to accept the claim that
the means of lead levels in water for the 2 sections are different.
Example 2:
Using the data given in Example 1, construct a 95% C.I for the difference between the mean of lead
levels in the sections.
Solution:
(5.9)2 (6.0)2
A 95% C.I for the difference is 36 − 34.1 ∓ (1.96) 100
+ 100
= 1.9 ∓ 1.65 = 0.25,3.55
Since the constructed interval does not contain the number 0 , we can say the two means are not equal
at 𝛼 = .05
Example 3:
A researcher claims that the mean height for 9-year-old girls exceeds the mean height of 9-year-old boys. Two
random samples yielded the following results
Boys Girls
Sample size 60 50
Mean height 123.5 126.2
Population variance 98 120
Solution:
Here, we will use z-test since the population standard deviations are known.
3. The given data are, 𝑛𝐵 = 60, 𝑥𝐵 = 123.5, 𝜎𝐵2 = 98, 𝑛𝐺 = 50, 𝑥𝐺 = 126.2, 𝜎𝐺2 = 120
𝑥 𝐺 −𝑥 𝐵 126.2−123.5
4. Compute the test statistic 𝑧𝑜 = = = 1.344
120 98
𝜎2 2
𝐺 +𝜎 𝐵 +
50 60
𝑛𝐺 𝑛𝐵
1.282
6. Reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑧𝑜 falls in the rejection region.
7. At 10% level of significance, the mean height of girls is significantly higher than the mean height of boys
at 9-years-old.
Example 4:
The researcher claims that the mean height for 9-year-old boys exceeds 120 cm. At 𝛼 = 0.10, is there sufficient
evidence to support the claim?
Boys Girls
Sample size 60 50
Mean height 123.5 126.2
Population variance 98 120
Solution:
Here, we will use z-test since the population standard deviations are known.
𝑥 −𝜇 0 123.5−120
4. Compute the test statistic 𝑧𝑜 = 𝜎 𝐵/ 𝑛𝐵
= 98
= 2.739
𝐵
60
1.282
6. Reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑧𝑜 falls in the rejection region.
7. At 10% level of significance, the mean height of 9-year-old boys is significantly higher than 120 cm.
Section 9.2: Testing the Difference between Two Means Using the t-Test
The z-test is used to test the difference between two means when the population standard deviations are
known and the populations are normally distributed, or when both sample sizes are greater than or equal to 30.
In many situations, however, these conditions cannot be met—that is, the population standard deviations are
not known. In these cases, a t-test is used when the two samples are independent and when the samples are
taken from two normally distributed populations. Samples are independent when they are not related.
To use the t-test, make sure the following assumptions are satisfied:
1. The samples must be independent of each other. That is, there can be no relationship between the
subjects in each sample.
2. The standard deviations, 𝜎1 and 𝜎2 , of both populations are not known.
3. The populations must be normally distributed or approximately normally distributed.
The steps of the hypothesis testing about the difference between means using t- test:
Step 1: State the null and alternative hypotheses and identify the claim.
Step 4: Evaluate the critical value and find the critical (rejection) region.
For right-tailed tests the rejection region is the region on the right of 𝑡𝛼
For left-tailed tests the rejection region is the region on the left of −𝑡𝛼 .
For two-tailed tests the rejection region is the region on the right of 𝑡𝛼/2 and the region on the left of
−𝑡𝛼/2 .
The following graphs show the rejection region (shaded region) for each type of tests
Step 5: Make the decision to reject or not reject the null hypothesis 𝐻𝑜 . In all types of tests, the null hypothesis
should be rejected when the value of the test statistic 𝑡𝑜 belongs to the rejection region.
Confidence intervals
Confidence intervals can also be found for the difference between two means with this formula:
Formula for the 1 − 𝛼 100% confidence interval for the difference between two population means is
𝑠12 𝑠22
𝑥1 − 𝑥2 ∓ 𝑡𝛼/2 + , 𝐷. 𝑓 = 𝑀𝑖𝑛{𝑛1 − 1 , 𝑛2 − 1}
𝑛1 𝑛2
Example 5:
The following table gives the percentages of oxygen uptake by air (the rest is by water) for redfish exposed to
temperature environments.
25oC 49 34 24 32 52 14 28 18 28 47 60
33oC 28 55 45 51 41 27 44 48 54 67 46 59
Assuming normality, do the given data present sufficient evidence to indicate the mean percentage of oxygen
uptake by air for redfish at 25oC is less than the mean at 33oC? Test using α = 0.05
Solution:
Here, we will use t-test since the population standard deviations are unknown.
3. To compute the test statistic, we must find the means and standard deviations of the two samples using
the formulas that given in chapter 3, 𝑥1 = 35.09, 𝑠1 = 14.88, 𝑥2 = 47.08, 𝑠2 = 11.62
𝑥1 − 𝑥2 35.09 − 47.08
𝑡𝑜 = = = −2.140
𝑠12 𝑠22 (14.88)2 (11.62)2
+
𝑛1 + 𝑛2 11 12
-1.812
5. Make the decision. Reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑡𝑜 falls in the rejection region.
6. Summarize the results. At 5% level of significance, there is a sufficient evidence to accept the claim that
the mean percentage of oxygen uptake by air for redfish at 25oC is less than the mean at 33oC.
Example 6:
Using the data in Example 5, construct a 90% C.I for the difference between the mean percentage of oxygen
uptake at 25oC and the mean at 33oC.
Solution:
𝛼
Here, 𝛼 = .10 → = .05 and 𝐷. 𝑓 = 𝑀𝑖𝑛 11 − 1 , 12 − 1 = 10 → 𝑡.05,10 = 1.812
2
(14.88)2 (11.62)2
47.08 − 35.09 ∓ (1.812) + = 11.99 ∓ 10.15 = 1.84,22.14
11 12
Since the constructed interval does not contain the number 0 , we can say the two means are not equal
at 𝛼 = .10, also, we can conclude the mean at 33oC exceeds the mean at 25oC.
Example 7:
The average size of a farm in region 1 is 191 acres. The average size of a farm in region 2 is 199 acres. Assume
the data were obtained from two samples with a standard deviations of 38 and 12 acres, respectively, and the
sample sizes of 8 and 10, respectively. Can it be concluded at 𝛼 = 0.05 that the average size of the farms in the
two regions is different? Assume the populations are normally distributed.
Solution:
Here, we will use t-test since the population standard deviations are unknown.
𝑥1 − 𝑥2 191 − 199
𝑡𝑜 = = = −0.57
𝑠12 𝑠22 (38)2 (12)2
+ 8 + 10
𝑛1 𝑛2
4. The rejection region is the region on the left of −2.365 and on the right of 2.365
-2.365 2.365
5. Do not reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑡𝑜 does not fall in the rejection region.
6. At 5% level of significance, there is not sufficient evidence to support the claim that the average size of
the farms is different.
Or equivalently 𝐻𝑜 : 𝑝1 = 𝑝2 versus 𝐻1 : 𝑝1 ≠ 𝑝2
Here, the researcher wants to know whether the two true proportions are equal or not.
Here, the researcher wants to know whether the first proportion exceeds the second proportion or not.
Here, the researcher wants to know whether the first proportion is lower than the second proportion or not.
To use the z-test, make sure the following assumptions are satisfied:
1. The samples must be independent of each other. That is, there can be no relationship between the subjects
in each sample.
2. 𝑛1 𝑝1 ≥ 5 and 𝑛1 1 − 𝑝1 ≥ 5 .
3. 𝑛2 𝑝2 ≥ 5 and 𝑛2 1 − 𝑝2 ≥ 5.
The steps of the hypothesis testing about the difference in proportions using z-test:
Step 1: State the null and alternative hypotheses and identify the claim.
Note:
1 1
3. Since 𝑝1 and 𝑝2 are unknown, the it estimated by 𝑝 1 − 𝑝 𝑛1
+𝑛
2
Step 4: Evaluate the critical value and find the critical (rejection) region.
For right-tailed tests the rejection region is the region on the right of 𝑧𝛼
For left-tailed tests the rejection region is the region on the left of −𝑧𝛼 .
For two-tailed tests the rejection region is the region on the right of 𝑧𝛼/2 and the region on the left of
−𝑧𝛼/2 .
The following graphs show the rejection region (shaded region) for each type of tests
Step 5: Make the decision to reject or not reject the null hypothesis 𝐻𝑜 .
In all types of tests, the null hypothesis should be rejected when the value of the test statistic 𝑧𝑜 belongs to the
rejection region.
An-Najah National University CH 9– Page 137
Elementary Statistics: A Step by Step Approach, Bluman, 7th Edition 2022-2023
Confidence intervals for the difference between two proportions can also be found. Confidence intervals can be
found by using this formula:
Formula for the (1 − 𝛼)100% confidence interval for the difference between two population proportions is
𝑝1 1 − 𝑝1 𝑝2 1 − 𝑝2
𝑝1 − 𝑝2 ∓ 𝑧𝛼/2 +
𝑛1 𝑛2
Example 8:
Suppose a drug company develops a new drug, designed to prevent colds. The company states that the drug is
equally effective for men and women. To test this claim, they choose a random sample of 100 women and 200
men from the population. At the end of the study, 48 of the women did not catch a cold; and 102 of the men did
not catch a cold. Based on these results, can we reject the company's claim that the drug is equally effective for
men and women? Use 𝛼 = 0.05
Solution:
𝑝 𝑤 −𝑝 𝑚 0.48−0.51
6. Compute the test statistic 𝑧𝑜 = 1 1
= 1 1
= −0.490
𝑝 1−𝑝 + 0.5(1−0.5) +
𝑛𝑤 𝑛𝑚 100 200
7. The rejection region is the region on the right of 1.96 or on the left of −1.96
8. Do not reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑧𝑜 does not fall in the rejection region.
9. At 5% level of significance, we cannot reject the company's claim that the drug is equally effective for
men and women .
Example 9:
Find the 95% C.I for the difference of proportions for the data in Example 8.
Solution:
𝑝 𝑚 1−𝑝 𝑚 𝑝 𝑤 1−𝑝 𝑤
A (1 − 𝛼)% C.I for the difference is 𝑝𝑚 − 𝑝𝑤 ∓ 𝑧𝛼/2 +
𝑛𝑚 𝑛𝑤
Example 10:
In a sample of 200 workers, 45% said that they missed work because of personal illness. Ten years ago in a
sample of 200 workers, 35% said that they missed work because of personal illness. At 𝛼 = 0.01, is there a
difference in the proportion?
Solution:
𝑛 1 𝑝 1 +𝑛 2 𝑝2 200(0.45)+200(0.35) 160
3. Here, 𝑝1 = 0.45, 𝑝2 = 0.35, 𝑝 = 𝑛 1 +𝑛 2
= 200+200
= 400 = 0.4
𝑝 1 −𝑝 2 0.45−0.35
6. Compute the test statistic 𝑧𝑜 = 1 1
= 1 1
= 2.04
𝑝 1−𝑝 + 0.4(1−0.4) +
𝑛1 𝑛2 200 200
7. The rejection region is the region on the right of 2.575 or on the left of −2.575
8. Do not reject the null hypothesis 𝐻𝑜 , since the test statistic 𝑧𝑜 does not fall in the rejection region.
9. At 1% level of significance, there is a sufficient evidence to reject the claim that there is a difference in
the proportion.
PROBLEMS
1. At age 9 the average weight (21.3 kg) and the average height (124.5 cm) for both boys and girls are exactly
the same. A random sample of 9-year-olds yielded these results. Estimate the mean difference in height
between boys and girls with 95% confidence interval. Does your interval support the claim?
Boys Girls
Sample size 60 50
Mean height, cm 123.5 126.2
Population variance 98 120
2.7 ∓ 3.94
2. The average length of "short hospital stays" for men is longer than that for women, 5.2 days versus 4.5 days.
A random sample of recent hospital stays for both men and women revealed the following. At 𝛼 = 0.01, is
there sufficient evidence to conclude that the average hospital stay for men is longer than the average
hospital stay for women?
Men Women
Sample size 32 30
Sample mean, days 5.5 4.2
Population standard deviation 1.2 1.5
𝑧𝑜 = 3.75, 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻𝑜
3. According to the Nielsen Media Research, children (ages 2 – 11) spend an average of 21 hours 30 minutes
watching television per week while teens (ages 12 – 17) spend an average of 20 hours 40 minutes. Based on
the sample statistics obtained below, is there sufficient evidence to conclude a difference in average
television watching times between the two groups? Use 𝛼 = 0.01.
Children Teens
Sample size 15 15
Sample mean, hours 22.45 18.50
Sample variance 16.4 18.2
4. Females and males alike from the general adult population volunteer. A random sample of 20 female
college students and 18 male college students indicated these results concerning the amount of time spent
in volunteer service per week. At the 0.10 level of significance, is there sufficient evidence to conclude that
the mean number of volunteer hours per week for male is less than the mean number of volunteer hours
per week for females?
Male Female
Sample size 18 20
Sample mean, hours 2.5 3.8
Sample variance 2.2 3.5
𝑡𝑜 = −2.38, 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻𝑜
5. Health Care Knowledge Systems reported that an insured woman spends on average 2.3 days in the
hospital for a routine childbirth, while an uninsured woman spends on average 1.9 days. Assume two
samples of 16 women each were used in both samples. The standard deviation of the first sample is 0.6 day,
and the standard deviation of the second sample is 0.3 day. At 𝛼 = 0.05, test the claim that the means are
equal. Find the 95% confidence interval for the difference of the means.
𝑡𝑜 = 2.39, 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻𝑜
6. In a sample of 200 men, 130 said they used seat belts. in a sample of 300 women, 63 said they used seat
belts. Test the claim that men more safety-conscious than women, at 𝛼 = 0.05. Find 90% confidence
interval for the difference in proportions.
BIOSTATISTICS
To answer this question, a researcher selects a sample of nurses and doctors and tabulates the data in table
form, as shown.
As the survey indicates, 100 nurses prefer the new procedure, 80 prefer the old procedure, and 20 have no
preference; 30 doctors prefer the new procedure, 60 prefer the old procedure, and 10 have no preference. Since
the main question is whether there is a difference in opinion, the null hypothesis is stated as follows:
𝑯𝒐 : The opinion about the procedure is independent of (not related to) the profession.
𝑯𝟏 : The opinion about the procedure is dependent on (related to) the profession.
If the null hypothesis is not rejected, the test means that both professions feel basically the same way about the
procedure and the differences are due to chance. If the null hypothesis is rejected, the test means that one
group feels differently about the procedure from the other. Remember that rejection does not mean that one
group favors the procedure and the other does not. Perhaps the two groups favor it or both dislike it, but in
different proportions.
To test the null hypothesis by using chi-square test for independence, you must follow the steps:
Step 1
Arrange the data obtained from the sample in a contingency table. The table is made up of 𝑅 rows and 𝐶
columns. Here 𝑅 = 2 and 𝐶 = 3.A contingency table is designed as an𝑅 × 𝐶 table, in this example the table is
2 × 3 contingency table. Each block in the table is called a cell and is designated by its row and column position.
For example, the cell of observed frequency of 80 is designated as 𝑂1,2 . The cells are shown below
Step 2
Compute the expected frequencies, assuming that the null hypothesis is true. These frequencies are computed
by using the observed frequencies given in the table. The expected frequency for each cell (𝐸𝑖,𝑗 ) is computed by
using the formula
𝑇𝑅𝑖 × 𝑇𝐶𝑗
𝐸𝑖𝑗 =
𝑛
Where, 𝑇𝑅𝑖 = the sum of frequencies in the ith row, 𝑇𝐶𝑗 = the sum of frequencies in the jth column, and 𝑛 is the
sample size (the sum of all frequencies)
Note:
1. Sum of observed frequencies in the ith row = Sum of expected frequencies in the ith row = 𝑇𝑅𝑖
2. Sum of observed frequencies in the jth column = Sum of expected frequencies in the jth column = 𝑇𝐶𝑗
3. Sum of all observed frequencies = Sum of all expected frequencies = n
Step 3
Step 4
Determine the level of significance 𝛼 and find the value 𝜒𝛼2 at degrees of freedom 𝐷. 𝑓 = 𝑅 − 1 𝐶 − 1
2
In this example, 𝐷. 𝑓 = 2 − 1 3 − 1 = 2. If 𝛼 = 0.05, the critical value will be 𝜒0.05,2 = 5.991
Step 5
The 5th step is to make the decision. The null hypothesis will be rejected if 𝜒𝑜2 > 𝜒𝛼2
In this example, 𝜒𝑜2 = 11.86 > 5.991 = 𝜒𝛼2 , the decision is to reject 𝐻𝑜
Step 6
Summarize the results. There is sufficient evidence to support the claim that the opinion is related (dependent
on) profession, that is, the doctors and nurses differ in their opinions about the procedure.
Example 1:
A researcher wishes to determine whether there is a relationship between the gender of an individual and the
amount of coffee consumed. A random sample of 68 people is selected, and the following data are obtained:
Coffee consumption
Gender Low Moderate High Total
Male 10 9 8 27
Female 13 16 12 41
Total 23 25 20 68
At 𝛼 = 0.10, can the researcher conclude the coffee consumption is related to gender?
Solution:
Coffee consumption
Gender Low Moderate High Total
Male 10 (9.13) 9 (9.93) 8 (7.94) 27
Female 13 (13.87) 16 (15.07) 12 (12.06) 41
Total 23 25 20 68
5. Make the decision. Do not reject 𝐻𝑜 , since 𝜒𝑜2 = 0.283 < 4.606 = 𝜒𝛼2
6. Summarize the results. There is no sufficient evidence to support the claim that the amount of coffee a
person consumes is related to the individual’s gender.
Example 2:
Use the data in the above Example to construct a 95% confidence interval for:
a) The proportion of the population who consume coffee with high level.
b) The proportion of males who consume coffee with low level.
c) The difference in the proportion of males and females who are consume coffee with high level.
0.294 1 − 0.294
0.294 ∓ 1.96 = 0.294 ∓ 0.108 = 0.186 , 0.402
68
0.37 1 − 0.37
0.37 ∓ 1.96 = 0.37 ∓ 0.18 = 0.19 , 0.55
27
When you are testing to see whether a frequency distribution fits a specific pattern, you can use the chi-square
goodness-of-fit test.
For example, suppose as a market analyst you wish to see whether consumers have any preference among five
flavors of a new fruit soda. A sample of 100 people provided these data:
To answer this question, we will use the chi-square goodness-of-fit test as shown below:
Step 1:
State the hypotheses 𝐻𝑜 and 𝐻1 , and identify which one is the claim.
In the above example:
1
𝐻𝑜 : Consumers show no preference for flavors of the fruit soda. (𝑝𝑀 = 𝑝𝑆 = 𝑝𝑂 = 𝑝𝐿 = 𝑝𝐺 = )
5
Step 2:
Compute the test value 𝜒𝑜2 by evaluating the expected frequencies, 𝐸𝑖 = 𝑛𝑝𝑖
1
In this example: 𝐸𝑀 = 𝐸𝑆 = 𝐸𝑂 = 𝐸𝐿 = 𝐸𝐺 = 100 = 20
5
2
𝑂−𝐸 𝑂2 322 282 162 142 102
𝜒𝑜2 = = −𝑛= + + + + − 100 = 18
𝐸 𝐸 20 20 20 20 20
Step 4: Find the rejection region (critical region), which is on the right of 𝜒𝛼2 with 𝐷. 𝑓 = 𝑘 − 1, where 𝑘 is the
number of categories.
2
In this example 𝐷. 𝑓 = 𝑘 − 1 = 5 − 1 = 4. At 𝛼 = 0.05, 𝜒.05,4 = 9.49
2
Step 5:If 𝜒𝑜2 > 𝜒𝛼2 , then reject 𝐻𝑜 . Here𝜒𝑜2 = 18 > 𝜒.05 = 9.49. Reject 𝐻𝑜 and accept 𝐻1
We can say at 𝛼 = .05 that consumers show a preference for flavors of the fruit soda.
Example 3:
For the data in the above example, test the claim that the consumers prefer the mango, strawberry, orange,
lime, and grape flavors with ratio 3:3:2:1:1, respectively. Use 𝛼 = 0.10
To answer this question, we will use the chi-square goodness-of-fit test as shown below:
Step 1:
State the hypotheses 𝐻𝑜 and 𝐻1 , and identify which one is the claim.
In the above example:
3 2 1
𝑝𝑀 = 𝑝𝑆 = 3+3+2+1+1 = 0.3, 𝑝𝑂 = 3+3+2+1+1 = 0.2, 𝑝𝐿 = 𝑝𝐺 = 3+3+2+1+1 = 0.1
Step 2:
Compute the test value 𝜒𝑜2 by evaluating the expected frequencies, 𝐸𝑖 = 𝑛𝑝𝑖
In this example:
𝐸𝑀 = 𝑛𝑝𝑀 = 100 0.3 = 30
𝐸𝑆 = 𝑛𝑝𝑆 = 100 0.3 = 30,
𝐸𝑂 = 𝑛𝑝𝑂 = 100 0.2 = 20,
𝐸𝐿 = 𝑛𝑝𝐿 = 100 0.1 = 10,
𝐸𝐺 = 𝑛𝑝𝐺 = 100 0.1 = 10
2
𝑂−𝐸 𝑂2 322 282 162 142 102
𝜒𝑜2 = = −𝑛= + + + + − 100 = 2.667
𝐸 𝐸 30 30 20 10 10
Step 4: Find the rejection region (critical region), which is on the right of 𝜒𝛼2 with 𝐷. 𝑓 = 𝑘 − 1, where 𝑘 is the
number of categories.
2
In this example 𝐷. 𝑓 = 𝑘 − 1 = 5 − 1 = 4. At 𝛼 = 0.10, 𝜒.10,4 = 7.78
2
Step 5:If 𝜒𝑜2 > 𝜒𝛼2 , then reject 𝐻𝑜 . Here 𝜒𝑜2 = 2.667 < 𝜒.10 = 7.78. Accept 𝐻𝑜 and reject 𝐻1
We can say at 𝛼 = .10 that the consumers prefer the mango, strawberry, orange, lime, and grape flavors
with ratio 3:3:2:1:1.
Example 4:
In a certain population, the percents of people with each blood type are as follows: O, 6%; A, 40%; B, 42%; and
AB, 12%. At a recent blood drive at a large university, the donors were classified as shown below. At 𝛼 = .05, is
there sufficient evidence to conclude that the proportions differ from those stated above?
O A B AB Total
10 65 60 15 150
Solution:
Step 1:
𝐻𝑜 : 𝑝𝑂 = .065, 𝑝𝐴 = .40, 𝑝𝐵 = .42, 𝑝𝐴𝐵 = .12
𝐻1 : not𝐻𝑜 (claim)
Step 2:
𝑛 = 150
𝐸𝑂 = 𝑛𝑝𝑂 = 150 . 06 = 9,
𝐸𝐴 = 𝑛𝑝𝐴 = 150 . 4 = 60,
𝐸𝐵 = 𝑛𝑝𝐵 = 150 . 42 = 63,
and 𝐸𝐴𝐵 = 𝑛𝑝𝐴𝐵 = 150 . 12 = 18
O A B AB Total
Observed 10 65 60 15 150
Expected 9 60 63 18 150
PROBLEMS
1. Listed below is information regarding organ transplantation for three different years. Based on these data,
is there sufficient evidence at 𝛼 = 0.01 to conclude that a relationship exists between year and type of
transplant?
2. A study is being conducted to determine whether the age of the customer is related to type of movie he or
she rents. A sample of renters gives the data shown here. At 𝛼 = 0.10, is the type of movie selected related
to customer’s age?
Type of movie
Age Documentary Comedy Mystery
12 – 20 14 9 8
21 – 29 15 14 9
30 – 38 9 21 39
39 – 47 7 22 17
48 and over 6 38 12
𝜒𝑜2 = 46.696
3. To test the effectiveness of a new drug, a researcher gives one group of individuals the new drug and
another group a placebo. The results of the study are shown here. At 𝛼 = 0.05, can the researcher conclude
that the drug is effective?
𝜒𝑜2 = 10.637
4. To test whether a die is fair, a student rolled the die 300 times and the following data were obtained
1 2 3 4 5 6 Total
Observed frequency 45 52 60 47 48 48 300
At 𝛼 = 0.10, can the student conclude that the die is fair? 𝜒𝑜2 = 2.92
𝜒∝2
BIOSTATISTICS
One of techniques that can be used to compare two or more population means is analysis of variance (ANOVA)
technique. This technique is used to test claims involving two or more means. For example, suppose a
researcher wishes to see whether the means of the time it takes three groups of students to solve a computer
problem using Fortran, Basic, and Pascal are different. The researcher will use the ANOVA technique for this
test.
The analysis of variance that is used to compare two or more means is called a one-way analysis of variance
since it contains only one variable. In the previous example, the variable is the type of computer language used.
The analysis of variance can be extended to studies involving two variables, such as type of computer language
used and mathematical background of the students. These studies involve a two-way analysis of variance.
Case Study
A researcher wishes to try three different methods to lower the blood pressure of individuals diagnosed with
high blood pressure. The subjects were randomly assigned to three groups each of 5 subjects; the first group
takes medication, the second group exercises, and the third group follows a special diet. After four weeks, the
reduction in each person’s blood pressure is recorded and the following data obtained:
10 6 5
12 8 9
9 3 12
15 0 8
13 2 4
The researcher wishes to know whether the three methods are equivalent or not, if not which is the best
method to lower the blood pressure.
To answer these questions, we will explain how to use the ANOVA technique in the following section.
1. The populations from which the samples were obtained must be normally distributed.
2. The samples must be independent from one another.
3. The variances of the populations must be equal. 𝜎12 = 𝜎22 = ⋯ = 𝜎𝑘2 = 𝜎 2
To test whether there are differences between the 𝑘 population means, follow the steps below:
Step 1:
𝐻𝑜 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 n
Step 2:
Compute the test value. Even though you are comparing means in the ANOVA, variances are used in the test
instead of means. Two different estimates of the population variance are made.
1. The first estimate is called the between-group variance, (denoted by 𝑆𝐵2 ) and it involves finding the
variance of the means.
2
2. The second estimate, the within group variance, (denoted by 𝑆𝑊 ) is made by computing the mean of
group variances and it is not affected by differences in the means.
2
Note: In this case, the variance 𝑆𝑊 is also called the pooled variance and denoted by 𝑆𝑝2
If there is no difference in the means, the between-group variance estimate will be approximately equal to the
within-group variance estimate, and then the null hypothesis will not be rejected. However, when the means
differ largely, the between-group variance will be much larger than the within-group variance, and then the
null hypothesis will be rejected.
Since variances are compared, this procedure is called analysis of variance (ANOVA).
**How to evaluate the between-group variance (𝑠𝐵2 ) and the within-group variance (𝑠𝑊
2
)
1. Evaluate the mean and the variance for each group, 𝑥1 , 𝑥2 , … , 𝑥𝑘 and 𝑠12 , 𝑠22 , … , 𝑠𝑘2
𝑛𝑖 𝑥𝑖 𝑛 1 𝑥 1 +𝑛 2 𝑥 2 +⋯+𝑛 𝑘 𝑥 𝑘
2. Evaluate the combined mean, 𝑥𝑐 = 𝑛
= 𝑛
, where 𝑛𝑖 is the number of observations in
Step 3:
Determine the level of significance 𝛼 and then find the critical value 𝐹𝛼 with 𝐷. 𝑓1 = 𝑘 − 1 and
𝐷. 𝑓2 = 𝑛 − 𝑘 . This critical value is obtained from the table of the 𝐹- distribution.
Step 4:
Make the decision. The decision is to reject the null hypothesis when the evaluated test statistic is
greater than the critical value 𝐹𝑜 > 𝐹𝛼
Step 5:
Example 1:
Given the data in the above study case, test the claim that there is no difference among the means at 𝛼 = 0.05
Solution:
𝐷. 𝑓1 = 𝑘 − 1 = 3 − 1 = 2
𝐷. 𝑓2 = 𝑛 − 𝑘 = 15 − 3 = 12
At 𝛼 = 0.05, 𝐹𝛼 = 𝐹0.05 = 3.89
4. Since 𝐹𝑜 = 9.168 > 𝐹𝛼 = 3.89 , the decision is to reject 𝐻𝑜 and accept 𝐻1 (p-value=0.004)
5. There is sufficient evidence to reject the claim and conclude that at least one mean is different from the
others.
Another Notation
In statistical programs, such as MINITAB, SPSS, R…etc, the calculations above are summarized in table as
follows:
Sum of Mean
Source Squares D.F. Squares 𝑭𝒐
Between groups 𝑆𝑆𝐵 𝑘−1 𝑀𝑆𝐵 𝑀𝑆𝐵/ 𝑀𝑆𝐸
Within (Error) 𝑆𝑆𝑊 or 𝑆𝑆𝐸 𝑛−𝑘 𝑀𝑆𝑊 or 𝑀𝑆𝐸
Total 𝑆𝑆𝑇 𝑛−1
1. 𝑆𝑆𝐵 = sum of squares between groups = 𝑛𝑖 𝑥𝑖2 − 𝑛𝑥𝑐2 = the numerator of 𝑠𝐵2
2. 𝑆𝑆𝑊 or 𝑆𝑆𝐸 = sum of squares within groups = 𝑛𝑖 − 1 𝑠𝑖2 =the numerator of 𝑠𝑊
2
3. 𝑆𝑆𝑇 = 𝑥 2 − 𝑛𝑥𝑐2 = 𝑆𝑆𝐵 + 𝑆𝑆𝑊 and it is called the total sum of squares
4. 𝑘 = number of groups
5. 𝑛 = 𝑛𝑖 = number of all observations
6. Notice that 𝑛 − 1 = 𝑘 − 1 + 𝑛 − 𝑘
𝑆𝑆𝐵
7. 𝑀𝑆𝐵 = 𝑘−1 = 𝑠𝐵2
𝑆𝑆𝑊 2
8. 𝑀𝑆𝐸 = 𝑀𝑆𝑊 = = 𝑠𝑊 .
𝑛−𝑘
9. 𝐹𝑜 = 𝑀𝑆𝐵/ 𝑀𝑆𝐸
As an illustration, the ANOVA table for the previous example is constructed as follows:
Sum of Mean
Source Squares D.F. Squares 𝑭𝒐 𝑭𝟎.𝟎𝟓
Between 160.21 2 80.105 9.173 3.89
Within (Error) 104.8 12 8.733
Total 265.01 14
2 104.8 2
𝑠𝑊 = 12
= 8.733 → 𝑆𝑆𝑊 = 104.8 and 𝑀𝑆𝑊 = 𝑠𝑊 = 8.733
Example 2:
Complete the following ANOVA table. State the hypotheses, and make a decision.
Sum of Mean
Source Squares D.F. Squares 𝑭𝒐 𝑭𝟎.𝟎𝟓
Between ___a___ 3 __c__ __e__ __f__
Within (Error) 42.333 __b__ __d__
Total 92.950 19
Solution:
3. 3 + 𝑏 = 19 → 𝑏 = 19 – 3 = 16
5. 𝑒 = 𝑀𝑆𝐵/𝑀𝑆𝐸 = 𝐹𝑜 = 6.376
6. 𝑓 = 𝐹.05,3,16 = 3.24
Example 3:
Solution:
Example 4:
The number of grams of fiber per serving for a random sample of three different kinds of food is listed. Is there
sufficient evidence at 𝛼 = 0.05 to conclude that there is a difference in mean fiber content among breakfast
cereals, fruits, and vegetables? Given the following data do a complete ANOVA.
Solution:
4. Make the decision. Since 𝐹𝑜 = 2.109 < 𝐹𝛼 = 3.47, the decision is to not reject 𝐻𝑜 (p_value=0.146)
5. Summarize the results. There is no sufficient evidence to support the claim.
6. There is no need to compare means, since the null hypothesis is not rejected.
Sum of Mean
Source Squares D.F. Squares 𝑭𝒐 𝑭𝟎.𝟎𝟓
Between 25.68 2 12.84 2.109 3.47
Total 153.497 23
PROBLEMS
1. In an experiment to determine the effect of nutrition on the attention spans of elementary school students,
a group of 15 students were randomly assigned to each of 3 meal plans: no breakfast, light breakfast, and
full breakfast. Their attention spans (in minutes) were recorded during a morning reading period and are
shown in the following table. Does the type of breakfast affect the attention spans? Test using 𝛼 = 0.05
2. A researcher wishes to see whether there is any difference in the weight gains of athletes following one of
three special diets. Athletes are randomly assigned to three groups and placed on the diet for 6 weeks. The
weight gains (in pounds) are shown below. At 𝛼 = 0.05, can the researcher conclude that there is a
difference in the diets?
3. The amount of sodium (in mg) in one serving for a random sample three different kinds of foods is
measured and summarize in the following ANOVA table. Complete the table and test whether there is a
difference in mean sodium amounts among condiments, cereals, and desserts. Use 𝛼 = 0.05
Sum of Mean
Source Squares D.F. Squares 𝑭𝒐 𝑭𝟎.𝟎𝟓
Between 275.4 ____ ______ ______ ______
Within (Error) _______ 19 ______
Total 1366.3 21
𝐹𝑜 = 2.40
See and solve the following examples and exercises from the text book:
Section 12 – 1
Examples: 1, 2
Exercises: 8, 9, 12, 14
Df1
Df2 1 2 3 4 5 6 7 8 9 10 Df2
1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9 1
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 2
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 3
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 4
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 5
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 6
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 7
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 8
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 9
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 10
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 11
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 12
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 13
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 14
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 15
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 16
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 17
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 18
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 19
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 20
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 21
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 22
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 23
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 24
25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 25
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 26
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20 27
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19 28
29 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.18 29
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 30
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 40
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 60
120 3.92 3.07 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.91 120
∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.63 1.83 ∞