Download as pdf or txt
Download as pdf or txt
You are on page 1of 122

Beni-Suef National University

Faculty of Computers and Artificial Intelligence

Probability and Statistics

2nd Year Students


2023-2024
Prepared by
Dr. Alaa H. Abdel-Hamid
Professor of Mathematical Statistics
Contents
Chapter 1
INTRODUCTION TO PROBABILITY 1
Chapter 2
RANDOM VARIABLES, PROBABILITY FUNCTIONS
AND EXPECTATIONS 20
Chapter 3
SOME IMPORTANT DISCRETE DISTRIBUTIONS 37
Chapter 4
SOME IMPORTANT CONTINUOUS DISTRIBUTIONS 57
Chapter 5
SAMPLING THEORY 73
Chapter 6
POINT AND INTERVAL ESTIMATIONS 88
Chapter 7
TESTS OF HYPOTHESES 98
REFERENCES 120
Chapter 1

INTRODUCTION TO
PROBABILITY

The purpose of this chapter is to present a formal treatment of


the mathematical elements of probability theory.

1.1 Elements of Probability


One of the basic features to be found in repetitive operations,
that is, repeating a trial or experiment over and over again,
under specified conditions is that the outcome varies from trial
to trial. This leads us to analyze the possible outcomes which
could arise if a trial or experiment were performed once.

1.1.1 Some important definitions


Definition 1.1 (Random experiment (E)). The random exper-
iment, denoted by E, is an experiment in which we can not
predict any of its outcomes can occur but, we know before the
set of all possible outcomes of it.
Definition 1.2 (Sample space (S)). The sample space, denoted
by S, is the set of all possible outcomes which could arise if a
random experiment were performed once.

1
Definition 1.3 (Event). The event is any subset of finite sample
space S
Example 1.1 (Rolling a die). The sample space, S, when we
role a die is
S = {1, 2, 3, 4, 5, 6}.
Example 1.2 (Tossing a coin). The sample space, when we toss
a coin is
S = {H, T }, H: Head, T : Tail.
Example 1.3 (Rolling a die and tossing a coin). The sample
space can be written in the form
S ={1, 2, 3, 4, 5, 6} × {H, T }
={(1, H), (2, H), . . . , (6, H), (1, T ), (2, T ), . . . , (6, T )}.
Example 1.4. If E is the experiment of tossing a coin until
the head occurs, and hence by calculating the number of times
tossing the coin, we get
S = {1, 2, 3, . . . , ∞}.
Example 1.5 (Tossing 3 coins or equivalently tossing one coin
3 times).
S ={H, T } × {H, T } × {H, T }
={HHH, HHT, HT T, HT H, T HH, T T H, T HT, T T T }.
Note that, for simplicity we write, for example, (H, T, H) ≡
HT H. The number of prime events also in S is equal to 23 .

Remark 1.1.
1. If the event consists of only one element then it is called
prime event.
2. If the event contains no elements, then it is called impossible
event.

2
3. If the event is the sample space itself, then it is called cer-
tain event.
4. For example, A = {2}, B = {5} are considered prime
events in S of Example (1.1), but D = {3, 4, 7} is not an
event in this sample space since 7 is not an element in S.
From the last definitions, one can notice that the sample
space may be finite as in Examples (1.1), (1.2) and (1.3), and
may be infinite as in Example (1.4).
Definition 1.4 (Complement event). The complement Ac of an
event A consists of all elements in S which are not in A.

1.2 Operations on the Events


If A and B are two events in a sample space S, then we define
the following:
1. A ∪ B : The event which consists of all elements con-
tained in A or B or both. Equivalently, we can say the
event that at least one of the two events A, B occurs.
2. A ∩ B : The event which consists of the elements con-
tained in A and B. Equivalently, we can say the event that
the two events A, B occur in the same time.
3. Ac : The event that A does not occur.
4. A − B : A and not B occurs.
5. (A ∩ B)c : At most one of the two events occurs.
6. (A − B) ∪ (B − A) = (A ∪ B) − (A ∩ B) : Exactly one
of the two events occurs.
Definition 1.5 (Mutually exclusive events). We say that the
two events A and B in a sample space S are mutually exclusive
events if A ∩ B = ∅, i.e. A and B not occur in the same time.

3
In general, we say that the events A1 , A2 , . . . , An in a sample
space S are mutually exclusive events if

Ai ∩ Aj = ∅, i ̸= j, i, j = 1, 2, . . . , n.

Definition 1.6 (Partition). The events A1 , A2 , . . . , An represent


a partition to the sample space S if the following two conditions
are satisfied.

(i) A1 ∪ A2 ∪ · · · ∪ An ≡ ni=1 = S
(ii) Ai ∩ Aj = ∅ ∀ i ̸= j, i, j = 1, 2, . . . , n.
Remark 1.2.
(i) A ∪ Ac = S (ii)A ∩ Ac = ∅
(iii)A ∪ ∅ = A (iv)A ∩ ∅ = ∅
(v)A ∪ S = S (vi)A ∩ S = A
(vii)A − B = A ∩ B c
(viii)S c = ∅
(ix)(A ∪ B)c = Ac ∩ B c (x) (A ∩ B)c = Ac ∪ B c
(xi)A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
Example 1.6. If S = {1, 2, 3, 4, 5, 6}, A = {1, 3, 5},
B = {2, 4, 6}, C = {1, 3, 6}. Find
(i) A ∪ B (ii) A ∩ C (iii) Ac ∩ C
(iv) Are the two events A and B represent a partition to S?

(i) A ∪ B = {1, 2, 3, 4, 5, 6} = S.
(ii) A ∩ B = ∅.
(iii) Ac ∩ C = {6}.
(iv) From (i) and (ii), we conclude that A and B represent a
partition to S.
Definition 1.7 (Probability function). Let S be a sample space,
a probability function P (.) is a set function with domain D (an
algebra of events) and counterdomain the interval [0,1] which
satisfies the following axioms:

4
A1 0 ≤ P (A) ≤ 1 ∀A ∈ D.
A2 P (S) = 1
A3 If A1 , A2 , A3 , . . . , is a sequence of mutually exclusive events
in D, then (∞ )
∪ ∑∞
P Ai = P (Ai ),
i=1 i=1
where P (A) is read the probability of the event A.
A1 , A2 and A3 are sometimes called the axioms of probability.

∪ D, we had the property:


Remark 1.3. For an algebra
If A1 and A2 ∈ D, then A1 A2 ∈ D.
Note also that the probability, P , of an event A, A ∈ S, is
calculated as follows
number of ways of occurring the event A
P (A) = .
number of ways of occurring the sample space S
Theorem 1.1. If A and B are any two events in a sample space
S, then we have
(i) P (∅) = 0.
(ii) P (Ac ) = 1 − P (A).
(iii) If A ⊆ B, then P (A) ≤ P (B).
(iv) P (A − B) = P (A) − P (A ∩ B).
(v) P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
Proof. Since ∅ ∪ S = S, ∅ ∩ S = ∅, then
P (∅ ∪ S) = P (∅) + P (S) = P (S) ⇒ P (∅) = 0, and this
proves (i).
Since A ∪ Ac = S, A ∩ Ac = ∅, then
P (A ∪ Ac ) = P (A) + P (Ac ) = P (S) = 1 and hence P (Ac ) =
1 − P (A). and this proves (ii).
By writing B as a union of two disjoint events as follows

5
B = A ∪ (B − A), then
P (B) = P (A) + P (B − A),
but since 0 ≤ P (A − B) ≤ 1, then P (A) ≤ P (B), and this
proves (iii).
By writing A as a union of two disjoint events, so
A = (A − B) ∪ (A ∩ B), then
P (A) = P (A − B) + P (A ∩ B), and hence
P (A − B) ≡ P (A ∩ B c ) = P (A) − P (A ∩ B), and this proves
(iv).
Similarly, one can prove that P (B − A) = P (B) − P (A ∩ B).
Now, by writing A ∪ B as a union of three mutually exclusive
events as follows
A ∪ B = (A − B) ∪ (A ∩ B) ∪ (B − A)
and hence

P (A ∪ B) =P (A − B) + P (B − A) + P (A ∩ B)
=P (A) − P (A ∩ B) + P (B) − P (A ∩ B) + P (A ∩ B)
=P (A) + P (B) − P (A ∩ B),
and this proves (v).
Example 1.7. Free two balanced coins are tossed simultane-
ously. Let A = {HH}, B = {HT, T H} and C = {HH, T T, HT }
be are three events in the sample space S of this experiment.
Find
(i) P (A ∪ B) (ii) P (A ∩ C) (iii) P (C − A)

(i) Here, the sample space is S = {HH, HT, T H, T T }. Since


A ∩ B = ∅, then
P (A ∪ B) =P (A) + P (B)
=0.25 + 0.5 = 0.75.

(ii) P (A ∩ C) = P (HH) = 0.25.

6
(iii) P (C − A) = P (C) − P (A ∩ C) = 0.75 − 0.25 = 0.25.
Example 1.8. Prove that if P (Ac ) = α and P (B c ) = β, then
P (A ∩ B) ≥ 1 − α − β.
Proof. Since P (Ac ) = 1 − P (A), then α = 1 − P (A) ⇒ P (A) =
1 − α.
Similarly, we can get P (B) = 1 − β, and hence
P (A ∩ B) =P (A) + P (B) − P (A ∪ B)
=1 − α + 1 − β + (−P (A ∪ B)).
Since 0 ≤ P (A ∪ B) ≤ 1 ⇒ −P (A ∪ B) ≥ −1, then
P (A ∩ B) ≥ 1 − α + 1 − β − 1 = 1 − α − β.
Example 1.9. If P (A ∩ B c ) = 0.2 and P (B c ) = 0.7. Find
P (A ∪ B).
∵ P (B c ) = 1 − P (B) ⇒ 0.7 = 1 − P (B) ⇒ P (B) = 0.3,
∵ P (A) = P (A ∩ B) + P (A ∩ B c ),
∴ P (A) − P (A ∩ B) = P (A ∩ B c ) = 0.2,
∵ P (A ∪ B) = P (A) + P (B) − P (A ∩ B),
∴ P (A ∪ B) = 0.2 + 0.3 = 0.5.
Example 1.10. A point is selected at random inside an equilat-
eral triangle whose side length is 3. Find the probability that its
distance to any corner is greater than 1.
Let A denote the set of points inside the shaded part con-
tained in the triangle, and so A consists of all points that its
distance to any corner is greater than 1, and let S denote the
set of points inside the triangle whose side length is 3.
Area of A = area of the triangle
− 3 (area of any sector in this triangle)
(π ) ( )
1 1 π
= × 3 × 3 × sin −3 × 12 ×
2 3 2 3
9 √ π
= 3− .
4 2
7
area of A
Since P (A) = ,
area of S

(9/4) 3 − π/2
⇒ P (A) = √ .
(9/4) 3

1.3 Conditional Probability and Multiplica-


tion Rule
Let us consider an experiment of recording the life of a light
bulb. If anyone interested in the probability that the bulb will
last 90 hours given that it has already lasted for 25 hours. Or
consider an experiment of sampling from a box containing 90
resistors of which 5 are defective. What is the probability that
the third draw results in a defective given that the first two
draws resulted in defective?. This question leads us to study
the following subject.
Definition 1.8 (Conditional probability). Let A and B be two
events in a sample space S and let P (B) > 0. Then the condi-
tional probability of the event A, given B, denoted by
P (A | B), is defined by
P (A ∩ B)
P (A | B) = .
P (B)
From the last equation we can write
P (A ∩ B) =P (B)P (A | B)
=P (A)P (B | A), by symmetry.
Since B = (A ∩ B) ∪ (Ac ∩ B), we have
P (B) =P {(A ∩ B) ∪ (Ac ∩ B)}
=P (A)P (B | A) + P (Ac )P (B | Ac ).

8
Multiplication Rule of Probabilities
If A1 , A2 , . . . , An are n events in a sample space S, then
P (A1 ∩ A2 , ∩, . . . , ∩An ) =P (A1 )P (A2 | A1 )P (A3 | A1 ∩ A2 ) . . .
P (An | A1 ∩ A2 ∩, . . . , ∩An−1 ).
Example 1.11. Two different digits are selected at random from
the digits 1 through 9.
(i) If the sum is even, find the probability that both numbers are
odd?
(ii) If the sum is odd, what is the probability that 2 is one of
the numbers selected?
(iii) If 2 is one of the digits selected, what is the probability that
the sum is odd?
Let’s solve each part of the problem step by step:
(i) If the sum is even, find the probability that both numbers
are odd?
To find the probability that both numbers are odd when the
sum is even, we first need to determine the total number of ways
to select two different digits from the digits 1 through 9.
There are 9 choices for the first digit and 8 choices for the
second digit (since it must be different from the first one). So,
there are 9 * 8 = 72 possible pairs of digits.
Now, we need to find the pairs where both numbers are odd
and the sum is even. The odd digits in the range 1 through 9
are 1, 3, 5, 7, and 9. To get an even sum, we can have pairs like
(1, 3), (1, 5), (1, 7), (3, 5), (3, 7), and (5, 7).
There are 6 such pairs, and for each pair, there are 2 ways to
arrange the digits (e.g., (1, 3) and (3, 1)). So, there are a total
of 6 * 2 = 12 pairs where both numbers are odd, and the sum
is even.

9
The probability of selecting one of these 12 pairs out of the
72 possible pairs is:
Probability = (Number of favorable outcomes) / (Total num-
ber of outcomes) Probability = 12 / 72 Probability = 1/6
So, the probability that both numbers are odd when the sum
is even is 1/6.
(ii) If the sum is odd, what is the probability that 2 is one of
the numbers selected?
To find the probability that 2 is one of the numbers selected
when the sum is odd, we again need to determine the total
number of ways to select two different digits from the digits 1
through 9, which is 72.
Now, we want to find the pairs where the sum is odd and one
of the numbers is 2. The odd digits are 1, 3, 5, 7, and 9, and we
want to pair 2 with one of these odd digits to get an odd sum.
There are 5 possibilities for the odd digit to pair with 2. For
each of these 5 cases, there are 2 ways to arrange the digits (e.g.,
(2, 1) and (1, 2)).
So, there are a total of 5 * 2 = 10 pairs where 2 is one of the
numbers selected, and the sum is odd.
The probability of selecting one of these 10 pairs out of the
72 possible pairs is:
Probability = (Number of favorable outcomes) / (Total num-
ber of outcomes) Probability = 10 / 72 Probability = 5/36
So, the probability that 2 is one of the numbers selected when
the sum is odd is 5/36.
(iii) If 2 is one of the digits selected, what is the probability
that the sum is odd?
To find the probability that the sum is odd when 2 is one of
the digits selected, we can consider the cases where 2 is paired
with an odd digit (1, 3, 5, 7, or 9) to form a sum.
We already determined in part (ii) that there are 10 such
pairs where 2 is one of the numbers selected, and the sum is

10
odd.
The probability of selecting one of these 10 pairs out of the
total possible pairs with 2 is:
Probability = (Number of favorable outcomes) / (Total num-
ber of outcomes) Probability = 10 / 72 Probability = 5/36
So, the probability that the sum is odd when 2 is one of the
digits selected is also 5/36.
To summarize:
(i) Probability that both numbers are odd when the sum is
even: 1/6
(ii) Probability that 2 is one of the numbers selected when
the sum is odd: 5/36
(iii) Probability that the sum is odd when 2 is one of the
digits selected: 5/36
Example 1.12. A box contains 8 red, 3 white and 9 blue balls.
Three balls are drawn at random without replacement from this
box. Find the probability that
(i) the three balls are red,
(ii) two balls are red and one is white,
(iii) at least one ball is white,
(iv) one from each color in the drawn balls,
(v) the drawn balls are accomplished according to the following
order (red, white, blue).
(i) Let Ri , i = 1, 2, 3, denote the event that the ith drawn ball
is red, W denote the event that the drawn ball is white and
let B denote the event that the drawn ball is blue, then
P (R1 ∩ R2 ∩ R3 ) =P (R1 )P (R2 | R1 )P (R3 | R1 ∩ R2 )
8 7 6 14
= × × = .
20 19 18 285
C38 14
Or P (R1 ∩ R2 ∩ R3 ) = 20 = .
C3 285
11
(ii)

P (2 red balls and 1 white ball) =


= P (R1 ∩ R2 ∩ W ) + P (R1 ∩ W ∩ R2 ) + P (W ∩ R1 ∩ R2 )
8 7 3 7
=3× × × = .
20 19 18 95
C8 × C3 7
Or P (2 red balls and 1 white ball) = 2 20 1 = .
C3 95

C317
(iii) P (there is no white ball) = , then
C320

P (drawing at least one white ball)


C317
= 1 − P (there is no white ball) = 1 − 20 .
C3

(iv)

P (one from each color in the drawn balls)


= P (R ∩ W ∩ B) + P (R ∩ B ∩ W )
+ P (W ∩ R ∩ B) + P (B ∩ R ∩ W )
+ P (W ∩ B ∩ R) + P (B ∩ W ∩ R)
8 3 9 18
=6× × × = .
20 19 18 95
Or P (one from each color in the drawn balls)
C18 × C13 × C19 18
= = .
C320 95

8 3 9 3
(v) P (R ∩ W ∩ B) = × × = .
20 19 18 95
Definition 1.9 (Independence). The two events A and B in
a sample space S are said to be independent if P (A ∩ B) =
P (A) · P (B).

12
One can show that if A and B are independent, then

P (A | B) = P (A).

The following two examples show that mutually exclusive events


< independent events.
Example 1.13. Consider the sample space of tossing a balanced
coin, then S = {H, T }. Take A = {H} and B = {T } ⇒
P (A) = P (B) = 1/2. Since A ∩ B = ∅ ⇒ P (A ∩ B) = 0 ̸=
P (A)P (B), and hence we conclude that, if the two events A and
B are disjoint, then this don’t lead in general to its independence.
i.e. disjoint events ; independent events.
Example 1.14. Consider the sample space of tossing two bal-
anced coins, then S = {HH, HT, T H, T T }. Take A = {HH, HT }
and B = {T T, HT } ⇒ P (A) = P (B) = 1/2. Since A ∩ B =
1
{HT } ⇒ P (A ∩ B) = = P (A)P (B), and hence we conclude
4
that, if the two events A and B are independent, then this don’t
lead in general to its disjoins . i.e. independent events ; dis-
joint events.
Example 1.15. Show that if A and B are two independent
events, then so Ac and B c .

P (Ac ∩ B c ) =P (A ∪ B)c = 1 − P (A ∪ B)
=1 − [P (A) + P (B) − P (A ∩ B)]
=1 − P (A) − P (B) + P (A)P (B)
=P (Ac ) − P (B)[1 − P (A)]
=P (Ac ) − P (B)P (Ac )
=P (Ac )[1 − P (B)]
=P (Ac )P (B c ).

13
Theorem 1.2 (Total probability). Suppose that the events A1 , A2 , . . . , An
form a partition of a sample space S and let B be another event
in S, then
∑ n
P (B) = P (Ai ) P (B | Ai ).
i=1

Proof.
The event B can be written, as a union
of mutually exclusive events, see Fig. (1.5),
in the form
B = (B ∩ A1 ) ∪ (B ∩ A2 ) ∪ · · · ∪ (B ∩ An ).

∴ P (B) = P (B ∩ A1 ) + P (B ∩ A2 ) + · · · + P (B ∩ An )
∑n
= P (B ∩ Ai ).
i=1
But, since P (B ∩ Ai ) = P (Ai ) P (B | Ai ), then

n
P (B) = P (Ai ) P (B | Ai ).
i=1

Example 1.16. Urn I contains u black and v white balls; urn II


contains x black and y white balls. One ball is transferred from
I to II; one ball is then drawn from II. What is the probability
that it is white?
Let A be the event that the transferred ball from I is black
and let B be the event that the drawn ball from II is white, then
P (B) =P (A)P (B | A) + P (Ac )P (B | Ac )
u y v y+1
= + .
u+vx+y+1 u+vx+y+1
Now, if the drawn ball, in the last example, turns out to be
white. What is the probability that the transferred ball was

14
black. The answer at this question leads us to the following
theorem.
Theorem 1.3 (Bayes’ formula). If the events A1 , A2 , . . . , An
form a partition to S and B is another event in S, then
P (Ai ) P (B | Ai )
P (Ai | B) = ∑n , i = 1, . . . , n
i=1 P (Ai ) P (B | Ai )

Proof.
P (Ai ∩ B) P (Ai ) P (B | Ai )
P (Ai | B) = =
P (B) P (B)
P (Ai ) P (B | Ai )
= ∑n .
i=1 P (Ai ) P (B | Ai )

We can now answer at the last question


P (Ac ) P (B | Ac )
P (Ac | B) = .
P (A) P (B | A) + P (Ac ) P (B | Ac )
Example 1.17. Suppose that it is known that a fraction 0.001 of
the people in a town have tuberculosis (TB). A TB test is given
with the following properties: If the person does have TB, the
test will indicate it with a probability .999. If he does not have
TB, then there is a probability .002 that the test will erroneously
indicate that he does. For one randomly selected person, the
test shows that he has TB. What is the probability that he really
does?
Let A denote the event that the person has TB and E denote
the event that the person is diagnosed to have TB, then
P (A) = 0.001, P (Ac ) = 0.999, P (E | A) = 0.999 and

15
P (E | Ac ) = 0.002, and hence
P (A)P (E | A)
P (A | E) =
P (A)P (E | A) + P (Ac )P (E | Ac )
0.001 × 0.999
=
0.001 × 0.999 + 0.999 × 0.002
=0.333.
Example 1.18. A box contains 10 balls of which 3 are black
and 7 are white. The following game is played: At each trial a
ball is selected at random, its color is noted, and it is replaced
along with two additional balls of the same color. What is the
probability that a black ball is selected in each of the first three
trials?
Let Bi denote the event that a black ball is selected on the
th
i trial. By the multiplication rule,

P (B1 ∩ B2 ∩ B3 ) = P (B1 )P (B2 | B1 )P (B3 | B1 ∩ B2 )


3 5 7 1
= · · = .
10 12 14 16
Example 1.19. Two sets of candidates compete for positions of
Board of Directors of a company. The probabilities for wining
are 0.7 and 0.3 for the two. If the first set wins they will intro-
duce a new product with a probability 0.4. Similar value for the
second set is 0.8. If the new product was introduced, what is the
probability that the first set was directors?
Let A and B be the two sets of candidates. Then P (A wining) =
0.7, P (B wining) = 0.3 and let N be the event of introducing
new product, then P (N | A) = 0.4, P (N | B) = 0.8, and hence
P (A)P (N | A)
P (A | N ) =
P (A)P (N | A) + P (B)P (N | B)
0.7 × 0.4
= = 0.52.
0.7 × 0.4 + 0.3 × 0.8

16
Example 1.20. Three boxes contain balls. The first contains 10
white and 5 black balls, the second contains 7 white and 8 black
balls and the third contains 5 white and 10 black balls. One
box is chosen at random and then two balls are drawn, without
replacement, from this box and turn out to be has the same color.
What is the probability that it comes from the second box.
Let Ai denote the event of drawing box i, i = 1, 2, 3, and B
be the event of drawing two balls of the same color. Since the
box is chosen at random, then
P (A1 ) = P (A2 ) = P (A3 ) = 1/3, and hence
P (B) =P (A1 )P (B | A1 ) + P (A2 )P (B | A2 ) + P (A3 )P (B | A3 )
[ ] [ ] [ ]
1 C210 + C25 1 C27 + C28 1 C25 + C210
= + +
3 C215 3 C215 3 C 15
[ ] [ ]2
1 10 × 9 + 5 × 4 1 7×6+8×7
= +
3 15 × 14 3 15 × 14
[ ]
1 5 × 4 + 10 × 9
+ = 0.504.
3 15 × 14
P (A2 )P (B | A2 ) 0.15
∴ P (A2 | B) = = = 0.31.
P (B) 0.504
Example 1.21. In a multiple-choice test, assume that there are
five multiple-choice available to each question. Let p be the prob-
ability that a student knows the answer and q = 1 − p the proba-
bility that the student guesses. Assume also that the probability
that the student gets the right answer given that he guesses is
0.2. If the student got the right answer, what is the probability
that the student knew the right answer indeed.
Let A denote the event that the student got the right answer
and B denote the event that the student knew the right answer.
Using Bayes’ formula, we have
P (A | B)P (B) 1×p
P (B | A) = = .
P (A | B)P (B) + P (A | B c )P (B c ) 1 × p + .2 × q

17
Exercises I
In the following, suppose that A, B and C are events in a
sample space S.
1. If P (A) = 0.2, P (B) = 0.4, P (A∪B) = 0.5. Determine
(i) P (A ∩ B), (ii) P (Ac ∩ B), (iii) P (A ∩ B c ),
(iv)P (Ac ∩ B c ), (v) P (A | B), (vi) P (B c | Ac ).
2. Show that
P (A ∩ B) − P (A)P (B)
P (A | B) − P (A | B c ) =
P (B)P (B c )

3. If P (B) = 2/3, P (A ∩ B) = 1/2, P (A ∩ B c ) = 1/4.


Determine (i) P (A), (ii) P (A ∪ B), (iii) P (Ac ∩ B c )
1
4. If P (A) = 0.5, P (B) = , P (A ∪ B) = 0.6.
3
Are A and B independent? Why?
5. If P (A) = 0.3, P (B) = 0.5, P (A ∪ B) = 0.7. Determine
(i) P (Ac | B), (ii) P (A | B c ), (iii) P (Ac | B c ),
(iv) P [(A − B) ∪ (B − A)].
6. If P (B) = P (A | B) = P (C | A ∩ B) = 0.5. What is
P (A ∩ B ∩ C)?
7. Show that the conditional probability function P (· | B),
P (B) ̸= 0, satisfies the axioms of a probability space.
8. A pair of dice is tossed. If the numbers appearing are dif-
ferent, find the probability that the sum is even? [2/5]
9. In a certain college 25% of the boys and 10% of the girls are
studying statistics. The girls constitute 60% of the students
body. If a student is selected at random and is found to
be studying statistics, determine the probability that the
student is a girl. [3/8]

18
10. A bond issue for the construction of a new public library is
before the voters. A poll showed that 85% of those with a
college education favored the construction of a new library,
but only 20% of those not having a college education did
so. Suppose that 90% of the voting population do not have
a college education. What is the probability that a voter
selected at random who favors the bond issue will be one
with a college education? [.32]
11. A certain cancer diagnostic test is 95% accurate on those
that do have cancer, and 90% accurate on those that do not
have cancer. If 1/2% of the population actually does have
cancer, compute the probability that a particular individual
has cancer if the test finds that he has cancer.
12. In Polya’s urn scheme, an urn initially contains r red balls
and b black balls. At each trial a ball is selected at random,
its color is noted and it is replaced along with c additional
balls of the same color. What is the probability that one
obtains a red ball in each of first three trials?
13. Three machines A, B and C produce respectively 60%, 30%
and 10% of the total number of items of a factory. The
percentages of defective output of this machines are respec-
tively 2%, 3% and 4%. An item is selected at random and
is found defective. Find the probability that the item was
produced by machine A or B? [21/25]

19
Chapter 2

RANDOM VARIABLES,
PROBABILITY FUNCTIONS
AND EXPECTATIONS

The purpose of this chapter is to present the types of random


variables and types of probability functions in addition to math-
ematical expectation and moments.

2.1 Types of Random Variables

Definition 2.1 (Random variable). Given a random experiment

with a sample space S. A function X, which assigns to each

element s ∈ S one and only one real number X(s) = x, is called

a random variable.

i.e. X : S 7→ R; X(s) = x.

The range of X is the set of real numbers

20
RX = {x : X(s) = x; s ∈ S}.

Example 2.1. If we consider, for example, the sample space S

of tossing two balanced coins, and let the random variables X

and Y be defined as follows:

X ≡ number of outcome heads,

Y ≡ | difference between number of heads and number of tails|,

then

S ={HH, HT, T H, T T }

={s1 , s2 , s3 , s4 }

X(s1 ) ≡ X(HH) = 2, X(s2 ) ≡ X(HT ) = 1, X(s3 ) ≡ X(T H) =

1, X(s4 ) ≡ X(T T ) = 0. Similarly,

Y (s1 ) ≡ Y (HH) = 2, Y (s2 ) ≡ Y (HT ) = 0, Y (s3 ) ≡ X(T H) =

0, Y (s4 ) ≡ Y (T T ) = 2, then

P (X = 2) = P {s ∈ S : X(s) = 2} = P {HH} = 1/4.

Similarly,

P (X = 1) = P {s ∈ S : X(s) = 1} = P {HT, T H} = 1/2,

P (X = 0) = P {s ∈ S : X(s) = 0} = P {T T } = 1/4.

Similarly, for the random variable Y ,

P (Y = 0) = P {s ∈ S : Y (s) = 0} = P {HT, T H} = 1/2,

21
P (Y = 2) = P {s ∈ S : Y (s) = 2} = P {HH, T T } = 1/2.

Then, we can write




 

 1/4 ; x = 0 
  1/2 ; y=0
P (X = x) = 1/2 ; x = 1 , P (Y = y) = (2.1)

 
 1/2 ; y = 2.


 1/4 ; x = 2

When the range of a random variable X can be counted or

numerated, then this random variable is called discrete random

variable and the probability function that defined on this type

of random variables is called a probability mass function if

the two conditions given in the following definition are satisfied.

Definition 2.2 (Probability mass function). The function p(x),

defined on the discrete random variable X, where p(x) ≡ P (X =

x) = P {s : X(s) = x; s ∈ S} represents a probability mass

function if the following two conditions are satisfied

(i) p(x) ≥ 0 ∀ x ∈ RX
∑∞
(ii) −∞ p(x) = 1.

Remark 2.1. Some authors, sometimes, denote to the proba-

bility mass function by f (x).

One can notice that the two random variables mentioned

in Example (3.1) are discrete random variables, since RX =

22
{0, 1, 2} and RY = {0, 2} and the functions P (X = x) ≡ p(x)

and P (Y = y) ≡ p(y) are probability mass functions, since

p(x) ≥ 0 ∀x ∈ {0, 1, 2} and p(0) + p(1) + p(2) = 1. Similarly,

the last conditions are satisfied for p(y).

If the range of the random variable X can not be counted,

then this random variable is called continuous random vari-

able and the probability function that defined on this type of

random variables is called a probability density function

(pdf) if the two conditions given in the following definition are

satisfied.

Definition 2.3 (Probability density function). The function

f (x), defined on the continuous random variable X, represents

a probability density function if the following two conditions are

satisfied

(i) f (x) ≥ 0, ∀x ∈ RX
∫∞
(ii) −∞ f (x) dx = 1.

Remark 2.2. Condition (i), given above, means that the curve

of f (x) lies entirely above the x-axis, but condition (ii) means

that the area under this curve, bounded by the domain of x, is

equal 1.

23
Remark 2.3. The relation between the probability of an event

A and both of the pdf and probability mass function can be

written by the following two relations



 ∫

A f (x) dx ; X : continuous,
PX (A) ≡ P (X ∈ A) =
∑

X∈A p(x) ; X : discrete.

Example 2.2. Prove that the following functions represent prob-

ability functions


 1− | 1 − x | ; 0 < x < 2,
(i) f(x)=

 0; otherwise.
{ ( )2 }
1 1 x−µ
(ii) f (x)= √ exp − ; −∞ < x < ∞,
σ 2π 2 σ
.  (−∞ < µ < ∞, σ > 0)
x

 e−2 2 ; x = 0, 1, 2, . . . ,
(iii) f(x)= x!

 0; otherwise.

(i) Since x is defined on an interval, the range can not be

counted, then X is a continuous random variable. From

the definition of absolute value we have



 1 − x ; x < 1,



| 1 − x |= 0; x = 1,




 x − 1 ; x > 1.

24
Then we can write f (x) in the form


 x; 0 < x < 1,
f (x) =

 2 − x ; 1 ≤ x < 2.

It is clear from the definition of f (x) that f (x) ≥ 0; ∀x ∈


∫∞
(0, 2). Now, we prove that −∞ f (x) dx = 1.
∫ ∞ ∫ 0 ∫ 1
f (x) dx = f (x) dx + f (x) dx
−∞ −∞ 0
∫ 2 ∫ ∞
+ f (x) dx + f (x) dx
1
∫ 1 2
∫ 2
= 0 + x dx + (2 − x) dx + 0
0 1
x2 1(2 − x)2 2
= −
2 0[ 2] 1
1 1
= − 0− = 1,
2 2
and hence f (x) is a pdf.

(ii) Since x is defined on an interval, then the range of X can not

be counted and hence X is a continuous random variable.


∫ ∞ { ( )2 }
1 −1 x − µ
To prove that I = √ exp = 1,
σ 2π −∞ 2 σ
we use the substitution z = (x − µ)/σ ⇒ dx = σdz, then
∫ ∞ ∫ ∞
σ 2
I= √ exp(−z /2)dz = √
2
exp(−z 2 /2)dz,
σ 2π −∞ 2π 0
since exp(−z 2 /2) is an even function.

Now, put y = z 2 /2 ⇒ z = 2y; z > 0, ⇒ dz =

25
(2y)−1/2 dy, and hence
∫ ∞ ∫ ∞
2 1 Γ(1/2)
I=√ (2y)−1/2 e−y dy = √ y −1/2 e−y dy = √ = 1,
2π 0 π 0 π

and hence f (x) is a pdf.

Remark 2.4. From the definition of gamma function we

have
∫ ∞ √
xn−1 exp(−x/β)dx = Γ(n).β n , Γ(1/2) = π.
0

∑∞ n
(iii) We know before that n=0 x /n! = ex , then

∑ ∑∞
−2 2x −2 2x
e =e = e−2 .e2 = 1,
x=0
x! x=0
x!

and hence f (x) is a probability mass function.

2.2 Mathematical Expectation and Moments

2.2.1 Mean, variance and moments

Definition 2.4 (Mean). Let X be a random variable having pdf

f (x) or probability mass function p(x), then the mean (expected

value) of X, denoted by E[X] or µX , is defined by



 ∫
 ∞ xf (x)dx; X : Continuous,
−∞
E[X] =
 ∑ xi p(xi );

X : Discrete.
i

26
Definition 2.5 (Variance). Let X be a random variable having

pdf f (x) or probability mass function p(x), then the variance of


2
X, denoted by V [X] or σX , is defined by

 ∫
 ∞ (x − µX )2 f (x)dx; X : Cont.,
−∞
V [X] = E[(X − µX )2 ] =
 ∑ (xi − µX )2 p(xi ); X : Disc.

i

In general we have the following definition

Definition 2.6. Let g(X) be a function of the random variable

X having pdf f (x) or mass function p(x). The expected value

of g(X) is defined by

 ∫
 ∞ g(x)f (x)dx; X : Continuous,
−∞
E[g(X)] =
 ∑ g(xi )p(xi );

X : Discrete.
i

Special Cases

(i) If g(X) = X r , then



 ∫
 ∞ xr f (x)dx; X : Cont.,
′ −∞
E[g(X)] = E[X r ] = µr =
 ∑ xr p(xi ); X : Disc.,

i i

which is called the rth moment of the random variable X

about the origin, r = 0, 1, 2, . . . . If r = 1, then E[g(X)] =

E[X] = µX which is called the mean of the random variable

27
X, i.e. the first moment of X about the origin is itself the

expected value of X.

(ii) If g(X) = (X − µX )r , then

E[g(X)] = E[X − µX ]r = µr

 ∫
 ∞ (x − µX )r f (x)dx; X : Cont.,
−∞
=
 ∑ (xi − µX )r p(xi ); X : Disc.,

i

which is the rth moment of the random variable X about

its mean, r = 0, 1, 2, . . . . It is called also the rth central

moment of X about its mean. If r = 2, then E[g(X)] =

E[(X − µX )2 ] which is called the variance of X, i.e. the

second central moment of X about its mean is itself the

variance of X.

Remark 2.5. The expected value of X, E[X], represents the

center of gravity of the unite mass that is determined by the

density function of X. So the mean of X is a measure of where

the values of the random variable X are centered, but the vari-

ance of X, V [X], represents the moment of inertia of the same

density with respect to a perpendicular axis through the center

of gravity.

28
Definition 2.7 (Standard deviation). The standard deviation

of a random variable X, denoted by σX , is defined as + V [X].

Properties of mean and variance

(i) E[c] = c, V [c] = 0 where c is a constant,

(ii) E[cg(X)] = cE[g(X)], V [cg(X)] = c2 V [g(X)],

(iii) E[c1 g1 (X1 ) + c2 g2 (X2 )] = c1 E[g1 (X1 )] + c2 E[g2 (X2 )],

V [c1 g1 (X1 )+c2 g2 (X2 )] = c21 V [g1 (X1 )]+c22 V [g2 (X2 )]; X1 and

X2 are independent.

(iv) If g1 (x) ≤ g2 (x) ∀x, then E[g1 (X)] ≤ E[g2 (X)].

The above properties can be proved, simply, by applying the

definitions of mean and variance.

Theorem 2.1. If X is a random variable,

V [X] = E[X 2 ] − (E[X])2 ,

provided E[X 2 ] exists.

29
Proof.

V [X] = E[X − E[X]]2

= E[X 2 − 2X E[X] + (E[X])2 ]

= E[X 2 ] − 2(E[X])2 + (E[X])2

= E[X 2 ] − (E[X])2 .

2.2.2 Measures of skewness and kurtosis

When the curve of the distribution is more extended to the

right(left), then we say that the curve is skewed to the right(left)

or it has positive(negative) skewed, see Fig. (2.3).

Definition 2.8 (Measure of skewness). The measure of skew-

ness, denoted by γ1 , is defined to be the ratio between the third

moment about the mean and cubic of the standard deviation,

30
i.e.
µ3
γ1 = 3
.
( σ )
mean − median
The quantity provides an alternative
standard deviation
measure of skewness.

If the measure of skewness is positive(negative), then this

means that the mean is bigger(smaller) than the median. When

the curve is symmetric, then the mean is equal to the median

and hence the measure of skewness is equal to zero, since all

moments of odd order about the mean are equal to zero.

The kurtosis is the degree of flatness of a density curve near

its center.

Definition 2.9 (Measure of kurtosis). The measure of kurtosis,

denoted by γ2 , is defined to be the ratio between the fourth

moment about the mean and squared of the variance, i.e.


µ4
γ2 = .
σ4
31
(µ )
4
Positive(negative) values of − 3 are sometimes used to
σ4
indicate that a density is more peaked(flat) around its center

than the density of a normal curve. Curves with γ2 < 3 are

called Platykurtic while those with γ2 > 3 are called Leptokurtic.

Example 2.3. A box contains 8 items of which 2 are defective.

A man selected 3 items from this box, find the expected number of

defective items he had drawn if (i) the draw without replacement,

(ii) the draw with replacement.

(i) The number of defective items may be 0,1 or 2. If X is a

random variable shows the number of defective items, then the

distribution of X can be written as follows


C02 C36 20
P (X = 0) = p(0) = = ,
C38 56
C12 C26 30
P (X = 1) = p(1) = = ,
C38 56
C 2C 6 6
P (X = 2) = p(2) = 2 8 1 = .
C3 56

x 0 1 2

p(x) 20/56 30/56 6/56

xp(x) 0 30/56 12/56


Then

2
3
E[X] = xp(x) =
x=0
4

32
The random variable here has the hypergeometric distribution

as we shall see in the next chapter.

The reader may be left (ii) until he knows the binomial dis-

tribution that is explained in the next chapter.

Example 2.4. A balanced coin is tossed until a head or four

tails occurs. Find the expected number, E, of tosses of the coin.

We know that the sample space of tossing a coin one, two,

three or four times consists of 2, 4, 8 or 16 elements, then only

one toss occurs if head occurs the first time, two tosses occur

if the first is tail and the second is head. Three tosses occur if

the first two are tails and the third is head. Four tosses occur if

either T T T H or T T T T occurs. So
1 1
p(1) = P {H} = , p(2) = P {T H} =
2 4
1
p(3) = P {T T H} = ,
8
1 1 2
p(4) = P {T T T H} + P {T T T T } = + = ,
16 16 8
then
1 1 1 2 15
E =1× +2× +3× +4× = .
2 4 8 8 8
Sometimes we denote the probability mass function p(x) by

f (x).

33
Exercises II

1. Let X be a random variable with pdf f (x) = 2x/9, x ∈

A = {x : 0 < x < 3} and let A1 = {x : 0 < x < 1}, A2 =

{x : 2 < x < 3}, determine P (A1 ), P (A2 ), P (A1 ∪ A2 ).

2. Let X be a random variable with probability mass function

f (x) = x/15; x = 1, 2, 3, 4, 5; zero elsewhere. Find P (X =

1 or 2), P (0.5 < X < 2.5) and P (1 ≤ X ≤ 2).

3. (a) Find the value of the constant C for the following func-

tion to be a probability function, where f (x) = Cxe−2x ;

x > 0, zero elsewhere.

(b) If X has the pdf given in (a), find the cdf, F (x), and

then calculate

i. P (X > 3).

ii. P (1 < X < 3 | 0 < X < 2).

4. Find the value of the constant C for the following functions

to be probability functions

(a) f (x) = C e−x +2x , −∞ < x < ∞.


2

  
 −8   −4 
(b) f (x) = C    ; x = 0, 1, 2, 3, 4, 5, 6.
x 6−x

34
C ex
(c) f (x) = , −∞ < x < ∞.
2(1 + ex )2


 √
1
; | x |< 1,
(d) f (x) = π 1 − x2

 0; otherwise.

5. Find the mode of each of the following distributions


( )x
1
(a) f (x) = , x = 1, 2, . . . , zero elsewhere.
2
x2 −x
(b) f (x) = e ; 0 < x < ∞, zero elsewhere.
2

6. Find the median of each of the following distributions


( )x ( )4−x
1 3
(a) f (x) = Cx4 , x = 0, 1, 2, 3, 4.
4 4
(b) f (x) = 3 x2 ; 0 < x < 1, zero elsewhere.

x+2
7. Let X be a random variable have pdf f (x) = ; −2 <
18
x < 4, zero elsewhere. Find E[X], E[(X +2)3 ], and E[6X −

2(X + 2)3 ].

8. A fair coin is tossed four times. Let X denote the number

of heads occurring. Find the distribution, mean, variance

and standard deviation of X.

9. A box contains 10 transistors of which 2 are defective. A

transistor is selected from the box and tested until a non-

35
defective one is obtained. Find the expected number of

transistors to be chosen. [11/9]

10. A fair coin is tossed until a head appears, Let X denote the

number of tosses required.

(a) Find the density function of X

(b) Find the moment generating function of X and hence

find its mean and variance.

11. Consider the rth central moment of gamma distribution as

follows
∫ ∞
(x − αβ)r α−1 −x/β
µr = E[(X − αβ) ] =r
x e dx.
0 Γ(α)β α
[ ]
dµ r
(a) Prove that µr+1 = β 2 α r µr−1 + , r = 1, 2, . . .

(b) Use the fact that µ0 = 1, µ1 = 0 and the differen-

tial equation in (a) in calculating the central moments

µ2 , µ3 and µ4 of gamma distribution with two parame-

ters (α, β).

36
Chapter 3

SOME IMPORTANT

DISCRETE DISTRIBUTIONS

In this chapter, some distributions such as the binomial, multi-

nomial, Poisson, and hypergeometric distributions are presented.

3.1 Binomial Distribution

A random experiment whose outcomes have been classified into

two categories, called “success” and “failure”, denoted respec-

tively by S and F, is a called a Bernoulli trial (for example, head

or tail, life or death, good or defective, boy or girl, etc). Suppose

that the random experiment consists of n repeated independent

Bernoulli trials and p is the probability of success at each indi-

37
vidual trial, then this random experiment is called a binomial

experiment. The term “repeated” is used to indicate that the

probability of success, P (S) ≡ p, remains the same from trial to

trial, thus the probability of failure on each repetition is 1 − p.

Definition 3.1 (Binomial distribution). If the random variable,

X, represents the number of successes in n independent trials of

a binomial experiment, then X subjects to the binomial distri-

bution with probability function




 C n px q n−x , x = 0, 1, . . . , n, p + q = 1,
x
P (X = x) = p(x) =

 0, otherwise

where the parameters n and p satisfy the following

n ∈ Z+ and p ∈ [0, 1].

Remark 3.1. From now on, we use X ∼ b(n, p) to mean that

the random variable X subjects to the binomial distribution

with two parameters n and p.

Conditions of the binomial experiment

1. The experiment has one of two outcomes, one is called suc-

cess with probability p and the other is called failure with

probability q = 1 − p.

38
2. The experiment is repeated n independent trials.

3. The probability of success is constant in each trial.

One can now notice that the probability given by the binomial

distribution may arise in the following ways:

(i) When sampling from a finite population with replacement.

(ii) When sampling from an infinite population (often referred

to as an indefinitely large population) with or without re-

placement.

Theorem 3.1. If the random variable X ∼ b(n, p), then the

mean, variance and moment generating function of X are given,

respectively, by np, npq and (pet + q)n .

Proof.

n
E[X] = xp(x)
x=0
∑n
n!
= x px q n−x , q =1−p
x=0
x!(n − x)!
∑n
n (n − 1)!
= x p px−1 q n−x
x=0
x(x − 1)!(n − x)!

n
(n − 1)!
= np px−1 q n−x
x=1
(x − 1)!(n − x)!

= np(p + q)n−1 = np

39

n
2
E[X ] = x2 p(x)
x=0
∑n
= [x(x − 1) + x]p(x)
x=0
∑n
= x(x − 1)p(x) + E[X]
x=0
∑n
n (n − 1)(n − 2)!
= x(x − 1) p2 px−2 q n−x + np
x=2
x(x − 1)(x − 2)!(n − x)!

n
(n − 2)!
= n(n − 1)p 2
px−2 q n−x + np
x=2
(x − 2)!(n − x)!

= n(n − 1)p2 (p + q)n−2 + np

= n(n − 1)p2 + np.


Since V [X] = E[X 2 ] − (E[X])2 , then

V [X] =n(n − 1)p2 + np − (np)2

= npq.
Now,

n
tX
mX (t) = E[e ] = etx p(x)
x=0

n
= etx Cxn px q n−x
x=0
∑n
= Cxn (pet )x q n−x
x=0

= (pet + q)n .

40
Example 3.1. The probability that a patient recovers from a

rare blood disease is 0.4. If 15 people are known to have con-

tracted this disease, what is the probability that (a) at least 10

survive, (b) from 3 to 8 survive, and (c) exactly 5 survive.

Let X ∼ b(15, 0.4) and shows the number of people that survive.

Then

(a)


9
P (X ≥ 10) = 1 − P (X < 10) = 1 − Cx15 0.4x 0.615−x
x=0

= 1 − 0.9662 = 0.0338.

(b)

8
P (3 ≤ X ≤ 8) = Cx15 0.4x 0.615−x
x=3

= 0.9050 − 0.0271 = 0.8779.

(c)

P (X = 5) = C515 0.45 0.610 = 0.185.

Example 3.2. A pheasant hunter brings down 75% of the birds

he shoots at. What is the probability that at least 3 of the next

5 pheasants shot at will escape?

41
Let X ∼ b(5, 0.25) and shows the number of escaping pheas-

ants, then


5
P (X ≥ 3) = Cx5 .25x 0.755−x .
x=3

Example 3.3. The moment generating function of a random


( )9
2 1 t
variable X is + e . Show that
3 3


5 ( )x ( )9−x
1 2
P (µ − 2σ < X < µ + 2σ) = Cx9 .
x=1
3 3
( )9
2 1 t
The moment generating function + e is the moment
3 3
generating function of the binomial distribution with parameters

n = 9, p = 1/3, then µ ≡ E[X] = np = 3, σ ≡ V [X] =
√ √
npq = 2 and hence
√ √
P (µ − 2σ < X < µ + 2σ) = P (3 − 2 2 < X < 3 + 2 2)

= P (0.2 < X < 5.8) = P (1 ≤ X ≤ 5)


∑5 ( )x ( )9−x
1 2
= Cx9 .
x=1
3 3

Example 3.4. If x = r is the unique mode of a distribution

which is b(n, p), show that (n + 1)p − 1 < r < (n + 1)p

42
If x = r is the only mode, then it must satisfy P (X = r+1) <

P (X = r) and P (X = r − 1) < P (X = r), then


n
Cr+1 pr+1 q n−r−1 (n − r)p
< 1 ⇒ < 1 ⇒ (n − r)p < (r + 1)(1 − p).
Crn pr q n−r (r + 1)q

It follows that

r > (n + 1)p − 1. (3.1)

Similarly,
n
Cr−1 pr−1 q n−r+1 rq
< 1 ⇒ < 1 ⇒ r(1 − p) < (n − r + 1)p.
Crn pr q n−r (n − r + 1)p

It follows that

r < (n + 1)p. (3.2)

From (3.1) and (3.2), we have

(n + 1)p − 1 < r < (n + 1)p.

Example 3.5. Let X be b(2, p) and let Y be b(4, p). If P (X ≥

1) = 5/9. Find P (Y ≥ 1)

P (X ≥ 1) = C12 pq + C22 p2 = 5/9 ⇒ 2p(1 − p) + p2 = 5/9 ⇒

9p2 − 18p + 5 = 0 ⇒ (3p − 5)(3p − 1) = 0, then

p = 1/3 (accepted) or p = 5/3 (rejected).

Thus, P (Y ≥ 1) = 1 − P (Y = 0) = 1 − C04 (2/3)4 .

43
3.2 Poisson Distribution

The Poisson distribution appears in many natural phenomena.

Among others, the number of misprints per page in a large text,

the number of telephone calls per minute at some switchboard

and the number of α particles emitted by a radioactive substance

per unite of time.

Definition 3.2 (Poisson distribution). The random variable X

subjects to Poisson distribution, with parameter λ, if its proba-

bility function is given by


 −λ x

 e λ , x = 0, 1, 2, . . . , (λ > 0),
P (X = x) = p(x) = x!

 0, otherwise.

Here the random variable X shows the number of successes in

a certain period of time or in a bounded region, and λ represents

the standard or the average of this number. The period of time

may be (minute, hour, day, week or month) and the bounded

region may be (page in a book, squared meter from the area or

cubic meter from the volume).

Example 3.6. The average number of accidents on some route

is 5 per week. What is the probability of there is no accident on

44
this route in a certain week? What is the probability of occurring

4 accidents or less in a certain week? What is the probability

that the number of accidents is more than two accidents through

two weeks.

Let X be a random variable subjects to the Poisson distribu-

tion and shows the number of accidents in a certain week,

(i)
e−5 50
P (X = 0) = = e−5 ,
0!

(ii)

4
e−5 5x
P (X ≤ 4) = = 0.4405
x=0
x!

(iii) If X shows the number of accidents in two weeks, then the

average of accidents in two weeks is λ = 2 × 5 = 10, and

hence

2
e−10 10x
P (X > 2) = 1 − P (X ≤ 2) = 1 − = 0.9972.
x=0
x!

Example 3.7. Suppose 220 misprints are distributed randomly

throughout a book of 200 pages. Find the probability that a given

page contains (i) no misprints, (ii) at least two misprints.

45
Suppose that X is a random variable subjects to Poisson dis-
220
tribution, with parameter λ = = 1.1, and shows the number
200
of misprints in a given page, then
e−1.1 1.10
(i) P (X = 0) = = e−1.1
0!
(ii)

P (X ≥ 2) = 1 − P (X < 2) = 1 − [P (X = 0) + P (X = 1)]
( )
−1.1 e−1.1 1.11
=1− e + = 0.334.
1!
Theorem 3.2. If the random variable X subjects to Poisson

distribution with parameter λ, then E[X] = λ, V [X] = λ and

m(t) = e−λ(1−e )
t

Proof.

∑ ∞
∑ xe−λ λx
E[X] = xp(x) =
x=0 x=0
x!
∑∞
xe−λ λλx−1
=
x=1
x(x − 1)!
∑∞
e−λ λx−1
=λ = λ.
x=1
(x − 1)!

46


2
E[X ] = x2 p(x)
x=0
∑∞
[x(x − 1) + x]e−λ λx
=
x=0
x!
∑∞
[x(x − 1)]e−λ λx
= + E[X]
x=0
x!
∑∞
x(x − 1)e−λ λ2 λx−2
= +λ
x=2
[x(x − 1)](x − 2)!


2 e−λ λx−2
=λ +λ
x=2
(x − 2)!

= λ2 + λ.
Since V [X] = E[X 2 ] − (E[X])2 , then

V [X] = λ2 + λ − λ2 = λ.

∑ ∞
∑ ∞

tx etx e−λ λx −λ (λet )x
m(t) = e p(x) = =e
x=0 x=0
x! x=0
x!
{ }
−λ λet
=e e

= e−λ(1−e ) .
t

Remark 3.2. The Poisson distribution is the only distribution

which has the mean equal to the variance.

The Poisson and binomial distributions have histograms with

approximately the same shape when n is large and p is close

47
to zero. Hence if these two conditions hold, the Poisson dis-

tribution, with µ = np, can be used to approximate binomial

probabilities. If p is close to one, we can interchange what we

have defined to be a success and a failure, thereby changing p

to a value close to zero. This approximation is illustrated in the

following theorem, which we state without proof:

Theorem 3.3. Let X be a random variable which has a binomial

distribution with parameters n and p. If n → ∞ and p → 0 such

that np = λ, then the binomial distribution tends to the Poisson

distribution with parameter λ = np.

3.3 Hypergeometric Distribution

The hypergeometric probability function provides probabilities

of certain events when a sample of n objects is drawn at random

from a finite population of N objects, where the sampling is done

without replacement.

Definition 3.3 (Hypergeometric distribution). The random vari-

able X subjects to the hypergeometric distribution if its proba-

bility function is given by

48
   

 N N − N1 

  1  

 






   

 x n−x

   , x = 0, 1, . . . , n,
P (X = x) ≡ p(x) = N 


 
 

  

  

 n



 0, otherwise.
where N is a positive integer, N1 is a nonnegative integer; n1 ≤

N , and n is a nonnegative integer; n ≤ N.

The random variable X in the hypergeometric distribution

represents, say, the number of defective items (drawn without

replacement) within a total of drawn items, n, from a set of

items N that includes N1 defective items.

Theorem 3.4. If the random variable X subjects to the hyper-

geometric distribution with parameters N, n and N1 , then

N1 N1 N − N1 N − n
E[X] = n and V [X] = n .
N N N N −1

The proof of the last theorem is omitted.


N1
Remark 3.3. If we set = p, then the mean of the hyper-
N
geometric distribution coincides with the mean of the binomial

distribution, as shown in the following example, but the vari-

49
(N − n)
ance of the hypergeometric distribution is times the
(N − 1)
variance of the binomial distribution.

Example 3.8. A box contains 8 items of which 2 are defective.

A man selected 3 items from this box. Find the distribution and

the expected number of defective items he had drawn if

(i) the draw without replacement,

(ii) the draw with replacement.

Let X be a random variable shows the number of defective

items and if the draw occurred without replacement, then X will

subject to the hypergeometric distribution with N = 8, N1 = 2

and n = 3 and hence the distribution of X takes the form

 2 6
 C C
 x 83−x , x = 0, 1, 2,
P (X = x) = C3

 0, otherwise.
N1 3×2 3
We know before that E[X] = n = = .
N 8 4
If the draw occurred with replacement, then X will subject
N1 2
to the binomial distribution with n = 3, p = = and hence
N 8
the distribution of X takes the form

50
 ( )x ( )3−x

 2 6
 Cx3 , x = 0, 1, 2, 3
P (X = x) = 8 8


 0, otherwise.

2 3
In this case, E[X] = np = 3 × = .
8 4
Number (i) of this example was solved before in Example

(2.4), but (ii) was left to be solved after studying the binomial

distribution.

Theorem 3.5. If in the hypergeometric distribution N1 → ∞,

then the hypergeometric distribution tends to the binomial dis-

tribution with p = N1 /N

The proof of the last theorem is omitted.

Remark 3.4. The last theorem tells us that for very large N

(population size), sampling with replacement gives approximately

the same probabilities as sampling without replacement.

Example 3.9. A large box contains 150 white mice and 35 gray

mice. A researcher draw (without replacement) 5 mice to per-

form a certain experiment. What is the probability of getting 3

white mice among the 5 selected mice.

51
Let X ∼ H(5, 150, 185) and shows the number of white mice

in the drawn sample, then


185−150
C3150 C5−3
P (X = 3) = ,
C5185

but we notice that the number of white mice is large, so it is

prefer to use the binomial distribution with the parameters n =

5 and p = 150/185. Therefore


( )3 ( )2
150 35
P (X = 3) = C35 = 0.19
185 185

Example 3.10. The telephone company reports that among 5000

telephones installed in a new subdivision 4000 are nonwhite. If

10 people are called at random, what is the probability that ex-

actly 3 will be talking on white telephones.

Since the population size N = 5000 is large relative to the

sample size n = 10, we shall approximate the desired probability

by using the binomial distribution. The probability of calling


4000
some one with a non white telephone is = 0.8. Therefore,
5000
the probability that exactly 3 people are called will be talking

on white telephones is

P (X = 3) = C310 (0.2)3 (0.8)7 .

52
Example 3.11. Two dice are thrown 100 times and the number

of “nines” is recorded. What is the probability that x “nines”

occur? That at least three “nines” occur?

It is apparent that we are examining each roll of the two

dice for the events “nines” or “not-nines”. The probability of

obtaining a nine by throwing two dice is 4/36 = 1/9, that is

p = 1/9. Hence, using the binomial distribution,


( )x ( )100−x
1 8
p(x) = Cx100 , x = 0, 1, 2, . . . , 100.
9 9

In answer to the second question, we have

P (X ≥ 3) = 1 − P (X ≤ 2)
[ 2 ( )x ( )100−x ]
∑ 1 8
=1− Cx100
x=0
9 9

= 0.9993.

53
Exercises III

1. Let X ∼ b(25, 0.2). Evaluate P (X < µX − 2σX ).

2. If X ∼ b(n, p), show that


[ ] [( )2 ]
X X p(1 − p)
E = p and E −p = .
n n n

3. Criticize the following statements

(a) The mean of binomial distribution is 6 and its standard

deviation is 3.

(b) The mean of binomial distribution is 5 and its standard

deviation is 2.

(c) The mean of Poisson distribution is 5 and its standard

deviation is 4.

4. If X has a Poisson distribution with P [X = 1] = P [X = 2],

what is P [X = 1 or 2]?

5. If X has a Poisson distribution with mean 1, show that

E[| X − 1 |] = 2σX /e?

6. The moment generating function of a random variable X

is e4(e −1) . Show that P (µ − 2σ < X < µ + 2σ) = 0.931.


t

54
7. Find the mode of Poisson distribution with parameter λ.

8. A multiple-choice quiz has 3 questions, each with 4 possible

answers of which only one is the correct answer. (a) What

is the probability that sheer guesswork yields at most two

correct answers. (b) What is the probability that a student,

answer by guesswork, will succeed. (c) What is the proba-

bility that a student, answer by guesswork, will get the full

mark. (d) If the third question has 3 possible answers of

which only one is correct, then find the expected number of

correct answers when a student tries, randomly, to answer

these questions.

9. An insurance company finds that 0.0005 of the population

die from a certain kind of accidents each year. What is the

probability that the company must pay off on more than

three of 10,000 insured risks against such accidents in a

given two years.

10. Suppose that it is known that a certain kind of bacteria is

distributed in water at the rate of two bacteria per cubic

centimeter of water. If we assume that this phenomenon

55
can be approximated by a Poisson model. What is the

probability that a sample of 2 c.c. will contain at least two

bacteria?[1 − 5e−4 ]

11. If 5 cards are dealt from a standard deck of 52 playing cards,

what is the probability that 3 will be heart? [0.0815]

56
Chapter 4

SOME IMPORTANT

CONTINUOUS

DISTRIBUTIONS

In this chapter, some distributions such as the uniform, normal,

and gamma distributions are presented.

4.1 Uniform Distribution

Definition 4.1 (Uniform distribution). The random variable X

subjects to the uniform (rectangular) distribution, with two pa-

rameters a, b, if its density function is given by

57


 1
; −∞ < a ≤ x ≤ b < ∞,
f (x) = b−a

 0; otherwise.
Theorem 4.1. If X is uniformly distributed over [a, b], then
a+b (b − a)2 ebt − eat
E[X] = , V [X] = and mX (t) = .
2 12 (b − a)t
Proof.
∫ ∞ ∫ b
x x2 b
E[X] = x f (x) dx = dx =
−∞ a b−a 2(b − a) a

b −a
2 2
b+a
= = .
2(b − a) 2


2
b
x2 b3 − a3 b2 + ab + a2
E[X ] = = = ,
a b − a 3(b − a) 3
V [X] = E[X 2 ] − (E[X])2
b2 + ab + a2 (a + b)2
= −
3 4
4b + 4ab + 4a − 3a2 − 6ab − 3b2
2 2
=
12
b − 2ab + a
2 2
(b − a)2
= = .
12 ∫ 12
b
tX etx
mX (t) = E[e ] = dx
a b−a
etx b
=
(b − a)t a
ebt − eat
= .
(b − a)t

58
Example 4.1. Suppose X is a continuous random variable with

uniform distribution having mean 1 and variance 4/3. What is

P [X < 0]?

If X is uniformly distributed with two parameters a,b; a < b,

then
a+b
E[X] = =1 (4.1)
2
(b − a)2 4
V [X] = = . (4.2)
12 3
From (4.1) we can write a = 2 − b and then by substituting in

(4.2), yields that


(b − 2 + b)2 4
= ⇒ (b − 1)2 = 4
12 3
⇒ b2 − 2b − 3 = 0 ⇒ (b + 1)(b − 3) = 0,

then either b = −1 and hence a = 3 or b = 3 and hence a = −1.

The first choice for a and b is rejected since a < b, thus the

second choice is accepted.


∫ 0
1 x 0 1 1
P (X < 0) = dx = −1
= [0 − (−1)] = .
−1 4 4 4 4

4.2 Normal Distribution

Definition 4.2 (Normal distribution). The random variable X

subjects to the normal distribution, with two parameters µ and

59
σ, if its density function is given by
[ ( )2 ]
1 −1 x − µ
f (x) = √ exp , −∞ < x < ∞,
σ 2π 2 σ

(−∞ < µ < ∞, σ > 0), π = 3.14.

(4.3)
If the random variable X is normally distributed with mean µ

and variance σ 2 (later on we prove that the mean is µ and the

variance is σ 2 ), we will write X ∼ N (µ, σ 2 ). We will also use

the notation Φµ,σ2 (x) for the cumulative distribution function.


x−µ
If in (4.3) z = , then
σ
[ 2]
1 −z
fZ (z) = √ exp , −∞ < z < ∞,
2π 2
is the density function of the random variable Z with two pa-

rameter values µ = 0 and σ 2 = 1, which is called standard

normal random variable, i.e. Z ∼ N (0, 1).

The graph of the normal distribution is called the normal curve,

Properties of the normal curve

1. The curve is symmetric about a vertical axes through the

mean µ and it has the bell-shape.

2. The mode, which is the point on the horizontal x-axes

where the curve is a maximum, occurs at x = µ.

60
3. The normal curve approaches the horizontal x-axes as x →

±∞

4. The total area under the curve and above the x-axes is

equal to 1.

Fortunately, to avoid the use of integral calculus, we are able to

transform all of the observations of any normal random variable

X to a new set of observations of a standard normal random

variable Z with mean zero and variance one, using the transfor-

mation
(X − µ)
Z= ,
σ
(µ − µ) V (X − µ) σ 2
where E[Z] = = 0 and V [Z] = = 2 = 1.
σ σ2 σ

Theorem 4.2. If X ∼ N (µ, σ 2 ), then

E[X] = µ, V [X] = σ 2 .

Proof.
∫ [ ( )2 ]

1 −1 x−µ
I ≡ E[X] = √ x exp dx.
σ 2π −∞ 2 σ

x−µ
Using the transformation z = ⇒ x = σz +µ ⇒ dx = σdz,
σ

61
then
∫ ∞ ] [
1 −1 2
I= √ (σz + µ)exp z σdz
σ 2π −∞ 2

∫ ∞ ∫ ∞
−z µ
e−z
2 2
=σ ze /2
dz + √ /2
dz
−∞ 2π −∞

= µ,
∫∞
since −∞ ze−z /2 dz = 0; the integrated function is odd, and
2

∫ ∞ −z 2 /2
−∞ e dz = 1; Z ∼ N (0, 1).

Similarly, one can prove that V [X] = σ 2 .

Example 4.2. Given the normally distributed random variable

X with mean 18 and variance 6.25, find

(i) P (X < 15),

(ii) the value of k such that P (X < k) = 0.2578,

(iii) P (17 < X < 21),

(iv) the value of k such that P (X > k) = 0.1539.

62
(i)
( )
X − 18 15 − 18
P (X < 15) = P <
2.5 2.5
= P (Z < −1.2)

= Φ(−1.2) = 0.1151,

(ii)
( )
k − 18
P (X < k) = 0.2578 ⇒ P Z < = 0.2578
2.5
k − 18
⇒ = −.65 ⇒ k = 16.375,
2.5
(iii)
( )
17 − 18 21 − 18
P (17 < X < 21) = P <Z<
2.5 2.5
= P (−.4 < Z < 1.2)

= Φ(1.2) − Φ(−.4)

= .8849 − .3446 = 0.5403,

(iv)
( )
K − 18
P (X > k) = .1539 ⇒ P Z > = .1539
2.5
( )
18 − k
⇔P Z< = .1539
2.5
18 − k
⇒ = −1.02
2.5
⇒ k = 20.55,

63
Example 4.3. If a set of grades on a statistic examination are

approximately normally distributed with a mean of 17 and a

standard deviation of 7.9, find (a) the lowest passing grade if

the lowest 10% of the students are given F s, (b) the highest B

if the top 5% of the students are given As.

Let X ∼ N (74, (7.9)2 ) and shows the grades


( )
k − 74
(a) P (X < k) = 0.1 ⇒ P Z < = .1
7.9
k − 74
⇒ = −1.28 ⇒ k ∼ = 64.
7.9
( )
B − 74
(b) P (X > B) = .05 ⇒ P Z > = .05
( ) 7.9
74 − B
⇔P Z< = .05
7.9
74 − B
⇒ = −1.65 ⇒ B ∼ = 87.
7.9
Example 4.4. In a mathematics examination the average grade

was 82 and the standard deviation was 5. All students with

grades from 88 to 94 received a grade of B. If the grades are

approximately normally distributed and 8 students received a B

grade, how many students took the examination?

Let X ∼ N (82, 25) and shows the grade,

P (88 < X < 94) = P (1.2 < Z < 2.4) = Φ(2.4) − Φ(1.2) =

.9918 − .8849 = .1096 ⇒ n × .1096 = 8 ⇒ n ∼


= 75.

64
4.2.1 Normal approximation to the binomial

We shall now state (without proof) a theorem that allows us

to use areas under the normal curve to approximate binomial

probabilities when n is sufficiently large.

Theorem 4.3. If X is a binomial random variable with mean

µ = np and variance σ 2 = npq, then the limiting form of the

distribution of
X − np
Z= √ ,
npq
as n → ∞, is the standardized normal distribution N (0, 1).

The probabilities of the binomial are approximated according

to the following:

If a, b and c are positive integers, 0 ≤ a, b, c ≤ n, then

1.
P (X = c) = P (c − 0.5 ≤ X ≤ c + 0.5)
( )
c − 0.5 − np c + 0.5 − np
=P √ ≤Z≤ √
npq npq
( )
a − 0.5 − np b + 0.5 − np
2. P (a ≤ X ≤ b) = P √ ≤Z≤ √ ,
npq npq
3. P (a < X < b), P (a < X ≤ b) and P (a ≤ X < b) should be

transformed to closed interval probability and then apply

(2).

65
Example 4.5. A drug manufacturer claims that a certain drug

cures a blood disease on the average 85% of the time. To check

the claim, government testers used the drug on a sample of 100

individuals and decide to accept the claim if 75% or more are

cured, what is the probability that the claim will be accepted when

the cure probability is in fact 85%

Let X ∼ b(100, 0.85) and shows the number of cured people,

then

100
P (X ≥ 75) = Cx100 (0.85)x (0.15)100−x ,
x=75
but we notice that the number of individuals is large, so it is

prefer to use the normal approximation to the binomial. There-

fore
( )
75 − 0.5 − (100)(0.85)
P (X ≥ 75) = P Z≥ √
(100)(0.85)(0.15)

= P (Z ≥ −2.94) = 1 − Φ(−2.94) = 1 − 0.0016

= 0.9984.

Example 4.6. A certain pharmaceutical company knows that,

on the average, 5% of a certain type of pill has ingredient that

is below the minimum strength and thus unacceptable. What is

the probability that at least 2 in a sample of 200 pills will be

66
unaccepted. Find also the mean and standard deviation of the

accepted pills.

Let X ∼ b(200, 0.05) and shows the number of unaccepted

pills, then

P (X ≥ 2) = 1 − [P (X = 1) + P (X = 0)] =

= 1 − [C1200 (0.05)1 (0.95)199 + C0200 (0.95)200 ],

but we notice that n = 200 is large, so it is prefer to use Poisson

distribution with parameter λ = (200)(0.05) = 10 or use the

normal approximation to the binomial. Therefore


( )
2 − 0.5 − (200)(0.05)
P (X ≥ 2) = P Z ≥ √
(200)(0.05)(0.95)

= P (Z ≥ −2.75) = 1 − Φ(−2.75) = 1 − 0.003 = 0.997.

The mean and standard deviation of the accepted pills are equal,

respectively, n(1 − p) = 200(0.95) = 190 and n(1 − p)q =

200(0.95)(0.05) = 3.08.

4.3 Gamma Distribution

Definition 4.3 (Gamma distribution). The random variable X

subjects to the gamma distribution, with two parameters n, β,

67
if its density function is given by,




1
xn−1 e−x/β , x ≥ 0, (n, β > 0)
Γ(n)β n
f (x) = (4.4)

 0, otherwise,
where Γ(.) is the gamma function.

Remark 4.1. From definition of gamma function, we have


∫ ∞
xn−1 e−x/β dx = Γ(n)β n .
0

Theorem 4.4. If X has a gamma distribution with parameters

n and β, then

E[X] = nβ, V [X] = nβ 2 .

Proof. The proof is omitted.

Special Cases

1. Exponential distribution.

If in (4.4) n = 1, then

 1
 e−x/β , x ≥ 0, (β > 0)
f (x) = β

 0, otherwise,

which is the exponential distribution with parameter β.

68
2. Chi-square distribution

If in (4.4) n = m/2, β = 2, then



 1

m/2
x(m/2)−1 e−x/2 , x ≥ 0, (m, β > 0)
f (x) = Γ(m/2)2

 0, otherwise,

which is the chi-square distribution with degree of freedom

m, denoted by χ2 (m)

Remark 4.2. The exponential distribution has been used as a

model for lifetimes of various things.

69
Exercises IV

1. Let X have the uniform distribution on the interval [θ1 −

θ2 , θ1 + θ2 ], θ2 > 0. Find θ1 and θ2 so that the mean and

the variance of X are respectively, equal to the mean and

the variance of a distribution which is χ2 (8).

2. Let X ∼ N (5, 10). Find P (0.04 < (X − 5)2 < 38.4).

3. Let X ∼ N (1, 4). Find P (1 < X 2 < 9).


X −µ
4. Let X ∼ N (µ, σ 2 ). Find b so that P (−b < < b) =
σ
0.9.

5. Let X ∼ N (50, 25). Determine (a) P (X > 62),

(b) P (| X − 50 |< 8), and (c) P (X 2 < 1600).

6. Let X ∼ N (50, 100). Find P (Y ≤ 3137), where Y = X 2 +1.

[.4516]

7. If loge X ∼ N (1, 4), find P (0.5 < X < 2). [.248]

8. If zα is the value of Z such that


∫ ∞
1
e−z /2 dz = α,
2

2π zα
then find the value of zα when (a) α = 0.05, (b) α = 0.025,

(c) α = 0.01.

70
9. Show that the graph of a pdf N (µ, σ 2 ) has points of inflec-

tion at x = µ − σ, x = µ + σ?

10. For a normal distribution with mean 12 and standard devi-

ation 2, find a value of the variate such that the probability

of the interval from the mean to that value is 0.3159. [13.8]

11. Let X ∼ N (µ, σ 2 ), prove that

dµ2r
(a) µ2r+2 = σ 2 µ2r + σ 3 .


′ ′ ′
3 dµr
(b) µr+2 = 2µµr+1 + (σ − µ )µ2 + σ
2 2
.

12. The I.Q.s of 600 applicants to a certain college are approx-

imately normally distributed with a mean of 115 and a

standard deviation of 12. If the college requires an I.Q. of

at least 95, how many of these students will be rejected on

this basis regardless of their other qualifications?

13. A coin is tossed 400 times. Find the probability of obtaining

(a) Between 185 and 210 heads.

(b) Exactly 205 heads.

(c) Less than 176 or more than 227 heads.

71
14. The total time, T , taken to complete a certain job is a
β α α−1 −βt
gamma random variable with pdf f (t) = t e ; t≥
Γ(α)
0, zero elsewhere, where α = 4 and β = 1 hours. What

fraction of job will take longer than 5 hours to complete.

[.265]

15. The time to failure, T , of a component is assumed to have

a pdf f (t) = α e−αt t ≥ 0, zero elsewhere.

(a) What is the probability that the component fails be-

tween k and k + 1; k is an integer.

(b) If the mean time to failure of the component is 100

hours, what is the probability that any particular com-

ponent will last 200 hours.

72
Chapter 5

SAMPLING THEORY

Studying the relationships existing between a population and

samples drawn from the population is called “sampling theory”.

Sampling theory is useful in estimating the unknown pop-

ulation parameters and also in determining whether observed

differences between two samples are really due to chance varia-

tion or whether they are actually.

The purpose of this chapter is to introduce the concept of

sampling and to present some distribution results that are re-

lated by sampling.

73
5.1 Population and Samples

Definition 5.1 (Population). The totality of all observations

which are under discussion will be called the population.

Definition 5.2 (Simple random sample). If a sample of size n,

say X1 , X2 , . . . , Xn , drawn from a population of size N in such a

way that every possible sample of size n has the same probability

of being selected, then it is called a simple random sample.

Definition 5.3 (Statistic). A statistic is a random variable de-

pends only on the observed sample.

Example 5.1. If X1 , X2 , . . . , Xn is a random sample of size n,

then each of the following represents a statistic.


1∑
n
1. X̄ = Xi [sample mean].
n i=1

1∑ r
n

2. µr = X [rth sample moment about 0].
n i=1 i

1∑
n
3. µr = (Xi − X̄)r [rth sample moment about X̄].
n i=1
Definition 5.4 (Sample variance). If X1 , X2 , . . . , Xn represent

a random sample of size n, with mean X̄, then


1 ∑
n
S =2
[Xi − X̄]2 ; n>1
n − 1 i=1

74
is defined to be the sample variance.

Definition 5.5 (Sampling distribution). The probability distri-

bution of a statistic is called a sampling distribution.

To construct a sampling distribution, we proceed as follows:

1. From a finite population of size N , randomly draw all pos-

sible samples of size n.

2. Compute the statistic of interest, such as the mean, for each

sample.

3. List in one column the different distinct observed values of

the statistic, and in another column list the corresponding

frequency of occurrence of each distinct observed value of

the statistic.

Definition 5.6 (Standard error). The standard deviation of the

sampling distribution of a statistic is called the standard error

of the statistic.

5.2 Sampling Distribution of the Mean

1. When σ is known.

75
Let X1 , X2 , . . . , Xn be a random sample of size n drawn

from a population of size N having mean µ and variance

σ 2 , then
 2( )

 σ N − n

 ; if the population is finite and the

 n N −1





 sampling is without replacement,

2
σX̄ =



 σ2



 ; if the population is infinite or the

 n


 sampling is with replacement,

where σX̄ is called the standard error of X̄.


( )
N −n
Remark 5.1. The factor is called “finite pop-
N −1
ulation correction” and can be ignored if N is very large

(infinite population) or if n represents at most 5 percent


n σ2
from the population, i.e. ≤ 0.05, in this case σX̄ = .
2
N n
Theorem 5.1. If all possible random samples of size n are

drawn with replacement from a finite population of size N

with mean µ and variance σ 2 , then the sampling distribution

of the mean X̄ will be approximately normally distributed

with mean µX̄ = µ and variance σ 2 /n. Hence

X̄ − µ
Z= √ ∼ N (0, 1).
σ/ n

76
2. When σ is unknown.

In this case we replace σ by S (standard deviation of the

sample) and then we have the following two cases:

(a) If n ≥ 30, then

X̄ − µ
Z= √ ∼ N (0, 1).
S/ n

(b) If n < 30, then

X̄ − µ
T = √ ∼ tν
S/ n

where ν = n−1 is the degrees of freedom of t-distribution.

5.3 Sampling Distribution of the Difference

of Means

If we are given two populations, the first with mean µ1 and

variance σ12 , and the second with mean µ2 and variance σ22 . Let

the values of the variable X̄1 represent the means of random

samples of size n1 drawn from the first population and similarly

the values of X̄2 represent the means of random samples of size

n2 drawn from the second population such that the values of X̄1

77
are independent of the values of X̄2 , then

σ12 σ22
µX̄1 ±X̄2 = µ1 ± µ2 and 2
σX̄ 1 ±X̄2
= + .
n1 n2

Theorem 5.2. Suppose that two independent samples of sizes n1

and n2 are drawn from two large populations with means µ1 and

µ2 and variances σ12 and σ22 , respectively. Then the sampling

distribution of X̄1 − X̄2 is approximately normally distributed

with mean and standard error, given by



σ12 σ22
µX̄1 −X̄2 = µ1 − µ2 and σX̄1 −X̄2 = + .
n1 n2

Hence,
(X̄1 − X̄2 ) − (µ1 − µ2 )
Z= √ 2 ∼ N (0, 1).
σ1 σ22
n1 + n2

Example 5.2. If the uric acid values in normal adult males are

approximately normally distributed with a mean and standard

deviation of 5.7 and 1 mg percent, respectively. Find the proba-

bility that a sample of size 9 will yield a mean:

(a) Greater than 6, (b) between 5 and 6.

µ = 5.7, σ = 1, n=9

78
(a)
( )
X̄ − µ 6−µ
P (X̄ > 6) = P √ > √
σ/ n σ/ n
= P (Z > 0.9) = 1 − P (Z ≤ 0.9)

= 1 − 0.8159 = 0.1841.

(b)

P (5 < X̄ < 6) = P (−2.1 < Z < 0.9)

= 0.8159 − 0.0143 = 0.8016.

Example 5.3. Suppose that a population consists of the follow-

ing values: 1, 3, 5, 7. Construct the sampling distribution of X̄

based on samples of size two selected without replacement from

the above population. Find the mean and variance of the sam-

pling distribution?

4!
The number of drawn samples is equal to C24 = = 6.
2! · 2!

79
Samples x̄ i x̄i fi x̄i fi x̄i 2 fi

(1,3) 2 1 2 1 2 4

(1,5) 3 Then we have the 2 3 1 3 9

(1,7) 4 following frequency 3 4 2 8 32

(3,5) 4 distribution 4 5 1 5 25

(3,7) 5 5 6 1 6 36

(5,7) 6 6 24 106

x̄i fi 24
E[X̄] = ∑i = = 4.
i f i 6
∑ 2 (∑ )2
x̄i fi x̄i fi 106 5
σX̄ = ∑
2 i
− ∑ i
= − 16 = .
i fi i fi 6 3
We can note that

1+3+5+7
µ= = 4 = µX̄ ,
4
σ2 N − n 5 2 5
= × = = σX̄
2
.
n N −1 2 3 3

Example 5.4. Suppose it has been established that for a certain

type of client the average length of a home visit by a public health

nurse is 45 minutes with a standard deviation of 15 minutes,

and that for a second type of client the average home visit is 30

minutes long with a standard deviation of 20 minutes. If a nurse

randomly visits 35 clients from the first and 40 for the second

80
group, what is the probability that the average length of home

visit will differ between the two groups by 20 or more minutes?

µ1 = 45 µ2 = 30

σ12 = 15 σ22 = 20

n1 = 35 n2 = 40.
We don’t know here whether the two populations are normal or

not. But, since n1 > 30 and n2 > 30, then the difference between

two sample means is approximately normally distributed with

the following mean and variance:

µX̄1 −X̄2 = µ1 − µ2 = 45 − 30 = 15,

2 σ12 σ22 (15)2 (20)2


σX̄1 −X̄2
= + = + = 16.4286,
n1 n2 35 40
and hence
( )
20 − 15
P ((X̄1 − X̄2 ) ≥ 20) = P Z ≥
4.05
= P (Z ≥ 1.23)

= 1 − P (Z < 1.23)

= 1 − 0.8907 = 0.1093.

Example 5.5. If all possible samples of size 16 are drawn from a

normal population with mean equal to 50 and variance 25, what

81
is the probability that a sample mean X̄ will fall in the interval

from µX̄ − 1.9σX̄ to µX̄ − 0.4σX̄ ?


σ 5
We know that µX̄ = µ = 50, σX̄ = √ = .
n 4

P (µX̄ − 1.9σX̄ < X̄ < µX̄ − 0.4σX̄ ) = P (−1.9 < Z < −0.4)

= 0.3446 − 0.0287

= 0.3159.

5.4 Sampling Distribution of the Sample Vari-

ance (S 2)

When we draw a sample of size n from a normal population

with variance σ 2 , and the sample variance s2 is computed for

each sample, then we have obtained the values of a statistic S 2 .

In practice, the sampling distribution of S 2 has little application

in statistics. Instead, we shall consider the distribution of a ran-

dom variable X 2 , called chi-square, whose values are calculated

from each sample by the formula

2 (n − 1)s2
χ = .
σ2
2 (n − 1)S 2
The distribution of X = is referred to as the chi-
σ2
square distribution with ν = n − 1 degrees of freedom.

82
Example 5.6. Find the probability that a random sample of size

25, from a normal population with σ 2 = 6, will have a variance

(a) greater than 9.1. (b) between 3.462 and 10.745.

(a)
( )
2 (n − 1)S 2 (n − 1)9.1
P (S > 9.1) = P >
σ2 σ2
( )
24 × 9.1
= P X2 >
6
= P (X 2 > 36.4) = χ224 (36.4) = 0.05.

(b)
( )
24 × 3.462 24 × 10.745
P (3.462 < S 2 < 10.745) = P < X2 <
6 6
= P (13.848 < X 2 < 42.98)

= χ224 (13.848) − χ224 (42.98)

= .95 − .01 = 0.94.

5.5 Sampling Distribution of the Sample Pro-

portion

To know and understand the distribution of the sample propor-

tion, let us consider the following example

Example 5.7. Suppose that in a certain human population 0.08

are colorblind. If we designate a population proportion by p, we

83
can say that, in this example, p = 0.08. If a random sample

of 150 individuals from this population is selected. What is the

probability that the proportion in the sample who are colorblind

will be greater than 0.15?

To answer this question we need to know the properties of

the sampling distribution of the sample proportion.

We will denote the sample proportion by p̂. When the sample

size is large, the distribution of sample proportions is approxi-

mately normally distributed. The mean of the distribution, µP̂ ,

that is the average of all possible sample proportions, will be

equal to the true population proportion, p, and the variance of


p(1 − p)
the distribution, σP̂2 , will be equal to , then
n
p̂ − p
z=√ , (5.1)
p(1 − p)/n
is the value of standard normal distribution.

Example 5.8. In a random sample of 75 adults, 35 said they

felt that cancer of the breast is curable. If, in the population

from which the sample was drawn, the true proportion who feel

cancer of the breast can be cured is 0.55, what is the probability

of obtaining a sample proportion smaller than that obtained in

this sample?

84
n = 75 p = 0.55
( ) ( )
35 P̂ − p (35/75) − p
P P̂ < =P √ <√
75 p(1 − p)/n p(1 − p)/n
( )
(35/75) − 0.55
=P Z< √
0.55 × 0.45/75

= P (Z < −1.45) = 0.0735.


Relation (5.1) can be generalized to difference between two sam-

ple proportions as follows

(p̂1 − p̂2 ) − (p1 − p2 )


z=√ ,
p1 (1 − p1 ) p2 (1 − p2 )
+
n1 n2
where, for i = 1, 2, p̂i , pi are the sample proportion and popu-

lation proportion of ith sample and ith population, respectively,

and ni is the sample size drawn from population i.

85
Exercises V
1. If each observation in a sample is multiplied by k, show that

the sample variance becomes k 2 times its original value.

2. (i) Calculate the variance of the sample 3, 5, 8, 7, 5, and 7.

(ii) Without calculating, state the variance of the sample

6, 10, 16, 14, 10, and 14

(iii) Without calculating, state the variance of the sample

25, 27, 30, 29, 27, and 29

3. A finite population consists of the numbers 2, 4, and 7.

(i) Construct a frequency histogram for the sampling dis-

tribution of X̄ when samples of size 4 are drawn with

replacement.
2
(ii) Verify that µX̄ = µ and σX̄ = σ 2 /n

(iii) Between what two values would you expect the middle

68% of the sample means to fall?

4. The heights of 1000 students are approximately normally

distributed with a mean of 68.5 inches and a standard de-

viation of 2.7 inches. If 200 random samples of size 25 are

drawn from this population, determine

86
(i) The expected mean and standard deviation of the sam-

pling distribution of the mean.

(ii) The number of sample means that fall between 66 and

69 inclusive.

(iii) The number of sample means falling below 65.

5. Find the probability that a random sample of 25 observa-

tions, from a normal population with variance σ 2 = 6 will

have a variance S 2 (a) greater that 9.1, (b)between 3.462

and 10.745.

87
Chapter 6

POINT AND INTERVAL

ESTIMATIONS

Estimation is the first of the two general areas of statistical infer-

ence. The second general area, hypothesis testing, is examined

in the next chapter.

In this chapter we shall consider inferences about unknown

population parameters such as the mean, variance and propor-

tion.

Definition 6.1 (Statistical inference). The procedure whereby

inferences about a population are made on the basis of the re-

sults obtained from a sample drawn from that population is

called statistical inference.

88
6.1 Methods of Estimation

A population parameter can be estimated by a point or an inter-

val. A point estimate of some population parameter θ is a single

numerical value θ̂ of the statistic Θ̂. For example, the value x̄

of the statistic X̄, computed from a sample of size n, is a point

estimate of the population parameter µ. Similarly, s2 is a point

estimate of the population variance σ 2 .

An interval estimate of a population parameter, θ, is given

by two values which θ lies within them.

Definition 6.2 (Unbiased estimator). A statistic Θ̂ is said to

be an unbiased estimator of the parameter θ if E(Θ̂) = θ.

The sample mean, the difference between two sample means,

the sample proportion, the difference between two sample pro-

portions are unbiased estimates of their corresponding parame-

ters.

Example 6.1. Prove that S 2 is an unbiased estimator of σ 2 .

Let X1 , X2 , . . . , Xn is a random sample of size n, that is

X1 , X2 , . . . , Xn are independent and identically distributed, each

with mean µ and variance σ 2 . Then

89
X1 + X2 + · · · + Xn
∵ X̄ =
( )n ( ) ( )
X1 X2 Xn
∴ E[X̄] = E +E + ··· + E
n n n
µ µ µ
= + + · · · + = µ.
n n n
σ2 σ2 σ2 σ2 σ2
Also, V [X̄] = E[(X̄ − µ)2 ] = 2 + 2 + · · · + 2 = n 2 = .
n n n n n
Now,

n
1∑
n
(Xi − µ) =
2
[(Xi − X̄) + (X̄ − µ)]2
i=1
n i=1

n ∑
n
= (Xi − X̄) + 2(X̄ − µ)
2
(Xi − X̄)
i=1 i=1

+ n(X̄ − µ)2
∑n
i=1 (Xi − X̄)
2
= (n − 1) + 0 + n(X̄ − µ)2
(n − 1)
= (n − 1)S 2 + n(X̄ − µ)2

Taking the expectation for both sides, then



n
n−1
E[(Xi − µ)2 ] = E[S 2 ] + nE[(X̄ − µ)2 ]
i=1
n

n
σ2
∴ σ = (n − 1)E[S ] + n
2 2

i=1
n

(n − 1)σ 2 = (n − 1)E[S 2 ]

∴ E[S 2 ] = σ 2 ,

and hence S 2 is an unbiased estimator for σ 2 .

90
6.2 Confidence Intervals

The interval I can be considered a confidence interval for the

population parameter, θ, if we can compute the probability that

I contains θ. This probability is called the confidence coefficient

of the interval.

The procedure of obtaining a confidence interval is to obtain

Q(Θ̂, θ), which is a function of the estimator Θ̂ and the parame-

ter θ such that the distribution of this quantity does not depend

on θ. For fixed α (usually 1% or 5%) we obtain the values Q1

and Q2 such that

P (Q1 ≤ Q(Θ̂, θ) ≤ Q2 ) = 1 − α.

By solving the inequality Q1 ≤ Q(Θ̂, θ) ≤ Q2 with respect to θ,

then

Q1 ≤ Q(Θ̂, θ) ≤ Q2 ⇐⇒ T1 ≤ θ ≤ T2 .

Then we can write

P (Q1 ≤ Q(Θ̂, θ) ≤ Q2 ) = P (T1 ≤ θ ≤ T2 ) = 1 − α,

where T1 and T2 are called the lower and upper limits, respec-

tively, 1 − α is called confidence coefficient.

91
6.2.1 Confidence interval for the population mean (µ)

[σ known]

It is easy now to find a (1 − α)100% confidence interval for µ of

a normal distribution with known variance, σ 2 . We know that

Z ∼ N (0, 1), then by taking

X̄ − µ
Z= √ ≡ Q(X̄, µ)
σ/ n

X̄ − µ
P (−zα/2 < Z < zα/2 ) = P (−zα/2 < √ < zα/2 ) = 1 − α
σ/ n
( )
σ σ
P −zα/2 √ < X̄ − µ < zα/2 √ =1−α
n n
( )
σ σ
P X̄ − zα/2 √ < µ < X̄ + zα/2 √ =1−α
n n
Theorem 6.1. A (1 − α)100% confidence interval for µ, based

on a random sample of size n with mean X̄, is

σ σ
X̄ − zα/2 √ < µ < X̄ + zα/2 √ , (6.1)
n n

where zα/2 is the value of standard normal random variable leav-

ing an area α/2 to the right, i.e. P (Z > zα/2 ) = α/2.

92
6.2.2 Confidence interval for the population mean (µ)

[σ unknown]

In this case we replace σ 2 by S 2 (sample variance) and then the

confidence interval becomes


S S
X̄ − zα/2 √ < µ < X̄ + zα/2 √ ; n ≥ 30
n n
S S
X̄ − tα/2 √ < µ < X̄ + tα/2 √ ; n < 30
n n
where tα/2 is the value of the random variable T having t-

distribution, with ν = n − 1 degrees of freedom, leaving an

area α/2 to the right, i.e. P (T > tα/2 ) = α/2.

6.2.3 Determination of sample size for estimating means

We present now a method for determining the sample size re-

quires for estimating a population mean.

Let e denote the error in estimating the population mean

represented for example by Inequality (6.1). So


σ [ σ ]2
e = zα/2 √ =⇒ n = zα/2 .
n e
Example 6.2. The average number of heartbeats per minute for

a sample of 49 subjects was found to be 90. If the sample is taken

from a normal population with variance 100, find 90%, 95% and

99% confidence interval for the population mean.

93
X̄ = 90, σ 2 = 100, n = 49.

1 − α = 0.90 −→ α = 0.1 −→ α/2 = 0.05 −→ 1 − α/2 =

0.95. =⇒ zα/2 = 1.645.

Since the confidence interval is given by

σ σ
X̄ − zα/2 √ < µ < X̄ + zα/2 √ .
n n

Then 90% confidence interval for µ is given by

10 10
90 − 1.645 × < µ < 90 + 1.645 ×
7 7

Now,

1 − α = 0.95 −→ 1 − α/2 = 0.975 =⇒ zα/2 = 1.96,

1 − α = 0.99 −→ 1 − α/2 = 0.995 =⇒ zα/2 = 2.6.

So, 95% and 99% confidence intervals for µ are given, respec-

tively, by

10 10
90 − 1.96 × < µ < 90 + 1.96 × ,
7 7
10 10
90 − 2.6 × < µ < 90 + 2.6 × .
7 7

Example 6.3. Let X̄ be the mean of a random sample of size

n from a distribution which is N (µ, 9). Find n such that

P (X̄ − 1 < µ < X̄ + 1) = 0.9.

94
σ 2 = 9, e=1

1 − α = 0.9 −→ 1 − α/2 = 0.95 =⇒ zα/2 = 1.645, then


[ [ ]2
σ ]2 3 ∼
n = zα/2 = 1.645 × = 24.
e 1

6.3 Confidence Interval for the Difference Be-

tween Two Population Means

Suppose that X1 , X2 , . . . , Xn1 is a random sample with mean

X̄ and variance S12 taken from a normal population with mean

µ1 and variance σ12 , and let Y1 , Y2 , . . . , Yn2 be another random

sample with mean Ȳ and variance S22 taken from a normal pop-

ulation with mean µ2 and variance σ22 , then we have the following

cases:

1. Confidence interval for µ1 − µ2 when σ12 and σ22 are known.


√ √
2
σ1 σ2 2 σ12 σ22
(X̄−Ȳ )−zα/2 + < µ1 −µ2 < (X̄−Ȳ )+zα/2 +
n1 n2 n1 n2

is a (1 − α)100% confidence interval for µ1 − µ2 .

2. Confidence interval for µ1 −µ2 when σ12 and σ22 are unknown.

If n1 and n2 are large (in practice both strictly greater than

95
60) and we cannot assume that σ12 = σ22 , then
√ √
2
S1 S2 2 S12 S22
(X̄−Ȳ )−zα/2 + < µ1 −µ2 < (X̄−Ȳ )+zα/2 +
n1 n2 n1 n2

is an approximate (1 − α)100% confidence interval for µ1 −

µ2 .

3. Confidence interval for µ1 − µ2 when σ12 = σ22 = σ 2 (un-

known).

It is usual to combine the two separate estimators S12 (based

on ν1 = n1 − 1 degrees of freedom) and S22 (based on

ν2 = n2 − 1 degrees of freedom) into a single estimator

(pooled estimator) of the variance, Sp2 , which is given by



i νi Si
2
(n1 − 1)S12 + (n2 − 1)S22
Sp2 = ∑ = ,
i νi n1 + n 2 − 2

and hence a (1 − α)100% confidence interval for µ1 − µ2 is

then given by

(X̄ − Ȳ ) − tα/2 Sp W < µ1 − µ2 < (X̄ − Ȳ ) + tα/2 Sp W,

where tα/2 is the value of the random variable T having t-

distribution with n1 + n2 − 2 degrees of freedom and W =



1 1
+ .
n1 n2

96
Exercises VI

1. An electrical firm manufactures light bulbs that have a

length of life that is approximately normally distributed,

with a standard deviation of 40 hours. If a random sample

of 30 bulbs has an average life of 780 hours, find a 96%

confidence interval for the population mean of all bulbs

produced by this firm.

2. How large a sample is needed in Exercise 1 if we wish to

be 96% confident that our sample mean will be within 10

hours of the true mean?

3. A random sample of 8 cigarettes of a certain brand has an

average nicotine content of 18.6 milligrams and a standard

deviation of 2.4 milligrams. Construct a 99% confidence

interval for the true average nicotine content of this kind of

cigarettes, assuming an approximate normal distribution.

4. Given two random samples of size n1 = 9 and n2 = 16,

from two independent normal populations, with x̄1 = 64,

x̄2 = 59, s1 = 6, and s2 = 5, find a 95% confidence interval

for µ1 − µ2 , assuming that σ1 = σ2 .

97
Chapter 7

TESTS OF HYPOTHESES

The purpose of hypothesis testing is to aid the clinician, re-

searcher, or administrator in reaching a decision concerning a

population by examining a sample from that population.

Definition 7.1 (Statistical hypothesis). A statistical hypothesis

is an assumption or statement, which may or may not be true,

concerning one or more populations.

Hypothesis that we formulate with the hope of rejecting are

called null hypotheses, denoted by H0 . The null hypothesis is

sometimes referred to as a hypothesis of no difference, since

it is a statement of agreement with ( or no difference form)

conditions presumed to be true in the population of interest.

The rejection of H0 leads to the acceptance of an alternative

98
hypothesis, denoted by H1 .

Definition 7.2. A type I error has been committed if we reject

the null hypothesis when it is true.

Definition 7.3. A type II error has been committed if we accept

the null hypothesis when it is false.

Definition 7.4. The probability of committing a type I error is

called the level of significance of the test and is denoted by α

i.e. α = P (type I error).

If the alternative hypothesis is one-sided such as H1 : θ > θ0

or θ < θ0 , the test is called a one-tailed test. The critical region

for the alternative hypothesis θ > θ0 lies entirely in the right

tail of the distribution, while the critical region H1 : θ < θ0 lies

entirely in the left tail. If the alternative hypothesis H1 : θ ̸= θ0 ,

then it is called two-tailed test. The critical region here consists

of two tails, one in left corresponds to θ < θ0 and the other one

in the right corresponds to θ > θ0 .

A test is said to be significant if the null hypothesis is re-

jected at the 0.05 level of significance, and is considered highly

significant if the null hypothesis is rejected at the 0.01 level of

significance.

99
7.1 Tests Concerning Means, Variances and

Proportions

The steps for testing a hypothesis concerning a population pa-

rameter θ against some alternative hypothesis may be summa-

rized as follows:

1. Formulate the null hypothesis, H0 : θ = θ0 .

2. Formulate the alternative hypothesis, H1 : θ > θ0 , θ <

θ0 or θ ̸= θ0 .

3. Choose a level of significance equal to α which may be 0.05

or 0.01.

4. Select the appropriate test statistic and establish the criti-

cal region.

5. Compute the value of the statistic from a random sample

of size n.

6. Conclusion: Reject H0 if the statistic has a value in the

critical region, otherwise accept H0 .

Example 7.1. A doctor developed a new drug claims its effi-

ciency with mean µ = 20 and with standard deviation of 0.5.

100
Test the hypothesis that µ = 20 against the alternative that

µ ̸= 20. If a random sample of 50 patients is tested and found

a mean x̄ = 19.8. Use 0.01 level of significance.

1. H0 : µ = 20.

2. H1 : µ ̸= 20.

3. α = 0.01.

x̄ − µ
4. Suppose that z = √ , so the critical region is
σ/ n
z < −zα/2 and z > zα/2

zt < −2.58 zt > 2.58.

5. Computation: x̄ = 19.8, n = 50
x̄ − µ 19.8 − 20
zc = √ = √ = −2.828.
σ/ n 0.5/ 50
6. Conclusion: Reject H0 since zc < zt , and conclude that the

drug is highly significant.

Example 7.2. Teat the hypothesis that the average weight of

containers of a particular lubricant is 10 ounces if the weights of

a random sample of 10 containers are 10.2, 9.7, 10.1, 10.3, 10.1,

9.8, 9.9, 10.4, 10.3 and 9.8 ounces? Use a 0.01 level of signifi-

cance and assume that the distribution of weights is normal.

101
n = 10, µ = 10
1 ∑
10
100.6
x̄ = xi = = 10.06,
10 i=1 10
 ( 10 )2 
1  ∑10 ∑
2
s = x −
2
xi  =⇒ s = 0.245.
10 × 9 i=1 i i=1

1. H0 : µ = 10.

2. H1 : µ ̸= 10.

3. α = 0.01.
x̄ − µ
4. Suppose that t = √ , so the critical region is
s/ n
t < −tα/2 and t > tα/2

tt < −3.25 tt > 3.25.


x̄ − µ 10.06 − 10
5. Computation: tc = √ = √ = 0.774.
s/ n 0.245/ 10
6. Conclusion: Accept H0 , since −tt < tc < tt .

Example 7.3. Let the mean of a certain operation is µ = 50

minutes with standard deviation σ = 10 minutes. A new equip-

ment is used. If a random sample of size n = 12 gives a mean

x̄ = 42 with standard deviation s = 11.9. Test the hypothesis

that the mean µ = 50 against the alternative that µ < 50 using

(α = 0.05, 0.01)? Assume the population is normal.

102
1. H0 : µ = 50.

2. H1 : µ < 50.

3. α = 0.05

x̄ − µ
4. Suppose that t = √ , so the critical region is
s/ n
t < −tα,n−1 =⇒ tt < −1.796.

x̄ − µ 42 − 50
5. Computation: tc = √ = √ = −2.32.
s/ n 11.9/ 12

6. Conclusion: Reject H0 , since tc < tt .

If α = 0.01, the critical region is tt < −2.718. In this case we

accept H0 , since tc > tt .

Example 7.4. A simple random sample of 15 nursing students

who participated in an experiment took a test to measure manual

dexterity. The variance of the sample observations was 1225.

We want to know if we can conclude from these data that the

population variance is different from 2500.

1. H0 : σ 2 = 2500.

2. H1 : σ 2 ̸= 2500

3. α = 0.05

103
(n − 1)S 2
4. The test statistic to be used is X 2 = .
σ2
The critical region is

X 2 > χ2(α/2, n−1) X 2 < χ2(1−α/2, n−1)

X 2 > 26.119 X 2 < 5.629.

5. Computations:

(n − 1)s2 14 × 1225
χ2c = 2
= = 6.86.
σ 2500

6. Conclusion: Accept H0 , since 5.629 < χ2c < 26.119. So,

based on these data we are unable to conclude that the

population variance is not 2500.

Example 7.5. A researcher team collected serum amylase data

from a sample of healthy subjects and from a sample of hospi-

talized subjects. They wish to know if they would be justified in

concluding that the population means are different? The data

consists of serum amylase determinations on n2 = 15 healthy

subjects and n1 = 22 hospitalized subjects. The sample means

and standard deviations are as follows:

x̄1 = 120 units/ml s1 = 40 units/ml

x̄2 = 96 units/ml s2 = 35 units/ml.

104
1. H0 : µ1 − µ2 = 0

2. H1 : µ1 − µ2 ̸= 0

3. α = 0.05

(x̄1 − x̄2 ) − (µ1 − µ2 )


4. The test statistic to be used is T = √
1 1
Sp +
n1 n2
The critical regions are

T < −t(α/2, n−1) and T > t(α/2, n−1)

T < −2.0301 T > 2.0301

5. Computations:

(120 − 96) − 0 24
tc = √ = = 1.88.
1 1 12.75
38.08 +
15 22
6. Conclusion: Accept H0 , since −2.0301 < tc < 2.0301.

Example 7.6. The manufacturer of a product medicine claimed

that it was 90% effective relieving an allergy for a period of

8 hours. In a sample of 200 people who had the allergy, the

medicine provided relief for 160 people. Determine whether the

manufacturer’s claim is legitimate.

Let p denote the probability of obtaining relief from allergy by

using medicine. Then

105
1. H0 : p = 0.9, i.e. claim is correct

2. H1 : p < 0.9.

3. α = 0.01.

P̂ − p
4. The test statistic to be used is Z = √ .
p(1 − p)
n
The critical region is Z < −zα =⇒ Z < −2.33.

5. Computations:
0.8 − 0.9
zc = √ = −4.73.
(0.9)(0.1)
200
6. Conclusion: Reject H0 , since zc lies in the critical (rejec-

tion) region.

When the null hypothesis to be tested is p1 − p2 = 0, we are

hypothesizing that the two population parameters are equal.

We use this as justification for combining the results of the two

samples to come up with a pooled estimate of the hypothesized

common proportion. If this procedure is adopted, one computes


x1 + x2
p̄ = ,
n1 + n2
where x1 and x2 are the numbers in the first and second sam-

ples, respectively, possessing the characteristic of interest. This

106
pooled estimate of p = p1 = p2 is used in computing σ̂P̂1 −Pˆ2 , the

estimated standard error of the estimator, as follows:



p̄(1 − p̄) p̄(1 − p̄)
σ̂P̂1 −Pˆ2 = + .
n1 n2

Then the test statistic becomes

(P̂1 − Pˆ2 ) − (p1 − p2 )


Z= , (7.1)
σ̂P̂1 −Pˆ2

where Z N (0, 1) if the null hypothesis is true.

Example 7.7. In a study designed to compare a new treatment

for migraine headache with the standard treatment, 78 of 100

subjects who received the standard treatment responded favorably.

Do these data provide sufficient evidence to indicate that the new

treatment is more effective than the standard? The answer is yes

if we can reject the null hypothesis that the new treatment is no

more effective than the standard.

78 90 90 + 78
p̂1 = , p̂2 = , p̄ = = 0.84
100 100 100 + 100

1. H0 : p2 − p1 ≤ 0

2. H1 : p2 − p1 > 0

107
3. α = 0.05.

4. The test statistic to be used is given by (7.1).

The critical region is

Z > zα =⇒ Z > 1.645.

5. Computations:

0.90 − 0.78 0.12


zc = √ = = 2.32.
0.84 × 0.16 0.84 × 0.16 0.0518
+
100 100

6. Conclusion: Reject H0 , since zc > 1.645.

So, these data suggest that the new treatment is more ef-

fective than the standard.

7.2 Goodness-of-Fit Test

We shall now consider a test to determine if some population has

a specified distribution. The test is based upon how good a fit

we have between the frequencies of occurrence of observations in

an observed sample and the expected frequencies obtained from

the hypothesized distribution.

Theorem 7.1. A goodness of fit test between observed and ex-

108
pected frequencies is based on the quantity

k
(oi − ei )2
2
χ = ,
i=1
ei

where χ2 is a value of the random variable X 2 whose sampling

distribution is approximated very closely by the chi-square dis-

tribution. The symbols oi and ei represent the observed and ex-

pected frequencies, respectively, for the ith cell

The number of degrees of freedom in a chi-square goodness

of fit test is equal to the number of cells minus the number of

quantities obtained from the observed data, which are used in

the calculations of the expected frequencies.

The critical region fall in the right tail of the chi-square dis-

tribution, X 2 > χ2α

Remark 7.1.

1. If the degree of freedom is equal to 1, a correction called

Yates’ correction for continuity is applied. The corrected

formula for χ2 then becomes


∑k
(| oi − ei | −0.5)2
2
χ (corrected) = .
i=1
ei

2. When the degrees of freedom is less than 5 and some values

109
of ei < 5, it is best to have ei somewhat larger than 5 which

can be done by grouping these cells to have ei > 5.

Example 7.8. The grades in a statistic course for a particular

semester were as follows

Grade A B C D F

f 14 18 32 20 16

Test the hypothesis, at the 0.05 level of significance, that the

distribution of grades is uniform.

The total number of students is n = 14 + 18 + 32 + 20 + 16 =

100, then the expected value for each grade is ei = np, where

p = 1/5, according to the uniform distribution we have

Grade A B C D F Total

oi 14 18 32 20 16

ei 20 20 20 20 20

(oi − ei )2 36 4 144 0 16 200

1. H0 : The distribution of grades is uniform.

2. H1 : The distribution of grades is not uniform.

3. α = 0.05.

110
4. The test statistic X 2 having chi-square distribution with
∑ (oi − ei )2
the value χ2 = 5i=1 is used.
ei
The critical region is X 2 > χ2(0.05,4) =⇒ X 2 > 9.488

5. computations:

5
(oi − ei )2 200
χ2c = = = 10.
i=1
ei 20

6. Conclusion: Reject H0 , since Xc2 > 9.488 and hence we

conclude that the distribution of grades is not uniform.

Example 7.9. A die is rolled 24 times, the following results are

obtained
1 2 3 4 5 6

oi 6 5 2 3 0 8

ei 4 4 4 4 4 4
Is it a fair die? Use 0.05 level of significance.

In this experiment non of the expected frequencies exceeds 5,

therefore it is necessary to combine each cell with other cell. If

successive pairs of cells are combined , the preceding empirical

111
condition of ei will be satisfied as shown in the following table

1 or 2 3 or 4 5 or 6 T otal

oi 11 5 8 24

ei 8 8 8 24

(oi − ei )2 9 9 0 18

1. H0 : The die is fair.

2. H1 : The die is not fair.

3. α = 0.05.

2
∑6 (oi − ei )2
4. We use in the calculations χ = i=1 .
ei
The critical region is χ2 > χ2(0.05,2) = 5.991.

5. Computations:

6
(oi − ei )2 18
χ2c = = = 2.25.
i=1
ei 8

6. Conclusion: Accept H0 , since χ2c < χ2 (tabulated), and

hence the die is fair.

7.2.1 Contingency tables

The contingency table is used for the purpose of studying the

relationship between two variables, each variable has different

112
levels. Consider, for example, the factor A classified into n levels

(A1 , . . . , An ) and factor B classified into m levels (B1 , . . . , Bm )

and it is desired to test the hypothesis that there is no rela-

tionship between the two factors A and B. If oij denote to

the observed frequency in the ith classification Ai for A and j th

classification Bj for B, and suppose that



m
Ri = oij ; i = 1, . . . , n,
j=1


n
Cj = oij ; j = 1, . . . , m, and
i=1

n ∑m
N= Ri = Cj ; i = 1, . . . , n j = 1, . . . , m,
i=1 j=1

AB B1 B2 . . . Bj . . . Bm Total

A1 o11 o12 . . . oij . . . o1m R1

A2 o21 o22 . . . o2j . . . o2m R2


.. .. .. .. .. .. .. ..
. . . . . . . .

Ai oi1 oi2 . . . oij . . . oim Ri


.. .. .. .. .. .. .. ..
. . . . . . . .

An on1 on2 . . . onj . . . onm Rn


∑ ∑
Total C1 C2 . . . Cj ... Cm N= Cj = Ri

113
If the null hypothesis is satisfied (the two factors are inde-

pendent), then we have, for i = 1, . . . , n, j = 1, . . . , m,

Ri × Cj ∑ ∑ (oij − eij )2
n m
2
eij = and then χ = .
N i=1 j=1
eij

The number of degrees of freedom ν = (n − 1)(m − 1), and the

null hypothesis will be rejected if χ2 > χ2α,ν .

Example 7.10. A random sample of 30 adults are classified

according to sex and the number of hours they watch television

during a week.

Male Female

Over 25 hours 5 9

Under 25 hours 9 7

Using a 0.01 level of significance, test the hypothesis that a per-

son’s sex and time watching television are independent.

1. H0 : A person’s sex and time watching television are inde-

pendent.

2. H1 : A person’s sex and time watching television are de-

pendent.

3. α = 0.01.

114
∑ ∑ (oij − eij )2
4. We use χ(α,ν)2 = i j ,
eij
where the critical region is χ2 > χ2α,ν = χ20.01,1 = 6.635.

5. Computations:
14 × 14 14 × 16
e11 = = 6.5 e12 = = 7.5
30 30
14 × 16 16 × 16
e21 = = 7.5 e22 = = 8.5
30 30

Male Female Total

Over 25 hours 5 (6.5) 9 (7.5) 14

Under 25 hours 9 (7.5) 7 (8.5) 16

Total 14 16 30


2 ∑
2
(oij − eij )2
χ2c =
i=1 j=1
eij
(5 − 6.5)2 (9 − 7.5)2 (9 − 7.5)2 (7 − 8.5)2
= + + +
6.5 7.5 7.5 8.5
= 0.538

6. Conclusion: Accept H0 , since χ2c < 6.635.

Example 7.11. In an experiment to study the dependence of

hypertension on smoking habits, the following data were taken

115
on 180 individuals:

Nonsmokers Moderate Heavy

Smokers Smokers

Hypertension 21 36 30

No hypertension 48 26 19

Test the hypothesis that the presence or absence of hypertension

is independent of smoking habits. Use a 0.05 level of signifi-

cance.

Nonsmokers Moderate Heavy Total

Smokers Smokers

Hypertension 21 (33.35) 36 (29.96) 30 (23.29) 87

No hypertension 48 (35.65) 26 (32.03) 19 (25.31) 93

Total 69 62 49 180

1. H0 : The presence or absence of hypertension is indepen-

dent of smoking habits.

2. H1 : The presence or absence of hypertension is dependent

of smoking habits.

3. α = 0.05.

116
∑ ∑ (oij − eij )2
4. We use χ(α,ν)2 = i j ,
eij
where the critical region is χ2 > χ2α,ν = χ20.05,2 = 5.991.

5. Computations:
87 × 69 93 × 69
e11 = = 33.35 e21 = = 35.65
180 180
87 × 62 93 × 62
e12 = = 29.96 e22 = = 32.03
180 180
87 × 49 93 × 49
e13 = = 23.68 e23 = = 25.31
180 180

(21 − 33.35)2 (36 − 29.96)2 (30 − 23.29)2


χ2c = + +
33.35 29.96 23.29
(48 − 35.65)2
(26 − 32.03)2
(19 − 25.31)2
+ + +
35.65 32.03 25.31
=13.065

6. Conclusion Reject H0 , since χ2c > 5.991 and hence the hy-

pertension and smoking habits are dependent.

117
Exercises VII
1. Test the hypothesis that the average weight of containers

of a particular lubricant is 10 ounces if the weights of a

random sample of 10 containers are 10.2, 9.7, 10.1, 10.3,

10.1, 9.8, 9.9, 10.4, 10.3, and 9.8 ounces. Use 0.01 level of

significance and assume that the distribution of weights is

normal.

2. A farmer claims that the average yield of corn of variety

A exceeds the average yield of variety B by at least 12

bushels per acre. To test this claim, 50 acres of each vari-

ety are planted and grown under similar conditions. Vari-

ety A yielded, on the average, 86.7 bushels per acre with a

standard deviation of 6.28 bushels per acre, while variety B

yielded, on the average, 77.8 bushels per acre with a stan-

dard deviation of 5.61 bushels per acre. Test the farmer’s

claim using a 0.05 level of significance.

3. A cigarette-manufacturing firm distributes two brands of

cigarettes. If it is found that 56 of 200 smokers prefer brand

A and that 29 of 150 smokers prefer brand B, can we con-

clude at the 0.06 level of significance that brand A outsells

118
brand B?

4. A die is tossed 180 times with the following results:

x 1 2 3 4 5 6
.
f 28 36 30 36 23 27

Is this a balanced die? Use a 0.05 level of significance.

5. In a shop study, a set of data was collected to determine

whether or not the proportion of defective produced by

workers was the same for the day, evening, or midnight shift

worked. The following data were collected on the items pro-

duced:

Shif t

Day Evening M idnight

Defective 45 55 70

Nondefective 905 890 870

What is your conclusion? Use an α = 0.025 level of signifi-

cance.

119
REFERENCES

Guttman, I; Wilks, S and Hunter, J. (1982). Introductory

Engineering Statistics . 3rd Ed., John Wiley & Sons, Inc.

Hogg, R.; McKean, J. and Craig, A. (2005). Introduction

to Mathematical Statistics. 6th Ed. Pearson Prentice Hall.

Mood, A.; Graybill, F. and Boes, D. (1982). Introduct-

ion to the Theory of Statistics. 3rd Ed. 12th printing, Mc-

Graw-Hill.

Rosner, B. (1982). Fundamentals of Biostatistics. PWS Pub-

lishers, Duxbury Press, Boston, Massachusetts.

120

You might also like