Probality and Statics

Beni-Suef National University
Faculty of Computers and Artificial Intelligence
Probability and Statistics
2nd Year Students

2023-2024
Prepared by
Dr. Alaa H. Abdel-Hamid
Professor of Mathematical Statistics
Contents
Chapter 1
INTRODUCTION TO PROBABILITY 1
Chapter 2
RANDOM VARIABLES, PROBABILITY FUNCTIONS
AND EXPECTATIONS 20
Chapter 3
SOME IMPORTANT DISCRETE DISTRIBUTIONS 37
Chapter 4
SOME IMPORTANT CONTINUOUS DISTRIBUTIONS 57
Chapter 5
SAMPLING THEORY 73
Chapter 6
POINT AND INTERVAL ESTIMATIONS 88
Chapter 7
TESTS OF HYPOTHESES 98
REFERENCES 120
Chapter 1
INTRODUCTION TO
PROBABILITY
The purpose of this chapter is to present a formal treatment of

the mathematical elements of probability theory.
1.1 Elements of Probability

One of the basic features to be found in repetitive operations,
that is, repeating a trial or experiment over and over again,
under specified conditions is that the outcome varies from trial
to trial. This leads us to analyze the possible outcomes which
could arise if a trial or experiment were performed once.
1.1.1 Some important definitions

Definition 1.1 (Random experiment (E)). The random exper-
iment, denoted by E, is an experiment in which we can not
predict any of its outcomes can occur but, we know before the
set of all possible outcomes of it.
Definition 1.2 (Sample space (S)). The sample space, denoted
by S, is the set of all possible outcomes which could arise if a
random experiment were performed once.
1
Definition 1.3 (Event). The event is any subset of finite sample
space S
Example 1.1 (Rolling a die). The sample space, S, when we
role a die is
S = {1, 2, 3, 4, 5, 6}.
Example 1.2 (Tossing a coin). The sample space, when we toss
a coin is
S = {H, T }, H: Head, T : Tail.
Example 1.3 (Rolling a die and tossing a coin). The sample
space can be written in the form
S ={1, 2, 3, 4, 5, 6} × {H, T }
={(1, H), (2, H), . . . , (6, H), (1, T ), (2, T ), . . . , (6, T )}.
Example 1.4. If E is the experiment of tossing a coin until
the head occurs, and hence by calculating the number of times
tossing the coin, we get
S = {1, 2, 3, . . . , ∞}.
Example 1.5 (Tossing 3 coins or equivalently tossing one coin
3 times).
S ={H, T } × {H, T } × {H, T }
={HHH, HHT, HT T, HT H, T HH, T T H, T HT, T T T }.
Note that, for simplicity we write, for example, (H, T, H) ≡
HT H. The number of prime events also in S is equal to 23 .
Remark 1.1.
1. If the event consists of only one element then it is called
prime event.
2. If the event contains no elements, then it is called impossible
event.
2
3. If the event is the sample space itself, then it is called cer-
tain event.
4. For example, A = {2}, B = {5} are considered prime
events in S of Example (1.1), but D = {3, 4, 7} is not an
event in this sample space since 7 is not an element in S.
From the last definitions, one can notice that the sample
space may be finite as in Examples (1.1), (1.2) and (1.3), and
may be infinite as in Example (1.4).
Definition 1.4 (Complement event). The complement Ac of an
event A consists of all elements in S which are not in A.
1.2 Operations on the Events

If A and B are two events in a sample space S, then we define
the following:
1. A ∪ B : The event which consists of all elements con-
tained in A or B or both. Equivalently, we can say the
event that at least one of the two events A, B occurs.
2. A ∩ B : The event which consists of the elements con-
tained in A and B. Equivalently, we can say the event that
the two events A, B occur in the same time.
3. Ac : The event that A does not occur.
4. A − B : A and not B occurs.
5. (A ∩ B)c : At most one of the two events occurs.
6. (A − B) ∪ (B − A) = (A ∪ B) − (A ∩ B) : Exactly one
of the two events occurs.
Definition 1.5 (Mutually exclusive events). We say that the
two events A and B in a sample space S are mutually exclusive
events if A ∩ B = ∅, i.e. A and B not occur in the same time.
3
In general, we say that the events A1 , A2 , . . . , An in a sample
space S are mutually exclusive events if
Ai ∩ Aj = ∅, i ̸= j, i, j = 1, 2, . . . , n.
Definition 1.6 (Partition). The events A1 , A2 , . . . , An represent

a partition to the sample space S if the following two conditions
are satisfied.
∪
(i) A1 ∪ A2 ∪ · · · ∪ An ≡ ni=1 = S
(ii) Ai ∩ Aj = ∅ ∀ i ̸= j, i, j = 1, 2, . . . , n.
Remark 1.2.
(i) A ∪ Ac = S (ii)A ∩ Ac = ∅
(iii)A ∪ ∅ = A (iv)A ∩ ∅ = ∅
(v)A ∪ S = S (vi)A ∩ S = A
(vii)A − B = A ∩ B c
(viii)S c = ∅
(ix)(A ∪ B)c = Ac ∩ B c (x) (A ∩ B)c = Ac ∪ B c
(xi)A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
Example 1.6. If S = {1, 2, 3, 4, 5, 6}, A = {1, 3, 5},
B = {2, 4, 6}, C = {1, 3, 6}. Find
(i) A ∪ B (ii) A ∩ C (iii) Ac ∩ C
(iv) Are the two events A and B represent a partition to S?
(i) A ∪ B = {1, 2, 3, 4, 5, 6} = S.
(ii) A ∩ B = ∅.
(iii) Ac ∩ C = {6}.
(iv) From (i) and (ii), we conclude that A and B represent a
partition to S.
Definition 1.7 (Probability function). Let S be a sample space,
a probability function P (.) is a set function with domain D (an
algebra of events) and counterdomain the interval [0,1] which
satisfies the following axioms:
4
A1 0 ≤ P (A) ≤ 1 ∀A ∈ D.
A2 P (S) = 1
A3 If A1 , A2 , A3 , . . . , is a sequence of mutually exclusive events
in D, then (∞ )
∪ ∑∞
P Ai = P (Ai ),
i=1 i=1
where P (A) is read the probability of the event A.
A1 , A2 and A3 are sometimes called the axioms of probability.
∪ D, we had the property:

Remark 1.3. For an algebra
If A1 and A2 ∈ D, then A1 A2 ∈ D.
Note also that the probability, P , of an event A, A ∈ S, is
calculated as follows
number of ways of occurring the event A
P (A) = .
number of ways of occurring the sample space S
Theorem 1.1. If A and B are any two events in a sample space
S, then we have
(i) P (∅) = 0.
(ii) P (Ac ) = 1 − P (A).
(iii) If A ⊆ B, then P (A) ≤ P (B).
(iv) P (A − B) = P (A) − P (A ∩ B).
(v) P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
Proof. Since ∅ ∪ S = S, ∅ ∩ S = ∅, then
P (∅ ∪ S) = P (∅) + P (S) = P (S) ⇒ P (∅) = 0, and this
proves (i).
Since A ∪ Ac = S, A ∩ Ac = ∅, then
P (A ∪ Ac ) = P (A) + P (Ac ) = P (S) = 1 and hence P (Ac ) =
1 − P (A). and this proves (ii).
By writing B as a union of two disjoint events as follows
5
B = A ∪ (B − A), then
P (B) = P (A) + P (B − A),
but since 0 ≤ P (A − B) ≤ 1, then P (A) ≤ P (B), and this
proves (iii).
By writing A as a union of two disjoint events, so
A = (A − B) ∪ (A ∩ B), then
P (A) = P (A − B) + P (A ∩ B), and hence
P (A − B) ≡ P (A ∩ B c ) = P (A) − P (A ∩ B), and this proves
(iv).
Similarly, one can prove that P (B − A) = P (B) − P (A ∩ B).
Now, by writing A ∪ B as a union of three mutually exclusive
events as follows
A ∪ B = (A − B) ∪ (A ∩ B) ∪ (B − A)
and hence
P (A ∪ B) =P (A − B) + P (B − A) + P (A ∩ B)
=P (A) − P (A ∩ B) + P (B) − P (A ∩ B) + P (A ∩ B)
=P (A) + P (B) − P (A ∩ B),
and this proves (v).
Example 1.7. Free two balanced coins are tossed simultane-
ously. Let A = {HH}, B = {HT, T H} and C = {HH, T T, HT }
be are three events in the sample space S of this experiment.
Find
(i) P (A ∪ B) (ii) P (A ∩ C) (iii) P (C − A)
(i) Here, the sample space is S = {HH, HT, T H, T T }. Since

A ∩ B = ∅, then
P (A ∪ B) =P (A) + P (B)
=0.25 + 0.5 = 0.75.
(ii) P (A ∩ C) = P (HH) = 0.25.
6
(iii) P (C − A) = P (C) − P (A ∩ C) = 0.75 − 0.25 = 0.25.
Example 1.8. Prove that if P (Ac ) = α and P (B c ) = β, then
P (A ∩ B) ≥ 1 − α − β.
Proof. Since P (Ac ) = 1 − P (A), then α = 1 − P (A) ⇒ P (A) =
1 − α.
Similarly, we can get P (B) = 1 − β, and hence
P (A ∩ B) =P (A) + P (B) − P (A ∪ B)
=1 − α + 1 − β + (−P (A ∪ B)).
Since 0 ≤ P (A ∪ B) ≤ 1 ⇒ −P (A ∪ B) ≥ −1, then
P (A ∩ B) ≥ 1 − α + 1 − β − 1 = 1 − α − β.
Example 1.9. If P (A ∩ B c ) = 0.2 and P (B c ) = 0.7. Find
P (A ∪ B).
∵ P (B c ) = 1 − P (B) ⇒ 0.7 = 1 − P (B) ⇒ P (B) = 0.3,
∵ P (A) = P (A ∩ B) + P (A ∩ B c ),
∴ P (A) − P (A ∩ B) = P (A ∩ B c ) = 0.2,
∵ P (A ∪ B) = P (A) + P (B) − P (A ∩ B),
∴ P (A ∪ B) = 0.2 + 0.3 = 0.5.
Example 1.10. A point is selected at random inside an equilat-
eral triangle whose side length is 3. Find the probability that its
distance to any corner is greater than 1.
Let A denote the set of points inside the shaded part con-
tained in the triangle, and so A consists of all points that its
distance to any corner is greater than 1, and let S denote the
set of points inside the triangle whose side length is 3.
Area of A = area of the triangle
− 3 (area of any sector in this triangle)
(π ) ( )
1 1 π
= × 3 × 3 × sin −3 × 12 ×
2 3 2 3
9 √ π
= 3− .
4 2
7
area of A
Since P (A) = ,
area of S
√
(9/4) 3 − π/2
⇒ P (A) = √ .
(9/4) 3
1.3 Conditional Probability and Multiplica-

tion Rule
Let us consider an experiment of recording the life of a light
bulb. If anyone interested in the probability that the bulb will
last 90 hours given that it has already lasted for 25 hours. Or
consider an experiment of sampling from a box containing 90
resistors of which 5 are defective. What is the probability that
the third draw results in a defective given that the first two
draws resulted in defective?. This question leads us to study
the following subject.
Definition 1.8 (Conditional probability). Let A and B be two
events in a sample space S and let P (B) > 0. Then the condi-
tional probability of the event A, given B, denoted by
P (A | B), is defined by
P (A ∩ B)
P (A | B) = .
P (B)
From the last equation we can write
P (A ∩ B) =P (B)P (A | B)
=P (A)P (B | A), by symmetry.
Since B = (A ∩ B) ∪ (Ac ∩ B), we have
P (B) =P {(A ∩ B) ∪ (Ac ∩ B)}
=P (A)P (B | A) + P (Ac )P (B | Ac ).
8
Multiplication Rule of Probabilities
If A1 , A2 , . . . , An are n events in a sample space S, then
P (A1 ∩ A2 , ∩, . . . , ∩An ) =P (A1 )P (A2 | A1 )P (A3 | A1 ∩ A2 ) . . .
P (An | A1 ∩ A2 ∩, . . . , ∩An−1 ).
Example 1.11. Two different digits are selected at random from
the digits 1 through 9.
(i) If the sum is even, find the probability that both numbers are
odd?
(ii) If the sum is odd, what is the probability that 2 is one of
the numbers selected?
(iii) If 2 is one of the digits selected, what is the probability that
the sum is odd?
Let’s solve each part of the problem step by step:
(i) If the sum is even, find the probability that both numbers
are odd?
To find the probability that both numbers are odd when the
sum is even, we first need to determine the total number of ways
to select two different digits from the digits 1 through 9.
There are 9 choices for the first digit and 8 choices for the
second digit (since it must be different from the first one). So,
there are 9 * 8 = 72 possible pairs of digits.
Now, we need to find the pairs where both numbers are odd
and the sum is even. The odd digits in the range 1 through 9
are 1, 3, 5, 7, and 9. To get an even sum, we can have pairs like
(1, 3), (1, 5), (1, 7), (3, 5), (3, 7), and (5, 7).
There are 6 such pairs, and for each pair, there are 2 ways to
arrange the digits (e.g., (1, 3) and (3, 1)). So, there are a total
of 6 * 2 = 12 pairs where both numbers are odd, and the sum
is even.
9
The probability of selecting one of these 12 pairs out of the
72 possible pairs is:
Probability = (Number of favorable outcomes) / (Total num-
ber of outcomes) Probability = 12 / 72 Probability = 1/6
So, the probability that both numbers are odd when the sum
is even is 1/6.
(ii) If the sum is odd, what is the probability that 2 is one of
the numbers selected?
To find the probability that 2 is one of the numbers selected
when the sum is odd, we again need to determine the total
number of ways to select two different digits from the digits 1
through 9, which is 72.
Now, we want to find the pairs where the sum is odd and one
of the numbers is 2. The odd digits are 1, 3, 5, 7, and 9, and we
want to pair 2 with one of these odd digits to get an odd sum.
There are 5 possibilities for the odd digit to pair with 2. For
each of these 5 cases, there are 2 ways to arrange the digits (e.g.,
(2, 1) and (1, 2)).
So, there are a total of 5 * 2 = 10 pairs where 2 is one of the
numbers selected, and the sum is odd.
72 possible pairs is:
So, the probability that 2 is one of the numbers selected when
the sum is odd is 5/36.
(iii) If 2 is one of the digits selected, what is the probability
that the sum is odd?
To find the probability that the sum is odd when 2 is one of
the digits selected, we can consider the cases where 2 is paired
with an odd digit (1, 3, 5, 7, or 9) to form a sum.
We already determined in part (ii) that there are 10 such
pairs where 2 is one of the numbers selected, and the sum is
10
odd.
total possible pairs with 2 is:
So, the probability that the sum is odd when 2 is one of the
digits selected is also 5/36.
To summarize:
(i) Probability that both numbers are odd when the sum is
even: 1/6
(ii) Probability that 2 is one of the numbers selected when
the sum is odd: 5/36
(iii) Probability that the sum is odd when 2 is one of the
digits selected: 5/36
Example 1.12. A box contains 8 red, 3 white and 9 blue balls.
Three balls are drawn at random without replacement from this
box. Find the probability that
(i) the three balls are red,
(ii) two balls are red and one is white,
(iii) at least one ball is white,
(iv) one from each color in the drawn balls,
(v) the drawn balls are accomplished according to the following
order (red, white, blue).
(i) Let Ri , i = 1, 2, 3, denote the event that the ith drawn ball
is red, W denote the event that the drawn ball is white and
let B denote the event that the drawn ball is blue, then
P (R1 ∩ R2 ∩ R3 ) =P (R1 )P (R2 | R1 )P (R3 | R1 ∩ R2 )
8 7 6 14
= × × = .
20 19 18 285
C38 14
Or P (R1 ∩ R2 ∩ R3 ) = 20 = .
C3 285
11
(ii)
P (2 red balls and 1 white ball) =

= P (R1 ∩ R2 ∩ W ) + P (R1 ∩ W ∩ R2 ) + P (W ∩ R1 ∩ R2 )
8 7 3 7
=3× × × = .
20 19 18 95
C8 × C3 7
Or P (2 red balls and 1 white ball) = 2 20 1 = .
C3 95
C317
(iii) P (there is no white ball) = , then
C320
P (drawing at least one white ball)

C317
= 1 − P (there is no white ball) = 1 − 20 .
C3
(iv)
P (one from each color in the drawn balls)

= P (R ∩ W ∩ B) + P (R ∩ B ∩ W )
+ P (W ∩ R ∩ B) + P (B ∩ R ∩ W )
+ P (W ∩ B ∩ R) + P (B ∩ W ∩ R)
8 3 9 18
=6× × × = .
20 19 18 95
Or P (one from each color in the drawn balls)
C18 × C13 × C19 18
= = .
C320 95
8 3 9 3
(v) P (R ∩ W ∩ B) = × × = .
20 19 18 95
Definition 1.9 (Independence). The two events A and B in
a sample space S are said to be independent if P (A ∩ B) =
P (A) · P (B).
12
One can show that if A and B are independent, then
P (A | B) = P (A).
The following two examples show that mutually exclusive events

< independent events.
Example 1.13. Consider the sample space of tossing a balanced
coin, then S = {H, T }. Take A = {H} and B = {T } ⇒
P (A) = P (B) = 1/2. Since A ∩ B = ∅ ⇒ P (A ∩ B) = 0 ̸=
P (A)P (B), and hence we conclude that, if the two events A and
B are disjoint, then this don’t lead in general to its independence.
i.e. disjoint events ; independent events.
Example 1.14. Consider the sample space of tossing two bal-
anced coins, then S = {HH, HT, T H, T T }. Take A = {HH, HT }
and B = {T T, HT } ⇒ P (A) = P (B) = 1/2. Since A ∩ B =
1
{HT } ⇒ P (A ∩ B) = = P (A)P (B), and hence we conclude
4
that, if the two events A and B are independent, then this don’t
lead in general to its disjoins . i.e. independent events ; dis-
joint events.
Example 1.15. Show that if A and B are two independent
events, then so Ac and B c .
P (Ac ∩ B c ) =P (A ∪ B)c = 1 − P (A ∪ B)
=1 − [P (A) + P (B) − P (A ∩ B)]
=1 − P (A) − P (B) + P (A)P (B)
=P (Ac ) − P (B)[1 − P (A)]
=P (Ac ) − P (B)P (Ac )
=P (Ac )[1 − P (B)]
=P (Ac )P (B c ).
13
Theorem 1.2 (Total probability). Suppose that the events A1 , A2 , . . . , An
form a partition of a sample space S and let B be another event
in S, then
∑ n
P (B) = P (Ai ) P (B | Ai ).
i=1
Proof.
The event B can be written, as a union
of mutually exclusive events, see Fig. (1.5),
in the form
B = (B ∩ A1 ) ∪ (B ∩ A2 ) ∪ · · · ∪ (B ∩ An ).
∴ P (B) = P (B ∩ A1 ) + P (B ∩ A2 ) + · · · + P (B ∩ An )
∑n
= P (B ∩ Ai ).
i=1
But, since P (B ∩ Ai ) = P (Ai ) P (B | Ai ), then
∑
n
P (B) = P (Ai ) P (B | Ai ).
i=1
Example 1.16. Urn I contains u black and v white balls; urn II

contains x black and y white balls. One ball is transferred from
I to II; one ball is then drawn from II. What is the probability
that it is white?
Let A be the event that the transferred ball from I is black
and let B be the event that the drawn ball from II is white, then
P (B) =P (A)P (B | A) + P (Ac )P (B | Ac )
u y v y+1
= + .
u+vx+y+1 u+vx+y+1
Now, if the drawn ball, in the last example, turns out to be
white. What is the probability that the transferred ball was
14
black. The answer at this question leads us to the following
theorem.
Theorem 1.3 (Bayes’ formula). If the events A1 , A2 , . . . , An
form a partition to S and B is another event in S, then
P (Ai ) P (B | Ai )
P (Ai | B) = ∑n , i = 1, . . . , n
i=1 P (Ai ) P (B | Ai )
Proof.
P (Ai ∩ B) P (Ai ) P (B | Ai )
P (Ai | B) = =
P (B) P (B)
P (Ai ) P (B | Ai )
= ∑n .
i=1 P (Ai ) P (B | Ai )
We can now answer at the last question

P (Ac ) P (B | Ac )
P (Ac | B) = .
P (A) P (B | A) + P (Ac ) P (B | Ac )
Example 1.17. Suppose that it is known that a fraction 0.001 of
the people in a town have tuberculosis (TB). A TB test is given
with the following properties: If the person does have TB, the
test will indicate it with a probability .999. If he does not have
TB, then there is a probability .002 that the test will erroneously
indicate that he does. For one randomly selected person, the
test shows that he has TB. What is the probability that he really
does?
Let A denote the event that the person has TB and E denote
the event that the person is diagnosed to have TB, then
P (A) = 0.001, P (Ac ) = 0.999, P (E | A) = 0.999 and
15
P (E | Ac ) = 0.002, and hence
P (A)P (E | A)
P (A | E) =
P (A)P (E | A) + P (Ac )P (E | Ac )
0.001 × 0.999
=
0.001 × 0.999 + 0.999 × 0.002
=0.333.
Example 1.18. A box contains 10 balls of which 3 are black
and 7 are white. The following game is played: At each trial a
ball is selected at random, its color is noted, and it is replaced
along with two additional balls of the same color. What is the
probability that a black ball is selected in each of the first three
trials?
Let Bi denote the event that a black ball is selected on the
th
i trial. By the multiplication rule,
P (B1 ∩ B2 ∩ B3 ) = P (B1 )P (B2 | B1 )P (B3 | B1 ∩ B2 )

3 5 7 1
= · · = .
10 12 14 16
Example 1.19. Two sets of candidates compete for positions of
Board of Directors of a company. The probabilities for wining
are 0.7 and 0.3 for the two. If the first set wins they will intro-
duce a new product with a probability 0.4. Similar value for the
second set is 0.8. If the new product was introduced, what is the
probability that the first set was directors?
Let A and B be the two sets of candidates. Then P (A wining) =
0.7, P (B wining) = 0.3 and let N be the event of introducing
new product, then P (N | A) = 0.4, P (N | B) = 0.8, and hence
P (A)P (N | A)
P (A | N ) =
P (A)P (N | A) + P (B)P (N | B)
0.7 × 0.4
= = 0.52.
0.7 × 0.4 + 0.3 × 0.8
16
Example 1.20. Three boxes contain balls. The first contains 10
white and 5 black balls, the second contains 7 white and 8 black
balls and the third contains 5 white and 10 black balls. One
box is chosen at random and then two balls are drawn, without
replacement, from this box and turn out to be has the same color.
What is the probability that it comes from the second box.
Let Ai denote the event of drawing box i, i = 1, 2, 3, and B
be the event of drawing two balls of the same color. Since the
box is chosen at random, then
P (A1 ) = P (A2 ) = P (A3 ) = 1/3, and hence
P (B) =P (A1 )P (B | A1 ) + P (A2 )P (B | A2 ) + P (A3 )P (B | A3 )
[ ] [ ] [ ]
1 C210 + C25 1 C27 + C28 1 C25 + C210
= + +
3 C215 3 C215 3 C 15
[ ] [ ]2
1 10 × 9 + 5 × 4 1 7×6+8×7
= +
3 15 × 14 3 15 × 14
[ ]
1 5 × 4 + 10 × 9
+ = 0.504.
3 15 × 14
P (A2 )P (B | A2 ) 0.15
∴ P (A2 | B) = = = 0.31.
P (B) 0.504
Example 1.21. In a multiple-choice test, assume that there are
five multiple-choice available to each question. Let p be the prob-
ability that a student knows the answer and q = 1 − p the proba-
bility that the student guesses. Assume also that the probability
that the student gets the right answer given that he guesses is
0.2. If the student got the right answer, what is the probability
that the student knew the right answer indeed.
Let A denote the event that the student got the right answer
and B denote the event that the student knew the right answer.
Using Bayes’ formula, we have
P (A | B)P (B) 1×p
P (B | A) = = .
P (A | B)P (B) + P (A | B c )P (B c ) 1 × p + .2 × q
17
Exercises I
In the following, suppose that A, B and C are events in a
sample space S.
1. If P (A) = 0.2, P (B) = 0.4, P (A∪B) = 0.5. Determine
(i) P (A ∩ B), (ii) P (Ac ∩ B), (iii) P (A ∩ B c ),
(iv)P (Ac ∩ B c ), (v) P (A | B), (vi) P (B c | Ac ).
2. Show that
P (A ∩ B) − P (A)P (B)
P (A | B) − P (A | B c ) =
P (B)P (B c )
3. If P (B) = 2/3, P (A ∩ B) = 1/2, P (A ∩ B c ) = 1/4.

Determine (i) P (A), (ii) P (A ∪ B), (iii) P (Ac ∩ B c )
1
4. If P (A) = 0.5, P (B) = , P (A ∪ B) = 0.6.
3
Are A and B independent? Why?
5. If P (A) = 0.3, P (B) = 0.5, P (A ∪ B) = 0.7. Determine
(i) P (Ac | B), (ii) P (A | B c ), (iii) P (Ac | B c ),
(iv) P [(A − B) ∪ (B − A)].
6. If P (B) = P (A | B) = P (C | A ∩ B) = 0.5. What is
P (A ∩ B ∩ C)?
7. Show that the conditional probability function P (· | B),
P (B) ̸= 0, satisfies the axioms of a probability space.
8. A pair of dice is tossed. If the numbers appearing are dif-
ferent, find the probability that the sum is even? [2/5]
9. In a certain college 25% of the boys and 10% of the girls are
studying statistics. The girls constitute 60% of the students
body. If a student is selected at random and is found to
be studying statistics, determine the probability that the
student is a girl. [3/8]
18
10. A bond issue for the construction of a new public library is
before the voters. A poll showed that 85% of those with a
college education favored the construction of a new library,
but only 20% of those not having a college education did
so. Suppose that 90% of the voting population do not have
a college education. What is the probability that a voter
selected at random who favors the bond issue will be one
with a college education? [.32]
11. A certain cancer diagnostic test is 95% accurate on those
that do have cancer, and 90% accurate on those that do not
have cancer. If 1/2% of the population actually does have
cancer, compute the probability that a particular individual
has cancer if the test finds that he has cancer.
12. In Polya’s urn scheme, an urn initially contains r red balls
and b black balls. At each trial a ball is selected at random,
its color is noted and it is replaced along with c additional
balls of the same color. What is the probability that one
obtains a red ball in each of first three trials?
13. Three machines A, B and C produce respectively 60%, 30%
and 10% of the total number of items of a factory. The
percentages of defective output of this machines are respec-
tively 2%, 3% and 4%. An item is selected at random and
is found defective. Find the probability that the item was
produced by machine A or B? [21/25]
19
Chapter 2
RANDOM VARIABLES,
PROBABILITY FUNCTIONS
AND EXPECTATIONS
The purpose of this chapter is to present the types of random

variables and types of probability functions in addition to math-
ematical expectation and moments.
2.1 Types of Random Variables
Definition 2.1 (Random variable). Given a random experiment
with a sample space S. A function X, which assigns to each
element s ∈ S one and only one real number X(s) = x, is called
a random variable.
i.e. X : S 7→ R; X(s) = x.
The range of X is the set of real numbers
20
RX = {x : X(s) = x; s ∈ S}.
Example 2.1. If we consider, for example, the sample space S
of tossing two balanced coins, and let the random variables X
and Y be defined as follows:
X ≡ number of outcome heads,
Y ≡ | difference between number of heads and number of tails|,
then
S ={HH, HT, T H, T T }
={s1 , s2 , s3 , s4 }
X(s1 ) ≡ X(HH) = 2, X(s2 ) ≡ X(HT ) = 1, X(s3 ) ≡ X(T H) =
1, X(s4 ) ≡ X(T T ) = 0. Similarly,
Y (s1 ) ≡ Y (HH) = 2, Y (s2 ) ≡ Y (HT ) = 0, Y (s3 ) ≡ X(T H) =
0, Y (s4 ) ≡ Y (T T ) = 2, then
P (X = 2) = P {s ∈ S : X(s) = 2} = P {HH} = 1/4.
Similarly,
P (X = 1) = P {s ∈ S : X(s) = 1} = P {HT, T H} = 1/2,
P (X = 0) = P {s ∈ S : X(s) = 0} = P {T T } = 1/4.
Similarly, for the random variable Y ,
P (Y = 0) = P {s ∈ S : Y (s) = 0} = P {HT, T H} = 1/2,
21
P (Y = 2) = P {s ∈ S : Y (s) = 2} = P {HH, T T } = 1/2.
Then, we can write



 

 1/4 ; x = 0 
  1/2 ; y=0
P (X = x) = 1/2 ; x = 1 , P (Y = y) = (2.1)

 
 1/2 ; y = 2.


 1/4 ; x = 2
When the range of a random variable X can be counted or
numerated, then this random variable is called discrete random
variable and the probability function that defined on this type
of random variables is called a probability mass function if
the two conditions given in the following definition are satisfied.
Definition 2.2 (Probability mass function). The function p(x),
defined on the discrete random variable X, where p(x) ≡ P (X =
x) = P {s : X(s) = x; s ∈ S} represents a probability mass
function if the following two conditions are satisfied
(i) p(x) ≥ 0 ∀ x ∈ RX
∑∞
(ii) −∞ p(x) = 1.
Remark 2.1. Some authors, sometimes, denote to the proba-
bility mass function by f (x).
One can notice that the two random variables mentioned
in Example (3.1) are discrete random variables, since RX =
22
{0, 1, 2} and RY = {0, 2} and the functions P (X = x) ≡ p(x)
and P (Y = y) ≡ p(y) are probability mass functions, since
p(x) ≥ 0 ∀x ∈ {0, 1, 2} and p(0) + p(1) + p(2) = 1. Similarly,
the last conditions are satisfied for p(y).
If the range of the random variable X can not be counted,
then this random variable is called continuous random vari-
able and the probability function that defined on this type of
random variables is called a probability density function
(pdf) if the two conditions given in the following definition are
satisfied.
Definition 2.3 (Probability density function). The function
f (x), defined on the continuous random variable X, represents
a probability density function if the following two conditions are
satisfied
(i) f (x) ≥ 0, ∀x ∈ RX
∫∞
(ii) −∞ f (x) dx = 1.
Remark 2.2. Condition (i), given above, means that the curve
of f (x) lies entirely above the x-axis, but condition (ii) means
that the area under this curve, bounded by the domain of x, is
equal 1.
23
Remark 2.3. The relation between the probability of an event
A and both of the pdf and probability mass function can be
written by the following two relations


 ∫

A f (x) dx ; X : continuous,
PX (A) ≡ P (X ∈ A) =
∑

X∈A p(x) ; X : discrete.
Example 2.2. Prove that the following functions represent prob-
ability functions


 1− | 1 − x | ; 0 < x < 2,
(i) f(x)=

 0; otherwise.
{ ( )2 }
1 1 x−µ
(ii) f (x)= √ exp − ; −∞ < x < ∞,
σ 2π 2 σ
.  (−∞ < µ < ∞, σ > 0)
x

 e−2 2 ; x = 0, 1, 2, . . . ,
(iii) f(x)= x!

 0; otherwise.
(i) Since x is defined on an interval, the range can not be
counted, then X is a continuous random variable. From
the definition of absolute value we have


 1 − x ; x < 1,



| 1 − x |= 0; x = 1,




 x − 1 ; x > 1.
24
Then we can write f (x) in the form


 x; 0 < x < 1,
f (x) =

 2 − x ; 1 ≤ x < 2.
It is clear from the definition of f (x) that f (x) ≥ 0; ∀x ∈

∫∞
(0, 2). Now, we prove that −∞ f (x) dx = 1.
∫ ∞ ∫ 0 ∫ 1
f (x) dx = f (x) dx + f (x) dx
−∞ −∞ 0
∫ 2 ∫ ∞
+ f (x) dx + f (x) dx
1
∫ 1 2
∫ 2
= 0 + x dx + (2 − x) dx + 0
0 1
x2 1(2 − x)2 2
= −
2 0[ 2] 1
1 1
= − 0− = 1,
2 2
and hence f (x) is a pdf.
(ii) Since x is defined on an interval, then the range of X can not
be counted and hence X is a continuous random variable.

∫ ∞ { ( )2 }
1 −1 x − µ
To prove that I = √ exp = 1,
σ 2π −∞ 2 σ
we use the substitution z = (x − µ)/σ ⇒ dx = σdz, then
∫ ∞ ∫ ∞
σ 2
I= √ exp(−z /2)dz = √
2
exp(−z 2 /2)dz,
σ 2π −∞ 2π 0
since exp(−z 2 /2) is an even function.
√
Now, put y = z 2 /2 ⇒ z = 2y; z > 0, ⇒ dz =
25
(2y)−1/2 dy, and hence
∫ ∞ ∫ ∞
2 1 Γ(1/2)
I=√ (2y)−1/2 e−y dy = √ y −1/2 e−y dy = √ = 1,
2π 0 π 0 π
and hence f (x) is a pdf.
Remark 2.4. From the definition of gamma function we
have
∫ ∞ √
xn−1 exp(−x/β)dx = Γ(n).β n , Γ(1/2) = π.
0
∑∞ n
(iii) We know before that n=0 x /n! = ex , then
∞
∑ ∑∞
−2 2x −2 2x
e =e = e−2 .e2 = 1,
x=0
x! x=0
x!
and hence f (x) is a probability mass function.
2.2 Mathematical Expectation and Moments
2.2.1 Mean, variance and moments
Definition 2.4 (Mean). Let X be a random variable having pdf
f (x) or probability mass function p(x), then the mean (expected
value) of X, denoted by E[X] or µX , is defined by


 ∫
 ∞ xf (x)dx; X : Continuous,
−∞
E[X] =
 ∑ xi p(xi );

X : Discrete.
i
26
Definition 2.5 (Variance). Let X be a random variable having
pdf f (x) or probability mass function p(x), then the variance of

2
X, denoted by V [X] or σX , is defined by

 ∫
 ∞ (x − µX )2 f (x)dx; X : Cont.,
−∞
V [X] = E[(X − µX )2 ] =
 ∑ (xi − µX )2 p(xi ); X : Disc.

i
In general we have the following definition
Definition 2.6. Let g(X) be a function of the random variable
X having pdf f (x) or mass function p(x). The expected value
of g(X) is defined by

 ∫
 ∞ g(x)f (x)dx; X : Continuous,
−∞
E[g(X)] =
 ∑ g(xi )p(xi );

X : Discrete.
i
Special Cases
(i) If g(X) = X r , then


 ∫
 ∞ xr f (x)dx; X : Cont.,
′ −∞
E[g(X)] = E[X r ] = µr =
 ∑ xr p(xi ); X : Disc.,

i i
which is called the rth moment of the random variable X
about the origin, r = 0, 1, 2, . . . . If r = 1, then E[g(X)] =
E[X] = µX which is called the mean of the random variable
27
X, i.e. the first moment of X about the origin is itself the
expected value of X.
(ii) If g(X) = (X − µX )r , then
E[g(X)] = E[X − µX ]r = µr

 ∫
 ∞ (x − µX )r f (x)dx; X : Cont.,
−∞
=
 ∑ (xi − µX )r p(xi ); X : Disc.,

i
which is the rth moment of the random variable X about
its mean, r = 0, 1, 2, . . . . It is called also the rth central
moment of X about its mean. If r = 2, then E[g(X)] =
E[(X − µX )2 ] which is called the variance of X, i.e. the
second central moment of X about its mean is itself the
variance of X.
Remark 2.5. The expected value of X, E[X], represents the
center of gravity of the unite mass that is determined by the
density function of X. So the mean of X is a measure of where
the values of the random variable X are centered, but the vari-
ance of X, V [X], represents the moment of inertia of the same
density with respect to a perpendicular axis through the center
of gravity.
28
Definition 2.7 (Standard deviation). The standard deviation
√
of a random variable X, denoted by σX , is defined as + V [X].
Properties of mean and variance
(i) E[c] = c, V [c] = 0 where c is a constant,
(ii) E[cg(X)] = cE[g(X)], V [cg(X)] = c2 V [g(X)],
(iii) E[c1 g1 (X1 ) + c2 g2 (X2 )] = c1 E[g1 (X1 )] + c2 E[g2 (X2 )],
V [c1 g1 (X1 )+c2 g2 (X2 )] = c21 V [g1 (X1 )]+c22 V [g2 (X2 )]; X1 and
X2 are independent.
(iv) If g1 (x) ≤ g2 (x) ∀x, then E[g1 (X)] ≤ E[g2 (X)].
The above properties can be proved, simply, by applying the
definitions of mean and variance.
Theorem 2.1. If X is a random variable,
V [X] = E[X 2 ] − (E[X])2 ,
provided E[X 2 ] exists.
29
Proof.
V [X] = E[X − E[X]]2
= E[X 2 − 2X E[X] + (E[X])2 ]
= E[X 2 ] − 2(E[X])2 + (E[X])2
= E[X 2 ] − (E[X])2 .
2.2.2 Measures of skewness and kurtosis
When the curve of the distribution is more extended to the
right(left), then we say that the curve is skewed to the right(left)
or it has positive(negative) skewed, see Fig. (2.3).
Definition 2.8 (Measure of skewness). The measure of skew-
ness, denoted by γ1 , is defined to be the ratio between the third
moment about the mean and cubic of the standard deviation,
30
i.e.
µ3
γ1 = 3
.
( σ )
mean − median
The quantity provides an alternative
standard deviation
measure of skewness.
If the measure of skewness is positive(negative), then this
means that the mean is bigger(smaller) than the median. When
the curve is symmetric, then the mean is equal to the median
and hence the measure of skewness is equal to zero, since all
moments of odd order about the mean are equal to zero.
The kurtosis is the degree of flatness of a density curve near
its center.
Definition 2.9 (Measure of kurtosis). The measure of kurtosis,
denoted by γ2 , is defined to be the ratio between the fourth
moment about the mean and squared of the variance, i.e.

µ4
γ2 = .
σ4
31
(µ )
4
Positive(negative) values of − 3 are sometimes used to
σ4
indicate that a density is more peaked(flat) around its center
than the density of a normal curve. Curves with γ2 < 3 are
called Platykurtic while those with γ2 > 3 are called Leptokurtic.
Example 2.3. A box contains 8 items of which 2 are defective.
A man selected 3 items from this box, find the expected number of
defective items he had drawn if (i) the draw without replacement,
(ii) the draw with replacement.
(i) The number of defective items may be 0,1 or 2. If X is a
random variable shows the number of defective items, then the
distribution of X can be written as follows

C02 C36 20
P (X = 0) = p(0) = = ,
C38 56
C12 C26 30
P (X = 1) = p(1) = = ,
C38 56
C 2C 6 6
P (X = 2) = p(2) = 2 8 1 = .
C3 56
x 0 1 2
p(x) 20/56 30/56 6/56
xp(x) 0 30/56 12/56

Then
∑
2
3
E[X] = xp(x) =
x=0
4
32
The random variable here has the hypergeometric distribution
as we shall see in the next chapter.
The reader may be left (ii) until he knows the binomial dis-
tribution that is explained in the next chapter.
Example 2.4. A balanced coin is tossed until a head or four
tails occurs. Find the expected number, E, of tosses of the coin.
We know that the sample space of tossing a coin one, two,
three or four times consists of 2, 4, 8 or 16 elements, then only
one toss occurs if head occurs the first time, two tosses occur
if the first is tail and the second is head. Three tosses occur if
the first two are tails and the third is head. Four tosses occur if
either T T T H or T T T T occurs. So
1 1
p(1) = P {H} = , p(2) = P {T H} =
2 4
1
p(3) = P {T T H} = ,
8
1 1 2
p(4) = P {T T T H} + P {T T T T } = + = ,
16 16 8
then
1 1 1 2 15
E =1× +2× +3× +4× = .
2 4 8 8 8
Sometimes we denote the probability mass function p(x) by
f (x).
33
Exercises II
1. Let X be a random variable with pdf f (x) = 2x/9, x ∈
A = {x : 0 < x < 3} and let A1 = {x : 0 < x < 1}, A2 =
{x : 2 < x < 3}, determine P (A1 ), P (A2 ), P (A1 ∪ A2 ).
2. Let X be a random variable with probability mass function
f (x) = x/15; x = 1, 2, 3, 4, 5; zero elsewhere. Find P (X =
1 or 2), P (0.5 < X < 2.5) and P (1 ≤ X ≤ 2).
3. (a) Find the value of the constant C for the following func-
tion to be a probability function, where f (x) = Cxe−2x ;
x > 0, zero elsewhere.
(b) If X has the pdf given in (a), find the cdf, F (x), and
then calculate
i. P (X > 3).
ii. P (1 < X < 3 | 0 < X < 2).
4. Find the value of the constant C for the following functions
to be probability functions
(a) f (x) = C e−x +2x , −∞ < x < ∞.

2
  
 −8   −4 
(b) f (x) = C    ; x = 0, 1, 2, 3, 4, 5, 6.
x 6−x
34
C ex
(c) f (x) = , −∞ < x < ∞.
2(1 + ex )2


 √
1
; | x |< 1,
(d) f (x) = π 1 − x2

 0; otherwise.
5. Find the mode of each of the following distributions

( )x
1
(a) f (x) = , x = 1, 2, . . . , zero elsewhere.
2
x2 −x
(b) f (x) = e ; 0 < x < ∞, zero elsewhere.
2
6. Find the median of each of the following distributions

( )x ( )4−x
1 3
(a) f (x) = Cx4 , x = 0, 1, 2, 3, 4.
4 4
(b) f (x) = 3 x2 ; 0 < x < 1, zero elsewhere.
x+2
7. Let X be a random variable have pdf f (x) = ; −2 <
18
x < 4, zero elsewhere. Find E[X], E[(X +2)3 ], and E[6X −
2(X + 2)3 ].
8. A fair coin is tossed four times. Let X denote the number
of heads occurring. Find the distribution, mean, variance
and standard deviation of X.
9. A box contains 10 transistors of which 2 are defective. A
transistor is selected from the box and tested until a non-
35
defective one is obtained. Find the expected number of
transistors to be chosen. [11/9]
10. A fair coin is tossed until a head appears, Let X denote the
number of tosses required.
(a) Find the density function of X
(b) Find the moment generating function of X and hence
find its mean and variance.
11. Consider the rth central moment of gamma distribution as
follows
∫ ∞
(x − αβ)r α−1 −x/β
µr = E[(X − αβ) ] =r
x e dx.
0 Γ(α)β α
[ ]
dµ r
(a) Prove that µr+1 = β 2 α r µr−1 + , r = 1, 2, . . .
dβ
(b) Use the fact that µ0 = 1, µ1 = 0 and the differen-
tial equation in (a) in calculating the central moments
µ2 , µ3 and µ4 of gamma distribution with two parame-
ters (α, β).
36
Chapter 3
SOME IMPORTANT
DISCRETE DISTRIBUTIONS
In this chapter, some distributions such as the binomial, multi-
nomial, Poisson, and hypergeometric distributions are presented.
3.1 Binomial Distribution
A random experiment whose outcomes have been classified into
two categories, called “success” and “failure”, denoted respec-
tively by S and F, is a called a Bernoulli trial (for example, head
or tail, life or death, good or defective, boy or girl, etc). Suppose
that the random experiment consists of n repeated independent
Bernoulli trials and p is the probability of success at each indi-
37
vidual trial, then this random experiment is called a binomial
experiment. The term “repeated” is used to indicate that the
probability of success, P (S) ≡ p, remains the same from trial to
trial, thus the probability of failure on each repetition is 1 − p.
Definition 3.1 (Binomial distribution). If the random variable,
X, represents the number of successes in n independent trials of
a binomial experiment, then X subjects to the binomial distri-
bution with probability function



 C n px q n−x , x = 0, 1, . . . , n, p + q = 1,
x
P (X = x) = p(x) =

 0, otherwise
where the parameters n and p satisfy the following
n ∈ Z+ and p ∈ [0, 1].
Remark 3.1. From now on, we use X ∼ b(n, p) to mean that
the random variable X subjects to the binomial distribution
with two parameters n and p.
Conditions of the binomial experiment
1. The experiment has one of two outcomes, one is called suc-
cess with probability p and the other is called failure with
probability q = 1 − p.
38
2. The experiment is repeated n independent trials.
3. The probability of success is constant in each trial.
One can now notice that the probability given by the binomial
distribution may arise in the following ways:
(i) When sampling from a finite population with replacement.
(ii) When sampling from an infinite population (often referred
to as an indefinitely large population) with or without re-
placement.
Theorem 3.1. If the random variable X ∼ b(n, p), then the
mean, variance and moment generating function of X are given,
respectively, by np, npq and (pet + q)n .
Proof.
∑
n
E[X] = xp(x)
x=0
∑n
n!
= x px q n−x , q =1−p
x=0
x!(n − x)!
∑n
n (n − 1)!
= x p px−1 q n−x
x=0
x(x − 1)!(n − x)!
∑
n
(n − 1)!
= np px−1 q n−x
x=1
(x − 1)!(n − x)!
= np(p + q)n−1 = np
39
∑
n
2
E[X ] = x2 p(x)
x=0
∑n
= [x(x − 1) + x]p(x)
x=0
∑n
= x(x − 1)p(x) + E[X]
x=0
∑n
n (n − 1)(n − 2)!
= x(x − 1) p2 px−2 q n−x + np
x=2
x(x − 1)(x − 2)!(n − x)!
∑
n
(n − 2)!
= n(n − 1)p 2
px−2 q n−x + np
x=2
(x − 2)!(n − x)!
= n(n − 1)p2 (p + q)n−2 + np
= n(n − 1)p2 + np.

Since V [X] = E[X 2 ] − (E[X])2 , then
V [X] =n(n − 1)p2 + np − (np)2
= npq.
Now,
∑
n
tX
mX (t) = E[e ] = etx p(x)
x=0
∑
n
= etx Cxn px q n−x
x=0
∑n
= Cxn (pet )x q n−x
x=0
= (pet + q)n .
40
Example 3.1. The probability that a patient recovers from a
rare blood disease is 0.4. If 15 people are known to have con-
tracted this disease, what is the probability that (a) at least 10
survive, (b) from 3 to 8 survive, and (c) exactly 5 survive.
Let X ∼ b(15, 0.4) and shows the number of people that survive.
Then
(a)
∑
9
P (X ≥ 10) = 1 − P (X < 10) = 1 − Cx15 0.4x 0.615−x
x=0
= 1 − 0.9662 = 0.0338.
(b)
∑
8
P (3 ≤ X ≤ 8) = Cx15 0.4x 0.615−x
x=3
= 0.9050 − 0.0271 = 0.8779.
(c)
P (X = 5) = C515 0.45 0.610 = 0.185.
Example 3.2. A pheasant hunter brings down 75% of the birds
he shoots at. What is the probability that at least 3 of the next
5 pheasants shot at will escape?
41
Let X ∼ b(5, 0.25) and shows the number of escaping pheas-
ants, then
∑
5
P (X ≥ 3) = Cx5 .25x 0.755−x .
x=3
Example 3.3. The moment generating function of a random

( )9
2 1 t
variable X is + e . Show that
3 3
∑
5 ( )x ( )9−x
1 2
P (µ − 2σ < X < µ + 2σ) = Cx9 .
x=1
3 3
( )9
2 1 t
The moment generating function + e is the moment
3 3
generating function of the binomial distribution with parameters
√
n = 9, p = 1/3, then µ ≡ E[X] = np = 3, σ ≡ V [X] =
√ √
npq = 2 and hence
√ √
P (µ − 2σ < X < µ + 2σ) = P (3 − 2 2 < X < 3 + 2 2)
= P (0.2 < X < 5.8) = P (1 ≤ X ≤ 5)

∑5 ( )x ( )9−x
1 2
= Cx9 .
x=1
3 3
Example 3.4. If x = r is the unique mode of a distribution
which is b(n, p), show that (n + 1)p − 1 < r < (n + 1)p
42
If x = r is the only mode, then it must satisfy P (X = r+1) <
P (X = r) and P (X = r − 1) < P (X = r), then

n
Cr+1 pr+1 q n−r−1 (n − r)p
< 1 ⇒ < 1 ⇒ (n − r)p < (r + 1)(1 − p).
Crn pr q n−r (r + 1)q
It follows that
r > (n + 1)p − 1. (3.1)
Similarly,
n
Cr−1 pr−1 q n−r+1 rq
< 1 ⇒ < 1 ⇒ r(1 − p) < (n − r + 1)p.
Crn pr q n−r (n − r + 1)p
It follows that
r < (n + 1)p. (3.2)
From (3.1) and (3.2), we have
(n + 1)p − 1 < r < (n + 1)p.
Example 3.5. Let X be b(2, p) and let Y be b(4, p). If P (X ≥
1) = 5/9. Find P (Y ≥ 1)
P (X ≥ 1) = C12 pq + C22 p2 = 5/9 ⇒ 2p(1 − p) + p2 = 5/9 ⇒
9p2 − 18p + 5 = 0 ⇒ (3p − 5)(3p − 1) = 0, then
p = 1/3 (accepted) or p = 5/3 (rejected).
Thus, P (Y ≥ 1) = 1 − P (Y = 0) = 1 − C04 (2/3)4 .
43
3.2 Poisson Distribution
The Poisson distribution appears in many natural phenomena.
Among others, the number of misprints per page in a large text,
the number of telephone calls per minute at some switchboard
and the number of α particles emitted by a radioactive substance
per unite of time.
Definition 3.2 (Poisson distribution). The random variable X
subjects to Poisson distribution, with parameter λ, if its proba-
bility function is given by

 −λ x

 e λ , x = 0, 1, 2, . . . , (λ > 0),
P (X = x) = p(x) = x!

 0, otherwise.
Here the random variable X shows the number of successes in
a certain period of time or in a bounded region, and λ represents
the standard or the average of this number. The period of time
may be (minute, hour, day, week or month) and the bounded
region may be (page in a book, squared meter from the area or
cubic meter from the volume).
Example 3.6. The average number of accidents on some route
is 5 per week. What is the probability of there is no accident on
44
this route in a certain week? What is the probability of occurring
4 accidents or less in a certain week? What is the probability
that the number of accidents is more than two accidents through
two weeks.
Let X be a random variable subjects to the Poisson distribu-
tion and shows the number of accidents in a certain week,
(i)
e−5 50
P (X = 0) = = e−5 ,
0!
(ii)
∑
4
e−5 5x
P (X ≤ 4) = = 0.4405
x=0
x!
(iii) If X shows the number of accidents in two weeks, then the
average of accidents in two weeks is λ = 2 × 5 = 10, and
hence
∑
2
e−10 10x
P (X > 2) = 1 − P (X ≤ 2) = 1 − = 0.9972.
x=0
x!
Example 3.7. Suppose 220 misprints are distributed randomly
throughout a book of 200 pages. Find the probability that a given
page contains (i) no misprints, (ii) at least two misprints.
45
Suppose that X is a random variable subjects to Poisson dis-
220
tribution, with parameter λ = = 1.1, and shows the number
200
of misprints in a given page, then
e−1.1 1.10
(i) P (X = 0) = = e−1.1
0!
(ii)
P (X ≥ 2) = 1 − P (X < 2) = 1 − [P (X = 0) + P (X = 1)]
( )
−1.1 e−1.1 1.11
=1− e + = 0.334.
1!
Theorem 3.2. If the random variable X subjects to Poisson
distribution with parameter λ, then E[X] = λ, V [X] = λ and
m(t) = e−λ(1−e )
t
Proof.
∞
∑ ∞
∑ xe−λ λx
E[X] = xp(x) =
x=0 x=0
x!
∑∞
xe−λ λλx−1
=
x=1
x(x − 1)!
∑∞
e−λ λx−1
=λ = λ.
x=1
(x − 1)!
46
∞
∑
2
E[X ] = x2 p(x)
x=0
∑∞
[x(x − 1) + x]e−λ λx
=
x=0
x!
∑∞
[x(x − 1)]e−λ λx
= + E[X]
x=0
x!
∑∞
x(x − 1)e−λ λ2 λx−2
= +λ
x=2
[x(x − 1)](x − 2)!
∞
∑
2 e−λ λx−2
=λ +λ
x=2
(x − 2)!
= λ2 + λ.
Since V [X] = E[X 2 ] − (E[X])2 , then
V [X] = λ2 + λ − λ2 = λ.
∞
∑ ∞
∑ ∞
∑
tx etx e−λ λx −λ (λet )x
m(t) = e p(x) = =e
x=0 x=0
x! x=0
x!
{ }
−λ λet
=e e
= e−λ(1−e ) .
t
Remark 3.2. The Poisson distribution is the only distribution
which has the mean equal to the variance.
The Poisson and binomial distributions have histograms with
approximately the same shape when n is large and p is close
47
to zero. Hence if these two conditions hold, the Poisson dis-
tribution, with µ = np, can be used to approximate binomial
probabilities. If p is close to one, we can interchange what we
have defined to be a success and a failure, thereby changing p
to a value close to zero. This approximation is illustrated in the
following theorem, which we state without proof:
Theorem 3.3. Let X be a random variable which has a binomial
distribution with parameters n and p. If n → ∞ and p → 0 such
that np = λ, then the binomial distribution tends to the Poisson
distribution with parameter λ = np.
3.3 Hypergeometric Distribution
The hypergeometric probability function provides probabilities
of certain events when a sample of n objects is drawn at random
from a finite population of N objects, where the sampling is done
without replacement.
Definition 3.3 (Hypergeometric distribution). The random vari-
able X subjects to the hypergeometric distribution if its proba-
bility function is given by
48
   

 N N − N1 

  1  

 






   

 x n−x

   , x = 0, 1, . . . , n,
P (X = x) ≡ p(x) = N 


 
 

  

  

 n



 0, otherwise.
where N is a positive integer, N1 is a nonnegative integer; n1 ≤
N , and n is a nonnegative integer; n ≤ N.
The random variable X in the hypergeometric distribution
represents, say, the number of defective items (drawn without
replacement) within a total of drawn items, n, from a set of
items N that includes N1 defective items.
Theorem 3.4. If the random variable X subjects to the hyper-
geometric distribution with parameters N, n and N1 , then
N1 N1 N − N1 N − n
E[X] = n and V [X] = n .
N N N N −1
The proof of the last theorem is omitted.

N1
Remark 3.3. If we set = p, then the mean of the hyper-
N
geometric distribution coincides with the mean of the binomial
distribution, as shown in the following example, but the vari-
49
(N − n)
ance of the hypergeometric distribution is times the
(N − 1)
variance of the binomial distribution.
Example 3.8. A box contains 8 items of which 2 are defective.
A man selected 3 items from this box. Find the distribution and
the expected number of defective items he had drawn if
(i) the draw without replacement,
(ii) the draw with replacement.
Let X be a random variable shows the number of defective
items and if the draw occurred without replacement, then X will
subject to the hypergeometric distribution with N = 8, N1 = 2
and n = 3 and hence the distribution of X takes the form
 2 6
 C C
 x 83−x , x = 0, 1, 2,
P (X = x) = C3

 0, otherwise.
N1 3×2 3
We know before that E[X] = n = = .
N 8 4
If the draw occurred with replacement, then X will subject
N1 2
to the binomial distribution with n = 3, p = = and hence
N 8
the distribution of X takes the form
50
 ( )x ( )3−x

 2 6
 Cx3 , x = 0, 1, 2, 3
P (X = x) = 8 8


 0, otherwise.
2 3
In this case, E[X] = np = 3 × = .
8 4
Number (i) of this example was solved before in Example
(2.4), but (ii) was left to be solved after studying the binomial
distribution.
Theorem 3.5. If in the hypergeometric distribution N1 → ∞,
then the hypergeometric distribution tends to the binomial dis-
tribution with p = N1 /N
The proof of the last theorem is omitted.
Remark 3.4. The last theorem tells us that for very large N
(population size), sampling with replacement gives approximately
the same probabilities as sampling without replacement.
Example 3.9. A large box contains 150 white mice and 35 gray
mice. A researcher draw (without replacement) 5 mice to per-
form a certain experiment. What is the probability of getting 3
white mice among the 5 selected mice.
51
Let X ∼ H(5, 150, 185) and shows the number of white mice
in the drawn sample, then

185−150
C3150 C5−3
P (X = 3) = ,
C5185
but we notice that the number of white mice is large, so it is
prefer to use the binomial distribution with the parameters n =
5 and p = 150/185. Therefore

( )3 ( )2
150 35
P (X = 3) = C35 = 0.19
185 185
Example 3.10. The telephone company reports that among 5000
telephones installed in a new subdivision 4000 are nonwhite. If
10 people are called at random, what is the probability that ex-
actly 3 will be talking on white telephones.
Since the population size N = 5000 is large relative to the
sample size n = 10, we shall approximate the desired probability
by using the binomial distribution. The probability of calling

4000
some one with a non white telephone is = 0.8. Therefore,
5000
the probability that exactly 3 people are called will be talking
on white telephones is
P (X = 3) = C310 (0.2)3 (0.8)7 .
52
Example 3.11. Two dice are thrown 100 times and the number
of “nines” is recorded. What is the probability that x “nines”
occur? That at least three “nines” occur?
It is apparent that we are examining each roll of the two
dice for the events “nines” or “not-nines”. The probability of
obtaining a nine by throwing two dice is 4/36 = 1/9, that is
p = 1/9. Hence, using the binomial distribution,

( )x ( )100−x
1 8
p(x) = Cx100 , x = 0, 1, 2, . . . , 100.
9 9
In answer to the second question, we have
P (X ≥ 3) = 1 − P (X ≤ 2)
[ 2 ( )x ( )100−x ]
∑ 1 8
=1− Cx100
x=0
9 9
= 0.9993.
53
Exercises III
1. Let X ∼ b(25, 0.2). Evaluate P (X < µX − 2σX ).
2. If X ∼ b(n, p), show that

[ ] [( )2 ]
X X p(1 − p)
E = p and E −p = .
n n n
3. Criticize the following statements
(a) The mean of binomial distribution is 6 and its standard
deviation is 3.
(b) The mean of binomial distribution is 5 and its standard
deviation is 2.
(c) The mean of Poisson distribution is 5 and its standard
deviation is 4.
4. If X has a Poisson distribution with P [X = 1] = P [X = 2],
what is P [X = 1 or 2]?
5. If X has a Poisson distribution with mean 1, show that
E[| X − 1 |] = 2σX /e?
6. The moment generating function of a random variable X
is e4(e −1) . Show that P (µ − 2σ < X < µ + 2σ) = 0.931.

t
54
7. Find the mode of Poisson distribution with parameter λ.
8. A multiple-choice quiz has 3 questions, each with 4 possible
answers of which only one is the correct answer. (a) What
is the probability that sheer guesswork yields at most two
correct answers. (b) What is the probability that a student,
answer by guesswork, will succeed. (c) What is the proba-
bility that a student, answer by guesswork, will get the full
mark. (d) If the third question has 3 possible answers of
which only one is correct, then find the expected number of
correct answers when a student tries, randomly, to answer
these questions.
9. An insurance company finds that 0.0005 of the population
die from a certain kind of accidents each year. What is the
probability that the company must pay off on more than
three of 10,000 insured risks against such accidents in a
given two years.
10. Suppose that it is known that a certain kind of bacteria is
distributed in water at the rate of two bacteria per cubic
centimeter of water. If we assume that this phenomenon
55
can be approximated by a Poisson model. What is the
probability that a sample of 2 c.c. will contain at least two
bacteria?[1 − 5e−4 ]
11. If 5 cards are dealt from a standard deck of 52 playing cards,
what is the probability that 3 will be heart? [0.0815]
56
Chapter 4
SOME IMPORTANT
CONTINUOUS
DISTRIBUTIONS
In this chapter, some distributions such as the uniform, normal,
and gamma distributions are presented.
4.1 Uniform Distribution
Definition 4.1 (Uniform distribution). The random variable X
subjects to the uniform (rectangular) distribution, with two pa-
rameters a, b, if its density function is given by
57


 1
; −∞ < a ≤ x ≤ b < ∞,
f (x) = b−a

 0; otherwise.
Theorem 4.1. If X is uniformly distributed over [a, b], then
a+b (b − a)2 ebt − eat
E[X] = , V [X] = and mX (t) = .
2 12 (b − a)t
Proof.
∫ ∞ ∫ b
x x2 b
E[X] = x f (x) dx = dx =
−∞ a b−a 2(b − a) a
b −a
2 2
b+a
= = .
2(b − a) 2
∫
2
b
x2 b3 − a3 b2 + ab + a2
E[X ] = = = ,
a b − a 3(b − a) 3
V [X] = E[X 2 ] − (E[X])2
b2 + ab + a2 (a + b)2
= −
3 4
4b + 4ab + 4a − 3a2 − 6ab − 3b2
2 2
=
12
b − 2ab + a
2 2
(b − a)2
= = .
12 ∫ 12
b
tX etx
mX (t) = E[e ] = dx
a b−a
etx b
=
(b − a)t a
ebt − eat
= .
(b − a)t
58
Example 4.1. Suppose X is a continuous random variable with
uniform distribution having mean 1 and variance 4/3. What is
P [X < 0]?
If X is uniformly distributed with two parameters a,b; a < b,
then
a+b
E[X] = =1 (4.1)
2
(b − a)2 4
V [X] = = . (4.2)
12 3
From (4.1) we can write a = 2 − b and then by substituting in
(4.2), yields that

(b − 2 + b)2 4
= ⇒ (b − 1)2 = 4
12 3
⇒ b2 − 2b − 3 = 0 ⇒ (b + 1)(b − 3) = 0,
then either b = −1 and hence a = 3 or b = 3 and hence a = −1.
The first choice for a and b is rejected since a < b, thus the
second choice is accepted.

∫ 0
1 x 0 1 1
P (X < 0) = dx = −1
= [0 − (−1)] = .
−1 4 4 4 4
4.2 Normal Distribution
Definition 4.2 (Normal distribution). The random variable X
subjects to the normal distribution, with two parameters µ and
59
σ, if its density function is given by
[ ( )2 ]
1 −1 x − µ
f (x) = √ exp , −∞ < x < ∞,
σ 2π 2 σ
(−∞ < µ < ∞, σ > 0), π = 3.14.
(4.3)
If the random variable X is normally distributed with mean µ
and variance σ 2 (later on we prove that the mean is µ and the
variance is σ 2 ), we will write X ∼ N (µ, σ 2 ). We will also use
the notation Φµ,σ2 (x) for the cumulative distribution function.

x−µ
If in (4.3) z = , then
σ
[ 2]
1 −z
fZ (z) = √ exp , −∞ < z < ∞,
2π 2
is the density function of the random variable Z with two pa-
rameter values µ = 0 and σ 2 = 1, which is called standard
normal random variable, i.e. Z ∼ N (0, 1).
The graph of the normal distribution is called the normal curve,
Properties of the normal curve
1. The curve is symmetric about a vertical axes through the
mean µ and it has the bell-shape.
2. The mode, which is the point on the horizontal x-axes
where the curve is a maximum, occurs at x = µ.
60
3. The normal curve approaches the horizontal x-axes as x →
±∞
4. The total area under the curve and above the x-axes is
equal to 1.
Fortunately, to avoid the use of integral calculus, we are able to
transform all of the observations of any normal random variable
X to a new set of observations of a standard normal random
variable Z with mean zero and variance one, using the transfor-
mation
(X − µ)
Z= ,
σ
(µ − µ) V (X − µ) σ 2
where E[Z] = = 0 and V [Z] = = 2 = 1.
σ σ2 σ
Theorem 4.2. If X ∼ N (µ, σ 2 ), then
E[X] = µ, V [X] = σ 2 .
Proof.
∫ [ ( )2 ]
∞
1 −1 x−µ
I ≡ E[X] = √ x exp dx.
σ 2π −∞ 2 σ
x−µ
Using the transformation z = ⇒ x = σz +µ ⇒ dx = σdz,
σ
61
then
∫ ∞ ] [
1 −1 2
I= √ (σz + µ)exp z σdz
σ 2π −∞ 2
∫ ∞ ∫ ∞
−z µ
e−z
2 2
=σ ze /2
dz + √ /2
dz
−∞ 2π −∞
= µ,
∫∞
since −∞ ze−z /2 dz = 0; the integrated function is odd, and
2
∫ ∞ −z 2 /2
−∞ e dz = 1; Z ∼ N (0, 1).
Similarly, one can prove that V [X] = σ 2 .
Example 4.2. Given the normally distributed random variable
X with mean 18 and variance 6.25, find
(i) P (X < 15),
(ii) the value of k such that P (X < k) = 0.2578,
(iii) P (17 < X < 21),
(iv) the value of k such that P (X > k) = 0.1539.
62
(i)
( )
X − 18 15 − 18
P (X < 15) = P <
2.5 2.5
= P (Z < −1.2)
= Φ(−1.2) = 0.1151,
(ii)
( )
k − 18
P (X < k) = 0.2578 ⇒ P Z < = 0.2578
2.5
k − 18
⇒ = −.65 ⇒ k = 16.375,
2.5
(iii)
( )
17 − 18 21 − 18
P (17 < X < 21) = P <Z<
2.5 2.5
= P (−.4 < Z < 1.2)
= Φ(1.2) − Φ(−.4)
= .8849 − .3446 = 0.5403,
(iv)
( )
K − 18
P (X > k) = .1539 ⇒ P Z > = .1539
2.5
( )
18 − k
⇔P Z< = .1539
2.5
18 − k
⇒ = −1.02
2.5
⇒ k = 20.55,
63
Example 4.3. If a set of grades on a statistic examination are
approximately normally distributed with a mean of 17 and a
standard deviation of 7.9, find (a) the lowest passing grade if
the lowest 10% of the students are given F s, (b) the highest B
if the top 5% of the students are given As.
Let X ∼ N (74, (7.9)2 ) and shows the grades

( )
k − 74
(a) P (X < k) = 0.1 ⇒ P Z < = .1
7.9
k − 74
⇒ = −1.28 ⇒ k ∼ = 64.
7.9
( )
B − 74
(b) P (X > B) = .05 ⇒ P Z > = .05
( ) 7.9
74 − B
⇔P Z< = .05
7.9
74 − B
⇒ = −1.65 ⇒ B ∼ = 87.
7.9
Example 4.4. In a mathematics examination the average grade
was 82 and the standard deviation was 5. All students with
grades from 88 to 94 received a grade of B. If the grades are
approximately normally distributed and 8 students received a B
grade, how many students took the examination?
Let X ∼ N (82, 25) and shows the grade,
P (88 < X < 94) = P (1.2 < Z < 2.4) = Φ(2.4) − Φ(1.2) =
.9918 − .8849 = .1096 ⇒ n × .1096 = 8 ⇒ n ∼

= 75.
64
4.2.1 Normal approximation to the binomial
We shall now state (without proof) a theorem that allows us
to use areas under the normal curve to approximate binomial
probabilities when n is sufficiently large.
Theorem 4.3. If X is a binomial random variable with mean
µ = np and variance σ 2 = npq, then the limiting form of the
distribution of
X − np
Z= √ ,
npq
as n → ∞, is the standardized normal distribution N (0, 1).
The probabilities of the binomial are approximated according
to the following:
If a, b and c are positive integers, 0 ≤ a, b, c ≤ n, then
1.
P (X = c) = P (c − 0.5 ≤ X ≤ c + 0.5)
( )
c − 0.5 − np c + 0.5 − np
=P √ ≤Z≤ √
npq npq
( )
a − 0.5 − np b + 0.5 − np
2. P (a ≤ X ≤ b) = P √ ≤Z≤ √ ,
npq npq
3. P (a < X < b), P (a < X ≤ b) and P (a ≤ X < b) should be
transformed to closed interval probability and then apply
(2).
65
Example 4.5. A drug manufacturer claims that a certain drug
cures a blood disease on the average 85% of the time. To check
the claim, government testers used the drug on a sample of 100
individuals and decide to accept the claim if 75% or more are
cured, what is the probability that the claim will be accepted when
the cure probability is in fact 85%
Let X ∼ b(100, 0.85) and shows the number of cured people,
then
∑
100
P (X ≥ 75) = Cx100 (0.85)x (0.15)100−x ,
x=75
but we notice that the number of individuals is large, so it is
prefer to use the normal approximation to the binomial. There-
fore
( )
75 − 0.5 − (100)(0.85)
P (X ≥ 75) = P Z≥ √
(100)(0.85)(0.15)
= P (Z ≥ −2.94) = 1 − Φ(−2.94) = 1 − 0.0016
= 0.9984.
Example 4.6. A certain pharmaceutical company knows that,
on the average, 5% of a certain type of pill has ingredient that
is below the minimum strength and thus unacceptable. What is
the probability that at least 2 in a sample of 200 pills will be
66
unaccepted. Find also the mean and standard deviation of the
accepted pills.
Let X ∼ b(200, 0.05) and shows the number of unaccepted
pills, then
P (X ≥ 2) = 1 − [P (X = 1) + P (X = 0)] =
= 1 − [C1200 (0.05)1 (0.95)199 + C0200 (0.95)200 ],
but we notice that n = 200 is large, so it is prefer to use Poisson
distribution with parameter λ = (200)(0.05) = 10 or use the
normal approximation to the binomial. Therefore

( )
2 − 0.5 − (200)(0.05)
P (X ≥ 2) = P Z ≥ √
(200)(0.05)(0.95)
= P (Z ≥ −2.75) = 1 − Φ(−2.75) = 1 − 0.003 = 0.997.
The mean and standard deviation of the accepted pills are equal,
√
respectively, n(1 − p) = 200(0.95) = 190 and n(1 − p)q =
√
200(0.95)(0.05) = 3.08.
4.3 Gamma Distribution
Definition 4.3 (Gamma distribution). The random variable X
subjects to the gamma distribution, with two parameters n, β,
67
if its density function is given by,



1
xn−1 e−x/β , x ≥ 0, (n, β > 0)
Γ(n)β n
f (x) = (4.4)

 0, otherwise,
where Γ(.) is the gamma function.
Remark 4.1. From definition of gamma function, we have

∫ ∞
xn−1 e−x/β dx = Γ(n)β n .
0
Theorem 4.4. If X has a gamma distribution with parameters
n and β, then
E[X] = nβ, V [X] = nβ 2 .
Proof. The proof is omitted.
Special Cases
1. Exponential distribution.
If in (4.4) n = 1, then

 1
 e−x/β , x ≥ 0, (β > 0)
f (x) = β

 0, otherwise,
which is the exponential distribution with parameter β.
68
2. Chi-square distribution
If in (4.4) n = m/2, β = 2, then


 1

m/2
x(m/2)−1 e−x/2 , x ≥ 0, (m, β > 0)
f (x) = Γ(m/2)2

 0, otherwise,
which is the chi-square distribution with degree of freedom
m, denoted by χ2 (m)
Remark 4.2. The exponential distribution has been used as a
model for lifetimes of various things.
69
Exercises IV
1. Let X have the uniform distribution on the interval [θ1 −
θ2 , θ1 + θ2 ], θ2 > 0. Find θ1 and θ2 so that the mean and
the variance of X are respectively, equal to the mean and
the variance of a distribution which is χ2 (8).
2. Let X ∼ N (5, 10). Find P (0.04 < (X − 5)2 < 38.4).
3. Let X ∼ N (1, 4). Find P (1 < X 2 < 9).

X −µ
4. Let X ∼ N (µ, σ 2 ). Find b so that P (−b < < b) =
σ
0.9.
5. Let X ∼ N (50, 25). Determine (a) P (X > 62),
(b) P (| X − 50 |< 8), and (c) P (X 2 < 1600).
6. Let X ∼ N (50, 100). Find P (Y ≤ 3137), where Y = X 2 +1.
[.4516]
7. If loge X ∼ N (1, 4), find P (0.5 < X < 2). [.248]
8. If zα is the value of Z such that

∫ ∞
1
e−z /2 dz = α,
2
√
2π zα
then find the value of zα when (a) α = 0.05, (b) α = 0.025,
(c) α = 0.01.
70
9. Show that the graph of a pdf N (µ, σ 2 ) has points of inflec-
tion at x = µ − σ, x = µ + σ?
10. For a normal distribution with mean 12 and standard devi-
ation 2, find a value of the variate such that the probability
of the interval from the mean to that value is 0.3159. [13.8]
11. Let X ∼ N (µ, σ 2 ), prove that
dµ2r
(a) µ2r+2 = σ 2 µ2r + σ 3 .
dσ
′
′ ′ ′
3 dµr
(b) µr+2 = 2µµr+1 + (σ − µ )µ2 + σ
2 2
.
dσ
12. The I.Q.s of 600 applicants to a certain college are approx-
imately normally distributed with a mean of 115 and a
standard deviation of 12. If the college requires an I.Q. of
at least 95, how many of these students will be rejected on
this basis regardless of their other qualifications?
13. A coin is tossed 400 times. Find the probability of obtaining
(a) Between 185 and 210 heads.
(b) Exactly 205 heads.
(c) Less than 176 or more than 227 heads.
71
14. The total time, T , taken to complete a certain job is a
β α α−1 −βt
gamma random variable with pdf f (t) = t e ; t≥
Γ(α)
0, zero elsewhere, where α = 4 and β = 1 hours. What
fraction of job will take longer than 5 hours to complete.
[.265]
15. The time to failure, T , of a component is assumed to have
a pdf f (t) = α e−αt t ≥ 0, zero elsewhere.
(a) What is the probability that the component fails be-
tween k and k + 1; k is an integer.
(b) If the mean time to failure of the component is 100
hours, what is the probability that any particular com-
ponent will last 200 hours.
72
Chapter 5
SAMPLING THEORY
Studying the relationships existing between a population and
samples drawn from the population is called “sampling theory”.
Sampling theory is useful in estimating the unknown pop-
ulation parameters and also in determining whether observed
differences between two samples are really due to chance varia-
tion or whether they are actually.
The purpose of this chapter is to introduce the concept of
sampling and to present some distribution results that are re-
lated by sampling.
73
5.1 Population and Samples
Definition 5.1 (Population). The totality of all observations
which are under discussion will be called the population.
Definition 5.2 (Simple random sample). If a sample of size n,
say X1 , X2 , . . . , Xn , drawn from a population of size N in such a
way that every possible sample of size n has the same probability
of being selected, then it is called a simple random sample.
Definition 5.3 (Statistic). A statistic is a random variable de-
pends only on the observed sample.
Example 5.1. If X1 , X2 , . . . , Xn is a random sample of size n,
then each of the following represents a statistic.

1∑
n
1. X̄ = Xi [sample mean].
n i=1
1∑ r
n
′
2. µr = X [rth sample moment about 0].
n i=1 i
1∑
n
3. µr = (Xi − X̄)r [rth sample moment about X̄].
n i=1
Definition 5.4 (Sample variance). If X1 , X2 , . . . , Xn represent
a random sample of size n, with mean X̄, then

1 ∑
n
S =2
[Xi − X̄]2 ; n>1
n − 1 i=1
74
is defined to be the sample variance.
Definition 5.5 (Sampling distribution). The probability distri-
bution of a statistic is called a sampling distribution.
To construct a sampling distribution, we proceed as follows:
1. From a finite population of size N , randomly draw all pos-
sible samples of size n.
2. Compute the statistic of interest, such as the mean, for each
sample.
3. List in one column the different distinct observed values of
the statistic, and in another column list the corresponding
frequency of occurrence of each distinct observed value of
the statistic.
Definition 5.6 (Standard error). The standard deviation of the
sampling distribution of a statistic is called the standard error
of the statistic.
5.2 Sampling Distribution of the Mean
1. When σ is known.
75
Let X1 , X2 , . . . , Xn be a random sample of size n drawn
from a population of size N having mean µ and variance
σ 2 , then
 2( )

 σ N − n

 ; if the population is finite and the

 n N −1





 sampling is without replacement,

2
σX̄ =



 σ2



 ; if the population is infinite or the

 n


 sampling is with replacement,
where σX̄ is called the standard error of X̄.

( )
N −n
Remark 5.1. The factor is called “finite pop-
N −1
ulation correction” and can be ignored if N is very large
(infinite population) or if n represents at most 5 percent

n σ2
from the population, i.e. ≤ 0.05, in this case σX̄ = .
2
N n
Theorem 5.1. If all possible random samples of size n are
drawn with replacement from a finite population of size N
with mean µ and variance σ 2 , then the sampling distribution
of the mean X̄ will be approximately normally distributed
with mean µX̄ = µ and variance σ 2 /n. Hence
X̄ − µ
Z= √ ∼ N (0, 1).
σ/ n
76
2. When σ is unknown.
In this case we replace σ by S (standard deviation of the
sample) and then we have the following two cases:
(a) If n ≥ 30, then
X̄ − µ
Z= √ ∼ N (0, 1).
S/ n
(b) If n < 30, then
X̄ − µ
T = √ ∼ tν
S/ n
where ν = n−1 is the degrees of freedom of t-distribution.
5.3 Sampling Distribution of the Difference
of Means
If we are given two populations, the first with mean µ1 and
variance σ12 , and the second with mean µ2 and variance σ22 . Let
the values of the variable X̄1 represent the means of random
samples of size n1 drawn from the first population and similarly
the values of X̄2 represent the means of random samples of size
n2 drawn from the second population such that the values of X̄1
77
are independent of the values of X̄2 , then
σ12 σ22
µX̄1 ±X̄2 = µ1 ± µ2 and 2
σX̄ 1 ±X̄2
= + .
n1 n2
Theorem 5.2. Suppose that two independent samples of sizes n1
and n2 are drawn from two large populations with means µ1 and
µ2 and variances σ12 and σ22 , respectively. Then the sampling
distribution of X̄1 − X̄2 is approximately normally distributed
with mean and standard error, given by

√
σ12 σ22
µX̄1 −X̄2 = µ1 − µ2 and σX̄1 −X̄2 = + .
n1 n2
Hence,
(X̄1 − X̄2 ) − (µ1 − µ2 )
Z= √ 2 ∼ N (0, 1).
σ1 σ22
n1 + n2
Example 5.2. If the uric acid values in normal adult males are
approximately normally distributed with a mean and standard
deviation of 5.7 and 1 mg percent, respectively. Find the proba-
bility that a sample of size 9 will yield a mean:
(a) Greater than 6, (b) between 5 and 6.
µ = 5.7, σ = 1, n=9
78
(a)
( )
X̄ − µ 6−µ
P (X̄ > 6) = P √ > √
σ/ n σ/ n
= P (Z > 0.9) = 1 − P (Z ≤ 0.9)
= 1 − 0.8159 = 0.1841.
(b)
P (5 < X̄ < 6) = P (−2.1 < Z < 0.9)
= 0.8159 − 0.0143 = 0.8016.
Example 5.3. Suppose that a population consists of the follow-
ing values: 1, 3, 5, 7. Construct the sampling distribution of X̄
based on samples of size two selected without replacement from
the above population. Find the mean and variance of the sam-
pling distribution?
4!
The number of drawn samples is equal to C24 = = 6.
2! · 2!
79
Samples x̄ i x̄i fi x̄i fi x̄i 2 fi
(1,3) 2 1 2 1 2 4
(1,5) 3 Then we have the 2 3 1 3 9
(1,7) 4 following frequency 3 4 2 8 32
(3,5) 4 distribution 4 5 1 5 25
(3,7) 5 5 6 1 6 36
(5,7) 6 6 24 106
∑
x̄i fi 24
E[X̄] = ∑i = = 4.
i f i 6
∑ 2 (∑ )2
x̄i fi x̄i fi 106 5
σX̄ = ∑
2 i
− ∑ i
= − 16 = .
i fi i fi 6 3
We can note that
1+3+5+7
µ= = 4 = µX̄ ,
4
σ2 N − n 5 2 5
= × = = σX̄
2
.
n N −1 2 3 3
Example 5.4. Suppose it has been established that for a certain
type of client the average length of a home visit by a public health
nurse is 45 minutes with a standard deviation of 15 minutes,
and that for a second type of client the average home visit is 30
minutes long with a standard deviation of 20 minutes. If a nurse
randomly visits 35 clients from the first and 40 for the second
80
group, what is the probability that the average length of home
visit will differ between the two groups by 20 or more minutes?
µ1 = 45 µ2 = 30
σ12 = 15 σ22 = 20
n1 = 35 n2 = 40.
We don’t know here whether the two populations are normal or
not. But, since n1 > 30 and n2 > 30, then the difference between
two sample means is approximately normally distributed with
the following mean and variance:
µX̄1 −X̄2 = µ1 − µ2 = 45 − 30 = 15,
2 σ12 σ22 (15)2 (20)2

σX̄1 −X̄2
= + = + = 16.4286,
n1 n2 35 40
and hence
( )
20 − 15
P ((X̄1 − X̄2 ) ≥ 20) = P Z ≥
4.05
= P (Z ≥ 1.23)
= 1 − P (Z < 1.23)
= 1 − 0.8907 = 0.1093.
Example 5.5. If all possible samples of size 16 are drawn from a
normal population with mean equal to 50 and variance 25, what
81
is the probability that a sample mean X̄ will fall in the interval
from µX̄ − 1.9σX̄ to µX̄ − 0.4σX̄ ?

σ 5
We know that µX̄ = µ = 50, σX̄ = √ = .
n 4
P (µX̄ − 1.9σX̄ < X̄ < µX̄ − 0.4σX̄ ) = P (−1.9 < Z < −0.4)
= 0.3446 − 0.0287
= 0.3159.
5.4 Sampling Distribution of the Sample Vari-
ance (S 2)
When we draw a sample of size n from a normal population
with variance σ 2 , and the sample variance s2 is computed for
each sample, then we have obtained the values of a statistic S 2 .
In practice, the sampling distribution of S 2 has little application
in statistics. Instead, we shall consider the distribution of a ran-
dom variable X 2 , called chi-square, whose values are calculated
from each sample by the formula
2 (n − 1)s2
χ = .
σ2
2 (n − 1)S 2
The distribution of X = is referred to as the chi-
σ2
square distribution with ν = n − 1 degrees of freedom.
82
Example 5.6. Find the probability that a random sample of size
25, from a normal population with σ 2 = 6, will have a variance
(a) greater than 9.1. (b) between 3.462 and 10.745.
(a)
( )
2 (n − 1)S 2 (n − 1)9.1
P (S > 9.1) = P >
σ2 σ2
( )
24 × 9.1
= P X2 >
6
= P (X 2 > 36.4) = χ224 (36.4) = 0.05.
(b)
( )
24 × 3.462 24 × 10.745
P (3.462 < S 2 < 10.745) = P < X2 <
6 6
= P (13.848 < X 2 < 42.98)
= χ224 (13.848) − χ224 (42.98)
= .95 − .01 = 0.94.
5.5 Sampling Distribution of the Sample Pro-
portion
To know and understand the distribution of the sample propor-
tion, let us consider the following example
Example 5.7. Suppose that in a certain human population 0.08
are colorblind. If we designate a population proportion by p, we
83
can say that, in this example, p = 0.08. If a random sample
of 150 individuals from this population is selected. What is the
probability that the proportion in the sample who are colorblind
will be greater than 0.15?
To answer this question we need to know the properties of
the sampling distribution of the sample proportion.
We will denote the sample proportion by p̂. When the sample
size is large, the distribution of sample proportions is approxi-
mately normally distributed. The mean of the distribution, µP̂ ,
that is the average of all possible sample proportions, will be
equal to the true population proportion, p, and the variance of

p(1 − p)
the distribution, σP̂2 , will be equal to , then
n
p̂ − p
z=√ , (5.1)
p(1 − p)/n
is the value of standard normal distribution.
Example 5.8. In a random sample of 75 adults, 35 said they
felt that cancer of the breast is curable. If, in the population
from which the sample was drawn, the true proportion who feel
cancer of the breast can be cured is 0.55, what is the probability
of obtaining a sample proportion smaller than that obtained in
this sample?
84
n = 75 p = 0.55
( ) ( )
35 P̂ − p (35/75) − p
P P̂ < =P √ <√
75 p(1 − p)/n p(1 − p)/n
( )
(35/75) − 0.55
=P Z< √
0.55 × 0.45/75
= P (Z < −1.45) = 0.0735.

Relation (5.1) can be generalized to difference between two sam-
ple proportions as follows
(p̂1 − p̂2 ) − (p1 − p2 )

z=√ ,
p1 (1 − p1 ) p2 (1 − p2 )
+
n1 n2
where, for i = 1, 2, p̂i , pi are the sample proportion and popu-
lation proportion of ith sample and ith population, respectively,
and ni is the sample size drawn from population i.
85
Exercises V
1. If each observation in a sample is multiplied by k, show that
the sample variance becomes k 2 times its original value.
2. (i) Calculate the variance of the sample 3, 5, 8, 7, 5, and 7.
(ii) Without calculating, state the variance of the sample
6, 10, 16, 14, 10, and 14
(iii) Without calculating, state the variance of the sample
25, 27, 30, 29, 27, and 29
3. A finite population consists of the numbers 2, 4, and 7.
(i) Construct a frequency histogram for the sampling dis-
tribution of X̄ when samples of size 4 are drawn with
replacement.
2
(ii) Verify that µX̄ = µ and σX̄ = σ 2 /n
(iii) Between what two values would you expect the middle
68% of the sample means to fall?
4. The heights of 1000 students are approximately normally
distributed with a mean of 68.5 inches and a standard de-
viation of 2.7 inches. If 200 random samples of size 25 are
drawn from this population, determine
86
(i) The expected mean and standard deviation of the sam-
pling distribution of the mean.
(ii) The number of sample means that fall between 66 and
69 inclusive.
(iii) The number of sample means falling below 65.
5. Find the probability that a random sample of 25 observa-
tions, from a normal population with variance σ 2 = 6 will
have a variance S 2 (a) greater that 9.1, (b)between 3.462
and 10.745.
87
Chapter 6
POINT AND INTERVAL
ESTIMATIONS
Estimation is the first of the two general areas of statistical infer-
ence. The second general area, hypothesis testing, is examined
in the next chapter.
In this chapter we shall consider inferences about unknown
population parameters such as the mean, variance and propor-
tion.
Definition 6.1 (Statistical inference). The procedure whereby
inferences about a population are made on the basis of the re-
sults obtained from a sample drawn from that population is
called statistical inference.
88
6.1 Methods of Estimation
A population parameter can be estimated by a point or an inter-
val. A point estimate of some population parameter θ is a single
numerical value θ̂ of the statistic Θ̂. For example, the value x̄
of the statistic X̄, computed from a sample of size n, is a point
estimate of the population parameter µ. Similarly, s2 is a point
estimate of the population variance σ 2 .
An interval estimate of a population parameter, θ, is given
by two values which θ lies within them.
Definition 6.2 (Unbiased estimator). A statistic Θ̂ is said to
be an unbiased estimator of the parameter θ if E(Θ̂) = θ.
The sample mean, the difference between two sample means,
the sample proportion, the difference between two sample pro-
portions are unbiased estimates of their corresponding parame-
ters.
Example 6.1. Prove that S 2 is an unbiased estimator of σ 2 .
Let X1 , X2 , . . . , Xn is a random sample of size n, that is
X1 , X2 , . . . , Xn are independent and identically distributed, each
with mean µ and variance σ 2 . Then
89
X1 + X2 + · · · + Xn
∵ X̄ =
( )n ( ) ( )
X1 X2 Xn
∴ E[X̄] = E +E + ··· + E
n n n
µ µ µ
= + + · · · + = µ.
n n n
σ2 σ2 σ2 σ2 σ2
Also, V [X̄] = E[(X̄ − µ)2 ] = 2 + 2 + · · · + 2 = n 2 = .
n n n n n
Now,
∑
n
1∑
n
(Xi − µ) =
2
[(Xi − X̄) + (X̄ − µ)]2
i=1
n i=1
∑
n ∑
n
= (Xi − X̄) + 2(X̄ − µ)
2
(Xi − X̄)
i=1 i=1
+ n(X̄ − µ)2
∑n
i=1 (Xi − X̄)
2
= (n − 1) + 0 + n(X̄ − µ)2
(n − 1)
= (n − 1)S 2 + n(X̄ − µ)2
Taking the expectation for both sides, then

∑
n
n−1
E[(Xi − µ)2 ] = E[S 2 ] + nE[(X̄ − µ)2 ]
i=1
n
∑
n
σ2
∴ σ = (n − 1)E[S ] + n
2 2
i=1
n
(n − 1)σ 2 = (n − 1)E[S 2 ]
∴ E[S 2 ] = σ 2 ,
and hence S 2 is an unbiased estimator for σ 2 .
90
6.2 Confidence Intervals
The interval I can be considered a confidence interval for the
population parameter, θ, if we can compute the probability that
I contains θ. This probability is called the confidence coefficient
of the interval.
The procedure of obtaining a confidence interval is to obtain
Q(Θ̂, θ), which is a function of the estimator Θ̂ and the parame-
ter θ such that the distribution of this quantity does not depend
on θ. For fixed α (usually 1% or 5%) we obtain the values Q1
and Q2 such that
P (Q1 ≤ Q(Θ̂, θ) ≤ Q2 ) = 1 − α.
By solving the inequality Q1 ≤ Q(Θ̂, θ) ≤ Q2 with respect to θ,
then
Q1 ≤ Q(Θ̂, θ) ≤ Q2 ⇐⇒ T1 ≤ θ ≤ T2 .
Then we can write
P (Q1 ≤ Q(Θ̂, θ) ≤ Q2 ) = P (T1 ≤ θ ≤ T2 ) = 1 − α,
where T1 and T2 are called the lower and upper limits, respec-
tively, 1 − α is called confidence coefficient.
91
6.2.1 Confidence interval for the population mean (µ)
[σ known]
It is easy now to find a (1 − α)100% confidence interval for µ of
a normal distribution with known variance, σ 2 . We know that
Z ∼ N (0, 1), then by taking
X̄ − µ
Z= √ ≡ Q(X̄, µ)
σ/ n
X̄ − µ
P (−zα/2 < Z < zα/2 ) = P (−zα/2 < √ < zα/2 ) = 1 − α
σ/ n
( )
σ σ
P −zα/2 √ < X̄ − µ < zα/2 √ =1−α
n n
( )
σ σ
P X̄ − zα/2 √ < µ < X̄ + zα/2 √ =1−α
n n
Theorem 6.1. A (1 − α)100% confidence interval for µ, based
on a random sample of size n with mean X̄, is
σ σ
X̄ − zα/2 √ < µ < X̄ + zα/2 √ , (6.1)
n n
where zα/2 is the value of standard normal random variable leav-
ing an area α/2 to the right, i.e. P (Z > zα/2 ) = α/2.
92
6.2.2 Confidence interval for the population mean (µ)
[σ unknown]
In this case we replace σ 2 by S 2 (sample variance) and then the
confidence interval becomes

S S
X̄ − zα/2 √ < µ < X̄ + zα/2 √ ; n ≥ 30
n n
S S
X̄ − tα/2 √ < µ < X̄ + tα/2 √ ; n < 30
n n
where tα/2 is the value of the random variable T having t-
distribution, with ν = n − 1 degrees of freedom, leaving an
area α/2 to the right, i.e. P (T > tα/2 ) = α/2.
6.2.3 Determination of sample size for estimating means
We present now a method for determining the sample size re-
quires for estimating a population mean.
Let e denote the error in estimating the population mean
represented for example by Inequality (6.1). So

σ [ σ ]2
e = zα/2 √ =⇒ n = zα/2 .
n e
Example 6.2. The average number of heartbeats per minute for
a sample of 49 subjects was found to be 90. If the sample is taken
from a normal population with variance 100, find 90%, 95% and
99% confidence interval for the population mean.
93
X̄ = 90, σ 2 = 100, n = 49.
1 − α = 0.90 −→ α = 0.1 −→ α/2 = 0.05 −→ 1 − α/2 =
0.95. =⇒ zα/2 = 1.645.
Since the confidence interval is given by
σ σ
X̄ − zα/2 √ < µ < X̄ + zα/2 √ .
n n
Then 90% confidence interval for µ is given by
10 10
90 − 1.645 × < µ < 90 + 1.645 ×
7 7
Now,
1 − α = 0.95 −→ 1 − α/2 = 0.975 =⇒ zα/2 = 1.96,
1 − α = 0.99 −→ 1 − α/2 = 0.995 =⇒ zα/2 = 2.6.
So, 95% and 99% confidence intervals for µ are given, respec-
tively, by
10 10
90 − 1.96 × < µ < 90 + 1.96 × ,
7 7
10 10
90 − 2.6 × < µ < 90 + 2.6 × .
7 7
Example 6.3. Let X̄ be the mean of a random sample of size
n from a distribution which is N (µ, 9). Find n such that
P (X̄ − 1 < µ < X̄ + 1) = 0.9.
94
σ 2 = 9, e=1
1 − α = 0.9 −→ 1 − α/2 = 0.95 =⇒ zα/2 = 1.645, then

[ [ ]2
σ ]2 3 ∼
n = zα/2 = 1.645 × = 24.
e 1
6.3 Confidence Interval for the Difference Be-
tween Two Population Means
Suppose that X1 , X2 , . . . , Xn1 is a random sample with mean
X̄ and variance S12 taken from a normal population with mean
µ1 and variance σ12 , and let Y1 , Y2 , . . . , Yn2 be another random
sample with mean Ȳ and variance S22 taken from a normal pop-
ulation with mean µ2 and variance σ22 , then we have the following
cases:
1. Confidence interval for µ1 − µ2 when σ12 and σ22 are known.

√ √
2
σ1 σ2 2 σ12 σ22
(X̄−Ȳ )−zα/2 + < µ1 −µ2 < (X̄−Ȳ )+zα/2 +
n1 n2 n1 n2
is a (1 − α)100% confidence interval for µ1 − µ2 .
2. Confidence interval for µ1 −µ2 when σ12 and σ22 are unknown.
If n1 and n2 are large (in practice both strictly greater than
95
60) and we cannot assume that σ12 = σ22 , then
√ √
2
S1 S2 2 S12 S22
(X̄−Ȳ )−zα/2 + < µ1 −µ2 < (X̄−Ȳ )+zα/2 +
n1 n2 n1 n2
is an approximate (1 − α)100% confidence interval for µ1 −
µ2 .
3. Confidence interval for µ1 − µ2 when σ12 = σ22 = σ 2 (un-
known).
It is usual to combine the two separate estimators S12 (based
on ν1 = n1 − 1 degrees of freedom) and S22 (based on
ν2 = n2 − 1 degrees of freedom) into a single estimator
(pooled estimator) of the variance, Sp2 , which is given by

∑
i νi Si
2
(n1 − 1)S12 + (n2 − 1)S22
Sp2 = ∑ = ,
i νi n1 + n 2 − 2
and hence a (1 − α)100% confidence interval for µ1 − µ2 is
then given by
(X̄ − Ȳ ) − tα/2 Sp W < µ1 − µ2 < (X̄ − Ȳ ) + tα/2 Sp W,
where tα/2 is the value of the random variable T having t-
distribution with n1 + n2 − 2 degrees of freedom and W =

√
1 1
+ .
n1 n2
96
Exercises VI
1. An electrical firm manufactures light bulbs that have a
length of life that is approximately normally distributed,
with a standard deviation of 40 hours. If a random sample
of 30 bulbs has an average life of 780 hours, find a 96%
confidence interval for the population mean of all bulbs
produced by this firm.
2. How large a sample is needed in Exercise 1 if we wish to
be 96% confident that our sample mean will be within 10
hours of the true mean?
3. A random sample of 8 cigarettes of a certain brand has an
average nicotine content of 18.6 milligrams and a standard
deviation of 2.4 milligrams. Construct a 99% confidence
interval for the true average nicotine content of this kind of
cigarettes, assuming an approximate normal distribution.
4. Given two random samples of size n1 = 9 and n2 = 16,
from two independent normal populations, with x̄1 = 64,
x̄2 = 59, s1 = 6, and s2 = 5, find a 95% confidence interval
for µ1 − µ2 , assuming that σ1 = σ2 .
97
Chapter 7
TESTS OF HYPOTHESES
The purpose of hypothesis testing is to aid the clinician, re-
searcher, or administrator in reaching a decision concerning a
population by examining a sample from that population.
Definition 7.1 (Statistical hypothesis). A statistical hypothesis
is an assumption or statement, which may or may not be true,
concerning one or more populations.
Hypothesis that we formulate with the hope of rejecting are
called null hypotheses, denoted by H0 . The null hypothesis is
sometimes referred to as a hypothesis of no difference, since
it is a statement of agreement with ( or no difference form)
conditions presumed to be true in the population of interest.
The rejection of H0 leads to the acceptance of an alternative
98
hypothesis, denoted by H1 .
Definition 7.2. A type I error has been committed if we reject
the null hypothesis when it is true.
Definition 7.3. A type II error has been committed if we accept
the null hypothesis when it is false.
Definition 7.4. The probability of committing a type I error is
called the level of significance of the test and is denoted by α
i.e. α = P (type I error).
If the alternative hypothesis is one-sided such as H1 : θ > θ0
or θ < θ0 , the test is called a one-tailed test. The critical region
for the alternative hypothesis θ > θ0 lies entirely in the right
tail of the distribution, while the critical region H1 : θ < θ0 lies
entirely in the left tail. If the alternative hypothesis H1 : θ ̸= θ0 ,
then it is called two-tailed test. The critical region here consists
of two tails, one in left corresponds to θ < θ0 and the other one
in the right corresponds to θ > θ0 .
A test is said to be significant if the null hypothesis is re-
jected at the 0.05 level of significance, and is considered highly
significant if the null hypothesis is rejected at the 0.01 level of
significance.
99
7.1 Tests Concerning Means, Variances and
Proportions
The steps for testing a hypothesis concerning a population pa-
rameter θ against some alternative hypothesis may be summa-
rized as follows:
1. Formulate the null hypothesis, H0 : θ = θ0 .
2. Formulate the alternative hypothesis, H1 : θ > θ0 , θ <
θ0 or θ ̸= θ0 .
3. Choose a level of significance equal to α which may be 0.05
or 0.01.
4. Select the appropriate test statistic and establish the criti-
cal region.
5. Compute the value of the statistic from a random sample
of size n.
6. Conclusion: Reject H0 if the statistic has a value in the
critical region, otherwise accept H0 .
Example 7.1. A doctor developed a new drug claims its effi-
ciency with mean µ = 20 and with standard deviation of 0.5.
100
Test the hypothesis that µ = 20 against the alternative that
µ ̸= 20. If a random sample of 50 patients is tested and found
a mean x̄ = 19.8. Use 0.01 level of significance.
1. H0 : µ = 20.
2. H1 : µ ̸= 20.
3. α = 0.01.
x̄ − µ
4. Suppose that z = √ , so the critical region is
σ/ n
z < −zα/2 and z > zα/2
zt < −2.58 zt > 2.58.
5. Computation: x̄ = 19.8, n = 50
x̄ − µ 19.8 − 20
zc = √ = √ = −2.828.
σ/ n 0.5/ 50
6. Conclusion: Reject H0 since zc < zt , and conclude that the
drug is highly significant.
Example 7.2. Teat the hypothesis that the average weight of
containers of a particular lubricant is 10 ounces if the weights of
a random sample of 10 containers are 10.2, 9.7, 10.1, 10.3, 10.1,
9.8, 9.9, 10.4, 10.3 and 9.8 ounces? Use a 0.01 level of signifi-
cance and assume that the distribution of weights is normal.
101
n = 10, µ = 10
1 ∑
10
100.6
x̄ = xi = = 10.06,
10 i=1 10
 ( 10 )2 
1  ∑10 ∑
2
s = x −
2
xi  =⇒ s = 0.245.
10 × 9 i=1 i i=1
1. H0 : µ = 10.
2. H1 : µ ̸= 10.
3. α = 0.01.
x̄ − µ
4. Suppose that t = √ , so the critical region is
s/ n
t < −tα/2 and t > tα/2
tt < −3.25 tt > 3.25.

x̄ − µ 10.06 − 10
5. Computation: tc = √ = √ = 0.774.
s/ n 0.245/ 10
6. Conclusion: Accept H0 , since −tt < tc < tt .
Example 7.3. Let the mean of a certain operation is µ = 50
minutes with standard deviation σ = 10 minutes. A new equip-
ment is used. If a random sample of size n = 12 gives a mean
x̄ = 42 with standard deviation s = 11.9. Test the hypothesis
that the mean µ = 50 against the alternative that µ < 50 using
(α = 0.05, 0.01)? Assume the population is normal.
102
1. H0 : µ = 50.
2. H1 : µ < 50.
3. α = 0.05
x̄ − µ
4. Suppose that t = √ , so the critical region is
s/ n
t < −tα,n−1 =⇒ tt < −1.796.
x̄ − µ 42 − 50
5. Computation: tc = √ = √ = −2.32.
s/ n 11.9/ 12
6. Conclusion: Reject H0 , since tc < tt .
If α = 0.01, the critical region is tt < −2.718. In this case we
accept H0 , since tc > tt .
Example 7.4. A simple random sample of 15 nursing students
who participated in an experiment took a test to measure manual
dexterity. The variance of the sample observations was 1225.
We want to know if we can conclude from these data that the
population variance is different from 2500.
1. H0 : σ 2 = 2500.
2. H1 : σ 2 ̸= 2500
3. α = 0.05
103
(n − 1)S 2
4. The test statistic to be used is X 2 = .
σ2
The critical region is
X 2 > χ2(α/2, n−1) X 2 < χ2(1−α/2, n−1)
X 2 > 26.119 X 2 < 5.629.
5. Computations:
(n − 1)s2 14 × 1225
χ2c = 2
= = 6.86.
σ 2500
6. Conclusion: Accept H0 , since 5.629 < χ2c < 26.119. So,
based on these data we are unable to conclude that the
population variance is not 2500.
Example 7.5. A researcher team collected serum amylase data
from a sample of healthy subjects and from a sample of hospi-
talized subjects. They wish to know if they would be justified in
concluding that the population means are different? The data
consists of serum amylase determinations on n2 = 15 healthy
subjects and n1 = 22 hospitalized subjects. The sample means
and standard deviations are as follows:
x̄1 = 120 units/ml s1 = 40 units/ml
x̄2 = 96 units/ml s2 = 35 units/ml.
104
1. H0 : µ1 − µ2 = 0
2. H1 : µ1 − µ2 ̸= 0
3. α = 0.05
(x̄1 − x̄2 ) − (µ1 − µ2 )

4. The test statistic to be used is T = √
1 1
Sp +
n1 n2
The critical regions are
T < −t(α/2, n−1) and T > t(α/2, n−1)
T < −2.0301 T > 2.0301
5. Computations:
(120 − 96) − 0 24
tc = √ = = 1.88.
1 1 12.75
38.08 +
15 22
6. Conclusion: Accept H0 , since −2.0301 < tc < 2.0301.
Example 7.6. The manufacturer of a product medicine claimed
that it was 90% effective relieving an allergy for a period of
8 hours. In a sample of 200 people who had the allergy, the
medicine provided relief for 160 people. Determine whether the
manufacturer’s claim is legitimate.
Let p denote the probability of obtaining relief from allergy by
using medicine. Then
105
1. H0 : p = 0.9, i.e. claim is correct
2. H1 : p < 0.9.
3. α = 0.01.
P̂ − p
4. The test statistic to be used is Z = √ .
p(1 − p)
n
The critical region is Z < −zα =⇒ Z < −2.33.
5. Computations:
0.8 − 0.9
zc = √ = −4.73.
(0.9)(0.1)
200
6. Conclusion: Reject H0 , since zc lies in the critical (rejec-
tion) region.
When the null hypothesis to be tested is p1 − p2 = 0, we are
hypothesizing that the two population parameters are equal.
We use this as justification for combining the results of the two
samples to come up with a pooled estimate of the hypothesized
common proportion. If this procedure is adopted, one computes

x1 + x2
p̄ = ,
n1 + n2
where x1 and x2 are the numbers in the first and second sam-
ples, respectively, possessing the characteristic of interest. This
106
pooled estimate of p = p1 = p2 is used in computing σ̂P̂1 −Pˆ2 , the
estimated standard error of the estimator, as follows:

√
p̄(1 − p̄) p̄(1 − p̄)
σ̂P̂1 −Pˆ2 = + .
n1 n2
Then the test statistic becomes
(P̂1 − Pˆ2 ) − (p1 − p2 )

Z= , (7.1)
σ̂P̂1 −Pˆ2
where Z N (0, 1) if the null hypothesis is true.
Example 7.7. In a study designed to compare a new treatment
for migraine headache with the standard treatment, 78 of 100
subjects who received the standard treatment responded favorably.
Do these data provide sufficient evidence to indicate that the new
treatment is more effective than the standard? The answer is yes
if we can reject the null hypothesis that the new treatment is no
more effective than the standard.
78 90 90 + 78
p̂1 = , p̂2 = , p̄ = = 0.84
100 100 100 + 100
1. H0 : p2 − p1 ≤ 0
2. H1 : p2 − p1 > 0
107
3. α = 0.05.
4. The test statistic to be used is given by (7.1).
The critical region is
Z > zα =⇒ Z > 1.645.
5. Computations:
0.90 − 0.78 0.12

zc = √ = = 2.32.
0.84 × 0.16 0.84 × 0.16 0.0518
+
100 100
6. Conclusion: Reject H0 , since zc > 1.645.
So, these data suggest that the new treatment is more ef-
fective than the standard.
7.2 Goodness-of-Fit Test
We shall now consider a test to determine if some population has
a specified distribution. The test is based upon how good a fit
we have between the frequencies of occurrence of observations in
an observed sample and the expected frequencies obtained from
the hypothesized distribution.
Theorem 7.1. A goodness of fit test between observed and ex-
108
pected frequencies is based on the quantity
∑
k
(oi − ei )2
2
χ = ,
i=1
ei
where χ2 is a value of the random variable X 2 whose sampling
distribution is approximated very closely by the chi-square dis-
tribution. The symbols oi and ei represent the observed and ex-
pected frequencies, respectively, for the ith cell
The number of degrees of freedom in a chi-square goodness
of fit test is equal to the number of cells minus the number of
quantities obtained from the observed data, which are used in
the calculations of the expected frequencies.
The critical region fall in the right tail of the chi-square dis-
tribution, X 2 > χ2α
Remark 7.1.
1. If the degree of freedom is equal to 1, a correction called
Yates’ correction for continuity is applied. The corrected
formula for χ2 then becomes

∑k
(| oi − ei | −0.5)2
2
χ (corrected) = .
i=1
ei
2. When the degrees of freedom is less than 5 and some values
109
of ei < 5, it is best to have ei somewhat larger than 5 which
can be done by grouping these cells to have ei > 5.
Example 7.8. The grades in a statistic course for a particular
semester were as follows
Grade A B C D F
f 14 18 32 20 16
Test the hypothesis, at the 0.05 level of significance, that the
distribution of grades is uniform.
The total number of students is n = 14 + 18 + 32 + 20 + 16 =
100, then the expected value for each grade is ei = np, where
p = 1/5, according to the uniform distribution we have
Grade A B C D F Total
oi 14 18 32 20 16
ei 20 20 20 20 20
(oi − ei )2 36 4 144 0 16 200
1. H0 : The distribution of grades is uniform.
2. H1 : The distribution of grades is not uniform.
3. α = 0.05.
110
4. The test statistic X 2 having chi-square distribution with
∑ (oi − ei )2
the value χ2 = 5i=1 is used.
ei
The critical region is X 2 > χ2(0.05,4) =⇒ X 2 > 9.488
5. computations:
∑
5
(oi − ei )2 200
χ2c = = = 10.
i=1
ei 20
6. Conclusion: Reject H0 , since Xc2 > 9.488 and hence we
conclude that the distribution of grades is not uniform.
Example 7.9. A die is rolled 24 times, the following results are
obtained
1 2 3 4 5 6
oi 6 5 2 3 0 8
ei 4 4 4 4 4 4
Is it a fair die? Use 0.05 level of significance.
In this experiment non of the expected frequencies exceeds 5,
therefore it is necessary to combine each cell with other cell. If
successive pairs of cells are combined , the preceding empirical
111
condition of ei will be satisfied as shown in the following table
1 or 2 3 or 4 5 or 6 T otal
oi 11 5 8 24
ei 8 8 8 24
(oi − ei )2 9 9 0 18
1. H0 : The die is fair.
2. H1 : The die is not fair.
3. α = 0.05.
2
∑6 (oi − ei )2
4. We use in the calculations χ = i=1 .
ei
The critical region is χ2 > χ2(0.05,2) = 5.991.
5. Computations:
∑
6
(oi − ei )2 18
χ2c = = = 2.25.
i=1
ei 8
6. Conclusion: Accept H0 , since χ2c < χ2 (tabulated), and
hence the die is fair.
7.2.1 Contingency tables
The contingency table is used for the purpose of studying the
relationship between two variables, each variable has different
112
levels. Consider, for example, the factor A classified into n levels
(A1 , . . . , An ) and factor B classified into m levels (B1 , . . . , Bm )
and it is desired to test the hypothesis that there is no rela-
tionship between the two factors A and B. If oij denote to
the observed frequency in the ith classification Ai for A and j th
classification Bj for B, and suppose that

∑
m
Ri = oij ; i = 1, . . . , n,
j=1
∑
n
Cj = oij ; j = 1, . . . , m, and
i=1
∑
n ∑m
N= Ri = Cj ; i = 1, . . . , n j = 1, . . . , m,
i=1 j=1
AB B1 B2 . . . Bj . . . Bm Total
A1 o11 o12 . . . oij . . . o1m R1
A2 o21 o22 . . . o2j . . . o2m R2

.. .. .. .. .. .. .. ..
. . . . . . . .
Ai oi1 oi2 . . . oij . . . oim Ri

.. .. .. .. .. .. .. ..
. . . . . . . .
An on1 on2 . . . onj . . . onm Rn

∑ ∑
Total C1 C2 . . . Cj ... Cm N= Cj = Ri
113
If the null hypothesis is satisfied (the two factors are inde-
pendent), then we have, for i = 1, . . . , n, j = 1, . . . , m,
Ri × Cj ∑ ∑ (oij − eij )2
n m
2
eij = and then χ = .
N i=1 j=1
eij
The number of degrees of freedom ν = (n − 1)(m − 1), and the
null hypothesis will be rejected if χ2 > χ2α,ν .
Example 7.10. A random sample of 30 adults are classified
according to sex and the number of hours they watch television
during a week.
Male Female
Over 25 hours 5 9
Under 25 hours 9 7
Using a 0.01 level of significance, test the hypothesis that a per-
son’s sex and time watching television are independent.
1. H0 : A person’s sex and time watching television are inde-
pendent.
2. H1 : A person’s sex and time watching television are de-
pendent.
3. α = 0.01.
114
∑ ∑ (oij − eij )2
4. We use χ(α,ν)2 = i j ,
eij
where the critical region is χ2 > χ2α,ν = χ20.01,1 = 6.635.
5. Computations:
14 × 14 14 × 16
e11 = = 6.5 e12 = = 7.5
30 30
14 × 16 16 × 16
e21 = = 7.5 e22 = = 8.5
30 30
Male Female Total
Over 25 hours 5 (6.5) 9 (7.5) 14
Under 25 hours 9 (7.5) 7 (8.5) 16
Total 14 16 30
∑
2 ∑
2
(oij − eij )2
χ2c =
i=1 j=1
eij
(5 − 6.5)2 (9 − 7.5)2 (9 − 7.5)2 (7 − 8.5)2
= + + +
6.5 7.5 7.5 8.5
= 0.538
6. Conclusion: Accept H0 , since χ2c < 6.635.
Example 7.11. In an experiment to study the dependence of
hypertension on smoking habits, the following data were taken
115
on 180 individuals:
Nonsmokers Moderate Heavy
Smokers Smokers
Hypertension 21 36 30
No hypertension 48 26 19
Test the hypothesis that the presence or absence of hypertension
is independent of smoking habits. Use a 0.05 level of signifi-
cance.
Nonsmokers Moderate Heavy Total
Smokers Smokers
Hypertension 21 (33.35) 36 (29.96) 30 (23.29) 87
No hypertension 48 (35.65) 26 (32.03) 19 (25.31) 93
Total 69 62 49 180
1. H0 : The presence or absence of hypertension is indepen-
dent of smoking habits.
2. H1 : The presence or absence of hypertension is dependent
of smoking habits.
3. α = 0.05.
116
∑ ∑ (oij − eij )2
4. We use χ(α,ν)2 = i j ,
eij
where the critical region is χ2 > χ2α,ν = χ20.05,2 = 5.991.
5. Computations:
87 × 69 93 × 69
e11 = = 33.35 e21 = = 35.65
180 180
87 × 62 93 × 62
e12 = = 29.96 e22 = = 32.03
180 180
87 × 49 93 × 49
e13 = = 23.68 e23 = = 25.31
180 180
(21 − 33.35)2 (36 − 29.96)2 (30 − 23.29)2

χ2c = + +
33.35 29.96 23.29
(48 − 35.65)2
(26 − 32.03)2
(19 − 25.31)2
+ + +
35.65 32.03 25.31
=13.065
6. Conclusion Reject H0 , since χ2c > 5.991 and hence the hy-
pertension and smoking habits are dependent.
117
Exercises VII
1. Test the hypothesis that the average weight of containers
of a particular lubricant is 10 ounces if the weights of a
random sample of 10 containers are 10.2, 9.7, 10.1, 10.3,
10.1, 9.8, 9.9, 10.4, 10.3, and 9.8 ounces. Use 0.01 level of
significance and assume that the distribution of weights is
normal.
2. A farmer claims that the average yield of corn of variety
A exceeds the average yield of variety B by at least 12
bushels per acre. To test this claim, 50 acres of each vari-
ety are planted and grown under similar conditions. Vari-
ety A yielded, on the average, 86.7 bushels per acre with a
standard deviation of 6.28 bushels per acre, while variety B
yielded, on the average, 77.8 bushels per acre with a stan-
dard deviation of 5.61 bushels per acre. Test the farmer’s
claim using a 0.05 level of significance.
3. A cigarette-manufacturing firm distributes two brands of
cigarettes. If it is found that 56 of 200 smokers prefer brand
A and that 29 of 150 smokers prefer brand B, can we con-
clude at the 0.06 level of significance that brand A outsells
118
brand B?
4. A die is tossed 180 times with the following results:
x 1 2 3 4 5 6
.
f 28 36 30 36 23 27
Is this a balanced die? Use a 0.05 level of significance.
5. In a shop study, a set of data was collected to determine
whether or not the proportion of defective produced by
workers was the same for the day, evening, or midnight shift
worked. The following data were collected on the items pro-
duced:
Shif t
Day Evening M idnight
Defective 45 55 70
Nondefective 905 890 870
What is your conclusion? Use an α = 0.025 level of signifi-
cance.
119
REFERENCES
Guttman, I; Wilks, S and Hunter, J. (1982). Introductory
Engineering Statistics . 3rd Ed., John Wiley & Sons, Inc.
Hogg, R.; McKean, J. and Craig, A. (2005). Introduction
to Mathematical Statistics. 6th Ed. Pearson Prentice Hall.
Mood, A.; Graybill, F. and Boes, D. (1982). Introduct-
ion to the Theory of Statistics. 3rd Ed. 12th printing, Mc-
Graw-Hill.
Rosner, B. (1982). Fundamentals of Biostatistics. PWS Pub-
lishers, Duxbury Press, Boston, Massachusetts.
120

Probality and Statics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probality and Statics

Uploaded by

Copyright:

Available Formats

Beni-Suef National University

Faculty of Computers and Artificial Intelligence

Probability and Statistics

2nd Year Students

The purpose of this chapter is to present a formal treatment of

1.1 Elements of Probability

1.1.1 Some important deﬁnitions

1.2 Operations on the Events

Deﬁnition 1.6 (Partition). The events A1 , A2 , . . . , An represent

∪ D, we had the property:

(i) Here, the sample space is S = {HH, HT, T H, T T }. Since

(ii) P (A ∩ C) = P (HH) = 0.25.

1.3 Conditional Probability and Multiplica-

P (2 red balls and 1 white ball) =

P (drawing at least one white ball)

P (one from each color in the drawn balls)

The following two examples show that mutually exclusive events

Example 1.16. Urn I contains u black and v white balls; urn II

We can now answer at the last question

P (B1 ∩ B2 ∩ B3 ) = P (B1 )P (B2 | B1 )P (B3 | B1 ∩ B2 )

3. If P (B) = 2/3, P (A ∩ B) = 1/2, P (A ∩ B c ) = 1/4.

The purpose of this chapter is to present the types of random

2.1 Types of Random Variables

Deﬁnition 2.1 (Random variable). Given a random experiment

with a sample space S. A function X, which assigns to each

element s ∈ S one and only one real number X(s) = x, is called

The range of X is the set of real numbers

Example 2.1. If we consider, for example, the sample space S

of tossing two balanced coins, and let the random variables X

and Y be deﬁned as follows:

X ≡ number of outcome heads,

Y ≡ | diﬀerence between number of heads and number of tails|,

X(s1 ) ≡ X(HH) = 2, X(s2 ) ≡ X(HT ) = 1, X(s3 ) ≡ X(T H) =

1, X(s4 ) ≡ X(T T ) = 0. Similarly,

Y (s1 ) ≡ Y (HH) = 2, Y (s2 ) ≡ Y (HT ) = 0, Y (s3 ) ≡ X(T H) =

P (X = 2) = P {s ∈ S : X(s) = 2} = P {HH} = 1/4.

P (X = 1) = P {s ∈ S : X(s) = 1} = P {HT, T H} = 1/2,

Similarly, for the random variable Y ,

P (Y = 0) = P {s ∈ S : Y (s) = 0} = P {HT, T H} = 1/2,

Then, we can write

When the range of a random variable X can be counted or

numerated, then this random variable is called discrete random

variable and the probability function that deﬁned on this type

of random variables is called a probability mass function if

the two conditions given in the following deﬁnition are satisﬁed.

Deﬁnition 2.2 (Probability mass function). The function p(x),

deﬁned on the discrete random variable X, where p(x) ≡ P (X =

x) = P {s : X(s) = x; s ∈ S} represents a probability mass

function if the following two conditions are satisﬁed

Remark 2.1. Some authors, sometimes, denote to the proba-

bility mass function by f (x).

One can notice that the two random variables mentioned

in Example (3.1) are discrete random variables, since RX =

and P (Y = y) ≡ p(y) are probability mass functions, since

p(x) ≥ 0 ∀x ∈ {0, 1, 2} and p(0) + p(1) + p(2) = 1. Similarly,

the last conditions are satisﬁed for p(y).

If the range of the random variable X can not be counted,

then this random variable is called continuous random vari-

able and the probability function that deﬁned on this type of

random variables is called a probability density function

(pdf) if the two conditions given in the following deﬁnition are

Deﬁnition 2.3 (Probability density function). The function