Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Contents

2 Discrete Probability Distributions 2


2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.1 Discrete Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.2 Probability Mass Function (pmf) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.3 Cumulative Distribution Function (cdf) . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.4 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.5 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.6 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.7 Moments and moment generating function . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Binomial distribution as a limiting case of hypergeometric distribution . . . . . . . 15
2.4.2 Generalization of the hypergeometric distribution . . . . . . . . . . . . . . . . . . . 16
2.5 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7 Misc. Practice Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1
Chapter 2

Discrete Probability Distributions

Note: These lecture notes aim to present a clear and crisp presentation of some topics in Probability and
Statistics. Comments/suggestions are welcome via the e-mail: sukuyd@gmail.com to Dr. Suresh Kumar.

2.1 Definitions
2.1.1 Discrete Random Variable
Suppose a random experiment results into finite or countably infinite outcomes with sample space S.
Then a variable X taking real values x corresponding to each outcome of the random experiment (or each
element of S) is called a discrete random variable. In other words, the discrete random variable X
is a function from the sample space S to the set of real numbers. So, in principle, the discrete random
variable X being a function could have any given definition.

2.1.2 Probability Mass Function (pmf )


A function f is said to be probability mass function of a discrete random variable X if it satisfies the
following three conditions:
(i) f (x) ≥ 0 for each value x of X.
(ii) fX(x) = P (X = x), that is, f (x) provides probability for each value x of X.
(iii) f (x) = 1, that is, sum of probabilities of all values x of X is 1.
X=x

2.1.3 Cumulative Distribution Function (cdf )


A function F defined by
X
F (x) = f (x)
X≤x

is called cumulative distribution function of X. Therefore, F (x) is the sum of probabilities of all the
values of X starting from its lowest value to the value x.

Example with finite sample space


Consider the random experiment of tossing of two fair coins. Then the sample space is

S = {HH, HT, T H, T T }.

2
Let X denotes the number of heads. Then X is a discrete random variable or the function from the
sample space S onto the set {0, 1, 2}, that is,

X : S = {HH, HT, T H, T T } → {0, 1, 2}.

since X(HH) = 2, X(HT ) = 1, X(T H) = 1 and X(T T ) = 0. In tabular form, it can be displayed as

Outcome HH HT TH TT

X=x 2 1 1 0

So here the discrete random variable X assumes only three values X = 1, 2, 3.


We find that P (X = 0) = 14 , P (X = 1) = 21 and P (X = 2) = 41 . It is easy to see that the function f
given by

X=x 0 1 2
1 1 1
f (x) = P (X = x) 4 2 4

is pmf of X. It gives the probability distribution of X.


The cumulative distribution function F of X is given by

X=x 0 1 2
1 3
F (x) = P (X ≤ x) 4 4 1

Remark: Note that X is a function with domain as the sample space S. So, in the above example, X
could also be defined as the number of tails, and accordingly we could write its pdf and cdf.

Example with countably infinite sample space


Suppose a fair coin is tossed again and again till head appears. Then the sample space is

S = {H, T H, T T H, T T T H, . . . }.

The outcome H corresponds to the possibility of getting head in the first toss. The outcome T H cor-
responds to the possibility of getting tail in the first toss and head in the second toss. Likewise, T T H
corresponds to the possibility of getting head in the third toss, and so on. If X denotes the number of
tosses in this experiment, then X is a function from the sample space S to the set of natural numbers,
and is given by

Outcome H TH TTH ···


X=x 1 2 3 ···

So here the discrete random variable X assumes countably infinite values X = 1, 2, 3, · · · .

The pmf of X is given by

X=x 1 2 3 ···
1 1 2 1 3
 
f (x) = P (X = x) 2 2 2 ···

3
It can also be written in the closed form
 x
1
f (x) = , x = 1, 2, 3, ........
2

Notice that f (x) ≥ 0 for all x and


∞  x 1
X X 1 a
f (x) = = 2 1 = 1 (∵ The sum of infinite G.P. a + ar + ar2 + . . . is 1−r ).
2 1− 2
X=x x=1

The cumulative distribution function F of X is given by


1 1 x  x
  
2 1− 2 1
X
F (x) = f (x) = 1 =1− , where x = 1, 2, 3, .........
1− 2 2
X≤x

Note. Determining cdf could be very useful. For instance, in the above example, suppose it is required
to calculate P (10 ≤ X ≤ 30). Here, one option is to sum all the probabilities from P (X = 10) to
P (X = 30). Instead, we use the cdf to obtain
P (10 ≤ X ≤ 30) = F (30) − F (9) = 1 − 2130 − 1 − 219 = 219 − 2130 .
 

2.1.4 Expectation
Let X be a random variable with pmf p. Then, the expectation of X, denoted by E(X), is defined as
X
E(X) = xf (x).
X=x

More generally, if H(X) is function of the random variable X, then we define


X
E(H(X)) = H(x)f (x).
X=x

Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values 0, 1 and
2 with probabilities 41 , 12 and 14 respectively. So E(X) = 0 × 41 + 1 × 12 + 2 × 14 = 1.

Note: (1) The expectation E(X) of the random variable X is the theoretical average or mean value of
X. In a statistical setting, the average value, mean value1 and expected value are synonyms. The mean
value is demoted by µ. So E(X) = µ.

(2) If X is a random variable, then it is easy to verify the following:


(i) E(c) = c
(ii) E(cX) = cE(X)
(iii) E(cX + d) = cE(X) + d
(iv) E(cH(X) + dG(X)) = cE(H(X)) + dE(G(X))
1
From your high school mathematics, you know that if we have n distinct values x1 , x2 , ...., xn with frequencies f1 , f2 ,
n
X
...., fn respectively and fi = N , then the mean value is
i=1

n n   n
X fi xi X fi X
µ= = xi = f (xi )xi .
i=1
N i=1
N i=1

where f (xi ) = fNi is the probability of occurrence of xi in the given data set. Obviously, the final expression for µ is the
expectation of a random variable X assuming the values xi with probabilities f (xi ).

4
where c, d are constants, and H(X) and G(X) are functions of X. Thus, expectation respects the linearity
property.

(3) The expected or the mean value of the random variable X is a measure of the location of the center
of values of X.
Ex. A lot containing 7 components is sampled by a quality inspector; the lot contains 4 good compo-
nents and 3 defective components. A sample of 3 is taken by the inspector. Find the expected value of
the number of good components in this sample.
Sol. Let X represent the number of good components in the sample. Then probability distribution of
4C 3C
x 3−x
X is f (x) = 7C
, x = 0, 1, 2, 3.
3
Simple calculations yield f (0) = 1/35, f (1) = 12/35, f (2) = 18/35, and f (3) = 4/35. Therefore,
3
X 12
µ = E(X) = xf (x) = = 1.7.
7
x=0
Thus, if a sample of size 3 is selected at random over and over again from a lot of 4 good components
and 3 defective components, it will contain, on average, 1.7 good components.

Ex. A salesperson for a medical device company has two appointments on a given day. At the first
appointment, he believes that he has a 70% chance to make the deal, from which he can earn $ 1000
commission if successful. On the other hand, he thinks he only has a 40% chance to make the deal at
the second appointment, from which, if successful, he can make $1500. What is his expected commission
based on his own probability belief? Assume that the appointment results are independent of each other.
Sol. First, we know that the salesperson, for the two appointments, can have 4 possible commission
totals: $0, $1000, $1500, and $2500. We then need to calculate their associated probabilities. By inde-
pendence, we obtain
f (0) = (1 − 0.7)(1 − 0.4) = 0.18,
f (2500) = (0.7)(0.4) = 0.28,
f (1000) = (0.7)(1 − 0.4) = 0.42,
f (1500) = (1 − 0.7)(0.4) = 0.12.

Therefore, the expected commission for the salesperson is


E(X) = (0)(0.18) + (1000)(0.42) + (1500)(0.12) + (2500)(0.28) = $1300.

2.1.5 Variance
Let X and Y be two random variables assuming the values X = 1, 9 and Y = 4, 6. We observe that both
the variables have the same mean values given by µX = µY = 5. However, we see that the values of X
are far away from the mean or the central value 5 in comparasion to the values of Y . Thus, the mean
value of a random variable does not account for its variability. In this regard, we define a new parameter
known as variance. It is defined as follows.

If X is a random variable with mean µ, then its variance, denoted by V (X) is defined as the expec-
tation of (X − µ)2 . So, we have
V (X) = E((X − µ)2 ) = E(X 2 ) + µ2 − 2µE(X) = E(X 2 ) + E(X)2 − 2E(X)E(X) = E(X 2 ) − E(X)2 .

Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values 0, 1 and
2 with probabilities 41 , 12 and 14 respectively. So
E(X) = 0 × 14 + 1 × 12 + 2 × 41 = 1,
E(X 2 ) = (0)2 × 14 + (1)2 × 21 + (2)2 × 14 = 32 .

5
3
∴ V (X) = 2 − 1 = 12 .

Note: (i) The variance V (X) of the random variable X is also denoted by σ 2 . So V (X) = σ 2 .
(ii) If X is a random variable and c is a constant, then it is easy to verify that V (c) = 0 and V (cX) =
c2 V (X).

2.1.6 Standard Deviation


The variance of a random variable, by definition, is sum of the squares of the differences of the values
of the random variable from the mean value. So variance carries squared units of the original data, and
hence is a pure number often without any physical meaning. To overcome this problem, a second measure
of variability is employed known as standard deviation and is defined as follows.
Let X be a random variable with variance σ 2 . Then the standard deviation of X denoted by σ is the
the non-negative square root of X, that is,
p
σ= V (X).
Note: A large standard deviation implies that the random variable X is rather inconsistent and somewhat
hard to predict. On the other hand, a small standard deviation is an indication of consistency and stability.

2.1.7 Moments and moment generating function


Let X be a random variable and k be any positive integer. Then E(X k ) defines the kth ordinary moment
of X.
Obviously, E(X) = µ is the first ordinary moment, E(X 2 ) is the second ordinary moment and so on.
Further, the ordinary moments can be obtained from the function E(etX ). For, the ordinary moments
k
E(X k ) are coefficients of tk! in the expansion

t2
E(etX ) = 1 + tE(X) + E(X 2 ) + ............
2!
Also, we observe that

dk 
E(X k ) = E(etX ) t=0 .

dtk

Thus, the function E(etX ) generates all the ordinary moments. That is why, it is known as the moment
generating function and is denoted by mX (t). Thus, mX (t) = E(etX ).
In general, the kth moment of a random variable X about any point a is defined as E((X − a)k ).
Obviously, a = 0 for the ordinary moments. Further, E(X − µX ) = 0 and E((X − µX )2 ) = σX 2 . So the

first moment about mean is 0 while the second moment about mean yields the variance.

6
2.2 Geometric Distribution
The geometric distribution arises under the following conditions:
(i) The random experiment consists of a series of independent trials.
(ii) Each trial results into two outcomes, namely success (S) and failure (F ), which have constant proba-
bilities p and q = 1 − p, respectively.a
(iii) X denotes the number of trials to obtain the first success.
a
Such trials are called as Bernoulli trials.
In the X = x trials, the first x − 1 trials result in failures (each with probability q) and the last xth
trial results in a success (with probability p). Further all the trials are independent. So the probability
of obtaining the first success in x trials is q x−1 p. Thus, the pmf of X, denoted by g(x; p), is given by
g(x; p) = q x−1 p, x = 1, 2, 3, · · ·
The random variable X with this pmf is called geometric random variable. Here the name ‘geometric’
because the probabilities p, qp, q 2 p,.... in succession constitute a geometric progression. Given the value
of the parameter p, the probability distribution of the geometric random variable X is uniquely described.

Mean, variance and mgf of geometric random variable


For the geometric random variable X, we have

X
(i) µX = E(X) = xg(x; p)
x=1

X
= xq x−1 p
x=1
X∞
=p xq x−1
x=1

!
d X
=p qx
dq
x=1

X
(∵ Term by term differentiation is permissible for the convergent power series q x within its interval of
x=1
convergence
 |q|< 1.)
d q
=p
dq 1 − q
1
=p
(1 − q)2
1
=
p
(ii) σX2 = E(X 2 ) − E(X)2 = E(X(X − 1)) + E(X) − E(X)2

X 1 1
=p x(x − 1)q x−1 + − 2
p p
x=1

X q
= pq x(x − 1)q x−2 − 2
p
x=1
2

d q q
=p 2 − 2
dq 1 − q  p
d2 q q
= pq 2 − 2
dq 1−q p

7
 
d 1 q
= pq − 2
dq (1 − q)2 p
2 q
= pq 3
− 2
(1 − q) p
2 q
= pq 3 − 2
p p
2q q
= 2 − 2
p p
q
= 2.
p

X
(iii) mX (t) = E(etX ) = etx g(x; p)
x=1

X
=p etx q x−1
x=1

pX t x
= (qe )
q
x=1
p qet
= (t < − ln q)
q 1 − qet
pet
=
1 − qet
Remark: Note that we can easily obtain E(X) and E(X 2 ) from the moment generating function mX (t)
by using

dk
E(X k ) = [mX (t)]t=0 ,
dtk
for k = 1 and k = 2 respectively. In other words the first and second t-derivatives of mX (t) at t = 0
provide us E(X) and E(X 2 ), respectively. Hence we easily get mean and variance from the moment
generating function. Verify!

Ex. A fair coin is tossed again and again till head appears. If X denotes the number of tosses in this
experiment, then find the probability distribution of X. x
Sol. X is a geometric random variable with p = 12 . So its pmf is g(x) = 21 , x = 1, 2, 3, · · · .

Ex. For a certain manufacturing process, it is known that, on the average, 1 in every 100 items is
defective. What is the probability that the fifth item inspected is the first defective item found?
Sol. Here p = 1/100 = 0.01 and x = 5. So required probability is (0.01)(0.99)4 = 0.0096.

2.2.1 Negative Binomial Distribution


In geometric distribution, X is the number of trials to obtain the first success. Its more general version is
that we choose X as the number of trials to obtain the kth success. Then X is called a negative binomial
random variable with the values X = k, k + 1, k + 2, · · · . Since the final trial among the x trials would
result in a success, therefore the remaining k − 1 successes can occur in x−1 k−1 ways from the first x − 1
trials. Overall, there are k successes (each with probability p) and x − k failures (each with probability
q), where k − 1 successes can occur in x−1
k−1 ways. Also, all the trials are independent. It follows that the
pmf of the negative binomial random variable X, denoted by nb(x; k, p), is given by
 
x − 1 k x−k
nb(x; k, p) = p q , x = k, k + 1, k + 2, · · · .
k−1

8
If we make a change of variable via y = x − k, then
   
k+y−1 k y k+y−1 k y
nb(y; k, p) = p q = p q , y = 0, 1, 2, . . . .
k−1 y
   
n n
Here we have used the well known result: = .
x n−x
∞  
−k
X k+y−1 y
Note that (1 − q) = q is a negative binomial series.
y
y=0

X
Now, let us show that nb(x; k, p) = 1. For,
x=k
∞ ∞  
X X x − 1 k x−k
nb(x; k, p) = p q
k−1
x=k x=k
∞  
X y+k−1 k y
= p q , where y = x − k
k−1
y=0
∞  
k
X y+k−1 y
=p q
y
y=0
= p (1 − q)−k
k

= pk p−k
= 1.

Mean, variance and mgf of negative binomial random variable


For the negative binomial random variable X, we have

X
(i) µX = E(X) = x nb(x; k, p)
x=k
∞  
X x − 1 k x−k
= x p q
k−1
x=k
∞  
X x k x−k
= k p q
k
x=k
∞  
k X x k+1 x−k
= p q
p k
x=k
∞  
k X y−1
= pk+1 q y−(k+1) , where x = y − 1,
p (k + 1) − 1
y=k+1

k X
= nb(y; k + 1, p)
p
y=k+1
k
= .1
p
k
=
p

X
(ii) E((X + 1)X) = (x + 1)x nb(x; k, p)
x=k

9
∞  
X x − 1 k x−k
= (x + 1)x p q
k−1
x=k
∞  
X x + 1 k x−k
= (k + 1)k p q
k+1
x=k
∞  
k(k + 1) X x + 1 k+2 x−k
= p q
p2 k+1
x=k
∞  
k(k + 1) X y−1
= pk+2 q y−(k+2) , where x = y − 2,
p2 (k + 2) − 1
y=k+2

k(k + 1) X
= nb(y; k + 2, p)
p2
y=k+2
k(k + 1)
= .1
p2
k(k + 1)
=
p2
k(k + 1) k k 2 kq
So V (X) = E((X + 1)X) − E(X) + E(X)2 = 2
− − 2 = 2
p p p p
∞ ∞  
X X x − 1 k x−k
(iii) mX (t) = E(etX ) = etx nb(x; k, p) = etx p q
k−1
x=k x=k
∞  
X x−1
= pk q x−k etx
k−1
x=k
∞  
X y + k − 1 k y t(y+k)
= p q e , where y = x − k
k−1
y=0
∞  
k tk
X y+k−1
=p e (qet )y
y
y=0
= (pet )k (1 − qet )−k
k
pet

=
1 − qet
Ex. A fair coin is tossed again and again till head appears third time. If X denotes the number of tosses
in this experiment, find the probability distribution of X.  1 x
Sol. X is a negative binomial random variable with p = 21 , k = 3. So its pmf is g(x) = x−1 2 2 ,
x = 3, 4, 5, · · · .

Ex. In a championship series, the team that wins four games out of seven is the winner. Suppose that
teams A and B face each other in the championship games and that team A has probability 0.55 of
winning a game over team B.
(a) What is the probability that team A will win the series in 6 games?
(b) What is the probability that team A will win the series?
Sol. (a) Here x = 6, k = 4, p = 0.55. So required probability is
nb(4; 4, 0.55) = 6−1 4
4−1 (0.55) (1 − 0.55)
6−4 = 0.1853.

(b) The team A can win the championship series in 4th or 5th or 6th or the 7th game. So required
probability is
nb(4; 4, 0.55) + nb(5; 4, 0.55) + nb(6; 4, 0.55) + nb(7; 4, 0.55) = 0.6083.

10
2.3 Binomial Distribution
The binomial distribution arises under the following conditions:
(i) The random experiment consists of a finite number n of independent trials.
(ii) Each trial results into two outcomes, namely success (S) and failure (F ), which have constant proba-
bilities p and q = 1 − p, respectively in each trial.
(iii) X denotes the number of successes in n trials.
There are X = x successes (each with probability p) and n − x failures (each with probability q).
Further all the trials are independent and the X = x successes can take place in nx ways. So the
probability of x successes in n trials is nx q n−x px . Thus, the pmf of X, denoted by b(x; n, p), is given by


 
n n−x x
b(x; n, p) = q p , x = 0, 1, 2, 3, · · · , n.
x

The random variable X with  this pmf is called  binomial random variable. Here the name ‘binomial’
n n n n−1 n n
because the probabilities 0 q , 1 q p,· · · , n p in succession are the terms in the binomial expansion
of (q + p)n . Once the values of the parameters n and p are given/determined, the pmf uniquely describes
the binomial distribution of X.

Mean, variance and mgf of binomial random variable


For the binomial random variable X, we have
n
X
(i) µX = E(X) = xb(x; n, p)
x=0
n  
X n n−x x
= x q p
x
x=0
n  
X n − 1 n−x x−1
= np q p
x−1
x=1
n−1
X n − 1
= np q n−1−y py (where y = x − 1)
y
y=0
= np(p + q)n−1 = np.
n
X
(ii) E(X(X − 1)) = x(x − 1)b(x; n, p)
x=0
n  
X n n−x x
= x(x − 1) q p
x
x=0
n  
2
X n − 2 n−x x−2
= n(n − 1)p q p
x−2
x=2
n−2
X n − 2
= n(n − 1)p 2 q n−2−y py (where y = x − 2)
y
y=0
= n(n − 2
1)p (p + q)n−2 = n(n − 1)p2 .

2 = E(X 2 ) − E(X)2 = E(X(X − 1)) + E(X) − E(X)2 = n(n − 1)p2 + np − n2 p2 = npq


So σX
n
X
(iii) mX (t) = E(etX ) = xb(x; n, p)
x=0

11
n  
X n n−x x
tx
= e q p
x
x=0
n  
X n n−x t x
= q (pe )
x
x=0
= (q + pet )n

Note: In the particular case n = 1, the binomial distribution is called Bernoulli distribution:

b(x; 1, p) = q 1−x px , x = 0, 1.

Ex. Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?
Sol. Here n = 5, p = 1/6, x = 2, and therefore
P (X = 2) = b(2; 5, 1/6) = 52 (1 − 1/6)5−2 (1/6)2 = 0.161.

Ex. In a bombing attack, there is a 50% chance that any bomb will strike the target. At least two direct
hits are required to destroy the target. How many minimum number of bombs must be dropped so that
the probability of hitting the target at least twice is more than 0.99?
Sol. Let n bombs must be dropped so that there is at least 99% chance to hit the target at least twice.
Let X be random variable representing the number of bombs striking the target. Then X = 0, 1, 2, ...., n
follows a a binomial distribution with, p = 1/2, and therefore
P (X ≥ 2) ≥ 0.99 or 1 − P (X = 0) − P (X = 1) ≥ 0.99
It can be simplified to get 2n ≥ 100 + 100n. This inequality is satisfied if n ≥ 11. So at least 11 bombs
must be dropped so that there is at least 99% chance to hit the target at least twice.

2.3.1 Multinomial Distribution


The binomial experiment becomes a multinomial experiment if we let each trial have more than two
possible outcomes. For example, the drawing of a card from a deck with replacement is a multinomial
experiment if the 4 suits are the outcomes of interest.
In general, if a given trial can result in any one of k possible outcomes o1 , o2 ,. . . , ok with probabilities
p1 , p2 ,. . . , pk , then the multinomial distribution gives the probability that o1 occurs x1 times, o2 occurs
x2 times, . . . , and ok occurs xk times in n independent trials, as follows:
 
n
f (x1 , x2 , . . . , xk ) = px1 1 px2 2 . . . pxnk ,
x1 , x2 , ..., xk

where
 
n n!
= ,
x1 , x2 , ..., xk x1 !x2 ! . . . xk !

x1 + x2 + · · · + xk = n, p1 + p2 + · · · + pk = 1.
Clearly, when k = 2, the multinomial distribution reduces to the binomial distribution.

Ex. The probabilities that a person goes to office by car, bus and train are 1/2, 1/4 and 1/4, respectively.
Find the probability that the person will go to office 2 days by car, 3 days by bus and 1 day by train in
the 6 days. 
6! 1 2 1 3 1
 
Sol. 2!3!1! 2 4 4 .

12
2.4 Hypergeometric Distribution
The hypergeometric distribution arises under the following conditions:
(i) The random experiment consists of choosing n objects without replacement from a lot of N objects given
that r objects possess a trait or property of our interest in the lot of N objects.
(ii) X denotes the number of objects possessing the trait or property in the selected sample of size n.
See the following venn diagram for an illustration.

N
N -r r

n- x x

n
It is easy to see that the x objects with the trait (by definition of X) are to be chosen from the r
−r
objects in xr ways while the remaining n − x objects are to be chosen from the N − r objects in N n−x 
−r
ways. So the n objects carrying x items with the trait can be chosen from the N objects in xr N

n−x
ways while N

n is the total numbers of ways in which n objects can be chosen from N objects. Therefore,
the pmf of X, denoted by h(x; N, r, n) is given by
r N −r
 
x n−x
∴ h(x; N, r, n) = P (X = x) = N
 .
n
The random variable X with this pmf is called hypergeometric random variable. The hypergeometric
distribution is characterized by the three parameters N , r and n. Note that X lies in the range max(0, n+
r − N ) ≤ x ≤ min(n, r). So minimum value of x could be n + r − N instead of 0. To understand this, let
N = 30, r = 20 and n = 15. Then the minimum value of x is n + r − N = 15 + 20 − 30 = 5. For, there are
only N − r = 10 objects without the trait in the 30 items. So a sample of 15 items certainly contains at
least 5 objects with the trait. So in this case, the random variable X takes the values 5, 6, ..., 15. Notice
that the maximum value of x is min(n, r) = min(20, 15) = 15. Similarly, if we choose n = 25, the random
variable X takes the values 15, 16, 17, 18, 19 and 20. In case, we choose n = 8, the random variable X
takes the values 0, 1, 2 ..., 8.
Next, let us check whether h(x; N, r, n) is a valid pmf. Note that x ∈ [max(0, n + r − N ), min(n, r)].
But we can take x ∈ [0, n] because in situations where this range is not [0, n], we have h(x; N, r, n) = 0.
Also, know the Vandermonde’s identity:
a
n      n
 b 
X a b a+b X x n−x
= or a+b
 = 1.
x n−x n
x=0 x=0 n

This identity is understandable in view of the following example.

Suppose a team of n persons is chosen from a group of a men and b women. The number of ways
of choosing the team of n persons from the group of a + b persons is a+b

n , the right hand side of the

13
Vandermonde’s identity. We can count these number of ways by considering that in the team of n persons,
x persons are men and remaining n − x persons are women. Then we end up with getting the left hand
side of the Vandermonde’s identity.
Now from the Vandermonde’s
 identity, it follows that
n n r N −r

X X x n−x
h(x; N, r, n) = N
 = 1. Thus, h(x; N, r, n) is a valid pmf.
x=0 x=0 n

Mean, variance and mgf of hypergeometric random variable


r 2 =

For thehypergeometric random variable X, it can be shown that µX = E(X) = n N and σX
 r  N − r N − n
n .
N N N −1
X∞
For, µX = E(X) = xh(x; N, r, n)
 −r x
X xr N n−x
= x N

x n
 r X r−1 N −1−(r−1)

x−1 n−1−(x−1)
=n N −1

N x n−1
r−1 N −1−(r−1)
 
r X x−1 n−1−(x−1)
=n , since N −1
 = 1 being the sum of the probabilities for a hypergeometric distri-
N x n−1
bution with parameters N − 1, r − 1 and n − 1.
 r  r − 1 
Likewise, it is easy to find that E(X(X − 1)) = n(n − 1) . Hence, we have
   N N − 1
  N −r N −n
2 = E(X(X − 1)) + E(X) − E(X)2 = n r
σX .
N N N −1
Just for the sake of completeness in line with the other distributions, some details of the moment
generating function of the hypergeometric distribution are given below. It is given by
r N −r N −r
 
t

tX
X
tx x n−x n 2 F1 (−n, −r; N − r − n + 1; e )
mX (t) = E(e ) = e N
 = N
 .
x n n

Here 2 F1 is hypergeometric function defined as the sum of the infinite series

ab z a(a − 1)b(b − 1) z 2
2 F1 (a, b; c; z) = 1 + + + .......,
c 1! c(c − 1) 2!
where a, b, c are constants, and z is variable of the hypergeometric function.

Also, note that


d ab
[2 F1 (a, b; c; z)] = 2 F1 (a + 1, b + 1; c + 1; z),
dz c
d2 a(a − 1)b(b − 1)
2
[2 F1 (a, b; c; z)] = 2 F1 (a + 2, b + 2; c + 2; z).
dz c(c − 1)
Following this, it can be shown that
 
d r
µX = E(X) = mX (t) =n .
dt t=0 N

14
Similarly, by calculating 2 = E(X 2 ) −
second derivative of mX (t) at t = 0, the variance can be found as σX
 r  N − r N − n
E(X)2 = n .
N N N −1
Ex. Suppose we randomly select 5 cards without replacement from a deck of 52 playing cards. What is
the probability of getting exactly 2 red cards?
Sol. Here N = 52, r = 26, n = 5, x = 2, and therefore P (X = 2) = h(2; 52, 26, 5) = 0.3251.

2.4.1 Binomial distribution as a limiting case of hypergeometric distribution


There is an interesting relationship between the hypergeometric and the binomial distributions.
It can be shown that if the population size N → ∞ in such a way that the proportion of successes
r/N → p, and n is held constant, then the hypergeometric probability mass function approaches the
binomial probability mass function.
Proof: We have

r N −r
 
x n−x r! (N − r)! n! · (N − n)!
h(x; N, r, n) = N
= ·
x! · (r − x)! (n − x)! · (N − n − (r − x))!

n
N!
 
n r!/(r − x)! (N − r)! · (N − n)!
= · ·
x N !/(N − x)! (N − x)! · (N − r − (n − x))!
 
n r!/(r − x)! (N − r)!/(N − r − (n − x))!
= · ·
x N !/(N − x)! (N − n + (n − x))!/(N − n)!
  Y x n−x
n (r − x + k) Y (N − r − (n − x) + m)
= · ·
x (N − x + k) (N − n + m)
k=1 m=1

Now taking the large N limit for fixed r/N , n and x we get the binomial pmf,
 
n x n−x
b(x; n, p) = p q
x

since
(r − x + k) r
lim = lim =p
N →∞ (N − x + k) N →∞ N

and
(N − r − (n − x) + m) N −r
lim = lim = 1 − p = q.
N →∞ (N − n + m) N →∞ N

In practice, this means that we can approximate the hypergeometric probabilities with binomial prob-
abilities, provided N ≫ n. As a rule of thumb, if the population size is more than 20 times the sample size
(N > 20n) or N/n > 20, then we may use binomial probabilities in place of hypergeometric probabilities.

Ex. A manufacturer of automobile tires reports that among a shipment of 5000 sent to a local distributor,
1000 are slightly blemished. If one purchases 10 of these tires at random from the distributor, what is the
probability that exactly 3 are blemished?
Sol. We find P (X = 3) = 0.2013 from binomial distribution, and P (X = 3) = 0.2015 from hypergeometric
distribution.

15
2.4.2 Generalization of the hypergeometric distribution
Consider a lot of N objects given that r1 , r2 , ....., rk objects possess different traits of our interest such that
r1 + r2 + .... + rk = N . Suppose a lot of n objects is randomly chosen (without replacement) where x1 , x2 ,
..., xk objects have the traits as in the r1 , r2 , ....., rk objects, respectively, such that x1 + x2 + .... + xk = n.
Then the probability of the random selection is
r1 r2 rk
  
x x .... x
f (x1 , x2 , ...., xk ) = 1 2N  k .
n

Ex. Ten cards are randomly chosen without replacement from a deck of 52 playing cards. Find the
probability of getting 2 spades, 3 clubs, 4 diamonds and 1 heart?
Sol. Here N = 52, r1 = 13, r2 = 13, r3 = 13, r4 = 13, n = 10, x1 = 2, x2 = 3, x3 = 4, x4 = 1. So the
required probability is
13 13 13 13
   
2 3
4
52
1
.
10

2.5 Poisson Distribution


Consider the pmf of the binomial random variable X:
 
n x
b(x; n, p) = p (1 − p)n−x , x = 0, 1, 2, · · · , n.
x
Let us calculate limiting form of the Binomial distribution as n → ∞, p → 0, and np = k is a constant.
We have  
n x
b(x; n, p) = p (1 − p)n−x
x
n!
= px (1 − p)n−x
x!(n − x)!
n(n − 1)...(n − x + 1) x
= p (1 − p)n−x
x!
(np)(np − p)...(np − xp + p)
= (1 − p)n−x
x!
(np)(np − p)...(np − xp + p)
= (1 − p)−x (1 − p)n
x!
(k)(k − p)...(k − xp + p)
= (1 − p)−x (1 − p)k/p (Using np = k)
x!
Thus, in the limit p → 0, we get

(k)(k − 0)...(k − 0) e−k k x


p(x; k) = (1 − 0)−x e−k = ,
x! x!
known as the pmf of Poisson distribution.

Notice that the conditions n → ∞, p → 0 and np = k, intuitively refer to a situation where the sample
space of the random experiment is a continuous interval or medium (thus carrying infinitely many points,
n → ∞); the probability p of discrete occurrences of an event of interest is very small (p → 0) such that
the mean number of occurrences np of the event remains constant k.

16
Thus, formally the Poisson distribution arises under the following conditions:
(i) The random experiment consists of counting or observing discrete occurrences of an event in a continuous
region or time intervala of some given size s, called as a Poisson process or Poisson experiment. For example,
counting number of airplanes landing on Delhi airport between 9am to 11am, observing the white blood
cells in a sample of blood etc. are Poisson experiments.
(ii) λ denotes the number of occurrences of the event of interest per unit measurement of the given region
of size s. Then k = λs is the expected or mean number of occurrences of the event in size s.
(iii) X denotes the number of occurrences of the event in the region of size s.
a
Note that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of
time, etc.
Then X is called a Poisson random variable, and its pmf can be proved to be

e−k k x
p(x; k) = , x = 0, 1, 2, ....
x!
The Poisson distribution is characterized by the single parameter k.

Mean, variance and mgf of Poisson random variable



X
(i) µX = E(X) = xp(x; k)
x=0

X e−k k x
= x
x!
x=1

X k x−1
= ke−k
(x − 1)!
x=1
= ke−k ek
=k

2 = E(X 2 ) − E[X]2 = E(X(X − 1)) + E(X) − E(X)2


(ii) σX

X e−k k x
= x(x − 1) + k − k2
x!
x=2

X k x−2
2
=k e −k + k − k2
(x − 2)!
x=2
= k 2 e−k ek + k − k 2
=k
We notice that
2
µX = k = σ X .

X
(iii) mX (t) = E(etX ) = etx p(x; k)
x=0

X e−k k x
= etx
x!
x=0

X (ket )x
= e−k
x!
x=0
−k ket
=e e
t
= ek(e −1)

17
Ex. A healthy person is expected to have 6000 white blood cells per ml of blood. A person is tested
for white blood cells count by collecting a blood sample of size 0.001ml. Find the probability that the
collected blood sample will carry exactly 3 white blood cells.
−6 3
Sol. Here λ = 6000, s = 0.001, k = λs = 6 and x = 3, and therefore P (X = 3) = p(3; 6) = e 3!6 .

Ex. In the last 5 years, 10 students of BITS Pilani are placed with a package of more than one crore.
Find the probability that exactly 7 students will be placed with a package of more than one crore in the
next 3 years.
−6 7
Sol. Here λ = 10/5 = 2, s = 3, k = λs = 6 and x = 7, and therefore P (X = 7) = p(7; 6) = e 7!6 .

Note: We proved that the Binomial distribution tends to the Poisson distribution as n → ∞, p → 0 and
np = k remains constant. Thus, we may use Poisson distribution to approximate binomial probabilities
when n is large and p is small. As a rule of thumb this approximation can safely be applied if n > 50 and
np < 5.
Ex. In a certain industrial facility, accidents occur infrequently. It is known that the probability of an
accident on any given day is 0.005 and accidents are independent of each other.
(a) What is the probability that in any given period of 400 days there will be an accident on one day?
(b) What is the probability that there are at most three days with an accident?
Sol. Let X be a binomial random variable with n = 400 and p = 0.005. Thus, np = 2. Using the Poisson
approximation,
(a) P (X = 1) = e−2 21 = 0.271 and
X3
(b) P (X ≤ 3) = e−2 2x /x! = 0.857.
x=0

2.6 Uniform Distribution


A random variable X is said to follow uniform distribution if it assumes finite number of values all with
same chance of occurrence or equal probabilities. For instance, if the random variable X assumes n values
x1 , x2 , .... , xn with equal probabilities P (X = xi ) = 1/n, then it is uniform random variable with pmf
given by
1
u(x) = , x = x1 , x2 , ...., xn .
n
The moment generating function, mean and variance of the uniform random variable respectively read
as
n n
1 X txi 1X
mX (t) = e , µ= xi ,
n n
i=1 i=1

n n
!2
2 1X 2 1X
σ = xi − xi .
n n
i=1 i=1

Ex. Suppose a fair die is thrown once. Let X denotes the number appearing on the die. Then X is a
discrete random variable assuming the values 1, 2, 3, 4, 5, 6. Also, P (X = 1) = P (X = 2) = P (X = 3) =
P (X = 4) = P (X = 5) = P (X = 6) = 1/6. Thus, X is a uniform random variable.

18
2.7 Misc. Practice Problems
1. A shipment of 20 similar laptop computers to a retail outlet contains 3 that are defective. If a school
makes a random purchase of 2 of these computers, find the probability distribution for the number
of defectives.
Sol. f (0) = 68/95, f (1) = 51/190 and f (2) = 3/190.

2. Find the probability distribution of the number of heads in a toss of four coins. Also, plot the
probability mass function and probability histogram.
Sol. Total number of points in the  sample
 space
 is 16. The number points in the sample space
with 0, 1, 2, 3 and 4 heads are 40 , 41 , 42 , 43 and 44 , respectively. So f (0) = 40 /16 = 1/16,


f (1) = 41 /16 = 1/4, f (2) = 42 /16 = 3/8, f (3) = 43 /16 = 1/4 and f (4) = 44 /16 = 1/16.
   

Thus, f (x) = x4 /16, x = 0, 1, 2, 3, 4.




The probability mass function plot and probability histogram are shown in Figure 2.1.

Figure 2.1: Probability mass function plot and probability histogram

3. If a car agency sells 50% of its inventory of a certain foreign car equipped with side airbags, find a
formula for the probability distribution of the number of cars with side airbags among the next 4
cars sold by the agency.
Sol. f (x) = x4 /16, x = 0, 1, 2, 3, 4.

4. Suppose that the number of cars X that pass through a car wash between 4:00 P.M. and 5:00 P.M.
on any sunny Friday has the following probability distribution:

X=x 4 5 6 7 8 9
1 1 1 1 1 1
f (x) = P (X = x) 12 12 4 4 6 6

Let g(X) = 2X −1 represent the amount of money, in dollars, paid to the attendant by the manager.
Find the attendant’s expected earnings for this particular time period.
Sol. We find
X9
E(g(X)) = E(2X − 1) = (2x − 1)f (x) = $12.67.
x=4

5. Find the mean and variance of a random variable X with the pmf given by

f (x) = cx, x = 1, 2, 3, ...., n

19
where c is a constant and n is some fixed natural number.
n
X 2
Sol. Using the condition f (x) = 1, we get c(1 + 2 + ...... + n) = 1 or c = .
n(n + 1)
x=1
n n
X X n(n + 1)(2n + 1) 2n + 1
Now µ = E(X) = xf (x) = cx2 = c = .
6 3
x=1 x=1
n n
X n2 (n + 1)2
X n(n + 1)
E(X 2 ) = x2 f (x) = cx3 = c= .
4 2
x=1 x=1

2n + 1 2
 
2 2 2 n(n + 1)
σ = E(X ) − E(X) = − .
2 3
6. Consider a random variable X with the pmf given by

f (x) = c 2−|x| , x = ±1, ±2, ±3, ....,

2|X|
where c is a constant. If g(X) = (−1)|X|−1 , then show that E(g(X)) exists but E(|g(X)|)
2|X| − 1
does not exist.

X
Sol. Using the condition f (x) = 1, we find c = 1/2.
x=±1
∞ ∞
X X 1
Now E(g(X)) = g(x)f (x) = (−1)|x|−1
, which is an alternating and convergent
2(2|x| − 1)
x=±1 x=±1

X 1
series. So E(g(X)) exists. But E(|g(X)|) = is a divergent series, so E(|g(X)|) does
2(2|x| − 1)
x=±1
not exist.

7. Let the random variable X represent the number of automobiles that are used for official business
purposes on any given workday. The probability distribution for company A is

x 1 2 3

f (x) 0.3 0.4 0.3

and that for company B is

x 0 1 2 3 4

f (x) 0.2 0.1 0.3 0.3 0.1

Show that the variance of the probability distribution for company B is greater than that for com-
pany A.
2 = 0.6, µ = 2.0 and σ 2 = 1.6.
Sol. µA = 2.0, σA B B

20
8. Calculate the variance of g(X) = 2X +3, where X is a random variable with probability distribution

x 0 1 2 3
1 1 1 1
f (x) 4 8 2 8
2
Sol. µ2X+3 = 6, σ2X+3 = 4.

9. At a “busy time,” a telephone exchange is very near capacity, so callers have difficulty placing
their calls. It may be of interest to know the number of attempts necessary in order to make a
connection. Suppose that we let p = 0.05 be the probability of a connection during a busy time.
Find the probability of a successful call in the fifth attempt.
Sol. Here p = 1/100 = 0.05 and x = 5. So required probability is (0.05)(0.95)4 = 0.041.

10. The probability that a certain kind of component will survive a shock test is 3/4. Find the proba-
bility that exactly 2 of the next 4 components tested survive.
Sol. Here n = 4, p = 3/4, x = 2, and therefore
P (X = 2) = b(2; 4, 3/4) = 42 (1 − 3/4)4−2 (3/4)2 = 27/128.

11. The probability that a patient recovers from a rare blood disease is 0.4. If 15 people are known to
have contracted this disease, what is the probability that (a) at least 10 survive, (b) from 3 to 8
survive, and (c) exactly 5 survive?
Sol. (a) 0.0338 (b) 0.8779 (c) 0.1859

12. A large chain retailer purchases a certain kind of electronic device from a manufacturer. The man-
ufacturer indicates that the defective rate of the device is 3%.
(a) The inspector randomly picks 20 items from a shipment. What is the probability that there will
be at least one defective item among these 20?
(b) Suppose that the retailer receives 10 shipments in a month and the inspector randomly tests 20
devices per shipment. What is the probability that there will be exactly 3 shipments each containing
at least one defective device among the 20 that are selected and tested from the shipment?
Sol. (a) Denote by X the number of defective devices among the 20. Then X follows a binomial
distribution with n = 20 and p = 0.03. Hence, P (X ≥ 1) = 1 − P (X = 0) = 0.4562.

(b) In this case, each shipment can either contain at least one defective item or not. Hence, testing
of each shipment can be viewed as a Bernoulli trial with p = 0.4562 from part (a). Assuming
independence from shipment to shipment and denoting by Y the number of shipments containing
at least one defective item, Y follows another binomial distribution with n = 10 and p = 0.4562.
Therefore,
P (Y = 3) = 0.1602.

13. The complexity of arrivals and departures of planes at an airport is such that computer simulation
is often used to model the “ideal” conditions. For a certain airport with three runways, it is known
that in the ideal setting the following are the probabilities that the individual runways are accessed
by a randomly arriving commercial jet:
Runway 1: p1 = 2/9,
Runway 2: p2 = 1/6,
Runway 3: p3 = 11/18.
What is the probability that 6 randomly arriving airplanes are distributed in the following fashion?

21
Runway 1: 2 airplanes,
Runway 2: 1 airplane,
Runway 3: 3airplanes
6! 2 2 1 11 3
 
Sol. 2!1!3! 9 6 18 .

14. Lots of 40 components each are deemed unacceptable if they contain 3 or more defectives. The
procedure for sampling a lot is to select 5 components at random and to reject the lot if a defective
is found. What is the probability that exactly 1 defective is found in the sample if there are 3
defectives in the entire lot?
Sol. Here N = 40, r = 3, n = 5, x = 1, and therefore P (X = 1) = h(x; 40, 3, 5) = 0.3011.

15. During a laboratory experiment, the average number of radioactive particles passing through a
counter in 1 millisecond is 4. What is the probability that 6 particles enter the counter in a given
millisecond?
Sol. Here k = 4 and x = 6.

16. Ten is the average number of oil tankers arriving each day at a certain port. The facilities at the
port can handle at most 15 tankers per day. What is the probability that on a given day tankers
have to be turned away?
Sol. Here k = 10 and required probability is
P (X > 15) = 1 − P (X ≤ 15)
X15
=1− P (X = x) = 1 − 0.9513 = 0.0487.
x=0

17. In a manufacturing process where glass products are made, defects or bubbles occur, occasionally
rendering the piece undesirable for marketing. It is known that, on average, 1 in every 1000 of these
items produced has one or more bubbles. What is the probability that a random sample of 8000
will yield fewer than 7 items possessing bubbles?
Sol. This is essentially a binomial experiment with n = 8000 and p = 0.001. Since p is very close to 0
and n is quite large, we shall approximate with the Poisson distribution using k = (8000)(0.001) = 8.
Hence, if X represents the number of bubbles, the require probability is P (X < 7) = 0.3134.

22

You might also like