Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

EE319 Probability & Random Processes (Spring 2020)

Lecture 2020-04-24

Hypergeometric PGF

Prof. Dr.-Ing. Mukhtar Ullah


Head of Electrical Engineering
FAST NUCES, Islamabad
Hyper-geometric PGF
On the combined space of Bernoulli trials, define RVs
𝑋𝑖 ↔ ‘success indicator for trial 𝑖’
𝑆𝑚 ↔ ‘number of successes in a sequence of 𝑚 trials’
𝑆𝑟 ↔ ‘number of successes in a shorter sequence of 𝑟 trials’
which are distributed as
𝑋𝑖 ∼ Ber𝑝
𝑚
Õ
𝑆𝑚 = 𝑋𝑖 ∼ Bin𝑚,𝑝
𝑖=1
𝑟
Õ
𝑆𝑟 = 𝑋𝑖 ∼ Bin𝑟,𝑝
𝑖=1
Define another RV
𝑆𝑟 𝑆𝑚 =𝑛 ↔ ‘number of successes in 𝑟 trials required for 𝑛 successes in 𝑚 trials’
which, in mathematical parlance, is the restriction of 𝑆𝑟
to the event 𝑆𝑚 = 𝑛. You can think of this restricted sum
as the number of successes in 𝑛 random draws, without
replacement, from a set of 𝑚 objects 𝑟 of which are fa-
vorable. Recall what is meant by random here – each
object (remaining in the set) has the same probability
of being picked. It is instructive to picture the sample
space of 𝑚 objects as
Ω = {𝜔1 , . . . , 𝜔𝑚 } = A ∪ Ac
where elements of the subset A are favorable (considered
a success). That draw-1 is random translates to equal
elementary probabilities
1
𝑃 ({𝜔𝑖 }) = 𝑖 ∈ {1, . . . , 𝑚}
𝑚
and consequently
𝑟  𝑚−𝑟
𝑃 (A) = , 𝑃 Ac =
𝑚 𝑚
We could then express the number of successes in 𝑛
draws as a sum of dependent RVs:
𝑛
Õ
𝑆𝑛𝑟,𝑚 = 𝑆𝑟 𝑆𝑚 =𝑛 = 𝐾𝑖
𝑖=1
here
𝐾𝑖 ↔ ‘success indicator for draw 𝑖’
Draw 1 is modeled by the distribution
 𝑚−𝑟
𝑃 (𝐾1 = 0) = 𝑃 Ac =
𝑚
𝑟
𝑃 (𝐾1 = 1) = 𝑃 (A) =
𝑚
Dependecy of draw 2 on draw 1 is modeled by the condi-
tional discribution
𝑃 (𝐾2 = 𝑘2 | 𝐾1 = 𝑘1 )
𝑘1 0 1
𝑘2

𝑚−𝑟−1 𝑚−𝑟
0
𝑚−1 𝑚−1
𝑟 𝑟−1
1
𝑚−1 𝑚−1

Draws 1-2 are modeled by the joint distribution


𝑃 (𝐾2 = 𝑘2 , 𝐾1 = 𝑘1 )
𝑘1 0 1
𝑘2

(𝑚 − 𝑟) 2 𝑟 (𝑚 − 𝑟)
0
(𝑚) 2 (𝑚) 2
(𝑚 − 𝑟) 𝑟 (𝑟) 2
1
(𝑚) 2 (𝑚) 2
000
m − r − 2

m − 2

00
m − r − 1 r

m − 1 m − 2

001
0

r 010
m − r − 1
m − 1
m − r m − 2

m
01
r − 1

m − 2
011
Start

100
r
m − r − 1
m
m − 2

10

m − r r − 1

m − 1 m − 2

101
1

r − 1
m − r 110
m − 1
m − 2

11

r − 2

m − 2
111
Summing over 𝑘1 gives the marginal distribution
𝑚−𝑟 𝑟
𝑃 (𝐾2 = 0) = , 𝑃 (𝐾2 = 1) =
𝑚 𝑚
which is identical to that of 𝐾1 despite their dependence.
Dependecy of draw 3 on the draws 1-2 is modeled by the
conditional distribution
𝑃 (𝐾3 = 𝑘3 | 𝐾2 = 𝑘2 , 𝐾1 = 𝑘1 )
𝑘3 = 0 𝑘3 = 1
𝑘1 0 1 0 1
𝑘2

𝑚−𝑟−2 𝑚−𝑟−1 𝑟 𝑟−1


0
𝑚−2 𝑚−2 𝑚−2 𝑚−2
𝑚−𝑟−1 𝑚−𝑟 𝑟−1 𝑟−2
1
𝑚−2 𝑚−2 𝑚−2 𝑚−2

Draws 1-3 are modeled by the joint disctribution


𝑃 (𝐾3 = 𝑘3 , 𝐾2 = 𝑘2 , 𝐾1 = 𝑘1 )
𝑘3 = 0 𝑘3 = 1
𝑘1 0 1 0 1
𝑘2

(𝑚 − 𝑟) 3 𝑟 (𝑚 − 𝑟) 2 (𝑚 − 𝑟) 2 𝑟 (𝑚 − 𝑟) (𝑟) 2
0
(𝑚) 3 (𝑚) 3 (𝑚) 3 (𝑚) 3
(𝑚 − 𝑟) 2 𝑟 (𝑟) 2 (𝑚 − 𝑟) (𝑚 − 𝑟) (𝑟) 2 (𝑟) 3
1
(𝑚) 3 (𝑚) 3 (𝑚) 3 (𝑚) 3
Summing over 𝑘1 gives the joint distribution
𝑃 (𝐾3 = 𝑘3 , 𝐾2 = 𝑘2 )
𝑘2 0 1
𝑘3

(𝑚 − 𝑟) 2 𝑟 (𝑚 − 𝑟)
0
(𝑚) 2 (𝑚) 2
(𝑚 − 𝑟) 𝑟 (𝑟) 2
1
(𝑚) 2 (𝑚) 2
which is identical to the one for draws 1-2. Dividing by
𝑃 (𝐾2 = 𝑘2 ) gives the conditional distribution
𝑃 (𝐾3 = 𝑘3 | 𝐾2 = 𝑘2 )
𝑘2 0 1
𝑘3

𝑚−𝑟−1 𝑚−𝑟
0
𝑚−1 𝑚−1
𝑟 𝑟−1
1
𝑚−1 𝑚−1

Thus, the dependency of draw-3 on draw-2 matches the


one of draw-2 on draw-1.
Following a similar procedure leads to the same conclu-
sion for draws 1 and 3.
Exchangeable RVs Here is the conclusion: all the 𝑛
draws are identical in marginal densities,
𝑚−𝑟 𝑟
𝑃 (𝐾𝑖 = 0) = , 𝑃 (𝐾𝑖 = 1) = , 𝑖 ∈ [1 · ·𝑛]
𝑚 𝑚
and all draw pairs have identical joint distributions,

𝑃 𝐾𝑗 = 𝑘𝑗 , 𝐾𝑖 = 𝑘𝑖
𝑘𝑖 0 1
𝑘𝑗

(𝑚 − 𝑟) 2 𝑟 (𝑚 − 𝑟)
0
(𝑚) 2 (𝑚) 2
(𝑚 − 𝑟) 𝑟 (𝑟) 2
1
(𝑚) 2 (𝑚) 2

and, consequently, in their conditional densities



𝑃 𝐾𝑗 = 𝑘𝑗 | 𝐾𝑖 = 𝑘𝑖
𝑘𝑖 0 1
𝑘𝑗

𝑚−𝑟−1 𝑚−𝑟
0
𝑚−1 𝑚−1
𝑟 𝑟−1
1
𝑚−1 𝑚−1

Stated differently, the RVs 𝐾𝑖 are said to be exchangeable,


or interchangeable, though they are dependent. This
has some very useful implications. Without knowing
the joint distribution of these 𝑛 RVs, we can work out a
few expectations by exploiting the fact that the RVs are
interchangeable. To start with, all the success indicators
𝐾𝑖 have the same mean
𝑟
E 𝐾𝑖 = , 𝑖 ∈ [1 · ·𝑛]
𝑚
For all 𝑖, 𝑗 ∈ [1 · ·𝑛], the expectation of the (pairwise)
product 𝐾𝑖 𝐾𝑗 is
𝑟  𝑟−𝑚 
E 𝐾𝑖 𝐾𝑗 = 1+ [𝑖 ≠ 𝑗] , 𝑖, 𝑗 ∈ [1 · ·𝑛]
𝑚 𝑚−1 
and the covariance between pairs 𝐾𝑖 , 𝐾𝑗 is
 
Cov 𝐾𝑖 , 𝐾𝑗 = E (𝐾𝑖 − E 𝐾𝑖 ) 𝐾𝑗 − E 𝐾𝑗
𝑟  𝑟 𝑚 
= E 𝐾𝑖 𝐾𝑗 −E 𝐾𝑖 E 𝐾𝑗 = 1− 1− [𝑖 ≠ 𝑗]
𝑚 𝑚 𝑚−1
The sum of all the success indicators has the mean
𝑛 𝑛
Õ Õ 𝑛𝑟
E 𝐾𝑖 = E 𝐾𝑖 =
𝑖=1 𝑖=1
𝑚
and variance
𝑛 𝑛 𝑛
!2 " 𝑛 #2
Õ Õ Õ Õ
Var 𝐾𝑖 = E 𝐾𝑖 − E 𝐾𝑖 = E (𝐾𝑖 − E 𝐾𝑖 )
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛
ÕÕ 𝑛 𝑛 Õ
Õ 𝑛
 
= E (𝐾𝑖 − E 𝐾𝑖 ) 𝐾𝑗 − E 𝐾𝑗 = Cov 𝐾𝑖 , 𝐾𝑗
𝑖=1 𝑗=1 𝑖=1 𝑗=1
𝑛 Õ 𝑛
Õ 𝑟  𝑟 𝑚 
= 1− 1− [𝑖 ≠ 𝑗]
𝑖=1 𝑗=1
𝑚 𝑚 𝑚−1
𝑛 𝑛 𝑛 𝑛
𝑟  𝑟  ©Õ Õ 𝑚 ÕÕ
= 1− 1− [𝑖 ≠ 𝑗] ®
ª
𝑚 − 1 𝑖=1 𝑗=1
­
𝑚 𝑚 𝑖=1 𝑗=1
« ¬
𝑟  𝑟 2 𝑚  𝑛𝑟  𝑟  𝑚−𝑛
= 1− 𝑛 − (𝑛) 2 = 1−
𝑚 𝑚 𝑚−1 𝑚 𝑚 𝑚−1
Captilizing on we have learned so far, consider a par-
ticular sequence of 𝑛 draws with 𝑘 successes and 𝑛 − 𝑘
failures. Regardless of how the success (and failures)
are positioned, this sequence is probabilistically no dif-
ferent than the sequence with 𝑘 consecutive successes
followed by 𝑛 − 𝑘 consecutive failures. Assume that
𝑘 ∈ [1 · ·𝑛]. The probability of a streak of 𝑘 successes in
the first 𝑘 draws is
𝑘
! 𝑘 𝑖−1
Ù Ö Ù (𝑟) 𝑘
𝑃 𝐾𝑖 = 1 = 𝑃 (𝐾1 = 1) 𝑃 ­𝐾 𝑖 = 1 | 𝐾𝑗 = 1® =
© ª
𝑖=1 𝑖=2 𝑗=1
(𝑚) 𝑘
« ¬
On the other hand, the probability of a streak of 𝑛 − 𝑘
failures in the last 𝑛 − 𝑘 draws is
𝑛
! 𝑛 𝑖−1
Ù Ö Ù ª (𝑚 − 𝑟) 𝑛−𝑘
𝑃 𝐾𝑖 = 0 = 𝑃 (𝐾𝑘+1 = 0) 𝑃 ­𝐾 𝑖 = 0 | 𝐾𝑗 = 0® =
©
(𝑚 − 𝑘) 𝑛−𝑘
𝑖=𝑘+1 𝑖=𝑘+2 « 𝑗=𝑘+1 ¬
Multiplying the two gives the probability of the sequence
with 𝑘 consecutive successes followed by 𝑛 − 𝑘 consecu-
tive failures !
𝑘 𝑛
Ù Ù (𝑟) 𝑘 (𝑚 − 𝑟) 𝑛−𝑘 (𝑟) 𝑘 (𝑚 − 𝑟) 𝑛−𝑘
𝑃 𝐾𝑖 = 1 𝐾𝑖 = 0 = =
𝑖=1
(𝑚) 𝑘 (𝑚 − 𝑘) 𝑛−𝑘 (𝑚) 𝑛
𝑖=𝑘+1
𝑛
There are 𝑘 different sequences each with 𝑘 successes
and 𝑛 − 𝑘 failures. Though different in how successes
are positioned, all these sequences have the same proba-
bility given by the above expression. Summing all these
probabilities gives the probability of 𝑘 successes in 𝑛
draws !  
𝑛   −1    
Õ 𝑛 (𝑟) 𝑘 (𝑚 − 𝑟) 𝑛−𝑘 𝑚 𝑟 𝑚−𝑟
𝑃 𝐾𝑖 = 𝑘 = =
𝑖=1
𝑘 (𝑚) 𝑛 𝑛 𝑘 𝑛−𝑘
Following the alternative route of conditioning successes
in Bernoulli trials, we arrive at the same hyper-geometric
distirbution
𝑃 𝑆𝑟𝑛,𝑚 = 𝑘 = 𝑃 (𝑆𝑟 = 𝑘 | 𝑆𝑚 = 𝑛)


𝑃 (𝑆𝑟 = 𝑘) 𝑃 (𝑆𝑚 = 𝑛 | 𝑆𝑟 = 𝑘)
=
𝑃 (𝑆𝑚 = 𝑛)
  −1    
Bin (𝑘; 𝑟, 𝑝) Bin (𝑛 − 𝑘; 𝑚 − 𝑟, 𝑝) 𝑚 𝑟 𝑚−𝑟
= =
Bin (𝑛; 𝑚, 𝑝) 𝑛 𝑘 𝑛−𝑘
𝑛,𝑚
written as 𝑆𝑟 ∼ HG𝑛,𝑟,𝑚 with support
S = [max {0, 𝑛 − 𝑚 + 𝑟} · · min {𝑛, 𝑟}] ,
the PDF, restricted to the support,
  −1    
𝑚 𝑟 𝑚−𝑟
𝑓𝑆𝑟𝑛,𝑚 S (𝑘) = HG (𝑘; 𝑛, 𝑟, 𝑚) = ,
𝑛 𝑘 𝑛−𝑘
and the PGF
𝑛   −1    
Õ 𝑚 𝑟 𝑚−𝑟 𝑘
𝐺𝑆𝑟𝑛,𝑚 (𝑧) = 𝑧
𝑛 𝑘 𝑛−𝑘
𝑘=0
Notice the simpler limits in the sum allowed by general-
ized binomial coefficients.

Distribution properties The first two derivatives are


𝑛   −1    
0
Õ 𝑚 𝑟 𝑚 − 𝑟 𝑘−1
𝐺𝑆 𝑛,𝑚 (𝑧) = 𝑘 𝑧
𝑟 𝑛 𝑘 𝑛−𝑘
𝑘=1
𝑛     −1  
Õ 𝑟 𝑟−1 𝑛 𝑚−1 𝑚 − 1 − (𝑟 − 1) 𝑘−1
= (𝑘) 𝑧
𝑘 𝑘−1 𝑚 𝑛−1 𝑛 − 1 − (𝑘 − 1)
𝑘=1
𝑛−1   −1   
𝑛𝑟 Õ 𝑚 − 1 𝑟 − 1 𝑚 − 1 − (𝑟 − 1) 𝑗 𝑛𝑟
= 𝑧 = 𝐺𝑆 𝑛−1,𝑚−1 (𝑧)
𝑚 𝑗=0 𝑛 − 1 𝑗 𝑛−1−𝑗 𝑚 𝑟−1
and
𝑛𝑟 0 𝑛𝑟 (𝑛 − 1) (𝑟 − 1)
𝐺𝑆00𝑛,𝑚 (𝑧) = 𝐺 𝑛−1,𝑚−1 (𝑧) = 𝐺𝑆 𝑛−2,𝑚−2 (𝑧)
𝑟 𝑚 𝑆𝑟−1 𝑚 𝑚−1 𝑟−2
Setting 𝑧 = 1 yields the mean
𝑛𝑟
𝜇𝑆𝑟𝑛,𝑚 = E 𝑆𝑟𝑛,𝑚 = 𝐺𝑆0 𝑛,𝑚 (1) =
𝑟 𝑚
and the expectation
𝑛 (𝑛 − 1) 𝑟 (𝑟 − 1)
E 𝑆𝑟𝑛,𝑚 𝑆𝑟𝑛,𝑚 − 1 = 𝐺𝑆00𝑛,𝑚 (1) =

𝑟 𝑚 (𝑚 − 1)
from which can the variance can be recovered as
 
Var 𝑆𝑟𝑛,𝑚 = E 𝑆𝑟𝑛,𝑚 𝑆𝑟𝑛,𝑚 − 1 − 𝜇𝑆𝑟𝑛,𝑚 𝜇𝑆𝑟𝑛,𝑚 − 1


𝑛 (𝑛 − 1) 𝑟 (𝑟 − 1) 𝑛𝑟  𝑛𝑟  𝑛𝑟  (𝑛 − 1) (𝑟 − 1) 𝑛𝑟

= − −1 = +1−
𝑚 (𝑚 − 1) 𝑚 𝑚 𝑚 (𝑚 − 1) 𝑚
 
𝑛𝑟 (𝑟 − 1) (𝑛 − 1) 𝑚 + (𝑚 − 𝑟𝑛) (𝑚 − 1) 𝑛𝑟  𝑟 𝑚−𝑛

= = 1−
𝑚 (𝑚 − 1) 𝑚 𝑚 𝑚 𝑚−1

You might also like