STAT 2006 Chapter 1

Revision
STAT 2006 Chapter 1

Revision
Presented by
Simon Cheung
Email: kingchaucheung@cuhk.edu.hk
Department of Statistics, The Chinese University of Hong Kong
STAT 2006 - Jan 2021 1

How to assign probability to events?
Subjective assignments
• "I think there is an 80% chance of rain today."
• "I think there is a 50% chance that the world's oil reserves will be depleted by 2100."
At which end of the probability scale would you put the probability that
• you can swim around the world in 30 hours?
• you will get an A in this course?
The relative frequency approach
The relative frequency approach involves three steps in order to determine 𝑃𝑃(𝐴𝐴),
the probability of an event 𝐴𝐴:
• Perform an experiment a large number of times, 𝑛𝑛, say.
• Count the number of times the event 𝐴𝐴 of interest occurs, call the number 𝑁𝑁(𝐴𝐴).
𝑁𝑁 𝐴𝐴
• Then, the probability of event 𝐴𝐴 equals: 𝑃𝑃 𝐴𝐴 = .
𝑛𝑛
• The frequentist interpretation: when 𝑛𝑛 tends to infinity, the relative frequency of
the event 𝐴𝐴 tends to a number 𝑃𝑃 𝐴𝐴 , the probability of the event 𝐴𝐴.
STAT 2006 - Jan 2021 2
Simulation Example
Consider flipping two fair coins. The set of possible outcomes is
𝐻𝐻, 𝐻𝐻 , 𝐻𝐻, 𝑇𝑇 , 𝑇𝑇, 𝐻𝐻 , 𝑇𝑇, 𝑇𝑇 . We can simulate the experiment by generating 500 pairs
𝑈𝑈1 , 𝑈𝑈2 of Uniform 𝑈𝑈 0,1 random numbers. We have four possible outcomes:
𝑈𝑈1 < 0.5, 𝑈𝑈2 < 0.5
𝑈𝑈1 < 0.5, 𝑈𝑈2 ≥ 0.5
𝑈𝑈1 ≥ 0.5, 𝑈𝑈2 < 0.5
𝑈𝑈1 ≥ 0.5, 𝑈𝑈2 ≥ 0.5
They correspond to the events 𝐻𝐻𝐻𝐻 , 𝐻𝐻𝑇𝑇 , 𝑇𝑇𝐻𝐻 , 𝑇𝑇𝑇𝑇 respectively.
By inspecting the relative frequency of each of the event, we can measure their
approximate probabilities.
STAT 2006 - Jan 2021 3

Simulation Example
STAT 2006 - Jan 2021 4

The Classical Approach
As long as the outcomes in the sample space is equally likely, the probability of event 𝐴𝐴 is
𝑁𝑁 𝐴𝐴
𝑃𝑃 𝐴𝐴 = ,
𝑁𝑁 𝑆𝑆
where 𝑁𝑁 𝐴𝐴 is the number of elements in the event 𝐴𝐴 and 𝑁𝑁 𝑆𝑆 is the number of elements
in the sample space 𝑆𝑆.
Example
Consider an experiment of drawing a card at random from a deck of playing cards
(excluding the two jokers). The set of possible outcomes is
𝑓𝑓, 𝑣𝑣 , 𝑓𝑓 ∈ clubs (♣), diamonds (♦), hearts (♥),spades (♠) , 𝑣𝑣 ∈ 2,3,4,5,6,7,8,9,10, 𝐽𝐽, 𝑄𝑄, 𝐾𝐾, 𝐴𝐴 .
Let 𝐴𝐴 be the event that the card drawn is a 2, 3, or 7. Let 𝐵𝐵 be the event that the card is a 2
of hearts (♥), 3 of diamonds (♦), 8 of spades (♠) or king of clubs (♣).
STAT 2006 - Jan 2021 5

Example
What is the probability that a 2, 3, or 7 is drawn?
12 3
Since 𝐴𝐴 = 2,3,7 and 𝑁𝑁 𝐴𝐴 = 12, 𝑃𝑃 𝐴𝐴 = = .
52 13
What is the probability that the card is a 2 of hearts, 3 of diamonds, 8 of spades or king of
clubs?
4 1
Since 𝐵𝐵 = ♥, 2 , ♦, 3 , ♠, 8 , ♣, 𝐾𝐾 and 𝑁𝑁 𝐵𝐵 = 4, 𝑃𝑃 𝐵𝐵 = = .
52 13
What is the probability that the card is either a 2, 3, or 7 or a 2 of hearts, 3 of diamonds, 8
of spades or king of clubs?
14 7
Since 𝑁𝑁 𝐴𝐴 ∪ 𝐵𝐵 = 14, 𝑃𝑃 𝐴𝐴 ∪ 𝐵𝐵 = = .
52 26
What is 𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵)?
2 1
Since 𝑁𝑁 𝐴𝐴 ∩ 𝐵𝐵 = 2, 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵 = = .
52 26
STAT 2006 - Jan 2021 6

Axioms of Probability
Probability is a real-valued set function 𝑃𝑃 that assigns to each event 𝐴𝐴 in the sample
space 𝑆𝑆 a number 𝑃𝑃(𝐴𝐴), called the probability of the event 𝐴𝐴, such that the following hold:
• The probability of any event 𝐴𝐴 must be nonnegative, that is, 𝑃𝑃(𝐴𝐴) ≥ 0.
• The probability of the sample space is 1, that is, 𝑃𝑃(𝑆𝑆) = 1.
• Given mutually exclusive events 𝐴𝐴1 , 𝐴𝐴2 , 𝐴𝐴3 , …, where 𝐴𝐴𝑖𝑖 ∩ 𝐴𝐴𝑗𝑗 = ∅, for 𝑖𝑖 ≠ 𝑗𝑗,
𝑃𝑃 𝐴𝐴1 ∪ 𝐴𝐴2 ∪ ⋯ ∪ 𝐴𝐴𝑘𝑘 = 𝑃𝑃 𝐴𝐴1 + 𝑃𝑃 𝐴𝐴2 + ⋯ + 𝑃𝑃 𝐴𝐴𝑘𝑘
∞ ∞
𝑃𝑃 � 𝐴𝐴𝑖𝑖 = � 𝑃𝑃 𝐴𝐴𝑖𝑖
𝑖𝑖=1 𝑖𝑖=1
STAT 2006 - Jan 2021 7

Probability rules
If 𝑃𝑃 is a probability function and 𝐴𝐴 and 𝐵𝐵 are any subsets of Ω, then
a) 𝑃𝑃 ∅ = 0
b) 𝑃𝑃 𝐴𝐴 ≤ 1
c) 𝑃𝑃 𝐴𝐴𝑐𝑐 = 1 − 𝑃𝑃 𝐴𝐴
d) 𝑃𝑃 𝐵𝐵 ∩ 𝐴𝐴𝑐𝑐 = 𝑃𝑃 𝐵𝐵 − 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵
e) 𝑃𝑃 𝐴𝐴 ∪ 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 + 𝑃𝑃 𝐵𝐵 − 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵
f) If 𝐴𝐴 ⊆ 𝐵𝐵 then 𝑃𝑃 𝐴𝐴 ≤ 𝑃𝑃 𝐵𝐵
Proof
d) 𝐵𝐵 = 𝐵𝐵 ∩ 𝑆𝑆 = 𝐵𝐵 ∩ 𝐴𝐴 ∪ 𝐴𝐴𝑐𝑐 = 𝐵𝐵 ∩ 𝐴𝐴 ∪ 𝐵𝐵 ∩ 𝐴𝐴𝑐𝑐
Since 𝐵𝐵 ∩ 𝐴𝐴 ∩ 𝐵𝐵 ∩ 𝐴𝐴𝑐𝑐 = ∅, 𝑃𝑃 𝐵𝐵 = 𝑃𝑃 𝐵𝐵 ∩ 𝐴𝐴 + 𝑃𝑃 𝐵𝐵 ∩ 𝐴𝐴𝑐𝑐
e) 𝐵𝐵 ∩ 𝐴𝐴𝑐𝑐 ∪ 𝐴𝐴 = 𝐵𝐵 ∪ 𝐴𝐴 ∩ 𝐴𝐴𝑐𝑐 ∪ 𝐴𝐴 = 𝐵𝐵 ∪ 𝐴𝐴
Since 𝐵𝐵 ∩ 𝐴𝐴𝑐𝑐 ∩ 𝐴𝐴 = ∅,
𝑃𝑃 𝐴𝐴 ∪ 𝐵𝐵 = 𝑃𝑃 𝐵𝐵 ∩ 𝐴𝐴𝑐𝑐 ∪ 𝐴𝐴 = 𝑃𝑃 𝐵𝐵 ∩ 𝐴𝐴𝑐𝑐 + 𝑃𝑃 𝐴𝐴 = 𝑃𝑃 𝐴𝐴 + 𝑃𝑃 𝐵𝐵 − 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵
If 𝐴𝐴 ⊆ 𝐵𝐵, 𝐴𝐴 ∪ 𝐵𝐵 = 𝐵𝐵. Then, 𝑃𝑃 𝐵𝐵 ∩ 𝐴𝐴𝑐𝑐 = 𝑃𝑃 𝐵𝐵 − 𝑃𝑃 𝐴𝐴 ≥ 0.
is
f)
STAT 2006 - Jan 2021 8
Probability rules
Example. A company has bid on two large construction projects. The company president
believes that the probability of winning the first contract is 0.6, the probability of winning
the second contract is 0.4, and the probability of winning both contracts is 0.2.
• What is the probability that the company wins at least one contract?
P(wins at least one contract) = P(wins the first contract) + P(wins the second contract) –
P(wins both contracts) = 0.6 + 0.4 – 0.2 = 0.8.
• What is the probability that the company wins the first contract but not the second?
P(wins the first contract but not the second) = P(wins the first contract) – P(wins both
contract) = 0.6 – 0.2 = 0.4.
• What is the probability that the company wins neither contract?
P(wins neither contract) = 1 – P(wins at least one contract) = 1 – 0.8 = 0.2.
• What is the probability that the company wins exactly one contract?
P(wins exactly one contract) = P(wins at least one contract) – P(wins both contracts) =
0.8 – 0.2 = 0.6.
STAT 2006 - Jan 2021 9
Conditional Probability
Definition
If 𝐴𝐴 and 𝐵𝐵 are events in Ω, and 𝑃𝑃 𝐵𝐵 > 0, then the conditional probability of 𝐴𝐴 given 𝐵𝐵 is
𝑃𝑃 𝐴𝐴∩𝐵𝐵
given as 𝑃𝑃 𝐴𝐴 𝐵𝐵 = .
𝑃𝑃 𝐵𝐵
Note that if 𝐵𝐵 becomes the sample space, then 𝑃𝑃 𝐵𝐵 𝐵𝐵 = 1.

Suppose that 𝐴𝐴 and 𝐵𝐵 are disjoint (mutually exclusive), then 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵 = 0, 𝑃𝑃 𝐴𝐴 𝐵𝐵 = 0.
Multiplication Theorem
• For any two events 𝐴𝐴 and 𝐵𝐵 with 𝑃𝑃 𝐵𝐵 > 0,
𝑃𝑃 𝐴𝐴⋂𝐵𝐵 = 𝑃𝑃 𝐴𝐴 𝐵𝐵 𝑃𝑃 𝐵𝐵 .
• For any three events 𝐴𝐴, 𝐵𝐵, 𝐶𝐶 with 𝑃𝑃 𝐵𝐵⋂𝐶𝐶 > 0,
𝑃𝑃 𝐴𝐴⋂𝐵𝐵⋂𝐶𝐶 = 𝑃𝑃 𝐴𝐴 𝐵𝐵⋂𝐶𝐶 𝑃𝑃 𝐵𝐵 𝐶𝐶 𝑃𝑃 𝐶𝐶 .
STAT 2006 - Jan 2021 10

Example. Four Cards are dealt from the top of a well-shuffled deck. What is the
probability that they are the four aces?
Solution 1
52
Total number of ways to select 4 cards is = 270,725
4
1
The probability that four aces are being dealt is .
270725
Solution 2
4
The probability that the first card is an ace is
52
3
Given that the first card is an ace, the probability that the second card is an ace is
51
2
Given that the first two cards are aces, the probability that the third card is an ace is
50
1
Given that the first three cards are aces, the probability that the fourth card is an ace is
49
4 3 2 1 1
Hence, the probability that four aces are being dealt is × × × = 52 .
52 51 50 49
4
STAT 2006 - Jan 2021 11
Independent
• Two events 𝐴𝐴 and 𝐵𝐵 are called independent if and only if：𝑃𝑃 𝐴𝐴⋂𝐵𝐵 = 𝑃𝑃 𝐴𝐴 𝑃𝑃 𝐵𝐵 .
• If 𝑃𝑃 𝐴𝐴 > 0, then 𝐴𝐴 and 𝐵𝐵 are independent if and only if：𝑃𝑃 𝐵𝐵 𝐴𝐴 = 𝑃𝑃 𝐵𝐵 . NBA⼆⼻
管
• The events 𝐴𝐴1 , 𝐴𝐴2 , … , 𝐴𝐴𝑘𝑘 are (mutually) independent if and only if ⼆
𝑃𝑃 𝐴𝐴𝑖𝑖1 𝐴𝐴𝑖𝑖2 ⋯ 𝐴𝐴𝑖𝑖𝑗𝑗 = 𝑃𝑃 𝐴𝐴𝑖𝑖1 𝑃𝑃 𝐴𝐴𝑖𝑖2 ⋯ 𝑃𝑃 𝐴𝐴𝑖𝑖𝑗𝑗 ,

器
⽚
tPCB
for any combination 𝑖𝑖1 , 𝑖𝑖2 , … , 𝑖𝑖𝑗𝑗 ⊆ 1,2, … , 𝑘𝑘 .
• If 𝐴𝐴 and 𝐵𝐵 are independent events, then the following pairs are also independent:
a) 𝐴𝐴 and 𝐵𝐵𝑐𝑐 , 𝐴𝐴𝑐𝑐 and 𝐵𝐵
b) 𝐴𝐴𝑐𝑐 and 𝐵𝐵𝑐𝑐
• Proof 進 1PIAUB
a) 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵𝑐𝑐 = 𝑃𝑃 𝐴𝐴 − 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 1 − 𝑃𝑃 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 𝑃𝑃 𝐵𝐵𝑐𝑐 , vice-versa
b) 𝑃𝑃 𝐴𝐴𝑐𝑐 ∩ 𝐵𝐵𝑐𝑐 = 𝑃𝑃 𝐴𝐴𝑐𝑐 − 𝑃𝑃 𝐴𝐴𝑐𝑐 ∩ 𝐵𝐵 = 𝑃𝑃 𝐴𝐴𝑐𝑐 1 − 𝑃𝑃 𝐵𝐵 = 𝑃𝑃 𝐴𝐴𝑐𝑐 𝑃𝑃 𝐵𝐵𝑐𝑐
STAT 2006 - Jan 2021 12

Example. Let an experiment consist of tossing two dice. The sample space is
𝑆𝑆 = 𝑖𝑖, 𝑗𝑗 : 𝑖𝑖, 𝑗𝑗 = 1,2, … , 6
• Define the following events
𝐴𝐴 = 𝑖𝑖, 𝑖𝑖 : 𝑖𝑖 = 1,2,3,4,5,6 𝐵𝐵 = the sum is between 7 and 10
𝐶𝐶 = the sum is 2 or 7 or 8
1 1 1
• Note that 𝑃𝑃 𝐴𝐴 = , 𝑃𝑃 𝐵𝐵 = and 𝑃𝑃 𝐶𝐶 =
6 2 3
miii
1
𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵 ∩ 𝐶𝐶 = 𝑃𝑃 the sum is 8, composed of double 4s = = 𝑃𝑃 𝐴𝐴 𝑃𝑃 𝐵𝐵 𝑃𝑃 𝐶𝐶
36
This implies that 𝐴𝐴, 𝐵𝐵, 𝐶𝐶 are independent.
11
• However, 𝑃𝑃 𝐵𝐵 ∩ 𝐶𝐶 = 𝑃𝑃 sum equals 7 or 8 = ≠ 𝑃𝑃 𝐵𝐵 𝑃𝑃 𝐶𝐶
36
• Similarly, 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵 ≠ 𝑃𝑃 𝐴𝐴 𝑃𝑃 𝐵𝐵 . Therefore, that the requirement that
花在 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵 ∩ 𝐶𝐶 = 𝑃𝑃 𝐴𝐴 𝑃𝑃 𝐵𝐵 𝑃𝑃 𝐶𝐶
is not a strong enough condition to guarantee pairwise independence.
STAT 2006 - Jan 2021 13
A B
Law of total probability and Bayes Theorem
• If 0 < 𝑃𝑃 𝐵𝐵 < 1, then 𝑃𝑃 𝐴𝐴 = 𝑃𝑃 𝐴𝐴 𝐵𝐵 𝑃𝑃 𝐵𝐵 + 𝑃𝑃 𝐴𝐴 𝐵𝐵𝑐𝑐 𝑃𝑃 𝐵𝐵𝑐𝑐 , for any event 𝐴𝐴.
• If 𝐵𝐵1 , 𝐵𝐵2 , … , 𝐵𝐵𝑘𝑘 are mutually exclusive臖
啡 pmB
and exhaustive events (that is, a partition of the
sample space), then, for any event 𝐴𝐴, neuro
𝑘𝑘
𝑃𝑃 𝐴𝐴 = � 𝑃𝑃 𝐴𝐴 𝐵𝐵𝑗𝑗 𝑃𝑃 𝐵𝐵𝑗𝑗 .
𝑗𝑗=1
• Proof.
We observe that 𝐴𝐴 = 𝐴𝐴⋂𝐵𝐵1 ⋃ 𝐴𝐴⋂𝐵𝐵2 ⋃ ⋯ ⋃ 𝐴𝐴⋂𝐵𝐵𝑘𝑘
and the events 𝐴𝐴⋂𝐵𝐵1 , 𝐴𝐴⋂𝐵𝐵2 , … , 𝐴𝐴⋂𝐵𝐵𝑘𝑘 are mutually exclusive.
𝑘𝑘 𝑘𝑘
𝑃𝑃 𝐴𝐴 = � 𝑃𝑃 𝐴𝐴⋂𝐵𝐵𝑗𝑗 = � 𝑃𝑃 𝐴𝐴 𝐵𝐵𝑗𝑗 𝑃𝑃 𝐵𝐵𝑗𝑗 .

𝑗𝑗=1 𝑗𝑗=1
STAT 2006 - Jan 2021 14

Example. When coded messages are sent, there may be errors in the transmission. In
particular, Morse code used "dots" and "dashes", which are known to occur in the
proportion of 3:4. Suppose there is interference on the transmission line, and with
probability 1/8 a dot is mistakenly received as a dash, and vice versa. If a single signal
is sent to us, what is the probability that we will receive a dot?
Denote 𝐵𝐵 as the event that the original signal sent is a dot and 𝐴𝐴 as the event that
we receive a dot. Then we have
3 1
𝑃𝑃 𝐵𝐵 = , 𝑃𝑃 𝐴𝐴𝑐𝑐 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 𝐵𝐵𝑐𝑐 = .
Do⼗⼆
年Dash Ǐ
Then
7 8 Plilltllswnnne
1 3 1 3 25
𝑃𝑃 𝐴𝐴 = 𝑃𝑃 𝐴𝐴 𝐵𝐵 𝑃𝑃 𝐵𝐵 + 𝑃𝑃 𝐴𝐴 𝐵𝐵𝑐𝑐 𝑃𝑃 𝐵𝐵𝑐𝑐 = 1 − × + × 1− = .
8 7 8 7 56
STAT 2006 - Jan 2021 15

Bayes Theorem
• For any two events 𝐴𝐴 and 𝐵𝐵 with 𝑃𝑃 𝐴𝐴 > 0 and 𝑃𝑃 𝐵𝐵 > 0,
𝑃𝑃 𝐵𝐵 𝐴𝐴 = 𝑃𝑃 𝐴𝐴 𝐵𝐵
𝑃𝑃 𝐵𝐵
.o 器器鬻
𝑃𝑃 𝐴𝐴
• If 𝐵𝐵1 , 𝐵𝐵2 , … , 𝐵𝐵𝑘𝑘 are mutually exclusive and exhaustive events (that is, a partition of the
sample space), and 𝐴𝐴 is any event with 𝑃𝑃 𝐴𝐴 > 0, then for any event 𝐵𝐵𝑗𝑗 ,
𝑃𝑃 𝐴𝐴 𝐵𝐵𝑗𝑗 𝑃𝑃 𝐵𝐵𝑗𝑗 𝑃𝑃 𝐴𝐴 𝐵𝐵𝑗𝑗 𝑃𝑃 𝐵𝐵𝑗𝑗
𝑃𝑃 𝐵𝐵𝑗𝑗 𝐴𝐴 = = 𝑘𝑘 .
𝑃𝑃 𝐴𝐴 ∑𝑖𝑖=1 𝑃𝑃 𝐴𝐴 𝐵𝐵𝑖𝑖 𝑃𝑃 𝐵𝐵𝑖𝑖
STAT 2006 - Jan 2021 16

Bayes Theorem
Remark:
• 𝑃𝑃 𝐵𝐵 𝐴𝐴 = 𝑃𝑃 𝐴𝐴 𝐵𝐵 if and only if 𝑃𝑃 𝐴𝐴 = 𝑃𝑃 𝐵𝐵 .
• 𝑃𝑃 𝐵𝐵 is the prior probability of 𝐵𝐵. It is prior in the sense that it does not take into
account any information about 𝐴𝐴.
• 𝑃𝑃 𝐵𝐵 𝐴𝐴 , the conditional probability of 𝐵𝐵 given 𝐴𝐴, is also called the posterior probability
in contrast to the prior probability 𝑃𝑃 𝐵𝐵 . It is posterior in the sense that some
additional information (occurrence of 𝐴𝐴) has been taken into account.
Example According to the Morse code example, if we received a dot, what is the
probability that the actual symbol sent was really a dot?
1 3
𝑃𝑃 𝐴𝐴 𝐵𝐵 𝑃𝑃 𝐵𝐵 1− ×
𝑃𝑃 𝐵𝐵 𝐴𝐴 = = 8 7 = 21 = 0.84.
𝑃𝑃 𝐴𝐴 25 25
56
STAT 2006 - Jan 2021 17
Example Suppose 0.3% individuals out of a certain population are carrying a particular
virus. A powerful diagnostic test should be as accurate as possible, that is, with small error
rate. There can be two different types of diagnostic errors:
False positive: a positive result is obtained on a non-virus-carrier
False negative: a negative result is obtained on a virus carrier
Suppose the rate of false positive and false negative are 2% and 1%, respectively. These
error rates can be expressed in terms of probabilities. Let + be the event that the test
shows a positive result as carrier of a particular virus, 𝑉𝑉 be the event that a person is
infected by this virus. Also denote − as the complement of +. We have
𝑃𝑃 𝑉𝑉 = 0.003 ⟹ 𝑃𝑃 𝑉𝑉 𝑐𝑐 = 0.997
𝑃𝑃 + 𝑉𝑉 𝑐𝑐 = 0.02 ⟹ 𝑃𝑃 − 𝑉𝑉 𝑐𝑐 = 0.98
𝑃𝑃 − 𝑉𝑉 = 0.01 ⟹ 𝑃𝑃 + 𝑉𝑉 = 0.99
The question is that if the test shows a positive result on me, how likely is that I am really a
virus carrier?
STAT 2006 - Jan 2021 18
Example
𝑃𝑃 + 𝑉𝑉 𝑃𝑃 𝑉𝑉
𝑃𝑃 𝑣𝑣𝑖𝑖𝑣𝑣𝑣𝑣𝑣𝑣 𝑐𝑐𝑐𝑐𝑣𝑣𝑣𝑣𝑖𝑖𝑐𝑐𝑣𝑣 𝑝𝑝𝑝𝑝𝑣𝑣𝑖𝑖𝑝𝑝𝑖𝑖𝑣𝑣𝑐𝑐 𝑣𝑣𝑐𝑐𝑣𝑣𝑣𝑣𝑟𝑟𝑝𝑝 = 𝑃𝑃 𝑉𝑉 + =
𝑃𝑃 + 𝑉𝑉 𝑃𝑃 𝑉𝑉 + 𝑃𝑃 + 𝑉𝑉 𝑐𝑐 𝑃𝑃 𝑉𝑉 𝑐𝑐
0.003 × 0.99
= = 0.1296.
0.003 × 0.99 + 0.997 × 0.02
That is, even the test shows positive on an individual, he/she will only have about 13%
chance to be infected by the virus.
A tree diagram is a useful graphical display
that shows the outcomes of a set of events.
ooixooos 0.9840997
𝑃𝑃 − = 0.00003 + 0.97706 = 0.97709
does
resultisnegativethatperson notcamgvwus
iven 0.97706
𝑃𝑃 𝑉𝑉 𝑐𝑐 − = = 0.99997
0.97706 + 0.00003
STAT 2006 - Jan 2021 19
Example. A book club classifies members as heavy, medium, or light purchasers, and
separate mailings are prepared for each of these groups. Overall 20% of the members are
heavy purchasers, 30% medium, and 50% light. A member is not classified into a group
until 18 months after joining the club, but a test is made of the feasibility of using the first
3 months' purchases to classify members. The following percentages are obtained from
existing records of individuals classified as heavy, medium, or light purchasers.
If a member purchases no books in the first 3 months, what is the probability that the
member is a light purchaser?
STAT 2006 - Jan 2021 20
Example
100 100 100

𝑃𝑃 𝐿𝐿𝑖𝑖𝐿𝐿𝐿𝑝𝑝 0
𝑃𝑃 0 𝐿𝐿𝑖𝑖𝐿𝐿𝐿𝑝𝑝 𝑃𝑃 𝐿𝐿𝑖𝑖𝐿𝐿𝐿𝑝𝑝
=
𝑃𝑃 0 𝐿𝐿𝑖𝑖𝐿𝐿𝐿𝑝𝑝 𝑃𝑃 𝐿𝐿𝑖𝑖𝐿𝐿𝐿𝑝𝑝 + 𝑃𝑃 0 𝑀𝑀𝑐𝑐𝑀𝑀𝑖𝑖𝑣𝑣𝑀𝑀 𝑃𝑃 𝑀𝑀𝑐𝑐𝑀𝑀𝑖𝑖𝑣𝑣𝑀𝑀 + 𝑃𝑃 0 𝐻𝐻𝑐𝑐𝑐𝑐𝑣𝑣𝐻𝐻 𝑃𝑃 𝐻𝐻𝑐𝑐𝑐𝑐𝑣𝑣𝐻𝐻
0.6 × 0.5
= = 0.845.
0.6 × 0.5 + 0.15 × 0.3 + 0.05 × 0.2
STAT 2006 - Jan 2021 21

Example. You are waiting for your bag at the baggage return carousel of an airport.
Suppose that you know that there are 200 bags to come from your flight, and you are
counting the distinct bags that come out. Suppose that 𝑥𝑥 bags have arrived, and your bag
is not among them. What is the probability that your bag will not arrive at all, that is, that it
has been lost (or at least delayed)?
Let us assign values to 𝑃𝑃 𝐴𝐴 based on empirical data. Page 9 of "aucbaggage.pdf" contains
data for the number of missing bags per 1000 passengers for 24 airlines (provided by the
Association of European Airlines (AEA) in 2006). Here are the data for two airlines:
• Air Malta 𝑃𝑃 𝐴𝐴 = 0.0044
• British Airways 𝑃𝑃 𝐴𝐴 = 0.023
STAT 2006 - Jan 2021 22

The figure shows a plot of 𝑃𝑃 𝐴𝐴 𝑥𝑥 as a function of 𝑥𝑥 for the two airlines.
Note that
• For Air Malta, 𝑃𝑃 𝐴𝐴 199 = 0.469. So even ⼀名
when only 1 bag remains to arrive, the
chance is less than half that your bag has
been lost.
• For British Airways, 𝑃𝑃 𝐴𝐴 199 = 0.825 .
However, 𝑃𝑃 𝐴𝐴 197 = 0.541 is the first
probability over half.
STAT 2006 - Jan 2021 23

Discrete Random Variables
Given a random experiment with sample space 𝑆𝑆, a random variable 𝑿𝑿 is a set function that assigns
one and only one real number to each element 𝑣𝑣 that belongs in the sample space 𝑆𝑆. The set of all
possible values of the random variable 𝑋𝑋, denoted 𝑥𝑥, is called the support, or space, of 𝑋𝑋.
A random variable 𝑋𝑋 is a discrete random variable if:
• there are a finite number of possible outcomes of 𝑋𝑋, or
• there are a countably infinite number of possible outcomes of 𝑋𝑋.
The probability that a discrete random variable 𝑋𝑋 takes on a particular value 𝑥𝑥, that is, 𝑃𝑃(𝑋𝑋 = 𝑥𝑥), is
frequently denoted 𝑃𝑃𝑋𝑋 (𝑥𝑥). The function 𝑃𝑃𝑋𝑋 (𝑥𝑥) is typically called the probability mass function, that
satisfies the followings:
• 𝑃𝑃𝑋𝑋 𝑥𝑥 > 0, if 𝑥𝑥 ∈ the support 𝑆𝑆;
• ∑𝑥𝑥∈𝑆𝑆 𝑃𝑃𝑋𝑋 (𝑥𝑥) = 1;
• 𝑃𝑃 𝑋𝑋 ∈ 𝐴𝐴 = ∑𝑥𝑥∈𝐴𝐴 𝑃𝑃𝑋𝑋 (𝑥𝑥).
Note that if 𝑥𝑥 does not belong in the support 𝑆𝑆, then 𝑃𝑃𝑋𝑋 𝑥𝑥 = 0.
STAT 2006 - Jan 2021 24

Example.
Suppose that 5 people, including you and a friend, line up at random. Let the random
variable 𝑋𝑋 denote the number of people standing between you and a friend. Determine
the probability mass function of 𝑋𝑋. to
Go
G3 × 2! × 3! 3
40
𝑃𝑃𝑋𝑋 0 =
4 × 2! × 3!
=
4
, 𝑃𝑃𝑋𝑋 1 = = , 𝑃𝑃𝑋𝑋 2 =
2 × 2! × 3!
=
2
, 𝑃𝑃 3
5! 10 5! 10 5! 10 𝑋𝑋
1 × 2! × 3! 1
= = .
5! 10
STAT 2006 - Jan 2021 25

The cumulative distribution function (cdf) of a real-valued random variable 𝑋𝑋 is defined by

𝐹𝐹𝑋𝑋 𝑝𝑝 = 𝑃𝑃 𝑋𝑋 ≤ 𝑝𝑝 . The cdf of random variable 𝑋𝑋 has the following properties:
• 𝐹𝐹𝑋𝑋 𝑝𝑝 is a non-decreasing function of 𝑝𝑝, for 𝑝𝑝 ∈ ℝ.
• The cdf, 𝐹𝐹𝑋𝑋 𝑝𝑝 , ranges from 0 to 1.
• If 𝑋𝑋 is a discrete random variable whose minimum value is 𝑐𝑐, 𝐹𝐹𝑋𝑋 𝑐𝑐 = 𝑃𝑃 𝑋𝑋 ≤ 𝑐𝑐 =
𝑃𝑃𝑋𝑋 𝑐𝑐 . If 𝑐𝑐 < 𝑐𝑐, 𝐹𝐹𝑋𝑋 𝑐𝑐 = 0.
• If the maximum value of 𝑋𝑋 is 𝑏𝑏, then 𝐹𝐹𝑋𝑋 𝑏𝑏 = 1.
• All probabilities concerning 𝑋𝑋 can be stated in terms of 𝐹𝐹𝑋𝑋 .
STAT 2006 - Jan 2021 26

Hypergeometric distribution
If we randomly select 𝑛𝑛 items without replacement from a set of 𝑁𝑁 items of which 𝑀𝑀 of
the items are of one type and 𝑁𝑁 − 𝑀𝑀 of the items are of a second type, then the
probability mass function of the discrete random variable 𝑋𝑋 is called the hypergeometric
distribution and is of the form
𝑀𝑀 𝑁𝑁 − 𝑀𝑀
𝑥𝑥 𝑛𝑛 − 𝑥𝑥 , 0 ≤ 𝑥𝑥 ≤ 𝑀𝑀, 𝑥𝑥 ≤ 𝑛𝑛, 𝑛𝑛 − 𝑥𝑥 ≤ 𝑁𝑁 − 𝑀𝑀.
𝑃𝑃 𝑋𝑋 = 𝑥𝑥 =
𝑁𝑁
𝑛𝑛
Note that when the samples are drawn with replacement, the discrete random
variable 𝑋𝑋 follows what is called the binomial distribution.
STAT 2006 - Jan 2021 27

Example.
A crate contains 50 light bulbs of which 5 are defective and 45 are not. A Quality Control
Inspector randomly samples 4 bulbs without replacement. Let 𝑋𝑋 = the number of defective
bulbs selected. Find the probability mass function, 𝑃𝑃𝑋𝑋 (𝑥𝑥), of the discrete random
variable 𝑋𝑋.
5 45 5 45 5 45
𝑃𝑃𝑋𝑋 0 = 0 4 = 0.647, 𝑃𝑃𝑋𝑋 1 = 1 3 = 0.3081, 𝑃𝑃𝑋𝑋 2 = 2 2
50 50 50
4 4 4
5 45 5 45
= 0.043, 𝑃𝑃𝑋𝑋 3 = 3 1 = 0.00195, 𝑃𝑃𝑋𝑋 4 = 4 0 = 0.000022
50 50
4 4
STAT 2006 - Jan 2021 28

Sampling without replacement
Example (Acceptance Sampling)

Suppose that a lot of 25 machine parts is delivered, where a part is considered acceptable
only if it passes tolerance. We sample 10 parts and find that none are defective (all are
within tolerance). What is the probability of this event if there are 6 defectives in the lot of
25? Here, 𝑁𝑁 = 25, 𝐾𝐾 = 6, 𝑛𝑛 = 10
6 25 − 6
𝑃𝑃 𝑋𝑋 = 0 = 0 10 = 0.028.
25
10
It shows that the observed event is quite unlikely if there are 6 defectives in the lot.
STAT 2006 - Jan 2021 29

Expectation
If 𝑃𝑃𝑋𝑋 𝑥𝑥 is the pmf of a discrete random variable 𝑋𝑋 with support 𝑆𝑆, and if the summation
∑𝑥𝑥∈𝑆𝑆 𝑣𝑣 𝑥𝑥 𝑃𝑃𝑋𝑋 𝑥𝑥 exists, then the expected value of 𝑣𝑣 𝑋𝑋 is defined by
𝐸𝐸 𝑣𝑣 𝑋𝑋 = � 𝑣𝑣 𝑥𝑥 𝑃𝑃𝑋𝑋 𝑥𝑥 .
𝑥𝑥∈𝑆𝑆
Example.
What is the average toss of a fair six-sided die?
1
The pmf of 𝑋𝑋, the face value of a tossed fair six-sided die, is 𝑃𝑃𝑋𝑋 𝑥𝑥 = , 𝑥𝑥 = 1,2,3,4,5,6.
6
Thus, the expected face value of 𝑋𝑋 is
1 1 1 1 1 1
𝐸𝐸 𝑋𝑋 = 1 × + 2 × + 3 × + 4 × + 5 × + 6 × = 3.5.
6 6 6 6 6 6
STAT 2006 - Jan 2021 30

Example. What is the expected value of a discrete random variable 𝑋𝑋 with pmf
𝑐𝑐
𝑃𝑃𝑋𝑋 𝑥𝑥 = 2 , 𝑥𝑥 = 1,2,3, … ,
𝑥𝑥
∞ ∞
𝑐𝑐 𝑐𝑐
where 𝑐𝑐 is a constant. 𝐸𝐸 𝑋𝑋 = � 𝑥𝑥 � =� = ∞.
𝑥𝑥=1 𝑥𝑥 2 𝑥𝑥=1 𝑥𝑥
Therefore, the expected value of 𝑋𝑋 doesn't exist.
Example. Let 𝑣𝑣 𝑋𝑋 = 𝑋𝑋 − 𝑐𝑐 2 , where 𝑐𝑐 is a constant. Suppose that 𝐸𝐸 𝑋𝑋 − 𝑐𝑐 2 exists.
Find the value of 𝑐𝑐 that minimizes 𝐸𝐸 𝑋𝑋 − 𝑐𝑐 2 .
𝐿𝐿 𝑐𝑐 = 𝐸𝐸 𝑋𝑋 − 𝑐𝑐 2 = 𝐸𝐸 𝑋𝑋 2 − 2𝑐𝑐𝑋𝑋 + 𝑐𝑐 2
𝑀𝑀
𝐿𝐿 𝑐𝑐 = 𝐸𝐸 −2𝑋𝑋 + 2𝑐𝑐 = −2𝐸𝐸 𝑋𝑋 + 2𝑐𝑐
𝑀𝑀𝑐𝑐
𝑑𝑑
Set 𝐿𝐿 𝑐𝑐 = 0 to imply that 𝑐𝑐 = 𝐸𝐸 𝑋𝑋 .
𝑑𝑑𝑐𝑐
STAT 2006 - Jan 2021 31

Moments
Let 𝑣𝑣 be an integer. The 𝑣𝑣 𝑡𝑡𝑡 moment about the origin is defined by
𝐸𝐸 𝑋𝑋 𝑟𝑟 = � 𝑥𝑥 𝑟𝑟 𝑃𝑃𝑋𝑋 𝑥𝑥 .
𝑥𝑥∈𝑆𝑆
The 𝑣𝑣 𝑡𝑡𝑡 moment about 𝑏𝑏 is defined by 𝐸𝐸 𝑋𝑋 − 𝑏𝑏 𝑟𝑟 .
Special cases: Ecxo 7
The first moment about origin is the mean (or expected value) of 𝑋𝑋.
The second moment about the mean is the variance of 𝑋𝑋.
𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = 𝐸𝐸 𝑋𝑋 2 − 𝐸𝐸 𝑋𝑋 2 .
Proof. Let 𝜇𝜇 = 𝐸𝐸 𝑋𝑋 .
𝐸𝐸 𝑋𝑋 − 𝜇𝜇 2 = 𝐸𝐸 𝑋𝑋 2 − 2𝜇𝜇𝑋𝑋 + 𝜇𝜇2 = 𝐸𝐸 𝑋𝑋 2 − 𝜇𝜇2 .
STAT 2006 - Jan 2021 32

𝐸𝐸 𝑋𝑋 and 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 are population parameters. To make inference on them, we collect a

sample 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 from the population. The sample mean and the sample variance are
defined by
𝑛𝑛
1
𝑥𝑥̅ = � 𝑥𝑥𝑗𝑗
𝑛𝑛
𝑗𝑗=1
𝑛𝑛 𝑛𝑛
1 2 1
𝑣𝑣 2 = � 𝑥𝑥𝑗𝑗 − 𝑥𝑥̅ = � 𝑥𝑥𝑗𝑗2 − 𝑛𝑛𝑥𝑥̅ 2 .
𝑛𝑛 − 1 𝑛𝑛 − 1
The sample standard deviation is 𝑣𝑣 2 . Note that a sample before collection is random. We
often use capital letter 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 to represent it. Hence, before collection, the sample
mean 𝑋𝑋� and the sample variance 𝑆𝑆 2 are random variables.
STAT 2006 - Jan 2021 33

The moment generating function (mgf) of 𝑋𝑋 is defined by

𝑀𝑀𝑋𝑋 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑋𝑋 = � 𝑐𝑐 𝑡𝑡𝑥𝑥 𝑃𝑃𝑋𝑋 𝑥𝑥 ,
𝑥𝑥∈𝑆𝑆
for a small neighborhood of 𝑝𝑝 around the origin, as long as the summation is finite.
Example.
Suppose that the pmf of a Binomial random variable 𝑋𝑋 is given by
𝑛𝑛 𝑥𝑥
𝑃𝑃𝑋𝑋 𝑥𝑥 = 𝑝𝑝 1 − 𝑝𝑝 𝑛𝑛−𝑥𝑥 , 0 < 𝑝𝑝 < 1, 𝑥𝑥 = 0,1,2, … , 𝑛𝑛.
𝑥𝑥
Determine the mgf of 𝑋𝑋.
𝑛𝑛 𝑛𝑛
𝑛𝑛 𝑥𝑥 𝑛𝑛
𝐸𝐸 𝑐𝑐 𝑡𝑡𝑋𝑋 = � 𝑐𝑐 𝑡𝑡𝑥𝑥 𝑝𝑝 1 − 𝑝𝑝 𝑛𝑛−𝑥𝑥 =� 𝑝𝑝𝑐𝑐 𝑡𝑡 𝑥𝑥 1 − 𝑝𝑝 𝑛𝑛−𝑥𝑥 = 𝑝𝑝𝑐𝑐 𝑡𝑡 + 1 − 𝑝𝑝 𝑛𝑛 ,
𝑥𝑥 𝑥𝑥
𝑥𝑥=0 𝑥𝑥=0
for 𝑝𝑝 ∈ ℝ.
STAT 2006 - Jan 2021 34

𝑟𝑟 𝑟𝑟
Let 𝑀𝑀 𝑝𝑝 be the 𝑣𝑣 𝑡𝑡𝑡 derivative of 𝑀𝑀 𝑝𝑝 with respect to 𝑝𝑝. We have 𝐸𝐸 𝑋𝑋 𝑟𝑟 = 𝑀𝑀𝑋𝑋 0 .
2
1 2 1
Hence, 𝜇𝜇𝑋𝑋 = 𝑀𝑀𝑋𝑋 0 and 𝜎𝜎𝑋𝑋2 = 𝑀𝑀𝑋𝑋 0 − 𝑀𝑀𝑋𝑋 0 .
Example.
Use the mgf of the Binomial random variable 𝑋𝑋 to determine the mean and the variance of
𝑋𝑋.
𝑀𝑀𝑋𝑋 𝑝𝑝 = 𝑝𝑝𝑐𝑐 𝑡𝑡 + 1 − 𝑝𝑝 𝑛𝑛
1 1
𝑀𝑀𝑋𝑋 𝑝𝑝 = 𝑛𝑛 𝑝𝑝𝑐𝑐 𝑡𝑡 + 1 − 𝑝𝑝 𝑛𝑛−1 𝑝𝑝𝑐𝑐 𝑡𝑡 ⟹ 𝑀𝑀𝑋𝑋 0 = 𝑛𝑛𝑝𝑝
2 2
𝑀𝑀𝑋𝑋 𝑝𝑝 = 𝑛𝑛𝑝𝑝 𝑐𝑐 𝑡𝑡 𝑝𝑝𝑐𝑐 𝑡𝑡 + 1 − 𝑝𝑝 𝑛𝑛−1 + 𝑝𝑝𝑐𝑐 2𝑡𝑡 𝑛𝑛 − 1 𝑝𝑝𝑐𝑐 𝑡𝑡 + 1 − 𝑝𝑝 𝑛𝑛−2 ⟹ 𝑀𝑀𝑋𝑋 0
= 𝑛𝑛𝑝𝑝 1 + 𝑛𝑛 − 1 𝑝𝑝
It follows that 𝜇𝜇𝑋𝑋 = 𝑛𝑛𝑝𝑝 and 𝜎𝜎𝑋𝑋2 = 𝑛𝑛𝑝𝑝 1 + 𝑛𝑛 − 1 𝑝𝑝 − 𝑛𝑛2 𝑝𝑝2 = 𝑛𝑛𝑝𝑝 1 − 𝑝𝑝 .
STAT 2006 - Jan 2021 35

Suppose that the support 𝑆𝑆 = 𝑥𝑥1 , 𝑥𝑥2 , … . We have
𝑀𝑀 𝑝𝑝 = 𝑐𝑐 𝑡𝑡𝑥𝑥1 𝑃𝑃𝑋𝑋 𝑥𝑥1 + 𝑐𝑐 𝑡𝑡𝑥𝑥2 𝑃𝑃𝑋𝑋 𝑥𝑥2 + 𝑐𝑐 𝑡𝑡𝑥𝑥3 𝑃𝑃𝑋𝑋 𝑥𝑥3 + ⋯
The coefficient of 𝑐𝑐 𝑡𝑡𝑥𝑥𝑖𝑖 is 𝑃𝑃 𝑋𝑋 = 𝑥𝑥𝑖𝑖 = 𝑃𝑃𝑋𝑋 𝑥𝑥𝑖𝑖 .
Hence, if two random have the same moment-generating function, they must have the
same probability distribution.
𝑒𝑒𝑡𝑡
Example. Suppose that the mgf of 𝑋𝑋 is 𝑀𝑀 𝑝𝑝 = 2
, 𝑝𝑝 < ln 2. Since 1 − 𝑧𝑧 −1 = 1 + 𝑧𝑧 +
𝑒𝑒𝑡𝑡
1−
2
𝑧𝑧 2 + ⋯, −1 < 𝑧𝑧 < 1, we have wow omg
−1 2 3
𝑐𝑐 𝑡𝑡 𝑐𝑐 𝑡𝑡 𝑐𝑐 𝑡𝑡 𝑐𝑐 𝑡𝑡 𝑐𝑐 2𝑡𝑡 1 1 1
1− = 1+ + 2 +⋯ = 𝑐𝑐 𝑡𝑡 + 𝑐𝑐 2𝑡𝑡 + 𝑐𝑐 3𝑡𝑡 +⋯
2 2 2 2 2 2 2 2
1 𝑥𝑥
Hence, the pmf of 𝑋𝑋 is 𝑃𝑃𝑋𝑋 𝑥𝑥 = , 𝑥𝑥 = 1,2,3, ….
2
STAT 2006 - Jan 2021 36
Bernoulli Trial
Consider an experiment of tossing a coin with sample space 𝑆𝑆 = 𝐻𝐻, 𝑇𝑇 , and define a
discrete random variable 𝑋𝑋 as 𝑋𝑋 𝐻𝐻 = 1 and 𝑋𝑋 𝑇𝑇 = 0. Let 𝑆𝑆 be a sample space and 𝐴𝐴
be an event. Denote 𝜔𝜔 as an outcome of the experiment. Define a discrete random
variable 𝑋𝑋 with 𝑋𝑋 𝜔𝜔 ∈ 𝐴𝐴 = 1 and 𝑋𝑋 𝜔𝜔 ∉ 𝐴𝐴 = 0. The random variable 𝑋𝑋 is called a
Bernoulli trial.
Suppose 𝑃𝑃 is a probability function defined on 𝑆𝑆. Let 𝑝𝑝 = 𝑃𝑃 𝐴𝐴 . Then, 𝑝𝑝 = 𝑃𝑃𝑋𝑋 𝑋𝑋 = 1 , call
it the probability of success. Example of a Bernoulli Trial
1. Flipping a coin
2. Rolling a die, where a six is “success”
3. In conducting a political opinion poll, choosing a voter at random to ascertain
whether that voter will vote “yes” in upcoming election.
STAT 2006 - Jan 2021 37

Binomial Trials
The number of successes in 𝑛𝑛 independent Bernoulli trials each with probability of success
𝑝𝑝 is called a binomial trial with parameter 𝑛𝑛, 𝑝𝑝 . For a variable to be a binomial random
variable, ALL of the following conditions must be met:
• There are a fixed number of trials (a fixed sample size).
• On each trial, the event of interest either occurs or does not.
• The probability of occurrence is the same on each trial.
• Trials are independent of one another.
Examples
• The number of heads in 𝑛𝑛 tosses of a coin.
• Number of correct guesses at 30 true-false questions when you randomly guess all
answers.
• Number of left-handers in a randomly selected sample of 100 unrelated people.
STAT 2006 - Jan 2021 38
The Distribution of a Binomial Random Variable
Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be 𝑛𝑛 independent Bernoulli random variables each with success
probability 𝑝𝑝. Denote 𝑌𝑌 = ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 as the number of successes in 𝑛𝑛 independent Bernoulli
trials. From the analysis in last section, 𝑌𝑌 is a binomial random variable with parameters 𝑛𝑛
and 𝑝𝑝. The range of 𝑌𝑌 is 0, 1, 2, 3, … , 𝑛𝑛 , and
𝑛𝑛 𝑘𝑘 𝑛𝑛−𝑘𝑘 , 𝑘𝑘
𝑃𝑃 𝑌𝑌 = 𝑘𝑘 = 𝑝𝑝 1 − 𝑝𝑝 = 0, 1, 2, … , 𝑛𝑛
𝑘𝑘
Proof
From 𝑛𝑛 independent Bernoulli trials, among 𝑘𝑘 of them are success with probability 𝑝𝑝 and
𝑛𝑛
𝑛𝑛 − 𝑘𝑘 of them are not success with probability 1 − 𝑝𝑝. In addition, there are different
𝑘𝑘
number of ways to select 𝑘𝑘 success trials among 𝑛𝑛 trials. Thus, the probability of obtaining 𝑘𝑘
𝑛𝑛 𝑘𝑘
successes from 𝑛𝑛 trials is 𝑝𝑝 1 − 𝑝𝑝 𝑛𝑛−𝑘𝑘 .
𝑘𝑘
STAT 2006 - Jan 2021 39

Mean and Variance of a Bernoulli Random Variable
If 𝑋𝑋 is a Bernoulli random variable with success probability 𝜋𝜋, its mean and variance are
evaluated as
𝐸𝐸 𝑋𝑋 = 1 � 𝜋𝜋 + 0 � 1 − 𝜋𝜋 = 𝜋𝜋, 𝐸𝐸 𝑋𝑋 2 = 12 � 𝜋𝜋 + 02 � 1 − 𝜋𝜋 = 𝜋𝜋
𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = 𝐸𝐸 𝑋𝑋 2 − 𝐸𝐸 𝑋𝑋 2 = 𝜋𝜋 1 − 𝜋𝜋
Mean and Variance of a Binomial Random Variable
Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be 𝑛𝑛 independent Bernoulli random variables each with success
probability 𝜋𝜋. 𝑌𝑌 = ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 is a binomial random variable with parameter 𝑛𝑛, 𝜋𝜋. The mean
and variance of 𝑌𝑌 are given as
𝑛𝑛
𝐸𝐸 𝑌𝑌 = � 𝐸𝐸 𝑋𝑋𝑖𝑖 = 𝑛𝑛𝜋𝜋
𝑖𝑖=1
𝑛𝑛
𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌 = � 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋𝑖𝑖 = 𝑛𝑛𝜋𝜋 1 − 𝜋𝜋

𝑖𝑖=1
STAT 2006 - Jan 2021 40
Example
草坪草 seed example. Suppose the company producing the turf grass takes
Consider the turf grass
a sample of 20 seeds on a regular basis to monitor the quality of the seeds. If the
germination rate of the seeds stays constant at 𝑝𝑝 = 85%, then the average number of
seeds that will germinate in the sample of 20 seeds is 𝜇𝜇 = 𝑛𝑛𝑝𝑝 = 20 × 0.85 = 17 with a
standard deviation of 𝜎𝜎 = 𝑛𝑛𝑝𝑝 1 − 𝑝𝑝 = 20 × 0.85 × 0.15 = 1.60.
Suppose we examine the germination records of a large number of samples of 20 seeds
each. If the germination rate has remained constant at 85%, then the average number of
seeds that germinate should be close to 17 per sample. If only 12 seeds are germinated in
a particular sample of 20 seeds, would the germination rate of 𝑝𝑝 = 85% seem consistent
with the observation?
STAT 2006 - Jan 2021 41

Example
We simulate germinations of 1000 samples of 20 seeds each. With germination rate at 𝑝𝑝 =
85%, the histogram of the number of germinated seeds in 1000 samples is shown below.
i
no.ofsamples
ylfhttoge
Let 𝑋𝑋 be the number of germinated seeds in a sample of 20 seeds. 𝑋𝑋 has a binomial
distribution with parameter 𝑛𝑛 = 20 and 𝜋𝜋 = 0.85. Since 𝑃𝑃 𝑋𝑋 ≤ 12 = 0.0059, it is highly
能improbable that in 20 seeds we would obtain 12 germinated seeds if 𝜋𝜋 is equal to 0.85. The
germination rate is most likely a value considerably less than 0.85.
STAT 2006 - Jan 2021 42
Example
A cable TV company is investigating the feasibility of offering a new service in a large
Midwestern city. For the proposed new service to be economically viable, it is necessary
that at least 50% of their current subscribers add the new service. A survey of 1,218
customers reveals that 516 would add the new service. Do you think the company should
offer the new service in this city?
Let 𝑋𝑋 be the number of customers who would subscribe the new service in a random
sample of 1,218 customers. If 𝑝𝑝 = 0.5, 𝑋𝑋 has a binomial distribution with parameter 𝑛𝑛 =
1218 and 𝑝𝑝 = 0.5. Since 𝑃𝑃 𝑋𝑋 ≤ 516 ≈ 0, offering the new service is not a good idea.
STAT 2006 - Jan 2021 43

Geometric Distribution
Assuming independent Bernoulli trials with constant success probability 𝑝𝑝. Let 𝑋𝑋 denotes the
number of trials to have the first success. The pmf of 𝑋𝑋 is 𝑃𝑃𝑋𝑋 𝑥𝑥 = 1 − 𝑝𝑝 𝑥𝑥−1 𝑝𝑝, 𝑥𝑥 =
1,2,3, …. Let 𝑞𝑞 = 1 − 𝑝𝑝.
The cumulative distribution function of 𝑋𝑋 is
∞
𝑘𝑘−1
𝑝𝑝 1 − 𝑝𝑝 𝑥𝑥 𝑥𝑥
𝑃𝑃 𝑋𝑋 > 𝑥𝑥 = � 1 − 𝑝𝑝 𝑝𝑝 = = 1 − 𝑝𝑝 ⟹ 𝐹𝐹𝑋𝑋 𝑥𝑥 = 1 − 1 − 𝑝𝑝 𝑥𝑥 .
1 − 1 − 𝑝𝑝
𝑘𝑘=𝑥𝑥+1
The moment generating function of a Geometric random variable 𝑋𝑋 is given by
∞ ∞
𝑝𝑝𝑐𝑐 𝑡𝑡
𝑀𝑀 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑋𝑋 = � 𝑐𝑐 𝑡𝑡𝑘𝑘 1 − 𝑝𝑝 𝑘𝑘−1 𝑝𝑝 = 𝑝𝑝𝑐𝑐 𝑡𝑡 � 𝑐𝑐 𝑡𝑡 1 − 𝑝𝑝 𝑘𝑘−1 = ,
1 − 𝑐𝑐𝑡𝑡 1 − 𝑝𝑝
𝑘𝑘=1 𝑘𝑘=1
where 𝑐𝑐 𝑡𝑡 1 − 𝑝𝑝 < 1.
STAT 2006 - Jan 2021 44

𝑑𝑑 𝑝𝑝𝑒𝑒 𝑡𝑡 𝑑𝑑 2 𝑝𝑝𝑒𝑒 𝑡𝑡 +𝑝𝑝 1−𝑝𝑝 𝑒𝑒 2𝑡𝑡
Since 𝑀𝑀 𝑝𝑝 = 2 and 𝑀𝑀 𝑝𝑝 = 3 ,
𝑑𝑑𝑡𝑡 1−𝑒𝑒 𝑡𝑡 1−𝑝𝑝 𝑑𝑑𝑡𝑡 2 1−𝑒𝑒 𝑡𝑡 1−𝑝𝑝
𝑀𝑀 1
𝐸𝐸 𝑋𝑋 = 𝑀𝑀 𝑝𝑝 � = ;
𝑀𝑀𝑝𝑝 𝑡𝑡=0 𝑝𝑝
𝑀𝑀 2 2 − 𝑝𝑝
𝐸𝐸 𝑋𝑋 2 = 2 𝑀𝑀 𝑝𝑝 � = 2 ;
𝑀𝑀𝑝𝑝 𝑡𝑡=0 𝑝𝑝
2 − 𝑝𝑝 1 1 − 𝑝𝑝
𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = 𝐸𝐸 𝑋𝑋 2 − 𝐸𝐸 𝑋𝑋 2 = − = .
𝑝𝑝2 𝑝𝑝2 𝑝𝑝2
STAT 2006 - Jan 2021 45

Example. A representative from the National Football League's Marketing Division randomly
selects people on a random street in Kansas City, Kansas until he finds a person who
attended the last home football game. Let p, the probability that he succeeds in finding such
a person, equal 0.20. And, let 𝑋𝑋 denote the number of people he selects until he finds his
first success. How many people should we expect (that is, what is the average number) the
marketing representative needs to select before he finds one who attended the last home
football game? And, while we're at it, what is the variance?
1 1 𝑞𝑞 0.8
The average number is 𝐸𝐸 𝑋𝑋 = = = 5 and 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = = = 20.
𝑝𝑝 0.2 𝑝𝑝2 0.22
STAT 2006 - Jan 2021 46

Negative Binomial Distribution
Assuming independent Bernoulli trials with constant success probability 𝑝𝑝. Let 𝑋𝑋 denotes the
number of trials to have the 𝑣𝑣 𝑡𝑡𝑡 success. The pmf of 𝑋𝑋 is
𝑥𝑥 − 1 𝑟𝑟 𝑥𝑥−𝑟𝑟 , 𝑥𝑥
𝑃𝑃𝑋𝑋 𝑥𝑥 = 𝑝𝑝 1 − 𝑝𝑝 = 𝑣𝑣, 𝑣𝑣 + 1, 𝑣𝑣 + 2, …
𝑣𝑣 − 1
Select 𝑣𝑣 − 1 successes
from 𝑥𝑥 − 1 trials.
𝑋𝑋 is equal to the sum of 𝑣𝑣 independent Geometric distributed random variables each with
the same success probability 𝑝𝑝.
STAT 2006 - Jan 2021 47

Negative Binomial Distribution
The mgf of a Negative Binomial random variable 𝑋𝑋 with parameters 𝑣𝑣 and 𝑝𝑝 is given by
𝑟𝑟 𝑟𝑟
𝑝𝑝𝑐𝑐 𝑡𝑡
𝐸𝐸 𝑐𝑐 𝑡𝑡𝑋𝑋 = 𝐸𝐸 𝑐𝑐 𝑡𝑡 𝑌𝑌1 +⋯+𝑌𝑌𝑟𝑟 = � 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑌𝑌𝑗𝑗 = ,
1 − 𝑐𝑐𝑡𝑡 1 − 𝑝𝑝
𝑗𝑗=1
where 𝑌𝑌1 , 𝑌𝑌2 , … , 𝑌𝑌𝑟𝑟 are independent Geometric random variables with parameter 𝑝𝑝 and
𝑐𝑐 𝑡𝑡 1 − 𝑝𝑝 < 1.
𝑣𝑣
𝐸𝐸 𝑋𝑋 = 𝐸𝐸 𝑌𝑌1 + 𝐸𝐸 𝑌𝑌2 + ⋯ + 𝐸𝐸 𝑌𝑌𝑟𝑟 = ;
𝑝𝑝
𝑣𝑣 1 − 𝑝𝑝
𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = 𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌1 + 𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌2 + ⋯ + 𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌𝑟𝑟 = .
𝑝𝑝2
STAT 2006 - Jan 2021 48

Example. An oil company conducts a geological study that indicates that an exploratory oil
well should have a 20% chance of striking oil. What is the probability that the first strike
comes on the third well drilled? ⼀
3−1
𝑃𝑃 𝑋𝑋 = 3 = 𝑝𝑝 1 − 𝑝𝑝 3−1 = 0.8 2 × 0.2 = 0.128.
1−1
What is the probability that the third strike comes on the seventh well drilled?
7−1 3 6
𝑃𝑃 𝑋𝑋 = 7 = 𝑝𝑝 1 − 𝑝𝑝 7−3 = 0.2 3 0.8 4 = 0.049.
3−1 2
What is the mean and variance of the number of wells that must be drilled if the oil
company wants to set up three producing wells?
𝑣𝑣 3 𝑣𝑣 1 − 𝑝𝑝 3 × 0.8
𝐸𝐸 𝑋𝑋 = = = 15; 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = = = 60.
𝑝𝑝 0.2 𝑝𝑝2 0.22
STAT 2006 - Jan 2021 49

Poisson Distribution
A random variable 𝑋𝑋, taking values in the non-negative integers, has a Poisson distribution
with parameter 𝜆𝜆 > 0 if
𝑐𝑐 −𝜆𝜆 𝜆𝜆𝑘𝑘
𝑃𝑃 𝑋𝑋 = 𝑘𝑘 = , 𝑘𝑘 = 0,1,2,3, …
𝑘𝑘!
Let 𝑋𝑋 denote the number of events in a given continuous interval. Then 𝑋𝑋 follows an
approximate Poisson process with parameter 𝜆𝜆 > 0 if:
• The number of events occurring in non-overlapping intervals are independent.
1
• The probability of exactly one event in a short interval of length 𝐿 = is approximately
𝑛𝑛
𝜆𝜆
𝜆𝜆𝐿 = .
𝑛𝑛
• The probability of exactly two or more events in a short interval is essentially zero.
STAT 2006 - Jan 2021 50

Poisson Distribution
Consider dividing the given interval into 𝑛𝑛 subintervals, 𝑋𝑋 has a Binomial distribution with
𝜆𝜆
parameter 𝑛𝑛 and .
𝑛𝑛
𝑥𝑥 𝑛𝑛−𝑥𝑥 𝑛𝑛 −𝑥𝑥
𝑛𝑛
𝜆𝜆 𝜆𝜆 𝜆𝜆𝑥𝑥 𝑛𝑛 𝑛𝑛 − 1 ⋯ 𝑛𝑛 − 𝑥𝑥 + 1 𝜆𝜆 𝜆𝜆
𝑃𝑃𝑋𝑋 𝑥𝑥 = 𝐶𝐶𝑥𝑥 1− = 1− 1−
𝑛𝑛 𝑛𝑛 𝑥𝑥! 𝑛𝑛 � 𝑛𝑛 � ⋯ � 𝑛𝑛 𝑛𝑛 𝑛𝑛
𝑛𝑛 −𝑥𝑥
𝜆𝜆𝑥𝑥 𝜆𝜆 𝑛𝑛 1 𝑥𝑥 − 1 𝜆𝜆 𝜆𝜆𝑥𝑥 −𝜆𝜆
= 1− 1− ⋯ 1− 1− ⟶ 𝑐𝑐 , 𝑥𝑥 ≪ 𝑛𝑛,
𝑥𝑥! 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑥𝑥!
as 𝑛𝑛 → ∞.
STAT 2006 - Jan 2021 51

Properties of Poisson Distribution
Let 𝑋𝑋 be a Poisson random variable with parameter 𝜆𝜆.
1) The mean of 𝑋𝑋
∞ ∞
𝑐𝑐 −𝜆𝜆 𝜆𝜆𝑘𝑘 −𝜆𝜆
𝜆𝜆𝑘𝑘−1
𝐸𝐸 𝑋𝑋 = � 𝑘𝑘 = 𝑐𝑐 𝜆𝜆 � = 𝜆𝜆.
𝑘𝑘! 𝑘𝑘 − 1 !
2) The variance of 𝑋𝑋
∞ ∞
𝑐𝑐 −𝜆𝜆 𝜆𝜆𝑘𝑘 𝜆𝜆𝑘𝑘−1
𝐸𝐸 𝑋𝑋 2 = � 𝑘𝑘 2 −𝜆𝜆
= 𝑐𝑐 𝜆𝜆 � 𝑘𝑘 − 1 + 1 � = 𝜆𝜆2 + 𝜆𝜆
𝑘𝑘! 𝑘𝑘 − 1 !
⟹ 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = 𝐸𝐸 𝑋𝑋 2 − 𝐸𝐸 𝑋𝑋 2 = 𝜆𝜆.
3) The moment generating function of 𝑋𝑋
∞ ∞
𝑐𝑐 −𝜆𝜆 𝜆𝜆𝑘𝑘 𝜆𝜆𝑐𝑐 𝑡𝑡 𝑘𝑘
𝐸𝐸 𝑐𝑐 𝑡𝑡𝑋𝑋 = � 𝑐𝑐 𝑡𝑡𝑘𝑘 −𝜆𝜆
= 𝑐𝑐 � = 𝑐𝑐 𝜆𝜆 𝑒𝑒 𝑡𝑡 −1 , 𝑝𝑝 ∈ ℝ.
𝑘𝑘! 𝑘𝑘!
STAT 2006 - Jan 2021 52
4) Consider tossing a coin 𝑛𝑛 times, and if the tosses are independent and have the same
probability of 𝜋𝜋 of showing a head, the probability that there are 𝑣𝑣 heads and 𝑛𝑛 − 𝑣𝑣 tails
is
𝑛𝑛 𝑟𝑟
𝑃𝑃 𝑋𝑋 = 𝑣𝑣 𝑛𝑛, 𝜋𝜋 = 𝜋𝜋 1 − 𝜋𝜋 𝑛𝑛−𝑟𝑟 .
𝑣𝑣
If 𝜇𝜇 = 𝑛𝑛𝜋𝜋, the average number of heads, is held constant while 𝑛𝑛 becomes large and 𝜋𝜋
small,
𝜇𝜇 −𝜇𝜇
𝜇𝜇𝑟𝑟
lim 𝑃𝑃 𝑋𝑋 = 𝑣𝑣 𝑛𝑛, = 𝑐𝑐
𝑛𝑛→∞ 𝑛𝑛 𝑣𝑣!
STAT 2006 - Jan 2021 53

5) If 𝑋𝑋 and 𝑌𝑌 are independent random variables, with Poisson distributions with mean 𝜇𝜇
and 𝜆𝜆 respectively, then 𝑋𝑋 + 𝑌𝑌 is a random variable whose distribution is Poisson with
mean 𝜇𝜇 + 𝜆𝜆.
𝑛𝑛 𝑛𝑛
−𝜇𝜇
𝜇𝜇𝑘𝑘 −𝜆𝜆 𝜆𝜆𝑛𝑛−𝑘𝑘
𝑃𝑃 𝑋𝑋 + 𝑌𝑌 = 𝑛𝑛 = � 𝑃𝑃 𝑋𝑋 = 𝑘𝑘, 𝑌𝑌 = 𝑛𝑛 − 𝑘𝑘 = � 𝑐𝑐 � 𝑐𝑐
𝑘𝑘! 𝑛𝑛 − 𝑘𝑘 !
𝑛𝑛 𝑛𝑛
𝜇𝜇𝑘𝑘 𝜆𝜆𝑛𝑛−𝑘𝑘 𝑐𝑐 − 𝜇𝜇+𝜆𝜆 𝑛𝑛 𝑘𝑘 𝑛𝑛−𝑘𝑘 𝑐𝑐 − 𝜇𝜇+𝜆𝜆
= 𝑐𝑐 − 𝜇𝜇+𝜆𝜆
� = � 𝜇𝜇 𝜆𝜆 = 𝜇𝜇 + 𝜆𝜆 𝑛𝑛
𝑘𝑘! 𝑛𝑛 − 𝑘𝑘 ! 𝑛𝑛! 𝑘𝑘 𝑛𝑛!
STAT 2006 - Jan 2021 54


6) If 𝑋𝑋 and 𝑌𝑌 are independent random variables, with Poisson distributions with mean 𝜇𝜇
and 𝜆𝜆 respectively, then, given that 𝑋𝑋 + 𝑌𝑌 = 𝑛𝑛, the conditional distribution of 𝑋𝑋 is
𝜇𝜇
binomial with parameters 𝑛𝑛, 𝑝𝑝, where 𝑝𝑝 =
𝜇𝜇+𝜆𝜆
𝜇𝜇𝑘𝑘 −𝜆𝜆 𝜆𝜆𝑛𝑛−𝑘𝑘
𝑃𝑃 𝑋𝑋 = 𝑘𝑘, 𝑌𝑌 = 𝑛𝑛 − 𝑘𝑘 𝑐𝑐 −𝜇𝜇 � 𝑐𝑐
𝑘𝑘! 𝑛𝑛 − 𝑘𝑘 !
𝑃𝑃 𝑋𝑋 = 𝑘𝑘 𝑋𝑋 + 𝑌𝑌 = 𝑛𝑛 = =
𝑃𝑃 𝑋𝑋 + 𝑌𝑌 = 𝑛𝑛 𝑐𝑐 − 𝜇𝜇+𝜆𝜆
𝑛𝑛
𝜇𝜇 + 𝜆𝜆
𝑛𝑛!
𝑘𝑘 𝑛𝑛−𝑘𝑘
𝑛𝑛 𝜇𝜇 𝜆𝜆
=
𝑘𝑘 𝜇𝜇 + 𝜆𝜆 𝜇𝜇 + 𝜆𝜆
STAT 2006 - Jan 2021 55

Example (Poisson Approximation)for binomial
A typesetter, on the average, makes one error in every 500 words typeset. A typical page
contains 300 words. What is the probability that there will be no more than two errors in five
pages?
Solution Let 𝑋𝑋 be the Bernoulli trial of typing a word, where 𝑋𝑋 = 1 represents the
1
occurrence of a typing error. The success probability is therefore 𝜋𝜋 = . Now five pages
500
contain 1500 words. 𝑌𝑌 = ∑ 𝑋𝑋 has a binomial distribution with parameters 1500 and 𝜋𝜋.
2 𝑘𝑘 1500−𝑘𝑘
1500 1 499
𝑃𝑃 𝑌𝑌 ≤ 2 = 𝑃𝑃 𝑌𝑌 = 0 + 𝑃𝑃 𝑌𝑌 = 1 + 𝑃𝑃 𝑌𝑌 = 2 = �
𝑘𝑘 500 500
𝑘𝑘=0
= 0.423.
1
However, since 𝑛𝑛 = 1500 is large and 𝜋𝜋 = is small, we can use Poisson approximation
500
1 32
with 𝜆𝜆 = 1500 = 3. 𝑃𝑃 𝑌𝑌 ≤ 2 ≈ 𝑐𝑐 −3 1 +3+ = 0.4232.
500 2
STAT 2006 - Jan 2021 56

Continuous Random Variables
When the support 𝑆𝑆 of a random variable 𝑋𝑋 is uncountable, 𝑋𝑋 is called a continuous random

variable. For a continuous random variable 𝑋𝑋, it is not meaningful to consider 𝑃𝑃 𝑋𝑋 = 𝑥𝑥 .
We'll need to find 𝑃𝑃 𝑐𝑐 < 𝑋𝑋 < 𝑏𝑏 , for some interval 𝑐𝑐, 𝑏𝑏 ⊂ ℝ.
The probability density function ("pdf") of a continuous random variable 𝑋𝑋 with support 𝑆𝑆 is
an integrable function 𝑓𝑓𝑋𝑋 𝑥𝑥 satisfying the following:
• 𝑓𝑓𝑋𝑋 𝑥𝑥 ≥ 0, 𝑥𝑥 ∈ 𝑆𝑆;
• ∫𝑆𝑆 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 = 1;
• 𝑃𝑃 𝑋𝑋 ∈ 𝐴𝐴 = ∫𝐴𝐴 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥, where 𝐴𝐴 is some interval.
STAT 2006 - Jan 2021 57

Example.
Let 𝑋𝑋 be a continuous random variable whose probability density function is
𝑓𝑓𝑋𝑋 𝑥𝑥 = 3𝑥𝑥 2 , 0 < 𝑥𝑥 < 1.
First, note again that 𝑓𝑓𝑋𝑋 𝑥𝑥 ≠ 𝑃𝑃(𝑋𝑋 = 𝑥𝑥). For example, 𝑓𝑓𝑋𝑋 (0.9) = 3(0.9)2 = 2.43, which is
clearly not a probability! In the continuous case, 𝑓𝑓𝑋𝑋 𝑥𝑥 is instead the height of the curve
at 𝑋𝑋 = 𝑥𝑥, so that the total area under the curve is 1. It is areas under the curve that define
the probabilities.
1
Since 𝑓𝑓𝑋𝑋 𝑥𝑥 > 0, 0 < 𝑥𝑥 < 1 and ∫0 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 = 1, 𝑓𝑓𝑋𝑋 𝑥𝑥 is a valid pdf.
1
1 1 7
𝑃𝑃 < 𝑋𝑋 < 1 = � 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 = 𝑥𝑥 3 �1 = .
2 1
2
8
2
Since 𝑃𝑃 𝑋𝑋 = 𝑥𝑥 = 0, 𝑃𝑃 𝑐𝑐 ≤ 𝑋𝑋 ≤ 𝑏𝑏 = 𝑃𝑃 𝑐𝑐 ≤ 𝑋𝑋 < 𝑏𝑏 = 𝑃𝑃 𝑐𝑐 < 𝑋𝑋 ≤ 𝑏𝑏 = 𝑃𝑃 𝑐𝑐 < 𝑋𝑋 < 𝑏𝑏 .
STAT 2006 - Jan 2021 58

The cumulative distribution function (cdf) of 𝑋𝑋 is defined by
𝑥𝑥
𝐹𝐹𝑋𝑋 𝑥𝑥 = 𝑃𝑃𝑋𝑋 𝑋𝑋 ≤ 𝑥𝑥 = � 𝑓𝑓𝑋𝑋 𝑝𝑝 𝑀𝑀𝑝𝑝 , 𝑥𝑥 ∈ ℝ.
−∞
By the fundamental theorem of calculus,
𝑀𝑀
𝐹𝐹 𝑥𝑥 = 𝑓𝑓𝑋𝑋 𝑥𝑥 .
𝑀𝑀𝑥𝑥 𝑋𝑋
𝑥𝑥 3
Example. Let 𝑋𝑋 be a random variable whose pdf is 𝑓𝑓𝑋𝑋 𝑥𝑥 = , 0 < 𝑥𝑥 < 𝑐𝑐.
4
What is the value of 𝑐𝑐?
𝑐𝑐 3
𝑥𝑥 𝑐𝑐 4
� 𝑀𝑀𝑥𝑥 = 1 ⟹ = 1 ⟹ 𝑐𝑐 = 2. (c ≠ −2)
0 4 16
What is the cdf of 𝑋𝑋?
𝑥𝑥
𝑥𝑥 3
𝑝𝑝 𝑝𝑝 4 𝑥𝑥 4
𝐹𝐹𝑋𝑋 𝑥𝑥 = � 𝑀𝑀𝑝𝑝 = � = , 0 < 𝑥𝑥 < 2.
0 4 16 16
0
STAT 2006 - Jan 2021 59
Example. Suppose that the pdf of a random variable 𝑋𝑋 is given by
𝑥𝑥 + 1, −1 < 𝑥𝑥 < 0
𝑓𝑓𝑋𝑋 𝑥𝑥 = � .
1 − 𝑥𝑥, 0 ≤ 𝑥𝑥 < 1
For −1 < 𝑥𝑥 < 0,
𝑥𝑥
𝑥𝑥
𝑝𝑝 + 1 2 𝑥𝑥 + 1 2
𝐹𝐹𝑋𝑋 𝑥𝑥 = � 𝑝𝑝 + 1 𝑀𝑀𝑝𝑝 = � = .
−1 2 2
−1
For 0 ≤ 𝑥𝑥 < 1,
1
𝐹𝐹𝑋𝑋 0 = ;
2
𝑥𝑥
𝑥𝑥
1 1 − 𝑝𝑝 2
𝐹𝐹𝑋𝑋 𝑥𝑥 = 𝐹𝐹𝑋𝑋 0 + � 1 − 𝑝𝑝 𝑀𝑀𝑝𝑝 = − �
0 2 2
0
1 − 𝑥𝑥 2
=1− .
2
STAT 2006 - Jan 2021 60
Let 𝑋𝑋 be a random variable with pdf 𝑓𝑓𝑋𝑋 𝑥𝑥 . The 𝑝𝑝𝑡𝑡𝑡 percentile is 𝜋𝜋𝑝𝑝 where
𝜋𝜋𝑝𝑝
𝐹𝐹𝑋𝑋 𝜋𝜋𝑝𝑝 = � 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 = 𝑝𝑝.
−∞
• The 25th percentile, 𝜋𝜋0.25 , is called the first quartile (𝑞𝑞1 ).

• The 50th percentile, 𝜋𝜋0.5 , is called the median (𝑀𝑀) or the second quartile (𝑞𝑞2 ).
• The 75th percentile, 𝜋𝜋0.75 , is called the third quartile (𝑞𝑞3 ).
1
Example. Suppose that 𝑓𝑓𝑋𝑋 𝑥𝑥 = 𝑥𝑥 + 1 , −1 < 𝑥𝑥 < 1. What is the 64𝑡𝑡𝑡 percentile of 𝑋𝑋?
2
𝑥𝑥
𝑥𝑥 1 𝑡𝑡+1 2 𝑥𝑥+1 2
The cdf is 𝐹𝐹𝑋𝑋 𝑥𝑥 = ∫−1 𝑝𝑝 + 1 𝑀𝑀𝑝𝑝 = � = , −1 < 𝑥𝑥 < 1. The 64𝑡𝑡𝑡 percentile
2 4 −1 4
𝜋𝜋 +1 2
of 𝑋𝑋 is 𝜋𝜋0.64 where 0.64 = 0.64. It follows that 𝜋𝜋0.64 + 1 = ± 2.56. Thus, 𝜋𝜋0.64 =
4
− 2.6 or 0.6. Since −1 < 𝑥𝑥 < 1, 𝜋𝜋0.64 = 0.6.
STAT 2006 - Jan 2021 61
The expectation of a real-valued random variable 𝑋𝑋 is defined by
∞
𝐸𝐸 𝑋𝑋 = � 𝑥𝑥𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥
−∞
The 𝑣𝑣 𝑡𝑡𝑡 moment of 𝑋𝑋 is defined by
∞
𝐸𝐸 𝑋𝑋 𝑟𝑟 = � 𝑥𝑥 𝑟𝑟 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥
−∞
The variance of 𝑋𝑋 is defined by
∞
2
𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = � 𝑥𝑥 − 𝐸𝐸 𝑋𝑋 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 = 𝐸𝐸 𝑋𝑋 2 − 𝐸𝐸 𝑋𝑋 2
−∞
The mgf of 𝑋𝑋 (if it exists) is defined by
∞
𝑀𝑀𝑋𝑋 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑋𝑋 = � 𝑐𝑐 𝑡𝑡𝑥𝑥 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥
−∞
STAT 2006 - Jan 2021 62
Continuous Random Variables it x
以
iē 0
Example. Suppose that the pdf of 𝑋𝑋 is given by
六zetttsx
𝑓𝑓𝑋𝑋 𝑥𝑥 = 𝑥𝑥𝑐𝑐 −𝑥𝑥 , 𝑥𝑥 > 0.
it 就啊
ftp
Using mgf to find 𝐸𝐸 𝑋𝑋 and 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 . For 𝑝𝑝 < 1,
∞ ∞ ∞
𝑡𝑡𝑋𝑋 𝑡𝑡𝑥𝑥 −𝑥𝑥 − 1−𝑡𝑡 𝑥𝑥
1 − 1−𝑡𝑡 𝑥𝑥
𝑀𝑀 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 = � 𝑐𝑐 𝑥𝑥𝑐𝑐 𝑀𝑀𝑥𝑥 = � 𝑥𝑥𝑐𝑐 𝑀𝑀𝑥𝑥 = − � 𝑥𝑥𝑀𝑀𝑐𝑐
0 0 1 − 𝑝𝑝 0
∞ ∞
1 − 1−𝑡𝑡 𝑥𝑥
∞
− 1−𝑡𝑡 𝑥𝑥
1
=− 𝑥𝑥 𝑐𝑐 � − � 𝑐𝑐 𝑀𝑀𝑥𝑥 = � 𝑐𝑐 − 1−𝑡𝑡 𝑥𝑥 𝑀𝑀𝑥𝑥
1 − 𝑝𝑝 0 0 1 − 𝑝𝑝 0
1 − 1−𝑡𝑡 𝑥𝑥
∞ 1
=− 𝑐𝑐 � = .
1 − 𝑝𝑝 2 0 1 − 𝑝𝑝 2
2 6
Since 𝑀𝑀′ 𝑝𝑝 = and 𝑀𝑀 2 𝑝𝑝 = , 𝐸𝐸 𝑋𝑋 = 2 and 𝐸𝐸 𝑋𝑋 2 = 6. Thus, 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = 2.
1−𝑡𝑡 3 1−𝑡𝑡 4
STAT 2006 - Jan 2021 63

Uniform Distribution
A uniformly distributed random variable 𝑋𝑋 is defined on a fixed interval 𝑐𝑐, 𝑏𝑏 . Its pdf is given
1
by 𝑓𝑓𝑋𝑋 𝑥𝑥 = , if 𝑥𝑥 ∈ 𝑐𝑐, 𝑏𝑏
𝑏𝑏−𝑎𝑎
Properties
𝑏𝑏
a) ∫𝑎𝑎 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 = 1.
b) If 𝑍𝑍 is uniformly distributed on 0,1 , then 𝑏𝑏 − 𝑐𝑐 𝑍𝑍 + 𝑐𝑐 is uniformly distributed on
𝑐𝑐, 𝑏𝑏 .
𝑥𝑥 1 𝑥𝑥−𝑎𝑎
c) The cdf is 𝐹𝐹𝑋𝑋 𝑥𝑥 = ∫𝑎𝑎 𝑀𝑀𝑝𝑝 = , 𝑥𝑥 ∈ 𝑐𝑐, 𝑏𝑏 .
𝑏𝑏−𝑎𝑎 𝑏𝑏−𝑎𝑎
𝑏𝑏 𝑒𝑒 𝑡𝑡𝑡𝑡 1 𝑒𝑒 𝑡𝑡𝑡𝑡 −𝑒𝑒 𝑡𝑡𝑡𝑡
d) The mgf is 𝑀𝑀 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑋𝑋 = ∫𝑎𝑎 𝑏𝑏−𝑎𝑎 𝑀𝑀𝑣𝑣 = 𝑐𝑐 𝑡𝑡𝑡𝑡 |𝑏𝑏𝑎𝑎 = , 𝑝𝑝 ≠ 0.
𝑡𝑡 𝑏𝑏−𝑎𝑎 𝑡𝑡 𝑏𝑏−𝑎𝑎
𝑡𝑡
𝑏𝑏 ∫𝑡𝑡 𝑥𝑥𝑑𝑑𝑥𝑥 𝑏𝑏2 −𝑎𝑎2 𝑎𝑎+𝑏𝑏
e) 𝐸𝐸 𝑋𝑋 = ∫𝑎𝑎 𝑥𝑥𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 = = = ;
𝑏𝑏−𝑎𝑎 2 𝑏𝑏−𝑎𝑎 2
𝑡𝑡+𝑡𝑡 2 𝑡𝑡+𝑡𝑡 3 𝑡𝑡+𝑡𝑡 3
𝑏𝑏 𝑥𝑥− 𝑏𝑏− − 𝑎𝑎− 𝑏𝑏−𝑎𝑎 2
2 2 2
f) 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = ∫𝑎𝑎 𝑀𝑀𝑥𝑥 = =
𝑏𝑏−𝑎𝑎 3 𝑏𝑏−𝑎𝑎 12
STAT 2006 - Jan 2021 64
Application of uniform random numbers
1. Random Assignment to groups
Objective to assign group memberships randomly to 30 students.
Step 1: Generate 30 𝑈𝑈 0,1 random numbers. Every student is associated with a
random number.
Step 2: Sort the random numbers in ascending order.
Step 3: The students with the smallest 10 random numbers are assigned to Group 1.
The students with the second smallest 10 random numbers are assigned to
Group 2. The students with the largest 10 random numbers are assigned to
Group 3.
2. Random selection to participate in a survey.
Objective to select 1000 from 40,000 people to participate in a survey.
Generate 40,000 𝑈𝑈 0,1 random numbers and sort them in ascending order. Select the
individuals with the smallest 1,000 random numbers.
STAT 2006 - Jan 2021 65

Exponential Distribution
Suppose that 𝑋𝑋 is the number of customers arriving at a bank in one hour. If 𝜆𝜆 > 0 is the
mean number of customers arriving in one hour, the number of customers arriving in 𝑝𝑝
hours has a Poisson distribution with mean 𝜆𝜆𝑝𝑝. Let 𝑊𝑊 be the waiting time until the first
customer arrives.
𝐹𝐹𝑊𝑊 𝑤𝑤 = 𝑃𝑃 𝑊𝑊 ≤ 𝑤𝑤 = 1 − 𝑃𝑃 𝑊𝑊 > 𝑤𝑤 = 1 − 𝑃𝑃 no customer arrived in 0, 𝑤𝑤
= 1 − 𝑐𝑐 −𝜆𝜆𝜆𝜆 .
For 𝑤𝑤 > 0, the pdf of 𝑊𝑊 is defined by
𝑀𝑀
𝑓𝑓𝑊𝑊 𝑤𝑤 = 𝐹𝐹𝑊𝑊 𝑤𝑤 = 𝜆𝜆𝑐𝑐 −𝜆𝜆𝜆𝜆 , 𝑤𝑤 > 0.
𝑀𝑀𝑤𝑤
𝑊𝑊 has an exponential distribution with parameter 𝜆𝜆.
STAT 2006 - Jan 2021 66

The mgf of Exponential random variable 𝑋𝑋 with parameter 𝜆𝜆 is given by
∞ ∞
𝜆𝜆 ∞ 𝜆𝜆
𝑀𝑀 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑋𝑋 =� 𝑐𝑐 𝑡𝑡𝑥𝑥 𝜆𝜆𝑐𝑐 −𝜆𝜆𝑥𝑥 𝑀𝑀𝑥𝑥 = 𝜆𝜆 � 𝑐𝑐 − 𝜆𝜆−𝑡𝑡 𝑥𝑥 𝑀𝑀𝑥𝑥 = − 𝑐𝑐 − 𝜆𝜆−𝑡𝑡 𝑥𝑥 � = ,
0 0 𝜆𝜆 − 𝑝𝑝 0 𝜆𝜆 − 𝑝𝑝
where 𝑝𝑝 < 𝜆𝜆. 7 tlul
xcz
𝑑𝑑 𝜆𝜆 𝑑𝑑 2 2𝜆𝜆
Since 𝑀𝑀 𝑝𝑝 = and 𝑀𝑀 𝑝𝑝 = , we have
𝑑𝑑𝑡𝑡 𝜆𝜆−𝑡𝑡 2 𝑑𝑑𝑡𝑡 2 𝜆𝜆−𝑡𝑡 3
E
𝑀𝑀 1
i
𝑀𝑀𝑝𝑝 𝑡𝑡=0
𝜆𝜆
𝑀𝑀2
器前 2
2 1
𝐸𝐸 𝑋𝑋 = 2 𝑀𝑀 𝑝𝑝 � = ⟹ 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = .
𝑀𝑀𝑝𝑝 𝜆𝜆2 𝜆𝜆2
𝑡𝑡=0
STAT 2006 - Jan 2021 67

Memory-less property
𝑃𝑃 𝑋𝑋>𝑡𝑡+𝑡𝑡,𝑋𝑋>𝑡𝑡 𝑃𝑃 𝑋𝑋>𝑡𝑡+𝑡𝑡 𝑒𝑒 −𝜆𝜆 𝑡𝑡+𝑡𝑡
𝑃𝑃 𝑋𝑋 > 𝑣𝑣 + 𝑝𝑝 𝑋𝑋 > 𝑝𝑝 = = = = 𝑐𝑐 −𝜆𝜆𝑡𝑡 = 𝑃𝑃 𝑋𝑋 > 𝑣𝑣
𝑃𝑃 𝑋𝑋>𝑡𝑡 𝑃𝑃 𝑋𝑋>𝑡𝑡 𝑒𝑒 −𝜆𝜆𝑡𝑡
Example ⼈
ÉG
Suppose that the amount of time one spends in a bank is exponentially distributed with
1
mean 10 minutes, 𝜆𝜆 = . What is the probability that a customer will spend more than 15
10
minutes in the bank? What is the probability that a customer will spend more than 15
minutes in the bank given that he is still in the bank after 10 minutes?
Solution
𝑃𝑃 𝑋𝑋 > 15 = 𝑐𝑐 −15𝜆𝜆 = 0.22
𝑃𝑃 𝑋𝑋 > 15 𝑋𝑋 > 10 = 𝑃𝑃 𝑋𝑋 > 5 = 𝑐𝑐 −0.5 = 0.604
STAT 2006 - Jan 2021 68

Gamma Distribution
Suppose that 𝑋𝑋 is the number of customers arriving at a bank in one hour. If 𝜆𝜆 > 0 is the
mean number of customers arriving in one hour, the number of customers arriving in 𝑝𝑝
hours has a Poisson distribution with mean 𝜆𝜆𝑝𝑝. Let 𝑊𝑊 be the waiting time until the 𝛼𝛼 𝑡𝑡𝑡
customer arrives.
𝐹𝐹𝑊𝑊 𝑤𝑤 = 𝑃𝑃 𝑊𝑊 ≤ 𝑤𝑤 = 1 − 𝑃𝑃 𝑊𝑊 > 𝑤𝑤
𝛼𝛼−1
𝜆𝜆𝑤𝑤 𝑘𝑘
= 1 − 𝑃𝑃 fewer than α customers arrived in 0, 𝑤𝑤 =1−� 𝑐𝑐 −𝜆𝜆𝜆𝜆
𝑘𝑘!
𝑘𝑘=0
𝛼𝛼−1
𝜆𝜆𝑤𝑤 𝑘𝑘
=1 − 𝑐𝑐 −𝜆𝜆𝜆𝜆 −� 𝑐𝑐 −𝜆𝜆𝜆𝜆 .
𝑘𝑘!
𝑘𝑘=1
ˋ STAT 2006 - Jan 2021 69

豐 ie
焦悲 kǚétuiin
Gamma Distribution
⼆
点悲 kniwgē
For 𝑤𝑤 > 0, the pdf of 𝑊𝑊 is defined by
𝛼𝛼−1 ⼆
𝑓𝑓𝑊𝑊 𝑤𝑤 =
𝑀𝑀
𝐹𝐹𝑊𝑊 𝑤𝑤 = 𝜆𝜆𝑐𝑐 −𝜆𝜆𝜆𝜆 −�
𝜆𝜆𝑘𝑘
𝑘𝑘𝑤𝑤 𝑘𝑘−1 − 𝜆𝜆𝑤𝑤 𝑘𝑘 𝑐𝑐 −𝜆𝜆𝜆𝜆 ē 意悲cknixu
𝑀𝑀𝑤𝑤
𝛼𝛼−1
𝑘𝑘!
𝑘𝑘
𝑘𝑘=1
iy.it 等
−𝜆𝜆𝜆𝜆 −𝜆𝜆𝜆𝜆
𝜆𝜆𝑤𝑤 𝜆𝜆𝑤𝑤 𝑘𝑘−1 𝜆𝜆𝑤𝑤 𝛼𝛼−1
= 𝜆𝜆𝑐𝑐 + 𝜆𝜆𝑐𝑐 � − = 𝜆𝜆𝑐𝑐 −𝜆𝜆𝜆𝜆 + 𝜆𝜆𝑐𝑐 −𝜆𝜆𝜆𝜆 −1
𝑘𝑘! 𝑘𝑘 − 1 ! 𝛼𝛼 − 1 !
𝑘𝑘=1
𝜆𝜆𝛼𝛼 𝑤𝑤 𝛼𝛼−1
= 𝑐𝑐 −𝜆𝜆𝜆𝜆 .
𝛼𝛼 − 1 !
iiǐ
xē 吢等哎答 ie
STAT 2006 - Jan 2021 70

Gamma Distribution
Since 𝑓𝑓𝑊𝑊 𝑤𝑤 ∝ 𝜆𝜆𝛼𝛼 𝑤𝑤 𝛼𝛼−1 𝑐𝑐 −𝜆𝜆𝜆𝜆 , write 𝑓𝑓𝑊𝑊 𝑤𝑤 = 𝐾𝐾𝜆𝜆𝛼𝛼 𝑤𝑤 𝛼𝛼−1 𝑐𝑐 −𝜆𝜆𝜆𝜆 , 𝑤𝑤 > 0, 𝛼𝛼 > 0, 𝜆𝜆 > 0. We
1
can determine that 𝐾𝐾 = , where Γ 𝛼𝛼 is the Gamma function.
Γ 𝛼𝛼
Properties of Gamma function

• Γ 𝛼𝛼 = 𝛼𝛼 − 1 Γ 𝛼𝛼 − 1 .
• Γ 𝑛𝑛 = 𝑛𝑛 − 1 !, for an integer 𝑛𝑛.
𝜆𝜆𝛼𝛼 𝜆𝜆 𝛼𝛼−1 𝑒𝑒 −𝜆𝜆𝜆𝜆
A random variable 𝑋𝑋 with pdf 𝑓𝑓𝑊𝑊 𝑤𝑤 = , , 𝑤𝑤 > 0, 𝛼𝛼 > 0, 𝜆𝜆 > 0, is called a
Γ 𝛼𝛼
Gamma random variable.
STAT 2006 - Jan 2021 71

Gamma Distribution
The mgf of a Gamma random variable 𝑋𝑋 is given by
∞ ∞
𝑡𝑡𝑋𝑋 𝑡𝑡𝑥𝑥
𝜆𝜆𝛼𝛼 𝑥𝑥 𝛼𝛼−1 𝑐𝑐 −𝜆𝜆𝑥𝑥 𝜆𝜆𝛼𝛼 𝜆𝜆𝛼𝛼 Γ 𝛼𝛼
𝑀𝑀 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 = � 𝑐𝑐 𝑀𝑀𝑥𝑥 = � 𝑥𝑥 𝛼𝛼−1 𝑐𝑐 − 𝜆𝜆−𝑡𝑡 𝑥𝑥 𝑀𝑀𝑥𝑥 = � 𝛼𝛼
0 Γ 𝛼𝛼 Γ 𝛼𝛼 0 Γ 𝛼𝛼 𝜆𝜆 − 𝑝𝑝
𝛼𝛼
=
𝜆𝜆
𝜆𝜆 − 𝑝𝑝
, 𝑝𝑝 < 𝜆𝜆. 亦以 0⽅
𝑑𝑑 𝜆𝜆𝛼𝛼 𝛼𝛼 𝑑𝑑 2 𝜆𝜆𝛼𝛼 𝛼𝛼 𝛼𝛼+1
Since 𝑀𝑀 𝑝𝑝 = and 𝑀𝑀 𝑝𝑝 = , we have
𝑑𝑑𝑡𝑡 𝜆𝜆−𝑡𝑡 𝛼𝛼+1 𝑑𝑑𝑡𝑡 2 𝜆𝜆−𝑡𝑡 𝛼𝛼+2
𝑀𝑀 𝛼𝛼
𝑀𝑀𝑝𝑝 𝑡𝑡=0
𝜆𝜆
𝑀𝑀 2 𝛼𝛼 𝛼𝛼 + 1 𝛼𝛼
𝐸𝐸 𝑋𝑋 2 = 2 𝑀𝑀 𝑝𝑝 � = ⟹ 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = .
𝑀𝑀𝑝𝑝 𝜆𝜆2 𝜆𝜆2
𝑡𝑡=0
STAT 2006 - Jan 2021 72

Gamma Distribution
Example. Engineers designing the next generation of space shuttles plan to include two
fuel pumps — one active, the other in reserve. If the primary pump malfunctions, the
second is automatically brought on line. Suppose a typical mission is expected to require
that fuel be pumped for at most 50 hours. According to the manufacturer's specifications,
pumps are expected to fail once every 100 hours. What are the chances that such a fuel
pump system would remain functioning for the full 50 hours?
1
Solution. We are given that the average number of failures in an hour is 𝜆𝜆 = . Let 𝑌𝑌 be
100
the waiting time until both bumps fails. Then, 𝑌𝑌 follows a Gamma distribution with
parameter 𝜆𝜆 and 𝛼𝛼 = 2.
∞ 2 2−1 −𝜆𝜆𝑥𝑥 ∞ ∞
𝜆𝜆 𝑥𝑥 𝑐𝑐 2 −𝜆𝜆𝑥𝑥
𝑃𝑃 𝑌𝑌 > 50 = � 𝑀𝑀𝑥𝑥 = 𝜆𝜆 � 𝑥𝑥𝑐𝑐 𝑀𝑀𝑥𝑥 = −𝜆𝜆 � 𝑥𝑥𝑀𝑀𝑐𝑐 −𝜆𝜆𝑥𝑥
50 Γ 2 50 50
∞ ∞ ∞
= −𝜆𝜆 𝑥𝑥𝑐𝑐 −𝜆𝜆𝑥𝑥 � −� 𝑐𝑐 −𝜆𝜆𝑥𝑥 𝑀𝑀𝑥𝑥 = 50𝜆𝜆𝑐𝑐 −50𝜆𝜆 − 𝑐𝑐 −𝜆𝜆𝑥𝑥 � = 50𝜆𝜆𝑐𝑐 −50𝜆𝜆 + 𝑐𝑐 −50𝜆𝜆 = 0.91.
50 50 50
STAT 2006 - Jan 2021 73
Chi-Square Distribution
1 𝑟𝑟
Let 𝑋𝑋 follow a gamma distribution with 𝜆𝜆 = and 𝛼𝛼 = , where 𝑣𝑣 is a positive integer. Then
2 2
the probability density function of 𝑋𝑋 is
𝑟𝑟 𝑥𝑥
−1 −2
𝑥𝑥 𝑐𝑐
2
𝑓𝑓𝑋𝑋 𝑥𝑥 = , 𝑥𝑥 > 0.
𝑟𝑟
𝑣𝑣
22 Γ
2
We say that 𝑋𝑋 follows a chi-square distribution with r degrees of freedom, denoted 𝜒𝜒 2 (𝑣𝑣).
𝑟𝑟
−
The mgf of 𝑋𝑋 is 𝑀𝑀 𝑝𝑝 = 1 − 2𝑝𝑝 2 , 𝐸𝐸 𝑋𝑋 = 𝑣𝑣 and 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = 2𝑣𝑣.
STAT 2006 - Jan 2021 74

Normal Distribution
Let 𝑋𝑋 be normally distributed with mean 𝜇𝜇 and variance 𝜎𝜎 2 . The pdf of 𝑋𝑋 is defined by
2
1 𝑥𝑥 − 𝜇𝜇 2
𝑓𝑓𝑋𝑋 𝑥𝑥 𝜇𝜇, 𝜎𝜎 = 𝑐𝑐𝑥𝑥𝑝𝑝 − 2
, −∞ < 𝑥𝑥 < ∞.
2𝜋𝜋𝜎𝜎 2 2𝜎𝜎
The normal distribution plays a central role in statistics. There are three main reasons:
1) It and its associated distributions are very tractable analytically.
2) It has the familiar bell shape, whose symmetry makes it an appealing choice for
many popular models.
3) There is the Central Limit Theorem, which shows that, under mild conditions, the
normal distribution can be used to approximate a large variety of distributions in
large samples.
STAT 2006 - Jan 2021 75

Normal Distribution
STAT 2006 - Jan 2021 76

Properties
1) The normal distribution is characterized by its mean 𝜇𝜇 and variance 𝜎𝜎 2 .
2) It is symmetrical about the mean 𝜇𝜇. Its mode, median and mean are all equal.
𝑋𝑋−𝜇𝜇
3) If 𝑋𝑋 is a normally distributed random variable with mean 𝜇𝜇 and variance 𝜎𝜎 2 , then 𝑍𝑍 = is normally
𝜎𝜎
distributed with mean 0 and variance 1, called standard normal random variable. That is, normal
distribution belongs to the family of location and scale distributions.
4) The cdf of a standard normal random variable 𝑍𝑍 is defined by
Φ 𝑧𝑧 = 𝑃𝑃 𝑍𝑍 < 𝑧𝑧 .
Table is often available for finding the value of Φ 𝑧𝑧 for a given 𝑧𝑧 value. You can visit the website
http://stattrek.com/online-calculator/normal.aspx to evaluate either a 𝑧𝑧 for a given value of Φ 𝑧𝑧 or a
value of Φ 𝑧𝑧 for a given value of 𝑧𝑧. Hence
𝑃𝑃 𝑐𝑐 < 𝑍𝑍 < 𝑏𝑏 = Φ 𝑏𝑏 − Φ 𝑐𝑐 .
STAT 2006 - Jan 2021 77

Example.
The United States Environmental Protection Agency (EPA) has developed procedures for
measuring vehicle emission levels of nitrogen oxide. Let 𝑋𝑋 denote the amount of this
pollution in a randomly selected automobile in Houston, Texas. Suppose the distribution of
𝑋𝑋 can be adequately modeled by a normal distribution with a mean level of 𝜇𝜇 = 70 ppb
(parts per billion) and standard deviation of 𝜎𝜎 = 13 ppb.
(a) What is the probability that a randomly selected vehicle will have emission levels less
than 60 ppb?
(b) What is the probability that a randomly selected vehicle will have emission levels
greater than 90 ppb?
(c) What is the probability that a randomly selected vehicle will have emission levels
between 60 and 90 ppb?
STAT 2006 - Jan 2021 78

Example.
(a) What is the probability that a randomly selected vehicle will have emission levels less than 60
ppb?
60 − 70
𝑃𝑃 𝑋𝑋 ≤ 60 = 𝑃𝑃 𝑍𝑍 ≤ = 𝑃𝑃 𝑍𝑍 ≤ −0.77 = 0.2206.
13
(b) What is the probability that a randomly selected vehicle will have emission levels between 60
and 90 ppb?
𝑃𝑃 60 ≤ 𝑋𝑋 ≤ 90 = 𝑃𝑃 −0.77 ≤ 𝑍𝑍 ≤ 1.54 = 0.9382 − 0.2206 = 0.7176.
ǐǒm
(c) A State of Texas environmental agency is going to offer a reduced vehicle license fee to
those vehicles having very low emission levels. As a preliminary pilot project, they will offer this
incentive to the group of vehicle owners having the best 10% of emission levels. What emission level
should the agency use?
𝑥𝑥 − 70
0.1 = 𝑃𝑃 𝑋𝑋 ≤ 𝑥𝑥 = 𝑃𝑃 𝑍𝑍 ≤ .
13
𝑥𝑥−70
It follows that = −1.28 or 𝑥𝑥 = 53.36
13
STAT 2006 - Jan 2021 79
Illustration of Mathematical Properties
Let 𝑋𝑋 be normally distributed with mean 𝜇𝜇 and variance 𝜎𝜎 2 .
• The mgf of 𝑋𝑋 is
1 ∞ 𝑥𝑥−𝜇𝜇 2 1 ∞ 1
𝑡𝑡𝑋𝑋 𝑡𝑡𝑥𝑥 − 2 − 2 𝑥𝑥 2 −2 𝜇𝜇+𝜎𝜎 2 𝑡𝑡 𝑥𝑥+𝜇𝜇2
𝑀𝑀 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 = � 𝑐𝑐 𝑐𝑐 2𝜎𝜎 𝑀𝑀𝑥𝑥 = � 𝑐𝑐 2𝜎𝜎 𝑀𝑀𝑥𝑥
2
2𝜋𝜋𝜎𝜎 −∞ 2
2𝜋𝜋𝜎𝜎 −∞
2 2
1 ∞ 𝑥𝑥−𝜇𝜇−𝜎𝜎 𝑡𝑡 1 2
1 2 2
2 𝜇𝜇+𝜎𝜎 𝑡𝑡 −𝜇𝜇
2 − 2
2
2 𝜎𝜎 𝑡𝑡 2𝜇𝜇+𝜎𝜎 𝑡𝑡
1
𝜇𝜇𝑡𝑡+2𝜎𝜎 2 𝑡𝑡 2
= 𝑐𝑐 2𝜎𝜎 � 𝑐𝑐 2𝜎𝜎 𝑀𝑀𝑥𝑥 = 𝑐𝑐 2𝜎𝜎 = 𝑐𝑐 .
2𝜋𝜋𝜎𝜎 2 −∞
𝑋𝑋−𝜇𝜇
• Since 𝑍𝑍 = has mean 0 and variance 1,
𝜎𝜎
𝐸𝐸 𝑋𝑋 = 𝐸𝐸 𝜇𝜇 + 𝜎𝜎𝑍𝑍 = 𝜇𝜇
𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = 𝑉𝑉𝑐𝑐𝑣𝑣 𝜇𝜇 + 𝜎𝜎𝑍𝑍 = 𝜎𝜎 2
STAT 2006 - Jan 2021 80

Illustration of Mathematical Properties
Let 𝑋𝑋 be normally distributed with mean 𝜇𝜇 and variance 𝜎𝜎 2 .
𝑋𝑋−𝜇𝜇 2
𝑉𝑉 = = 𝑍𝑍 2 is distributed as a chi-square random variable with 1 degree of freedom.
𝜎𝜎
Proof. distribution
∞ 2 ∞
1 2 −𝑧𝑧 1 1−2𝑡𝑡 2 1
CMOÜǑOMN
2 −
𝐸𝐸 𝑐𝑐 𝑡𝑡𝑍𝑍 = � 𝑡𝑡𝑧𝑧
𝑐𝑐 𝑐𝑐 2 𝑀𝑀𝑧𝑧 = � 𝑐𝑐 2 𝑧𝑧 𝑀𝑀𝑧𝑧 = 1 − 2𝑝𝑝 −2
.
2𝜋𝜋 −∞ 2𝜋𝜋 −∞
Thus, 𝑉𝑉 = 𝑍𝑍 2 is a chi-square random variable with 1 degree of freedom.
Example. Find the probability that the standard normal random variable 𝑍𝑍 falls between
− 1.96 and 1.96 in two ways:
• using the standard normal distribution Plzi 196
𝑃𝑃 −1.96 < 𝑍𝑍 < 1.96 = 𝑃𝑃 𝑍𝑍 < 1.96 − 𝑃𝑃 𝑍𝑍 > 1.96 = 0.975 − 0.025 = 0.95
• using the chi-square distribution
𝑃𝑃 −1.96 < 𝑍𝑍 < 1.96 = 𝑃𝑃 𝑍𝑍 2 < 1.962 = 𝑃𝑃 𝜒𝜒12 < 3.8416 = 0.95
STAT 2006 - Jan 2021 81
Two Discrete Random Variables
Let 𝑋𝑋 and 𝑌𝑌 be two discrete random variables, and let 𝑆𝑆 denote the two-dimensional
support of 𝑋𝑋 and 𝑌𝑌. Then, the function 𝑓𝑓𝑋𝑋,𝑌𝑌 (𝑥𝑥, 𝐻𝐻) = 𝑃𝑃(𝑋𝑋 = 𝑥𝑥, 𝑌𝑌 = 𝐻𝐻) is a joint probability
mass function (pmf) if it satisfies the following three conditions:
• 0 ≤ 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 ≤ 1
• � 𝑥𝑥,𝑦𝑦 ∈𝑆𝑆 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 1
• 𝑃𝑃 𝑋𝑋, 𝑌𝑌 ∈ 𝐴𝐴 = � 𝑓𝑓
𝑥𝑥,𝑦𝑦 ∈𝐴𝐴 𝑋𝑋,𝑌𝑌
𝑥𝑥, 𝐻𝐻 , where 𝐴𝐴 ⊂ 𝑆𝑆.
STAT 2006 - Jan 2021 82

Example. Two caplets are selected at random from a bottle containing three aspirins, two
sedatives and two placebo caplets. We are assuming that the caplets are well mixed and
that each has an equal chance of being selected.
Let 𝑋𝑋 and 𝑌𝑌 denote, respectively, the numbers of aspirin caplets, and the number of
sedative caplets, included among the two caplets drawn from the bottle.
3 2 2
𝑥𝑥 𝐻𝐻 2 − 𝑥𝑥 − 𝐻𝐻
𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = , 𝑥𝑥 + 𝐻𝐻 ≤ 2.
7
2
𝒚𝒚/𝒙𝒙 0 1 2 𝑷𝑷 𝒀𝒀 = 𝒚𝒚 Marginal of 𝑌𝑌 are row
sums.
0 1/21 6/21 3/21 10/21
1 4/21 6/21 0 10/21
Marginal of 𝑋𝑋 are column
2 1/21 0 0 1/21
sums.
𝑷𝑷 𝑿𝑿 = 𝒙𝒙 6/21 12/21 3/21 1
STAT 2006 - Jan 2021 83

Example. Let 𝑋𝑋 and 𝑌𝑌 be two independent random variables having respective pmf
𝑓𝑓𝑋𝑋 𝑥𝑥 = 1 − 𝜆𝜆 𝜆𝜆𝑥𝑥 and 𝑓𝑓𝑌𝑌 𝐻𝐻 = 1 − 𝜇𝜇 𝜇𝜇 𝑦𝑦 for 𝑥𝑥, 𝐻𝐻 = 0,1,2, … What is the pmf of 𝑍𝑍 =
min 𝑋𝑋, 𝑌𝑌 ?
Note that for 𝑧𝑧 ≥ 0.
∞ ∞
𝑃𝑃 𝑍𝑍 ≥ 𝑧𝑧 = 𝑃𝑃 𝑋𝑋 ≥ 𝑧𝑧, 𝑌𝑌 ≥ 𝑧𝑧 = 𝑃𝑃 𝑋𝑋 ≥ 𝑧𝑧 𝑃𝑃 𝑌𝑌 ≥ 𝑧𝑧 = � 1 − 𝜆𝜆 𝜆𝜆𝑥𝑥 � 1 − 𝜇𝜇 𝜇𝜇 𝑦𝑦
𝑥𝑥=𝑧𝑧 𝑦𝑦=𝑧𝑧
𝑧𝑧 𝑧𝑧
= 𝜆𝜆 𝜇𝜇 .
Hence, for any 𝑧𝑧 ≥ 0, 𝑃𝑃 𝑍𝑍 = 𝑧𝑧 = 𝑃𝑃 𝑍𝑍 ≥ 𝑧𝑧 − 𝑃𝑃 𝑍𝑍 ≥ 𝑧𝑧 + 1 = 𝜆𝜆𝑧𝑧 𝜇𝜇 𝑧𝑧 1 − 𝜆𝜆𝜇𝜇 .
STAT 2006 - Jan 2021 84

Let 𝑋𝑋 and 𝑌𝑌 be two discrete random variables having joint probability mass function
𝑓𝑓𝑋𝑋,𝑦𝑦 𝑥𝑥, 𝐻𝐻 defined on the joint support 𝑆𝑆 . Let 𝑆𝑆1 = 𝑥𝑥: ∃𝐻𝐻 𝑣𝑣𝑝𝑝 𝑥𝑥, 𝐻𝐻 ∈ 𝑆𝑆 and 𝑆𝑆2 =
𝐻𝐻: ∃𝑥𝑥 𝑣𝑣𝑝𝑝 𝑥𝑥, 𝐻𝐻 ∈ 𝑆𝑆 . Let 𝑣𝑣 𝑋𝑋, 𝑌𝑌 be a function of 𝑋𝑋 and 𝑌𝑌.
𝐸𝐸 𝑣𝑣 𝑋𝑋, 𝑌𝑌 = � 𝑣𝑣 𝑥𝑥, 𝐻𝐻 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻

𝑥𝑥,𝑦𝑦 ∈𝑆𝑆
is called the expected value of 𝑣𝑣 𝑋𝑋, 𝑌𝑌 .
Note that
• 𝐸𝐸 𝑋𝑋 = ∑𝑥𝑥∈𝑆𝑆1 𝑥𝑥𝑓𝑓𝑋𝑋 𝑥𝑥 = ∑𝑥𝑥∈𝑆𝑆1 ∑𝑦𝑦∈𝑆𝑆2 𝑥𝑥𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻
2 2
• 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = ∑𝑥𝑥∈𝑆𝑆1 𝑥𝑥 − 𝐸𝐸 𝑋𝑋 𝑓𝑓𝑋𝑋 𝑥𝑥 = ∑𝑥𝑥∈𝑆𝑆1 ∑𝑦𝑦∈𝑆𝑆2 𝑥𝑥 − 𝐸𝐸 𝑋𝑋 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻
• 𝐸𝐸 𝑌𝑌 = ∑𝑦𝑦∈𝑆𝑆2 𝐻𝐻𝑓𝑓𝑌𝑌 𝐻𝐻 = ∑𝑦𝑦∈𝑆𝑆2 ∑𝑥𝑥∈𝑆𝑆1 𝐻𝐻𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻
STAT 2006 - Jan 2021 85

The correlation coefficient
2 2
• 𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌 = ∑𝑦𝑦∈𝑆𝑆2 𝐻𝐻 − 𝐸𝐸 𝑌𝑌 𝑓𝑓𝑌𝑌 𝐻𝐻 = ∑𝑦𝑦∈𝑆𝑆2 ∑𝑥𝑥∈𝑆𝑆1 𝐻𝐻 − 𝐸𝐸 𝑌𝑌 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻
Let 𝑋𝑋 and 𝑌𝑌 be random variables (discrete or continuous) with means 𝜇𝜇𝑋𝑋 and 𝜇𝜇𝑌𝑌 . Let the
joint support of 𝑋𝑋 and 𝑌𝑌 be 𝑆𝑆. The covariance of 𝑋𝑋 and 𝑌𝑌 is defined by
𝜎𝜎𝑋𝑋,𝑌𝑌 = 𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 = 𝐸𝐸 𝑋𝑋 − 𝜇𝜇𝑋𝑋 𝑌𝑌 − 𝜇𝜇𝑌𝑌 .
• For discrete case
𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 = � 𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝐻𝐻 − 𝜇𝜇𝑌𝑌 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻
𝑥𝑥,𝑦𝑦 ∈𝑆𝑆
• For continuous case
𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 = � � 𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝐻𝐻 − 𝜇𝜇𝑌𝑌 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 𝑀𝑀𝐻𝐻𝑀𝑀𝑥𝑥

𝑥𝑥∈𝑆𝑆1 𝑦𝑦∈𝑆𝑆2
STAT 2006 - Jan 2021 86

Suppose that 𝑋𝑋 and 𝑌𝑌 have the following joint pmf. What is 𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 ?
𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 𝑌𝑌 = 1 𝑌𝑌 = 2 𝑌𝑌 = 3 𝑓𝑓𝑋𝑋 𝑥𝑥
𝑋𝑋 = 1 0.25 0.25 0 0.5
𝑋𝑋 = 2 0 0.25 0.25 0.5
𝑓𝑓𝑌𝑌 𝐻𝐻 0.25 0.5 0.25 1.0
𝜇𝜇𝑋𝑋 = 1 × 0.5 + 2 × 0.5 = 1.5

𝜇𝜇𝑌𝑌 = 1 × 0.25 + 2 × 0.5 + 3 × 0.25 = 2
𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌
= 1 − 1.5 1 − 2 × 0.25 + 1 − 1.5 2 − 2 × 0.25 + 1 − 1.5 3 − 2 × 0
+ 2 − 1.5 1 − 2 × 0 + 2 − 1.5 2 − 2 × 0.25 + 2 − 1.5 3 − 2 × 0.25 = 0.25.
We can calculate 𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 as 𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 = 𝐸𝐸 𝑋𝑋𝑌𝑌 − 𝜇𝜇𝑋𝑋 𝜇𝜇𝑌𝑌 .
𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌
= 1 × 1 × 0.25 + 1 × 2 × 0.25 + 1 × 3 × 0 + 2 × 1 × 0 + 2 × 2 × 0.25 + 2 × 3 × 0.25
− 1.5 × 2 = 0.25.
STAT 2006 - Jan 2021 87
Let 𝑋𝑋 and 𝑌𝑌 be random variables (discrete or continuous) with standard deviations 𝜎𝜎𝑋𝑋 and
𝜎𝜎𝑌𝑌 . The correlation coefficient of 𝑋𝑋 and 𝑌𝑌 is defined by
𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 𝜎𝜎𝑋𝑋,𝑌𝑌
𝜌𝜌𝑋𝑋,𝑌𝑌 = 𝐶𝐶𝑝𝑝𝑣𝑣𝑣𝑣 𝑋𝑋, 𝑌𝑌 = = .
𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌
In the example, 𝜎𝜎𝑋𝑋2 = 12 × 0.5 + 22 × 0.5 − 1.52 = 0.25 and 𝜎𝜎𝑌𝑌2 = 12 × 0.25 + 22 ×
0.5 + 32 × 0.25 − 22 = 0.5. The correlation coefficient of 𝑋𝑋 and 𝑌𝑌 is
𝜎𝜎𝑋𝑋,𝑌𝑌 0.25
𝜌𝜌𝑋𝑋,𝑌𝑌 = = = 0.71.
𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 0.25 × 0.5
STAT 2006 - Jan 2021 88

Interpretation of correlation.
• −1 ≤ 𝜌𝜌𝑋𝑋,𝑌𝑌 ≤ 1 (For 𝑝𝑝 ∈ ℝ, 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 − 𝑝𝑝𝑌𝑌 ≥ 0)
• If 𝜌𝜌𝑋𝑋,𝑌𝑌 = 1, then 𝑋𝑋 and 𝑌𝑌 are perfectly, positively, linearly correlated.
• If 𝜌𝜌𝑋𝑋,𝑌𝑌 = −1, then 𝑋𝑋 and 𝑌𝑌 are perfectly, negatively, linearly correlated.
• If 𝜌𝜌𝑋𝑋,𝑌𝑌 = 0, then 𝑋𝑋 and 𝑌𝑌 are completely, un-linearly correlated. That is, 𝑋𝑋 and 𝑌𝑌 may be
perfectly correlated in some other manner, in a parabolic manner, perhaps, but not in a
linear manner.
• If 𝜌𝜌𝑋𝑋,𝑌𝑌 > 0, then 𝑋𝑋 and 𝑌𝑌 are positively, linearly correlated, but not perfectly so.
• If 𝜌𝜌𝑋𝑋,𝑌𝑌 < 0, then 𝑋𝑋 and 𝑌𝑌 are negatively, linearly correlated, but not perfectly so.
In our example, we can conclude that X and Y are positively, linearly correlated, but not
perfectly so.
STAT 2006 - Jan 2021 89

If 𝑋𝑋 and 𝑌𝑌 are independent random variables, then 𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 = 0.
Proof.
If 𝑋𝑋 and 𝑌𝑌 are independent random variables, then 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑓𝑓𝑌𝑌 𝐻𝐻 for all 𝑥𝑥 ∈ 𝑆𝑆1 and 𝐻𝐻 ∈ 𝑆𝑆2 .
𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 = � � 𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝐻𝐻 − 𝜇𝜇𝑌𝑌 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = � � 𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝐻𝐻 − 𝜇𝜇𝑌𝑌 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑓𝑓𝑌𝑌 𝐻𝐻
𝑥𝑥∈𝑆𝑆1 𝑦𝑦∈𝑆𝑆2 𝑥𝑥∈𝑆𝑆1 𝑦𝑦∈𝑆𝑆2
Left Tfnwufily 忘所以 Miflykttn

= � 𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝑓𝑓𝑋𝑋 𝑥𝑥 � 𝐻𝐻 − 𝜇𝜇𝑌𝑌 𝑓𝑓𝑌𝑌 𝐻𝐻 = 0.
1
⼆0
Example. Let 𝑋𝑋 be a random variable with 𝑃𝑃 𝑋𝑋 = 0 = 𝑃𝑃 𝑋𝑋 = 1 = 𝑃𝑃 𝑋𝑋 = −1 = and 𝑌𝑌 = 𝐼𝐼 𝑋𝑋 = 0 be
3
the indicator random variable of the event 𝑋𝑋 = 0.
Note that 𝑋𝑋 and 𝑌𝑌 are dependent. But we have that 𝑋𝑋𝑌𝑌 = 0 and 𝐸𝐸 𝑋𝑋 = 0. It follows that 𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 =
𝐸𝐸 𝑋𝑋𝑌𝑌 − 𝐸𝐸 𝑋𝑋 𝐸𝐸 𝑌𝑌 = 0.
Counterexample
Example. Let 𝑍𝑍 be a standard normal random variable. 𝑍𝑍 and 𝑍𝑍 2 are dependent, but 𝐶𝐶𝑝𝑝𝑣𝑣 𝑍𝑍, 𝑍𝑍 2 = 𝐸𝐸 𝑍𝑍 3 =
0.
STAT 2006 - Jan 2021 90
Example. A quality control inspector for a t-shirt manufacturer inspects t-shirts for defects.
She labels each t-shirt she inspects as either good, a second and defective. The quality
control inspector inspects 𝑛𝑛 = 2 t-shirts. Let 𝑋𝑋 be the number of good t-shirts and 𝑌𝑌 be
the number of second t-shirts. Assume that the probability that a t-shirt is good is 0.6 and
that a t-shirt is second is 0.2. Are 𝑋𝑋 and 𝑌𝑌 independent? What is the correlation between
𝑋𝑋 and 𝑌𝑌?
Trinomial 2!
The joint pmf of 𝑋𝑋 and 𝑌𝑌 is 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 0.6𝑥𝑥 0.2𝑦𝑦 0.22−𝑥𝑥−𝑦𝑦 , 0 ≤ 𝑥𝑥 + 𝐻𝐻 ≤ 2.
𝑥𝑥!𝑦𝑦! 2−𝑥𝑥−𝑦𝑦 !
Since the joint pmf cannot be factorized, 𝑋𝑋 and 𝑌𝑌 are not independent. Since the marginal
pmf of 𝑋𝑋 is Binomial with parameter 𝑛𝑛 = 2 and 𝑝𝑝 = 0.6, 𝜇𝜇𝑋𝑋 = 2 × 0.6 = 1.2 and 𝜎𝜎𝑋𝑋2 =
2 × 0.6 × 0.4 = 0.48. Similarly, the marginal pmf of 𝑌𝑌 is Binomial with parameter 𝑛𝑛 = 2
and 𝑝𝑝 = 0.2. 𝜇𝜇𝑌𝑌 = 2 × 0.2 = 0.4 and 𝜎𝜎𝑌𝑌2 = 2 × 0.2 × 0.8 = 0.32. The expectation of 𝑋𝑋𝑌𝑌 is
𝜇𝜇𝑋𝑋𝑌𝑌 = ∑𝑥𝑥 ∑𝑦𝑦 𝑥𝑥𝐻𝐻𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥 = 1, 𝐻𝐻 = 1 = 0.24. We have
uhenx o.y.sn x 2 yoFzxyfxinx.g o 𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌
𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 = 0.24 − 1.2 × 0.4 = −0.24 and 𝐶𝐶𝑝𝑝𝑣𝑣𝑣𝑣 𝑋𝑋, 𝑌𝑌 = = −0.61.
𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌
STAT 2006 - Jan 2021 91
Conditional pmf
The conditional pmf of 𝑋𝑋, given that 𝑌𝑌 = 𝐻𝐻, is defined by
𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻
𝐿𝐿𝑋𝑋|𝑌𝑌 𝑥𝑥 𝐻𝐻 = , 𝑓𝑓𝑌𝑌 𝐻𝐻 > 0.
𝑓𝑓𝑌𝑌 𝐻𝐻
The conditional pmf of 𝑌𝑌, given that 𝑋𝑋 = 𝑥𝑥, is defined by
𝐿𝑌𝑌|𝑋𝑋 𝐻𝐻 𝑥𝑥 = , 𝑓𝑓𝑋𝑋 𝑥𝑥 > 0.
𝑓𝑓𝑋𝑋 𝑥𝑥
Note that
� 𝐿𝐿𝑋𝑋|𝑌𝑌 𝑥𝑥 𝐻𝐻 = 1 and � 𝐿𝑌𝑌|𝑋𝑋 𝐻𝐻 𝑥𝑥 = 1
STAT 2006 - Jan 2021 92

Conditional pmf
The conditional mean of 𝑌𝑌 given 𝑋𝑋 = 𝑥𝑥 is defined by
𝜇𝜇𝑌𝑌|𝑋𝑋=𝑥𝑥 = 𝐸𝐸 𝑌𝑌 𝑋𝑋 = 𝑥𝑥 = � 𝐻𝐻𝐿𝑌𝑌|𝑋𝑋 𝐻𝐻 𝑥𝑥 .
𝑦𝑦∈𝑆𝑆2
The conditional mean of 𝑋𝑋 given 𝑌𝑌 = 𝐻𝐻 is defined by
𝜇𝜇𝑋𝑋|𝑌𝑌=𝑦𝑦 = 𝐸𝐸 𝑋𝑋 𝑌𝑌 = 𝐻𝐻 = � 𝑥𝑥𝐿𝐿𝑋𝑋|𝑌𝑌 𝑥𝑥 𝐻𝐻 .
𝑥𝑥∈𝑆𝑆1
The conditional variance of 𝑌𝑌 given 𝑋𝑋 = 𝑥𝑥 is defined by
2 2 2 2
𝜎𝜎𝑌𝑌|𝑋𝑋=𝑥𝑥 = 𝐸𝐸 𝑌𝑌 − 𝜇𝜇𝑌𝑌|𝑋𝑋=𝑥𝑥 𝑋𝑋 = 𝑥𝑥 = � 𝐻𝐻 − 𝜇𝜇𝑌𝑌|𝑋𝑋=𝑥𝑥 𝐿𝑌𝑌|𝑋𝑋 𝐻𝐻 𝑥𝑥 = 𝐸𝐸 𝑌𝑌 2 𝑋𝑋 = 𝑥𝑥 − 𝜇𝜇𝑌𝑌|𝑋𝑋=𝑥𝑥 .
𝑦𝑦∈𝑆𝑆2
The conditional variance of 𝑋𝑋 given 𝑌𝑌 = 𝐻𝐻 is defined by
2 2 2 2
𝜎𝜎𝑋𝑋|𝑌𝑌=𝑦𝑦 = 𝐸𝐸 𝑋𝑋 − 𝜇𝜇𝑋𝑋|𝑌𝑌=𝑦𝑦 𝑌𝑌 = 𝐻𝐻 = � 𝑥𝑥 − 𝜇𝜇𝑋𝑋|𝑌𝑌=𝑦𝑦 𝐿𝐿𝑋𝑋|𝑌𝑌 𝑥𝑥 𝐻𝐻 = 𝐸𝐸 𝑋𝑋 2 𝑌𝑌 = 𝐻𝐻 − 𝜇𝜇𝑋𝑋|𝑌𝑌=𝑦𝑦 .
𝑥𝑥∈𝑆𝑆1
STAT 2006 - Jan 2021 93

Two Continuous Random Variables
continuous
Let 𝑋𝑋 and 𝑌𝑌 be two discrete random variables, and let 𝑆𝑆 denote the two-dimensional
support of 𝑋𝑋 and 𝑌𝑌. Then, the function 𝑓𝑓𝑋𝑋,𝑌𝑌 (𝑥𝑥, 𝐻𝐻) = 𝑃𝑃(𝑋𝑋 = 𝑥𝑥, 𝑌𝑌 = 𝐻𝐻) is a joint probability
density function (pdf) if it satisfies the following three conditions:
• 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 ≥ 0
∞ ∞
• ∫−∞ ∫−∞ 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 𝑀𝑀𝑥𝑥𝑀𝑀𝐻𝐻 = 1
• 𝑃𝑃 𝑋𝑋, 𝑌𝑌 ∈ 𝐴𝐴 = ∫𝐴𝐴 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 𝑀𝑀𝑥𝑥𝑀𝑀𝐻𝐻, where 𝐴𝐴 ⊂ 𝑆𝑆.
Example. Verify that the function 𝑓𝑓 𝑥𝑥, 𝐻𝐻 = 4𝑥𝑥𝐻𝐻 for 0 < 𝑥𝑥 < 1 and 0 < 𝐻𝐻 < 1 is a valid pdf. If yes,
What is 𝑃𝑃 𝑌𝑌 < 𝑋𝑋 ?
1 1
1 1 1
𝐻𝐻 2 𝑥𝑥 2
� � 4𝑥𝑥𝐻𝐻 𝑀𝑀𝐻𝐻𝑀𝑀𝑥𝑥 = 4 � 𝑥𝑥 � 𝑀𝑀𝑥𝑥 = 2 � = 1.
0 0 0 2 2
0 0
𝑥𝑥 1
1 𝑥𝑥 1 1
𝐻𝐻 2 3
𝑥𝑥 4 1
𝑃𝑃 𝑌𝑌 < 𝑋𝑋 = � � 4𝑥𝑥𝐻𝐻 𝑀𝑀𝐻𝐻𝑀𝑀𝑥𝑥 = � 4𝑥𝑥 � 𝑀𝑀𝑥𝑥 = 2 � 𝑥𝑥 𝑀𝑀𝑥𝑥 = 2 � = .
0 0 0 2 0 4 2
0 0
STAT 2006 - Jan 2021 94
The Marginal pdf of 𝑋𝑋 is given by

∞
𝑓𝑓𝑋𝑋 𝑥𝑥 = � 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 𝑀𝑀𝐻𝐻, 𝑥𝑥 ∈ 𝑆𝑆1 .
−∞
The Marginal pdf of 𝑌𝑌 is given by
∞
𝑓𝑓𝑌𝑌 𝐻𝐻 = � 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 𝑀𝑀𝑥𝑥, 𝐻𝐻 ∈ 𝑆𝑆2 .
−∞
Example.
The marginal pdf of 𝑋𝑋 is
1
1
𝐻𝐻 2
𝑓𝑓𝑋𝑋 𝑥𝑥 = � 4𝑥𝑥𝐻𝐻 𝑀𝑀𝐻𝐻 = 4𝑥𝑥 � = 2𝑥𝑥, 0 < 𝑥𝑥 < 1.
0 2
0
The marginal pdf of 𝑌𝑌 is, by symmetry, 𝑓𝑓𝑌𝑌 𝐻𝐻 = 2𝐻𝐻, 0 < 𝐻𝐻 < 1.
STAT 2006 - Jan 2021 95

The random variables 𝑋𝑋 and 𝑌𝑌 are independent if and only if their joint pdf factors into the
product of their marginal pdfs, that is
𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑓𝑓𝑌𝑌 𝐻𝐻 , 𝑥𝑥 ∈ 𝑆𝑆1 , 𝐻𝐻 ∈ 𝑆𝑆2 .
Example. If 𝑋𝑋 and 𝑌𝑌 have the pdf 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 4𝑥𝑥𝐻𝐻 for 0 < 𝑥𝑥 < 1 and 0 < 𝐻𝐻 < 1, are 𝑋𝑋
and 𝑌𝑌 independent?
Answer is yes because 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 4𝑥𝑥𝐻𝐻 = 2𝑥𝑥 2𝐻𝐻 = 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑓𝑓𝑌𝑌 𝐻𝐻 .
Example. If 𝑋𝑋 and 𝑌𝑌 have the pdf 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 𝑥𝑥 + 𝐻𝐻 for 0 < 𝑥𝑥 < 1 and 0 < 𝐻𝐻 < 1, are 𝑋𝑋
and 𝑌𝑌 independent? 很
1
明顯不是independent呢_
1
Answer: The marginal pdf of 𝑋𝑋 is 𝑓𝑓𝑋𝑋 𝑥𝑥 = ∫0 𝑥𝑥 + 𝐻𝐻 𝑀𝑀𝐻𝐻 = 𝑥𝑥 + , 0 < 𝑥𝑥 < 1 and, similarly,
2
1 1
the marginal pdf of 𝑌𝑌 is 𝑓𝑓𝑌𝑌 𝐻𝐻 = ∫0 𝑥𝑥 + 𝐻𝐻 𝑀𝑀𝑥𝑥 = 𝐻𝐻 + , 0 < 𝐻𝐻 < 1. Since 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 =
2
1 1
𝑥𝑥 + 𝐻𝐻 ≠ 𝑥𝑥 + 𝐻𝐻 + for 0 < 𝑥𝑥 < 1 and 0 < 𝐻𝐻 < 1, 𝑋𝑋 and 𝑌𝑌 are not independent.
2 2
STAT 2006 - Jan 2021 96

The expected value of 𝑋𝑋 is defined by
∞ ∞ ∞ ∞ ∞
𝐸𝐸 𝑋𝑋 = � � 𝑥𝑥𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 𝑀𝑀𝐻𝐻𝑀𝑀𝑥𝑥 = � 𝑥𝑥 � 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 𝑀𝑀𝐻𝐻 𝑀𝑀𝑥𝑥 = � 𝑥𝑥𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥.
−∞ −∞ −∞ −∞ −∞
The expected value of 𝑌𝑌 is defined by

∞ ∞ ∞ ∞ ∞
𝐸𝐸 𝑌𝑌 = � � 𝐻𝐻𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 𝑀𝑀𝑥𝑥𝑀𝑀𝐻𝐻 = � 𝐻𝐻 � 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 𝑀𝑀𝑥𝑥 𝑀𝑀𝐻𝐻 = � 𝐻𝐻𝑓𝑓𝑌𝑌 𝐻𝐻 𝑀𝑀𝐻𝐻.
−∞ −∞ −∞ −∞ −∞
Example. Let 𝑋𝑋 and 𝑌𝑌 have the pdf 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 4𝑥𝑥𝐻𝐻 for 0 < 𝑥𝑥 < 1 and 0 < 𝐻𝐻 < 1.
jtxgdy⼆爽啊
1
1
xnxi
2
𝑥𝑥 3 2 2
𝐸𝐸 𝑋𝑋 = 2 � 𝑥𝑥 𝑀𝑀𝑥𝑥 = 2 � = and 𝐸𝐸 𝑌𝑌 = .
3 3 3
0
0 2x
Example. Let 𝑋𝑋 and 𝑌𝑌 have the pdf 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 𝑥𝑥 + 𝐻𝐻 for 0 < 𝑥𝑥 < 1 and 0 < 𝐻𝐻 < 1.
57symmetry
1 1
1
2
𝑥𝑥 𝑥𝑥 3 1 𝑥𝑥 2 7 7
𝐸𝐸 𝑋𝑋 = � 𝑥𝑥 + 𝑀𝑀𝑥𝑥 = � + � = and 𝐸𝐸 𝑌𝑌 = .
0 2 3 2 2 12 12
0 0
STAT 2006 - Jan 2021 97
Conditional pdf of 𝑌𝑌 given 𝑋𝑋 = 𝑥𝑥

Suppose that 𝑋𝑋 and 𝑌𝑌 are random variables with joint pdf 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 and marginal pdfs
𝑓𝑓𝑋𝑋 𝑥𝑥 and 𝑓𝑓𝑌𝑌 𝐻𝐻 respectively. The conditional pdf of 𝑌𝑌 given 𝑋𝑋 = 𝑥𝑥 is defined by
𝐿 𝐻𝐻 𝑥𝑥 = ,
𝑓𝑓𝑋𝑋 𝑥𝑥
provided 𝑓𝑓𝑋𝑋 𝑥𝑥 > 0. The conditional expectation of 𝑌𝑌 given 𝑋𝑋 = 𝑥𝑥 is defined by
∞
𝐸𝐸 𝑌𝑌 𝑋𝑋 = 𝑥𝑥 = � 𝐻𝐻𝐿 𝐻𝐻 𝑥𝑥 𝑀𝑀𝐻𝐻.
−∞
The conditional variance of 𝑌𝑌 given 𝑋𝑋 = 𝑥𝑥 is defined by
2
𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌 𝑋𝑋 = 𝑥𝑥 = 𝐸𝐸 𝑌𝑌 − 𝐸𝐸 𝑌𝑌 𝑋𝑋 = 𝑥𝑥 = 𝐸𝐸 𝑌𝑌 2 𝑋𝑋 = 𝑥𝑥 − 𝐸𝐸 𝑌𝑌 𝑋𝑋 = 𝑥𝑥 2 .
STAT 2006 - Jan 2021 98

3
Example. Suppose that 𝑋𝑋 and 𝑌𝑌 have the joint pdf 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = for 𝑥𝑥 2 ≤ 𝐻𝐻 ≤ 1 and 0 < 𝑥𝑥 < 1.
2
What is the conditional pdf of 𝑌𝑌 given 𝑋𝑋 = 𝑥𝑥?
Answer. The marginal pdf of 𝑋𝑋 is given by
1
3 3
𝑓𝑓𝑋𝑋 𝑥𝑥 = � 𝑀𝑀𝐻𝐻 = 1 − 𝑥𝑥 2 , 0 < 𝑥𝑥 < 1.
𝑥𝑥 2 2 2
The conditional pdf of 𝑌𝑌 given 𝑋𝑋 = 𝑥𝑥 is given by
3
𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 2 1
𝐿 𝐻𝐻 𝑥𝑥 = = = , 𝑥𝑥 2 ≤ 𝐻𝐻 ≤ 1.
𝑓𝑓𝑋𝑋 𝑥𝑥 3 1 − 𝑥𝑥 2
1 − 𝑥𝑥 2
2
Thus, the conditional pdf of 𝑌𝑌 given 𝑋𝑋 = 𝑥𝑥 is 𝑈𝑈 𝑥𝑥 2 , 1 . The conditional expectation of 𝑌𝑌 given 𝑋𝑋 =
𝑥𝑥 is given by
1
𝐻𝐻 1 − 𝑥𝑥 4 1 + 𝑥𝑥 2
𝐸𝐸 𝑌𝑌 𝑋𝑋 = 𝑥𝑥 = � 2 𝑀𝑀𝐻𝐻 = 2 1 − 𝑥𝑥 2 =
.
𝑥𝑥 2 1 − 𝑥𝑥 2
STAT 2006 - Jan 2021 99

Bivariate Normal Distribution
𝜎𝜎𝑋𝑋2 𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌
Let Σ = , where 𝜎𝜎𝑋𝑋 > 0, 𝜎𝜎𝑌𝑌 > 0 and −1 < 𝜌𝜌 < 1. 𝑋𝑋 and 𝑌𝑌 have a
𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 𝜎𝜎𝑌𝑌2
bivariate normal distribution with parameters 𝜇𝜇𝑋𝑋 , 𝜇𝜇𝑌𝑌 , 𝜎𝜎𝑋𝑋 , 𝜎𝜎𝑌𝑌 , 𝜌𝜌 if their joint pdf is defined
by
1 −2
1 1 𝑥𝑥 − 𝜇𝜇𝑋𝑋
𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = Σ exp − 𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝐻𝐻 − 𝜇𝜇𝑌𝑌 Σ −1 𝐻𝐻 − 𝜇𝜇
2𝜋𝜋 2 𝑌𝑌
2
1 1 𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝐻𝐻 − 𝜇𝜇𝑌𝑌
= exp �− � − 2𝜌𝜌
2𝜋𝜋𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 1 − 𝜌𝜌2 2 1 − 𝜌𝜌2 𝜎𝜎𝑋𝑋 𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌
STAT 2006 - Jan 2021 100

The marginal pdf of 𝑋𝑋 is given by

∞ 1 𝑥𝑥−𝜇𝜇𝑋𝑋 2 𝑥𝑥−𝜇𝜇 𝑦𝑦−𝜇𝜇𝑌𝑌 𝑦𝑦−𝜇𝜇 2
1 1
−2 −2 1−𝜌𝜌2 𝜎𝜎 −2𝜌𝜌 𝜎𝜎 𝑋𝑋 𝜎𝜎 + 𝜎𝜎 𝑌𝑌
𝑓𝑓𝑋𝑋 𝑥𝑥 = � Σ 𝑐𝑐 𝑋𝑋 𝑋𝑋 𝑌𝑌 𝑌𝑌 𝑀𝑀𝐻𝐻
−∞ 2𝜋𝜋
∞ 1 𝑦𝑦−𝜇𝜇𝑌𝑌 𝑥𝑥−𝜇𝜇𝑋𝑋 2 2 𝑥𝑥−𝜇𝜇𝑋𝑋
2
1 1 − −𝜌𝜌 + 1−𝜌𝜌
=� Σ −2 𝑐𝑐 2 1−𝜌𝜌2 𝜎𝜎𝑌𝑌 𝜎𝜎𝑋𝑋 𝜎𝜎𝑋𝑋 𝑀𝑀𝐻𝐻
−∞ 2𝜋𝜋
2 ∞ 1 𝑦𝑦−𝜇𝜇𝑌𝑌 𝑥𝑥−𝜇𝜇𝑋𝑋 2 1 𝑥𝑥−𝜇𝜇𝑋𝑋 2
𝜎𝜎𝑌𝑌 −1 −12 𝑥𝑥−𝜇𝜇 𝑋𝑋 − 2 𝜎𝜎𝑌𝑌 −𝜌𝜌 𝜎𝜎𝑋𝑋
𝐻𝐻 − 𝜇𝜇 𝑌𝑌 1 −
= Σ 2 𝑐𝑐 𝜎𝜎𝑋𝑋 � 𝑐𝑐 2 1−𝜌𝜌 𝑀𝑀 = 𝑐𝑐 2 𝜎𝜎𝑋𝑋 .
2𝜋𝜋 −∞ 𝜎𝜎𝑌𝑌 2
2𝜋𝜋𝜎𝜎𝑋𝑋
2
Thus, 𝑋𝑋 is normally distributed with mean 𝜇𝜇𝑋𝑋 and variance 𝜎𝜎𝑋𝑋 . Similarly, 𝑌𝑌 is normally
distributed with mean 𝜇𝜇𝑌𝑌 and variance 𝜎𝜎𝑌𝑌2 .
STAT 2006 - Jan 2021 101

∞ ∞ 1
𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝐻𝐻 − 𝜇𝜇𝑌𝑌 1 −
2 1−𝜌𝜌
2
2 𝑥𝑥 −2𝜌𝜌𝑥𝑥𝑦𝑦+𝑦𝑦
2
𝐸𝐸 =� � 𝑥𝑥𝐻𝐻 𝑐𝑐 𝑀𝑀𝑥𝑥𝑀𝑀𝐻𝐻
𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 −∞ −∞ 2𝜋𝜋 1 − 𝜌𝜌2
∞ ∞ 1
1 − 𝑦𝑦−𝜌𝜌𝑥𝑥 2 +𝑥𝑥 2 1−𝜌𝜌2
= � � 𝑥𝑥𝐻𝐻𝑐𝑐 2 1−𝜌𝜌2 𝑀𝑀𝐻𝐻𝑀𝑀𝑥𝑥
2𝜋𝜋 1 − 𝜌𝜌2 −∞ −∞
∞ 𝑥𝑥 2 ∞ 1
1 −2 −
2 1−𝜌𝜌 2 𝑦𝑦−𝜌𝜌𝑥𝑥
2
= � 𝑥𝑥𝑐𝑐 � 𝐻𝐻𝑐𝑐 𝑀𝑀𝐻𝐻𝑀𝑀𝑥𝑥
2𝜋𝜋 1 − 𝜌𝜌2 −∞ −∞
∞ 2 ∞ 𝑥𝑥 2
1 −
𝑥𝑥 𝜌𝜌 2 −
= � 𝑥𝑥𝑐𝑐 2 𝜌𝜌𝑥𝑥 2𝜋𝜋 1 − 𝜌𝜌2 𝑀𝑀𝑥𝑥 = � 𝑥𝑥 𝑐𝑐 2 𝑀𝑀𝑥𝑥 = 𝜌𝜌,
2𝜋𝜋 1 − 𝜌𝜌2 −∞ 2𝜋𝜋 −∞
∞ 2 𝜋𝜋
since ∫−∞ 𝑥𝑥 2 𝑐𝑐 −𝑥𝑥 𝑀𝑀𝑥𝑥 = . It follows that 𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋, 𝑌𝑌 = 𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 .
2
STAT 2006 - Jan 2021 102

If 𝑋𝑋 and 𝑌𝑌 have a bivariate normal distribution with correlation coefficient 𝜌𝜌, then 𝑋𝑋 and 𝑌𝑌 are
independent if and only if 𝜌𝜌 = 0.
Proof. Note that
2 2
1 1 𝐻𝐻 − 𝜇𝜇𝑌𝑌 𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝑥𝑥 − 𝜇𝜇𝑋𝑋
𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = exp − − 𝜌𝜌 + 1 − 𝜌𝜌2 .
2𝜋𝜋𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 1 − 𝜌𝜌2 2 1 − 𝜌𝜌2 𝜎𝜎𝑌𝑌 𝜎𝜎𝑋𝑋 𝜎𝜎𝑋𝑋
It follows that 𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 = 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑓𝑓𝑌𝑌 𝐻𝐻 if and only if 𝜌𝜌 = 0.

The conditional pdf of 𝑌𝑌 given 𝑋𝑋 = 𝑥𝑥 is given by
2
𝑓𝑓𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝐻𝐻 1 1 𝜎𝜎𝑌𝑌
𝐿 𝐻𝐻 𝑥𝑥 = = exp − 2 𝐻𝐻 − 𝜇𝜇𝑌𝑌 + 𝜌𝜌 𝑥𝑥 − 𝜇𝜇𝑋𝑋 .
𝑓𝑓𝑋𝑋 𝑥𝑥 2𝜋𝜋𝜎𝜎𝑌𝑌2 1 − 𝜌𝜌2 2𝜎𝜎𝑌𝑌 1 − 𝜌𝜌2 𝜎𝜎𝑋𝑋
𝜎𝜎𝑌𝑌
Hence, 𝑌𝑌|𝑋𝑋 = 𝑥𝑥 is normally distributed with mean 𝜇𝜇𝑌𝑌 + 𝜌𝜌 𝑥𝑥 − 𝜇𝜇𝑋𝑋 and variance 𝜎𝜎𝑌𝑌2 1 − 𝜌𝜌2 .
𝜎𝜎𝑋𝑋
STAT 2006 - Jan 2021 103

Example. Let 𝑋𝑋 denote the math score of a randomly selected student. Let 𝑌𝑌 denote the statistic
score of a randomly selected student. Previous history indicates that
• 𝑋𝑋 is normally distributed with mean 68 and variance 17.64.
• 𝑌𝑌 is normally distributed with mean 79 and variance 12.25.
• The correlation between 𝑋𝑋 and 𝑌𝑌 is 0.78.
What is the probability that the statistic score of a randomly selected student is between 72 and
86?
72 − 79 86 − 79
𝑃𝑃 58 < 𝑌𝑌 < 85 = 𝑃𝑃 < 𝑍𝑍 < = 𝑃𝑃 −2 < 𝑍𝑍 < 2 = 0.9545.
12.25 12.25
What is the probability that the statistic score of a randomly selected student is between 72 and 86
given that his or her math score was 50?
Note that 𝑌𝑌|𝑋𝑋 = 50 is normally distributed with mean 67.3 and variance 4.7971. Hence,
72 − 67.3 86 − 67.3
𝑃𝑃 72 < 𝑌𝑌 < 86 𝑋𝑋 = 50 = 𝑃𝑃 < 𝑍𝑍 < = 0.016.
4.7971 4.7971
STAT 2006 - Jan 2021 104
Distribution Function Technique
The distribution function technique to find the pdf of 𝑋𝑋 uses the following steps:
• Find the cdf 𝐹𝐹𝑋𝑋 𝑥𝑥 = 𝑃𝑃 𝑋𝑋 ≤ 𝑥𝑥
𝑑𝑑
• The pdf 𝑓𝑓𝑋𝑋 𝑥𝑥 = 𝐹𝐹𝑋𝑋 𝑥𝑥
𝑑𝑑𝑥𝑥
2
Example. Let 𝑋𝑋 be a random variable with pdf 𝑓𝑓𝑋𝑋 𝑥𝑥 = 3 1 − 𝑥𝑥 for 0 < 𝑥𝑥 < 1. What is the pdf of
𝑌𝑌 = 1 − 𝑋𝑋 3 ?
Answer. The cdf of 𝑌𝑌 is given by
1 1 1
𝐹𝐹𝑌𝑌 𝐻𝐻 = 𝑃𝑃 𝑌𝑌 ≤ 𝐻𝐻 = 𝑃𝑃 1 − 𝑋𝑋 ≤ 𝐻𝐻 3 = 𝑃𝑃 𝑋𝑋 ≥ 1 − 𝐻𝐻 3 =� 3 1 − 𝑥𝑥 2 𝑀𝑀𝑥𝑥
1
1−𝑦𝑦 3
1
= − 1 − 𝑥𝑥 3 � 1 = 𝐻𝐻, 0 < 𝐻𝐻 < 1.
1−𝑦𝑦 3
Hence the pdf of 𝑌𝑌 is 𝑓𝑓𝑌𝑌 𝐻𝐻 = 1, 0 < 𝐻𝐻 < 1. It follows that 𝑌𝑌 is 𝑈𝑈 0,1 distributed.
STAT 2006 - Jan 2021 105

Let 𝑋𝑋 be a random variable with pdf 𝑓𝑓𝑋𝑋 𝑥𝑥 defined on the support 𝑐𝑐1 < 𝑥𝑥 < 𝑐𝑐2 . Suppose
that 𝑌𝑌 = 𝑣𝑣 𝑋𝑋 is an invertible function of 𝑋𝑋 on the support, that is 𝑋𝑋 = 𝑣𝑣−1 𝑌𝑌 for
𝑣𝑣 𝑐𝑐1 < 𝐻𝐻 < 𝑣𝑣 𝑐𝑐2 . Then, the pdf of 𝑌𝑌 is given by
−1
𝑀𝑀𝑣𝑣−1 𝐻𝐻
𝑓𝑓𝑌𝑌 𝐻𝐻 = 𝑓𝑓𝑋𝑋 𝑣𝑣 𝐻𝐻 ,
𝑀𝑀𝐻𝐻
for 𝑣𝑣 𝑐𝑐1 < 𝐻𝐻 < 𝑣𝑣 𝑐𝑐2 .
Example. Let the pdf of 𝑋𝑋 be 𝑓𝑓𝑋𝑋 𝑥𝑥 = 3𝑥𝑥 2 for 0 < 𝑥𝑥 < 1. Find the pdf of 𝑌𝑌 = 𝑋𝑋 2 .
Note that 𝐻𝐻 = 𝑥𝑥 2 is a monotone function on 0,1 with inverse 𝑥𝑥 = 𝐻𝐻 defined on 0,1 .
𝑑𝑑𝑥𝑥 1 2 1 3
Since = , 𝑓𝑓 𝐻𝐻 = 3 𝐻𝐻 � = 𝐻𝐻, 0 < 𝐻𝐻 < 1.
𝑑𝑑𝑦𝑦 2 𝑦𝑦 𝑌𝑌 2 𝑦𝑦 2
STAT 2006 - Jan 2021 106

Example. Let 𝑍𝑍 be a standard normal random variable. Find the pdf of 𝑍𝑍 2 .

Answer. The cdf of 𝑌𝑌 = 𝑍𝑍 2 is given by
𝐹𝐹𝑌𝑌 𝐻𝐻 = 𝑃𝑃 𝑌𝑌 ≤ 𝐻𝐻 = 𝑃𝑃 − 𝐻𝐻 ≤ 𝑍𝑍 ≤ 𝐻𝐻 = Φ 𝐻𝐻 − Φ − 𝐻𝐻 .
Hence the pdf of 𝑌𝑌 is given by
1 1 1 − 𝑦𝑦 2 1 1 − 𝑦𝑦 2 1
−
𝑓𝑓𝑌𝑌 𝐻𝐻 = 𝑓𝑓𝑍𝑍 𝐻𝐻 + 𝑓𝑓𝑍𝑍 − 𝐻𝐻 = 𝑐𝑐 2 � + 𝑐𝑐 2 �
2 𝐻𝐻 2 𝐻𝐻 2𝜋𝜋 2 𝐻𝐻 2𝜋𝜋 2 𝐻𝐻
1 −𝑦𝑦 1
= 𝑐𝑐 2 � .
2𝜋𝜋 𝐻𝐻
Note that 𝑌𝑌 is a 𝜒𝜒12 random variable.
STAT 2006 - Jan 2021 107

Transformations of two Random Variables
Example. Suppose 𝑋𝑋1 and 𝑋𝑋2 are independent gamma random variables with joint pdf
𝑥𝑥1 +𝑥𝑥2
1 𝛼𝛼−1 𝛽𝛽−1 − 𝜃𝜃 𝑋𝑋1
𝑓𝑓𝑋𝑋1 ,𝑋𝑋2 𝑥𝑥1 , 𝑥𝑥2 = 𝑥𝑥 𝑥𝑥2 𝑐𝑐 , 𝑥𝑥1 > 0, 𝑥𝑥2 > 0. Find the joint pdf of 𝑌𝑌1 =
Γ 𝛼𝛼 Γ 𝛽𝛽 𝜃𝜃𝛼𝛼+𝛽𝛽 1 𝑋𝑋1 +𝑋𝑋2
and 𝑌𝑌2 = 𝑋𝑋1 + 𝑋𝑋2 .
Answer. The inverse relationships are 𝑋𝑋1 = 𝑌𝑌1 𝑌𝑌2 and 𝑋𝑋2 = 𝑌𝑌2 1 − 𝑌𝑌1 . The determinant of the
Jacobian is given by
𝜕𝜕𝑋𝑋1 𝜕𝜕𝑋𝑋1
𝜕𝜕𝑌𝑌1 𝜕𝜕𝑌𝑌2 𝑌𝑌 𝑌𝑌1
𝐽𝐽 = = 2 = 𝑌𝑌2 .
𝜕𝜕𝑋𝑋2 𝜕𝜕𝑋𝑋2 −𝑌𝑌2 1 − 𝑌𝑌1
𝜕𝜕𝑌𝑌1 𝜕𝜕𝑌𝑌2
The joint pdf of 𝑌𝑌1 and 𝑌𝑌2 is
𝐻𝐻2 𝛼𝛼−1 𝛽𝛽−1 −𝑦𝑦2
𝑓𝑓𝑌𝑌1 ,𝑌𝑌2 𝐻𝐻1 , 𝐻𝐻2 = 𝐽𝐽 𝑓𝑓𝑋𝑋1 ,𝑋𝑋2 𝐻𝐻1 𝐻𝐻2 , 𝐻𝐻2 1 − 𝐻𝐻1 = 𝐻𝐻 𝐻𝐻 𝐻𝐻2 1 − 𝐻𝐻1 𝑐𝑐 𝜃𝜃
Γ 𝛼𝛼 Γ 𝛽𝛽 𝜃𝜃 𝛼𝛼+𝛽𝛽 1 2
1 𝑦𝑦
𝛼𝛼+𝛽𝛽−1 − 2 Γ 𝛼𝛼 + 𝛽𝛽 𝛼𝛼−1 𝛽𝛽−1
= 𝐻𝐻 𝑐𝑐 𝜃𝜃 𝐻𝐻 1 − 𝐻𝐻1 , 0 < 𝐻𝐻1 < 1, 𝐻𝐻2 > 0.
Γ 𝛼𝛼 + 𝛽𝛽 𝜃𝜃 𝛼𝛼+𝛽𝛽 2 Γ 𝛼𝛼 Γ 𝛽𝛽 1
STAT 2006 - Jan 2021 108

Example. Note that 𝑌𝑌1 and 𝑌𝑌2 are independent. The marginal pdf of 𝑌𝑌2 is given by
1 𝑦𝑦
𝛼𝛼+𝛽𝛽−1 − 2
𝑓𝑓𝑌𝑌2 𝐻𝐻2 = 𝐻𝐻 𝑐𝑐 𝜃𝜃 , 𝐻𝐻2 > 0.
Γ 𝛼𝛼 + 𝛽𝛽 𝜃𝜃 𝛼𝛼+𝛽𝛽 2
The marginal pdf of 𝑌𝑌1 is given by
Γ 𝛼𝛼 + 𝛽𝛽 𝛼𝛼−1 𝛽𝛽−1
𝑓𝑓𝑌𝑌1 𝐻𝐻1 = 𝐻𝐻 1 − 𝐻𝐻1 , 0 < 𝐻𝐻1 < 1.
Γ 𝛼𝛼 Γ 𝛽𝛽 1
It follows that 𝑌𝑌1 has a Beta pdf with parameters 𝛼𝛼 and 𝛽𝛽 and 𝑌𝑌2 has a Gamma pdf with
parameters 𝛼𝛼 + 𝛽𝛽 and 𝜃𝜃.
STAT 2006 - Jan 2021 109

Example. Suppose 𝑈𝑈 and 𝑉𝑉 are independent 𝜒𝜒𝑟𝑟21 random variable and 𝜒𝜒𝑟𝑟22 random variable.
Their joint pdf is
𝑟𝑟1 𝑢𝑢 𝑟𝑟2 𝑣𝑣
𝑣𝑣 2 −1 𝑐𝑐 −2 𝑣𝑣 2 −1 𝑐𝑐 −2
𝑓𝑓𝑈𝑈,𝑉𝑉 𝑣𝑣, 𝑣𝑣 = , 𝑣𝑣 > 0, 𝑣𝑣 > 0.
𝑣𝑣1 𝑟𝑟21 𝑣𝑣2 𝑟𝑟22
Γ 2 Γ 2
2 2
𝑈𝑈/𝑟𝑟1
Find the pdf of 𝑊𝑊 = .
𝑉𝑉/𝑟𝑟2
The cdf of 𝑊𝑊 is given by

𝑟𝑟1 𝑟𝑟1 𝑢𝑢 𝑟𝑟2 𝑣𝑣
𝜆𝜆𝑣𝑣 2 −1 − 2
𝑈𝑈/𝑣𝑣1 𝑣𝑣1 ∞ 𝑟𝑟2 𝑣𝑣 𝑐𝑐 𝑣𝑣 2 −1 𝑐𝑐 −2
𝐹𝐹𝑊𝑊 𝑤𝑤 = 𝑃𝑃 ≤ 𝑤𝑤 = 𝑃𝑃 𝑈𝑈 ≤ 𝑤𝑤𝑉𝑉 = � � 𝑟𝑟1 𝑀𝑀𝑣𝑣 𝑟𝑟2 𝑀𝑀𝑣𝑣.
𝑉𝑉/𝑣𝑣2 𝑣𝑣2 𝑣𝑣
1 𝑣𝑣2
0 0 Γ 22 Γ 22
2 2
STAT 2006 - Jan 2021 110

Example.
The pdf of 𝑊𝑊 is
𝑟𝑟1
𝑟𝑟
𝑣𝑣1 2 −1 −2𝑟𝑟1 𝜆𝜆𝑣𝑣 𝑣𝑣1 𝑟𝑟 𝑣𝑣
𝑀𝑀 ∞
𝑣𝑣2 𝑤𝑤𝑣𝑣 𝑐𝑐 2 𝑣𝑣 𝑣𝑣 𝑣𝑣 22 −1 𝑐𝑐 −2
2
𝑓𝑓𝑊𝑊 𝑤𝑤 = 𝐹𝐹 𝑤𝑤 = � 𝑀𝑀𝑣𝑣
𝑀𝑀𝑤𝑤 𝑊𝑊 0 Γ
𝑣𝑣1 21
2
𝑟𝑟 𝑣𝑣
Γ 2 22
𝑟𝑟2
2 2
𝑟𝑟1 𝑟𝑟1
𝑣𝑣1 2 −1 𝑣𝑣1 𝑣𝑣1 2 𝑣𝑣1 + 𝑣𝑣2 𝑟𝑟21 −1
𝑣𝑣2 𝑤𝑤 ∞ 𝑟𝑟1 +𝑟𝑟2 𝑣𝑣 𝑟𝑟 Γ 𝑤𝑤
𝑣𝑣2 −1 −2 1+𝑟𝑟1 𝜆𝜆 𝑣𝑣2 2
= 𝑟𝑟1 +𝑟𝑟2 � 𝑣𝑣 2 𝑐𝑐 2 𝑀𝑀𝑣𝑣 = 𝑟𝑟1 +𝑟𝑟2 .
𝑣𝑣 𝑣𝑣
Γ 1 Γ 2 2 2 0 𝑣𝑣1 𝑣𝑣 𝑣𝑣1 2
2 2 Γ Γ 2 1+ 𝑤𝑤
2 2 𝑣𝑣2
We say that 𝑊𝑊 has a F distribution with parameters 𝑣𝑣1 and 𝑣𝑣2 . Due to symmetry of
1
construction, we see that has a F distribution with parameters 𝑣𝑣2 and 𝑣𝑣1 .
𝑊𝑊
STAT 2006 - Jan 2021 111

Several Independent Random Variables
The random variables 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are a random sample of size 𝑛𝑛 if the followings are true.
• 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are pairwise independent, that is 𝑋𝑋𝑖𝑖 and 𝑋𝑋𝑗𝑗 are independent random
variables for all 𝑖𝑖 ≠ 𝑗𝑗.
• 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are identically distributed, that is, each 𝑋𝑋𝑖𝑖 has the same pdf.
We say that 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are i.i.d. Given a random sample 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 , we can calculate
the sample mean
1
𝑋𝑋� = 𝑋𝑋1 + 𝑋𝑋2 + ⋯ + 𝑋𝑋𝑛𝑛
𝑛𝑛
and the sample variance
𝑛𝑛
1 2
𝑆𝑆 2 = � 𝑋𝑋𝑗𝑗 − 𝑋𝑋� .
𝑛𝑛 − 1
𝑗𝑗=1
Note that 𝑋𝑋� and 𝑆𝑆 2 are random variables.
STAT 2006 - Jan 2021 112
Expectations of Functions of Independent Random Variables
Let 𝑋𝑋1 , 𝑋𝑋2 be a random sample of size 2 and 𝑌𝑌 = 𝑋𝑋1 + 𝑋𝑋2 . We have
𝐸𝐸 𝑌𝑌 = 𝐸𝐸 𝑋𝑋1 + 𝐸𝐸 𝑋𝑋2 .
Proof.
∞ ∞
𝐸𝐸 𝑌𝑌 = � � 𝑥𝑥1 + 𝑥𝑥2 𝑓𝑓𝑋𝑋1 ,𝑋𝑋2 𝑥𝑥1 , 𝑥𝑥2 𝑀𝑀𝑥𝑥1 𝑀𝑀𝑥𝑥2
−∞ −∞
∞ ∞
=� � 𝑥𝑥1 + 𝑥𝑥2 𝑓𝑓𝑋𝑋1 𝑥𝑥1 𝑓𝑓𝑋𝑋2 𝑥𝑥2 𝑀𝑀𝑥𝑥1 𝑀𝑀𝑥𝑥2
−∞ −∞
∞ ∞ ∞ ∞
= � 𝑥𝑥1 𝑓𝑓𝑋𝑋1 𝑥𝑥1 � 𝑓𝑓𝑋𝑋2 𝑥𝑥2 𝑀𝑀𝑥𝑥2 𝑀𝑀𝑥𝑥1 + � 𝑥𝑥2 𝑓𝑓𝑋𝑋2 𝑥𝑥2 � 𝑓𝑓𝑋𝑋1 𝑥𝑥1 𝑀𝑀𝑥𝑥1 𝑀𝑀𝑥𝑥2
−∞ −∞ −∞ −∞
= 𝐸𝐸 𝑋𝑋1 + 𝐸𝐸 𝑋𝑋2 .
STAT 2006 - Jan 2021 113

Let 𝑋𝑋1 , 𝑋𝑋2 be a random sample of size 2 and 𝑌𝑌 = 𝑋𝑋1 + 𝑋𝑋2 . We have
𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌 = 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋1 + 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋2 .
Proof.
∞ ∞ ∞ ∞
𝐸𝐸 𝑋𝑋1 𝑋𝑋2 = � � 𝑥𝑥1 𝑥𝑥2 𝑓𝑓𝑋𝑋1 ,𝑋𝑋2 𝑥𝑥1 , 𝑥𝑥2 𝑀𝑀𝑥𝑥1 𝑀𝑀𝑥𝑥2 = � � 𝑥𝑥1 𝑥𝑥2 𝑓𝑓𝑋𝑋1 𝑥𝑥1 𝑓𝑓𝑋𝑋2 𝑥𝑥2 𝑀𝑀𝑥𝑥1 𝑀𝑀𝑥𝑥2
−∞ −∞ −∞ −∞
∞ ∞
= � 𝑥𝑥1 𝑓𝑓𝑋𝑋1 𝑥𝑥1 𝑀𝑀𝑥𝑥1 � 𝑥𝑥2 𝑓𝑓𝑋𝑋2 𝑥𝑥2 𝑀𝑀𝑥𝑥2 = 𝐸𝐸 𝑋𝑋1 𝐸𝐸 𝑋𝑋2 .
−∞ −∞
2
𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌 = 𝐸𝐸 𝑋𝑋1 − 𝐸𝐸 𝑋𝑋1 + 𝑋𝑋2 − 𝐸𝐸 𝑋𝑋2
= 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋1 + 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋2 + 2𝐸𝐸 𝑋𝑋1 − 𝐸𝐸 𝑋𝑋1 𝑋𝑋2 − 𝐸𝐸 𝑋𝑋2
= 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋1 + 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋2 + 2 𝐸𝐸 𝑋𝑋1 𝑋𝑋2 − 𝐸𝐸 𝑋𝑋1 𝐸𝐸 𝑋𝑋2 = 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋1 + 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋2 .
STAT 2006 - Jan 2021 114

Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be independent random variables. Suppose that their joint pdf is
𝑓𝑓1 𝑥𝑥1 𝑓𝑓2 𝑥𝑥2 ⋯ 𝑓𝑓𝑛𝑛 𝑥𝑥𝑛𝑛
The expected value of 𝑌𝑌 = 𝑣𝑣 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 is given by
∞ ∞ ∞
𝐸𝐸 𝑌𝑌 = � � ⋯ � 𝑣𝑣 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 𝑓𝑓1 𝑥𝑥1 𝑓𝑓2 𝑥𝑥2 ⋯ 𝑓𝑓𝑛𝑛 𝑥𝑥𝑛𝑛 𝑀𝑀𝑥𝑥1 𝑀𝑀𝑥𝑥2 ⋯ 𝑀𝑀𝑥𝑥𝑛𝑛 ,
−∞ −∞ −∞
provided that the integral exists. In particular, we have
𝑛𝑛 ∞
𝐸𝐸 𝑣𝑣1 𝑋𝑋1 𝑣𝑣2 𝑋𝑋2 ⋯ 𝑣𝑣𝑛𝑛 𝑋𝑋𝑛𝑛 = � � 𝑣𝑣𝑗𝑗 𝑥𝑥𝑗𝑗 𝑓𝑓𝑗𝑗 𝑥𝑥𝑗𝑗 𝑀𝑀𝑥𝑥𝑗𝑗
𝑗𝑗=1 −∞
= 𝐸𝐸 𝑣𝑣1 𝑋𝑋1 𝐸𝐸 𝑣𝑣2 𝑋𝑋2 ⋯ 𝐸𝐸 𝑣𝑣𝑛𝑛 𝑋𝑋𝑛𝑛 .
STAT 2006 - Jan 2021 115

Example. Suppose we toss a fair coin five times. Let 𝑌𝑌 be the number of heads in the five
tosses. Let 𝑋𝑋1 be the number of heads in the first two tosses and 𝑋𝑋2 be the number of
heads in the last three tosses. We see that 𝑌𝑌 = 𝑋𝑋1 + 𝑋𝑋2 and 𝑋𝑋1 , 𝑋𝑋2 are independent. We
can calculate expectation and variance of 𝑌𝑌 in the following ways:
• 𝑌𝑌 has a Binomial distribution with parameters 𝑛𝑛 = 5 and 𝑝𝑝 = 0.5. 𝐸𝐸 𝑌𝑌 = 𝑛𝑛𝑝𝑝 = 2.5
and 𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌 = 𝑛𝑛𝑝𝑝 1 − 𝑝𝑝 = 1.25.
• 𝑋𝑋1 has a Binomial distribution with parameters 𝑛𝑛 = 2 and 𝑝𝑝 = 0.5 and 𝑋𝑋2 has a
Binomial distribution with parameters 𝑛𝑛 = 3 and 𝑝𝑝 = 0.5. We have 𝐸𝐸 𝑋𝑋1 = 1 and
𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋1 = 𝑛𝑛𝑝𝑝 1 − 𝑝𝑝 = 0.5, and 𝐸𝐸 𝑋𝑋2 = 1.5 and 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋2 = 𝑛𝑛𝑝𝑝 1 − 𝑝𝑝 = 0.75. It
follows that 𝐸𝐸 𝑌𝑌 = 𝐸𝐸 𝑋𝑋1 + 𝐸𝐸 𝑋𝑋2 = 2.5 and 𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌 = 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋1 + 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋2 = 1.25.
What is the expected value of 𝑋𝑋12 𝑋𝑋2 ?
𝐸𝐸 𝑋𝑋12 𝑋𝑋2 = 𝐸𝐸 𝑋𝑋12 𝐸𝐸 𝑋𝑋2 = 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋1 + 𝐸𝐸 𝑋𝑋1 2
𝐸𝐸 𝑋𝑋2 = 0.5 + 1 × 1.5 = 2.25.
STAT 2006 - Jan 2021 116

Suppose that 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are independent random variables with means 𝜇𝜇1 , 𝜇𝜇2 , … , 𝜇𝜇𝑛𝑛
and variances 𝜎𝜎12 , 𝜎𝜎22 , … , 𝜎𝜎𝑛𝑛2 . Then, the mean and variance of 𝑌𝑌 = 𝑐𝑐1 𝑋𝑋1 + 𝑐𝑐2 𝑋𝑋2 + ⋯ +
𝑐𝑐𝑛𝑛 𝑋𝑋𝑛𝑛 are 𝜇𝜇𝑌𝑌 = 𝑐𝑐1 𝜇𝜇1 + 𝑐𝑐2 𝜇𝜇2 + ⋯ + 𝑐𝑐𝑛𝑛 𝜇𝜇𝑛𝑛 and 𝜎𝜎𝑌𝑌2 = 𝑐𝑐12 𝜎𝜎12 + 𝑐𝑐22 𝜎𝜎22 + ⋯ + 𝑐𝑐𝑛𝑛2 𝜎𝜎𝑛𝑛2 .
Proof.
𝐸𝐸 𝑌𝑌 = 𝐸𝐸 𝑐𝑐1 𝑋𝑋1 + 𝐸𝐸 𝑐𝑐2 𝑋𝑋2 + ⋯ + 𝐸𝐸 𝑐𝑐𝑛𝑛 𝑋𝑋𝑛𝑛 = 𝑐𝑐1 𝜇𝜇1 + 𝑐𝑐2 𝜇𝜇2 + ⋯ + 𝑐𝑐𝑛𝑛 𝜇𝜇𝑛𝑛
𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌 = 𝑉𝑉𝑐𝑐𝑣𝑣 𝑐𝑐1 𝑋𝑋1 + 𝑉𝑉𝑐𝑐𝑣𝑣 𝑐𝑐2 𝑋𝑋2 + ⋯ + 𝑉𝑉𝑐𝑐𝑣𝑣 𝑐𝑐𝑛𝑛 𝑋𝑋𝑛𝑛 = 𝑐𝑐12 𝜎𝜎12 + 𝑐𝑐22 𝜎𝜎22 + ⋯ + 𝑐𝑐𝑛𝑛2 𝜎𝜎𝑛𝑛2
In general, let 𝜎𝜎𝑖𝑖,𝑗𝑗 = 𝐶𝐶𝑝𝑝𝑣𝑣 𝑋𝑋𝑖𝑖 , 𝑋𝑋𝑗𝑗 . We have
𝑛𝑛 𝑛𝑛−1 𝑛𝑛
𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌 = � 𝑐𝑐𝑗𝑗2 𝜎𝜎𝑗𝑗2 + 2 � � 𝑐𝑐𝑖𝑖 𝑐𝑐𝑗𝑗 𝜎𝜎𝑖𝑖,𝑗𝑗 .

𝑗𝑗=1 𝑖𝑖=1 𝑗𝑗=𝑖𝑖+1
STAT 2006 - Jan 2021 117

Mean and Variance of Sample Mean
Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a random sample of size 𝑛𝑛 from a distribution with mean 𝜇𝜇 and
variance 𝜎𝜎 2 . The mean and variance of the sample mean 𝑋𝑋� are
𝑋𝑋1 𝑋𝑋2 𝑋𝑋𝑛𝑛 𝜇𝜇 𝜇𝜇 𝜇𝜇
𝐸𝐸 𝑋𝑋� = 𝐸𝐸 + + ⋯+ = + + ⋯ + = 𝜇𝜇
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
𝑋𝑋1 𝑋𝑋2 𝑋𝑋𝑛𝑛 𝜎𝜎 2 𝜎𝜎 2 𝜎𝜎 2 𝜎𝜎 2 𝜎𝜎 2
𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋� = 𝑉𝑉𝑐𝑐𝑣𝑣 + + ⋯+ = 2 + 2 + ⋯ + 2 = 2 × 𝑛𝑛 =
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
STAT 2006 - Jan 2021 118

Moment Generating Function Technique
Let 𝑋𝑋1 be the number of heads in two tosses of a fair coin and 𝑋𝑋2 be the number of heads
in three tosses of a fair coin. We let 𝑌𝑌 = 𝑋𝑋1 + 𝑋𝑋2 .
We understand that
1 1 2
• 𝑋𝑋1 has a binomial distribution with 𝑛𝑛 = 2 and 𝑝𝑝 = 0.5. Its mgf is 𝑀𝑀𝑋𝑋1 𝑝𝑝 = + 𝑐𝑐 𝑡𝑡 .
2 2
1 1 𝑡𝑡 3
• 𝑋𝑋2 has a Binomial distribution with 𝑛𝑛 = 3 and 𝑝𝑝 = 0.5. Its mgf is 𝑀𝑀𝑋𝑋2 𝑝𝑝 = + 𝑐𝑐 .
2 2
The mgf of 𝑌𝑌 is given by

5
𝑡𝑡𝑌𝑌 𝑡𝑡 𝑋𝑋1 +𝑋𝑋2 𝑡𝑡𝑋𝑋1 𝑡𝑡𝑋𝑋2
1 1 𝑡𝑡
𝑀𝑀𝑌𝑌 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 = 𝐸𝐸 𝑐𝑐 = 𝐸𝐸 𝑐𝑐 𝐸𝐸 𝑐𝑐 = 𝑀𝑀𝑋𝑋1 𝑝𝑝 𝑀𝑀𝑋𝑋2 𝑝𝑝 = + 𝑐𝑐 .
2 2
Hence, 𝑌𝑌 has a Binomial distribution with 𝑛𝑛 = 5 and 𝑝𝑝 = 0.5.
STAT 2006 - Jan 2021 119

Suppose that 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are independent random variables with mgf 𝑀𝑀𝑋𝑋𝑖𝑖 𝑝𝑝 =
𝑛𝑛
𝐸𝐸 𝑐𝑐 𝑡𝑡𝑋𝑋𝑖𝑖 , 𝑖𝑖 = 1,2, … , 𝑛𝑛 respectively. The mgf of 𝑌𝑌 = � 𝑐𝑐𝑗𝑗 𝑋𝑋𝑗𝑗 is given by
𝑗𝑗=1
𝑛𝑛 𝑛𝑛
𝑀𝑀𝑌𝑌 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑌𝑌 = 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑎𝑎1 𝑋𝑋1 𝑐𝑐 𝑡𝑡𝑎𝑎2 𝑋𝑋2 ⋯ 𝑐𝑐 𝑡𝑡𝑎𝑎𝑛𝑛 𝑋𝑋𝑛𝑛 = � 𝐸𝐸 𝑐𝑐 𝑎𝑎𝑖𝑖 𝑡𝑡𝑋𝑋𝑖𝑖 = � 𝑀𝑀𝑋𝑋𝑖𝑖 𝑐𝑐𝑖𝑖 𝑝𝑝 .
𝑖𝑖=1 𝑖𝑖=1
If 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are random sample from a population with mgf 𝑀𝑀 𝑝𝑝 , then
𝑛𝑛
• The mgf of 𝑌𝑌 = � 𝑋𝑋𝑗𝑗 is ∏𝑛𝑛𝑗𝑗=1 𝑀𝑀 𝑝𝑝 = 𝑀𝑀 𝑝𝑝 𝑛𝑛 .
𝑗𝑗=1
𝑡𝑡 𝑛𝑛
• The mgf of 𝑋𝑋� is 𝑀𝑀 .
𝑛𝑛
STAT 2006 - Jan 2021 120

Let 𝑋𝑋1 , 𝑋𝑋2 and 𝑋𝑋3 be a random sample of size 3 from a gamma distribution with 𝛼𝛼 = 7 and
𝜃𝜃 = 5. Let 𝑌𝑌 = 𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 .
What is the distribution of 𝑌𝑌?
−7 1
The mgf of the gamma random variable is 𝑀𝑀 𝑝𝑝 = 1 − 5𝑝𝑝 for 𝑝𝑝 < . Thus, the mgf of 𝑌𝑌
5
1
is 𝑀𝑀𝑌𝑌 𝑝𝑝 = 𝑀𝑀 𝑝𝑝 3 = 1 − 5𝑝𝑝 −21
for 𝑝𝑝 < . It follows that 𝑌𝑌 has a gamma distribution
5
with 𝛼𝛼 = 21 and 𝜃𝜃 = 5.
�
What is the distribution of 𝑋𝑋?
𝑡𝑡 3 5 −21 3
The mgf of 𝑋𝑋� is 𝑀𝑀𝑋𝑋� 𝑝𝑝 = 𝑀𝑀 = 1 − 𝑝𝑝 for 𝑝𝑝 < . Hence, 𝑋𝑋� has a gamma
3 3 5
5
distribution with 𝛼𝛼 = 21 and 𝜃𝜃 = .
3
STAT 2006 - Jan 2021 121

Sum of Chi-square Random Variables
For 𝑖𝑖 = 1,2, … , 𝑛𝑛, 𝑋𝑋𝑖𝑖 has a Chi-square distribution with 𝑣𝑣𝑖𝑖 degrees of freedom. Assume that
𝑛𝑛
𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are independent. Then, 𝑌𝑌 = � 𝑋𝑋𝑗𝑗 has a Chi-square distribution with
𝑗𝑗=1
𝑛𝑛
degrees of freedom � 𝑣𝑣𝑗𝑗 .
𝑗𝑗=1
• Answer. The mgf of a Chi-square random variable with 𝑣𝑣 degrees of freedom is 𝑀𝑀 𝑝𝑝 =
𝑟𝑟 𝑟𝑟𝑗𝑗 1
− 1 −3 − ∑𝑛𝑛
𝑗𝑗=1 𝑟𝑟𝑗𝑗
1 − 2𝑝𝑝 2 for 𝑝𝑝 < . Hence, the mgf of 𝑌𝑌 is ∏𝑛𝑛𝑗𝑗=1 1 − 2𝑝𝑝 = 1 − 2𝑝𝑝 2 for
2
1
𝑝𝑝 < .
2
𝑛𝑛
• In particular, let 𝑍𝑍1 , 𝑍𝑍2 , … , 𝑍𝑍𝑛𝑛 be independent standard normal variables. 𝑊𝑊 = � 𝑍𝑍𝑗𝑗2
𝑗𝑗=1
has a Chi-square distribution with 𝑛𝑛 degrees of freedom.
STAT 2006 - Jan 2021 122

Sum of Chi-square Random Variables
For 𝑖𝑖 = 1,2, … , 𝑛𝑛, 𝑋𝑋𝑖𝑖 has a normal distribution with mean 𝜇𝜇𝑖𝑖 and variance 𝜎𝜎𝑖𝑖2 . Assume that
𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are independent. Then,
𝑛𝑛 2
𝑋𝑋𝑗𝑗 − 𝜇𝜇𝑗𝑗
𝑊𝑊 = �
𝜎𝜎𝑗𝑗
𝑗𝑗=1
has a Chi-square distribution with 𝑛𝑛 degrees of freedom.
STAT 2006 - Jan 2021 123

Sum of Independent Normal Random Variables
For 𝑖𝑖 = 1,2, … , 𝑛𝑛, 𝑋𝑋𝑖𝑖 has a normal distribution with mean 𝜇𝜇𝑖𝑖 and variance 𝜎𝜎𝑖𝑖2 . Assume that
𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are independent. Then,
𝑛𝑛
𝑌𝑌 = � 𝑐𝑐𝑗𝑗 𝑋𝑋𝑗𝑗
𝑗𝑗=1
𝑛𝑛 𝑛𝑛
has a normal distribution with mean � 𝑐𝑐𝑗𝑗 𝜇𝜇𝑗𝑗 and variance � 𝑐𝑐𝑗𝑗2 𝜎𝜎𝑗𝑗2 .
1
2 𝜇𝜇𝑡𝑡+2𝜎𝜎 2 𝑡𝑡 2
Answer. The mgf of a normal random variable with mean 𝜇𝜇 and 𝜎𝜎 is 𝑀𝑀 𝑝𝑝 = 𝑐𝑐 .
The mgf of 𝑌𝑌 is
𝑛𝑛 𝑛𝑛 𝑛𝑛
1
𝑀𝑀𝑌𝑌 𝑝𝑝 = � 𝑀𝑀𝑋𝑋𝑗𝑗 𝑐𝑐𝑗𝑗 𝑝𝑝 = exp � 𝑐𝑐𝑗𝑗 𝜇𝜇𝑗𝑗 𝑝𝑝 + � 𝑐𝑐𝑗𝑗2 𝜎𝜎𝑗𝑗2 𝑝𝑝 2 .
2
𝑗𝑗=1 𝑗𝑗=1 𝑗𝑗=1
STAT 2006 - Jan 2021 124

Example. Let 𝑋𝑋𝑖𝑖 denote the weight of a randomly selected prepackaged one-pound bag of carrots.
Past records suggest that 𝑋𝑋𝑖𝑖 is normally distributed with mean of 1.18 pounds and a standard
deviation of 0.07 pound. Now, let 𝑊𝑊 denote the weight of a randomly selected prepackaged three-
pound bag of carrots. It is known that 𝑊𝑊 is normally distributed with mean 3.22 pounds and
standard deviation 0.09 pound. Selecting bags at random, what is the probability that the sum of
three one-pound bags exceeds the weight of one three-pound bag?
Because bags are selected at random, we assume that 𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3 and 𝑊𝑊 are independent. Let 𝑌𝑌 =
𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 be the sum of the weights of three one-pound bags. Then, 𝑌𝑌 is normally distributed
with mean 1.18 + 1.18 + 1.18 = 3.54 and variance 0.072 + 0.072 + 0.072 = 0.0147. Since 𝑌𝑌 and
𝑊𝑊 are independent, 𝑌𝑌 − 𝑊𝑊 is normally distributed with mean 3.54 − 3.22 = 0.32 and variance
0.01472 + 0.092 = 0.0228.
0 − 0.32
𝑃𝑃 𝑌𝑌 > 𝑊𝑊 = 𝑃𝑃 𝑌𝑌 − 𝑊𝑊 > 0 = 𝑃𝑃 𝑍𝑍 > = 𝑃𝑃 𝑍𝑍 > −2.12 = 𝑃𝑃 𝑍𝑍 < 2.12 = 0.9830
0.0228
STAT 2006 - Jan 2021 125

Suppose that 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are a random sample of size 𝑛𝑛 from a normal distribution with
mean 𝜇𝜇 and variance 𝜎𝜎 2 . Then, we have the followings:
𝜎𝜎 2
• the sample mean 𝑋𝑋� is normally distributed with mean 𝜇𝜇 and variance .
𝑛𝑛
� 2
• 𝑋𝑋 and 𝑆𝑆 are independent.
𝑛𝑛−1 𝑆𝑆 2
• has a Chi-square distribution with 𝑛𝑛 − 1 degrees of freedom.
𝜎𝜎 2
Proof.
𝑛𝑛
𝑛𝑛 𝑡𝑡 1 𝑡𝑡2 1𝜎𝜎2
𝑡𝑡 𝜇𝜇 + 𝜎𝜎 2 2 𝜇𝜇𝑡𝑡+2 𝑛𝑛 𝑡𝑡 2
The mgf of 𝑋𝑋� is given by 𝑀𝑀𝑋𝑋� 𝑝𝑝 = 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑋𝑋�
= 𝑀𝑀 = 𝑐𝑐 = 𝑐𝑐
𝑛𝑛 2 𝑛𝑛 .
𝑛𝑛
Note that
𝑛𝑛 𝑛𝑛 2
𝑋𝑋𝑖𝑖 − 𝜇𝜇 2 � 𝑋𝑋 − 𝑋𝑋� 2 + 𝑛𝑛 𝑋𝑋� − 𝜇𝜇 2
𝑗𝑗=1 𝑖𝑖 𝑛𝑛 − 1 𝑆𝑆 2 𝑋𝑋� − 𝜇𝜇
𝑊𝑊 = � = = + .
𝜎𝜎 𝜎𝜎 2 𝜎𝜎 2 𝜎𝜎 2
𝑗𝑗=1
𝑛𝑛
STAT 2006 - Jan 2021 126
Suppose that 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are a random sample of size 𝑛𝑛 from a normal distribution with
mean 𝜇𝜇 and variance 𝜎𝜎 2 . Then, we have the followings:
𝜎𝜎 2
• the sample mean 𝑋𝑋� is normally distributed with mean 𝜇𝜇 and variance .
𝑛𝑛
• 𝑋𝑋� and 𝑆𝑆 2 are independent.
𝑛𝑛−1 𝑆𝑆 2
• has a Chi-square distribution with 𝑛𝑛 − 1 degrees of freedom.
𝜎𝜎 2
Proof.
� 2
𝑋𝑋−𝜇𝜇
Assume that 𝑋𝑋� and 𝑆𝑆 2 are independent. Since 𝑊𝑊~𝜒𝜒𝑛𝑛2 , and 𝑍𝑍 2 = 𝜎𝜎2
~𝜒𝜒12 ,
𝑛𝑛
𝑛𝑛 𝑛𝑛−1 𝑆𝑆2 𝑛𝑛−1 𝑆𝑆2 𝑛𝑛−1 𝑆𝑆2 1
−2 𝑡𝑡 2 𝑡𝑡 2 𝑡𝑡 −2
1 − 2𝑝𝑝 = 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑊𝑊 = 𝐸𝐸 𝑐𝑐 𝜎𝜎2 𝑐𝑐 𝑡𝑡𝑍𝑍 = 𝐸𝐸 𝑐𝑐 𝜎𝜎2 𝐸𝐸 𝑐𝑐 𝑡𝑡𝑍𝑍 = 𝐸𝐸 𝑐𝑐 𝜎𝜎2 1 − 2𝑝𝑝 ⟹
𝑛𝑛−1 𝑆𝑆2 𝑛𝑛−1
𝑡𝑡 − 2 𝑛𝑛−1 𝑆𝑆 2 2
𝐸𝐸 𝑐𝑐 𝜎𝜎2 = 1 − 2𝑝𝑝 ⟹ ~𝜒𝜒𝑛𝑛−1
𝜎𝜎2
STAT 2006 - Jan 2021 127

Student's t Distribution
𝑍𝑍
If 𝑍𝑍~𝑛𝑛 0,1 and 𝑈𝑈~𝜒𝜒𝑟𝑟2 are independent, then we say that the random variable 𝑇𝑇 =
𝑈𝑈/𝑟𝑟
has a t-distribution with 𝑣𝑣 degrees of freedom. Write 𝑇𝑇~𝑝𝑝𝑟𝑟 .
𝑍𝑍 2
Note that 𝑍𝑍 2
~𝜒𝜒12 and 𝑈𝑈~𝜒𝜒𝑟𝑟2 and 𝑍𝑍 and 𝑈𝑈 are independent. 𝑇𝑇 = 2
~𝐹𝐹1,𝑟𝑟 . For 𝑝𝑝 > 0,
𝑈𝑈/𝑟𝑟
𝑡𝑡 2
𝐹𝐹𝑇𝑇 𝑝𝑝 − 𝐹𝐹𝑇𝑇 −𝑝𝑝 = 𝑃𝑃 −𝑝𝑝 ≤ 𝑇𝑇 ≤ 𝑝𝑝 = 𝑃𝑃 𝑇𝑇 2 ≤ 𝑝𝑝 2 = � 𝑓𝑓1,𝑟𝑟 𝑥𝑥 𝑀𝑀𝑥𝑥.
0
The pdf of 𝑇𝑇 is
𝑓𝑓𝑇𝑇 𝑝𝑝 + 𝑓𝑓𝑇𝑇 −𝑝𝑝 = 2𝑝𝑝𝑓𝑓1,𝑟𝑟 𝑝𝑝 2 .
Since 𝑓𝑓𝑇𝑇 −𝑝𝑝 = 𝑓𝑓𝑇𝑇 𝑝𝑝 ,
1
1 2 1 + 𝑣𝑣
Γ
𝑓𝑓𝑇𝑇 𝑝𝑝 = 𝑝𝑝𝑓𝑓1,𝑟𝑟 𝑝𝑝 2 = 𝑣𝑣 2 .
1+𝑟𝑟
1 𝑣𝑣 2
𝑝𝑝 2
Γ Γ 1+
2 2 𝑣𝑣
STAT 2006 - Jan 2021 128
Properties of a t-distribution
• The support is ℝ.
• The pdf is symmetric about zero.
• The pdf is bell-shaped.
• The tails of a t-distribution is heavier than that of a standard normal distribution.
• As the degrees of freedom increases, the t-distribution approaches the standard normal
distribution.
Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a random sample of size 𝑛𝑛 from a normal distribution with mean 𝜇𝜇
�
𝑋𝑋−𝜇𝜇 𝑛𝑛−1 𝑆𝑆 2 2
and variance 𝜎𝜎 2 . Then 𝑍𝑍 = 𝑛𝑛 ~𝑛𝑛 0,1 and 𝑈𝑈 = ~𝜒𝜒𝑛𝑛−1 and 𝑍𝑍 and 𝑈𝑈 are
𝜎𝜎 𝜎𝜎 2
independent. It follows that
𝑍𝑍 𝑋𝑋� − 𝜇𝜇
= 𝑛𝑛 ~𝑝𝑝𝑛𝑛−1 .
𝑈𝑈 𝑆𝑆
𝑛𝑛 − 1
STAT 2006 - Jan 2021 129
Example. GPAs have been recorded for a random sample of 16 from the entering freshman
class at a major university. It can be assumed that the distribution of GPA values is
approximately normal. The sample yielded a mean, 𝑥𝑥̅ = 3.1, and standard deviation, 𝑣𝑣 =
0.8. The nationwide mean GPA of entering freshmen is 𝜇𝜇 = 2.7. What is the probability of
getting a 𝑋𝑋� which is greater than or equal to 𝑥𝑥̅ = 3.1 if the mean GPA of this university is
the same as the nationwide population of students.
Answer.
𝑥𝑥̅ − 2.7 3.1 − 2.7
𝑃𝑃 𝑋𝑋� ≥ 𝑥𝑥̅ = 𝑃𝑃 𝑇𝑇 ≥ = 𝑃𝑃 𝑇𝑇 ≥ = 2.0 ≅ 0.032,
𝑣𝑣/ 16 0.8
16
where 𝑇𝑇~𝑝𝑝15 .
STAT 2006 - Jan 2021 130

The Central Limit Theorem
Chebyshev's inequality
Let 𝑋𝑋 be a random variable with 𝜇𝜇 = 𝐸𝐸 𝑋𝑋 and 𝜎𝜎 2 = 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 . Then, for any 𝜀𝜀 > 0,
𝜎𝜎 2
𝑃𝑃 𝑋𝑋 − 𝜇𝜇 > 𝜀𝜀 ≤ 2 .
𝜀𝜀
Proof. WLOG assume that 𝜇𝜇 = 0.
∞ −𝜀𝜀 ∞
2 2
𝜎𝜎 = 𝐸𝐸 𝑋𝑋 = � 𝑥𝑥 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 ≥ � 𝑥𝑥 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 + � 𝑥𝑥 2 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥
2 2
−∞ −∞ 𝜀𝜀
−𝜀𝜀 ∞
≥ � 𝜀𝜀 2 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 + � 𝜀𝜀 2 𝑓𝑓𝑋𝑋 𝑥𝑥 𝑀𝑀𝑥𝑥 = 𝜀𝜀 2 𝑃𝑃 𝑋𝑋 > 𝜀𝜀 .
−∞ 𝜀𝜀
STAT 2006 - Jan 2021 131

Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a random sample from a distribution (not necessary normal
distribution) with finite mean 𝜇𝜇 and (finite) variance 𝜎𝜎 2 . If the sample size 𝑛𝑛 is sufficiently
large, then we have the followings:
• the sample mean 𝑋𝑋� is approximately normally distributed.
• the mean of the normal distribution is 𝐸𝐸 𝑋𝑋� = 𝜇𝜇.
𝜎𝜎 2
• the variance of the normal distribution is 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋� = .
𝑛𝑛
�
𝑋𝑋−𝜇𝜇 d
We write 𝑛𝑛 𝑛𝑛 0,1 , as 𝑛𝑛 ⟶ ∞.
𝜎𝜎
𝜎𝜎2
Alternatively, we also write the large sample distribution of 𝑋𝑋� as 𝑛𝑛 𝜇𝜇, .
𝑛𝑛
Note that, using Chebyshev's inequality, for any 𝜀𝜀 > 0,
𝜎𝜎 2
�
𝑃𝑃 𝑋𝑋 − 𝜇𝜇 > 𝜀𝜀 ≤ 2 ⟶ 0, as 𝑛𝑛 ⟶ ∞.
𝑛𝑛𝜀𝜀
�
We see that 𝑋𝑋 converges to 𝜇𝜇 as the sample size increases to infinity. This is called the
weak law of large number.
STAT 2006 - Jan 2021 132
Example. Let 𝑋𝑋 be the waiting time (in minutes) for a customer. An assistant manager claims
that 𝜇𝜇, the average waiting time of a customer, is 2 minutes. To verify this claim, the
manager observes the waiting times of a random sample of 36 customers. The average
waiting time for the 36 customers is 3.2 minutes. Should the claim be justified?
Answer.
1
It is reasonable to assume that 𝑋𝑋 has an exponential distribution with parameter 𝜆𝜆 = .
𝜇𝜇
1 1
Since 𝐸𝐸 𝑋𝑋 = = 𝜇𝜇 and 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 = = 𝜇𝜇2 , we have that the large sample distribution of 𝑋𝑋�
𝜆𝜆 𝜆𝜆2
4
is 𝑛𝑛 2, . The probability of obtaining a random sample of size 36 with sample mean at
36
least 3.2 min is given by
3.2 − 2
𝑃𝑃 𝑋𝑋� > 3.2 = 𝑃𝑃 𝑍𝑍 > = 𝑃𝑃 𝑍𝑍 > 3.6 = 0.00016.
1/9
Since this probability is close to zero, it is not likely to obtain such a random sample or the
claim is not correct.
STAT 2006 - Jan 2021 133
Normal Approximation to Binomial
Example. A large drug company has 100 potential new prescription drugs under clinical test.
About 20% of all drugs that reach this stage are eventually licensed for sale. What is the
probability that at least 15 of the 100 drugs are eventually licensed? Assume that the
binomial assumptions are satisfied, and a normal approximation with continuity correction.
Answer.
Let 𝑋𝑋 be the number of prescription drugs that are eventually licensed for sale. Then 𝑋𝑋 has a
binomial distribution with 𝑛𝑛 = 100 and 𝑝𝑝 = 0.2.
14.5 − 100 × 0.2
𝑃𝑃 𝑋𝑋 ≥ 15 = 𝑃𝑃 𝑍𝑍 ≥ = 𝑃𝑃 𝑍𝑍 ≥ −1.38 = 𝑃𝑃 𝑍𝑍 ≤ 1.38 = 0.9162.
100 × 0.2 × 0.8
Note that the continuity correction refers to the approximation 𝑃𝑃 𝑋𝑋 ≥ 15 = 𝑃𝑃 𝑋𝑋 ≥ 14.5 .
STAT 2006 - Jan 2021 134

Normal Approximation to Poisson
Example. The annual number of earthquakes registering at least 2.5 on the Richter Scale and
having an epicenter within 40 miles of downtown Memphis follows a Poisson distribution
with mean 6.5. What is the probability that at least 9 such earthquakes will strike next year?
Answer. Let 𝑋𝑋 be the number of earthquakes. We answer the question in the following ways:
• Using the Poisson distribution with mean 6.5, 𝑃𝑃 𝑋𝑋 ≥ 9 = 1 − 𝑃𝑃 𝑋𝑋 ≤ 8 = 1 − 0.792 =
0.208.
• Using the normal approximation, the distribution of 𝑋𝑋 is approximated by a normal
distribution with mean 6.5 and variance 6.5.
8.5 − 6.5
𝑃𝑃 𝑋𝑋 ≥ 9 = 𝑃𝑃 𝑌𝑌 > 8.5 = 𝑃𝑃 𝑍𝑍 > = 𝑃𝑃 𝑍𝑍 > 0.78 = 0.218.
6.5
STAT 2006 - Jan 2021 135

STAT 2006 Chapter 1 - 2022 - v2 - Polished

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT 2006 Chapter 1 - 2022 - v2 - Polished

Uploaded by

Copyright:

Available Formats

Revision

Department of Statistics, The Chinese University of Hong Kong

STAT 2006 - Jan 2021 1

STAT 2006 - Jan 2021 3

STAT 2006 - Jan 2021 4

STAT 2006 - Jan 2021 5

STAT 2006 - Jan 2021 6

STAT 2006 - Jan 2021 7

Note that if 𝐵𝐵 becomes the sample space, then 𝑃𝑃 𝐵𝐵 𝐵𝐵 = 1.

STAT 2006 - Jan 2021 10

𝑃𝑃 𝐴𝐴𝑖𝑖1 𝐴𝐴𝑖𝑖2 ⋯ 𝐴𝐴𝑖𝑖𝑗𝑗 = 𝑃𝑃 𝐴𝐴𝑖𝑖1 𝑃𝑃 𝐴𝐴𝑖𝑖2 ⋯ 𝑃𝑃 𝐴𝐴𝑖𝑖𝑗𝑗 ,

STAT 2006 - Jan 2021 12

𝑃𝑃 𝐴𝐴 = � 𝑃𝑃 𝐴𝐴⋂𝐵𝐵𝑗𝑗 = � 𝑃𝑃 𝐴𝐴 𝐵𝐵𝑗𝑗 𝑃𝑃 𝐵𝐵𝑗𝑗 .

STAT 2006 - Jan 2021 14

STAT 2006 - Jan 2021 15

STAT 2006 - Jan 2021 16

100 100 100

STAT 2006 - Jan 2021 21

STAT 2006 - Jan 2021 22

The figure shows a plot of 𝑃𝑃 𝐴𝐴 𝑥𝑥 as a function of 𝑥𝑥 for the two airlines.

STAT 2006 - Jan 2021 23

STAT 2006 - Jan 2021 24

STAT 2006 - Jan 2021 25

The cumulative distribution function (cdf) of a real-valued random variable 𝑋𝑋 is defined by

STAT 2006 - Jan 2021 26

STAT 2006 - Jan 2021 27

STAT 2006 - Jan 2021 28

Example (Acceptance Sampling)

STAT 2006 - Jan 2021 29

STAT 2006 - Jan 2021 30

STAT 2006 - Jan 2021 31

STAT 2006 - Jan 2021 32

𝐸𝐸 𝑋𝑋 and 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋 are population parameters. To make inference on them, we collect a

STAT 2006 - Jan 2021 33

The moment generating function (mgf) of 𝑋𝑋 is defined by

STAT 2006 - Jan 2021 34

STAT 2006 - Jan 2021 35

STAT 2006 - Jan 2021 37

STAT 2006 - Jan 2021 39

𝑉𝑉𝑐𝑐𝑣𝑣 𝑌𝑌 = � 𝑉𝑉𝑐𝑐𝑣𝑣 𝑋𝑋𝑖𝑖 = 𝑛𝑛𝜋𝜋 1 − 𝜋𝜋

STAT 2006 - Jan 2021 41

STAT 2006 - Jan 2021 43

STAT 2006 - Jan 2021 44

STAT 2006 - Jan 2021 45

STAT 2006 - Jan 2021 46

STAT 2006 - Jan 2021 47

STAT 2006 - Jan 2021 48

STAT 2006 - Jan 2021 49

STAT 2006 - Jan 2021 50

STAT 2006 - Jan 2021 51

STAT 2006 - Jan 2021 53

STAT 2006 - Jan 2021 54

Properties of Poisson Distribution

STAT 2006 - Jan 2021 55

STAT 2006 - Jan 2021 56

When the support 𝑆𝑆 of a random variable 𝑋𝑋 is uncountable, 𝑋𝑋 is called a continuous random

STAT 2006 - Jan 2021 57

STAT 2006 - Jan 2021 58

• The 25th percentile, 𝜋𝜋0.25 , is called the first quartile (𝑞𝑞1 ).

STAT 2006 - Jan 2021 63

STAT 2006 - Jan 2021 65