UECM2273 Mathematical Statistics

[UECM2273 MATHEMATICAL STATISTICS] January 14, 2019
CHAPTER ONE:
RANDOM VARIABLES AND THEIR DISTRIBUTION
This topic is served only as a revision of Probability & Statistics I & II. So all basic
concepts will only be briefly stated. Students enrolled in this course are advised to revise the
materials learnt in the two courses mentioned above because they will also be tested.
1.1 Probability
Definition 1.1: Suppose S is a sample space associated with an experiment. To every event
A in S (A is a subset of S), we assign a number 𝑃(𝐴), called the probability of A, so that the
following axioms hold:
Axiom 1: 𝑃(𝐴) ≥ 0.
Axiom 2: 𝑃(𝑆) = 1.
Axiom 3: If 𝐴 , 𝐴 , 𝐴 , … from a sequence of pairwise mutually exclusive (disjoint)
events in S (that is, 𝐴 ∩ 𝐴 = ∅ if 𝑖 ≠ 𝑗), then 𝑃(⋃ 𝐴 ) = ∑ 𝑃(𝐴 ).
1.2 Discrete Random Variables

Definition 1.2: A random variable, say X, is a function defined over a sample space, S, that
associates a real number, 𝑋(𝑒) = 𝑥, with each possible outcome e in S.
Example 1.1: A four-sided (tetrahedral) die has a different number – 1, 2, 3, or 4 – affixed to

each side. On any given roll, each of the four numbers is equally likely to occur. A game
consists of rolling the die twice, and the score is the maximum of the two numbers that
occur. Although the score cannot be predicted, we can determine the set of possible values
and define a random variable. In particular, if 𝑒 = (𝑖, 𝑗) where 𝑖, 𝑗 ∈ {1,2,3,4}, then 𝑋(𝑒) =
max(𝑖, 𝑗). The sample space, S, and X are illustrated in Fig. 1.
(1,4) (2,4) (3,4) (4,4)

(1,3) (2,3) (3,3) (4,3)
S
(1,2) (2,2) (3,2) (4,2)
(1,1) (2,1) (3,1) (4,1)
x
1 2 3 4
Definition 1.3: If the set of all possible values of a random variable, X, is a countable finite
set 𝑥 , 𝑥 , … , 𝑥 , or countable infinite set 𝑥 , 𝑥 , …, then X is called a discrete random
variable. The function
𝑓 (𝑥) = 𝑃 (𝑋 = 𝑥), where 𝑥 = 𝑥 , 𝑥 , …
Prepared by Dr. Chang Yun Fah Page 1

that assigns the probability to each possible value x will be called the discrete probability
density function (or probability mass function).
Example 1.2: A supervisor in a manufacturing plant has three men and three women
working for him. He wants to choose two workers for a special job. Not wishing to show
any biases in his selection, he decides to select the two workers at random. Let Y denote the
number of women in his selection. Find the probability density function for Y.
Answer: the supervisor can select two workers from six in = 15 ways. Hence, S
contains 15 sample points, which we assume to be equally likely because random sampling
was employed. The number of women that can be selected are 𝑌 = 0,1, or 2. Hence, the
number of ways of selecting Y is , and the number of ways of selecting men is .
Thus, the probability to select y women is
𝑓(𝑦) = 𝑃(𝑌 = 𝑦) = .
or
y Calculation 𝑓(𝑦)
0 3 1
= 5
15
1 9 3
= 5
15
2 3 1
= 5
15
Theorem 1.4: A function 𝑓(𝑥) is a discrete pdf if and only if it satisfies both of the
following properties:
1) 𝑓 (𝑥 ) ≥ 0 for all 𝑥 , and
2) ∑ 𝑓(𝑥 ) = 1.
Example 1.3: from the Example 1.1, we have

1
𝑓 (1) = 𝑃(𝑋 = 1) = >0
16
3
𝑓 (2) = 𝑃(𝑋 = 2) = >0
16
5
𝑓 (3) = 𝑃(𝑋 = 3) = >0
16
7
𝑓 (4) = 𝑃(𝑋 = 4) = >0
16

where ∑ 𝑓 (𝑥 ) = 𝑓(1) + 𝑓 (2) + 𝑓(3) + 𝑓 (4) = + + + = 1.

Similar for Example 1.2.
Example 1.4: We roll a 12-sided die twice. If each face is marked with an integer, 1 through
12, then each value is equally likely to occur on a single roll of the die. We define a random
variable X to be the maximum obtained on the two rolls. It is not hard to see that for each
value x there are an odd number, 2𝑥 − 1 of ways for that value to occur. Thus, the discrete
pdf of X must have the form
𝑓 (𝑥) = 𝑐(2𝑥 − 1) for 𝑥 = 1,2, … ,12
In order to determine the value, c, we consider Theorem 1.1 as follow

( )( )
1=∑ 𝑓(𝑥) = 𝑐 ∑ (2𝑥 − 1) = 𝑐[2 ∑ 𝑥 − 12] = 𝑐 − 12 = 𝑐 (12)
So, 𝑐 = .
1.3 Continuous Random Variables

Definition 1.5: A random variable X is called a continuous random variable if there is a
function 𝑓(𝑥), called the probability density function (pdf) of X, such that the cdf can be
represented as
𝐹 (𝑥 ) = 𝑓(𝑡) 𝑑𝑡
By the Fundamental Theorem of Calculus, the pdf of X can be obtained from the cdf by
differentiations
𝑑
𝑓 (𝑥 ) = 𝐹 (𝑥) = 𝐹′(𝑥)
𝑑𝑥
wherever the derivative exists.
Theorem 1.6: A function 𝑓(𝑥) is a pdf for some continuous random variable X if and only
if it satisfies the properties
1) 𝑓(𝑥) ≥ 0 for all real x, and
2) ∫ 𝑓(𝑥) 𝑑𝑥 = 1.
Example 1.5: A machine produces copper wire, and occasionally there is a flaw at some
point along the wire. The length of wire (in meters) produced between successive flaws is a
continuous random variable X with pdf of the form
𝑓(𝑥) = 𝑐(1 + 𝑥) 𝑥>0

0 𝑥≤0
where c is a constant. The value of c can be determined by means of Theorem 1.9(2)
𝑐
1= 𝑓(𝑥) 𝑑𝑥 = 𝑐 (1 + 𝑥 ) 𝑑𝑥 =
2
(using substitution 𝑢 = 1 + 𝑥 and an application of the power rule for integrals). This
implies that the constant is 𝑐 = 2.
1.4 Distribution Functions

Definition 1.7: The cumulative distribution function (cdf) of a discrete random variable X
is defined for any real x by
𝐹 (𝑥) = 𝑃(𝑋 ≤ 𝑥)
Theorem 1.8: Let X be a discrete random variable with pdf 𝑓(𝑥) and cdf 𝐹(𝑥). If the
possible values of X are indexed in increasing order, 𝑥 < 𝑥 < 𝑥 < ⋯, then
𝑓 (𝑥 ) = 𝐹 (𝑥 ), and
𝑓 (𝑥 ) = 𝐹 (𝑥 ) − 𝐹 (𝑥 ) for any 𝑖 > 1.
Furthermore, if 𝑥 < 𝑥 then 𝐹 (𝑥 ) = 0, and for any other real x
𝐹(𝑥) = ∑ 𝑓(𝑥 ).
Theorem 1.9: A function 𝐹(𝑥) is a cdf for some random variable X if and only if it satisfies
the following properties
1) lim 𝐹(𝑥) = 0,
→
2) lim 𝐹(𝑥) = 1,
→
3) lim 𝐹(𝑥 + ℎ) = 𝐹(𝑥),

→
4) 𝑎 < 𝑏 implies 𝐹(𝑎) ≤ 𝐹(𝑏).
Continue from Example 1.5, the cdf for this random variable is given by
𝐹 (𝑥 ) = 𝑃(𝑋 ≤ 𝑥 ) = 𝑓 (𝑡 ) 𝑑𝑡
⎧
⎪ 0 𝑑𝑡 + 2(1 + 𝑡) 𝑑𝑡 𝑥>0
=
⎨
⎪ 0 𝑑𝑡 𝑥≤0
⎩
( )
= 1− 1+𝑥 𝑥>0
0 𝑥≤0

Note that ∫ 𝑓(𝑥)𝑑𝑥 = 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = 𝑃(𝑎 < 𝑋 ≤ 𝑏) = 𝑃(𝑎 ≤ 𝑋 < 𝑏) = 𝑃(𝑎 < 𝑋 <
𝑏) = 𝐹(𝑏) − 𝐹(𝑎)
1.5 Expectation
Definition 1.10: If X is a discrete random variable with pdf 𝑓(𝑥), then the expected value
(mean) of X is defined by
𝐸 (𝑋) = 𝑥𝑓(𝑥)
Similarly, if 𝑔(𝑋) is a real-valued function of X, then the expected value of 𝑔(𝑋) is given
by
𝐸 (𝑔(𝑋)) = 𝑔(𝑥)𝑓(𝑥)
Example 1.8: A box contains four chips. Two are labelled with the number 2, one is labelled
with a 4, and the other with an 8. The average of the numbers on the four chips is =
4. The experiment of choosing a chip at random and recording its number can be expressed
with a discrete random variable X having distinct values 𝑥 = 2,4, or 8, with 𝑓(2) = , and
𝑓(4) = 𝑓(8) = . The corresponding expected value or mean is
1 1 1
𝜇 = 𝐸(𝑋) = 2 +4 +8 =4
2 4 4
Definition 1.11: If X is a continuous random variable with pdf 𝑓(𝑥), then the expected
value (mean) of X is defined by
𝐸 (𝑋 ) = 𝑥𝑓(𝑥) 𝑑𝑥
if the integral is absolutely convergent. Otherwise we say that 𝐸(𝑋) does not exist. Similarly,
if 𝑔(𝑋) is a real-valued function of X, then the expected value of 𝑔(𝑋) is given by
𝐸 (𝑔(𝑋)) = 𝑔(𝑥)𝑓(𝑥) 𝑑𝑥
In Example 1.8, the mean length between flaws in a piece of wire is
𝜇 = 𝐸 (𝑋) = 𝑥(0) 𝑑𝑥 + 𝑥[2(1 + 𝑥) ] 𝑑𝑥
If we make the substitution 𝑡 = 1 + 𝑥, then

1
𝜇=2 (𝑡 − 1)𝑡 𝑑𝑡 = 2 1 − =1
2
Definition 1.12: The variance of a (continuous or discrete) random variable X is given by

𝜎 = 𝑉𝑎𝑟 (𝑋) = 𝐸[(𝑋 − 𝜇) ] = 𝐸 (𝑋 ) − 𝜇 = 𝐸 (𝑋(𝑋 − 1)) + 𝐸(𝑋) − [𝐸(𝑋)]
and it positive square root 𝜎 = 𝑉𝑎𝑟(𝑋) is called the standard deviation.
In the Example 1.8, we have 𝐸 (𝑋 ) = 2 +4 +8 = 22, and thus the variance

is 𝑉𝑎𝑟 (𝑋) = 22 − 4 = 6 and the standard deviation is 𝜎 = √6 = 2.45.
Definition 1.13: the kth moment about the origin of a random variable X is
𝜇 = 𝐸(𝑋 )
and the kth moment about the mean is
𝜇 = 𝐸[𝑋 − 𝐸(𝑋)] = 𝐸 (𝑋 − 𝜇 )
Note that 𝜇 = 𝐸[𝑋 − 𝐸(𝑋)] = 𝐸 (𝑋) − 𝐸 (𝑋) = 0 and 𝜇 = 𝐸 (𝑋 − 𝜇 ) = 𝜎 .
Theorem 1.14 (Bounds on Probability): If X is a random variable and 𝑢(𝑥) is a

nonnegative real-valued function, then for any positive constant 𝑐 > 0,
𝐸[𝑢(𝑋)]
𝑃[𝑢(𝑋) ≥ 𝑐] ≤
𝑐
Proof: If 𝐴 = {𝑥: 𝑢(𝑥) ≥ 𝑐}, then for a continuous random variable,
𝐸 𝑢(𝑋) = 𝑢(𝑥)𝑓 (𝑥 )𝑑𝑥 = 𝑢(𝑥 )𝑓 (𝑥 )𝑑𝑥 + 𝑢(𝑥 )𝑓 (𝑥 )𝑑𝑥
≥ 𝑢(𝑥)𝑓(𝑥)𝑑𝑥 ≥ 𝑐𝑓(𝑥)𝑑𝑥
= 𝑐𝑃[𝑋 ∈ 𝐴] = 𝑐𝑃[𝑢(𝑋) ≥ 𝑐]

A special case, known as the Markov inequality, is obtained if 𝑢(𝑥 ) = |𝑥| for 𝑟 > 0,
namely
𝐸 (|𝑋| )
𝑃[|𝑋| ≥ 𝑐] ≤
𝑐
Theorem 1.15 (Chebychev inequality): If X is a random variable with mean 𝜇 and variance
𝜎 , then for any 𝑘 > 0,
𝑃[|𝑋 − 𝜇| ≥ 𝑘𝜎] ≤ or 𝑃[|𝑋 − 𝜇| < 𝑘𝜎] ≥ 1 −
Proof: if 𝑢(𝑋) = (𝑋 − 𝜇 ) , 𝑐 = 𝑘 𝜎 , then using Theorem 1.14,

𝐸(𝑋 − 𝜇) 1
𝑃[(𝑋 − 𝜇) ≥ 𝑘 𝜎 ] ≤ ≤
𝑘 𝜎 𝑘
Example 1.9: Suppose that X takes on the value –1, 0, and 1 with probabilities 1/8, 6/8, and
1/8, respectively. Then 𝜇 = 0 and 𝜎 = 1⁄4. For 𝑘 = 2,
3 1
𝑃[−2(0.5) < 𝑋 − 0 < 2(0.5)] = 𝑃[−1 < 𝑋 < 1] = 𝑃[𝑋 = 0] = ≥1−
4 2
1.6 Moment Generating Functions

Definition 1.16: If X is random variable, then the expected value
𝑀 (𝑡) = 𝐸 (𝑒 )
⎧ 𝑒 𝑓 (𝑥 ) , 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
⎪
=
⎨
⎪ 𝑒 𝑓(𝑥) 𝑑𝑥, 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
⎩
is called the moment generating function (mgf) of X if this expected value exists for all
values of t in some interval of the form −ℎ < 𝑡 < ℎ for some ℎ > 0.
Theorem 1.17: If the mgf of X exists, then

( )
𝐸 (𝑋 ) = 𝑀 (0) for all 𝑟 = 1,2, …
Proof: The series expansion of 𝑒 is

𝑡 𝑋 𝑡 𝑋 𝑡 𝑋
𝑒 = 1 + 𝑡𝑋 + + + ⋯+ +⋯
2! 3! 𝑟!

Hence, 𝑀 (𝑡) = 𝐸 (𝑒 ) = 1 + 𝑡𝐸 (𝑋) + 𝐸 (𝑋 ) + 𝐸 (𝑋 ) + ⋯ + 𝐸 (𝑋 ) + ⋯

! ! !
𝑡 𝑡 𝑡
= 1 + 𝑡𝜇 + 𝜇 + 𝜇 + ⋯+ 𝜇 + ⋯
2! 3! 𝑟!
where 𝜇 is the rth moment. Differentiate 𝑀 (𝑡) for r times with respect to t and setting 𝑡 =
0, we obtain the rth moment about the origin 𝜇 .
Example 1.10: A discrete random variable X has pdf 𝑓(𝑥) = if 𝑥 = 0,1,2, …., and
zero otherwise. The mgf of X is
1 1 𝑒
𝑀 (𝑡) = 𝐸 (𝑒 ) = 𝑒 𝑓 (𝑥) = 𝑒 =
2 2 2
We make use of the well-known identity for the geometric series,

1
1+𝑠+𝑠 +𝑠 +⋯= , −1<𝑠 <1
1−𝑠
with 𝑠 = . The resulting mgf is
1
𝑀 (𝑡) = , 𝑡 < ln 2
2−𝑒
The first derivative is 𝑀 (𝑡) = 𝑒 (2 − 𝑒 ) , and thus 𝐸 (𝑋) = 𝑀 (0) = 𝑒 (2 − 𝑒 ) = 1.
It is possible to obtain higher derivatives, but the complexity increases with the order of the
derivative.
Example 1.11: Consider a continuous random variable X with pdf 𝑓 (𝑥) = 𝑒 if 𝑥 > 0, and
zero otherwise. The mgf is
𝑀 (𝑡) = 𝐸 (𝑒 ) = ∫ 𝑒 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑒 𝑒 𝑑𝑥
( )
1 ( ) 1
= 𝑒 𝑑𝑥 = 𝑒 = , 𝑡<1
1−𝑡 1−𝑡
( )
The rth derivative is 𝑀 (𝑡) = 𝑟! (1 − 𝑡) , and thus the rth moment is 𝐸(𝑋 ) =
( )
𝑀 (0) = 𝑟!. The mean is 𝜇 = 𝐸 (𝑋) = 1! = 1, and the variance is 𝑉𝑎𝑟 (𝑋) = 𝐸 (𝑋 ) −
𝜇 = 2! − 1 = 1.
Theorem 1.18: If 𝑌 = 𝑎𝑋 + 𝑏, then 𝑀 (𝑡) = 𝑒 𝑀 (𝑎𝑡).
( )
Proof: 𝑀 (𝑡) = 𝐸 (𝑒 ) = 𝐸 𝑒 = 𝐸 (𝑒 𝑒 ) = 𝑒 𝐸 (𝑒 ) = 𝑒 𝑀 (𝑎𝑡).

Theorem 1.19 (Uniqueness): If 𝑋 and 𝑋 have respective cdf 𝐹 (𝑥) and 𝐹 (𝑥), and mgf
𝑀 (𝑡) and 𝑀 (𝑡), then 𝐹 (𝑥 ) = 𝐹 (𝑥) for all real x if and only if 𝑀 (𝑡) = 𝑀 (𝑡) for all t in
some interval −ℎ < 𝑡 < ℎ for some ℎ > 0.
Definition 1.20: the rth factorial moment of X is

𝐸[𝑋(𝑋 − 1) ⋯ (𝑋 − 𝑟 + 1)]
and the factorial moment generating function (FMGF) of X is 𝐺 (𝑡) = 𝐸(𝑡 ) if this
expectation exists for all t in some interval of the form 1 − ℎ < 𝑡 < 1 + ℎ.
Theorem 1.21: if X has a FMGF, 𝐺 (𝑡), then

𝐺 (1) = 𝐸(𝑋)
𝐺 (1) = 𝐸[𝑋(𝑋 − 1)]
( )
𝐺 (1) = 𝐸[𝑋(𝑋 − 1) ⋯ (𝑋 − 𝑟 + 1)]

1.7 Special Probability Distributions

Some commonly used discrete distributions are as follows:
Distribution 𝒇(𝒙) 𝑬(𝑿) 𝑽𝒂𝒓(𝑿) 𝑴𝑿 (𝒕)
Bernoulli 𝑝 (1 − 𝑝) ; 𝑝 𝑝(1 − 𝑝) 𝑝𝑒 + 𝑞
𝑋~𝐵𝑒𝑟(𝑝) 𝑥 = 0,1; 0 < 𝑝 < 1
Binomial 𝑝 (1 − 𝑝) ; 𝑛𝑝 𝑛𝑝(1 − 𝑝) (𝑝𝑒 + 𝑞) ;
𝑋~𝐵𝑖𝑛(𝑛, 𝑝) 𝑥 = 0,1,2, … , 𝑛; 0 < −∞ < 𝑡 < ∞
𝑝<1
Geometric 𝑝(1 − 𝑝) ; 1 1−𝑝
;
𝑋~𝐺𝑒𝑜(𝑝) 𝑥 = 1,2,3 …; 0 < 𝑝 < 1 𝑝 𝑝
𝑡 < −ln(𝑞)
Hypergeometric 𝑛𝑀 𝑀 𝑀 𝐹 (−𝑛, −𝑀; 𝑁 − 𝑀 − 𝑛 + 1; 𝑒 )
𝑛 𝑁 1 − 𝑁 (𝑁 − 𝑛)
𝑋~𝐻𝑦𝑝(𝑛, 𝑀, 𝑁) 𝑁
(𝑁 − 1)
Poisson ; 𝜇 𝜇 𝑒 ;
!
𝑋~𝑃𝑜𝑖(𝜇) −∞ < 𝑡 < ∞
𝑥 = 0,1,2 …
Negative Binomial 𝑝 (1 − 𝑝) ; 𝑟 𝑟(1 − 𝑝) 𝑝𝑒
𝑝 𝑝 1 − 𝑞𝑒
𝑋~𝑁𝐵(𝑟, 𝑝) 𝑥 = 𝑟, 𝑟 + 1, …
Discrete Uniform ; 𝑁+1 𝑁 −1 𝑒 (1 − 𝑒 )
𝑋~𝐷𝑈(𝑁) 2 12 𝑁(1 − 𝑒 )
𝑥 = 1,2, … , 𝑁

Some commonly used continuous distributions are as follows:

Distribution 𝒇(𝒙) 𝑬(𝑿) 𝑽𝒂𝒓(𝑿) 𝑴𝑿 (𝒕)
Uniform ; 𝑎+𝑏 (𝑏 − 𝑎) 𝑒 −𝑒
𝑋~𝑈(𝑎, 𝑏) 2 12 (𝑏 − 𝑎)𝑡
𝑎<𝑥<𝑏
Gamma 𝑥 𝑒 ⁄
; 𝜅𝜃 𝜅𝜃 1
( )
𝑋~𝐺𝑎𝑚(𝜃, 𝜅) 1 − 𝜃𝑡
𝑥 > 0, 𝜃 > 0, 𝜅 > 0
Chi-square 1 ⁄ 𝜈 2𝜈 1 ⁄
⁄
𝑥 𝑒 ;
𝑋~𝜒 (𝜈) 2 Γ(𝜈⁄2) 1 − 2𝑡
𝑥 > 0, 𝜈 = 1,2, …
Exponential 𝑒 ⁄
; 𝜃 𝜃 1
𝑋~𝐸𝑥𝑝(𝜃) 1 − 𝜃𝑡
𝑥 > 0, 𝜃 > 0
Weibull 𝑋 𝑒 ( ⁄ )
; 1 2 1 Not tractable
𝜃Γ 1 + 𝜃 Γ 1+ −Γ 1+
𝑋~𝑊𝑒𝑖(𝜃, 𝛽) 𝛽 𝛽 𝛽
𝑥 > 0, 𝜃 > 0, 𝛽 > 0
Pareto ; 𝜃 𝜃 𝜅 Does not exists
; ;
𝑋~𝑃𝑎𝑟(𝜃, 𝜅) 𝜅−1 (𝜅 − 2)(𝜅 − 1)
𝑥 > 0, 𝜃 > 0, 𝜅 > 0 𝜅>1 𝜅>2
Normal 1 ( ) 𝜇 𝜎 𝑒 ⁄
𝑒 ;
𝑋~𝑁(𝜇, 𝜎 ) √2𝜋𝜎
−∞ < 𝑥, 𝜇 < ∞, 𝜎 > 0
Beta Γ(𝑎 + 𝑏) 𝑎 𝑎𝑏 Not tractable
𝑥 (1 − 𝑥) ; 𝑎+𝑏
𝑋~𝐵𝑒𝑡𝑎(𝑎, 𝑏) Γ(𝑎)Γ(𝑏) (𝑎 + 𝑏 + 1)(𝑎 + 𝑏)
0 < 𝑥 < 1, 𝑎 > 0, 𝑏 > 0

1.8 Joint Distributions

Definition 1.22: The join probability density function (joint pdf) of the k-dimensional
discrete random variable 𝑿 = (𝑋 , 𝑋 , … , 𝑋 ) is defined to be
𝑓(𝑥 , 𝑥 , … , 𝑥 ) = 𝑃(𝑋 = 𝑥 , 𝑋 = 𝑥 , … , 𝑋 = 𝑥 )
for all possible values 𝒙 = (𝑥 , 𝑥 , … , 𝑥 ) of X.
Example 1.12: A bin contained 1000 flowers seeds and 400 were red flowering seeds. Of
the remaining seeds, 400 are white flowering and 200 are pink flowering. If 10 seeds are
selected at random without replacement, then the number of red flowering seeds, 𝑋 , and the
number of white flowering seeds, 𝑋 , in the sample are jointly distributed discrete random
variables. The join pdf of the pair (𝑋 , 𝑋 ) is obtained by the counting technique,
specifically
𝑓 (𝑥 , 𝑥 ) =
The probability of obtaining exactly two red, five white and three pink flowering seeds is
𝑓 (2, ,5) = 0.0331 . Notice that once the values of red and white flowering seeds are
specified, the number of pink seeds is also determined, so it suffices to consider only two
variables.
Definition 1.23: The joint cumulative distribution function (joint cdf) of the k random
variables 𝑋 , 𝑋 , … , 𝑋 is the function defined by
𝐹 (𝑥 , 𝑥 , … , 𝑥 ) = 𝑃[𝑋 ≤ 𝑥 , 𝑋 ≤ 𝑥 , … , 𝑋 ≤ 𝑥 ]
Definition 1.24: A k-dimensional vector-valued random variable 𝑿 = (𝑋 , 𝑋 , … , 𝑋 ) is said

to be continuous if there is a function 𝑓(𝑥 , 𝑥 , … , 𝑥 ), called the joint pdf of X, such that
the joint cdf can be written as
𝐹 (𝑥 , 𝑥 , … , 𝑥 ) = ⋯ 𝑓 (𝑡 , 𝑡 , … , 𝑡 ) 𝑑𝑡 ⋯ 𝑑𝑡
for all 𝑥 = (𝑥 , 𝑥 , … , 𝑥 ).
Theorem 1.25: A function 𝑓(𝑥 , 𝑥 , … , 𝑥 ) is the joint pdf for some vector-valued random
variable 𝑿 = (𝑋 , 𝑋 , … , 𝑋 ) if and only if the following properties are satisfied:
1. 𝑓(𝑥 , 𝑥 , … , 𝑥 ) ≥ 0 for all possible values (𝑥 , 𝑥 , … , 𝑥 ),
2. ∑ ⋯ ∑ 𝑓 (𝑥 , 𝑥 , … , 𝑥 ) = 1 for discrete case or
∫ ⋯ ∫ 𝑓(𝑥 , 𝑥 , … , 𝑥 ) 𝑑𝑥 ⋯ 𝑑𝑥 = 1 for continuous case.
Example 1.13: Let 𝑋 denote the concentration of a certain substance in one trial of an
experiment, and 𝑋 the concentration of the substance in a second trial of the experiment.

Assume that the joint pdf is given by 𝑓(𝑥 , 𝑥 ) = 4𝑥 𝑥 ; 0 < 𝑥 < 1, 0 < 𝑥 < 1, and zero
otherwise. The joint cdf is given by
𝐹 (𝑥 , 𝑥 ) = 𝑓 (𝑡 , 𝑡 ) 𝑑𝑡 𝑑𝑡 = 4𝑡 𝑡 𝑑𝑡 𝑑𝑡 = 𝑥 𝑥
for 0 < 𝑥 < 1, 0 < 𝑥 < 1.
Definition 1.26: If the pair (𝑋 , 𝑋 ) of random variables has the joint pdf 𝑓(𝑥 , 𝑥 ), then the
marginal pdf’s of 𝑋 and 𝑋 are
𝑓 (𝑥 ) = ∑ 𝑓(𝑥 , 𝑥 ) and 𝑓 (𝑥 ) = ∑ 𝑓(𝑥 , 𝑥 ) for discrete case
𝑓 (𝑥 ) = ∫ 𝑓 (𝑥 , 𝑥 )𝑑𝑥 and 𝑓 (𝑥 ) = ∫ 𝑓 (𝑥 , 𝑥 )𝑑𝑥 for continuous case
!
Example 1.14: If 𝑓(𝑥 , 𝑥 ) = 𝑝 𝑝 is a multinomial distribution, then the marginal
! !
pdf of 𝑋 is
𝑛!
𝑓 (𝑥 ) = 𝑓(𝑥 , 𝑥 ) = 𝑝 𝑝
𝑥 !𝑥 !
𝑛! (𝑛 − 𝑥 )!
= 𝑝 𝑝 [(1 − 𝑝 ) − 𝑝 ]( )
𝑥 ! (𝑛 − 𝑥 )! 𝑥 ! [(𝑛 − 𝑥 ) − 𝑥 ]!
𝑛! 𝑛−𝑥
= 𝑝 𝑝 [(1 − 𝑝 ) − 𝑝 ]( )
𝑥 ! (𝑛 − 𝑥 )! 𝑥
𝑛
= 𝑝 [𝑝 + (1 − 𝑝 ) − 𝑝 ]
𝑥
𝑛
= 𝑝 [(1 − 𝑝 )]
𝑥
That is, 𝑋 ~𝐵𝑖𝑛(𝑛, 𝑝 ).
Definition 1.27: If 𝑿 = (𝑋 , 𝑋 , … , 𝑋 ) is a k-dimensional random variable with joint cdf

𝐹 (𝑥 , 𝑥 , … , 𝑥 ), then the marginal cdf of 𝑋 is
𝐹 𝑥 = lim
→
𝐹 𝑥 ,…,𝑥 ,…,𝑥
and the marginal pdf is

1. 𝑓 𝑥 =∑ ∑𝑓 𝑥 ,…,𝑥 ,…,𝑥 for discrete case
2. 𝑓 𝑥 =∫ ∫𝑓 𝑥 ,…,𝑥 ,…,𝑥 𝑑𝑥 ⋯ 𝑑𝑥 for continuous case

Example 1.15: Let 𝑋 , 𝑋 and 𝑋 be continuous with a joint pdf of the form 𝑓(𝑥 , 𝑥 , 𝑥 ) =
𝑐, 0 < 𝑥 < 𝑥 < 𝑥 < 1, and zero otherwise, where c is a constant. First, note that
1 = ∫ ∫ ∫ 𝑐 𝑑𝑥 𝑑𝑥 𝑑𝑥 = ∫ ∫ 𝑐𝑥 𝑑𝑥 𝑑𝑥 = ∫ 𝑑𝑥 = =
Hence, 𝑐 = 6. Suppose it is desired to find the marginal of 𝑋 , we obtain
𝑓 (𝑥 ) = 6 𝑑𝑥 𝑑𝑥 = 6 𝑥 𝑑𝑥 = 3𝑥
if 0 < 𝑥 < 1, and zero otherwise.
The joint pdf of any subset of the original set of random variables can be obtained with a
similar procedure. For instance, the joint pdf of the pair (𝑋 , 𝑋 ) as follows:
𝑓(𝑥 , 𝑥 ) = 𝑓 (𝑥 , 𝑥 , 𝑥 ) 𝑑𝑥 = 6 𝑑𝑥 = 6(1 − 𝑥 )
if 0 < 𝑥 < 𝑥 < 1, and zero otherwise.
1.9 Conditional Distributions

Definition 1.28: If 𝑋 and 𝑋 are discrete or continuous random variables with joint pdf
𝑓(𝑥 , 𝑥 ), then the conditional probability density function (conditional pdf) of 𝑋 given
𝑋 = 𝑥 is defined to be
𝑓 (𝑥 , 𝑥 )
𝑓(𝑥 |𝑥 ) =
𝑓 (𝑥 )
for values 𝑥 such that 𝑓 (𝑥 ) > 0, and zero otherwise. Similarly,
𝑓 (𝑥 , 𝑥 )
𝑓(𝑥 |𝑥 ) =
𝑓 (𝑥 )
is the conditional pdf of 𝑋 given 𝑋 = 𝑥 .
Example 1.16: from the previous Example 1.15, the conditional pdf of 𝑋 given (𝑋 , 𝑋 ) =
(𝑥 , 𝑥 ) is
𝑓(𝑥 , 𝑥 , 𝑥 ) 6 1
𝑓 (𝑥 |𝑥 , 𝑥 ) = = =
𝑓(𝑥 , 𝑥 ) 6(1 − 𝑥 ) (1 − 𝑥 )
for 0 < 𝑥 < 𝑥 < 𝑥 < 1.
Theorem 1.29: If 𝑋 and 𝑋 are random variables with joint pdf 𝑓(𝑥 , 𝑥 ) and marginal
pdf’s 𝑓 (𝑥 ) and 𝑓 (𝑥 ), then
𝑓 (𝑥 , 𝑥 ) = 𝑓 (𝑥 )𝑓 (𝑥 |𝑥 ) = 𝑓 (𝑥 )𝑓(𝑥 |𝑥 )
and if 𝑋 and 𝑋 are independent, then

𝑓 (𝑥 |𝑥 ) = 𝑓 (𝑥 ) and 𝑓(𝑥 |𝑥 ) = 𝑓 (𝑥 )
1.10 Conditional Expectation

Definition 1.30: If X and Y are jointly distributed random variables, then the conditional
expectation of Y given 𝑋 = 𝑥 is given by
1. 𝐸 (𝑌|𝑥 ) = ∑ 𝑦𝑓(𝑦|𝑥) if X and Y are discrete
2. 𝐸 (𝑌|𝑥 ) = ∫ 𝑦𝑓(𝑦|𝑥)𝑑𝑦 if X and Y are continuous
Example 1.17: A certain airborne particle lands at a random point (X, Y) on a triangular
region with the conditional pdf of Y given 𝑋 = 𝑥 is 𝑓(𝑦|𝑥) = , 0 < 𝑦 < . Then, the
conditional expectation is
⁄ 2 𝑥
2 𝑥
𝐸 (𝑌|𝑥) = 𝑦 𝑑𝑦 = 𝑥 2 = , 0<𝑥<2
𝑥 2 4
Theorem 1.31: If X and Y are jointly distributed random variables, then

𝐸[𝐸(𝑌|𝑋)] = 𝐸(𝑌)
and if X and Y are independent random variables, then 𝐸(𝑌|𝑥) = 𝐸(𝑌) and 𝐸(𝑋|𝑦) = 𝐸(𝑋).
Definition 1.32: The conditional variance of Y given 𝑋 = 𝑥 is given by

𝑉𝑎𝑟(𝑌|𝑥) = 𝐸{[𝑌 − 𝐸(𝑌|𝑥)] |𝑥} = 𝐸(𝑌 |𝑥) − [𝐸(𝑌|𝑥)]
Definition 1.33: The joint moment generating function (joint mgf) of 𝑿 = (𝑋 , 𝑋 , … , 𝑋 ),

if it exists, is defined to be
𝑀 (𝒕) = 𝐸 exp 𝑡𝑋
where 𝒕 = (𝑡 , 𝑡 , … , 𝑡 ) and −ℎ < 𝑡 < ℎ for some ℎ > 0.
1.11 Special joint distributions

1) Extended Hypergeometric distribution, 𝑋~𝐻𝑦𝑝(𝑛, 𝑀 , 𝑀 , … , 𝑀 , 𝑁)
Suppose that a collection consists of a finite number of items, N, and that these are 𝑘 + 1
different types; 𝑀 of type 1, 𝑀 of type 2, and so on. Select n items at random without
replacement, and let 𝑋 be the number of items of type i that are selected. The vector 𝑿 =
(𝑋 , 𝑋 , … , 𝑋 ) has an extended hypergeometric distribution and a joint pdf of the form

⋯
𝑓(𝑥 , 𝑥 , … , 𝑥 ) =
for all 0 ≤ 𝑥 ≤ 𝑀 , where 𝑀 =𝑁−∑ 𝑀 and 𝑥 =𝑛−∑ 𝑥.
2) Multinomial distribution, 𝑋~𝑀𝑈𝐿𝑇(𝑛, 𝑝 , 𝑝 , … , 𝑝 )

Suppose that there are 𝑘 + 1 mutually exclusive and exhaustive events, say
𝐸 , 𝐸 , … , 𝐸 , 𝐸 , which can occur on any trial of an experiment, and let 𝑝 = 𝑃(𝐸 ) for
𝑖 = 1,2, … , 𝑘 + 1. On n independent trials of the experiment, we let 𝑋 be the number of
occurrences of the event 𝐸 . The vector 𝑿 = (𝑋 , 𝑋 , … , 𝑋 ) is said to have the multinomial
distribution, which has a joint pdf of the form
𝑛!
𝑓 (𝑥 , 𝑥 , … , 𝑥 ) = 𝑝 𝑝 ⋯𝑝
𝑥 !𝑥 !⋯𝑥 !
for all 0 ≤ 𝑥 ≤ 𝑛, where 𝑥 =𝑛−∑ 𝑥 and 𝑝 = 1−∑ 𝑝.
3) Bivariate Normal distribution, (𝑋, 𝑌)~𝐵𝑉𝑁 (𝜇 , 𝜇 , 𝜎 , 𝜎 , 𝜌)

A pair of continuous random variable X and Y is said to have a bivariate normal distribution
if it has a joint pdf of the form
1
𝑓(𝑥, 𝑦) =
2𝜋𝜎 𝜎 1−𝜌
1 𝑥−𝜇 𝑥−𝜇 𝑦−𝜇 𝑦−𝜇
× exp − − 2𝜌 +
2(1 − 𝜌 ) 𝜎 𝜎 𝜎 𝜎
for −∞ < 𝑥 < ∞, −∞ < 𝑦 < ∞ and −1 < 𝜌 < 1 is the correlation coefficient of X and Y.
Theorem 1.34: If (𝑋, 𝑌 )~𝐵𝑉𝑁 (𝜇 , 𝜇 , 𝜎 , 𝜎 , 𝜌), then

1) 𝑋~𝑁 (𝜇 , 𝜎 ) and 𝑌~𝑁(𝜇 , 𝜎 ), and 𝜌 is the correlation coefficient of X and Y.
2) Conditional on 𝑋 = 𝑥, 𝑌|𝑥~𝑁 𝜇 + 𝜌 (𝑥 − 𝜇 ), 𝜎 (1 − 𝜌 )
3) Conditional on 𝑌 = 𝑦, 𝑋|𝑦~𝑁 𝜇 + 𝜌 (𝑦 − 𝜇 ), 𝜎 (1 − 𝜌 )

UECM2273 Mathematical Statistics

Uploaded by

Copyright:

Available Formats

You might also like

UECM2273 Mathematical Statistics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UECM2273 Mathematical Statistics

Uploaded by

Copyright:

Available Formats

[UECM2273 MATHEMATICAL STATISTICS] January 14, 2019

1.2 Discrete Random Variables

Example 1.1: A four-sided (tetrahedral) die has a different number – 1, 2, 3, or 4 – affixed to

(1,4) (2,4) (3,4) (4,4)

Prepared by Dr. Chang Yun Fah Page 1

Example 1.3: from the Example 1.1, we have

Prepared by Dr. Chang Yun Fah Page 2

where ∑ 𝑓 (𝑥 ) = 𝑓(1) + 𝑓 (2) + 𝑓(3) + 𝑓 (4) = + + + = 1.

In order to determine the value, c, we consider Theorem 1.1 as follow

1.3 Continuous Random Variables

𝑓(𝑥) = 𝑐(1 + 𝑥) 𝑥>0

1.4 Distribution Functions

3) lim 𝐹(𝑥 + ℎ) = 𝐹(𝑥),

4) 𝑎 < 𝑏 implies 𝐹(𝑎) ≤ 𝐹(𝑏).

Prepared by Dr. Chang Yun Fah Page 4

In Example 1.8, the mean length between flaws in a piece of wire is

𝜇 = 𝐸 (𝑋) = 𝑥(0) 𝑑𝑥 + 𝑥[2(1 + 𝑥) ] 𝑑𝑥

If we make the substitution 𝑡 = 1 + 𝑥, then

Prepared by Dr. Chang Yun Fah Page 5

Definition 1.12: The variance of a (continuous or discrete) random variable X is given by

In the Example 1.8, we have 𝐸 (𝑋 ) = 2 +4 +8 = 22, and thus the variance

Note that 𝜇 = 𝐸[𝑋 − 𝐸(𝑋)] = 𝐸 (𝑋) − 𝐸 (𝑋) = 0 and 𝜇 = 𝐸 (𝑋 − 𝜇 ) = 𝜎 .

Theorem 1.14 (Bounds on Probability): If X is a random variable and 𝑢(𝑥) is a

Proof: If 𝐴 = {𝑥: 𝑢(𝑥) ≥ 𝑐}, then for a continuous random variable,

𝐸 𝑢(𝑋) = 𝑢(𝑥)𝑓 (𝑥 )𝑑𝑥 = 𝑢(𝑥 )𝑓 (𝑥 )𝑑𝑥 + 𝑢(𝑥 )𝑓 (𝑥 )𝑑𝑥

Prepared by Dr. Chang Yun Fah Page 6

Proof: if 𝑢(𝑋) = (𝑋 − 𝜇 ) , 𝑐 = 𝑘 𝜎 , then using Theorem 1.14,

1.6 Moment Generating Functions

Theorem 1.17: If the mgf of X exists, then

Proof: The series expansion of 𝑒 is

Prepared by Dr. Chang Yun Fah Page 7

Hence, 𝑀 (𝑡) = 𝐸 (𝑒 ) = 1 + 𝑡𝐸 (𝑋) + 𝐸 (𝑋 ) + 𝐸 (𝑋 ) + ⋯ + 𝐸 (𝑋 ) + ⋯

We make use of the well-known identity for the geometric series,

Theorem 1.18: If 𝑌 = 𝑎𝑋 + 𝑏, then 𝑀 (𝑡) = 𝑒 𝑀 (𝑎𝑡).

Prepared by Dr. Chang Yun Fah Page 8

Definition 1.20: the rth factorial moment of X is

Theorem 1.21: if X has a FMGF, 𝐺 (𝑡), then

Prepared by Dr. Chang Yun Fah Page 9

1.7 Special Probability Distributions

Prepared by Dr. Chang Yun Fah Page 10

Some commonly used continuous distributions are as follows:

Prepared by Dr. Chang Yun Fah Page 11

1.8 Joint Distributions

Definition 1.24: A k-dimensional vector-valued random variable 𝑿 = (𝑋 , 𝑋 , … , 𝑋 ) is said

Prepared by Dr. Chang Yun Fah Page 12

for 0 < 𝑥 < 1, 0 < 𝑥 < 1.

Definition 1.27: If 𝑿 = (𝑋 , 𝑋 , … , 𝑋 ) is a k-dimensional random variable with joint cdf

and the marginal pdf is

Prepared by Dr. Chang Yun Fah Page 13

Hence, 𝑐 = 6. Suppose it is desired to find the marginal of 𝑋 , we obtain

if 0 < 𝑥 < 1, and zero otherwise.

if 0 < 𝑥 < 𝑥 < 1, and zero otherwise.

1.9 Conditional Distributions

and if 𝑋 and 𝑋 are independent, then

1.10 Conditional Expectation

Theorem 1.31: If X and Y are jointly distributed random variables, then

Definition 1.32: The conditional variance of Y given 𝑋 = 𝑥 is given by

Definition 1.33: The joint moment generating function (joint mgf) of 𝑿 = (𝑋 , 𝑋 , … , 𝑋 ),