Professional Documents
Culture Documents
UECM2273 Mathematical Statistics
UECM2273 Mathematical Statistics
UECM2273 Mathematical Statistics
CHAPTER ONE:
RANDOM VARIABLES AND THEIR DISTRIBUTION
This topic is served only as a revision of Probability & Statistics I & II. So all basic
concepts will only be briefly stated. Students enrolled in this course are advised to revise the
materials learnt in the two courses mentioned above because they will also be tested.
1.1 Probability
Definition 1.1: Suppose S is a sample space associated with an experiment. To every event
A in S (A is a subset of S), we assign a number 𝑃(𝐴), called the probability of A, so that the
following axioms hold:
Axiom 1: 𝑃(𝐴) ≥ 0.
Axiom 2: 𝑃(𝑆) = 1.
Axiom 3: If 𝐴 , 𝐴 , 𝐴 , … from a sequence of pairwise mutually exclusive (disjoint)
events in S (that is, 𝐴 ∩ 𝐴 = ∅ if 𝑖 ≠ 𝑗), then 𝑃(⋃ 𝐴 ) = ∑ 𝑃(𝐴 ).
x
1 2 3 4
Definition 1.3: If the set of all possible values of a random variable, X, is a countable finite
set 𝑥 , 𝑥 , … , 𝑥 , or countable infinite set 𝑥 , 𝑥 , …, then X is called a discrete random
variable. The function
𝑓 (𝑥) = 𝑃 (𝑋 = 𝑥), where 𝑥 = 𝑥 , 𝑥 , …
that assigns the probability to each possible value x will be called the discrete probability
density function (or probability mass function).
Example 1.2: A supervisor in a manufacturing plant has three men and three women
working for him. He wants to choose two workers for a special job. Not wishing to show
any biases in his selection, he decides to select the two workers at random. Let Y denote the
number of women in his selection. Find the probability density function for Y.
Answer: the supervisor can select two workers from six in = 15 ways. Hence, S
contains 15 sample points, which we assume to be equally likely because random sampling
was employed. The number of women that can be selected are 𝑌 = 0,1, or 2. Hence, the
number of ways of selecting Y is , and the number of ways of selecting men is .
Thus, the probability to select y women is
𝑓(𝑦) = 𝑃(𝑌 = 𝑦) = .
or
y Calculation 𝑓(𝑦)
0 3 1
= 5
15
1 9 3
= 5
15
2 3 1
= 5
15
Theorem 1.4: A function 𝑓(𝑥) is a discrete pdf if and only if it satisfies both of the
following properties:
1) 𝑓 (𝑥 ) ≥ 0 for all 𝑥 , and
2) ∑ 𝑓(𝑥 ) = 1.
Example 1.4: We roll a 12-sided die twice. If each face is marked with an integer, 1 through
12, then each value is equally likely to occur on a single roll of the die. We define a random
variable X to be the maximum obtained on the two rolls. It is not hard to see that for each
value x there are an odd number, 2𝑥 − 1 of ways for that value to occur. Thus, the discrete
pdf of X must have the form
𝑓 (𝑥) = 𝑐(2𝑥 − 1) for 𝑥 = 1,2, … ,12
So, 𝑐 = .
𝐹 (𝑥 ) = 𝑓(𝑡) 𝑑𝑡
By the Fundamental Theorem of Calculus, the pdf of X can be obtained from the cdf by
differentiations
𝑑
𝑓 (𝑥 ) = 𝐹 (𝑥) = 𝐹′(𝑥)
𝑑𝑥
wherever the derivative exists.
Theorem 1.6: A function 𝑓(𝑥) is a pdf for some continuous random variable X if and only
if it satisfies the properties
1) 𝑓(𝑥) ≥ 0 for all real x, and
2) ∫ 𝑓(𝑥) 𝑑𝑥 = 1.
Example 1.5: A machine produces copper wire, and occasionally there is a flaw at some
point along the wire. The length of wire (in meters) produced between successive flaws is a
continuous random variable X with pdf of the form
𝑐
1= 𝑓(𝑥) 𝑑𝑥 = 𝑐 (1 + 𝑥 ) 𝑑𝑥 =
2
(using substitution 𝑢 = 1 + 𝑥 and an application of the power rule for integrals). This
implies that the constant is 𝑐 = 2.
Theorem 1.8: Let X be a discrete random variable with pdf 𝑓(𝑥) and cdf 𝐹(𝑥). If the
possible values of X are indexed in increasing order, 𝑥 < 𝑥 < 𝑥 < ⋯, then
𝑓 (𝑥 ) = 𝐹 (𝑥 ), and
𝑓 (𝑥 ) = 𝐹 (𝑥 ) − 𝐹 (𝑥 ) for any 𝑖 > 1.
Furthermore, if 𝑥 < 𝑥 then 𝐹 (𝑥 ) = 0, and for any other real x
𝐹(𝑥) = ∑ 𝑓(𝑥 ).
Theorem 1.9: A function 𝐹(𝑥) is a cdf for some random variable X if and only if it satisfies
the following properties
1) lim 𝐹(𝑥) = 0,
→
2) lim 𝐹(𝑥) = 1,
→
Continue from Example 1.5, the cdf for this random variable is given by
𝐹 (𝑥 ) = 𝑃(𝑋 ≤ 𝑥 ) = 𝑓 (𝑡 ) 𝑑𝑡
⎧
⎪ 0 𝑑𝑡 + 2(1 + 𝑡) 𝑑𝑡 𝑥>0
=
⎨
⎪ 0 𝑑𝑡 𝑥≤0
⎩
( )
= 1− 1+𝑥 𝑥>0
0 𝑥≤0
Note that ∫ 𝑓(𝑥)𝑑𝑥 = 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = 𝑃(𝑎 < 𝑋 ≤ 𝑏) = 𝑃(𝑎 ≤ 𝑋 < 𝑏) = 𝑃(𝑎 < 𝑋 <
𝑏) = 𝐹(𝑏) − 𝐹(𝑎)
1.5 Expectation
Definition 1.10: If X is a discrete random variable with pdf 𝑓(𝑥), then the expected value
(mean) of X is defined by
𝐸 (𝑋) = 𝑥𝑓(𝑥)
Similarly, if 𝑔(𝑋) is a real-valued function of X, then the expected value of 𝑔(𝑋) is given
by
𝐸 (𝑔(𝑋)) = 𝑔(𝑥)𝑓(𝑥)
Example 1.8: A box contains four chips. Two are labelled with the number 2, one is labelled
with a 4, and the other with an 8. The average of the numbers on the four chips is =
4. The experiment of choosing a chip at random and recording its number can be expressed
with a discrete random variable X having distinct values 𝑥 = 2,4, or 8, with 𝑓(2) = , and
𝑓(4) = 𝑓(8) = . The corresponding expected value or mean is
1 1 1
𝜇 = 𝐸(𝑋) = 2 +4 +8 =4
2 4 4
Definition 1.11: If X is a continuous random variable with pdf 𝑓(𝑥), then the expected
value (mean) of X is defined by
𝐸 (𝑋 ) = 𝑥𝑓(𝑥) 𝑑𝑥
if the integral is absolutely convergent. Otherwise we say that 𝐸(𝑋) does not exist. Similarly,
if 𝑔(𝑋) is a real-valued function of X, then the expected value of 𝑔(𝑋) is given by
𝐸 (𝑔(𝑋)) = 𝑔(𝑥)𝑓(𝑥) 𝑑𝑥
1
𝜇=2 (𝑡 − 1)𝑡 𝑑𝑡 = 2 1 − =1
2
Definition 1.13: the kth moment about the origin of a random variable X is
𝜇 = 𝐸(𝑋 )
and the kth moment about the mean is
𝜇 = 𝐸[𝑋 − 𝐸(𝑋)] = 𝐸 (𝑋 − 𝜇 )
≥ 𝑢(𝑥)𝑓(𝑥)𝑑𝑥 ≥ 𝑐𝑓(𝑥)𝑑𝑥
= 𝑐𝑃[𝑋 ∈ 𝐴] = 𝑐𝑃[𝑢(𝑋) ≥ 𝑐]
A special case, known as the Markov inequality, is obtained if 𝑢(𝑥 ) = |𝑥| for 𝑟 > 0,
namely
𝐸 (|𝑋| )
𝑃[|𝑋| ≥ 𝑐] ≤
𝑐
Theorem 1.15 (Chebychev inequality): If X is a random variable with mean 𝜇 and variance
𝜎 , then for any 𝑘 > 0,
𝑃[|𝑋 − 𝜇| ≥ 𝑘𝜎] ≤ or 𝑃[|𝑋 − 𝜇| < 𝑘𝜎] ≥ 1 −
Example 1.9: Suppose that X takes on the value –1, 0, and 1 with probabilities 1/8, 6/8, and
1/8, respectively. Then 𝜇 = 0 and 𝜎 = 1⁄4. For 𝑘 = 2,
3 1
𝑃[−2(0.5) < 𝑋 − 0 < 2(0.5)] = 𝑃[−1 < 𝑋 < 1] = 𝑃[𝑋 = 0] = ≥1−
4 2
⎧ 𝑒 𝑓 (𝑥 ) , 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
⎪
=
⎨
⎪ 𝑒 𝑓(𝑥) 𝑑𝑥, 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
⎩
is called the moment generating function (mgf) of X if this expected value exists for all
values of t in some interval of the form −ℎ < 𝑡 < ℎ for some ℎ > 0.
𝑡 𝑡 𝑡
= 1 + 𝑡𝜇 + 𝜇 + 𝜇 + ⋯+ 𝜇 + ⋯
2! 3! 𝑟!
where 𝜇 is the rth moment. Differentiate 𝑀 (𝑡) for r times with respect to t and setting 𝑡 =
0, we obtain the rth moment about the origin 𝜇 .
Example 1.10: A discrete random variable X has pdf 𝑓(𝑥) = if 𝑥 = 0,1,2, …., and
zero otherwise. The mgf of X is
1 1 𝑒
𝑀 (𝑡) = 𝐸 (𝑒 ) = 𝑒 𝑓 (𝑥) = 𝑒 =
2 2 2
Example 1.11: Consider a continuous random variable X with pdf 𝑓 (𝑥) = 𝑒 if 𝑥 > 0, and
zero otherwise. The mgf is
𝑀 (𝑡) = 𝐸 (𝑒 ) = ∫ 𝑒 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑒 𝑒 𝑑𝑥
( )
1 ( ) 1
= 𝑒 𝑑𝑥 = 𝑒 = , 𝑡<1
1−𝑡 1−𝑡
( )
The rth derivative is 𝑀 (𝑡) = 𝑟! (1 − 𝑡) , and thus the rth moment is 𝐸(𝑋 ) =
( )
𝑀 (0) = 𝑟!. The mean is 𝜇 = 𝐸 (𝑋) = 1! = 1, and the variance is 𝑉𝑎𝑟 (𝑋) = 𝐸 (𝑋 ) −
𝜇 = 2! − 1 = 1.
( )
Proof: 𝑀 (𝑡) = 𝐸 (𝑒 ) = 𝐸 𝑒 = 𝐸 (𝑒 𝑒 ) = 𝑒 𝐸 (𝑒 ) = 𝑒 𝑀 (𝑎𝑡).
Theorem 1.19 (Uniqueness): If 𝑋 and 𝑋 have respective cdf 𝐹 (𝑥) and 𝐹 (𝑥), and mgf
𝑀 (𝑡) and 𝑀 (𝑡), then 𝐹 (𝑥 ) = 𝐹 (𝑥) for all real x if and only if 𝑀 (𝑡) = 𝑀 (𝑡) for all t in
some interval −ℎ < 𝑡 < ℎ for some ℎ > 0.
Example 1.12: A bin contained 1000 flowers seeds and 400 were red flowering seeds. Of
the remaining seeds, 400 are white flowering and 200 are pink flowering. If 10 seeds are
selected at random without replacement, then the number of red flowering seeds, 𝑋 , and the
number of white flowering seeds, 𝑋 , in the sample are jointly distributed discrete random
variables. The join pdf of the pair (𝑋 , 𝑋 ) is obtained by the counting technique,
specifically
𝑓 (𝑥 , 𝑥 ) =
The probability of obtaining exactly two red, five white and three pink flowering seeds is
𝑓 (2, ,5) = 0.0331 . Notice that once the values of red and white flowering seeds are
specified, the number of pink seeds is also determined, so it suffices to consider only two
variables.
Definition 1.23: The joint cumulative distribution function (joint cdf) of the k random
variables 𝑋 , 𝑋 , … , 𝑋 is the function defined by
𝐹 (𝑥 , 𝑥 , … , 𝑥 ) = 𝑃[𝑋 ≤ 𝑥 , 𝑋 ≤ 𝑥 , … , 𝑋 ≤ 𝑥 ]
𝐹 (𝑥 , 𝑥 , … , 𝑥 ) = ⋯ 𝑓 (𝑡 , 𝑡 , … , 𝑡 ) 𝑑𝑡 ⋯ 𝑑𝑡
for all 𝑥 = (𝑥 , 𝑥 , … , 𝑥 ).
Theorem 1.25: A function 𝑓(𝑥 , 𝑥 , … , 𝑥 ) is the joint pdf for some vector-valued random
variable 𝑿 = (𝑋 , 𝑋 , … , 𝑋 ) if and only if the following properties are satisfied:
1. 𝑓(𝑥 , 𝑥 , … , 𝑥 ) ≥ 0 for all possible values (𝑥 , 𝑥 , … , 𝑥 ),
2. ∑ ⋯ ∑ 𝑓 (𝑥 , 𝑥 , … , 𝑥 ) = 1 for discrete case or
∫ ⋯ ∫ 𝑓(𝑥 , 𝑥 , … , 𝑥 ) 𝑑𝑥 ⋯ 𝑑𝑥 = 1 for continuous case.
Example 1.13: Let 𝑋 denote the concentration of a certain substance in one trial of an
experiment, and 𝑋 the concentration of the substance in a second trial of the experiment.
Assume that the joint pdf is given by 𝑓(𝑥 , 𝑥 ) = 4𝑥 𝑥 ; 0 < 𝑥 < 1, 0 < 𝑥 < 1, and zero
otherwise. The joint cdf is given by
𝐹 (𝑥 , 𝑥 ) = 𝑓 (𝑡 , 𝑡 ) 𝑑𝑡 𝑑𝑡 = 4𝑡 𝑡 𝑑𝑡 𝑑𝑡 = 𝑥 𝑥
Definition 1.26: If the pair (𝑋 , 𝑋 ) of random variables has the joint pdf 𝑓(𝑥 , 𝑥 ), then the
marginal pdf’s of 𝑋 and 𝑋 are
𝑓 (𝑥 ) = ∑ 𝑓(𝑥 , 𝑥 ) and 𝑓 (𝑥 ) = ∑ 𝑓(𝑥 , 𝑥 ) for discrete case
𝑓 (𝑥 ) = ∫ 𝑓 (𝑥 , 𝑥 )𝑑𝑥 and 𝑓 (𝑥 ) = ∫ 𝑓 (𝑥 , 𝑥 )𝑑𝑥 for continuous case
!
Example 1.14: If 𝑓(𝑥 , 𝑥 ) = 𝑝 𝑝 is a multinomial distribution, then the marginal
! !
pdf of 𝑋 is
𝑛!
𝑓 (𝑥 ) = 𝑓(𝑥 , 𝑥 ) = 𝑝 𝑝
𝑥 !𝑥 !
𝑛! (𝑛 − 𝑥 )!
= 𝑝 𝑝 [(1 − 𝑝 ) − 𝑝 ]( )
𝑥 ! (𝑛 − 𝑥 )! 𝑥 ! [(𝑛 − 𝑥 ) − 𝑥 ]!
𝑛! 𝑛−𝑥
= 𝑝 𝑝 [(1 − 𝑝 ) − 𝑝 ]( )
𝑥 ! (𝑛 − 𝑥 )! 𝑥
𝑛
= 𝑝 [𝑝 + (1 − 𝑝 ) − 𝑝 ]
𝑥
𝑛
= 𝑝 [(1 − 𝑝 )]
𝑥
That is, 𝑋 ~𝐵𝑖𝑛(𝑛, 𝑝 ).
Example 1.15: Let 𝑋 , 𝑋 and 𝑋 be continuous with a joint pdf of the form 𝑓(𝑥 , 𝑥 , 𝑥 ) =
𝑐, 0 < 𝑥 < 𝑥 < 𝑥 < 1, and zero otherwise, where c is a constant. First, note that
1 = ∫ ∫ ∫ 𝑐 𝑑𝑥 𝑑𝑥 𝑑𝑥 = ∫ ∫ 𝑐𝑥 𝑑𝑥 𝑑𝑥 = ∫ 𝑑𝑥 = =
𝑓 (𝑥 ) = 6 𝑑𝑥 𝑑𝑥 = 6 𝑥 𝑑𝑥 = 3𝑥
The joint pdf of any subset of the original set of random variables can be obtained with a
similar procedure. For instance, the joint pdf of the pair (𝑋 , 𝑋 ) as follows:
𝑓(𝑥 , 𝑥 ) = 𝑓 (𝑥 , 𝑥 , 𝑥 ) 𝑑𝑥 = 6 𝑑𝑥 = 6(1 − 𝑥 )
Example 1.16: from the previous Example 1.15, the conditional pdf of 𝑋 given (𝑋 , 𝑋 ) =
(𝑥 , 𝑥 ) is
𝑓(𝑥 , 𝑥 , 𝑥 ) 6 1
𝑓 (𝑥 |𝑥 , 𝑥 ) = = =
𝑓(𝑥 , 𝑥 ) 6(1 − 𝑥 ) (1 − 𝑥 )
for 0 < 𝑥 < 𝑥 < 𝑥 < 1.
Theorem 1.29: If 𝑋 and 𝑋 are random variables with joint pdf 𝑓(𝑥 , 𝑥 ) and marginal
pdf’s 𝑓 (𝑥 ) and 𝑓 (𝑥 ), then
𝑓 (𝑥 , 𝑥 ) = 𝑓 (𝑥 )𝑓 (𝑥 |𝑥 ) = 𝑓 (𝑥 )𝑓(𝑥 |𝑥 )
Prepared by Dr. Chang Yun Fah Page 14
[UECM2273 MATHEMATICAL STATISTICS] January 14, 2019
Example 1.17: A certain airborne particle lands at a random point (X, Y) on a triangular
region with the conditional pdf of Y given 𝑋 = 𝑥 is 𝑓(𝑦|𝑥) = , 0 < 𝑦 < . Then, the
conditional expectation is
⁄ 2 𝑥
2 𝑥
𝐸 (𝑌|𝑥) = 𝑦 𝑑𝑦 = 𝑥 2 = , 0<𝑥<2
𝑥 2 4
𝑀 (𝒕) = 𝐸 exp 𝑡𝑋
⋯
𝑓(𝑥 , 𝑥 , … , 𝑥 ) =