Mathematical Statistics Lecture Notes: Chapter 0: Review of Probability

Mathematical Statistics Lecture Notes
Note: This document has not been checked for errors and typos. It is only used
as a reference for class preparation before each lecture. The actual notes from
each lecture supersede this document and serve as the final materials for this
course.
Chapter 0: Review of Probability

Review of factorization criteria for independence of random variables
Let X be a random variable (rv) or a vector of rvs.
cumulative distribution function (CDF) of X (discrete and continuous);
(complement of CDF: CCDF)
probability mass function (pmf) of X (discrete);
probability density function (pdf) of X (continuous, also used for discrete);
moment generating function (mgf) of X (discrete and continuous).
Definition 0.1: Jointly distributed random variables X 1 , X 2 , , X n are independent if, and only if (iff),
all events of the form  X 1  A1 ,  X 2  A2  , ,  X n  An  are independent.
Definition 1 leads to the following equivalent criteria for independence:
Theorem 0.1 (Factorization criteria for independence): Jointly distributed rvs, X 1 , X 2 , , X n

are independent iff

n
FX1 , X 2 ,, X n  x1 , x2 , , xn    FX i ( xi ); (1)
i 1
n
p X1 , X 2 ,, X n  x1 , x2 , , xn    p X i ( xi ), X 1 , X 2 , , X n discrete; (2)
i 1
n
f X1 , X 2 ,, X n  x1 , x2 , , xn    f X i ( xi ), X 1 , X 2 , , X n continuous; (3)
i 1
n
M X1 , X 2 ,, X n  t1 , t2 ,, tn    M X i (ti ), (4)
i 1
if all moment generating functions exist,
 n  n
E  gi  X i     E  gi  X i  , (5)
 i 1  i 1
for all functions gi for which all the expectations in (5) exist.
Remark 0.1: Suppose X 1 , X 2 , , X n are independent. Choose gi(x)  x in (5) to obtain
 n  n
E  X i    E  X i . (6)
 i 1  i 1

The converse is not true, however. Specifically, (6) does not imply independence. It implies only that
X 1 , X 2 , , X n are pairwise uncorrelated. For most standard multivariate distributions (6) does imply
that X 1 , X 2 , , X n are independent.
Example 0.1: Consider discrete random variable x with pmf
0.25, x  2
0.25, x  1

PX ( x)  
0.25, x 1
0.25, x2
Let Y = X . It is clear that Y  X . However,
2

Cov( X , Y )  E[ XY ]  E[ X ]E[Y ]
 E[ X 3 ]  E[ X ]E[ X 2 ]
1 1
 (8  1  1  8)  (2  1  1  2)(4  1  1  4)
4 16
0
So X and Y are not independent but uncorrelated.
Example 0.2: Consider the uniform distribution on a disk of radius 1 centered at the origin, which has
pdf:
1
f X ,Y  x, y   , x 2  y 2  1. (7)

Shown that X and Y are uncorrelated but dependent using Theorem 1.

Remark 0.2: An equivalent necessary and sufficient condition for independence is
n
f X1 , X 2 ,, X n  x1 , x2 , , xn    hi ( xi ). (8)
i 1
and the support set  x ,..., x  f
1 n X1 , X 2 ,, X n  x1 , x2 ,, xn   0 is a Cartesian product, for example,
 x1  (a1 , b1 ),..., xn  ( an , bn ) .

Review of moment generating functions
Definition 0.2: The moment generating function (mgf) of X is defined by
(9)
which exists only if the indicated expectation exists for t in a neighborhood of 0.
Three usages of the moment generating functions include
(1) Find moments of random
(2) Identify pdf’s of random variables (MGF is unique).
(3) Establish properties of random variables.

Note: In Definition 0.2, if the moment generating function of X is finite (exist) in a neighborhood of 0 (an
open interval about 0), then this function completely determines the distribution of X.
Theorem 0.2: Mgfs are unique, i.e., M X  M Y iff FX  FY .
Theorem 0.3: If MX exists, then moments of all orders exist and
 E  X r tr
M X t    (10)
r 0 r!
d M X t 
r
and EX r   |t  0 , r  1, 2, (11)
dt r

Example 0.3: X  geometric(p) if
p X  x   p (1  p ) x 1 , x  1, 2, , 0  p  1.
A. Find the mgf of X.
B. Use the mgf to find E(X) and Var(X).
C. Find FX(x).

Example 0.4: A kind of capacitor has a defect rate of 100 ppm (parts per million). Assume that whether
one is defective is independent of whether any other is.
A. What is the expected number of capacitors that would have to be inspected in order to find one
defective?
B. What is the probability that a defective capacitor would be found at or before the 10,000th
inspection?

Theorem 0.4: Let Y  aX  b, where a and b are constants. Then
M Y  t   ebt M X  at  . (12)
1 2
Example 0.5: Show that (1) if X ~ N  0,1 , then M X  t   e  
t
and (2) if Y ~ N  ,  , then
2 2
 
 t   2t 2
M Y t   e 2
. Use Theorem 4 to show that Z  aY  b ~ N a  b, a  .
2 2
Solution:

Definition 0.3: The joint mgf of jointly distributed random variables X 1 , X 2 , , X n is defined by
(13)
or in equivalent vector form
(14)

The joint mgf is said to exist iff the expectation in (13) and (14) exists for any t in a neighborhood of 0.
Theorem 0.5: As in the univariate case, joint mgfs are unique.
Theorem 0.6: For r  n,
M X1 , X 2 ,, X r  t1 , t2 , , tr   M X1 , X 2 ,, X n  t1 , t2 , , tr ,0,0, ,0  . (15)

Theorem 0.7: If M X 1 , X 2  t1 , t 2  exists, then moments of all orders exist and
 r  s M X 1 , X 2  t1 , t2 
EX X r
1
s
2  | t1 , t2    0,0 
. (16)
t1r t2s

Theorem 0.8: If X 1 , X 2 , , X n are independent, then
n
M n  t    M X (t ).
 Xi
i
i 1
i 1
Example 0.7: Use Theorem 8 to show that if X 1 , X 2 , , X n are independent and a1 , a2 , , an are
 
constants with X i ~ N i ,  i , then
2
n
 n n
2 2

i 1
a X
i i ~ N   i i  ai  i  .
 i 1
a  ,
i 1 

The gamma and chi‐square distributions
Definition 0.4: The gamma function is defined by
. (18)

Some properties of the gamma function:
1. () does not exist in closed form unless  is a positive integer in which case (n)  (n 1 )!.
2. Recursive property: (  1)   ( ).
1
3.      , which combined with the recursive property gives
2
 1  1  3  5  2n  1
 n     for any positive integer n.
 2 2n
Definition 0.5: X has a gamma distribution with shape parameter  and scale parameter  [write
X ~   ,   ] if
x
1 
fX  x  x 1e  , x  0,  ,   0. (19)
   

Note: The text uses    ,    .
Remarks:
1. The exponential distribution is a special case of the gamma distribution; specifically
 1,    exp    .
2. The gamma CDF cannot be expressed in closed form in general. Computer packages can give
numerical values. When   n, a positive integer, it can be shown that (P.113):
 
k
x
  .
n 1
x
FX  x   1   e 
 1  FY  n  1 , where Y ~ Poisson x (20)
k 0 k!
Theorem 0.9a: If X ~   ,   ,
E  X    , Var  2  .

Theorem 0.9b: If X ~   ,   ,
M X  t   1   t  .


Proof: (Similar to Theorem 0.9a)

Corollary 0.1: If X ~ exp(  ), M X  t   1   t  .

1
Theorem 0.10: If X ~   ,   , then
 r    r 
EX  r
,
  

Theorem 0.11: If X ~   ,   , then cX ~   , c   .

Theorem 0.12: If X 1 , X 2 , , X n are independent, with X i ~   i ,   , then
n
 n 

i 1
X i ~   i ,   .
 i 1 
Corollary 0.2: If X 1 , X 2 , , X n are independent, with X i ~ exp   , then
X
i 1
i ~   n,  .
Example 0.8: Suppose a machine has exponential life with mean time to failure (MTTF) 300h. When it
breaks down assume that it is immediately replaced by an identical machine or repaired to its previous
state. Also, assume that the times between breakdowns are independent.
A. What is the distribution to the fifth breakdown?
B. Assuming that the machines operate continuously, what is the probability that the fifth
breakdown occurs after one month (1 mo.  720h)?

Chi‐Square Distribution:
Definition 0.6: X has a chi‐square distribution with  degrees of freedom (dof) (write
 
X ~ 2 ) if X ~   , 2  , i.e., if
2 

1 1 
x
fX  x  x 2 e 2 , x  0,   0.
  
2 2  
2
Note:  is usually an integer in the context of the chi‐square distribution.
Since the chi‐square is a special case of the gamma distribution, Corollaries 0.3 – 0.5 below follow
readily from the theorems on the gamma distribution.
Corollary 0.3: If X ~ 2 ,
E  X    , Var  X   2 .

Corollary 0.4: If X ~ 2 ,

M X  t   1  2t  2
.

Corollary 0.5: If X ~   ,   , then 2 X
 ~  2 .
2
Corollary 0.6: If X 1 , X 2 , , X n are independent, with X i ~ 2i , then
X i ~  2n .
i 1  i
i 1
Proof: (By MGF)

Theorem 0.13: If Z ~ N  0,1 (standard normal), then Z 2 ~ 12 .
Proof: (By MGF)
 
Corollary 0.7: If X 1 , X 2 , , X n are independent with X i ~ N i ,  i , then
2
2
n
 X i  i 
 
i 1  i 
 ~ n .
2
Example 0.9: Redo Example 0.8.B using the2 distribution.
(Assuming that the machines operate continuously, what is the probability that the fifth breakdown
occurs after one month (1 mo.  720h)? )

Summary:

Transformation Method
Any real‐value function of a rv X is itself a rv, the distribution of which is determined by the distribution
of X.
Three main techniques: 𝑌 𝑢 𝑥
1) MGF method ⇒ 𝑒. 𝑔. , 𝑋~𝑁 0,1 → 𝑥 ~ 𝑋

2) CDF approach ⇒ 𝐹𝑖𝑛𝑑 𝐹 𝑦 𝑓𝑟𝑜𝑚 𝐹 𝑥
3) Transformation method (Jacobian Method) ⇒ 𝐹𝑖𝑛𝑑 𝑓 𝑦 𝑓𝑟𝑜𝑚 𝑓 𝑥
1) CDF Approach
Ex 0.10: Let 𝑋~𝐸𝑋𝑃 . 𝐹𝑖𝑛𝑑 𝑓 𝑦 𝑤ℎ𝑒𝑟𝑒 𝑌 𝑒 ,𝑏 0.
Sol: ∴ 𝐹 𝑥 1 𝑒 ,0 𝑥 ∞
∴𝐹 𝑦 𝑃𝑌 𝑦 𝑃𝑒 𝑦
1
𝑃 𝑥 ln 𝑦 ⇒ 𝐶𝐷𝐹 𝑜𝑓 𝑋
𝑏
1
𝐹 ln 𝑦
𝑏
1 𝑒
1 𝑦
𝑑𝐹 𝑦 𝑎
∴ 𝑓 𝑦 𝑦 ≡ 𝑐𝑦 ,1 𝑦 ∞
𝑑𝑦 𝑏
Formally, Let 𝑌 𝑢 𝑥 , where 𝑢 ∙ is a real‐value function
Define 𝐴 𝑥|𝑢 𝑥 𝑦
The CDF approach is to find
𝐹 𝑦 𝑃𝑢 𝑥 𝑦 𝑏𝑦 𝑠𝑜𝑙𝑣𝑖𝑛𝑔 𝐹 𝑦 𝐹 𝑢 𝑦

Thm 0.14: Let 𝑋 𝑥 , … , 𝑥 be a k‐dimensional vector of continuous r.v.’s with joint pdf
𝑓 𝑥 ,…,𝑥
If 𝑌 𝑢 𝑋 , then
𝐹 𝑦 𝑃𝑢 𝑋 𝑦 … 𝑓 𝑥 , … , 𝑥 𝑑𝑥 ⋯ 𝑑𝑥 ,
𝐴 𝑥|𝑢 𝑥 𝑦
The relationship between x and y can be:
❶ 𝐷𝑖𝑠𝑐𝑟𝑒𝑡𝑒
1) one‐to‐one
❷ 𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
❸ 𝑛 𝑡𝑜 𝑜𝑛𝑒 ∶ 𝑥 𝑌
2) Not one‐to‐one < 𝑜𝑛𝑒 𝑡𝑜 𝑛 → 𝑁𝑜𝑡 𝑤𝑒𝑙𝑙 𝑑𝑒𝑓𝑖𝑛𝑒𝑑
❹ 𝑛! 𝑡𝑜 𝑜𝑛𝑒
❶ one‐to‐one (discrete)
Thm 0.15: If X is a discrete rv with pmf 𝑓 𝑥 , 𝑌 𝑢 𝑥 is one‐to‐one
i.e., 𝑦 𝑢 𝑥 ⇔𝑥 𝑢 𝑦 𝑤 𝑦 , then the pmf of Y is:
𝑓 𝑦 𝑓 𝑤 𝑦 , 𝑦 ∈ 𝐵, 𝐵 𝑦|𝑓 𝑦 0
Ex 0.12: Suppose 𝑋~𝐵𝐼𝑁 𝑛, 𝑝 . Find pmf of 𝑌 𝑛 𝑥
𝑛
Sol: We know 𝑃 𝑥 𝑝 1 𝑝
𝑥
Find the invert function 𝑤 ∙ : 𝑌 𝑛 𝑥⇒𝑥 𝑛 𝑌 𝑤 𝑌
𝑛 𝑛
∴𝑃 𝑦 𝑃 𝑤 𝑦 𝑛 𝑦 𝑝 1 𝑝 ⏞ 𝑦 𝑞 1 𝑞
Hence, 𝑌~ 𝐵𝐼𝑁 𝑛, 𝑞 #

❷ one‐to‐one (continuous)
Thm 0.16: If X is continuous rv with pdf 𝑓 𝑥 , 𝑌 𝑢 𝑥 is one‐to‐one from
𝐴 𝑥|𝑓 𝑥 0 𝑡𝑜 𝐵 𝑦|𝑓 𝑦 0
𝑑𝑤 𝑦
If the derivative is continuous and non zero on B, then the pdf of Y is:
𝑑𝑦
𝑑𝑤 𝑦
𝑓 𝑦 𝑓 𝑤 𝑦 ,𝑦 ∈ 𝐵
𝑑𝑦

𝑎
𝑎 1
Ex 0.13: In Ex 0.10, 𝑓 𝑥 𝑎𝑒 ,𝑌 𝑒 , 𝑓𝑌 𝑦 𝑦 𝑏 . Now, find 𝑓 𝑦 using the
𝑏
transformation method.
Sol: STEP 1: Find the inverse 𝑥 𝑤 𝑦 ln 𝑦
STEP 2: Take the derivative
STEP 3: 𝑓 𝑦 𝑓 𝑤 𝑦 𝑎𝑒 𝑦 #
❸ Not one‐to‐one:
Partition A into disjoint subsets 𝐴 , 𝐴 , …, such that 𝑢 𝑥 is one‐to‐one over each 𝐴 , the pdf

would be:
𝑓 𝑦 𝑓 𝑤 𝑦 𝐷𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑑
𝑓 𝑦 𝑓 𝑤 𝑦 𝑤 𝑦 𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
𝑑𝑦
Ex 0.14: Let 𝑓 𝑥 ,𝑥 2, 1,0,1,2 𝑎𝑛𝑑 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟 𝑌 |𝑋|. 𝐹𝑖𝑛𝑑 𝑓 𝑦

Sol: 𝐵 0,1,2
4
𝑓 0 𝑓 𝑤 𝑦 | 𝑓 𝑦
31
𝑓 1 ⏞ 𝑓 𝑤 1 𝑓 𝑤 1
𝑓 1 𝑓 1
4 1 4 1

31 2 31 2
10

31
𝑓 2 𝑓 𝑤 2 𝑓 𝑤 2 𝑓 2 𝑓 2 #
Joint Transformations:
Thm 0.17: If X is a vector of rv’s with joint pdf 𝑓 𝑥 and
𝑌 𝑌 ,…,𝑌 𝑈 𝑋 𝑢 𝑋 , … , 𝑢 𝑋 defines a one‐to‐one transformation, then the
joint pdf of Y is: (Jacobian is | 𝐽 |)
𝐶𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠: 𝐹 𝑦 , … , 𝑦 𝑓 𝑥 ,…,𝑥 ∗ | 𝐽 |
Where 𝑥 , 𝑥 , … , 𝑥 are the solution of 𝑌 𝑈 𝑋 .
𝐷𝑖𝑠𝑐𝑟𝑒𝑡𝑒: 𝑓 𝑦 , … , 𝑦 𝑓 𝑥 ,…,𝑥
Ex 0.15: Let 𝑥 , 𝑥 be independent and exponential, 𝑥 ~𝐸𝑋𝑃 1 .
𝑌 𝑋
Define 𝑥 ,𝑥 ∈𝐴 𝑥 ,𝑥 | 0 𝑥 , 0 𝑥
𝑌 𝑋 𝑋
1) 𝐹𝑖𝑛𝑑 𝑡ℎ𝑒 𝑟𝑎𝑛𝑔𝑒 𝑜𝑓 𝑌 , 𝑌 . 𝑖𝑒, 𝑦 , 𝑦 ∈ 𝐵.

2) 𝐹𝑖𝑛𝑑 𝑓 , 𝑦 , 𝑦
Sol:
0 𝑥 𝑦 𝑢 𝑦 ,𝑦 0 𝑦 0 𝑦
1) ⇒ ⇒
0 𝑥 𝑦 𝑦 𝑢 𝑦 ,𝑦 0 𝑦 𝑦 𝑦 𝑦
2) 𝑓 , 𝑦 ,𝑦 𝑓 , 𝑥 ,𝑥 | 𝐽 | 𝑓 , 𝑦 ,𝑦 𝑦
𝜕𝑥 𝜕𝑥
𝜕𝑦 𝜕𝑦 1 0
|𝐽| 0
𝜕𝑥 𝜕𝑥 1 1
𝜕𝑦 𝜕𝑦
𝑓 𝑦 ∙𝑓 𝑦 𝑦 ∙ 1
𝑒 ∙𝑒
𝑒
Extension: 𝑓 𝑦 𝑓 , 𝑦 , 𝑦 𝑑𝑦 𝑒 𝑑𝑦 𝑦 𝑒 → Γ 2,1 , 0 𝑦 #
Ex 0.16:
a) Let 𝑓 𝑥 2𝑥, 0 𝑥 1 and 𝑌 𝑋 . Find 𝑓 𝑦 ?

b) Let 𝑓 𝑥 𝑥 , 1 𝑥 2 and 𝑌 𝑋 . Find 𝑓 𝑦 ?
Sol:
a) ∴ 𝑤𝑖𝑡ℎ𝑖𝑛 0 𝑥 1, 𝑌 𝑋 𝑖𝑠 1 𝑡𝑜 1
𝑑𝑤 𝑦 𝑑 𝑦 1
∴𝑓 𝑦 𝑓 𝑤 𝑦 𝑓 𝑦 2 𝑦 1
𝑑𝑦 𝑑𝑦 2 𝑦
𝑁𝑜𝑡𝑒: 𝑌 𝑋 ⇒ 𝑋 𝑦⇒𝑋 𝑦 𝑤 𝑦
b) ∴ 1 𝑥 1, 𝑌 𝑋 𝑖𝑠 2 𝑡𝑜 1
1 𝑥 2, 𝑌 𝑋 𝑖𝑠 1 𝑡𝑜 1
Split 1 𝑥 1 to:
𝑑 𝑦 1 𝑦⎫
1 𝑥 0∶𝑥 𝑦 𝑤 ⇒𝑓 𝑦 𝑓 𝑦 𝑦
𝑑𝑦 3 6⎪
0 𝑦 1
𝑑 𝑦 1 1 𝑦 ⎬
0 𝑥 1∶𝑥 𝑦 𝑤 𝑦 ⇒𝑓 𝑦 𝑓 𝑦 𝑦 ⎪
𝑑𝑦 3 2 𝑦 6 ⎭
𝑑 𝑦 𝑦
1 𝑥 2: 𝑥 𝑦 𝑤 𝑦 ⇒𝑓 𝑦 𝑓 𝑦 ⋯ 1 𝑦 4
𝑑𝑦 6
⎧ 𝑦 𝑦 ⎧ 𝑦
⎪6 6 ⎪3 , 0 𝑦 1
∴𝑓 𝑦
⎨ 𝑦 ⎨ 𝑦
⎪ ⎪ , 1 𝑦 4
⎩ 6 ⎩6
Part b) can also be solved using CDF approach:
When 1 𝑥 1: 𝐹 𝑦 𝑃𝑌 𝑦 𝑃𝑋 𝑦 𝑃 √𝑦 𝑥 √𝑦
𝐹 𝑦 𝐹 𝑦
𝑑𝐹 𝑦 𝑑𝐹 𝑦 𝑑 𝑦 𝑑𝐹 𝑦 𝑑 𝑦
∴ 𝑓 𝑦 ∙ ∙
𝑑𝑦 𝑑𝑦 𝑑𝑦 𝑑𝑦 𝑑𝑦
When 1 𝑥 2: 𝐹 𝑦 𝑃𝑌 𝑦 𝑃𝑋 𝑦 𝑃𝑋 𝑦 𝐹 𝑦
𝑑𝐹 𝑦 𝑑𝐹 𝑦 𝑑 𝑦
∴ 𝑓 𝑦 ∙
𝑑𝑦 𝑑𝑦 𝑑𝑦
#
Ex 0.17: Let 𝑋 ⊥ 𝑌~𝑈 0,1 . Find 𝑓 𝑢 , where 𝑈 𝑋 𝑌
Sol: Idea:
1) Define a dummy r.v. V=Y convolution: 𝑓 𝑢 𝑓 𝑡 𝑓 𝑢 𝑡 𝑑𝑡

2) Find 𝑓 , 𝑢, 𝑣
3) Take the marginal 𝑓 𝑢

1) Define V=Y (V=X also works)
𝜕𝑋 𝜕𝑋
𝑈 𝑋 𝑌 𝑋 𝑈 𝑉 𝜕𝑈 𝜕𝑉 1 1
⇒ ⇒ |𝐽| 1
𝑉 𝑌 𝑌 𝑉 𝜕𝑌 𝜕𝑌 0 1
𝜕𝑈 𝜕𝑉
2) Find 𝑓 , 𝑢, 𝑣
𝑢, 𝑣 𝑓 , 𝑢 𝑣, 𝑣 |𝐽|
𝑓 , 1
0 𝑋 1 0 𝑈 𝑉 1 𝑉 𝑈 1 𝑉
3) ⇒ ⇒
0 𝑌 1 0 𝑉 1 0 𝑉 1
𝑓 𝑢 𝑓 , 𝑢, 𝑣 𝑑𝑣
⎧ 𝑑𝑣 , 0 𝑢 1
⎪

⎨
⎪ 𝑑𝑣 , 1 𝑢 2
⎩
𝑢, 0 𝑢 1
#
2 𝑢, 1 𝑢 2
Summary:
*** One‐to‐one: 𝑌 𝑈 𝑋 , 𝑋 𝑈 𝑌 𝑤 𝑌 , 𝐴 𝑥 |𝑓 𝑥 0, 𝐵 𝑦|𝑓 𝑦 0
Discrete: 𝑓 𝑦 𝑓 𝑤 𝑦 , 𝑦 ∈ 𝐵
Continuous: 𝑓 𝑦 𝑓 𝑤 𝑦 , 𝑦 ∈ 𝐵
*** Not one‐to‐one: Partition A into disjoint subsets: 𝐴 , 𝐴 , … , 𝐴
Discrete: 𝑓 𝑦 ∑ 𝑓 𝑤 𝑦
Continuous: 𝑓 𝑦 ∑ 𝑓 𝑤 𝑦 𝑤 𝑦
*** Joint Distribution:
Discrete: If not one‐to‐one, then partition A into 𝐴 , 𝐴 , … , 𝐴 s.t.
𝑦 𝑈 𝑋 has a unique solution 𝑋 𝑋
𝑓 𝑦 ,…,𝑦 𝑓 𝑥 ,…,𝑥
⎡ ⋯ ⎤
Continuous: 𝑓 𝑦 , … , 𝑦 𝑓 𝑥 ,…,𝑥 ⎢ ⋮ ⋱ ⋮ ⎥
⎢ ⎥
⎣ ⋯ ⎦
If not one‐to‐one:
𝑓 𝑦 ,…,𝑦 𝑓 𝑥 ,…,𝑥 𝐽

Order statistics
𝑛! 𝑓 𝑥
𝑥 , 𝑥 , 𝑥 ⇒ 𝑒𝑥𝑎𝑚 𝑠𝑐𝑜𝑟𝑒𝑠 𝑜𝑓 3 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 , ⊥ , 𝑓 𝑥

𝑥 𝑥 𝑥
𝑥 𝑥 𝑥 ⎫ 𝑦 min 𝑥 , 𝑥 , 𝑥
𝑥 𝑥 𝑥 ⎪
3! 𝑜𝑟𝑑𝑒𝑟𝑖𝑛𝑔𝑠 𝑦 𝑦 𝑦⏞ 𝑦⏞ 𝑦 min 𝑥 , 𝑥 , 𝑥
𝑥 𝑥 𝑥 ⎬
𝑥 𝑥 𝑥 𝑦 min 𝑥 , 𝑥 , 𝑥
⎪
𝑥 𝑥 𝑥 ⎭
!
𝑓 , , 𝑦 ,𝑦 ,𝑦 𝑓 , , 𝑥 , 𝑥 , 𝑥 |𝐽 |
𝑓 𝑥 𝑓 𝑥 𝑓 𝑥 |𝐽 |
𝑓 𝑦 𝑓 𝑦 𝑓 𝑦 |𝐽 |
𝜕𝑥 𝜕𝑥 𝜕𝑥
⎡ ⎤
⎢𝜕𝑦 𝜕𝑦 𝜕𝑦 ⎥
⎢𝜕𝑥 𝜕𝑥 𝜕𝑥 ⎥ 0 0 1
|𝐽| ⎢𝜕𝑦 ⎥ 1 0 0 | 1| 1
𝜕𝑦 𝜕𝑦
⎢ ⎥ 0 1 0
⎢𝜕𝑥 𝜕𝑥 𝜕𝑥 ⎥
⎣𝜕𝑦 𝜕𝑦 𝜕𝑦 ⎦
𝑛! 𝑓 𝑦 𝑓 𝑦 𝑓 𝑦
In general, the pdf of the order statistic for n iid(independently identically distributed) r.v.’s is:
𝑓 ,…, 𝑦 ,…,𝑦 𝑛! 𝑓 𝑦
Thm 0.18: If xi’s are iid sample from a population with continuous pdf f(x), then the joint pdf of
the order statistics 𝑌 , … , 𝑌 is
𝑔 𝑦 ,…,𝑦 𝑛! 𝑓 𝑦 ⋯ 𝑓 𝑦
For 𝑦 𝑦 ⋯ 𝑦 and zero otherwise.
***What is 𝑔 𝑦 ?
𝑛!
𝑔 𝑦 𝐹 𝑦 1 𝐹 𝑦 𝑓 𝑦
𝑘 1 ! 𝑛 𝑘 !
***How about the CDF?
Let’s focus on min & max
𝐺 𝑦 𝑃𝑌 𝑦
1 𝑃𝑌 𝑦
1 𝑃 min 𝑥 , … , 𝑥 𝑦
1 𝑃𝑥 𝑦 ,𝑥 𝑦 ,…,𝑥 𝑦
1 𝑃𝑥 𝑦 𝑃𝑥 𝑦 ⋯𝑃 𝑥 𝑦
1 1 𝐹 𝑦 1 𝐹 𝑦 ⋯ 1 𝐹 𝑦
1 1 𝐹 𝑦
CDF of Yn (max):
𝐹 𝑦 𝑃𝑌 𝑦
𝑃 max 𝑥 , … 𝑥 𝑦
𝑃𝑥 𝑦 ,𝑥 𝑦 ,…,𝑥 𝑦
𝑃𝑥 𝑦 𝑃𝑥 𝑦 ⋯𝑃 𝑥 𝑦
𝐹 𝑦 ∙𝐹 𝑦 ⋯𝐹 𝑦
𝐹 𝑦

Chapter 1. Sampling Distribution

Probability vs Statistics:

Def. 1.1: A function of observable r.v.’s, 𝑇 𝑡 𝑥 , … , 𝑥 , which does not depend on any unknown
parameters is called Statistics.
ᵢ
Examples: 𝑥 ∑ 𝑇 ∑ 𝑛 𝑖𝑠 𝑘𝑛𝑜𝑤𝑛
ℴ
𝑥 ∑ 𝑖𝑓 ℴ 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛, 𝑛𝑜𝑡 𝑎 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐
𝑇 𝑥: 𝑥 :

Def 1.2: The set of r.v.’s 𝑥 , … , 𝑥 is said to be a random sample (rs) of size n from a population with
density function ƒ(x) if the joint pdf has the form:
𝑓 , . . ., 𝑥 , . . . , 𝑥 𝑓 𝑥 ∗ … ∗ 𝑓 𝑥

Ex. 1.1: Let 𝑥 , . . . , 𝑥 be a rs of size 𝑛 2𝑘 from 𝐸𝑋𝑃 𝜃 . Find 𝑓 , . . ., 𝑥 , . . . , 𝑥 .
Solution: ∴ 𝑓 𝑥 𝑒 , ∀𝑖 𝑎𝑛𝑑 𝑥 𝑠 𝑎𝑟𝑒 𝑟𝑠
∑
∴ 𝑓 ,… , 𝑥 ,…,𝑥 𝑓 𝑥 ∗. . .∗ 𝑓 𝑥 𝑛𝑒 , ∀𝑖 0 𝑥 #

Ex. 1.2: Let 𝑥 , . . . , 𝑥 be a rs of size 𝑛 2𝑘 taken from 𝑈 0,1 .
Find 𝑃 𝑥 , 𝑥 , 𝑥 ,𝑥 ,...,𝑥 .
Solution: ∴ 𝑥 𝑠 𝑎𝑟𝑒 𝑟𝑠
∴ 𝑃 𝑥 , 𝑥 , 𝑥 ,𝑥 ,...,𝑥 𝑃 𝑥 𝑃 𝑥 …𝑃 𝑥
1 1 1
∗ ∗. . . ∗
2 2 2
2
4 #

∑
Def 1.3: The sample mean is a r.v. and defined as 𝑥 . 𝑁𝑜𝑡𝑒: 𝐸 𝑥 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑟𝑠.

Thm 1.1: If 𝑥 , . . . , 𝑥 is a rs from 𝑓 𝑥 with 𝐸 𝑥 𝜇 and 𝑉𝑎𝑟 𝑥 𝜎 then 𝐸 𝑥 𝜇 and 𝑉𝑎𝑟 𝑥
.
Proof: 1. 𝐸𝑥 𝐸 ∑ (Substitute definition of 𝑥 )
∑
2. Since 𝑓 𝑥 𝑑𝑥 ∑ 𝑓 𝑥 𝑑𝑥 , 𝐸 ∑ ∑ even if xᵢ’s are not ⊥.
∑
3. ∑ (Substitute definition of 𝐸 𝑥 )
∑
4. 𝜇 (Definition of summation) #
∑
Proof Summary: 𝐸𝑥 𝐸 ∑ ∑ 𝜇 , even if xᵢ’s are not independent.

FOLLOWING SIMILAR LOGIC:
𝑉𝑎𝑟 𝑥̅ 𝑉𝑎𝑟 ∑
𝑉𝑎𝑟 ∑ 𝑥 𝑁𝑜𝑡𝑒: 𝑂𝑁𝐿𝑌 𝑤𝑜𝑟𝑘𝑠 𝑖𝑓 𝑥 𝑠 𝑎𝑟𝑒 ⊥ ∑ 𝑉𝑎𝑟 𝑥
∑ 𝜎 #

Def 1.4: An estimator T is said to be an unbiased estimator of 𝜏 𝜃 if
∀𝜃 ∈ Ω, 𝐸 𝑇 𝜏 𝜃 .
Examples: 𝜎 , ln 𝜎 , 𝑒 , ln 𝜎 𝑒

How to estimate 𝜎 ? ∑ → 𝜇
∑
𝑁𝑜𝑡 𝑎 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 → → 𝜎 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝐸 𝑥 𝜇 𝜎
∑ ̅
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 → → 𝜎
∑ ̅
********ADD EXPLANATION 𝐼𝑠 𝐸 𝜎
Thm 1.2: If 𝑥 , . . . , 𝑥 is a rs of size n from 𝑓 𝑥 with 𝐸 𝑥 𝜇 and 𝑉𝑎𝑟 𝑥 𝜎 , define the sample

∑ ̅
variance: 𝑆 𝑛 . Then, 𝐸 𝑆 𝑛 𝜎 .
∑ ̅
Proof: 𝐸𝑆 𝑛 𝐸 ∑ 𝐸 𝑥 𝑥̅
1
𝐸𝑥 2𝑥 𝑥̅ 𝑥̅
𝑛 1
1
𝐸𝑥 2𝐸 𝑥 𝑥̅ 𝑛𝐸 𝑥̅
𝑛 1
∑𝑥
𝑆𝑢𝑏𝑠𝑡𝑒𝑝: 𝐸 𝑥 𝑥̅ 𝑛𝐸 𝑥̅ 𝑛𝐸 𝑥̅ 𝑥̅ 𝑛𝐸 𝑥̅
𝑛
1
𝐸𝑥 2𝑛𝐸 𝑥̅ 𝑛𝐸 𝑥̅
𝑛 1
1
𝐸𝑥 𝑛𝐸 𝑥̅
𝑛 1
1
𝜇 𝜎 𝑛 𝐸 𝑥̅ 𝑉𝑎𝑟 𝑥̅
𝑛 1

IF 𝑥 ’s are independent
1 𝜎
𝑛𝜇 𝑛𝜎 𝑛 𝜇
𝑛 1 𝑛
1
𝑛 1 𝜎
𝑛 1
𝜎
IF 𝑥 ’s are not iid
(ex: if there is a correlation between 𝑥 𝑎𝑛𝑑 𝑥 )
𝑥
𝑉𝑎𝑟 𝑥̅ 𝑉𝑎𝑟
𝑛
1
𝑉𝑎𝑟 𝑥
𝑛
1
𝑐𝑜𝑟 𝑥 , 𝑥
𝑛
𝑆𝑢𝑏𝑠𝑡𝑒𝑝: 𝑐𝑜𝑟 𝑋, 𝑌 𝑍 𝑐𝑜𝑟 𝑋, 𝑌 𝑐𝑜𝑟 𝑋, 𝑍
𝑒𝑣𝑒𝑛 𝑖𝑓 𝑌, 𝑍 𝑎𝑟𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑒𝑑
1
𝑐𝑜𝑟 𝑥 , 𝑥
𝑛
1
𝑛𝜎 2 𝑛 𝑗 𝑐
𝑛
𝑆𝑢𝑏𝑠𝑡𝑒𝑝: 𝑐 𝑐𝑜𝑟 𝑥 , 𝑥 𝜌𝜎
1
𝑛𝜎 2 𝑛 𝑗 𝜎 𝜌
𝑛
𝜎 𝑗
1 2 1 𝜌
𝑛 𝑛
𝜎
𝑖𝑓 𝜌 0 𝐵𝑖𝑔𝑔𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑛𝑜𝑡 𝑔𝑜𝑜𝑑
𝑛
𝜎
𝑖𝑓 𝜌 0 𝑆𝑚𝑎𝑙𝑙𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑔𝑜𝑜𝑑
𝑛
𝜎
𝑛
𝑗
1 2∑ 1 𝜌
𝑛
𝜎
→ 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
𝑛

⎧ 𝐸 𝑥̅ 𝑀
⎪ 𝜇 → 𝑥̅ ∑ ̅
SUMMARY: 𝑉𝑎𝑟 𝑥̅ →
⎨ ∑ ̅
⎪𝜎 → 𝑆
⎩
Notes: ~ → "𝑓𝑜𝑙𝑙𝑜𝑤𝑠" , → 𝑚𝑒𝑎𝑛𝑠 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒 𝑖𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

Def 1.5: Normal Distribution: 𝐼𝑓 𝑓 𝑥 𝑒 , ∞ 𝑥 ∞ 𝑡ℎ𝑒𝑛 𝑋~𝑁 𝜇, 𝜎 𝑤ℎ𝑒𝑟𝑒 ∞
√
𝜇 ∞ 𝑎𝑛𝑑 𝜎 0
1
𝑎𝑛𝑑 𝑖𝑓 𝜇 0 𝑎𝑛𝑑 𝜎 1 𝑡ℎ𝑒𝑛 𝑥 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑎 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑛𝑜𝑟𝑚𝑎𝑙 𝑟. 𝑣. 𝑤𝑖𝑡ℎ 𝑓 𝑥 𝑒
√2𝜋
𝑒 𝑑𝑥 √2𝜋
Thm 8.3: 𝐶𝑒𝑛𝑡𝑟𝑎𝑙 𝐿𝑖𝑚𝑖𝑡 𝑇ℎ𝑒𝑜𝑟𝑒𝑚 𝐶𝐿𝑇 : 𝐼𝑓 𝑥 , . . . , 𝑥 is a rs from a distribution with mean 𝜇 and
variance 𝜎 ∞, then the limiting distribution of
∑
∑ ̅
𝑍 𝑍
√
is a standard normal rv, 𝑍 → 𝑍~𝑁 0,1 𝑎𝑠 𝑛 → ∞ (→ 𝑚𝑒𝑎𝑛𝑠 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒 𝑖𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

Ex 8.1: If 𝑋 ~𝑁 𝜇 , 𝜎 , 𝑖 1, … , 𝑛, ⊥, 𝑡ℎ𝑒𝑛 𝑌 ∑ 𝑎 𝑥 ~ 𝑁 ∑ 𝑎 𝜇 , ∑ 𝑎 𝜎
Thm 8.4: 𝐼𝑓 𝑥 , . . . , 𝑥 is a rs from 𝑁 𝜇, 𝜎 , then 𝑥̅ ~𝑁 𝜇, .

Logic: 𝑥̅ ∑ 𝑥 ⟹𝑎 ⟶

Def 1.6: If 0 < p < 1, then a 𝟏𝟎𝟎 𝒑𝒕𝒉 𝒑𝒆𝒓𝒄𝒆𝒏𝒕𝒊𝒍𝒆 of the distribution of a continuous random variable x is
a solution 𝑥 to the equation
fx(x)
𝐹_𝑥 𝑥_𝑝 𝑝 𝑝𝑥 𝑥_𝑝

p

x
Xp

Standard Normal density is symmetric about zero( 0 ):
Φ 𝑧 1 Φ 𝑧 Φ 𝑖𝑠 𝑝𝑑𝑓 𝑜𝑓 𝑁 0,1 1‐p
p
p

‐Z p Z1‐p
Ex 1.3: Suppose X is the lifetime of a battery and is claimed to be N(60,36). To test the claim,
25 batteries are life‐tested 𝑥 , … , 𝑥 . If the claim is true, the average life of the 25 batteries should
exceed what value 95% of the time.
1st identify values in question: 𝑥 , … , 𝑥 → 𝑥 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑙𝑖𝑓𝑒 𝑜𝑓 𝑡ℎ𝑒 25 𝑏𝑎𝑡𝑡𝑒𝑟𝑖𝑒𝑠 → 𝑥̅
95% 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑖𝑚𝑒 → 𝑃 𝑥̅ 𝑐 0.95 𝑥 ~𝑁 60,36 𝐹𝑖𝑛𝑑 𝑐.
Solution: 𝐸 𝑥̅ 60, 𝑉𝑎𝑟 𝑥̅
⎡ ⎤
𝑥̅ 60 𝑐 60
∴ 𝑃 𝑥̅ 𝑐 1 𝑃 𝑥̅ 𝑐 1 𝑃⎢ ⎥
⎢ 36 36 ⎥
⎣ 25 25 ⎦
60 𝑐
1 Φ 0.95
6⁄5
𝑐 60
⟹Φ 0.05
6⁄5
⇒∴ ⁄
𝑍 . → 𝑁𝑜𝑤 𝑙𝑜𝑜𝑘 𝑎𝑡 𝑡𝑎𝑏𝑙𝑒 𝑡𝑜 𝑓𝑖𝑛𝑑 𝑣𝑎𝑙𝑢𝑒𝑠 𝑎𝑛𝑑 𝑢𝑠𝑒 𝑙𝑖𝑛𝑒𝑎𝑟 𝑖𝑛𝑡𝑒𝑟𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛 𝑡𝑜 𝑔𝑒𝑡 𝑍 .
𝑍. 1.65
∴
𝑍. 1.64
𝑔𝑒𝑡 𝑍 . 1.645 𝑍. 1.645
6
∴ 𝐶 60 𝑍 . 58.026 #
5

Ex 1.4: Suppose𝑍~𝑁 0,1 ,
❶ 𝑋 ~Γ ,2
2𝑥
❷ 𝐼𝑓 𝑥~Γ 𝛼, 𝛽 , 𝑡ℎ𝑒𝑛 𝐹 𝑥 𝐻
, 2𝛼 𝑤ℎ𝑒𝑟𝑒 𝐹 𝑖𝑠 𝑡ℎ𝑒 𝐶𝐷𝐹 𝑜𝑓 Γ 𝑎𝑛𝑑 𝐻 𝑖𝑠 𝐶𝐷𝐹 𝑜𝑓 𝑋
𝛽
1 𝑋 ∑ 𝑒 , 𝑤ℎ𝑒𝑟𝑒 𝑒 ~𝑁 0,1
❸ 𝑊ℎ𝑦 𝑙𝑒𝑎𝑟𝑛 𝑋 ? ̅
2 𝑟𝑒𝑙𝑎𝑡𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑆 ∑

Thm 1.5: If 𝑥 , . . . , 𝑥 is a r.s. from 𝑁 𝜇, 𝜎 , then
❶ 𝑥̅ 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚𝑠 𝑥 𝑥̅ , 𝑖 1, . . , 𝑛 𝑎𝑟𝑒 ⊥
❷ 𝑥 𝑎𝑛𝑑 𝑆 𝑎𝑟𝑒 ⊥
❸ ~𝑋 𝑛 1 ←𝑛 1 𝑑𝑜𝑓 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚
Proof:
𝑦 𝑥̅
1) Use transformation: ⋮ ⋮ →𝑓 ,…, 𝑓 ∗𝑓 ,…,
𝑦 𝑥 𝑥̅
∑ ̅
2) 𝑆 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥 𝑥̅ , 𝑖 1, … , 𝑛 𝑥̅ ⊥ 𝑆
∑ ∑ ̅ ̅ ∑ ̅ ∑ ̅ ̅ ∑ ̅
3) 𝑣
𝑥 𝑥̅ 𝑛 𝑥̅ 𝜇 ∑

𝜎 𝜎
𝑥 𝜇 𝑥 𝜇 𝑥 𝜇
~𝑁 0,1 → ~𝑋 1 → ~𝑋 𝑛
𝜎 𝜎 𝜎
𝑥̅ 𝜇 𝑥̅ 𝜇
𝜎 ~𝑁 0,1 → 𝜎 ~𝑋 1
√𝑛 √𝑛
𝑣 𝑣 𝑣
𝑀 𝑡 𝑀 𝑡 ∗𝑀 𝑡
𝑀 𝑡 1 2𝑡
𝑀 𝑡 1 2𝑡
𝑀 𝑡
1 2𝑡
∴ 𝑣 ~𝑋 𝑛 1 #

We can compute the percentile of 𝑆 by using the Chi‐Square table.
i.e., 𝛾 𝑃 𝑋 , 𝑃 𝑆 𝑋 , 𝑤ℎ𝑒𝑟𝑒 𝑖𝑓 𝑋 , ,𝑦 𝑑𝑜𝑓 𝑎𝑛𝑑 𝑧 𝑝𝑒𝑟𝑐𝑒𝑛𝑡
100 𝛾 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑆 , 𝑠𝑎𝑦 𝑐 , 𝑖𝑠:
𝜎 𝑋 ,
𝑐
𝑛 1

𝑎 𝐹𝑖𝑛𝑑 𝑃 𝑍 3.84 𝑢𝑠𝑖𝑛𝑔 𝑁𝑜𝑟𝑚𝑎𝑙 𝑇𝑎𝑏𝑙𝑒
Ex. 1.5: Let 𝑍~𝑁 0,1 ,
𝑏 𝐹𝑖𝑛𝑑 𝑃 𝑍 3.84 𝑢𝑠𝑖𝑛𝑔 𝐶ℎ𝑖 𝑆𝑞𝑢𝑎𝑟𝑒 𝑇𝑎𝑏𝑙𝑒

Sol: a) 𝑃 𝑍 3.84 𝑃 √3.84 𝑍 √3.84 𝑃 |𝑍| √3.84
𝑃 |𝑍| 1.96
∝
From Ex.1.4 𝑍 ∝ 1.96 ⇒ 1 0.975
⇒ ∝ 0.05
⇒ 𝑃𝑍 3.84 1 ∝ 1 0.05 0.95
b) 𝐹𝑖𝑟𝑠𝑡, 𝑛𝑜𝑡𝑒: 𝑍~𝑁 0,1 → 𝑍 ~𝑋 1 𝑎𝑛𝑑 𝑡ℎ𝑢𝑠 𝛾 1
𝑃 𝑍 3.84 𝑃𝑋 1 3.84 0.95 #

Snedecor’s F Distribution: developed by George W. Snedecor, “F” is to commemorate Sir Ronald Fisher

𝑣 ~𝑋 𝛾
Thm. 1.6: If ⊥ , then 𝐹 , ℎ𝑎𝑠 𝑡ℎ𝑒 𝑓𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔 𝑝𝑑𝑓:
𝑣 ~𝑋 𝛾
𝛾 𝛾
Γ 𝛾 𝛾
𝑔 𝑥; 𝛾 , 𝛾 2 𝑥 1 𝑥
γ γ 𝛾 𝛾
Γ Γ
2 2

Why F?
𝑣 ~𝑋 𝛾 ⟹ ~𝑋
Answer: 𝑊ℎ𝑒𝑛 𝑤𝑒 𝑐𝑜𝑚𝑝𝑎𝑟𝑒 𝜎 𝑎𝑛𝑑 𝜎 :
𝑣 ~𝑋 𝛾 ⟹ ~𝑋 𝑛 1
𝑛 1 𝑆
𝜎
𝑛 1 𝑆 𝑛 1 𝑆 𝜎
𝐹 , ~ ∗ ∗
𝜎 𝑛 1 𝑆 𝜎

Percentile: 𝑷 𝑿 𝒇𝜸 𝜸𝟏 , 𝜸𝟐 𝜸
For small value of 𝛾, e.g., 𝛾 0.01, we can use the fact that if 𝑋~𝐹 𝛾 , 𝛾 , then
𝑌 ~𝐹 𝛾 , 𝛾 .
Then: 𝛾 𝑃𝑋 𝑓 𝛾 ,𝛾 𝑃
,
1
𝑃 𝑌
𝑓 𝛾 ,𝛾
1
1 𝑃 𝑌
𝑓 𝛾 ,𝛾
1
⟹ 𝑃 𝑌 1 𝛾
𝑓 𝛾 ,𝛾
1
⟹ 𝑓 𝛾 ,𝛾
𝑓 𝛾 ,𝛾
Ex.1.6: 𝛾 0.01, 𝛾 3, 𝛾 5 𝐹𝑖𝑛𝑑 𝑓 . 3,5 ? 1 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
Sol: 𝑓. , 0.0355
. , . , .

Thm. 1.7: If 𝑋~𝐹 𝛾 , 𝛾 , 𝑡ℎ𝑒𝑛
𝛾
𝐸𝑋 , 𝛾 2
𝛾 2
2𝛾 𝛾 𝛾 2
𝑉𝑎𝑟 𝑋 , 𝛾 4
𝛾 𝛾 2 𝛾 4
(𝑋 𝑣 𝑣 𝐸𝑋 𝐸𝑣 𝐸𝑣 )

Ex. 1.7: Two vendors make the same product. We take independent r.s.’s of size 31 from each and
measure the length of the products (assumed normal). If the length is equally variable for the two
vendors, what is 𝑃 1.5 ? 𝑁𝑜𝑡𝑒: 𝜎 𝜎
Sol: We know 𝑛 𝑛 ⟹ ∗ ~𝐹 30,30
𝑆 𝑆
𝐼𝑓 𝜎 𝜎 , 𝑡ℎ𝑒𝑛 𝑃 1.5 ∴ 0
𝑆 𝑆
𝑃 1.5 𝑃𝐹 , 2.25 0.015 (Found by web calculator)
1 0.985 #

Student’s t Distribution: by William Gosset

Why?
Answer:
𝑥̅ 𝜇
𝜎 ~ 𝑁 0,1
√𝑛
𝑥̅ 𝜇
𝑆 ~𝑡 𝑛 1
√𝑛

Thm 1.8: If𝑍~𝑁 0,1 ⊥ 𝑉~𝑋 𝛾 , 𝑡ℎ𝑒𝑛 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑇
𝑖𝑠 𝑟𝑒𝑓𝑒𝑟𝑟𝑒𝑑 𝑡𝑜 𝑎𝑠 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 𝑠 𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝛾 𝑑𝑜𝑓, 𝑑𝑒𝑛𝑜𝑡𝑒𝑑 𝑏𝑦 𝑇~𝑡 𝛾 .
𝑇ℎ𝑒 𝑝𝑑𝑓 𝑜𝑓 𝑇 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦 ∶ 𝑓 𝑡; 𝛾 ∗ 1

√

̅
Thm.1.9: If 𝑥 , … , 𝑥 is a r.s. from 𝑁 𝜇, 𝜎 , then 𝑇 ~𝑡 𝑛 1
√
~ ,
̅ ̅ ̅
Proof: ∴ 𝑇 ∗ ~𝑡 𝑛 1 #
. .
√ ∗ ~

𝑑
𝑡 𝑛 ⎯⎯ 𝑁 0,1
→

1 𝑋 ~𝑁 𝜇, 𝜎 , 𝑛 𝑠𝑚𝑎𝑙𝑙 𝑒𝑔. 𝑛 5 ⟹ 𝑒𝑥𝑎𝑐𝑡𝑙𝑦 𝑁 0,1 ⇐ 𝑋
2 𝑋 ≁ 𝑁 𝜇, 𝜎 , 𝑛 𝑠𝑚𝑎𝑙𝑙 𝑒𝑔. 𝑛 10 ⟹ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒𝑙𝑦 𝑡 𝑛 1
3 𝑋 ≁ 𝑁 𝜇, 𝜎 , 𝑛 𝑙𝑎𝑟𝑔𝑒 𝑒𝑔. 𝑛 20 ⟹ 𝑎𝑠𝑦𝑚𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑁 0,1

Thm 1.10: The t distribution with 1 degree of freedom ( t(1) ) is the Cauchy distribution (pdf: 𝑓 𝑦
, ∞ 𝑦 ∞ )
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Proof: 𝑓 𝑡; 𝛾 ∗ 1 Γ 1 1, Γ √𝜋 𝑓 𝑦 #
√
𝑦≡𝑡

𝐸𝑦 ∞
𝑊ℎ𝑒𝑛 𝑦 𝑖𝑠 𝐶𝑎𝑢𝑐ℎ𝑦 ∶
𝑉𝑎𝑟 𝑦 ∞

𝑢~𝑁 0,1 𝑢
𝐼𝑓 , 𝑡ℎ𝑒𝑛 ~𝑡 1 𝐶𝑎𝑢𝑐ℎ𝑦
𝑣~𝑁 0,1 𝑣

Beta Distribution: developed by Karl Pearson
1 𝐹𝑙𝑒𝑥𝑖𝑏𝑙𝑒
Why learn Beta?
2 𝑅𝑒𝑙𝑎𝑡𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑜𝑟𝑑𝑒𝑟 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠

If 𝑋~𝐹 , , 𝑡ℎ𝑒𝑛 𝑌 ℎ𝑎𝑠 𝑡ℎ𝑒 𝑝𝑑𝑓: 𝑓 𝑦; 𝑎, 𝑏 𝑦 1 𝑦 ,0 𝑦 1 where 𝑎
,𝑏 . This is called the Beta distribution with parameters a>0 and b>0, denoted by
𝒀~𝑩𝑬𝑻𝑨 𝒂, 𝒃 .

The F distribution can be expressed by Beta:
𝛾 𝛾
𝑥 𝛾 𝛾 𝛾 𝑌
𝛾 𝛾
𝑌 𝛾 ⟹ 𝑌 𝑥𝑌 𝑥 ⟹ 𝑌 1 𝑌 𝑥 ⟹ 𝑥
1 𝑥 𝛾 𝛾 𝛾 1 𝑌
𝛾

The percentile of Beta can be computed by using the percentile of F: 𝛾 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝐵𝐸𝑇𝐴 𝑎, 𝑏
𝑎
𝑓 𝛾 ,𝛾 𝑎𝑓 2𝑎, 2𝑏
𝑦 𝑏
𝑎 𝑏 𝑎𝑓 2𝑎, 2𝑏
1 𝑓 𝛾 ,𝛾
𝑏

The Beta distribution is related to the order statistics:
𝑇ℎ𝑒 𝑝𝑑𝑓 𝑜𝑓 𝐾 𝑜𝑟𝑑𝑒𝑟 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐, 𝑥 : :
𝑛!
𝑔 𝑥 : 𝐹 𝑥 : 1 𝐹 𝑥 : 𝑓 𝑥 :
𝑘 1 ! 𝑛 𝑘 !
Γ 𝑎 𝑏
𝑓 𝑦; 𝑎, 𝑏 𝑦 1 𝑦
Γ 𝑎 Γ 𝑏

Chapter 2: Point Estimation

Def 2.1: A statistic 𝑇 𝑡 𝑥 , … , 𝑥 used to estimate the value of 𝜏 𝜃 is called an estimator of 𝜏 𝜃 ,
and an observed value of statistic, 𝔱 𝑡 𝑥 , … , 𝑥 is call an estimate of 𝜏 𝜃 .
𝑇 ⟶ 𝑟. 𝑣.
𝑡 ⟶ 𝑘𝑛𝑜𝑤𝑛 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑
Examples of 𝜏 𝜃 → 𝜃 , ln 𝜃 , 𝑒 , 𝑒𝑡𝑐.
Four methods for estimating parameters:
1) Method of Moment Estimators (MME)
2) Method of Maximum Likelihood (MLE)
3) Minimax Estimator
4) Bayes Estimator
Toss a coin 3 times:
1 𝑖𝑓 ℎ𝑒𝑎𝑑𝑠
Let 𝑥 𝑒. 𝑔. , 𝑥 , 𝑥 , 𝑥 ⇒ 1,0,1
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
How to estimate p = p(x=1) = p(head).
Write down the pdf:
1) 𝐿 𝑓 𝑥 ;𝑝 𝑓 𝑥 ;𝑝 𝑓 𝑥 ;𝑝 𝑝 1 𝑝
Want to find a P such that the above probability is maximized:
2) 𝐿 ∑ 𝑥 𝑝∑ 1 𝑝 ∑
3 ∑𝑥 1 𝑝 ∑
𝑝∑
𝑥 1 𝑝 3 𝑥 𝑝 𝑝∑ 1 𝑝 ∑

0
⇒ ∑𝑥 1 𝑝 3 ∑𝑥 𝑝 0
⇒ ∑𝑥 ∑𝑥 𝑝 3𝑝 ∑𝑥 𝑝 0
∑
⇒ 𝑝∗ 𝑥̅

MLE for P is 𝑃 𝑥̅
Maximum Likelihood Estimator
Def 2.2: The joint density function of n rv’s 𝑋 , … , 𝑋 evaluated at𝑥 , … , 𝑥 , say 𝑓 𝑥 , … , 𝑥 , is referred

to as a likelihood function. If 𝑋 , … , 𝑋 represents a rs from 𝑓 𝑥; 𝜃 , then 𝐿 𝜃 𝑓 𝑥 ; 𝜃 … 𝑓 𝑥 ; 𝜃
The maximum likelihood principle:
Choose 𝜃 for a given observed set of data such that the observed data would have been most likely
to occur.
Def 2.3: Let 𝐿 𝜃 𝑓 𝑥 , … , 𝑥 ; 𝜃 , 𝜃 ∈ Ω be the joint pdf of 𝑥 , … , 𝑥 . For a given set of (𝑥 , … , 𝑥 ), a

value 𝜃 𝑡 𝑥 , … , 𝑥 𝑖𝑛 Ω at which 𝐿 𝜃 is a maximum is called a maximum likelihood estimate (MLE)
of 𝜃, i.e.,
𝑓 𝑥 , … , 𝑥 ; 𝜃 max 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑎𝑛𝑑 𝜃 𝑡 𝑥 , … , 𝑥 is called the maximum likelihood

∈
estimator.
Solving for the Maximum likelihood estimator:
1) Write down 𝐿 𝜃 : aka the likelihood function
2) Find 𝜃 that maximizes 𝐿 𝜃 : two steps
a. Take ln ∙ of 𝐿 𝜃 (reasoning explained below)
b. Take partial derivative in terms of 𝜃 and set to 0, solve for 𝜃
3) Check that 𝜃 is a maximizer, not a minimizer
a. Take second partial derivative of ln 𝐿 𝜃
Ex 2.1: If r.s. 𝑥 𝑠~𝑃𝑂𝐼 𝜃 . 𝐹𝑖𝑛𝑑 𝜃 ?

Sol: STEP 1: Write down 𝐿 𝜃 :
𝑒 𝜃
∴ 𝑓 𝑥 , 𝑖 1, … , 𝑛
𝑥!
𝑒 𝜃 𝑒 𝜃∑
∴ 𝐿 𝜃 𝑓 𝑥
𝑥! ∏ 𝑥!
STEP 2: Find 𝜃 that maximizes 𝐿 𝜃
If 𝜃 that maximizes 𝐿 𝜃 , it also maximizes ln 𝐿 𝜃 because

ln ∙ is a monotonic function.
ln 𝐿 𝜃 𝑛𝜃 𝑥 ln 𝜃 ln 𝑥!
𝜕 ∑𝑥
ln 𝐿 𝜃 𝑛 0
𝜕𝜃 𝜃
∑𝑥
𝜃 𝑥̅
𝑛
STEP 3: Verify that 𝜃 is a maximizer not a minimizer. Need to check ln 𝐿 𝜃 0?
∑
ln 𝐿 𝜃 0 0⟹ 𝜃 is a maximizer. #
Ex. 2.2: RS 𝑥 𝑠~𝐸𝑋𝑃 1 𝑠𝑐𝑎𝑙𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 , 𝜂 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 ,two‐parameter exponential

distribution. Find 𝜂̂ ?
Sol: STEP 1: ∴ 𝑓 𝑥 𝑒 , 𝜂 𝑥
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
,
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ 𝑓 𝑥 𝑒 , 𝑥 𝜂
𝑒 , 𝑎𝑙𝑙 𝑥 𝑠 𝜂
∴ 𝐿 𝜂 𝑓 𝑥
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑒 ∑ , 𝑎𝑙𝑙 𝑥 𝑠 𝜂

STEP 2: ln 𝐿 𝜂 ∑ 𝑥 𝜂 , 𝑎𝑙𝑙 𝑥 𝑠 𝜂
𝜕 ln 𝐿 𝜂
𝑥 𝜂 0 𝑛 0
𝜕𝜂
𝐹𝑟𝑜𝑚 𝑡ℎ𝑒 𝑔𝑟𝑎𝑝ℎ 𝑜𝑓 𝐿 𝜂 , 𝑖𝑡𝑠 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑜𝑐𝑐𝑢𝑟𝑠 𝑎𝑡 𝑥 : ,
∴ 𝜂̂ 𝑥 :
STEP 3: (Trivial from STEP 2)
A poll was conducted at the Univ. of West Florida, 356 upper‐classmen were asked the question:
x=“How many times have you switched major?”
Q: Is 𝑥~𝑃𝑂𝐼 𝜃 ? # of major changes Observed frequency Expected frequency
0 237 230.4
1 90 100.2
2 22 21.8
Sol: From Ex. 2.1, we 3 7 3.8
know
∑𝑥
𝜃 𝑥̅ 250
𝑛
200
Observed Frequency
150

100
Actual Frequency
0.435 50
𝑒 ∙𝜃 0 1 2 3
𝑃 𝑥 0 𝑒 0.65
𝑥!
∴ 𝐸𝑥 0 356 0.65 230.4
Since the data and the expected frequency are close, we believe that 𝑥~𝑃𝑂𝐼 𝜃 (Need a formal

statement, see later lecture on hypothesis tests) #
Invariance Property
In Ex. 2.1, suppose we want to estimate 𝜏 𝜃 𝑝𝑥 0 𝑒 using MLE, can we use 𝜏̂ 𝑒 ?
STEP 1: 𝑓 𝑥; 𝜃 → 𝑓 𝑥; 𝑒
! !
𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝜏 𝑒 𝑎𝑛𝑑 𝜃 ln 𝜏
𝜏 ln 𝜏 𝜏 ln 𝜏 ∑
𝐿 𝜏 𝑓 𝑥 ;𝜏
𝑥! ∏ 𝑥!
ln 𝐿 𝜏 𝑛 ln 𝜏 𝑥 ln ln 𝜏 ln 𝑥!
∑
STEP 2: ∙ 0
∑𝑥 ∑
⇒ ln 𝜏 ⇒ 𝜏̂ 𝑒 𝑒
𝑛
𝐴𝑛𝑠𝑤𝑒𝑟 𝑦𝑒𝑠.
Thm 2.1: Invariance Property: If 𝜃 is the MLE of 𝜃 and if 𝜏 𝜃 is a function of 𝜃, then

𝜏 𝜃 𝑖𝑠 𝑡ℎ𝑒 𝑀𝐿𝐸 𝑜𝑓 𝜏 𝜃 .
Thm 2.2: If 𝜃 𝜃 , … , 𝜃 is the MLE of 𝜃 𝜃 , … , 𝜃 , then the MLE of 𝜏 𝜏 𝜃 , … , 𝜏 𝜃 is 𝜏̂

𝜏̂ , … , 𝜏̂ 𝜏 𝜃 , … , 𝜏 𝜃 .
Ex. 2.4: rs 𝑥 𝑠~𝑃 𝑥; 𝜇 ,𝑥 0,1,2, . … 𝐹𝑖𝑛𝑑 𝜇̂ ?

!
Substitution: 𝛽 𝜇 ⇒ 𝑃 𝑥; 𝛽 ⇒ 𝑃𝑂𝐼 𝛽
!
∑
Sol: STEP 1: 𝐿 𝜇 ∏ 𝑃 𝑥 ;𝜇 ∏
! ∏ !
ln 𝐿 𝜇 2 𝑥 ln 𝜇 𝑛𝜇 ln 𝑥!
∑ ∑
STEP 2: 2𝑛𝜇 0 𝑛𝜇
∑𝑥 ∑𝑥
⇒ 𝜇̂ 𝜇̂ 𝑑𝑖𝑠𝑐𝑎𝑟𝑑𝑒𝑑
𝑛 𝑛
√𝑥̅ 𝜃
∑
STEP 3: ln 𝐿 𝜇 𝑛 0 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒𝑟
If you realized the POI substitution, then 𝜇̂ 𝜃 √𝑥̅ #
Ex. 2.5: Let 𝑥 𝑠~𝐸𝑋𝑃 𝜃 𝑙𝑖𝑓𝑒𝑡𝑖𝑚𝑒 𝑜𝑓 𝑛 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠 . We only observed the first r

failures:𝑥 : , 𝑥 : , … , 𝑥 : . 𝐹𝑖𝑛𝑑 𝜃 𝑏𝑎𝑠𝑒𝑑 𝑜𝑛 𝑥 : , 𝑥 : , … , 𝑥 :
Sol: STEP 1: Write down 𝐿 𝜃 :
𝐿 𝜃 𝑓 𝑥 : ,𝑥 : ,…,𝑥 : ; 𝜃 𝑤ℎ𝑒𝑟𝑒 𝑓 𝑥 𝑒 𝑎𝑛𝑑 𝐹 𝑥 1 𝑒
! : :
∏ 𝑒 𝑒
! x x x x x
1: n 2: n . . . r: n r+1: n . . . n: n
∑ : :
!
∙ 𝑟𝑒 ∙𝑒
!
n
𝑛! ∑ 𝑥: 𝑛 𝑟 𝑥 :
𝑟 exp This is what
𝑛 𝑟 !𝜃 𝜃 r+
we observe
r
.
𝐷𝑒𝑓𝑖𝑛𝑒 𝑇 ∑ 𝑥: 𝑛 𝑟 𝑥 : .
.
2
1
STO x
⇒ 𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑟𝑣𝑖𝑣𝑎𝑙 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑛 𝑖𝑡𝑒𝑚𝑠 𝑢𝑛𝑡𝑖𝑙 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 𝑖𝑠 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑
STEP 2: Find 𝜃
𝑇
ln 𝐿 𝜃 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑟 ln 𝜃
𝜃
𝜕 ln 𝐿 𝜃 𝑟 𝑇
0
𝜕𝜃 𝜃 𝜃
𝑇
⇒ 𝜃
𝑟
STEP 3: Verify:
0 #
Ex. 2.6: rs 𝑥 𝑠~𝑁 𝜇, 𝜎 , Find 𝜇̂ and 𝜃 .
Sol: ∴ 𝑓 𝑥; 𝜇, 𝜎 𝑒
√
1 ∑
∴ 𝐿 𝜇, 𝜎 𝑒 2𝜋𝜎 𝑒
√2𝜋𝜎
𝑛 ∑ 𝑥 𝜇
ln 𝐿 𝜇, 𝜎 ln 2𝜋𝜎
2 2𝜎
⎧ 𝜕 ∑ 𝑥 𝜇 ⇒ 𝑥 𝜇 0
⎪ ln 𝐿 𝜇, 𝜎 0 0 ∑𝑥
⎪ 𝜕𝜇 𝜎
⇒ 𝜇̂ 𝑥̅
⎪ 𝑛
⎪ 𝑥 𝜇
⎪ ⎧ ⇒ 𝑛
⎪ ⎪ 𝜎
⎪ 1
⎪ ⇒𝜎 𝑥 𝜇̂
⎨ ⎪ 𝑛
⎪ 𝜕 ln 𝐿 𝜇, 𝜎 𝑛 1 ∑ 𝑥 𝜇 1
0 ⇒ 𝜎 𝑥 𝑥̅
⎪𝜕𝜎 2𝜎 2𝜎 ⎨ 𝑛
⎪ ⎪ 𝑛 1∑ 𝑥 𝑥̅
⎪ ⎪
⎪ ⎪ 𝑛 𝑛 1
⎪ ⎪ 𝑛 1
⎩ ⎩ 𝑆
𝑛
𝜇̂ 𝑥
∴ ∴ 𝑆 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 #
𝜎 𝑆
∴𝜎 𝑖𝑠 𝑏𝑖𝑎𝑠𝑒𝑑
Ex 2.7: RS xi’s ~𝑓 𝑥; 𝜃, 𝜂 𝜃𝜂 𝑥 ; 𝜂 𝑥, 0 𝜃, 0 𝜂 ∞.
Find 𝜃 and 𝜂̂ .
Sol: STEP 1:
𝐿 𝜃, 𝜂 𝑓 𝑥 ; 𝜃, 𝜂 𝜃𝜂 𝑥 𝜃 𝜂 𝑥
ln 𝐿 𝜃, 𝜂 𝑛 ln 𝜃 𝑛𝜃 ln 𝜂 𝜃 1 ln 𝑥
𝑛 ln 𝜃 𝑛𝜃 ln 𝜂 𝜃 1 ln 𝑥
,
𝑛 ln 𝜂 ∑ ln 𝑥 0
STEP 2: ,
0
𝑛 𝑛
𝜃 𝑥 𝜃 𝑥
∑ ln ⇒ ∑ ln
𝜂 𝑥:
𝜂̂ 𝑥: 𝜂̂ 𝑥 :
STEP 3: Verify.. #

****Find MLE for 𝛼 percentile?
Suppose 𝑥 𝛼 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑥 ↔ 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑒. 𝑔, 𝜃, 𝜂
𝐸. 𝑔. , 𝑋~𝐸𝑋𝑃 𝜂, 𝜃 𝜂 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛, 𝜃 𝑠𝑐𝑎𝑙𝑒 𝐹𝑖𝑛𝑑 𝑥 ? 𝑥 , . . , 𝑥 𝑖𝑠 𝑎 𝑅𝑆
⇒𝑥 𝜃 ln 1 𝛼 𝜂
∴𝐹 𝑥 𝛼 1 𝑒
⇒ 𝑢𝑠𝑒 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦.
Methods of Moment Estimators: by Karl Pearson
Methods of Moments Estimator: Suppose x is a continuous rv and its pdf, 𝑓 𝑥; 𝜃 , 𝜃 , … , 𝜃 has k
unknown parameters. The jth population(theoretical) moment about the origin is:𝜇 𝐸𝑥
𝑥 𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥, 𝑗 1, … , 𝑘 The jth population moment about the mean is
𝜇 𝐸 𝑥 𝐸𝑥 𝐸 𝑥 𝜇
𝐿𝑒𝑡 𝑥 , … , 𝑥 𝑏𝑒 𝑎 𝑟𝑠 𝑓𝑟𝑜𝑚 𝑓 𝑥; 𝜃 , … , 𝜃 . 𝑀𝑀𝐸, 𝜃 , … , 𝜃 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑠 𝑜𝑓
∑𝑥
𝜇 𝐸𝑥 𝑥𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥
𝑛
∑𝑥
𝜇 𝐸𝑥 𝑥 𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥
𝑛
⋮
∑𝑥
𝜇 𝐸𝑥 𝑥 𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥
𝑛
∑
If x is discrete: ∑∀ 𝑥 𝑃 𝑥 𝑋; 𝜃 , … , 𝜃 ,𝑗 1, … , 𝑘
***Idea: 𝑥 , 𝑥 , … , 𝑥 ↓ 𝑡ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝑚𝑜𝑚𝑒𝑛𝑡𝑠

𝑥 𝑥 𝑥
⎧ ⋯ ⎯⎯⎯⎯⎯ 𝜇 𝑚𝑒𝑎𝑛 𝑥
𝑛 𝑛 𝑛
⎪
⎪𝑥 𝑥 𝑥
⋯ → 𝜇 2 𝑚𝑜𝑚𝑒𝑛𝑡 𝑥
⎪𝑛 𝑛 𝑛
𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑜𝑚𝑒𝑛𝑡𝑠 𝑥 𝑥 𝑥
⎨ ⋯ → 𝜇 3 𝑚𝑜𝑚𝑒𝑛𝑡 𝑥
𝑛 𝑛 𝑛
⎪ ⋮
⎪
⎪𝑥 𝑥 𝑥
⎩ 𝑛 ⋯ → 𝜇 𝑗 𝑚𝑜𝑚𝑒𝑛𝑡 𝑥
𝑛 𝑛
Def 2.5: The first k sample moments are:
∑𝑥
𝑀 , 𝑗 1,2, … , 𝑘
𝑛
Def 2.6: Let 𝑥 , … , 𝑥 be a rs from 𝑓 𝑥; 𝜃 , … , 𝜃 . The method of moments estimator (MME), 𝜃 , … , 𝜃

are the solutions of:
∑𝑥
𝜇 𝐸𝑋 𝑥𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥 𝑀
𝑛
∑𝑥
𝜇 𝐸𝑋 𝑥 𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥 𝑀
𝑛
⋮
∑𝑥
𝜇 𝐸𝑥 𝑥 𝑓 𝑥; 𝜃 , … , 𝜃 𝑑𝑥 𝑀
𝑛
If x is discrete, then the equations become:
∑𝑥
𝑥 𝑃 𝑋 𝑥; 𝜃 , … , 𝜃 , 𝑗 1, … , 𝑘
𝑛
∀
Ex 2.8: Let 𝑋~𝑃 𝑘; 𝜃 𝜃 1 𝜃 ,𝑘 0,1. If 𝑥 , … , 𝑥 1,0,1,1,0
Find 𝜃 .
Theoretical Moment Sample Moment
𝐸𝑋 0∙𝑃 1∙𝑃
0∙ 1 𝜃 1∙ 𝜃
∑ 𝑥 1 0 1 1 0 3
𝜃 0.6
𝑛 5 5
∴ 𝜃 0.6 𝑥̅ #
Ex 2.9: If 𝑥 𝑠~𝐸𝑋𝑃 1, 𝜂 , 𝑓 𝑥; 𝜃, 𝜂 𝑒 , 𝜂 𝑥
Find 𝜂̂ ? ⇒ 𝑓 𝑥; 1, 𝜂 𝑒
Sol:
∑𝑥
𝐸𝑋 1 𝜂
𝑛
⇒ 𝜂̂ 𝑥̅ 1
𝑊ℎ𝑖𝑐ℎ 𝑜𝑛𝑒 𝑖𝑠 𝑏𝑒𝑡𝑡𝑒𝑟? 𝑆𝑒𝑒 𝑛𝑒𝑥𝑡 𝑡𝑜𝑝𝑖𝑐
𝜂̂ 𝑥:
Ex 2.10: Let 𝑋~𝑁 𝜇, 𝜎 , 𝑅𝑆 𝑥 𝑠, Find 𝜇̂ and 𝜎 ?
Sol:
𝜇̂ 𝑥̅
1 𝐸𝑥 𝜇 𝑥̅ ⎧
∑𝑥 𝑛 1
∑𝑥 ⇒ 𝜎 ⏟ 𝑥̅ 𝑆
2 𝐸𝑥 𝜎 𝜇 ⎨ 𝑛 𝑛
𝑛 ⎩
∑ ̅
𝑘 1
Ex 2.11: Let 𝑋~𝑃 𝑘; 𝑟, 𝑝 𝑝 1 𝑝 ,𝑘 𝑟, 𝑟 1, ⋯.
𝑟 1
Find 𝑟̂ and 𝑝̂ ? ↖ 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
1) 𝐸 𝑋 ⏟ 𝐸∑ 𝑌 ⏟ 𝑥̅
~
2) 𝐸 𝑋 𝑉𝑎𝑟 𝑋 𝐸𝑋
𝑟
𝑉𝑎𝑟 𝑌 ↓𝑌 𝑠 ⊥
𝑝
∑ 𝑉𝑎𝑟 𝑌 ↓ 𝑉𝑎𝑟 𝑌
∑
∑

Solve for r and p :
𝑥̅
⎧r 𝑛 1
⎪ 𝑆 𝑥̅
𝑛
⇒ #
⎨ 𝑥̅
⎪p 𝑛 1
⎩ 𝑆 𝑥̅
𝑛

Criterion for Evaluating Estimators:

Def 2.7: An estimator T is said to be an unbiased estimator of 𝜏 𝜃 if
∀𝜃 ∈ Ω, 𝐸 𝑇 𝜏 𝜃 . Otherwise, we say that T is a biased estimator of 𝜏 𝜃

Alternate Definition: A point estimator 𝜃 is called an unbiased estimator of the parameter 𝜃 if 𝐸 𝜃 𝜃
for all possible values of 𝜃 . Otherwise 𝜃 is said to be biased. Furthermore the bias of is given by 𝐵
𝐸𝜃 𝜃. Note: bias occurs when the sample does not accurately represent the population which
contains the sample.

Thm 2.3: 𝐸 𝑥̅ 𝜇 and 𝐸 𝑆 𝜎

Ex. 2.12: Let xi’s be a rs from 𝑈 0, 𝜃 . Find an unbiased estimator for 𝜃 based on x1:n.
Sol: From review lecture, we know
𝑥
𝐹 𝑥 1 1 𝐹 𝑥 ⏞ 1 1
:
𝜃
𝑑𝐹 :
𝑥 𝑥
∴𝑓 𝑥 1 1
:
𝑑𝑥 𝜃
𝑛 1
𝑛 𝑥
1
𝜃 𝜃
𝑛𝑥 𝑥
∴ 𝐸𝑥 : 𝑥𝑓 𝑥 𝑑𝑥 1 𝑑𝑥
:
𝜃 𝜃
𝑥 𝑥
𝑛𝑥 1 𝑑 1
𝜃 𝜃
𝑥
𝑥𝑑 1 𝐼𝑛𝑡𝑒𝑔𝑟𝑎𝑡𝑖𝑜𝑛 𝑏𝑦 𝑝𝑎𝑟𝑡𝑠
𝜃
𝑥 𝑛 𝑥
𝑥 1 𝑑𝑥 1
𝜃 0 𝜃
𝑥 𝑥
𝜃 1 𝑑 1
𝜃 𝜃
𝜃 𝑥 𝜃
1 |
𝑛 1 𝜃 0
𝜃
𝜃
𝑛 1
If 𝑇 𝑛 1 𝑥 : , then
𝜃
𝐸𝑇 𝐸 𝑛 1 𝑥 : 𝑛 1 ∙ 𝜃
𝑛 1
∴ 𝑇 is an unbiased estimator for 𝜃 #

Ex 2.13: 𝑅𝑆 𝑥 𝑠~𝐸𝑋𝑃 𝜃 , 𝜃 is mean. Find an unbiased estimator for rate .
Sol: ∴ 𝑇ℎ𝑒 𝑀𝐿𝐸 𝑓𝑜𝑟 𝜃 𝑖𝑠 𝑥̅ 𝑎𝑘𝑎 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜃 . If E T θ, E e e
∑𝑥 ∑𝐸 𝑥 ∑𝜃
𝐸 𝑥̅ 𝐸 𝜃
𝑛 𝑛 𝑛
1 1
𝑄: 𝐼𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 ?
𝑥̅ 𝜃
1
𝐸 ? ∴ 𝑥 ~𝐸𝑋𝑃 𝜃 ⇒ Γ 1, 𝜃
𝑥̅
2𝑥
∴ ~𝑋 𝐶𝑜𝑟𝑜𝑙𝑙𝑎𝑟𝑦 0.3
𝜃
2𝑥 2𝑛
𝑌 𝑥̅ ~𝑋 𝐶𝑜𝑟𝑜𝑙𝑙𝑎𝑟𝑦 0.4
𝜃 𝜃
1 2𝑛 Γ 𝑛 1
𝐸 𝐸𝑌 𝐸 𝑥̅ ⏞ 2
𝑌 𝜃 Γ 𝑛
1 1

2𝑛 1
𝜃 1 1 1 1 𝑛 1 1
∴ 𝐸 ⇒𝐸 ∙
2𝑛 𝑥̅ 2𝑛 1 𝑥̅ 𝑛 1 𝜃 𝜃
𝑛 1 1 1
∴ 𝑇 ∙ is unbiased for #
𝑛 𝑥̅ 𝜃

***Second Criterion => Variance of Estimator:
Ex 2.14: RS 𝑥 𝑠 ~𝐸𝑋𝑃 𝜃 , we know 𝜃 𝑥̅ is unbiased for θ. Find another θ that is
unbiased for θ
Sol: Try x1:n
𝜃
𝑥 : min 𝑥 , … , 𝑥 ~𝐸𝑋𝑃
𝑛
𝑃𝑥 : 𝑥 𝑃𝑥 𝑋, … , 𝑥 𝑋
𝜃
∴𝐸 𝑥 : 𝜃
𝑛
∴ 𝜃 𝑛𝑥 : is unbiased for θ.
Q: Which one ise better? That Var θ is greater than or less than Var θ ?
𝑉𝑎𝑟 𝑥 𝜃
𝑉𝑎𝑟 𝜃 𝑉𝑎𝑟 𝑥̅
𝑛 𝑛
𝜃
𝑉𝑎𝑟 𝜃 𝑉𝑎𝑟 𝑛𝑥 : 𝑛 𝑉𝑎𝑟 𝑥 : 𝑛 𝜃
𝑛
⇒ 𝑉𝑎𝑟 𝜃 𝑉𝑎𝑟 𝜃 ∴ 𝜃 is better #
Thm: If 𝑆 is the variance of a random sample from an infinite population with finite variance 𝜎 , then

𝑆 is an unbiased estimator for 𝜎
Ex 2.15: RS 𝑥 𝑠~ 𝑈𝑁𝐼𝐹 0, 𝜃 . 𝐶𝑜𝑛𝑠𝑖𝑑𝑒𝑟 𝜃 ∑ 𝑥 𝑎𝑛𝑑 𝜃 𝑥 : .
1 𝐴𝑟𝑒 𝑡ℎ𝑒𝑦 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑?

2 𝐼𝑓 𝑠𝑜, 𝑉𝑎𝑟 𝜃 𝑖𝑠 𝑔𝑟𝑒𝑎𝑡𝑒𝑟 𝑜𝑟 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑉𝑎𝑟 𝜃 ?
Sol:
1) 𝐸 𝜃 ∑𝐸 𝑥 ∙∑ 𝜃
𝑛 1
𝐸𝜃 𝐸𝑥 :
𝑛
𝑛 1 𝑛𝑥
𝑥∙ 𝑑𝑥
𝑛 𝜃
𝑛 1
𝑥 𝑑𝑥
𝜃
𝑛 1 1 𝜃
∙ 𝑥 |
𝜃 𝑛 1 0
𝜃
𝜃
𝜃
𝑁𝑜𝑡𝑒: 𝐹 :
𝑥 𝑃𝑥 : 𝑥 𝑃 max 𝑥 , … , 𝑥 𝑥 𝑃𝑥 𝑥, … , 𝑥 𝑥
~ , 𝑥
𝑃𝑥 𝑥 ∙ ⋯∙ 𝑃 𝑥 𝑥 𝐹 𝑥 ⋯𝐹 𝑥
𝜃
𝑛𝑥
⇒𝑓 𝑥
:
𝜃
2) 𝑉𝑎𝑟 𝜃 𝑉𝑎𝑟 ∑𝑥 ∑ 𝑉𝑎𝑟 𝑥 𝑛

𝑛 1 𝑛 1
𝑉𝑎𝑟 𝜃 𝑉𝑎𝑟 𝑥 : 𝑉𝑎𝑟 𝑥 :
𝑛 𝑛
𝑛 1
𝐸𝑥 : 𝐸𝑥 :
𝑛
𝑛𝑥 𝑛
𝐸𝑥 : 𝑥 𝑓 𝑥 𝑑𝑥 𝑥 𝑑𝑥 𝑥 𝑑𝑥
:
𝜃 𝜃
𝑛 1 𝜃
𝑥 |
𝜃 𝑛 2 0
𝑛 𝜃
∙
𝑛 2 𝜃
𝑛
𝜃
𝑛 2
𝑛 1 𝑛 𝑛 𝜃
∴ 𝑉𝑎𝑟 𝜃 𝜃 𝜃
𝑛 𝑛 2 𝑛 1 𝑛 𝑛 2
𝜃 𝜃
∴ 𝑉𝑎𝑟 𝜃 𝑉𝑎𝑟 𝜃 𝑛 2
3𝑛 𝑛 𝑛 2
Hence, θ is better #

Ex 2.16: During World War II, a very simple statistical procedure was developed for estimating German
war production. Every piece of German equipment (V‐2 rockets, tanks, or even automobile tire) was
stamped with a serial number that indicated the order which it was manufactured. If the total # of, say,
Mark I tanks produced by certain date was N, each would bear one of the integers 1 to N.
As the war progressed, some of these numbers became known to the Allies – either by the direct
capture of a piece of equipment or from records seized when a command post was overrun.
The problem was to estimate N using the sample of “captured” serial numbers
1 𝑋 𝑋 ⋯ 𝑋 𝑁
Q: How to estimate N using 𝑥 , … , 𝑥 ?
Sol: The 1st method assumes equal probability:
1
𝑃 𝑋 𝑥 ,𝑋 𝑥 ,…,𝑋 𝑥
𝑁
𝑛
N is estimated by adding the average gap to the maximum
order statistic:
1
𝑁 𝑥 : 𝑥 𝑥 1
𝑛 1
E.g., 2,6,8
1
𝑁 8 6 2 1 8 6 1 8 2 10
2
The 2nd method uses the discrete version of MLE:
𝑛 1
𝑁 𝑥 : 1
𝑛
It can be shown that: 1 1 #

Uniformly Minimum Variance Unbiased Estimator: Let 𝑥 , … , 𝑥 be a RS of
size n from 𝑓 𝑥; 𝜃 . An estimator 𝑇 ∗ of 𝜏 𝜃 is called a uniformly minimum
variance unbiased estimator (UMVUE) of 𝜏 𝜃 if:
1 𝑇 ∗ 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜏 𝜃
𝑎𝑛𝑑
2 𝐹𝑜𝑟 𝑎𝑛𝑦 𝑜𝑡ℎ𝑒𝑟 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑇 𝑜𝑓 𝜏 𝜃 , 𝑉𝑎𝑟 𝑇 ∗ 𝑉𝑎𝑟 𝑇 , ∀𝜃 ∈ Ω
Thm 2.4: Cramer‐Rao Lower Bound (CRLB): Let 𝑥 , … , 𝑥 be a rs from 𝑓 𝑥; 𝜃 . If 𝑓 𝑥; 𝜃 is

differentiable and the range of 𝑥 does not depend on 𝜃, then for an estimator T of 𝜏 𝜃 , we have:
Cramer‐Rao Lower Bound (CRLB):
𝜏 𝜃
𝑉𝑎𝑟 𝑇
𝜕
𝐸 ln 𝑓 𝑥 , … , 𝑥 ; 𝜃
𝜕𝜃
𝜏 𝜃
or for RS 𝑉𝑎𝑟 𝑇
𝜕
𝑛𝐸 ln 𝑓 𝑥, 𝜃
𝜕𝜃
Note: ln f x , … , x ; θ is the Fisher Information
Proof of CRLB theorem: Define 𝑈 𝑥 , … , 𝑥 ; 𝜃 ≡ 𝑈 ≡ ln 𝑓 𝑥 , … , 𝑥 ; 𝜃
1 𝜕
⇒𝑈 𝑓 𝑥 ,…,𝑥 ;𝜃
𝑓 𝑥 , … , 𝑥 ; 𝜃 𝜕𝜃
𝐸𝑢 0 because:
𝐸𝑢 ⋯ 𝑈 𝑥 , … 𝑥 ; 𝜃 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕
⋯ 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕𝜃
𝜕
⋯ 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕𝜃
𝜕
1
𝜕𝜃
0
If 𝑇 𝑡 𝑥 , … , 𝑥 is unbiased for 𝜏 𝜃 , then
𝜏 𝜃 𝐸𝑇 ⋯ 𝑡 𝑥 , … , 𝑥 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕
𝜏 𝜃 ⋯ 𝑡 𝑥 , … , 𝑥 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕𝜃
𝜕
⋯ 𝑡 𝑥 ,…,𝑥 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝜕𝜃
⋯ 𝑡 𝑥 , … , 𝑥 𝑈 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑑𝑥 ⋯ 𝑑𝑥
𝐸 𝑇𝑈
∴ 𝑐𝑜𝑟 𝑇, 𝑈 𝐸 𝑇𝑈 𝐸𝑇𝐸𝑈 𝐸 𝑇𝑈
𝑐𝑜𝑟 𝑇, 𝑈 𝑐𝑜𝑟 𝑇, 𝑈
∴ 1 𝑝 1⇒ 1 1⇒ 1
𝑉𝑎𝑟 𝑇 𝑉𝑎𝑟 𝑈 𝑉𝑎𝑟 𝑇 𝑉𝑎𝑟 𝑈
𝑐𝑜𝑟 𝑇, 𝑈
⇒ 𝑉𝑎𝑟 𝑇
𝑉𝑎𝑟 𝑈
𝜏 𝜃

𝐸𝑈 𝐸𝑈
𝜏 𝜃

𝜕
𝐸 ln 𝑓 𝑥 , … , 𝑥 ; 𝜃
𝜕𝜃
𝐼𝑓 𝑥 , … , 𝑥 𝑖𝑠 𝑎 𝑟𝑠, 𝑡ℎ𝑒𝑛: 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑓 𝑥 ;𝜃
𝜕 𝜕
∴ 𝑈 𝑥 ,…,𝑥 ;𝜃 ln 𝑓 𝑥 ;𝜃 ln 𝑓 𝑥 ; 𝜃
𝜕𝜃 𝜕𝜃
𝜕
𝐴𝑙𝑠𝑜, 𝐸𝑈 ⏟ 𝑉𝑎𝑟 𝑈 𝑉𝑎𝑟 ln 𝑓 𝑥 ; 𝜃
𝜕𝜃
𝜕
𝑉𝑎𝑟 ln 𝑓 𝑥 ; 𝜃
𝜕𝜃
𝜕
𝑛𝐸 ln 𝑓 𝑥; 𝜃
𝜕𝜃
𝜏 𝜃
∴ 𝑉𝑎𝑟 𝑇 #
𝜕
𝜕𝜃
Ex 2.17: Let rx 𝑥 𝑠~𝑁 𝜇, 𝜎 . Find CRLB(𝜇).
Sol:
1) 𝜏 𝜇 𝜇⇒𝜏 𝜇 1
2) 𝑓 𝑥; 𝜇, 𝜎 𝑒
√
1
ln 𝑓 𝑥; 𝜇, 𝜎 ln 𝑥 𝜇
√2𝜋𝜎
𝜕 𝑥 𝜇 𝑥 𝜇
ln 𝑓 𝑥; 𝜇, 𝜎 0 𝜇
𝜕𝜇 𝜎 𝜎
𝑥 𝜇 1 𝑥 𝜇 1 1 1
𝐸 𝐸 ⏟ 𝐸𝑍 𝑉𝑎𝑟 𝑍 𝐸𝑍
𝜎 𝜎 𝜎 𝜎 𝜎 𝜎
~ ,
3) ∴ 𝐶𝑅𝐿𝐵 𝜇 : 𝑉𝑎𝑟 𝑇
; , ∙
𝜎
∴ 𝑉𝑎𝑟 𝑥̅ ∴ 𝑥̅ 𝑖𝑠 𝑡ℎ𝑒 𝑈𝑀𝑉𝑈𝐸 #
𝑛
𝟐 𝝉 𝝁 𝝁𝟐 𝟐
𝝉 𝝁 𝝁𝟐
𝟐
**𝑪𝑹𝑳𝑩 𝝁 𝝏 𝟐 ⏞ 𝝏 𝟐
𝒏𝑬 𝐥𝐧 𝒇 𝒙;𝝁,𝝈𝟐 𝒏𝑬 𝐥𝐧 𝒇 𝒙;𝝁,𝝈𝟐
𝝏𝝁 𝝏𝝁
𝟐
𝟐𝝁 ∙ 𝝁
𝟐
𝝏
𝒏𝑬 𝐥𝐧 𝒇 𝒙; 𝝁, 𝝈𝟐
𝝏𝝁
𝟐𝝁 𝟐 𝑪𝑹𝑳𝑩 𝝁
Ex 2.18: RS 𝑥 𝑠~𝐵𝐼𝑁 1, 𝑝 . Find CRLB(p).

Sol: See Quiz. 𝐶𝑅𝐿𝐵 𝑝
𝑝 1 𝑝
Q: What estimator can achieve this bound? 𝑉𝑎𝑟 𝑥̅
𝑛
Note:
a) " "⇒ " " 𝑖𝑓𝑓 𝑝 1 ⇒ 𝑇 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝑎𝑈 𝑏 𝑤ℎ𝑒𝑟𝑒 𝑈 ln 𝑓 𝑥 , … , 𝑥 ; 𝜃
∑
E.g., Ex 2.18: 𝑇 𝑥̅ 𝑎𝑈 𝑏 (U is linear function of xi)
b) Suppose:
𝐼𝑓 𝑇 ⎯⎯⎯⎯⎯ 𝜏 𝜃 & 𝑇 𝑎𝑐ℎ𝑖𝑒𝑣𝑒𝑠 𝐶𝑅𝐿𝐵 𝜏 𝜃 𝑇 𝑎 𝑈 𝑏

𝑇 𝑎 𝑇 𝑏
𝐼𝑓 𝑇 ⎯⎯⎯⎯⎯ 𝑔 𝜏 𝜃 & 𝑇 𝑎𝑐ℎ𝑖𝑒𝑣𝑒𝑠 𝐶𝑅𝐿𝐵 𝑔 𝜏 𝜃 𝑇 𝑎 𝑈 𝑏
⇒𝑔 𝜏 𝜃 𝐸𝑇 ⏟ 𝑎 𝐸𝑇 𝑏 𝑎 𝜏 𝜃 𝑏
∴ 𝑇ℎ𝑚 2.5 𝑔 𝜏 𝜃 𝑎 𝜏 𝜃 𝑏_3
E.g., 𝑥 𝑠~𝐵𝐼𝑁 1, 𝑝 , 𝑤𝑒 𝑘𝑛𝑜𝑤 𝑇 𝑥̅ 𝑎𝑐ℎ𝑖𝑒𝑣𝑒𝑠 𝐶𝑅𝐿𝐵 𝑝
∴ 𝑜𝑛𝑙𝑦 𝑎 𝑝 𝑏 𝑎𝑑𝑚𝑖𝑡𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝑤𝑖𝑡ℎ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑞𝑢𝑎𝑙 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑
2𝑝 ∗ 𝑝 1 𝑝
𝜏 𝑝 𝑝 ⇒ 𝐶𝑅𝐿𝐵 𝑝 𝑝 𝐶𝑅𝐿𝐵 𝑝
𝑛
2𝑥
Ex 2.19 Let RS 𝑥 𝑠~𝑓 𝑥; 𝜃 ,0 𝑥 𝜃
𝜃
a) Find an unbiased estimator T for 𝜃
b) What is Var(T)=? Compare Var(T) with CRLB(𝜃 .
Sol:
a) Try 𝑥̅ and see how 𝐸 𝑥̅ differs from 𝜃.

2𝑥 2𝑥 2𝑥 𝜃 2
𝐸𝑥 𝑥 ∙ 𝑓 𝑥; 𝜃 𝑑𝑥 𝑥∙ 𝑑𝑥 𝑑𝑥 | 𝜃
𝜃 𝜃 3𝜃 0 3
Try: 𝑇 𝑥̅
3 3 3 ∑𝑥 3 𝐸𝑥 3 𝑛 2
𝐸 𝑥̅ 𝐸 𝑥̅ 𝐸 ∙ ∙ 𝜃 𝜃
2 2 2 𝑛 2 𝑛 2 𝑛 3
b) 𝑉𝑎𝑟 𝑇 𝑉𝑎𝑟 𝑥̅ 𝑉𝑎𝑟 𝑥̅

2𝑥 2
𝑉𝑎𝑟 𝑥 𝐸𝑥 𝐸𝑥 𝑥 ∙ 𝑑𝑥 𝜃
𝜃 3
2𝑥 𝜃 4
| 𝜃
4𝜃 0 9
2 4
𝜃 𝜃
4 9
1
𝜃
18
9 1 𝜃
∴ 𝑉𝑎𝑟 𝑇 ∙ 𝜃
4𝑛 18 8𝑛
Compute CRLB θ :
𝜕 𝜕 2𝑥 𝜕 2
ln 𝑓 𝑥; 𝜃 ln ln 2𝑥 2 ln 𝜃
𝜕𝜃 𝜕𝜃 𝜃 𝜕𝜃 𝜃
𝜃 1 𝜃
∴ 𝐶𝑅𝐿𝐵 𝜃
𝜕 2 4𝑛
𝑛𝐸 ln 𝑓 𝑥; 𝜃 𝑛𝐸
𝜕𝜃 𝜃
𝜃 𝜃
𝑉𝑎𝑟 𝑇 ⏟ 𝐶𝑅𝐿𝐵 𝜃 #
8𝑛 4𝑛
.

Thm 2.5: If an unbiased estimator for 𝜏 𝜃 exists, the variance of which achieves the CRLB, then only a
linear function of 𝜏 𝜃 will admit an unbiased estimator, the variance of which achieves the
corresponding CRLB.
Def 2.9: The relative efficiency of an unbiased estimator T of 𝜏 𝜃 to another unbiased estimator T* of
𝜏 𝜃 is:
𝑉𝑎𝑟 𝑇 ∗
𝑟𝑒 𝑇, 𝑇 ∗
𝑉𝑎𝑟 𝑇
𝑇 ∗ is efficient if 𝑟𝑒 𝑇, 𝑇 ∗ 1 ∀ 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑇 𝑎𝑛𝑑 𝜃 ∈ Ω
Ex 2.19: From Ex 2.14 (𝑥 𝑠~𝐸𝑋𝑃 𝜃 ), we have 𝜃 𝑥̅ and 𝜃 𝑛𝑥 : . Both are unbiased for 𝜃. We also

know that 𝜃 is the UMVUE. What is 𝑟𝑒 𝜃 , 𝜃 ?
Sol:
𝜃 𝜃
𝑉𝑎𝑟 𝜃 𝑉𝑎𝑟 𝑥̅ 𝑛 𝑛 1
𝑟𝑒 𝜃 , 𝜃 ⏟
𝑉𝑎𝑟 𝜃 𝑉𝑎𝑟 𝑛𝑥 : 𝑛 𝑉𝑎𝑟 𝑥 : 𝜃 𝑛
~ 𝑛
: 𝑛
∴ 𝜃 is a very poor estimator for 𝜃 #
Def 2.6:Mean Square Error and Bias: If T is an estimator of 𝜏 𝜃 , then the bias is 𝑏 𝑇 𝐸𝑇 𝜏 𝜃
and the Mean Squared Error (MSE) is:
𝑀𝑆𝐸 𝑇 𝐸 𝑇 𝜏 𝜃 𝑉𝑎𝑟 𝑇 𝑏 𝑇
Thm 2.6: 𝑀𝑆𝐸 𝑇 𝑉𝑎𝑟 𝑇 𝑏 𝑇
Proof: Homework 3 Q.1
Ex 2.20:
RS 𝑥 𝑠~𝐸𝑋𝑃 1, 𝜂 where ηis location parameter. We know that
𝑀𝑀𝐸: 𝜂̂ 𝑥̅ 1
Compare the MSEs.
𝑀𝐿𝐸: 𝜂̂ 𝑥:
Sol: Need to find: 𝐸 𝜂̂ , 𝐸 𝜂̂ , 𝑉𝑎𝑟 𝜂̂ , 𝑉𝑎𝑟 𝜂̂
𝐸 𝜂̂ 𝐸 𝑥̅ 1 𝐸 𝑥̅ 1 1 𝜂 1 𝜂 ⇒ 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑
1
𝐸 𝜂̂ 𝐸𝑥 : 𝐸𝑥 : 𝜂 𝜂 𝐸𝑥 : 𝜂 𝜂 ⏟ 𝜂 ⇒ 𝑏𝑖𝑎𝑠𝑒𝑑
𝑛
𝑁𝑜𝑡𝑒: 𝑥 : 𝜂 min 𝑥 , … 𝑥 𝜂 min 𝑥 𝜂, … , 𝑥 𝜂 min 𝑌 , … , 𝑌 → 𝑌 ~𝐸𝑋𝑃 1

1
∴𝑥 : 𝜂~𝐸𝑋𝑃
𝑛
𝑉𝑎𝑟 𝜂̂ 𝑉𝑎𝑟 𝑥̅ 1 𝑉𝑎𝑟 𝑥̅
𝑉𝑎𝑟 𝑥

𝑛
𝑉𝑎𝑟 𝑌
𝑤ℎ𝑒𝑟𝑒 𝑌~𝐸𝑋𝑃 1
𝑛
1

𝑛
𝑉𝑎𝑟 𝜂̂ 𝑉𝑎𝑟 𝑥 : 𝑉𝑎𝑟 𝑥 : 𝜂 𝜂
𝑉𝑎𝑟 𝑥 : 𝜂
𝑉𝑎𝑟 𝑍 𝑤ℎ𝑒𝑟𝑒 𝑍 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑖𝑛 𝑁𝑜𝑡𝑒
1

𝑛
1 1
𝑀𝑆𝐸 𝜂̂ 𝑉𝑎𝑟 𝜂̂ 𝐸 𝜂̂ 𝜂 0
𝑛 𝑛
∴
1 1 1 1 2
𝑀𝑆𝐸 𝜂̂ 𝑉𝑎𝑟 𝜂̂ 𝐸 𝜂̂ 𝜂 𝜂 𝜂
𝑛 𝑛 𝑛 𝑛 𝑛
∴ 𝑀𝑆𝐸 𝜂̂ 𝑀𝑆𝐸 𝜂̂ ⇒ 𝜂̂ 𝑖𝑠 𝑏𝑒𝑡𝑡𝑒𝑟. #
Bayes and Minimax Estimators
Def 2.11: Loss Function: If T is an estimator of 𝜏 𝜃 , then the loss function is any real‐value function,
𝐿 𝑡; 𝜃 , such that:
𝐿 𝑡; 𝜃 0 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝜃
𝑎𝑛𝑑
𝐿 𝑡; 𝜃 0 𝑤ℎ𝑒𝑛 𝑡 𝜏 𝜃
𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟: 𝐿 𝑇; 𝜃 |𝑇 𝜏 𝜃 | ⇒ 𝐿 𝑡; 𝜃 |𝑡 𝜏 𝜃 | 𝑤ℎ𝑒𝑛 𝑥 𝑠 𝑎𝑟𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑

E.g.,
𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟: 𝐿 𝑇; 𝜃 𝑇 𝜏 𝜃
Def 2.12:Risk Function: The risk function is defined to be the expected loss:
𝑅 𝜃 𝐸 𝐿 𝑇; 𝜃
Def 2.13: Admissible Estimator: An estimator 𝑇 is a better estimator then 𝑇 iff
𝑅 𝜃 𝑅 𝜃 ∀𝜃 ∈ Ω
𝑎𝑛𝑑
𝑅 𝜃 𝑅 𝜃 𝑓𝑜𝑟 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝜃
An estimator is admissible iff there is no better estimator.
** Not called “best” estimator because it is the “best” only for a particular loss function 𝐿 𝑡; 𝜃 .

Def 2.14: Minimax Estimator:
An estimator 𝑇 is a minimax estimator if:
max 𝑅 𝜃 max 𝑅 𝜃
for every estimator T.
𝑇 𝜃 𝜃 𝜃
𝑇 𝑇 𝜃 𝜃 𝜃
⋮
Def 2.15: Bayes Risk:
For a rs from 𝑓 𝑥; 𝜃 , the Bayes Risk of an estimator T relative to a risk function 𝑅 𝜃 and pdf 𝑝 𝜃 is

the average risk in respect to 𝑝 𝜃 :
𝐴 𝐸 𝑅 𝜃 𝑅 𝜃 𝑝 𝜃 𝑑𝜃
 The Bayes estimator is the one which gives the smallest risk(Bayes)
Bayes Estimator: For a rs from 𝑓 𝑥; 𝜃 the Bayes Estimator T* relative to the risk function 𝑅 𝜃 and

pdf 𝑝 𝜃 is the estimator with the minimum expected risk:
𝐸 𝑅 ∗ 𝜃 𝐸 𝑅 𝜃 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝑇
↑

𝐸 𝐿 𝑇, 𝜃
Posterior Distribution: the conditional density of 𝜃 given the sample observation 𝑋 𝑥 , … , 𝑥 is
called the posterior density (or posterior pdf) and is given by:
𝑓 𝑥 , … , 𝑥 |𝜃 𝑝 𝜃
𝑓 | 𝜃
𝑓 𝑥 , … , 𝑥 |𝜃 𝑝 𝜃 𝑑𝜃
 Posterior distribution integrates prior known information 𝜃 with the updated sample
information X

Ex 2.12: RS 𝑥 𝑠~𝑁 𝜇, 2 , prior 𝑝 𝜇 is 𝑁 0, 𝜎 𝑤𝑖𝑡ℎ 𝜎 unknown. Find f | μ ?

Sol:
𝑓 𝑥 , … , 𝑥 |𝜇 𝑝 𝜇
𝑓| 𝜇 𝐷𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝜇
𝑓 𝑥 , … , 𝑥 |𝜇 𝑝 𝜇 𝑑𝜇
1 1
𝑓 𝑥 , … , 𝑥 |𝜇 𝑝 𝜇 𝑒 ∙ ∙ 𝑒
√2𝜋2 √2𝜋𝜎
∑
1
𝑒 ⇒ 𝐸𝑥𝑝𝑎𝑛𝑑 𝑥 𝜇
2𝜋 ∙ 2 ∙ √2𝜋𝜎
∑
̅
𝑐 ∙𝑒
∑ ̅
𝑐 𝑒 ∙𝑒
̅ ̅ ̅
𝑐 𝑒
̅
𝑁𝑜𝑡𝑒: 𝑐 𝑐 ∙𝑒
̅
𝑐 𝑒
̅
𝑐 𝑒
𝑛𝑥̅ 1
𝑁𝑜𝑡𝑒: 𝜇 𝑎𝑛𝑑 𝜎
𝑛 1 𝑛 1
2
2 𝜎 2 𝜎
𝑐 𝑒
𝑐 𝑒
∴ 𝑓| 𝜇 ⏟ 𝑐 𝑒
𝑐 ∴
1
𝑓| 𝜇 ⏟ 𝑒
|
2𝜋𝜎
𝑓 | 𝜇 𝑑𝜇 1
𝑐
∙𝑒 𝑑𝜇 1
𝑐
𝑐 1 1 𝑐 1
⇒ 𝑒 𝑑𝜇 ⇒
𝑐 2𝜋𝜎 2𝜋𝜎 𝑐 2𝜋𝜎

**** No Need to compute two expectations to find Bayes estimator:
𝐸 𝑅 𝜃 𝑅 𝜃 𝑝 𝜃 𝑑𝜃
𝐸 | 𝐿 𝑇; 𝜃 𝑝 𝜃 𝑑𝜃
𝐿 𝑇; 𝜃 𝑓 𝑥|𝜃 𝑝 𝜃 𝑑𝑥 𝑑𝜃
𝐿 𝑇; 𝜃 𝑓 𝑥; 𝜃 𝑑𝑥 𝑑𝜃
𝐿 𝑇; 𝜃 𝑓 𝜃|𝑥 𝑓 𝑥 𝑑𝑥 𝑑𝜃
𝐿 𝑇; 𝜃 𝑓 𝜃|𝑥 𝑑𝜃 𝑓 𝑥 𝑑𝑥
𝐸 | 𝐿 𝑇; 𝜃 𝑓 𝑥 𝑑𝑥

Thm 2.7: If𝑥 , … , 𝑥 denotes a RS from 𝑓 𝑥|𝜃 , then the Bayes estimator is the estimator that minimizes
the expected loss relative to the posterior distribution:
𝐸 | 𝐿 𝑇; 𝜃 .

Thm 2.8: The Bayes Estimator, T, of 𝜏 𝜃 under the squared error loss function, 𝐿 𝑇; 𝜃 𝑇 𝜏 𝜃 ,
is the conditional mean of 𝜏 𝜃 relative to the posterior:
𝑇 𝐸 | 𝜏 𝜃 𝜏 𝜃 𝑓 | 𝜃 𝑑𝜃
𝐼𝑓 𝜏 𝜇 𝜇⇒𝑇 𝐸 | 𝜇 𝜇𝑓 | 𝜇 𝑑𝜇 𝜇
E.g.:
𝜏 𝜇 𝜇 ⇒𝑇 𝑉𝑎𝑟 𝜇 𝐸𝜇 𝜎 𝜇

Ex 11.2.5: Let 𝑋 , … , 𝑋 be a sample from geometric distribution with parameter 𝑝, 0 𝑝 1. Assume
that the prior distribution of 𝑝 is BETA with 𝛼 4 and 𝛽 4.
a) Find the posterior distribution of 𝑝
b) Find the Bayes estimate under quadratic loss function(squared loss)
Sol:
a) Because 𝑝~𝐵𝐸𝑇𝐴 4,4 , the prior density is

Γ 8
𝑝 1 𝑝 140𝑝 1 𝑝
Γ 4 Γ 4
Because the r.v.’s 𝑋 ′𝑠 have geometric distribution with parameter p, the likelihood is given by
∑
𝐿 𝑋 , … , 𝑋 |𝜃 𝑝 1 𝑝 𝑝 1 𝑝
The product of the likelihood function and the prior is given by
𝑝 1 𝑝 ∑ 140𝑝 1 𝑝 140𝑝 1 𝑝 ∑
Because, posterior is proportional to prior of p times likelihood, factoring out the constant
140(posterior is proportional, not necessarily equal) gives a beta distribution with 𝛼 1 𝑛
3 𝑎𝑛𝑑 𝛽 1 3 ∑ 𝑥 𝑛. Therefore the posterior is proportional to 𝐵𝐸𝑇𝐴 𝑛
4, ∑ 𝑥 𝑛 4
b) Recall that for 𝐵𝐸𝑇𝐴 𝛼, 𝛽 random variable, the mean is . Because the Bayes estimate is the
posterior mean, the mean of 𝐵𝐸𝑇𝐴 𝑛 4, ∑ 𝑥 𝑛 4 is
𝑛 4 𝑛 4

∑ 𝑥 𝑛 4 𝑛 4 ∑ 𝑥 8
Ex 2.22: RS 𝑥 𝑠~𝑃𝑂𝐼 𝜃 . Assume prior density for 𝜃 is 𝐺𝐴𝑀 𝛼, 𝛽 .
Find θ under the squared error loss.
Sol: Compute the posterior:
𝑓 𝑥 , … , 𝑥 |𝜃 𝑝 𝜃 𝑒 𝜃
𝑓 | 𝜃 𝑁𝑜𝑡𝑒: 𝑓 𝑥
𝑓 𝑥 , … , 𝑥 |𝜃 𝑝 𝜃 𝑑𝜃 𝑥!
𝑒 𝜃 1
∏ ∙ ∙𝜃 𝑒
𝑥! 𝛽 Γ 𝛼

𝑒 𝜃 1
∏ ∙ ∙𝜃 𝑒 𝑑𝜃
𝑥! 𝛽 Γ 𝛼
𝑒 𝜃∑ 𝜃 𝑒

𝑒 𝜃∑ 𝜃 𝑒 𝑑𝜃
𝑒 𝜃∑

𝑒 𝜃∑ 𝑑𝜃
1
𝑁𝑜𝑡𝑒: 𝛼 𝑥 𝛼, 𝛽
1
𝑛
𝛽
1
𝛽 Γ 𝛼 𝑒 𝜃
∙ 𝑁𝑜𝑡𝑒: 𝐷𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟 1
1
𝛽 Γ 𝛼 𝑒 𝜃 𝑑𝜃
1
𝑒 𝜃
𝛽 Γ 𝛼
1
∴𝑓 | 𝜃 𝑖𝑠 𝐺𝐴𝑀 𝛼 , 𝛽 𝐺𝐴𝑀 𝑥 𝛼,
1
𝑛
𝛽
∑𝑥 𝛼
∴ 𝐵𝑦 𝑇ℎ𝑚 2.8, 𝜃 𝐸 | 𝜃
1
𝑛
𝛽
1
𝑛 ∑𝑥 𝛽
∙ 𝛼𝛽
1 𝑛 1
𝑛 𝑛
𝛽 𝛽
1
𝑛 𝛽
𝑥̅ 𝐸 𝜃
1 1
𝑛 𝑛
𝛽 𝛽
⇒ 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 𝑥̅ 𝑎𝑛𝑑 𝑝𝑟𝑖𝑜𝑟 𝑚𝑒𝑎𝑛.
𝑇ℎ𝑒 𝑟𝑖𝑠𝑘 𝑖𝑠: 𝑅 𝜃 𝐸 𝜃 𝜃 𝑉𝑎𝑟 𝜃 𝐸𝜃 𝜃
𝑛 ∑𝑥 𝛼
𝑉𝑎𝑟 𝑥̅ 𝐸 𝜃 𝑁𝑜𝑡𝑒: 𝐸 𝑥 𝜃
1 1
𝑛 𝑛
𝛽 𝛽
𝑛 𝑉𝑎𝑟 𝑥 𝑛𝜃 𝛼
𝜃 𝑁𝑜𝑡𝑒: 𝑉𝑎𝑟 𝑥 𝜃
1 𝑛 1
𝑛 𝑛
𝛽 𝛽
𝑛 𝜃 𝑛𝜃 𝛼
𝜃
1 𝑛 1
𝑛 𝑛
𝛽 𝛽
𝜃 →
𝑛𝜃 𝛼 𝜃
𝛽 →
⎯⎯ 𝑉𝑎𝑟 𝑥̅ #
1 𝑛
𝑛
𝛽
Ex 2.23: Predicting the annual number of hurricanes that will hit the U.S. mainland is a problem receiving
a great deal of public attention. (e.g., four major hurricanes struck Florida in Summer 2004; and
Hurricane Katrina attacked New Orleans in August 2005) Assuming the number of hurricanes reaching
the mainland is Poisson distributed with a yearly expected number of 𝜃, and the prior distribution of 𝜃 is
gamma, i.e., 𝑝 𝜃 𝜃 𝑒 , 0 𝜃. What is the Bayes estimator of 𝜃? Assume squared loss
function.
Sol: Prior is 𝐺𝐴𝑀 𝛼, 𝛽 ⇐ 𝑓 𝑥 ~𝑃𝑂𝐼 𝜃
Because the oldest data is most unreliable, we use it to
estimate 𝛼, 𝛽:
88
∴𝐸 𝜃 𝛼𝛽
50
1
∴ 𝑊𝑒 𝑔𝑢𝑒𝑠𝑠: 𝛼 88, 𝛽
50
1
⇒𝑝 𝜃 𝜃 𝑒
50 Γ 88
From Ex 2.22 the posterior is Γ ∑ 𝑥 𝛼, Γ 164 88, Γ 252,
1 ℎ𝑢𝑟𝑟𝑖𝑐𝑎𝑛𝑒𝑠
∴ 𝜃 252 ∗ 1.7 #
150 𝑦𝑒𝑎𝑟
Thm 2.9: The Bayes estimator, 𝜃 , of 𝜃 under absolute error loss
𝐿 𝜃; 𝜃 𝜃 𝜃
Is the median of the posterior 𝑓 | 𝜃 .
***SUMMARY:
𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝐿 𝑇; 𝜃
↓
𝑅𝑖𝑠𝑘 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝑅 𝜃 𝐸 𝐿 𝑇; 𝜃
↙ ↓ ↘
𝐸 𝑅 𝜃 𝐸 𝑅 𝜃
𝑅 𝜃 𝑅 𝜃 ∀𝜃 ∈ Ω 𝐵𝑎𝑦𝑒𝑠 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟
max 𝑅 𝜃 max 𝑅 𝜃
𝐴𝑑𝑚𝑖𝑠𝑠𝑖𝑏𝑙𝑒 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 ↙ ↘
𝐵𝑒𝑠𝑡 𝑓𝑜𝑟 𝑡ℎ𝑖𝑠 𝐿 𝑇; 𝜃 𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐿𝑜𝑠𝑠 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑒𝑟𝑟𝑜𝑟 𝐿𝑜𝑠𝑠
𝜏̂ 𝜃 𝐸 | 𝜏 𝜃 𝜃 𝑚𝑒𝑑𝑖𝑎𝑛 𝑜𝑓 𝑓 | 𝜃
𝑚𝑒𝑎𝑛 𝑜𝑓 𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟
E.g. of Bayes: 𝜃 𝐸 | 𝜃 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 𝜃
𝐸 | 𝜃 𝜃 𝑓 | 𝜃 𝑑𝜃

Chapter 3: Sufficiency and Completeness

Sufficiency
0 𝑖𝑓 ℎ𝑒𝑎𝑑𝑠
Ex 3.1: A coin is tossed n times: 𝑥 𝑅𝑆 𝑥 𝑠~𝐵𝐼𝑁 1, 𝜃 . What info do we need if we
1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
want to estimate 𝜃?
Sol: 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝜃∑ 1 𝜃 ∑
, 𝑥 0, 1
𝐷𝑒𝑓𝑖𝑛𝑒: 𝑆 𝑥 ~𝐵𝐼𝑁 𝑛, 𝜃
𝑛
𝑓 𝑠; 𝜃 𝜃 1 𝜃
𝑠
𝑁𝑜𝑤, 𝑔𝑖𝑣𝑒𝑛 𝑥 𝑠, 𝑤ℎ𝑎𝑡 𝑒𝑥𝑡𝑟𝑎 𝑖𝑛𝑓𝑜 𝑎𝑏𝑜𝑢𝑡 𝜃 𝑐𝑎𝑛 𝑏𝑒 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑?
𝑃𝑋 𝑥 ,𝑋 𝑥 ,…,𝑋 𝑥 ;𝑆 𝑠
𝑓 ,…, | 𝑥 ,…,𝑥
𝑓 𝑠; 𝜃
𝑓 𝑥 ,..,𝑥
𝑛
𝜃 1 𝜃
𝑠
𝜃∑ 1 𝜃 ∑
𝑛
𝜃 1 𝜃
𝑠
1
𝑛 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝜃.
𝑠
∴ 𝐾𝑛𝑜𝑤𝑖𝑛𝑔 𝑆 𝑥 𝑡𝑒𝑙𝑙𝑠 𝑒𝑣𝑒𝑟𝑦𝑡ℎ𝑖𝑛𝑔 𝑎𝑏𝑜𝑢𝑡 𝜃 𝑡ℎ𝑎𝑡 𝑥 , … , 𝑥 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 #
Jointly Sufficient Statistics: Let 𝑋 𝑥 , … , 𝑥 have joint pdf 𝑓 𝑥; 𝜃 and let 𝑆 𝑆 , … , 𝑆 be a k‐

dimensional statistic. Then 𝑆 , … , 𝑆 is a set of jointly sufficient statistics for 𝜃 if for any other vector of
statistics, T, the conditional pdf of T given 𝑆 𝑆 denoted by 𝑓 | 𝑡 does not depend on 𝜃.

Ex. 3.2: RS 𝑥 𝑠~𝐸𝑋𝑃 𝜃 . Find the sufficient statistic.
Sol: 𝑓 ,…, 𝑥 ,…,𝑥 ;𝜃 ∏ 𝑒
,
,…,
1 ∑ 1 ∑
𝑒 𝑒 ∙ ⏞
1
𝜃 𝜃
1
𝑇ℎ𝑖𝑠 𝑠𝑢𝑔𝑔𝑒𝑠𝑡𝑠 𝑆 𝑥 ~Γ 𝑛, 𝜃 ⇒ 𝑓 𝑠; 𝜃 𝑠 𝑒 , 𝑠 0
𝜃 Γ 𝑛
𝑓 ,…, 𝑥 ,…,𝑥 ;𝜃
⇒ 𝑓 | 𝑥|𝑠
𝑓 𝑠; 𝜃
∑
1
𝑒
𝜃
1
𝑠 𝑒
𝜃 Γ 𝑛
Γ 𝑛
𝑔 𝜃
𝑠
∴𝑆 𝑥 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡. #
Thm 3.1: (Neymen) Factorization Criterion: If 𝑥 , … , 𝑥 have joint pdf 𝑓 𝑥 , … , 𝑥 ; 𝜃 , and if 𝑆

𝑆 , … , 𝑆 , then 𝑆 , … , 𝑆 are jointly sufficient for 𝜃 iff
𝑓 𝑥 ,…,𝑥 ;𝜃 𝑔 𝑆, 𝜃 ∙ ℎ 𝑥 , … , 𝑥
Where 𝑔 𝑆, 𝜃 does not depend on 𝑥 , … , 𝑥 except through S, and ℎ 𝑥 , … , 𝑥 does not involve 𝜃.

(For proof see text 1)
Ex 3.3: In Ex 3.1, 𝑥 𝑠~𝐵𝐼𝑁 1, 𝜃 , we showed by definition that 𝑆 ∑ 𝑥 is sufficient for 𝜃. Now, use the

factorization criterion to show the sufficiency of 𝑆 ∑ 𝑥 .
Sol:
𝑓 𝑥 ,…,𝑥 ;𝜃 𝜃∑ 1 𝜃 ∑

𝜃 1 𝜃 ∙ 1
𝑔 𝑠; 𝜃 ∙ ℎ 𝑥 , … , 𝑥
∴𝑆 𝑥 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑓𝑜𝑟 𝜃 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑔 𝑠; 𝜃 𝜃 1 𝜃 𝑎𝑛𝑑 ℎ 𝑥 , … , 𝑥 1 #

**Note: Any one‐to‐one function of S is also sufficient
Ex 3.4: RS 𝑥 𝑠~𝑈𝑁𝐼𝐹 0, 𝜃 . Find a sufficient statistic for 𝜃.
Sol:
1
𝑓 𝑥 ,…,𝑥 ;𝜃 , 0 𝑥 𝜃, 𝑖 1, … , 𝑛
𝜃
Method 1: By definition
𝑓 𝑥 ,…,𝑥 ;𝜃
Consider 𝑆 𝑥 : ⇒ 𝑔 𝜃
𝑓 𝑥
𝑥
𝐹 𝑥 𝐹 𝑥
:
𝜃
𝑥
𝑓 𝑥 𝑛 ,0 𝑥 𝜃
:
𝜃
Compute the conditional distribution:
1
𝑓 𝑥 ,…,𝑥 ;𝜃 𝜃 1
𝑔 𝜃
𝑓 : 𝑥 𝑥 𝑛𝑥
𝑛
𝜃
∴𝑆 𝑥 : is sufficient for 𝜃
Method 2: By factorization:
1
𝑓 𝑥 ,…,𝑥 ;𝜃 , 0 𝑥 𝜃 ⇔ 0 𝑥 : &
𝜃 𝑥 : 𝜃

1
, 0 𝑥: , 𝑥 : 𝜃 𝐼 1, 𝑖𝑓 𝑎 𝑥 𝑏
𝜃 , 𝑥
1
𝐼 , 𝑥 : 𝐼 , 𝑥 :
𝜃
1
𝑔 𝑥, 𝜃 ℎ 𝑥 , … , 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑔 𝑥, 𝜃 𝐼 , 𝑥 : 𝑎𝑛𝑑 ℎ 𝑥 , … , 𝑥 𝐼 , 𝑥 :
𝜃
⇒ ∴ 𝑆 𝑥 : is sufficient for 𝜃

Ex 5.4.8: Let 𝑋 , … , 𝑋 denote a random sample from a 𝑈 0, 𝜃 with pdf
1
𝑓 𝑥 , 0 𝑥 𝜃, 𝜃 0
𝜃
Show that 𝑋 max 𝑋 is sufficient for 𝜃, using the factorization theorem.
Sol: The likelihood function of the sample is:
1
𝑓 𝑥 ,…,𝑥 , 𝑖𝑓 0 𝑥 ,…,𝑥 𝜃,
𝜃
We can now write 𝑓 𝑥 , … , 𝑥 as
𝑓 𝑥 ,…,𝑥 ℎ 𝑥 , … , 𝑥 𝑔 𝜃; 𝑥 , 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 , … , 𝑥
where
1, 𝑖𝑓 𝑥 , … , 𝑥 0
ℎ 𝑥 ,…,𝑥
and
1
𝑔 𝜃; 𝑥 , 𝑖𝑓 0 𝑥 𝜃,
𝜃

| |
Ex 3.5: RS 𝑥 𝑠~𝑓 𝑥; 𝜎 𝑒 , ∞ 𝑥 ∞, 𝜎 0
𝑄: 𝐹𝑖𝑛𝑑 𝑎 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 𝑓𝑜𝑟 𝜎
Sol: Use factorization:
1 | |
𝑓 𝑥 ,…,𝑥 ;𝜎 𝑒
2𝜎
1 1 ∑| |
𝑒 𝑤ℎ𝑒𝑟𝑒 𝑆 |𝑥 |
2 𝜎
1 1
ℎ 𝑥 ,…,𝑥 ∙ 𝑔 𝑆, 𝜎 𝑤ℎ𝑒𝑟𝑒 ℎ 𝑥 , … , 𝑥 𝑎𝑛𝑑 𝑔 𝑆, 𝜎 𝑒
2 𝜎
∴ 𝑆 |𝑥 | is sufficient for σ. #

Ex 3.6: RS xi’s from a one‐parameter Weibull distribution: 𝛼 𝑠ℎ𝑎𝑝𝑒, 𝛽 𝑠𝑐𝑎𝑙𝑒
𝛼
𝑓 𝑥; 𝛼, 𝛽 𝑥 𝑒 , 𝑥 0
𝛽
2
𝑓 𝑥; 2, 𝛽 𝑥𝑒 , 𝑥 0
𝛽
a Find a sufficient statistic for β

Q:
b Use Part a to find a UMVUE for β
Sol:
a) 𝑓 𝑥 , … , 𝑥 ; 2, 𝛽 ∏ 𝑥𝑒 𝐼 , 𝑥 where I is Indicator function (piecewise)
∑
2 𝑥𝐼 , 𝑥 ∙𝛽 𝑒
,
,…,
ℎ 𝑥 ,..,𝑥 ∙ 𝑔 𝑆, 𝛽 𝑤ℎ𝑒𝑟𝑒 𝑆 𝑥
∴ 𝑆 𝑥 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑓𝑜𝑟 𝛽
b) (See later discussion)
Ex 3.7: RS xi’s from a one‐parameter Weibull distribution
𝛼
𝑓 𝑥; 𝛼, 2 𝑥 𝑒 , 𝑥 0
2
𝐹𝑖𝑛𝑑 𝑎 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 𝑓𝑜𝑟 𝛼.
Sol:𝑓 𝑥 , … , 𝑥 ; 𝛼, 2 ∏ 𝑥 𝑒 𝐼 , 𝑥
∑
𝑥 𝐼 , 𝑥 ∙𝛼 2 𝑥 𝑒
∑
ℎ 𝑥 ,…,𝑥 ∙ 𝛼 2 𝑥 𝑒
ℎ 𝑥 ,…,𝑥 ∙𝑔 𝑥 , 𝑥 ;𝛼
ℎ 𝑥 ,…,𝑥 ∙ 𝑔 𝑥 ,…,𝑥 ;𝛼
∴𝑆 𝑥 , … , 𝑥 𝑖𝑠 𝑗𝑜𝑖𝑛𝑡𝑙𝑦 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑓𝑜𝑟 𝛼 #
Note: Exponential family of probability distributions (ex: Poisson, normal, gamma, and Bernoulli) have
density functions of form:
exp 𝑘 𝑥 𝑐 𝜃 𝑆 𝑥 𝑑 𝜃 , 𝑖𝑓 𝑥 ∈ Β
𝑓 𝑥; 𝜃
0, 𝑥∉Β
Where B does not depend on the parameter 𝜃
Thm: Let 𝑋 , … 𝑋 be a random sample from a population with pdf of pmf of the exponential form
exp 𝑘 𝑥 𝑐 𝜃 𝑆 𝑥 𝑑 𝜃 , 𝑖𝑓 𝑥 ∈ Β
𝑓 𝑥; 𝜃
0, 𝑥∉Β
where B does not depend on the parameter 𝜃. The statistic ∑ 𝑘 𝑋 is sufficient for 𝜃.
Proof: The joint density
𝑓 𝑥 ,…,𝑥 ;𝜃 exp 𝑐 𝜃 𝑘 𝑥 𝑆 𝑥 𝑛𝑑 𝜃
exp 𝑐 𝜃 𝑘 𝑥 𝑛𝑑 𝜃 exp 𝑆 𝑥
Using the factorization theorem, the statistic ∑ 𝑘 𝑋 is sufficient.
***dim(S) may or may not be equal to dim(𝜽)
Def 3.2: Minimal Sufficient Statistic: S is a minimal sufficient statistic for 𝜃 if S is sufficient and if
dim 𝑆 dim 𝑇 for every other sufficient statistics, T.

**Methods for verifying whether a set of statistics is minimally sufficient are given in Wasan’s book in
1970 “Parametric Estimation”‐ McGraw‐Hill

Thm 3.2: Let S be a set of jointly sufficient statistic for 𝜃:
1) If 𝜃 is a unique MLE, then 𝜃 is a function of S
2) If 𝜃 is a unique MLE and jointly sufficient for 𝜃, then 𝜃 is minimally sufficient and a
function of S *********

**Two main usages of sufficient statistics:
1) Compress info for estimating parameters
2) Improve the accuracy of estimators
Thm 3.3: Rao‐Blackwell
Let 𝑥 , … , 𝑥 have joint pdf 𝑓 𝑥 , … , 𝑥 ; 𝜃 , and let 𝑆 𝑆 , … , 𝑆 be a vector of jointly sufficient

statistics for 𝜃. If T is any unbiased estimator of 𝜏 𝜃 , and if 𝑇 ∗ 𝐸 𝑇|𝑆 , then
1) T* is a function of S and doesn’t depend on 𝜃
2) T* is an unbiased estimator of 𝜏 𝜃
3) 𝑉𝑎𝑟 𝑇 ∗ 𝑉𝑎𝑟 𝑇 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝜃 𝑎𝑛𝑑 𝑉𝑎𝑟 𝑇 ∗ 𝑉𝑎𝑟 𝑇 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝜃
∗
𝑢𝑛𝑙𝑒𝑠𝑠 𝑇 𝑇

Note:
1) We can restrict to sufficient statistics
2) If we find unbiased T, then we can improve by E[T|S]
3) If there is only one 𝑇 ∗ ℎ 𝑆 𝐸 𝑇|𝑆 that is unbiased, then it must be the UMVUE

Def 3.3: Complete: A sufficient statistic is called complete if it is a unique unbiased estimator (after
adjustment)
E.g., 𝑆 ∑ 𝑥 𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜃, 𝑎𝑑𝑗𝑢𝑠𝑡: 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜃
Thm 3.4: Lehmann‐Scheffe (L‐S)
Let 𝑥 , … , 𝑥 have join pdf 𝑓 𝑥 , … , 𝑥 ; 𝜃 , and let S be a vector of jointly complete sufficient statistics for

𝜃. If 𝑇 ∗ 𝑡 ∗ 𝑆 is a statistic that is unbiased for 𝜏 𝜃 and a function of S, then T* is a UMVUE of 𝜏 𝜃 .
(Proof: See Text 1)
STEPS:
1) Find S that is complete sufficient for 𝜃.
2) Find unbiased estimator T for 𝜏 𝜃
a. Yes (2.1) if T is a function of S, then DONE ⇒ T is UMVUE
b. No (2.2) otherwise, compute 𝑇 ∗ 𝐸 𝑇|𝑆 ⇒ 𝑇 ∗ is UMVUE
Def 3.4: K‐Parameter Regular Exponential Class: (REC)
A density function is said to be a member of the regular K‐parameter exponential class:
 If it can be expressed in the form
𝑓 𝑥; 𝜃 𝑐 𝜃 ℎ 𝑥 exp ∑ 𝑞 𝜃 𝑡 𝑥 𝑥∈𝐴 𝑥: 𝑓 𝑥; 𝜃 0
and zero otherwise, where 𝜃 𝜃 , … , 𝜃 is a vector of k unknown parameters
 If the parameter space is the interval set
Ω 𝜃|𝑎 𝜃 𝑏, 𝑖 1, … , 𝑘
where a , b s are known constants and can be ∞, and if it satisfies regularity conditions 1,2,
and 3a or 3b given by:
1) The set 𝐴 𝑥: 𝑓 𝑥; 𝜃 0 does not depend on 𝜃 (Uniform is NOT REC)
2) 𝑞 𝜃 are nontrivial, functionally independent, continuous functions of 𝜃. 𝑞 𝜃 𝑔 𝑞 𝜃
3) a) For continuous rv, 𝑡 𝑥 ≢ 0 are linearly independent continuous functions of x over A
b) For discrete rv, 𝑡 𝑥 are nontrivial, linearly independent functions of x on A
Notes:
1) Notation: 𝑓 𝑥; 𝜃 ∈ 𝑅𝐸𝐶 𝑜𝑟 𝑅𝐸𝐶 𝑞 , … , 𝑞
2) The pdf of REC can also be expressed as:
𝑓 𝑥; 𝜃 exp 𝑞 𝜃 𝑡 𝑥 𝑙𝑛 𝑐 𝜃 𝑙𝑛 ℎ 𝑥
Ex 3.8: 𝑥~𝐸𝑋𝑃 𝜃 , 𝑠ℎ𝑜𝑤 𝐸𝑋𝑃 𝜃 ∈ 𝑅𝐸𝐶
𝑒 , 𝑥 0
Sol:𝑓 𝑥; 𝜃 𝑒 𝐼 , 𝑥
1 1
𝐼 , 𝑥 exp ∙ ⏟
𝑥
⏟
𝜃 𝜃
𝑓 𝑥; 𝜃 𝑐 𝜃 ℎ 𝑥 exp 𝑞 𝜃 𝑡 𝑥
Other conditions:
1) 0 𝜃 ∞
2) 0 𝑥 ∞
3) 𝑞 𝜃 , 𝑛𝑜𝑛 𝑡𝑟𝑖𝑣𝑖𝑎𝑙
4) 𝑡 𝑥 𝑥, 𝑛𝑜𝑛 𝑡𝑟𝑖𝑣𝑖𝑎𝑙
∴ 𝐸𝑋𝑃 𝜃 ∈ 𝑅𝐸𝐶 #
Ex 3.9: 𝑥~𝐵𝐼𝑁 1, 𝑝 . Show that 𝐵𝐼𝑁 1, 𝑝 ∈ 𝑅𝐸𝐶.

Sol:𝑓 𝑥; 𝑝 𝑝 1 𝑝 ∙𝐼 , 𝑥 𝑥 0,1
𝑒 ∙𝐼 , 𝑥
𝑒 ∙𝑒 𝑥
∙𝐼 ,
𝑝
1 𝑝 ∙𝐼 , 𝑥 exp 𝑙𝑛 ∙𝑥
1 𝑝
𝑝
𝑐 𝑝 1 𝑝, ℎ 𝑥 𝐼 , 𝑥 , 𝑞 𝑝 𝑙𝑛 , 𝑡 𝑥 𝑥
1 𝑝
1) 𝐴 𝑥: 𝑓 𝑥; 𝑝 0 0,1 ⊥ 𝑝
2) 𝑞 𝑝 𝑙𝑛 Non‐trivial
3) 𝑡 𝑥 𝑥 Non‐trivial
⇒ ∴ 𝐵𝐼𝑁 1, 𝑝 ∈ 𝑅𝐸𝐶 #

Ex 3.10: 𝑥~Γ 𝛼, 𝛽 , 𝑓 𝑥; 𝛼, 𝛽 𝑥 𝑒 𝐼 , 𝑥 . Show that Γ 𝛼, 𝛽 ∈ 𝑅𝐸𝐶.
Sol:𝑓 𝑥; 𝛼, 𝛽 𝐼 , 𝑥 exp 𝛼 1 ln 𝑥 ⏟
𝑥
,
, ,
1) 𝐴 𝑥: 𝑓 𝑥; 𝛼, 𝛽 0 0, ∞ ⊥ 𝛼, 𝛽
𝑞 𝛼, 𝛽 𝛼 1
2) 𝑁𝑜𝑛 𝑡𝑟𝑖𝑣𝑖𝑎𝑙, 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑎𝑙𝑙𝑦 ⊥
𝑞 𝛼, 𝛽
𝑡 𝑥 ln 𝑥
3) 𝑙𝑖𝑛𝑒𝑎𝑟 ⊥
𝑡 𝑥 𝑥 1
∴ Γ 𝛼, 𝛽 ∈ 𝑅𝐸𝐶 #

Thm 3.5: If 𝑥 , … , 𝑥 is a random sample from a member of the regular exponential class
𝑅𝐸𝐶 𝑞 , … , 𝑞 , then the statistics
𝑆 ,…,𝑆 𝑡 𝑥 , 𝑡 𝑥 ,…, 𝑡 𝑥
Are a minimal set of complete sufficient statistics for 𝜃 , … , 𝜃 .

Ex 3.11: In HW#3 Q.4, 𝑥 ~𝐸𝑋𝑃 𝜃 , 𝑇 ̅
is unbiased for 1/𝜃. Find the UMVUE for .
Sol: We know that 𝑥̅ achieve 𝐶𝑅𝐿𝐵 𝜃 ⇒ 𝑥̅ is UMVUE for 𝜃
1
∴ 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑙𝑖𝑛𝑒𝑎𝑟 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝜃
𝜃
1
∴ 𝑁𝑜 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝑐𝑎𝑛 𝑎𝑐ℎ𝑖𝑒𝑣𝑒 𝐶𝑅𝐿𝐵 .
𝜃
But UMVUE may still exist.
From Ex 3.8, EXP θ ∈ REC
By THM 3.5, S 𝑡 𝑥 ⏟ 𝑥 is minimal complete & sufficient
n 11
n 1
Also, T is function of S and unbiased
∑𝑥
n 𝑥̅
1
∴ T is the UMVUE for 𝐿 𝑆 Thm 3.4 #
θ

Ex 3.13: Find the UMVUE for 𝜇 and 𝜎 of 𝑁 𝜇, 𝜎
Sol:
STEP 1: Find the complete & sufficient statistics for 𝜇 and 𝜎
Express 𝑁 𝜇, 𝜎 in REC form:
1 𝑥 𝜇
𝑓 𝑥; 𝜇, 𝜎 exp
√2𝜋𝜎 2𝜎
1 𝑥 𝑥𝜇 𝜇
exp
√2𝜋𝜎 2𝜎 𝜎 2𝜎
1 1 𝜇
𝑒 𝐼 exp 𝑥 𝑥
√2𝜋𝜎 2𝜎 𝜎
1 1 𝜇
𝑐 𝜇, 𝜎 𝑒 , ℎ 𝑥 𝐼, 𝑞 𝜇, 𝜎 , 𝑡 𝑥 𝑥 , 𝑞 𝜇, 𝜎 ,
√2𝜋𝜎 2𝜎 𝜎
𝑡 𝑥 𝑥
1) 𝐴 𝑥: 𝑓 𝑥; 𝜇, 𝜎 0 ∞, ∞ ⊥ 𝜇, 𝜎
𝑞 𝜇, 𝜎
2) Non‐trivial, functionally ⊥
𝑞 𝜇, 𝜎
𝑡 𝑥 2𝑥
3) Not linear of each other
𝑡 𝑥 1
𝑁 𝜇, 𝜎 ∈ 𝑅𝐸𝐶 𝑎𝑛𝑑 𝑆 𝑥 ,𝑆 𝑥 𝑖𝑠 𝑚𝑖𝑛𝑖𝑚𝑎𝑙 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 & 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡

STEP 2: Find an unbiased estimator that is a function of S1 and S2
∑𝑥
∴ 𝑀𝐿𝐸𝑠 ∶ 𝜇̂ 𝑔 𝑆
𝑛
∑𝑥
𝑛 𝑛 ∑ 𝑥 𝑥̅ ∑ 𝑥
𝑛
𝑆 𝜎
𝑛 1 𝑛 1 𝑛 𝑛 1
∑𝑥 ∑𝑥
∑𝑥 2∑𝑥
𝑛 𝑛

𝑛 1
∑𝑥
∑𝑥
𝑛
𝑔 𝑆 ,𝑆
𝑛 1
∴ By Thm 3.4 L S Thm , x and S are UMVUE for μ, σ #

Ex 3.14: RS𝑥 𝑠~𝑓 𝑥; 𝜃 𝜃 𝑥𝑒 ,0 𝑥 ∞, 𝑤ℎ𝑒𝑟𝑒 𝜃 0. Find the UMVUE for 𝜃.
Sol: STEP 1: Note that this is Γ 2,
REC form is:
𝑓 𝑥; 𝜃 𝜃 𝑥𝑒
𝑐 𝜃 𝜃 , ℎ 𝑥 𝑥, 𝑔 𝜃 𝜃, 𝑡 𝑥 𝑥
1) 𝐴 𝑥: 𝑓 𝑥; 𝜃 0 0, ∞ ⊥ 𝜃
2) 𝑞 𝜃 𝜃 Non‐trivial
3) 𝑡 𝑥 1 0
∴𝑆 𝑥 is complete & sufficient
STEP 2: Find an unbiased estimator for 𝜃
2 2𝑛
𝑇𝑟𝑦: 𝐸 𝑆 𝐸 𝑥 𝐸𝑥
𝜃 𝜃
1 1
𝐸
𝑆 𝐸𝑆
𝐸𝑆 𝐸 𝑥
Γ 2𝑛 1 1

Γ 2𝑛 𝜃
Γ 2𝑛 1 1

Γ 2𝑛 𝜃
2𝑛 2 !
𝜃
2𝑛 1 !
1
𝜃
2𝑛 1
2𝑛 1
∴𝑇 2𝑛 1 𝑆 2𝑛 1 𝑥 𝑥̅ is unbiased for 𝜃 .
𝑛
T is also the UMVUE for 𝜃 because it is a function of the complete & sufficient statistic. #

***Summary:
𝑆 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
1) 𝜃 𝑓 𝑆
𝜃 𝑖𝑠 𝑢𝑛𝑖𝑞𝑢𝑒
𝑆 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑇∗ 𝐸 𝑇|𝑆 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜏 𝜃
2)
𝑇 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜏 𝜃 𝑉𝑎𝑟 𝑇 ∗ 𝑉𝑎𝑟 𝑇
𝑆 𝑖𝑠 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑎𝑛𝑑 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
3) 𝑇∗ 𝑡 𝑆 𝑖𝑠 𝑈𝑀𝑉𝑈𝐸
𝑇 ∗ 𝑡 𝑆 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝜏 𝜃
4) 𝑅𝐸𝐶 → 𝑆 ∑ 𝑡 𝑥 ,…,𝑆 ∑ 𝑡 𝑥 𝑖𝑠 𝑚𝑖𝑛𝑖𝑚𝑎𝑙 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒
& 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡

Chapter 4: Interval Estimation
Ex 4.1: Suppose 6.5, 9.2, 9.9, and 12.4 is a RS of size 4 from 𝑁 𝜃, 0.8 , i.e.,
1 .
𝑓 𝑥; 𝜃 𝑒 , ∞ 𝑥 ∞
√2𝜋0.8
What value of 𝜃 is believable?
∑
Sol: 𝜃 𝑥̅ 9.5
0.8
∴ 𝑥 ~𝑁 𝜃, 0.8 , ∴ 𝑥̅ ~𝑁 𝜃,
4
𝑥̅𝜃
0.8 𝑥̅ 𝜃
⇒ ~𝑁 0,1
√4 0.4
𝑥̅ 𝜃
1 𝛼 𝑃 𝑍 0.8 𝑍
√4
0.8 0.8
𝑃 𝑍 𝑥̅ 𝜃 𝑍 𝑥̅
√4 √4
𝛼 0.05, 𝑍 1.96, 𝑍 1.96
0.8 0.8
𝑃 𝑥̅ 1.96 𝜃 𝑥̅ 1.96
√4 √4
Def 4.1: An interval 𝑙 𝑥 , … , 𝑥 , 𝑢 𝑥 , … , 𝑥 is called a 𝟏𝟎𝟎𝜸% Confidence Interval for 𝜃 if:
𝑃 𝑙 𝑥 ,…,𝑥 𝜃 𝑢 𝑥 ,…,𝑥 𝛾
Where 0 𝛾 1. The observed value 𝑙 𝑥 , … , 𝑥 and 𝑢 𝑥 , … , 𝑥 are called lower and upper
confidence limits, respectively.
 If 𝑃 𝑙 𝑥 , … , 𝑥 𝜃 𝛾, then 𝑙 𝑥 , … , 𝑥 is called a one‐side lower 𝟏𝟎𝟎𝜸% confidence limit
for 𝜃
 If 𝑃 𝜃 𝑢 𝑥 ,…,𝑥 𝛾, then 𝑢 𝑥 , … , 𝑥 is called a one‐side upper 𝟏𝟎𝟎𝜸% confidence
limit for 𝜃
Ex 4.2: RS 𝑥 𝑠~𝐸𝑋𝑃 𝜃 . Find one‐side lower 100𝛾% CL for 𝜃.
Sol:𝛾 𝑃𝐿 𝜃 ⇒ 𝑊ℎ𝑎𝑡 𝑖𝑠 𝐿 ?
∑𝑥 2𝑛𝑥̅
∴ 𝑥̅ 𝑖𝑠 𝑠𝑢𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑓𝑜𝑟 𝜃 𝑥̅ ⇒ ~𝑋 2𝑛
𝑛 𝜃
̅
𝛾 𝑃 𝑋 2𝑛 ←γ percentile of chi square with 2n dof
2𝑛𝑥̅
𝑃 𝜃
𝑋 2𝑛
2𝑛𝑥̅
∴𝑙 𝑥 is the one side lower 100γ% CL for θ
𝑋 2𝑛
2𝑛𝑥̅
Similarly, 𝑢 𝑥 is the one side upper 100γ% CL for θ
𝑋 2𝑛
̅ ̅
𝑙 𝑥 ,𝑢 𝑥 , is two side 100 1 α % CI for 𝜃
*Note: For two‐side CI, if 𝛼 𝛼 then it is the
“equal‐tailed” choice.
⇒ Best choice (smallest width) if pdf is a single hump
distribution.
Ex 4.3: RS 𝑥 𝑠~𝑁 𝜇, 𝜎 , what sample size n is needed to achieve a precision d (or width w=2d) of a CI

for 𝜇?
Sol: In Ex 4.1, 100 1 𝛼 % CI for 𝜇 is : (𝑥̅ 𝑍 , 𝑥̅ 𝑍
√ √
𝜎
∴ 𝐸 𝑤𝑖𝑑𝑡ℎ 𝐸 𝑤
⏟ 𝐸𝑢 𝑙 𝐸 2𝑍
√𝑛
𝜎 𝜎
2𝑍 𝑤ℎ𝑒𝑟𝑒 𝑑 𝑍
√𝑛 √𝑛
𝜎 2𝜎 2𝜎
⇒ 2𝑍 2𝑑 ⇒ 𝑛 𝑍 𝑍
√𝑛 2𝑑 𝑤
1
𝑛∝ #
𝑤
Ex 4.4: For a certain new model of microwave oven, it is desired to set a guarantee period so that only
𝛾% of the ovens sold will have had a major failure in this length of time. Assume that the time to the 1st
major failure is 𝑁 𝜇, 𝜎 , the guarantee period should end at 𝑡 𝜇 𝑍 𝜎. Suppose the company will
charge $C if a customer purchases the insurance for the oven. The cost for fixing a failure oven is $F.
Given a r.s. of time to failure 𝑥 , … , 𝑥 , how do you determine the insurance policy (pricing)?
Sol: 𝑇 𝑥̅ 𝑍 𝜎 𝐸𝑆 𝜎 𝐸 √𝑆 𝜎
Unbiased estimator for 𝜎 in
𝑛 1
𝑛 1Γ 2 ∑ 𝑥 𝑥̅
𝜎 𝑛
2 Γ 𝑛 1
2
Make sure 𝐶 𝛾𝐹
Insurance Price Premium

Duration
0.5 yr $16 𝐶 . 𝛾 . 𝐹, 𝑤ℎ𝑒𝑟𝑒 𝛾 𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑒𝑠 0.5 𝑥̅ 𝑍 𝜎
1 yr $50 𝐶 𝛾 𝐹, 𝑤ℎ𝑒𝑟𝑒 𝛾𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑒𝑠 1 𝑥̅ 𝑍 𝜎

2 yr $97.7 𝐶 𝛾 𝐹, 𝑤ℎ𝑒𝑟𝑒 𝛾𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑒𝑠 2 𝑥̅ 𝑍 𝜎
E.g., 𝐹 100, 𝑥̅ 1, 𝜎 0.5
𝐶.
𝐶 . 𝛾 . 𝐹⇒𝛾 .
𝐹
For 0.5 yr: 0.5 1 𝑍 .
0.5 ⇒ 𝑍 .
1⇒𝛾 . 0.159 16%
𝐶 . 0.16 ∗ 100 16 $
For 1 yr: 1 1 𝑍 0.5 ⇒ 𝑍 0⇒ 𝛾 50%
𝐶 0.5 ∗ 100 50 $ #
Def 4.2: Pivotal Quantity
If 𝑄 𝑞 𝑥 , … , 𝑥 ; 𝜃 is a r.v. that is a function only of 𝑥 , … , 𝑥 and 𝜃, then 𝑄 is called a pivotal

quantity (PQ) if its distribution does not depend on 𝜃 or any other unknown parameters.
Def 4.3:
𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠: 𝑓 𝑥; 𝜂 𝑓 𝑥 𝜂 , 𝑤ℎ𝑒𝑟𝑒 𝑓 𝑖𝑠 𝑓𝑟𝑒𝑒 𝑜𝑓 𝜂

1 𝑥
𝑆𝑐𝑎𝑙𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠: 𝑓 𝑥; 𝜃 𝑓 , 𝑤ℎ𝑒𝑟𝑒 𝑓 𝑖𝑠 𝑓𝑟𝑒𝑒 𝑜𝑓 𝜃
𝜃 𝜃
1 𝑥 𝜂
𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 & 𝑆𝑐𝑎𝑙𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠: 𝑓 𝑥; 𝜃, 𝜂 𝑓 , 𝑤ℎ𝑒𝑟𝑒 𝑓 𝑖𝑠 𝑓𝑟𝑒𝑒 𝑜𝑓 𝜃
𝜃 𝜃
E.g.s:
Location: 𝑥~𝐸𝑋𝑃 1, 𝜂
𝑒 , 𝑥 𝜂 𝑒 , 𝑥 0
𝑓 𝑥; 𝜂 𝑓 𝑥
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Scale: 𝑥~𝐸𝑋𝑃 𝜃
1 1
𝑓 𝑥; 𝜃 𝑒 , 𝑥 0 𝑓 𝑥 𝑒
𝜃 𝜃
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0
Location & Scale: 𝑁 𝜇, 𝜎
1 1 1 1
𝑓 𝑥; 𝜇, 𝜎 𝑒 𝑓 𝑥 𝑒
√2𝜋𝜎 𝜎 𝜎 √2𝜋
Thm 4.1: RS 𝑥 ~𝑓 𝑥; 𝜃 𝑜𝑟 𝑓 𝑥; 𝜃 , 𝜃 . Assume MLE 𝜃 𝑜𝑟 𝜃 , 𝜃 exists, then:
1) 𝜃 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 ∶ 𝑄 𝜃 𝜃 is a PQ (Ex 4.1)
2) 𝜃 𝑠𝑐𝑎𝑙𝑒: 𝑄 is a PQ (Ex 4.2)
3) 𝜃 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 & 𝜃 𝑠𝑐𝑎𝑙𝑒: 𝑄 ,𝑄 are PQs for 𝜃 & 𝜃 respectively
Ex 4.5: RS 𝑥 𝑠~𝑁 𝜇, 𝜎 , where both 𝜇 & 𝜎 are unknown. Find 100 1 𝛼 % CI for 𝜇 & 𝜎 .
Sol: 𝜃 𝜇̂ 𝑥̅ , 𝜃 𝜎 𝑆 ⇒𝜎 𝑆
𝑥̅ 𝜇 1 𝑥̅ 𝜇
𝑄
𝑛 1 √𝑛 1 𝑆
𝑆 √𝑥
𝑛
𝑥̅ 𝜇
∴ 1 𝛼 𝑃 𝑡 𝑡
𝑆
√𝑥
𝑆 𝑆
𝑃 𝑥̅ 𝑡 𝜇 𝑥̅ 𝑡
√𝑛 √𝑛
𝑆 𝑆
∴ 𝑥̅ 𝑡 , 𝑥̅ 𝑡 𝑖𝑠 𝑎 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 𝜇.
√𝑛 √𝑛
𝑛 1 𝑛 1
𝑆 𝑆 1 𝑛 1 𝑆
𝑛 𝑛
𝑄 ⇒ 𝑄
𝜎 𝜎 𝑛 𝜎
∴ 1 𝛼 𝑃 𝑋 𝑋 𝑛 1
𝑛 1 𝑆 𝑛 1 𝑆
𝑃 𝜎
𝑋 𝑛 1 𝑋 𝑛 1
𝑛 1 𝑆 𝑛 1 𝑆
∴ , 𝑖𝑠 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 𝜎 . #
𝑋 𝑛 1 𝑋 𝑛 1
𝜎 𝑘𝑛𝑜𝑤𝑛 𝐸𝑥 4.1
⎧𝜇
𝜎 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝐸𝑥 4.5
For 𝑁 𝜇, 𝜎 ,
⎨𝜎 𝜇 𝑘𝑛𝑜𝑤𝑛 ⇒ 𝐸𝑥 4.6
⎩ 𝜇 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝐸𝑥 4.5
Ex 4.6: RS 𝑥 𝑠~𝑁 𝜇, 𝜎 , 𝜇 is known. Find 100 1 𝛼 % CI for 𝜎 .
∑ ∑ ̅
Sol: STEP 1: ~𝑋 𝑛 1
̅ ∑ 𝑥 𝜇 𝑥 𝜇
~𝑋 𝑛
𝜎 𝜎
STEP 2:
𝑥 𝜇
1 𝛼 𝑃 𝑋 𝑛 𝑋 𝑛
𝜎
∑ 𝑥 𝜇 ∑ 𝑥 𝜇
𝑃 𝜎
𝑋 𝑛 𝑋 𝑛
∑ 𝑥 𝜇 ∑ 𝑥 𝜇
∴ , 𝑖𝑠 𝑎 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 𝜎 𝑤ℎ𝑒𝑛 𝜇 𝑖𝑠 𝑘𝑛𝑜𝑤𝑛 #
𝑋 𝑛 𝑋 𝑛

Compare two CIs for 𝜇 known/unknown:
∑𝒏
𝒊 𝟏 𝒙𝒊 𝝁
𝟐 ∑𝒏 𝒙 𝝁 𝟐
𝟐 , 𝒊 𝟏𝟐 𝒊
𝑿 𝜶 𝒏 𝑿𝜶 𝒏
𝟏
𝟐 𝟐
∑ ∑

∑𝒏
𝒊 𝟏 𝒙𝒊 𝒙
𝟐 ∑𝒏 𝒙 𝒙 𝟐
, , , 𝒊 𝟐𝟏 𝒊
𝑿𝟐 𝜶 𝒏 𝟏 𝑿𝜶 𝒏 𝟏
𝟏
𝟐 𝟐
Two Sample Problem:
Note: ⋚ 𝑝𝑜𝑠𝑒𝑠 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛 𝑖𝑓 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛, 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜, 𝑜𝑟 𝑔𝑟𝑒𝑎𝑡𝑒𝑟 𝑡ℎ𝑎𝑛 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝑥 𝑠~𝑁 𝜇 , 𝜎 𝜎
𝜇 𝜇 ⋚0 ⋚1
𝑌 𝑠~𝑁 𝜇 , 𝜎 𝜎
**Mean: 𝝁𝟏 𝝁𝟐 𝑥 , … , 𝑥 ~𝑁 𝜇 , 𝜎 , 𝑌 , … , 𝑌 ~𝑁 𝜇 , 𝜎

❶ 𝝈𝟐𝟏 𝒂𝒏𝒅 𝝈𝟐𝟐 are known
∴ 𝑎 𝑥 ~𝑁 𝑎𝜇 , 𝑎 𝜎
𝑥 𝜎 ⎫
𝑥̅ ~
⏞ 𝑁 𝜇 , ⎪
⎪
𝑛 𝑛 𝜎 𝜎
𝑥̅ 𝑌~𝑁 𝜇 𝜇 ,
⎬ 𝑛 𝑛
𝑌 𝜎 ⎪
𝑌 ~
⏞ 𝑁 𝜇 , ⎪
𝑛 𝑛 ⎭

𝑥̅ 𝑌 𝜇 𝜇
⇒ ~ 𝑁 0,1
𝜎 𝜎
𝑛 𝑛
⎡ ⎤
⎢ ⎥
𝑥̅ 𝑌 𝜇 𝜇
∴1 𝛼 𝑃⎢ 𝑍 𝑍 ⎥
⎢ ⎥
𝜎 𝜎
⎢ ⎥
𝑛 𝑛
⎣ ⎦
𝜎 𝜎 𝜎 𝜎
𝑃 𝑥̅ 𝑌 𝑍 𝜇 𝜇 𝑥̅ 𝑌 𝑍
𝑛 𝑛 𝑛 𝑛
𝜎 𝜎 𝜎 𝜎
∴ 𝑥̅ 𝑌 𝑍 , 𝑥̅ 𝑌 𝑍 𝑖𝑠 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 𝜇 𝜇
𝑛 𝑛 𝑛 𝑛
𝑤ℎ𝑒𝑛 𝜎 & 𝜎 𝑎𝑟𝑒 𝑘𝑛𝑜𝑤𝑛.

❷ 𝝈𝟐𝟏 𝝈𝟐𝟐 𝝈𝟐 unknown
Thm 4.2: RS of size 𝑛 , 𝑥 𝑠~𝑁 𝜇 , 𝜎 𝑤𝑖𝑡ℎ 𝑆 𝑎𝑛𝑑 𝑅𝑆 𝑜𝑓 𝑠𝑖𝑧𝑒 𝑛 ,
𝑌 𝑠~𝑁 𝜇 , 𝜎 𝑤𝑖𝑡ℎ 𝑆 . Define the pooled sample variance:
𝑛 1 𝑆 𝑛 1 𝑆
𝑆
𝑛 𝑛 2
∑ 𝑥 𝑥̅ ∑ 𝑌 𝑌 ∑ 𝑥 𝑥̅ ∑ 𝑌 𝑌
𝑛 1 𝑛 1
𝑛 1 𝑛 1

𝑛 𝑛 2 𝑛 𝑛 2
Then:
(i) ~𝑋 𝑛 𝑛 2
(ii) 𝐸𝑆 𝜎
(iii) ~𝑡 𝑛 𝑛 2
Sol:
i) ∴ ~𝑋 𝑛 1 ⊥ ~𝑋 𝑛 1 𝐹𝑟𝑜𝑚 𝑇ℎ𝑚 1.5
𝑛 1 𝑆 𝑛 1 𝑆
∴ ~𝑋 𝑛 𝑛 2
𝜎
ii) 𝐸 𝑛 𝑛 2⇒𝐸 𝜎
𝑋↔𝑆
iii) ∴ 𝑌 𝑋⊥𝑆 𝑁𝑜𝑡𝑒: 𝑌 ⊥ 𝑆 𝑎𝑛𝑑 𝑋 ⊥ 𝑆
𝑌↔𝑆
𝑌 𝑋 𝜇 𝜇
∴ ~ 𝑁 0,1
1 1
𝜎
𝑛 𝑛
,
𝑡
𝑌 𝑋 𝜇 𝜇
1 1
𝜎
𝑛 𝑛
⇒
𝑛 𝑛 2 𝑆 1
𝜎 𝑛 𝑛 2
𝑌 𝑋 𝜇 𝜇
~𝑡 𝑛 𝑛 2
1 1
𝑆
𝑛 𝑛
⎡ ⎤
𝑌 𝑋 𝜇 𝜇
∴1 𝛼 𝑃⎢ 𝑡 𝑛 𝑛 2 𝑡 𝑛 𝑛 2 ⎥
⎢ 1 1 ⎥
𝑆
⎣ 𝑛 𝑛 ⎦
1 1 1 1
⇒ 𝑌 𝑋 𝑡 𝑛 𝑛 2 𝑆 ,𝑌 𝑋 𝑡 𝑛 𝑛 2 𝑆
𝑛 𝑛 𝑛 𝑛
is 100 1 𝛼 % CI for 𝜇 𝜇 #
*If 𝜎 𝜎 both unknown then an approximated PQ is:
𝑆 𝑆
𝑌 𝑋 𝜇 𝜇 𝑛 𝑛
~𝑡 𝛾 𝑤ℎ𝑒𝑟𝑒 𝛾
𝑆 𝑆 𝑆 𝑆
𝑛 𝑛 𝑛 𝑛
𝑛 1 𝑛 1
𝝈𝟐𝟐
**Variance:
𝝈𝟐𝟏
❸When 𝝁𝟏 , 𝝁𝟐 are unknown
𝑛 1 𝑆 𝑛 1 𝑆
~𝑋 𝑛 1 , ~𝑋 𝑛 1
𝜎 𝜎
∙
∴ ∙ ~𝐹 𝑛 1, 𝑛 1
∙
𝑆 𝜎
∴1 𝛼 𝑃 𝑓 𝑛 1, 𝑛 1 ∙
𝑆 𝜎
𝑓 𝑛 1, 𝑛 1
∴ 𝑓 𝑛 1, 𝑛 1 , 𝑓 𝑛 1, 𝑛 1 𝑖𝑠 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 #
❹When 𝝁𝟏 𝒂𝒏𝒅 𝝁𝟐 are both known
Find 100 1 𝛼 % CI for (See HW#5)
** 𝑥 , … , 𝑥 ~𝐸𝑋𝑃 𝜃 ⊥ 𝑌 , … , 𝑌 ~𝐸𝑋𝑃 𝜃
I.B: Find 100 1 𝛼 % CI for (See HW#5)
****SUMMARY:
I. 𝑥 ~𝐸𝑋𝑃 𝜃
A: 𝜃 𝑥 , … , 𝑥 ~𝐸𝑋𝑃 𝜃 Ex 4.2

2𝑛𝑥̅ 2𝑛𝑥̅
,
𝑋 2𝑛 𝑋 2𝑛
B: 𝑥 , … , 𝑥 ~𝐸𝑋𝑃 𝜃 HW

𝑌 , … 𝑌 ~𝐸𝑋𝑃 𝜃
II. 𝑥 ~𝑁 𝜇, 𝜎
A: 𝜇 1) 𝜎 known 𝜎 Ex 4.1
𝑥̅ 𝑍
√𝑛
2) 𝜎 unknown 𝑆 Ex 4.5
𝑥̅ 𝑡 𝑛 1
√𝑛
B: 𝜎 1) 𝜇 unknown EEx

𝑛 1 𝑆 𝑛 1 𝑆
, 4.5
𝑋 𝑛 1 𝑋 𝑛 1
2) 𝜇 known ∑ 𝑥 𝜇 ∑ 𝑥 𝜇 Ex 4.6
,
𝑋 𝑛 𝑋 𝑛
III. 𝑛 : 𝑥 ~𝑁 𝜇 , 𝜎 , 𝑛 : 𝑌 ~𝑁 𝜇 , 𝜎
A: 𝜇 𝜇 1) 𝜎 , 𝜎 known Ex 4.1

𝜎 𝜎
𝑥̅ 𝑌 𝑍 ,
𝑛 𝑛
𝜎 𝜎
𝑥̅ 𝑌 𝑍
𝑛 𝑛
2) 𝜎 𝜎 𝜎 After Thm 4.2
1 1
unknown 𝑌 𝑋 𝑡 𝑛 𝑛 2 𝑆 , 𝑌
𝑛 𝑛
𝑋
𝑡 𝑛 𝑛
1 1
2 𝑆
𝑛 𝑛
where 𝑆
B: 1) 𝜇 , 𝜇 unknown 𝑆 𝑆 Case❸

𝑓 𝑛 1, 𝑛 1 , 𝑓 𝑛 1, 𝑛
𝑆 𝑆
After Thm 4.2
1
2) 𝜇 , 𝜇 known HW

Approximated CIs
∑
Def 4.4: Consider a r.v. xn indexed by sample size n (e.g., 𝑥 ). We say that 𝑥 converges in
probability to constant C iff for every 𝜀 0:
lim 𝑃 |𝑥 𝑐| 𝜀 1 𝑤𝑟𝑖𝑡𝑡𝑒𝑛 𝑥 → 𝐶
→
Def 4.5: A sequence of r.v. 𝑥 is said to be consistent for 𝜏 𝜃 if 𝑥 → 𝜏 𝜃

(Also called “simple consistency”)

Consistency: The estimator 𝜃 is said to be a consistent estimator of 𝜃 if, ∀𝜀 0,
lim 𝑃 𝜃 𝜃 𝜀 1
→
or equivalently
→
Def 4.6: A sequence of r.v. 𝑥 is called mean square error consistent (MSE)
lim 𝐸 𝑥 𝜏 𝜃 0 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝜃 ∈ Ω

→
Def 4.7: A sequence of r.v. 𝑥 is said to be asymptotically unbiased for 𝜏 𝜃 if
lim 𝐸 𝑥 𝜏 𝜃 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝜃 ∈ Ω

→
Note: 𝐸 𝑥 𝜏 𝜃

𝑀𝑆𝐸 𝑇 𝐸 𝑇 𝜏 𝜃 𝑉𝑎𝑟 𝑇 𝐸𝑇 𝜏 𝜃
** ∴
𝑀𝑆𝐸 𝑇 → 0 𝑖𝑓 𝑉𝑎𝑟 𝑇 → 0 & 𝑇 𝑖𝑠 𝑎𝑠𝑦𝑚𝑝𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑

Thm 4.3: A sequence of rv 𝑇 of estimators of 𝜏 𝜃 is MSE consistent iff
1) lim 𝑉𝑎𝑟 𝑇 0 and
→
2) lim 𝐸 𝑇 𝜏 𝜃
→
Thm 4.4: If a sequence 𝑇 is MSE consistent, it is also simply consistent.
Ex 4.7: RS 𝑥 𝑠~ 𝐸𝑋𝑃 𝜃 , 𝑇 ̅
, show that 𝑇 is biased for , but it is simple consistent for .
̅
Sol: ∴ ~𝑋 2𝑛 ⇒ 𝐸 𝑇 𝐸 𝑥̅ 𝐸
2𝑛 1 𝑛 1 1
∙ ∙
𝜃 2 𝑛 1 𝑛 1 𝜃 𝜃
𝑛 1 1
𝑉𝑎𝑟 𝑇 𝐸𝑇 𝐸 𝑇 𝐸𝑇
𝑛 1 𝑛 2𝜃
𝑛 1 1
∴ 𝐴𝑠 𝑛 → ∞ lim 𝐸 𝑇 lim 𝐴𝑠𝑦𝑚𝑝𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑
→ → 𝑛 1𝜃 𝜃
𝑛 1 1
lim 𝑉𝑎𝑟 𝑇 lim 0
→ → 𝑛 1 𝑛 2𝜃
𝐵𝑦 𝑇ℎ𝑚 4.3 𝑇 𝑖𝑠 𝑀𝑆𝐸 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡.

∴ #
𝐵𝑦 𝑇ℎ𝑚 4.4 𝑇 𝑖𝑠 𝑎𝑙𝑠𝑜 𝑠𝑖𝑚𝑝𝑙𝑦 𝑠𝑖𝑚𝑝𝑙𝑦 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡
Sufficient Condition for Consistency of an Unbiased Estimator:
An unbiased estimator 𝜃 of 𝜃 is a consistent estimator for 𝜃 if lim 𝑉𝑎𝑟 𝜃 0
→
Chebyshev’s Inequality: 𝑃 |𝑋 𝜇| 𝜀
Test for Consistency: Let 𝜃 be an estimator of 𝜃 and let 𝑉𝑎𝑟 𝜃 be finite. If
lim 𝐸 𝜃 𝜃 0
→
Then 𝜃 is a consistent estimator of 𝜃.
Proof: Using Chebyshev’s inequality, we obtain
𝐸 𝜃 𝜃
𝑃 𝜃 𝜃 𝜀
𝑠
Because
lim 𝐸 𝜃 𝜃 0, 𝑏𝑦 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠
→
The right hand side converges to zero. To elaborate, by definition a consistent estimator converges to
the real value as n approaches infinity. Thus,
→
Consequently, 𝜃 is a consistent estimator of 𝜃

Procedure to test for consistency:
1) Check wheter the estimator 𝜃 is unbiased or not
2) Calculate 𝑉𝑎𝑟 𝜃 and Β 𝜃 , the bias of 𝜃
3) An unbiased estimator is consistent if 𝑉𝑎𝑟 𝜃 → 0 𝑎𝑠 𝑛 → ∞
4) A biased estimator is consistent if both 𝑉𝑎𝑟 𝜃 → 0 𝑎𝑛𝑑 𝐵 𝜃 → 0 𝑎𝑠 𝑛 → ∞

Ex. Let 𝑋 , … , 𝑋 be a random sample from 𝑁 𝜇, 𝜎 population.
1) Show that the sample variance 𝑆 is a consistent estimator for 𝜎
2) Show that the maximum likelihood estimators for 𝜇 and 𝜎 are consistent estimators for 𝜇 and
𝜎
Sol:
a) We have already seen that 𝐸𝑆 𝜎 , and hence, 𝑆 is an unbiased estimator of 𝜎 . Because
the sample is drawn from a normal distribution, we know that has a chi‐square
distribution with (n‐1) dof. And
𝑛 1 𝑆
𝑉𝑎𝑟 2 𝑛 1
𝜎
Thus,
𝑛 1 𝑆 𝑛 1
2 𝑛 1 𝑉𝑎𝑟 𝑉𝑎𝑟 𝑆
𝜎 𝜎
2𝜎
𝑉𝑎𝑟 𝑆 → 0 𝑎𝑠 𝑛 → ∞
𝑛 1
Hence S2 is a consistent estimator of the variance of a normal population.
b) We have seen that the MLE of 𝜇 is 𝜇̂ 𝑋, and that of 𝜎 is 𝜎 ∑ 𝑋 𝑋 . Now 𝜇̂ is an
unbiased estimator of 𝜇, and 𝑉𝑎𝑟 𝑋 → 0 𝑎𝑠 𝑛 → ∞. Therefore, from theorem of a
consistent estimator, 𝑋 us a consistent estimator of 𝜇.

Now we will use the identity
𝐸 𝜃 𝜃 𝑉𝑎𝑟 𝜃 Β 𝜃
To show that the MLE for 𝜎 is biased with
𝑛 1
𝐸 𝜎 𝜎
𝑛
And
𝑛 1 1
Β 𝜎 𝜎 𝜎 𝜎
𝑛 𝑛
Thus, 𝜎 ∑ 𝑋 𝑋 𝑆 . Using part (a), we get
𝑛 1 𝑛 1 2𝜎 2 𝑛 1 𝜎
𝑉𝑎𝑟 𝜎 𝑉𝑎𝑟 𝑆
𝑛 𝑛 𝑛 1 𝑛
Therefore,
𝜎
lim Β 𝜎 lim 0
→ → 𝑛
2 𝑛 1 𝜎
lim 𝑉𝑎𝑟 𝜎 lim 0
→ → 𝑛
By the test for consistency, 𝜎 ∑ 𝑋 𝑋 is a consistent estimator of 𝜎 .
Thm. 4.5: If 𝑇 → 𝜏 𝜃 , then 𝑔 𝑇 → 𝑔 𝜏 𝜃 , 𝑔 𝑡 is continuous.
𝐼𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦
lim 𝑔 𝑇 𝑔 lim 𝑇 𝑔 𝜏 𝜃 lim √
Ex: → → → ̅ ̅
→
𝐵𝑢𝑡 𝐸 𝑔 𝑇 𝑔 𝐸𝑇
𝐸 ̅ ̅

Difference between unbiasedness and consistency****:
∑
Ex: 𝑇 estimator for E[X], show 𝑇 → 𝐸 𝑋
∑
1) lim 𝐸 𝑇 lim 𝐸 𝐸𝑋
→ →
∑ ∑
2) lim 𝑉𝑎𝑟 𝑇 lim 𝑉𝑎𝑟 ∙ lim lim 0
→ → → →
𝑇 MLE consistent to E X ⎯⎯⎯ T → E X θ
Σ𝑥 𝑛 ∑𝑥 𝑛
However, E 𝑇 𝐸 𝐸 𝐸 𝑋 → 𝑏𝑖𝑎𝑠𝑒𝑑
𝑛 1 𝑛 1 𝑛 𝑛 1
𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 ↛ 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑𝑛𝑒𝑠𝑠
∑ ∑ ∑
Ex. 𝑇 𝑥 ∶𝐸 𝑇 𝐸𝑥 𝐸
1 1 𝑛 1 𝐸𝑋
𝐸𝑋
2 2 𝑛 1
𝐸𝑋
1) Must be asymptotically unbiased
∑ ∑
2) lim 𝑉𝑎𝑟 𝑇 lim 𝑉𝑎𝑟 𝑥 𝑉𝑎𝑟 lim 𝑉𝑎𝑟 𝑥
→ → →
1 1 𝜎
lim 𝜎
→ 4 4𝑛 1
1
𝜎 0
4
⇒ Not Consistent
𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡𝑐𝑦 ↚ 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑𝑛𝑒𝑠𝑠
∑ ∑ ∑
Ex. 𝑇 ∶ lim 𝑉𝑎𝑟 𝑇 lim 𝑉𝑎𝑟 lim lim 0
→ → → →
1∑ 𝑥 1 ∑𝑥 1
𝐸𝑇 𝐸 𝐸 𝐸𝑋 𝐸𝑋
2 𝑛 2 𝑛 2
→
𝑉𝑎𝑟 𝑇 ⎯⎯ 0 ↛ 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑𝑛𝑒𝑠𝑠

Def 4.8: If 𝑌 ~𝐺 𝑦 , then 𝑌 is said to 𝐜𝐨𝐧𝐯𝐞𝐫𝐠𝐞 𝐢𝐧 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧 to 𝑌~𝐺 𝑦 , if
lim 𝐺 𝑦 𝐺 𝑦 𝑁𝑜𝑡𝑒𝑑: 𝑌 → 𝑌
→

1, 𝑦 1
Ex 4.8: RS 𝑥 , … , 𝑥 ~𝑈𝑁𝐼𝐹 0,1 , if 𝑌 𝑋 : , show 𝑌 → 𝑌, where 𝑌~𝐺 𝑦
1, 1 𝑦
Sol: ∴ 𝐺 𝑦 𝑦 , 0 𝑦 1
0, 𝑦 0
→ 1, 1 𝑦
⎯⎯⎯ 𝐺 𝑦
0, 𝑦 1
Thm 4.6: Slutsky’s Theorem: If 𝑋 → 𝐶 𝑎𝑛𝑑 𝑌 → 𝑌 , then
1) 𝑋 𝑌 →𝐶 𝑌
2) 𝑋 𝑌 → 𝐶𝑌
3) → 𝐶 0
Thm 4.7: If 𝑌 → 𝑌, then 𝑔 𝑌 → 𝑔 𝑌 for any continuous function g independent of n. (Invariance

Property)
Thm 4.8: CLT: RS xi’s from any distribution with mean 𝜇 and variance 𝜎 ∞,
∑ 𝑥
𝜇
𝑛 → 𝑁 0,1 , 𝑛 → ∞
𝜎
𝑛
Ex. 4.9: RS 𝑥 𝑠~𝐵𝐼𝑁 1, 𝑝 . Find a quantity that converges to N(0,1).
∑
Sol: We know 𝑝̂ , 𝑉𝑎𝑟 𝑋 𝑝 1 𝑝
𝑋 𝐸𝑋 𝑝̂ 𝑝
⇒ → 𝑁 0,1 𝑏𝑦 𝐶𝐿𝑇
𝑉𝑎𝑟 𝑋 𝑝 1 𝑝
𝑛 𝑛
Since 𝑝̂ → 𝑝 ⇒ 𝑝̂ 1 𝑝̂ → 𝑝 1 𝑝
𝑝̂ 1 𝑝̂ 𝑝̂ 1 𝑝̂
⇒ →1 ⇒ → 1
𝑝 1 𝑝 𝑝 1 𝑝
𝑝̂ 𝑝
𝑝 1 𝑝
𝑛 𝑁 0,1
→ 𝑁 0,1 𝑁𝑜𝑡𝑒: 𝑛𝑢𝑚𝑒𝑟𝑎𝑡𝑜𝑟 → 𝑁 0,1 , 𝑑𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟 → 1
1
𝑝̂ 1 𝑝̂
𝑝 1 𝑝
𝑝̂ 𝑝
→ 𝑁 0,1 ~𝐴𝑁 0,1
𝑝̂ 1 𝑝̂
𝑛
Thm 4.9: If 𝑌 ~𝑋 𝛾 , then 𝑍 → 𝑍~𝑁 0,1 𝑎𝑠 𝛾 → ∞
Proof: ∴ 𝑌 ∑ 𝑥, 𝑥 ~𝑋 1 ⊥
∑ 𝑥 𝑌 𝑉𝑎𝑟 𝑥 2
∴𝑋 and 𝐸 𝑋 1, 𝑉𝑎𝑟 𝑋
𝛾 𝛾 𝛾 𝛾
𝑌
𝑋 𝐸𝑋 1 𝑌 𝛾
𝛾
By CLT: #
𝑉𝑎𝑟 𝑋 2 2𝛾
𝛾 𝛾
Thm 4.10:
If → 𝑁 0,1 𝑤𝑟𝑖𝑡𝑒 ~𝐴𝑁 0,1 𝑁𝑜𝑡𝑒: 𝐴𝑁 𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒 𝑁𝑜𝑟𝑚𝑎𝑙

√ √
and if 𝑔 𝑥 : 𝑔 𝜇 0
𝑔 𝑥 𝑔 𝜇
|𝜎𝑔 𝜇 |
𝑡ℎ𝑒𝑛, ~𝐴𝑁 0,1
√𝑛
***Note: This result is important because: 𝐼𝑓 𝑋 ~𝐴𝑁 𝜇, , 𝑡ℎ𝑒𝑛 𝑔 𝑋 ~𝐴𝑁 𝑔 𝜇 ,
Thm 4.11 and 4.12: RS 𝑥 ’s from any distribution with mean 𝜇 and 𝜎 ∞, then 𝑋 → 𝜇 and
𝑆 →𝜎 𝑜𝑟 𝑆 → 𝜎
Proof: (Use Chebychev Inequality)

Ex. 4.10: RS 𝑋 𝑠~𝑁 𝜇, 𝜎 . Find the limiting distribution of Sn2
Sol: We know 𝑉 ~𝑋 𝑛 1
𝑉 𝑛 1
By Thm 4.9, → 𝑍~𝑁 0,1
2 𝑛 1
𝑛 1 𝑆
𝑛 1 1 𝑆 𝜎
𝜎 𝑠𝑖𝑚𝑝𝑙𝑖𝑓𝑦 √𝑛
↪
2 𝑛 1 𝜎 √2
𝑆 𝜎
~ 𝐴𝑁 0,1
2𝜎
𝑛 1
2𝜎
∴ 𝑆 →𝑁 𝜎 , #
𝑛 1
Thm 4.13: RS 𝑋 𝑠 from any distribution with 𝜇 and 𝜎 ∞, then ~𝐴𝑁 0,1

√
Sol: ⁄√ ⁄√
∙ → 𝑁 0,1 ∙ 1 𝑁 0,1
𝜎
𝑆 →𝜎 ⇒ →1 #
𝑆

Claim:
Let 𝑥 , … , 𝑥 be a random sample from pdf 𝑓 𝑥; 𝜃 , 𝜃 ∈ Ω. Then the MLE estimator is asymptotically
normal and asymptotically efficient ie:
⎛ 1 ⎞
𝜃 → 𝑁 ⎜𝜃, ⎟
𝜕
𝜕𝜃
⎝ ⎠
If the following regularity conditions are satisfied,
0) The pdfs are distinct, i.e., 𝜃 𝜃 ⇒ 𝑓 𝑥 ;𝜃 𝑓 𝑥 ; 𝜃 .
1) The pdfs have common support for all 𝜃
2) The point 𝜃 is an interior point in Ω
3) The integral 𝑓 𝑥; 𝜃 𝑑𝑥 can be differentiated twice under the integral sign as a function of 𝜃
4) The pdf 𝑓 𝑥; 𝜃 is three times differentiable as a function of 𝜃. Further, ∀𝜃 ∈ Ω, ∃ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝐶
𝑎𝑛𝑑 𝑀 𝑥 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡
log 𝑓 𝑥; 𝜃 𝑀 𝑥 , 𝑤𝑖𝑡ℎ 𝐸 𝑀 𝑥 ∞, ∀𝜃 𝐶 𝜃 𝜃 𝐶 , ∀𝑥

IV: Approximated PQ (assume n large)
1) RS 𝑥 𝑠 from any distribution with mean 𝜇 and 𝜎 both unknown
̅
Approximated PQ for 𝜇: ⁄√
~𝑁 0,1 ← 𝜎 𝑢𝑛𝑘𝑛𝑜𝑤𝑛
𝑥̅ 𝜇
~𝐴𝑁 0,1
𝑆⁄√𝑛
2) 𝑥 , … , 𝑥 ⊥ 𝑌 , … , 𝑌 from any distributions with 𝜇 , 𝜎 and 𝜇 , 𝜎 → all unknown
Approximated PQ for 𝜇 𝜇 :
𝑌 𝑋 𝜇 𝜇
~𝐴𝑁 0,1
𝑆 𝑆
𝑛 𝑛
3) 𝑥 , … , 𝑥 ~𝐵𝐼𝑁 1, 𝑝 :
𝑋 𝑝
Approximated PQ for 𝑝: ~𝐴𝑁 0,1
𝑝 1 𝑝
𝑛
→ 𝑋 𝑝
~𝐴𝑁 0,1
𝑋 1 𝑋
𝑛
4) 𝑥 , … , 𝑥 , ⊥ 𝑌 , … , 𝑌 from 𝐵𝐼𝑁 1, 𝑝 and 𝐵𝐼𝑁 1, 𝑝 respectively.
Approximated PQ for 𝑝 𝑝 :
𝜎
𝑋 ~𝐴𝑁 𝑝 ,
𝑛
𝑌 𝑋 𝑝 𝑝 𝑌 𝑋 𝑝 𝑝 𝜎
~𝐴𝑁 0,1 ⇒ 𝑋 ~𝐴𝑁 𝑝 ,
𝑛
𝜎 𝜎 𝑌 1 𝑌 𝑋 1 𝑋
𝑛 𝑛 𝑛 𝑛 𝜎 𝜎
𝑋 𝑋 ~𝐴𝑁 𝑝 𝑝 ,
𝑛 𝑛
5) 𝑥 , … , 𝑥 from 𝑃𝑂𝐼 𝜇
Approximated PQ for 𝜇: See Quiz
𝑋 𝜇
~𝐴𝑁 0,1
𝜇⁄𝑛
𝑋 𝜇
~𝐴𝑁 0,1
𝑋⁄𝑛

****Summary:****
𝑠𝑡𝑟𝑜𝑛𝑔𝑒𝑟
𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒 𝑖𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 ⇒ 𝑋 → 𝑋
⎯⎯⎯ 𝑆𝑖𝑚𝑝𝑙𝑦 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡
1) MSE consistent ⇒𝑋 →𝜇
1 𝐴𝑠𝑦𝑚𝑝𝑡𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑
⇒ 2 lim 𝑉𝑎𝑟 𝑋 0
→
𝑋 𝑌 →𝜇 𝑌
𝐼𝑓 𝑥 → 𝜇 𝑐𝑜𝑛𝑠𝑡 , 𝑡ℎ𝑒𝑛 𝑔 𝑥 → 𝑔 𝜇 , 𝑔 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
2) 𝑋 ∙ 𝑌 → 𝜇𝑌
𝐼𝑓 𝑌 → 𝑌 𝑟. 𝑣. , 𝑡ℎ𝑒𝑛 𝑔 𝑌 → 𝑔 𝑌 , 𝑔 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
→
𝑋 ~𝐴𝑁 𝜇,
3) CLT:
𝑆 →𝜎
4) If 𝑋 ~𝐴𝑁 𝜇, , then 𝑔 𝑋 ~𝐴𝑁 𝑔 𝜇 ,
5) ~𝐴𝑁 0,1
√
6) 𝜃 ~𝐴𝑁 𝜃, 𝐶𝑅𝐿𝐵 𝜃 with regularity conditions

Case: 100 1 𝛼 % CI:
a) xi's are Normal and 𝜎 known 𝜎 𝜎
𝑋 𝑍 ,𝑋 𝑍
√𝑛 √𝑛
b) xi's are Normal but 𝜎 unknown
𝑆 𝑆
𝑋 𝑡 , 𝑋 𝑡
c) xi's are NOT Normal but 𝑛 large √𝑛 √𝑛
𝑆 𝑆
𝑋 𝑍 ,𝑋 𝑍
√𝑛 √𝑛
Ex. 4.11: In order to compare the speed of two types of CPUs: Intel Pentium 4(T) vs AMD Athlon XP (II),
18 random samples of Pentium 4 were taken and the average processing time is 𝑋
1198 ,S 3.2 . 12 random samples of Athlon XP were taken and the average processing
time is 𝑌 1202 , S 4.3 . Assume both samples are Normal with equal variances.
What is the 95% CI for 𝜇 𝜇 ?
Sol: ∴ 𝜎 𝜎 𝜎 𝑢𝑛𝑘𝑛𝑜𝑤𝑛, 𝑡ℎ𝑖𝑠 𝑖𝑠 𝑐𝑎𝑠𝑒 3 𝐶𝐼 . 𝑏 𝐶𝑎𝑠𝑒𝑇𝑎𝑏𝑙𝑒 .2
∴ 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 𝜇 𝜇 𝑖𝑠:
1 1 1 1
𝑦 𝑥̅ 𝑡 n n 2 S , 𝑦 𝑥̅ 𝑡 n n 2 S
𝑛 𝑛 𝑛 𝑛
n 1 S n 1 S
where S
n n 2
⇒ α 0.05, n 18, n 12, n n 2 28, t . 28 2.048
⇒𝑆 3.67
∴ 95% 𝐶𝐼 𝑓𝑜𝑟 𝜇 𝜇 𝑖𝑠: 1.198, 6.802 #
Ex. 4.12: In Ex 4.11, suppose now we take n1=21 samples and obtain S1=3.2, and n2=16 and S2=4.3.
Assume the samples are independent and follow𝑁 𝜇 , 𝜎 𝑎𝑛𝑑 𝑁 𝜇 , 𝜎 , 𝜇 , 𝜎 , 𝑖
1,2 𝑎𝑟𝑒 𝑢𝑛𝑘𝑛𝑜𝑤𝑛. What is 90% CI for 𝜎 /𝜎 ?
Sol: This is Case 3.B.1: 𝑓 𝑛 1, 𝑛 1 , 𝑓 𝑛 1, 𝑛 1
⇒𝑛 21, 𝑆 3.2 , 𝑛 16, 𝑆 4.3 , 𝛼 0.1
𝑓 𝑛 1, 𝑛 1 𝑓. 20,15 2.33
1 1
𝑓 𝑛 1, 𝑛 1 𝑓. 20,15 0.45
𝑓. 15,20 2.20
𝜎
∴ 90% 𝐶𝐼 𝑓𝑜𝑟 𝑖𝑠 0.813, 4.207 #
𝜎
Ex 4.13: Assume that in HSBC, the amount of deposit that a customer made follows 𝑁 𝜇, 𝜎 , where
𝜎 1000 and 𝜇 is unknown. Suppose we want to estimate 𝜇 such that the absolute error(precision)
50 dollars under a 95% confidence level. What is the sample size?
Sol: WANT: 𝑃 |𝑥̅ 𝜇| 50 0.95
⇒ 𝑃 50 𝑥̅ 𝜇 50 0.95
50 𝑥̅ 𝜇 50
𝑃 𝜎 𝜎 𝜎 0.95
√𝑛 √𝑛 √𝑛
50
⇒𝜎 𝑍 𝑍 . 𝑍 . 1.96
√𝑛
1.96𝜎
⇒𝑛 1536.64 ⎯⎯⎯⎯⎯ 𝑛 1537 #
50
Ex 4.14: To buy a 30‐sec commercial break during the telecast of Super Bowl XXIX(the 29th Super Bowl,
the championship game of NFL) cost approximately $1,000,000. Potential sponsors want to find out how
many people might be watching. In a survey of 1015 potential viewers, 734 said they will watch more
than a quarter of the advertisements aired during the game. Want 90% CI for that probability.
Sol: Case IV.3
Assuming n=1015 is large, p =probability of a potential viewer watches the
commercial
1 , 𝑤ℎ𝑒𝑛 𝑝𝑒𝑟𝑠𝑜𝑛 𝑤𝑎𝑡𝑐ℎ𝑒𝑠 ∑

Approximated PQ is 𝑋 𝑋
⎡ ⎤
𝑋 𝑝
1 𝛼 𝑃⎢ 𝑍 𝑍 ⎥
⎢ 𝑝 1 𝑝 ⎥
⎣ 𝑛 ⎦
𝑋 1 𝑋 𝑋 1 𝑋
𝑃 𝑋 𝑍 𝑝 𝑋 𝑍
𝑛 𝑛
734
⇒ 𝑋 0.72, 𝑍 𝑍 . 1.64
1015
∴ 90% 𝐶𝐼 𝑓𝑜𝑟 𝑝 𝑖𝑠: 0.697, 0.743 #
**** Question: What is n=? if we want |𝑋 𝑝| 𝑑 under 95% confidence?
Sol: 1 𝛼 𝑃 |𝑋 𝑝| 𝑑 𝑃 𝑑 𝑋 𝑝 𝑑
𝑃
When n is large, 𝑍
𝑍
⇒ 𝑛 𝑝 1 𝑝 ∙
𝑑
1 𝑍
𝑛∗
4 𝑑
.
If 𝑑 0.01, 𝛼 5%, 𝑡ℎ𝑒𝑛: 𝑛∗ 9604 #
.
Ex 4.15: A manufacter inspects a RS of size 200 items from a process and 20% are defective. After an
improvement process, another RS of size 200 is taken and 12% are defective. Does the improvement
process help under a 90% confidence level?
𝑝 𝑖𝑠 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑡𝑒 𝑎𝑓𝑡𝑒𝑟 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡

Sol: Case IV.4: Approximated PQ is :
𝑝 𝑖𝑠 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑡𝑒 𝑏𝑒𝑓𝑜𝑟𝑒 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡
𝑌 0.2 , 𝑋 0.12
⎡ ⎤
⎢ ⎥
𝑌 𝑋 𝑝 𝑝
1 𝛼 𝑃⎢ 𝑍 𝑍 ⎥
⎢ ⎥
𝑌 1 𝑌 𝑋 1 𝑋
⎢ ⎥
𝑛 𝑛
⎣ ⎦
𝑌 1 𝑌 𝑋 1 𝑋 𝑌 1 𝑌 𝑋 1 𝑋
𝑃 𝑌 𝑋 𝑍 𝑝 𝑝 𝑌 𝑋 𝑍
𝑛 𝑛 𝑛 𝑛
𝛼 0.1, 𝑛 200, 𝑛 200
⇒ 90% 𝐶𝐼 𝑖𝑠 0.02, 0.14 #
𝑊𝑖𝑑𝑡ℎ 0.14 0.02 0.12
𝐷 0.02 2 1
0.166 #
𝑊𝑖𝑑𝑡ℎ 0.12 12 6
Paired‐Sample Method:
For example, when measuring the effectiveness of diet plan, Let Xi andYi be the weight of the ith
individual before and after the implementation of the diet plan, respectively. Then, 𝑥 , … , 𝑥 would be
independent (same for 𝑌 , … , 𝑌 ) But the pair 𝑋 , 𝑌 would be dependent.
Note: If 𝑋 ~𝑁 𝜇 , 𝜎 𝑎𝑛𝑑 𝑌 ~𝑁 𝜇 , 𝜎 𝑡ℎ𝑒𝑛 𝑋 , 𝑌 ~𝐵𝑉𝑁 𝜇 , 𝜇 , 𝜎 , 𝜎 𝑐𝑎𝑙𝑙𝑒𝑑 𝐵𝑖𝑉𝑎𝑟𝑖𝑎𝑡𝑒 𝑁𝑜𝑟𝑚𝑎𝑙
Difference: 𝐷 𝑌 𝑋, 𝑖 1, … , 𝑛 , 𝑋 𝑌 𝑛𝑜𝑡 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑏𝑢𝑡 𝐷 ⊥ 𝐷
A PQ for 𝜇 𝜇 𝜇 : 𝑇 ,
√
∑ 𝑜 ∑ 𝐷 𝐷
𝑤ℎ𝑒𝑟𝑒 𝐷 , 𝑆
𝑛 𝑛 1
∴ 100 1 𝛼 % 𝐶𝐼 𝑓𝑜𝑟 𝜇 𝑖𝑠:
𝑆 𝑆
𝑑̅ 𝑡 𝑛 1 , 𝑑̅ 𝑡 𝑛 1 #
√𝑛 √𝑛

Ex 4.16: A RS of 41 marginally overweight non‐smoking men was taken and their blood pressure was
measured. After a diet plan has been implementedd for 3 months, their blood pressure was measured
again. Want 99% CI for 𝜇 𝜇
Sol: 𝛼 0.01,
𝑡 𝑛 1 𝑡 . 40 2.704
𝑑̅ 9, 𝑆 2.6
∴ 99% 𝐶𝐼 𝑓𝑜𝑟 𝜇 𝜇
𝑆
𝑑̅ 𝑡 𝑛 1 , 𝑑̅
√𝑛
𝑆
𝑡 𝑛 1
√𝑛
⇒ 7.9, 10.1
#
1 General Method
Other Methods:
2 CDF Approach
There are other more advanced topics for CI estimation:
1) Conservative CIs
2) Monte Carlo Simulation
3) Bayesian Intervales ⇒ "𝐺𝑒𝑡𝑡𝑖𝑛𝑔 𝑆𝑜𝑚𝑒𝑡ℎ𝑖𝑛𝑔 𝑓𝑟𝑜𝑚 𝑛𝑜𝑡ℎ𝑖𝑛𝑔"
4) Bootstrap CI’s
5) CIs for Stochastic processes(nonstationary processes)
(CI is a function of time t.)
6) Prediction Intervals
7) Tolerance Intervals
Ex. 4.19: RS 𝑥 , … , 𝑥 ~Γ 1, 𝛽 ⇒ True Γ 1,100 𝐸𝑋𝑃 100 𝛽 100
𝑛 20: 131.7, 182.7, 73.3, 10.7, 150.4, 42.3, 22.2, 17.9, 264.0, 154.4, 4.3, 215.6, 61.9,
10.8, 48.8, 22.5, 8.8, 150.6, 103.0, 85.9
Find the 100 1 𝛼 % Bootstrap percentile CI for 𝛽.
Sol:
1) Resample with replacement 3000 times:
(m=# of Bootstrap samples, b=sample size of each Bootstrap sample)
𝑥∗ 𝑥 ∗, , 𝑥 ∗, , … , 𝑥 ∗, , 𝑥 ∗ 𝑥 ∗, , 𝑥 ∗, , … , 𝑥 ∗ , , … , 𝑥 ∗ 𝑥∗ , , … , 𝑥∗ ,
𝑒𝑔: 𝑥 ∗ : 4.3, 4.3, 4.3, 10.8, 10.8, 10.8, 10.8, 17.9, 22.5, 42.3, 48.8, 48.8, 85.9, 131.7, 131.7,
150.4, 154.4, 154.4, 264.0, 265.6
2) Compute 𝜃 ∗ 𝑥̅ ∗ , 𝑗 1, … ,3000 , 𝑎𝑛𝑑 𝑜𝑟𝑑𝑒𝑟 𝑡ℎ𝑒𝑚: 𝜃 ∗ 𝜃∗ ⋯ 𝜃∗
3) The 100 1 𝛼 % percentile Bootstrap CI for 𝜃 is:
𝜃∗ , 𝜃∗
∗ ∗
𝛼 0.05 0.05 3000 150 3000 1 150 2849

𝜃∗ ⋯ 𝜃 ∗
⋯ 𝜃 ∗
⋯ 𝜃∗
%
Suppose α 0.1, 90% CI is
1) 61.655, 120.48 ⇒ 𝑤 58.825 cover β 100
At least 3 other CI’s for 𝛽
2) ~𝐴𝑁 0,1 ⇒ 𝑋 90.59, 𝑆 82.39 , 𝑛 20
√
𝑆 𝑆
𝐶𝐼 ⇒ 𝑥̅ 𝑍 , 𝑥̅ 𝑍
√𝑛 √𝑛
60.37,120.8 ⇒ 𝑤 60.43
3) Case I.A:
2𝑛𝑋 2𝑛𝑋
𝐶𝐼 ⇒ , , 𝛼 0.1, 𝑋 90.59,
𝑋 2𝑛 𝑋 2𝑛
64.99, 136.69 ⇒ 𝑤 71.7 𝑐𝑜𝑣𝑒𝑟 𝛽 100

4) Treat 3000 Bootstrap sample means as a RS of size 3000
∑ ̅∗
i.e, 𝑥̅ ∗ , 𝑥̅ ∗ , … , 𝑥̅ ∗ ⇒ 𝑥̅ ∗
∑ 𝑥̅ ∗ 𝑥̅ ∗
𝑆
2999
𝑆 𝑆
∴ 𝐶𝐼 ⇒ 𝑥̅ ∗ 𝑍 , 𝑥̅ ∗ 𝑍 #
√3000 √3000

Chapter 5: Hypothesis Tests

STEPS for HT:
𝑪𝒐𝒎𝒑𝒖𝒕𝒂𝒕𝒊𝒐𝒏 𝑫𝒆𝒄𝒊𝒔𝒊𝒐𝒏
𝑷𝒓𝒐𝒃𝒍𝒆𝒎 𝑫𝒆𝒇𝒊𝒏𝒆 𝑬𝒙𝒑𝒆𝒓𝒊𝒎𝒆𝒏𝒕
→ → → → 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 𝑏𝑎𝑠𝑒𝑑
𝑀𝑎𝑘𝑒 𝑎 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝐻 𝑣𝑠 𝐻 𝑐𝑜𝑙𝑙𝑒𝑐𝑡 𝑑𝑎𝑡𝑎 𝑥 , … , 𝑥 𝑋, 𝑆
𝑜𝑛 𝛼% 𝑣𝑎𝑙𝑢𝑒
Ex 5.1: An automobile company is testing whether a certain additive can help increase gas mileage.
Without the additive, it takes on average 25.0 mpg with 𝜎 2.4 𝑚𝑝𝑔 (S is known for simplicity) By using
the additive, will that increase the gas mileage? Assume 𝑥 𝑠 ~ 𝑁 𝜇, 2.4
Sol:
1) Problem: Effective or Not?
𝐻 ∶ 𝜇 25.0 𝑁𝑜𝑡 𝐸𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑁𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠
2) Define:
𝐻 : 𝜇 25.0 𝐸𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝐴𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒
3) Experiment:
Using the additive, 30 cars are being tested on a road trip from Boston to LA: giving:
𝑥 ,𝑥 ,…,𝑥
4) Comutation: 𝑋 26.3 𝑚𝑝𝑔

5) Decision: If 𝑋 is too big, then we believe 𝐻 (How big?)
Use 5% rule (“Beyond all reasonable doubt”)
0.05 𝑃 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 |𝐻 𝑡𝑟𝑢𝑒
𝑃 𝑋 𝑥 ∗ | 𝜇 25
∗
𝑃 . . |𝜇 25
√ √
𝑥 ∗ 25
𝑃 𝑍
2.4
√30
𝑥̅ ∗ 25
⇒ 𝑍 . 1.64
2.4⁄√30
⇒ 𝑥̅ ∗ 25.718
∴ 𝑥̅ ∗ 26.3 25.718 𝑖𝑠 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 #
∗ 𝐼𝑓 𝑤𝑒 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 : 𝜇 25 𝑢𝑠𝑖𝑛𝑔 𝑎 5% 𝑟𝑢𝑙𝑒, 𝑡ℎ𝑒𝑛 𝑤𝑒 𝑠𝑎𝑦 𝑡ℎ𝑎𝑡 𝑥̅ 26.3 𝑥̅ ∗ 𝑖𝑠 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄𝒂𝒍𝒍𝒚
𝒔𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒆𝒏𝒕

↛ 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒
∗ 𝐻 𝑖𝑠 𝑁𝑂𝑇 𝑟𝑒𝑗𝑒𝑐𝑡𝑒𝑑 ⇒ Fail to reject 𝐻
⇒ 𝑁𝑜𝑡 𝑒𝑛𝑜𝑢𝑔ℎ 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 𝑡𝑜 𝑜𝑣𝑒𝑟𝑡𝑢𝑟𝑛 𝑡ℎ𝑒 𝑝𝑟𝑒𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛
Def 5.1:
1) Any function of the ovserved data whose numerical value dictates whether H0 is accepted or
rejected is called a test statistic eg. 𝑥̅
2) The set of values for the test statistic that results in the null hypothesis
being rejected is called the critical region, denoted by C
3) The particular point in C that separates the rejection region from the
acceptance region is called the critical value. eg. 𝑥 ∗ 25.718
Def 5.2: The probability that the test statistic lies in the critical region when 𝐻 is true is called the level
of significance (the size of the test), denoted by 𝛼.
Def 5.2:
1) TYPE I Error: Reject a true H0
2) TYPE II Error: Fail to reject a false H0
*To compute P[T II], we need to know the decision rule first.
𝐻 ∶ 𝜇 𝜇 , 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 𝑎𝑡 𝛼
Eg., Consider: vs.
𝐻 ∶ 𝜇 𝜇 𝜇
̅ ∗
Test: 𝛼 𝑃 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 | 𝐻 𝑃𝑋 𝑥 ∗ |𝜇 𝑥 𝑠~ 𝑁𝑜𝑟𝑚𝑎𝑙, ⊥ 𝑃 𝜇
√ √
𝑃𝑍 𝑍
𝑃 𝑇𝐼𝐼 𝑃 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻 |𝐻
𝑥̅ 𝜇
𝑃 𝜎 𝑍 𝜇
√𝑛
𝑥̅ 𝜇 𝜇 𝜇
𝑃 𝜎 𝜎 𝑍 |𝜇
√𝑛 √𝑛
𝑃 𝑍 𝜇
√ √
Φ 𝑍
√
In Ex 5.1, if 𝐻 ∶ 𝜇 25.75, then
.
𝑃 𝑇 𝐼𝐼 Φ 𝑍 . . 0.4721 Not a good decision
√

**Simple and Composite Hypotheses:
Def 5.4: If a hypothesis completely specifies 𝑓 𝑥; 𝜇 , then it is called a simple hypothesis, otherwise it is
a composite hypothesis.
Eg,
𝐻 ∶ 𝜇 25 𝑆𝑖𝑚𝑝𝑙𝑒

𝐻 ∶ 𝜇 25.75 𝑆𝑖𝑚𝑝𝑙𝑒


𝐻 ∶ 𝜇 25 𝐶𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑒



**One‐side and Two‐side Tests:

𝐻 ∶ 𝜇 25 𝑢𝑝𝑝𝑒𝑟 𝑜𝑛𝑒 𝑠𝑖𝑑𝑒

𝐻 ∶ 𝜇 25 𝑡𝑒𝑠𝑡

𝐻 ∶ 𝜇 25 𝑙𝑜𝑤𝑒𝑟 𝑜𝑛𝑒 𝑠𝑖𝑑𝑒


𝐻 ∶ 𝜇 25 𝑡𝑤𝑜 𝑠𝑖𝑑𝑒

**For a
composite
null hypothesis, the size of the test is bounded by 𝜶.
∞, 𝜇 ←
Ω 𝐻 ∶ 𝜇 𝜇 𝑥̅ 𝜇
𝑣𝑠 , 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 𝑎𝑡 𝑙𝑒𝑣𝑒𝑙 𝛼 𝑖𝑒. , 𝑍
Ω Ω 𝜇 ,∞ ← 𝐻 ∶ 𝜇 𝜇 𝜎 ⁄ √𝑛
1) We know that when 𝜇 𝜇 , 𝑃 𝑇 𝐼 𝛼
2) Now, need to show that 𝑃 𝑇 𝐼 𝛼 when 𝜇 𝜇
𝜇 𝜇 ̅
When ∶ 𝑃 𝑇𝐼 𝑃 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 |𝐻 𝑃 ⁄ 𝑍 𝜇
𝜇 √
𝑥̅ 𝜇 𝜇 𝜇
𝑃 𝑍 𝜇
𝜎⁄√𝑛 𝜎⁄√𝑛
1 Φ 𝑍 ↲ Φ ∙ 𝐶𝐷𝐹 𝑜𝑓 𝑁 0,1

√
𝛼

Def 5.5: For composite hypotheses,𝐻 ∶ 𝜇 ∈ Ω 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻 ∶ 𝜇 ∈ Ω Ω , the size of the test is 𝛼
max 𝜋 𝜇 max 𝑃 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 |𝜇
∈ ∈

*Express 𝑷 𝑻 𝑰𝑰 𝐢𝐧 𝐭𝐞𝐫𝐦𝐬 𝐨𝐟 𝜶:
⎛ 𝜇 𝜇 ⎞
𝑃 𝑇 𝐼𝐼 Φ ⎜𝑍 𝛽
𝜎 ⁄ √𝑛 ⎟
⎝ ⎠
𝜇 𝜇
⇒ 𝑍 𝑍 𝑍
𝜎⁄√𝑛
𝑍 𝑍 𝜎
⇒𝑛
𝜇 𝜇

𝛼 0.05, 𝛽 0.1 1 𝑊ℎ𝑒𝑛 𝜇 25.75 ⇒ 𝑛 87.73 ⇒ 88
In Ex 5.1,
𝜇 25, 𝜎 2.4 2 𝑊ℎ𝑒𝑛 𝜇 26.8 ⇒ 𝑛 15.23 ⇒ 16

** 𝑷 𝑻 𝑰𝑰 𝑷 𝑭𝒂𝒊𝒍 𝒕𝒐 𝒓𝒆𝒋𝒆𝒄𝒕 𝑯𝟎 |𝑯𝟏 𝒕𝒓𝒖𝒆 𝜷
𝟏 𝑷 𝑻 𝑰𝑰 𝟏 𝛃 𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐨𝐟 𝐢𝐝𝐞𝐧𝐭𝐢𝐟𝐲𝐢𝐧𝐠 𝐚 𝐭𝐫𝐮𝐞 𝐇𝟏

Def. 5.6: The power function (power curve), 𝜋 𝜇 , of a test of H0 is the probability of rejecting H0 when
the true value of the parameter is 𝜇.

𝝁𝟎 𝝁
** 𝝅 𝝁 𝟏 𝜷 𝟏 𝚽 𝒁𝟏 𝜶
𝝈⁄√𝒏
𝜇 𝜇
𝑍 𝜎 ↓ 𝜇
√𝑛
𝜇 𝜇
Φ Z ↓𝜇
𝜎⁄√𝑛
𝜇 𝜇
1 Φ 𝑍 ↑𝜇
𝜎⁄√𝑛

**Two usages of the power curve:
1) It characterizes the “performance” of a test
2) It provides a way of comaring the power of difficult tests

*The power function for a two‐side test:
𝜋 𝜇 1 𝛽 1 𝑃 𝐹𝑎𝑖𝑙 𝑡𝑜 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 |𝐻
𝑥̅ 𝜇
1 𝑃 𝑍 𝑍 𝜇
𝜎⁄√𝑛
𝜇 𝜇 𝑥̅ 𝜇 𝜇 𝜇
1 𝑃 𝑍 𝑍 𝜇
⁄
𝜎 √𝑛 ⁄
𝜎 √𝑛 ⁄
𝜎 √𝑛
1 Φ 𝑍 ⁄√
Φ 𝑍 ⁄√

Def 5.7: The P‐value associated with a test statistic is the probability of getting a more extreme value for
that statistic then what was actually observed (relative to H1) given that H0 is true.
x 26.3
Eg., In Ex 5.1:
New x 30

1) The smaller the p‐value, the stronger the evidence to
reject H0
2) A p‐value In favor of H0

Mathematical Statistics Lecture Notes: Chapter 0: Review of Probability

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematical Statistics Lecture Notes: Chapter 0: Review of Probability

Uploaded by

Copyright:

Available Formats

Mathematical Statistics Lecture Notes

Theorem 0.1 (Factorization criteria for independence): Jointly distributed rvs, X 1 , X 2 , , X n

Shown that X and Y are uncorrelated but dependent using Theorem 1.

Theorem 0.4: Let Y  aX  b, where a and b are constants. Then

M X1 , X 2 ,, X r  t1 , t2 , , tr   M X1 , X 2 ,, X n  t1 , t2 , , tr ,0,0, ,0  . (15)

Theorem 0.7: If M X 1 , X 2  t1 , t 2  exists, then moments of all orders exist and

Example 0.7: Use Theorem 8 to show that if X 1 , X 2 , , X n are independent and a1 , a2 , , an are

Proof: (Similar to Theorem 0.9a)

Corollary 0.1: If X ~ exp(  ), M X  t   1   t  .

Theorem 0.11: If X ~   ,   , then cX ~   , c   .

Theorem 0.12: If X 1 , X 2 , , X n are independent, with X i ~   i ,   , then

Corollary 0.2: If X 1 , X 2 , , X n are independent, with X i ~ exp   , then

Corollary 0.6: If X 1 , X 2 , , X n are independent, with X i ~ 2i , then

Theorem 0.13: If Z ~ N  0,1 (standard normal), then Z 2 ~ 12 .

1) MGF method ⇒ 𝑒. 𝑔. , 𝑋~𝑁 0,1 → 𝑥 ~ 𝑋

Ex 0.10: Let 𝑋~𝐸𝑋𝑃 . 𝐹𝑖𝑛𝑑 𝑓 𝑦 𝑤ℎ𝑒𝑟𝑒 𝑌 𝑒 ,𝑏 0.

Formally, Let 𝑌 𝑢 𝑥 , where 𝑢 ∙ is a real‐value function

Hence, 𝑌~ 𝐵𝐼𝑁 𝑛, 𝑞 #

Partition A into disjoint subsets 𝐴 , 𝐴 , …, such that 𝑢 𝑥 is one‐to‐one over each 𝐴 , the pdf

Ex 0.14: Let 𝑓 𝑥 ,𝑥 2, 1,0,1,2 𝑎𝑛𝑑 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟 𝑌 |𝑋|. 𝐹𝑖𝑛𝑑 𝑓 𝑦

Ex 0.15: Let 𝑥 , 𝑥 be independent and exponential, 𝑥 ~𝐸𝑋𝑃 1 .

1) 𝐹𝑖𝑛𝑑 𝑡ℎ𝑒 𝑟𝑎𝑛𝑔𝑒 𝑜𝑓 𝑌 , 𝑌 . 𝑖𝑒, 𝑦 , 𝑦 ∈ 𝐵.

Extension: 𝑓 𝑦 𝑓 , 𝑦 , 𝑦 𝑑𝑦 𝑒 𝑑𝑦 𝑦 𝑒 → Γ 2,1 , 0 𝑦 #

a) Let 𝑓 𝑥 2𝑥, 0 𝑥 1 and 𝑌 𝑋 . Find 𝑓 𝑦 ?

Ex 0.17: Let 𝑋 ⊥ 𝑌~𝑈 0,1 . Find 𝑓 𝑢 , where 𝑈 𝑋 𝑌

1) Define a dummy r.v. V=Y convolution: 𝑓 𝑢 𝑓 𝑡 𝑓 𝑢 𝑡 𝑑𝑡

𝑦 𝑈 𝑋 has a unique solution 𝑋 𝑋

𝑥 , 𝑥 , 𝑥 ⇒ 𝑒𝑥𝑎𝑚 𝑠𝑐𝑜𝑟𝑒𝑠 𝑜𝑓 3 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 , ⊥ , 𝑓 𝑥

Thm 1.2: If 𝑥 , . . . , 𝑥 is a rs of size n from 𝑓 𝑥 with 𝐸 𝑥 𝜇 and 𝑉𝑎𝑟 𝑥 𝜎 , define the sample

𝑆𝑢𝑏𝑠𝑡𝑒𝑝: 𝑐𝑜𝑟 𝑋, 𝑌 𝑍 𝑐𝑜𝑟 𝑋, 𝑌 𝑐𝑜𝑟 𝑋, 𝑍

𝑒𝑣𝑒𝑛 𝑖𝑓 𝑌, 𝑍 𝑎𝑟𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑒𝑑

is a standard normal rv, 𝑍 → 𝑍~𝑁 0,1 𝑎𝑠 𝑛 → ∞ (→ 𝑚𝑒𝑎𝑛𝑠 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒 𝑖𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

Thm 8.4: 𝐼𝑓 𝑥 , . . . , 𝑥 is a rs from 𝑁 𝜇, 𝜎 , then 𝑥̅ ~𝑁 𝜇, .

𝐹_𝑥 𝑥_𝑝 𝑝 𝑝𝑥 𝑥_𝑝

𝑖𝑠 𝑟𝑒𝑓𝑒𝑟𝑟𝑒𝑑 𝑡𝑜 𝑎𝑠 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 𝑠 𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝛾 𝑑𝑜𝑓, 𝑑𝑒𝑛𝑜𝑡𝑒𝑑 𝑏𝑦 𝑇~𝑡 𝛾 .

𝑇ℎ𝑒 𝑝𝑑𝑓 𝑜𝑓 𝑇 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦 ∶ 𝑓 𝑡; 𝛾 ∗ 1

Def 2.2: The joint density function of n rv’s 𝑋 , … , 𝑋 evaluated at𝑥 , … , 𝑥 , say 𝑓 𝑥 , … , 𝑥 , is referred

Def 2.3: Let 𝐿 𝜃 𝑓 𝑥 , … , 𝑥 ; 𝜃 , 𝜃 ∈ Ω be the joint pdf of 𝑥 , … , 𝑥 . For a given set of (𝑥 , … , 𝑥 ), a

𝑓 𝑥 , … , 𝑥 ; 𝜃 max 𝑓 𝑥 , … , 𝑥 ; 𝜃 𝑎𝑛𝑑 𝜃 𝑡 𝑥 , … , 𝑥 is called the maximum likelihood

Ex 2.1: If r.s. 𝑥 𝑠~𝑃𝑂𝐼 𝜃 . 𝐹𝑖𝑛𝑑 𝜃 ?

If 𝜃 that maximizes 𝐿 𝜃 , it also maximizes ln 𝐿 𝜃 because

Ex. 2.2: RS 𝑥 𝑠~𝐸𝑋𝑃 1 𝑠𝑐𝑎𝑙𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 , 𝜂 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 ,two‐parameter exponential

𝐹𝑟𝑜𝑚 𝑡ℎ𝑒 𝑔𝑟𝑎𝑝ℎ 𝑜𝑓 𝐿 𝜂 , 𝑖𝑡𝑠 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑜𝑐𝑐𝑢𝑟𝑠 𝑎𝑡 𝑥 : ,

Q: Is 𝑥~𝑃𝑂𝐼 𝜃 ? # of major changes Observed frequency Expected frequency

∴ 𝐸𝑥 0 356 0.65 230.4

Since the data and the expected frequency are close, we believe that 𝑥~𝑃𝑂𝐼 𝜃 (Need a formal

Thm 2.1: Invariance Property: If 𝜃 is the MLE of 𝜃 and if 𝜏 𝜃 is a function of 𝜃, then

Thm 2.2: If 𝜃 𝜃 , … , 𝜃 is the MLE of 𝜃 𝜃 , … , 𝜃 , then the MLE of 𝜏 𝜏 𝜃 , … , 𝜏 𝜃 is 𝜏̂

Ex. 2.4: rs 𝑥 𝑠~𝑃 𝑥; 𝜇 ,𝑥 0,1,2, . … 𝐹𝑖𝑛𝑑 𝜇̂ ?

Ex. 2.5: Let 𝑥 𝑠~𝐸𝑋𝑃 𝜃 𝑙𝑖𝑓𝑒𝑡𝑖𝑚𝑒 𝑜𝑓 𝑛 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠 . We only observed the first r

Ex. 2.6: rs 𝑥 𝑠~𝑁 𝜇, 𝜎 , Find 𝜇̂ and 𝜃 .

Suppose 𝑥 𝛼 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑥 ↔ 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑒. 𝑔, 𝜃, 𝜂

𝐸. 𝑔. , 𝑋~𝐸𝑋𝑃 𝜂, 𝜃 𝜂 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛, 𝜃 𝑠𝑐𝑎𝑙𝑒 𝐹𝑖𝑛𝑑 𝑥 ? 𝑥 , . . , 𝑥 𝑖𝑠 𝑎 𝑅𝑆

𝐿𝑒𝑡 𝑥 , … , 𝑥 𝑏𝑒 𝑎 𝑟𝑠 𝑓𝑟𝑜𝑚 𝑓 𝑥; 𝜃 , … , 𝜃 . 𝑀𝑀𝐸, 𝜃 , … , 𝜃 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑠 𝑜𝑓

***Idea: 𝑥 , 𝑥 , … , 𝑥 ↓ 𝑡ℎ𝑒𝑜𝑟𝑒𝑡𝑖𝑐𝑎𝑙 𝑚𝑜𝑚𝑒𝑛𝑡𝑠

Def 2.6: Let 𝑥 , … , 𝑥 be a rs from 𝑓 𝑥; 𝜃 , … , 𝜃 . The method of moments estimator (MME), 𝜃 , … , 𝜃

Ex 2.8: Let 𝑋~𝑃 𝑘; 𝜃 𝜃 1 𝜃 ,𝑘 0,1. If 𝑥 , … , 𝑥 1,0,1,1,0

Theoretical Moment Sample Moment

Ex 2.10: Let 𝑋~𝑁 𝜇, 𝜎 , 𝑅𝑆 𝑥 𝑠, Find 𝜇̂ and 𝜎 ?

Theoretical Moment Sample Moment

Theoretical Moment Sample Moment

Ex 2.14: RS 𝑥 𝑠 ~𝐸𝑋𝑃 𝜃 , we know 𝜃 𝑥̅ is unbiased for θ. Find another θ that is