Probability Theory and Mathematical Statistics: Homework 2, Vitaliy Pozdnyakov

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Probability Theory and Mathematical Statistics

Homework 2, Vitaliy Pozdnyakov

Part 1

1. Prove that the frequency of an event A is an efficient estimator of probability P(A)

Let 𝑋1 , 𝑋2 ,…, 𝑋𝑛 — not dependent random variables, each distributed:


𝑃(𝐴) = 𝑃(𝑋𝑖 = 1) = 𝑝
𝑃(⎯𝐴⎯⎯⎯) = 𝑃(𝑋𝑖 = 0) = 1 − 𝑝 = 𝑞
Let frequency 𝜃̂ 𝑝 = 𝑚
𝑛 where 𝑚 — the number of happened events 𝐴 (if 𝑋𝑖 = 1) and 𝑛 — the number of all
attempts

𝑀(𝑋𝑖 ) = 1 ⋅ 𝑝 + 0 ⋅ 𝑞 = 𝑝
𝑀(𝑋) = 𝑀(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) = 𝑛𝑀(𝑋𝑖 ) = 𝑛𝑝
𝑀( 𝑚𝑛 ) = 𝑀( 𝑋1+𝑋2𝑛+⋯+𝑋𝑛 ) = 1𝑛 𝑀(𝑋) = 𝑛𝑝𝑛 = 𝑝
𝑚 is an unbiased estimator
Thereby 𝑛
𝐷(𝑋𝑖 ) = (1 − 𝑝)2 𝑝 + (0 − 𝑝)2 𝑞 = 𝑞 2 𝑝 + 𝑝2 𝑞 = 𝑝𝑞(𝑝 + 𝑞) = 𝑝𝑞
𝐷(𝑋) = 𝐷(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) = 𝑛𝑝𝑞
𝐷( 𝑚𝑛 ) = 𝐷( 𝑋1+𝑋2𝑛+⋯+𝑋𝑛 ) = 𝐷(𝑋)
𝑛2 = 𝑛
𝑝𝑞

Let 𝜑(1,𝑝) and 𝜑(0,𝑞) are PDFs of value 1 and 0 respectively, then

𝜑(1,𝑝) = 𝑝 and 𝜑(0,𝑝) = 1 − 𝑝


Fisher information
′ (1,𝑝) 2 𝜑′ (0,𝑝) 2 𝜑(0,𝑝) = [𝜑′ (1,𝑝)]2 + [𝜑′ (0,𝑝)]2 = 1
𝐼(𝑝) = 𝑀 [ln𝜑(𝑋,𝑝)′𝑝 ]2 = [ 𝜑𝜑(1,𝑝) ] 𝜑(1,𝑝) + [ 𝜑(0,𝑝) ] 𝜑(1,𝑝) 𝜑(0,𝑝) 𝑝
+ 1−𝑝1 = 𝑝(1−𝑝)
1 = 1
𝑝𝑞
min𝐷(𝜃̂ 𝑝 ) = 𝑛𝐼(𝑝)1 = 𝑝𝑞𝑛 = 𝐷( 𝑚𝑛 )
𝑚 is an efficent estimator
Thereby
𝑛
2. The sample set ( 𝑋1 … 𝑋𝑛 ) of random
⎯⎯⎯⎯⎯
variable 𝑋 corresponds the exponential distribution with
parameter 1 (𝐸( )). Prove that 𝑋 is an efficient estimator
1
𝜃 𝜃
𝜆 = 1𝜃 and 𝑋𝑖 ∼ 𝐸𝑥𝑝(𝜆)
Let

𝑀(𝑋𝑖 ) = 𝜆−1
𝑀(𝑋⎯⎯⎯⎯⎯) = 𝑀( 𝑋1+⋯+𝑛 𝑋𝑛 ) = 𝑛𝜆𝑛−1 = 𝜆−1
Thereby 𝑋⎯⎯⎯⎯⎯ is an unbiased estimator of parameter 𝜆−1
𝐷(𝑋𝑖 ) = 𝜆−2
𝐷(𝑋⎯⎯⎯⎯⎯) = 𝐷( 𝑋1+⋯+𝑛 𝑋𝑛 ) = 𝑛𝜆𝑛−22 = 𝜆𝑛−2 = 𝑛𝜆12
Let 𝜑(𝑥,𝜆) = 𝜆𝑒−𝜆𝑥
Fisher information 𝐼(𝜆) = 𝑀[ln𝜑(𝑥,𝜆) ′𝜆 ] 2 = 𝑀[(ln𝜆 − 𝜆𝑥) ′𝜆 ] 2 = 𝑀[ 1𝜆 − 𝑥] 2 = 𝑀[ 12 − 2𝑥
𝜆 𝜆 +𝑥 ]
2

= 𝜆12 − 2𝜆 𝑀[𝑋] + 𝑀[𝑋 2 ] = 𝜆12 − 𝜆22 + 𝜆2!2 = 𝜆−2 and then


𝐼(𝜆−1) = 𝜆2
min𝐷 = 𝑛𝐼(1𝜆−1 ) = 𝑛𝜆12 = 𝐷(𝑋⎯⎯⎯⎯⎯)
Thereby 𝑋⎯⎯⎯⎯⎯ is an efficent estimator of parameter 𝜆−1 = 𝜃
3. The random variable 𝑋 can be 0 or 1 with probability 0.5

a. What is mean 𝜇 and variance 𝜎 2 for 𝑋?

𝜇 = 𝑀(𝑋) = 0 ⋅ 0.5 + 1 ⋅ 0.5 = 0.5


𝜎 2 = 𝐷(𝑋) = (0 − 0.5)2 0.5 + (1 − 0.5)2 0.5 = 0.25 ⋅ 0.5 + 0.25 ⋅ 0.5 = 0.25 = 𝜇2
b. The set of 𝑋 : 𝑋1 … 𝑋9 . Let’s take a look at five different estimators of mean 𝜇:

𝜇^1 = 0.45
𝜇^2 = 𝑋1
𝜇^3 = 𝑋⎯⎯⎯⎯⎯ 1
𝜇^4 = 𝑋1 + 3 𝑋2
𝜇^5 = 23 𝑋1 + 23 𝑋2 − 13 𝑋3
Which of estimators are unbiased? What is the bias for each estimator? Which one is the most efficient?

𝑀(𝜇^1 ) = 𝑀(0.45) = 0.45 ≠ 𝜇 ⇒ it is biased estimator, 𝐵𝑖𝑎𝑠 = 0.5 − 0.45 = 0.05


𝑀(𝜇^2 ) = 𝑀(𝑋1 ) = 𝜇 ⇒ it is unbiased estimator, 𝐵𝑖𝑎𝑠 = 0
𝑀(𝜇^3 ) = 𝑀(𝑋⎯⎯⎯⎯⎯) = 19 𝑀(𝑋1 + ⋯ + 𝑋9 ) = 0.5⋅99 = 0.5 = 𝜇 ⇒ it is unbiased estimator, 𝐵𝑖𝑎𝑠 = 0
𝑀(𝜇^4 ) = 𝑀(𝑋1 + 13 𝑋2 ) = 𝑀(𝑋1 ) + 13 𝑀(𝑋2 ) = 1530 + 305 = 2030 ≠ 𝜇 ⇒ it is biased estimator,
𝐵𝑖𝑎𝑠 = 2030 − 1530 = 305
𝑀(𝜇^5 = 23 𝑀(𝑋1 ) + 23 𝑀(𝑋2 ) − 13 𝑀(𝑋3 ) = 23 𝜇 + 23 𝜇 − 13 𝜇 = 𝜇 ⇒ it is unbiased estimator, 𝐵𝑖𝑎𝑠 = 0
Let 𝜑(0,𝜇) = 𝜇 and 𝜑(1,𝜇) = 𝜇
′ (1,𝜇) 2 𝜑′ (0,𝜇) 2 𝜑(0,𝜇) = 12 𝜇 + 12 𝜇 = 2
Fisher information 𝐼(𝜇) = [ 𝜑𝜑(1,𝜇) ] 𝜑(1,𝜇) + [ 𝜑(0,𝜇) ] 𝜇 𝜇 𝜇

min𝐷 = 𝑛𝐼(𝜇)
1 = 12 = 𝜇
9 𝜇 18

Consider only unbiased estimators

𝐷(𝜇^2 ) = 𝐷(𝑋1 ) = 𝜇2 ≠ min𝐷 ⇒ it is not efficent estimator


𝐷(𝜇^3 ) = 𝐷(𝑋⎯⎯⎯⎯⎯) = 912 𝐷(𝑋1 + ⋯ + 𝑋9 ) = 99𝜇22 = 18𝜇 = min𝐷
Answer: 𝜇^3 is an efficent estimator (the most efficent)

4. Topographer measures the square of a rectangle area. The measurement tool is set as the length 𝑋
𝑌
and width of area are independent. The real value for the length is 𝑋 10 𝑌 5
and for the width is . The
distribution of 𝑋 𝑌
and :

The estimation of the square 𝐴 = 𝑋𝑌 is a random variable.


Is 𝑋 unbiased estimator of length? Prove
𝑀(𝑋) = 8 14 + 10 14 + 11 12 = 84 + 104 + 224 = 404 = 10 = 𝑋 ⇒ it is unbiased estimator
Is 𝑌 unbiased estimator of width? Prove

𝑀(𝑌) = 4 12 + 6 12 = 5 = 𝑌 ⇒ it is unbiased estimator


Is 𝐴 = 𝑋𝑌 unbiased estimator of square? Prove

𝑋𝑌 = 50
𝑀(𝑋𝑌) = ∑3𝑖=1 ∑2𝑗=1 𝑥𝑖 𝑦𝑗 𝑃(𝑋 = 𝑥𝑖 )𝑃(𝑌 = 𝑦𝑗 )
= 8 ⋅ 4 ⋅ 18 + 8 ⋅ 6 ⋅ 18 + 10 ⋅ 4 ⋅ 18 + 10 ⋅ 6 ⋅ 18 + 11 ⋅ 4 ⋅ 14 + 11 ⋅ 6 ⋅ 14
= 328 + 488 + 408 + 608 + 888 + 1328
= 4008 = 50 = 𝑋𝑌 ⇒ it is unbiased estimator
5. The two measurements of a side of square are taken 𝑋1 , 𝑋2 . Let measurements be independent with
mean 𝛼 (𝛼 is a real length of the square) and 𝜎 2 . What is MSE of the area estimator 𝑋1 𝑋2 ?

𝛼 2 — the real area of square


𝑀𝑆𝐸(𝑋1 𝑋2 ) = 𝐷(𝑋1 𝑋2 ) + 𝐵𝑖𝑎𝑠(𝑋1 𝑋2 , 𝛼 2 )2
𝐷(𝑋1 𝑋2 ) = 𝐷(𝑋1 )𝐷(𝑋2 ) + 𝛼 2 𝐷(𝑋1 ) + 𝛼 2 𝐷(𝑋2 ) = 𝜎 4 + 𝛼 2 𝜎 2 + 𝛼 2 𝜎 2 = 𝜎 4 + 2𝛼 2 𝜎 2
𝑀(𝑋1 𝑋2 ) = 𝑀(𝑋1 )𝑀(𝑋2 ) = 𝛼 2 ⇒ 𝐵𝑖𝑎𝑠 = 0
𝑀𝑆𝐸(𝑋1 𝑋2 ) = 𝜎 4 + 2𝛼 2 𝜎 2
6. The mean 𝜆 of Poisson distribution can be 1 or 2. The one measurement of random variable 𝑋 has
been taken. Two estimators of 𝜆 are defined as:

a) What is the mean and variance for ?𝜆̂ 


Poisson distribution: 𝑃(𝑋 = 𝑚) = 𝜆 𝑚!𝑒
𝑚 −𝜆
𝑀(𝑋) = 𝜆, 𝐷(𝑋) = 𝜆
For 𝜆 = 1

𝑀(𝜆̂ ) = 1 ⋅ 𝑃(𝑋 = 0,𝜆 = 1) + 1.5 ⋅ 𝑃(𝑋 = 1,𝜆 = 1) + 2 ⋅ 𝑃(𝑋 ≥ 2,𝜆 = 1)


= [𝑃(𝑋 ≥ 2,𝜆 = 1) = 1 − 𝑃(𝑋 = 0,𝜆 = 1) − 𝑃(𝑋 = 1,𝜆 = 1)]
= 𝑒−1 + 1.5 ⋅ 𝑒−1 + 2 ⋅ (1 − 2𝑒−1 )
= 2.5𝑒−1 + 2 − 4𝑒−1 = 2 − 1.5𝑒−1
In [6]:

import math

In [27]:

2 - 1.5 * math.exp(-1)

Out[27]:

1.4481808382428365

𝑀(𝜆̂ ) ≈ 1.45
𝐷(𝜆̂ ) = (1 − (2 − 1.5𝑒−1 ))2 𝑒−1 + (1.5 − (2 − 1.5𝑒−1 ))2 𝑒−1 + (2 − (2 − 1.5𝑒−1 ))2 (1 − 2𝑒−1 )
= (1.5𝑒−1 − 1)2 𝑒−1 + (1.5𝑒−1 − 0.5)2 𝑒−1 + (1.5𝑒−1 )2 (1 − 2𝑒−1 )
In [16]:

((1.5 * math.exp(-1) - 1) ** 2) * math.exp(-1) \


+ ((1.5 * math.exp(-1) - 0.5) ** 2) * math.exp(-1) \
+ (1.5 * math.exp(-1) ** 2) * (1 - 2 * math.exp(-1))

Out[16]:

0.12852405430626074

𝐷(𝜆̂ ) ≈ 0.13
For 𝜆 = 2

𝑀(𝜆̂ ) = 1 ⋅ 𝑃(𝑋 = 0,𝜆 = 2) + 1.5 ⋅ 𝑃(𝑋 = 1,𝜆 = 2) + 2 ⋅ 𝑃(𝑋 ≥ 2,𝜆 = 2)


= [𝑃(𝑋 ≥ 2,𝜆 = 2) = 1 − 𝑃(𝑋 = 0,𝜆 = 2) − 𝑃(𝑋 = 1,𝜆 = 2)]
= 𝑒−1 + 1.5 ⋅ 2𝑒−2 + 2 ⋅ (1 − 𝑒−1 − 2𝑒−2 )
= 𝑒−1 + 3𝑒−2 + 2 − 2𝑒−1 − 4𝑒−2
= 2 − 𝑒−1 − 𝑒−2
In [17]:

2 - math.exp(-1) - math.exp(-2)

Out[17]:

1.4967852755919449

𝑀(𝜆̂ ) ≈ 1.5
𝐷(𝜆̂ ) = (1 − (2 − 𝑒−1 − 𝑒−2 ))2 𝑒−1 + (1.5 − (2 − 𝑒−1 − 𝑒−2 ))2 2𝑒−2
+ (2 − (2 − 𝑒−1 − 𝑒−2 ))2 (1 − 𝑒−1 − 2𝑒−2 )
= (−1 + 𝑒−1 + 𝑒−2 )2 𝑒−1 + (−0.5 + 𝑒−1 + 𝑒−2 )2 2𝑒−2 + (𝑒−1 + 𝑒−2 )2 (1 − 𝑒−1 − 2𝑒−2 )
In [19]:

(-1 + math.exp(-1) + math.exp(-2))**2 * math.exp(-1) \


+ (-0.5 + math.exp(-1) + math.exp(-2))**2 * 2 * math.exp(-2) \
+ (math.exp(-1) + math.exp(-2))**2 * (1 - math.exp(-1) - 2*math.exp(-2))

Out[19]:

0.18232202392867392

𝐷(𝜆̂ ) ≈ 0.18
Answer:

for 𝜆 = 1: 𝑀(𝜆̂ ) ≈ 1.45, 𝐷(𝜆̂ ) ≈ 0.13


for 𝜆 = 2: 𝑀(𝜆̂ ) ≈ 1.5 , 𝐷(𝜆̂ ) ≈ 0.18

b) What is the mean and variance for 𝜆̃ ?

For 𝜆 = 1

𝑀(𝜆)̃  = 0 ⋅ 𝑃(𝑋 = 0,𝜆 = 1) + 1 ⋅ 𝑃(𝑋 = 1,𝜆 = 1) + 2 ⋅ 𝑃(𝑋 ≥ 2,𝜆 = 1)


= [𝑃(𝑋 ≥ 2,𝜆 = 1) = 1 − 𝑃(𝑋 = 0,𝜆 = 1) − 𝑃(𝑋 = 1,𝜆 = 1)]
= 𝑒−1 + 2(1 − 2𝑒−1 )
= 𝑒−1 + 2 − 4𝑒−1 = 2 − 3𝑒−1
In [21]:

2 - 3 * math.exp(-1)

Out[21]:

0.896361676485673

𝑀(𝜆)̃  ≈ 0.9
𝐷(𝜆)̃  = (0 − (2 − 3𝑒−1 ))2 𝑒−1 + (1 − (2 − 3𝑒−1 ))2 𝑒−1 + (2 − (2 − 3𝑒−1 ))2 (1 − 2𝑒−1 )
= (3𝑒−1 − 2)2 𝑒−1 + (3𝑒−1 − 1)2 𝑒−1 + 9𝑒−2 (1 − 2𝑒−1 )
In [34]:

(3 * math.exp(-1) - 2) ** 2 * math.exp(-1) \
+ (3 * math.exp(-1) - 1) ** 2 * math.exp(-1) \
+ 9 * math.exp(-2) * (1 - 2 * math.exp(-1))

Out[34]:

0.6213796567276975

𝐷(𝜆)̃  ≈ 0.62
For 𝜆 = 2

𝑀(𝜆)̃  = 0 ⋅ 𝑃(𝑋 = 0,𝜆 = 2) + 1 ⋅ 𝑃(𝑋 = 1,𝜆 = 2) + 2 ⋅ 𝑃(𝑋 ≥ 2,𝜆 = 2)


= [𝑃(𝑋 ≥ 2,𝜆 = 2) = 1 − 𝑃(𝑋 = 0,𝜆 = 2) − 𝑃(𝑋 = 1,𝜆 = 2)]
= 2𝑒−2 + 2 ⋅ (1 − 𝑒−1 − 2𝑒−2 )
= 2𝑒−2 + 2 − 2𝑒−1 − 4𝑒−2
= 2 − 2𝑒−1 − 2𝑒−2
In [35]:

2 - 2 * math.exp(-1) - 2 * math.exp(-2)

Out[35]:

0.9935705511838899

𝑀(𝜆)̃  ≈ 0.99
𝐷(𝜆)̃  = (0 − (2 − 2𝑒−1 − 2𝑒−2 ))2 𝑒−1 + (1 − (2 − 2𝑒−1 − 2𝑒−2 ))2 2𝑒−2
+ (2 − (2 − 2𝑒−1 − 2𝑒−2 ))2 (1 − 𝑒−1 − 2𝑒−2 )
= (−2 + 2𝑒−1 + 2𝑒−2 )2 𝑒−1 + (−1 + 2𝑒−1 + 2𝑒−2 )2 2𝑒−2 + (2𝑒−1 + 2𝑒−2 )2 (1 − 𝑒−1 − 2𝑒−2 )
In [36]:

(-2 + 2*math.exp(-1) + 2*math.exp(-2))**2 * math.exp(-1) \


+ (-1 + 2*math.exp(-1) + 2*math.exp(-2))**2 * 2 * math.exp(-2) \
+ (2*math.exp(-1) + 2*math.exp(-2))**2 * (1 - math.exp(-1) - 2*math.exp(-2))

Out[36]:

0.7292880957146957

𝐷(𝜆)̃  ≈ 0.73
Answer:

for 𝜆 = 1: 𝑀(𝜆̂ ) ≈ 0.9 , 𝐷(𝜆̂ ) ≈ 0.62


for 𝜆 = 2: 𝑀(𝜆̂ ) ≈ 0.99, 𝐷(𝜆̂ ) ≈ 0.73

c) Which estimator is more accurate?

𝜆=1
For

𝑀𝑆𝐸(𝜆̂ ) = 𝐷(𝜆̂ ) + 𝐵𝑖𝑎𝑠(𝜆̂ ,𝜆)2 ≈ 0.13 + 0.452


In [37]:

0.13 + 0.45 ** 2

Out[37]:

0.3325

≈ 0.33
𝑀𝑆𝐸(𝜆)̃  = 𝐷(𝜆)̃  + 𝐵𝑖𝑎𝑠(𝜆,𝜆̃  )2 ≈ 0.62 + 0.12
≈ 0.621
For 𝜆 = 2

𝑀𝑆𝐸(𝜆̂ ) = 𝐷(𝜆̂ ) + 𝐵𝑖𝑎𝑠(𝜆̂ ,𝜆)2 ≈ 0.18 + 0.52


In [38]:

0.18 + 0.5 ** 2

Out[38]:

0.43

≈ 0.43
𝑀𝑆𝐸(𝜆)̃  = 𝐷(𝜆)̃  + 𝐵𝑖𝑎𝑠(𝜆,𝜆̃  )2 ≈ 0.73 + 0.012
≈ 0.73
Answer: 𝜆̂ is more accurate then 𝜆̃ in both cases
7. Let 𝑋1 ,𝑋2 ,𝑋3 be a set of values for random variable with mean (𝜇) and variance (𝜎 2 ). Let’s take a look
at two estimators of 𝜎 2 :

^
1. 𝜎12 = 𝑐1 (𝑋1 − 𝜇) 2
^
2. 𝜎22 = 𝑐2 (𝑋1 − 𝜇) 2 + 𝑐2 (𝑋2 − 𝜇) 2 + 𝑐2 (𝑋3 − 𝜇) 2

^ ^
a) What 𝑐1 ,𝑐2 can be equal to satisfy the 𝜎12 , 𝜎22 are unbiased estimators of 𝜎 2 ?

𝑀(𝜎^12 ) = 𝑀[𝑐1 (𝑋1 − 𝜇)2 ] = 𝑐1 𝑀[(𝑋1 − 𝜇)2 ] = 𝑐1 𝜎 2 ⇒ 𝑐1 = 1


𝑀(𝜎^22 ) = 𝑀[𝑐2 (𝑋1 − 𝜇)2 + 𝑐2 (𝑋2 − 𝜇)2 + 𝑐2 (𝑋3 − 𝜇)2 ] = 𝑐2 𝑀[(𝑋1 − 𝜇)2 ]
+ 𝑐2 𝑀[(𝑋2 − 𝜇)2 ] + 𝑐2 𝑀[(𝑋3 − 𝜇)2 )] = 3𝑐2 𝜎 2 ⇒ 𝑐2 = 13
^ ^
b) Find proportion: efficiency(𝜎12 ) / efficiency(𝜎22 )

𝑐1 = 1 and 𝑐2 = 13 then 𝐵𝑖𝑎𝑠 = 0, so


If

𝑀𝑆𝐸(𝜎^12 ) = 𝐷(𝜎^12 ) = 𝐷[(𝑋1 − 𝜇)2 ]


𝑀𝑆𝐸(𝜎^22 ) = 𝐷[ 13 (𝑋1 − 𝜇)2 + 13 (𝑋2 − 𝜇)2 + 13 (𝑋3 − 𝜇)2 ] = 19 𝐷[(𝑋1 − 𝜇)2 ]
+ 19 𝐷[(𝑋2 − 𝜇)2 ] + 19 𝐷[(𝑋3 − 𝜇)2 ] = 13 𝐷[(𝑋𝑖 − 𝜇)2 ]
𝑀𝑆𝐸(𝜎^2 )
Thereby ^
1
𝑀𝑆𝐸(𝜎22 )
= 𝐷[(𝑋1 −𝜇)2 ]
1 𝐷[(𝑋𝑖 −𝜇)2 ]
3
=3

Part 2

(
1. The mean of the empirical distribution )2 is an estimator of ( the mean of the random variable)2 ?
Proof

Let 𝑋⎯⎯⎯⎯⎯ be the mean of the emperical distribution and 𝜇 is the mean of the random variable
𝐷(𝑋⎯⎯⎯⎯⎯) = 𝑀(𝑋⎯⎯⎯⎯⎯2 ) − 𝑀(𝑋⎯⎯⎯⎯⎯)2 ⇒ 𝑀(𝑋⎯⎯⎯⎯⎯2 ) = 𝐷(𝑋⎯⎯⎯⎯⎯) + 𝑀(𝑋⎯⎯⎯⎯⎯)2 = 𝐷(𝑋⎯⎯⎯⎯⎯) + 𝜇2
𝐷(𝑋⎯⎯⎯⎯⎯) = 𝐷( 𝑋1+⋯+𝑛 𝑋𝑛 ) = 𝑛12 𝐷(𝑋1 + ⋯ + 𝑋𝑛 ) = 𝑛12 𝑛𝜎 2 = 𝜎𝑛2
Thereby 𝑋⎯⎯⎯⎯⎯2 is biased estimator of 𝜇2 with 𝐵𝑖𝑎𝑠 = 𝜎𝑛2
2. Let 𝑋 be a random variable ∼ 𝑁(147.8, 12.32 ).

a. What is 𝑃(𝑋 < 163.3) ?

𝑎 = 147.8, 𝜎 2 = 12.32
163
𝑃(𝑋 < 163.3) = ∫−∞ ⎯ 1
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ − (𝑥−147.8)2
𝑒 2⋅12.32 𝑑𝑥 ≈ 0.89
√2𝜋 ⋅ 12.3 2

b. Let 𝑋⎯⎯⎯⎯⎯ be an empirical mean and 𝑠2 be an empirical


⎯⎯⎯⎯⎯
variance that have been obtained via sample set
with size 𝑛 = 25 of random variable 𝑋. What is 𝑃(𝑋 ≤ 150.9) ?

𝑋⎯⎯⎯⎯⎯ ≤ 150.9 ⇒ 𝑋1+⋯+ 25 ≤ 150.9 ⇒ 𝑋1 + ⋯ + 𝑋𝑛 ≤ 3772.5


𝑋25

Let 𝑌 = 𝑋1 + ⋯ + 𝑋25 ⇒ 𝑀(𝑌) = 𝑀(𝑋1 + ⋯ + 𝑋25 ) = 25 ⋅ 147.8 = 3695

and 𝐷(𝑌) = 𝐷(𝑋1 + ⋯ + 𝑋25 ) = 25 ⋅ 12.32 = 3782.25

Thereby 𝑌 ∼ 𝑁(3695,3782.25)

3772.5 1 ⎯ 𝑒− (𝑥−3695)
𝑃(𝑌 < 3772.5) = ∫−∞ √⎯2𝜋⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
2
2⋅3782.25 𝑑𝑥 ≈ 0.83
⋅ 3782.25
c. What 𝑎 and 𝑏 can be equal to satisfy the 𝑃(𝑎 ≤ 𝑠2 ≤ 𝑏) = 0.95 ?

𝑠2 = (𝑋1−147.8)2+⋯+(
25
𝑋25 −147.8)2

Let 𝑌 = 𝑋 − 147.8 ⇒ 𝑀(𝑌) = 𝑀(𝑋 − 147.8) = 𝑀(𝑋) − 147.8 = 0

and 𝐷(𝑌) = 𝐷(𝑋 − 147.8) = 𝐷(𝑋) − 0 = 12.3


2

Thereby 𝑌 ∼ 𝑁(0, 12.32 ) ⇒ 𝑍 =


𝑌−0 ∼ 𝑁(0,1)
12.3

and 𝑎 ≤ 𝑠2 ≤ 𝑏 ⇒ 𝑎 ≤ 12.3(𝑍1225+⋯+𝑍252 ) ≤ 𝑏 ⇒ 12.325 𝑎 ≤ 𝑍12 + ⋯ + 𝑍252 ≤ 12.325 𝑏


⇒ 12.325 𝑎 ≤ 𝑊 ≤ 12.325 𝑏 where 𝑊 ∼ 𝜒 2 (𝑘 = 25)
12.3 𝑎 = 0 ⇒ 𝑃(0 ≤ 𝑊 ≤ 37.652) ≈ 0.95
Let
25
37.652 = 12.3
25𝑏 ⇒ 𝑏 = 37.652⋅12.3
25
In [65]:

37.652 * 12.3 / 25

Out[65]:

18.524784

Answer: 𝑎 = 0, 𝑏 ≈ 18.52
3.The table shows the average temperature in January per year in two cities:

a) What is empirical mean?

𝑋⎯⎯⎯⎯⎯ = 1𝑛 ∑ 𝑥𝑖
In [77]:

import pandas as pd

In [40]:

df = pd.DataFrame({
'city1': [-19.2, -14.8, -19.6, -11.1, -9.4, -16.9, -13.7, -4.9, -13.9, -9.4, -8.
'city2': [-21.8, -15.4, -20.8, -11.3, -11.6, -19.2, -13.0, -7.4, -15.1, -14.4, -

In [67]:

mean1 = 0
for i in list(df['city1']):
mean1 += i
mean1 = mean1 / len(df)
print('The emperical mean for City 1:', round(mean1, 2))

mean2 = 0
for i in list(df['city2']):
mean2 += i
mean2 = mean2 / len(df)
print('The emperical mean for City 2:', round(mean2, 2))

The emperical mean for City 1: -11.88


The emperical mean for City 2: -13.75

b) What is empirical variance?


𝑠2 = 1𝑛 ∑(𝑥𝑖 − 𝑋⎯⎯⎯⎯⎯)2 (a biased estimator)
In [71]:

var1 = 0
for i in list(df['city1']):
var1 += (i - mean1) ** 2
var1 = var1 / len(df)
print('The emperical variance for City 1:', round(var1, 2))

var2 = 0
for i in list(df['city2']):
var2 += (i - mean2) ** 2
var2 = var2 / len(df)
print('The emperical variance for City 2:', round(var2, 2))

The emperical variance for City 1: 22.14


The emperical variance for City 2: 20.09

c) What is empirical correlation coefficient?

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ ⎯⎯⎯⎯⎯⎯⎯⎯ ⎯⎯⎯⎯⎯⎯⎯⎯


𝑐𝑜𝑟𝑟 = 𝑠𝑐𝑜𝑣1𝑠2 = 𝑋1𝑋2𝑠−1𝑠𝑋2 1 ⋅𝑋2
In [75]:

prod_mean = 0
for i in range(len(df)):
prod_mean += list(df['city1'])[i] * list(df['city2'])[i]
prod_mean = prod_mean / len(df)
cov = prod_mean - mean1 * mean2
corr = cov / (var1**(1/2) * var2**(1/2))
print('The emperical correlation coefficient:', round(corr, 2))

The emperical correlation coefficient: 0.96

d) Create the histogram using python/R

In [82]:

import matplotlib.pyplot as plt


In [98]:

df.hist(figsize=(10,5), bins=5, grid=False)


plt.show()

You might also like