Professional Documents
Culture Documents
Ch2 Confidence Intervals For Normal Samples Part 2
Ch2 Confidence Intervals For Normal Samples Part 2
Chi-Squared Distribution
Let us remember the gamma distribution. A continuous random variable X is said to have a gamma distribution with parameters α > 0 and λ > 0, shown as X ∼ Gamma(α, λ), if its PDF is given by
α α−1 −λx
λ x e
x > 0
Γ(α)
fX (x) = {
0 otherwise
Now, we would like to define a closely related distribution, called the chi-squared distribution. We know that if Z1 , Z2 , ⋯ , Zn are independent standard normal random variables, then the random variable
X = Z1 + Z2 + ⋯ + Zn
2 2 2
Y = Z +Z + ⋯ + Zn ,
1 2
n 1
It can be shown that the random variable Y has, in fact, a gamma distribution with parameters α =
2
and λ =
2
,
n 1
Y ∼ Gamma ( , ).
2 2
Figure 8.5 shows the PDF of χ2 (n) distribution for some values of n.
Figure 8.5 - The PDF of χ2 (n) distribution for some values of n.
So, let us summarize the definition and some properties of the chi-squared distribution.
Definition 8.1.
If Z1 , Z2 , ⋯ , Zn are independent standard normal random variables, the random variable Y defined as
2 2 2
Y = Z +Z + ⋯ + Zn
1 2
2
Y ∼ χ (n).
Properties:
1. The chi-squared distribution is a special case of the gamma distribution. More specifically,
n 1
Y ∼ Gamma ( , ).
2 2
Thus,
1 n y
−1 −
fY (y) = n
y 2 e 2 , for y > 0.
n
2 2 Γ( )
2
2. E Y = n , Var(Y ) = 2n .
3. For any p ∈ [0, 1] and n ∈ N, we define χ2p,n as the real value for which
2
P (Y > χ p,n ) = p,
where Y ∼ χ (n).
2
Figure 8.6 shows χ2p,n . In MATLAB, to compute χ2p,n you can use the following command: chi2inv(1 − p, n)
Figure 8.6 - The definition of χ2p,n .
Now, why do we need the chi-squared distribution? One reason is the following theorem, which we will use in estimating the variance of normal random variables.
Theorem 8.3.
Let X1 , X2 , ⋯ , Xn be i.i.d. N (μ, σ 2 ) random variables. Also, let S 2 be the sample variance for this random sample. Then, the random variable Y defined as
2 n
(n − 1)S 1 ¯¯¯¯ 2
Y = = ∑(X i − X )
2 2
σ σ
i=1
¯¯¯¯
has a chi-squared distribution with n − 1 degrees of freedom, i.e., Y ∼ χ (n − 1).
2
Moreover, X and S 2 are independent random variables.
The t -Distribution
The next distribution that we need is the Student's t-distribution (or simply the t-distribution). Here, we provide the definition and some properties of the t-distribution.
The t-Distribution
Definition 8.2.
Let Z ∼ N (0, 1) , and Y ∼ χ (n),
2
where n ∈ N. Also assume that Z and Y are independent. The random variable T defined as
Z
T = −−−−
√Y /n
T ∼ T (n).
Properties:
1. The t-distribution has a bell-shaped PDF centered at 0, but its PDF is more spread out than the normal PDF (Figure 8.7).
2. E T = 0 , for n > 0. But E T , is undefined for n = 1.
3. Var(T ) = n , for n > 2. But, Var(T ) is undefined for n = 1, 2 .
n−2
4. As n becomes large, the t density approaches the standard normal PDF. More formally, we can write
d
5. For any p ∈ [0, 1] and n ∈ N, we define tp,n as the real value for which
P (T > tp,n ) = p.
t1−p,n = −tp,n .
In MATLAB, to compute tp,n you can use the following command: tinv(1 − p, n) .
Figure 8.7 shows the PDF of t-distribution for some values of n and compares them with the PDF of the standard normal distribution. As we see, the t density is more spread out than the standard normal PDF. Figure 8.8 shows tp,n .
Figure 8.7 - The PDF of t-distribution for some values of n compared with the standard normal PDF.
Why do we need the t-distribution? One reason is the following theorem which we will use in estimating the mean of normal random variables.
Theorem 8.4.
Let X1 , X2 , ⋯ , Xn be i.i.d. N (μ, σ 2 ) random variables. Also, let S 2 be the sample variance for this random sample. Then, the random variable T defined as
¯¯¯¯
X −μ
T =
−
−
S /√n
2
(n − 1)S
Y = .
2
σ
¯¯¯¯
Z X −μ
T = −−− = −
−
Y S /√n
√
n−1
has a t-distribution with n − 1 degrees of freedom.
Here, we assume that X1 , X2 , X3 , . . ., Xn is a random sample from a normal distribution N (μ, σ 2 ) , and our goal is to find an interval estimator for μ. We no longer require n to be large. Thus, n could be any positive integer. There are two possible scenarios depending on whether σ 2 is known or not.
If the value of σ 2 is known, we can
easily find a confidence interval for μ. This can be done using exactly the same method that we used to estimate μ for a general distribution for the case of large n. More specifically, we know that the random variable
¯¯¯¯
X −μ
Q = −
−
σ/√n
¯¯¯¯ σ ¯¯¯¯ σ
has N (0, 1) distribution. In particular, Q is a function of the Xi 's and μ, and its distribution does not depend on μ. Thus, Q is a pivotal quantity, and we conclude that [X −z α
,X + z α
] is (1 − α)100% confidence interval for μ.
2 √n 2 √n
Parameter to be Estimated: μ = E Xi .
¯¯¯¯ σ ¯¯¯¯ σ
Confidence Interval: [X −z α
,X + z α
] is a (1 − α)100% confidence interval for μ.
2 √n 2 √n
The more interesting case is when we do not know the variance σ 2 . More specifically, we are given X1 , X2 , X3 , . . ., Xn , which is a random sample from a normal distribution N (μ, σ 2 ) , and our goal is to find an interval estimator for μ. However, σ 2 is also unknown. In this case, using Theorem 8.4, we conclude that the random variable T
defined as
¯¯¯¯
X −μ
T = −
−
S /√n
has a t-distribution with n − 1 degrees of freedom, i.e., T ∼ T (n − 1) . Here, the random variable T is a pivotal quantity, since it is a function of the Xi 's and μ, and its distribution does not depend on μ or any other unknown parameters. Now that we have a pivot, the next step is to find a (1 − α) interval for T . Using the definition of tp,n , a
(1 − α) interval for T can be stated as
P (−t α
,n−1
≤ T ≤ t α
,n−1
) = 1 − α.
2 2
Therefore,
¯¯¯¯
X −μ
P (−t α ≤ ≤ t α ) = 1 − α,
,n−1 −
− ,n−1
2
S /√n 2
which is equivalent to
¯¯¯¯
S ¯¯¯¯
S
P (X − t α ≤ μ ≤ X +t α ) = 1 − α.
,n−1 −
− ,n−1 −
−
2
√n 2
√n
¯¯¯¯ S ¯¯¯¯ S
We conclude that [X −t α
,n−1
,X + t α
,n−1
] is (1 − α)100% confidence interval for μ.
2 √n 2 √n
Assumptions: A random sample X1 , X2 , X3 , . . ., Xn is given from a N (μ, σ 2 ) distribution, where μ = E Xi and Var(Xi ) = σ
2
are unknown.
Parameter to be Estimated: μ = E Xi .
¯¯¯¯ ¯¯¯¯
Confidence Interval: [X −t α
,n−1
S
,X + t α
,n−1
S
] is a (1 − α) confidence interval for μ.
2 √n 2 √n
Example 8.20
A farmer weighs 10 randomly chosen watermelons from his farm and he obtains the following values (in lbs):
7.72 9.58 12.38 7.77 11.27 8.80 11.10 7.80 10.17 6.00
Assuming that the weight is normally distributed with mean μ and variance σ 2 , find a 95% confidence interval for μ.
Solution
Now, suppose that we would like to estimate the variance of a normal distribution. More specifically, assume that X1 , X2 , X3 , . . ., Xn is a random sample from a normal distribution N (μ, σ 2 ) , and our goal is to find an interval estimator for σ 2 . We assume that μ is also unknown. Again, n could be any positive integer.
By Theorem 8.3, the
random variable Q defined as
2 n
(n − 1)S 1 ¯¯¯¯ 2
Q = = ∑(X i − X )
2 2
σ σ
i=1
2 2
P (χ α ≤ Q ≤ χ α ) = 1 − α.
1− ,n−1 ,n−1
2 2
Therefore,
2
(n − 1)S
2 2
P (χ α ≤ ≤ χ α ) = 1 − α.
1− ,n−1 2 ,n−1
2 σ 2
which is equivalent to
⎛ (n − 1)S 2 (n − 1)S
2 ⎞
2
P ≤ σ ≤ = 1 − α.
2 2
⎝ χ α χ α ⎠
,n−1 1− ,n−1
2 2
2 2
(n−1)S (n−1)S
We conclude that [ 2
, 2
] is a (1 − α)100% confidence interval for σ 2 .
χ α
χ α
,n−1 1− ,n−1
2 2
Assumptions: A random sample X1 , X2 , X3 , . . ., Xn is given from a N (μ, σ 2 ) distribution, where μ = E Xi and Var(Xi ) = σ
2
are unknown.
2 2
(n−1)S (n−1)S
Confidence Interval: [ 2
,
2
] is a (1 − α)100% confidence interval for σ 2 .
χ α χ α
,n−1 1− ,n−1
2 2
Example 8.21
For the data given in Example 8.20, find a 95% confidence interval for σ 2 . Again, assume that the weight is normally distributed with mean μ and variance σ 2 , where μ and σ are unknown.
Solution
As before, using the data we obtain
¯¯¯¯
X = 9.26,
2
S = 3.96
The above values can obtained in MATLAB using the commands chi2inv(0.975, 9) and chi2inv(0.025, 9), respectively. Thus, we can obtain a 95% confidence interval for σ 2 as
⎡ (n − 1)S 2 (n − 1)S
2
⎤ 9 × 3.96 9 × 3.96
, = [ , ]
2 2
⎣ χ α χ α ⎦ 19.02 2.70
,n−1 1− ,n−1
2 2
= [1.87, 13.20].
← previous
next →