Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

8.3.

3 Confidence Intervals for Normal Samples


¯¯¯¯
In the above discussion, we assumed n to be large so that we could use the CLT. An interesting aspect of the confidence intervals that we obtained was that they often did not depend on the details of the distribution from which we obtained the random sample. That is, the confidence intervals only depended on statistics such as X and S 2 .
What if n
is not large? In this case, we cannot use the CLT, so we need to use the probability distribution from which the random sample is obtained. A very important case is when we have a sample X1 , X2 , X3 , . . ., Xn from a normal distribution. Here, we would like to discuss how to find interval estimators for the mean and the variance of a normal
distribution. Before doing so, we need to introduce two probability distributions that are related to the normal distribution. These distributions are useful when finding interval estimators for the mean and the variance of a normal distribution.

Chi-Squared Distribution

Let us remember the gamma distribution. A continuous random variable X is said to have a gamma distribution with parameters α > 0 and λ > 0, shown as X ∼ Gamma(α, λ), if its PDF is given by
α α−1 −λx
λ x e
x > 0
Γ(α)
fX (x) = {

0 otherwise

Now, we would like to define a closely related distribution, called the chi-squared distribution. We know that if Z1 , Z2 , ⋯ , Zn are independent standard normal random variables, then the random variable

X = Z1 + Z2 + ⋯ + Zn

is also normal. More specifically, X ∼ N (0, n) . Now, if we define a random variable Y as

2 2 2
Y = Z +Z + ⋯ + Zn ,
1 2

then Y is said to have a chi-squared distribution with n degrees of freedom shown by


2
Y ∼ χ (n).

n 1
It can be shown that the random variable Y has, in fact, a gamma distribution with parameters α =
2
and λ =
2
,

n 1
Y ∼ Gamma ( , ).
2 2

Figure 8.5 shows the PDF of χ2 (n) distribution for some values of n.
Figure 8.5 - The PDF of χ2 (n) distribution for some values of n.

So, let us summarize the definition and some properties of the chi-squared distribution.

The Chi-Squared Distribution

Definition 8.1.
If Z1 , Z2 , ⋯ , Zn are independent standard normal random variables, the random variable Y defined as
2 2 2
Y = Z +Z + ⋯ + Zn
1 2

is said to have a chi-squared distribution with n degrees of freedom shown by

2
Y ∼ χ (n).

Properties:

1. The chi-squared distribution is a special case of the gamma distribution. More specifically,
n 1
Y ∼ Gamma ( , ).
2 2

Thus,

1 n y
−1 −
fY (y) = n
y 2 e 2 , for y > 0.
n
2 2 Γ( )
2

2. E Y = n , Var(Y ) = 2n .

3. For any p ∈ [0, 1] and n ∈ N, we define χ2p,n as the real value for which

2
P (Y > χ p,n ) = p,

where Y ∼ χ (n).
2
Figure 8.6 shows χ2p,n . In MATLAB, to compute χ2p,n you can use the following command: chi2inv(1 − p, n)
Figure 8.6 - The definition of χ2p,n .

Now, why do we need the chi-squared distribution? One reason is the following theorem, which we will use in estimating the variance of normal random variables.
Theorem 8.3.
Let X1 , X2 , ⋯ , Xn be i.i.d. N (μ, σ 2 ) random variables. Also, let S 2 be the sample variance for this random sample. Then, the random variable Y defined as

2 n
(n − 1)S 1 ¯¯¯¯ 2
Y = = ∑(X i − X )
2 2
σ σ
i=1

¯¯¯¯
has a chi-squared distribution with n − 1 degrees of freedom, i.e., Y ∼ χ (n − 1).
2
Moreover, X and S 2 are independent random variables.

The t -Distribution
The next distribution that we need is the Student's t-distribution (or simply the t-distribution). Here, we provide the definition and some properties of the t-distribution.

The t-Distribution

Definition 8.2.
Let Z ∼ N (0, 1) , and Y ∼ χ (n),
2
where n ∈ N. Also assume that Z and Y are independent. The random variable T defined as

Z
T = −−−−
√Y /n

is said to have a t-distribution with n degrees of freedom shown by

T ∼ T (n).

Properties:

1. The t-distribution has a bell-shaped PDF centered at 0, but its PDF is more spread out than the normal PDF (Figure 8.7).
2. E T = 0 , for n > 0. But E T , is undefined for n = 1.
3. Var(T ) = n , for n > 2. But, Var(T ) is undefined for n = 1, 2 .
n−2

4. As n becomes large, the t density approaches the standard normal PDF. More formally, we can write
d

T (n)  →  N (0, 1).

5. For any p ∈ [0, 1] and n ∈ N, we define tp,n as the real value for which

P (T > tp,n ) = p.

Since the t-distribution has a symmetric PDF, we have

t1−p,n = −tp,n .

In MATLAB, to compute tp,n you can use the following command: tinv(1 − p, n) .

Figure 8.7 shows the PDF of t-distribution for some values of n and compares them with the PDF of the standard normal distribution. As we see, the t density is more spread out than the standard normal PDF. Figure 8.8 shows tp,n .
Figure 8.7 - The PDF of t-distribution for some values of n compared with the standard normal PDF.

Figure 8.8 - The definition of tp,n .

Why do we need the t-distribution? One reason is the following theorem which we will use in estimating the mean of normal random variables.
Theorem 8.4.
Let X1 , X2 , ⋯ , Xn be i.i.d. N (μ, σ 2 ) random variables. Also, let S 2 be the sample variance for this random sample. Then, the random variable T defined as

¯¯¯¯
X −μ
T =


S /√n

has a t-distribution with n − 1 degrees of freedom, i.e., T ∼ T (n − 1) .


Proof:
Define the random variable Z as
¯¯¯¯
X −μ
Z = −.

σ/√n

Then, Z ∼ N (0, 1) . Also, define the random variable Y as

2
(n − 1)S
Y = .
2
σ

Then by Theorem Theorem 8.3, Y ∼ χ (n − 1).


2
We conclude that the random variable

¯¯¯¯
Z X −μ
T = −−− = −

Y S /√n

n−1
has a t-distribution with n − 1 degrees of freedom.

Confidence Intervals for the Mean of Normal Random Variables

Here, we assume that X1 , X2 , X3 , . . ., Xn is a random sample from a normal distribution N (μ, σ 2 ) , and our goal is to find an interval estimator for μ. We no longer require n to be large. Thus, n could be any positive integer. There are two possible scenarios depending on whether σ 2 is known or not.
If the value of σ 2 is known, we can
easily find a confidence interval for μ. This can be done using exactly the same method that we used to estimate μ for a general distribution for the case of large n. More specifically, we know that the random variable
¯¯¯¯
X −μ
Q = −

σ/√n

¯¯¯¯ σ ¯¯¯¯ σ
has N (0, 1) distribution. In particular, Q is a function of the Xi 's and μ, and its distribution does not depend on μ. Thus, Q is a pivotal quantity, and we conclude that [X −z α
,X + z α
] is (1 − α)100% confidence interval for μ.
2 √n 2 √n

Assumptions: A random sample X1 , X2 , X3 , . . ., Xn is given from a N (μ, σ 2


) distribution, where Var(Xi ) = σ
2
is known.

Parameter to be Estimated: μ = E Xi .

¯¯¯¯ σ ¯¯¯¯ σ
Confidence Interval: [X −z α
,X + z α
] is a (1 − α)100% confidence interval for μ.
2 √n 2 √n

The more interesting case is when we do not know the variance σ 2 . More specifically, we are given X1 , X2 , X3 , . . ., Xn , which is a random sample from a normal distribution N (μ, σ 2 ) , and our goal is to find an interval estimator for μ. However, σ 2 is also unknown. In this case, using Theorem 8.4, we conclude that the random variable T
defined as
¯¯¯¯
X −μ
T = −

S /√n

has a t-distribution with n − 1 degrees of freedom, i.e., T ∼ T (n − 1) . Here, the random variable T is a pivotal quantity, since it is a function of the Xi 's and μ, and its distribution does not depend on μ or any other unknown parameters. Now that we have a pivot, the next step is to find a (1 − α) interval for T . Using the definition of tp,n , a
(1 − α) interval for T can be stated as

P (−t α
,n−1
≤ T ≤ t α
,n−1
) = 1 − α.
2 2

Therefore,

¯¯¯¯
X −μ
P (−t α ≤ ≤ t α ) = 1 − α,
,n−1 −
− ,n−1
2
S /√n 2

which is equivalent to

¯¯¯¯
S ¯¯¯¯
S
P (X − t α ≤ μ ≤ X +t α ) = 1 − α.
,n−1 −
− ,n−1 −

2
√n 2
√n

¯¯¯¯ S ¯¯¯¯ S
We conclude that [X −t α
,n−1
,X + t α
,n−1
] is (1 − α)100% confidence interval for μ.
2 √n 2 √n

Assumptions: A random sample X1 , X2 , X3 , . . ., Xn is given from a N (μ, σ 2 ) distribution, where μ = E Xi and Var(Xi ) = σ
2
are unknown.

Parameter to be Estimated: μ = E Xi .

¯¯¯¯ ¯¯¯¯
Confidence Interval: [X −t α
,n−1
S
,X + t α
,n−1
S
] is a (1 − α) confidence interval for μ.
2 √n 2 √n

Example 8.20

A farmer weighs 10 randomly chosen watermelons from his farm and he obtains the following values (in lbs):

7.72 9.58 12.38 7.77 11.27 8.80 11.10 7.80 10.17 6.00

Assuming that the weight is normally distributed with mean μ and variance σ 2 , find a 95% confidence interval for μ.

Solution

Confidence Intervals for the Variance of Normal Random Variables

Now, suppose that we would like to estimate the variance of a normal distribution. More specifically, assume that X1 , X2 , X3 , . . ., Xn is a random sample from a normal distribution N (μ, σ 2 ) , and our goal is to find an interval estimator for σ 2 . We assume that μ is also unknown. Again, n could be any positive integer.
By Theorem 8.3, the
random variable Q defined as

2 n
(n − 1)S 1 ¯¯¯¯ 2
Q = = ∑(X i − X )
2 2
σ σ
i=1

has a chi-squared distribution with n − 1 degrees of freedom, i.e., Q ∼ χ (n − 1).


2
In particular, Q is a pivotal quantity since it is a function of the Xi 's and σ 2 , and its distribution does not depend on σ 2 or any other unknown parameters. Using the definition of χ2p,n , a (1 − α) interval for Q can be stated as

2 2
P (χ α ≤ Q ≤ χ α ) = 1 − α.
1− ,n−1 ,n−1
2 2

Therefore,
2
(n − 1)S
2 2
P (χ α ≤ ≤ χ α ) = 1 − α.
1− ,n−1 2 ,n−1
2 σ 2

which is equivalent to

⎛ (n − 1)S 2 (n − 1)S
2 ⎞
2
P ≤ σ ≤ = 1 − α.
2 2
⎝ χ α χ α ⎠
,n−1 1− ,n−1
2 2

2 2
(n−1)S (n−1)S
We conclude that [ 2
, 2
] is a (1 − α)100% confidence interval for σ 2 .
χ α
χ α
,n−1 1− ,n−1
2 2

Assumptions: A random sample X1 , X2 , X3 , . . ., Xn is given from a N (μ, σ 2 ) distribution, where μ = E Xi and Var(Xi ) = σ
2
are unknown.

Parameter to be Estimated: Var(Xi ) = σ


2
.

2 2
(n−1)S (n−1)S
Confidence Interval: [ 2
,
2
] is a (1 − α)100% confidence interval for σ 2 .
χ α χ α
,n−1 1− ,n−1
2 2

Example 8.21

For the data given in Example 8.20, find a 95% confidence interval for σ 2 . Again, assume that the weight is normally distributed with mean μ and variance σ 2 , where μ and σ are unknown.

Solution
As before, using the data we obtain

¯¯¯¯
X = 9.26,
2
S = 3.96

Here, n = 10 , α = 0.05 , so we need


2 2
χ = 19.02, χ = 2.70
0.025,9 0.975,9

The above values can obtained in MATLAB using the commands chi2inv(0.975, 9) and chi2inv(0.025, 9), respectively. Thus, we can obtain a 95% confidence interval for σ 2 as

⎡ (n − 1)S 2 (n − 1)S
2
⎤ 9 × 3.96 9 × 3.96
, = [ , ]
2 2
⎣ χ α χ α ⎦ 19.02 2.70
,n−1 1− ,n−1
2 2

= [1.87, 13.20].

Therefore, [1.87, 13.20] is a 95% confidence interval for σ 2 .

← previous
next →

You might also like