Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

Topic 8: Sampling Methods and the Central Limit Theorem

Population Parameters

___________ ___________
______ ______

Sample Statistics

◎Sampling:The method of selecting a ___________ from the _____________.

In order to understand the characteristics of the population, it is a necessary process of statistical inference to infer
the population through sampling surveys. How to select samples from the population, how to estimate, its accuracy
and error judgment are important issues, which will be discussed one by one later. Topic 8 begins discussing issues
related to sampling, including:

一、Sampling Methods

1. Probability Sampling
2. Non-probability Sampling
二、Error of Estimation
1. Non-sampling Errors
2. Sampling Errors
三、Sampling Distributions
1. The Sampling Distribution of the Sample Mean
◆ The Central Limit Theorem
2. The Sampling Distribution of the Sample Proportions
3. The Sampling Distribution of the Difference of Two Sample Means
4. The Sampling Distribution of the Difference of Two Proportions
四、Sampling from the Normal Distribution
1. Normal Distribution
2. Chi-squared Distribution
3. t Distribution
4. F Distribution
1
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

一、Sampling Methods

During research, if the population data can be obtained, the population data can be directly
used for analysis, but the population data is usually not easy to obtain. So, _____________is
used to obtain a ________________ sample to infer the population. There are many practical
reasons why we prefer to select portions or samples of a population to observe and measure:

1 To contact the whole population would be _________________.


2 __________________________________________________________________________ .

For example, to know how many bacteria are in the water of Sun Moon Lake, in theory, all
the water in Sun Moon Lake should be tested. But water is infinite, only a portion of the
water is taken for testing and inferences about the overall water quality.

3 ______________________________________. For example, when checking the quality of



cherries, if all the cherries are checked, when the checking is completed, all the cherries have
been eaten. Therefore, it is usually sampling to check its quality.

4 __________________________________. For example, the federal government wants to



estimate the monthly food price index, so all the prices of food such as bread, milk, etc. sold
in stores must be included. Since the prices of food such as bread and milk vary little among
stores, it is appropriate to select only the food prices of some stores and estimate the monthly
food price index.

Sampling method refers to the process of obtaining sample data. The main sampling
methods are _________________ and _________________________:

1. Probability or Scientific Sampling

The probability that each item or individual in the population will be selected is
__________________________, and common probability sampling includes:

1 Simple Random Sampling



2 Systematic Random Sampling

3 Stratified Random Sampling

4 Cluster Sampling

2
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

Explanation:

(1) Simple Random Sampling

Definition:Each item or individual in the population has _____________ probability be


selected.

Time of use:Applicable when there is a _________________.


For example:The general manager of the company decided to select 10 employees from 750
employees for business trip.
 Each employee has the same chance of being selected.
 How to do the simple random sampling?
1 Write the names of 750 employees on 750 sheets of paper and put them in a box,

and then draw 10 in sequence. 【 The process of writing all employee names on
paper is very time-consuming→ table of random numbers 】
(※The drawn papers are not put back into the box, so the chance of each paper
being picked increases, but the difference is not significant.)
2 Use table of random numbers

Use a computer to give each employee a number, the number is from 1 to 750.
When using a random number table, a _________________ will be randomly
selected from the table, and then 10 3-digit numbers between 001 and 750 will be
selected. 【 you can also use a computer to generate random numbers】
These numbers would correspond to 10 employees on the list.

✓ There are only 750 employees, so skip 961 and the next right 784, and the
third selected employee number is 189.
✓ If the number is repeated, skip it.

Note: Sampling without replacement from a finite population is sometimes called simple
random sampling.
[The probability of each selection does increase slightly because the slip is not
replaced. However, the differences are very small when the population is large. In
other words, if the population size N is large compared to the sample size n,

3
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

X 1 , X 2 ,..., X n are nearly independent and some approximate probability calculations

can be made assuming they are independent.]

Note:Simple random sampling is not easy to use in some situations. For example, a store
wants to study how long customers will stay in the store, at this point, the simple random
sampling is not an effective method, because there is no customer list, and there is no
way to specify customers by random numbers.

(2) Systematic Random Sampling

Definition:All items or individuals of the population are arranged _________________, a


_________________ is randomly selected, and then a sample is selected every k item or
individual, that is, sample every kth member at a fixed number of intervals.
When to use:Applicable to the situation where it is randomly ordered or no cyclic variation.
For example: In order to understand the consumption habits of customers, the manager of
the supermarket decided to use the systematic random sampling to select 25 customers to
conduct a survey.
 How to do the systematic random sampling?
The supermarket manager can start from the 6th customer who enters the supermarket,
and select every 10 customers (16th, 26th, 36th...) until the goal of 25 customers is
reached.
Note:When the actual order is related to maternal characteristics, systematic random sampling
is not appropriate because the sample may be biased. For example, if we want to audit
the invoices in the file in ascending order by amount, the system random sampling
cannot guarantee that the random sample is unbiased. In this case, other sampling
methods should be used.

【EXAMPLE 1】(Two sampling methods are used)

We decide to select 100 customers over 4 days, Mon.~Thur. We will select 25 customers a
day and begin the sampling at different times each day: 8 am, 11 am, 4 pm, and 7 pm.

4
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

(3) Stratified Random Sampling

Definition : The population are divided into _______________ groups according to


_____________________, called "__________", and the required proportion of samples are
selected from each layer according to the simple random sampling method. There is
__________ variability between layers and _______ variability within layers, with items within
each layer having some homogeneous properties.
When to use:Applicable when the population can be clearly divided into several groups based
on certain characteristics.
For example:Surveying men's and women's thoughts about student internships
 How to do the stratified random sampling?
First, the mothers were divided into two groups, men and women, and then several
students were selected from each group for investigation.
In this example, the stratified random sampling method can ensure that each layer in
the sample will be represented by at least one sample.

Note:Under stratified random sampling, the number of people selected from every layer may
be proportional to the relative frequency of each layer in the population.

(4) Cluster Sampling

Definition :The population is divided into _________ (regions, tribes, etc.) according to
specific condition (natural, such as geographic boundaries or artificial). Then, several
clusters are selected from these clusters for random sampling or census to form a sample.
There is _______ variation between individuals and _________ variation within clusters.

When to use:It is suitable for population items or individuals dispersed over a wide geographic
area.

For example:School management wants to know what students think about school policies.
The principal decided to randomly select 10 students from the class and conduct interviews
with all students in the class.
 How to do the cluster sampling?

First, the students of the school are divided into groups according to their classes,
and then 10 classes are randomly selected. Finally, it is decided whether to conduct
sampling interviews from 10 classes or to interview all of them.

Note : The advantage of cluster sampling is that it makes possible to sample unlimited
populations and can conduct centralized surveys nearby, and it saves time and costs.

5
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

2. Non-probability Sampling
In practical research, probability sampling is not always feasible, for example, the chance of
each sample being selected cannot be determined. In these cases, only non-probability
sampling can be done, and the process mostly involves subjective judgments. For example,
according to the principles of convenience and economy, the items or individuals who are
most easily contacted are selected as samples. For a survey on the consumption of Taipei
residents, direct interviews with passersby are called Convenience Sampling. Convenience
sampling is not time-consuming, but the sample is biased and conclusions extracted from
this data should not be extended to larger populations.

二、Error of Estimation

When using sample statistics to estimate the population parameters, since each item or
individual in the population is _________, even if the sampling method is very objective, there
is still some difference between the sample statistics and the population parameters, which is
the error of _________. Estimation errors can be divided into non-sampling errors and
sampling errors.

1. Non-sampling Errors

Non-sampling errors are errors that occur when _________ and _________ data, including
Sampling Bias, Nonresponse Error, and Measurement Error. Such errors can also exist across
the entire census.

(1) Sampling Bias

Sampling bias refers to the tendency to select samples from a certain part of the population,
and _______________ is a type of sampling bias. occurs when the
proportion of one segment of the population is in a sample than it is in the population.
For example, surveying high school students in school to measure youth drug using—this is
sampling bias because high school students who are homeschooled or who drop out are not
included.

(2) Nonresponse Error

Nonresponse error refers to the difference in opinion between subjects who provided a response
and those who _________. It is the error [or bias] that arises from the inability to obtain survey
responses from some sample individuals. At this point, the collected sample observations may
not be representative of the target population. Nonresponse can be improved through the use
of callbacks or rewards/incentives.
For example, if managers in a certain field are selected and their workload is surveyed,
managers with a high workload may not do the survey because they do not have enough time.

6
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

(3) Measurement Error

Measurement error is the _____________ of recorded data values, which may be the result of
incorrect or inaccurate responses from subjects, such as subjects' responses that do not reflect
their true feelings, which may be due to vague or leading questions.

Measurement errors may be intentional or unintentional.

For example, asking “Given that at the age of 18 people are old enough to fight and die for
their country, don't you think they should be able to drink alcohol as well?” versus “Do you
think 18-year-olds should be able to drink alcohol?”

【EXAMPLE 2】
To determine the public’s opinion of the police department, the police chief obtains a cluster
sample of 15 census tracts within his jurisdiction and samples all households in the randomly
selected tracts. Uniformed police officers go door to door to conduct the survey. Determine the
type of bias.

2. Sampling Error

Sampling error is the difference between __________________ and _____________________ .


This difference comes from the randomness of the sampling process, the sampling method and
the inference method. Sampling error occurs mainly because the sample provides incomplete
information about the population. In general, sampling error cannot be completely avoided
unless the entire population can be observed. One way to reduce sampling error is to increase
the sample size.

★Estimation error is inevitable, so we can only try to reduce this error.


★Response bias is not the opposite of non-response error. It refers to the tendency of
respondents to provide inaccurate or dishonest answers for various reasons.

【EXAMPLE 3】Sampling Error

The Foxtrot Inn’s number of rooms rented in June 2017 is shown as table below. Find the mean
of the population. Select 3 random samples of 5 days. Calculate the mean rooms rented for
each sample and compare it to the population mean. What is the sampling error in each case?

7
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

8
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

Exercise:
1. Every fifth adult entering an airport is checked for extra security screening. What sampling
technique is used?
Answer:
(The airport is sampling every 5th adult entering an airport for extra security screening.)

2. At a local technical school, five auto repair classes are randomly selected and all of the
students from each class are interviewed. What sampling technique is used?
Answer:
(All students on selected auto repair classes (clusters) are interviewed.)

3. A writer for an art magazine randomly selects and interviews fifty male and fifty female
artists. What sampling technique is used?
Answer:
(The writer interviews fifty artists from each of two groups, male and female (strata)).

4. A statistics student interviews everyone in his apartment building to determine who owns
a cell phone. What sampling technique is used?
Answer:
(The statistics student is simply relying on people who are in his apartment building to obtain
the sample data.)

5. The names of 100 employees are written on 100 cards. The cards are placed in a bag, and
three names are picked from the bag. What sampling technique is used?
Answer:
(Every person has an equal chance of being selected.)

6. Distinguish between nonsampling error and sampling error. Choose the correct answer below.
A. Nonsampling error is the error that results because a sample is being used to estimate
information about a population. Sampling error is the error that results from
undercoverage, nonresponse bias, response bias, or data-entry errors.
B. Nonsampling error is the error that results from the process of obtaining the data.
Sampling error is the error that results from undercoverage, nonresponse bias, response
bias, or data-entry errors.
C. Nonsampling error is the error that results from undercoverage, nonresponse bias,
response bias, or data-entry errors. Sampling error is the error that results from the
process of obtaining the data.
D. Nonsampling error is the error that results from the process of obtaining the data.
Sampling error is the error that results because a sample is being used to estimate
information about a population.
Answer:
9
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

三、Sampling Distributions Since statistics are actually random variables associated with a
given sample, they will vary from sample to sample. Therefore,

Definition they have probabilities distributions associated with them.

Suppose 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 are random samples extracted from the population, and the random
samples are composed of n random variables. Define T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) as the real number
function of the sample (𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ), then Y = T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) is a sample statistic.

The statistic Y is a random variable whose probability assignment is ______________________.


Explanation:

(1) Take 【EXAMPLE 3】Foxtrot Inn as an example of the number of rooms rented out per
day in June 2017, the population is the number of rooms rented every day for 30 days in
June 2017. The number of rooms rented out for 5 days is randomly selected from the number
of rooms rented out for 30 days in the population (n = 5), which is (𝑋1 , 𝑋2 , ⋯ , 𝑋5 ), which
may be (4, 7,4,3,1), (3, 3,2,3,6), (0,0,3,3,3)etc. Assuming that T(𝑋1 , 𝑋2 , ⋯ , 𝑋5 ) is defined
∑ 𝑋𝑖
as the average of samples (𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) , that is T(𝑋1 , 𝑋2 , ⋯ , 𝑋5 ) = , it varies
5

according to the random sample extracted each time, so Y = T(𝑋1 , 𝑋2 , ⋯ , 𝑋5 ) is a statistic


and is a random variable.
★ T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) can be defined as the mean, variance and standard deviation of
(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ).

∑ 𝑋𝑖
 Sample _________:T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) ≡ 𝑋̅ = 𝑛

∑(𝑋𝑖 −𝑋̅)2
 Sample _________:T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) ≡ 𝑆 2 = 𝑛−1

∑(𝑋𝑖 −𝑋̅)2
 Sample __________________:T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) ≡ S = √𝑆 2 = √
𝑛−1

→ The definition of a statistic is very broad, with the only restriction being that a statistic

cannot be a function of a parameter.

(2)About sampling distribution:

1 The statistic is the characteristic number of the _________, which can be the sample mean,

the sample variance, the sample standard deviation, the sample maximum, the sample
minimum, the sample median, etc. It is also a random variable.
2 The probability distribution of statistics is called _________ distribution or sample

statistics distribution, which is also a kind of probability model. It is a probability model of
multiple sampling results and has two functions:

10
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

i The amount of uncertainty or error in a measurable statistical inference.



ii It can explain the reliability of the inference results.

3 Sampling distribution is the basis of statistical inference. There are three factors that affect

sampling distribution, including __________________, sample size, and _______________.
As long as one of these three factors is different, sampling distribution will be different.

1. The Sampling Distribution of the Sample Mean

(1) Definition

Assuming that 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 are random samples extracted from the population, and the
𝑛
∑ 𝑋𝑖
random sample is composed of n random variables. Let 𝑋̅ = 𝑖=1 , then 𝑋̅ is the sample
𝑛

mean. Its probability distribution is f(𝑥̅ ), called the sampling distribution of the ____________.
Explanation:

Taking [EXAMPLE 3] the number of rooms rented out by Foxtrot Inn Hotel in June 2017 as
an example, the _______________ will change with different samples. In the first group
(4,7,4,3,1) 5-day sample, the sample mean is 3.8 rooms; the second group (3,3,2,3,6) has a
sample mean of 3.4 rooms. The population mean for this sample is 3.13 rooms. If the average
of possible (𝐶𝑛𝑁 = 𝐶530) 5-day samples is sorted into a probability distribution, this
distribution is called the sampling distribution of the sample mean.

𝑛
∑ 𝑋𝑖
𝑋1 𝑋2 𝑋3 𝑋4 𝑋5 𝑋̅ = 𝑖=1 𝑥̅ f(𝑥̅ )
𝑛

First group 4 7 4 3 1 _________ 𝑥̅1 1⁄𝐶𝑁𝑛

Second group 3 3 2 3 6 _________ 𝑥̅ 2 1⁄𝐶𝑁𝑛

⋮ ⋮ ⋮

⋮ ⋮ ⋮

_________ group 𝑥̅ 𝐶𝑛𝑁 1⁄𝐶𝑁𝑛

The mean and variance of 𝑋̅ E(𝑋̅ )、V(𝑋̅ )

11
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

(2)樣本平均數的重要表徵數

Assuming that 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 is a random sample drawn from the population (mean is μ ,


variance is 𝜎 2 ), and the random sample is composed of n random variables, then
̅) = μ𝑋̅ = μ
1 Expected Value: E(𝑋

2 Variance:

𝜎2 𝜎2 𝑁−𝑛
Infinite Population:Var(𝑋̅) = 𝜎𝑋2̅ = ;Finite Population:Var(𝑋̅) = 𝜎𝑋2̅ =
𝑛 𝑛 𝑁−1

3 Standard Deviation:

𝜎
Infinite Population:S.D. = 𝜎𝑋̅ = √𝜎𝑋2̅ = ;
√𝑛

𝜎 𝑁−𝑛
Finite Population:S.D. = 𝜎𝑋̅ = √𝜎𝑋2̅ = √𝑁−1
√𝑛

12
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

(3) Pattern of sample mean sampling distribution

1 Population distribution is normal distribution


The sampling distribution of the sample mean is also the normal distribution regardless of the
sample size.
2 Population distribution is not normal distribution

When the sample is large enough, according to the Central Limit Theorem (CLT), the
sampling distribution of the sample mean will approach the normal distribution.
◆Central Limit Theorem(CLT)

Regardless of the distribution of the population, the mean is μ and the variance
is 𝜎 2(𝜎 2 < ∞). Simply randomly select n samples from the mother as a group of samples.
If the sample size is large enough, the sampling distribution of the sample mean will
approach the normal distribution.
Sample size is large enough means:_________

Explanation:

1 If the population is a normal distribution, the sampling distribution of the sample mean is

the _________ distribution regardless of the sample size.
2 If the population is not a normal distribution:

◆Central Limit Theorem(CLT)

It can be seen from the figure below that, regardless of the shape of the population distribution,
as the sample size increases, the sampling distribution of the sample mean will approach the
_________ distribution.

If the population is symmetric but


not normal distribution, as long as
the sample size is above 10, the
sampling distribution of the sample
mean will be close to the normal
distribution; If the population is
skewed or thick-tailed, and the
sample size is above 30, the
sampling distribution of the sample
mean will approach the normal
distribution.

13
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

̅~_______________;
3 If it is an infinite population, the sampling distribution of its sample mean is 𝑋

If it is an finite population, the sampling distribution of its sample mean is 𝑋̅ ~_______________
(4) Z-score for a given value of X (sample mean)

1 Infinite population

𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 − 𝜇 𝑥̅ − μ𝑋̅ 𝑥̅ − 𝜇
Z= = = 𝜎
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝜎𝑋̅
√𝑛
2 Finite population

𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 − 𝜇 𝑥̅ − μ𝑋̅ 𝑥̅ − 𝜇
Z= = =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝜎𝑋̅ 𝜎 √𝑁 − 𝑛
√𝑛 𝑁 − 1

【EXAMPLE 4】(Mean of the sampling distribution)


Tartus Industries has seven production employees (the population). The hourly earnings of
each employee is given in the table.

1. What is the population mean?


2. What is the sampling distribution of the sample mean for samples of size 2?
3. What is the mean of the sampling distribution?
4. What observations can be made about the population and the sampling distribution?

14
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

15
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

【EXAMPLE 5】
When a production machine is properly calibrated, it requires an average of 25 seconds per unit
produced, with a standard deviation of 3 seconds. For a simple random sample of n = 36 units,
the sample mean is found to be 26.2 seconds per unit.
a. What z-score corresponds to the sample mean of x = 26.2 seconds?
b. When the machine is properly calibrated, what is the probability that the mean for a simple
random sample of this size will be at least 26.2 seconds?

【EXAMPLE 6】
The Quality Assurance Dept. for Cola, Inc. maintains records regarding the amount of cola in
its jumbo bottle. The actual amount of cola in each bottle varies a small amount from one bottle
to another. Records indicate the amounts of cola follow the normal distribution, the mean
amount of cola in the bottles is 31.2 ounces, and the standard deviation is 0.4 ounces. At 8 a.m.
today, the quality technician randomly selected 16 bottles from the filling line. The mean
amount was 31.38 ounces. Is this an unlikely result? Is it a likely the process is putting too
much soda in the bottle? Is the sampling error of 0.18 ounce unusual?

16
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

2. The Sampling Distribution of the Sample Proportions

(1) Definition

Assuming that 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 is a random sample extracted from the population. The random
sample is composed of n random variables, and the number of "successes" in n is X, then the
success rate is _________, called 𝑝̂ the sample proportion. Its probability distribution is
f(𝑝̂ ), which is called the sampling distribution of the __________________.
Explanation:
According to the data, 42% of the company's employees have financial licenses. The
following 3 graphs describe the sampling distribution for sample proportions and sample
sizes n of 10, 50, and 100, respectively.

(2)樣本比例的重要表徵數

樣本比例的重要表徵數為:
1 Expected Value: E(𝑝̂ ) = 𝜇𝑝̂ = 𝑝

2 Variance:

𝑝(1−𝑝)
Infinite population:Var(𝑝̂ ) = 𝜎𝑝2̂ = ;
𝑛

𝑝(1−𝑝) 𝑁−𝑛
finite population:Var(𝑝̂ ) = 𝜎𝑝2̂ =
𝑛 𝑁−1

3 Standard Deviation:

𝑝(1−𝑝)
Infinite population:S.D. = 𝜎𝑝̂ = √𝜎𝑝2̂ = √ ;
𝑛

𝑝(1−𝑝) 𝑁−𝑛
finite population:S.D. = 𝜎𝑝̂ = √𝜎𝑝2̂ = √ √𝑁−1
𝑛

Explanation:
𝑋
𝐸(𝑝̂ ) = 𝐸 ( ) = ______________________________________________________
𝑛

The variance and standard deviation of 𝑝̂ vary depending on whether the population is infinite
or finite:
17
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

1 Infinite Population

𝑋
𝑉(𝑝̂ ) = 𝑉 ( ) = ________________________________________________________________________
𝑛
𝜎𝑃̂ = 𝑆. 𝐷. (𝑝̂ ) = √𝑉𝑎𝑟(𝑝̂ ) = ___________________________
○2 Finite Population

𝑁−𝑛
☆Use finite population correction factor 𝑁−1

If the sample size n is not small relative to the population N [rule of thumb: the number of
samples is greater than 5% of the parent (n>5%N)], or when the finite population is extracted
and not returned, the variance and standard deviation of 𝑝̂ need to be adjusted with a finite
𝑁−𝑛
population correction factor , so
𝑁−1

𝑉(𝑝̂ ) = 𝜎𝑝2̂ = ____________________________________


𝜎𝑃̂ = 𝑆. 𝐷. (𝑝̂ ) = √𝑉𝑎𝑟(𝑝̂ ) = ____________________________________
(3) The shape of the Sampling Distribution of the Sample Proportions

1 Sampling Distribution of Small Sample Proportions



𝑝(1−𝑝)
Infinite Population:𝑝̂ ~Binomial distribution, with mean 𝑝, variance Var(𝑝̂ ) = 𝜎𝑝2̂ = 𝑛

Finite Population:𝑝̂ ~hypergeometric distribution, with mean 𝑝, variance Var(𝑝̂ ) = 𝜎𝑝2̂ =


𝑝(1−𝑝) 𝑁−𝑛
𝑛 𝑁−1

2 Sampling Distribution of Large Sample Proportions


When the sample is large enough, according to the Central Limit Theorem (CLT), the sampling
distribution of the sample proportion will approach the _________ distribution, the mean is 𝑝
𝑝(1−𝑝)
and the variance is Var(𝑝̂ ) = 𝜎𝑝2̂ =
𝑛

Sample size is large enough means _________ and __________________

18
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

(4) Z-score for a given value of 𝑝̂ (sample proportion)

1 Infinite Population

𝑝̂ − 𝑝 𝑝̂ − 𝑝
Z= =
𝜎𝑃̂
√𝑝(1 − 𝑝)
𝑛
2 Finite Population

𝑝̂ − 𝑝 𝑝̂ − 𝑝
Z= =
𝜎𝑃̂
√𝑝(1 − 𝑝) √𝑁 − 𝑛
𝑛 𝑁−1
𝑝̂ = the sample proportion value of interest

【EXAMPLE 7】
The campaign manager for a political candidate claims that 55% of registered voters favor her
strongest opponent. Assuming that this claim is true, what is the probability that in a simple
random sample of 300 voters, at least 60% would favor the candidate over her strongest
opponent?

【EXAMPLE 8】
A survey of 500 adults aged 18-29 showed 285 ate fast food for dinner at least once in the past
week. Find the sample proportion of individuals surveyed who ate fast food for dinner at least
once in the past week.

【EXAMPLE 9】
According to the Centers for Disease Control and Prevention, 18.8% of school-aged children,
aged 6-11 years, were overweight in 2004. In a random sample of 90 school-aged children,
aged 6-11 years, what is the probability that at least 19% are overweight?

19
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

【EXAMPLE 10】
Sixteen percent of Americans do not have health insurance. Suppose a simple random sample
of 500 Americans is obtained. In a random sample of 500 Americans, what is the probability
that more than 20% do not have health insurance?

3.The Sampling Distribution of the Difference of Two Sample Means

(1) Definition

Suppose there are two quantitative population, their sizes are 𝑁1 and 𝑁2 , the means are 𝜇1
and 𝜇2 , and the variances are 𝜎12 and 𝜎22 . Random samples are extracted from the two
populations, the sample sizes are 𝑛1 and 𝑛2, and the samples are independent of each other.
If 𝑋̅1 and 𝑋̅2 represent the sample mean respectively, then 𝑋̅1 −𝑋̅2 is the difference between
the mean of the two samples, and its probability distribution is called sampling distribution of
the difference of two sample means.

20
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

(2)兩樣本平均數差的重要表徵數

兩樣本平均數差的重要表徵數為:
̅1 −𝑋̅2 ) = 𝜇(𝑋̅ −𝑋̅ ) = 𝜇1 − 𝜇2
1 Expected Value: E(𝑋
○ 1 2
2 Variance:

2 𝜎2 𝜎2
Infinite Population:Var(𝑋̅1 −𝑋̅2 ) = 𝜎(𝑋 1
̅1 −𝑋̅2 ) = 𝑛 + 𝑛
2
1 2

2 𝜎2 𝑁1 −𝑛1 𝜎2 𝑁2 −𝑛2
Finite Population:Var(𝑋̅1 −𝑋̅2 ) = 𝜎(𝑋
̅1 −𝑋̅2 ) = 𝑛
1
+ 𝑛2
1 𝑁1 −1 2 𝑁2 −1

3 Standard Deviation:

2 𝜎2 𝜎2
Infinite Population:S.D. = 𝜎(𝑋̅1 −𝑋̅2) = √𝜎(𝑋 1
̅1 −𝑋̅2 ) = √𝑛 + 𝑛
2
1 2

2 𝜎2 𝑁1 −𝑛1 𝜎2 𝑁2 −𝑛2
Finite Population:S.D. = 𝜎(𝑋̅1−𝑋̅2) = √𝜎(𝑋
̅1 −𝑋̅2 ) = √𝑛
1
+ 𝑛2
1 𝑁1 −1 2 𝑁2 −1

(3) The pattern of sampling distribution of the difference of two sample means.

1 Population distribution is normal distribution


Regardless of the sample size, the sampling distribution of the difference of the two sample
means is also the normal distribution.
2 Population distribution is not normal distribution

When the sample size is large enough, according to the Central Limit Theorem (CLT), the
sampling distribution of the difference of the two sample means will approach normal
distribution
Explanation:

1 Population distribution is normal distribution



Using the linear combination theorem of normal random variables, it can be known that if the
population is normal distribution, the distribution of 𝑋̅1 −𝑋̅2 is also normal distribution.
2 Population distribution is not normal distribution

If the population is not normally distribution, then when 𝑛1 and 𝑛2 are large enough,
according to _________, 𝑋̅1 and 𝑋̅2 are respectively normal distribution. By CLT,
𝜎1 𝜎2
𝑋̅1 ~𝑁 (𝜇1 , ) 𝑎𝑠 𝑛1 ↑, 𝑋̅2 ~𝑁 (𝜇2 , ) 𝑎𝑠 𝑛2 ↑
√𝑛1 √𝑛2

21
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

According to the linear combination theorem of normal random variables, the sampling
distribution of 𝑋̅1 −𝑋̅2 also approaches the _________ distribution.

(4) Z-score for a given value of (𝑋̅1 −𝑋̅2 )

1 Infinite Population

(𝑋̅1 −𝑋̅2 ) − 𝐸(𝑋̅1 −𝑋̅2 ) (𝑥̅1 −𝑥̅ 2) − (𝜇1 − 𝜇2 )
Z= =
𝜎(𝑋̅1−𝑋̅2 )
𝜎2 𝜎2
√ 1 + 2
𝑛 1𝑛 2

2 Finite Population

(𝑋̅1 −𝑋̅2 ) − 𝐸(𝑋̅1 −𝑋̅2 ) (𝑥̅1 −𝑥̅2 ) − (𝜇1 − 𝜇2 )
Z= =
𝜎(𝑋̅1−𝑋̅2 )
𝜎12 𝑁1 − 𝑛1 𝜎22 𝑁2 − 𝑛2

𝑛1 𝑁1 − 1 + 𝑛2 𝑁2 − 1

【EXAMPLE 11】
In a study of annual family expenditures for general health care, two populations were
surveyed with the following results:

Population 1: n1 =40, X 1 = $346 ; Population 2: n2 =35, X 2 = $300 ;

If the variances of the populations are  12 = 2800 and  22 = 3250 , what is the probability of

obtaining sample results ( X 1 − X 2 ) as large as those shown if there is no difference in the

means of the two populations?

22
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

4.The Sampling Distribution of the Difference of Two Proportions

(1) Definition

Suppose there are two qualitative population, their sizes are 𝑁1 and 𝑁2 , population
proportions are 𝑝1 and 𝑝2 . Two independent samples are extracted from the two populations.
The sample sizes are 𝑛1 and 𝑛2, the sample proportions are 𝑝̂1 and 𝑝̂2 , them 𝑝̂1 − 𝑝̂2 is
the difference between the two sample proportions, and the probability distribution is called
sampling distribution of the difference of two sample proportions.

(2) The important characteristic of the difference between the proportions of two samples
1 Expected Value: E(𝑝̂1 − 𝑝̂ 2 ) = 𝜇(𝑝̂ −𝑝̂ ) = 𝑝1 − 𝑝2
○ 1 2
2 Variance:

2 𝑝1 (1−𝑝1 ) 𝑝2 (1−𝑝2 )
Infinite Population:Var(𝑝̂1 − 𝑝̂2 ) = 𝜎(𝑝
̂1 −𝑝̂2 ) = +
𝑛1 𝑛2

2 𝑝1 (1−𝑝1 ) 𝑁1 −𝑛1 𝑝2 (1−𝑝2 ) 𝑁2 −𝑛2


Finite Population:Var(𝑝̂1 − 𝑝̂2 ) = 𝜎(𝑝
̂1 −𝑝̂2 ) = +
𝑛1 𝑁1 −1 𝑛2 𝑁2 −1


3 標準差:

2 𝑝1 (1−𝑝1 ) 𝑝2 (1−𝑝2 )
Infinite Population:S.D. = 𝜎(𝑝̂1−𝑝̂2) = √𝜎(𝑝
̂1 −𝑝̂2 ) = √ +
𝑛1 𝑛2

2 𝑝1 (1−𝑝1 ) 𝑁1 −𝑛1 𝑝2 (1−𝑝2 ) 𝑁2 −𝑛2


Finite Population:S.D. = 𝜎(𝑝̂1−𝑝̂2) = √𝜎(𝑝
̂1 −𝑝̂2 ) = √ +
𝑛1 𝑁1 −1 𝑛2 𝑁2 −1

(3)The pattern of Sampling Distribution of the Difference of Two Proportions

When the samples 𝑛1 and 𝑛2 are large enough, according to the Central Limit Theorem
(CLT), the sampling distribution of the difference in the proportions of the two samples will
approach the normal distribution.

Explanation:
When 𝑛1 and 𝑛2 are large enough, according to the _____________________________,
𝑝1 (1−𝑝1 )
𝑝̂1 and 𝑝̂2 are normal distributions, respectively. By CLT, 𝑝̂1 ~𝑁 (𝑛1𝑝1 , ) 𝑎𝑠 𝑛1 ↑,
𝑛1

𝑝(1−𝑝2 )
𝑝̂2 ~𝑁 (𝑛2 𝑝2 , ) 𝑎𝑠 𝑛2 ↑
𝑛2

According to the linear combination theorem of normal random variables, the sampling
distribution of 𝑝̂1 − 𝑝̂2 also approaches the ______________________.
23
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

(4) Z-score for a given value of (𝑝̂1 − 𝑝̂2 )

1 Infinite Population

(𝑝̂1 − 𝑝̂2 ) − 𝐸(𝑝̂1 − 𝑝̂2 ) (𝑝̂1 − 𝑝̂2 ) − (𝑝1 − 𝑝2 )
Z= =
𝜎(𝑝̂1−𝑝̂2) 𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 )
√ +
𝑛1 𝑛2
2 Finite Population

(𝑝̂1 − 𝑝̂2 ) − 𝐸(𝑝̂1 − 𝑝̂2 ) (𝑝̂1 − 𝑝̂2 ) − (𝑝1 − 𝑝2 )
Z= =
𝜎(𝑝̂1−𝑝̂2) 𝑝1 (1 − 𝑝1 ) 𝑁1 − 𝑛1 𝑝2 (1 − 𝑝2 ) 𝑁2 − 𝑛2

𝑛1 𝑁1 − 1 + 𝑛2 𝑁2 − 1

【EXAMPLE 12】
In a certain area of a large city it is hypothesized that 40% of the houses are in a dilapidated
condition. A random sample of 75 houses from this section and 90 houses from another section
yield a difference, (𝑝̂1 − 𝑝̂2 ) , of 0.09. If there is no difference between the two areas in the
proportion of dilapidated houses, what is the probability of observing a difference this large or
larger?

24
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

Exercise:
1. Sampling distributions describe the distribution of
a) parameters.
b) statistics.
c) both parameters and statistics.
d) neither parameters nor statistics.

2. The Central Limit Theorem is important in statistics because


a) for a large n, it says the population is approximately normal.
b) for any population, it says the sampling distribution of the sample mean is approximately
normal, regardless of the sample size.
c) for a large n, it says the sampling distribution of the sample mean is approximately normal,
regardless of the shape of the population.
d) for any sized sample, it says the sampling distribution of the sample mean is approximately
normal.

3. For air travelers, one of the biggest complaints is of the waiting time between when the
airplane taxis away from the terminal until the flight takes off. This waiting time is known to
have a right skewed distribution with a mean of 10 minutes and a standard deviation of 8
minutes. Suppose 100 flights have been randomly sampled. Describe the sampling distribution
of the mean waiting time between when the airplane taxis away from the terminal until the
flight takes off for these 100 flights.
a) Distribution is right skewed with mean = 10 minutes and standard error = 0.8 minutes.
b) Distribution is right skewed with mean = 10 minutes and standard error = 8 minutes.
c) Distribution is approximately normal with mean = 10 minutes and standard error = 0.8
minutes.
d) Distribution is approximately normal with mean = 10 minutes and standard error = 8 minutes.

4. The mean score of all pro golfers for a particular course has a mean of 70 and a standard
deviation of 3.0. Suppose 36 pro golfers played the course today. Find the probability that the
mean score of the 36 pro golfers exceeded 71.

5. The distribution of the number of loaves of bread sold per week by a large bakery over the
past 5 years has a mean of 7,750 and a standard deviation of 145 loaves. Suppose a random
sample of n = 40 weeks has been selected. What is the approximate probability that the mean
number of loaves sold in the sampled weeks exceeds 7,895 loaves?

25
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

6. Major league baseball salaries averaged follow a normal distribution with a $3.26 million
and a standard deviation of $1.2 million in a certain year in the past. Suppose a sample of 100
major league players was taken. What was the standard error for the sample mean salary? What
is the approximate probability that the mean salary of the 100 players exceeded $3.5 million?

7. The amount of bleach a machine pours into bottles has a mean of 36 oz. with a standard
deviation of 0.15 oz. Suppose we take a random sample of 36 bottles filled by this machine.
What is the probability that the mean of the sample is between 35.94 and 36.06 oz.?

8. According to a survey, only 15% of customers who visited the web site of a major retail store
made a purchase. Random samples of size 50 are selected. What proportion of the samples will
have between 20% and 30% of customers who will make a purchase after visiting the web site?

9. According to an article, 19% of the entire population in a developing country have high-
speed access to the Internet. Random samples of size 200 are selected from the country’s
population. ______ % will have between 14% and 24% who have high-speed access to the
Internet.

10. Times spent studying by students in the week before final exams follow a normal
distribution with standard deviation 8 hours. A random sample of 4 students was taken from a
population of 50 in order to estimate the mean study time for the population of all students.
Use the finite population correction. What is the standard error of all the sample means? What
is the probability that the sample mean differs from the population mean by less than 2 hours?

26
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

四、Sampling from the Normal Distribution

This section describes the characteristics of sample sizes extracted from the data whose
population is normal distribution, this distribution remains one of the most used statistical
models. Many useful properties of sample statistics are derived from the sampling from normal
distribution as well as many well-known sampling distributions.

1.Normal Distribution

Suppose there are n variables, 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 , are from the same normal population N(μ, 𝜎 2 ),
then
𝑖=𝑛
∑ 𝑋𝑖 𝜎2
(1) The sampling distribution of sample mean 𝑋̅ = 𝑖=1 is N(μ, ).
𝑛 𝑛

(2) The sampling distribution of sample sum S = ∑𝑖=𝑛 2


𝑖=1 𝑋𝑖 is N(nμ, 𝑛𝜎 ).

𝜎 2
(3) The sampling distribution of difference of two sample means 𝑋̅1 −𝑋̅2 is N(𝜇1 − 𝜇2 , 𝑛1 +
1

𝜎22
)
𝑛2

𝜎 𝜎 2 2
(4) The sampling distribution of sum of two sample means 𝑋̅1 +𝑋̅2 is N(𝜇1 + 𝜇2 , 𝑛1 + 𝑛2 )
1 2

2. 𝜒 2 Distribution;Chi-squared Distribution

Definition

Assuming that there are n independent normal random variables 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 , the means are
𝑋𝑖 −𝜇𝑖 2
𝜇1 , 𝜇2 , ⋯ , 𝜇𝑛 , and the variance are 𝜎12 , 𝜎22 , ⋯ , 𝜎𝑛2 . Assuming that 𝜒 2 = ∑𝑖=𝑛
𝑖=1 ( ) , then
𝜎𝑖

the sampling distribution of 𝜒 2 is a chi-square distribution with degrees of freedomυ.


 x
1 −1 −
f ( x) = 
x2 e 2 , 0  x  
 
  2 2
2

υ is degrees of freedom

Explanation:

(1) The degrees of freedom υ are specific to the chi-square statistic. The degree of freedom is
the number of variables that can be freely changed in the calculation, and in the case of the
chi-square statistic, it is the number of variables that can be freely changed in each variable
that constitutes a chi-square statistic.

For example: There is a set of data X with 4 observations: 15, 30, 25, 10. The mean of this
27
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

group of data is 20. Given a mean of 20 in this set of data, there are only 3 values that can
vary freely. In other words, the degrees of freedom are N-1=3.

In statistics, degrees of freedom are usually equal to the number of observations minus the
number of estimated population parameters.
(# of observations) minus (# of parameters estimated)

(2) The characteristic of Chi-squared Distribution:

Expected Value:𝐸(𝜒𝜐2 ) = 𝜐

Variance:𝑉(𝜒𝜐2 ) = 2𝜐
8
Skewness:𝛼3 = √υ

12
Kurtosis:𝛼4 = 3 + 𝜐

(3) The facts related to chi-square random variables::Let 𝜒𝜐2 be a chi-square random variable
with degrees of freedom.

1 If Z is the random variable of N(0,1), then 𝑍 2 ~𝜒12 , that is, the square of a standard normal

random variable is a chi-square random variable with 1 degree of freedom.


2 If 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 are independent and 𝑍𝑖 ~𝜒𝜐2𝑖 , then (𝑋1 + 𝑋2 + ⋯ +
2
𝑋𝑛 )~𝑍𝑖 ~𝜒(𝜐 1 +𝜐2+⋯+𝜐𝑛 )
, that is, the addition of independent chi-square variables remains
chi-square variables, and the degrees of freedom are also added.

3 Suppose that n random variables such as 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 come from the same normal

𝑋𝑖 −𝜇 2
population N(μ, 𝜎 2 ), then the sampling distribution 𝜒 2 = ∑𝑖=𝑛
𝑖=1 ( ) is also a chi-square
𝜎

distribution with the parameter as the degree of freedom υ.


4 Suppose that n random variables such as 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 come from the same normal

1 2 (𝑛−1)𝑆 2
population N(μ, 𝜎 2 ), 𝑆 2 = 𝑛−1 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋) , then 𝜎2
is a chi-square distribution

with (𝑛−1) degrees of freedom.


𝑛 𝑛 𝑛 2
2
1 2 2 (𝑛 − 1)𝑆 2 𝑋𝑖 − 𝑋
𝑆 = ∑(𝑋𝑖 − 𝑋) ⟹ (𝑛 − 1)𝑆 2 = ∑(𝑋𝑖 − 𝑋) ⟹ 2
= ∑( )
𝑛−1 𝜎 𝜎
𝑖=1 𝑖=1 𝑖=1

We lost a degree of freedom when we used the sample mean rather than the true mean.

28
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

1 2
❖The mean and variance of 𝑆 2 = 𝑛−1 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋) :

Expected Value:𝐸(𝑆 2 ) = 𝜎 2
2𝜎4
Variance:𝑉(𝑆 2 ) = 𝑛−1

(𝑛 − 1)𝑆 2 2 2
𝜎2 2
= 𝜒𝑛−1 ⟹ 𝑆 = 𝜒
𝜎2 𝑛 − 1 𝑛−1
𝜎2 2 𝜎2 𝜎2
𝐸(𝑆 2 ) = 𝐸 ( 𝜒𝑛−1 ) = 2 )
𝐸(𝜒𝑛−1 = (𝑛 − 1) = 𝜎 2
𝑛−1 𝑛−1 𝑛−1
𝜎2 2 𝜎4 𝜎4 2𝜎 4
𝑉(𝑆 2 ) = 𝑉 ( 𝜒𝑛−1 ) = 𝑉(𝜒 2 )
𝑛−1 = ∙ 2(𝑛 − 1) =
𝑛−1 (𝑛 − 1)2 (𝑛 − 1)2 𝑛−1

3. Student’s t Distribution

Assume that n random variables such as 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 come from the same normal population
𝑋̅−𝜇 𝑋̅−𝜇
N(μ, 𝜎 2 ), then Z = 𝜎⁄ ~𝑁(0,1). If σ is known and 𝑋̅ is calculated, then can be used
√𝑛 𝜎⁄√𝑛

to infer μ because μ is the only unknown. However, most of the time, σ is unknown.
Therefore, with σ unknown, student estimates μ through the distribution of the t-statistic
𝑋̅ −𝜇
t = 𝑠⁄ .
√𝑛

❖ Why does Student need to use Student’s t Distribution?

Student has a small sample of data, he wants to know the difference between the sample
mean and the population mean, but he does not know the variance and cannot accurately
estimate it.

❖The current situation is:


1 N(0,1)

2 Compare 𝑋
○ ̅ and μ
3 𝜎 2 is unknown

4 Small sample (otherwise, we can estimate 𝜎 2 with 𝑆 2 )

→rewrite
𝑋−𝜇
𝜎
𝑋−𝜇 𝑛 𝑍
𝑡= 𝑠 = √ =
𝑠 1 2
√𝑛 𝑛
∙ 𝜎 √𝑆 2
√ 𝜎
√𝑛
When 𝑛 ⟶ ∞, Student’s t Distribution will approach the Z distribution
29
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

❖Student finds the probability density function (pdf) of the Student’s t Distribution is:
 +1
  ( +1)
2 − 2
 
f (t ) =   1 + 
2 t
, −t 
     
 
2
Υis degrees of freedom
❖The characteristic of t Distribution:
Expected Value:𝐸(𝑡) = 0
𝜐
Variance:𝑉(𝑡) = 𝜐−2 , 𝜐 > 2

Skewness:𝛼3 = 0
3(𝜐−2)
Kurtosis:𝛼4 = ,𝜐 > 4
𝜐−4

4. Snedecor’s F Distribution

Snedecor’s F Distribution is usually used to compare the variance of two data sets.

Assume two independent random samples 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ~N(𝜇1 , 𝜎12 ) and


𝑆12 (𝑛−1)
𝑌1 , 𝑌2 , ⋯ , 𝑌𝑛 ~N(𝜇2 , 𝜎22 ) , 𝑆12 及 𝑆22 is the sample variance, then ~𝜒𝑛21−1 ;
𝜎12

𝑆22 (𝑛−1)
~𝜒𝑛22−1
𝜎22

𝑆12 2
𝜒𝑛 1 −1⁄
⁄ 2
𝜎1 (𝑛1 −1)
The value F: 𝑆22 ~ 𝜒2
⁄ 2 𝑛2 −1⁄
𝜎2 (𝑛2 −1)

2
𝜒𝑛 1 −1⁄
𝑆12 (𝑛1 −1)
When 𝜎12 = 𝜎22 → ~ 𝜒2
𝑆22 𝑛2 −1⁄
(𝑛2 −1)

❖ Why do you need Snedecor’s F Distribution?

𝜎12
If you want to know the degree of variation of the two sets of population data, you can see ,
𝜎22

𝑆12
⁄ 2
𝑆12 𝑆2
and the related information you can see (the ratio of sample variation). Through 2 =
𝑆22 𝜎1
⁄ 2
𝜎2

𝑆12
⁄ 2
𝜎1
𝑆22
, Snedecor’s F Distribution can be used to compare the degree of variation between the
⁄ 2
𝜎2

30
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾

two groups of data.


❖ The probability density function (pdf) of Snedecor’s F Distribution:
𝑈⁄
𝜐1
Assuming that U~𝜒𝜐21 and V~𝜒𝜐22 , then W = 𝑉 is Snedecor’s F Distribution, W~𝐹𝜐1,𝜐2 ,
⁄𝜐2

its probability density function is:


  + 2  1  +
 1  − 1 2
2  1     
1

f ( w) = 
2 −1 2
  w 1 + 1 w 
2
for w  0
 1    2    2   2 
  
2  2
❖ The characteristic of F Distribution:

𝜐2
Expected Value:𝐸 (𝑤) = , 𝜐2 > 2
𝜐2 −2

2𝜐2 (𝜐1 +𝜐2−2)


Variance:𝑉(𝑤) = 𝜐 2 , 𝜐2 > 4
1 (𝜐2 −2) (𝜐2−4)

31

You might also like