Professional Documents
Culture Documents
Topic 8 Sampling Methods and The Central Limit Theorem (No Answer)
Topic 8 Sampling Methods and The Central Limit Theorem (No Answer)
Population Parameters
___________ ___________
______ ______
Sample Statistics
In order to understand the characteristics of the population, it is a necessary process of statistical inference to infer
the population through sampling surveys. How to select samples from the population, how to estimate, its accuracy
and error judgment are important issues, which will be discussed one by one later. Topic 8 begins discussing issues
related to sampling, including:
一、Sampling Methods
1. Probability Sampling
2. Non-probability Sampling
二、Error of Estimation
1. Non-sampling Errors
2. Sampling Errors
三、Sampling Distributions
1. The Sampling Distribution of the Sample Mean
◆ The Central Limit Theorem
2. The Sampling Distribution of the Sample Proportions
3. The Sampling Distribution of the Difference of Two Sample Means
4. The Sampling Distribution of the Difference of Two Proportions
四、Sampling from the Normal Distribution
1. Normal Distribution
2. Chi-squared Distribution
3. t Distribution
4. F Distribution
1
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
一、Sampling Methods
During research, if the population data can be obtained, the population data can be directly
used for analysis, but the population data is usually not easy to obtain. So, _____________is
used to obtain a ________________ sample to infer the population. There are many practical
reasons why we prefer to select portions or samples of a population to observe and measure:
2 __________________________________________________________________________ .
○
For example, to know how many bacteria are in the water of Sun Moon Lake, in theory, all
the water in Sun Moon Lake should be tested. But water is infinite, only a portion of the
water is taken for testing and inferences about the overall water quality.
Sampling method refers to the process of obtaining sample data. The main sampling
methods are _________________ and _________________________:
The probability that each item or individual in the population will be selected is
__________________________, and common probability sampling includes:
2
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
Explanation:
✓ There are only 750 employees, so skip 961 and the next right 784, and the
third selected employee number is 189.
✓ If the number is repeated, skip it.
Note: Sampling without replacement from a finite population is sometimes called simple
random sampling.
[The probability of each selection does increase slightly because the slip is not
replaced. However, the differences are very small when the population is large. In
other words, if the population size N is large compared to the sample size n,
3
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
Note:Simple random sampling is not easy to use in some situations. For example, a store
wants to study how long customers will stay in the store, at this point, the simple random
sampling is not an effective method, because there is no customer list, and there is no
way to specify customers by random numbers.
We decide to select 100 customers over 4 days, Mon.~Thur. We will select 25 customers a
day and begin the sampling at different times each day: 8 am, 11 am, 4 pm, and 7 pm.
4
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
Note:Under stratified random sampling, the number of people selected from every layer may
be proportional to the relative frequency of each layer in the population.
Definition :The population is divided into _________ (regions, tribes, etc.) according to
specific condition (natural, such as geographic boundaries or artificial). Then, several
clusters are selected from these clusters for random sampling or census to form a sample.
There is _______ variation between individuals and _________ variation within clusters.
When to use:It is suitable for population items or individuals dispersed over a wide geographic
area.
For example:School management wants to know what students think about school policies.
The principal decided to randomly select 10 students from the class and conduct interviews
with all students in the class.
How to do the cluster sampling?
First, the students of the school are divided into groups according to their classes,
and then 10 classes are randomly selected. Finally, it is decided whether to conduct
sampling interviews from 10 classes or to interview all of them.
Note : The advantage of cluster sampling is that it makes possible to sample unlimited
populations and can conduct centralized surveys nearby, and it saves time and costs.
5
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
2. Non-probability Sampling
In practical research, probability sampling is not always feasible, for example, the chance of
each sample being selected cannot be determined. In these cases, only non-probability
sampling can be done, and the process mostly involves subjective judgments. For example,
according to the principles of convenience and economy, the items or individuals who are
most easily contacted are selected as samples. For a survey on the consumption of Taipei
residents, direct interviews with passersby are called Convenience Sampling. Convenience
sampling is not time-consuming, but the sample is biased and conclusions extracted from
this data should not be extended to larger populations.
二、Error of Estimation
When using sample statistics to estimate the population parameters, since each item or
individual in the population is _________, even if the sampling method is very objective, there
is still some difference between the sample statistics and the population parameters, which is
the error of _________. Estimation errors can be divided into non-sampling errors and
sampling errors.
1. Non-sampling Errors
Non-sampling errors are errors that occur when _________ and _________ data, including
Sampling Bias, Nonresponse Error, and Measurement Error. Such errors can also exist across
the entire census.
Sampling bias refers to the tendency to select samples from a certain part of the population,
and _______________ is a type of sampling bias. occurs when the
proportion of one segment of the population is in a sample than it is in the population.
For example, surveying high school students in school to measure youth drug using—this is
sampling bias because high school students who are homeschooled or who drop out are not
included.
Nonresponse error refers to the difference in opinion between subjects who provided a response
and those who _________. It is the error [or bias] that arises from the inability to obtain survey
responses from some sample individuals. At this point, the collected sample observations may
not be representative of the target population. Nonresponse can be improved through the use
of callbacks or rewards/incentives.
For example, if managers in a certain field are selected and their workload is surveyed,
managers with a high workload may not do the survey because they do not have enough time.
6
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
Measurement error is the _____________ of recorded data values, which may be the result of
incorrect or inaccurate responses from subjects, such as subjects' responses that do not reflect
their true feelings, which may be due to vague or leading questions.
For example, asking “Given that at the age of 18 people are old enough to fight and die for
their country, don't you think they should be able to drink alcohol as well?” versus “Do you
think 18-year-olds should be able to drink alcohol?”
【EXAMPLE 2】
To determine the public’s opinion of the police department, the police chief obtains a cluster
sample of 15 census tracts within his jurisdiction and samples all households in the randomly
selected tracts. Uniformed police officers go door to door to conduct the survey. Determine the
type of bias.
2. Sampling Error
The Foxtrot Inn’s number of rooms rented in June 2017 is shown as table below. Find the mean
of the population. Select 3 random samples of 5 days. Calculate the mean rooms rented for
each sample and compare it to the population mean. What is the sampling error in each case?
7
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
8
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
Exercise:
1. Every fifth adult entering an airport is checked for extra security screening. What sampling
technique is used?
Answer:
(The airport is sampling every 5th adult entering an airport for extra security screening.)
2. At a local technical school, five auto repair classes are randomly selected and all of the
students from each class are interviewed. What sampling technique is used?
Answer:
(All students on selected auto repair classes (clusters) are interviewed.)
3. A writer for an art magazine randomly selects and interviews fifty male and fifty female
artists. What sampling technique is used?
Answer:
(The writer interviews fifty artists from each of two groups, male and female (strata)).
4. A statistics student interviews everyone in his apartment building to determine who owns
a cell phone. What sampling technique is used?
Answer:
(The statistics student is simply relying on people who are in his apartment building to obtain
the sample data.)
5. The names of 100 employees are written on 100 cards. The cards are placed in a bag, and
three names are picked from the bag. What sampling technique is used?
Answer:
(Every person has an equal chance of being selected.)
6. Distinguish between nonsampling error and sampling error. Choose the correct answer below.
A. Nonsampling error is the error that results because a sample is being used to estimate
information about a population. Sampling error is the error that results from
undercoverage, nonresponse bias, response bias, or data-entry errors.
B. Nonsampling error is the error that results from the process of obtaining the data.
Sampling error is the error that results from undercoverage, nonresponse bias, response
bias, or data-entry errors.
C. Nonsampling error is the error that results from undercoverage, nonresponse bias,
response bias, or data-entry errors. Sampling error is the error that results from the
process of obtaining the data.
D. Nonsampling error is the error that results from the process of obtaining the data.
Sampling error is the error that results because a sample is being used to estimate
information about a population.
Answer:
9
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
三、Sampling Distributions Since statistics are actually random variables associated with a
given sample, they will vary from sample to sample. Therefore,
Suppose 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 are random samples extracted from the population, and the random
samples are composed of n random variables. Define T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) as the real number
function of the sample (𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ), then Y = T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) is a sample statistic.
(1) Take 【EXAMPLE 3】Foxtrot Inn as an example of the number of rooms rented out per
day in June 2017, the population is the number of rooms rented every day for 30 days in
June 2017. The number of rooms rented out for 5 days is randomly selected from the number
of rooms rented out for 30 days in the population (n = 5), which is (𝑋1 , 𝑋2 , ⋯ , 𝑋5 ), which
may be (4, 7,4,3,1), (3, 3,2,3,6), (0,0,3,3,3)etc. Assuming that T(𝑋1 , 𝑋2 , ⋯ , 𝑋5 ) is defined
∑ 𝑋𝑖
as the average of samples (𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) , that is T(𝑋1 , 𝑋2 , ⋯ , 𝑋5 ) = , it varies
5
∑ 𝑋𝑖
Sample _________:T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) ≡ 𝑋̅ = 𝑛
∑(𝑋𝑖 −𝑋̅)2
Sample _________:T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) ≡ 𝑆 2 = 𝑛−1
∑(𝑋𝑖 −𝑋̅)2
Sample __________________:T(𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) ≡ S = √𝑆 2 = √
𝑛−1
→ The definition of a statistic is very broad, with the only restriction being that a statistic
1 The statistic is the characteristic number of the _________, which can be the sample mean,
○
the sample variance, the sample standard deviation, the sample maximum, the sample
minimum, the sample median, etc. It is also a random variable.
2 The probability distribution of statistics is called _________ distribution or sample
○
statistics distribution, which is also a kind of probability model. It is a probability model of
multiple sampling results and has two functions:
10
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
(1) Definition
Assuming that 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 are random samples extracted from the population, and the
𝑛
∑ 𝑋𝑖
random sample is composed of n random variables. Let 𝑋̅ = 𝑖=1 , then 𝑋̅ is the sample
𝑛
mean. Its probability distribution is f(𝑥̅ ), called the sampling distribution of the ____________.
Explanation:
Taking [EXAMPLE 3] the number of rooms rented out by Foxtrot Inn Hotel in June 2017 as
an example, the _______________ will change with different samples. In the first group
(4,7,4,3,1) 5-day sample, the sample mean is 3.8 rooms; the second group (3,3,2,3,6) has a
sample mean of 3.4 rooms. The population mean for this sample is 3.13 rooms. If the average
of possible (𝐶𝑛𝑁 = 𝐶530) 5-day samples is sorted into a probability distribution, this
distribution is called the sampling distribution of the sample mean.
𝑛
∑ 𝑋𝑖
𝑋1 𝑋2 𝑋3 𝑋4 𝑋5 𝑋̅ = 𝑖=1 𝑥̅ f(𝑥̅ )
𝑛
⋮ ⋮ ⋮
⋮ ⋮ ⋮
11
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
(2)樣本平均數的重要表徵數
3 Standard Deviation:
○
𝜎
Infinite Population:S.D. = 𝜎𝑋̅ = √𝜎𝑋2̅ = ;
√𝑛
𝜎 𝑁−𝑛
Finite Population:S.D. = 𝜎𝑋̅ = √𝜎𝑋2̅ = √𝑁−1
√𝑛
12
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
The sampling distribution of the sample mean is also the normal distribution regardless of the
sample size.
2 Population distribution is not normal distribution
○
When the sample is large enough, according to the Central Limit Theorem (CLT), the
sampling distribution of the sample mean will approach the normal distribution.
◆Central Limit Theorem(CLT)
Regardless of the distribution of the population, the mean is μ and the variance
is 𝜎 2(𝜎 2 < ∞). Simply randomly select n samples from the mother as a group of samples.
If the sample size is large enough, the sampling distribution of the sample mean will
approach the normal distribution.
Sample size is large enough means:_________
Explanation:
1 If the population is a normal distribution, the sampling distribution of the sample mean is
○
the _________ distribution regardless of the sample size.
2 If the population is not a normal distribution:
○
It can be seen from the figure below that, regardless of the shape of the population distribution,
as the sample size increases, the sampling distribution of the sample mean will approach the
_________ distribution.
13
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
̅~_______________;
3 If it is an infinite population, the sampling distribution of its sample mean is 𝑋
○
If it is an finite population, the sampling distribution of its sample mean is 𝑋̅ ~_______________
(4) Z-score for a given value of X (sample mean)
1 Infinite population
○
𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 − 𝜇 𝑥̅ − μ𝑋̅ 𝑥̅ − 𝜇
Z= = = 𝜎
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝜎𝑋̅
√𝑛
2 Finite population
○
𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 − 𝜇 𝑥̅ − μ𝑋̅ 𝑥̅ − 𝜇
Z= = =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝜎𝑋̅ 𝜎 √𝑁 − 𝑛
√𝑛 𝑁 − 1
14
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
15
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
【EXAMPLE 5】
When a production machine is properly calibrated, it requires an average of 25 seconds per unit
produced, with a standard deviation of 3 seconds. For a simple random sample of n = 36 units,
the sample mean is found to be 26.2 seconds per unit.
a. What z-score corresponds to the sample mean of x = 26.2 seconds?
b. When the machine is properly calibrated, what is the probability that the mean for a simple
random sample of this size will be at least 26.2 seconds?
【EXAMPLE 6】
The Quality Assurance Dept. for Cola, Inc. maintains records regarding the amount of cola in
its jumbo bottle. The actual amount of cola in each bottle varies a small amount from one bottle
to another. Records indicate the amounts of cola follow the normal distribution, the mean
amount of cola in the bottles is 31.2 ounces, and the standard deviation is 0.4 ounces. At 8 a.m.
today, the quality technician randomly selected 16 bottles from the filling line. The mean
amount was 31.38 ounces. Is this an unlikely result? Is it a likely the process is putting too
much soda in the bottle? Is the sampling error of 0.18 ounce unusual?
16
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
(1) Definition
Assuming that 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 is a random sample extracted from the population. The random
sample is composed of n random variables, and the number of "successes" in n is X, then the
success rate is _________, called 𝑝̂ the sample proportion. Its probability distribution is
f(𝑝̂ ), which is called the sampling distribution of the __________________.
Explanation:
According to the data, 42% of the company's employees have financial licenses. The
following 3 graphs describe the sampling distribution for sample proportions and sample
sizes n of 10, 50, and 100, respectively.
(2)樣本比例的重要表徵數
樣本比例的重要表徵數為:
1 Expected Value: E(𝑝̂ ) = 𝜇𝑝̂ = 𝑝
○
2 Variance:
○
𝑝(1−𝑝)
Infinite population:Var(𝑝̂ ) = 𝜎𝑝2̂ = ;
𝑛
𝑝(1−𝑝) 𝑁−𝑛
finite population:Var(𝑝̂ ) = 𝜎𝑝2̂ =
𝑛 𝑁−1
3 Standard Deviation:
○
𝑝(1−𝑝)
Infinite population:S.D. = 𝜎𝑝̂ = √𝜎𝑝2̂ = √ ;
𝑛
𝑝(1−𝑝) 𝑁−𝑛
finite population:S.D. = 𝜎𝑝̂ = √𝜎𝑝2̂ = √ √𝑁−1
𝑛
Explanation:
𝑋
𝐸(𝑝̂ ) = 𝐸 ( ) = ______________________________________________________
𝑛
The variance and standard deviation of 𝑝̂ vary depending on whether the population is infinite
or finite:
17
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
1 Infinite Population
○
𝑋
𝑉(𝑝̂ ) = 𝑉 ( ) = ________________________________________________________________________
𝑛
𝜎𝑃̂ = 𝑆. 𝐷. (𝑝̂ ) = √𝑉𝑎𝑟(𝑝̂ ) = ___________________________
○2 Finite Population
𝑁−𝑛
☆Use finite population correction factor 𝑁−1
If the sample size n is not small relative to the population N [rule of thumb: the number of
samples is greater than 5% of the parent (n>5%N)], or when the finite population is extracted
and not returned, the variance and standard deviation of 𝑝̂ need to be adjusted with a finite
𝑁−𝑛
population correction factor , so
𝑁−1
When the sample is large enough, according to the Central Limit Theorem (CLT), the sampling
distribution of the sample proportion will approach the _________ distribution, the mean is 𝑝
𝑝(1−𝑝)
and the variance is Var(𝑝̂ ) = 𝜎𝑝2̂ =
𝑛
18
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
1 Infinite Population
○
𝑝̂ − 𝑝 𝑝̂ − 𝑝
Z= =
𝜎𝑃̂
√𝑝(1 − 𝑝)
𝑛
2 Finite Population
○
𝑝̂ − 𝑝 𝑝̂ − 𝑝
Z= =
𝜎𝑃̂
√𝑝(1 − 𝑝) √𝑁 − 𝑛
𝑛 𝑁−1
𝑝̂ = the sample proportion value of interest
【EXAMPLE 7】
The campaign manager for a political candidate claims that 55% of registered voters favor her
strongest opponent. Assuming that this claim is true, what is the probability that in a simple
random sample of 300 voters, at least 60% would favor the candidate over her strongest
opponent?
【EXAMPLE 8】
A survey of 500 adults aged 18-29 showed 285 ate fast food for dinner at least once in the past
week. Find the sample proportion of individuals surveyed who ate fast food for dinner at least
once in the past week.
【EXAMPLE 9】
According to the Centers for Disease Control and Prevention, 18.8% of school-aged children,
aged 6-11 years, were overweight in 2004. In a random sample of 90 school-aged children,
aged 6-11 years, what is the probability that at least 19% are overweight?
19
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
【EXAMPLE 10】
Sixteen percent of Americans do not have health insurance. Suppose a simple random sample
of 500 Americans is obtained. In a random sample of 500 Americans, what is the probability
that more than 20% do not have health insurance?
(1) Definition
Suppose there are two quantitative population, their sizes are 𝑁1 and 𝑁2 , the means are 𝜇1
and 𝜇2 , and the variances are 𝜎12 and 𝜎22 . Random samples are extracted from the two
populations, the sample sizes are 𝑛1 and 𝑛2, and the samples are independent of each other.
If 𝑋̅1 and 𝑋̅2 represent the sample mean respectively, then 𝑋̅1 −𝑋̅2 is the difference between
the mean of the two samples, and its probability distribution is called sampling distribution of
the difference of two sample means.
20
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
(2)兩樣本平均數差的重要表徵數
兩樣本平均數差的重要表徵數為:
̅1 −𝑋̅2 ) = 𝜇(𝑋̅ −𝑋̅ ) = 𝜇1 − 𝜇2
1 Expected Value: E(𝑋
○ 1 2
2 Variance:
○
2 𝜎2 𝜎2
Infinite Population:Var(𝑋̅1 −𝑋̅2 ) = 𝜎(𝑋 1
̅1 −𝑋̅2 ) = 𝑛 + 𝑛
2
1 2
2 𝜎2 𝑁1 −𝑛1 𝜎2 𝑁2 −𝑛2
Finite Population:Var(𝑋̅1 −𝑋̅2 ) = 𝜎(𝑋
̅1 −𝑋̅2 ) = 𝑛
1
+ 𝑛2
1 𝑁1 −1 2 𝑁2 −1
3 Standard Deviation:
○
2 𝜎2 𝜎2
Infinite Population:S.D. = 𝜎(𝑋̅1 −𝑋̅2) = √𝜎(𝑋 1
̅1 −𝑋̅2 ) = √𝑛 + 𝑛
2
1 2
2 𝜎2 𝑁1 −𝑛1 𝜎2 𝑁2 −𝑛2
Finite Population:S.D. = 𝜎(𝑋̅1−𝑋̅2) = √𝜎(𝑋
̅1 −𝑋̅2 ) = √𝑛
1
+ 𝑛2
1 𝑁1 −1 2 𝑁2 −1
(3) The pattern of sampling distribution of the difference of two sample means.
Regardless of the sample size, the sampling distribution of the difference of the two sample
means is also the normal distribution.
2 Population distribution is not normal distribution
○
When the sample size is large enough, according to the Central Limit Theorem (CLT), the
sampling distribution of the difference of the two sample means will approach normal
distribution
Explanation:
21
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
According to the linear combination theorem of normal random variables, the sampling
distribution of 𝑋̅1 −𝑋̅2 also approaches the _________ distribution.
1 Infinite Population
○
(𝑋̅1 −𝑋̅2 ) − 𝐸(𝑋̅1 −𝑋̅2 ) (𝑥̅1 −𝑥̅ 2) − (𝜇1 − 𝜇2 )
Z= =
𝜎(𝑋̅1−𝑋̅2 )
𝜎2 𝜎2
√ 1 + 2
𝑛 1𝑛 2
2 Finite Population
○
(𝑋̅1 −𝑋̅2 ) − 𝐸(𝑋̅1 −𝑋̅2 ) (𝑥̅1 −𝑥̅2 ) − (𝜇1 − 𝜇2 )
Z= =
𝜎(𝑋̅1−𝑋̅2 )
𝜎12 𝑁1 − 𝑛1 𝜎22 𝑁2 − 𝑛2
√
𝑛1 𝑁1 − 1 + 𝑛2 𝑁2 − 1
【EXAMPLE 11】
In a study of annual family expenditures for general health care, two populations were
surveyed with the following results:
If the variances of the populations are 12 = 2800 and 22 = 3250 , what is the probability of
22
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
(1) Definition
Suppose there are two qualitative population, their sizes are 𝑁1 and 𝑁2 , population
proportions are 𝑝1 and 𝑝2 . Two independent samples are extracted from the two populations.
The sample sizes are 𝑛1 and 𝑛2, the sample proportions are 𝑝̂1 and 𝑝̂2 , them 𝑝̂1 − 𝑝̂2 is
the difference between the two sample proportions, and the probability distribution is called
sampling distribution of the difference of two sample proportions.
(2) The important characteristic of the difference between the proportions of two samples
1 Expected Value: E(𝑝̂1 − 𝑝̂ 2 ) = 𝜇(𝑝̂ −𝑝̂ ) = 𝑝1 − 𝑝2
○ 1 2
2 Variance:
○
2 𝑝1 (1−𝑝1 ) 𝑝2 (1−𝑝2 )
Infinite Population:Var(𝑝̂1 − 𝑝̂2 ) = 𝜎(𝑝
̂1 −𝑝̂2 ) = +
𝑛1 𝑛2
○
3 標準差:
2 𝑝1 (1−𝑝1 ) 𝑝2 (1−𝑝2 )
Infinite Population:S.D. = 𝜎(𝑝̂1−𝑝̂2) = √𝜎(𝑝
̂1 −𝑝̂2 ) = √ +
𝑛1 𝑛2
When the samples 𝑛1 and 𝑛2 are large enough, according to the Central Limit Theorem
(CLT), the sampling distribution of the difference in the proportions of the two samples will
approach the normal distribution.
Explanation:
When 𝑛1 and 𝑛2 are large enough, according to the _____________________________,
𝑝1 (1−𝑝1 )
𝑝̂1 and 𝑝̂2 are normal distributions, respectively. By CLT, 𝑝̂1 ~𝑁 (𝑛1𝑝1 , ) 𝑎𝑠 𝑛1 ↑,
𝑛1
𝑝(1−𝑝2 )
𝑝̂2 ~𝑁 (𝑛2 𝑝2 , ) 𝑎𝑠 𝑛2 ↑
𝑛2
According to the linear combination theorem of normal random variables, the sampling
distribution of 𝑝̂1 − 𝑝̂2 also approaches the ______________________.
23
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
1 Infinite Population
○
(𝑝̂1 − 𝑝̂2 ) − 𝐸(𝑝̂1 − 𝑝̂2 ) (𝑝̂1 − 𝑝̂2 ) − (𝑝1 − 𝑝2 )
Z= =
𝜎(𝑝̂1−𝑝̂2) 𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 )
√ +
𝑛1 𝑛2
2 Finite Population
○
(𝑝̂1 − 𝑝̂2 ) − 𝐸(𝑝̂1 − 𝑝̂2 ) (𝑝̂1 − 𝑝̂2 ) − (𝑝1 − 𝑝2 )
Z= =
𝜎(𝑝̂1−𝑝̂2) 𝑝1 (1 − 𝑝1 ) 𝑁1 − 𝑛1 𝑝2 (1 − 𝑝2 ) 𝑁2 − 𝑛2
√
𝑛1 𝑁1 − 1 + 𝑛2 𝑁2 − 1
【EXAMPLE 12】
In a certain area of a large city it is hypothesized that 40% of the houses are in a dilapidated
condition. A random sample of 75 houses from this section and 90 houses from another section
yield a difference, (𝑝̂1 − 𝑝̂2 ) , of 0.09. If there is no difference between the two areas in the
proportion of dilapidated houses, what is the probability of observing a difference this large or
larger?
24
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
Exercise:
1. Sampling distributions describe the distribution of
a) parameters.
b) statistics.
c) both parameters and statistics.
d) neither parameters nor statistics.
3. For air travelers, one of the biggest complaints is of the waiting time between when the
airplane taxis away from the terminal until the flight takes off. This waiting time is known to
have a right skewed distribution with a mean of 10 minutes and a standard deviation of 8
minutes. Suppose 100 flights have been randomly sampled. Describe the sampling distribution
of the mean waiting time between when the airplane taxis away from the terminal until the
flight takes off for these 100 flights.
a) Distribution is right skewed with mean = 10 minutes and standard error = 0.8 minutes.
b) Distribution is right skewed with mean = 10 minutes and standard error = 8 minutes.
c) Distribution is approximately normal with mean = 10 minutes and standard error = 0.8
minutes.
d) Distribution is approximately normal with mean = 10 minutes and standard error = 8 minutes.
4. The mean score of all pro golfers for a particular course has a mean of 70 and a standard
deviation of 3.0. Suppose 36 pro golfers played the course today. Find the probability that the
mean score of the 36 pro golfers exceeded 71.
5. The distribution of the number of loaves of bread sold per week by a large bakery over the
past 5 years has a mean of 7,750 and a standard deviation of 145 loaves. Suppose a random
sample of n = 40 weeks has been selected. What is the approximate probability that the mean
number of loaves sold in the sampled weeks exceeds 7,895 loaves?
25
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
6. Major league baseball salaries averaged follow a normal distribution with a $3.26 million
and a standard deviation of $1.2 million in a certain year in the past. Suppose a sample of 100
major league players was taken. What was the standard error for the sample mean salary? What
is the approximate probability that the mean salary of the 100 players exceeded $3.5 million?
7. The amount of bleach a machine pours into bottles has a mean of 36 oz. with a standard
deviation of 0.15 oz. Suppose we take a random sample of 36 bottles filled by this machine.
What is the probability that the mean of the sample is between 35.94 and 36.06 oz.?
8. According to a survey, only 15% of customers who visited the web site of a major retail store
made a purchase. Random samples of size 50 are selected. What proportion of the samples will
have between 20% and 30% of customers who will make a purchase after visiting the web site?
9. According to an article, 19% of the entire population in a developing country have high-
speed access to the Internet. Random samples of size 200 are selected from the country’s
population. ______ % will have between 14% and 24% who have high-speed access to the
Internet.
10. Times spent studying by students in the week before final exams follow a normal
distribution with standard deviation 8 hours. A random sample of 4 students was taken from a
population of 50 in order to estimate the mean study time for the population of all students.
Use the finite population correction. What is the standard error of all the sample means? What
is the probability that the sample mean differs from the population mean by less than 2 hours?
26
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
This section describes the characteristics of sample sizes extracted from the data whose
population is normal distribution, this distribution remains one of the most used statistical
models. Many useful properties of sample statistics are derived from the sampling from normal
distribution as well as many well-known sampling distributions.
1.Normal Distribution
Suppose there are n variables, 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 , are from the same normal population N(μ, 𝜎 2 ),
then
𝑖=𝑛
∑ 𝑋𝑖 𝜎2
(1) The sampling distribution of sample mean 𝑋̅ = 𝑖=1 is N(μ, ).
𝑛 𝑛
𝜎 2
(3) The sampling distribution of difference of two sample means 𝑋̅1 −𝑋̅2 is N(𝜇1 − 𝜇2 , 𝑛1 +
1
𝜎22
)
𝑛2
𝜎 𝜎 2 2
(4) The sampling distribution of sum of two sample means 𝑋̅1 +𝑋̅2 is N(𝜇1 + 𝜇2 , 𝑛1 + 𝑛2 )
1 2
2. 𝜒 2 Distribution;Chi-squared Distribution
Definition
Assuming that there are n independent normal random variables 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 , the means are
𝑋𝑖 −𝜇𝑖 2
𝜇1 , 𝜇2 , ⋯ , 𝜇𝑛 , and the variance are 𝜎12 , 𝜎22 , ⋯ , 𝜎𝑛2 . Assuming that 𝜒 2 = ∑𝑖=𝑛
𝑖=1 ( ) , then
𝜎𝑖
υ is degrees of freedom
Explanation:
(1) The degrees of freedom υ are specific to the chi-square statistic. The degree of freedom is
the number of variables that can be freely changed in the calculation, and in the case of the
chi-square statistic, it is the number of variables that can be freely changed in each variable
that constitutes a chi-square statistic.
For example: There is a set of data X with 4 observations: 15, 30, 25, 10. The mean of this
27
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
group of data is 20. Given a mean of 20 in this set of data, there are only 3 values that can
vary freely. In other words, the degrees of freedom are N-1=3.
In statistics, degrees of freedom are usually equal to the number of observations minus the
number of estimated population parameters.
(# of observations) minus (# of parameters estimated)
Expected Value:𝐸(𝜒𝜐2 ) = 𝜐
Variance:𝑉(𝜒𝜐2 ) = 2𝜐
8
Skewness:𝛼3 = √υ
12
Kurtosis:𝛼4 = 3 + 𝜐
(3) The facts related to chi-square random variables::Let 𝜒𝜐2 be a chi-square random variable
with degrees of freedom.
1 If Z is the random variable of N(0,1), then 𝑍 2 ~𝜒12 , that is, the square of a standard normal
○
random variable is a chi-square random variable with 1 degree of freedom.
○
2 If 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 are independent and 𝑍𝑖 ~𝜒𝜐2𝑖 , then (𝑋1 + 𝑋2 + ⋯ +
2
𝑋𝑛 )~𝑍𝑖 ~𝜒(𝜐 1 +𝜐2+⋯+𝜐𝑛 )
, that is, the addition of independent chi-square variables remains
chi-square variables, and the degrees of freedom are also added.
3 Suppose that n random variables such as 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 come from the same normal
○
𝑋𝑖 −𝜇 2
population N(μ, 𝜎 2 ), then the sampling distribution 𝜒 2 = ∑𝑖=𝑛
𝑖=1 ( ) is also a chi-square
𝜎
We lost a degree of freedom when we used the sample mean rather than the true mean.
28
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
1 2
❖The mean and variance of 𝑆 2 = 𝑛−1 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋) :
Expected Value:𝐸(𝑆 2 ) = 𝜎 2
2𝜎4
Variance:𝑉(𝑆 2 ) = 𝑛−1
(𝑛 − 1)𝑆 2 2 2
𝜎2 2
= 𝜒𝑛−1 ⟹ 𝑆 = 𝜒
𝜎2 𝑛 − 1 𝑛−1
𝜎2 2 𝜎2 𝜎2
𝐸(𝑆 2 ) = 𝐸 ( 𝜒𝑛−1 ) = 2 )
𝐸(𝜒𝑛−1 = (𝑛 − 1) = 𝜎 2
𝑛−1 𝑛−1 𝑛−1
𝜎2 2 𝜎4 𝜎4 2𝜎 4
𝑉(𝑆 2 ) = 𝑉 ( 𝜒𝑛−1 ) = 𝑉(𝜒 2 )
𝑛−1 = ∙ 2(𝑛 − 1) =
𝑛−1 (𝑛 − 1)2 (𝑛 − 1)2 𝑛−1
3. Student’s t Distribution
Assume that n random variables such as 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 come from the same normal population
𝑋̅−𝜇 𝑋̅−𝜇
N(μ, 𝜎 2 ), then Z = 𝜎⁄ ~𝑁(0,1). If σ is known and 𝑋̅ is calculated, then can be used
√𝑛 𝜎⁄√𝑛
to infer μ because μ is the only unknown. However, most of the time, σ is unknown.
Therefore, with σ unknown, student estimates μ through the distribution of the t-statistic
𝑋̅ −𝜇
t = 𝑠⁄ .
√𝑛
Student has a small sample of data, he wants to know the difference between the sample
mean and the population mean, but he does not know the variance and cannot accurately
estimate it.
○
1 N(0,1)
2 Compare 𝑋
○ ̅ and μ
3 𝜎 2 is unknown
○
4 Small sample (otherwise, we can estimate 𝜎 2 with 𝑆 2 )
○
→rewrite
𝑋−𝜇
𝜎
𝑋−𝜇 𝑛 𝑍
𝑡= 𝑠 = √ =
𝑠 1 2
√𝑛 𝑛
∙ 𝜎 √𝑆 2
√ 𝜎
√𝑛
When 𝑛 ⟶ ∞, Student’s t Distribution will approach the Z distribution
29
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
❖Student finds the probability density function (pdf) of the Student’s t Distribution is:
+1
( +1)
2 − 2
f (t ) = 1 +
2 t
, −t
2
Υis degrees of freedom
❖The characteristic of t Distribution:
Expected Value:𝐸(𝑡) = 0
𝜐
Variance:𝑉(𝑡) = 𝜐−2 , 𝜐 > 2
Skewness:𝛼3 = 0
3(𝜐−2)
Kurtosis:𝛼4 = ,𝜐 > 4
𝜐−4
4. Snedecor’s F Distribution
Snedecor’s F Distribution is usually used to compare the variance of two data sets.
𝑆22 (𝑛−1)
~𝜒𝑛22−1
𝜎22
𝑆12 2
𝜒𝑛 1 −1⁄
⁄ 2
𝜎1 (𝑛1 −1)
The value F: 𝑆22 ~ 𝜒2
⁄ 2 𝑛2 −1⁄
𝜎2 (𝑛2 −1)
2
𝜒𝑛 1 −1⁄
𝑆12 (𝑛1 −1)
When 𝜎12 = 𝜎22 → ~ 𝜒2
𝑆22 𝑛2 −1⁄
(𝑛2 −1)
𝜎12
If you want to know the degree of variation of the two sets of population data, you can see ,
𝜎22
𝑆12
⁄ 2
𝑆12 𝑆2
and the related information you can see (the ratio of sample variation). Through 2 =
𝑆22 𝜎1
⁄ 2
𝜎2
𝑆12
⁄ 2
𝜎1
𝑆22
, Snedecor’s F Distribution can be used to compare the degree of variation between the
⁄ 2
𝜎2
30
112 學年度企管系統計學 主題 8:抽樣方法與中央極限定理 授課老師:林晉禾
f ( w) =
2 −1 2
w 1 + 1 w
2
for w 0
1 2 2 2
2 2
❖ The characteristic of F Distribution:
𝜐2
Expected Value:𝐸 (𝑤) = , 𝜐2 > 2
𝜐2 −2
31