Professional Documents
Culture Documents
Chapter 2 Students-Sta408
Chapter 2 Students-Sta408
SAMPLING DISTRIBUTION
SAMPLING DISTRIBUTION
A population consists of all elements of the group which we are interested to study.
Population Sample
𝑥
Mean 𝑋 𝑥=
𝜇= 𝑛
𝑁
2 2
Variance 1 𝑋 1 𝑥
𝜎2 = 𝑥2 − 𝑠2 = 𝑥2 −
𝑁 𝑁 𝑛−1 𝑛
1 𝑋 2 2
1 𝑥
Standard deviation 𝜎= 𝑋2 − 𝑠= 𝑥2 −
𝑁 𝑁 𝑛−1 𝑛
Notation
Population Sample
Parameter Summary measure of -
population
Statistic - Summary measure of a
sample
Total N n
Mean µ 𝑥
Standard deviation σ -
Standard error - s
Variance 𝜎2 𝑠2
Example 2.1
Compute
Population Sample
mean
The Sampling Distribution of The Mean
If a lecturer gives a 10 point quiz to all of 5 students. If a lecturer gives a 10 point quiz only for 5 student out of 10
The results of the quiz were 2,4,6,8,9 students in Class C. The results of the quiz were 2,5,6,8,9
Mean
𝑋 29 𝑥=
𝑥
=
30
=6
𝜇= = = 5.8 𝑛 5
𝑁 5
Standard Deviation
1 𝑋 2 1 𝑥 2
𝜎= 𝑋2 − 𝑠= 𝑥2 −
𝑁 𝑁 𝑛−1 𝑛
1 292 1 302
𝜎= 201 − 𝑠= 210 −
5 5 5−1 5
𝜎 =2.5612 𝑠 =2.7386
Variance 1 𝑥 2
2 2
1 𝑋 2 𝑠 = 𝑥 −
2 2 𝑛−1 𝑛
𝜎 = 𝑥 −
𝑁 𝑁 1 302
2
1 292 𝑠 = 210 −
2 5−1 5
𝜎 = 201 −
5 5
2
𝜎 = 6.56 𝑠 2 =7.5
CENTRAL LIMIT THEOREM (CLT)
Rule of thumb:
n≥30
(consider large)
Central Limit Theorem
• The CLT state that the sampling distribution of any statistics will be
normal or approximately normal if sample is large (n≥30)
• If the original population is normally distributed or approximately
normal, then the distribution of the sample mean will be normally
distributed for any sample size n.
• If the original population is not normally distributed, the distribution
of the sample mean will be normally distributed for a sample size of
30 or more.
Sampling Distribution of the Sample Mean, 𝒙
• A sampling distribution of sample mean is a distribution using the means computed from all
possible random samples of a specific size taken from a population.
• The Central Limit Theorem on the Distribution of Sample Mean
The mean of the sample means will be the same as the population mean, µ𝑥 = µ
The standard deviation of the sample means will be smaller than the standard deviation of the
population, and it will be equal to population standard deviation divided by the square root of the
σ
sample size, σ𝑥 = 𝑛
If the population is not normally distribution or there is no information regarding the population,
then the distribution of the sample means tends to be normally distributed when the sample size
is sufficiently large. That is, when n ≥ 30.
WITHOUT SAMPLE WITH SAMPLE (CLT)
Formula for z score (or standard score) Formula for the z value for the central limit
Used to gain information about an individual
theorem. Use to gain information when
when the variable is normally distributed
applying the central limit theorem about a
sample mean when the variable is normally
distributed or when n≥30.
X−μ
Z= 𝑿 = sample mean
σ 𝐗− 𝛍 𝛍 = population mean
Z= 𝛔 𝜎= population standard deviation
𝐧
n = sample size
2 𝜎2
X ~ N (µ, 𝜎 ) 𝑋 ~ N (µ, )
𝑛
Application of CLT
Example 2.2
A production firm manufactures light bulbs that have a length of life that is approximately
normally distributed with mean 800 hours and a standard deviation of 40 hours.
A manager observes that his income per day averages RM1000 with
standard deviation of RM200. He selected a random sample of 30 days.
a) Describe the distribution of the sample mean.
b) What is the probability that the mean income for the sample of 30
days exceeds RM1050. (Ans: 0.0853)
ESTIMATION
(Point Estimation and Interval Estimation)
A distribution of a sample statistic (mean, standard deviation, proportion)
These statistic and other numerical descriptive measures computed from
the samples can be used not only to describe the sample but also to
make inferences about the population parameter in the form of estimates
and hypothesis.
Make Estimation
Compute
Population Sample inferences/
statistic
conclusion
Hypothesis
testing
INTRODUCTION OF ESTIMATION
Types of Estimation
Unbiased
Estimation of single value Sample estimates equal to
parameter being estimated
Properties of Estimation
2
Estimation Interval Estimation 2 Efficient
Estimation of 2 numbers
Var(ˆ1 ) Var(ˆ2 )
(UL & LL) to form interval
Definition 3
Estimator
Consistent
The process by which
Sample measures (Statistics)
sample data are used to As n↑, an estimate with std
are used to estimate
indicate the value of error is smaller
population measures
unknown quantity in the
population
(parameter) lim var(ˆ) 0
n
Introduction Estimation
• There are three properties of best estimators, namely unbiased, efficient, and consistent.
The estimator should be unbiased estimator. That is, the expected value or the mean of the
estimates obtained from sample of a given size is equal to the parameter being estimated. E(𝜃) =
𝜃
The estimator should be a relatively efficient estimator. That is, of all the statistics that can be
used to estimate a parameter, the relatively efficient estimator has the smallest variance. 𝜃1 is a
more efficient estimator of 𝜃 than . 𝜃2 if 𝑉𝑎𝑟 𝜃1 < 𝑉𝑎𝑟 𝜃2
The estimator should be consistent. For a consistent estimator, as sample size increases, the
value of the estimator approaches the value of the parameter estimated. lim 𝑣𝑎𝑟 (𝜃) = 0
𝑛 →∞
Two type of estimation
• There are two type of estimation
Point estimation Interval estimator
• The value of a sample statistic that is used • An inferential statistical procedure used to
to estimate a population parameter estimate population parameters from
• To generalize the estimation to the sample data through the building of
population, the sample must be a random confidence intervals
sample. • Confidence Intervals is a range of values
• A random sample is a sample which each computed from sample data that has a
element in the population has an equal known probability of capturing some
chance to be included in the sample population parameter of interest
• For example, the sample mean x is a point
estimate of the population mean μ.
• Similarly, the sample proportion p is a
point estimate of the population
proportion P.
EXAMPLE
The following table indicates the best point estimator for each parameter.
𝜇 𝑥
𝑥=
𝑛
𝜎2 1 𝑥 2
𝑠2 = 𝑥2 −
𝑛−1 𝑛
𝜎 2
1 𝑥
𝑠= 𝑥2 −
𝑛−1 𝑛
Example 2.5
mean variance
One Population
Mean
σ2 Known σ2 Unknown
𝑠
𝜎 𝜇 = 𝑥 ± 𝑡𝛼,𝑑𝑓
𝜇 = 𝑥 ± 𝑧𝛼 2 𝑛
2 𝑛
df=n-1
Confidence interval for Mean 𝝁 (Variance is known)
known σ2
Formula
Example 2.6
The average lifetime of a product from a sample of 30 items is found to be 48 months. It is
estimated that the standard deviation of the population is 3 months. Find the 95% confidence
interval for the average lifetime of the product.
Now, repeat the same problem by finding the 90% confidence interval for the average lifetime.
Confidence interval for Mean 𝝁 (Variance is unknown)
unknown σ2
Formula
Example 2.7
The time taken (in seconds) to connect to the internet via a dial-in service for a
sample of 35 nights gave a mean of 26.46 and a standard deviation of 10.81.
Find a 98% confidence interval on the mean time required to access the
internet during the night.
Interval Estimation: One Population Mean (Variance UnKnown)
Example 2.8
Example 2.9
The breaking strengths of 11 bundles of wool fibres have a sample mean 436.5 and a sample
standard deviation of 11.90. Assume the breaking strengths of the populations are normally
distributed.
Construct a 90% confidence interval for the mean breakings strengths for wool fibres.
Interval Estimation: One Population Mean (Variance UnKnown )
Example 2.10: The R & D department of an industry imposed that the mean life of the
light bulbs produced should exceed 4000 hours and with a standard deviation of less
than 150 hours before it could be supplied to the markets. A sample 15 bulbs were
tested and the lengths of life are as follows (hours):
4300 4302 4415 4483 4301 4446 4478 4319 3985 4483 4377 4401 4346 4261 4353
a) Prove that the mean lengths of life bulbs is 4350 hours
b) Estimate the mean life of the light bulb using a 98% confidence interval. Using the
results, is the industry ready to supply the light bulbs? Explain your answer.
CONFIDENCE INTERVAL FOR TWO
POPULATION MEAN
Two Population Means
Dependent
Independent
𝑠𝑑
𝜇𝑑 = d ± t α,𝑑𝑓 𝑑𝑓 = 𝑛 − 1
2 𝑛
σ2 Known
σ2 Unknown
𝜎12 𝜎22
𝜇1 − 𝜇2 = 𝑥1 − 𝑥2 ± 𝑍𝛼 +
2 𝑛1 𝑛2
Assume 𝝈𝟏 𝟐 =𝝈𝟐 𝟐
1 1
𝜇1 − 𝜇2 = (𝑥1 − 𝑥2 ) ± 𝑡𝛼,𝑑𝑓 𝑠𝑝 +
2 𝑛1 𝑛2
𝑛1 −1 𝑠12 + 𝑛2 −1 𝑠22
𝑑𝑓 = 𝑛1 + 𝑛2 − 2 𝑠𝑝 =
𝑛1 +𝑛2 −2
Assume 𝝈𝟏 𝟐 ≠ 𝝈𝟐 𝟐 2
𝑠1 2 𝑠2 2
+
𝑛1 𝑛2
𝑠12 𝑠22 df = 2 2
𝜇1 − 𝜇2 = (𝑥1 −𝑥2 ) ± 𝑡𝛼,𝑑𝑓 + 𝑠1 2 𝑠2 2
2 𝑛1 𝑛2
𝑛1 𝑛
+ 2
𝑛1 − 1 𝑛2 − 1
CONFIDENCE INTERVAL FOR DIFFERENCE BETWEEN TWO POPULATION MEANS
INDEPENDENT SAMPLE
• Two samples are independent if they are draw from two different
populations and the elements of first sample have no relationship to
the elements of the second sample.
• Example: To determine the difference in mean ph in rainfall of Shah
Alam and Klang
DEPENDENT SAMPLE
• Two samples are dependent if they are draw from two different
populations and the elements of first sample have relationship to the
elements of the second sample.
• Example: To determine the effectiveness of Kevin Zahari’s diet program
Participant’s weight before and after program is measured
Confidence Interval for Differences between Two Population Means (variances is known)
known σ2
Example 2.11: An experiment was conducted in which two types of engines, A and B were
compared. Gas mileage in miles per gallon was measured. 75 experiments were conducted using
engine type A and 50 experiments were done for engine type B. The gasoline used and other
conditions were held constant. The average gas mileage for engine A was 42 miles per gallon and the
average for engine B was 36 miles per gallon. Find a 96% confidence interval on 𝜇𝐴 − 𝜇𝐵 , where 𝜇𝐴
and 𝜇𝐵 are population mean gas mileage for engine A and engine B, respectively. Assume that the
population standard deviations are 8 and 6 for engine A and B respectively.
Confidence Interval for Difference between Two Population Means (variances is unknown and
assumed equal variances)
𝑛1 − 1 𝑠12 + 𝑛2 − 1 𝑠22
𝑠𝑝 =
𝑛1 + 𝑛2 − 2
Confidence Interval for Difference between Two Population Means (variances is unknown and assumed equal
variances)
Example 2.12: An insurance company wants to know if the average speed at which men drive cars is
greater than that of women drivers. The company took a random sample of 26 cars driven by men
on a highway and found the mean speed to be 72 miles per hour with a standard deviation of 2.2
miles per hour. Another sample of 16 cars driven by women on the same highway gave a mean
speed of 68 miles per hour with standard deviation of 2.5 miles per hour. Assume that the speeds at
which all men and all women drive cars on this highway are both normally distributed with the same
population standard deviation.
Construct a 98% confidence interval for the difference between the mean speeds of cars driven by
all men and all women on this highway. (Ans: 𝑠𝑝 = 2.317, (2.216,5.784))
Confidence Interval for Difference between Two Population Means (variances is unknown and unequal
variances)
Formula
Where
Assumptions: 2
𝑠1 2 𝑠2 2
i. Populations are normally 𝑛1 + 𝑛2
distributed df = 2 2
𝑠1 2 𝑠2 2
ii. Populations variances 𝜎1 2 and 𝜎2 2 𝑛1 𝑛2
are unknown but the variances are +
𝑛1 − 1 𝑛2 − 1
assumed to be different
Confidence Interval for Difference between Two Population Means (variances is unknown and unequal
variances)
Example 2.14: A set of facilitation tools to help with data analysis for problem solving is being
developed by a group of statisticians at UiTM. In order to test effectiveness of these tools, a group of
research officers were asked to analyze and produce a built-in report for a set of data on the
computer. Twelve equally capable research officers were randomly selected and six were randomly
assigned a standard procedure to complete the task. The other six were asked to do the task using
the developed facilitation tools. The response measured was the time to completion (in minutes).
DEPENDENT
Formula
SAMPLE
The (1-α) 100% confidence interval for the mean difference
between two observations from matched samples, 𝜇𝑑
𝑠𝑑
𝜇𝑑 = d ± t α,𝑑𝑓 , 𝑑𝑓 = 𝑛 − 1
𝑛
Matched or paired samples 2
Example 2.15:The manufacturer of a gasoline additive claimed that the use of this
additive increases gasoline mileage. A random sample of six cars was selected and
these cars were driven for one week without the gasoline additive and then for one
week with the gasoline additive. The following table gives the miles per gallon for
these cars without and with the gasoline additive
Construct a 95% confidence interval for the difference in mean mileage per gallon
for cars without and with the gasoline additive. (Ans:(-3.2150,-0.2184))
Confidence Interval for Difference Between Two Population Means - Dependent samples
Example 2.16: Ariff is the Human Resources Director at the head office of a reputable bank in Ipoh. Ariff finds that absenteeism among the bank’s
employee is quite high leading to poor moral and slow performance. In order to boost employee performance and lower absenteeism among his
employees, he sent the bank’s employees to attend “The Innersole of Highly Effective People”, a training program conducted by Top Performers
Sdn.Bhd. In order to test the effectiveness of the training program, he selected a random sample of 12 employees and gathered data on the number
of days these employees were absent from work six months before the training program. He then collected the same data six months after the
training programs. The data is shown in the table
Employee Number of days absent from work
Before Program After Program
A 14 8
B 9 7
C 10 6
D 6 3
E 7 8
F 9 5
G 11 6
H 5 3
I 7 4
J 12 10
K 10 5
L 12 6
Determine and interpret the 95% confidence interval for the mean difference in number of days employees were absent before and after training
program. (Ans: (2.1326, 4.7008))
Confidence Interval for Difference Between Two Population Means - Dependent samples
Example 2.17: Many engineering students are having problems in data analysis using statistical software. A professor who
teaches statistics for engineering course offered a two day workshop on this topic. The following table gives the test scores of
seven engineering students before and after they attended the workshop.
Before 56 69 48 74 65 71 58
After 62 73 44 85 71 70 69
a) Show that 95% confidence interval for the difference in mean tests scores before and after attending the workshop is
between -9.94 and 0.51.
b) Can we conclude whether attending the workshop increases the test score?
CONFIDENCE INTERVAL FOR
VARIANCE
INTERVAL ESTIMATION FOR POPULATION VARIANCE
VARIANCE
Confidence Interval for variance
Chi-Square
Distribution F- Distribution
characteristics
Formula
Example 2.20:
The manufacturer of a small battery-powered tape recorder decides to include four
alkaline batteries with its product. Two battery suppliers are being considered; each
has its own brand (brand 1 and brand 2). The supervising inspector of incoming quality
believes that the battery lifetimes follow a normal distribution with equal variances. A
sample experiment is conducted: each of ten batteries (five of each brand) is
connected to a test device that places a small drain on the battery power and records
the battery lifetime the following results (in hours) are obtained:
Brand 1 43 48 38 41 51
Brand 2 30 26 37 31 34
Construct a 95% confidence interval on the ratio of the variances of lifetimes of the
battery of the two brands. Interpret the confidence interval obtained. Do the interval
supports the supervising inspector’s believes that the variances lifetimes of the two
brands are equal? (Ans: (0.167, 15.371))
Thank you