Download as pdf or txt
Download as pdf or txt
You are on page 1of 67

Statistical Inference

about Population Mean


Statistical Inference About Mean

Sampling q Confidence Interval


Distribution of
Sample Mean q Test of Significance
Simple Conditions
1. We have an SRS from the population of interest.

2. The variable we measure has an exactly Normal


distribution N(µ,σ) in the population.
3. We don’t know the population mean µ, but we do know
the population standard deviation σ.

Note: the conditions are all unrealistic.


Point Estimation: Sample Mean

q Generate an SRS of
size 25 from N(13,2).
q X% = 12.7
q Generate 1000 SRSs of
size 25 from N(13,2).
q What is the sampling
%
distribution of X?
Point Estimation: Sample Mean

q Generate an SRS of Sampling Distribution of Sample Mean


size 25 from N(13,2).

1.0
q X% = 12.7 N(13,0.4)

0.8
q Generate 1000 SRSs of

Density
0.6
size 25 from N(13,2).

0.4
q What is the sampling

0.2
%
distribution of X?
0.0
N(13,0.4). 11.5 12.0 12.5 13.0 13.5 14.0 14.5

Values of sample mean


Point Estimate is Not Enough
q We need to provide an interval of reasonable values for
the population mean µ. Confidence Interval (CI).
q Use what we learned about sampling distribution of
sample means.
Confidence Interval
For a Population Mean
Confidence Interval for a Population
Mean
N(µ, s n)
q The sample mean of an

1.0
SRS of size 25 from

0.8
N(µ,2) is 12.7.

Density
0.6
q How close to 12.7 to µ

0.4
likely to be?

0.2
0.0
µ
values of x
Confidence Interval for a Population
Mean
N(µ, 0.4)

q In 95% of all samples of

1.0
size 25, the sample mean

0.8
will be within 0.8 (two

Density
0.6
95%
standard deviation) of µ.

0.4
0.2
q In 95% of all samples of
size 25, the distance

0.0
µ - 0.8 µ µ + 0.8
between sample mean values of x
% |, will be
and µ, i.e. |X-µ
within 0.8.
Confidence Interval for a Population
Mean
N(µ, 0.4)

q In 95% of all samples of

1.0
size 25, the distance

0.8
Density
0.6
between sample mean 95%

0.4
% |, will be
and µ, i.e. |X-µ

0.2
within 0.8.

0.0
µ - 0.8 µ µ + 0.8
q In 95% of all samples of values of x
%
size 25, (X-0.8, %
X+0.8)
will capture µ.
General Form of Confidence Intervals
q An interval calculated from the data, which has the form:
estimate ± margin of error
q A confidence level C, where C is the probability that the
interval will capture the true parameter value in repeated
samples. In other words, the confidence level is the
success rate for the method.
§ We usually choose a confidence level of 90% or higher.
§ The most common confidence level is 95%.
Confidence Level
To say that we are 95%
confident is shorthand for
“95% of all possible samples
of a given size from this
population will yield intervals
that capture the unknown
parameter.”

Confidence Level
Confidence Interval for a Population
Mean

q Choose an SRS of size n from a Normal population having


unknown mean µ and known standard deviation σ. A level C
confidence interval for µ is:
𝝈
% ±𝒛
X ∗
𝒏
§ The value z* is called critical value which is found from the N(0,1)
distribution.
Previous Example
𝝈
% ±𝒛
X ∗
𝒏

q The sample mean of an SRS of size 25 from N(µ,2) is 12.7.


q A 95% confidence interval for µ is
12.7 ± 1.96*0.4 = (11.92, 13.48)
Find Critical Values for a given
confidence level C
q We can use tables of Normal
distribution or t distribution. C
(1-C)/2 (1-C)/2
q Example: For a 98%
confidence level,
−z* 0 z*
Find Critical Values for a given
confidence level C
q We can use tables of Normal
distribution or t distribution. C
(1-C)/2 (1-C)/2
q Example: For a 98%
confidence level, z*=2.326.
−z* 0 z*
The Margin of Error
𝝈 ∗
𝒎=𝒛
𝒏

q Depends on z*, which is determined by the confidence


level C
q Higher (Lower) confidence level C implies a larger
(smaller) margin of error m
What is a Good Confidence Interval?
q High confidence and a small margin of error
q High confidence suggests our method almost always
gives correct answers.
q A small margin of error suggests we have pinned down
the parameter precisely.
Small Margin of Error
𝝈 ∗
𝒎=𝒛
𝒏
q z* gets smaller (the same as a lower confidence level C).
q σ is smaller. It is easier to pin down µ when σ is smaller.
q n gets larger.
§ We must take four times as many observations to cut the margin
of error in half.
Choose the Sample Size
q No control over the population variability s
q We can choose the sample size n
q The confidence interval for a population mean will have a
specified margin of error m when the sample size is

2
s æ z *s ö
m = z* Û n =ç ÷
n è m ø
Example
q Measure density of bacteria in solution

q Measurement equipment has standard deviation σ = 1 *


106 bacteria/ml fluid.

q How many measurements should you make to obtain a


margin of error of at most 0.5 * 106 bacteria/ml with a
confidence level of 95%?
Example
q σ = 1 * 106 bacteria/ml fluid
q m = 0.5 * 106 bacteria/ml
q For a 95% confidence interval, z* = 1.96
æ z * s ö2 æ1.96 *1ö 2
n =ç ÷ Þ n =ç ÷ = 3.92 = 15.3664
2

è m ø è 0.5 ø
q Using only 15 measurements will not be enough to ensure that
m is no more than 0.5 * 106. Therefore, we need at least 16
measurements.
Some Cautions
q The data should be an SRS from the population.
q Confidence intervals are not resistant to outliers.
q If n is small (<15) and the population is not Normal, the true
confidence level of this method will be different from C.
q The standard deviation σ of the population must be known.
q The margin of error in a confidence interval covers only
uncertainty from random sampling.
Tests of Significance
(Significance Tests)
(Hypothesis Tests)
For a Population Mean
Test of Significance
q Goal: to assess evidence in the data about some claim
concerning a population.
q Formal procedure for comparing observed data with a claim
(also called a hypothesis) whose truth we want to assess.
q The claim is a statement about a parameter, like the population
mean µ.
q We express the results of a significance test in terms of a
probability, called the P-value, that measures how well the data
and the claim agree.
Steps of Significance Tests
1. Hypotheses
2. Test statistic
3. P-value
4. Conclusion
Example
q Population distribution is N(µ,5).
q Test a claim: µ=5
%
q An SRS of size 25 with the sample mean X=2.5.
q What can we conclude about the claim based on the data?
Example
q Population distribution is N(µ,5).
q Test a claim: µ=5
q An SRS of size 25 with the sample mean %
X=2.5.
q What is the parameter? What is the hypothesis? What is
the information from data?
§ Parameter: population mean µ
§ Hypothesis: µ=5
%
§ Data: X=2.5
Example
We can use software to simulate 1000 sets of SRSs of size
25 from N(5,5) (assume µ=5)
50
X# ~ N(µ, σ/√n)
= N(5, 5/√25)
40

= N(5,1)
30
Count

20
10
0

2 3 4 5 6 7 8

round(rnorm(1000, 5, 1), 1)

Values of 𝑥̅
Example
We can use software to simulate 1000 sets of SRSs of size
25 from N(5,5) (assume µ=5)
50
40
X# ~ N(5,1)
In 1000 sets of SRSs, there
were only 10 times when the
30

sample means are as small as


Count

% 2.5.
the observed X=
20
10
0

2 3 4 5 6 7 8

round(rnorm(1000, 5, 1), 1)

x = 2.5
Example
We can say, assuming that the actual parameter value µ=5, there is a small
chance that the sample mean can be as small as 2.5, i.e., the observed
statistic is so unlikely that it gives convincing evidence that the claim is not
true.
50
40

In 1000 sets of SRSs, there


were only 10 times when the
30

sample means are as small as


Count

% 2.5.
the observed X=
20
10
0

2 3 4 5 6 7 8

round(rnorm(1000, 5, 1), 1)

x = 2.5
Steps of Significance Tests
1. Hypotheses
2. Test statistic
3. P-value
4. Conclusion
Hypotheses
q Null hypothesis (H0): The claim needs to be tested.
§ Often, the null hypothesis is a statement of “no effect” or “no
difference”.
q Alternative hypothesis (H0): The claim about the population
for which we’re trying to find evidence.
§ The alternative is one-sided if it states either that a parameter is (1)
larger than the null hypothesis value, or (2) smaller than the null
hypothesis value.
§ It is two-sided if it states that the parameter is different from the null
value.
Example
q H0: µ=5 versus Ha: µ≠5 (two-sided alternative)
q H0: µ=5 versus Ha: µ<5 (one-sided alternative)
q H0: µ=5 versus Ha: µ>5 (one-sided alternative)
Logic of Significance Tests

“null hypothesis” =
“Prosecutor” “innocence”
Logic of Significance Tests

“null hypothesis” =
“Prosecutor” “innocence”
Test Statistic is Evidence.
q Calculated from the sample data
q Measures how far the data diverge from what we would expect
if the null hypothesis H0 were true.

estimate - hypothesized value


z=
standard deviation of the estimate
§ When H0 is true, we expect the estimate to be near the parameter value
specified in H0.
§ Large values of the statistic indicate stronger evidence against H0.
Previous Example
q H0: µ=5 versus Ha: µ<5
% = 2.5
q X
q Test statistic is
%
X−5 %
X−5
Z= =
SD(𝑥)̅ 1
q The observed value of the test statistic from the data is
2.5−5
𝑧= = −2.5
1
P-value
q The probability, computed assuming H0 is true, that the
statistic would take a value as or more extreme than the
one actually observed.
In 1000 sets of
SRSs, there were

50
only 10 times

40
when the sample
P-value = 0.01 means are as

30
small as the

Count
% 2.5.
observed X=
20
10
0

2 3 4 5 6 7 8

round(rnorm(1000, 5, 1), 1)

x = 2.5
How to Calculate the P-value
without doing repeated sampling
The probability, computed assuming H0 is true, that the statistic would
take a value as or more extreme than the one actually observed.

Previous Example:
q H0: µ=5 versus Ha: µ<5
q x# = 2.5
q Observed test statistic value from the data is z=-2.5
% ≤2.5) if H0 is true
q P-value = Pr(X
= Pr(Z≤-2.5) if H0 is true
How to Calculate the P-value
without actual repeated sampling
P-value = Pr(Z≤-2.5) if H0 is true
q What is the distribution of Z if H0 is true?
% ~ N(5,1) if H0 is true
X
%
X−5
Z= ~ N(0,1)
1
q P-value is the area to the left of -2.5
q P-value = 0.0062.
P-value
q Measures the strength of the evidence from the data
against a null hypothesis
q Small P-values
§ Observed result is unlikely to occur when H0 is true
§ Strong evidence against H0
q Large P-values
§ Observed result is likely to occur by chance when H0 is true
§ Fail to give convincing evidence against H0
P-value H0: µ = 5 vs Ha: µ < 5

0.4
0.3
0.2
0.1
0.0

2 3 4 5 6 7 8

x = 2.5 x=4

q X% = 2.5 has smaller P-value, and provides stronger


evidence against H0
Draw Conclusion
q One of two decisions: reject H0 or fail to reject H0
q Significance level (level of significance): α
q The most common level is α = 0.05.
q If P-value < α
§ Reject H0
§ Conclude Ha (in context)
§ The result is statistical significant at level α
q If P-value ≥ α
§ Fail to reject H0 (never “accept H0“)
§ Cannot conclude Ha (in context)
Four Steps of Significance Tests

1. State the null and alternative hypotheses.

2. Calculate the value of the test statistic.

3. Find the P-value for the observed data.

4. State a conclusion.
Example
q Question: Does the job satisfaction of assembly-line workers differ
when their work is machine-paced rather than self-paced?
q Study: chose 18 subjects at random from a company with over 200
workers who assembled electronic devices
q Half of the workers were assigned at random to each of two groups.
Both groups did similar assembly work, but one group was allowed
to pace themselves while the other group used an assembly line that
moved at a fixed pace.
q After two weeks, all the workers took a test of job satisfaction. Then
they switched work set-ups and took the test again after two more
weeks.
Example
q Question: Does the job satisfaction of assembly-line workers differ when their
work is machine-paced rather than self-paced?
q Study: chose 18 subjects at random from a company with over 200 workers who
assembled electronic devices
q Half of the workers were assigned at random to each of two groups. Both groups
did similar assembly work, but one group was allowed to pace themselves while
the other group used an assembly line that moved at a fixed pace.
q After two weeks, all the workers took a test of job satisfaction. Then they
switched work set-ups and took the test again after two more weeks.
q Is this design a matched-pairs design?
A. Yes.
B. No.
Example
q Question: Does the job satisfaction of assembly-line workers differ when
their work is machine-paced rather than self-paced?
q Study: chose 18 subjects at random from a company with over 200
workers who assembled electronic devices
q Half of the workers were assigned at random to each of two groups. Both
groups did similar assembly work, but one group was allowed to pace
themselves while the other group used an assembly line that moved at a
fixed pace.
q After two weeks, all the workers took a test of job satisfaction. Then they
switched work set-ups and took the test again after two more weeks.
q This is a matched-pairs design.
Example
q Question: Does the job satisfaction of assembly-line workers differ when their
work is machine-paced rather than self-paced?
q Study: chose 18 subjects at random from a company with over 200 workers who
assembled electronic devices
q Half of the workers were assigned at random to each of two groups. Both groups
did similar assembly work, but one group was allowed to pace themselves while
the other group used an assembly line that moved at a fixed pace.
q After two weeks, all the workers took a test of job satisfaction. Then they
switched work set-ups and took the test again after two more weeks.
q The response variable is the difference in satisfaction scores, self-paced minus
machine-paced. The sample mean score is 17.
q Suppose job satisfaction scores follow a Normal distribution with standard
deviation s=60.
Example
q Hypotheses:
H0: µ = 0 versus Ha: µ ≠ 0
§ This is a two-sided alternative, i.e., we are interested in
determining if a difference, positive or negative, exists.

q Test statistic is
x − µ0 17 − 0
z= = ≈ 1.20
σ 60
n 18
P-value
q Two-sided P-value
P-value = P(Z < –1.2 or Z > 1.2)
= 2 P(Z < –1.2)
= 2 P(Z > 1.2)
= (2)(0.1151) = 0.2302
P-value
q Two-sided P-value = 0.2302
q If H0 is true, there is a chance of 23% that we would see
results at least as extreme as those in the sample.
q At significance level α=0.05, the P-value is larger than α.
q This means there is not strong evidence against H0 or in
favor of Ha.
P-values and Confidence Intervals
For two-sided alternative,
q If we rejected the null hypothesis at significance level α
(e.g. α=0.05), the level C=1- α (C=95%) confidence
interval will NOT cover the hypothesized value of the
parameter.
q If the level C (e.g. C=95%) confidence interval did not
cover the hypothesized value of the parameter, we will
reject the null hypothesis at the significance level α=1-C
( α=0.05).
More Comments about P-Values
q Reporting the P-value is a better way to summarize a test than
simply stating whether or not H0 is rejected. This is because P-
values quantify how strong the evidence is against H0. The smaller
the value of P-value, the greater the evidence.
§ There is no practical difference between 4.9% and 5.1%.
§ It is the order of magnitude of the P-value that matters: “somewhat significant,”
“significant,” or “very significant.”
q P-values do not provide specific information about the true
population mean µ.
q If you desire a likely range of values for the parameter, use a
confidence interval.
Approximating a P-value
q In the absence of suitable technology, you can get
approximate P-values by comparing your test statistic with
critical values from a table.
q For example, H0: µ = 0 versus Ha: µ > 0
q Test statistic z = –0.276.
q P-value is the area under N(0,1) to the right of –0.276.
q Note that because N(0,1) is symmetric about 0, the area
under N(0,1) to the right of –0.276 is equal to the area to the
left of 0.276.
Approximating a P-value
q Find the area to the left of 0.276 under N(0,1)
q 0.276 is between 0.27 and 0.28
q The area to the left of 0.27 is 0.6064 and the area to the left
of 0.28 is 0.6103.
q The area to the left of 0.276 is between 0.6064 and 0.6103.
q 0.6064 < P-value < 0.6103
q There is no reason whatsoever to reject H0 in this case as the
P-value is so large.
How to Choose α
q What are the consequences of rejecting the null
hypothesis when it is actually true (an innocent person was
convicted of a crime)?
§ A smaller α indicates less chance to commit such errors.
q Are you conducting a preliminary study?
§ With a larger α, you will be less likely to miss an
interesting result.
Statistical Significance Might Not Be
Practical Important
q Statistical significance doesn’t tell you about the
magnitude of the effect, only that there is one.
q An effect could be too small to be relevant.
§ A drug is found to lower patient temperature an average of
0.4°Celsius (P-value < 0.01). But clinical benefits of temperature
reduction only appear for a 1°or larger decrease.

q With a large enough sample size, significance can be


reached even for the tiniest effect.
Don’t Ignore Lack of Significance
q Fail to reject the null hypothesis does not mean accepting
the null hypothesis.
q For instance, the sample size could be too small.
Potential Importance of Small Effects
q In some cases, effects that may appear to be trivial can be
very important.
§ Example: Improving the format of a computerized test reduces the
average response time by about 2 seconds. Although this effect is
small, it is important because this is done millions of times a year.
The cumulative time savings of using the better format is gigantic.

q Always think about the context.


Type I error and Type II error
q Type I error: reject H0 when H0 is true
q Type II error: fail to reject H0 when H0 is false

Truth about the population


H0 false
H0 true
(Ha true)
Correct
Conclusion Reject H0 Type I error
conclusion
based on
sample Fail to reject Correct
Type II error
H0 conclusion
Type I Error Rate
q The probability of a Type I error is the probability of
rejecting H0 when it is really true.
q This is exactly the significance level of the test, α.
q Determine significance level α is equivalent to determine
the probability Type I error rate.
Power of Test
q Power of a significance test is the probability of rejecting
H0 when Ha is true.
q The probability of a Type II error is the probability of
failing to reject H0 when Ha is true.
q Test power = 1 – Type II error rate
Power of Test Power of Test H0: µ=5 vs Ha: µ<5

1.00
q Test power is not a fixed
number.

0.80
Prob. of rejecting H0
q Previous example: Test

0.60
H0: µ=5 versus Ha: µ<5
with an SRS of size 25

0.40
q Test power is a function
of the values of µ in Ha

0.05
0 1 2 3 4 5

µ in Ha
When We Need Larger Sample Size?
q If you insist on a smaller significance level (such as 1%
rather than 5%)
q If you insist on higher power (such as 99% rather than 90%)
q At any significance level and desired power, detecting a
small difference between H0 and Ha requires a larger sample
than detecting a large difference.
Common Practices
q State H0 and Ha as in a test of significance.
q Think of the problem as a decision problem, so the
probabilities of Type I and Type II errors are relevant.
q Consider only tests in which the probability of a Type I
error is no greater than a specified α.
q Among these tests, select a test that makes the
probability of a Type II error as small as possible.

You might also like