Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

1

Hypothesis Testing for a


Single Sample

Assoc. Prof. Prapaisri Sudasna-na-Ayudthya, KU


2

Statistics and Sampling


Distributions

• Statistical methods are used to


make decisions about a process
– Is the process out of control?
– Is the process average you were
given the true value?
– What is the true process
variability?

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


3

Statistics and Sampling


Distributions
• Statistics are quantities calculated
from a random sample taken from a
population of interest.

• The probability distribution of a


statistic is called a sampling
distribution.

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


4

Sampling from a Normal


Distribution

• Let X represent measurements


taken from a normal distribution.
X ~ N(µ ,σ )2

• Select a sample of size n, at


random, and calculate the sample
mean, x . Then  σ 2
x ~ N µ ,  
 n 
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
5

Sampling from a Normal


Distribution
• Chi-square (χ2) Distribution
– Furthermore, the sampling
distribution of
n
∑ i
( x − x ) 2
(n − 1)S 2
y= i =1
=
σ2 σ2
is chi-square with n – 1 d.f. when
sampling from a normal population.

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


6

Sampling from a Normal


Distribution
• t-distribution
– If X is a standard normal
random variable and if Y is a chi-
square random variable with k
degrees of freedom, then
x
t =
y
k
with k degrees of freedom.
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
7

Sampling from a Normal


Distribution
• F-distribution
– If w and y are two independent chi-
square random variables with u and
v degrees of freedom, respectively,
then w/u
F=
y/v
is distributed as F with u numerator d.f.
and v denominator d.f.

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


Point Estimation of Process
8

Parameters
• Parameters are values representing the
population, e.g. µ, σ 2

• Parameters in reality are often


unknown and must be estimated.
• Statistics are estimates of parameters,
e.g.
2
x, S
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
9

Point Estimation of Process


Parameters
Two properties of good point
estimators
1. The point estimator should be
unbiased.
2. The point estimator should
have minimum variance.
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
10

Statistical Inference for a


Single Sample

Two categories of statistical


inference:
1. Parameter Estimation
2. Hypothesis Testing

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


11

Hypothesis
Testing

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


12

Statistical Inference for a


Single Sample

• A statistical hypothesis is a
statement about the values of
the parameters of a probability
distribution.
H0 : µ = µ0
H1 : µ ≠ µ 0

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


13

• Hypothesis consists of two parts


1. Null Hypothesis (Ho)
: statement that you want to
reject. (Disprove)
2. Alternate Hypothesis (H1)
:statement that you want to prove
that it’s true.

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


14

Statistical Inference for a


Single Sample
• Steps in Hypothesis Testing
– Identify the parameter of
interest
– State the null hypothesis, H0
and alternative hypotheses,
H1.
– Choose a significance level
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
15

– State the appropriate test


statistic
– (State the rejection region)
– Compare the value of test
statistic to the rejection
region. Can the null hypothesis
be rejected?
p-value (Sig.) < α
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
16

Statistical Inference for a


Single Sample
• Example: An automobile
manufacturer claims a particular
automobile can perform at average
35 mpg (highway).
– Suppose we are interested in
testing this claim. We will sample 25
of these particular autos and under
identical conditions calculate the average
mpg for this sample.
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
17

– Before actually collecting the


data, we decide that if we get
a sample average less than 33
mpg or more than 37 mpg, we
will reject the makers claim.
(Critical Values)

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


18

Statistical Inference for a


Single Sample
• Example (continued)
– H0: µ = 35
Rejection Regions

H1: µ ≠ 35
Do not reject

• From the sample of


25 cars, the average
mpg was found to be
Reject
Reject

31.5. What is your


conclusion?
33 35 37
x

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


19

One-tailed Two-tailed
H1: < H1: ≠

α
(1- α)100% α/2 α/2
(1- α)100%
H1: >
α
(1- α)100%

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


20

Statistical Inference for a


Single Sample
Choice of Critical Values
• How are the critical values
chosen?
• Wouldn’t it be easier to decide
“how much room for error you
will allow” instead of finding the
exact critical values for every
problem you encounter?

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


Statistical Inference for a
21

Single Sample
Significance Level
• The level of significance, α
determines the size of the
rejection region.
• The level of significance is a
probability. It is also known as
the probability of a “Type I
error” (want this to be small)

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


22

• Type I error - rejecting the


null hypothesis when it is true.
How small? Usually want

α ≤ 0.10

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


23

Test
result FTR H0 Reject H0
Fact
H0 is
“true” 1- α α
H1 is
“true” β 1-β

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


24

Statistical Inference for a


Single Sample
Types of Error
• Type I error - rejecting the
null hypothesis when it is true.
Pr(Type I error) = α.
• Type II error - not rejecting
the null hypothesis when it is
false.
Pr(Type II error) = β.
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
Statistical Inference for a
25

Single Sample

Power of a Test
• The Power of a test of hypothesis
is given by 1 - β
• That is, 1 - β is the probability
of correctly rejecting the null
hypothesis

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


26

Inference on the Mean of a


Population, Variance Known
Hypothesis Testing
• Hypotheses: H0: µ = µ o H1: µ ≠ µ o
• Test Statistic: x − µ0
Z =
σ/ n
0

• Significance Level, α
• Rejection Region: Zo < − Zα / 2 or Z0 > Zα / 2
• If Z0 falls into R.R. above,Reject H0

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


27

Inference on the Mean of a


Population, Variance Known
Example
• Hypotheses: H0: µ = 175 H1: µ > 175
182 − 175
• Test Statistic: Z0 = = 3.50
10 / 25
• Significance Level, α = 0.05
• Rejection Region: Z0 > Zα = 1.645
Since 3.50 > 1.645, reject H0 and conclude that
the lot mean pressure strength exceeds 175 psi.

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


28

Inference on the Mean of a


Population, Variance Known
Confidence Intervals
• A general 100(1- α)% two-
sided confidence interval on
the true population mean, µ
is
P[ L ≤ µ ≤ U ] = (1 − α )

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


29

• 100(1- α)% One-sided


confidence intervals are:
Upper Lower

P[µ ≤ U] = (1 − α ) P[L ≤ µ ] = (1 − α )

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


30

Inference on the Mean of a


Population, Variance Known
Confidence Interval on the Mean
with Variance Known
• Two-Sided:

σ σ
P[ x − Z α ≤ µ ≤ x + Zα ] = (1 − α )
2 n 2 n

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


The Use of P-Values in
31

Hypothesis Testing
• If it is not enough to know if your
test statistic, Z0 falls into a
rejection region, then a measure
of just how significant your test
statistic is can be computed - P-
value.
• P-values are probabilities
associated with the test statistic,
Z0 .
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
32

The Use of P-Values in


Hypothesis Testing

Definition
• The P-value is the smallest
level of significance that
would lead to rejection of the
null hypothesis H0.

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


33

P-value
• One sided test
(H1: < or >)

P-Value
= P( test statistics < or >
calculated value)

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


34

• Two sided test


(H1 : ≠ ) left = < right = >

P-Value
= 2*P( test statistics <* or >*
calculated value)

* Sign depends on the position of the


calculated value. (left = <, right = >)

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


35

The Use of P-Values in


Hypothesis Testing
Example
• Reconsider the Example . The test
statistic was calculated to be Z0 =
3.50 for a right-tailed hypothesis
test. The P-value for this problem is
then P = 1 - Φ(3.50) = 0.00023
• Thus, H0: µ = 175 would be rejected
at any level of significance α ≥ P =
0.00023 i.e.reject when P < α
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
36

Inference on the Mean of a


Population, Variance Unknown
Hypothesis Testing
• Hypotheses: H0: µ = µ o H1: µ ≠ µ o
• Test Statistic: x − µ0
t0 =
s/ n
• Significance Level, α
• Reject H0 if
t 0 > t α / 2,n −1
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
37

Inference on the Mean of a


Population, Variance Unknown

Confidence Interval on the Mean with


Variance Unknown
• Two-Sided:
 s s 
P  x − t α / 2,n −1 ≤ µ ≤ x + t α / 2,n −1  = (1 − α )
 n n

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


38

Inference on the Mean of a


Population, Variance Unknown
Computer Output
Minitab Output
Welcome to Minitab, press F1 for help.

One-Sample T: Strength

Test of mu = 50 vs mu not = 50

Variable N Mean StDev SE Mean


Strength 16 49.864 1.661 0.415

Variable 95.0% CI T P
Strength (48.979, 50.750) -0.33 0.749

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


39

Inference on the Variance of


a Normal Distribution
Hypothesis Testing
• Hypotheses: H0:σ = σ 0 H1: σ ≠ σ 0
2 2 2 2

• Test Statistic: (n − 1)S 2


χ0 =
2

σ02

• Significance Level, α
• Rejection Region: 2
χ 0 > χ α2 or χ 02 < χ 2 α
, n −1 1− , n −1
2 2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


40

Inference on the Variance of


a Normal Distribution

Confidence Interval on the Variance


• Two-Sided:

 (n − 1)s 2 (n − 1)s 2 
≤σ ≤ 2  = 1− α
2
P 2
 χ α / 2,n −1 χ1− α / 2,n −1 

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


41

Inference on a Population
Proportion
Hypothesis Testing
• Hypotheses: H0: p = p0 H1: p ≠ p0
• Test Statistic:
X − np0
Z0 =
np0 (1 − p0 )
• Significance Level, α
• Rejection Region: Z 0 ≥ Z α / 2
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
42

Inference on a Population
Proportion

Confidence Interval on the Population


Proportion
• Two-Sided:
 pˆ (1 − pˆ ) pˆ (1 − pˆ ) 
P  pˆ − Zα / 2 ≤ p ≤ pˆ + Zα / 2  = 1−α
 n n 

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


43

CH X

Tests of Hypotheses for


Two Samples

Assoc. Prof. Prapaisri Sudasna-na-Ayudthya, KU


44

Statistical Inference for Two


Samples
The picture can't be displayed.

• Previous section presented


hypothesis testing and confidence
intervals for a single population
parameter.
• Results are extended to the case of
two independent populations
• Statistical inference on the
difference in population means, µ1 − µ 2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


45

Inference For a Difference in


Means, Variances Known
The picture can't be displayed.

Assumptions
1. X11, X12, …, X1n1 is a random sample
from population 1.
2. X21, X22, …, X2n2 is a random sample
from population 2.
3. The two populations represented by X1
and X2 are independent.
4. Both populations are normal, or if they
are not normal, the conditions of the
central limit theorem apply.
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
46

Inference For a Difference in


Means, Variances Known
The picture can't be displayed.

Null Hypothesis:
H 0 : µ1 − µ 2 = ∆ 0
Test Statistic:
X1 − X 2 − ∆ 0
Z0 =
σ12 σ 22
+
n1 n 2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


47

Inference For a Difference in


Means, Variances Known
The picture can't be displayed.

Hypothesis Tests for a Difference in Means,


Variances Known
Alternative Hypotheses Rejection Criterion

H1 : µ 1 − µ 2 ≠ ∆ 0 z 0 > z α / 2 or z 0 < − z α / 2
H1 : µ 1 − µ 2 > ∆ 0 z0 > zα
H1 : µ 1 − µ 2 < ∆ 0 z0 < −zα

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


48

Inference For a Difference in


Means,Variances Known
Confidence Interval on a Difference in
Means, Variances Known
100(1 - α)% confidence interval on
the difference in means is given by
σ σ
2 2 2
σ σ 2
x1 − x 2 − z α / 2 1
+ ≤ µ1 − µ 2 ≤ x 1 − x 2 + z α / 2
2 1
+ 2
n1 n 2 n1 n 2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


49

Inference For a Difference in


Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Case I: σ1 = σ 2 = σ
2 2 2

• Point estimator for µ1 − µ 2 is X1 − X 2


where

σ 2
σ 1
2
1 
V (X1 − X 2 ) = + = σ  + 
2

n1 n 2  n1 n 2 

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


50

Inference For a Difference in


Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Case I:
σ1 = σ 2 = σ
2 2 2

The pooled estimate of σ 2, denoted by S2p is


defined by

S =
2 (n1 − 1)S 2
1+ (n 2 − 1)S 2
2
n1 + n 2 − 2
p

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


51

Inference For a Difference in


Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Case I: σ 2 = σ 2 = σ 2
1 2

Null Hypothesis: H : µ − µ = ∆
0 1 2 0

Test Statistic: X1 − X 2 − ∆ 0
t0 =
1 1
Sp +
n1 n 2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


52

Inference For a Difference in


Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Variances Unknown
Alternative Hypotheses Rejection Criterion
H1 : µ 1 − µ 2 ≠ ∆ 0 t0 > tα / 2,n1 + n2 − 2 or
t0 < −tα / 2,n1 + n2 − 2
H1 : µ 1 − µ 2 > ∆ 0 t0 > tα ,n1 + n2 − 2
H1 : µ 1 − µ 2 < ∆ 0 t0 < −tα ,n1 + n2 − 2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


53

Inference For a Difference in


Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Case II: σ 2 ≠ σ 2
1 2

Null Hypothesis: H 0 : µ1 − µ 2 = ∆ 0

Test Statistic: ∗ X1 − X 2 − ∆ 0
t =
0
S12 S22
+
n1 n 2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


54

Inference For a Difference in


Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Case II: σ2 ≠ σ2
1 2

The degrees of freedom for t ∗0 are given by


2
S 2
S 2
 1
+ 
2

 n1 n2 
ν= 2
(S1 n1 ) + (S2 n2 )
2 2 2

n1 − 1 n2 − 1
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
55

Inference For a Difference in


Means, Variances Unknown
Confidence Intervals on a Difference in Means,
Case I: σ12 = σ 22 = σ 2
100(1 - α)% confidence interval on the
difference in means is given by

1 1 1 1
x 1 − x 2 − t α / 2 , n1 + n 2 − 2 s p + ≤ µ1 − µ 2 ≤ x 1 − x 2 + t α / 2 , n 1 + n 2 − 2 s p +
n1 n 2 n1 n 2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


56

Inference For a Difference in


Means, Variances Unknown
Confidence Intervals on a Difference in Means,
Case II: σ ≠σ
2 2
1 2
100(1 - α)% confidence interval on the
difference in means is given by

2 2 2 2
s s s s
x1 − x 2 − t α / 2 ,ν 1
+ ≤ µ1 − µ 2 ≤ x 1 − x 2 + t α / 2 , ν
2 1
+ 2
n1 n 2 n1 n 2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


57

Paired Data
• Observations in an experiment are
often paired to prevent extraneous
factors from inflating the estimate
of the variance.
• Difference is obtained on each pair
of observations, dj = x1j – x2j,
where j = 1, 2, …, n.
• Test the hypothesis that the mean
of the difference, µd, is zero.

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


58

Paired Data
• The differences, dj, represent the
“new” set of data with the
summary statistics:
1 n
d = ∑dj
n j=1
∑ (d j − d )
n 2

j=1
S =
2

n −1
d

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


59

Paired Data
Hypothesis Testing
• Hypotheses: H0: µd = 0 H1: µd ≠ 0
• Test Statistic:
d
t0 =
Sd n
• Significance Level, α
• Rejection Region: |t0| > tα/2,n-1

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


60

Inferences on the Variances


of Two Normal Distributions
Hypothesis Testing
• Consider testing the hypothesis that the
variances of two independent normal
distributions are equal.
H0 : σ = σ
2
1
2
2

H1 : σ12 ≠ σ 22
• Assume random samples of sizes n1 and n2
are taken from populations 1 and 2,
respectively
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
Inferences on the Variances
61

of Two Normal Distributions


Hypothesis Testing
• Hypotheses: H 0 : σ12 = σ 22 H1 : σ12 ≠ σ 22
• Test Statistic: 2
S
F0 = 1
2
S 2

• Significance Level, α
• R.R.: F0 > Fα / 2,n −1,n
1 2 −1
F0 < F(1− α / 2 ),n1 −1, n 2 −1

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


62

Inferences on the Variances


of Two Normal Distributions
The picture can't be displayed.

Alternative Test Rejection


Hypothesis Statistic Region
S22
H1 : σ < σ 2
1
2
2 F0 = 2 F0 > Fα ,n 2 −1,n1 −1
S1

S12
H1 : σ > σ 2
1
2
2 F0 = 2 F0 > Fα , n1 −1, n 2 −1
S2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


63

Inferences on the Variances


of Two Normal Distributions
The picture can't be displayed.

Confidence Intervals on Ratio of the


Variances of Two Normal Distributions
100(1 - α)% two-sided confidence
interval on the ratio of variances is given
by
S2 2
σ S 2
1
F(1− α / 2),n 2 −1,n1 −1 ≤
1
≤ Fα / 2,n 2 −1,n1 −1
1
S2
2
2
2 σ S 2
2

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


64

What If We Have More Than


Two Populations?
Example
Investigating the effect of one factor (with several
levels) on some response.
Hardwood Observations
Concentration 1 2 3 4 5 6 Totals Avg
5% 7 8 15 11 9 10 60 10.0
10 12 17 13 18 19 15 94 15.67
15 14 18 19 17 16 18 102 17.0
20 19 25 22 23 18 20 127 21.17
Overall 383 15.96

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU


65

What If We Have More Than


Two Populations?
Analysis of Variance
• Always a good practice to compare
the levels of the factor using
graphical methods such as boxplots.
• Comparative boxplots show the
variability of the observations
within a factor level and the
variability between factor levels.
Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU
66

What If We Have More Than


Two Populations?

25
Tensile strength (psi)

15

5
5 10 15 20
Hardwood Concentration (%)

Assoc.Prof. Prapaisri Sudasna-na-Ayudthya, KU

You might also like