STAL2073 Chapter4 2020 2021 6af2b89854d8f7bccb4ff4 231017 010847

10/20/2020
STAL2073 – Chapter 4
Concept of Hypothesis Testing,

Parametric Hypothesis Testing for
One and Two Populations
Hypothesis
• Hypothesis: a statement regarding the population
• To validate a hypothesis, testing is required. A
sample must be obtained, and testing procedure
must be followed.
• E.g., pollutant released by a factory is said to
pollute the river causing growth retardation of
fishes in the river
• How to test and validate this hypothesis?
• Let’s begin by understanding some basic
concepts: Null hypothesis, Alternative hypothesis,
Error Type I, Error Type 2
1
10/20/2020
Basic Concept in Hypothesis

Testing
Type I Error (α) and Type II Error (β)
Real Situation
Decision Made
Ho CORRECT Ha CORRECT
Accept Ho OK Error Type II
Reject Ho (or Accept Error Type I OK

Ha)
2
10/20/2020
Type I Error= Error committed when Ho is rejected when Ho is true

Type II Error = Error committed when Ho is accepted when Ho is false
When making a decision we may commit Type I Error (α) i.e. when we Reject Ho
or Type II Error (β) i.e. when we accept Ho.
Optimal testing procedure is usually setting the values of α and β as smallest as

possible. However, β >> α and we’re allowed to set the value of α, say 95%, 99% level
of confidence, equivalent to α=0.05 and α=0.01, respectively.
Hence, in a hypothesis testing we reject Ho if the computed statistic allow us to do so

and the probability of making error is only α (1%, 5% or lower).
However, if the computed statistics does not allow us to reject Ho and we decided to
accept it (Ho), we’re risking ourselves to commit Type II Error (β) which is >> than α.
In such situation where we fail to reject Ho, we may not accept it. Instead, we may
say that the data does not support your Ha and this requires you to collect further
evidences or modify Ha.
3
10/20/2020
Example
• Assume X ~ N(µ,16) and a sample with n=25 is withdrawn

from this population
• What are optimal values of α and β for this test?
• α is usually directly set by the researcher e.g. α =0.05 or 5%
(or α=0.01, 1%). This implies the probability of making Type I
Error is set to 5% (or 1%) which is acceptable.
• In scientific papers in various journal, scientists usually use p-
value instead of α. What is p-value and its relationship with α.
• β on the other hand can not be set directly by the researcher
but can choose the hypothesis rejection criterion (or Area of
Rejection) so that Type II Error is the smallest.
Rejection Area (or Rejection Set)

{ }
_ _
S= X|X ³c
• In this Area of Rejection, we Reject Ho is mean sample equals or exceeds c)
• The value of c can be determined depending on the value of α.
• The following is based on CLT
4
10/20/2020
• Need to use standard normal distribution

• p{Z≥1.645}=0.05, hence 1.645=(c-10)x5/4 or
c=11.316
• Hence S = { X | X ³ 11.316} is the optimum Rejection
_ _
Area
• If the mean sample we collected exceeds or equal
to 11.316, hence we reject Ho with significance
level of 0.05 of 5% (or level of confidence of 95%)
• What is the Type Error II (β) for this Rejection
Area?
Hypotesis Testing involving 1

population
• Parametric hypothesis testing, requires normality assumption
• Hypothesis testing for 1 mean:
• Case I: Population is normally distributed and σ2 is known.
• Test I (Two-tail test)
H :  =  Lwn.H :   
o A o a A o
• Test II
H :    Lwn.H :   
o A o a A o
• Test III
H :  ³  Lwn.H :   
o A o a A o
x 
• Test statistic used : Z=
/ n
0
10
5
10/20/2020
• Rejection:
Test I: Reject Ho if |Z| > zα/2
Test II: Rejectk Ho if Z > Z α
Test III: Reject Ho if Z < - Z α
• Case II: Population is normally distributed but

σ2 is unknown.
For the same tests above, use this test
statistic: _
x 
t= 0
s/ n
11
• Rejection:
Test I: Reject Ho if |t|>tα/2,n-1
Test II: Reject Ho if t >tα,n-1
Test III: Reject Ho if t < - tα,n-1
• Case III: Population is not normally distributed but

sample size n > 30
Use CLT. Testing procedure is similar to Case II
• Case IV: Population is not normally distributed and

sample size n < 30 (small sample size)
Requires transformation of variables or application of
non-parametric testing procedures
12
6
10/20/2020
• P-value of a hypothesis test

Similar value to that of α defined by the exact value
of the test statistic e.g. Z or t
If p < α then Ho is rejected higher at significance level
of higher than 95% atau 99%.
p < 0.05 (significance level is higher than 95%)
p < 0.01 (significance level is higher than 99%)
p < 0.0001 (highly significant)
Jika p > α=0.05 (non significant), Ho is not rejected.
• Confidence interval
In addition to hypothesis testing, confidence
intervals of population parameters (mean, variance)
can be computed
13
• Case I: Confidence interval (1- α)100% for µ of

normally distributed population N(µ,σ2), σ is
known
_
 _

Kb[ x  z    x z ] = 1 
n n
/ 2 /2
_
 _

[x  z , x z ]
n n
/2 /2
• Case II: As in Case II but variance is unknown

_
 _

[x  t , x t ]
n n
 / 2 ,n 1  / 2 , n 1
14
7
10/20/2020
• Case III: Population is not normal but n > 30

Same as in Case II:
_
 _

[x  t , x t ]
n n
 / 2 ,n 1  / 2 , n 1
• Interpretation of Confidence Interval:

If α=0.05 and sample of size is taken
repeatedly 100 times and for each sample
confidence interval is computed, then out of
100 intervals, 95 will contain µ dan 5 will not.
15
Testing for Data normality

• For small sample size of n < 30, normality
testing is required prior to parametric
hypothesis testing
• Several methods can be use:
– qq-plot
– Shapiro-Wilk normality test
• We learn to do these testing procedures in R
16
8
10/20/2020
Hypothesis testing involving

variance (1 sample)
• Three types of testings
Case I: Ho: σ2= σo2 lwn Ha: σ2≠ σo2 (two-tail)
Case II: Ho: σ2≤ σo2 lwn Ha: σ2> σo2 (one-tail)
Case III: Ho: σ2 ≥σo2 lwn Ha: σ2< σo2 (one-tail)
• If X~N(µ,σ2) then
(n  1)s 2 will be distributed  2 (n  1)

2 =
2
• Test statistic: (n  1)s 2

2 = 2
o
17
• Rejection area at α level of confidence
Case I: Reject Ho if 2  2 
1 ,n 1
or 2  2 
,n 1
2 2
Case II: Reject Ho if  2   2  ,n 1
Case III: Reject Ho if  2   2 1  , n  1
• Confidence interval (CI) for σ2
 (n  1)s 2 (n  1)s 2 
 2 , 2 
   / 2 ,n 1  1  / 2 ,n1 
18
9
10/20/2020
Testing involving variance of 2

population (2 sample)
• Three types of testings
Case I: Ho: σA2= σB2 lwn Ha: σA2≠ σB2 (Two-tail)
Case II: Ho: σA2≤ σB2 lwn Ha: σA2> σB2 (One-tail)
Case III: Ho: σA2 ≥σB2lwn Ha: σA2< σB2 (One-tail)
• Test Statistic:
s2A
F=
s2B
• If both samples are drawn from normally distributed
populations then would follow F distribution.
19
TEST I:
Reject Ho if F < F1-α/2,nA-1, nB-1
or F > Fα/2,nA-1, nB-1
TEST II:
Reject Ho if F > Fα,nA-1, nB-1
TEST III
Reject Ho if F < F1-α,nA-1, nB-1
20
10
10/20/2020
5. Hypothesis testing of two population

means (Independent samples)
• Two samples are said to be independent if one does not influence the
other
CASE I (two-tail)
H o : 1 =  2 Lwn .H a :  1   2
CASE II (one-tail)
H o : 1   2 Lwn .H a :  1   2
CASE III (one-tail)
H o : 1 ³  2 Lwn .H a :  1   2
21
CASE I (Both populations are normal and

variance are known)
Test statistic:
_ _
x1  x 2
Z=
21 2 2

n1 n2
Rejection criterion is same as for testing 1

population
22
11
10/20/2020
CASE II (Both populations are normally distributed but variances are

unknown and unequal)
_ _
Test Statistic: x1  x 2
t=
s2 1 s2 2

n1 n 2
Rejection criteria is same as before but computed test statistic is compared

with t distribution of degree of freedom (df)
 s12 s 22 
n  n 
 1 2
= =
df dk 2 2
 s12   s 22 
   
 n1    n 2 
n1  1 n 2  1
23
CASE III (Both are normally distributed, variances unknown but

equal)
Test statistic: _ _
x1  x 2
t=
1 1 
s pooled   
 n1 n 2 
( n1 1)s12 ( n 2 1)s22
s2pooled = n1 n 2 2
Rejection criteria is same as before but test statistic is compared

with t distribution with df ==n1+n2-2.
24
12
10/20/2020
CASE IV (Both are not normal, variances are

unknown but equal and n1, n2 > 30)
Using CLT, CASE IV is similar to CASE III.
25
Hypothesis testing involving 2 dependent

samples
Comparing 2 sample from the same locations or same sampling units.
Example: To test the effectiveness of fish food pellet. Fish weights are
measured before and after the pellet is administered.
For each fish we have two readings: Before and After.
However, we can’t treat this as two independent samples, but instead two
dependent samples.
We consider the difference between the two samples, d= weight

differences after and before
26
13
10/20/2020
Hypothesis testing:
Ho:µd=0 vs. Ho: µd≠0 (two-tail)
Ho:µd≤0 vs. Ho: µd>0 (one-tail)
Ho:µd≥0 vs. Ho: µd<0 (one-tail)
If population is normally distributed and

variance is known can use:_
d
Zd =
d n
Otherwise as before use t distribution.
27
Example #1:
Assume X ~ N(µ,(78.5)2). One sample of
n=100 is taken and x_ =520.
Researcher is interested to test this hypothesis:
Ho:µ=500 vs. Ha: µ ≠ 500.
using α=0.05
This is a case of two-tail test with σ is known.
Statistik ujian:
520  500
Z= = 2.55
78.5 / 100
Reject Ho if Z > Zα/2=Z0.025=1.96.
Decision: Reject Ho. What is the p-value for this
test?
28
14
10/20/2020
Example #2
X~N(µ, σ 2), σ is unknown and
_
n=25, x=520, s=75
Ho:µ=500 vs. Ha: µ ≠ 500
520  500
Test statistic: t= = 1.33
75 / 100
From t distribution t0.025, 24=2.064
Decision: Fail to reject Ho, data does not support

Ha.
29
Example #3
Testing involving variance of 1 population.
Consider the following data: 31.2 32.5 30.8
31.5 29.5 31.1 31.3 30.7 26.7 29.2 32.1 28.3
31.6 29.2 31.5 29.7 30.4 31.0 29.1 30.5
Ho: σ ≤ .5 lwn. Ha: σ > .5
With the formula, s=1.41
2
(20  1)s 2  1.41 
 =
2
= 19  = 151.1   0.05,19 = 30.1435
2
 2
 0 .5 
Decision: Reject Ho.

30
15
10/20/2020
Example # 4
Researcher is of the opinion that fishes in Pool A are bigger than
those in Pool B. One sample of n=12 is taken from Pool A
and another one with n=11 from Pool B. All fishes are weighed
Pool A (Weight in g): [76.9 107.2 85.5 112.3 90.1 114.8 96.4
119.8 98.9 124.9 104.4 134.5]
Pool B (Weight in g): [41.9 84.8 52.8 91.8 58.3 95.2 61.3 104.9
74.9 114.3 82.3]
Before hypothesis testing can be done, because both sample

sizes < 30, both data must be tested for normality
31
• Assuming this is done (later we learn in R how

to do this), the next step is to test if the
variances are equal.
_ _
x A = 105.5, x B = 78.4
s2A = 285.5,s2B = 520.5
32
16
10/20/2020
520.5
F= = 1.82  F0.025,10,11 = 3.53
285.5
Fail to Reject Ho. Decision:

Variances are equal.
33
Hence we “pool” average the variance.
11s2A 10s2B 11(285.5) 10(520.5)

s =
2
p = = 397.4
1110 21
_ _
xA  xB 105.5  78.4
t= = = 3.25
1 1 1 1
sp    397.4   
 12 11   12 11 
Can Ho be rejected at α=0.05? What is the p-

value of this test?
34
17
10/20/2020
Example # 5
Effectiveness of a pellet food for fish growth was tested on 12
fishes in a laboratory. Researcher believes that the food can
increase the weight of the fishes by 10 g in within a week.
This is an example of
dependant samples (before and
after but readings are obtained
from the same subject).
Study such as this can not be

taken as 2 independent
samples
Ho:µd ≤ 10 lwn. Ha:µd > 10
35
 
 åd  åd ( )
2
2
1 _ n
sd =
n 1
å(d  d) = 
2
n 1  = 5.116
 
_
d10
t= = 2.34
sd / n
Can Ho be rejected?
36
18
10/20/2020
Important Concepts
• Hypothesis testing concepts
• Type I, Type II error, α, β, p-value
• Rejection area (or set)
• Hypothesis testing for 1 population with normal
distribution with known variance
• Hypothesis testing for 1 population with normal
distribution with unknown variance
• Central Limit Theorem
• Testing involving 2 populations
• Testing involving variance
37
19

STAL2073 Chapter4 2020 2021 6af2b89854d8f7bccb4ff4 231017 010847

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAL2073 Chapter4 2020 2021 6af2b89854d8f7bccb4ff4 231017 010847

Uploaded by

Copyright:

Available Formats

10/20/2020

Concept of Hypothesis Testing,

Basic Concept in Hypothesis

Type I Error (α) and Type II Error (β)

Accept Ho OK Error Type II

Reject Ho (or Accept Error Type I OK

Type I Error= Error committed when Ho is rejected when Ho is true

Optimal testing procedure is usually setting the values of α and β as smallest as

Hence, in a hypothesis testing we reject Ho if the computed statistic allow us to do so

• Assume X ~ N(µ,16) and a sample with n=25 is withdrawn

Rejection Area (or Rejection Set)

• Need to use standard normal distribution

Hypotesis Testing involving 1

• Case II: Population is normally distributed but

• Case III: Population is not normally distributed but

• Case IV: Population is not normally distributed and

• P-value of a hypothesis test

• Case I: Confidence interval (1- α)100% for µ of

• Case II: As in Case II but variance is unknown

• Case III: Population is not normal but n > 30

• Interpretation of Confidence Interval:

Testing for Data normality

Hypothesis testing involving

(n  1)s 2 will be distributed  2 (n  1)

• Test statistic: (n  1)s 2

• Rejection area at α level of confidence

Case II: Reject Ho if  2   2  ,n 1

Case III: Reject Ho if  2   2 1  , n  1

• Confidence interval (CI) for σ2

Testing involving variance of 2

or F > Fα/2,nA-1, nB-1

5. Hypothesis testing of two population

CASE III (one-tail)

CASE I (Both populations are normal and

Rejection criterion is same as for testing 1

CASE II (Both populations are normally distributed but variances are

Rejection criteria is same as before but computed test statistic is compared

CASE III (Both are normally distributed, variances unknown but

Rejection criteria is same as before but test statistic is compared

CASE IV (Both are not normal, variances are

Using CLT, CASE IV is similar to CASE III.

Hypothesis testing involving 2 dependent

For each fish we have two readings: Before and After.

We consider the difference between the two samples, d= weight

If population is normally distributed and

Otherwise as before use t distribution.

From t distribution t0.025, 24=2.064

Decision: Fail to reject Ho, data does not support

Decision: Reject Ho.

Before hypothesis testing can be done, because both sample

• Assuming this is done (later we learn in R how

Fail to Reject Ho. Decision:

Hence we “pool” average the variance.

11s2A 10s2B 11(285.5) 10(520.5)

Can Ho be rejected at α=0.05? What is the p-

Study such as this can not be

Ho:µd ≤ 10 lwn. Ha:µd > 10

You might also like