Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

10/20/2020

STAL2073 – Chapter 4

Concept of Hypothesis Testing,


Parametric Hypothesis Testing for
One and Two Populations

Hypothesis
• Hypothesis: a statement regarding the population
• To validate a hypothesis, testing is required. A
sample must be obtained, and testing procedure
must be followed.
• E.g., pollutant released by a factory is said to
pollute the river causing growth retardation of
fishes in the river
• How to test and validate this hypothesis?
• Let’s begin by understanding some basic
concepts: Null hypothesis, Alternative hypothesis,
Error Type I, Error Type 2

1
10/20/2020

Basic Concept in Hypothesis


Testing

Type I Error (α) and Type II Error (β)

Real Situation
Decision Made
Ho CORRECT Ha CORRECT

Accept Ho OK Error Type II

Reject Ho (or Accept Error Type I OK


Ha)

2
10/20/2020

Type I Error= Error committed when Ho is rejected when Ho is true


Type II Error = Error committed when Ho is accepted when Ho is false

When making a decision we may commit Type I Error (α) i.e. when we Reject Ho
or Type II Error (β) i.e. when we accept Ho.

Optimal testing procedure is usually setting the values of α and β as smallest as


possible. However, β >> α and we’re allowed to set the value of α, say 95%, 99% level
of confidence, equivalent to α=0.05 and α=0.01, respectively.

Hence, in a hypothesis testing we reject Ho if the computed statistic allow us to do so


and the probability of making error is only α (1%, 5% or lower).

However, if the computed statistics does not allow us to reject Ho and we decided to
accept it (Ho), we’re risking ourselves to commit Type II Error (β) which is >> than α.

In such situation where we fail to reject Ho, we may not accept it. Instead, we may
say that the data does not support your Ha and this requires you to collect further
evidences or modify Ha.

3
10/20/2020

Example

• Assume X ~ N(µ,16) and a sample with n=25 is withdrawn


from this population
• What are optimal values of α and β for this test?
• α is usually directly set by the researcher e.g. α =0.05 or 5%
(or α=0.01, 1%). This implies the probability of making Type I
Error is set to 5% (or 1%) which is acceptable.
• In scientific papers in various journal, scientists usually use p-
value instead of α. What is p-value and its relationship with α.
• β on the other hand can not be set directly by the researcher
but can choose the hypothesis rejection criterion (or Area of
Rejection) so that Type II Error is the smallest.

Rejection Area (or Rejection Set)


{ }
_ _
S= X|X ³c
• In this Area of Rejection, we Reject Ho is mean sample equals or exceeds c)
• The value of c can be determined depending on the value of α.
• The following is based on CLT

4
10/20/2020

• Need to use standard normal distribution


• p{Z≥1.645}=0.05, hence 1.645=(c-10)x5/4 or
c=11.316
• Hence S = { X | X ³ 11.316} is the optimum Rejection
_ _

Area
• If the mean sample we collected exceeds or equal
to 11.316, hence we reject Ho with significance
level of 0.05 of 5% (or level of confidence of 95%)
• What is the Type Error II (β) for this Rejection
Area?

Hypotesis Testing involving 1


population
• Parametric hypothesis testing, requires normality assumption
• Hypothesis testing for 1 mean:
• Case I: Population is normally distributed and σ2 is known.
• Test I (Two-tail test)
H :  =  Lwn.H :   
o A o a A o

• Test II
H :    Lwn.H :   
o A o a A o

• Test III
H :  ³  Lwn.H :   
o A o a A o

x 
• Test statistic used : Z=
/ n
0

10

5
10/20/2020

• Rejection:
Test I: Reject Ho if |Z| > zα/2
Test II: Rejectk Ho if Z > Z α
Test III: Reject Ho if Z < - Z α

• Case II: Population is normally distributed but


σ2 is unknown.
For the same tests above, use this test
statistic: _

x 
t= 0

s/ n

11

• Rejection:
Test I: Reject Ho if |t|>tα/2,n-1
Test II: Reject Ho if t >tα,n-1
Test III: Reject Ho if t < - tα,n-1

• Case III: Population is not normally distributed but


sample size n > 30
Use CLT. Testing procedure is similar to Case II

• Case IV: Population is not normally distributed and


sample size n < 30 (small sample size)
Requires transformation of variables or application of
non-parametric testing procedures

12

6
10/20/2020

• P-value of a hypothesis test


Similar value to that of α defined by the exact value
of the test statistic e.g. Z or t
If p < α then Ho is rejected higher at significance level
of higher than 95% atau 99%.
p < 0.05 (significance level is higher than 95%)
p < 0.01 (significance level is higher than 99%)
p < 0.0001 (highly significant)
Jika p > α=0.05 (non significant), Ho is not rejected.
• Confidence interval
In addition to hypothesis testing, confidence
intervals of population parameters (mean, variance)
can be computed

13

• Case I: Confidence interval (1- α)100% for µ of


normally distributed population N(µ,σ2), σ is
known
_
 _

Kb[ x  z    x z ] = 1 
n n
/ 2 /2

_
 _

[x  z , x z ]
n n
/2 /2

• Case II: As in Case II but variance is unknown


_
 _

[x  t , x t ]
n n
 / 2 ,n 1  / 2 , n 1

14

7
10/20/2020

• Case III: Population is not normal but n > 30


Same as in Case II:
_
 _

[x  t , x t ]
n n
 / 2 ,n 1  / 2 , n 1

• Interpretation of Confidence Interval:


If α=0.05 and sample of size is taken
repeatedly 100 times and for each sample
confidence interval is computed, then out of
100 intervals, 95 will contain µ dan 5 will not.

15

Testing for Data normality


• For small sample size of n < 30, normality
testing is required prior to parametric
hypothesis testing
• Several methods can be use:
– qq-plot
– Shapiro-Wilk normality test
• We learn to do these testing procedures in R

16

8
10/20/2020

Hypothesis testing involving


variance (1 sample)
• Three types of testings
Case I: Ho: σ2= σo2 lwn Ha: σ2≠ σo2 (two-tail)
Case II: Ho: σ2≤ σo2 lwn Ha: σ2> σo2 (one-tail)
Case III: Ho: σ2 ≥σo2 lwn Ha: σ2< σo2 (one-tail)
• If X~N(µ,σ2) then

(n  1)s 2 will be distributed  2 (n  1)


2 =
2

• Test statistic: (n  1)s 2


2 = 2
o

17

• Rejection area at α level of confidence

Case I: Reject Ho if 2  2 
1 ,n 1
or 2  2 
,n 1
2 2

Case II: Reject Ho if  2   2  ,n 1

Case III: Reject Ho if  2   2 1  , n  1

• Confidence interval (CI) for σ2

 (n  1)s 2 (n  1)s 2 
 2 , 2 
   / 2 ,n 1  1  / 2 ,n1 

18

9
10/20/2020

Testing involving variance of 2


population (2 sample)
• Three types of testings
Case I: Ho: σA2= σB2 lwn Ha: σA2≠ σB2 (Two-tail)
Case II: Ho: σA2≤ σB2 lwn Ha: σA2> σB2 (One-tail)
Case III: Ho: σA2 ≥σB2lwn Ha: σA2< σB2 (One-tail)
• Test Statistic:

s2A
F=
s2B
• If both samples are drawn from normally distributed
populations then would follow F distribution.

19

TEST I:
Reject Ho if F < F1-α/2,nA-1, nB-1

or F > Fα/2,nA-1, nB-1

TEST II:
Reject Ho if F > Fα,nA-1, nB-1

TEST III
Reject Ho if F < F1-α,nA-1, nB-1

20

10
10/20/2020

5. Hypothesis testing of two population


means (Independent samples)
• Two samples are said to be independent if one does not influence the
other
CASE I (two-tail)
H o : 1 =  2 Lwn .H a :  1   2

CASE II (one-tail)

H o : 1   2 Lwn .H a :  1   2

CASE III (one-tail)

H o : 1 ³  2 Lwn .H a :  1   2

21

CASE I (Both populations are normal and


variance are known)

Test statistic:
_ _
x1  x 2
Z=
21 2 2

n1 n2

Rejection criterion is same as for testing 1


population

22

11
10/20/2020

CASE II (Both populations are normally distributed but variances are


unknown and unequal)
_ _

Test Statistic: x1  x 2
t=
s2 1 s2 2

n1 n 2

Rejection criteria is same as before but computed test statistic is compared


with t distribution of degree of freedom (df)

 s12 s 22 
n  n 
 1 2
= =
df dk 2 2
 s12   s 22 
   
 n1    n 2 
n1  1 n 2  1

23

CASE III (Both are normally distributed, variances unknown but


equal)

Test statistic: _ _
x1  x 2
t=
1 1 
s pooled   
 n1 n 2 
( n1 1)s12 ( n 2 1)s22
s2pooled = n1 n 2 2

Rejection criteria is same as before but test statistic is compared


with t distribution with df ==n1+n2-2.

24

12
10/20/2020

CASE IV (Both are not normal, variances are


unknown but equal and n1, n2 > 30)

Using CLT, CASE IV is similar to CASE III.

25

Hypothesis testing involving 2 dependent


samples
Comparing 2 sample from the same locations or same sampling units.
Example: To test the effectiveness of fish food pellet. Fish weights are
measured before and after the pellet is administered.

For each fish we have two readings: Before and After.

However, we can’t treat this as two independent samples, but instead two
dependent samples.

We consider the difference between the two samples, d= weight


differences after and before

26

13
10/20/2020

Hypothesis testing:
Ho:µd=0 vs. Ho: µd≠0 (two-tail)
Ho:µd≤0 vs. Ho: µd>0 (one-tail)
Ho:µd≥0 vs. Ho: µd<0 (one-tail)

If population is normally distributed and


variance is known can use:_
d
Zd =
d n

Otherwise as before use t distribution.

27

Example #1:
Assume X ~ N(µ,(78.5)2). One sample of
n=100 is taken and x_ =520.
Researcher is interested to test this hypothesis:
Ho:µ=500 vs. Ha: µ ≠ 500.
using α=0.05
This is a case of two-tail test with σ is known.
Statistik ujian:
520  500
Z= = 2.55
78.5 / 100
Reject Ho if Z > Zα/2=Z0.025=1.96.
Decision: Reject Ho. What is the p-value for this
test?
28

14
10/20/2020

Example #2
X~N(µ, σ 2), σ is unknown and
_
n=25, x=520, s=75
Ho:µ=500 vs. Ha: µ ≠ 500

520  500
Test statistic: t= = 1.33
75 / 100

From t distribution t0.025, 24=2.064

Decision: Fail to reject Ho, data does not support


Ha.
29

Example #3
Testing involving variance of 1 population.
Consider the following data: 31.2 32.5 30.8
31.5 29.5 31.1 31.3 30.7 26.7 29.2 32.1 28.3
31.6 29.2 31.5 29.7 30.4 31.0 29.1 30.5
Ho: σ ≤ .5 lwn. Ha: σ > .5
With the formula, s=1.41
2
(20  1)s 2  1.41 
 =
2
= 19  = 151.1   0.05,19 = 30.1435
2

 2
 0 .5 

Decision: Reject Ho.


30

15
10/20/2020

Example # 4
Researcher is of the opinion that fishes in Pool A are bigger than
those in Pool B. One sample of n=12 is taken from Pool A
and another one with n=11 from Pool B. All fishes are weighed

Pool A (Weight in g): [76.9 107.2 85.5 112.3 90.1 114.8 96.4
119.8 98.9 124.9 104.4 134.5]
Pool B (Weight in g): [41.9 84.8 52.8 91.8 58.3 95.2 61.3 104.9
74.9 114.3 82.3]

Before hypothesis testing can be done, because both sample


sizes < 30, both data must be tested for normality

31

• Assuming this is done (later we learn in R how


to do this), the next step is to test if the
variances are equal.

_ _
x A = 105.5, x B = 78.4
s2A = 285.5,s2B = 520.5

32

16
10/20/2020

520.5
F= = 1.82  F0.025,10,11 = 3.53
285.5

Fail to Reject Ho. Decision:


Variances are equal.

33

Hence we “pool” average the variance.

11s2A 10s2B 11(285.5) 10(520.5)


s =
2
p = = 397.4
1110 21
_ _
xA  xB 105.5  78.4
t= = = 3.25
1 1 1 1
sp    397.4   
 12 11   12 11 

Can Ho be rejected at α=0.05? What is the p-


value of this test?

34

17
10/20/2020

Example # 5
Effectiveness of a pellet food for fish growth was tested on 12
fishes in a laboratory. Researcher believes that the food can
increase the weight of the fishes by 10 g in within a week.

This is an example of
dependant samples (before and
after but readings are obtained
from the same subject).

Study such as this can not be


taken as 2 independent
samples

Ho:µd ≤ 10 lwn. Ha:µd > 10

35

 
 åd  åd ( )
2
2
1 _ n
sd =
n 1
å(d  d) = 
2

n 1  = 5.116
 

_
d10
t= = 2.34
sd / n

Can Ho be rejected?

36

18
10/20/2020

Important Concepts
• Hypothesis testing concepts
• Type I, Type II error, α, β, p-value
• Rejection area (or set)
• Hypothesis testing for 1 population with normal
distribution with known variance
• Hypothesis testing for 1 population with normal
distribution with unknown variance
• Central Limit Theorem
• Testing involving 2 populations
• Testing involving variance

37

19

You might also like