Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Contents

7.1. Two sample tests with numerical data


7.2. Analysis of variance
7.3. Post hoc analysis

PHÂN TÍCH DỮ LIỆU


BẰNG PYTHON
Lecture 7: Tests of Hypothesis

1
2

7.1. Two sample tests with numerical data 7.1. Two sample tests with numerical data

 Comparing two independent samples


Sample Population
 Independent samples z test for the difference in two means Group 1 Group 2 Group 1 Group 2
 Pooled-variance t test for the difference in two means N infinite infinite

Mean 𝑋 𝑋
 Comparing two related samples
SD (standard deviation)
 Paired-sample z test for the mean difference
Difference -
 Paired-sample t test for the mean difference Status Known Known/unknown

3 4
7.1. Two sample tests with numerical data 7.1. Two sample tests with numerical data
 7.1.1 Pooled-Variance t Test (Variances Unknown) for two  7.1.1 Pooled-Variance t Test (Variances Unknown)
independent samples  Setting up the hypotheses
 Assumptions
H0: 1 = 2
 Both populations are normally distributed
H1: 1  2
 Samples are randomly and independently drawn
 Calculate the Pooled Sample Variance as an Estimate of the
 Population variances are unknown but assumed equal
Common Population Variance
 If both populations are not normal, need large sample
( n1  1) S12  ( n2  1) S 22
sizes S p2 
( n1  1)  ( n2  1)
S p2 : Pooled sample variance n1 : Size of sample 1
S12 : Variance of sample 1 n2 : Size of sample 2
5
S 22 : Variance of sample 2 6

7.1. Two sample tests with numerical data 7.1. Two sample tests with numerical data

 7.1.1 Pooled-Variance t Test (Variances Unknown)  p-value or critical value (CV) solution
 Compute the sample statistic
(p-Value  ( = 0.05/2) -> Reject.

t
X 1  X 2    1   2 
1 1
S p2    Hypothesized
df  n1  n2  2  n1 n2  difference Reject Reject

S 2

 n1  1 S12   n2  1 S 22 
 =.025
p
 n1  1   n2  1
-CV
0 CV Z

7 8
7.1. Two sample tests with numerical data 7.1. Two sample tests with numerical data

 7.1.1 Pooled-Variance t Test được tính bằng python  7.1.2 Comparing two independent samples
 Different Data Sources
• Unrelated
• Independent
- Sample selected from one population has no effect
or bearing on the sample selected from the other
population
 Use the Difference between 2 Sample Means
 Use Z Test or Pooled-Variance t Test
Kết luận gì?

9 10

7.1. Two sample tests with numerical data 7.1. Two sample tests with numerical data

 7.1.2 Independent Sample Z Test (Variances Known)  z test statistic được tính bằng python
 Assumptions
• Samples are randomly and independently drawn from
normal distributions
• Population variances are known
 Test Statistic

( X 1  X 2 )  ( 1    )
Z
 2  2

n1 n2
11 12
7.2. One-way analysis of variance – F Test 7.2. One-way analysis of variance – F Test

 Evaluate the difference among the mean responses of more  Hypotheses of one-way Anova
two populations  H : 1   2     c
0
 Assumptions  All population means are equal
 Samples are randomly and independently drawn
 H 1 : N o t a ll  i a re th e s a m e
 Populations are normally distributed
 At least one population mean is different (others may
 Populations have equal variances be the same!)
 Does not mean that all population means are different

13 14

7.2. One-way analysis of variance – F Test 7.2. One-way analysis of variance – F Test

 One-way Anova (treatment effect present)  One-way Anova (Partition of Total variation)
H 0 : 1   2    c Total Variation SST
H 1 : N o t a ll  i a r e th e s a m e The Null Hypothesis
is NOT True
Variation Due to Variation Due to Random
= Group SSA + Sampling SSW

1   2  3 1   2  3
15 16
7.2. One-way analysis of variance – F Test 7.2. One-way analysis of variance – F Test

 Total variation  Total variation

  X     X 
c nj
2 2 2
SST  (X
j 1 i 1
ij  X )2 SST  X 11  X 21 X nc c X
X ij : the i -th observation in group j
Response, X
n j : the number of observations in group j
n : the total number of observations in all groups
c : the number of groups
nj
X
c

 X
j 1 i 1
ij

X  the over all or grand mean


n Group 1 Group 2 Group 3

17 18

7.2. One-way analysis of variance – F Test 7.2. One-way analysis of variance – F Test

 Among-Group Variation  Among-Group Variation

     
c 2 2 2

SSA
SSA 
j 1
n j(X j  X )2 M SA 
c  1 SSA  n1 X1  X  n2 X 2  X  nc Xc  X

Response, X

X j : The sample mean of group j X3


X : The overall or grand mean
X
X1 X2
i  j Variation Due to Differences Among Groups

Group 1 Group 2 Group 3


19 20
7.2. One-way analysis of variance – F Test 7.2. One-way analysis of variance – F Test

 Within-Group Variation  Within-Group Variation


n
c j
SSW
  M SW  𝟐
SSW  (X  X )2 𝑺𝑺𝑾 𝑿𝟏𝒊 𝑿𝟏 𝟐
𝑿𝟐𝒋 𝑿𝟐 ... 𝑿𝒏𝒌 𝑿𝒏 𝟐
n  c
ij j
j 1 i1

X j : T he sam ple m ean of group j Response, X


X ij : T he i -th observation in group j
X3
Summing the variation within each X
group and then adding over all X2
groups X1

Group 1 Group 2 Group 3


j
21 22

7.2. One-way analysis of variance – F Test 7.2. One-way analysis of variance – F Test

 F test statistic  Summary table


 Test Statistic Degrees Mean
MSA Source of Sum of F
 F of Squares
Variation Squares Statistic
MSW Freedom (Variance)
•MSA is mean squares among Among MSA =
c–1 SSA MSA/MSW
• MSW is mean squares within (Factor) SSA/(c – 1 )
 Degrees of Freedom Within MSW =
n–c SSW
 df1  c  1 (Error) SSW/(n – c )
 df  n  c SST =
2 Total n–1
SSA + SSW

23 24
7.2. One-way analysis of variance – F Test 7.2. One-way analysis of variance – F Test

 Features of one-way anova – F statistic  Features of one-way anova – F statistic


 The F Statistic is the Ratio of the Among Estimate of  If the Null Hypothesis is False
Variance and the Within Estimate of Variance  The numerator should be greater than the
 The ratio must always be positive denominator
 df1 = c -1 will typically be small  The ratio should be larger than 1

 df2 = n - c will typically be large


 The Ratio Should Be Close to 1 if the Null is True The null hypothesis
 = 0.05 is False

0 CV F
CV=Critical values,  =
0.05, F=3.89
25 26

7.2. One-way analysis of variance – F Test 7.2. One-way analysis of variance – F Test

 One-way anova example  One-way anova example – summary table


Machine1 Machine2 Machine3 X 1  24.93 nj  5 Degrees Mean
25.40 23.40 20.00 Source of Sum of F
X 2  22.61 of Squares
26.31 21.80 22.20 c3 Variation Squares Statistic
Freedom (Variance)
24.10 23.50 19.75 X 3  20.59 n  15 Among MSA/MSW
23.74 22.75 20.60
X  22.71 3-1=2 47.1640 23.5820
25.10 21.60 20.40 (Factor) =25.60
SSA  5  24.93  22.71   22.61  22.71   20.59  22.71 
2 2 2
Within
  15-3=12 11.0532 .9211
(Error)
 47.164
Total 15-1=14 58.2172
SSW  4.2592  3.112  3.682  11.0532
MSA  SSA /(c -1)  47.16 / 2  23.5820
MSW  SSW /( n - c )  11.0532 /12  .9211
27 28
7.2. One-way analysis of variance – F Test 7.2. One-way analysis of variance – F Test

 One-way anova example – solutions  F Test statistic được tính dựa vào python

H 0:  1 =  2 =  3 Test Statistic:
H1: Not All Equal MSA 23.5820
 = .05 F   25.
df1= 2 df2 = 12 MSW .9211

Critical Value(s):

 = 0.05

0 3.89 F
29 30

7.2. One-way analysis of variance – F Test 7.2. One-way analysis of variance – F Test

 Features of one-way anova – F statistic  One-way Anova (treatment effect present)


 The F Statistic is the Ratio of the Among Estimate of H : 1      c
0 2
Variance and the Within Estimate of Variance
 The ratio must always be positive H 1 : N o t a ll  i a r e th e s a m e The Null Hypothesis
is NOT True
 df1 = c -1 will typically be small
 df2 = n - c will typically be large
 The Ratio Should Be Close to 1 if the Null is True

1   2  3 1   2  3
31 32
7.3. Post hoc analysis 7.3. Post hoc analysis

 Questions:  Some methods of post hoc analysis


 Is there any difference between the three groups?  LSD (least significance difference) or Fisher’s method
 If there is a difference, which group is different from the  Bonferroni’s method
others?  Duncan’s multiple range test
Machine1 Machine2 Machine3
25.40 23.40 20.00  Scheffe
26.31 21.80 22.20
24.10 23.50 19.75  Tukey’s Honest Significant Difference
23.74 22.75 20.60  Dunnett’s Test
25.10 21.60 20.40

33 34

7.3. Post hoc analysis 7.3. Post hoc analysis

 Tukey’s Honest Significant Difference  Kiểm định Tukey được tính bằng Python như sau:
 HSD=Honest Significant Difference

 is the average number of objects per a group


 If is greater than the theoretical Q value (theoretical
Tukey’s Studentized critical value), there is a significant
difference between the two groups, and it has statistical
significance.

35 36
Bài thực hành - Python Hỏi & Đáp …

37 38

You might also like