Type I and Type II Errors

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

Type I and Type II errors

Type I error refers to the situation when we reject the null hypothesis when it is
 
true (H0 is wrongly rejected). Denoted by

Type II error refers to the situation when we accept the null hypothesis when it is
 
false. (H0 is wrongly Accepted). Denoted by
Power and Sample Size
••The
  power of a test is the probability of correctly rejecting the
null hypothesis
• It is denoted by , where (1-is the probability of a type II error.
• The power of a test improves as the sample size increases
• power is used to determine the necessary sample size.
• power of a hypothesis test depends on the true difference of
the population means.
• A larger sample size is required to detect a smaller difference
in the means.
• In general, Effect size d= difference between the means
• It is important to consider an appropriate effect size for the
problem at hand
Power and Sample Size
A larger sample size better identifies a fixed effect size
ANOVA: Analysis of Variance
• Used when there are more than 2 population (groups)
• ANOVA is a generalization of the hypothesis testing of the difference of
two population means.
• ANOVA tests if any of the population means differ from the other
population means.
• The null hypothesis of ANOVA is that all the population means are equal.
The alternative hypothesis is that at least one pair of the population
means is not equal
• H0: µ1=µ2
• HA: µi ≠µj for at least one pair of i,j
ANOVA: Analysis of Variance
• Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality
of two or more population (or treatment) means by examining the variances of samples
that are taken.
• ANOVA allows one to determine whether the differences between the samples are
simply due to random error (sampling errors) or whether there are systematic treatment
effects that causes the mean in one group to differ from the mean in another.
• ANOVA is based on comparing the variance (or variation) between the data samples to
variation within each particular sample.
• If the between variation is much larger than the within variation, the means of different
samples will not be equal.
• If the between and within variations are approximately the same size, then there will be
no significant difference between sample means.
Assumptions of ANOVA:
(i) All populations involved follow a normal distribution.
(ii) All populations have the same variance (or standard deviation).
(iii) The samples are randomly selected and independent of one another
• Since ANOVA assumes the populations involved follow a normal
distribution, ANOVA falls into a category of hypothesis tests known as
parametric tests.
• If the populations involved did not follow a normal distribution, an
ANOVA test could not be used to examine the equality of the sample
means.
ANOVA: Analysis of Variance
• Calculate the test statistic.
• The goal is to test whether the clusters formed by each population are more
tightly grouped than the spread across all the populations.
• Let the total number of populations be k.
• The total number of samples N is randomly split into the k groups. The
number of samples in the i-th group is denoted as n1, and the mean of the
group is where i ε[1,k]. The mean of all the samples is denoted as
• The between-groups mean sum of squares, , is an estimate of the
between-groups variance. It measures how the population means vary with
respect to the grand mean, or the mean spread across all the populations
ANOVA: Analysis of Variance
• The within-group mean sum of squares, . is an estimate of the
within-group variance. It quantifies the spread of values within
groups.
ANOVA: Analysis of Variance
• The F-test statistic in ANOVA can be thought of as a measure of how
different the means are relative to the variability within each group.
• The larger the observed F-test statistic, the greater the likelihood that
the differences between the means are due to something other than
chance alone.
• The F-test statistic is used to test the hypothesis that the observed
effects are not due to chance-that is, if the means are significantly
different from one another.
ANOVA
for comparing means between more than 2 groups

Variance: average of squared differences from mean.


Hypothesis of One-Way ANOVA
• H 0 : μ1  μ 2  μ 3    μ c
• All population means are equal
• i.e., no treatment effect (no variation in means among groups)


H 1 :least
• At Notoneall of the population
population mean is different
means are the same
• i.e., there is a treatment effect
• Does not mean that all population means are different (some pairs may be the
same)
The F-distribution
• A ratio of variances follows an F-distribution:

 2
between
~ F
 2
within
n,m

The F-test tests the hypothesis that two variances are equal.
F will be close to 1 if sample variances are equal.

H
0:2
2
between
within

H
a:2
2
between
within
How to calculate ANOVA’s by hand…
 
Treatment 1 Treatment 2 Treatment 3 Treatment 4
y11 y21 y31 y41
y12 y22 y32 y42
n=10 obs./group
y13 y23 y33 y43
y14 y24 y34 y44 k=4 groups
y15 y25 y35 y45
y16 y26 y36 y46
y17 y27 y37 y47
y18 y28 y38 y48
y19 y29 y39 y49
y110 y210 y310 y410
10


10 10 10
y1 j
y 2j y 3j y 4j The group means
j 1 j 1
y1  y 2 
j 1
y 3 
j 1 y 4 
10 10 10 10
10 10

10 10

(y 1j  y1 ) 2

j 1
( y 2 j  y 2 ) 2 (y
j 1
3j  y 3 ) 2
(y
j 1
4j  y 4 ) 2
The (within) group
j 1

10  1 10  1 10  1 10  1 variances
Sum of Squares Within (SSW), or Sum of
Squares Error (SSE)
10 10
(y
10 10

(y (y
2
 y 2 )
(y  y 4 ) 2
2
1j  y1 ) 2 2j 3j  y 3 ) 4j
j 1 j 1 j 1
The (within) group
j 1
variances
10  1 10  1 10  1 10  1

10 10

 (y
10 10

(y   y 4 ) 2
2
1j
2
 y1 ) + ( y 2 j  y 2 ) 2 + ( y 3 j  y 3 ) + 4j
j 1 j 3 j 1
j 1

4 10
  i 1 j 1
( y ij  y i  ) 2 Sum of Squares Within (SSW) (or SSE, for
chance error)
Sum of Squares Between (SSB), or Sum of Squares
Regression (SSR)

4 10
Overall mean of all 40
observations (“grand  y
i 1 j 1
ij
mean”)
y  
40


Sum of Squares Between (SSB).
2
10 x ( y i  y  )
Variability of the group means compared
to the grand mean (the variability due to
the treatment).
i 1
Total Sum of Squares (SST)

Total sum of squares(TSS).


4 10


Squared difference of every observation

( y ij  y  ) 2 from the overall mean. (numerator of


variance of Y!)

i 1 j 1
Partitioning of Variance

4 10 4 4 10
(y
i 1 j 1
ij  y i ) 2
+ 10x
 ( y i   y  ) 2 =  ( y ij  y  ) 2
i 1 i 1 j 1

SSW + SSB = TSS


ANOVA Table
     

Mean Sum of
Source of Sum of squares Squares
variation d.f. F-statistic p-value

Between k-1 SSB SSB/k-1 Go to


SSB
(sum of squared deviations k 1
(k groups) SSW Fk-1,nk-k
of group means from grand nk  k
mean)
chart

Within nk-k SSW s2=SSW/nk-k


(sum of squared deviations
(n individuals per group)
of observations from their
group mean)

Total variation nk-1 TSS


(sum of squared deviations of observations
from grand mean)   TSS=SSB + SSW
Example

Treatment 1 Treatment 2 Treatment 3 Treatment 4


60 inches 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
Example

Step 1) calculate the sum of squares


Treatment 1 Treatment 2 Treatment 3 Treatment 4
between groups:
60 inches 50 48 47
  67 52 49 67
42 43 50 54
Mean for group 1 = 62.0 67 67 55 67
Mean for group 2 = 59.7 56 67 56 68
62 59 61 65
Mean for group 3 = 56.3 64 67 61 65

Mean for group 4 = 61.4 59 64 60 56


72 63 59 60
  71 65 64 65

Grand mean= 59.85

SSB = [(62-59.85)2 + (59.7-59.85)2 + (56.3-59.85)2 + (61.4-59.85)2 ] xn per group= 19.65x10 = 196.5


Example

Step 2) calculate the sum of squares


Treatment 1 Treatment 2 Treatment 3 Treatment 4
within groups:
60 inches 50 48 47
  67 52 49 67
42 43 50 54
(60-62) 2+(67-62) 2+ (42-62) 2+ (67-62) 2+ 67 67 55 67
(56-62) 2+ (62-62) 2+ (64-62) 2+ (59-62) 2+ 56 67 56 68
(72-62) 2+ (71-62) 2+ (50-59.7) 2+ (52-59.7) 62 59 61 65
2
+ (43-59.7) 2+67-59.7) 2+ (67-59.7) 2+ (69- 64 67 61 65
59.7) 2…+….(sum of 40 squared 59 64 60 56
deviations) = 2060.6 72 63 59 60
71 65 64 65
Step 3) Fill in the ANOVA table
           

Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value

Between 3 196.5 65.5 1.14 .344

Within 36 2060.6 57.2    

 
Total 39 2257.1      
Step 3) Fill in the ANOVA table
           

Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value

Between 3 196.5 65.5 1.14 .344

Within 36 2060.6 57.2    

 
Total 39 2257.1      

INTERPRETATION of ANOVA:
How much of the variance in height is explained by treatment group?
R2=“Coefficient of Determination” = SSB/TSS = 196.5/2275.1=9%
Coefficient of Determination

2 SSB SSB
R  
SSB  SSE SST
The amount of variation in the outcome variable (dependent variable) that is explained by the predictor
(independent variable).
ANOVA example
Table 6. Mean micronutrient intake from the school lunch by school
S1a, n=25 S2b, n=25 S3c, n=25 P-valued
Calcium (mg) Mean 117.8 158.7 206.5 0.000
SDe 62.4 70.5 86.2
Iron (mg) Mean 2.0 2.0 2.0 0.854
SD 0.6 0.6 0.6
Folate (μg) Mean 26.6 38.7 42.6 0.000
SD 13.1 14.5 15.1
Mean 1.9 1.5 1.3 0.055
Zinc (mg)
SD 1.0 1.2 0.4
a
 School 1 (most deprived; 40% subsidized lunches). FROM: Gould R, Russell J, Barker ME.
School lunch menus and 11 to 12 year old
b
 School 2 (medium deprived; <10% subsidized). children's food choice in three secondary
schools in England-are the nutritional
c
 School 3 (least deprived; no subsidization, private school). standards being met? Appetite. 2006
Jan;46(1):86-92.
d
 ANOVA; significant differences are highlighted in bold (P<0.05).
Answer

Step 1) calculate the sum of squares between groups:


Mean for School 1 = 117.8
Mean for School 2 = 158.7
Mean for School 3 = 206.5

Grand mean: 161

SSB = [(117.8-161)2 + (158.7-161)2 + (206.5-161)2] x25 per group= 98,113


Answer

Step 2) calculate the sum of squares within groups:


 
S.D. for S1 = 62.4
S.D. for S2 = 70.5
S.D. for S3 = 86.2

Therefore, sum of squares within is:


(24)[ 62.42 + 70.5 2+ 86.22]=391,066
Answer

Step 3) Fill in your


 
ANOVA
  table        

Source of variation d.f. Sum of squares Mean Sum of Squares F-statistic p-value

Between 2 98,113 49056 9 <.05

Within 72 391,066 5431    

Total 74 489,179      

**R2=98113/489179=20%
School explains 20% of the variance in lunchtime calcium intake in these kids.

You might also like