Quantitative Techniques - Ii: Dr. Pritha Guha

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

QUANTITATIVE

TECHNIQUES - II
Dr. Pritha Guha
MORE THAN TWO POPULATIONS
Comparing the means of more than two populations
• Annual savings using public transportation in 4 large American cities (in $):

• We believe that the annual saving for the 4 cities are the same. We would have
𝐻0 : mean savings for all cities are same
𝐻1 : mean savings vary across the cities
boxplot(savings~city, col = c("red", "blue", "yellow", "plum"), main = "Boxplot of
Annual savings using public transportation in 4 large American cities")
Comparing the means of more than two populations: Set up
• Suppose we have k (≥ 3) samples.
• We would like to know whether they are from the same distribution.
• Assumption: all samples are from normal distributions with same variance (unknown)
• Samples:
• Sample 1 : 𝑋11 , 𝑋12 , ⋯ , 𝑋1𝑛1 𝐼𝐼𝐷 𝑁(𝜇1 , 𝜎 2 )
• Sample 2 : 𝑋11 , 𝑋12 , ⋯ , 𝑋1𝑛1 𝐼𝐼𝐷 𝑁(𝜇2 , 𝜎 2 )

• Sample k : 𝑋11 , 𝑋12 , ⋯ , 𝑋1𝑛1 𝐼𝐼𝐷 𝑁(𝜇𝑘 , 𝜎 2 )

• 𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 , 𝐻0 : at least one 𝜇𝑖 is different


Comparing the means of more than two populations:
Analysis Of Variance(ANOVA)
• Analysis of variance, is about comparing the means of multiple populations.
• A one-way layout is an experimental set up where independent measures are made under
several treatments (factor/groups).
• The sample sizes for every group may or may not be the same.
Comparing the means of more than two populations: Set up
Set Up
• 𝑋𝑖𝑗 : jth observation from the ith group, 𝑗 = 1,2, ⋯ , 𝑛𝑖 and 𝑖 = 1,2, ⋯ , 𝑘
• Thus 𝑋𝑖𝑗 ~𝑁 𝜇𝑖 , 𝜎 2 , 1 ≤ 𝑗 ≤ 𝑛𝑖 , 1 ≤ 𝑖 ≤ 𝑘.
• Model: 𝑋𝑖𝑗 = 𝜇𝑖 + 𝜖𝑖𝑗 , 𝑗 = 1,2, ⋯ , 𝑛𝑖 , 𝑖 = 1, 2, ⋯ , 𝑘. So,
𝜖𝑖𝑗 ~𝑁 0, 𝜎 2 , 𝐼𝐼𝐷, , 𝑗 = 1,2, ⋯ , 𝑛𝑖 , 𝑖 = 1, 2, ⋯ , 𝑘.

Assumptions
• Normality: all samples have to be from normal distributions.
• Independence: samples need to be independent.
• Equal variance: all populations must have equal variance
• Difference, if any, is therefore only through the means.
Alternative Representation
• Let μ: overall mean,
𝛼𝑖 : Differential effect of the i-th factor/ treatment/ group, then,
𝜇𝑖 = 𝜇 + 𝛼𝑖
• The model becomes: 𝑋𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝜖𝑖𝑗 , 𝑗 = 1, 2, ⋯ , 𝑛𝑖 , 𝑖 = 1, 2, ⋯ , 𝑘
• We now test for
𝐻0 : 𝛼1 = 𝛼2 = ⋯ = 𝛼𝑘 = 0, 𝐻1 : at least one 𝛼𝑖 is not 0

• A Model Restriction: The 𝛼𝑖 ’s are the differential effects from mean level, thus,
𝑘
• σ𝑖=1 𝑛𝑖 𝛼𝑖 = 0 , if model is unbalanced.
𝑘
• σ𝑖=1 𝛼𝑖 = 0, if model is balanced.
Some Estimates
1 𝑛𝑖
• The grand sample mean, 𝑋ത00 = σ𝑘𝑖=1 σ𝑗=1 𝑋𝑖𝑗 is unbiased for overall mean μ.
𝑛
Thus 𝜇ො = 𝑋ത00
1 𝑛𝑖
• Sample mean for each group 𝑋ത𝑖0 = σ 𝑋 . 𝑋ത𝑖0 is unbiased for 𝜇𝑖 = 𝜇 + 𝛼𝑖 .
𝑛𝑖 𝑗=1 𝑖𝑗

Thus, 𝜇ො + 𝛼ො𝑖 = 𝑋ത𝑖0


• Hence 𝛼ො𝑖 = 𝑋ത𝑖0 − 𝑋ത00
The Sum of Squares
• Sum of squared variation between the groups/ treatments (SSB/SSTR):
σ𝑘𝑖=1 𝑛𝑖 𝑋ത𝑖0 − 𝑋ത00 2 = σ𝑘𝑖=1 𝑛𝑖 𝛼ො𝑖2
• Sum of squared variation within the groups/treatments (SSE/SSW):
𝑘 𝑛𝑖 𝑘 𝑛𝑖
2 2
෍ ෍ 𝑋𝑖𝑗 − 𝑋ത𝑖0 = ෍ ෍ 𝑋𝑖𝑗 − 𝜇ො − 𝛼ො𝑖
𝑖=1 𝑗=1 𝑖=1 𝑗=1

𝑘 σ𝑛𝑖 2
σ ത00
• Total Sum of Squares (SST): 𝑖=1 𝑗=1 𝑋𝑖𝑗 − 𝑋

• A Result: SST can be partitioned as follows: SST=SSTR+SSE


• PublicT = read.csv(file.choose(), header=T)
• attach(PublicT)
• names(PublicT)

In R
• #Calculation by hand
• unique(city)

• xGT = mean(savings)
• xB = mean(savings[city == "Boston"])
• xNY = mean(savings[city == "NY"])
• xSF = mean(savings[city == "SF"])
• xC = mean(savings[city == "Chicago"])

In R
𝑛𝑖 2
SST= σ𝑘𝑖=1 σ𝑗=1 𝑋𝑖𝑗 − 𝑋ത00 =

• SSB/SSTR= σ𝑘𝑖=1 𝑛𝑖 𝑋ത𝑖0 − 𝑋ത00 2 =


A Test Statistic
Under 𝐻0
𝑆𝑆𝑇𝑅 2
• ~ 𝜒 𝑘−1
𝜎2
𝑆𝑆𝐸 2
• ~ 𝜒𝑛−𝑘 , where 𝑛 = 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘
𝜎2
• SSTR and SSE are independent.

Test Statistic
𝑆𝑆𝑇𝑅Τ(𝑘−1)
• Under 𝐻0 , 𝐹 = ~𝐹 𝑘−1 ,(𝑛−𝑘)
𝑆𝑆𝐸 Τ(𝑛−𝑘)

Note: We define
• 𝑀𝑆𝑇𝑅 = 𝑆𝑆𝑇𝑅Τ(𝑘 − 1)
• 𝑀𝑆𝐸 = 𝑆𝑆𝐸 Τ(𝑛 − 𝑘)
Analysis Of Variance (ANOVA) Table

• Under 𝐻0 , 𝐹~𝐹𝑘−1,𝑛−𝑘
• Reject 𝐻0 if observed 𝐹 > 𝐹𝑘−1,𝑛−𝑘;𝛼 at level α.
• Rejection of 𝐻0 means that not all group means are equal.
In R
• PublicT.Anova=aov(savings~city)
• summary(PublicT.Anova)
Analysis Of Variance (ANOVA) Table

• Cut-off at 5% level: 𝐹3,20;0.05 = 3.098


• Observed F-stat > cut-off
• Reject 𝐻0 : meaning savings is not same for all the cities
Why do we reject 𝐻0 for large F?
• Ideally, no variation between the groups under 𝐻0
• Under 𝐻0 , MSTR should be small (and MSE should remain unaffected)
• F = MSTR/MSE should be relatively small under 𝐻0 .
• Reject 𝐻0 for large values of observed F.
Bartlett's Test for Homogeneity of Variance

• To test 𝐻0 : 𝜎12 = 𝜎22 = ⋯ = 𝜎𝑘2 and 𝐻1 : not all 𝜎𝑖2 ’s are equal.
• Assumption: Data is from a normal distribution.
2
• Under 𝐻0 , the test statistic follows 𝜒𝑘−1 .
• Reject 𝐻0 at significance level α, if
2
• observed test statistic value > 𝜒𝑘−1;𝛼
• or if p-value < α
In R

bartlett.test(savings ~ city)
A Problem
• Does salary depend on gender?
• A data consisting of observations on three variables for 52 tenure-track professors in a
small college was collected to test this opinion (see DisSalary.csv).
• The variables are:
• Gender: Male/Female
• JobRank: Full Professor (full), Associate Professor (associate), Assistant Professor
(assistant)
• Salary: Salary of the faculty('000 Rs.)

• Assume that the data are normally and independently distributed.


• Suppose that we want to test at a 5% level of significance whether there is a difference
in salary with respect to gender.
boxplot(Salary~Gender, col = c("red", "blue"), main = "Boxplot of salary based on gender")
library(plotly)
p2 <- ggplot(DSal, aes(x=Gender, y=Salary, fill=Gender)) +
geom_boxplot()
ggplotly(p2)
A Problem: Continued
• 𝐻0 : 𝐻1 :

• Cut-off value:
• p-value:
• Conclusion:

You might also like