Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

Quintin Hunt

Exam 3
###1.
Because sex and race were not missing any values the only two that are of concern are sei and
ann.income. I substracted 252 (missing values) from 624. There are 372 cases with both sei and
ann.income.

d=read.csv(file.choose())
> sum(is.na(d$sei)|is.na(d$ann.income))
[1] 252
> 624-252
[1] 372

###2.
> summary(subd$sei)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.20 29.28 37.70 43.46 62.20 92.30

> describe(subd$sei)
vars n mean sd median trimmed mad min max range skew kurtosis
1 1 372 43.46 19.03 37.7 42.86 18.24 0.2 92.3 92.1 0.27 -0.64
se
1 0.99
Both skewness and kurtosis values are not greater than |1|. There are no extreme outliers. There
is some evidence of bimodality and perhaps some skewness but the skewness rating is not
greater than |1|. There are many values outside of the confidence interval band in the QQplot but
they are fairly close.

###3.

I will do a 3 way ANOVA. This test is appropriate because I am analyzing 3 independent


variables. The ANN.INCOME variable has 3 levels, the SEX variable has 2 levels, and the
RACE has 2 as well. Accordingly, a 2x2x3 design is appropriate.

###4.
The design needs to be unbalanced because the group sizes are different.
The data below show that it is also disproportional.

> subd$sex.ann.race2=factor(paste(subd$sex, subd$ann.income, subd$race, sep=":"))


> table(subd$sex.ann.race2)

F:HIGH:NON-WHITE F:HIGH:WHITE F:LOW:NON-WHITE F:LOW:WHITE


14 101 11 23
F:MED:NON-WHITE F:MED:WHITE M:HIGH:NON-WHITE M:HIGH:WHITE
10 34 22 55
M:LOW:NON-WHITE M:LOW:WHITE M:MED:NON-WHITE M:MED:WHITE
16 45 12 29
> length(subd$sex.ann.race2)
[1] 372

> SexF <- sum(table3[1:6])


> SexM <- sum(table[7:12])
> SexM <- sum(table3[7:12])
> Sext <- c(SexF, SexM)
> AnnL <- sum(table3[c(3, 4, 9, 10)])
> AnnM <- sum(table3[c(5, 6, 11, 12)])
> AnnH <- sum(table3[c(1, 2, 7, 8)])
> Annt <- c(AnnL, AnnM, AnnH)
> RaW <- sum(table3[c(2,4,6,8,10,12)])
> RaN <- sum(table3[c(1,3,5,7,9,11)])
> Rat <- c(RaW, RaN)
> Sext
[1] 193 179
> Annt
[1] 95 85 192
> Rat
[1] 287 85
>

> N <- nrow(subd)


>N
[1] 372
> Sex.p = Sext/N
> Ann.p = Annt/N
> Rac.p = Rat/N
> sum(Sex.p)
[1] 1
> sum(Ann.p)
[1] 1
> sum(Rac.p)
[1] 1
> Sext
[1] 193 179
> Annt
[1] 95 85 192
> Rat
[1] 287 85
> Sex.p
[1] 0.5188172 0.4811828
> Rac.p
[1] 0.7715054 0.2284946
> Ann.p
[1] 0.2553763 0.2284946 0.5161290

> test.prop <- Rac.p[2]*Sex.p[1]*Ann.p[3]


> test.prop
[1] 0.06118552
>

> test.prop*N
[1] 22.76101
>
> round(test.prop*N) ==table3[1]
F:HIGH:NON-WHITE
FALSE
> round(test.prop*N) ==table3[2]
F:HIGH:WHITE
FALSE
> round(test.prop*N) ==table3[3]
F:LOW:NON-WHITE
FALSE
> round(test.prop*N) ==table3[4]
F:LOW:WHITE
TRUE
> round(test.prop*N) ==table3[5]
F:MED:NON-WHITE
FALSE
> table3[4]
F:LOW:WHITE
23
> round(test.prop*N)
[1] 23
>
###5.

> sei.mod=lm(sei~sex*ann.income*race, contrasts=list(sex=contr.sum, ann.income=contr.sum,


race=contr.sum), data=subd)
> leveneTest(sei.mod)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 11 1.5242 0.1206
360
>

I do not reject the null hypothesis that the population variance is the same for all 12 groups.

> VAR=tapply(subd$sei, subd$sex.ann.race, var)


> VAR
F:HIGH:NON-WHITE F:HIGH:WHITE F:LOW:NON-WHITE F:LOW:WHITE
249.6303 359.8849 224.7780 165.2449
F:MED:NON-WHITE F:MED:WHITE F:NA:NON-WHITE F:NA:WHITE
232.2582 281.7281 NA NA
M:HIGH:NON-WHITE M:HIGH:WHITE M:LOW:NON-WHITE M:LOW:WHITE
436.1599 373.6142 239.7553 248.0221
M:MED:NON-WHITE M:MED:WHITE M:NA:NON-WHITE M:NA:WHITE
211.9582 372.0102 NA NA
>

Needed to recreate table without NA ann.income variables.

> subd$sex.ann.race2=factor(paste(subd$sex, subd$ann.income, subd$race, sep=":"))


> table(subd$sex.ann.race2)

F:HIGH:NON-WHITE F:HIGH:WHITE F:LOW:NON-WHITE F:LOW:WHITE


14 101 11 23
F:MED:NON-WHITE F:MED:WHITE M:HIGH:NON-WHITE M:HIGH:WHITE
10 34 22 55
M:LOW:NON-WHITE M:LOW:WHITE M:MED:NON-WHITE M:MED:WHITE
16 45 12 29
> length(subd$sex.ann.race2)
[1] 372
>
> VAR=tapply(subd$sei, subd$sex.ann.race2, var)
> VAR
F:HIGH:NON-WHITE F:HIGH:WHITE F:LOW:NON-WHITE F:LOW:WHITE
249.6303 359.8849 224.7780 165.2449
F:MED:NON-WHITE F:MED:WHITE M:HIGH:NON-WHITE M:HIGH:WHITE
232.2582 281.7281 436.1599 373.6142
M:LOW:NON-WHITE M:LOW:WHITE M:MED:NON-WHITE M:MED:WHITE
239.7553 248.0221 211.9582 372.0102
>

> max(VAR)
[1] 436.1599
> min(VAR)
[1] 165.2449
> max(VAR)/min(VAR)
[1] 2.639475
>
There does not appear to be a violation of the assumption of homogeneity of variance. The
biggest variance is less than 4

> NS=aggregate(subd$sei,list(SEX=subd$sex, ANN.INCOME = subd$ann.income, RACE = subd$race),


length )
> NS=aggregate(subd$sei,list(SEX=subd$sex, ANN.INCOME = subd$ann.income, RACE = subd$race),
length )
> MEANS=aggregate(subd$sei,list(SEX=subd$sex, ANN.INCOME = subd$ann.income, RACE =
subd$race),mean )
> SDS=aggregate(subd$sei,list(SEX=subd$sex, ANN.INCOME = subd$ann.income, RACE =
subd$race), sd)
> group.table=cbind(NA, MEANS[4], SDS[4])
> group.table=cbind(NS, MEANS[4], SDS[4])
> colnames(group.table)[4:6]=c("N", "MEAN", "SD")
> group.table
SEX ANN.INCOME RACE N MEAN SD
1 F HIGH NON-WHITE 14 38.25714 15.79969
2 M HIGH NON-WHITE 22 40.99091 20.88444
3 F LOW NON-WHITE 11 31.30000 14.99260
4 M LOW NON-WHITE 16 34.32500 15.48403
5 F MED NON-WHITE 10 34.96000 15.24002
6 M MED NON-WHITE 12 44.40000 14.55878
7 F HIGH WHITE 101 49.66436 18.97063
8 M HIGH WHITE 55 54.78545 19.32910
9 F LOW WHITE 23 32.23043 12.85476
10 M LOW WHITE 45 35.64889 15.74872
11 F MED WHITE 34 39.82647 16.78476
12 M MED WHITE 29 42.26552 19.28757
>

> tapply(subd$sei, list(SEX=subd$sex), mean)


SEX
F M
43.21762 43.72570
> tapply(subd$sei, list(ANN.INCOME=subd$ann.income), mean)
ANN.INCOME
HIGH LOW MED
49.30573 34.09474 40.73176
> tapply(subd$sei, list(RACE=subd$race), mean)
RACE
NON-WHITE WHITE
37.80353 45.13798

> mean(subd$sei)
[1] 43.4621
Gender
Male Female
Race Race
Non- Non-
White White White White
LOW 35.6489 34.325 32.23 31.3
Annual
MED 42.265 44.4 29.8265 34.96
Income
HIGH 54.785 40.9909 49.664 38.2574

Marginal Means:
GENDER Female Male
43.21762 43.7257
Non-
RACE White White
37.80353 45.13798 Grand Mean
ANN.INCOM
E Low Med High 43.4621
34.09474 40.73176 49.30573

> tapply(subd$sei, subd$sex.ann.race2, describe)


$`F:HIGH:NON-WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 14 38.26 15.8 34.6 36.73 13.79 20.9 73.9 53 0.82 -0.57 4.22

$`F:HIGH:WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis
1 1 101 49.66 18.97 50.2 49.43 20.31 0.2 92.3 92.1 -0.09 -0.2
se
1 1.89

$`F:LOW:NON-WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 11 31.3 14.99 31.1 31.59 11.27 5.5 54.5 49 -0.25 -1.12 4.52

$`F:LOW:WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 23 32.23 12.85 28.7 31.24 8.3 13.1 62.5 49.4 0.85 -0.02 2.68

$`F:MED:NON-WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 10 34.96 15.24 30.2 32.92 11.86 22 64.2 42.2 0.8 -1.01 4.82

$`F:MED:WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 34 39.83 16.78 33.7 38.59 7.26 0.2 80.9 80.7 0.62 0.15 2.88

$`M:HIGH:NON-WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis
1 1 22 40.99 20.88 37.25 42.23 15.86 0.2 71.6 71.4 -0.16 -0.93
se
1 4.45

$`M:HIGH:WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis
1 1 55 54.79 19.33 62.5 56.47 17.49 0.6 80.9 80.3 -0.78 0.07
se
1 2.61

$`M:LOW:NON-WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 16 34.32 15.48 28.45 33.32 5.19 13.1 69.6 56.5 0.98 -0.2 3.87

$`M:LOW:WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis
1 1 45 35.65 15.75 32.1 34.13 10.53 14.6 70.2 55.6 0.96 -0.22
se
1 2.35

$`M:MED:NON-WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 12 44.4 14.56 40.9 44.25 13.05 17.1 73.2 56.1 0.14 -0.52 4.2

$`M:MED:WHITE`
vars n mean sd median trimmed mad min max range skew kurtosis
1 1 29 42.27 19.29 35.4 41.08 11.42 17.1 84.2 67.1 0.78 -0.67
se
1 3.58
par(mfrow=c(2,3))
for(i in 1:6){
qqPlot(subd$sei[subd$sex.ann.race2==levels(subd$sex.ann.race2)[i]], ylab="SEI", main =
paste(levels(subd$sex.ann.race2)[i], " (n = "
,length(subd$sei[subd$sex.ann.race2==levels(subd$sex.ann.race2)[i]]),")", sep=""))
}
for(i in 7:12){
qqPlot(subd$sei[subd$sex.ann.race2==levels(subd$sex.ann.race2)[i]], ylab="SEI", main =
paste(levels(subd$sex.ann.race2)[i], " (n = "
,length(subd$sei[subd$sex.ann.race2==levels(subd$sex.ann.race2)[i]]),")", sep=""))
}
par(mfrow = c(1,1))

The QQ plots are presented above. One of the plot shows some evidence of skewness but the
skewness rating does not exceed |1|. There is little variance from the a normal distribution as
well. There does not appear to be an extreme violation of the normality assumption. There are
two instances in which the kurtosis values exceed |1| but not greatly.

There is some evidence against the assumption of normality but considering that most are normal
and there was not problem in the variances I would go ahead with the test.

###6.

> summary(sei.mod)

Call:
lm(formula = sei ~ sex * ann.income * race, data = subd, contrasts = list(sex = contr.sum,
ann.income = contr.sum, race = contr.sum))
Residuals:
Min 1Q Median 3Q Max
-54.185 -12.561 -4.085 13.836 42.636

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.8878 1.1548 34.540 < 2e-16 ***
sex1 -2.1814 1.1548 -1.889 0.059699 .
ann.income1 6.0366 1.5108 3.996 7.82e-05 ***
ann.income2 -6.5118 1.6636 -3.914 0.000108 ***
race1 -2.5157 1.1548 -2.178 0.030026 *
sex1:ann.income1 0.2177 1.5108 0.144 0.885487
sex1:ann.income2 0.5706 1.6636 0.343 0.731819
sex1:race1 -0.3517 1.1548 -0.305 0.760901
ann.income1:race1 -3.7848 1.5108 -2.505 0.012680 *
ann.income2:race1 1.9521 1.6636 1.173 0.241416
sex1:ann.income1:race1 0.9485 1.5108 0.628 0.530510
sex1:ann.income2:race1 0.4500 1.6636 0.271 0.786917
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.72 on 360 degrees of freedom


Multiple R-squared: 0.1584, Adjusted R-squared: 0.1327
F-statistic: 6.16 on 11 and 360 DF, p-value: 2.732e-09

> summary(sei.mod)$fstatistic
value numdf dendf
6.159856 11.000000 360.000000
> 1 - pf(summary(sei.mod)$fstatistic[1], summary(sei.mod)$fstatistic[2], summary(sei.mod)
$fstatistic[3])
value
2.732248e-09
> 1 - pf(summary(sei.mod)$fstatistic[1], summary(sei.mod)$fstatistic[2], summary(sei.mod)
$fstatistic[3])
value
2.732248e-09

> a.table = Anova(sei.mod, type=3)


> a.table
Anova Table (Type III tests)

Response: sei
Sum Sq Df F value Pr(>F)
(Intercept) 374789 1 1193.0062 < 2.2e-16 ***
sex 1121 1 3.5682 0.05970 .
ann.income 6935 2 11.0379 2.226e-05 ***
race 1491 1 4.7454 0.03003 *
sex:ann.income 68 2 0.1088 0.89695
sex:race 29 1 0.0927 0.76090
ann.income:race 1978 2 3.1478 0.04413 *
sex:ann.income:race 232 2 0.3689 0.69179
Residuals 113096 360
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>

The omnibus null hypothesis is rejected [F(11, 360) = 6.159856, p <0.001]. With this rejected I
will test all the effects of the model.

The main effects of RACE and ANN.INCOME are significant; SEX is close to significant. There
is also a significant interaction between RACE and ANN.INCOME. The three way interaction is
not significant. Before performing post hoc tests to explore the nature of the statistically
significant effects I will compute the ETA-squared values to estimate an effect size for each
effect.

> sst <- var(subd$sei)*(length(subd$sei)-1)


> sst
[1] 134382.5
> eta.table=matrix(0, nrow=(nrow(a.table)-1), ncol=2)
> colnames(eta.table)=c("SOURCE","eta Squared")
> for(i in 1:(nrow(a.table)-1)){
+ eta.table[i,1] = rownames(a.table[i,])
+ eta.table[i,2]=round(a.table[i,1]/sst,3)
+}
> print(eta.table,quote=FALSE)
SOURCE eta Squared
[1,] (Intercept) 2.789
[2,] sex 0.008
[3,] ann.income 0.052
[4,] race 0.011
[5,] sex:ann.income 0.001
[6,] sex:race 0
[7,] ann.income:race 0.015
[8,] sex:ann.income:race 0.002
>
> summary(sei.mod)$r.squared
[1] 0.1584035
>
> sum(eta.table[2:8,2]))
Error: unexpected ')' in "sum(eta.table[2:8,2]))"
> sum(as.numeric(eta.table[2:8,2]))
[1] 0.089
> .008+.052+.011+.001+.015+.002
[1] 0.089
>

The overall R2 for the model is .158; the sum of parts is .089. Meaning the overlapping part of variance
the effects share in common accounts for 6.9% of the variance in SEI.

###7.

> mse <- a.table$"Sum Sq"[9]/a.table$DF[9]


> mse
[1] 314.1552

race.effect <- (mean(subd$sei[subd$race == "WHITE"]) - (mean(subd$sei[subd$race == "NON-


WHITE"])))/sqrt(mse)
> race.effect
[1] 0.4138047
>

There are two levels for RACE (WHITE and NON-WHITE). The main effect size for RACE is .414.

The hypotheses for post hoc test:

H0: μlow = μmed


H0: μmed = μhigh
H0: μhigh = μlow

> exam.lsd <- pairwise.t.test(subd$sei, subd$ann.income, p.adjust.method="none")


> exam.lsd

Pairwise comparisons using t tests with pooled SD

data: subd$sei and subd$ann.income

HIGH LOW
LOW 5.5e-11 -
MED 0.00028 0.01370

P value adjustment method: none


>

With the Fisher LSD test (which had the lowest p-value) all three null hypotheses are rejected because the
p-values are all below .05. There appears to be a statistically significant difference between all the low to
medium, medium to high, and low to high groups.

> plotmeans(sei~ann.inn, data=subd)


Error: could not find function "plotmeans"
> plotmeans(sei~ann.inn, data=subd)
Error: could not find function "plotmeans"
> ?plotmeans
No documentation for ‘plotmeans’ in specified packages and libraries:
you could try ‘??plotmeans’
> ??plotmeans
> library(gplots)

Attaching package: ‘gplots’

> plotmeans(sei~ann.inn, data=subd)


Error in eval(expr, envir, enclos) : object 'ann.inn' not found
> plotmeans(sei~ann.in, data=subd)
Error in eval(expr, envir, enclos) : object 'ann.in' not found
> plotmeans(sei~ann.income, data=subd)
>
These graphs show a significant difference between WHITE and NON-WHITE for HIGH but not so for
low. I will do a simple effects analysis with income static.

> Low.mod <-lm(sei~race*sex, contrasts=list(race=contr.sum, sex=contr.sum), data=Low)


> Med.mod <-lm(sei~race*sex, contrasts=list(race=contr.sum, sex=contr.sum), data=Med)
> High.mod <-lm(sei~race*sex, contrasts=list(race=contr.sum, sex=contr.sum), data=High)
> low.aov <- Anova(low.mod, type=3)
Error in Anova(low.mod, type = 3) : object 'low.mod' not found
> Low.aov <- Anova(low.mod, type=3)
Error in Anova(low.mod, type = 3) : object 'low.mod' not found
> Low.aov <- Anova(Low.mod, type=3)
> Med.aov <- Anova(Med.mod, type=3)
> High.aov <- Anova(High.mod, type=3)
> Low.aov
Anova Table (Type III tests)

Response: sei
Sum Sq Df F value Pr(>F)
(Intercept) 81345 1 362.9952 <2e-16 ***
race 23 1 0.1035 0.7484
sex 189 1 0.8456 0.3602
race:sex 1 1 0.0032 0.9553
Residuals 20392 91
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Med.aov
Anova Table (Type III tests)

Response: sei
Sum Sq Df F value Pr(>F)
(Intercept) 105436 1 353.8539 <2e-16 ***
race 30 1 0.1013 0.7511
sex 571 1 1.9156 0.1701
race:sex 198 1 0.6654 0.4171
Residuals 24135 81
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> High.aov
Anova Table (Type III tests)

Response: sei
Sum Sq Df F value Pr(>F)
(Intercept) 232778 1 638.2301 < 2.2e-16 ***
race 4381 1 12.0124 0.0006548 ***
sex 426 1 1.1669 0.2814159
race:sex 39 1 0.1078 0.7430351
Residuals 68568 188
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>

> n<- aggregate(subd$sei, list(RACE=subd$race, ANN.INN=subd$ann.income), length)


> means<- aggregate(subd$sei, list(RACE=subd$race, ANN.INN=subd$ann.income), mean)
> sds<- aggregate(subd$sei, list(RACE=subd$race, ANN.INN=subd$ann.income), sd)
> vars<- aggregate(subd$sei, list(RACE=subd$race, ANN.INN=subd$ann.income), var)
> group.table <- cbind(n, means[3], sds[3], vars[3])
> group.table
RACE ANN.INN x x x x
1 NON-WHITE HIGH 36 39.92778 18.87439 356.2426
2 WHITE HIGH 156 51.46987 19.19300 368.3712
3 NON-WHITE LOW 27 33.09259 15.06876 227.0676
4 WHITE LOW 68 34.49265 14.82547 219.7944
5 NON-WHITE MED 22 40.10909 15.28762 233.7113
6 WHITE MED 63 40.94921 17.87340 319.4583
>
> colnames(group.table)[3:6]=c("N", "MEAN", "SD", "VAR")
> group.table
RACE ANN.INN N MEAN SD VAR
1 NON-WHITE HIGH 36 39.92778 18.87439 356.2426
2 WHITE HIGH 156 51.46987 19.19300 368.3712
3 NON-WHITE LOW 27 33.09259 15.06876 227.0676
4 WHITE LOW 68 34.49265 14.82547 219.7944
5 NON-WHITE MED 22 40.10909 15.28762 233.7113
6 WHITE MED 63 40.94921 17.87340 319.4583
>
The cell means for WHITE/NON-WHITE are very similar for LOW and MED but very different for
HIGH.

You might also like