Lab 7 - Shell

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Lab 7 - ANOVA 2

Mansi Kumari (7908159)

2023-03-15

Learning Objectives

By the end of this lab, you should have a grasp on the following concepts:

• How to construct confidence intervals for group means after performing ANOVA with confint.
• How to construct confidence intervals for differences in group means with confint.
• How to test for differences in means after performing ANOVA, using tukeyHSD.

Instructions

To complete this worksheet, add code as needed into the R code chunks given below. Do not delete the
question text. All text should be in complete English sentences. Be sure to change the author of this file to
reflect your name and student number.
To properly see the questions, knit this .Rmd file to .pdf and view the output. You will have a link in your
email that takes you to the Crowdmark submission page. Once you have completed the worksheet, knit it
to .pdf and upload your output to Crowdmark.

1
Exercises
Import the Games200 dataset, which contains information on a sample of video games released in 2019.

Games200 <- read.csv("~/Downloads/Games200.csv")

Perform ANOVA on this dataset to compare the mean Metascore between each platform.

my.aov = aov(Metascore ~ Platform, data = Games200)


summary(my.aov)

## Df Sum Sq Mean Sq F value Pr(>F)


## Platform 3 923 307.80 5.261 0.00164 **
## Residuals 196 11468 58.51
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Use the aggregate function to see the mean Metascore of each platform.

means = aggregate(Metascore ~ Platform, data = Games200, FUN = mean)


means

## Platform Metascore
## 1 PC 74.63462
## 2 PlayStation 4 71.48889
## 3 Switch 72.24675
## 4 Xbox One 78.11538

Calculate a 95% confidence interval for the mean Metascore of the PC games, without using any functions
not introduced yet in the lab.

t.star = qt(0.975, 200 - 4)


t.star

## [1] 1.972141

MSE<- 58.51
ns = aggregate(Metascore ~ Platform, data = Games200, FUN = length)
ns

## Platform Metascore
## 1 PC 52
## 2 PlayStation 4 45
## 3 Switch 77
## 4 Xbox One 26

moe.PC = t.star*sqrt(MSE/52)
moe.PC

## [1] 2.09195

2
c(74.63 - moe.PC, 74.63 + moe.PC)

## [1] 72.53805 76.72195

Use confint to calculate a 95% confidence interval for the mean Metascore of the PC games.

confint(my.aov, level = 0.95)

## 2.5 % 97.5 %
## (Intercept) 72.5426369 76.72659387
## PlatformPlayStation 4 -6.2171296 -0.07432336
## PlatformSwitch -5.0956006 0.31987628
## PlatformXbox One -0.1426438 7.10418226

Convert the Platform variable in the Games200 dataset into the factor data type.

Games200$Platform = factor(Games200$Platform)

Use relevel to set Switch as the reference group in the Games200 dataset.

Games200$Platform = relevel(Games200$Platform, ref = "Switch")

Recreate the ANOVA model comparing mean Metascores between platforms, and use confint to get the
95% confidence interval for the Metascores of the Switch games, as well as the 95% confidence intervals for
the differences in mean Metascores, compared to the Switch.

my.aov = aov(Metascore ~ Platform, data = Games200)

confint(my.aov, level = 0.95)

## 2.5 % 97.5 %
## (Intercept) 70.5276042 73.965902
## PlatformPC -0.3198763 5.095601
## PlatformPlayStation 4 -3.5885209 2.072792
## PlatformXbox One 2.4469035 9.290359

Exercise: Load in the MLB200 dataset. This dataset contains the Names, Teams, Positions,
Heights, Weights, and Ages of a sample of 200 MLB players (taken a few years ago). The
positions have been recorded as Pit for Pitcher, Cat for Catcher, Out for Outfielder, and Inf
for Infielder, and Cat is the default reference group. Use aov to calculate an ANOVA table
comparing the mean heights of players by their position, and use confint to calculate a 99%
confidence interval for the mean height of the Catchers.

MLB200 <- read.csv("~/Downloads/MLB200.csv")


aov.MLB<-aov(Height~ Position,data = MLB200)
confint(aov.MLB,level = 0.99)

## 0.5 % 99.5 %
## (Intercept) 71.1501079 73.516559
## PositionInf -1.2112002 1.724021
## PositionOut -0.9055260 2.072193
## PositionPit 0.7637008 3.358094

3
Exercise: Relevel this dataset so that Out is the reference group.

MLB200$Position <- factor(MLB200$Position)


MLB200$Position <- relevel(MLB200$Position, ref = "Out")
aov.MLB<-aov(Height~ Position,data = MLB200)
confint(aov.MLB,level = 0.99)

## 0.5 % 99.5 %
## (Intercept) 72.0129633 73.8203700
## PositionCat -2.0721926 0.9055260
## PositionInf -1.5801341 0.9262879
## PositionPit 0.4290525 2.5260757

Exercise: Use confint to calculate a 99% confidence interval for the mean height of the Out-
fielders, as well as a 99% confidence interval for the difference in the mean heights between
the Outfielders and the Pitchers.

confint(aov.MLB,level = 0.99)

## 0.5 % 99.5 %
## (Intercept) 72.0129633 73.8203700
## PositionCat -2.0721926 0.9055260
## PositionInf -1.5801341 0.9262879
## PositionPit 0.4290525 2.5260757

## [1] " The 99% confidence interval for the mean height of the Outfielders is (72.0129633 ,73.8203700)\

Use TukeyHSD to calculate the confidence intervals needed to test for differences in the mean Metascores
between each platform.

my.Tukey = TukeyHSD(my.aov, conf.level = 0.95)


my.Tukey

## Tukey multiple comparisons of means


## 95% family-wise confidence level
##
## Fit: aov(formula = Metascore ~ Platform, data = Games200)
##
## $Platform
## diff lwr upr p adj
## PC-Switch 2.3878621 -1.169851 5.9455749 0.3062905
## PlayStation 4-Switch -0.7578644 -4.477080 2.9613513 0.9522093
## Xbox One-Switch 5.8686314 1.372804 10.3644589 0.0047747
## PlayStation 4-PC -3.1457265 -7.181260 0.8898074 0.1841909
## Xbox One-PC 3.4807692 -1.280054 8.2415921 0.2337460
## Xbox One-PlayStation 4 6.6264957 1.743804 11.5091875 0.0030276

4
Plot the confidence intervals calculated above.

plot(my.Tukey)

95% family−wise confidence level


PlayStation 4−Switch
Xbox One−PlayStation 4

−5 0 5 10

Differences in mean levels of Platform

Exercise: Use the TukeyHSD function to create 99% confidence intervals comparing the means
heights for each position in the MLB200 dataset. Which groups are significantly different?

MLB200.HSD<- TukeyHSD(aov.MLB, conf.level = 0.99)


MLB200.HSD

## Tukey multiple comparisons of means


## 99% family-wise confidence level
##
## Fit: aov(formula = Height ~ Position, data = MLB200)
##
## $Position
## diff lwr upr p adj
## Cat-Out -0.5833333 -2.3885149 1.221848 0.7384508
## Inf-Out -0.3269231 -1.8463907 1.192545 0.9051375
## Pit-Out 1.4775641 0.2062863 2.748842 0.0017933
## Inf-Cat 0.2564103 -1.5230080 2.035829 0.9686972
## Pit-Cat 2.0608974 0.4880990 3.633696 0.0003082
## Pit-Inf 1.8044872 0.5700659 3.038909 0.0000427

Exercise: Plot the confidence intervals calculated in the previous exercise.

5
plot(MLB200.HSD)

99% family−wise confidence level


Inf−Out
Inf−Cat
Pit−Inf

−2 −1 0 1 2 3

Differences in mean levels of Position

You might also like