Professional Documents
Culture Documents
R Cheat Sheet v3
R Cheat Sheet v3
R Cheat Sheet v3
nrow
nrow(dataframe)
summary
summary(dataframe)
summary(dataframe$variable)
table
table(dataframe$variable)
table(dataframe$variable1, dataframe$variable2,
dnn=c("NameVariable1", "NameVariable2"))
Output: Frequencies
Best for: Showing frequencies of categorical variables only, c
prop.table
prop.table(table(dataframe$variable)) #percentage within dataframe overall
prop.table(table(dataframe$variable1, dataframe$variable2)) #cell percentages
prop.table(table(dataframe$variable1, dataframe$variable2), 1) #row percentages
prop.table(table(dataframe$variable1, dataframe$variable2), 2) #column percent.
Output: Proportions
Best for: Showing proportions for one or more categorical variable in data frame
freqtable
freqtable(dataframe$variable)
aggregate
aggregate(IRvariable ~ CategoricalVariable, dataframe, mean)
Descriptive Statistics
range
range(dataframe$IRvariable)
max
max(dataframe$IRvariable)
var
var(dataframe$IRvariable)
Output: Variance
sd
sd(dataframe$IRvariable)
IQR
IQR(dataframe$IRvariable)
quantile
quantile(dataframe$IRvariable)
quantile(dataframe$IRvariable, seq(0, 1, 0.1))
*0 - starting point of analysis (between 0 and 1)
*1 - ending point of analysis (between 0 and 1)
*0.1 - percentile (in decimal) that you want to analyze data by (between 0 and 1)
Output: Percentiles
Best for: IR variables only; getting general sense of where data is congregated; great
check point before creating categories with IR variables
hist
hist(dataframe$IRvariable)
Output: Histogram
Best for: Graphic display of IR variable; watch out for outliers
pie
pie(table(dataframe$CategoricalVariable)
barplot
barplot(table(dataframe$variable)
scatterplot
plot(dataframe$dependentvariable~dataframe$independentvariable)
abline(lm(dataframe$dependentvariable~dataframe$independentvariable))
boxplot
boxplot(dataframe$IRvariable~dataframe$CategoricalVariable)
Significance Testing
t.test
t.test(dataframe$IRvariable, mu= X, conf.level=0.5)
Output: One-sample t-test: test statistic, degrees of freedom, p-value, confidence interval,
and your sample mean
Best for: Performing single-sample hypothesis testing on IR variables
prop.test
Notes: All inputs into this function are actual numbers, not variables. The first
(‘FrequencyofYes’) will be the frequency of instances for the thing you’re
measuring (e.g. is there a statistically significant difference between the number of
smokers and non smokers? To measure for smokers, you’d put the total number of
smokers in this spot). The second (‘TotalResponses’) is the total number of
responses, minus any NAs.
t.test
t.test(IRVariable~CategoricalVariable, dataframe)
Notes: Categorical variable can only have a maximum of two “levels” (categories) for
this test to work. See Lab 6 for work around.
Output: Two sample hypothesis test: test statistic, degrees of freedom, p-value,
confidence interval, and raw means for both categories being tested.
Best for: Performing two-sample hypothesis test where one variable is categorical and the
other is interval ratio (the test is comparing the means of the IR variable)
prop.test
The second section need to match the first; if we use the example we were already
working with, TotalDepG1 would be the total number of moms who don’t smoke,
and TotalDepG2 would be the total number of moms who do smoke.
ANOVA
Post Hoc
TukeyHSD
TukeyHSD(anova.results)
Notes: Must be used with a previous ANOVA attempt. Uses 95% confidence level
Output: Shows matched pairs with difference in means, lower and upper bounds
of the confidence interval, and, most importantly, p-adjusted.
Best for: Determining where the significant difference is detected in ANOVA
Bonferroni
pairwise.t.test(dataframe$variable1, dataframe$variable2,
p.adj = “bonferroni")
Notes: Doesn’t rely on having done any other hypothesis testing before hand.
More conservative than Tukey.
Output: Matrix of relationships between both variables showing adjusted p-value.
Best for: Detailed hypothesis test for two variables
chisq.test(CrosstabDataframe, correct=FALSE)
Measures of Association
cor
cor(dataframe$dependent, dataframe$independent,
method="pearson", use="complete.obs")
cor.test
lm
lm(dataframe$variable1 ~ dataframe$variable2)
test_regression <- lm(dataframe$depvariable1 ~ dataframe$indepvariable2)
summary(test_regression)
Output: Coefficients including standard error, test statistic, and Pr value. Also
includes degrees of freedom, R-squared, adjusted R-squared.
Best for: Performing bivariate and multivariate regression
*bolded are longer-style references that weren’t mentioned in this cheat sheet