Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Statistics Quick Reference 2.

3 V ISUAL 5 Continuous random variables 6 Sampling distributions


Card & R Commands All plots have optional arguments: CDF F(x) gives area to the left of x, F −1 (p) expects p
µx̄ = µ σx̄ = √
σ
(57)
by Anthony Tanbakuchi. Version 1.8.2 • main="" sets title is area to the left.
n
http://www.tanbakuchi.com • xlab="", ylab="" sets x/y-axis label
• type="p" for point plot f (x) : probability density (34) r
pq
ANTHONY @ TANBAKUCHI · COM Z ∞ µ p̂ = p σ p̂ = (58)
Get R at: http://www.r-project.org • type="l" for line plot E =µ= x · f (x) dx (35) n
R commands: bold typewriter text • type="b" for both points and lines −∞
Ex: plot(x, y, type="b", main="My Plot") rZ
∞ 7 Estimation
1 Misc R Plot Types: σ= (x − µ)2 · f (x) dx (36)
hist(x) histogram
−∞ 7.1 C ONFIDENCE I NTERVALS
To make a vector / store data: x=c(x1, x2, ...)
stem(x) stem & leaf F(x) : cumulative prob. density (CDF) (37)
Help: general RSiteSearch("Search Phrase") proportion: p̂ ± E, E = zα/2 · σ p̂ (59)
Help: function ?functionName boxplot(x) box plot F −1 (x) : inv. cumulative prob. density (38)
plot(T) bar plot, T=table(x) mean (σ known): x̄ ± E, E = zα/2 · σx̄ (60)
Get column of data from table: Z x
tableName$columnName plot(x,y) scatter plot, x, y are ordered vectors F(x) = f (x0 ) dx0 (39) mean (σ unknown, use s): x̄ ± E, E = tα/2 · σx̄ , (61)
−∞
List all variables: ls() plot(t,y) time series plot, t, y are ordered vectors
curve(expr, xmin,xmax) plot expr involving x p = P(x < x0 ) = F(x0 ) (40) d f = n−1
Delete all variables: rm(list=ls())
0
x =F −1
(p) (41) (n − 1)s2 (n − 1)s2
√ variance: < σ2 < , (62)
x = sqrt(x) (1) 2.4 A SSESSING N ORMALITY p = P(x > a) = 1 − F(a) (42)
2
χR χ2L
xn = x∧ n (2) Q-Q plot: qqnorm(x); qqline(x) d f = n−1
p = P(a < x < b) = F(b) − F(a) (43)
r
n = length(x) (3) pˆ1 qˆ1 pˆ2 qˆ2
3 Probability 5.1 U NIFORM DISTRIBUTION 2 proportions: ∆ p̂ ± zα/2 · + (63)
T = table(x) (4) n1 n2
Number of successes x with n possible outcomes. s
(Don’t double count!) p = P(u < u0 ) = F(u0 ) s21 s2
2 Descriptive Statistics 2 means (indep): ∆x̄ ± tα/2 · + 2, (64)
xA = punif(u’, min=0, max=1) (44) n1 n2
2.1 N UMERICAL P(A) = (17)
n u0 = F −1 (p) = qunif(p, min=0, max=1) (45) d f ≈ min (n1 − 1, n2 − 1)
Let x=c(x1, x2, x3, ...) P(Ā) = 1 − P(A) (18) sd
n matched pairs: d¯ ± tα/2 · √ , di = xi − yi , (65)
total = ∑ xi = sum(x) (5) P(A or B) = P(A) + P(B) − P(A and B) (19) 5.2 N ORMAL DISTRIBUTION n
i=1 P(A or B) = P(A) + P(B) if A, B mut. excl. (20) 2
d f = n−1
1 (x−µ)
min = min(x) (6) −1
P(A and B) = P(A) · P(B|A) (21) f (x) = √ · e 2 σ2 (46)
2πσ 2
max = max(x) (7) 7.2 CI C RITICAL VALUES ( TWO SIDED )
P(A and B) = P(A) · P(B) if A, B independent (22) p = P(z < z0 ) = F(z0 ) = pnorm(z’) (47)
six number summary : summary(x) (8) n! = n(n − 1) · · · 1 = factorial(n) (23) zα/2 = Fz−1 (1 − α/2) = qnorm(1-alpha/2) (66)
∑ xi z0 = F −1 (p) = qnorm(p) (48)
µ= = mean(x) (9) =
n!
Perm. no elem. alike (24) p = P(x < x0 ) = F(x0 ) tα/2 = Ft−1 (1 − α/2) = qt(1-alpha/2, df) (67)
N n Pk
(n − k)!
∑ xi = pnorm(x’, mean=µ, sd=σ) (49) χ2L = Fχ−1
2 (α/2) = qchisq(alpha/2, df) (68)
x̄ = = mean(x) (10) n!
n = Perm. n1 alike, . . . (25)
n1 !n2 ! · · · nk ! x0 = F −1 (p) χ2R = Fχ−1
2 (1 − α/2) = qchisq(1-alpha/2, df)
x̃ = P50 = median(x) (11)
n! = qnorm(p, mean=µ, sd=σ) (50) (69)
nCk = = choose(n,k) (26)
s
2 (n − k)!k!
∑ (xi − µ)
σ= (12)
N 5.3 t- DISTRIBUTION 7.3 R EQUIRED SAMPLE SIZE
s 4 Discrete Random Variables z 2
∑ (xi − x̄)
2
p = P(t < t 0 ) = F(t 0 ) = pt(t’, df) (51) proportion: n = p̂q̂
α/2
, (70)
s= = sd(x) (13) E
n−1 P(xi ) : probability distribution (27)
t 0 = F −1 (p) = qt(p, df) (52)
σ s ( p̂ = q̂ = 0.5 if unknown)
CV = = (14) E = µ = ∑ xi · P(xi ) (28)
µ x̄ χ2 - DISTRIBUTION zα/2 · σ̂ 2
 
q
5.4
σ = ∑ (xi − µ)2 · P(xi ) (29) mean: n = (71)
E
2.2 R ELATIVE STANDING 2
p = P(χ < χ ) = F(χ ) 20 20
x−µ x − x̄ 4.1 B INOMIAL DISTRIBUTION
z= = (15)
σ s = pchisq(X 2 ’, df) (53)
Percentiles: 20
µ = n· p (30) χ =F −1
(p) = qchisq(p, df) (54)
Pk = xi , (sorted x) √
σ = n· p·q (31)
i − 0.5 5.5 F - DISTRIBUTION
k= · 100% (16) P(x) = nCx px q(n−x) = dbinom(x, n, p) (32)
n
To find xi given Pk , i is: p = P(F < F 0 ) = F(F 0 )
1. L = (k/100%)n 4.2 P OISSON DISTRIBUTION = pf(F’, df1, df2) (55)
2. if L is an integer: i = L + 0.5; µx · e−µ
otherwise i=L and round up. P(x) = = dpois(x, µ) (33) F 0 = F −1 (p) = qf(p, df1, df2) (56)
x!
8 Hypothesis Tests 8.6 2- SAMPLE MATCHED PAIRS TEST 9.4 P REDICTION I NTERVALS
Test statistic and R function (when available) are listed for each. H0 : µd = 0 To predict y when x = 5 and show the 95% prediction interval with regression
Optional arguments for hypothesis tests: t.test(x, y, paired=TRUE, alternative="two.sided") model in results:
alternative="two.sided" can be: where: x and y are ordered vectors of sample 1 and sample 2 data. predict(results, newdata=data.frame(x=5),
"two.sided", "less", "greater" d¯ − µd0 int="pred")
conf.level=0.95 constructs a 95% confidence interval. Standard CI t= √ , di = xi − yi , d f = n − 1 (78)
sd / n
only when alternative="two.sided". 10 ANOVA
Required Sample size:
Optional arguments for power calculations & Type II error: 10.1 O NE WAY ANOVA
power.t.test(delta=h, sd =σ, sig.level=α, power=1 −
alternative="two.sided" can be:
β, type ="paired", alternative="two.sided") 1. results=aov(depVarColName∼indepVarColName,
"two.sided" or "one.sided"
data=tableName) Run ANOVA with data in TableName,
sig.level=0.05 sets the significance level α.
8.7 T EST OF HOMOGENEITY, TEST OF INDEPENDENCE factor data in indepVarColName column, and response data in
depVarColName column.
8.1 1- SAMPLE PROPORTION H0 : p1 = p2 = · · · = pn (homogeneity)
2. summary(results) Summarize results
H0 : X and Y are independent (independence)
H0 : p = p0 3. boxplot(depVarColName∼indepVarColName,
chisq.test(D)
prop.test(x, n, p=p0 , alternative="two.sided") data=tableName) Boxplot of levels for factor
Enter table: D=data.frame(c1, c2, ...), where c1, c2, ... are
p̂ − p0 column data vectors.
z= p (72) MS(treatment)
p0 q0 /n Or generate table: D=table(x1, x2), where x1, x2 are ordered vectors F= , d f1 = k − 1, d f2 = N − k (85)
MS(error)
of raw categorical data.
8.2 1- SAMPLE MEAN (σ KNOWN ) To find required sample size and power see power.anova.test(...)
(Oi − Ei )2
H0 : µ = µ0 χ2 = ∑ , d f = (num rows - 1)(num cols - 1) (79)
Ei 11 Loading and using external data and tables
x̄ − µ0 (row total)(column total)
z= √ (73) Ei = = npi (80) 11.1 L OADING EXCEL DATA
σ/ n (grand total)
1. Export your table as a CSV file (comma seperated file) from Excel.
For 2 × 2 contingency tables, you can use the Fisher Exact Test:
2. Import your table into MyTable in R using:
8.3 1- SAMPLE MEAN (σ UNKNOWN ) fisher.test(D, alternative="greater")
MyTable=read.csv(file.choose())
H0 : µ = µ0 (must specify alternative as greater)
t.test(x, mu=µ0 , alternative="two.sided")
Where x is a vector of sample data. 9 Linear Regression 11.2 L OADING AN .RDATA FILE
x̄ − µ0 You can either double click on the .RData file or use the menu:
t = √ , d f = n−1 (74) 9.1 L INEAR CORRELATION • Windows: File→Load Workspace. . .
s/ n
Required Sample size:
H0 : ρ = 0 • Mac: Workspace→Load Workspace File. . .
cor.test(x, y)
power.t.test(delta=h, sd =σ, sig.level=α, power=1 −
where: x and y are ordered vectors. 11.3 U SING TABLES OF DATA
β, type ="one.sample", alternative="two.sided")
∑ (xi − x̄)(yi − ȳ) r−0 1. To see all the available variables type: ls()
r= , t=q , d f = n−2 (81)
8.4 2- SAMPLE PROPORTION TEST (n − 1)sx sy 1−r2 2. To see what’s inside a variable, type its name.
n−2
H0 : p1 = p2 or equivalently H0 : ∆p = 0 3. If the variable tableName is a table, you can also type
prop.test(x, n, alternative="two.sided") names(tableName) to see the column names or type
9.2 M ODELS IN R head(tableName) to see the first few rows of data.
where: x=c(x1 , x2 ) and n=c(n1 , n2 ) MODEL TYPE EQUATION R MODEL 4. To access a column of data type tableName$columnName
∆ p̂ − ∆p0 linear 1 indep var y = b0 + b1 x1 y∼x1
z= q , ∆ p̂ = p̂1 − p̂2 (75) An example demonstrating how to get the women’s height data and find the
p̄q̄ p̄q̄ . . . 0 intercept y = 0 + b1 x1 y∼0+x1
n + n 1 2 mean:
linear 2 indep vars y = b0 + b1 x1 + b2 x2 y∼x1+x2
x1 + x2 . . . inteaction y = b0 + b1 x1 + b2 x2 + b12 x1 x2 y∼x1+x2+x1*x2
p̄ = , q̄ = 1 − p̄ (76) > ls() # See what variables are defined
n1 + n2 polynomial y = b0 + b1 x1 + b2 x22 y∼x1+I(x2∧ 2)
Required Sample size: [1] "women" "x"
power.prop.test(p1=p1 , p2=p2 , power=1 − β, > head(women) #Look at the first few entries
9.3 R EGRESSION height weight
sig.level=α, alternative="two.sided")
Simple linear regression steps: 1 58 115
1. Make sure there is a significant linear correlation. 2 59 117
8.5 2- SAMPLE MEAN TEST 2. results=lm(y∼x) Linear regression of y on x vectors 3 60 120
H0 : µ1 = µ2 or equivalently H0 : ∆µ = 0 3. results View the results > names(women) # Just get the column names
t.test(x1, x2, alternative="two.sided") 4. plot(x, y); abline(results) Plot regression line on data [1] "height" "weight"
where: x1 and x2 are vectors of sample 1 and sample 2 data. 5. plot(x, results$residuals) Plot residuals > women$height # Display the height data
∆x̄ − ∆µ0 [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
t=r d f ≈ min (n1 − 1, n2 − 1), ∆x̄ = x̄1 − x̄2 (77)
s21 s22 y = b0 + b1 x1 (82) > mean(women$height) # Find the mean of the heights
n1 + n2 [1] 65
∑ (xi − x̄)(yi − ȳ)
Required Sample size: b1 = (83)
∑(xi − x̄)2
power.t.test(delta=h, sd =σ, sig.level=α, power=1 −
β, type ="two.sample", alternative="two.sided") b0 = ȳ − b1 x̄ (84)

You might also like