EquationSheet PDF

Statistics Quick Reference 2.
3 V ISUAL 5 Continuous random variables 6 Sampling distributions

Card & R Commands All plots have optional arguments: CDF F(x) gives area to the left of x, F −1 (p) expects p
µx̄ = µ σx̄ = √
σ
(57)
by Anthony Tanbakuchi. Version 1.8.2 • main="" sets title is area to the left.
n
http://www.tanbakuchi.com • xlab="", ylab="" sets x/y-axis label
• type="p" for point plot f (x) : probability density (34) r
pq
ANTHONY @ TANBAKUCHI · COM Z ∞ µ p̂ = p σ p̂ = (58)
Get R at: http://www.r-project.org • type="l" for line plot E =µ= x · f (x) dx (35) n
R commands: bold typewriter text • type="b" for both points and lines −∞
Ex: plot(x, y, type="b", main="My Plot") rZ
∞ 7 Estimation
1 Misc R Plot Types: σ= (x − µ)2 · f (x) dx (36)
hist(x) histogram
−∞ 7.1 C ONFIDENCE I NTERVALS
To make a vector / store data: x=c(x1, x2, ...)
stem(x) stem & leaf F(x) : cumulative prob. density (CDF) (37)
Help: general RSiteSearch("Search Phrase") proportion: p̂ ± E, E = zα/2 · σ p̂ (59)
Help: function ?functionName boxplot(x) box plot F −1 (x) : inv. cumulative prob. density (38)
plot(T) bar plot, T=table(x) mean (σ known): x̄ ± E, E = zα/2 · σx̄ (60)
Get column of data from table: Z x
tableName$columnName plot(x,y) scatter plot, x, y are ordered vectors F(x) = f (x0 ) dx0 (39) mean (σ unknown, use s): x̄ ± E, E = tα/2 · σx̄ , (61)
−∞
List all variables: ls() plot(t,y) time series plot, t, y are ordered vectors
curve(expr, xmin,xmax) plot expr involving x p = P(x < x0 ) = F(x0 ) (40) d f = n−1
Delete all variables: rm(list=ls())
0
x =F −1
(p) (41) (n − 1)s2 (n − 1)s2
√ variance: < σ2 < , (62)
x = sqrt(x) (1) 2.4 A SSESSING N ORMALITY p = P(x > a) = 1 − F(a) (42)
2
χR χ2L
xn = x∧ n (2) Q-Q plot: qqnorm(x); qqline(x) d f = n−1
p = P(a < x < b) = F(b) − F(a) (43)
r
n = length(x) (3) pˆ1 qˆ1 pˆ2 qˆ2
3 Probability 5.1 U NIFORM DISTRIBUTION 2 proportions: ∆ p̂ ± zα/2 · + (63)
T = table(x) (4) n1 n2
Number of successes x with n possible outcomes. s
(Don’t double count!) p = P(u < u0 ) = F(u0 ) s21 s2
2 Descriptive Statistics 2 means (indep): ∆x̄ ± tα/2 · + 2, (64)
xA = punif(u’, min=0, max=1) (44) n1 n2
2.1 N UMERICAL P(A) = (17)
n u0 = F −1 (p) = qunif(p, min=0, max=1) (45) d f ≈ min (n1 − 1, n2 − 1)
Let x=c(x1, x2, x3, ...) P(Ā) = 1 − P(A) (18) sd
n matched pairs: d¯ ± tα/2 · √ , di = xi − yi , (65)
total = ∑ xi = sum(x) (5) P(A or B) = P(A) + P(B) − P(A and B) (19) 5.2 N ORMAL DISTRIBUTION n
i=1 P(A or B) = P(A) + P(B) if A, B mut. excl. (20) 2
d f = n−1
1 (x−µ)
min = min(x) (6) −1
P(A and B) = P(A) · P(B|A) (21) f (x) = √ · e 2 σ2 (46)
2πσ 2
max = max(x) (7) 7.2 CI C RITICAL VALUES ( TWO SIDED )
P(A and B) = P(A) · P(B) if A, B independent (22) p = P(z < z0 ) = F(z0 ) = pnorm(z’) (47)
six number summary : summary(x) (8) n! = n(n − 1) · · · 1 = factorial(n) (23) zα/2 = Fz−1 (1 − α/2) = qnorm(1-alpha/2) (66)
∑ xi z0 = F −1 (p) = qnorm(p) (48)
µ= = mean(x) (9) =
n!
Perm. no elem. alike (24) p = P(x < x0 ) = F(x0 ) tα/2 = Ft−1 (1 − α/2) = qt(1-alpha/2, df) (67)
N n Pk
(n − k)!
∑ xi = pnorm(x’, mean=µ, sd=σ) (49) χ2L = Fχ−1
2 (α/2) = qchisq(alpha/2, df) (68)
x̄ = = mean(x) (10) n!
n = Perm. n1 alike, . . . (25)
n1 !n2 ! · · · nk ! x0 = F −1 (p) χ2R = Fχ−1
2 (1 − α/2) = qchisq(1-alpha/2, df)
x̃ = P50 = median(x) (11)
n! = qnorm(p, mean=µ, sd=σ) (50) (69)
nCk = = choose(n,k) (26)
s
2 (n − k)!k!
∑ (xi − µ)
σ= (12)
N 5.3 t- DISTRIBUTION 7.3 R EQUIRED SAMPLE SIZE
s 4 Discrete Random Variables z 2
∑ (xi − x̄)
2
p = P(t < t 0 ) = F(t 0 ) = pt(t’, df) (51) proportion: n = p̂q̂
α/2
, (70)
s= = sd(x) (13) E
n−1 P(xi ) : probability distribution (27)
t 0 = F −1 (p) = qt(p, df) (52)
σ s ( p̂ = q̂ = 0.5 if unknown)
CV = = (14) E = µ = ∑ xi · P(xi ) (28)
µ x̄ χ2 - DISTRIBUTION zα/2 · σ̂ 2

q
5.4
σ = ∑ (xi − µ)2 · P(xi ) (29) mean: n = (71)
E
2.2 R ELATIVE STANDING 2
p = P(χ < χ ) = F(χ ) 20 20
x−µ x − x̄ 4.1 B INOMIAL DISTRIBUTION
z= = (15)
σ s = pchisq(X 2 ’, df) (53)
Percentiles: 20
µ = n· p (30) χ =F −1
(p) = qchisq(p, df) (54)
Pk = xi , (sorted x) √
σ = n· p·q (31)
i − 0.5 5.5 F - DISTRIBUTION
k= · 100% (16) P(x) = nCx px q(n−x) = dbinom(x, n, p) (32)
n
To find xi given Pk , i is: p = P(F < F 0 ) = F(F 0 )
1. L = (k/100%)n 4.2 P OISSON DISTRIBUTION = pf(F’, df1, df2) (55)
2. if L is an integer: i = L + 0.5; µx · e−µ
otherwise i=L and round up. P(x) = = dpois(x, µ) (33) F 0 = F −1 (p) = qf(p, df1, df2) (56)
x!
8 Hypothesis Tests 8.6 2- SAMPLE MATCHED PAIRS TEST 9.4 P REDICTION I NTERVALS
Test statistic and R function (when available) are listed for each. H0 : µd = 0 To predict y when x = 5 and show the 95% prediction interval with regression
Optional arguments for hypothesis tests: t.test(x, y, paired=TRUE, alternative="two.sided") model in results:
alternative="two.sided" can be: where: x and y are ordered vectors of sample 1 and sample 2 data. predict(results, newdata=data.frame(x=5),
"two.sided", "less", "greater" d¯ − µd0 int="pred")
conf.level=0.95 constructs a 95% confidence interval. Standard CI t= √ , di = xi − yi , d f = n − 1 (78)
sd / n
only when alternative="two.sided". 10 ANOVA
Required Sample size:
Optional arguments for power calculations & Type II error: 10.1 O NE WAY ANOVA
power.t.test(delta=h, sd =σ, sig.level=α, power=1 −
alternative="two.sided" can be:
β, type ="paired", alternative="two.sided") 1. results=aov(depVarColName∼indepVarColName,
"two.sided" or "one.sided"
data=tableName) Run ANOVA with data in TableName,
sig.level=0.05 sets the significance level α.
8.7 T EST OF HOMOGENEITY, TEST OF INDEPENDENCE factor data in indepVarColName column, and response data in
depVarColName column.
8.1 1- SAMPLE PROPORTION H0 : p1 = p2 = · · · = pn (homogeneity)
2. summary(results) Summarize results
H0 : X and Y are independent (independence)
H0 : p = p0 3. boxplot(depVarColName∼indepVarColName,
chisq.test(D)
prop.test(x, n, p=p0 , alternative="two.sided") data=tableName) Boxplot of levels for factor
Enter table: D=data.frame(c1, c2, ...), where c1, c2, ... are
p̂ − p0 column data vectors.
z= p (72) MS(treatment)
p0 q0 /n Or generate table: D=table(x1, x2), where x1, x2 are ordered vectors F= , d f1 = k − 1, d f2 = N − k (85)
MS(error)
of raw categorical data.
8.2 1- SAMPLE MEAN (σ KNOWN ) To find required sample size and power see power.anova.test(...)
(Oi − Ei )2
H0 : µ = µ0 χ2 = ∑ , d f = (num rows - 1)(num cols - 1) (79)
Ei 11 Loading and using external data and tables
x̄ − µ0 (row total)(column total)
z= √ (73) Ei = = npi (80) 11.1 L OADING EXCEL DATA
σ/ n (grand total)
1. Export your table as a CSV file (comma seperated file) from Excel.
For 2 × 2 contingency tables, you can use the Fisher Exact Test:
2. Import your table into MyTable in R using:
8.3 1- SAMPLE MEAN (σ UNKNOWN ) fisher.test(D, alternative="greater")
MyTable=read.csv(file.choose())
H0 : µ = µ0 (must specify alternative as greater)
t.test(x, mu=µ0 , alternative="two.sided")
Where x is a vector of sample data. 9 Linear Regression 11.2 L OADING AN .RDATA FILE
x̄ − µ0 You can either double click on the .RData file or use the menu:
t = √ , d f = n−1 (74) 9.1 L INEAR CORRELATION • Windows: File→Load Workspace. . .
s/ n
Required Sample size:
H0 : ρ = 0 • Mac: Workspace→Load Workspace File. . .
cor.test(x, y)
where: x and y are ordered vectors. 11.3 U SING TABLES OF DATA
β, type ="one.sample", alternative="two.sided")
∑ (xi − x̄)(yi − ȳ) r−0 1. To see all the available variables type: ls()
r= , t=q , d f = n−2 (81)
8.4 2- SAMPLE PROPORTION TEST (n − 1)sx sy 1−r2 2. To see what’s inside a variable, type its name.
n−2
H0 : p1 = p2 or equivalently H0 : ∆p = 0 3. If the variable tableName is a table, you can also type
prop.test(x, n, alternative="two.sided") names(tableName) to see the column names or type
9.2 M ODELS IN R head(tableName) to see the first few rows of data.
where: x=c(x1 , x2 ) and n=c(n1 , n2 ) MODEL TYPE EQUATION R MODEL 4. To access a column of data type tableName$columnName
∆ p̂ − ∆p0 linear 1 indep var y = b0 + b1 x1 y∼x1
z= q , ∆ p̂ = p̂1 − p̂2 (75) An example demonstrating how to get the women’s height data and find the
p̄q̄ p̄q̄ . . . 0 intercept y = 0 + b1 x1 y∼0+x1
n + n 1 2 mean:
linear 2 indep vars y = b0 + b1 x1 + b2 x2 y∼x1+x2
x1 + x2 . . . inteaction y = b0 + b1 x1 + b2 x2 + b12 x1 x2 y∼x1+x2+x1*x2
p̄ = , q̄ = 1 − p̄ (76) > ls() # See what variables are defined
n1 + n2 polynomial y = b0 + b1 x1 + b2 x22 y∼x1+I(x2∧ 2)
Required Sample size: [1] "women" "x"
power.prop.test(p1=p1 , p2=p2 , power=1 − β, > head(women) #Look at the first few entries
9.3 R EGRESSION height weight
sig.level=α, alternative="two.sided")
Simple linear regression steps: 1 58 115
1. Make sure there is a significant linear correlation. 2 59 117
8.5 2- SAMPLE MEAN TEST 2. results=lm(y∼x) Linear regression of y on x vectors 3 60 120
H0 : µ1 = µ2 or equivalently H0 : ∆µ = 0 3. results View the results > names(women) # Just get the column names
t.test(x1, x2, alternative="two.sided") 4. plot(x, y); abline(results) Plot regression line on data [1] "height" "weight"
where: x1 and x2 are vectors of sample 1 and sample 2 data. 5. plot(x, results$residuals) Plot residuals > women$height # Display the height data
∆x̄ − ∆µ0 [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
t=r d f ≈ min (n1 − 1, n2 − 1), ∆x̄ = x̄1 − x̄2 (77)
s21 s22 y = b0 + b1 x1 (82) > mean(women$height) # Find the mean of the heights
n1 + n2 [1] 65
∑ (xi − x̄)(yi − ȳ)
Required Sample size: b1 = (83)
∑(xi − x̄)2
β, type ="two.sample", alternative="two.sided") b0 = ȳ − b1 x̄ (84)

EquationSheet PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EquationSheet PDF

Uploaded by

Copyright:

Available Formats

Statistics Quick Reference 2.

3 V ISUAL 5 Continuous random variables 6 Sampling distributions

You might also like