Professional Documents
Culture Documents
Cheat Sheet
Cheat Sheet
Cheat Sheet
1. n ‘Bernoulli trials’: n trials such that each trial has only two possible
outcomes: ‘success’ (1) or ‘failure’ (0)
𝑋 = # of successes in n trials
= np 𝜎= 𝑛𝑝(1 − 𝑝)
1 2
Z -3 -2 -1 0 1 2 3
= l σ= 𝜆
The Z-score tells you how many standard deviations you are away from the mean
3 4
Transformations
Central limit theorem (CLT)
x- x1 -
z Prob( x x1 ) Prob( z z1 )
s s For a random sample of size n taken from any
population X with mean and standard deviation s:
In the table of Normal Probability: Areas to the right of indicated Z values
Prob (Z > value) = F(Z) The sample mean X follows* a Normal distribution, with
5 6
H0: = 0
𝜎 𝜎 If we don’t know the population standard deviation s HA: 0
Reject HO Accept HO Reject HO
X - 0
𝑋−𝑍 ⁄ , 𝑋+𝑍 ⁄ z obs
𝑛 𝑛 use the sample standard deviation s instead. s
n
𝑥 𝑥 𝑥 𝑥
𝑥 1− 𝑥 1−
−𝑍 𝑛 𝑛 , +𝑍 𝑛 𝑛
𝑛
⁄
𝑛 𝑛
⁄
𝑛 X
0
Z2 2 z
Sample size n for a given Max Error n 2
-Z/2 Z/2
4E
7 8
p-value = probability of a result as or more extreme Regression analysis
(in the direction of HA) as that observed, assuming • Regression analysis is used to:
H0 is true Predict the value of a dependent variable based on at least one independent
variable
Explain the impact of changes in an independent variable on the dependent variable
For a two-sided test,
the p-value is the probability of a result at least as extreme
• Jargon:
(on either side) as that observed, assuming H0 is true.
Dependent variable: The variable we wish to predict or explain
Independent variable: The variable used to explain the dependent variable
P-value = 2 P( Z |Zobserved| )
Decision rule: • Types of regression models:
If p-value < α You reject H0 Simple Regression: Use one independent variable to predict the dependent variable
If p-value > α You fail to reject (i.e., you accept) H0 Multiple Regression: Use more than one independent variable to predict the
dependent variable
9 10
Postulated vs. estimated Model Testing Bi: is there evidence of a relationship between Xi and Y?
Postulated model (for the population) Estimated model (based on a sample) Objective: Is Bi different from zero? If not, we want it out of the model.
𝑌 = 𝐴 + 𝐵1𝑋 + 𝐵2𝑋 + 𝐵3𝑋 + ⋯ + 𝜀 𝑌 = 𝑎 + 𝑏1𝑋 + 𝑏2𝑋 + 𝑏3𝑋 + ⋯ + 𝜀̂ Test: H0: Bi 0 there is no relationship between Xi and Y
HA: Bi 0 there is a relationship
Definitions: Definitions: Statistic: If H0 is true, then
Y – dependent variable
Xi – independent variables a, bi – regression coefficients bi–Bi bi
t-stat =
A, Bi – unknown regression parameters
𝜺𝒊 – residual error sbi = sbi ~ Z (approx.)
εi – random error term reject H0 reject H0
11 12
Coefficient of determination R2
and adjusted R2 Possible problems with regression
• Measures of how good the model is • Multicollinearity (MC)
• R2 measures the proportion of variation in the dependent variable – A strong linear relationship between two independent variables
(Y) that is explained by the model
• Always true: 0 R2 1 • Why is it a problem?
• If more variables are added, R2 goes up or stays the same. That – One of the independent variables becomes redundant; may result in
happens regardless of what variables are added – that’s not good. unstable/unreliable coefficients
• How to spot it?
Adjusted-R2 accounts for number of variables in the model. It – Rule of thumb: if the correlation between two independent variables is
penalizes having too many variables (“overfitting”). It is used to higher than 0.7 in absolute value
compare models of different sizes.
In contrast to R2, it can go up or down when a variable is added
to the model.
13 14
15 16