Stat Final Notes111

CH1: DESCRIPTIVE STATISTICS 2. SD(0.5X+0.5Y) > √ Var(0.5X+0.5Y): [留意 dependent/not!
> dep 要用‘2 求 1 式’]

- Categorical: Nominal(region); Ordinal(low, high)| Numerical: √ units; interval (temp), ra o (age)
- Boxplot: 5 no. summary – min q1 median q3 max|Mean > Median -> Right-skewed curve; vice versa Properties of variation
- Quar les (Q): 成個數列雙數 > 切開兩整分搵 median; 成個數列單數 > 攞走 Q2，左右係兩整分
- Variance (Var): |Mean&SD: affected by extreme; Md&Q: robust
Categorical: Two-way / Con ngency tables

-(W propor on/%) Joint distribu on: each cell gives
propor on of TOTAL sample
- Bernoulli Distribu on: 1 = success, 0 = failure; fixed, constant probi of success ‘p’; independence
-Marginal distribu on: 最邊嗰兩條’Total’
➢ - P(X=1) = p, P(X=0) = 1-p; E(X) = p; Var(X) = p(1-p) = pq
- Condi onal distribu on:分母=total no. of the condi on (最邊嗰小 totals > eg 577 on above - Binomial Distribu on: Criteria: 同上 & n iden cal trials; X = total no. of successes in n trials; p =
- Lurking variable: overlooked variable w/ impt effect | Simpson’s Paradox: change in direc on of Steps to solve ques ons
variables’ associa on, when data are separated into groups defined THIRD variable probi of success in EACH trial > for Y˜Binomial (n, p) v
- 答法: 3rd variable –__;. *The worst x among the 3rd variable X* is the worst as rate of delay is the
< 1st find n and p(may be from ra os) Eg: US boys: girls
*highest/*. *A**hv a lot/ worst op on*,B no. absolute no. of delays for *A/* > rela vely *smaller/
at birth = 1.09:1, what propor on of US fam w/ 6
➢ Numerical: Sca erplot > explanatory x-axis, response y-axis|Correla on (r): Between -1 to 1, 0=no r
children hv at least 3 boys?
- High/moderate/low; +ve/ -ve/zero; no units; NOT robust | Correla on does NOT imply causa on
E(X) = np; Var(X) = npq
CH2: PROBABILITY (PROBI)
- A B: OR; A B: AND; AC: complement | Mutually exclusive > 有佢無我 > roll a dice 出 6 = no 1-5
CH4: CONTINUOUS RANDOM VARIABLE (PDF)
3. Addi on Rule for MuEx: P(A∪B) = P(A) + P(B); P(A∩B) = 0|5. General Addi on Rule: P(A∪B) = P(A)
+ P(B) - P(A∩B) | Independent > not affect other> roll dice twice, events of 2 trials ind. > 要中 Rule6
- Uniform Distribu on >上面 for c≤x≤d; 下面 otherwise (x<c; x>d) <state!
- 6. Mul plica on Rule: P(A∩B) = P(A)*P(B)|Bayes’ Rule: - X ~ Uniform [c, d] #c<d| P(a≤X≤b): c ≤ a<b ≤ d: (b-a)/(d-c)
a<c/ d<b:area under curve between a & b | X = median = (c+d)/2; σX = (d-c)/√12
-Condi onal: P(A|B) = P(A∩B)/ P(B) > P(B) ≠ 0
-Mul plica ve Rule for dependent events:
Empirical Rule -Skewness K3: K3≈ 0
P(A∩B) = P(A|B)*P(B) = P(B|A)*P(A)
‘symmetric’; K3> 0 ‘right-
-Law of Total Probi:
skewed’ |Normal probi:
If 𝑋~𝑁 (𝜇, 𝜎2), Z = (X-µ)/ 𝜎
P(Ace) = P(Ace and Red) + P(Ace and Black) =
is standard normal, mean
2/52 + 2/52 = 4/52
➢ 0 and SD 1 > ZÑ(0,1)
-Step to solve ques ons: frequency (no.
CH3: DISCRETE RANDOM VARIABLE - Steps to solve ques ons: 1. 攞 X 計 Z 2. 攞 Z 搵相對 probi 3.睇清目標 probi 係 Z 上/下> 要 Z 上>
Discrete(PMF) of x)/ total no. OR P(A∩B) = P(A)*P(B) >
搵 – Z probi. 4. 搵兩個數字中間嘅 probi > 相減
when outcomes involve 2/+ ind. Events
- Normal approx to Binomial: n larger, p= 0.5 > use 𝑋~𝑁 (𝜇, 𝜎2) > µ = np; 𝜎2=npq |np ≥5; nq ≥5
- Smaller df, t-distribu on has fa er tails
^ Covariance (Cov): > Cov(X, X) = Var(X) CH5: SAMPLING DISTRIBUTION (Samp dis) -Central Limit Theorem (CLT): : ;
< Correla on (r) |samp dis= normal when n large enough

- rXY = Cov(X, Y)/ SD(X)*SD(Y) > 可用任 2 求 1 項
- Possible 題型: Given SDx, SDY, r; find SD(X+Y): 1.Cov(X,Y) = r* SDx*SDY -σ(X)< σ(popu) |popu σ not given> es mate by SD (p1
- 2. Var(X+Y) = SD x2 + SDY2 + 2Cov(X,Y) 3. 開方搵 SD(X+Y) √var) | t-distribu on #df=n-1 for n< 30
- Coefficient of Varia on (CV) > CVX = SDX/ µX; Sharpe Ra o (S) > S(X) = (µ - rf)/ SD
- For mixed por olio S(0.5X+0.5Y): 1. E(0.5X+0.5Y) = 0.5E(X) + 0.5E(Y)
CH6: CONFIDENCE INTERVAL - Danger of Extrapola on outside the experimental region > may become curved outside exp region
- For each value of X, value of Y is normally distributed with some mean (may linearly depend on X),
- Convert given % of confidence to a > Table find za/2 > Find SE w given p̂
and a σ (not depend on X) > the σ is a constant, same for all values > σ of Y is also the σ of all ε
- Interpreta on: We are x% confident that required popu propor on is between this interval
- Model assump ons – LINE 1. Linearity: Mean value of Y has linear rela on with X
- Mean (above is p): | For n < 30, za/2 changes to ta/2, (n-1) - εi is independent of X i > E(εi) = 0 > ‘mean zero assump on’ 2. Independence between errors for
- Sample size n requirement >control margin of error be within a range, requires minimum n different X 3.Normality (distributed) of errors for different X 4.Equal variance for different X
- Residuals (ei / εi-hat) | Mean Square Error (MSE): s2 = MSE = SSE/(n-2) |Standard Error: s= √MSE
- For propor on: ; For mean: (unknown σ then use SD-,-) - 解題步驟: 1. plug the ‘X’s into ŷ = b0 + b1x to obtain ŷ 2.Sum up all ‘y - ŷ’ for SSE 3. Cal MSE and SE
- !! For propor on, L is % range; For mean, L is numerical value range|Round up ans if n not integer ▪ check for ‘LINE’: “E”: In residual plot evenly spread > homoscedas c; “N”: QQ plot > standard
CH7: HYPOTHESIS TESTING normal VS Residual quan le; “I”: Most likely violated when X is me stamps ( me-series data)
- Type I error: Reject true null hypo > a/ significance level > smaller a requires more opposing evd
- Type II error: Do not reject false null hypo > ß| 1 – ß/ power: Probi of correctly rejec ng false null SST= |Coe icient of Determination
ß varies inversely with a; Small R2: Interpreta on: About …% of sample varia on
a when cost of rejec ng truth in … (Y) can be explained by the linear regression
- (Type I) is high (cancer eg) model where we use …(X) to predict …(Y)
- Large a when interest in changing default | //Hypo tes ng methods// (illustrated by popu MEAN) -
- R2 = SSR/SST = 1 – SSE/SST |Between 0 to 1: 1= perfect match; 0= no linear rela onship
❖ Z-test (cri cal value) >Requirement: popu standard devia on σ is KNOWN; SRS
Popu proportion v - (correla on) r2 = R2 > can find R2 by: 1. Find SD(X), SD(Y), Cov(X,Y) 2. Cal r 3. r 2 = R2!
➢ 1-sided/ Tail tests: For ‘less than’ Ha > lower-tail test; vice versa
- b1 = sxy/s2x = r*sy/sx; sign of r same as sign of b1
- 1. Cal ‘z’ of given sample> z = (X-µ)/ (σ/√n) 2. Find ‘za’from normal table(eg a=0.05, za=1.645
- Tes ng for slope ß1: When Y&X no linear rela onship > H0: ß1=0 (default no rela on); H1: ß1≠0
3. Rejec on rule: Reject H0 in favor of Ha if z >za (‘greater than’ Ha); vice versa
- 表達方法: Reject H0: … in favor of Ha: … at the … significance level
- SE of b1: > Recap s= √ SSE/(n-2) | T-test 要 n-2 n-2 n-2 !!!!!!!
➢ 2-sided test: For ‘not equal to’ Ha |1. Find a/2 2.Find za/2 & -za/2
- Rejec on rule: Reject H0 in favor of Ha if z > za/2 OR z < -za/2 !! Different Ha different rejec on rules -大致同 ch7 差唔多
- For n<30 -> apply t-test pls! -> dun forget to s ll do a/2 for 2-sided! -解題 (Given b0, b1, s)
❖ P-value : Rejec on rule: p-value ≤ a, we can reject H0… 1-sided/ Tail tests 1. Cal ‘z’ of given sample -Find z-/t-sta s c of a
- 2. Find area under curve to RIGHT (‘greater than’ Ha) of ‘z’ on table > p-value! or a/2 from tables
-
- Apply rejec on rule > smaller than a > reject H0! > Ks implies the z-test in another way
<Find Sb1 > Cal b1/ Sb1 > Apply rejec on rules
- 2-sided test: 1. Cal ‘p-value’ based on calculated ‘z’ > then 2 > REAL P-VALUE
- Under same rejec on rule, reject only when Real p-value ≤ a !! SAME rejec on rule all the me !!
❖ Chi-Square Test for Independence > H0: 2 variables sta s cally independent; Ha: dependent -,- - -Test for y-int ß0: everything same except
- Observed table > data collected; Ar ficial/expected table > assuming 2 variables independent - Point on regression line corresponding to a par cular value X0:
- Oij = Observed cell frequency; ri = total obs in ith row; cj = total obs in jth column - Confidence interval for mean value of Y; predic on interval for an individual observed value of Y
- Expected cell frequency for ith row & jth column under independence: Eij = (ri*cj)/n
▪ Requirement: each expected frequency ≥ 5; observa on obtained from SRS
- CI: | PI: |開方入面舊嘢 (no 1+): distance value
Confidence Interval 解題：Find 95% CI for mean sales when X is 4 (X0); Given b0, b1, s
- > do NOT round Eij to integers! | X2a with (r-1)(c-1) df | Reject H0 if X2 > X2a
CH8: (SIMPLE) LINEAR REGRESSION -Predic on Interval 解題: Use 95% PI,
- Probabilis c: hypothesize determinis c (exact)+ random error > Y=?X + ε predict sales when X is 4 (X0); Given
- Y = ß0 + ß1X + ε | ß0: popu y-intercept; ß1: popu slope; ε: random error > E(ε)=0! | µY|X: ß0 + ß1X b0, b1, s| 所有野 > 除咗 final step
- find PI 要係 +1! +1! +1!
- ‘Best fit’: Least Square Es ma on(LSE):minimize Sum of Square Error(SSE) - Factors affec ng interval width (+- as rela onship): Confidence level 1-a (+); Data dispersion s (+);
sample size n (-); Distance of X0 from mean X (+) > 上圖個弧形 shape!
3. b0 = Ȳ - b1 X
4. ŷ = b0 + b1x
- 1. > 上 Cov (X,Y),下 Var(X) 2. ;

Stat Final Notes111

Uploaded by

Copyright:

Available Formats

You might also like

Stat Final Notes111

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat Final Notes111

Uploaded by

Copyright:

Available Formats

CH1: DESCRIPTIVE STATISTICS 2. SD(0.5X+0.5Y) > √ Var(0.5X+0.5Y): [留意 dependent/not!

> dep 要用‘2 求 1 式’]

- Variance (Var): |Mean&SD: aﬀected by extreme; Md&Q: robust

Categorical: Two-way / Con ngency tables

< Correla on (r) |samp dis= normal when n large enough

You might also like