Professional Documents
Culture Documents
Data Mining1
Data Mining1
1 (𝑥𝑖 −𝑥)ҧ 2
• ℎ𝑖 = + σ(𝑥𝑖 −𝑥)ҧ 2
𝑛
• An observation with leverage 2(m+1)/n or 3(m+1)/n may be considered
high leverage point (m number of predictors).
Simple Linear Regression
• Influential observation:
• Omitting this point will have effect on regression equation.
• One of the way it can be measured Cook’s Distance. It is given by
𝑗 − 𝑦𝑖 )2
σ𝑗(𝑦 𝑗
𝐷𝑖 =
𝑘 + 1 ∗ 𝑀𝑆𝐸
Where Di is the Cook’s distance of ith observation and k – number
of predictor in the model. Yj is the predicted value of jth observation
including ith observation and yji is the predicted value of jth
observation after excluding ith observation.
• A cook’s distance more than 1, is highly influential observation.
Assumption
• Normality of errors
• E(e) = 0
• Var(e) = 𝜎 2
• Breush-pagan test (bptest)
• Indepenence
• Model evaluation
• AIC (akaike information criterion)
• BIC (Bayesian information criterion)
Interpretation
• Multiple R
• R2 =SSR/SST
• Coefficient of determination
• Adjusted R2 = 1 – (1-R2)*((n-1)/(n-k-1)) = 1 –MSE/MST
• Model building (variable selection)
• Standard Error
• Variable selection & comparisons of models
• Precision
• F-test
• T-test