Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Simple Linear Regression

Изучать в сети: quizlet.com/_ra1gb

1. alpha The Y intercept/ Expected value of Y when X = 8. Estimating 1) b = (X- Xmean)(Y-Ymean) / (X-Xmean)^2
definition 0 alpha, beta, (slope)
and the 2) a = Ymean - bXmean (intercept)
2. Assumption Y = alpha + beta(X) + error term where the
population
for Simple error term is roughly equivalent to the
variance OR
Linear Model population distribution of N(0, variance)
3. Assumptions 1) the expected values of Y given X must be SSxx (Sum of X)^2 / n
for E(Y/X) independently distributed as N(alpha + betaX, SSyy (Sum of Y)^2 / n
(the variance) SSXY (Sum of X)(Sum of Y) / n
expected 2) Normality of distribution
values of Y 3) Linearity: is it appropriate to perform a thus,
given X) linear regression
4) Constant Variance (homoscedasticity). The b= SSXY / SSXX
variance and r^2 = s/(n-2) x (SSYY - bSSXY)
5) Independence of Errors (the errors of the
9. Explanatory The dependent variable, The predictor
response variables are not correlated with
Variable variable, the X axis, x
each other
10. Fitted Line Y(hat) = a + bX
4. beta Regression coefficient/ slope of the line/
equation Y hat is the predicted value.
definition Amount of change in Y for a 1 unit change in
X; when the slope = 0 there is a horizontal line 11. F-stat The F-stat is a measure of appropriateness. In
and it indicates no relationship other words The F-stats statistical
significance tells us whether or not it was
5. Confidence These are the confidence intervals around the
appropriate to run a linear regression in the
Intervals for slope and the y-intercept.
first place. On stata output the F stat = (MS
Alpha and
reg)/ (MS res) = t-stat squared
Beta 95% CI = estimate +- Critical T value x
Standard Error 12. Hypothesis 1) Test for overall significance of the linear
Testing Steps regression
The interpretation is "We are 95% confident 2) Test for individual regression coefficients
that the true population slope will lie between 3) Calculate the confidence intervals for
the values of b-b range. If the population was alpha and beta
sampled 100 times, 95 would capture the true 4) Calculate the confidence intervals for the
Beta (the slope). expected value of Y given X
5) Calculate the prediction interval for Y
6. Confidence This is the confidence interval around the
given X
intervals for expected value (or mean) of Y given X.
E(Y/X) 13. Interpretation If the independent variable = 0, the average,
Interpretation is "we are 95% confident that the of Regression or expected value of the dependent variable
expected or true mean value of the dependent would be the y-intercept.
variable lies between y-y when the
independent variable is x." For ever one unit of increase for the
independent variable (predictor), we can
In stata this is the coefficient +- the critical t- expect the dependent variable to increase by
value x SE the slope.
7. epsilon Error term (since not all points will lie on the 14. Least Squares The method that minimizes the sum of the
definition line exactly) Error explains other factors that Regression squares of the vertical distances of the
are influencing the data observations from the line; We find the
difference between the observed value and
the expected (predicted) value (otherwise
known as the error or residual). Each error is
squared because we are not interested in
whether + or -. The method chooses the
intercept and slope in a way that makes the
sum of the residuals squared as small as
possible
15. Predicted The expected value, Y(hat) 26. Test for 1) Check assumptions hold
value Individual 2) test [H0: alpha = 0] [H1: alpha does not = 0]
Regression with t-stat (df=n-2)
16. Prediction This gives a 95% confidence interval for the
Coefficients 3) test [H0: beta = 0] [H1: beta does not = 0]
interval for predicted value of a single observation, and
with t-stat (df=n-2)
(Y/X) thus has much greater uncertainty than the
expected (Y/X). 27. Test for 1) Find total variation of Y with no knowledge
overall of X (SSyy = SS residuals + SS regression); the
17. Regression To estimate the population function with the
Significance larger proportion of the SS regression [(the
Analysis sample function; to find out how the average
of Linear predicted - mean)squared] the better the
Objective value for each dependent variable varies with
Regression regression line models the variability in Y
the independent variable
given X
18. Relationship For simple regression, all of these tests are the 2) Find the variance of the error terms r-
between same and will produce the same p-value squared which approximates sigma squared;
Correlation, results and thus the same significace level. MES res = SS res / (n-2)0
t-test, and 3) Calculate the F-stat and interpret
F-test 4) Interpret r-squared
19. Relationship The F-stat = the t-stat squared 28. X definition Predictor/ Independent Variable
between
29. Y definition Response/ Dependent Variable
the F-stat
and the t-
stat
20. Residual The error; the difference between the observed
and the expected vlaue or the predicted value.
The distance between the observed value and
the fitted line
21. Residuals e = Y - Y(hat); the observed - the predicted
equation value
22. Response The independent variable, the Y axis, y
Variable
23. r-squared This is a measure of goodness of fit; it is a
measure of how closely the points fit the line.
The higher the r-squared the better the fit. In
stata output r-squared can be found by
dividing the SS (model or residual) by the total
SS
24. Simple Describes the relationship between the values
Linear of two continuous variables where there is
Regression only one explanatory/independent variable;
however, if model is not available then the
mean of Y is the best estimate.
25. Sqrt of the This is the standard deviations of the residuals
Mean and a measure of accuracy; it is a measure of
Squared the values predicted by a model and the
Error observed value. It represents the sample
standard deviation of the differences between
predicted values and observed values; it
aggregates the magnitude of errors into one
value. It is scale dependent.

You might also like