Professional Documents
Culture Documents
Lecture 10
Lecture 10
When we say that μy|x is related to x by a straight line, we mean the different mean
weekly fuel consumptions and average hourly temperatures lie in a straight line.
Perhaps using this simpler rendition…
And the general form of the regression
equation…
MAIN
FORMULA
MAIN
FORMULA
Exercise: Least Square Point Estimates for
Fuel Consumption Case
THIS AREA HAS INTENTIONALLY BEEN LEFT BLANK
n-2
SSE ei2 ( yi yˆ i ) 2
The Standard Error is the point estimate of the residual standard deviation
σ
SSE
s MSE
n-2
We can use hypothesis testing to determine
whether a significant association exists
A regression model is not likely to be useful unless there is a
significant relationship between x and y
To test significance, we use the null hypothesis: H0: β1 = 0
Versus the alternative hypothesis: Ha: β1 ≠ 0
Note: The test will only be valid if SRM assumptions hold.
Alternative Hypo Reject H0 if P-value
|t| > t /2 Twice the area under the t
or distribution to the right of |t|
Ha: β1 ≠ 0 t > t/2
or
t < -t/2
b1 s
Remember: In this case, degrees t= where sb1
of freedom = n-2 sb1 SS xx
The same goes in determining whether the
y-intercept β0
To test significance, we use the null hypothesis: H0: β0 = 0
Versus the alternative hypothesis: Ha: β0 ≠ 0
Note: The test will only be valid if SRM assumptions hold.
If the intercept is not statistically significant, we can drop it from the
model (unless common sense dictates otherwise)
Alternative Hypo Reject H0 if P-value
|t| > t /2 Twice the area under the t
or distribution to the right of |t|
Ha: β0 ≠ 0 t > t/2
or
t < -t/2
Remember: In this case, b0 1 x2
t= where sb0 s
degrees of freedom = n-2 sb0 n SS xx
F Test checks the significance of the overall
regression relationship between x and y
For simple regression, this is
another way to test the null
hypothesis
H0: β1 = 0
Very useful in multiple
regression
Test statistic based on F
distribution. Reject H0 if F(model)
> F or p-value <
F is based on 1 numerator and n-2
denominator degrees of freedom
Explained variation
F
(Unexplain ed variation)/(n - 2)
And we can apply the concept of the
confidence interval for β1 and β0 as well…
Note that the confidence intervals here apply to β0 and β1 separately
100(1-α)% Confidence Interval for β1 1
3 2 1
Confidence and Prediction Intervals
The point on the regression line corresponding to a particular value
of x0 of the independent variable x is ŷ = b0 + b1x0. It is unlikely that
this value will equal the mean value of y when x equals x 0
Therefore, we need to place bounds on how far the predicted value
might be from the actual value
We can do this by calculating a confidence interval mean for the
value of y and a prediction interval for an individual value of y
CONFIDENCE
INTERVAL
[ŷ t /2 s Distance value ]
PREDICTION
INTERVAL [ŷ t /2 s 1 Distance value ]
The value of the simple correlation coefficient (r) is not the slope of the least
square line. The latter is b1.
Note: Coefficient of correlation is related to
covariance
A measure of the strength of a linear relationship is the
covariance sxy
x x y
n
i i y
s xy i 1
n 1
You do not need to bother with these formulas if you will be using statistical
software to compute correlation
Sample Calculation for Spearman’s
Correlation
Diff2
Not so good
Good
Source: http://people.duke.edu/~rnau/testing.htm
Autocorrelation and time series data
Independence assumption frequently violated in time series data.
Time-ordered error terms may be autocorrelated. There are two
types of autocorrelation as seen in a residuals in time order plot…
Positive autocorrelation:
Residual plot has a cyclical pattern
positive error term in time period i tends to be followed by another
positive value in i+k (some later period); negative error term in
time period i tends to be followed by another negative value in i+k
Negative autocorrelation:
Residual plot has an alternating pattern
positive error term in time period i tends to be followed by a
negative value in i+k (some later period); negative error term in
time period i tends to be followed by a positive value in i+k
Note: When residuals are independent of each other, the residual plot
should display a random pattern
Source: Bowerman (2009)
The Durbin-Watson Test for Autocorrelation
0 4
dl du 4-du 4-dl