03 Hme 712 Week 7 Post Regression Tests For MLR Audio Transcript

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

03 Hme 712 Week 7 Post Regression Tests For Mlr Audio

This video is about post regression tests following multiple linear regression. This is
important because you can put any kind of data into stata and asks data to calculate a model
and it will give you a model, but it doesn't mean that it's necessarily a good model that is
actually predicting correctly. We need to do the post-regression tests in order to make sure
that in fact, this is a model that we can rely on. Unfortunately, most journal articles there's no
mention of post-regression tests, this is unfortunate, but in your research reports if there is
linear regression carried out we expect to see post-regression tests. So in this slideshow, we're
going to find out how to define the residuals because the residuals are what we use for many
of the post-regression tests. We're going to define them for multi-linear regression, we're
going to then show the first test is are these residuals normally distributed, they should be for
a good model, and is the variance of the residuals constant across the range of y it should be
for a good model. For simple linear regression where you've only got one co-variable, for
example weight is your outcome here and height is your co-variable or your predictor
variable then the residual is simply the vertical distance from the observed point to the
straight line the fitted straight line, that's called the residual it's a little bit leftover that the
model did not explain. If you want to know algebraically it's y minus y-hat. Y are your
observed values those blue dots y-hat are the values on the red line those are the predicted
values so that's the algebraic definition of a residual, but if you just think of it conceptually
it's the distance from the point to the straight line the vertical distance. In this hypothetical
example, we've illustrated the calculation of the residual using a hypothetical relationship that
y hat or the y outcome equals 1.5 plus 2 that's our beta coefficient times the exposure, and
we've got the exposures listed there when we substitute into this formula we then can get y-
hat and the residual is simply the difference between the obser the observed value exposure
and the y hat which is the predicted value so in that way, the residuals can be calculated. With
multilinear regression, we calculate the residual in exactly the same way. It's simply the
observed values y minus the predicted values y-hat. The y-hats are obtained from solving the
regression equation. Now however there won't be a nice scatter plot of a fitted straight line
because which one of the co-variables are you going to put on the x-axis you may have
several to choose between, but we don't usually plot a graph of the residuals like this it's not
necessary. There are many good post-regression tests for multi-linear regression but we're
just going to do two, and these are that the residuals must be approximately normally
distributed, so if you were to plot a histogram that should be nice and bell-shaped or you for
if smooth the numbers are small you could do a shapira work test and then you want a p-
value greater than 0.05 and secondly, the residuals must have constant variation across the
values of y-hat. So you plot the residuals against y-hat and make sure that the variation that
you see is more or less constant across the range of y-hat. Here's the histogram of the
residuals from the systolic blood pressure data set. You'll see that it's actually approximately
normally distributed. There are also more sophisticated tests available such as the shapiro
wilk or the shapira francia you may find those used sometimes. For the even variance across
the range again here we've done this graphically we just plotted the residuals against the fitted
values or predicted values of y-hat and we find that yes the variation is more or less constant
across the range so we would be happy. There are more sophisticated tests you might see
brush pagan crookweisberg test done in status called the head test, which stands for
heteroskedasticity test don't worry about the name. So in conclusion we found that the f-test
result the p-value was less than 0.01 it shows the model is not just guessing. We found that
the coefficients for the t-tests on each each of the coefficient t-tests their p values were all
less than 0.05 so that's convincing that they all belong in the model. The r-squared test was
0.74 meaning that it explains 74 percent of the observed variation in systolic blood pressure.
All these things are quite good but what's the cherry on the on the cake if you like is that the
residuals are also normally distributed and they have more or less constant variation across
the range of predicted or y hat values so we would be quite happy with this model.

You might also like