Professional Documents
Culture Documents
RESEARCH METHODS LESSON 17 - Simple Regression
RESEARCH METHODS LESSON 17 - Simple Regression
REGRESSION
The most commonly used form of regression is linear regression, and the
most common type of linear regression is called ordinary least squares
regression.
Linear regression uses the values from an existing data set consisting of
measurements of the values of two variables, X and Y, to develop a model that
is useful for predicting the value of the dependent variable, Y for given values of
X.
ELEMENTS OF A REGRESSION EQUATION
b or Beta, the coefficient of X; the slope of the regression line; how much Y
changes for each one-unit change in X.
e is the error term; the error in predicting the value of Y, given the value of X (it
is not displayed in most regression equations).
For example, say we know what the average speed is of cars on the
freeway when we have 2 highway patrols deployed (average speed=75 mph) or
10 highway patrols deployed (average speed=35 mph). But what will be the
average speed of cars on the freeway when we deploy 5 highway patrols?
From our known data, we can use the regression formula (calculations
not shown) to compute the values of and and obtain the following equation: Y=
85 + (-5) X, where
Y is the average speed of cars on the freeway
That is, the average speed of cars on the freeway when there are no
highway patrols working (X=0) will be 85 mph. For each additional highway
patrol car working, the average speed will drop by 5 mph. For five patrols (X=5),
Y = 85 + (-5) (5) = 85 - 25 = 60 mph
r2
s.e.b
s.e.b is the standard error of the computed value of b. A t-test for statistical
significance of the coefficient is conducted by dividing the value of b by its
standard error. By rule of thumb, a t-value of greater than 2.0 is usually
statistically significant but you must consult a t-table to be sure. If the t-value
indicates that the b coefficient is statistically significant, this means that the
independent variable or X (number of patrol cars deployed) should be kept in
the regression equation, since it has a statistically significant relationship with
the dependent variable or Y (average speed in mph). If the relationship was not
statistically significant, the value of the b coefficient would be (statistically
speaking) indistinguishable from zero.
F
F is a test for statistical significance of the regression equation as a whole. It is
obtained by dividing the explained variance by the unexplained variance. By
rule of thumb, an F-value of greater than 4.0 is usually statistically significant
but you must consult an F-table to be sure. If F is significant, than the
regression equation helps us to understand the relationship between X and Y.
r2 = .9
Knowing the value of X (the number of patrol cars deployed), we can explain
90% of the variance in Y (the average speed of motorists on the freeway).
s.e.b = 1.5
Dividing b by s.e.b, we obtain a value for t = -5/1.5 = -3.3. Consulting a t-table,
we find that the coefficient is statistically significant. This means that the
independent variable X (number of patrol cars deployed) should be kept in the
regression equation, since it has a statistically significant relationship with the
dependent variable Y (average speed in mph).
F= 8.4
From the F-table, we see that the regression equation as a whole is statistically
significant. This means that the regression equation is helping us to
understand the relationship between X and Y.
Example: The motor pool wants to know if it costs more to maintain cars that
are driven more often.
Hypothesis: maintenance costs are affected by car mileage
Null hypothesis: there is no relationship between mileage and maintenance
costs
Data are gathered on each car in the motor pool, regarding number of miles
driven in a given year, and maintenance costs for that year. Here is a sample of
the data collected.
a=50 or the cost of maintenance when X=0; if there is no mileage on the car,
then the yearly cost of maintenance=$50
b=.03 the value that Y increases for each unit increase in X; for each extra mile
driven (X), the cost of yearly maintenance increases by $.03
s.e.b = .0005; the value of b divided by s.e.b=60.0; the t-table indicates that
the b coefficient of X is statistically significant (it is related to Y)
r2=.90 we can explain 90% of the variance in repair costs for different vehicles
if we know the vehicle mileage for each car
However, if the trend of the dependent variable over time is not linear,
then linear regression will not capture the relationship. Linear regression fails
to capture seasonal, cyclical, and counter-cyclical trends in time series data.
Neither does linear regression capture the effects of changes in direction of
time series data, nor changes in the rate of change over time. For time series
regression, it is important to obtain a plot of the data over time and inspect it
for possible non-linear trends. There is also a problem if the values at one point
in the time series are determined or strongly influenced by values at a previous
time. This is called auto-correlation. This occurs when the values of the
dependent variable over time are not randomly distributed. Linear regression
can be used with interrupted time series research designs. For example, say a
policy is implemented to reduce the number of accidents among teenage
drivers.
If there is a difference between the two equations, then the policy has
had an effect. If all the data points (both pre- and post-) had been included in
the regression equation, the amount of variance explained (r 2) would be quite
low. This is because, if there is a change after the policy is introduced, the
trend is no longer linear. Instead, there are two different linear trends, one
before the policy was introduced, and another, different one after it was
introduced. In setting up the data for time series regression, the researcher
must remember to number the years (or other time periods) consecutively from
1 to n. These are the values for the independent (X) variable. The value of the
dependent variable is the accident rate. For example,