Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

STEPWISE REGRESSION

Sumber: http://business.fullerton.edu/
Begin by performing a normal multiple regression. If all variables are
shown as significant (P-values < ), then STOP -- the complete model is
good.
But if Significance F is low, but one or more of the p-values for the ttests are high, forward stepwise regression can be used to develop the
best model that contains some of the variables as follows.
STEP 1.
Do simple linear regressions of y vs. each x variable
individually. Select the x variable with the lowest p-value. (Suppose it
is X3.)
Step 2:
Do all possible 2-variable regressions in which one of the
two variables is X3.
If none of the 2-variable regressions gives low p-values for both X3
and the other variable -- STOP -- use the model utilizing only X3.
If one or more of the 2-variable models gives low p-values for both X3
and the second variable, select the model with the lowest p-values.
(Suppose it is the one with X3 and X5.) --- GO TO STEP 3.
Step 3:
Do all possible 3-variable regressions in which two of the
three variables are X3 and X5.
If none of the 3-variable regressions gives low p-values for each of
X3, X5, and the other variable -- STOP -- use the model utilizing only
X3 and X5.
If one or more of the 3-variable models gives low p-values for X3, X5
and the third variable, select the model with the lowest p-values.
GO TO STEP 4 and continue this process.

Example
Here is the printout from a model of Y vs. X1, X2, X3, X4, and X5.
There is low Significance F, but 2 of the p-values are high.
ANOVA
df
Regressio
n
Residual
Total

Intercept
X1
X2
X3
X4
X5

SS
MS
F
Significance F
5 82624266 16524853 18.79356 9.16E-06
14 12309961 879282.9
19 94934227

Coefficients Standard
Error
-1350.67 1326.78
2
105.1368 37.21172
-905.579 688.1833
4.038254 33.28221
732.1831 257.4505
23.08303 10.08736

t Stat

P-value

-1.01801 0.325946
2.825368
-1.3159
0.121334
2.843976
2.288312

Lower
Upper
95%
95%
-4196.34 1494.996

0.013489 25.32554 184.9481


0.209349 -2381.59 570.4283
0.905151 -67.3451 75.42157
0.013003 180.0062 1284.36
0.038187 1.447773 44.71829

Step 1: Do 5 1-variable regressions


X1:
Intercept
X1

Coefficients Standard
t Stat
P-value
Error
705.574 1093.339 0.645339 0.526849
162.3509 49.62806 3.271353 0.004241

X2:
Intercept
X2

Coefficients Standard
t Stat
P-value
Error
5510 455.4713 12.09736 4.43E-10
-3298.56 678.9765 -4.85813 0.000126

X3:
Coefficients
Intercept
X3

Standard
t Stat
P-value
Error
1829.596 943.2457 1.939681 0.068254
130.3296 49.62046 2.62653 0.017116

X4:
Intercept
X4

Coefficients Standard
t Stat
P-value
Error
33.24607 852.302 0.039007 0.969314
1209.819 238.2256 5.07846 7.84E-05

X5:
Intercept
X5

Coefficients Standard
t Stat
Error
1921.712 1099.356 1.748034
42.24776 20.0507 2.107047

P-value
0.097494
0.049403

Lowest p-value is X4
Do 2-variable
regressions with X4

Step 2: 2-variable regressions with X4


X4 and X1:
Intercept
X4
X1

Coefficients Standard
t Stat
P-value
Error
-2083.08 764.2981 -2.72548 0.014388
1062.177 170.179 6.241527 8.94E-06
127.3128 28.7017 4.435724 0.000362

X4 and X2:
Intercept
X4
X2

Coefficients Standard
t Stat
P-value
Error
2381.845 1156.512 2.059508
0.0551
764.6601 266.6007 2.868185 0.010657
-1954.61 740.6114 -2.63918 0.017223

X4 and X3:
Intercept
X4
X3

Coefficients Standard
t Stat
P-value
Error
-271.984 890.1006 -0.30556 0.763646
1059.013 272.7572 3.882622 0.001196
47.64925 42.83959 1.112271 0.281504

X4 and X5:
Intercept
X4
X5

Coefficients Standard
t Stat
P-value
Error
-529.912 957.4169 -0.55348 0.587141
1099.614 251.4775 4.372614 0.000415
18.61115 15.15154 1.228334 0.236057

Do 3-variable regressions with X1 and X4.

Model with X4 and X1


has the lowest p-values.

Step 3: 3-variable regressions with X1 and X4


X1, X4, and X2
Coefficients
Intercept
X1
X4
X2

-915.611
108.5795
921.6408
-712.454

Standard
t Stat
P-value
Error
1400.646 -0.65371 0.522586
34.33533 3.162327 0.006037
221.2157 4.166254 0.000728
716.1868 -0.99479 0.334647

X1, X4, and X3


Coefficients
Intercept
X1
X4
X3

-2105.84
136.6601
1116.86
-20.7029

Standard
t Stat
P-value
Error
780.2997 -2.69876 0.015812
33.26264 4.108516 0.000822
196.6308 5.679982 3.41E-05
35.00935 -0.59135 0.562546

p-values suggest all


three variables (X1, X2
and X5) are significant

X1, X4, and X5


Coefficients
Intercept
X1
X4
X5

-2782.66
130.5134
931.9743
21.36134

Standard
t Stat
P-value
Error
761.0356 -3.65641 0.00213
25.98578 5.022496 0.000125
164.9015 5.651702 3.61E-05
9.745077 2.192014 0.043515

Do 4-variable models that include X1, X4, and X5.


X1, X4, X5, and X2:
Coefficients
Intercept
X1
X4
X5
X2

-1388.72
107.5962
749.4844
22.82502
-879.915

Standard
t Stat
P-value
Error
1246.139 -1.11441 0.28264
30.16421 3.567017 0.002809
207.1954 3.617283 0.002534
9.531333 2.394735 0.030133
632.9993 -1.39007 0.184792

Neither adds add a new variable.


There are large p-values.

X1, X4, X5, and X3:


Intercept
X1
X4
X5
X3

Coefficients Standard
t Stat
P-value
Error
-2776.57 784.0729 -3.54121 0.002962
134.6924 30.38378 4.433037 0.000484
959.9247 195.1911 4.91787 0.000186
20.85893 10.18438 2.048129 0.058472
-9.42256 32.43436 -0.29051 0.775403

Best model includes only X1, X4, and X5:


Y = -2782.66+130.5134X1 + 931.9743X2 + 21.36134X5.

You might also like