Professional Documents
Culture Documents
Regression Analysis: Business Statistics Class Lecturer: Dr. Phan Nguyen Ky Phuc
Regression Analysis: Business Statistics Class Lecturer: Dr. Phan Nguyen Ky Phuc
Regression Analysis
Problem given a data set of one output and multiple inputs
Y 72 76 78 70 68 80 82 65 62 90
X1 12 11 15 10 11 16 14 8 8 18
X2 5 8 6 5 3 9 12 4 3 10
Assume that the relationship between output and inputs are linear, so it can be expressed as
k
Y 0 i X i
i 1
where N (0, 2 ) can be interpreted as noise. In this model 0 , 1 , 2 ..., k are unknown.
So our objective is to reconstruct a prediction model of Y which creates the minimum errors
based on the given data set.
So the error square between the forecasting model and the output of a record is
k 2
error (Y Y )2 b0 bi X i Y
2
i 1
For example, for the 1st record Y 72, X 1 12, X 2 5 the error square is
Since we consider all the data set, so the error square must be summed for all records
k 2
L error (Y Y )2 b0 bi X i Y where n is the number of records in dataset.
2
n n n i 1
k k
L L
2b0 b0 bi X i Y 2 X i b0 bi X i Y
b0 n i 1 bi n i 1
k k
L L
0 b0 bi X i Y 0 0 X i b0 bi X i Y 0
b0 n i 1 bi n i 1
So to minimize the error square, we take the first derivative corresponding to bi and set it equal
to zero. Solve these linear equation systems we can obtains values of bi
Whether this regression model is valid (good enough to explain the data)
Whether we can exclude some inputs, i.e., simplify current model but still can explain the
data
SSR MSR
Regression SSR k MSR F
k MSE
SSE
Error SSE n (k 1) MSE
n (k 1)
Total SST n 1
n: number of data records
k: number of inputs
SSE 2 MSE
R2 1 ; R 1
SST MST
ANOVA table
Source Dof SS MS F p
Regression 2 630.54 315.27 86.34 0.000
Error 7 25.56 3.67
Total 9 656.10
R-sq = 96.1% R-sq(adj) = 95.0%
To answer the 2nd concern we use the t-test for each coefficient
In t-test for coefficient i the hypothesis testing is: H 0 : i 0, H1 : i 0 . When we running the
software, for the above data set, we obtain