Professional Documents
Culture Documents
10 - Regression Analysis
10 - Regression Analysis
Regression Analysis
Problem given a data set of one output and multiple inputs
Y 72 76 78 70 68 80 82 65 62 90
X1 12 11 15 10 11 16 14 8 8 18
X2 5 8 6 5 3 9 12 4 3 10
Assume that the relationship between output and inputs are linear, so it can be expressed as
So the error square between the forecasting model and the output of a record is
Since we consider all the data set, so the error square must be summed for all records
10 Regression Analysis 1
INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic
ISE Department Lecturer: Phan Nguyễn Kỳ Phúc
--------------------o0o------------------
So to minimize the error square, we take the first derivative corresponding to and set it equal
Whether this regression model is valid (good enough to explain the data)
Whether we can exclude some inputs, i.e., simplify current model but still can explain the
data
Regression SSR
Error SSE
Total SST
n: number of data records
k: number of inputs
10 Regression Analysis 2
INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic
ISE Department Lecturer: Phan Nguyễn Kỳ Phúc
--------------------o0o------------------
Another approach
where the superscript denote the index of the record. The response can be expressed as
where
The square roots of the main diagonal elements of this matrix are the standard errors of the
model parameters
ANOVA table
Source Dof SS MS F p
Regression 2 630.54 315.27 86.34 0.000
(input)
Error 7 25.56 3.67
Total (sl data 9 656.10
-1)
R-sq = 96.1% R-sq(adj) = 95.0%
10 Regression Analysis 3
INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic
ISE Department Lecturer: Phan Nguyễn Kỳ Phúc
--------------------o0o------------------
To answer the 2nd concern we use the t-test for each coefficient.
In t-test for coefficient the hypothesis testing is: . When we running the
software, for the above data set, we obtain
10 Regression Analysis 4
INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic
ISE Department Lecturer: Phan Nguyễn Kỳ Phúc
--------------------o0o------------------
Assignments
Question 1: The following data indicate the gain in reading speed versus the number of weeks in
the program of 10 students in a speed-reading program.
weeks 2 3 8 11 4 5 9 7 5 7
Speed 21 42 102 130 52 57 105 85 62 90
1. Plot a scatter diagram to see if a linear relationship is indicated.
3. Estimate the expected gain of a student who plans to take the program for 7 weeks.
Question 2: The following data set presents the heights of 12 male law school classmates whose
law school examination scores were roughly equal. It also gives their first year salaries. Each of
them went into corporate law. The height is in inches and the salary in units of $1,000.
Height 64 65 66 67 69 70 72 72 74 74 75 76
Salary 91 94 88 103 77 96 105 88 122 102 90 114
1. Do the above data establish the hypothesis that a lawyer’s salary is related to his height? Use
the 5 percent level of significance.
Question 3: Fit a multiple linear regression equation to the following data set.
X1 X2 X3 X4 Y
1 11 16 4 275
2 10 9 3 183
3 9 4 2 140
4 8 1 1 82
5 7 2 1 97
6 6 1 -1 122
7 5 4 -2 146
8 4 9 -3 246
9 3 16 -4 359
19 2 25 -5 482
Question 4:
4. Test the hypothesis that the mean response at the input levels x1 = x2 = x3 = 1 is 8.5.
10 Regression Analysis 5
INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic
ISE Department Lecturer: Phan Nguyễn Kỳ Phúc
--------------------o0o------------------
X1 X2 X3 Y
7.1 0.68 4 41.53
9.9 0.64 1 63.75
3.6 0.58 1 16.38
9.3 0.21 3 45.54
2.3 0.89 5 15.52
4.6 0.00 8 28.55
0.2 0.37 5 5.65
5.4 0.11 3 25.02
8.2 0.87 4 52.49
7.1 0.00 6 38.05
4.7 0.76 0 30.76
5.4 0.87 8 39.69
1.7 0.52 1 17.59
1.9 0.31 3 13.22
9.2 0.19 5 50.98
Question 5: The cost of producing power per kilowatt hour is a function of the load factor and
the cost of coal in cents per million Btu. The following data were obtained from 12 mills.
Loaf Cost X3
84 14 4.1
81 16 4.4
73 22 5.6
74 24 5.1
67 20 5.0
87 29 5.3
77 26 5.4
76 15 4.8
69 29 6.1
82 24 5.5
90 25 4.7
88 13 3.9
1. Estimate the relationship.
2. Test the hypothesis that the coefficient of the load factor is equal to 0.
3. Determine a 95 percent prediction interval for the power cost when the load factor is 85 and
the coal cost is 20.
10 Regression Analysis 6