Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic

ISE Department Lecturer: Phan Nguyễn Kỳ Phúc


--------------------o0o------------------

Regression Analysis
Problem given a data set of one output and multiple inputs

Y 72 76 78 70 68 80 82 65 62 90

X1 12 11 15 10 11 16 14 8 8 18

X2 5 8 6 5 3 9 12 4 3 10

Assume that the relationship between output and inputs are linear, so it can be expressed as

where can be interpreted as noise. In this model are unknown.


So our objective is to reconstruct a prediction model of Y which creates the minimum errors
based on the given data set.

Assume that our forecasting model is

So the error square between the forecasting model and the output of a record is

For example, for the 1st record the error square is

Since we consider all the data set, so the error square must be summed for all records

where n is the number of records in dataset.

For the above data set, n=10.

10 Regression Analysis 1
INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic
ISE Department Lecturer: Phan Nguyễn Kỳ Phúc
--------------------o0o------------------

So to minimize the error square, we take the first derivative corresponding to and set it equal

to zero. Solve these linear equation systems we can obtains values of

With the above example

The next two problems that we concern are

 Whether this regression model is valid (good enough to explain the data)
 Whether we can exclude some inputs, i.e., simplify current model but still can explain the
data

To answer the 1st concern we use the F-test

ANOVA Table for Multiple Regression

Source of Variation Sum of Squares Dof Mean Square F Ratio

Regression SSR

Error SSE

Total SST
n: number of data records

k: number of inputs

In the above example k=2, n=10.

10 Regression Analysis 2
INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic
ISE Department Lecturer: Phan Nguyễn Kỳ Phúc
--------------------o0o------------------

Another approach

The response can be expressed under the matrix form as

where the superscript denote the index of the record. The response can be expressed as

To find , following formula is applied

where X’ is the transpose matrix of X.

The covariance matrix of is given as follows:

where

The square roots of the main diagonal elements of this matrix are the standard errors of the
model parameters

ANOVA table

 SS tỉ lệ: error^2 propose model/error^2 of the simplest model


 All the value of output just random varies accrual one avg values => tổng (Yi-avgYall)^2
 SST= reduce error/by model + SSE after model

Source Dof SS MS F p
Regression 2 630.54 315.27 86.34 0.000
(input)
Error 7 25.56 3.67
Total (sl data 9 656.10
-1)
R-sq = 96.1% R-sq(adj) = 95.0%

10 Regression Analysis 3
INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic
ISE Department Lecturer: Phan Nguyễn Kỳ Phúc
--------------------o0o------------------

To answer the 2nd concern we use the t-test for each coefficient.

In t-test for coefficient the hypothesis testing is: . When we running the
software, for the above data set, we obtain

Predictor Coef Stdev t-ratio p


Constant 47.165 2.470 19.09 0.000
X1 1.5990 0.2810 5.69 0.000
X2 1.1487 0.3052 3.76 0.007

To look up the value of t-table we use the dof of SSE

Finding the confident interval for the output


Given an input x, we need to find the confident for the output Y(x)

The 100(1-α) percent confidence Y(x) will lie between

10 Regression Analysis 4
INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic
ISE Department Lecturer: Phan Nguyễn Kỳ Phúc
--------------------o0o------------------

Assignments

Question 1: The following data indicate the gain in reading speed versus the number of weeks in
the program of 10 students in a speed-reading program.

weeks 2 3 8 11 4 5 9 7 5 7
Speed 21 42 102 130 52 57 105 85 62 90
1. Plot a scatter diagram to see if a linear relationship is indicated.

2. Find the least squares estimates of the regression coefficients.

3. Estimate the expected gain of a student who plans to take the program for 7 weeks.

Question 2: The following data set presents the heights of 12 male law school classmates whose
law school examination scores were roughly equal. It also gives their first year salaries. Each of
them went into corporate law. The height is in inches and the salary in units of $1,000.

Height 64 65 66 67 69 70 72 72 74 74 75 76
Salary 91 94 88 103 77 96 105 88 122 102 90 114

1. Do the above data establish the hypothesis that a lawyer’s salary is related to his height? Use
the 5 percent level of significance.

2. What was the null hypothesis in part 1?

Question 3: Fit a multiple linear regression equation to the following data set.

X1 X2 X3 X4 Y
1 11 16 4 275
2 10 9 3 183
3 9 4 2 140
4 8 1 1 82
5 7 2 1 97
6 6 1 -1 122
7 5 4 -2 146
8 4 9 -3 246
9 3 16 -4 359
19 2 25 -5 482
Question 4:

1. Fit a multiple linear regression equation to the following data set.

2. Test the hypothesis that β0 = 0.

3. Test the hypothesis that β3 = 0.

4. Test the hypothesis that the mean response at the input levels x1 = x2 = x3 = 1 is 8.5.

10 Regression Analysis 5
INTERNATIONAL UNIVERSITY (IU) Engineering Probability & Statistic
ISE Department Lecturer: Phan Nguyễn Kỳ Phúc
--------------------o0o------------------

X1 X2 X3 Y
7.1 0.68 4 41.53
9.9 0.64 1 63.75
3.6 0.58 1 16.38
9.3 0.21 3 45.54
2.3 0.89 5 15.52
4.6 0.00 8 28.55
0.2 0.37 5 5.65
5.4 0.11 3 25.02
8.2 0.87 4 52.49
7.1 0.00 6 38.05
4.7 0.76 0 30.76
5.4 0.87 8 39.69
1.7 0.52 1 17.59
1.9 0.31 3 13.22
9.2 0.19 5 50.98

Question 5: The cost of producing power per kilowatt hour is a function of the load factor and
the cost of coal in cents per million Btu. The following data were obtained from 12 mills.

Loaf Cost X3
84 14 4.1
81 16 4.4
73 22 5.6
74 24 5.1
67 20 5.0
87 29 5.3
77 26 5.4
76 15 4.8
69 29 6.1
82 24 5.5
90 25 4.7
88 13 3.9
1. Estimate the relationship.

2. Test the hypothesis that the coefficient of the load factor is equal to 0.

3. Determine a 95 percent prediction interval for the power cost when the load factor is 85 and
the coal cost is 20.

10 Regression Analysis 6

You might also like