MP1.4 Final Report

Introduction
Regression analysis is a statistical method used to estimate the relationship between one or more
independent variables, which is always on the x-axis and a dependent variable, which is found on the y-
axis (Corporate Finance Institute, 2020). This is a reliable method for identifying the variable that has an
impact on the topic of interest, which is the dependent variable. The most used models are the linear and
multiple linear regression. Simple linear regression evaluates the relationship between a single
independent variable and a dependent variable. A simple linear model can be expressed as:
Y = a + bX + ε
where,
Y = dependent variable
X = independent variable
a = intercept
b = slope
ε = residual error
Meanwhile, multiple linear regression deals with the relationship between a dependent variable
and two or more independent variables. This can be expressed mathematically as:
Y = a + bX1 + cX2 + dX3 + ε
where,
Y = dependent variable
X1, X2, X3 = independent variables
a = intercept
b, c, d = slopes
ε = residual error
Furthermore, to know that the proposed model is the best possible model, the R 2 value of the
model should be 1 or as close as possible to 1. R 2 determines how the model fits the data, this is always
between 0 and 100% (Frost, 2017). R2 is solves using the equation:
2 explained variation
R=
total variation
Results
The machine problem was divided into 4 parts with the first 2 being about linear regression and
the later 2 about multiple regression. The first 2 problems were about solving linear regression problems
using MS Excel and Polymath. The given data for problems 1 and 2 are the following:
x, mole percentage y, yield
20 73
20 78
30 85
40 90
40 91
50 87
50 86
50 91
60 75
70 65
Table 1. Parameters for Problems 1 and 2.
For problem 1, the data shown in table 1 were fitted into different models: linear, quadratic (order
= 2), cubic (order = 3), quartic (order = 4), and quintic (order = 5). The quintic model was best fit for the
data. This was concluded using the r 2 values. In choosing the most appropriate model, the model with an
r2 value closest to 1 would be the most fitting. The equation gathered from the quintic model is
y = 1E-6x5-0.0002x4+0.0172x3-0.6165x2+11.869x-19.
Figure 1. Spreadsheet for Problem 1.
For problem 2, Polymath was utilized to find the best model for the data in table 1. The usage of
Polymath was easier and quicker than when using MS Excel. The gathered equation for problem 2 is the
same as with problem 1, with an exemption in the order of the equation and its significant figures. This
shows that for linear regression problems using either MS Excel or Polymath will produce the same
outcome.
For problem 3, MS Excel was used to fit the given data to a multiple linear model. In this given
data, the independent variables are Wpc1, Wpc2, Wpc3, and Wpc4 while the dependent variable is
hard_heat. This means that all weight percent were selected to be placed under the x range and all values
under hard_heat was for the y range. Data analysis was used to determine the best model.
Wpc1 Wpc2 Wpc3 Wpc4 hard_heat

7 26 6 60 78.7
1 29 15 52 74.3
11 56 8 20 104.3
11 31 8 47 87.6
7 52 6 33 95.9
11 55 9 22 109.2
3 71 17 6 102.7
1 31 22 44 72.5
2 54 18 22 93.1
21 47 4 26 115.9
1 40 23 34 83.8
11 66 9 12 113.3
10 68 8 12 109.4
Table 2. Given Data for Problem 3.
The model is rate=24.25183Wpc 10.078458 Wpc20.315851 Wpc 30.004391 Wpc 4 0.005101, this was
accepted because the r2 value was 0.947, which is still very close to 1. Since the r 2 value was already
accepted, the intercept and x variables 1, 2, 3, and 4 were used to complete the model for hard_heat.
Considering that the application used was MS Excel, the model for this problem will depend on what the
user wanted the model to look like.
Lastly for problem 4, Polymath was used to find the model for the given data. Just like problem 3,
the model was solved using multiple linear regression. In table 1, the independent variables are Pa and Pb
while the dependent variable is rate.
PA PB Rate
0.1044 0.1036 0.5051
0.1049 0.2871 0.6302
0.1030 0.5051 0.6342
0.2582 0.507 1.3155
0.2608 0.3100 1.5663
0.2407 0.4669 1.5981
0.3501 0.0922 1.6217
0.3437 0.1944 1.8976
0.3494 0.5389 2.1780
0.4778 0.1017 2.1313
0.4880 0.2580 2.7227
0.5014 0.5037 3.1632
Table 3. Given for Problem 4.
The model for this problem is rate=-0.3039+5.481Pa+1.028Pb, this model was accepted because
the r value was 0.957 which was still accepted. For polymath, the model is given along with the values of
2
the variable, so the user only needs to substitute the solved values to its corresponding variables.

Discussion
1. What was the basis of the choice of the best model in problem number 1? Explain in terms of variance.
The basis in choosing the best model in problem 1 is the R 2 values. The ideal R2 value is equal to
1, with this condition the model with an R 2 value closest to 1 was chosen. In problem 1, the model that
produced an R2 value closest to 1 was the quintic model or the 5 th order polynomial model. The R 2 value
of the said model was 0.962 compared to the R 2 values of the linear, quadratic, cubic, and quartic model,
which was 0.0345, 0.9364, 0.9402, and 0.9598, respectively.
2. What does the r2 value stand for? Why is r2 used, instead of r?
R2 measures the fitness of the data to the regression line model. It measures the strength of the
relationship between the model and the dependent variable. This can be between 0 to 100%, this is the
reason why the wanted R 2 value is the value closest to 1. R 2 is used instead of R because R 2 shows the
percent variation of the dependent variable that is explained by all the independent variables together.
Meanwhile R only represents the correlation between a single independent variable and a dependent
variable. This can conclude that R2 is better than R.
3. Discuss the similarities / differences of the models produced using different software packages.
The software used for the machine problem are MS Excel and Polymath. Both can identify the
most appropriate model of a given data. For linear regression and multiple regression, the software that
provide the quickest and easiest model is Polymath. In MS Excel, solving for linear regressions needs
trial and error and for multiple regressions, the model produced is an exponential equation at the same
time the user needs to create their own model. Furthermore, with Polymath the software provides its own
model and for multiple regression the given model is a linear equation. However, for linear regression,
trial and error is still needed to find the right fit which will all depend on the R 2 values.
Conclusion
The machine problem was successful because all the objectives of this activity was achieved. The
objectives for this machine problem were to solve curve-fitting problems using MS Excel and Polymath
and to determine the best model for a given set of data. The student was able to solve and determine the
best model using the said software. MS Excel was used for problems 1 and 3 while Polymath was used
for problem 2 and 4. Each software was used to determine the model for a linear regression and multiple
regression data. For linear regression, the data given was the same for both Polymath and Excel, the said
applications produced the same outcome. However, for multiple regressions the outcome of Polymath and
Excel were different from each other. This could be because Polymath uses its own model while for Excel
the user provides the initial model. Also, the final model of Excel is exponential while for Polymath it is
linear. As for which software is easier and quicker to use, the student concluded that Polymath is a better
software then Excel in solving curve-fitting problems and determining the best model for a given data.
References
Corporate Finance Institute. (2020, February 11). Regression Analysis. Corporate Finance

Institute; Corporate Finance Institute.
https://corporatefinanceinstitute.com/resources/knowledge/finance/regression-analysis/
Frost, J. (2017, April 16). How To Interpret R-squared in Regression Analysis. Statistics by Jim.
https://statisticsbyjim.com/regression/interpret-r-squared-regression/
R vs R Squared | Learn Top 8 Key difference with Comparision Table. (2019, December 23).
EDUCBA. https://www.educba.com/r-vs-r-squared/

MP1.4 Final Report

Uploaded by

Copyright:

Available Formats

You might also like

MP1.4 Final Report

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MP1.4 Final Report

Uploaded by

Copyright:

Available Formats

Introduction

Y = a + bX1 + cX2 + dX3 + ε

X1, X2, X3 = independent variables

Figure 1. Spreadsheet for Problem 1.

Wpc1 Wpc2 Wpc3 Wpc4 hard_heat

Figure 4. Spreadsheet for Problem 4.

2. What does the r2 value stand for? Why is r2 used, instead of r?

Corporate Finance Institute. (2020, February 11). Regression Analysis. Corporate Finance

You might also like