Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

ASSIGNMENT RESPONSION 08

LINEAR REGRESSION LINE

By:
PANJI INDRA WADHARTA
03411640000037

DEPARTMENT OF GEOPHYSICS ENGINEERING


FAKULTAS TEKNIK SIPIL LINGKUNGAN DAN KEBUMIAN
INSTITUT TEKNOLOGI SEPULUH NOPEMBER
SURABAYA
2019
ABSTRACT
This report presents MATLAB applications in mathematical analysis. Our
objective is to employ numerical method in examples emphasizing the appeal of
Matlab as a progamming tool. The least-square method is used to illustrate the
linear regression line. So, to solved the problem we have to make the matlab code.
Based on experiment that we did, we can conclude that the script that attached in
4.1. can be used to construct the linear regression line of data. The generate error
is 4.5475e-13.

Keywords : Linear, Matlab Code, Regression


CHAPTER I
INTRODUCTION

1.1 Background
Numerical methods are technique by which mathematical problems are
formulated so that they can be solved with arithmetic and logical operations.
Because digital computers excel at performing such operations, numerical
methods are sometimes referred to as computer mathematics. Under the traditional
teaching mode, the complex equations take a long time to be solved, and need
calculator to get precise answer. Some of students are not easy to understand.
From the above, the course of numerical problem tends to have the problems that
rather difficult to understand and abstract, lack of interest and so on.
MATLAB is an essential numerical calculation and graphics processing
software by MathWorks company launched in 1984, and also one of the world's
most widely used scientific computing software nowadays. It has a powerful
scientific computing and data visualization function and integrates a variety of
toolboxs for different areas. Using MATLAB simulation becomes one of the
effective ways to solve the above problems. The objective in this paper to
illustrate the linear regression line with the matlab code that create by myself. The
descriptive will be presented too, in the sections below.

1.2 Scope of Problem


From the background, the problems that appear in this research is how to
construct the linear regression line in Matlab and how to smaller the value of
estimate error.

1.3 Objective
The objective are to create the matlab code and minimize the error
estimate with Matlab.
CHAPTER II
LITERATURE

2.1 Goodness of Fit of a Straight Line to Data


Once the scatter diagram of the data has been drawn and the model
assumptions described in the previous sections at least visually verified (and
perhaps the correlation coefficient r computed to quantitatively verify the linear
trend), the next step in the analysis is to find the straight line that best fits the data.
To each point in the data set there is associated an “error,” the positive or
negative vertical distance from the point to the line: positive if the point is above
the line and negative if it is below the line. The error can be computed as the
actual y-value of the point minus the y-value yˆ that is “predicted” by inserting the
x-value of the data point into the formula for the line:
error at data point(x,y)=(true y) − (predicted y) = y−yˆ
A first thought for a measure of the goodness of fit of the line to the data
would be simply to add the errors at every point, but the example shows that this
cannot work well in general. The line does not fit the data perfectly (no line can),
yet because of cancellation of positive and negative errors the sum of the errors
(the fourth column of numbers) is zero. Instead goodness of fit is measured by the
sum of the squares of the errors. Squaring eliminates the minus signs, so no
cancellation can occur. 

The goodness of fit of a line y=mx+by to a set of n pairs (x,y)(x,y) of numbers in


a sample is the sum of the squared errors:

Σ(y−yˆ)2Σ(y−y^)2

(n terms in the sum, one for each data pair).

2.2 The Least Squares Regression Line


Given any collection of pairs of numbers (except when all the x-values are
the same) and the corresponding scatter diagram, there always exists exactly one
straight line that fits the data better than any other, in the sense of minimizing the
sum of the squared errors. It is called the least squares regression line. Moreover
there are formulas for its slope and y-intercept.
Its slope β1 and y-intercept β0 are computed using the formulas
Ssxy
β1= and β0=y−β1x
SSxx
where:
SSxx = Σx2−(1/n)(Σx)2, Ssxy = Σxy−(1/n)(Σx)(Σy)
x⎯is the mean of all the x-values, y- is the mean of all the y-values, and n
is the number of pairs in the data set.
The equation y=β1x+β0 specifying the least squares regression line is
called the least squares regression equation.

2.3 The Sum of Square Erross (SSE)


In general, in order to measure the goodness of fit of a line to a set of data,
we must compute the predicted y-value yˆ at every point in the data set, compute
each error, square it, and then add up all the squares. In the case of the least
squares regression line, however, the line that best fits the data, the sum of the
squared errors can be computed directly from the data using the following
formula.
SSE = Ssyy − β1 Ssxy
CHAPTER III
METHODOLOGY

3.1 Data Information

The data used in this research is data on age and value of used
automobiles. After that, we compute the sum of x, y, and xy, x2 in Excel.
Table 3.1. Data on age and value of used automobiles
Age Value
(x) (y)
28.7 2
24.8 3
26 3
30.5 3
23.8 4
24.6 4
23.8 5
20.4 5
21.6 5
22.1 6

Table 3.2. Calculate data statistic in Excel


Age (x) Value (y) X.y X2 SUM(X) SUM(Y)
28,7 2 57,4 823,69 246,3 40
24,8 3 74,4 615,04
26 3 78 676
30,5 3 91,5 930,25
23,8 4 95,2 566,44
24,6 4 98,4 605,16
23,8 5 119 566,44
20,4 5 102 416,16
21,6 5 108 466,56
22,1 6 132,6 488,41

3.2 Flowchart
The flowchart for algorithm matlab code is:

Figure 3.1. Flowchart for Matlab Code


CHAPTER IV
RESULT AND DISCUSSION
4.1. Matlab Code
%panji jago

clear all
clc

x = [28.7 24.8 26 30.5 23.8 The data set of age (x) and value automobile
24.6 23.8 20.4 21.6 22.1]; (y)
y = [2 3 3 3 4 4 5 5 5 6];
n = length(x);

ssxx = sum(x,'all')^2 -
(1/n)*(sum(x,'all')^2); Calculate the ssxx, ssxy, ssy
ssxy = sum(x,'all')* sum(y, SSxx = Σx2−(1/n)(Σx)2,
'all') - Ssxy = Σxy−(1/n)(Σx)(Σy)
(1/n)*sum(x,'all')*sum(y, SSyy = Σy2−(1/n)(Σy)2,
'all');
ssyy = sum(y,'all')^2 -
(1/n)*sum(y, 'all')^2;
Calculate the mean of data x and y
x_ = mean(x);
y_ = mean(y);
Calculate the parameter b1 and b0 for
b1 = ssxy/ssxx; searching the regression line
b0 = y_ - b1*x_;
Input the parameter b1 and b0 for equation
Y = b1*x + b0; regression line y = b1*x+b0

plot(x,y, '.', x,Y, Plot the data and regression line


'markersize', 50)
title 'Regresi linear least-
square')

error = ssyy - b1*ssxy Calculate the error

4.2 Result
Based on the script that made by myself, the generate error is appear in
command window as variable ‘error’. The value is 4.5475e-13, which is show a
very small number. So, we can say that the script is correct, with algorithm used
based on literature. The trend of the line is positive because its up significant to
the right, means the strong correlation between data. The plot of regression linear
and data is:
Figure 4.1. Plot Regression Line and Scatter Plot Data

Figure 4.2. Command Window Result


Chapter V
Conclussion
Based on experiment that we did, we can conclude that the script that
attached in 4.1. can be used to construct the linear regression line of data. The
generate error is 4.5475e-13.
REFERENCE
https://saylordotorg.github.io/text_introductory-statistics/s14-04-the-least-squares-
regression-l.html accessed on 19 May 2019 04:56

Cohen, J., Cohen P., West, S.G., & Aiken, L.S. (2003). Applied multiple
regression/correlation analysis for the behavioral sciences. (2nd ed.) Hillsdale,
NJ: Lawrence Erlbaum Associates

You might also like