Lin Reg Introanim

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 15

Linear Regression

Least Squares Method:


an introduction
We are given the following ordered pairs:
(1.2,1), (1.3,1.6), (1.7,2.7), (2,2),
(3,1.8), (3,3), (3.8,3.3), (4,4.2). They
are shown in the scatterplot below:
If we draw a line, not the best line,
necessarily, but a line, as shown, we can
begin to consider how well it fits the
data. From each data point, we construct
a vertical line segment to the line.
This distance gives us an indication of
Squaringthe
the error, the difference between this
error, which may
predicted and actual y values.
be positive or
negative, gives
all positive
values, an
advantage in
finding a total.
The sum of the
squares gives us
a measure of the
scatter of the
We try drawing another line, this time a
horizontal line is shown. The squares are
still fairly large.
This line seems like a better fit. It
has a positive slope and an intercept
that seems reasonable. The total sum
of the squares is less than that for
the two previous lines.
Wow, this line with a negative slope
does not fit so well. The sum of the
squares will be very large. This would
make a very poor model.
Again, this line looks much better. This
is clearly a better model than some of
the earlier attempts.
This line does not look as good as the
purple or pink ones.
This line has larger squares than some
of the others. This is not the best
model.
This is the line based on calculations.
This is very similar to the
ŷ = .4967 purple
+ 0.7813 x one.The
equation is .
(The graphs are just approximations, and
are not exact.)
This shows the approximate sums of the
squares of the previous examples. The
smaller this quantity, the better the
model. Fortunately, we have a technique
that allows us to go straight to the
equation without all the guesswork.
Each colored
square
represents the
total area of
the squares for
an earlier
example. The
yellow was the
worst of the
proposals, and
the dark green
Below are the standardized coordinates.
All ordered pairs (x,y) are now
represented (Zx, Zy ). The best fit line
will pass through the origin.
Remember, to
standardize
is to
calculate a
z-score.
x −x
Z=
sx
The line drawn is the best fit line.
The difference between the data point and
the line is shown.
In order to find the best fit line we want
to minimize the quantity

2 This is the
Area =
∑( Zy −Ẑy) standardized sum of
n −1 the squares of the
differences, divided
by degrees of freedom
to adjust for sample
size.
The equation of the best fit line is
ŷ = a + bx wher
e
sy
b=r and a = y−bx
sx

This means that the equation can be


found if you have the means and
standard deviations for both x and y,
even without knowing all of the data
values.
We usually make use of technology to
carry out these calculations, and
formulas are always provided, but do
know how to use the formulas.

You might also like