Professional Documents
Culture Documents
Linear Regression: Method of Least Squares
Linear Regression: Method of Least Squares
Linear Regression: Method of Least Squares
Obviously all the attributes and the outcome must be numerical for this model to make
any sense.
In the case where we have only one attribute, linear regression corresponds to the idea
of ‘line of best fit’ in the form:
For example, if we were running a restaurant that was reservation only, we might want to
create a model of how many people who make reservations actually show up. We could use this
model to predict how much business we were likely going to do, and make plans based on that.
In this case, our attribute, x, would be the number of reservations and our outcome, y, would be
the number of meals that are actually served.
In other words, we are finding values such that the square of the difference between
the observed data and what the model predicts is minimized (thus the method of least squares).
The closer our model matches the data, the smaller the difference will be, and thus the smaller
this expression will be. For example, if our data happens to form a perfectly straight line, then
our model will fit the data exactly and the expression will be 0.
Putting it into the form of a function of our values, we have
To find values which minimize this function, we have to find values such that the following
holds:
Example
Suppose we have two data points: (10, 20) and (11, 22). Then our function, F, will be the
following:
Combining terms and setting each equal to 0, we have the system of equations we have to
solve for:
But what about the general case? We start by taking the partial derivatives of our function:
Let’s look at the first equation first. To begin we divide out by -2 on both sides to get the
following:
Then we break up the sum and move the 0 and 1 terms to the left side. 1 is a constant so we
also move it outside the sum.
The 0 sum is just the sum of 0 added to itself n times, so we have
For the xi and yi sums we can observe the following:
Distributing the xi and moving the 0 and terms to the left side we have:
1
We distribute the xi, factor out the 1, and get this:
Then we move the sum over the right and factor out the xi, and we finally get 1 by itself
So the general formulas for our values (in the case where we only have one attribute) are:
Alternative formulas
Some textbooks use the following formulas for 0 and 1:
Where
We can show that these two formulas are equivalent. Let’s being with Sxy:
So we have