Linear Regression: Method of Least Squares

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Linear Regression

Linear regression is a method by which we find a linear model between an outcome, y,


and a set of attributes, {x​1​, x​2​, …, x​n​} in the following form:

Obviously all the attributes and the outcome must be numerical for this model to make
any sense.

In the case where we have only one attribute, linear regression corresponds to the idea
of ‘line of best fit’ in the form:

where ​0​ is the y-intercept and ​1​ is the slope.

(Line of best fit)

For example, if we were running a restaurant that was reservation only, we might want to
create a model of how many people who make reservations actually show up. We could use this
model to predict how much business we were likely going to do, and make plans based on that.
In this case, our attribute, x, would be the number of reservations and our outcome, y, would be
the number of meals that are actually served.

Method of Least Squares


So the question that remains is how do we find these values? One method is the
method of least squares. For simplicity’s sake, we’ll only be looking at the case where there is
only one attribute.
Given a set of historical data, { (x​1​, y​1​), (x​2​, y​2​), …, (x​n​, y​n​) }, where each pair (x​i​, y​i​) is an
instance where x​i​ and y​i​ occurred together. (Using the restaurant example, (100, 80) would
represent a night where there were 100 reservations but only 80 people showed up) When we
use the method of least squares we want to find values such that the following expression is
minimized:

In other words, we are finding values such that the square of the difference between
the observed data and what the model predicts is minimized (thus the method of ​least squares​).
The closer our model matches the data, the smaller the difference will be, and thus the smaller
this expression will be. For example, if our data happens to form a perfectly straight line, then
our model will fit the data exactly and the expression will be 0.
Putting it into the form of a function of our values, we have

To find values which minimize this function, we have to find values such that the following
holds:

Example
Suppose we have two data points: (10, 20) and (11, 22). Then our function, F, will be the
following:

And our partial derivatives are the following:

Combining terms and setting each equal to 0, we have the system of equations we have to
solve for:

And our solution comes out to :


So our model for the outcome is:

Which perfectly matches our data:

But what about the general case? We start by taking the partial derivatives of our function:

So this is the system of equations we have to solve:

Let’s look at the first equation first. To begin we divide out by -2 on both sides to get the
following:

Then we break up the sum and move the ​0​ and ​1​ terms to the left side. ​1​ is a constant so we
also move it outside the sum.

The ​0​ sum is just the sum of ​0​ added to itself n times, so we have

For the x​i​ and y​i​ sums we can observe the following:

So our equation becomes:


Let’s switch gears now to the other equation. Dividing by -2 again like the first time we have the
following:

Distributing the x​i​ and moving the ​0​ and ​ terms to the left side we have:
1​

Now we plug in the equation we found for the ​0

We distribute the x​i​, factor out the ​1​, and get this:

Then we move the sum over the right and factor out the x​i​, and we finally get ​1​ by itself

So the general formulas for our values (in the case where we only have one attribute) are:

Alternative formulas
Some textbooks use the following formulas for ​0​ and ​1​:

Where
We can show that these two formulas are equivalent. Let’s being with S​xy​:

Similarly for S​xx​:

So we have

Sample Correlation Coefficient


To see how good of a fit we have, we use the following value called the ​sample correlation
coefficient​:

The following is true about the sample correlation coefficient:


1.
2. The larger the value of the better the model fits the data

You might also like