Everything You Need To Know About Linear Regression - by Sushant Patrikar - Towards Data Science

5/31/23, 9:36 AM Everything You Need To Know About Linear Regression | by Sushant Patrikar | Towards Data Science
Open in app Sign up Sign In
Search Medium
This is your last free member-only story this month. Sign up for Medium and get an extra one.
Member-only story
Everything You Need To Know About Linear

Regression
Sushant Patrikar · Follow
Published in Towards Data Science
8 min read · Sep 10, 2019
Listen Share
Linear Regression is the first stepping stone in the field of Machine Learning. If you
are new in Machine Learning or a math geek and want to know all the math behind
Linear Regression, then you are at the same spot as I was 9 months ago. Here we will
look at the math of linear regression and understand the mechanism behind it.
https://towardsdatascience.com/everything-you-need-to-know-about-linear-regression-b791e8f4bd7a 1/20
Linear Regression (Source: https://datumguy.com/blog/blog/view/5ce138213122e?

utm_content=buffer09809&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer)
Introduction
Linear Regression. After breaking it down, we get two words ‘Linear’ & ‘Regression’.
When we think mathematically, the word ‘Linear’ appears to be something related to
the straight line while the term ‘Regression’ means A technique for determining the
statistical relationship between two or more variables.
Simply putting it together, Linear Regression is all about finding an equation of a line
that almost fits the given data so that it can predict the future values.
Hypothesis
Now, what’s this hypothesis? It’s nothing but the equation of line we were talking
about. Let’s look at the equation below.
Does this look familiar to you? It is the equation of a straight line. This is a hypothesis.
Let’s rewrite this in a somewhat similar way.
We have just replaced y with h(x) and c,m with Θ₀ and Θ₁ respectively. h(x) will be our
predicted value. This is the most common way of writing a hypothesis in Machine
Learning.
Now to understand this hypothesis, we will take the example of the housing prices.
Suppose you collect the size of different houses in your locality and their respective
prices. The hypothesis can be represented as
Now all you have to do is find the appropriate base price and the value of Θ₁ based on
your dataset so that you can predict the price of any house when given its size.
To say it more technically, we have to tune the values of Θ₀ & Θ₁ in such a way that our
line fits the dataset in the best way possible. Now we need some metric to determine
the ‘best’ line, and we have it. It’s called a cost function. Let’s look into it.
Cost Function J(Θ)

The cost function of the linear regression is
To make it look more beautiful for our brain, we can rewrite it as
Here m means the total number of examples in your dataset. In our example, m will be
the total number of houses in our dataset.
Now look at our cost function carefully, we need predicted values i.e h(x) for all m
examples. Let’s look again how our predicted values or predicted price look like.
To calculate our cost function, what we need is h(x) for all m examples i.e m predicted
prices corresponding to m houses.
Now, to calculate h(x), we need a base price and the value of Θ₁. Note that these are the
values which we will tune to find our best fit. We need something to start with, so we
will randomly initialize these two values.
Explanation of Cost Function

If you look at the cost function carefully
you’ll find that what we are doing is just averaging the square of the distances between
predicted values and actual values over all the m examples.
Look at the graph above, here m = 4. The points on the blue line are predicted values
while the red points are actual values. The green line is the distance between the actual
value and the predicted value.
The cost for this line will be
So what the cost function calculates is just the mean of the square of the length of
green lines. We also divide it by 2 to ease out some future calculations which we will
see.
Linear Regression tries to minimize this cost by finding the proper values of Θ₀ and Θ₁.
How? By using Gradient Descent.
Gradient Descent
Gradient Descent is a very important algorithm when it comes to Machine Learning.
Right from Linear Regression to Neural Networks, it is used everywhere.
This is how we update our weights. This update rule is executed in a loop & it helps us
to reach the minimum of the cost function. The α is a constant learning rate which we
will talk about in a minute.
U
nderstanding Gradient Descent
So basically we are updating our weight by subtracting it with the partial
derivative of our cost function w.r.t the weight.
But how is this taking us to the minimum cost? Let’s visualize it. For easy
understanding, let’s assume that Θ₀ is 0 for now.
So the hypothesis becomes
And the cost function
Now let’s see how the cost is dependent on the value of Θ₁. Since this is a quadratic
equation, the graph of Θ₁ vs J(Θ) will be a parabola and it will look something like this
with Θ₁ on the x-axis and J(Θ) on the y-axis.
Source: Machine Learning by Andrew Ng
Our goal is to reach the minimum of the cost function, which we will get when our Θ₁
will be equal to Θₘᵢₙ.
Now, to start with we will randomly initialize our Θ₁.
Suppose, the Θ₁ gets initialized as shown in the figure. The cost corresponding to
current Θ₁ is equal to the blue dot on the graph.
Now, let’s update Θ₁ using gradient descent.
We are subtracting the derivative of the cost function w.r.t Θ₁ multiplied by some
constant.
The derivative of cost function w.r.t Θ₁ gives the slope of the curve at that point. Which
in these case is positive. So we are subtracting positive quantity from our current value
of Θ₁. This will force Θ₁ to move in the left direction and slowly diverge to the value of
Θₘᵢₙ where our cost function is minimum. Here comes the role of α which is our
learning rate. It is the learning rate which decides how much we want to descent in
one iteration. Also, one point to note here is that as we are moving to the minimum,
the slope of the curve is also getting less steeper that means, as we are reaching the
minimum value, we will be taking smaller and smaller steps.
Eventually, the slope will become zero at the minimum of the curve and then Θ₁ will
not be updated.
Think of it like this. Suppose a man is at top of the valley and he wants to get to the
bottom of the valley. So he goes down the slope taking larger steps when the slope is
steep and smaller steps when the slope is less steep. He decides his next position based
on his current position and stops when he gets to the bottom of the valley which was
his goal.
Similarly, if the Θ₁ is initialized on the left side of the minimum value,
The slope at this point will be negative. In gradient descent, we subtract the slope, but
here slope is negative. So, the negative of negative will become positive. So we will
keep on adding until it is reached where cost becomes minimum.
Gradient Descent (Source: https://saugatbhattarai.com.np/what-is-gradient-descent-in-machine-learning/)
The above figure is a good depiction of gradient descent. Note how the steps are
getting smaller and smaller as we are reaching the minimum.
Similarly, the value of Θ₀ will also be updated using gradient descent. I did not show it,
because we need to update the values of Θ₀ and Θ₁ simultaneously which will result in
a 3-dimensional graph (cost on one axis, Θ₀ on one axis and Θ₁ on another axis) which
becomes kind of hard to visualize.
Derivative of Cost Function
We are using a derivative of the cost function in gradient descent.
Let’s look at what we get after differentiating it.
Similarly, for Θ₁
Linear Regression Visualization
Linear Regression Visualization
In this visualization, you can see how the line is fitting to the dataset. Note that
initially, the line is covering the distance very quickly. But as the cost is decreasing, the
line becomes slower.
The code for the above visualization is available on my GitHub.
Got Questions? Need Help? Contact me!
Email: sushantpatrikarml@gmail.com
Github: https://github.com/sushantPatrikar
LinkedIn: https://www.linkedin.com/in/sushant-patrikar/
Website: https://sushantpatrikar.github.io/
Machine Learning Linear Regression Gradient Descent Mathematics
Follow
Written by Sushant Patrikar

180 Followers · Writer for Towards Data Science
I breathe Machine Learning.
More from Sushant Patrikar and Towards Data Science
Sushant Patrikar in Towards Data Science
Batch, Mini Batch & Stochastic Gradient Descent
An introduction to gradient descent and it’s variants.
5 min read · Oct 1, 2019
1.5K 8
Jacob Marks, Ph.D. in Towards Data Science
How I Turned My Company’s Docs into a Searchable Database with OpenAI

And how you can do the same with your docs
15 min read · Apr 25
3.1K 39
Leonie Monigatti in Towards Data Science
Getting Started with LangChain: A Beginner’s Guide to Building LLM-

Powered Applications
A LangChain tutorial to build anything with large language models in Python
· 12 min read · Apr 25
2.2K 18
Matt Chapman in Towards Data Science
How I Stay Up to Date With the Latest AI Trends as a Full-Time Data

Scientist
No, I don’t just ask ChatGPT to tell me
· 8 min read · May 1
1.4K 21
See all from Sushant Patrikar
See all from Towards Data Science
Recommended from Medium
Dr. Roi Yehoshua in Towards Data Science
Mastering Logistic Regression

From theory to implementation in Python
· 17 min read · May 20
233
Peter Karas in Artificial Intelligence in Plain English
Logistic Regression in Depth

Logistic regression, activation function, derivation, math
· 7 min read · Jan 31
260 1
Lists
What is ChatGPT?
9 stories · 62 saves
Staff Picks
329 stories · 83 saves
Dr. Soumen Atta, Ph.D.
Regression models: a concise tutorial of real-life examples with Python

implementations (Part III)
In this tutorial, we will discuss lasso regression and non-linear regression with real-life examples
and Python implementations. This is…
· 8 min read · Mar 6
Peter Karas in Artificial Intelligence in Plain English
Linear Regression in depth

The directive equation of a straight line, simple linear regression, math, cost functions
111 2
Matt Chapman in Towards Data Science
The Portfolio that Got Me a Data Scientist Job

Spoiler alert: It was surprisingly easy (and free) to make
· 10 min read · Mar 24
2.9K 44
Sadrach Pierre, Ph.D. in Towards Data Science
Mastering P-values in Machine Learning

Understanding P-values and ML use cases
171
See more recommendations

Everything You Need To Know About Linear Regression - by Sushant Patrikar - Towards Data Science

Uploaded by

Copyright:

Available Formats

You might also like

Everything You Need To Know About Linear Regression - by Sushant Patrikar - Towards Data Science

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Everything You Need To Know About Linear Regression - by Sushant Patrikar - Towards Data Science

Uploaded by

Copyright:

Available Formats

5/31/23, 9:36 AM Everything You Need To Know About Linear Regression | by Sushant Patrikar | Towards Data Science

Open in app Sign up Sign In

Everything You Need To Know About Linear

Linear Regression (Source: https://datumguy.com/blog/blog/view/5ce138213122e?

Cost Function J(Θ)

To make it look more beautiful for our brain, we can rewrite it as

Explanation of Cost Function

The cost for this line will be

So the hypothesis becomes

And the cost function

Source: Machine Learning by Andrew Ng

Now, to start with we will randomly initialize our Θ₁.

Source: Machine Learning by Andrew Ng

Now, let’s update Θ₁ using gradient descent.

Source: Machine Learning by Andrew Ng

Source: Machine Learning by Andrew Ng

Similarly, if the Θ₁ is initialized on the left side of the minimum value,

Source: Machine Learning by Andrew Ng

Gradient Descent (Source: https://saugatbhattarai.com.np/what-is-gradient-descent-in-machine-learning/)

Let’s look at what we get after differentiating it.

Linear Regression Visualization

Linear Regression Visualization

The code for the above visualization is available on my GitHub.

Got Questions? Need Help? Contact me!

Machine Learning Linear Regression Gradient Descent Mathematics

Written by Sushant Patrikar

I breathe Machine Learning.

More from Sushant Patrikar and Towards Data Science

Sushant Patrikar in Towards Data Science

Batch, Mini Batch & Stochastic Gradient Descent

An introduction to gradient descent and it’s variants.

5 min read · Oct 1, 2019

Jacob Marks, Ph.D. in Towards Data Science

How I Turned My Company’s Docs into a Searchable Database with OpenAI

15 min read · Apr 25

Leonie Monigatti in Towards Data Science

Getting Started with LangChain: A Beginner’s Guide to Building LLM-

· 12 min read · Apr 25

Matt Chapman in Towards Data Science

How I Stay Up to Date With the Latest AI Trends as a Full-Time Data

· 8 min read · May 1

See all from Sushant Patrikar

See all from Towards Data Science

Recommended from Medium

Dr. Roi Yehoshua in Towards Data Science

Mastering Logistic Regression

· 17 min read · May 20

Peter Karas in Artificial Intelligence in Plain English

Logistic Regression in Depth

· 7 min read · Jan 31

Dr. Soumen Atta, Ph.D.

Regression models: a concise tutorial of real-life examples with Python

· 8 min read · Mar 6

Peter Karas in Artificial Intelligence in Plain English

Linear Regression in depth

· 6 min read · Jan 27

Matt Chapman in Towards Data Science