Kami Export - 5a - LinearRegression

Linear Regression
FSD Team
Informatics, UII
October 2, 2022
FSD Team (Informatics, UII) Linear Regression October 2, 2022 1 / 50

How do we see the world?

The more cookies I eat, the more weight I gain

Minutes
1 15 30 45 60
Mood

Minutes
1 15 30 45 60
Mood
I am not necessarily always happier as time passes

If I eat 1 cookie per day, I gain 2 Kg weight

If I eat 2 cookies per day, I gain 4 Kg weight


The more cookies I eat, the more weight I gain.


We call such an example as a linear model.


We call such an example as a linear model. For example,

If A goes up, so does B, OR
If A goes up, B goes down


We call such an example as a linear model. For example,

If A goes up, so does B, OR
If A goes up, B goes down
Given two variables, a linear relationship among them indicates consistent
directions of changes.

I just woke up and

I am happy (minute 1)
I am angry as no one WA me (minute 10)
I am happy as 1 WA comes (minute 11)
I am nervous as I’ll have an exam (minute 30)

I just woke up and

I am not necessarily always happier as time passes.

I just woke up and

We call such an example as nonlinear model.

I just woke up and

We call such an example as nonlinear model.
For example, if A goes up, B alternates up and down.

If I eat 1 cookie per day, I gain 2 Kg of weight

If I eat 2 cookies per day, I gain 4 Kg of weight
What if I eat 3 cookies? How many Kg of weight will I gain?

I just woke up and

Will I be happy in minute 110? Or nervous?

Say it in Mathematics
6
Kg
eat 1 cookie
2
cookies
1 2 3 4

6
Kg
eat 2 cookies
4
eat 1 cookie
2
cookies
1 2 3 4

6
Kg
eat 2 cookies
4
eat 1 cookie
2
cookies
1 2 3 4
What if I eat 3 cookies?

6
Kg
cookies
1 2 3 4
We can draw a line that passes through the two points.

eat 3 cookies
6
Kg
cookies
1 2 3 4
Using the line, we can predict “what if I eat 3 cookies?”

20
19 Kg
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1 cookies
1 2 3 4 5 6 7 8 9 10
With the line, you can predict the weight gain for any (positive) number of cookies
eaten.

In terms of mathematical function, a line can be represented by
y = θ0 + θ1 x,
where θ0 is the intercept and θ1 is the slope.

y = θ0 + θ1 x,
The intercept indicates the y-coordinate through which the line passes. It
is the value of y when x = 0.

y = θ0 + θ1 x,
The intercept indicates the y-coordinate through which the line passes. It
is the value of y when x = 0.
The slope represents how steep the line is.

20
19 Kg
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1 cookies
1 2 3 4 5 6 7 8 9 10
From the previous example, we have y = 0 + 2x.

20
19 Kg
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1 cookies
1 2 3 4 5 6 7 8 9 10
From the previous example, we have y = 0 + 2x. Now let’s play a bit with the slope and
intercept.

20
19 Kg
18
=5
17
4
16
=
15
3
θ1
=
14
θ1
2
13
=
θ1
12
θ1
11
10
9
8
7
6
5
4
3
2
1 cookies
1 2 3 4 5 6 7 8 9 10
From y = 0 + 2x, we have θ1 = 2. See what happens if we increase the slope θ1 .

20
19 Kg
=5
18
4
17
16
3
=
15
2
=
14
θ1
13
=
θ1
12
θ1
11
θ1
10
9
8
7 1
6
5
4
θ1 =
3
2 cookies
1
−1
−2 1 2 3 4 5 6 7 8 9 10
−3
−4
−5
−6
−7 θ1 = −1
−8
−9
−10 θ1 =
−11
−12 −2
−13
−14
−15
−16
−17
−18
−19
−20
From y = 0 + 2x, we have θ1 = 2. See what happens if we decrease the slope θ1 .

20
19 Kg
18
17
16
10
15
6
14
0
θ0
13
=
θ0
12
θ0
11
10
9
8
7
6
5
4
3
2
1 cookies
1 2 3 4 5 6 7 8 9 10
From y = 0 + 2x, we have θ0 = 0. See what happens if we increase the intercept θ0 .

20
19 Kg
18
17
10
16
6
15
0
14
=
13
=
θ0
12
θ0
11
θ0
10
9
8
7 6
6
5 −
4 =
3 θ0
2 cookies
1
−1
−2 1 2 3 4 5 6 7 8 9 10
−3
−4
−5 0
−6
−7 −1
−8
−9
=
−10 θ0
−11
−12
−13
−14
−15
−16
−17
−18
−19
−20
From y = 0 + 2x, we have θ0 = 0. See what happens if we decrease the intercept θ0 .

Linear regression
The previous example illustratively describes the idea behind the linear
regression.
That is, based on the data we have, we want to predict the weight gained,
given the number of cookies we eat, by fitting a line.
Next, we will describe the linear regression in more detail.

Real-world cases
In real-world cases, the data sets are often of the form

(x(i) , y (i) ); i = 1, . . . , m. For example,
House area (x) Price (y)
2104 400
1600 330
2400 369
1416 232
3000 540
.. ..
. .
Based on the data above, a typical question is, e.g., what is the price of a
house if the area is 558 M2 ?

Plot the data
600
Price
400
200
1,000 2,000 3,000 4,000

Area in m2
Does the data distribution form a linear fashion?

Plot the data
600
Price
400
200
1,000 2,000 3,000 4,000

Area in m2
A typical question is, e.g., what is the price of a house if the area is 558
M2 ?

Plot the data
600 Note that we are interested in

to predict unseen ŷ based on
x̂ from other population
Price
400
200
1,000 2,000 3,000 4,000

Area in m2

Plot the data

Price
400
Solution: we can find a line
(model) that constitutes the
200 data we have, and use the line
1,000 2,000 3,000 4,000 to predict ŷ based on x̂
Area in m2

Plot the data

Price
400
Solution: we can find a line
(model) that constitutes the
200 data we have, and use the line
1,000 2,000 3,000 4,000 to predict ŷ based on x̂
Area in m2
This is called a model generalization, i.e., you obtain a model from a data set and apply
it to another data set(s). This is a fundamental concept in Machine & Deep Learning.

Linear regression
600
Price
400
200
1,000 2,000 3,000 4,000

Area in m2
The data do form a linear fashion, but how to find a good line/model?

Linear regression
600
Price
400
200
1,000 2,000 3,000 4,000

Area in m2
Draw a line by connecting all the points like this?

Linear regression
600
Price
400
200
1,000 2,000 3,000 4,000

Area in m2
Draw a line by connecting all the points like this? Recall that our objective is a model
generalization; the model above will not fit well other data.

Linear regression
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
Draw a line like this?

Linear regression
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
But which line?

Linear regression
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
But which line?

Linear regression
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
But which line? What is the criteria of a good line/model? Define ones.

Linear regression
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
Informally, we can think of a good line/model is the one that is generally close to all
data points.

Linear regression
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
To indicate how close a line, we can measure distances between data points to the line.
The total distance is often called error; the lower it is, the better.

Linear regression
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
Linear regression is about to find the best line by selecting the one with the minimum
error. How?

Linear regression
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
Recall that we can search lines by changing the values of intercept θ0 and slope θ1 in
y = θ0 + θ1 x. But of course, we do not want to search randomly.

Linear Regression
More formally, a straight line can be represented by,

n
X
h(x) = θi xi = θT x,
i=0

Linear Regression
More formally, a straight line can be represented by,

n
X
h(x) = θi xi = θT x,
The error term indicates i=0
the distance between a

point y (i) to the line 6
hθ (x(i) ), i.e., 5
hθ (x(i) ) − y (i) . 4
3 hθ (x(i)) − y (i)
2
1 2 3 4

Linear Regression
The total error, or often called cost function in ML/DL, thus can be
written
m
1X
J(θ) = (hθ (x(i) ) − y (i) )2 .
2
i=1

Linear Regression
The total error, or often called cost function in ML/DL, thus can be
written
m
1X
J(θ) = (hθ (x(i) ) − y (i) )2 .
2
i=1
The function above is called the ordinary least square. Here is the plot.
·107
1.5
1
Cost
0.5
0
−50 0 50 100 150 200
θ

Ordinary least square
·107
1.5
Hey, we get some clue

1
here!
Cost
0.5
0
−50 0 50 100 150 200
θ

·107
1.5

1
here!
Cost
Obviously we know
0.5
which one the minimum
cost is, don’t we?
0
−50 0 50 100 150 200
θ

·107
1.5
here!
1
Cost
Obviously we know
0.5
I am here!
That is, look at “I am
0
here”
−50 0 50 100 150 200
θ

·107
1.5
here!
1
Cost
Obviously we know
0.5
I am here!
0
here”
−50 0 50 100 150 200
θ
Each cost (data point) above represents an error term of a line/model.
The question: How to find the minimum error?

·107
1.5
here!
1
Cost
Obviously we know
0.5
I am here!
0
here”
−50 0 50 100 150 200
θ
How to find the minimum error? Answer: Simply going down to the valley
bottom.

·107
Start here!
1.5 Typically your randomly
initialized line returns
1 the “start here” cost.
Cost
How can we go down?

0.5
0
−50 0 50 100 150 200
θ

·107
Start here!
1.5 Typically your randomly
initialized line returns
1 the “start here” cost.
Cost
How can we go down?

0.5 We can use the gradient
descent to update
0 parameter θ, so as to
−50 0 50 100 150 200
get the minimum cost
θ
We now use the term model parameter to represent slope and intercept.

Gradient descent
·107
Start here!
1.5
1
Cost
Initially guest θ, compute

0.5 the cost (see Start here)
0
−50 0 50 100 150 200
θ

Gradient descent
·107
down here
Repeatedly, update the
1 parameter (see “down
Cost
here”) via
0.5
∂
θj := θj − α J(θ)
∂θj
0
−50 0 50 100 150 200 simultaneously for all
θ j = 0, . . . , n.
down again...

Gradient descent
·107
down here Repeatedly, update the
Cost
here”) via
0.5
∂
∂θj
0
θ j = 0, . . . , n.
down again...

Gradient descent
·107
Cost
here”) via
0.5
∂
∂θj
0
θ j = 0, . . . , n.
so on...

Gradient descent
·107
Cost
here”) via
0.5
∂
stop here θj := θj − α J(θ)
∂θj
0
θ j = 0, . . . , n.
until reaching the minimum error.

Gradient descent
·107
1.5
Once you reach the

1
minimum error, the
Cost
corresponding parameter
0.5
θ should give you the
stop here
fittest line/model
0
−50 0 50 100 150 200
θ

Linear Regression: the procedure
1 Pick an initial line/model h(θ) by randomly choosing parameter θ

2 Compute the corresponding cost function, e.g.,
m
1X
J(θ) = (hθ (x(i) ) − y (i) )2 ,
2
i=1
3 Update the line/model h(θ) by updating θ that makes J(θ) smaller,

using, e.g., the gradient descent that reads
∂
θj := θj − α J(θ),
∂θj
where α is the learning rate.

4 Repeat steps 2 and 3 until converges

Linear regression: a summary
In more detail
The learning procedure above is a typical approach in mostly (parametric)
models of machine and deep learning.
Note that we have been so far assuming the y in our data set
(x(i) , y (i) ); i = 1, . . . , m is continuous, e.g., house price, height,
temperature, etc.

Linear regression: a summary
In more detail
The learning procedure above is a typical approach in mostly (parametric)
models of machine and deep learning.
Note that we have been so far assuming the y in our data set
(x(i) , y (i) ); i = 1, . . . , m is continuous, e.g., house price, height,
temperature, etc.
What if y is discrete? E.g., pass/fail, yes/true, 1/0, etc, that leads to

nonlinear fashion and a classification task.

Practice in R

Example 1
# Linear regression
library ( MASS )
lm . fit <- lm ( medv ~ lstat , data = Boston )
# check result
lm . fit
summary ( lm . fit )
# check other values

names ( lm . fit )
# plot result
attach ( Boston )
plot ( lstat , medv )
abline ( lm . fit )
Example 2
# Multiple linear regression

library ( MASS )
lm . fit <- lm ( medv ~ lstat + age , data = Boston )

# with all variable in the data

lm . fit <- lm ( medv ~ . , data = Boston )
# with all varible except age

lm . fit1 = lm ( medv ~ . - age , data = Boston )

Kami Export - 5a - LinearRegression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kami Export - 5a - LinearRegression

Uploaded by

Copyright:

Available Formats

Linear Regression

FSD Team (Informatics, UII) Linear Regression October 2, 2022 1 / 50

FSD Team (Informatics, UII) Linear Regression October 2, 2022 2 / 50

The more cookies I eat, the more weight I gain

FSD Team (Informatics, UII) Linear Regression October 2, 2022 2 / 50

FSD Team (Informatics, UII) Linear Regression October 2, 2022 3 / 50

FSD Team (Informatics, UII) Linear Regression October 2, 2022 3 / 50

If I eat 1 cookie per day, I gain 2 Kg weight

FSD Team (Informatics, UII) Linear Regression October 2, 2022 4 / 50

If I eat 1 cookie per day, I gain 2 Kg weight

The more cookies I eat, the more weight I gain.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 4 / 50

If I eat 1 cookie per day, I gain 2 Kg weight

The more cookies I eat, the more weight I gain.

We call such an example as a linear model.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 4 / 50

If I eat 1 cookie per day, I gain 2 Kg weight

The more cookies I eat, the more weight I gain.

We call such an example as a linear model. For example,

FSD Team (Informatics, UII) Linear Regression October 2, 2022 4 / 50

If I eat 1 cookie per day, I gain 2 Kg weight

The more cookies I eat, the more weight I gain.

We call such an example as a linear model. For example,

FSD Team (Informatics, UII) Linear Regression October 2, 2022 4 / 50

I just woke up and

FSD Team (Informatics, UII) Linear Regression October 2, 2022 5 / 50

I just woke up and

I am not necessarily always happier as time passes.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 5 / 50

I just woke up and

I am not necessarily always happier as time passes.

We call such an example as nonlinear model.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 5 / 50

I just woke up and

I am not necessarily always happier as time passes.

We call such an example as nonlinear model.

For example, if A goes up, B alternates up and down.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 5 / 50

If I eat 1 cookie per day, I gain 2 Kg of weight

What if I eat 3 cookies? How many Kg of weight will I gain?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 6 / 50

I just woke up and

Will I be happy in minute 110? Or nervous?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 7 / 50

FSD Team (Informatics, UII) Linear Regression October 2, 2022 8 / 50

FSD Team (Informatics, UII) Linear Regression October 2, 2022 9 / 50

What if I eat 3 cookies?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 10 / 50

We can draw a line that passes through the two points.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 11 / 50

Using the line, we can predict “what if I eat 3 cookies?”

FSD Team (Informatics, UII) Linear Regression October 2, 2022 12 / 50

FSD Team (Informatics, UII) Linear Regression October 2, 2022 13 / 50

In terms of mathematical function, a line can be represented by

where θ0 is the intercept and θ1 is the slope.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 14 / 50

In terms of mathematical function, a line can be represented by

where θ0 is the intercept and θ1 is the slope.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 14 / 50

In terms of mathematical function, a line can be represented by

where θ0 is the intercept and θ1 is the slope.

The slope represents how steep the line is.