Professional Documents
Culture Documents
Kami Export - 5a - LinearRegression
Kami Export - 5a - LinearRegression
FSD Team
Informatics, UII
October 2, 2022
Minutes
1 15 30 45 60
Mood
Minutes
1 15 30 45 60
Mood
I am not necessarily always happier as time passes
6
Kg
eat 1 cookie
2
cookies
1 2 3 4
6
Kg
eat 2 cookies
4
eat 1 cookie
2
cookies
1 2 3 4
6
Kg
eat 2 cookies
4
eat 1 cookie
2
cookies
1 2 3 4
6
Kg
cookies
1 2 3 4
eat 3 cookies
6
Kg
cookies
1 2 3 4
20
19 Kg
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1 cookies
1 2 3 4 5 6 7 8 9 10
With the line, you can predict the weight gain for any (positive) number of cookies
eaten.
y = θ0 + θ1 x,
y = θ0 + θ1 x,
The intercept indicates the y-coordinate through which the line passes. It
is the value of y when x = 0.
y = θ0 + θ1 x,
The intercept indicates the y-coordinate through which the line passes. It
is the value of y when x = 0.
20
19 Kg
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1 cookies
1 2 3 4 5 6 7 8 9 10
20
19 Kg
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1 cookies
1 2 3 4 5 6 7 8 9 10
From the previous example, we have y = 0 + 2x. Now let’s play a bit with the slope and
intercept.
20
19 Kg
18
=5
17
4
16
=
15
3
θ1
=
14
θ1
2
13
=
θ1
12
θ1
11
10
9
8
7
6
5
4
3
2
1 cookies
1 2 3 4 5 6 7 8 9 10
=5
18
4
17
16
3
=
15
2
=
14
θ1
13
=
θ1
12
θ1
11
θ1
10
9
8
7 1
6
5
4
θ1 =
3
2 cookies
1
−1
−2 1 2 3 4 5 6 7 8 9 10
−3
−4
−5
−6
−7 θ1 = −1
−8
−9
−10 θ1 =
−11
−12 −2
−13
−14
−15
−16
−17
−18
−19
−20
20
19 Kg
18
17
16
10
15
6
14
0
θ0
13
=
θ0
12
θ0
11
10
9
8
7
6
5
4
3
2
1 cookies
1 2 3 4 5 6 7 8 9 10
20
19 Kg
18
17
10
16
6
15
0
14
=
13
=
θ0
12
θ0
11
θ0
10
9
8
7 6
6
5 −
4 =
3 θ0
2 cookies
1
−1
−2 1 2 3 4 5 6 7 8 9 10
−3
−4
−5 0
−6
−7 −1
−8
−9
=
−10 θ0
−11
−12
−13
−14
−15
−16
−17
−18
−19
−20
The previous example illustratively describes the idea behind the linear
regression.
That is, based on the data we have, we want to predict the weight gained,
given the number of cookies we eat, by fitting a line.
Based on the data above, a typical question is, e.g., what is the price of a
house if the area is 558 M2 ?
600
Price
400
200
600
Price
400
200
400
200
400
Solution: we can find a line
(model) that constitutes the
200 data we have, and use the line
1,000 2,000 3,000 4,000 to predict ŷ based on x̂
Area in m2
400
Solution: we can find a line
(model) that constitutes the
200 data we have, and use the line
1,000 2,000 3,000 4,000 to predict ŷ based on x̂
Area in m2
This is called a model generalization, i.e., you obtain a model from a data set and apply
it to another data set(s). This is a fundamental concept in Machine & Deep Learning.
600
Price
400
200
600
Price
400
200
600
Price
400
200
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
Draw a line like this?
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
But which line?
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
But which line?
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
But which line? What is the criteria of a good line/model? Define ones.
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
Informally, we can think of a good line/model is the one that is generally close to all
data points.
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
To indicate how close a line, we can measure distances between data points to the line.
The total distance is often called error; the lower it is, the better.
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
Linear regression is about to find the best line by selecting the one with the minimum
error. How?
800
600
Price
400
200
0
1,000 2,000 3,000 4,000 5,000
Area in m2
Recall that we can search lines by changing the values of intercept θ0 and slope θ1 in
y = θ0 + θ1 x. But of course, we do not want to search randomly.
hθ (x(i) ), i.e., 5
hθ (x(i) ) − y (i) . 4
3 hθ (x(i)) − y (i)
2
1 2 3 4
The total error, or often called cost function in ML/DL, thus can be
written
m
1X
J(θ) = (hθ (x(i) ) − y (i) )2 .
2
i=1
The total error, or often called cost function in ML/DL, thus can be
written
m
1X
J(θ) = (hθ (x(i) ) − y (i) )2 .
2
i=1
The function above is called the ordinary least square. Here is the plot.
·107
1.5
1
Cost
0.5
0
−50 0 50 100 150 200
θ
·107
1.5
0.5
0
−50 0 50 100 150 200
θ
·107
1.5
Obviously we know
0.5
which one the minimum
cost is, don’t we?
0
−50 0 50 100 150 200
θ
·107
1.5
Hey, we get some clue
here!
1
Cost
Obviously we know
which one the minimum
0.5
cost is, don’t we?
I am here!
That is, look at “I am
0
here”
−50 0 50 100 150 200
θ
·107
1.5
Hey, we get some clue
here!
1
Cost
Obviously we know
which one the minimum
0.5
cost is, don’t we?
I am here!
That is, look at “I am
0
here”
−50 0 50 100 150 200
θ
Each cost (data point) above represents an error term of a line/model.
The question: How to find the minimum error?
·107
1.5
Hey, we get some clue
here!
1
Cost
Obviously we know
which one the minimum
0.5
cost is, don’t we?
I am here!
That is, look at “I am
0
here”
−50 0 50 100 150 200
θ
How to find the minimum error? Answer: Simply going down to the valley
bottom.
·107
Start here!
1.5 Typically your randomly
initialized line returns
1 the “start here” cost.
Cost
0
−50 0 50 100 150 200
θ
·107
Start here!
1.5 Typically your randomly
initialized line returns
1 the “start here” cost.
Cost
·107
Start here!
1.5
1
Cost
0
−50 0 50 100 150 200
θ
·107
Initially guest θ, compute
1.5 the cost (see Start here)
down here
Repeatedly, update the
1 parameter (see “down
Cost
here”) via
0.5
∂
θj := θj − α J(θ)
∂θj
0
−50 0 50 100 150 200 simultaneously for all
θ j = 0, . . . , n.
down again...
·107
Initially guest θ, compute
1.5 the cost (see Start here)
down here Repeatedly, update the
1 parameter (see “down
Cost
here”) via
0.5
∂
θj := θj − α J(θ)
∂θj
0
−50 0 50 100 150 200 simultaneously for all
θ j = 0, . . . , n.
down again...
·107
Initially guest θ, compute
1.5 the cost (see Start here)
Repeatedly, update the
1 parameter (see “down
Cost
here”) via
0.5
∂
θj := θj − α J(θ)
∂θj
0
−50 0 50 100 150 200 simultaneously for all
θ j = 0, . . . , n.
so on...
·107
Initially guest θ, compute
1.5 the cost (see Start here)
Repeatedly, update the
1 parameter (see “down
Cost
here”) via
0.5
∂
stop here θj := θj − α J(θ)
∂θj
0
−50 0 50 100 150 200 simultaneously for all
θ j = 0, . . . , n.
until reaching the minimum error.
·107
1.5
corresponding parameter
0.5
θ should give you the
stop here
fittest line/model
0
−50 0 50 100 150 200
θ
In more detail
The learning procedure above is a typical approach in mostly (parametric)
models of machine and deep learning.
Note that we have been so far assuming the y in our data set
(x(i) , y (i) ); i = 1, . . . , m is continuous, e.g., house price, height,
temperature, etc.
In more detail
The learning procedure above is a typical approach in mostly (parametric)
models of machine and deep learning.
Note that we have been so far assuming the y in our data set
(x(i) , y (i) ); i = 1, . . . , m is continuous, e.g., house price, height,
temperature, etc.
library ( MASS )
lm . fit <- lm ( medv ~ lstat , data = Boston )
# check result
lm . fit
summary ( lm . fit )
# plot result
attach ( Boston )
plot ( lstat , medv )
abline ( lm . fit )
FSD Team (Informatics, UII) Linear Regression October 2, 2022 49 / 50
Example 2