Download as pdf or txt
Download as pdf or txt
You are on page 1of 72

Linear Regression

FSD Team

Informatics, UII

October 2, 2022

FSD Team (Informatics, UII) Linear Regression October 2, 2022 1 / 50


How do we see the world?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 2 / 50


How do we see the world?

The more cookies I eat, the more weight I gain

FSD Team (Informatics, UII) Linear Regression October 2, 2022 2 / 50


How do we see the world?

Minutes
1 15 30 45 60
Mood

FSD Team (Informatics, UII) Linear Regression October 2, 2022 3 / 50


How do we see the world?

Minutes
1 15 30 45 60
Mood
I am not necessarily always happier as time passes

FSD Team (Informatics, UII) Linear Regression October 2, 2022 3 / 50


How do we see the world?

If I eat 1 cookie per day, I gain 2 Kg weight


If I eat 2 cookies per day, I gain 4 Kg weight

FSD Team (Informatics, UII) Linear Regression October 2, 2022 4 / 50


How do we see the world?

If I eat 1 cookie per day, I gain 2 Kg weight


If I eat 2 cookies per day, I gain 4 Kg weight

The more cookies I eat, the more weight I gain.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 4 / 50


How do we see the world?

If I eat 1 cookie per day, I gain 2 Kg weight


If I eat 2 cookies per day, I gain 4 Kg weight

The more cookies I eat, the more weight I gain.

We call such an example as a linear model.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 4 / 50


How do we see the world?

If I eat 1 cookie per day, I gain 2 Kg weight


If I eat 2 cookies per day, I gain 4 Kg weight

The more cookies I eat, the more weight I gain.

We call such an example as a linear model. For example,


If A goes up, so does B, OR
If A goes up, B goes down

FSD Team (Informatics, UII) Linear Regression October 2, 2022 4 / 50


How do we see the world?

If I eat 1 cookie per day, I gain 2 Kg weight


If I eat 2 cookies per day, I gain 4 Kg weight

The more cookies I eat, the more weight I gain.

We call such an example as a linear model. For example,


If A goes up, so does B, OR
If A goes up, B goes down
Given two variables, a linear relationship among them indicates consistent
directions of changes.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 4 / 50


How do we see the world?

I just woke up and


I am happy (minute 1)
I am angry as no one WA me (minute 10)
I am happy as 1 WA comes (minute 11)
I am nervous as I’ll have an exam (minute 30)

FSD Team (Informatics, UII) Linear Regression October 2, 2022 5 / 50


How do we see the world?

I just woke up and


I am happy (minute 1)
I am angry as no one WA me (minute 10)
I am happy as 1 WA comes (minute 11)
I am nervous as I’ll have an exam (minute 30)

I am not necessarily always happier as time passes.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 5 / 50


How do we see the world?

I just woke up and


I am happy (minute 1)
I am angry as no one WA me (minute 10)
I am happy as 1 WA comes (minute 11)
I am nervous as I’ll have an exam (minute 30)

I am not necessarily always happier as time passes.

We call such an example as nonlinear model.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 5 / 50


How do we see the world?

I just woke up and


I am happy (minute 1)
I am angry as no one WA me (minute 10)
I am happy as 1 WA comes (minute 11)
I am nervous as I’ll have an exam (minute 30)

I am not necessarily always happier as time passes.

We call such an example as nonlinear model.

For example, if A goes up, B alternates up and down.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 5 / 50


How do we see the world?

If I eat 1 cookie per day, I gain 2 Kg of weight


If I eat 2 cookies per day, I gain 4 Kg of weight

What if I eat 3 cookies? How many Kg of weight will I gain?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 6 / 50


How do we see the world?

I just woke up and


I am happy (minute 1)
I am angry as no one WA me (minute 10)
I am happy as 1 WA comes (minute 11)
I am nervous as I’ll have an exam (minute 30)

Will I be happy in minute 110? Or nervous?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 7 / 50


Say it in Mathematics

6
Kg

eat 1 cookie
2

cookies
1 2 3 4

FSD Team (Informatics, UII) Linear Regression October 2, 2022 8 / 50


Say it in Mathematics

6
Kg

eat 2 cookies
4

eat 1 cookie
2

cookies
1 2 3 4

FSD Team (Informatics, UII) Linear Regression October 2, 2022 9 / 50


Say it in Mathematics

6
Kg

eat 2 cookies
4

eat 1 cookie
2

cookies
1 2 3 4

What if I eat 3 cookies?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 10 / 50


Say it in Mathematics

6
Kg

cookies
1 2 3 4

We can draw a line that passes through the two points.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 11 / 50


Say it in Mathematics

eat 3 cookies
6
Kg

cookies
1 2 3 4

Using the line, we can predict “what if I eat 3 cookies?”

FSD Team (Informatics, UII) Linear Regression October 2, 2022 12 / 50


Say it in Mathematics

20
19 Kg
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1 cookies

1 2 3 4 5 6 7 8 9 10

With the line, you can predict the weight gain for any (positive) number of cookies
eaten.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 13 / 50


Say it in Mathematics

In terms of mathematical function, a line can be represented by

y = θ0 + θ1 x,

where θ0 is the intercept and θ1 is the slope.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 14 / 50


Say it in Mathematics

In terms of mathematical function, a line can be represented by

y = θ0 + θ1 x,

where θ0 is the intercept and θ1 is the slope.

The intercept indicates the y-coordinate through which the line passes. It
is the value of y when x = 0.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 14 / 50


Say it in Mathematics

In terms of mathematical function, a line can be represented by

y = θ0 + θ1 x,

where θ0 is the intercept and θ1 is the slope.

The intercept indicates the y-coordinate through which the line passes. It
is the value of y when x = 0.

The slope represents how steep the line is.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 14 / 50


Say it in Mathematics

20
19 Kg
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1 cookies

1 2 3 4 5 6 7 8 9 10

From the previous example, we have y = 0 + 2x.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 15 / 50


Say it in Mathematics

20
19 Kg
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1 cookies

1 2 3 4 5 6 7 8 9 10

From the previous example, we have y = 0 + 2x. Now let’s play a bit with the slope and
intercept.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 15 / 50


Say it in Mathematics

20
19 Kg
18

=5
17

4
16

=
15

3
θ1

=
14

θ1

2
13

=
θ1
12

θ1
11
10
9
8
7
6
5
4
3
2
1 cookies

1 2 3 4 5 6 7 8 9 10

From y = 0 + 2x, we have θ1 = 2. See what happens if we increase the slope θ1 .

FSD Team (Informatics, UII) Linear Regression October 2, 2022 16 / 50


20
19 Kg

=5
18

4
17
16

3
=
15

2
=
14

θ1
13

=
θ1
12

θ1
11

θ1
10
9
8
7 1
6
5
4
θ1 =
3
2 cookies
1
−1
−2 1 2 3 4 5 6 7 8 9 10
−3
−4
−5
−6
−7 θ1 = −1
−8
−9
−10 θ1 =
−11
−12 −2
−13
−14
−15
−16
−17
−18
−19
−20

From y = 0 + 2x, we have θ1 = 2. See what happens if we decrease the slope θ1 .

FSD Team (Informatics, UII) Linear Regression October 2, 2022 17 / 50


Say it in Mathematics

20
19 Kg
18
17
16

10
15

6
14

0
θ0
13

=
θ0
12

θ0
11
10
9
8
7
6
5
4
3
2
1 cookies

1 2 3 4 5 6 7 8 9 10

From y = 0 + 2x, we have θ0 = 0. See what happens if we increase the intercept θ0 .

FSD Team (Informatics, UII) Linear Regression October 2, 2022 18 / 50


Say it in Mathematics

20
19 Kg
18
17

10
16

6
15

0
14

=
13

=
θ0
12

θ0
11

θ0
10
9
8
7 6
6
5 −
4 =
3 θ0
2 cookies
1
−1
−2 1 2 3 4 5 6 7 8 9 10
−3
−4
−5 0
−6
−7 −1
−8
−9
=
−10 θ0
−11
−12
−13
−14
−15
−16
−17
−18
−19
−20

From y = 0 + 2x, we have θ0 = 0. See what happens if we decrease the intercept θ0 .

FSD Team (Informatics, UII) Linear Regression October 2, 2022 19 / 50


Linear regression

The previous example illustratively describes the idea behind the linear
regression.

That is, based on the data we have, we want to predict the weight gained,
given the number of cookies we eat, by fitting a line.

Next, we will describe the linear regression in more detail.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 20 / 50


Real-world cases

In real-world cases, the data sets are often of the form


(x(i) , y (i) ); i = 1, . . . , m. For example,
House area (x) Price (y)
2104 400
1600 330
2400 369
1416 232
3000 540
.. ..
. .

Based on the data above, a typical question is, e.g., what is the price of a
house if the area is 558 M2 ?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 21 / 50


Plot the data

600
Price

400

200

1,000 2,000 3,000 4,000


Area in m2
Does the data distribution form a linear fashion?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 22 / 50


Plot the data

600
Price

400

200

1,000 2,000 3,000 4,000


Area in m2
A typical question is, e.g., what is the price of a house if the area is 558
M2 ?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 23 / 50


Plot the data

600 Note that we are interested in


to predict unseen ŷ based on
x̂ from other population
Price

400

200

1,000 2,000 3,000 4,000


Area in m2

FSD Team (Informatics, UII) Linear Regression October 2, 2022 24 / 50


Plot the data

600 Note that we are interested in


to predict unseen ŷ based on
x̂ from other population
Price

400
Solution: we can find a line
(model) that constitutes the
200 data we have, and use the line
1,000 2,000 3,000 4,000 to predict ŷ based on x̂
Area in m2

FSD Team (Informatics, UII) Linear Regression October 2, 2022 24 / 50


Plot the data

600 Note that we are interested in


to predict unseen ŷ based on
x̂ from other population
Price

400
Solution: we can find a line
(model) that constitutes the
200 data we have, and use the line
1,000 2,000 3,000 4,000 to predict ŷ based on x̂
Area in m2
This is called a model generalization, i.e., you obtain a model from a data set and apply
it to another data set(s). This is a fundamental concept in Machine & Deep Learning.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 24 / 50


Linear regression

600
Price

400

200

1,000 2,000 3,000 4,000


Area in m2
The data do form a linear fashion, but how to find a good line/model?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 25 / 50


Linear regression

600
Price

400

200

1,000 2,000 3,000 4,000


Area in m2
Draw a line by connecting all the points like this?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 26 / 50


Linear regression

600
Price

400

200

1,000 2,000 3,000 4,000


Area in m2
Draw a line by connecting all the points like this? Recall that our objective is a model
generalization; the model above will not fit well other data.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 26 / 50


Linear regression

800

600
Price

400

200

0
1,000 2,000 3,000 4,000 5,000
Area in m2
Draw a line like this?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 27 / 50


Linear regression

800

600
Price

400

200

0
1,000 2,000 3,000 4,000 5,000
Area in m2
But which line?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 28 / 50


Linear regression

800

600
Price

400

200

0
1,000 2,000 3,000 4,000 5,000
Area in m2
But which line?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 29 / 50


Linear regression

800

600
Price

400

200

0
1,000 2,000 3,000 4,000 5,000
Area in m2
But which line? What is the criteria of a good line/model? Define ones.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 29 / 50


Linear regression

800

600
Price

400

200

0
1,000 2,000 3,000 4,000 5,000
Area in m2
Informally, we can think of a good line/model is the one that is generally close to all
data points.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 30 / 50


Linear regression

800

600
Price

400

200

0
1,000 2,000 3,000 4,000 5,000
Area in m2
To indicate how close a line, we can measure distances between data points to the line.
The total distance is often called error; the lower it is, the better.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 31 / 50


Linear regression

800

600
Price

400

200

0
1,000 2,000 3,000 4,000 5,000
Area in m2
Linear regression is about to find the best line by selecting the one with the minimum
error. How?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 32 / 50


Linear regression

800

600
Price

400

200

0
1,000 2,000 3,000 4,000 5,000
Area in m2
Recall that we can search lines by changing the values of intercept θ0 and slope θ1 in
y = θ0 + θ1 x. But of course, we do not want to search randomly.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 33 / 50


Linear Regression

More formally, a straight line can be represented by,


n
X
h(x) = θi xi = θT x,
i=0

FSD Team (Informatics, UII) Linear Regression October 2, 2022 34 / 50


Linear Regression

More formally, a straight line can be represented by,


n
X
h(x) = θi xi = θT x,
The error term indicates i=0

the distance between a


point y (i) to the line 6

hθ (x(i) ), i.e., 5

hθ (x(i) ) − y (i) . 4

3 hθ (x(i)) − y (i)
2

1 2 3 4

FSD Team (Informatics, UII) Linear Regression October 2, 2022 34 / 50


Linear Regression

The total error, or often called cost function in ML/DL, thus can be
written
m
1X
J(θ) = (hθ (x(i) ) − y (i) )2 .
2
i=1

FSD Team (Informatics, UII) Linear Regression October 2, 2022 35 / 50


Linear Regression

The total error, or often called cost function in ML/DL, thus can be
written
m
1X
J(θ) = (hθ (x(i) ) − y (i) )2 .
2
i=1

The function above is called the ordinary least square. Here is the plot.
·107

1.5

1
Cost

0.5

0
−50 0 50 100 150 200
θ

FSD Team (Informatics, UII) Linear Regression October 2, 2022 35 / 50


Ordinary least square

·107

1.5

Hey, we get some clue


1
here!
Cost

0.5

0
−50 0 50 100 150 200
θ

FSD Team (Informatics, UII) Linear Regression October 2, 2022 36 / 50


Ordinary least square

·107

1.5

Hey, we get some clue


1
here!
Cost

Obviously we know
0.5
which one the minimum
cost is, don’t we?
0
−50 0 50 100 150 200
θ

FSD Team (Informatics, UII) Linear Regression October 2, 2022 36 / 50


Ordinary least square

·107

1.5
Hey, we get some clue
here!
1
Cost

Obviously we know
which one the minimum
0.5
cost is, don’t we?
I am here!
That is, look at “I am
0
here”
−50 0 50 100 150 200
θ

FSD Team (Informatics, UII) Linear Regression October 2, 2022 37 / 50


Ordinary least square

·107

1.5
Hey, we get some clue
here!
1
Cost

Obviously we know
which one the minimum
0.5
cost is, don’t we?
I am here!
That is, look at “I am
0
here”
−50 0 50 100 150 200
θ
Each cost (data point) above represents an error term of a line/model.
The question: How to find the minimum error?

FSD Team (Informatics, UII) Linear Regression October 2, 2022 37 / 50


Ordinary least square

·107

1.5
Hey, we get some clue
here!
1
Cost

Obviously we know
which one the minimum
0.5
cost is, don’t we?
I am here!
That is, look at “I am
0
here”
−50 0 50 100 150 200
θ
How to find the minimum error? Answer: Simply going down to the valley
bottom.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 38 / 50


Ordinary least square

·107

Start here!
1.5 Typically your randomly
initialized line returns
1 the “start here” cost.
Cost

How can we go down?


0.5

0
−50 0 50 100 150 200
θ

FSD Team (Informatics, UII) Linear Regression October 2, 2022 39 / 50


Ordinary least square

·107

Start here!
1.5 Typically your randomly
initialized line returns
1 the “start here” cost.
Cost

How can we go down?


0.5 We can use the gradient
descent to update
0 parameter θ, so as to
−50 0 50 100 150 200
get the minimum cost
θ
We now use the term model parameter to represent slope and intercept.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 39 / 50


Gradient descent

·107

Start here!
1.5

1
Cost

Initially guest θ, compute


0.5 the cost (see Start here)

0
−50 0 50 100 150 200
θ

FSD Team (Informatics, UII) Linear Regression October 2, 2022 40 / 50


Gradient descent

·107
Initially guest θ, compute
1.5 the cost (see Start here)
down here
Repeatedly, update the
1 parameter (see “down
Cost

here”) via
0.5

θj := θj − α J(θ)
∂θj
0
−50 0 50 100 150 200 simultaneously for all
θ j = 0, . . . , n.
down again...

FSD Team (Informatics, UII) Linear Regression October 2, 2022 41 / 50


Gradient descent

·107
Initially guest θ, compute
1.5 the cost (see Start here)
down here Repeatedly, update the
1 parameter (see “down
Cost

here”) via
0.5

θj := θj − α J(θ)
∂θj
0
−50 0 50 100 150 200 simultaneously for all
θ j = 0, . . . , n.
down again...

FSD Team (Informatics, UII) Linear Regression October 2, 2022 42 / 50


Gradient descent

·107
Initially guest θ, compute
1.5 the cost (see Start here)
Repeatedly, update the
1 parameter (see “down
Cost

here”) via
0.5

θj := θj − α J(θ)
∂θj
0
−50 0 50 100 150 200 simultaneously for all
θ j = 0, . . . , n.
so on...

FSD Team (Informatics, UII) Linear Regression October 2, 2022 43 / 50


Gradient descent

·107
Initially guest θ, compute
1.5 the cost (see Start here)
Repeatedly, update the
1 parameter (see “down
Cost

here”) via
0.5

stop here θj := θj − α J(θ)
∂θj
0
−50 0 50 100 150 200 simultaneously for all
θ j = 0, . . . , n.
until reaching the minimum error.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 44 / 50


Gradient descent

·107

1.5

Once you reach the


1
minimum error, the
Cost

corresponding parameter
0.5
θ should give you the
stop here
fittest line/model
0
−50 0 50 100 150 200
θ

FSD Team (Informatics, UII) Linear Regression October 2, 2022 45 / 50


Linear Regression: the procedure

1 Pick an initial line/model h(θ) by randomly choosing parameter θ


2 Compute the corresponding cost function, e.g.,
m
1X
J(θ) = (hθ (x(i) ) − y (i) )2 ,
2
i=1

3 Update the line/model h(θ) by updating θ that makes J(θ) smaller,


using, e.g., the gradient descent that reads

θj := θj − α J(θ),
∂θj

where α is the learning rate.


4 Repeat steps 2 and 3 until converges

FSD Team (Informatics, UII) Linear Regression October 2, 2022 46 / 50


Linear regression: a summary

In more detail
The learning procedure above is a typical approach in mostly (parametric)
models of machine and deep learning.

Note that we have been so far assuming the y in our data set
(x(i) , y (i) ); i = 1, . . . , m is continuous, e.g., house price, height,
temperature, etc.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 47 / 50


Linear regression: a summary

In more detail
The learning procedure above is a typical approach in mostly (parametric)
models of machine and deep learning.

Note that we have been so far assuming the y in our data set
(x(i) , y (i) ); i = 1, . . . , m is continuous, e.g., house price, height,
temperature, etc.

What if y is discrete? E.g., pass/fail, yes/true, 1/0, etc, that leads to


nonlinear fashion and a classification task.

FSD Team (Informatics, UII) Linear Regression October 2, 2022 47 / 50


Practice in R

FSD Team (Informatics, UII) Linear Regression October 2, 2022 48 / 50


Example 1
# Linear regression

library ( MASS )
lm . fit <- lm ( medv ~ lstat , data = Boston )

# check result
lm . fit
summary ( lm . fit )

# check other values


names ( lm . fit )

# plot result
attach ( Boston )
plot ( lstat , medv )
abline ( lm . fit )
FSD Team (Informatics, UII) Linear Regression October 2, 2022 49 / 50
Example 2

# Multiple linear regression


library ( MASS )

lm . fit <- lm ( medv ~ lstat + age , data = Boston )


summary ( lm . fit )

# with all variable in the data


lm . fit <- lm ( medv ~ . , data = Boston )
summary ( lm . fit )

# with all varible except age


lm . fit1 = lm ( medv ~ . - age , data = Boston )
summary ( lm . fit )

FSD Team (Informatics, UII) Linear Regression October 2, 2022 50 / 50

You might also like