Professional Documents
Culture Documents
Applied Machine Learning: Multiple Linear Regression
Applied Machine Learning: Multiple Linear Regression
Machine Learning
Lecture: 3
Multiple Linear Regression
!1
Andrew Ng
Outline
3.1 Multiple features
3.2 Gradient descent for multiple variables
3.3 Feature Scaling
3.4 Stochastic Gradient Descent
3.5 Mini-batch Gradient Descent
3.6 Polynomial regression
2
Andrew Ng
3.1 Multiple features
3
Single features (variable)
2104 460
1416 232
1534 315
852 178
… …
4
Multiple features (variables)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
5
Hypothesis
Single variable
Multiple variables
7
Hypothesis: hθ(x) = θ0 x0 + θ1x1 + . . . + θn xn = θ T x
Cost function:
Gradient descent:
Repeat until convergence
8
Gradient Descent for single variable
9
Gradient Descent for multiple variables
Multiple variables, i.e., j = 0, 1, …, n
Repeat until convergence
1
(hθ(x (i)) − y (i))x0(i)
m∑
θ0 := θ0 − α
1
(hθ(x (i)) − y (i))x1(i)
m∑
θ1 := θ1 − α
.
.
.
1
(hθ(x (i)) − y (i))xn(i)
m∑
θn := θn − α
10
3.3 Feature Scaling
11
Feature Scaling
θ2 J(θ1, θ2)
J(θ1, θ2)
θ2
θ1 θ1
Source: https://stackoverflow.com/questions/46686924/why-scaling-data-is-very-important-in-neural-networklstm/46688787#46688787 12
Feature scaling
Standardization xi − μ
x′i =
(or Z-score normalization) s
xi − xmin
Min-max scaling x′i =
(or normalization) xmax − xmin
https://sebastianraschka.com/Articles/2014_about_feature_scaling.html
https://developers.google.com/machine-learning/data-prep/transform/normalization
13
3.4 Stochastic Gradient Descent
14
Stochastic Gradient Descent
1 m
(hθ(x (i)) − y (i))xj(i)
m∑
Batch gradient descent θj := θj − α (2)
i=1
all m examples
x (i)
15
Batch Gradient Descent Stochastic Gradient Descent
2. Repeat {
for {
(for every )
}
}
16
17
Learning schedule
Ref: Bottou, L. (2012). Stochastic gradient descent tricks. In Neural networks: Tricks of the trade (pp. 421-436). Springer, Berlin, Heidelberg.
18
3.5 Mini-batch Gradient Descent
19
Mini-batch gradient descent
Batch gradient descent: Use all m examples in each iteration
Stochastic gradient descent: Use 1 example in each iteration
Mini-batch gradient descent: Use b examples in each iteration
Credit: Andrew NG
20
Mini-batch gradient descent
Say .
Repeat {
for {
(for every )
}
}
Credit: Andrew NG
21
Credit: Andrew NG
22
3.6 Polynomial regression
23
Housing prices prediction
24
Polynomial regression
Price
(y)
Size (x)
25