Professional Documents
Culture Documents
AI Explanation Basic 2
AI Explanation Basic 2
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
Multiple Features
Multiple features (variables)
Size in feet2 (𝑥) Price ($) in 1000’s (𝑦)
2104 400
1416 232
1534 315
852 178
… …
𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏
Andrew Ng
Multiple features (variables)
Size in Number of Number of Age of home Price ($) in
feet2 bedrooms floors in years $1000’s
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
x𝑗 = 𝑗 𝑡ℎ feature
𝑛 = number of features
x 𝑖 = features of 𝑖 𝑡ℎ training example
𝑖
x𝑗 = value of feature 𝑗 in 𝑖 𝑡ℎ training example
Andrew Ng
Model:
Previously: 𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏
𝑓𝑤,𝑏 x = 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛 + 𝑏
Andrew Ng
𝑓𝑤,𝑏 𝑥 = 𝑤1 𝑥1 + 𝑤2𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛 + 𝑏
𝑓w,𝑏 x = w ∙ x + 𝑏 =
Andrew Ng
Linear Regression
with Multiple Variables
Vectorization
Part 1
Parameters and features Without vectorization
w = 𝑤1 𝑤2 𝑤3
𝑏 is a number
x = 𝑥1 𝑥2 𝑥3
linear algebra: count from 1
w = np.array([1.0,2.5,-3.3]) f = 0
b = 4 for j in range(0,n):
x = np.array([10,20,30]) f = f + w[j] * x[j]
code: count from 0 f = f + b
Andrew Ng
Linear Regression
with Multiple Variables
Vectorization
Part 2
Without vectorization Vectorization
for j in range(0,16): np.dot(w,x)
f = f + w[j] * x[j]
𝑡0
𝑡0 w[0] w[1] … w[15]
f + w[0] * x[0] * * … *
𝑡1
f + w[1] * x[1] x[0] x[1] … x[15]
𝑡1
…
w[0]*x[0] + w[1]*x[1] + … + w[15]*x[15]
𝑡15
f + w[15] * x[15]
Andrew Ng
Gradient descent w = 𝑤1 𝑤2 ⋯ 𝑤16
d = 𝑑1 𝑑2 ⋯ 𝑑16
w = np.array([0.5, 1.3, … 3.4])
d = np.array([0.3, 0.2, … 0.4])
compute 𝑤𝑗 = 𝑤𝑗 − 0.1𝑑𝑗 for 𝑗 = 1 … 16
Without vectorization With vectorization
𝑤1 = 𝑤1 − 0.1𝑑1 w = w − 0.1d
𝑤2 = 𝑤2 − 0.1𝑑2
⋮
𝑤16 = 𝑤16 − 0.1𝑑16
for j in range(0,16): w = w – 0.1 * d
w[j] = w[j] - 0.1 * d[j]
Andrew Ng
Linear Regression
with Multiple Variables
Cost function 𝐽 𝑤1 , ⋯ , 𝑤𝑛 , 𝑏 𝐽 w, 𝑏
Gradient descent
repeat { repeat {
𝜕 𝜕
𝑤𝑗 = 𝑤𝑗 − 𝛼𝜕𝑤 𝐽 𝑤1 , ⋯ , 𝑤𝑛 , 𝑏 𝑤𝑗 = 𝑤𝑗 − 𝛼𝜕𝑤 𝐽 w, 𝑏
𝑗 𝑗
𝜕 𝜕
𝑏=𝑏 − 𝛼𝜕𝑏 𝐽 𝑤1 , ⋯ , 𝑤𝑛 , 𝑏 𝑏 = 𝑏 − 𝛼𝜕𝑏 𝐽 w, 𝑏
} }
Andrew Ng
Gradient descent
One feature 𝑛 features 𝑛 ≥ 2
repeat {
𝑚 repeat { 𝑚
1 1 𝑖
𝑤 = 𝑤 − 𝛼 𝑓𝑤,𝑏 𝑥 𝑖 −𝑦 𝑖 𝑥 𝑖 𝑤1 = 𝑤1 − 𝛼 𝑓w,𝑏 x 𝑖 − 𝑦 𝑖
𝑥1
𝑚 𝑚
𝑖=1 𝑖=1
⋮ 𝜕
𝐽 w, 𝑏
𝜕 𝜕𝑤1
𝜕𝑤
𝐽 𝑤, 𝑏 𝑚
1 𝑖 𝑖 𝑖
𝑤𝑛 = 𝑤𝑛 − 𝛼 𝑓w,𝑏 x −𝑦 𝑥𝑛
𝑚
𝑚 𝑖=1
𝑚 1
1 𝑖 𝑖 𝑏 = 𝑏 − 𝛼 𝑓w,𝑏 x 𝑖 −𝑦 𝑖
𝑏 = 𝑏 − 𝛼 𝑓𝑤,𝑏 𝑥 −𝑦 𝑚
𝑚 𝑖=1
𝑖=1 simultaneously update
simultaneously update 𝑤, 𝑏 𝑤𝑗 (for 𝑗 = 1, ⋯ , 𝑛) and 𝑏
} }
Andrew Ng
An alternative to gradient descent
Normal equation
What you need to know
• Only for linear regression
• Normal equation method may
• Solve for w, b without
be used in machine learning
iterations
libraries that implement linear
Disadvantages regression.
• Doesn’t generalize to other • Gradient descent is the
learning algorithms. recommended method for
• Slow when number of features finding parameters w,b
is large (> 10,000)
Andrew Ng
Practical Tips for
Linear Regression
Feature Scaling
Part 1
Feature and parameter values
= 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑥1 : size (feet2) 𝑥2 : # bedrooms
𝑝𝑟𝑖𝑐𝑒
range: 300 − 2,000 range: 0 − 5
= 50 ∗ 2000 + 0.1 ∗ 5 + 50
𝑝𝑟𝑖𝑐𝑒 = 0.1 ∗ 2000𝑘 + 50 ∗ 5 + 50
𝑝𝑟𝑖𝑐𝑒
= $100,050.5k
𝑝𝑟𝑖𝑐𝑒 = $500k
𝑝𝑟𝑖𝑐𝑒
Andrew Ng
Feature size and parameter size
size of feature 𝑥𝑗 size of parameter 𝑤𝑗
size in feet2
#bedrooms
Features Parameters
𝐽 w, 𝑏
𝑥2 𝑤2
# bedrooms # bedrooms
Andrew Ng
Feature size and gradient descent
Features Parameters
𝑥2 𝑤2 𝐽 w, 𝑏
# bedrooms # bedrooms
𝑤1 size in feet2
𝑥1 size in feet2
𝑥2 𝑤2
# bedrooms # bedrooms
𝐽 w, 𝑏
rescaled rescaled
Andrew Ng
Practical Tips for
Linear Regression
Feature Scaling
Part 2
Feature scaling
𝑥2
# bedrooms 300 ≤ 𝑥1 ≤ 2000 0 ≤ 𝑥2 ≤ 5
𝑥1 size in
feet2
𝑥1 𝑥2
𝑥1,𝑠𝑐𝑎𝑙𝑒𝑑 = 𝑥2,𝑠𝑐𝑎𝑙𝑒𝑑 =
2000 5
𝑥2
# bedrooms
rescaled 0.15 ≤ 𝑥1,𝑠𝑐𝑎𝑙𝑒𝑑 ≤ 1 0 ≤ 𝑥2,𝑠𝑐𝑎𝑙𝑒𝑑 ≤ 1
𝑥1 size in
feet2 rescaled
Andrew Ng
Mean normalization
300 ≤ 𝑥1 ≤ 2000 0 ≤ 𝑥2 ≤ 5
𝑥2
# bedrooms
𝑥1 − 𝜇1 𝑥2 − 𝜇2
𝑥1 = 𝑥2 =
𝑥1 size 2000−300 5−0
in feet2
𝑥2
# bedrooms
normalized −0.18 ≤ 𝑥1 ≤ 0.82 −0.46 ≤ 𝑥2 ≤ 0.54
𝑥1 size in feet2
normalized
Andrew Ng
Z-score normalization
standard deviation 𝜎
300 ≤ 𝑥1 ≤ 2000 0 ≤ 𝑥2 ≤ 5
𝑥2 𝜎1 = 450 𝜎1
# bedrooms
𝜎2 = 1.4
𝑥1 − 𝜇1 𝑥2 − 𝜇2
𝑥1 size 𝑥1 = 𝑥2 =
𝜎1 𝜎2
in feet2
𝑥2
# bedrooms
normalized
−0.67 ≤ 𝑥1 ≤ 3.1 −1.6 ≤ 𝑥2 ≤ 1.9
𝑥1 size in feet2
normalized
Andrew Ng
Feature scaling
aim for about −1 ≤ 𝑥𝑗 ≤ 1 for each feature 𝑥𝑗
−3 ≤ 𝑥𝑗 ≤ 3
−0.3 ≤ 𝑥𝑗 ≤ 0.3
0 ≤ 𝑥1 ≤ 3
−2 ≤ 𝑥2 ≤ 0.5
−100 ≤ 𝑥3 ≤ 100
−0.001 ≤ 𝑥4 ≤ 0.001
98.6 ≤ 𝑥5 ≤ 105
Andrew Ng
Practical Tips for
Linear Regression
Andrew Ng
Make sure gradient descent is working correctly
𝐽 w, 𝑏 should decrease
objective: min 𝐽 w, 𝑏
w,𝑏 after every iteration
Automatic convergence test
𝐽 w, 𝑏
Let 𝜀 “epsilon” be 10−3 .
𝐽 w, 𝑏 after 100 iterations
𝐽 w, 𝑏 after 200 iterations If 𝐽 w, 𝑏 decreases by ≤ 𝜀
𝐽 w, 𝑏 likely converged in one iteration,
by 400 iterations declare convergence.
(found parameters w, 𝑏
to get close to
global minimum)
0 100 200 300 400
# iterations 𝑤, 𝑏
# iterations needed varies
Andrew Ng
Practical Tips for
Linear Regression
Choosing the
Learning Rate
Identify problem with gradient descent
or learning rate is too
𝛼 is too large large
𝑤1 = 𝑤1 + 𝛼𝑑1
𝐽 𝑤, 𝑏 𝐽 𝑤, 𝑏
use a minus sign
𝑤1 = 𝑤1 − 𝛼𝑑1
# iterations # iterations
𝐽 𝑤, 𝑏 𝐽 𝑤, 𝑏 If 𝛼 is too small,
gradient descent takes
a lot more iterations to
converge
parameter 𝑤1 parameter 𝑤1
Andrew Ng
Values of 𝛼 to try:
J w, b J w, b
# iterations # iterations
Andrew Ng
Practical Tips for
Linear Regression
Feature Engineering
Feature engineering
𝑓w,𝑏 x = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏
Andrew Ng
Practical Tips for
Linear Regression
Polynomial Regression
Polynomial regression
𝑓𝑤,𝑏 𝑥 = 𝑤1 𝑥 + 𝑤2 𝑥 2 + 𝑤3 𝑥 3 + 𝑏
price y
𝑓𝑤,𝑏 𝑥 = 𝑤1 𝑥 + 𝑤2 𝑥 2 + 𝑏
size x
Andrew Ng
Choice of features
𝑓w,𝑏 𝑥 = 𝑤1 𝑥 + 𝑤2 𝑥 + 𝑏
price y
size x
Andrew Ng