Professional Documents
Culture Documents
Deeplearning - Ai Deeplearning - Ai
Deeplearning - Ai Deeplearning - Ai
Deeplearning - Ai Deeplearning - Ai
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
Making recommendations
Predicting movie ratings Ratings
User rates movies using one to five stars
Love at last 5 5 0 0
Romance forever 5 ? ? 0
Cute puppies of love ? 4 0 ? nu = no. of users
Nonstop car chases 0 0 5 4 nm = no. of movies
Andrew Ng
Collaborative Filtering
For user j: Predict user j’s rating for movie i as w(j) $ x(i) + b(j)
Andrew Ng
Cost function
Notation:
r(i,j) = 1 if user j has rated movie i (0 otherwise)
y(i,j) = rating given by user j on movie i (if defined)
w(j), b(j) = parameters for user j
x(i) = feature vector for movie i
Andrew Ng
Cost function
To learn parameters 𝑤 (() , 𝑏 (() for user j :
)
( ( 1 * λ ($) *
J 𝑤 ,𝑏 = + 𝑤 ($) $ 𝑥 (+) + 𝑏 ($) − 𝑦 (+,$) + + 𝑤&
2 2
+:-(+,$) &'(
Andrew Ng
Collaborative Filtering
&
to learn 𝑥 : %
1 & λ + 𝑥 (&) $
J 𝑥 + = 5 𝑤 (() 6 𝑥 (+) + 𝑏 (() − 𝑦 (+,() + ,
2 2 ,'"
(:- +,( .)
Andrew Ng
j=1 j=2 j=3
𝑖=1 Movie1 5 5 ?
1' 1' 1
1 & λ (0) &
min 3 3 𝑤 (0) ! 𝑥 (2) +𝑏 (0) − 𝑦 (2,0) + 3 3 𝑤5
,(%),.(%), ⋯, , &' ,. &' 2 2
0%# 2:4 2,0 %# 0%# 5%#
"
Cost function to learn 𝑥 , ⋯ , 𝑥 (%%) :
1( 1( 1
1 & λ (2) &
min (& 3 3 𝑤 (0) ! 𝑥 (2) +𝑏 (0) − 𝑦 (2,0) + 3 3 𝑥5
6 , ⋯, 6 ()
(%) 2 2
2%# 0:4 2,0 %# 2%# 5%#
Andrew Ng
Gradient Descent
Linear regression (course 1)
repeat {
< (() (()
𝑤+ = 𝑤+ − 𝛼 <= 𝐽 𝑤, 𝑏 𝑤+ = 𝑤+ −𝛼 /
> (?) 𝐽 𝑤, 𝑏, 𝑥
/ /3>
𝑏=𝑏− 𝛼 /0 𝐽 𝑤, 𝑏
𝑏 (() = 𝑏 (() − 𝛼 / ? 𝐽(𝑤, 𝑏, 𝑥)
/0
(+) (+) /
𝑥1 = 𝑥1 − 𝛼 (>) J(w,b,x)
/2@
}
w, b, x
Andrew Ng
Collaborative Filtering
Binary labels:
favs,
likes and clicks
Binary labels
Andrew Ng
Example applications
1. Did user j purchase an item after being shown?
2. Did user j fav/like an item?
3. Did user j spend at least 30sec with an item?
4. Did user j click on an item?
Meaning of ratings:
1 - engaged after being shown item
0 - did not engage after being shown item
? - item not yet shown
Andrew Ng
From regression to binary classification
Previously:
Predict 𝒚 𝒊,𝒋 as 𝒘(𝒋) 6 𝒙(𝒊) + 𝒃(𝒋)
For binary labels:
Predict that the probability of 𝒚 𝒊,𝒋 =𝟏
is given by g 𝒘(𝒋) 6 𝒙(𝒊) + 𝒃(𝒋)
9
where g 𝑧 = 9:; "#
Andrew Ng
Cost function for binary application
Previous cost function:
1( 1 1' 1
1 & λ (2) & λ &
3 𝑤 (0) ! 𝑥 (2) + 𝑏 (0) − 𝑦 (2,0) + 3 3 𝑥5 + 3 3 𝑤5
0
2 2,0 :4 2,0 %#
2 2
2%# 5%# 0%# 5%#
( + (
Loss for binary labels 𝑦 (+,() : 𝑓(3,0,2) 𝑥 = 𝑔(𝑤 6𝑥 +𝑏 )
Loss for
𝐿 𝑓(,,.,6) 𝑥 , 𝑦 2,0 =−𝑦 2,0 log 𝑓(,,.,6) 𝑥 − 1−𝑦 2,0 log 1 − 𝑓(,,.,6) 𝑥 single
example
Andrew Ng
Recommender Systems
implementation
Mean normalization
Users who have not rated any movies
Movie Alice(1) Bob (2) Carol (3) Dave (4) Eve (5)
5 5 0 0 ?
Love at last 5 5 0 0 ? 5 ? ? 0 ?
? 4 0 ? ?
Romance forever 5 ? ? 0 ?
0 0 5 4 ?
Cute puppies of love ? 4 0 ? ?
0 0 5 0 ?
Nonstop car chases 0 0 5 4 ?
Swords vs. karate 0 0 5 ? ?
𝐦𝐢𝐧 1' 1 1( 1
𝒘(𝟏), ….𝒘 𝒏𝒖 1 & λ 0 & λ (2) &
𝒃(𝟏), ….𝒃 𝒏𝒖 3 𝑤 (0) ! 𝑥 (2) + 𝑏 (0) − 𝑦 (2,0) + 3 3 𝑤5 + 3 3 𝑥5
2 2 2
𝒙(𝟏), ….𝒙 𝒏𝒎 2,0 :4 2,0 %# 0%# 5%# 2%# 5%#
Mean Normalization
User 5 (Eve):
+𝜇
Recommender Systems
implementational detail
TensorFlow implementation
Derivatives in ML
Gradient descent algorithm 3
Repeat until convergence
Learning rate 2
Derivative
1
0
-0.5 0 0.5 1 1.5 2 2.5
Andrew Ng
y
𝑱 = (𝒘𝒙 − 𝟏)𝟐
w = tf.Variable(3.0)
x = 1.0
Tf.variables are the parameters we want
y = 1.0 # target value to optimize
alpha = 0.01 Auto Diff
iterations = 30 Auto Grad
for iter in range(iterations):
Fix b = 0 for this example # Use TensorFlow’s Gradient tape to record the steps
# used to compute the cost J, to enable auto differentiation.
with tf.GradientTape() as tape:
fwb = w*x f(x)
costJ = (fwb - y)**2
𝝏
# Use the gradient tape to calculate the gradients 𝑱(𝒘)
# of the cost with respect to the parameter w. 𝝏𝒘
[dJdw] = tape.gradient( costJ, [w] )
Andrew Ng
Implementation in TensorFlow
# Instantiate an optimizer.
Gradient descent algorithm optimizer = keras.optimizers.Adam(learning_rate=1e-1)
iterations = 200
Repeat until convergence for iter in range(iterations):
# Use TensorFlow’s GradientTape
# to record the operations used to compute the cost
with tf.GradientTape() as tape:
Dataset credit: Harper and Konstan. 2015. The MovieLens Datasets: History and Context
Andrew Ng
Collaborative Filtering
*
i.e. with smallest (1) (+) &
distance 5 𝑥8 − 𝑥8 𝒙(𝒌) x(i)
8.)
𝟐
𝒙(𝒌) − 𝒙(𝒊)
Andrew Ng
Limitations of Collaborative Filtering
Cold start problem. How to
Andrew Ng
Content-based Filtering
Collaborative filtering
vs
Content-based filtering
Collaborative filtering vs Content-based filtering
Collaborative filtering:
Recommend items to you based on rating of users who
gave similar ratings as you
Content-based filtering:
Recommend items to you based on features of user and item
to find good match
if user j has rated item i
rating given by user j on item i (if defined)
Andrew Ng
Examples of user and item features
User features:
• Age
• Gender
(𝐣)
• Country 𝐱 𝐮 𝐟𝐨𝐫 𝐮𝐬𝐞𝐫 𝐣
• Movies watched
• Average rating per genre Vector size
• … could be
different
Movie features:
• Year
• Genre/Genres (𝐢)
𝐱 𝐦 𝐟𝐨𝐫 𝐦𝐨𝐯𝐢𝐞 𝐢
• Reviews
• Average rating
• …
Andrew Ng
Content-based filtering: Learning to match
Predict rating of user j on movie i as
computed
computed (𝒊)
(𝒋) from 𝒙𝒎
from 𝒙𝒖
Andrew Ng
Content-based Filtering
⋮
xu ⋮ vu xm ⋮ vm
⋮ ⋮ ⋮
64 32 128 32
128 256
Prediction :
𝒈 𝒗𝒖 $ 𝒗𝒎 𝒕𝒐 𝒑𝒓𝒆𝒅𝒊𝒄𝒕 𝒕𝒉𝒆 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 that 𝒚(𝒊,𝒋) 𝒊𝒔 𝟏
Andrew Ng
Neural network architecture
⋮
xu ⋮
⋮
vu
Prediction
vm
⋮ ⋮
xm ⋮
&
Cost 𝐽= 5 𝑣! (() 6 𝑣" (+) − 𝑦 (+,() + NN regularization term
function +,( :-(+,().)
Andrew Ng
Learned user and item vectors:
(𝒋) (𝒋)
𝒗𝒖 is a vector of length 32 that describes user j with features 𝒙𝒖
(𝒊) (𝒊)
𝒗𝒎 is a vector of length 32 that describes movie i with features 𝒙𝒎
Andrew Ng
Advanced implementation
Recommending from
a large catalogue
How to efficiently find recommendation from
a large set of items?
• Movies 1000+ xu ⋮
⋮ ⋮
• Ads 1m+ vu
• Songs 10m+ predictions
• Products 10m+ vm
xm ⋮ ⋮ ⋮
Andrew Ng
Two steps: Retrieval & Ranking
Retrieval:
• Generate large list of plausible item candidates
e.g.
1) For each of the last 10 movies watched by the user,
find 10 most similar movies
Ranking:
⋮
• Take list retrieved xu ⋮
⋮
vu
and rank using
learned model predictions
vm
⋮ ⋮
xm ⋮
Andrew Ng
Retrieval step
Andrew Ng
Advanced implementation
Ethical use of
recommender systems
What is the goal of the recommender system?
Recommend:
Andrew Ng
Ethical considerations with recommender systems
Andrew Ng
Other problematic cases:
Andrew Ng
Content-based Filtering
TensorFlow Implementation
user_NN = tf.keras.models.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(32)
])
item_NN = tf.keras.models.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(32 )
])