4d WSC Sample

Sequence-Aware Recommender
Systems
Massimo Quadrana, Pandora Media, USA

mquadrana@pandora.com
Paolo Cremonesi, Politecnico di Milano

paolo.cremonesi@polimi.it
Dietmar Jannach, AAU Klagenfurt, Austria

dietmar.jannach@aau.at
About today’s presenter(s)
• Paolo Cremonesi
– Politecnico di Milano, Italy
• Dietmar Jannach
– University Klagenfurt, Austria
• Massimo Quadrana
– Pandora, Italy
2
Today’s tutorial based on
• M. Quadrana, P. Cremonesi, D. Jannach,
"Sequence-Aware Recommender Systems“
ACM Computing Surveys, 2018
• Link to paper:
bit.ly/sequence-aware-rs
3
About you?
4
Agenda
• 13:30 – 15:00 Part 1
– Introduction, Problem definition
• 15:00 – 15:30 Coffee break
• 15:30 - 16:15 Part 2
– Algorithms
• 16:15 – 16:45 Part 3
– Evaluation
• 16:45 – 17:00
– Closing, Discussion
5
Part I: Introduction
Recommender Systems
• A central part of our daily user experience
– They help us locate potentially interesting things
– They serve as filters in times of information overload
– They have an impact on user behavior and business
7
Recommendations everywhere
8
A field with a tradition
The last 18 years in Recsys research:
• 2000
recommendation = matrix competition
• 2010
recommendation = learning to rank
9
Common problem abstraction (1):
matrix completion
• Goal
– Learn missing ratings
• Quality assessment
– Error between true and estimated ratings
Item1 Item2 Item3 Item4 Item5

Alice 5 ? 4 4 ?
Paolo 3 1 ? 3 3
Susan ? 3 ? ? 5
Bob 3 ? 1 5 4
George ? 5 5 ? 1
10
Common problem abstraction (2):
learning to rank
• Goal
– Learn relative ranking of items
• Quality assessment
– Error between true and estimated ranking
11
A field with a tradition
• But are we focusing on

the most important problems?
12
Real-world problem situations
• User intent
• Short-term intent/context vs. long term taste
• Order can matter
• Order matters (constraints)
• Interest drift
• Reminders
• Repeated purchases
13
User intent
• Our user searched and listened to
“Last Christmas” by Wham!
• Should we, …
– Play more songs by Wham!?
– More pop Christmas songs?
– More popular songs from the 1980s?
– Play more songs with
controversial user feedback?
14
User intent
• Knowing the user’s intention can be crucial
15
Short-term intent/context vs.
long term taste
• Here’s what the customer purchased during the
last weeks
• Now, the user return to the shop and browse

these items
16
long term taste
• What to recommend?
• Some plausible options
– Only shoes
– Mostly Nike shoes
– Maybe also some T-shirts
products purchased
in the past
products browsed
in the current session
17
long term taste
• Using the matrix completion formulation
– One trains a model based only on past actions
– Without the context, the algorithm will probably
most recommend
– Mostly (Nike) T-shirts and some trousers
– Is this what you expect?
products purchased
in the past
products browsed
in the current session
18
Order can matter
• Next rack recommendation
• What to recommend next should suit the
previous tracks
19
Order matters (order constraints)
• Is it meaningful to recommend Star Wars III, if
the user has not seen the previous episodes?
20
Interest drift
Before having a family After having a family
21
Reminders
• Should we recommend items that the user
already knows?
• Amazon does
22
Repeated purchases
• When should be remind users
(through recommendations?)
23
Algorithms depends on …
• The choice of the best recommender algorithm
depends on
– Goal (user task, application domain, context, …)
– Data available
– Target quality
24
Examples based on goal …
• Movie recommendation
– if you watched a SF movie …
25
• Movie recommendation
– if you watched a SF movie …
– … you recommend another SF movie
• Algorithm: most similar item
26
• On-line store recommendations
– if you bought a coffee maker …
27
• On-line store recommendations
– if you bought a coffee maker …
– … you recommend cups or coffee beans (not
another coffee maker)
• Algorithm: frequently bought together
user who bought … also bought
28
Example based on data …
• If you have
– all items with many ratings
– no item attributes
• You need CF
• If you have
– no ratings
– all item with many attributes
• You need CBF
29
Examples for
sequence-aware algorithms …
• Goal
– next track recommendation, …
• Data
– new users, …
• Need for quality

– context, intent, …
30
Implications for research
• Many of the scenarios cannot be addressed by a
matrix completion or learning-to-rank problem
formulation
– No short-term context
– Only one single user-item interaction pair
– Often only one type of interaction (e.g., ratings)
– Rating timestamps sometimes exist, but might be
disconnected from actually experiencing/using the
item
31
Sequence-Aware Recommenders
• A family of recommenders that
– uses different input data,
– often bases the recommendations on certain types
of sequential patterns in the data,
– addresses the mentioned practical problem settings
32
Problem Characterization
33
Characterizing Sequence-Aware
Recommender Systems
34
Inputs
• Ordered (and time-stamped) set of user actions
– Known or anonymous
(beyond the session)
35
Inputs
• Actions are usually connected with items
– Exceptions: search terms,
category navigation, …
36
Inputs
37
Inputs
38
Inputs
39
Inputs
40
Inputs
• Different types of actions
– Item purchase/consumption, item view, add-to-
catalog, add-to-wish-list, …
41
Inputs
• Attributes of actions: user/item details, context
– Dwelling times, item discounts, etc.
42
Inputs
• Typical: Enriched clickstream data
43
Inputs: differences with traditional
RSs
• for each user, all sequences contain
one single action (item)
44
Inputs: differences with traditional
RSs (with time-stamp)
• for each user, there is
one single sequence
with several actions (item)
45
Output (1)
• One (or more) ordered list of items
• The list can have different interpretations,
based on goal, domain, application scenario
• Usual item-ranking
tasks
– list of alternatives
for a given item
– complements or
accessories
47
Output (2)
• An ordered list of items
• Suggested sequence
of actions
– next-track music
recommendations
48
Output (3)
• An ordered list of items
• Strict sequence
of actions
– course learning
recommendations
49
Typical computational tasks
• Find sequence-related in the data, e.g.,
– co-occurrence patterns
– sequential patterns
– distance patterns
• Reasoning about order constraints
– weak and strong constraints
• Relate patterns with user profile and current
point in time
– e.g., items that match the current session
51
Abstract problem characterization:
Item-ranking or list-generation
• Some definitions
– 𝑈 : users
– 𝐼 : items
– 𝐿 : ordered list of items of length 𝑘
– 𝐿∗ : set of all possible lists 𝐿 of length up to 𝑘
– 𝑓 𝑢, 𝐿 : utility function, with 𝑢 ∈ 𝑈 and 𝐿 ∈ 𝐿∗
𝐿𝑢 = argmax 𝑓 𝑢, 𝐿 𝑢∈𝑈
𝐿∈𝐿∗
– Task: learn 𝑓 𝑢, 𝐿 from sequences 𝐴 of past user

actions
53
Abstract problem characterization
• Utility function is not limited to scoring
individual items
• The utility of entire lists can be assessed
– including, e.g., transition between objects,
fulfillment of order constraints, diversity aspects
• The design of the utility function depends on
the purpose of the system
– e.g., provide logical continuation, show alternatives,
show accessories, …
Jannach, D. and Adomavicius, G.: "Recommendations with a Purpose". In: Proceedings of the 10th ACM
Conference on Recommender Systems (RecSys 2016). Boston, Massachusetts, USA, 2016, pp. 7-10
54
On recommendation purposes
• Often, researchers are not explicit about the
purpose
– Traditionally, could be information filtering or
discovery, with conflicting goals
• Commonly used abstraction
– e.g., predict hidden rating
• For sequence-aware recommenders
– often: predict next (hidden) action, e.g., for a given
session beginning
55
Relation to other areas
• Implicit feedback recommender systems
– Sequence-aware recommenders are often built on
implicit feedback signals (action logs)
– Problem formulation is however not based on matrix
completion
• Context-aware recommender systems
– Sequence-aware recommenders often are special
forms of context-aware systems
– Here: Interactional context is relevant, which is only
implicitly defined through the user’s actions
56
Relation to other areas
• Time-Aware RSs
– Sequence-aware recommenders do not necessarily
need explicit timestamps
– Time-Aware RSs use explicit time
• e.g., to detect long-term user drifts
• Other:
– interest drift
– user-modeling
57
Categorization
58
Categorization of tasks
• Four main categories
– Context adaptation
– Trend detection
– Repeated recommendation
– Consideration of order constraints and sequential
patterns
• Notes
– Categories are not mutually exclusive
– All types of problems based on the same problem
characterization, but with different utility functions,
and using the data in different ways
59
Context adaptation
• Traditional context-aware recommenders are
often based on the representational context
– defined set of variables and observations, e.g.,
weather, time of the day etc.
• Here, the interactional context is relevant
– no explicit representation of the variables
– contextual situation has to be inferred from user
actions
60
How much past information is
considered?
• Last-N interactions based recommendation:
– Often used in Next-Point-Of-Interest
recommendation scenarios
– In many cases only the very last visited location is
considered
– Also: “Customers who bought … also bought”
• Reasons to limit oneself:
– Not more information available
– Previous information not relevant
61
How much past information is
considered?
• Session-based recommendation:
– Short-term only
– Only last sequence of actions of the current user is
known
– User might be anonymous
• Session-aware recommendation:
– Short-term + Long-term
– In addition, past session of the current user are
known
– Allows for personalized session-based
recommendation
Quadrana, M., Karatzoglou, A., Hidasi, B., Cremonesi, P.: “Personalizing Session-based Recommendations 62
with Hierarchical Recurrent Neural Networks”. Proceedings ACM RecSys 2017: 130-137
Session-based recommendation
Anonym 1
Anonym 2
Anonym 3
Time
63
Session-based recommendation
Soccer
Anonym 1
Cartoons
Anonym 2
NBA
Anonym 3
Time
64
Session-aware recommendation
Soccer
User 1
Cartoons
User 2
NBA
User 1
Time
65
Soccer
User 1
Sports!
NBA
User 1
Time
66
Soccer
User 1
Sports!
Capcakes
User 1
Time
67
What to find?
• Next
• Alternatives
• Complements
• Continuations
68
What to pick
• One
• All
69
Trend detection
• Less explored than context adaptation
• Community trends:
– Consider the recent or seasonal popularity of items,
e.g., in the fashion domain and, in particular, in the
news domain
• Individual trends:
– E.g., natural interest drift
• Over time, because of influence of other people, because of
a recent purchase, because something new was discovered
(e.g., a new artist)
Jannach, D., Ludewig, M. and Lerche, L.: "Session-based Item Recommendation in E-Commerce: On Short-
Term Intents, Reminders, Trends, and Discounts". User-Modeling and User-Adapted Interaction, Vol. 27(3-
5). Springer, 2017, pp. 351-392
70
Repeated Recommendation
• Identifying repeated user behavior patterns:
– Recommendation of repeat purchases or actions
• E.g., ink for a printer, next app to open after call on mobile
– Patterns can be mined from the individual or the
community as a whole
• Repeated recommendations as reminders
– Remind users of things they found interesting in the past
• To remind the of things they might have forgotten
• As navigational shortcuts, e.g., in a decision situation
• Timing as an interesting question in both situations
71
Consideration of Order Constraints
and Observed Sequential Patterns
• Two types of sequentiality information
– External domain knowledge: strict or weak ordering
constraints
• Strict, e.g., sequence of learning courses
• Weak, e.g., when recommending sequels to movies
– Information that is mined from the user behavior
• Learn that one movie is always consumed after another
• Predict next web page, e.g., using sequential pattern mining
techniques
72
Review of existing works (1)
• 100+ papers reviewed
– Focus on last N interactions common
• But more emphasis on session-based / session-aware
approaches in recent years
73
• 100+ papers reviewed
– Very limited work for other problems:
• Repeated recommendation, trend detection, consideration
of constraints
74
• Application domains
75
Summary of first part
• Matrix completion abstraction not well suited
for many practical problems
• In reality, rich user interaction logs are available
• Different types of information can be derived
from the sequential information in the logs
– and used for special recommendation tasks, in
particular for the prediction of the next-action
• Coming next:
– Algorithms and evaluation
76
Part II: Algorithms
Agenda
• 13:30 – 15:00 Part 1
• 15:00 – 15:30 Coffee break
• 15:30 - 16:15 Part 2
– Algorithms
• 16:15 – 16:45 Part 3
– Evaluation
• 16:45 – 17:00
78
Taxonomy
• Sequence Learning
– Frequent Pattern Mining
– Sequence Modeling
– Distributed Item Representations
– Supervised Models with Sliding Window
• Sequence-aware Matrix Factorization
• Hybrids
– Factorized Markov Chains
– LDA/Clustering + sequence learning
• Others
– Graph-based, Discrete-optimization 79
Sequence Learning (SL)
• Useful in application domains where input data
has an inherent sequential nature
– Natural Language Processing
– Time-series prediction
– DNA modelling
– Sequence-Aware Recommendation
80
SL: Frequent Pattern Mining (FPM)
1. Discover user consumption patterns
• e.g. Association rules, (Contiguous) Sequential Patterns)
2. Look for patterns matching partial transactions
3. Rank items by confidence of matched rules
• FPM Applications
– Page prefetching and recommendations
Personalized FPM for next-item recommendation
Next-app prediction
81
SL: Frequent Pattern Mining (FPM)
• Pros
– Easy to implement
– Explainable predictions
• Cons
– Choice of the minimum support/confidence
thresholds
– Data sparsity
– Limited scalability
82
SL: Sequence Modeling
• Sequences of past user actions as
time series with discrete observations
– Timestamps used only to order user actions
• SM aim to learn models from past observations
(user actions) to predict future ones
• Categories of SM models
– Markov Models
– Reinforcement Learning
– Recurrent Neural Networks
83
Markov Models
• Stochastic processes over discrete random
variables
– Finite history (= order of the model)  user actions
depend on a limited # of most recent actions
• Applications
– Online shops
– Playlist generation
– Variable Order Markov Models for news
recommendation
– Hidden Markov Models for contextual next track
prediction
84
Reinforcement Learning
• Learn by sequential interactions with the
environment
• Generate recommendations (actions)
and collect user feedback (reward)
• Markov Decisions Processes (MDPs)
• Applications
– Online e-commerce services
– Sequential relationships between the attributes of
items explored in a user session
85
Recurrent Neural Networks (RNN)
• Distributed real-valued hidden state models
with non-linear dynamics
– Hidden state: latent representation of user state
within/across sessions
– Update the hidden state on the current input and its
previous value, then use it to predict the probability
for the next action
• Trained end-to-end with gradient descent
• Applications
– Next-click prediction with RNNs
86
SL:
Distributed Item Representations
• Dense, lower-dimensional representations of
the items
– derived from sequences of events
– preserve the sequential relationships between items
• Similar to latent factor models
– “similar” items are projected into similar vectors
– every item is associated with a real-valued
embedding vector
– its projection into a lower-dimensional space
– certain item transition properties are preserved
• e.g., co-occurrence of items in similar contexts
87
SL:
Distributed Item Representations
• Different approaches are possible
– Prod2Vec
– Latent Markov Embedding (LME)
– …
88
SL: Distributed Item
Representations - Prod2Vec
vi = vector representing item i
ik = item at step k of the session
• Learn vectors vi in order to maximize the joint

probabilities of having items
ik-2 , ik-1 , ik+1 , ik+2
given item ik
• Sliding window of size N
– e.g., N = 5 k–2 k–1 k k+1 k+2

89
SL: Supervised Learning with
Sliding Windows
90
Sequence-aware Matrix
Factorization
• Sequence information usually derived from
timestamps
 Time-aware recommender systems
91
Hybrid Methods
• Sequence Learning +
Matrix Completion (CF or CBF)
92
Statistics
14
12
10
8
6
4
2
0
93
Going deep: FPM
94
FPM
• Association Rule Mining
– items co-occurring within the same sessions
• no check on order
• if you like A and B, you also like C (aka: learning to rank)
• Sequential Pattern Mining
– Items co-occurring in the same order
• no check on distance
• If you watch A and later watch B, you will later watch C
• Contiguous Sequential Pattern Mining
– Item co-occurring in the same order and distance
• If you watch A and B one after the other, if now watch C
95
FPM
• Two steps approach
1. Offline: rule mining
2. Online: rule matching (with current user session)
• Rules have
– Confidence: conditional probability
– Support: number of examples (main parameter)
• lower thresholds -> fewer rules
– few rules: difficult to find rules matching a session
– many rules: noisy rules (low quality)
96
FPM
• More constrained FPM methods produce fewer
rules and are computationally more complex
– CSPM -> SPM -> ARM
• Efficiency and number of mined rules decrease
with the number N of past user actions (items)
considered in the mining
– N=1 : users who like A also like B
– N=2 : users who like A and B also like C
– …
97
Going deep: RNN
98
Simple Recurrent Neural Network
• Hidden state  used to predict the output
– Computed from next input and previous hidden state
𝒚𝟏 𝒚𝟐
𝒉𝟎 𝒉𝟏 𝒉𝟐
𝒙𝟏 𝒙𝟐
99
• Three weight matrices
𝑇
– ℎ𝑡 = 𝑓 𝑥 𝑇 𝑊𝑥 + ℎ𝑡−1 𝑊ℎ + 𝑏ℎ
– 𝑦𝑡 = 𝑓 ℎ𝑡𝑇 𝑊𝑦
𝒚𝟏 𝒚𝟐
𝒉𝟎 𝒉𝟏 𝒉𝟐
𝒙𝟏 𝒙𝟐
100
𝑇
𝒚𝟏 𝒚𝟐
𝒉𝟎 𝒉𝟏 𝒉𝟐
𝒙𝟏 𝒙𝟐
101
𝑇
𝒚𝟏 𝒚𝟐
𝒉𝟎 𝒉𝟏 𝒉𝟐
𝒙𝟏 𝒙𝟐
102
𝑇
𝒚𝟏 𝒚𝟐
𝒉𝟎 𝒉𝟏 𝒉𝟐
𝒙𝟏 𝒙𝟐
103
• Item of event as 1-of-N coded vector
• Example:
– Output event 2, item 4
y2 = 0 0 0 1 0 𝒚𝟏 𝒚𝟐
– Input event 2, item 3 𝒉𝟎 𝒉𝟏 𝒉𝟐
x2 = 0 0 1 0 0
𝒙𝟏 𝒙𝟐
104
Distributed Item Representations:
Prod2Vec
vi = vector representing item i
ik = item at step k of the session
• Learn vectors vi in order to maximize the joint

probabilities of having items
ik-2 , ik-1 , ik+1 , ik+2
given item ik
• Sliding window of size N
– e.g., N = 5 k–2 k–1 k k+1 k+2

105
• Fairly effective in modeling sequences
– good for session-based problems
• Problems
– exploding gradients  careful regularization
– careful initialization
– how to manage attributes of events? 𝒚𝟏 𝒚𝟐
– how to manage multiple sessions ?
𝒉𝟎 𝒉𝟏 𝒉𝟐
𝒙𝟏 𝒙𝟐
106
Multiple sessions
• Naïve solution: concatenation
• Whole user history as a single sequence
• Trivial implementation but limited effectiveness
R R R R R R R R R R
N N N N N N N N N N
N N N N N N N N N N
…
User 1
Session 1 Session 2 Session N
107
Multiple sessions
• Naïve solution: concatenation
• Whole user history as a single sequence
• Trivial implementation but limited effectiveness
R R R R R R R R R R
N N N N N N N N N N
N N N N N N N N N N
…
User 1
Session 1 Session 2 Session N
 Hierarchical RNN (HRNN)

108
HRNN Architecture
Session 1 Session 2
User 1
109
Architecture
Session 1
Within session hidden states:

• Initialization: 𝑠1,0 = 0
• Update: 𝑠1,𝑛 = RNNses (𝑖1,𝑛 , 𝑠1,𝑛−1 )
Across session hidden states:
• Initialization: 𝑐0 = 0
User 1
110
Architecture
Session 1
Across session:
• Update: 𝑐𝑚 = RNNusr 𝑠𝑚 , 𝑐𝑚−1
User 1 previous user-state
last session-state
111
Architecture – HRNN Init
Session 1 Session 2
User 1
from the 2nd session on:

• Initialization: 𝑠𝑚,0 = f(𝑊𝑖𝑛𝑖𝑡 𝑐𝑚−1 + 𝑏𝑖𝑛𝑖𝑡 )
• Update: 𝑠𝑚,𝑛 = RNNses (𝑖𝑚,𝑛 , 𝑠𝑚,𝑛−1 )
112
Architecture – HRNN All
Session 1 Session 2
User 1
from the 2nd session on:

• Initialization: 𝑠𝑚,0 = f(𝑊𝑖𝑛𝑖𝑡 𝑐𝑚−1 + 𝑏𝑖𝑛𝑖𝑡 )
• Update: 𝑠𝑚,𝑛 = RNNses (𝑖𝑚,𝑛 , 𝑠𝑚,𝑛−1 , 𝑐𝑚−1 )
113
Architecture - Complete
Session 1 Session 2
User 1
114
Architecture - Complete
Session 1 Session 2
Two identical sessions from

different users will produce
different recommendations
User 1
115
Number of hidden layers
• One is enough
Add more layers if you have

• many sessions
or
• many interactions per session
116
Including attributes
Naïve solutions fail
ItemID (next)
One-hot vector
Scores on items
f()
Network output
Hidden layer
wID wfeatures
One-hot vector feature vector
ItemID
Concatenate item ID and features 117

Late-merging -> Parallel RNNs
ItemID (next)
One-hot vector
Scores on items
f()
Weighted output
wID wfeatures
Subnet output Subnet output
hidden layer hidden layer
One-hot vector feature vector
ItemID 118
Training Parallel RNN
• Straight backpropagation fails  Alternative
training methods
• Train one subnet at the time and freeze others
(like in ALS!)
• Depending on the frequency of iteration
– Residual training (train fully)
– Alternating training (per epoch)
– Interleaving training (per mini-batch)
119
Part III: Evaluation and Datasets
Agenda
• 13:30 – 15:00 Part 1
• 15:00 – 15:30 Coffee break
• 15:30 - 16:15 Part 2
– Algorithms
• 16:15 – 16:45 Part 3
– Evaluation
• 16:45 – 17:00
121
Traditional evaluation approaches
• Off-line
– evaluation metrics
• Error metrics: RMSE, MAE
• Classification metrics: Precision, Recall, …
• Ranking metrics: MAP, MRR, NDCG, …
– dataset partitioning (hold-out, leave-one-out, …)
– available datasets
• On-line evaluation
– users studies
– field test
Dataset partitioning
• Splitting the data into training and test sets
event-level session-level
Users
Time (events)
Training
Test
Users
Users
Time (events) Time (events)

Training
Test
Community Level User-level
Training
users
users
Test
Time (events)
Training
Test
Profile
Training
Training
users
users
users
users
Test
Test
Training
Test
Profile
Training
Training
users
users
users
users
Test
Test
Training
Test
Profile
• session-based or session-aware task
use session-level partitioning
– better mimic real systems trained on past sessions
• session-based task
use session-level + user-level
– to test for new users
Split size
• Fixed
– e.g., 80% training / 20% test
• Time rolling
– Move the time-split forward in time
Target Items
• Sequence agnostic prediction
– e.g., similar to matrix completion
Time (events)
Target Items
• Given-N next item prediction
– e.g., next track prediction
Time (events)
Target Items
– e.g., predict a sequence of events
Time (events)
Target Items
• Given-N next item prediction with look-ahead
Time (events)
Target Items
• Given-N next item prediction with look-back
– e.g., how many events are required to predict a final
product
Time (events)
Target Items
• Given-N next item prediction with look-back
• In same scenarios, the same item can be

consumed many times
– e.g, next track recommendation, repeated
recommendations, …
Evaluation metrics
• Classification metrics
– Precision, Recall, …
– good for context-adaptation (traditional matrix-
completion task)
• Ranking metrics
– MAP, NDCG, MRR, …
– good with fixed-sized splits
• Multi-metrics
– Ranking + Others (Diversity)
Datasets
Name Domain Users Items Events Sessions
Amazon EC 20M 6M 143M
Epinions EC 40k 110k 181k
Ta-feng EC 32k 24k 829k
TMall EC 1k 10k 5k
Retailrocket EC 1.4M 235k 2.7M
Microsoft WWW 27k 8k 55k 32k
MSNBC WWW 1.3M 1k 476k
Delicious WWW 8.8k 3.3k 60k 45k
CiteULike WWW 53k 1.8k 2.1M 40k
Outbrain WWW 700M 560 2B
139
Datasets
Name Domain Users Items Events Sessions
AOL Query 650k 17M 2.5M
Adressa News 15k 923 2.7M
Foursquare_2 POI 225k 22.5M
Gowalla_1 POI 54k 367k 4M
Gowalla_2 POI 196k 6.4M
30Music Music 40k 5.6M 31M 2.7M
140
Thank you for your attention
Massimo Quadrana, Pandora Media, USA

mquadrana@pandora.com
Paolo Cremonesi, Politecnico di Milano

paolo.cremonesi@polimi.it
Dietmar Jannach, AAU Klagenfurt, Austria

dietmar.jannach@aau.at

4d WSC Sample

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4d WSC Sample

Uploaded by

Copyright:

Available Formats

Sequence-Aware Recommender

Massimo Quadrana, Pandora Media, USA

Paolo Cremonesi, Politecnico di Milano

Dietmar Jannach, AAU Klagenfurt, Austria

Item1 Item2 Item3 Item4 Item5

• But are we focusing on

• Now, the user return to the shop and browse

Before having a family After having a family

• Need for quality

– Task: learn 𝑓 𝑢, 𝐿 from sequences 𝐴 of past user

• Learn vectors vi in order to maximize the joint

– e.g., N = 5 k–2 k–1 k k+1 k+2

 Time-aware recommender systems

– Input event 2, item 3 𝒉𝟎 𝒉𝟏 𝒉𝟐

• Learn vectors vi in order to maximize the joint

– e.g., N = 5 k–2 k–1 k k+1 k+2

Session 1 Session 2 Session N

Session 1 Session 2 Session N

 Hierarchical RNN (HRNN)

Within session hidden states:

from the 2nd session on:

from the 2nd session on:

Two identical sessions from

Add more layers if you have

One-hot vector feature vector

Concatenate item ID and features 117

Subnet output Subnet output

hidden layer hidden layer

One-hot vector feature vector

Time (events) Time (events)

• In same scenarios, the same item can be

Massimo Quadrana, Pandora Media, USA

Paolo Cremonesi, Politecnico di Milano

Dietmar Jannach, AAU Klagenfurt, Austria

You might also like