BEARS: Towards An Evaluation Framework For Bandit-Based Interactive Recommender Systems

BEARS: TOWARDS AN EVALUATION FRAMEWORK FOR
BANDIT-BASED INTERACTIVE RECOMMENDER SYSTEMS

Andrea Barraza-Urbina1
Georgia Koutrika 2, Mathieu d’Aquin 1 and Conor Hayes 1
Offline Evaluation for Recommender Systems Workshop (REVEAL’18)

The ACM Recommender Systems 2018 (RecSys’18)
Vancouver, Canada
1 Data Science Institute, The Insight Centre for Data Analytics 2 Athena Research Center
National University of Ireland Galway Athens, Greece
Galway, Ireland
October 7th, 2018

AGENDA
Introduction
BEARS Evaluation Framework
Conclusion and Future Work
VIDEO OF THE PRESENTATION: https://youtu.be/GIH_ArJ-ylk

AGENDA
Introduction

EXPLOITATION VS. EXPLORATION IN RECSYS
Explore other preferences the

Exploit the known user might have.
user model. Adapt to user taste drift Maximize long-term
user satisfaction
Classical RecSys Maximize user Face user/item cold-start
approaches immediate
satisfaction Active Learning
Accuracy-based
metrics Beyond-accuracy metrics
Face the Filter Bubble

Exploitation
Exploration
4
REINFORCEMENT LEARNING (RL)
Tries to Walk
Branch of Machine Learning.
Learn by interacting with the

environment which provides
reward signals.
Trial – error search / learning.
Reinforcement
5
RECSYS AS REINFORCEMENT LEARNING
RL can offer a proper framework to represent RecSys that learn from

evaluative feedback when interacting with an uncertain environment.
RS Agent
Agent
Current state reward Explicit/Implicit action Single item/List

Target User $" #" User Feedback !" Recommendation
#"%&
Environment
$"%&
RS Environment
6
THE MULTI-ARMED BANDIT PROBLEM (MAB)
Recent RecSys solutions have proposed to address the exploration and

exploitation trade-off by using Multi-Armed Bandits.
Unknown reward !"# !"$ !"%

probabilities with
mean !"& .
?
?
( trials
Maximize total payoff.
Exploit or Explore
7
MULTI-ARMED BANDITS AND RECSYS
Actions Action Value Action Selection Strategy

Estimates Epsilon Greedy
CTR< ≈ >< 1−; Exploit
CTR ? ≈ >? ;
Explore
CTR @ ≈ >@
# -./012
CTR estimate =
# /345622/782
8
EVALUATION CHALLENGES OF BANDIT-BASED
INTERACTIVE RECSYS
• Evaluation based on Supervised Learning is inappropriate to test

how a system learns over time using interactive feedback.
• There is no standard methodology to evaluate offline interactive
bandit-based RecSys.
• Each work proposes their own evaluation approach.
• No concrete benchmarks.
• Hard to compare between related works.
• All evaluation set-up details are not necessarily shared.
• Difficult to reproduce/replicate evaluation set-up.
9
BEARS: BANDIT-BASED RECOMMENDER SYSTEM
EVALUATION FRAMEWORK
Goals:
• Enable faster and reproducible evaluation in a common open-
source platform.
• Provide simple and extensible building blocks based on the
Reinforcement Learning framework.
• Allow to share benchmark problem settings and reusable
implementations of solution approaches.
• Incorporate flexible tools to easily keep track of metrics along the
interaction.
10
AGENDA
Introduction

MAIN COMPONENTS
Experiment Evaluator RS Agent RS Environment
Assess the Keep track of Solution Representation of

agent-environment multiple metrics Approach. the real-world
interaction. over time steps. problem setting.
12
RS AGENT
Agent
Value Function Agent’s knowledge about the expected payoff of actions.
Action Selection Policy Uses Action Value Estimates to select the next action
considering both exploration and exploitation goals.
Update Knowledge Bridge Select Next Action

observation/state action
Value Action Value Action Selection
reward
Function Estimates Policy
Example #1: Example #1: Estimated Expected Example #1:

Classic RS. Matrix Factorization. Reward only for
Greedy.
available actions at
the current state "# .
Example #2: Example #2: Example #2:
Simple Bandit. Use observed !-Greedy.
average reward as an
estimate for
arm/action value.
13
RS ENVIRONMENT
• Models the Agent’s problem setting. Domain or application

scenario where the Agent is deployed.
• The RS Environment interface aims to be compatible with the
OpenAI Gym toolkit1.
Environment
ℛ(%" , &" ) = !"#$
action observation/state, reward
&" Dynamics of users and %"#$ !"#$
items.
1 https://gym.openai.com/
14
EXPERIMENT
An Experiment measures the performance of a single RS Agent when

interacting with a specific RS Environment, in terms of pre-defined
metrics.
BEARS considers two common experiments:
BASE EXPERIMENT MONTE-CARLO EXPERIMENT
15
EXPERIMENT
Executes a single Episode, i.e. a complete sequence

BASE EXPERIMENT
of ! agent-environment interactions.
Agent action def base_experiment (agent, env, evaluator, T):

for _ in range(T):
Episode
action = agent.step()
observation, reward = env.step(action)
observation
Environment agent.observation(observation, action, reward)
reward
evaluator.score(observation, action, reward)
return evaluator.results()
Evaluator
Records a snapshot of
the metric values.
16
EXPERIMENT
MONTE CARLO Executes E Episodes. Uses repeated sampling to consider the

EXPERIMENT stochastic properties of a RS Agent and/or RS Environment.
Before each episode run:

E • Informs to the Evaluator that a new
Episode episode will be executed.
Episode
Episode • Reset the Agent and the Environment.
Episode
Episode After an episode run has executed:
• Informs to the Evaluator that the
episode run has finished.
Result: Summary statistics about the overall performance of the RS Agent across
the different episode runs.
17
EVALUATOR
• Supports the collection of metric values throughout an Experiment.

• Array of Scorer components: Each Scorer measures a different Metric.
METRIC
TIME
SCORER
AGGREGATOR
EPISODE
AGGREGATOR
18
EVALUATOR METRIC TIME AGGREGATOR
Every Checkpoint time steps:

1 2 . . . " . . . $
Assess the issued reward '( according to the Metric.
,( = 9,"':* +;),'<=":+8, =*":+8, ',?='@
Keep track of the Time Aggregated score: )*+',(-./ ["].
DEFAULT )*+',(234 " = ,(
CUMULATIVE )*+',(-./ ["] = )*+',(-./ " − 1 + ,(
,( − )*+',(-./ " − 1
AVERAGE )*+',(-./ " = )*+',(-./ " − 1 +
8+1
Result: Time series of time aggregated metric values.
19
EVALUATOR EPISODE AGGREGATOR
• Capture performance across different episode runs.

• The Evaluator is informed of the start and end of an episode.
At the end of an Episode:

Aggregate collected episode sample ( !"#$%&'() ) to the Episode
Aggregated score.
Checkpoints
mean Incrementally
!"#$%)*'+,-)+ =
updated summary
min statistics
max per checkpoint.
var
std
Result: Episode Aggregated time series of summary statistics related to metric values.
20
Elements to build Experiments using BEARS:
Experiment Environment Evaluator

Base Experiment
Agent Scorer
Monte Carlo Experiment
Value Function Metric
Action Selection Policy Time Aggregator
Episode Aggregator
21
AGENDA
Introduction

CONCLUSION AND FUTURE WORK
MOTIVATION
• Standard RecSys evaluation methods are inappropriate to test
solutions that learn through interaction.
• Reproducing past results can be a challenge due to the lack of tools
and benchmarks.
PROPOSED SOLUTION
• BEARS is an open-source evaluation framework designed to enable the
construction of shareable experiments.
INTERESTING EXTENSIONS
• Expand built-in tools to include more baseline Agents, benchmark
Environments, common Experiments and Scorers.
• Broaden the use of BEARS to include RecSys approaches based on RL
concepts beyond the bandit approach.
23
We would like to hear your feedback.
Any comments, contributions or requests to test the
framework, please write to:
Andrea Barraza
<andrea.barraza@insight-centre.org>
VIDEO OF THE PRESENTATION: https://youtu.be/GIH_ArJ-ylk

BEARS: Towards An Evaluation Framework For Bandit-Based Interactive Recommender Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BEARS: Towards An Evaluation Framework For Bandit-Based Interactive Recommender Systems

Uploaded by

Copyright:

Available Formats

BEARS: TOWARDS AN EVALUATION FRAMEWORK FOR

BANDIT-BASED INTERACTIVE RECOMMENDER SYSTEMS

Offline Evaluation for Recommender Systems Workshop (REVEAL’18)

October 7th, 2018

BEARS Evaluation Framework

Conclusion and Future Work

VIDEO OF THE PRESENTATION: https://youtu.be/GIH_ArJ-ylk

BEARS Evaluation Framework

Conclusion and Future Work

Explore other preferences the

Face the Filter Bubble

Learn by interacting with the

Trial – error search / learning.

RL can offer a proper framework to represent RecSys that learn from

Current state reward Explicit/Implicit action Single item/List

Recent RecSys solutions have proposed to address the exploration and

Unknown reward !"# !"$ !"%

Actions Action Value Action Selection Strategy

CTR< ≈ >< 1−; Exploit

• Evaluation based on Supervised Learning is inappropriate to test

BEARS Evaluation Framework

Conclusion and Future Work

Experiment Evaluator RS Agent RS Environment

Assess the Keep track of Solution Representation of

Update Knowledge Bridge Select Next Action

Example #1: Example #1: Estimated Expected Example #1:

• Models the Agent’s problem setting. Domain or application

An Experiment measures the performance of a single RS Agent when

BEARS considers two common experiments:

BASE EXPERIMENT MONTE-CARLO EXPERIMENT

Executes a single Episode, i.e. a complete sequence

Agent action def base_experiment (agent, env, evaluator, T):

MONTE CARLO Executes E Episodes. Uses repeated sampling to consider the

Before each episode run:

• Supports the collection of metric values throughout an Experiment.

Every Checkpoint time steps:

Assess the issued reward '( according to the Metric.

,( = 9,"':* +;),'<=":+8, =*":+8, ',?='@

Keep track of the Time Aggregated score: )*+',(-./ ["].

DEFAULT )*+',(234 " = ,(

CUMULATIVE )*+',(-./ ["] = )*+',(-./ " − 1 + ,(

Result: Time series of time aggregated metric values.

• Capture performance across different episode runs.

At the end of an Episode:

Elements to build Experiments using BEARS:

Experiment Environment Evaluator

Action Selection Policy Time Aggregator

BEARS Evaluation Framework

Conclusion and Future Work

VIDEO OF THE PRESENTATION: https://youtu.be/GIH_ArJ-ylk

You might also like

CUMULATIVE )+',(-./ ["] = )+',(-./ " − 1 + ,(