Professional Documents
Culture Documents
BEARS: Towards An Evaluation Framework For Bandit-Based Interactive Recommender Systems
BEARS: Towards An Evaluation Framework For Bandit-Based Interactive Recommender Systems
Introduction
Introduction
4
REINFORCEMENT LEARNING (RL)
Tries to Walk
Branch of Machine Learning.
Reinforcement
5
RECSYS AS REINFORCEMENT LEARNING
RS Agent
Agent
#"%&
Environment
$"%&
RS Environment
6
THE MULTI-ARMED BANDIT PROBLEM (MAB)
?
?
( trials
Maximize total payoff.
Exploit or Explore
7
MULTI-ARMED BANDITS AND RECSYS
CTR ? ≈ >? ;
Explore
CTR @ ≈ >@
# -./012
CTR estimate =
# /345622/782
8
EVALUATION CHALLENGES OF BANDIT-BASED
INTERACTIVE RECSYS
9
BEARS: BANDIT-BASED RECOMMENDER SYSTEM
EVALUATION FRAMEWORK
Goals:
• Enable faster and reproducible evaluation in a common open-
source platform.
• Provide simple and extensible building blocks based on the
Reinforcement Learning framework.
• Allow to share benchmark problem settings and reusable
implementations of solution approaches.
• Incorporate flexible tools to easily keep track of metrics along the
interaction.
10
AGENDA
Introduction
MAIN COMPONENTS
12
RS AGENT
Agent
Value Function Agent’s knowledge about the expected payoff of actions.
Action Selection Policy Uses Action Value Estimates to select the next action
considering both exploration and exploitation goals.
13
RS ENVIRONMENT
Environment
ℛ(%" , &" ) = !"#$
action observation/state, reward
&" Dynamics of users and %"#$ !"#$
items.
1 https://gym.openai.com/
14
EXPERIMENT
15
EXPERIMENT
Episode
action = agent.step()
observation, reward = env.step(action)
observation
Environment agent.observation(observation, action, reward)
reward
evaluator.score(observation, action, reward)
return evaluator.results()
Evaluator
Records a snapshot of
the metric values.
16
EXPERIMENT
Result: Summary statistics about the overall performance of the RS Agent across
the different episode runs.
17
EVALUATOR
METRIC
TIME
SCORER
AGGREGATOR
EPISODE
AGGREGATOR
18
EVALUATOR METRIC TIME AGGREGATOR
,( − )*+',(-./ " − 1
AVERAGE )*+',(-./ " = )*+',(-./ " − 1 +
8+1
19
EVALUATOR EPISODE AGGREGATOR
var
std
Result: Episode Aggregated time series of summary statistics related to metric values.
20
BEARS: BANDIT-BASED RECOMMENDER SYSTEM
EVALUATION FRAMEWORK
Episode Aggregator
21
AGENDA
Introduction
MOTIVATION
• Standard RecSys evaluation methods are inappropriate to test
solutions that learn through interaction.
• Reproducing past results can be a challenge due to the lack of tools
and benchmarks.
PROPOSED SOLUTION
• BEARS is an open-source evaluation framework designed to enable the
construction of shareable experiments.
INTERESTING EXTENSIONS
• Expand built-in tools to include more baseline Agents, benchmark
Environments, common Experiments and Scorers.
• Broaden the use of BEARS to include RecSys approaches based on RL
concepts beyond the bandit approach.
23
We would like to hear your feedback.
Any comments, contributions or requests to test the
framework, please write to:
Andrea Barraza
<andrea.barraza@insight-centre.org>