Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

BEARS: TOWARDS AN EVALUATION FRAMEWORK FOR

BANDIT-BASED INTERACTIVE RECOMMENDER SYSTEMS


Andrea Barraza-Urbina1
Georgia Koutrika 2, Mathieu d’Aquin 1 and Conor Hayes 1

Offline Evaluation for Recommender Systems Workshop (REVEAL’18)


The ACM Recommender Systems 2018 (RecSys’18)
Vancouver, Canada
1 Data Science Institute, The Insight Centre for Data Analytics 2 Athena Research Center
National University of Ireland Galway Athens, Greece
Galway, Ireland

October 7th, 2018


AGENDA

Introduction

BEARS Evaluation Framework

Conclusion and Future Work

VIDEO OF THE PRESENTATION: https://youtu.be/GIH_ArJ-ylk


AGENDA

Introduction

BEARS Evaluation Framework

Conclusion and Future Work


EXPLOITATION VS. EXPLORATION IN RECSYS

Explore other preferences the


Exploit the known user might have.
user model. Adapt to user taste drift Maximize long-term
user satisfaction
Classical RecSys Maximize user Face user/item cold-start
approaches immediate
satisfaction Active Learning
Accuracy-based
metrics Beyond-accuracy metrics

Face the Filter Bubble


Exploitation
Exploration

4
REINFORCEMENT LEARNING (RL)

Tries to Walk
Branch of Machine Learning.

Learn by interacting with the


environment which provides
reward signals.

Trial – error search / learning.

Reinforcement

5
RECSYS AS REINFORCEMENT LEARNING

RL can offer a proper framework to represent RecSys that learn from


evaluative feedback when interacting with an uncertain environment.

RS Agent

Agent

Current state reward Explicit/Implicit action Single item/List


Target User $" #" User Feedback !" Recommendation

#"%&
Environment
$"%&
RS Environment

6
THE MULTI-ARMED BANDIT PROBLEM (MAB)

Recent RecSys solutions have proposed to address the exploration and


exploitation trade-off by using Multi-Armed Bandits.

Unknown reward !"# !"$ !"%


probabilities with
mean !"& .

?
?
( trials
Maximize total payoff.

Exploit or Explore

7
MULTI-ARMED BANDITS AND RECSYS

Actions Action Value Action Selection Strategy


Estimates Epsilon Greedy

CTR< ≈ >< 1−; Exploit

CTR ? ≈ >? ;
Explore

CTR @ ≈ >@

# -./012
CTR estimate =
# /345622/782

8
EVALUATION CHALLENGES OF BANDIT-BASED
INTERACTIVE RECSYS

• Evaluation based on Supervised Learning is inappropriate to test


how a system learns over time using interactive feedback.
• There is no standard methodology to evaluate offline interactive
bandit-based RecSys.
• Each work proposes their own evaluation approach.
• No concrete benchmarks.
• Hard to compare between related works.
• All evaluation set-up details are not necessarily shared.
• Difficult to reproduce/replicate evaluation set-up.

9
BEARS: BANDIT-BASED RECOMMENDER SYSTEM
EVALUATION FRAMEWORK

Goals:
• Enable faster and reproducible evaluation in a common open-
source platform.
• Provide simple and extensible building blocks based on the
Reinforcement Learning framework.
• Allow to share benchmark problem settings and reusable
implementations of solution approaches.
• Incorporate flexible tools to easily keep track of metrics along the
interaction.

10
AGENDA

Introduction

BEARS Evaluation Framework

Conclusion and Future Work


BEARS: BANDIT-BASED RECOMMENDER SYSTEM
EVALUATION FRAMEWORK

MAIN COMPONENTS

Experiment Evaluator RS Agent RS Environment

Assess the Keep track of Solution Representation of


agent-environment multiple metrics Approach. the real-world
interaction. over time steps. problem setting.

12
RS AGENT
Agent
Value Function Agent’s knowledge about the expected payoff of actions.

Action Selection Policy Uses Action Value Estimates to select the next action
considering both exploration and exploitation goals.

Update Knowledge Bridge Select Next Action


observation/state action
Value Action Value Action Selection
reward
Function Estimates Policy

Example #1: Example #1: Estimated Expected Example #1:


Classic RS. Matrix Factorization. Reward only for
Greedy.
available actions at
the current state "# .
Example #2: Example #2: Example #2:
Simple Bandit. Use observed !-Greedy.
average reward as an
estimate for
arm/action value.

13
RS ENVIRONMENT

• Models the Agent’s problem setting. Domain or application


scenario where the Agent is deployed.
• The RS Environment interface aims to be compatible with the
OpenAI Gym toolkit1.

Environment
ℛ(%" , &" ) = !"#$
action observation/state, reward
&" Dynamics of users and %"#$ !"#$
items.

1 https://gym.openai.com/

14
EXPERIMENT

An Experiment measures the performance of a single RS Agent when


interacting with a specific RS Environment, in terms of pre-defined
metrics.

BEARS considers two common experiments:

BASE EXPERIMENT MONTE-CARLO EXPERIMENT

15
EXPERIMENT

Executes a single Episode, i.e. a complete sequence


BASE EXPERIMENT
of ! agent-environment interactions.

Agent action def base_experiment (agent, env, evaluator, T):


for _ in range(T):

Episode
action = agent.step()
observation, reward = env.step(action)
observation
Environment agent.observation(observation, action, reward)
reward
evaluator.score(observation, action, reward)
return evaluator.results()
Evaluator
Records a snapshot of
the metric values.

16
EXPERIMENT

MONTE CARLO Executes E Episodes. Uses repeated sampling to consider the


EXPERIMENT stochastic properties of a RS Agent and/or RS Environment.

Before each episode run:


E • Informs to the Evaluator that a new
Episode episode will be executed.
Episode
Episode • Reset the Agent and the Environment.
Episode
Episode After an episode run has executed:
• Informs to the Evaluator that the
episode run has finished.

Result: Summary statistics about the overall performance of the RS Agent across
the different episode runs.

17
EVALUATOR

• Supports the collection of metric values throughout an Experiment.


• Array of Scorer components: Each Scorer measures a different Metric.

METRIC

TIME
SCORER
AGGREGATOR

EPISODE
AGGREGATOR

18
EVALUATOR METRIC TIME AGGREGATOR

Every Checkpoint time steps:


1 2 . . . " . . . $

Assess the issued reward '( according to the Metric.

,( = 9,"':* +;),'<=":+8, =*":+8, ',?='@

Keep track of the Time Aggregated score: )*+',(-./ ["].

DEFAULT )*+',(234 " = ,(

CUMULATIVE )*+',(-./ ["] = )*+',(-./ " − 1 + ,(

,( − )*+',(-./ " − 1
AVERAGE )*+',(-./ " = )*+',(-./ " − 1 +
8+1

Result: Time series of time aggregated metric values.

19
EVALUATOR EPISODE AGGREGATOR

• Capture performance across different episode runs.


• The Evaluator is informed of the start and end of an episode.

At the end of an Episode:


Aggregate collected episode sample ( !"#$%&'() ) to the Episode
Aggregated score.
Checkpoints
mean Incrementally
!"#$%)*'+,-)+ =
updated summary
min statistics
max per checkpoint.

var
std

Result: Episode Aggregated time series of summary statistics related to metric values.

20
BEARS: BANDIT-BASED RECOMMENDER SYSTEM
EVALUATION FRAMEWORK

Elements to build Experiments using BEARS:

Experiment Environment Evaluator


Base Experiment
Agent Scorer
Monte Carlo Experiment
Value Function Metric

Action Selection Policy Time Aggregator

Episode Aggregator

21
AGENDA

Introduction

BEARS Evaluation Framework

Conclusion and Future Work


CONCLUSION AND FUTURE WORK

MOTIVATION
• Standard RecSys evaluation methods are inappropriate to test
solutions that learn through interaction.
• Reproducing past results can be a challenge due to the lack of tools
and benchmarks.

PROPOSED SOLUTION
• BEARS is an open-source evaluation framework designed to enable the
construction of shareable experiments.

INTERESTING EXTENSIONS
• Expand built-in tools to include more baseline Agents, benchmark
Environments, common Experiments and Scorers.
• Broaden the use of BEARS to include RecSys approaches based on RL
concepts beyond the bandit approach.

23
We would like to hear your feedback.
Any comments, contributions or requests to test the
framework, please write to:
Andrea Barraza
<andrea.barraza@insight-centre.org>

VIDEO OF THE PRESENTATION: https://youtu.be/GIH_ArJ-ylk

You might also like