Professional Documents
Culture Documents
Biu 2004 May Xmas
Biu 2004 May Xmas
Policies
Sarit Kraus
Bar-Ilan University
Israel
Motivation: explaining
MARL black box policy
• MARL policy is difficult to
understand
• Why do we need to explain MARL?
– Improve system transparency
– Higher understandability
– Increase user satisfaction
– Increase user trust
– Better human-agent cooperation
Explainable Multi-Agent
Reinforcement Learning For
Temporal Queries
• Kayla Boggess1, Sarit Kraus2, and Lu Feng1
• 1
University of Virginia
• 2
Bar-Ilan University
• {kjb5we, lu.feng}@virginia.edu, sarit@cs.biu.ac.il
– Temporal (IJCAI23) I
5
Policy Summarization:
Policy Abstraction
• Train joint policy for N
agents
• Generate abstract features
for new state space
containing adequate
information for explanation
• Convert each training
Victim_Detected
Obstacle_Not_Detected
sample to corresponding
Fire_Not_Detected abstract state
Victim_Not_Complete • Compute transition
Obstacle_Not_Complete
Fire_Not_Complete probability via frequency
counting
Policy Summarization
• Search policy graph for most
probable path
• Extract agent cooperation
and task sequences (If an
agent satisfies a task, assign
task to all agents involved.)
• Generate chart to show to
user
C
Hypotheses
• H1: Chart-based summarizations lead to
better user performance than GIF-based.
• H2: Chart-based summarizations yield higher
user ratings on explanation goodness metrics
than GIF-based.
User Study: Policy Summarization (116
subjects)
SD=0.6 I B
SD=0.4 A C
10
Construct a policy abstraction MMDP given π
PCTL*
User Query
Task 1 Task 2 Task
3 1 Query Checking
Robot I Obstacle Victim - with Temporal Logic
Robot II Obstacle - Fire
Robot - - Fire Feasible Query
III
“Your plan is
feasible.”
Feasible Query
2 Guided Rollout “Your plan is
feasible.”
Infeasible Query
3 Explanation Generation “The robots cannot
rescue the victim
because Robot I needs
Robot III to help rescue
the victim.” 11
Task 1 Task 2
Robot I - -
Robot II Fire Obstacle
12
Explanation Generation
• Find highest U-value in policy abstraction
– U+1 is failed task causing infeasibility
16
Real-time XMRL
House Hill
Pond
Bridge
Impassible Forst
Path
Augmented Reality for XMRL
Maps and summaries shown on the user’s pad
Pond
Bridge
Impassible
Forst
Impassible
Path
Forst
Sea Beach
Beach House
Sea
Human Study: Performance (23 subjects)
Human Study: Questionnaires
Conclusions of MARL Explanations
Developed a method to generate explanations to answer
temporal user queries for multi-agent reinforcement learning
24
ADESSE: Advice Explanations in Complex Repeated Decision-Making Environments:
Prediction+DRL
Feng Schleibaum
An Explainable Agent Consisting of Two Components Provides Advice to a Human Decision-maker.
Taxi Repositioning Game
Fighting
Fire
Contrastive Explainable
Clustering with Differential
Privacy