Professional Documents
Culture Documents
E0234 PPT
E0234 PPT
DECISION
Harsh Vishwakarma
21532
MTech, CSA
Introduction
Experiments
Table of Contents
Introduction
Experiments
Introduction
What is interpretability
I A model’s ability to provide insights for its decisions or inner
working, whether intrinsically or not, is referred to as
interpretability.
I Complex models, such as transformers, cannot provide
interpretations out of the box, and therefore posthoc
techniques are typically applied. The representations of an
interpretation include, among others, rules, heatmaps, and
feature importance.
Interpretability of Transformer
Introduction
Experiments
Optimus Transformer Interpretibility
Introduction
Experiments
How Attention matrix is interpreted
Operations over Attention Matrices
Introduction
Experiments
Metric to measure the interpretations
Proposed solution
I Authors suggest replacing tokens with a special token such as
”[UNK]” (unknown token) instead of deleting them entirely.
By replacing tokens with ”[UNK]”, the influence of the
replaced token is neutralized while minimally affecting the
context for the remaining tokens.
I
Table of Contents
Introduction
Experiments
Experiments