Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

AN ATTENTION MATRIX FOR EVERY

DECISION

Harsh Vishwakarma
21532
MTech, CSA

Deep Learning for NLP


E0-334
Table of Contents

Introduction

Optimus Transformer Interpretibility

Operations over Attention Matrices

Metric to measure the interpretations

Experiments
Table of Contents

Introduction

Optimus Transformer Interpretibility

Operations over Attention Matrices

Metric to measure the interpretations

Experiments
Introduction

What is the most important aspect of a model in the real-world


scenario
I The model needs to be interpretable
Introduction

What is the most important aspect of a model in the real-world


scenario
I The model needs to be interpretable
I Enhances the performances on binary and multi-label data
Introduction

What is the most important aspect of a model in the real-world


scenario
I The model needs to be interpretable
I Enhances the performances on binary and multi-label data
I The authors mainly focused on:
I a new technique that selects the most faithful attention-based
interpretation among the several ones that can be obtained by
combining different head, layer, and matrix operations.
Interpretability

What is interpretability
I A model’s ability to provide insights for its decisions or inner
working, whether intrinsically or not, is referred to as
interpretability.
I Complex models, such as transformers, cannot provide
interpretations out of the box, and therefore posthoc
techniques are typically applied. The representations of an
interpretation include, among others, rules, heatmaps, and
feature importance.
Interpretability of Transformer

How can we interpret the results generated by the transformer


I The most popular transformer-specific interpretability
approach is the use of self-attention scores
Interpretability of Transformer

How can we interpret the results generated by the transformer


I The most popular transformer-specific interpretability
approach is the use of self-attention scores
I We can also generate attention maps as to check which part
of the input get most attention for a particular input instance
Feature importance based methods

I We can consider techniques like Layer-wise Relevance


Propagation (LRP) to check the gradient flow during
backpropagation as how the updates are being made
corresponding to each feature
Feature importance based methods

I We can consider techniques like Layer-wise Relevance


Propagation (LRP) to check the gradient flow during
backpropagation as how the updates are being made
corresponding to each feature
I Some of the ready-to-use interpretations that use the similar
idea are LIME, IG, SHARP
How is interpretibility evaluated?

I Comprehensibility : calculates the percentage of non-zero


weights in an interpretation. The lower this number, the
easier for end users to comprehend the interpretation.
How is interpretibility evaluated?

I Comprehensibility : calculates the percentage of non-zero


weights in an interpretation. The lower this number, the
easier for end users to comprehend the interpretation.
I Faithfulness Score: eliminates the token with the highest
importance score from the examined instance and measures
how much the prediction changes. Higher changes signify
better interpretations.
Table of Contents

Introduction

Optimus Transformer Interpretibility

Operations over Attention Matrices

Metric to measure the interpretations

Experiments
Optimus Transformer Interpretibility

Objective: Given a transformer model f , and an input sequence x


= [t1 , . . . , tS ], consisting of S tokens ti , i = 1 . . . S, our goal
is to extract a local interpretation z = [w1 , . . . , wS ], where wi 
R signifies the influence of token ti on the model’s decision f (x),
based on the model’s self-attention scores.
Using Attention scores

We know that the Attention scores corresponding to each token


are generated as:
T
I A = softmax( Q.K
√ + mask) where ARSxS , S: length of
d
sequence
Using Attention scores

We know that the Attention scores corresponding to each token


are generated as:
T
I A = softmax( Q.K √ + mask) where ARSxS , S: length of
d
sequence
I To get beneficial scores for both polarities, the authors
consider interpretations, they removed the softmax function
and named that matrix as A∗ .
Table of Contents

Introduction

Optimus Transformer Interpretibility

Operations over Attention Matrices

Metric to measure the interpretations

Experiments
How Attention matrix is interpreted
Operations over Attention Matrices

Aggregation of Attention Matrix:The process involves


aggregating attention matrices across all heads within each
self-attention layer.

Head Operations:Common operations applied to the


attention matrices of each head. Averaging and summing
essentially give the same token importance order, differing
only in the magnitude of scores assigned to tokens
Operations over Attention Matrices

Final Interpretation Vector:


I Operations like ”From [CLS]” and ”To [CLS]” involve
extracting attention regarding the special [CLS] token that is
typically prepended in text classification tasks. This operation
considers the attention the [CLS] token gives and receives
from other tokens.
Operations over Attention Matrices
Selecting the Best Interpretation

I Select the best set of operations by iterating through different


combinations of operations across the layers and heads in the
transformer model.
I Find the most faithful interpretation for a single instance. It
iterates through various combinations of head, layer, and
matrix operations, and for each combination, it evaluates the
faithfulness using the metric. The combination with the
highest faithfulness score is selected as the best interpretation.
Table of Contents

Introduction

Optimus Transformer Interpretibility

Operations over Attention Matrices

Metric to measure the interpretations

Experiments
Metric to measure the interpretations

Ranked Faithful Truthfulness


I Objective: RFT is designed to measure the importance of
each token in the interpretation of the model’s output. It
evaluates the impact of removing each token from the input
sequence on the model’s prediction, considering the weight of
the model’s interpretation of that token.
Metric to measure the interpretations

Problems that arises due to removing tokens


I In sequence-based models like recurrent neural networks and
transformers, the removal of tokens disrupts the context for
the remaining tokens.
I This disruption is especially crucial in transformer models that
employ positional encoding to understand the sequence
structure.
I Research suggests that simply removing words or tokens from
a sequence can lead to the creation of texts that are
considered out-of-distribution for the transformer model.
Metric to measure the interpretations

Proposed solution
I Authors suggest replacing tokens with a special token such as
”[UNK]” (unknown token) instead of deleting them entirely.
By replacing tokens with ”[UNK]”, the influence of the
replaced token is neutralized while minimally affecting the
context for the remaining tokens.

I
Table of Contents

Introduction

Optimus Transformer Interpretibility

Operations over Attention Matrices

Metric to measure the interpretations

Experiments
Experiments

The below figures contains the results of Ground truth and


Optimus Prime interpretability heatmaps
Results on HateSpeech dataset

Results on Assignment1 Classification1 dataset

Results on SNLI dataset


Experiments

Results on HoC dataset


Thank You!

You might also like