Studying - Dual Graph Convolutional Networks For Aspect-Based Sentiment Analysis

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

https://aclanthology.org/2021.acl-long.494.

pdf

Graph Convolutional Network (GCN)

Reference: https://www.youtube.com/watch?v=GXhBEj1ZtE8

Reference: https://purelyvivid.github.io/2019/07/07/GCN_1/

Given a graph with n nodes, the graph can be represented as an adjacency matrix A ∈ R n× n.

n
h =σ ( ∑ A ij W l hl−1
l
i
l
j +b ) →
hidden state representation in the i -th node at the l -th layer
j=1

{
Aij = 1 ,∧if thei−thnode is connected the j−th node ¿ 0 ,∧otherwise ¿
¿

l
W → a weight matrix

b l → a bias term

σ → is an activation function (e.g., ReLU)


Proposed DualGCN

In the ABSA task, a sentence-aspect pair ( s , a) is given,


a={a1 , a 2 , ⋯ ,a m }→ is an aspect.
s={w 1 , w 2 , ⋯ , wm }→ a sub-sequence of the entire sentence
BiLSTM/ BERT: word embeddings ↦ hidden contextual representations
Reference: https://www.youtube.com/watch?v=UYPa347-DdE&t=826s

For the BiLSTM encoder

x={x 1 , x 2 , ⋯ , x n }→ the word embeddings of the sentences from an embedding lookup table
|V |× de
E∈R
|V |→ the size of vocabulary
d e → denotes the dimensionality of word embeddings
H={h1 , h2 ,⋯ , hn }→ hidden state vectors hi ∈ R2 d
The dimensionality of a hidden state vector d is output by a unidirectional LSTM

For the BERT encoder

input: [CLS] sentence [SEP] aspect [SEP] ↦ aspect-aware hidden representations of the
sentence
[CLS] sentence [SEP] aspect [SEP] → a sentence-aspect pair

--------------------------

Syntax-based GCN (SynGCN)

Idea: the probability matrix representing dependencies between words contains rich syntactic
information compared with the final discrete output of a dependency parser.
LAL-Parser: dependency parser ↦ dependency probability matrix

Reference: https://www.youtube.com/watch?v=9erBrs-VIqc&t=855s

SynGCN module using Eq. (1): A syn+ H ↦ H syn

n
Eq . (1 ) → hli=σ ( ∑ Aij W l hl−1 l
j +b )
j=1

A syn → the syntactic encoding of an adjacency matrix A syn ∈ R n ×n

H → the hidden state vectors from BiLSTM

syn syn d
H → The syntactic graph representation hi ∈ R

{h1syn ,h 2syn , ⋯ ,h nsyn } is a hidden representation of the i th node

is a hidden representation of the aspect nodes


{ ha 1syn ,h a 2syn , ⋯ , hamsyn }

--------------------------

Semantic-based GCN (SemGCN)


Idea: the attention matrix shaped by self-attending, also viewed as an edge-weighted directed
graph, can represent semantic correlations between words

Self-Attention

Reference: https://www.youtube.com/watch?v=hYdO9CscNes

Q K T
QW ×( K W )
A sem=softmax( ) → the adjacency matrix of SemGCN A sem ∈ R n ×n
√d

Q and K → equal to the graph representations of previous layer of SemGCN module

Q k
W and W → are learnable weight matrices

d → the dimensionality of the input node feature.


Note that we use only one self-attention head to obtain an attention score matrix for a
sentence

SemGCN module using Eq. (1): A sem+ H ↦ H sem

sem d
H sem → The Semantic graph representation hi ∈R

{h1sem , h2sem , ⋯ , hnsem }→ a hidden representation of the i th node

a hidden representation of the aspect nodes


{ ha 1sem , ha 2sem , ⋯ , hamsem } →

--------------------------

BiAffine Module: H syn + H sem ↦ H syn ', H syn + H sem ↦ H sem '

BiAffine Module → a mutual BiAffine transformation as a bridge

Note that Biaffine transformation is a method to incorporate an attention mechanism into


binary relations

Reference: https://aclanthology.org/Y18-1052.pdf

H syn ' =softmax ¿

H sem ' =softmax ¿

W 1 and W 2 → trainable parameters.

syn syn syn syn sem sem sem sem


f : (ha , ha , ⋯ , h a ) ↦h a , (h a ,h a , ⋯ , ha )↦ ha
1 2 m 1 2 m

f →an average pooling function applied over the aspect node representations

h syn syn syn syn


a =f (ha , ha , ⋯ , h a )→ the final feature representation of SynGCN
1 2 m

h sem sem sem sem


a =f (ha , h a , ⋯ , ha )→ the final feature representation of SemGCN
1 2 m

r =[hasyn , hasem ]→ concatenation of the final feature representation

--------------------------

σ :r ↦ p
σ → softmax function pa=softmax (W p r +b p )

p → a sentiment probability distribution

W p and b p → the learnable weight and bias

--------------------------

Regularizer

the orthogonal regularizer: RO =‖ A sem A semT −I‖F

Idea: the semantically related terms of each word should not overlap. Therefore, we
encourage the attention probability distributions over words to be orthogonal

I → an identity matrix

√∑ ∑| |
m n
2
F → the Frobenius norm ‖ A‖F = aij
i j=1

1
the differential regularizer: R D=
‖A sem
− A s yn‖F

Idea: the two representations should contain significantly distinct information captured by the
syntactic dependency and the semantic correlation

Note that the regularizer is only restrictive to A sem

Loss Function

l T =l C + λ1 R O + λ2 R D + λ3‖Θ‖2
λ 1 , λ2 , λ3 → regularization coefficients

Reference: https://allen108108.github.io/blog/2019/10/22/L1%20,%20L2%20Regularization
%20 到底正則化了什麼%20_/

Θ → represents all trainable model parameters

l C → a standard cross-entropy loss

l C =− ∑ ∑ log p (a)
(s , a)∈ D c∈ C

D → contains all sentence-aspect pairs

C → the collection of distinct sentiment polarities

pa=softmax (W p r +b p )

Introduction
The key point in solving the ABSA task is to model the dependency relationship between an
aspect and its corresponding opinion expressions.

Experiments
越淺關聯性越強

You might also like