Professional Documents
Culture Documents
Studying - Dual Graph Convolutional Networks For Aspect-Based Sentiment Analysis
Studying - Dual Graph Convolutional Networks For Aspect-Based Sentiment Analysis
Studying - Dual Graph Convolutional Networks For Aspect-Based Sentiment Analysis
Reference: https://www.youtube.com/watch?v=GXhBEj1ZtE8
Reference: https://purelyvivid.github.io/2019/07/07/GCN_1/
Given a graph with n nodes, the graph can be represented as an adjacency matrix A ∈ R n× n.
n
h =σ ( ∑ A ij W l hl−1
l
i
l
j +b ) →
hidden state representation in the i -th node at the l -th layer
j=1
{
Aij = 1 ,∧if thei−thnode is connected the j−th node ¿ 0 ,∧otherwise ¿
¿
l
W → a weight matrix
b l → a bias term
x={x 1 , x 2 , ⋯ , x n }→ the word embeddings of the sentences from an embedding lookup table
|V |× de
E∈R
|V |→ the size of vocabulary
d e → denotes the dimensionality of word embeddings
H={h1 , h2 ,⋯ , hn }→ hidden state vectors hi ∈ R2 d
The dimensionality of a hidden state vector d is output by a unidirectional LSTM
input: [CLS] sentence [SEP] aspect [SEP] ↦ aspect-aware hidden representations of the
sentence
[CLS] sentence [SEP] aspect [SEP] → a sentence-aspect pair
--------------------------
Idea: the probability matrix representing dependencies between words contains rich syntactic
information compared with the final discrete output of a dependency parser.
LAL-Parser: dependency parser ↦ dependency probability matrix
Reference: https://www.youtube.com/watch?v=9erBrs-VIqc&t=855s
n
Eq . (1 ) → hli=σ ( ∑ Aij W l hl−1 l
j +b )
j=1
syn syn d
H → The syntactic graph representation hi ∈ R
--------------------------
Self-Attention
Reference: https://www.youtube.com/watch?v=hYdO9CscNes
Q K T
QW ×( K W )
A sem=softmax( ) → the adjacency matrix of SemGCN A sem ∈ R n ×n
√d
Q k
W and W → are learnable weight matrices
sem d
H sem → The Semantic graph representation hi ∈R
--------------------------
BiAffine Module: H syn + H sem ↦ H syn ', H syn + H sem ↦ H sem '
Reference: https://aclanthology.org/Y18-1052.pdf
f →an average pooling function applied over the aspect node representations
--------------------------
σ :r ↦ p
σ → softmax function pa=softmax (W p r +b p )
--------------------------
Regularizer
Idea: the semantically related terms of each word should not overlap. Therefore, we
encourage the attention probability distributions over words to be orthogonal
I → an identity matrix
√∑ ∑| |
m n
2
F → the Frobenius norm ‖ A‖F = aij
i j=1
1
the differential regularizer: R D=
‖A sem
− A s yn‖F
Idea: the two representations should contain significantly distinct information captured by the
syntactic dependency and the semantic correlation
Loss Function
l T =l C + λ1 R O + λ2 R D + λ3‖Θ‖2
λ 1 , λ2 , λ3 → regularization coefficients
Reference: https://allen108108.github.io/blog/2019/10/22/L1%20,%20L2%20Regularization
%20 到底正則化了什麼%20_/
l C =− ∑ ∑ log p (a)
(s , a)∈ D c∈ C
pa=softmax (W p r +b p )
Introduction
The key point in solving the ABSA task is to model the dependency relationship between an
aspect and its corresponding opinion expressions.
Experiments
越淺關聯性越強