Professional Documents
Culture Documents
Xu-Ly-Ngon-Ngu-Tu-Nhien - Christopher-Manning - Cs224n-2019-Lecture16-Coref - (Cuuduongthancong - Com)
Xu-Ly-Ngon-Ngu-Tu-Nhien - Christopher-Manning - Cs224n-2019-Lecture16-Coref - (Cuuduongthancong - Com)
Christopher Manning
Christopher Manning and Richard Socher
Lecture 16: Coreference Resolution
Lecture 2: Word Vectors
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Announcements
• We plan to get HW5 grades back tomorrow before the add/drop
deadline
1
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Lecture Plan:
Lecture 16: Coreference Resolution
1. What is Coreference Resolution? (15 mins)
2. Applications of coreference resolution (5 mins)
3. Mention Detection (5 mins)
4. Some Linguistics: Types of Reference (5 mins)
Four Kinds of Coreference Resolution Models
5. Rule-based (Hobbs Algorithm) (10 mins)
6. Mention-pair models (10 mins)
7. Mention ranking models (15 mins)
• Including the current state-of-the-art coreference system!
8. Mention clustering model (5 mins – only partial coverage)
9. Evaluation and current results (10 mins)
2
CuuDuongThanCong.com https://fb.com/tailieudientucntt
1. What is Coreference Resolution?
• Identify all mentions that refer to the same real world entity
3
CuuDuongThanCong.com https://fb.com/tailieudientucntt
What is Coreference Resolution?
• Identify all mentions that refer to the same real world entity
4
CuuDuongThanCong.com https://fb.com/tailieudientucntt
What is Coreference Resolution?
• Identify all mentions that refer to the same real world entity
5
CuuDuongThanCong.com https://fb.com/tailieudientucntt
What is Coreference Resolution?
• Identify all mentions that refer to the same real world entity
6
CuuDuongThanCong.com https://fb.com/tailieudientucntt
What is Coreference Resolution?
• Identify all mentions that refer to the same real world entity
7
CuuDuongThanCong.com https://fb.com/tailieudientucntt
What is Coreference Resolution?
• Identify all mentions that refer to the same real world entity
8
CuuDuongThanCong.com https://fb.com/tailieudientucntt
A couple of years later, Vanaja met Akhila at the local park.
Akhila’s son Prajwal was just two months younger than her son
Akash, and they went to the same school. For the pre-school
play, Prajwal was chosen for the lead role of the naughty child
Lord Krishna. Akash was to be a tree. She resigned herself to
make Akash the best tree that anybody had ever seen. She
bought him a brown T-shirt and brown trousers to represent the
tree trunk. Then she made a large cardboard cutout of a tree’s
foliage, with a circular opening in the middle for Akash’s face.
She attached red balls to it to represent fruits. It truly was the
nicest tree. From The Star by Shruthi Rao, with some shortening.
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Applications
• Full text understanding
• information extraction, question answering, summarization, …
• “He was born in 1961” (Who?)
10
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Applications
• Full text understanding
• Machine translation
• languages have different features for gender, number,
dropped pronouns, etc.
11
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Applications
• Full text understanding
• Machine translation
• languages have different features for gender, number,
dropped pronouns, etc.
12
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Applications
• Full text understanding
• Machine translation
• Dialogue Systems
“Book tickets to see James Bond”
“Spectre is playing near you at 2:00 and 3:00 today. How many
tickets would you like?”
“Two tickets for the showing at three”
13
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Coreference Resolution in Two Steps
14
CuuDuongThanCong.com https://fb.com/tailieudientucntt
3. Mention Detection
• Mention: span of text referring to some entity
• Three kinds of mentions:
1. Pronouns
• I, your, it, she, him, etc.
2. Named entities
• People, places, etc.
3. Noun phrases
• “a dog,” “the big fluffy cat stuck in the tree”
15
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Mention Detection
• Span of text referring to some entity
• For detection: use other NLP systems
1. Pronouns
• Use a part-of-speech tagger
2. Named entities
• Use a NER system (like hw3)
3. Noun phrases
• Use a parser (especially a constituency parser – next week!)
16
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Mention Detection: Not so Simple
• Marking all pronouns, named entities, and NPs as mentions
over-generates mentions
• Are these mentions?
• It is sunny
• Every student
• No student
• The best donut in the world
• 100 miles
17
CuuDuongThanCong.com https://fb.com/tailieudientucntt
How to deal with these bad mentions?
• Could train a classifier to filter out spurious mentions
18
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Can we avoid a pipelined system?
• We could instead train a classifier specifically for mention
detection instead of using a POS tagger, NER system, and parser.
19
CuuDuongThanCong.com https://fb.com/tailieudientucntt
4. On to Coreference! First, some linguistics
• Coreference is when two mentions refer to the same entity in
the world
• Barack Obama traveled to … Obama
20
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Anaphora vs Coreference
• Coreference with named entities
text Barack Obama Obama
world
• Anaphora
text Barack Obama he
world
21
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Not all anaphoric relations are coreferential
• Not all noun phrases have reference
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Anaphora vs. Coreference
• Not all anaphoric relations are coreferential
23
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Anaphora vs. Cataphora
24
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Cataphora
25
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Four Kinds of Coreference Models
26
CuuDuongThanCong.com https://fb.com/tailieudientucntt
5. Traditional pronominal anaphora resolution:
Hobbs’ naive algorithm
1. Begin at the NP immediately dominating the pronoun
2. Go up tree to first NP or S. Call this X, and the path p.
3. Traverse all branches below X to the left of p, left-to-right,
breadth-first. Propose as antecedent any NP that has a NP or S
between it and X
4. If X is the highest S in the sentence, traverse the parse trees of
the previous sentences in the order of recency. Traverse each
tree left-to-right, breadth first. When an NP is encountered,
propose as antecedent. If X not the highest node, go to step 5.
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Hobbs’ naive algorithm (1976)
5. From node X, go up the tree to the first NP or S. Call it X, and
the path p.
6. If X is an NP and the path p to X came from a non-head phrase
of X (a specifier or adjunct, such as a possessive, PP, apposition, or
relative clause), propose X as antecedent
(The original said “did not pass through the N’ that X immediately
dominates”, but the Penn Treebank grammar lacks N’ nodes….)
7. Traverse all branches below X to the left of the path, in a left-
to-right, breadth first manner. Propose any NP encountered as
the antecedent
8. If X is an S node, traverse all branches of X to the right of the
path but do not go below any NP or S encountered. Propose
any NP as the antecedent.
9. Go to step 4
Until deep learning still often used as a feature in ML systems!
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Hobbs Algorithm Example
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Knowledge-based Pronominal Coreference
• She poured water from the pitcher into the cup until it was full
• She poured water from the pitcher into the cup until it was empty”
CuuDuongThanCong.com https://fb.com/tailieudientucntt
6. Coreference Models: Mention Pair
“I voted for Nader because he was most aligned with my values,” she said.
I Nader he my she
Coreference Cluster 1
Coreference Cluster 2
32
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Coreference Models: Mention Pair
“I voted for Nader because he was most aligned with my values,” she said.
I Nader he my she
33
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Coreference Models: Mention Pair
“I voted for Nader because he was most aligned with my values,” she said.
I Nader he my she
34
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Coreference Models: Mention Pair
“I voted for Nader because he was most aligned with my values,” she said.
I Nader he my she
35
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Mention Pair Training
• N mentions in a document
• yij = 1 if mentions mi and mj are coreferent, -1 if otherwise
• Just train with regular cross-entropy loss (looks a bit different
because it is binary classification)
I Nader he my she
37
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Mention Pair Test Time
• Coreference resolution is a clustering task, but we are only
scoring pairs of mentions… what to do?
• Pick some threshold (e.g., 0.5) and add coreference links
between mention pairs where is above the threshold
“I voted for Nader because he was most aligned with my values,” she said.
I Nader he my she
38
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Mention Pair Test Time
• Coreference resolution is a clustering task, but we are only
scoring pairs of mentions… what to do?
• Pick some threshold (e.g., 0.5) and add coreference links
between mention pairs where is above the threshold
• Take the transitive closure to get the clustering
“I voted for Nader because he was most aligned with my values,” she said.
I Nader he my she
Even though the model did not predict this coreference link,
I and my are coreferent due to transitivity
39
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Mention Pair Test Time
• Coreference resolution is a clustering task, but we are only
scoring pairs of mentions… what to do?
• Pick some threshold (e.g., 0.5) and add coreference links
between mention pairs where is above the threshold
• Take the transitive closure to get the clustering
“I voted for Nader because he was most aligned with my values,” she said.
I Nader he my she
Ralph
he his him Nader he
Nader
almost impossible
41
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Mention Pair Models: Disadvantage
• Suppose we have a long document with the following mentions
• Ralph Nader … he … his … him … <several paragraphs>
… voted for Nader because he …
Relatively easy
Ralph
he his him Nader he
Nader
almost impossible
NA I Nader he my she
43
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Coreference Models: Mention Ranking
NA I Nader he my she
44
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Coreference Models: Mention Ranking
NA I Nader he my she
NA I Nader he my she
Iterate through candidate For ones that …we want the model to
antecedents (previously
0 are coreferent assign a high probability
1
occurring mentions)
N YX i 1
to mj…
J= log @ (yij = 1)p(mj , mi )A
i=2 j=1
47
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Coreference Models: Training
• We want the current mention mj to be linked to any one of the
candidate antecedents it’s coreferent with.
• Mathematically, we want to maximize this probability:
i 1
X
(yij = 1)p(mj , mi )
j=1
Iterate through candidate For ones that …we want the model to
antecedents (previously
0 are coreferent assign a high probability
1
occurring mentions)
N i 1
YX
to mj…
J= log @ (yij = 1)p(mj , mi )A
i=2 j=1
• The model could produce 0.9 probability for one of the correct
antecedents and low probability for everything else, and the
sum will still be large
48
CuuDuongThanCong.com https://fb.com/tailieudientucntt
i 1
X
Coreference Models:
(yij Training
= 1)p(mj , mi )
j=1
• We want the current mention mj to be linked to any one of the
candidate antecedents it’s coreferent with.
0 1
• Mathematically, weNwant
i 1to maximize this probability:
YX
J= i@
logX1 (yij = 1)p(mj , mi )A
(yj=1
i=2 ij = 1)p(mj , mi )
j=1
• Turning this into a loss
0function: 1
N
X 0 X i 1 1
J= N@X
logY i 1 (yij = 1)p(mj , mi )A
J = i=2log @ j=1 (yij = 1)p(mj , mi )A
i=2 j=1
Iterate over all the mentions
in the document Usual trick of taking negative
log to go from likelihood to loss
49
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Mention Ranking Models: Test Time
NA I Nader he my she
50
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Mention Ranking Models: Test Time
NA I Nader he my she
51
CuuDuongThanCong.com https://fb.com/tailieudientucntt
How do we compute the probabilities?
52
CuuDuongThanCong.com https://fb.com/tailieudientucntt
A. Non-Neural Coref Model: Features
• Person/Number/Gender agreement
• Jack gave Mary a gift. She was excited.
• Semantic compatibility
• … the mining conglomerate … the company …
• Certain syntactic constraints
• John bought him a new car. [him can not be John]
• More recently mentioned entities preferred for referenced
• John went to a movie. Jack went as well. He was not busy.
• Grammatical Role: Prefer entities in the subject position
• John went to a movie with Jack. He was not busy.
• Parallelism:
• John went with Jack to a movie. Joe went with him to a bar.
• …
53
CuuDuongThanCong.com https://fb.com/tailieudientucntt
B. Neural Coref Model
• Standard feed-forward neural network
• Input layer: word embeddings and a few categorical features
Score s
Hidden Layer h3 W4h3 + b4
54
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Neural Coref Model: Inputs
• Embeddings
• Previous two words, first word, last word, head word, … of
each mention
• The head word is the “most important” word in the mention – you can
find it using a parser. e.g., The fluffy cat stuck in the tree
55
CuuDuongThanCong.com https://fb.com/tailieudientucntt
C. End-to-end Model
• Current state-of-the-art model for coreference resolution
(Kenton Lee et al. from UW, EMNLP 2017)
• Mention ranking model
• Improvements over simple feed-forward NN
• Use an LSTM
• Use attention
• Do mention detection and coreference end-to-end
• No mention detection step!
• Instead consider every span of text (up to a certain length) as a
candidate mention
• a span is just a contiguous sequence of words
56
CuuDuongThanCong.com https://fb.com/tailieudientucntt
End-to-end Model
• First embed the words in the document using a word embedding
matrix and
Mention score (s )
m
a character-level CNN
General Electric Electric said the the Postal Service Service contacted the the company
e 1: First step of the end-to-end coreference resolution model, which computes embedding r
tions of spans for scoring potential entity mentions. Low-scoring spans are pruned, so that o
geable number of spans is considered for coreference decisions. In general, the model conside
ble spans up to a maximum width, but we depict here only a small subset.
57
CuuDuongThanCong.com https://fb.com/tailieudientucntt
End-to-end Model
• Then run a bidirectional LSTM over the document
General Electric Electric said the the Postal Service Service contacted the the company
Mention score (sm )
e 1: First step of the end-to-end coreference resolution model, which computes embedding r
tions of spans for scoring potential entity mentions. Low-scoring spans are pruned, so that o
geable number of spans is considered for coreference decisions. In general, the model conside
ble spans up to a maximum width, but we depict here only a small subset.
58
CuuDuongThanCong.com https://fb.com/tailieudientucntt
End-to-end Model
• Next, represent each span of text i going from START(i) to END(i) as a
vector
Mention score (s )
m
General Electric Electric said the the Postal Service Service contacted the the company
e 1: First step of the end-to-end coreference resolution model, which computes embedding r
tions of spans for scoring potential entity mentions. Low-scoring spans are pruned, so that o
geable number of spans is considered for coreference decisions. In general, the model conside
ble spans up to a maximum width, but we depict here only a small subset.
59
CuuDuongThanCong.com https://fb.com/tailieudientucntt
End-to-end Model
• Next, represent each span of text i going from START(i) to END(i) as a
vector
Mention score (s )
m
General Electric Electric said the the Postal Service Service contacted the the company
• General,
e 1: First General
step of the Electric,
end-to-end General
coreference Electricmodel,
resolution said, …which
Electric, Electric
computes embedding r
tions of spans
said, for scoring
… will potential
all get its ownentity mentions.
vector Low-scoring spans are pruned, so that o
representation
geable number of spans is considered for coreference decisions. In general, the model conside
ble spans up to a maximum width, but we depict here only a small subset.
60
CuuDuongThanCong.com https://fb.com/tailieudientucntt
END(i)
" In the traini
exp(αk )
End-to-end Model is observed.
k= START(i)
optimize the
END (i)
• Next, represent each span of " text i going from START(i) to END(i) as a
antecedents
vector.
Mention score (s )
x̂
General Electric
i =
Electric said the a · x
the Postal
i,t
Service
t
Service contacted the the company
m
t= START(i)
Span representation (g)
∗ ∗
By optim
e 1:Span
Firstrepresentation: gi = [xcoreference
step of the end-to-end START(i)
, xresolution
END(i)
, x̂imodel,
, φ(i)]which computesrallyembedding
learns r
tions of spans for scoring potential entity mentions. Low-scoring spans are pruned, so that o
initial pruni
geable numberThisof spansgeneralizes thecoreference
is considered for recurrent span In repre-
decisions. general, the model conside
ble spans up tosentations
a maximum width, but we depict here only mentions rec
recently proposed fora small subset.
question-
quickly leve
answering (Lee et al., 2016), which only include
CuuDuongThanCong.com
ate credit as
https://fb.com/tailieudientucntt
END(i)
" In the traini
exp(αk )
End-to-end Model is observed.
k= START(i)
optimize the
END (i)
• Next, represent each span of " text i going from START(i) to END(i) as a
antecedents
vector. For
Mention score (s )
example, x̂ for
General Electric
i = “the postal
Electric said the a · x
service”
the Postal
i,t
Service
t
Service contacted the the company
m
t= START(i)
Span representation (g)
∗ ∗
By optim
e 1:Span
Firstrepresentation: gi = [xcoreference
step of the end-to-end START(i)
, xresolution
END(i)
, x̂imodel,
, φ(i)]which computesrallyembedding
learns r
tions of spans for scoring potential entity mentions. Low-scoring spans are pruned, so that o
initial pruni
geable numberThisof spansgeneralizes thecoreference
is considered for recurrent span In repre-
decisions. general, the model conside
ble spans up tosentations
a maximum width, but we depict here only mentions rec
recently proposed fora small subset.
question-
quickly leve
answering (Lee et al., 2016), which only include
CuuDuongThanCong.com
ate credit as
https://fb.com/tailieudientucntt
END(i)
" In the traini
exp(αk )
End-to-end Model is observed.
k= START(i)
optimize the
END (i)
• Next, represent each span of " text i going from START(i) to END(i) as a
antecedents
vector. For
Mention score (s )
example, x̂ for
General Electric
i = “the postal
Electric said the a · x
service”
the Postal
i,t
Service
t
Service contacted the the company
m
t= START(i)
Span representation (g)
∗ ∗
By optim
e 1:Span
Firstrepresentation: gi = [xcoreference
step of the end-to-end START(i)
, xresolution
END(i)
, x̂imodel,
, φ(i)]which computesrally
embedding
learns r
tions of spans for scoring potential entity mentions. Low-scoring spans are pruned, so that o
initial pruni
geable numberThisof spansgeneralizes thecoreference
is considered for recurrent span In repre-
decisions. general, the model conside
ble spans upBILSTM hidden states
tosentations
a maximum for but we depict here only a small subset.
width, mentions rec
span’s start and end recently proposed for question-
quickly leve
answering (Lee et al., 2016), which only include
CuuDuongThanCong.com
ate credit as
https://fb.com/tailieudientucntt
END(i)
" In the traini
exp(αk )
End-to-end Model is observed.
k= START(i)
optimize the
END (i)
• Next, represent each span of " text i going from START(i) to END(i) as a
antecedents
vector. For
Mention score (s )
example, x̂ for
General Electric
i = “the postal
Electric said the a · x
service”
the Postal
i,t
Service
t
Service contacted the the company
m
t= START(i)
Span representation (g)
∗ ∗
By optim
e 1:Span
Firstrepresentation: gi = [xcoreference
step of the end-to-end START(i)
, xresolution
END(i)
, x̂imodel,
, φ(i)]which computesrallyembedding
learns r
tions of spans for scoring potential entity mentions. Low-scoring spans are pruned, so that o
initial pruni
geable numberThisof spansgeneralizes thecoreference
is considered for recurrent span In repre-
decisions. general, the model conside
ble spans upBILSTM hidden states
tosentations
a maximum width, Attention-based
for but we depict representation
here only a small subset. mentions rec
span’s start and end recently proposed for question-
(details next slide) of the words quickly leve
answering (Lee et al.,in 2016),
the span which only include
CuuDuongThanCong.com
ate credit as
https://fb.com/tailieudientucntt
END(i)
" In the traini
exp(αk )
End-to-end Model is observed.
k= START(i)
optimize the
END (i)
• Next, represent each span of " text i going from START(i) to END(i) as a
antecedents
vector. For
Mention score (s )
example, x̂ for
General Electric
i = “the postal
Electric said the a · x
service”
the Postal
i,t
Service
t
Service contacted the the company
m
t= START(i)
Span representation (g)
∗ ∗
By optim
e 1:Span
Firstrepresentation: gi = [xcoreference
step of the end-to-end START(i)
, xresolution
END(i)
, x̂imodel,
, φ(i)]which computesrallyembedding
learns r
tions of spans for scoring potential entity mentions. Low-scoring spans are pruned, so that o
initial pruni
geable numberThisof spansgeneralizes thecoreference
is considered for recurrent span In repre-
decisions. general, the model conside
ble spans upBILSTM hidden states
tosentations
a maximum width, Attention-based
for but we depict representation
here only a small mentions
Additional
subset. features rec
span’s start and end recently proposed for question-
(details next slide) of the words quickly leve
answering (Lee et al.,in 2016),
the span which only include
CuuDuongThanCong.com
ate credit as
https://fb.com/tailieudientucntt
with the highest mention scores and co
,δ = ot,δ
mation is ◦concatenated
tanh(ct,δ ) to to a gold cluster up or to all
K gold antecedents
antecedents for each. have Webeenal
∗ End-to-end Model
t = [hg
xtation t,1i ,of span
ht,−1 ] i: pruned, GOLDnon-crossing (i) = {ϵ}. bracketing structures wit
suppression scheme.the 2 We accept sp
∗
By optimizing this objective, model natu-
x∈ {−1,
END•(i)
, x̂1} i ,isindicates
φ(i)] the directionality
an attention-weighted rally learns
of
average to of the word
creasing
prune orderembeddings
spans of the mention
accurately. inWhile
the
scores,theun
∗
STM, and span xt is the General
Mention score (s ) m
concatenated
Electric Electricoutput
initial
said the
pruning
the Postal Service
considering
is completely
Service contacted the the company
i, there exists
span random, only agold
prev
idirectional
recurrent LSTM.repre-
span We use independent cepted span j such that START (i) < S
Span representation (g)
for every for sentence, mentions
since cross-sentence receive positive updates. The model can
oposed question- END (i) < END (j) ∨ START (j) < ST
was
Span head (x̂)
not helpful in our +
experiments. quickly + leverage this + learning signal + for appropri-
+
16), which only include END (j) < END (i).
ctic heads are ∗ typically included ate credit
as fea-assignment Despite to these
the different
aggressive factors,
pruning such
str
ntations
BidirectionalxLSTM ∗
(x )
(i) and
n previous systems (Durrettas
START theKlein,
and mentionmaintain scores samhigh usedrecall
for pruning.
of gold mentions
e soft head word vector
lark and Manning, 2016b,a). Instead of re-scoreperiments
Fixing of the dummy (over 92% antecedent
when λ =to0.4). zero
(i) encoding
syntactic parses, theoursize of learns
model removesa task- a spurious For degree of freedom
the remaining in thethe
mentions, over-
joi
notion
Word & ofcharacter
embedding (x)
headedness using an allattention
model withtion respect to mention
of antecedents for each detection.
documentIti
sm (Bahdanau et al., 2014) over General Electric words
said in the
in a
Postal
forward
Service
also prevents the span pruning from introducing pass
contacted
over a
the
single
company
computa
n: Attention scores The final prediction is the clustering p
e 1: First step of the end-to-end coreference 2 resolution
The official CoNLL-2012 model, evaluation
which computes embedding
only considers pre-r
odel
tions αdescribed
of spans for scoringabove potential
is∗ entity without
mentions. the most
Low-scoring likely configuration.
t = wα · FFNN α (xt ) dictions crossing mentionsspansto be are pruned,
valid. so that
Enforcing thiso
geable
lengthnumberT . To of spans
maintainis considered for coreference
consistency decisions.
is not inherently In general,
necessary in ourthemodel.
model conside
exp(α ) 6 only Learning
adot
ble spans
i,t
product
= up to a of weightt width, but we depict here
maximum a small subset.
vector andEND "transformed
(i)
hidden state In the training data, only clustering i
exp(αk )
CuuDuongThanCong.com https://fb.com/tailieudientucntt
ot,δ = σ(Wo [xt , ht+δ,δ with ] +thebo ) highest mention compute their and
scores unaryco m
,δ = ot,δ
mation is ◦concatenated
tanh(ct,δ ) to c!to t,δ =a tanh(W
gold cluster c [xt ,up or
ht+δ,δto]all
+Kbcgold
) antecedents
antecedents fined in Section
for have We
each. 4). To
been al
∗
xtation
t = End-to-end Model
[h g
t,1i ,of
h span
t,−1 ] i: c t,δ = f t,δ ◦ !
c
pruned, GOLDnon-crossing
t,δ + (1 − f t,δ ) ◦ c
(i) = {ϵ}. bracketing
t+δ,δ
of spans
with the
to consider,
structures wit
highest mentio
w
CuuDuongThanCong.com https://fb.com/tailieudientucntt
nd or (4) forcontext
5), Klein, this pairwise coreference
is unambiguous. score:
There (1) factors
are three whether
aid the Postal Service contacted the company
span i is a mention, (2) whether span(1)j is a men-
End-to-end Model
Manning,
tt and Klein, for this pairwise coreference score: whether
is Manning,
nd most tion,span andi is(3) whether (2)
a mention, j iswhether span j isofa i:men-
an antecedent
rence
ach resolution model,
tion, and (3) which
#whether computes embedding
j is an antecedent repre-
of i:
son • is
over most score
Lastly, every pair of spans to decide if they are coreferent
ity
ions over Low-scoring#
mentions.
reason mentions
and 0spans are pruned, so that only j = ϵa
s(i, j) = 0 j=ϵ
mentions and
or coreference decisions. s(i, j) =Insm general,
(i) + sm the(j)model considers
+ sa (i, j) j ̸ =allϵ
we depict here only a small subset. sm (i) + sm (j) + sa (i, j) j ̸ =ϵ
Are spans
Here
i and j
sm (i)Is is
i a
amention?
unary scoreIs j a
for span iDo
mention?
being
they
a men-
look
ules to re- Here s m (i) is a unary score for span i being a men-
use rules
alized coreferent
reg- mentions?
to re- tion,
tion,and
and sas(i,(i,j)j)isispairwise
pairwise scorefor
score spanj jbeing
forcoreferent?
span being
lexicalized reg- a
above are computed via standard feed-forward
an an antecedent
antecedentofofspan span i.i.
neural networks:
,
ervice)• Scoring functions take the span representations as input
sm (i) = wm · FFNN m (gi )
sa (i, j) = wa · FFNN a ([gi , gj , gi ◦ gj , φ(i, j)])
74
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Coreference Models: Clustering-Based
75
4 CuuDuongThanCong.com https://fb.com/tailieudientucntt
Coreference Models: Clustering-Based
Coreference
Google recently with Agglomera3ve
Coreference with Agglomera3ve Clustering
… the company announced Clustering
Google Plus ... the product features ...
Cluster11
Cluster Cluster22
Cluster Cluster33
Cluster Cluster44
Cluster
Google
Google thecompany
the company GooglePlus
Google Plus theproduct
the product
Cluster 1 Cluster 2
? coreferent
? coreferent
8
77
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Clustering Model Architecture
From Clark & Manning, 2016
Neural Network Architecture
Merge clusters c1 = {Google, the company} and
c2 = {Google Plus, the product} ?
Men3on-Pair Cluster-Pair
Men3on Pairs Score
Representa3ons Representa3on
78
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Clustering
ProducingModel Architecture
Cluster-Pair Representa3ons
First•produces representaHons
First produce for relevant
a vector for each menHon pairs
pair of mentions
using the menHon-pair-encoder.
• e.g., the output of the hidden layer in the feedforward neural
network model
c1
!!!
!!!
Mention-Pair Mention-Pair
Representations Encoder c2
!!!
!!!
79
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Clustering Model Architecture
Producing Cluster-Pair Representa3ons
• Then
• Then apply
applies a pooling
a pooling operationover
operaHon overthe
the matrix
matrix of
of mention-pair
menHon-
representations to get a cluster-pair representation
pair representaHons (we use max and average pooling)
max avg
c1
Cluster-Pair
Representation
rc(c1, c2) !!!
Pooling
!!!
Mention-Pair Mention-Pair
Representations Encoder c2
Rm(c1, c2)
!!!
!!!
8
80
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Clustering Model Architecture
Producing Cluster-Pair Representa3ons
• Score the candidate cluster merge by taking the dot product of
• Then applies a pooling operaHon over the matrix of menHon-
the representation with a weight vector
pair representaHons (we use max and average pooling)
max avg
c1
Cluster-Pair
Representation
rc(c1, c2) !!!
Pooling
!!!
Mention-Pair Mention-Pair
Representations Encoder c2
Rm(c1, c2)
!!!
!!!
18
81
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Clustering Model: Training
82
CuuDuongThanCong.com https://fb.com/tailieudientucntt
9. Coreference Evaluation
• Many different metrics: MUC, CEAF, LEA, B-CUBED, BLANC
• Often report the average over a few different metrics
Gold Cluster 1
Gold Cluster 2
P = 4/5
R= 4/6
Gold Cluster 1
Gold Cluster 2
P = 4/5
R= 4/6
Gold Cluster 1
Gold Cluster 2
P = 1/5
R= 1/3
Gold Cluster 1
Gold Cluster 2
P = 1/5
R= 1/3
87
CuuDuongThanCong.com https://fb.com/tailieudientucntt
System Performance
• OntoNotes dataset: ~3000 documents labeled by humans
• English and Chinese data
88
CuuDuongThanCong.com https://fb.com/tailieudientucntt
System Performance
Model English Chinese
Rule-based system, used
Lee et al. (2010) ~55 ~50 to be state-of-the-art!
89
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Where do neural scoring models help?
Example Wins
Anaphor Antecedent
the country’s leftist rebels the guerillas
the company the New York firm
216 sailors from the ``USS cole’’ the crew
the gun the rifle
90
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Conclusion
• Coreference is a useful, challenging, and linguistically interesting
task
• Many different kinds of coreference resolution systems
• Systems are getting better rapidly, largely due to better neural
models
• But overall, results are still not amazing
• Try out a coreference system yourself!
• http://corenlp.run/ (ask for coref in Annotations)
• https://huggingface.co/coref/
CuuDuongThanCong.com https://fb.com/tailieudientucntt