Professional Documents
Culture Documents
Annotation of Anaphora and Coreference For Automatic Processing
Annotation of Anaphora and Coreference For Automatic Processing
Annotation:
Linguistically motivated: tries to capture certain
phenomena (usually focuses on anaphora)
Application motivated: limited relations are
encoded (usually focuses on coreference)
Structure
1. Background information
2. The MUC annotation for coreference
3. The NP4E corpus
4. Event coreference and NP coreference
5. Conclusions
Anaphora and anaphora
resolution
cohesion which points back to some previous item
(Halliday and Hasan, 1976)
the pointing back word is called an anaphor, the
entity to which it refers or for which it stands is its
antecedent (Mitkov, 2002)
The process of determining the antecedent of an
anaphor is called anaphora resolution (Mitkov,
2002)
Anaphora resolution can be seen as a process of
filling empty or almost empty expressions with
information from other expressions
Coreference and coreference
resolution
When the anaphor refers to an antecedent
and when both have the same referent in real
world they are termed coreferential (Mitkov,
2002)
Coreferential chains:
{Sophia Loren, she, the actress, her, she},
{Bono, the U2 singer},
{a thunderstorm},
{a plane}
Examples of anaphoric
expressions from Mitkov (2002)
Indirect anaphora: Although the store had only just
opened, the food hall was busy and there were long
queues at the tills.
Identity-of-sense anaphora: The man who gave
his paycheck to his wife was wiser that the man who
gave it to his mistress
Verb and adverb anaphora: Stephanie sang, as
did Mike
Bound anaphora: Every man has his own agenda
Cataphora: The elevator opened for him on the 14th
floor, and Alec stepped out quickly.
Anaphora vs. coreference
There are many anaphoric expressions which are
not coreferential
Most of the coreferential expressions are anaphoric
(Sophia Loren, the actress)
Coreferential expressions that may be or may not be
anaphoric
(Sophia Loren, the actress Sophia Loren) – not anaphoric?
(the actress Sophia Loren, Sophia Loren) – anaphoric
Coreferential expressions which are not anaphoric
(Sophia Loren, Sophia Loren)
Cross-document coreference is not anaphora
Substitution test
To determine whether two entities are
coreferential substitution test is used
Sophia Loren says she will always be grateful to
Bono Sophia Loren says Sophia Loren will
always be grateful to Bono.
John has his own agenda John has John’s own
agenda
Every man has his own agenda. Every man has
every man’s own agenda. ??
Anaphora & coreference in
computational linguistics
are important preprocessing steps for a wide
range of applications such as machine
translation, information extraction, automatic
summarisation, etc.
(i) annotate identity-of-reference direct nominal (i) annotate indefinite predicate nominals that are linked to
anaphora other elements by perception verbs as coreferential with
those elements
(ii) annotate definite descriptions which stand in any of (ii) annotate identity-of-sense anaphora
the identity, synonymy, generalisation, specialisation, or
copula relationships with an antecedent
(iii) annotate definite NPs in a copula relation as (iii) annotate indirect anaphora between markables
coreferential
(iv) annotate definite appositional and bracketed phrases (iv) annotate cross-document coreference
as coreferential with the NP of which they are a part
(v) annotate NPs at all levels from base to complex and (v) annotate indefinite NPs in copula relations with other
co-ordinated NPs as coreferential
(vi) familiarise yourself with the use of unfamiliar, (vi) annotate non-permanent or “potential” coreference
highly specialised terminology by search through the between markables
text
(vii) annotate bound anaphors
the operation
TYPE: attack
TIME: Dec. 17
REF: stormed
TARGET: the Japanese
ambassador's residence in
Lima (FACILITY)
ATTACKER: MRTA rebels
(PERSON)
the operation PLACE: Lima (LOCATION)
Issues with event annotation
Very difficult annotation task
At times it is difficult to decide the tense of an event
in direct speech
Whether to include demands, promises or threats in
the CONTACT (or use them only as a signal of
modality)
Whether to make a distinction between
speaker/hearer in CONTACT events (especially in
the case of demands, promises or threats)
What coreferential events indicate?
(Hasler and Orasan 2009)
Zaire said on Monday its warplanes were bombing three key rebel-held towns in its eastern
border provinces and that the raids would increase in intensity.
a333 TRIGGER: bombing
ATTACKER: Zaire: ID=44: CHAIN=5: ORGANISATION
MEANS: its warplanes: ID=46: CHAIN=46: VEHICLE
PLACE: three key rebel-held towns in its eastern border provinces: ID=48:
CHAIN=14: LOCATION
TARGET: three key rebel-held towns in its eastern border provinces: ID=48:
CHAIN=14: LOCATION
TIME: Monday: ID=45: CHAIN=7
“Since this morning the FAZ (Zaire army) has been bombing Bukavu, Shabunda and
Walikale”, said a defence ministry statement in the capital Kinshasa.
a334 TRIGGER: bombing
ATTACKER: the FAZ (Zaire army): ID=53: CHAIN=53: ORGANISATION
MEANS: –
PLACE: Bukavu, Shabunda and Walikale: ID=55: CHAIN=14: LOCATION
TARGET: Bukavu, Shabunda and Walikale: ID=55: CHAIN=14: LOCATION
TIME: this morning: ID=52: CHAIN=52
Referential relations between
arguments
104 chains considered:
22 (21.15%) contained only coferential NPs
23 (22.12%) contained only non-coferential NPs
9 chains ignored
50 (48.07%) contain a mixture of coreferential and
non-coreferential NPs