Professional Documents
Culture Documents
NLP 2 Adv
NLP 2 Adv
Outline
Cohesion and Coherence
Ambiguity
Natural Language Generation
Cohesion & Coherence
Cohesion
Definition 1:
isthe grammatical and lexical relationship within a
text or sentence and can be defined as the links
that hold a text together and give it meaning.
Cohesion is when the link between sentences,
words and phrases are visible, or easily
understandable.
Ref
http://en.wikipedia.org/wiki/Cohesion_(linguistics)
Cohesion
Definition 2:
Cohesion is a semantic relation between an
element in the text and some other element
that is crucial to the interpretation of it.
Ref
Halliday et al., 1976
Example:
Ahmad belongs to Peshawar. He is in M.Sc
final year
Coherence
Definition 1:
A quality of sentences, paragraphs, and essays
when all parts are clearly connected.
Coherence is when the theme or the main idea
of the essay or writing piece is understandable.
A text has coherence if its constituent
sentences follow on one from the other in an
orderly fashion so that the reader can make
sense of the entire text.
Ref(http://grammar.about.com/od/c/g/coherenceterm.htm: retrieved: 30
Oct, 2010)
Coherence
Example 1:
There once was a farmer in a small village.
He worked hard day and night in his fields
to fed his wife and children.
Example 2:
Aliwas a studious student and got 900
marks. His percentage is 92.1%.
Coherence
Definition 2:
Coherence is a semantic property of discourse
formed through the interpretation of each
individual sentence relative to the interpretation
of other sentences, with "interpretation" implying
interaction between the text and the reader.
Ref(Teun A. van Dijk, pp. 93)
http://www.criticism.com/da/coherence.php
Coherence
Definition 3:
Coherence in linguistics is what makes a text semantically
meaningful.
It is especially dealt with in text linguistics.
http://en.wikipedia.org/wiki/Coherence_(linguistics), retrieval date: 27 Oct, 2010
Definition 4:
When sentences, ideas, and details fit together clearly,
readers can follow along easily, and the writing is coherent.
http://home.ku.edu.tr/~doregan/Writing/Cohesion.html, Retrieved 30 Oct, 2010
Why coherence?
The text-based features which provide cohesion in a
text do not necessarily help achieve coherence, that
is, they do not always contribute to the
meaningfulness of a text, be it written or spoken.
It has been stated that a text coheres only if the world
around is also coherent.
Cohesive devices
Definition:
The links within the text that hold it
together are called cohesive devices.
Categories of cohesive devices
A cohesive text is created in many different ways. In
Cohesion in English, M.A.K. Halliday and Ruqaiya
Hasan (1976) identify five general categories of
cohesive devices that create coherence in texts:
Reference,
Ellipsis,
Substitution,
Conjunction,
Lexical cohesion.
Refhttp://www.criticism.com/da/coherence.php
References or referring expressions
Referent
For Example:
1)John helped Mary.
Correlate/
2) He was kind.
Referent/
Anaphor/
Anaphoric Device (AD)
Anaphoric and Cataphoric Devices
The referring elements (pronouns) in anaphoric text that refer to their
corresponding referent ( nouns) backward are called anaphoric
devices. Also, called anaphor.
For example:
Bell is a powerful player but unfortunately he will not take part in the
trophy due to injury.
Anaphoric Device
The referring elements (pronouns) that refer to their corresponding referent
(nouns) forward in cataphoric text are called cataphoric devices.
Also, called cataphor.
For example:
As her father went abroad, Nighat took control of the organization by
herself.
Cataphoric Device
Antecedent
The referent in the anaphoric/cataphoric text to which the
anaphoric/cataphoric devices refer are called antecedents.
Also, called correlates .
For example:
Bell is a powerful player but unfortunately he will not
take part in the trophy due to injury.
Antecedent
As her father went abroad, Nighat took control of the
organization by herself.
Types of Anaphora (On the basis of position of
anaphor and its antecedent)
Intra-sentential/Sentence internal anaphora:
The anaphora in which the AD and its antecedent both
occurs in the same sentence is called sentence internal.
Reflexive pronouns
(himself, herself, itself, themselves) are typical examples of intra-
sentential anaphora.
Possessive pronouns
(his, her, hers, its, their, theirs) can often be used as intra-sentential
anaphors too, and often be in the same clause as the anaphor.
For example:
[John] 1 took [his] 1 [hat] 2 off and hung [it] 2 on a peg.
Types of Anaphora (Cont..)
Inter-sentential/Sentence external anaphora:
The anaphora in which the AD and its antecedent
doesn’t occur in the same sentence is called
sentence external or inter-sentential anaphora.
For example:
[Jehansher] 1 Khan was senior player of Sqash. [He] 1 has
won several trophies.
[John] 1 took his hat off and hung it on a peg. [He] 1 was very
tied therefore went to slept..
Types of Anaphora
(On the basis of grammatical category of the anaphor and antecedent)
is in love)
B)
Jane told marry she was in danger (ambiguous)
Jane warned Marry she was in danger.
Anaphora resolution
Anaphora Resolution == the problem of resolving what a
pronoun, or a noun phrase refers to.
Consider the following Discourse:
Been to Karachi
Origin of the word ellipsis
Derived from Greek, the word 'ellipsis' means
“the omission of words that could be
understood from the context”.
Ellipsis is the non-expression of one or more
sentence elements whose meaning can be
reconstructed either from the context or from a
person’s knowledge of the world.
e-clause & a-clause
e-clause:
The clause from which the material is missing is often
referred to as the elliptical clause (e-clause).
Example:
He is rich, but his brother is not . ᶲ
a-clause:
The clause from which the interpretation of the missing
material is derived is referred to as antecedent clause (a-
clause).
Example:
ᶲ
He is rich, but his brother is not .
Types of ellipsis
Some of the several types of ellipses are:
Noun Phrase Ellipses
Verb Phrase Ellipses
Gapping
Stripping
Sluicing
Ellipses in WH-Constructions
Ellipses in Q-Constructions
Ellipsis resolution
Ellipses resolution is an important area in the
research community.
All natural languages have the occurrences of
ellipses in their text as well as speech,
although different from each other.
For their resolution different approaches are
followed by linguists as described by Shalom
Lappin (Lappin and Lease).
Examples of ellipsis resolution
He is rich, but his brother is not ᶲ.
(= He is rich but his brother is not rich).
Bob ᶲ and Tom ate cheese.
(= Bob (ate cheese) and Tom ate cheese).
John realizes that he is a fool, but Bill does not ᶲ,
even though his wife does ᶲ.
(= John realizes that John is a fool, but Bill does not
realize that Bill is a fool, even though Bill's wife
realizes that Bill is a fool.)
Substitution
Substitution
Substitution is very similar to ellipsis in the effect
it has on the text, and occurs when instead of
leaving a word or phrase out, as in ellipsis, it is
substituted for another, more general word.
{example}
"Which ice-cream would you like?“
Lappin, S., “A Sequenced Model of Anaphora and Ellipsis Resolution”, 2003 .
Ambiguity
Ambiguity
Natural languages are inherently ambiguous.
[McDonald 1992]
What is NLG? Or Ingredient of NLG
Goal:
Computer software which produces understandable and
appropriate texts in English or other human languages
Input:
Some underlying non-linguistic representation of information
Output:
Documents, reports, explanations, help messages, and other
kinds of texts
Knowledge sources required:
Knowledge of target language and of the domain
Example System #1: FoG
Function:
Produces textual weather reports in English and French
Input:
Graphical/numerical weather depiction
User:
Environment Canada (Canadian Weather Service)
Developer:
CoGenTex
Status:
Fielded, in operational use since 1992
FoG: Input
FoG: Output
Example System #2: PlanDoc
Function:
Produces a report describing the simulation options that an engineer
has explored
Input:
A simulation log file
User:
Southwestern Bell Telephone Company (Texas)
Developer:
Bellcore and Columbia University
Status:
Fielded, in operational use since 1996
PlanDoc: Input
RUNID fiberall FIBER 6/19/93 act yes
FA 1301 2 1995
FA 1201 2 1995
FA 1401 2 1995
FA 1501 2 1995
ANF co 1103 2 1995 48
ANF 1201 1301 2 1995 24
ANF 1401 1501 2 1995 24
END. 856.0 670.2
PlanDoc: Output
This saved fiber refinement includes all DLC
changes in Run-ID ALLDLC. RUN-ID FIBERALL
demanded that PLAN activate fiber for CSAs 1201,
1301, 1401 and 1501 in 1995 Q2. It requested
the placement of a 48-fiber cable from the CO to
section 1103 and the placement of 24-fiber cables
from section 1201 to section 1301 and from
section 1401 to section 1501 in the second quarter
of 1995. For this refinement, the resulting 20 year
route PWE was $856.00K, a $64.11K savings over the
BASE plan and the resulting 5 year IFC was $670.20K,
a $60.55K savings over the BASE plan.
Example System #3: STOP
Function:
Produces a personalized smoking-cessation leaflet
Input:
Questionnaire about smoking attitudes, beliefs, history
User:
NHS (British Health Service)
Developer:
University of Aberdeen
Status:
Undergoing clinical evaluation to determine its effectiveness
STOP: Input
SMOKING QUESTIONNAIRE
Please answer by marking the most appropriate box for each question like this:
Please read the questions carefully. If you are not sure how to answer, just give the best answer you can.
Q2 Home situation:
Live Live with Live with Live with
alone husband/wife/partner other adults children
Q4 Does anyone else in your household smoke? (If so, please mark all boxes which apply)
husband/wife/partner other family member others
70
TEMSIS: Output Summary
Le 21/7/1998 à la station de mesure de
Völklingen -City, la valeur moyenne maximale
d'une demi-heure (Halbstundenmittelwert) pour
l'ozone atteignait 104.0 µg/m³. Par conséquent,
selon le decret MIK (MIK-Verordnung), la valeur
limite autorisée de 120 µg/m³ n'a pas été
dépassée.
Der höchste Halbstundenmittelwert für Ozon an
der Meßstation Völklingen -City erreichte am
21. 7. 1998 104.0 µg/m³, womit der gesetzlich
zulässige Grenzwert nach MIK-Verordnung von 120
µg/m³ nicht überschritten wurde.
71
Types of NLG Applications
Automated document production
weather forecasts, simulation reports, letters, ...
Presentation of information to people in an
understandable fashion
medical records, expert system reasoning, ...
Teaching
information for students in CAL systems
Entertainment
jokes (?), stories (??), poetry (???)
An Architecture for Generation
sentence plans
6. Syntax, Morphology, Linguistic
Orthography
Realizer
surface text
Text Plans
Common representation : tree
Leaf nodes = messages
Internal nodes = message groupings
Simple text plans: templates OK
Complex text plans: require full representation
language
(e.g., TAMERLAN, DIOGENES)
Sentence Plans
Simple: templates (select & fill)
Complex: abstract representation
(SPL: Sentence Planning Language)
Example SPL Expression
(S1/exist
:object (01/train
:cardinality 20
:relations ((R1/period
:value daily)
(R2/source
:value Aberdeen)
(R3/destination
:value Glasgow))))
There are 20 trains a day from Aberdeen to Glasgow
Content Determination
Messages (raw content)
User Model (influences content)
Is Reasoning Required?
Find a train from Aberdeen to Leeds
(It requires two trains to get there)
Deep Reasoning Systems
represent the user’s goals as well as any
immediate query
utilize plan recognition & reasoning
Discourse Planning
Structure messages into a coherent text
Example: start with a summary, then give
details
Discourse relations, e.g.:
elaboration:More specifically, X
exemplification: For example, X
contrast / exception: However, X
Rhetorical Structure Theory (RST)
Sentence Aggregation
No aggregation (1 sentence / message)
Relative Clause
..which leaves at 10am
Conjunction
..and the next train is the express
Combinations
..and the next train is the express
which leaves at 10am
Lexicalization
Choosing words to realize concepts or
relations
Example:
(action/change
(measure outside_temperature)
(delta (quantity/deg_F -10)))
(*A-INGEST
(AGENT *O-BOB)
(PATIENT *O-CHOCOLATE)) => "eat"
Case Creation
• Additional structure is required to
realize the meaning of the
semantic representation
(*A-KICK
(AGENT *O-JOHN)
(PATIENT *O-BALL))
The Daily News, “Jolie was high on cocaine during TV interview: Former drug
dealer”, dated: 22 Oct, 2010.
http://dailymailnews.com/1010/22/ShowBiz/index.php?id=3