Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 4

Natural-language generation

From Wikipedia, the free encyclopedia


Jump to navigationJump to search
Natural-language generation (NLG) is the natural-language processing task of
generating natural language from a machine-representation system such as a
knowledge base or a logical form. Psycholinguists prefer the term language
production when such formal representations are interpreted as models for mental
representations.

It could be said an NLG system is like a translator that converts data into a
natural-language representation. However, the methods to produce the final language
are different from those of a compiler due to the inherent expressivity of natural
languages. NLG has existed for a long time but commercial NLG technology has only
recently become widely available.

NLG may be viewed as the opposite of natural-language understanding: whereas in


natural-language understanding, the system needs to disambiguate the input sentence
to produce the machine representation language, in NLG the system needs to make
decisions about how to put a concept into words.

A simple example is systems that generate form letters. These do not typically
involve grammar rules, but may generate a letter to a consumer, e.g. stating that a
credit card spending limit was reached. To put it another way, simple systems use a
template not unlike a Word document mail merge, but more complex NLG systems
dynamically create text. As in other areas of natural-language processing, this can
be done using either explicit models of language (e.g., grammars) and the domain,
or using statistical models derived by analysing human-written texts[1].

Contents
1 Example
2 Stages
3 Applications
4 Evaluation
5 See also
6 References
7 Further reading
8 External links
Example
The Pollen Forecast for Scotland system[2] is a simple example of a simple NLG
system that could essentially be a template. This system takes as input six
numbers, which give predicted pollen levels in different parts of Scotland. From
these numbers, the system generates a short textual summary of pollen levels as its
output.

For example, using the historical data for 1-July-2005, the software produces

Grass pollen levels for Friday have increased from the moderate to high levels of
yesterday with values of around 6 to 7 across most parts of the country. However,
in Northern areas, pollen levels will be moderate with values of 4.

In contrast, the actual forecast (written by a human meteorologist) from this data
was

Pollen counts are expected to remain high at level 6 over most of Scotland, and
even level 7 in the south east. The only relief is in the Northern Isles and far
northeast of mainland Scotland with medium levels of pollen count.

Comparing these two illustrates some of the choices that NLG systems must make;
these are further discussed below.

Stages
The process to generate text can be as simple as keeping a list of canned text that
is copied and pasted, possibly linked with some glue text. The results may be
satisfactory in simple domains such as horoscope machines or generators of
personalised business letters. However, a sophisticated NLG system needs to include
stages of planning and merging of information to enable the generation of text that
looks natural and does not become repetitive. The typical stages of natural-
language generation, as proposed by Dale and Reiter,[3] are:

Content determination: Deciding what information to mention in the text. For


instance, in the pollen example above, deciding whether to explicitly mention that
pollen level is 7 in the south east.

Document structuring: Overall organisation of the information to convey. For


example, deciding to describe the areas with high pollen levels first, instead of
the areas with low pollen levels.

Aggregation: Merging of similar sentences to improve readability and naturalness.


For instance, merging the two following sentences:

Grass pollen levels for Friday have increased from the moderate to high levels of
yesterday and
Grass pollen levels will be around 6 to 7 across most parts of the country
into the following single sentence:

Grass pollen levels for Friday have increased from the moderate to high levels of
yesterday with values of around 6 to 7 across most parts of the country.
Lexical choice: Putting words to the concepts. For example, deciding whether medium
or moderate should be used when describing a pollen level of 4.

Referring expression generation: Creating referring expressions that identify


objects and regions. For example, deciding to use in the Northern Isles and far
northeast of mainland Scotland to refer to a certain region in Scotland. This task
also includes making decisions about pronouns and other types of anaphora.

Realization: Creating the actual text, which should be correct according to the
rules of syntax, morphology, and orthography. For example, using will be for the
future tense of to be.

An alternative approach to NLG is to use "end-to-end" machine learning to build a


system, without having separate stages as above [4]. In other words, we build an
NLG system by training a machine learning algorithm (often an LSTM) on a large data
set of input data and corresponding (human-written) output texts. The end-to-end
approach has perhaps been most successful in image captioning [5], that is
automatically generating a textual caption for an image.

Applications
The popular media has paid the most attention to NLG systems which generate jokes
(see computational humor), but from a commercial perspective, the most successful
NLG applications have been data-to-text systems which generate textual summaries of
databases and data sets; these systems usually perform data analysis as well as
text generation. Research has shown that textual summaries can be more effective
than graphs and other visuals for decision support[6] [7] [8], and that computer-
generated texts can be superior (from the reader's perspective) to human-written
texts [9].

The first commercial data-to-text systems produced weather forecasts from weather
data. The earliest such system to be deployed was FoG,[10] which was used by
Environment Canada to generate weather forecasts in French and English in the early
1990s. The success of FoG triggered other work, both research and commercial.
Recent applications include the UK Met Office's text-enhanced forecast.[11]

Currently there is considerable commercial interest in using NLG to summarise


financial and business data. Indeed, Gartner has said that NLG will become a
standard feature of 90% of modern BI and analytics platforms[12]. NLG is also being
used commercially in automated journalism, chatbots, generating product
descriptions for e-commerce sites, summarising medical records[13] [14], and
enhancing accessibility (for example by describing graphs and data sets to blind
people[15]).

An example of an interactive use of NLG is the WYSIWYM framework. It stands for


What you see is what you meant and allows users to see and manipulate the
continuously rendered view (NLG output) of an underlying formal language document
(NLG input), thereby editing the formal language without learning it.

Content generation systems assist human writers and makes writing process more
efficient and effective. A content generation tool based on web mining using search
engines APIs has been built.[16] The tool imitates the cut-and-paste writing
scenario where a writer forms its content from various search results. Relevance
verification is essential to filter out irrelevant search results; it is based on
matching the parse tree of a query with the parse trees of candidate answers.[17]
In an alternative approach, a high-level structure of human-authored text is used
to automatically build a template for a new topic for automatically written
Wikipedia article.[18]

Several companies have been started since 2009 which build systems that transform
data into narrative using NLG and AI techniques. These include Phrasetech, Arria
NLG, Automated Insights, Narrative Science, Retresco, and Yseop.

Evaluation
As in other scientific fields, NLG researchers need to test how well their systems,
modules, and algorithms work. This is called evaluation. There are three basic
techniques for evaluating NLG systems:

Task-based (extrinsic) evaluation: give the generated text to a person, and assess
how well it helps him perform a task (or otherwise achieves its communicative
goal). For example, a system which generates summaries of medical data can be
evaluated by giving these summaries to doctors, and assessing whether the summaries
helps doctors make better decisions.[14]
Human ratings: give the generated text to a person, and ask him or her to rate the
quality and usefulness of the text.
Metrics: compare generated texts to texts written by people from the same input
data, using an automatic metric such as BLEU.
An ultimate goal is how useful NLG systems are at helping people, which is the
first of the above techniques. However, task-based evaluations are time-consuming
and expensive, and can be difficult to carry out (especially if they require
subjects with specialised expertise, such as doctors). Hence (as in other areas of
NLP) task-based evaluations are the exception, not the norm.

Recently researchers are assessing how well human-ratings and metrics correlate
with (predict) task-based evaluations. Work is being conducted in the context of
Generation Challenges[19] shared-task events. Initial results suggest that human
ratings are much better than metrics in this regard. In other words, human ratings
usually do predict task-effectiveness at least to some degree (although there are
exceptions), while ratings produced by metrics often do not predict task-
effectiveness well. These results are preliminary. In any case, human ratings are
the most popular evaluation technique in NLG; this is contrast to machine
translation, where metrics are widely used.

You might also like