Machine Translation Post-Editing: Oryslava Bryska, April 28, 2021

Machine Translation
Post-editing Oryslava Bryska,

April 28th, 2021
Agenda today
1 Definition and key points of MT PE research
Paradigms of MT reception
2 PE levels and guidelines

How MT works.
3 Recent developments. Future arrays of research in MT PE
Key Post editing, Human-assisted MT paradigm in PE,

Machine-assisted MT paradigm in PE, Cognitive effort, Temporal effort,
terms: Technical effort, Level of PE, Automatic PE, Interactive PE

Introduction
Definition and scope
“ Post-editing of machine translation (MT) is now increasin
gly implemented in the human translation workflow after
studies in both industry and academia have demonstrated
the efficacy of this practice.
Lucas Nunes Vieira, 2017

”
The use of machine translation (MT) in the human translation workflow is now a
common practice. A number of studies show evidence that confirms the efficacy
of post-editing in terms of greater quality and speed.
Post-editing defined
Evolution of the notion
Working definition
The term has (Allen, 2003 ):
been Distinguishing Previously
commonly factor of the defined as:
used in: process:
- Subfields of natural Correction of a pre-
language processing,
Term used for the Editing, modification
translated text rather correction of MT
MT including; than translation from and/or correction of a pre-
output by human
- Automated error scratch (Wagner, 1985) linguists/editors translated text that has
correction;
- Optical character
(Veale and Way, 1997) been processed by a MT
recognition; system from a
- TM source language into
- Controlled languages;
- Separate translation (a) target language(s)
related service with its
own standard ISO 2017.
Human-assisted Machine-assisted HT
MT
- Passive activity with
editors closing the gap • Humans are at the centre of
between defective MT translation production;
outputs and high quality • MT are often used as a part
translations; of CAT tools (TMs and
Human-assisted MT - often monolingual pre- terminology resources);
editors and post-editors • Achievements in terms of
vs Machine-assisted - Undesirable final step in effort and quality;
• Etc.
HT MT application;
- Post-editors were viewed
as ‘human partners’
- Evoked negative
perceptions of MT
- Etc.
Pioneer articles describing the different tasks, processes and profiles in post-editing (Vasconcellos and L
eón 1985, Wagner 1985, 1987, Vasconcellos 1986, 1989, 1992, 1993, Senez 1998) and different level
s of post-editing: rapid and conventional (Loffler-Laurain 1983, 1986). Jeffrey Allen (2003,2010), A
na Guerberof (2009, 2013), Ke Hu and Patrick Cadwell (2016), Lucas Nunes Vieira (2017)
Major findings revolve around the following aspects:

• Significance of specifications to identify the extent MT output is acceptable
(quality);
• Metrics on how much human effort is necessary to improve such
imperfect texts (temporal, cognitive and technical);
• Productivity of PE as opposed to human translation from scratch
 Post-editing practice demonstrates greater quality and speed (e.g.Plitt and
Masselot 2010; Green, Heer, and Manning 2013),
 Research on different levels of post-editing as the result (e.g. Carl et
al.2011; Koponen 2012; Koponen et al. 2012; de Almeida 2013).
User acceptance of MT output PE vs HT
 Bowker and Ehgoetz (2007) explore user acceptance of machine translation o

utput among a group of 121 professors in the Arts Faculty at University of Ott
awa.
 Analysis of different documents according to speed, quality and cost.
 The results show that two thirds of the participants chose the post-edited opti
on and one third - the human translation. Still, evaluators chose different doc
uments showing that they had different preferences even within a broader ca
tegory (post-edited versus human translation).
 The researchers also point out that language professional participants are mor
e linguistically sensitive to language quality than those that are not language
professionals, and that the latter might be more tolerant to linguistic errors.
Quality: Clarity, Accuracy and Style para
meters
 Fiederer and O’Brien (2009) also examine the question of quality in machine tr
anslated texts
 Eleven raters to evaluate sentences according to the Clarity, Accuracy and Style
parameters
 Equal scores for translated and post-edited sentences with regards to Clarity
 Higher scores for post-edited sentences with regards to Accuracy
 Higher scores to translated sentences in terms of Style
 Further, raters chose primarily the translated sentences as their favourite sente
nces (63 percent of the sentences as opposed to 37 percent for post-edited se
ntences).
Temporal effort
 Flournoy and Duran (2009) investigate whether post-editing MT output is faster than
translating from scratch within the context of integrating MT in the Adobe productio
n workflow.
 Two tests, a small pilot of 800-2,000 words and a second larger project of around 20
0,000 words using two MT engines: PROMT for Russian and Language Weaver for Sp
anish and French trained with Adobe data and lexicons.
 MT quality and editing speed vary significantly between files as related to the quality o
f the source text (if the two tests are compared);
 the integration of a Globalization Management System (GMS) and MT requires thoug
htful consideration before implementing MT in the regular translation process (other
wise benefits from translation memories might be overlooked);
 feedback from translators interferes with their speed and it should, therefore, be com
pensated;
 post-editing requires senior and skilled translators because novice translators trusted
MT output more readily than senior ones, and this can lead, presumably, to a lower fi
nal quality.
Cognitive effort
 Sharon O’Brien has studied various aspects of post-editing: post-editors’ profiles and co
urse contents for post-editing (2002), and the correlation between post-editing effort
and machine translatability, suggesting a new methodology for measuring source-text
difficulty and cognitive effort (2006a).
 Research on eye-tracking and translation memory matches (2006b) as a methodology
for recording a translator’s interaction with translation technology and explains differe
nces in cognitive effort with different translation memory match types.
 In fact, most uses of MT are similar to the uses of TMs
 Four participants familiar with SDL Trados translate a text from English into German an
d French using different match types, fuzzy matches, and introducing some matches fr
om MT using Systran as the engine:
 exact matches (100 percent matches) present the least cognitive effort
 no matches demand the greatest load.
 As the fuzzy match value decreases, the cognitive load increases and that MT matches
appear to be equivalent to 80-90 percent Fuzzy matches in TMs in terms of cognitive l
oad.
MT PE in non-professional context
 García (2010) explores the use of machine translation and post-editing in a no

n-professional context.
 Two markers assessed the quality of the translation and judged the final result
s.
 Both markers rated the post-edited segments higher than those translated with
out MT, but they showed significant differences among them.
 In 2011, García set up a second phase of the previous study, with 14 students
from English to Chinese.
 Regarding quality, post-edited texts scored higher than translations but evaluat
ors seemed to show a “big disparity” in their assessments of the translations.
Quality
 Carl et al. (2011) compare the post-editing experience in a group of translation

students and professionals.
 The quality of the four translations (two manual and two post-edited) was eval
uated by seven native Danish speakers (four were professional translators).
 Low level of agreement among reviewers when ranking the translations both i
n terms of inter and intra-coder agreement. Further, this low agreement sugge
sts that the assessment of translation quality is “too difficult”.
Quality: subjectivity
 Koehn (2012) reports on the experience of running evaluation assignments to

measure quality of machine translation quality using automatic and manual me
trics.
 Low agreement with evaluators when assessing fluency and adequacy in outpu
t quality
 subjectivity in the judgment of translations made by others and possibly even o
f translators’ own translations.
Levels and Guidelines
• Fit for purpose PE levels
• Criteria of PE evaluation
Approaches towards levels of PE
• An early report - at the European Commission mentions a
‘rapid’ and a ‘full’ post- editing level, where the main differences between
the two were the time spent on the task and the quality of the final product
(Wagner 1985)
• ‘Minimal’ post- editing has also been mentioned as a fuzzy level in between
‘rapid’ and ‘full’ (Allen 2003)
• ‘minimal’, ‘light’, ‘moderate’ and ‘full’ (van Egdom and
Pluymaekers 2019)
• Translation Automation User Society (TAUS) differentiated
between two levels of expected quality: ‘good enough quality’, and ‘human
translation quality’ (TAUS, 2016).
• The two most popularly used are ‘light’ and ‘full’ (according to the
comparative study by Hu and Cadwell 2016).
Factors that determine the level of PE
(Allen, 2003)
• the user/client requirements,

• the volume of documentation expected to be processed,
• the expectation with regard to the level of quality for reading the final
draft of the translated product,
• the translation turn-around time,
• the use of the document with regard to the life expectancy and
perishability of the information,
• the use of the final text in the range from information
gisting to publishable information.
Inbound approach Outbound
approach
- MT for acquisition or - MT for dissemination

assimilation - the process of translating to
- the process - translating to communicate
understand - applying to a raw
- it bypasses human translation appropriate
intervention by presenting raw corrections for published
MT output text to readers documents aimed for many
- users determine their own people (full PE)
Main approach threshold acceptance of MT - the only domain with non-
output (no PE) post-edited or limited post-
es - a strictly minimal amount of edited information is that
corrections on documents that regarding weather bulletins
(Allen 2003) usually contain perishable (Météo system - 90–95%
MT accuracy).
information (rapid PE)
Mostly used classification of PE levels
 Even though approaches to post- editing levels are wide- ranging, the most po
pular levels are often referred to as just ‘light’ and ‘full’ post- editing (see Hu an
d Cadwell 2016).
 Influential guidelines published by the Translation Automation User Society (TA
US) refer to these levels based on two standards of expected target- text qualit
y, namely:
 ‘good enough’
 ‘similar or equal to human translation’ (Massardo et al. 2016: 17– 18).
These two quality standards roughly correspond to ‘light’ and ‘full’ post- editing, re
spectively.
Guidelines
for Light PE
Guidelines
for Full PE
Full PE g
uidelines
Do retain as much of the raw translation as possible. Resist the
temptation to delete and rewrite too much. Remember that
many of the words you need are there somewhere, but
probably in the wrong order.
Don’t allow yourself to hesitate too long over any particular
problem — put in a marker and go back to the problem later if
necessary.
Don’t worry if the style of the translation is repetitive
or pedestrian — there is no need to change words simply for
the sake of elegant variation.
Case-study: Don’t embark on time-consuming research. Use only rapid
research aids (Eurodicautom, knowledgeable colleagues,
ECTS (Wag specialised terminology. If a terminology problem is insoluble,
ner 1985) bring it to the attention of the requester by putting a question
mark in the margin.
Case-study: ECTS (cont)
 Do make changes only when they are absolutely necessary, i.e. correct
only words or phrases that are:
 a) nonsensical
 b) wrong
and, if there is enough time left,
 c) ambiguous.
CASL (Controlled Automotive Service Language) project:
• Well-rounded case of establishing and using documentation for PE
• The use of Society for Automotive Engineering (SAE) J2450 standard metric for
translation quality
• Several prioritized categories of errors rated as unacceptable in
translated texts:
 wrong terms,
 syntactic errors,
Case study:  omissions,
 word-structure or agreement errors,
General Mo  misspelling,
 punctuation errors,
tors  miscellaneous errors
• Does not address stylistic considerations
• Identifying and correcting errors of the above-mentioned categories (minimal level
of PE)
• Weights for each type of error (serious and minor)
Procedure of PE stuff training
• Post-editors receive post-editing ‘macros’
• Basic guidance about how to take advantage of the raw MT
output text
Case-study: Pan • How to avoid extensive reordering of concepts
-American Heal • How to respect phrases enclosed in ‘reliability marks’ in the output
• How to deal with context-sensitive alternate translations
th Organization
Case study: Microsoft (Groves and Schmidtke (2009) )
 The source text, the machine raw output and the post-edited text are used for the analysis.
 Microsoft reports improvements in the quality of the MT and related productivity increases
from 5-10 percent to 10 to 20 percent for certain languages, although they signal variatio
ns in post-editing productivity for the same language depending on project, product, differ
ent file deliveries of the same project, and between different translators.
 Translators report on issues related to terminology, grammar, and incorrect handling of ma
rk-up and formatting (tagging).
 To analyze the post-editing patterns, two data sets are used: English into German and into
French. Using their own edit distance (the number of modifications a human editor is require
d to make to a system translation so that the resulting edited translation counts as accurate) t
echniques, they find that for French the edit distance is 5.60 whereas the German score 8.
81, indicating a greater post-editing effort for German.
 The most common types of edits are deletion and insertion of function words (especially d
eterminers), also edits in punctuation, especially actions related to inserting or deleting co
mmas.
 They also give a detailed report on structure- based comparison for each language.
Recent Developments
• Static vs interactive mode in PE
• Automatic MT PE
• NMT PE
• MT Literary texts PE
Static vs interactive PE modes
Static Mode Interactive PE Mode

• Translators interact with
MT systems while the final
Recent MT output is generated
first and then it is edited
version of the text is
generated;
statically as a separate • In situations where MT can
developments i step be used to predict and
complete the HT
n MT PE as they are typed;
• In a reciprocal
interactive manner, where the
MT system reacts to while
also learning from
the human edits on the fly.
Automatic PE
(key researches by Astudillo, Graca, Martins 2018)
i. Developing post-editing models that are able to learn from patterns found
in aligned parallel data with the raw output on one side and post- edited
version on the other.
ii.Improving the MT output after it has been generated.
iii.Some previous research in this respect suggests that interactive
post- editing takes more time but that it may
lead to products of higher quality relative to static PE (e.g., Green et al.
2014).
Automatic
vs Human P
E
- Monolingual PE is impossible without the reference to source
Neural M text
- Higher fluency of NMT makes errors harder to identify
T PE and correct even in bilingual editing conditions
- Requires greater expertise in translation to post-edit
- Cognitive effort at NMT PE may well be similar to the
effort taken at HT from scratch
Introduction of NMT has opened perspectives to literary text PE
• Using NMT to post- edit a novel led to a 36% increase in translators’ words-per-
time productivity relative to translation from scratch (Toral, Wieling and Way
2018).
• This same study found that even PBSMT led to an 18% increase in
translators’ productivity compared to the from- scratch condition.
• The MT system was tailored to literary content (Toral and Way 2018), which is
likely to have played a significant role in productivity boost.
• Traditional assessment methodologies employed in MT research, such as

automatic evaluations and sentence-level assessments, may be re-examined if
MT PE of li research in this area is to stay true to readers’and literary translators’ perceptions of
quality (Doherty 2016).
terary texts • The BLEU metric, for instance (Papineni et al. 2002) – bilingual evaluation
understudy -- is an algorithm for evaluating the quality of text which has been
machine-translated (by IBM) would need to give way to
reader-centred evaluation approaches.
Future arrays of research in PE
 PE as a method for translating literary text is clearly emerging as a trend venturi
ng into a territory that for years has been off-limits for research in this area.
 Redefining important concepts and methodologies may be required by NMT P
E.
 Going beyond the sentence as a default level of analysis is one of the most pro
ductive tendencies.
 Quality assessment will also need to revisit theoretical paradigms that might no
t be consistent with the pragmatic approach such as automatic evaluations.
 More technological integration and interaction between translators and MT sys
tems call for more research on the implications of different working methods.
 Human-machine interface is still relatively poorly understood in relation to issue
s such as decision-making, agency and cognitive processing, so research on the
se subjects should hopefully continue to evolve.
To sum up:
 Post-editing is an encompassing term that may be used to describe not only static edi
ting of MT outputs but also as an additional source of matches in CAT environments
 May be offered as a separate service
 As a way of improving translator’s productivity
 Major research and development in PE revolves around effort and quality
 Successful PE is based on clear specification of requirements
 Two most popularly distinguished PE levels: light (with accuracy and semantics prevai
ling) and full PE (additionally, correct terminology, grammar, punctuation, syntax an
d formatting as well as culture-bound notions editing)
 Roughly corresponding to “good enough” vs “similar or equal to human translation”
 Translator’s role in light PE may be focused on terminological checks and content sig
n-off
 While in NMT PE professional translation expertise is a requirement in case of full PE.
Please, define the terms
 Post editing
 Human-assisted MT paradigm in PE
 Machin-assisted MT paradigm in PE
 Cognitive effort
 Temporal effort
 Technical effort
 Level of PE
 Automatic PE
 Interactive PE
Thank you
See you at the next lecture 

Machine Translation Post-Editing: Oryslava Bryska, April 28, 2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Translation Post-Editing: Oryslava Bryska, April 28, 2021

Uploaded by

Copyright:

Available Formats

Machine Translation

Post-editing Oryslava Bryska,

2 PE levels and guidelines

3 Recent developments. Future arrays of research in MT PE

Key Post editing, Human-assisted MT paradigm in PE,

terms: Technical effort, Level of PE, Automatic PE, Interactive PE

Lucas Nunes Vieira, 2017

Major findings revolve around the following aspects:

 Bowker and Ehgoetz (2007) explore user acceptance of machine translation o

 García (2010) explores the use of machine translation and post-editing in a no

 Carl et al. (2011) compare the post-editing experience in a group of translation

 Koehn (2012) reports on the experience of running evaluation assignments to

• the user/client requirements,

- MT for acquisition or - MT for dissemination

Static Mode Interactive PE Mode

• Traditional assessment methodologies employed in MT research, such as

You might also like