Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Perspectives

Studies in Translation Theory and Practice

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/rmps20

A human evaluation of English-Slovak machine


translation

Dasa Munkova, Ludmila Panisova & Katarina Welnitzova

To cite this article: Dasa Munkova, Ludmila Panisova & Katarina Welnitzova (2023) A human
evaluation of English-Slovak machine translation, Perspectives, 31:6, 1142-1161, DOI:
10.1080/0907676X.2022.2116989

To link to this article: https://doi.org/10.1080/0907676X.2022.2116989

Published online: 15 Sep 2022.

Submit your article to this journal

Article views: 286

View related articles

View Crossmark data

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=rmps20
PERSPECTIVES
2023, VOL. 31, NO. 6, 1142–1161
https://doi.org/10.1080/0907676X.2022.2116989

A human evaluation of English-Slovak machine translation


a a b
Dasa Munkova , Ludmila Panisova and Katarina Welnitzova
a
Constantine the Philosopher University in Nitra, Nitra, Slovakia; bUniversity of Ss. Cyril and Methodius,
Trnava, Slovakia

ABSTRACT ARTICLE HISTORY


The paper aims to obtain an error profile for machine translation Received 24 February 2021
(MT) from English into Slovak. We present an adjusted framework Accepted 21 August 2022
for MT evaluation, which is based on Vanko’s categorical
KEYWORDS
framework, but reflects machine translation peculiarities of Machine translation; error
synthetic and/or inflectional languages. Based on the framework, analysis; English language;
we analyse the errors generated by Google Translate and identify Slovak language; error
the most frequent categories of errors occurring in machine typology
translation when translating newspaper articles from English into
Slovak. While we have seen research on widely-spoken
languages, such as English or other major official EU languages,
little is known about Slovak, which is also an official EU language.
This paper provides the first human MT evaluation study of
English-Slovak machine translation using professional translators
for a more detailed depiction of translation quality. Our research
has revealed that the highest numbers of errors occurred in the
sphere of lexical semantics, as well as in syntactic-semantic
correlativeness, both being closely related. Additionally, based on
the results of the Cochran Q test, we show how individual MT
errors located in the examined categories differ in co-incidence
and in how they impact translation quality.

1. Introduction
The use of machine translation (MT) in the translation industry is closely related to
matters of business (Nunes Vieira & Alonso, 2020, p. 179). According to the Mordor
Intelligence Industry report1, the MT market reached $ 153.8 million in 2020 and is
expected to reach 230.67 million by 2026. The MT market is still growing due to the
ever increasing use of computer-assisted tools, above all CAT tools that make the trans-
lation process more effective. This increase is mainly due to the growth of emerging econ-
omies and the globalization of the market, as well as the demand for content localization
and time- and cost-effective translation. However, this fact does not indicate that
machine translation is perfect: satisfaction of the target consumer plays one of the key
roles in the determination of the quality of the translation (Way, 2018).
In the European Union (EU), translation quality among 24 official languages plays a
key role in successful communication between its citizens and EU or national institutions
(Vardaro et al., 2019). In the EU and/or in European countries, more than half of

CONTACT Dasa Munkova dmunkova@ukf.sk Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, Nitra
949 74, Slovakia
© 2022 Informa UK Limited, trading as Taylor & Francis Group
PERSPECTIVES 1143

translation companies use MT systems (Loock, 2020). MT is a viable alternative to ensure


equal access to information for all people not only in the EU but across the society as a
whole. Access to information can be seen as a human right necessary participating in
society and as a means of ensuring equality (Nurminen & Koponen, 2020, p. 151).
Even in on-line education, machine translation offers an alternative solution to existing
forms of MOOC (Massive Open Online Course) translation (Hu et al., 2020).

1.1. Research objectives


This research aims at the human evaluation of MT output from an analytical language
into an inflectional language. We present an adjusted framework based on Vanko’s fra-
mework (2017) to assess how well neural MT (Google Translate), in the English-Slovak
direction, models the specific linguistic features in the spheres of predication, modal and
communication sentence framework, syntactic-semantic correlativeness, compound/
complex sentences, and lexical semantics.
The research comprises two partial objectives:
The first objective deals with errors occurring in neural MT output when translating
journalistic texts from English into Slovak. We attempt to identify the most frequently
occurring types of errors during machine translation into Slovak and we categorize
them into five main categories of error rate according to Vanko’s categorical framework.
The second, subsequent objective focuses on a comparison of the types of errors found
in machine translation. We mutually compare the incidence of the types of errors for
each main category of error rate. We state the following null statistical hypothesis:
H0: There is no statistically significant difference in the incidence of error segments among
the examined categories (I. Predication, II. Modal and communication sentence framework,
III. Syntactic-semantic correlativeness, IV. Compound/complex sentences, and V. Lexical
semantics).

Our goal is not to compare statistical and neural MT, but to linguistically describe the
MT error profile of current technology (neural MT) using a hierarchical three-level error
scheme implemented into a categorical framework that is linguistically-motivated and
considers translation into a synthetic language – Slovak.
The structure of the paper is as follows. In the next section, we provide a brief review
of manual MT evaluation. We subsequently describe why we had to adjust Vaňko’s fra-
mework with other areas of error with regard to Slovak and describe the methodology,
including the data and results. Finally, we discuss the findings and direction of future
research.

2. MT evaluation
In the translation research community, there is no clear definition for translation quality.
The main issue is how to express and measure translation quality and what measure
should be used to assess the translation quality. Different views on translation bring
different approaches to the evaluation process (compare House, 2014). In academia,
there is a focus on the theoretical and educational aspects as related to translation
quality (Castilho et al., 2018, p. 11). This is contrary to the industry, where the focus
1144 D. MUNKOVA ET AL.

is on quantitative indicators of quality and the satisfaction of the end-user (client, con-
sumer, or buyer etc.). This resulted in two distinct paradigms in quality measurement
and comparison (Drugan, 2013, p. 123): top-down translation quality models (based
on traditional industry approaches and error rate), and bottom-up translation quality
models (based on emerging strategies, i.e. drawing on technological features to
enhance translation quality).
According to Popović (2018), one of the possible methods to assess the systematic and
semantic equivalence of translation lies in the classification of MT errors, which can
provide us with basic information about the errors that the MT system produces and/
or models. The error classification can be carried out manually, automatically, or in a
combined manner (in this paper, we focus on manual). Hu (2020) suggests that error
classification is of great importance not only for human but also automatic evaluations.
In human MT evaluation, bilingual judges assess (identify and classify) the MT quality
based on certain criteria referring to error schemes (Bojar, 2011; Costa et al., 2015;
Popović, 2018). The error schemes can comprise one or more levels with several cat-
egories. Federico et al. (2014) used a simple error scheme (one level with four categories)
which is based on linear mixed-effects models from English to Chinese, Arabic, and
Russian (reordering errors, lexicon errors, missing words, and morphology errors). A
more complex error scheme with a hierarchical structure was proposed by Vilar et al.
(2006), comprising three levels with several error categories in English-Spanish and
Chinese-English directions (e.g. the first level consists of five categories: missing
words, word order error, incorrect words, unknown words, and punctuation error).
Bojar (2011), inspired by Vilar et al. (2006), used a similar three-level hierarchical
error scheme, without the unknown words error category, and analysed error types in
English-Czech machine translation. Another hierarchical four-level error scheme was
proposed by Costa et al. (2015), which as a first is associated with Romance languages
(morphologically richer languages) and is not focused on English errors. It is linguisti-
cally-motivated, i.e. it indicates the language level where the error occurs or is located
(orthography, lexis, grammar, semantic, and discourse).
Currently, the DQF-MQM hierarchical error scheme is coming to the fore for analytic
quality evaluation. The DQF-MQM framework harmonizes the Dynamic Quality Frame-
work (DQF) from TAUS with the European Commission-funded Multidimensional
Quality Metrics (MQM). This harmonized model shares the same basic structure of
both, covering 20 of the most common issue types arising in translation quality assess-
ment (compare TAUS, 2015 or Lommel et al., 2014). TAUS DQF Error typology contains
six error categories plus four additional features (20+ subtypes). MQM consist of 100+
issue types in hierarchy, in which DQF Error typology is a subset.
In this context the Slovak language, which belongs to synthetic languages, mostly
relies on a rich range of inflectional morphemes that convey the relation among words
in a sentence and their meaning. Since there are huge grammatical differences
between e.g. English and Slovak, there is also a need to focus on different kinds of trans-
lation errors and, hence, to design a new framework for error typology which may better
cover the important issues in the analysis of English-Slovak translations. Vanko (2017)
designed a categorical framework for error analysis. Besides traditional linguistic foun-
dations, including analytical procedures, he also applied a pragmatic-communicative
aspect, involving the human addressee. The aim of Vanko’s framework is to be as
PERSPECTIVES 1145

exhaustive as possible in covering all the specific features typical of Slovak. Most of all, it
can be illustrated by the sphere of syntactic-semantic correlativeness in which there are
nominal morphosyntactic categories, such as Concord in determinative syntagm and
Agreement in determinative syntagm. While in English the term concord covers only
the agreement between subject and verb, in Slovak it also includes the relation
between the head of the noun phrase and its modifiers. In addition, the category of
Agreement in determinative syntagm is focused on the analysis of the relation
between a predicate and an object, which is, in Slovak, expressed using cases and inflec-
tional morphemes. It is not possible to trace similar relations in predominantly analytical
English grammar. The categorical framework is a three-level hierarchical error scheme
which corresponds to the core of MQM, as well as the DQF Error Typology. The
sphere of Language corresponds to the categories of Predication, Syntactic-semantic cor-
relativeness, Compound/complex sentence, and, partially, to the category of Modal and
communication sentence framework (e.g. negation). Accuracy is related mainly to an
incorrect meaning (transfer) in the text of the target language, the omission of
lexemes, etc. and is represented in the field of Lexical semantics together with the field
of Terminology understood as an inadequate transfer of a term from the source language
to the target language. The last, Style, includes those errors related to a mismatch between
the style of the source and target texts. In the framework, it is reflected in the subcategory
of Stylistic compatibility belonging to Lexical semantics (Vanko, 2017, p. 100).
There are few studies for the manual evaluation of MT output in the context of Slovak
(Absolon et al., 2018; Bánik et al., 2019; Wrede et al., 2020; Welnitzová et al., 2021; Wel-
nitzová & Munkova, 2021). Welnitzová et al. (2021) examined the quality of machine
translation using MQM, where they focused on the fluency of machine translation in
Slovak. They concluded that MT output is comprehensible, and that a reader can under-
stand the meaning of the text (2021, p. 228).
Munkova and Munk (2016) determined MT quality using automatic evaluation metrics
in the English/German-Slovak direction, but also vice versa, from Slovak to English/
German. Vičič et al. (2017, p. 60) showed, in the direction Czech-Slovak using automatic
metric HTER, that the quality of MT output from Czech to Slovak (only in this direction)
was satisfactory. Munkova & Munk et al. (2016) and Munk et al. (2018) proved that auto-
matic evaluation metrics are reliable and valid for both directions. They pointed to the fact
that metrics of automatic evaluation are a good indicator of MT quality, but do not provide
a detailed linguistic information on the accuracy and/or translation error rate (Munkova
et al., 2020; Munkova & Munk, 2015). However, they have come up with a new method-
ology to evaluate MT quality using automatic metrics and residuals, which allows more
detailed examination and analysis of the extreme cases (either sentences or segments) in
terms of accuracy or error rate (Munk & Munkova, 2018). Benková et al. (2021) also
used the same methodology in a comparative analysis of the quality of statistical and
neural machine translation from English to Slovak.

3. Framework for MT error classification in Slovak


As Popović (2018) states, defining a suitable set of error classes is a challenging task
because it is necessary to decide which error types are of interest for the given task
and how many details are needed.
1146 D. MUNKOVA ET AL.

The classification of categories within our framework was based on preliminary


research of English-Slovak MT output (Vanko, 2017), which revealed the most relevant
errors according to their frequency in the analysed texts. However, since the preliminary
research conducted in 2014–2018 (Munkova et al., 2017) showed that, since there were a
lot of errors in the category Others falling out of the scope of the original framework
designed by Vanko in 2017, there was a need to extend the original framework to classify
the errors in as detailed a manner as possible. As a result, the new adjusted framework
was designed in 2020. Similar to the earlier model, it is based on the definition of linguis-
tically-motivated categories for performing error analysis of MT output. It also (Table 1)
consists of five main linguistic issues (categories): 1. Predication, 2. Modal and communi-
cation sentence framework, 3. Syntactic-semantic correlativeness, 4. Compound/
complex sentences, and 5. Lexical semantics. However, each of the categories contains
more sub-categories as shown in italics to be easily identified (Table 1).
The highest increase in number of subcategories can be found in the category of pre-
dication, syntactic-semantic correlativeness and lexical semantics. The first two categories
are directly related to the specifics of Slovak as a synthetic language, i.e. errors in declina-
tion, as well as a relatively free word order depending on the case of the words used in a
sentence and their meaning. Analytical languages do not formally express the difference
between the nominative and accusative cases, i.e. the word order in such languages is
firmly fixed. In English, there is only one way to express the meaning The policeman
killed the robber. using the active form of the main verb kill whereas in Slovak there are
two possibilities: Policajt [The policeman] zabil [killed] lupiča [the robber] or Lupiča
zabil policajt. In the case of the second sentence, the literal English translation ignoring
the declination rules of the Slovak language could be as follows: The robber killed the
policeman. However, this translation is not correct. The position of the subject in nomi-
native (the policeman) and the object in accusative (the robber) has been changed in
Slovak, but the meaning of the sentence is the same. This means that different inflectional
morphemes expressing the different cases in Slovak lead to a looser word order in which
the subject and object may change their position, but they still preserve their function, and
the meaning of both Slovak sentences is the same.
In addition, the category of lexical semantics has been extended to 19 subcategories of
MT errors which appear to be quite frequent in the preliminary research (Munková &
Vanko, 2017). In this case, the high number of subcategories is directly related to the
style and genre of analysed texts.
Although the framework (2020) is primarily focusing on the analysis of MT output, it
can also be used for the evaluation of human translations (produced by, for example,
language learners, non-native speakers, non-experienced translators, as well as students
of translation studies).

4. Materials and methods


4.1. Materials
For our research purposes, we excerpted 63 news articles from the British online news-
paper The Guardian (including news from politics, sports, show business, and technol-
ogy). The sample represented the same style and genre (they were all news articles, not
Table 1. Adjusted framework for MT error classification (2020).
I. Predication Predicative categories Tense
Mode
Congruency categories Congruency in person
Congruency in number
Congruency in gender
Non-finite verb or other word class instead of finite verb functioning as a predicate
Missing verb in predication
Sentence with or without subject (one-member sentence with subject or two-member sentence
without subject)
Sentence with or without agent
Descriptive and reflexive passive verb forms
Incorrectly identified subject in sentence
Incorrectly identified predicate in sentence
Incorrect form of complex verb phrase
Others
II. Modal and communication sentence Modality Negation
framework Necessity
Possibility
Intention
Obligation
Communication functions Interrogativeness
Directiveness
Optativeness
Others
III. Syntactic-semantic correlativeness Nominal morphosyntax Concord in determinative syntagm
Agreement in determinative syntagm (without
preposition)
Agreement in determinative syntagm (with

PERSPECTIVES
preposition)
Adjunction in determinative syntagm
Adjunction in coordinated syntagm(s)
Incorrect case in syntagm
Incorrect number
Incorrect position of a word in determinative
syntagm

1147
Pronominal morphosyntax

1148
Numeral morphosyntax
Verbal morphosyntax Non-prepositional phrases
Prepositional phrases
Incorrect aspect

D. MUNKOVA ET AL.
Word order
Other morphosyntactic phenomena Prepositions
Incorrect transfer of word class
Redundant or missing comma in a simple sentence
Other punctuation
Lower case letter in the beginning of sentence or
headline
Others
IV. Compound/complex sentences Identification of number of sentences
Identification of semantic relations between sentences
Connectiveness between sentences (omission of conjunctions)
Conjunctions in compound, complex and compound-complex sentence
Comma in compound, complex and compound-complex sentence
Time shifts between sentences
Other phenomena
V. Lexical semantics Adequate transfer of word’s meaning
Polysemy
Homonymy
Semantic compatibility
Stylistic compatibility
Terms
Derivation
Omission
Untranslated lexeme
Translation of lexeme into different language or use of foreign expression
Redundant lexeme
Literal translation
Compounds
Proper names
Abbreviations and symbols
Explication
Technical shortcomings
Other phenomena
PERSPECTIVES 1149

essays or editorials). The articles were obtained in 2016–2017 and in 2018 translated into
Slovak by two professional translators. Altogether, we analysed 1 903 segments. Using the
TreeTagger corpus tool (Schmid, 1994), we pre-processed the English source texts (STs),
their Slovak machine translations (MTs), as well as their human translations (HTs)
(Table 2). The reason for the corpus processing was to find the readability and lexico-
grammatical structure of the examined texts (especially for English STs, Slovak NMTs
and Slovak HTs).
According to Biber and Conrad (2009) the typical feature of newspaper writing is its
written register and the emphasis on information. Since the aim of the text is to report
and describe events which have happened, there is a preference of nominal features, i.e.
nouns in all realizations (nouns, nouns as pre-modifiers, post-modifiers, nouns in noun
phrases), then prepositional phrases, and attributive adjectives.

4.2. Procedure and method


In May 2021, the examined English articles were machine translated using neural Google
Translate (NGT) into Slovak and implemented into the OSTPERE system (Munková &
Kapusta et al., 2016), in which three translators (PhD students in Translation Studies)
classified the errors identified according to the proposed adjusted framework (2020).
To determine the inter-annotator concordance within each error category, we use
Kendall’s coefficient of concordance, namely Kendall’s W for agreements between
ranks, where 0 represents no agreements at all, and 1 represents complete replications
(Siegel & Castellan, 1988).
To test the differences in the proportion of error segments in the examined neural MT
(NMT) texts, we used a non-parametric Cochran Q test, given that the examined vari-
ables come from an unknown distribution.

5. Results
We will separately analyse and interpret the results according to the main categories I, II,
III, IV, and V.

Table 2. Dataset composition.


Feature type Feature name GT_NMT Human translations Source texts
Readability Average sentence length 17.12034 17.83447 19.26274
Average word length 5.696361 5.81015 4.996122
#short sentences (n<10) 469 441 395
# long sentences (n>=10) 1434 1462 1508
Lexico-grammatical Frequency of proper nouns 1501 1522 3078
Frequency of nouns 10070 10740 8627
Frequency of adjectives 3324 3467 2968
Frequency of adverbs 933 966 1667
Frequency of verbs 5198 5287 6473
Frequency of pronominals 2371 2570 2124
Frequency of particles 592 782 149
Frequency of foreign words 841 841 0
Frequency of interjections 3 3 3
Frequency of numerals 617 619 777
Frequency of prepositions & conjunctions 6028 6250 6697
Frequency of interpunction 5958 5881 3547
1150 D. MUNKOVA ET AL.

I. Predication
Within the Predication category, there was a high agreement (Kendall W = 0.8457) in
determining the individual subcategories between the annotators, i.e. significant agree-
ment between evaluators (ChiSq = 4825.3285; df = 1902; p <0.0001). The most frequent
error segments were identified in subcategory I.1 (142 error segments), followed by I.5
(82 error segments), and I.11 (61 error segments). On the contrary, the least frequent
error segments were identified in I.8 (3 error segments) and I.10 (5 error segments).
Based on the results of the Cochran Q Test (Cochran Q Test: Q = 486.879, df = 13, p <
0.001), there are statistically significant differences in the occurrence of individual
error subcategories with respect to the examined segments. The most frequent error cat-
egory (I.1) represents only 7.41% of error segments out of the total number of segments
(Figure 1).

II. Modal and communication sentence framework


Within the second category, there was a high agreement (Kendall W = 0.8458) in deter-
mining the individual subcategories between the annotators, i.e. significant agreement
between evaluators (ChiSq = 4826.1855; df = 1902; p <0.0001). The most frequent error
segments were identified in subcategory II.3 (19 error segments), followed by II.2 (3
error segments), and I.4 (3 error segments). Conversely, no error segments were found
in subcategories II.5, II.6, II.8, and II.9. Based on the results of the Cochran Q Test
(Cochran Q Test: Q = 84.000, df = 8, p <0.001), there are statistically significant

Figure 1. Prediction- proportion of error segments.


PERSPECTIVES 1151

differences in the occurrence of individual error subcategories with respect to the exam-
ined segments. The most frequent error category (II.3) represents only 1% of error seg-
ments out of the total number of segments (Figure 2).

III. Syntactic-semantic correlativeness


Within the third category, there was a high agreement (Kendall W = 0.8342) in determin-
ing individual subcategories between annotators, i.e. significant agreement between eva-
luators (ChiSq = 4825.3285; df = 1902; p <0.0001). The most frequent error segments
were identified in subcategory III.14 (325 error segments), followed by III.18 (225
error segments), and III.3 (101 error segments). On the contrary, in category III.19 no
errors were found, and/or in II.5 and III.10, there were found in only three error seg-
ments. Based on the results of the Cochran Q Test (Cochran Q Test: Q = 2007.924, df
= 19, p <0.001), there are statistically significant differences in the occurrence of individ-
ual error subcategories with respect to the examined segments. The most frequent error
category (III.18) represents only 17.24% of the total number of segments (Figure 3).

IV. Compound/complex sentences


Within the fourth category, there was again a high agreement (Kendall W = 0.8494) in
determining the individual subcategories between the annotators, i.e. significant agreement
between evaluators (ChiSq = 4846.7676; df = 1902; p <0.0001). The most frequent error seg-
ments were identified in subcategory IV.4 (168 error segments), followed by IV.3 (140 error
segments). In contrast, IV.7 (6 error segments) and IV.1 (29 error segments) were the least

Figure 2. Modal and communication sentence framework – proportion of error segments.


1152 D. MUNKOVA ET AL.

Figure 3. Syntactic-semantic correlativeness – proportion of error segments.

frequent. Based on the results of the Cochran Q Test (Cochran Q Test: Q = 300.509, df = 6, p
<0.001), there are statistically significant differences in the occurrence of individual error
subcategories with respect to the examined segments. The most frequent error category
(IV.4) represents only 8.83% of the total number of segments (Figure 4).

V. Lexical semantics
Within the last category, there was again a high agreement (Kendall W = 0.856) in deter-
mining individual subcategories between annotators, i.e. significant agreement between
evaluators. (ChiSq = 4884.3697; df = 1902; p <0.0001). The most frequent error segments
were identified in subcategory V.1 (1348 error segments), followed by V.4 (184 error seg-
ments). Conversely, the least frequent error segments were in V.10 (13 error segments)
and V.6 (17 error segments). Based on the results of the Cochran Q Test (Cochran Q
Test: Q = 12481.500, df = 17, p <0.001), there are statistically significant differences in
the occurrence of individual error subcategories with respect to the examined segments.
The most frequent error category (V.1) represents up to 70.84% of the total number of
segments (Figure 5).

6. Discussion
In this section, we will analyse and interpret the most common mistakes that are relevant
to the newspaper style in the context of the source and target languages with the aim of
obtaining an MT error profile from English into Slovak. The highest number of MT error
PERSPECTIVES 1153

Figure 4. Compound/complex sentences – proportion of error segments.

segments were detected in Lexical semantics (1771) representing 77.30% out of a total
number of 1903 error segments. The second was Syntactic-semantic correlativeness
(914) representing 48.03%. 435 error segments were recorded in Predication, followed
by Compound/complex sentences (426). The lowest number of error segments (33) was
identified in Modal and communication sentence framework representing only 1.73%.
In the discussion, we will follow the aforementioned order while interpreting the
results of our research.

V. Lexical semantics
The category of Adequate transfer of word meaning was the most significant due to the
frequency of MT errors. The inadequate transfer of word meaning from English to Slovak
can be caused by the measure of homonymy and polysemy used in both languages. In
general, English word-stock is richer in homonymy, polysemy and even synonymy,
which may lead to the incorrect choice of meaning during the translation process and
thus a high number of MT errors. Preliminary results indicate not only an increased inci-
dence of errors in the sphere of adequate transfer of the word meaning from English to
Slovak, but also in the translation of abbreviations and symbols, persistent problems in
the translation of English polysemy and homonymy into Slovak, and increased incidence
of literal translation. As a result, during our research many word combinations and sen-
tence structures unnatural to the Slovak language used in NMT were detected, for
example,
1154 D. MUNKOVA ET AL.

Figure 5. Lexical semantics – proportion of error segments.

ST: This time last year, the unprecedented year-on-year decline in quarterly revenue could
be explained by the enormous success of the iPhone 6, and the comparative wet blanket of
the iPhone 6s.

MT: Tentokrát v minulom roku sa bezprecedentný medziročný pokles štvrťročných výnosov


dá vysvetliť enormným úspechom iPhone 6 a porovnateľnou mokrou prikrývkou iPhone 6s.

While in English the expression wet blanket means a person or thing that dampens or
discourages one’s enthusiasm or enjoyment, its literal translation into Slovak as mokrou
prikrývkou does not bear the same meaning and is considered to be incorrect in the target
language. The correct Slovak expression bearing the same metaphorical meaning is stu-
denou sprchou.

III. Syntactic-semantic correlativeness


The highest number of errors was related to the category of word order. Syntax and
grammatical constructions are not as fixed in Slovak newspaper writing, while in
English syntactic relations are presented by auxiliaries and a more fixed word order,
mainly because of the analytical character of English, so the word position in syntactic
structures indicates the position of sentence elements (Welnitzová & Munkova, 2021).
Due to their position, subject and subject clause are then quite easily recognizable in
English sentences. Based on large-scale corpus analyses, Biber and Conrad (2009) charac-
terize nominal features of newspaper writing that is very rich in nouns in all realizations
including nouns and noun phrases used as pre-modifiers or post-modifiers of various
PERSPECTIVES 1155

sentence elements. Our research has revealed that MT errors usually arise in cases of mul-
tiple-noun phrases. NGT is usually able to identify the subject and the object of the sen-
tence, but it fails to translate it correctly when the head of a noun phrase is pre-modified
by a complex noun phrase, for example,
ST: He added that he was at the meeting because he was ‘petrified at what is going on with
Islamophobia’ and was worried by the emergence of a ‘no Muslims, no Syrians, no refugee
culture’.

MT: Dodal, že bol na stretnutí, pretože bol “skamenený nad tým, čo sa deje s islamofóbiou” a bol
znepokojený vznikom “žiadnych moslimov, žiadnych Sýrčanov a žiadnej utečeneckej kultúry”.

HT: Dodal, že bol na stretnutí, pretože bol “šokovaný tým, čo sa deje s islamofóbiou” a bol
znepokojený vznikom “kultúry bez Moslimov, Sýrčanov a utečencov”.

In the original English sentence there is a direct object consisting of a complex noun
phrase the emergence of a ‘no Muslims, no Syrians, no refugee culture’ where the head
of the noun phrase emergence is pre-modified by a definite article the and post-
modified by a prepositional phrase consisting of the preposition of, an indefinite
article an, and a complex noun phrase no Muslims, no Syrians, no refugee culture
that should have been translated into Slovak as a complex noun phrase kultúry bez
Moslimov, Sýrčanov a utečencov, in which there is a head consisting of a noun
kultúry that is post-modified by a prepositional phrase bez Moslimov, Sýrčanov a ute-
čencov. Since the English adjectives sometimes do not contain specific derivational
morphemes, they can be placed in front of the noun phrase without any change in
their structure. On the other hand, Slovak adjectives have their own specific set of
suffixes attached to the stem that need to be appropriately declined depending on
the case and number of the head of the noun phrase and its structure. NGT has cor-
rectly declined the noun phrases functioning as pre-modifiers in the complex noun
phrase: žiadni Moslimovia (Nominative) – žiadnych moslimov (Genitive), žiadni Sýrča-
nia (Nominative) – žiadnych Sýrčanov (Genitive) a žiadna utečenecká kultúra (Nomi-
native) – žiadnej utečeneckej kultúry (Genitive). However, NGT has followed the
original English sentence pattern even in the Slovak translation, the result of which
is a literal translation copying the original English word order, which is grammatically
incorrect in the target language.

I. Predication
The category of tense was one of the most significant due to the frequency of errors by
MT. As Kroeger states (2005), the tense marking indicates the time when an event
occurred, or when a situation existed. However, the term tense is only used in our
paper for time reference, which is marked grammatically, i.e. by purely grammatical
elements such as affixes, auxiliaries, or particles. A lot of errors occurring in the trans-
lation of verbs correlate with the fact that English has a higher number of tenses and
verb forms than Slovak, and many of them do not have counterparts in Slovak. For
example, present perfect tense in English can be translated into simple or progressive
forms of the Slovak past or present tense, depending on the meaning of the sentence
and the message conveyed. Consequently, inadequate time shift during translation trans-
fer may cause logical discrepancies and meaning shifts, for example,
1156 D. MUNKOVA ET AL.

ST: The Department for Education says more teachers are ‘entering our classrooms than
those choosing to leave or retire.’

MT: Ministerstvo školstva tvrdí, že do našich tried “vstupuje viac učiteľov ako tých, ktorí sa
rozhodli odísť alebo odísť do dôchodku”.

[Back translation: The Department for Education says more teachers are ‘entering our class-
rooms than those chose to leave or retire.’]

In the English sentence the present continuous tense of the verbs enter and choose was
used, but in NMT only the present tense of the verb vstúpiť – vstupuje has been used.
What is important to notice is that NMT was also successful in the conjugation of the
verb – as for the verb enter – vstúpiť (infinitive) – its correct conjugated form is vstupuje.
However, in the case of the second verb choose a simple past form of its Slovak equivalent
rozhodnúť – rozhodli was used instead of the grammatically and semantically correct
simple present form rozhodnú.

IV. Compound/complex sentences


The highest number of errors related to Conjunctions in compound, complex and com-
pound-complex sentence were observed. NMT is usually successful in the translation of
English nominal that-clauses in which the subordinating conjunction that had been
omitted, for example,
ST: Agricultural sector representatives said they have encountered a sharp decline in appli-
cations for work from EU citizens in the wake of the vote last June.

MT: Zástupcovia poľnohospodárskeho sektoru uviedli, že po hlasovaní v júni minulého roku


narazili na prudký pokles žiadostí občanov EÚ o prácu.

NGT has been able to identify the omission of the subordinating conjunction that and
to add the appropriate subordinating Slovak conjunction že into the translation and, as a
result, to maintain the correct semantic relations between subordinate and superordinate
clause within the Slovak complex sentence. However, when that has been used in a sen-
tence as a restrictive relative pronoun, translation errors by NGT have been observed in
the examined newspaper articles, for example,
ST: ‘We are very clear we want to see a strong and successful EU, now and into the future,
that we can have a mature and constructive partnership,’ she said.

MT: “Sme si veľmi jasní, že chceme vidieť silnú a úspešnú EÚ, teraz aj do budúcnosti, že
môžeme mať zrelé a konštruktívne partnerstvo,” uviedla.

HT: “Stojíme si za tým, že chceme vidieť silnú a úspešnú Európsku úniu nielen teraz, ale
aj v budúcnosti. Takú, s ktorou môžeme mať zrelý a konštruktívny partnerský vzťah,”
dodala.

NGT has used the same Slovak conjunction že as in the case of subordinate that-
clauses. However, when that is used as a relative pronoun, it has to be translated into
Slovak differently, i.e. by means of the subordinate conjunction ktorá which has to be
declined in accordance with the rules of Slovak grammar (ktorou). Moreover, the subor-
dinate conjunction needs to be combined with the Slovak preposition s (with) to main-
tain the correct semantic meaning of the translated text.
PERSPECTIVES 1157

II. Modal and communication sentence framework


The lowest number of errors were detected in this sphere. The highest number of errors
were identified in the category of Possibility that represents only 1% of the total number
of MT error segments. What is interesting within this category is the fact that we have
recorded an overuse of modal verbs expressing the possibility in NMT output, for
example,
ST: Business critics of the scheme had originally feared it would be inflexible, while colleges
said that amendments announced during the summer recess would have seen cuts of up to
50% for apprenticeships for the poorest teenagers.

MT: Podnikateľskí kritici systému sa pôvodne obávali, že bude nepružný, zatiaľ čo vysoké
školy uviedli, že novely ohlásené počas letných prázdnin by mohli viesť k zníženiu až o
50% v prípade učňovskej prípravy pre najchudobnejších tínedžerov.

[Back translation: Business critics of the scheme had originally feared it would be inflexible,
while colleges said that amendments announced during the summer recess could have seen
cuts of up to 50% for apprenticeships for the poorest teenagers.]

In the original sentence there is a backshift related to the use of indirect speech, the
result of which is that the modal verb will is substituted by would expressing the same
meaning, i.e. prediction. However, NGT has translated the sentence using the Slovak
modal verb mohli (could) expressing possibility which is not present in the original sen-
tence. As a result, NGT has not only used a different modal verb than is used in the orig-
inal text, but it also caused the shift in meaning which may have a crucial impact on the
message conveyed in an original sentence.

7. Conclusion
The study offers a new insight into the manual evaluation of MT quality. The results and
conclusions of the study offer one substantial theoretical contribution to the field of
translation quality assessment and two practical contributions.
The theoretical contribution lies in the proposed framework for manually evaluating
MT quality in the context of syntactic languages, which corresponds with the four dimen-
sions of the MQM-DQF Error Typology framework. The framework is based on Vanko’s
categorical framework, but is adjusted for the purpose of classifying MT errors, which has
its own highly detailed specifics.
From a practical point of view, the findings of the study offer a closer understanding of
those errors occurring in NMT, i.e. they allow us to reveal the linguistic feature of newspaper
writing machine translated texts from an analytical language to a synthetic language. What
the machine (NGT) knows, based on neural networks, and what it cannot translate correctly
with respect to the newspaper style in the context of the source (English) and target (Slovak)
languages. The second practical piece of knowledge, which follows on from the first, consists
in the identification of ‘machine-translationese’ (Loock, 2020). We have no knowledge that
there are similar extensively linguistically-motivated studies focusing on NMT into Slovak,
in which the error rate of neural machine translation is analysed in detail.
The highest number of MT errors were recorded in the sphere of Lexical semantics.
They are related to inadequate transfer of word meaning, as well as semantic
1158 D. MUNKOVA ET AL.

incompatibility caused by the measure of homonymy, synonymy, and polysemy used in


both languages. In addition, there is an increased incidence of literal translation, as well
as MT errors in abbreviations and symbols.
In the sphere of syntactic and semantic correlativeness, the highest number of errors is
related to the category of the word order. Our research has revealed that NGT was usually
able to identify the subject and the object of the sentence, but MT errors usually arise in
cases of multiple-noun phrases. NGT fails to translate the aforementioned sentence
elements correctly when their head of a noun phrase was pre-modified or post-
modified by multiple-noun phrases.
In the sphere of predication, we agree with Panisova and Munkova (2021) that the cat-
egory of tense was one of the most significant due to the frequency of errors. The typical
errors occurring in NMT were related to incorrect shifts between present and past tenses
and their simple and progressive aspects.
The sphere of compound and complex sentences was typical for the highest number of
errors related to incorrect translation of subordinating conjunctions. One of the typical
examples is translation transfer of the conjunction that. While in the case of English
nominal that-clauses, in which the subordinating conjunction that had been omitted,
NMT was usually successful, but it failed to translate it correctly when that was used
as a relative pronoun. The lowest number of errors were detected in the sphere of
Modal and communication sentence framework, which corresponds to the results of
the study by Panisova and Munkova (2021).
The research also has certain limitations due to the fact that the examined texts are not
extensive and come from one genre, and partly caused by manual evaluation, despite the
fact that the annotators achieved a high score of agreement. For this reason, in future
work, we would like to focus on the connection between manual and automatic error
classifications of MT output, using automatic MT evaluation metrics. Furthermore, we
intend to extend the examined texts (MT outputs), not only in terms of writing style,
but also in terms of language pairs.

Note
1. Retrieved 27 June, 2022, from https://www.mordorintelligence.com/industry-reports/
machine-translation-market

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by Slovak Research and Development Agency: [Grant Number APVV-
18-0473]; Vedecká Grantová Agentúra MŠVVaŠ SR a SAV: [Grant Number VEGA-1/0809/18].

Notes on contributors
Daša Munková worked at the Department of Translation Studies, Faculty of Arts, Constantine the
Philosopher University (CPU) in Nitra from 2010 to 2021. She currently works as professor at the
PERSPECTIVES 1159

NLP Lab at the Department of Informatics, Faculty of Natural Sciences, CPU in Nitra in 2021. Her
research interests focus on computational linguistics, machine translation, and translation quality
assessment. The dominant area of her research interest lies in machine translation quality evalu-
ation. Currently, she is the principal investigator of the project APVV - Classification Model of
Machine Translation Error Rate: A Step Toward Objectifying the Translation Assessment. She
initiated the first meeting of the Slovak translation companies (language service providers) and
five universities in Slovakia within the Elia Exchange (EE) forum. She is a member of the
Slovak Society of Translators of Professional Literature (SSPOL).
Ľudmila Pánisová studied translation and interpreting in the Department of Translation Studies,
Faculty of Arts, CPU in Nitra, where she now works as an assistant professor. In 2012 she success-
fully defended her Ph.D. thesis Stylistic Aspects of Translation. She leads lectures and seminars in
the English and Slovak linguistics, history of the English language and non-literary translation and
she actively translates non-literary and journalistic texts. Within her academic research focused on
translation of the Slovak literature into foreign languages, stylistics and comparative linguistics she
has cooperated with the Slovak as well as foreign academic institutions. In 2014 she published her
first monograph Slovak literature in English Translation - Past and Present (1832-2013), which has
been appreciated and acknowledged by members of the Slovak as well as foreign academic com-
munity and professional translators.
Katarína Welnitzová is an associate professor with the Department of British and American
Studies, Faculty of Arts, University of Ss. Cyril and Methodius in Trnava (Slovakia). She obtained
a PhD in 2008 on the topic of Non-verbal Communication in Consecutive Interpreting. Prior to
this, she got MA at Comenius University in Bratislava by research on interpreting. Her teaching
focuses on English Phonetics and Phonology, Translation, Interpreting, Non-verbal Communi-
cation, and Computer Technologies in Translation. In her research, she examines machine trans-
lation, evaluation of MT and post-editing of MT in the direction English - Slovak. She is a member
of the research team coordinated by prof. Munkova. She is an author of monographs and articles
published in scientific journals.

ORCID
Dasa Munkova http://orcid.org/0000-0002-1003-7929
Ludmila Panisova http://orcid.org/0000-0002-9081-212X
Katarina Welnitzova http://orcid.org/0000-0003-3324-8320

References
Absolon, J., Munkova, D., & Welnitzova, K. (2018). Machine translation: Translation of the future?
Machine translation in the context of the Slovak language. VERBUM, 78.
Bánik, T., Benko, Ľ, Machová, R., Munk, M., & Munková, D. (2019). Wie irrt die Maschine?
Probleme der maschinellen Übersetzung. Verlag Dr. Kovač, 204.
Benková, L., Munkova, D., Benko, Ľ, & Munk, M. (2021). Evaluation of English-Slovak neural and
statistical machine translation. Applied Sciences, 11(7), 1–17. https://doi.org/10.3390/
app11072948
Biber, D., & Conrad, S. (2009). Register, genre, and style. Cambridge.
Bojar, O. (2011). Analyzing error types in English-Czech machine translation. The Prague Bulletin
of Mathematical Linguistics, 95(1), 63–76. https://doi.org/10.2478/v10108-011-0005-2
Castilho, S., Doherty, S., Gaspari, F., & Moorkens, J. (2018). Approaches to human and machine
translation quality assessment. In J. Moorkens, S. Castilho, F. Gaspari, & S. Doherty (Eds.),
Translation quality assessment: From principles to practice (pp. 9–39). Springer.
Costa, Â, Ling, W., Luís, T., Correia, R., & Coheur, L. (2015). A linguistically motivated taxonomy
for machine translation error analysis. Machine Translation, 29(2), 127–161. https://doi.org/10.
1007/s10590-015-9169-0
1160 D. MUNKOVA ET AL.

Drugan, J. (2013). Quality in professional translation. Assessment and improvement. Bloomsbury


Academic.
Federico, M., Negri, M., Bentivogli, L., & Turchi, M. (2014). Assessing the impact of translation
errors on machine translation quality with mixed-effects models, In Proceedings of the 2014
Conference on Empirical Methods in Natural Language Processing (EMNLP). Dohar, Qatar,
25–29 October 2014. Association for Computational Linguistics, pp. 1643–1653.
House, J. (2014). Translation quality assessment: Past and present. Routledge.
Hu, K. (2020). A reception study of machine translated subtitles for MOOCs [Unpublished doctoral
dissertation]. Retrieved 30 January, 2021, from http://doras.dcu.ie/24084/1/thesis_KE%20HU.
pdf
Hu, K., O’Brien, S., & Kenny, D. (2020). A reception study of machine translated subtitles for
MOOCs. Perspectives: Studies in Translation Theory and Practice, 28(4), 521–538. https://doi.
org/10.1080/0907676X.2019.1595069
Kroeger, P. R. (2005). Analyzing grammar (An introduction). Cambridge University Press.
Lommel, A., Uszkoreit, H., & Burchardt, A. (2014). Multidimensional Quality Metrics (MQM): a
framework for declaring and describing translation quality metrics. Revista Tradumàtica, 12,
455–463. https://doi.org/10.5565/rev/tradumatica.77
Loock, R. (2020). No more rage against the machine: How the corpus-based identification of
machine-translationese can lead to student empowerment. The Journal of Specialised
Translation, 34, 150–170.
Munk, M., & Munkova, D. (2018). Detecting errors in machine translation using residuals and
metrics of automatic evaluation. Journal of Intelligent & Fuzzy Systems, 34(5), 3211–3223.
https://doi.org/10.3233/JIFS-169504
Munk, M., Munkova, D., & Benko, Ľ. (2018). Towards the use of entropy as a measure for the
reliability of automatic MT evaluation metrics. Journal of Intelligent & Fuzzy Systems, 34(5),
3225–3233. https://doi.org/10.3233/JIFS-169505
Munkova, D., Hajek, P., Munk, M., & Skalka, J. (2020). Evaluation of machine translation quality
through the metrics of error rate and accuracy. Procedia Computer Science, 171, 1327–1336.
https://doi.org/10.1016/j.procs.2020.04.142
Munkova, D., Kapusta, J., & Drlík, M. (2016). System for post-editing and automatic error classification
on machine translation. In DIVAI 2016: 11th International Scientific Conference on Distance
Learning in Applied Informatics, Štúrovo, Slovakia, May 2–4 2016; Nitra: UKF, pp. 571–581.
Munkova, D., & Munk, M. (2014). An automatic evaluation of machine translation and Slavic
languages. In 2014 IEEE 8th International Conference on Application of Information and
Communication Technologies- AICT2014: Kazakhstan, Astana, 15–17 October 2014, Astana:
IEEE, 2014, pp. 447–451.
Munkova, D., & Munk, M. (2015). Automatic evaluation of machine translation through the
residual analysis. In ICIC 2015 Advanced Intelligent Computing Theories and Applications, PT
III: Lecture Notes in Artificial Intelligence 9227. Lecture Notes in Computer Science, Berlin:
Springer Verlag, pp. 481–490.
Munkova, D., & Munk, M. (2016). Evalvácia strojového prekladu. UKF, 173.
Munkova, D., Munk, M., Kapusta, J., & Reichel, J. (2016). Evaluation of machine translation
output in context of inflectional languages. In IEEE 2016: 10th International Conference on
Application of Information and Communication Technologies (AICT), Baku 12–14 October
2016, Baku: IEEE, pp. 85–90.
Munkova, D. & Vanko, J. (Eds.). (2017). Mýliť sa je ľudské (ale aj strojové): analýza chýb strojového
prekladu do slovenčiny. UKF, 260.
Nunes Vieira, L., & Alonso, E. (2020). Translating perceptions and managing expectations: An
analysis of management and production perspectives on machine translation. Perspectives:
Studies in Translation Theory and Practice, 28(2), 163–184. https://doi.org/10.1080/0907676X.
2019.1646776
Nurminen, M., & Koponen, M. (2020). Machine translation and fair access to information.
Translation Spaces, 9(1), 150–169. https://doi.org/10.1075/ts.00025.nur
PERSPECTIVES 1161

Panisova, L., & Munkova, D. (2021). Špecifiká strojového prekladu publicistických textov z
anglického do slovenského jazyka. In FORLANG: Cudzie jazyky v akademickom prostredí,
23–24 júna 2021. Technická univerzita v Košiciach, pp. 281–290.
Popović, M. (2018). Error classification and analysis for machine translation quality assessment. In
J. Moorkens, S. Castilho, F. Gaspari, & S. Doherty (Eds.), Translation quality assessment: From
principles to practice (pp. 129–158). Springer.
Schmid, H. (1994). Part-of-speech tagging with neural networks. In Proceedings of the
International Conference on Computational Linguistics, Kyoto, pp. 172–176.
Siegel, S., & Castellan, N. J., Jr (1988). Nonparametric statistics for the behavioral sciences (2nd ed.).
Mcgraw-Hill Book Company.
TAUS. (2015). DQF and MQM harmonized to create an industry-wide quality standard – TAUS.
Retrieved 30 January, 2021, from https://www.taus.net/academy/news/press-release/dqf-and-
mqm-harmonized-to-create-an-industry-wide-quality-standard
Vanko, J. (2017). Kategoriálny rámec pre analýzu chýb strojového prekladu. In D. Munková, & J.
Vaňko (Eds.), Mýliť sa je ľudské (ale aj strojové) (pp. 83–100). UKF v Nitre.
Vardaro, J., Schaeffer, M., & Hansen-Schirra, S. (2019). Translation quality and error recognition
in professional neural machine translation post-editing. Informatics, 6(3), 41. https://doi.org/10.
3390/informatics6030041
Vičič, J., Kuboň, V., & Homola, P. (2017). Česílko goes open-source. The Prague Bulletin of
Mathematical Linguistics, 107(1), 57–66. https://doi.org/10.1515/pralin-2017-0004
Vilar, D., Xu, J., Luis Fernando, D. H., & Ney, H. (2006). Error analysis of statistical machine trans-
lation output, In Proceedings of LREC2006, Genoa, Italy, 22–28 May 2006. European Language
Resources Association, pp. 697–702.
Way, A. (2018). Quality expectations of machine translation. In J. Moorkens, S. Castilho, F.
Gaspari, & S. Doherty (Eds.), Translation quality assessment: From principles to practice (pp.
159–178). Springer.
Welnitzová, K., Jakubičková, B., & Králik, R. (2021). Human-Computer interaction in translation
activity: Fluency of machine translation. RUDN Journal of Psychology and Pedagogics, 18(1),
217–234. https://doi.org/10.22363/2313-1683-2021-18-1-217-234
Welnitzová, K., & Munkova, D. (2021). Sentence-structure errors of machine translation into
Slovak. Topics in Linguistics, 22(1), 78–92. https://doi.org/10.2478/topling-2021-0006
Wrede, O., Munkova, D., & Welnitzová, K. (2020). Effektivität des post-editings maschineller
Übersetzung: Eine Fallstudie zur Übersetzung von Rechtstexten aus dem Slowakischen ins
Deutsche. Lingua et Vita, 9(17), 117–127.

You might also like