Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/272102857

Oakes, Michael P., and Meng Ji, eds. 2012. Quantitative Methods in Corpus-
Based Translation Studies: A Practical Guide to Descriptive Translation
Research

Article  in  Target · January 2015


DOI: 10.1075/target.27.1.13zan

CITATIONS READS

0 205

1 author:

Federico Zanettin
Università degli Studi di Perugia
33 PUBLICATIONS   419 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

News Media Translation View project

Comics translation View project

All content following this page was uploaded by Federico Zanettin on 26 February 2015.

The user has requested enhancement of the downloaded file.


John Benjamins Publishing Company

This is a contribution from Target 27:1


© 2015. John Benjamins Publishing Company
This electronic file may not be altered in any way.
The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be
used by way of offprints, for their personal use only.
Permission is granted by the publishers to post this file on a closed server which is accessible
only to members (students and faculty) of the author’s/s’ institute. It is not permitted to post
this PDF on the internet, or to share it on sites such as Mendeley, ResearchGate, Academia.edu.
Please see our rights policy on https://benjamins.com/#authors/rightspolicy
For any other use of this material prior written permission should be obtained from the
publishers or through the Copyright Clearance Center (for USA: www.copyright.com).
Please contact rights@benjamins.nl or consult our website: www.benjamins.com
Oakes, Michael P., and Meng Ji, eds. 2012. Quantitative Methods in
Corpus-Based Translation Studies: A Practical Guide to Descriptive
Translation Research. Studies in Corpus Linguistics 51. Amsterdam: John
Benjamins. X + 361 pp. ISBN 978-90-272-0356-4. € 99.00 / US$ 149.00
(HB)
Reviewed by Federico Zanettin (University of Perugia, Italy)

The purpose of this volume is to “provide a comprehensive guidebook to the es-


sential quantitative methods in corpus-based translation studies” (vii), thus filling
a gap in the literature, which still lacks a systematic description of how statistical
techniques used in corpus linguistics can be applied to translation research. The
volume is introduced by a short preface and divided into four sections: “theoreti-
cal explorations,” “essential corpus statistics,” “quantitative explorations of literary
translation” and “quantitative explorations of translation lexis,” for a total of 13
chapters. Statistical tables are provided as appendixes, as are a keyword and author
index.
Chapter 1 by Lewandowska-Tomaszczyk proposes an approach termed
CogCorpLing (Cognitive Corpus Linguistics). The data discussed derive from
three parallel texts in two directions of translation and belonging to different
genres (a technical guide book, a romantic novel and a collection of poetry). First,
basic statistical analyses of word frequencies and keyword lists are used to develop
and compare textual profiles of the source and target texts. Then, the author ex-
amines verbs of emotion and the constructional, collocational and metaphorical
contexts in which they appear in source texts and translations, compared to their
behavior in reference corpora in the two languages. She is thus able to show how
similarity relations are established in translation between lexical and conceptual
clusters rather than between single words, providing quantitative evidence of re-
conceptualization in translation. Chapter 2 by Tries and Wulff aims at demon-
strating how the statistical family of methods called regression analysis can be
applied to corpus-based translation research to test the strength of correlations
between dependent and independent variables. The authors use corpus data re-
garding clause position (main vs. adverbial) and type (causal vs. temporal) and
sentence length to show how these factors are correlated and how the correlation
may change across languages. After a brief theoretical introduction to the method
and a description of the data used, the authors show how the data can be pro-
cessed using the open source programming language and statistical package R.

Target 27:1 (2015), 138–144.  doi 10.1075/target.27.1.13zan


issn 0924–1884 / e-issn 1569–9986 © John Benjamins Publishing Company
Book reviews 139

In Chapter 3 Meng Ji provides a corpus-based stylistic study of two late twentieth


century Chinese translations of Cervantes’s Don Quijote. More specifically, she
compares the distribution of archaisms in Don Quijote’s speeches in two transla-
tions and in the source text by carrying out a linear regression analysis of their
frequency. The aim is to establish which Chinese translation follows more closely
the pattern of use of archaisms in the original, and the results are subjected to a
number of statistical tests to determine whether the correlations found are sta-
tistically significant. Ji then examines the frequency of use of Chinese idiomatic
expressions in the two translations and relates it to language change, and shows
how the two translations follow the same pattern observed in two similarly time-
distanced Chinese reference corpora.
Chapter 4 by Hareide and Hofland provides a detailed and informative pre-
sentation of the steps followed in the compilation of a Norwegian-Spanish parallel
corpus (NSPC). Basic corpus statistics such as word and sentence number and
length are presented, as are possible research questions to be pursued on the basis
of these. The authors then set out to find whether variables such as genre or trans-
lator’s gender show a statistically significant correlation with sentence length and
alignment type. The corpus (NSPC) will be used in conjunction with the similarly
designed English-Spanish parallel corpus P-ACTRES (Izquierdo et al. 2008), for
instance to test hypotheses concerning universal features of translation.
In Chapter 5 Oakes provides an overview of descriptive statistics methods and
tests frequently used in corpus linguistics which, he suggests, can be usefully ap-
plied to the study of corpora of translated texts, for instance to describe quantita-
tive differences between different translations of the same source text. The author
explains how to use quantitative summary data as well as visual representations.
Basic concepts and methods are carefully exemplified, while others are presented
in less detail and may thus be less easily digested. In Chapter 6 Ke describes how
clustering techniques can be used to discriminate among, or group together, texts
in a corpus. He begins by explaining what clustering is and the different steps
involved in a clustering task, and provides details of statistical techniques which
can be applied to each task when dealing with textual documents. The methods il-
lustrated are then applied to a corpus of student translations, showing that clusters
created using statistical techniques tend to match clusters derived from human
evaluation of translation quality.
The second part of the volume begins with Chapter 7 by Ji and Oakes, which
presents a study of different English translations of a Chinese classic novel. The
authors provide background information on the novelist, the novel and the trans-
lators, and explain how the texts were annotated with grammatical and semantic
information. They illustrate the tests which were used to assess the statistical sig-
nificance of measures of difference between the translations in terms of sentence

© 2015. John Benjamins Publishing Company


All rights reserved
140 Book reviews

length and frequency of fixed phrases and of words of ‘emotion’ and ‘value.’ They
explain which tests should be used according to the type of data, and provide for-
mulas and R command lines to replicate them. In Chapter 8 Patton and Can draw
on authorship attribution and stylometry studies to investigate the relationship
between James Joyce’s Dubliners and a Turkish translation of it using five style
markers, that is sentence length, most frequent words, length of types and tokens
and their ratio. The aim is to identify translation invariant characteristics, that is
how closely the distribution of these markers in the translation resembles that in
the original text, given the features specific to each language. The statistical tech-
nique adopted, called discriminant analysis, “uses the information available in a
set of independent variables to predict the value of a categorical dependent vari-
able” (218). The authors suggest that this technique provides a consistency check
which can be used for plagiarism detection, “where the potential plagiarized copy
can be assumed to be a translation of the original” (227).
In Chapter 9, Rybicki also uses stylometrics to compare translations of same
and different authors by same and different translators into same and different lan-
guages. More specifically, he uses Borrow’s Delta, a multivariate analysis method
which takes into account the frequencies of the most frequent words. By apply-
ing an R script he developed himself to a quite sizable corpus of literary transla-
tions, the author is able to visualize text similarity as cluster analysis tree diagrams,
showing that this method groups together works by the same author, while it gen-
erally fails to identify works by the same translator.
Chapter 10 by Ji is an investigation into the history of translation from
European languages into Chinese. More specifically, the author uses a corpus of
five nineteenth century dictionaries in a pilot experiment which aims to test the
validity of statistical methods for the study of patterns of development in early
Chinese scientific translation. It is argued that scientific translation was one of
the main factors responsible for the systematic introduction of disyllabic words in
modern Chinese, as opposed to the lexis of ancient Chinese which was predomi-
nantly monosyllabic. The statistical technique of Hierarchical Cluster Analysis
(HCA) is used to compare the frequencies of different-length tokens, and the dis-
tribution of functional particles (similar to affixes in inflectional languages) in the
five texts. It is shown how HCA can be used to obtain a binary classification of the
descriptive categories into which functional characters extracted from the corpus
were initially manually classified.
In Chapter 11 Sotov uses a trilingual parallel corpus in which German and
Russian translations are aligned with original ancient Indian cultic poetry. He
looks at the strategies used to translate Vedic ambiguous proper names of mytho-
logical figures, and correlates strategy type and degree of consensus among trans-
lators with ambiguity, defined as a function of variable contextual ‘constraints’

© 2015. John Benjamins Publishing Company


All rights reserved
Book reviews 141

determined by ‘location’ (whether a term occurs in ‘core’ or ‘peripheral’ texts)


and ‘co-text’ (that is by the frequency of the words which accompany a term). A
statistically significant correlation is found between the degree of ambiguity of a
source term and the strategy used to translate it: the two translators usually agreed
on a transcription strategy when confronted with intertextually and contextually
less ambiguous terms, and on an adjustment strategy in the case of more ambigu-
ous contexts. In Chapter 12 Jenset and McGillivray investigate the extent to which
three different multivariate analysis methods are suited for investigating the use of
derivational suffixes in a corpus of translations into English (TEC, Laviosa 1997,
Baker 1999). The authors explore suffix productivity, i.e., the distribution of four
English suffixes which appear frequently in hapax legomena, and the general dis-
tribution of suffixes in the corpus. A correlation is found with text type, while none
is found for the variables source language and translator’s cultural/linguistic back-
ground. Since both indexes for suffix productivity and distribution in translated
texts match those which were found to characterize general English in a study
using the same methods (Baayen 1994), the authors suggest that this is in line with
the proposed translation universal of normalization (Baker 1993).
Finally, in Chapter 13 De Sutter, Delaere and Plevoets contend that studies
of translation universals have so far usually tended to be monodimensional, that
is they have compared translated and non-translated texts focusing on a single
linguistic feature and without taking into proper account the influence of vari-
ables such as text type and source language. They thus propose to test whether
one non-controversial hypothesis, conservatism (Kenny 2001, aka normalization),
here defined in terms of a higher level of lexical formality, holds independently
of these variables. Using advanced statistical techniques such as correspondence
and regression analysis the authors compute ten lexical variables (‘profiles’) and
correlate them to eight varieties (‘lects’) corresponding to five text types, in order
to establish degrees of lexical formality. The data are taken from the bidirectional
Dutch Parallel Corpus (Macken et al. 2011). The results show that translated texts
behave differently from non-translated texts, but also that much variation is cor-
related with text type and source language.
The essays are generally well written and edited, notwithstanding the occa-
sional typo, missing reference, or inconsistency. For instance, in the first chapter
some of the data discussed in the narrative are not found in the tables, and in one
case they are even at odds with them. That is, while the author states that all the
translations examined are longer (i.e., contain more tokens) than the respective
source texts, the figures provided in the associated table indicate otherwise (7). In
Chapter 7 a study by Li et al. (2011) is reported as dealing with translation univer-
sals, whereas the authors are not in fact concerned with this topic (178).

© 2015. John Benjamins Publishing Company


All rights reserved
142 Book reviews

This volume draws attention to the intersection between corpus-based trans-


lation studies and corpus statistics by including articles which would otherwise
probably be found scattered in computational linguistics journals and conference
proceedings, and could thus go largely ignored by ‘mainstream’ corpus-based trans-
lation studies. Conversely, while some articles are conversant with such literature,
others show little familiarity with it. Thus, the volume seems to fall somewhere in
between the two intended readerships: on the one hand corpus and computational
linguists interested in translation studies; on the other researchers in (descriptive)
translation studies who use corpus linguistics as a methodological tool and are
interested in gaining a deeper knowledge and understanding of statistical meth-
ods. The first type of readers may better appreciate technical details, though they
will probably find some methodological explanations redundant and perhaps the
exposure to the concerns of corpus-based translation studies literature too limited.
Most of the second type of readers will instead not be able to identify a clear pro-
gression in the ordering of chapters and will be confronted with a very steep learn-
ing curve. Thus, while the procedures used in the first chapter do not appear to be
too cognitively demanding, a knowledge of essential statistical methods, concepts
and terminology seems to be taken for granted in the second chapter, which will
simply go over the head of anyone without a basic but sturdy knowledge of statis-
tics. In Chapter 6 some methods and tests are explained in detail and through ex-
amples, and will thus be understood by readers without any mathematical and sta-
tistical training, whereas others assume some knowledge of statistics and will thus
be opaque to readers lacking that background. Some data, concepts and methods
used in earlier chapters are only explained in more detail in later ones, without
explicit cross-references. For example, the data used to exemplify the methods de-
scribed in Chapter 5, different English translations of the Chinese novel Dream of
a Red Chamber, are properly accounted for only in Chapter 7. On the other hand,
the explanation of statistical methods found in Chapter 5 partly overlaps with the
explanation of some of the same methods (e.g., YF-idf and chi-squared) found in
the following chapter. Some statistical methods and tests are carefully illustrated
in non-statistical jargon, while others are simply introduced as “commonly used”
or “best” for one type of assessment, and explained in rather technical terms, e.g.,
“the statistics is largely based on the squared vertical distance between the points
on the curves for each type or token length” (224). In such cases, as new tests, for-
mulas, tables, diagrams and graphs are introduced, the reader with no mathemati-
cal or statistical training will have to take the author(s)’ word on their usefulness,
but will probably feel baffled and somewhat frustrated not to be able to grasp the
rationale behind them. Some research questions (especially in the more technical
chapters) may appear to be formulated in order to apply statistical methods and
techniques, rather than the other way around.

© 2015. John Benjamins Publishing Company


All rights reserved
Book reviews 143

This volume is a thematically coherent collection of articles and an important


contribution to corpus-based translation studies. All together, it makes a convinc-
ing argument for the need to assess the statistical significance of theoretical claims
based on quantitative corpus data; it offers an extensive survey of descriptive and
exploratory statistical methods which can be applied to translation studies; it
provides an introduction to software tools and techniques which can be used to
generate graphs and perform calculations; it suggests a range of insights on inves-
tigations which can be carried out through statistical means. However, given the
interdisciplinary nature of translation studies, the volume would have benefited
from a more down to earth approach guiding the reader more closely through the
various steps needed to understand the role which can be played by corpus statis-
tics. This might have been done by a proper introduction acquainting the readers
with the main concepts and methods of corpus statistics as applied in the various
chapters and in translations studies more in general, rather than disseminating
such information throughout the volume which, thus, falls short of expectations
as a proper guidebook.

References

Baayen, R. Harald. 1994. “Derivational Productivity and Text Typology.” Journal of Quantitative
Linguistics 1 (1): 16–34. DOI: 10.1080/09296179408589996
Baker, Mona. 1993. “Corpus Linguistics and Translation Studies: Implications and Applications.”
In Text and Technology: In Honour of John Sinclair, ed. by Mona Baker, Gill Francis, and
Elena Tognini-Bonelli, 233–250. Amsterdam: John Benjamins. DOI: 10.1075/z.64.15bak
Baker, Mona. 1999. “The Role of Corpora in Investigating the Linguistic Behaviour of
Professional Translators.” International Journal of Corpus Linguistics 4 (2): 281–298. 
DOI: 10.1075/ijcl.4.2.05bak
Izquierdo, Marlèn, Knut Hofland, and Øystein Reigem. 2008. “The ACTRES Parallel Corpus: An
English–Spanish Translation Corpus.” Corpora 3: 31–41. DOI: 10.3366/E1749503208000051
Li, Defeng, Chunling Zhang, and Kanglong Liu. 2011. “Translation Style and Ideology: A
Corpus-assisted Analysis of Two English Translations of Hongloumeng.” Literary and
Linguistic Computing 26 (2): 153–166. DOI: 10.1093/llc/fqr001
Kenny, Dorothy. 2001. Lexis and Creativity in Translation. A Corpus Based Approach. Manchester:
St. Jerome.
Laviosa, Sara. 1997. “How Comparable Can ‘Comparable Corpora’ Be?” Target 9 (2): 289–319.
DOI: 10.1075/target.9.2.05lav
Macken, Lieve, Orphée De Clercq, and Hans Paulussen. 2011. “Dutch Parallel Corpus: A Balanced
Copyright-Cleared Parallel Corpus.” Meta 56 (2): 374–390. DOI: 10.7202/1006182ar

© 2015. John Benjamins Publishing Company


All rights reserved
144 Book reviews

Reviewer’s address
Federico Zanettin
Department of Political Sciences
University of Perugia
Via A. Pascoli 23
06123 Perugia
Italy
federico.zanettin@unipg.it

© 2015. John Benjamins Publishing Company


All rights reserved
View publication stats

You might also like