Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Translation Memory Database in the Translation Process

Sanja Seljan, Ph.D.


Faculty of Humanities and Social Sciences, Department of Information Sciences
Ivana Lučića 3, 10 000 Zagreb, Croatia, tel/fax: +385 1 600 24 31
sseljan@ffzg.hr

Damir Pavuna, M.Sc.


Integra d.o.o., A. Stipančića 18, 10 000 Zagreb, Croatia, tel:+385 1 38 33 447
damir.pavuna@integra.hr

Abstract. Translation memory (TM) today widely condition to be used on certain type of text of
used Computer-Assisted Translation (CAT) tool, considerable volume. It is consistent, fast but the
based on matching of source and target language biggest lacks of TMs are non-existence of knowledge
segments is intended for future reuse of already of the language to be translated and context
translated domain restricted type of texts. It is insensitivity.
consistent, fast and cost-saving but the biggest lacks In spite of this TMs have been implemented in
are non-existence of language knowledge and context various multinational companies and EU, often with
insensitivity. Often integrated with machine MT software and other CAT tools, where the
translation (MT) software, TMs have been translation process is integrated into document
implemented in various multinational companies and processing workflow.
EU. In the paper the experience in TM use is presented
The neural network approach is judged and in translation to Croatian on the text of the restricted
confronted to the knowledge based approach. domain, with respect to 10 years experience and ever
Experience of TMs use in Croatian organizations is growing TM. Necessity of using some kind of
presented and suggestions for further integration into language knowledge is discussed and solutions are
document processing workflow given. proposed.

Keywords. Translation Memory (TM), database, 2. Translation demands in EU


Computer-Assisted Translation (CAT) Tools,
translation unit, cost-saving, speed, consistency, Translation demands in EU surpass human
terminology, alignment, languages. capacities from several sides (huge number of pages
to be translated every day, 20 official languages, need
1. Introduction for consistency, for data-sharing, extremely short
deadlines, insufficient number of translators,
Everyday need for quick access to information and considerable volumes of text, translation costs,
documentation, translation into other languages, specialized training courses, requirements for every
increasingly short time of product life and new member state, etc. ref [18] this is the area where
documentation production, translation needs in systematic use of CAT and MT tools could facilitate
business, education, science and culture have caused written communication by offering fast and cost-
use Computer-Assisted Translation (CAT) tools in the effective raw translations.
translation process. With the last enlargement of EU consisting of 25
Translation memory (TM), in its simplest form a member countries and 20 languages (Czech, Danish,
database, is today widely used Computer-Assisted Dutch, English, Estonian, Finnish, French, German,
Translation (CAT) tool intended for future reuse of Greek, Hungarian, Italian, Latvian, Lithuanian,
already translated texts. Based on matching source Maltese, Polish, Portuguese, Slovak, Slovene,
and target language segments, TM does not translate Spanish, Swedish), the complexity of translation goes
anything by itself and differs from the Machine up to 20 x 19 = 380 language combinations. With
Translation (MT) software. such an overload of translation work, use of CAT
In the TM new segments are compared to the tools, among which translation memories (TMs), has
database content, and the resulting output (exact, been recommended by Directorate-General for
fuzzy or no match) is reviewed and finalised by the Translation of the European Commission.
translator. Need to overcome language barriers that exist not
Primary reasons for implementing TM technology only in EU, but also in national government
are consistency, speed and cost-saving, but under institutions, agencies, scientific, business and
educational institutions, in industry, multinational
associations, etc. show that use of technology in the analysis goes to EURODICAUTOM, the central
translation process of the domain specialized text terminology database which is redesigned to a modern
have become necessary relational database management system. Retrieved
terminology can be also used in TWB environment
3. Integrated TMs using MULTITERM terminology base.
The best results are obtained for the language pairs
The main body responsible for translation and that combine TM (TWB) and Systran [19] such as En-
written communication, Directorate-General for Fr, En-Es, En-It, Fr-En, Fr-Es, Fr-It, Es-En, Es-Fr,
Translation (DGT) of the European Commission (EC) with the main idea of using MT system Systran when
employs more than 2000 professionals has there is no satisfactory proposition from the TM
recommended use of CAT/MT tools and methods in (TWB) database.
order to cope with overload of translation work:
a) Administration and document tools – for 4. Translation Memory
electronic submission translation requests and
integration into electronic archiving system In the TM the efficient translation is achieved
b) Use of CAT tools such as: through three main functions:
- Terminology tools MAT tool breaks texts into segments (sentences or
- TM at two levels: local and central sentence fragments) and presents the segments in a
- Machine translation convenient way, to make translating easier and faster.
c) Voice recognition In some tools each segment is presented in a special
According del Pino [4], six institutions and bodies box and the user can accept the offered translation,
of the EU (Council, Commission, Court of Auditors, modify it or enter the new one.
Economic and Social Committee, Committee of the - Translation of each segment is saved together
Regions, Translation Centre) have published a tender with the source text. Source text and translation
in 1996 for the acquisition of a software package are treated as translation units (TUs). One can
providing "integrated translation support tool", i.e. return to a segment at any time to check the
translation memory software integrated with word- translation. There are special functions which
processing programs, with text alignment utility and a help navigating through the text and finding
terminology management program. segments which need to be translated or revised
For the reasons of short deadlines, expenses, large (quality control).
volumes and need for consistency EU has integrated - The main function of a MAT tool is to save the
all modules of the translation process and had added translation unit in a database, called TM which
the translation memory into EURAMIS system. This can be re-used for any other text or even in the
type of integration into "Translation Workstation" same text. Through special "fuzzy search"
discovers the accelerated need for integration of features the search functions of MAT tools can
communication tools, especially translation tools into also find segments which do not match 100%.
organisational workflow. This saves time and effort and helps the translator
The acronym EURAMIS, standing for European to use terminology more consistently.
Advanced Multilingual Information System used in - Along with storing of aligned segments of source
the Commission's Translation Service is e-mail base and target languages, translation memory grows,
client-server application that gives access to a number and the result of the segment to be translated can
of language tools and services and is based on the be offered in the form of exact match, fuzzy
following principles [2]: match or when no match is found.
- Storage of linguistic resources of general interest
in one place (Linguistic Resources Database, 4.1. Reasons to use
LRD) in order to make them available to all users
in the translation service Need for consistent use of terminology, data-
- One internal format because of easier exchange sharing of common resources, re-use of already
of results translated and revised text suggest used of Translation
- Mass treatment of linguistic data on a server by Memory, in its simplest form a database.
modular programs TM is organised as a database, where each source
Integration of machine translation software language segment is stored together with target
(SYSTRAN) with translation memory (TWB – language segment, which is called "translation unit".
TRADOS) combines both types of advantages TMs can be organised for numerous language
depending on the users' preferences and on the type of pairs and most often are integrated with terminology
document to be translated, while post-editing can be bases, alignment system and word-processor, but also
done in a Word format. with machine translation systems, as in EU translation
Automatic lookup of the terminology in Systran workflow. TMs are intended to augment efficiency,
goes first to very powerful morphological analyser consistency, reduce costs, enable data-sharing, and
that is used as a lemmatiser and the result of this not to replace the human.
"MultiTerm", then "Transit" TM system by Star used
4.2. Output with "Termstar", then "Déjà vu" by Atril, "TM2" by
IBM, "XL8" by Globalware, among which Trados
TM does not translate anything by itself and merged with SDL into SDLX currently has
differs from the MT software. In the TM new competitive advantage.
segments are compared to the database content and In order to allow better interoperability between
any new sentence that has to be translated is firstly TM systems, the special group of the LISA
searched in the database and the matched value is (Localisation Industry Standards Association) named
calculated (it can be defined by the user). The OSCAR group (Open Standards for Container/
resulting output (exact, fuzzy or no match) is Content Allowing Re-use) has defined standards for
reviewed and finalised by the translator. If the mach the development and maintenance of open standards
value is 100% (i.e. exact mach) then the translation is [15] for the language industry which include:
inserted from the database into the text. If the match - TMX (Translation Memory eXchange) format –
value is below 100% but above the defined the standard for the exchange of translation
percentage (i.e. “fuzzy mach”) the user can accept the memory data
offered translation and correct it. Language segments - TBX – the standard for exchange of structured
whose values are below the defined margin have to be terminological data
translated manually and then new proposals will be - SRX – the standard for exchange of information
stored for future use. about how translation tools segment text
In the case of already existing source and target TMX is a certifiable standard format (since 1998),
language texts, the text can be aligned on the level of created to allow easier exchange of TM data between
word, phrase, clause, sentence or even paragraph and tools and/or translation vendors with little or no loss
then integrated into TM dbase. In the TM system, text of critical data during the process. TMX provides a
information system is separated form the file standard XML format for the representation of
formatting code, so the original page layout can be Translation Memory (TM) data. According to
saved. OSCAR research group, TMs database represent
strategic corporate assets, of worth for international
business and TMX is actually a way to protects them
4.3. Type of text
against market and technology changes. It also
indicates that although TMX is free and open format,
Primary reasons for implementing TM technology only software products that have been certified as
are consistency, speed and cost-saving, but under conforming to TMX by LISA's testing lab are
certain conditions. Besides the basic condition for the licensed to carry TMX logo.
source text to be in electronic form, and cleaned from
miss spellings, only certain types of texts are suitable
for translation using TM. There are several basic 5. Pro and con of TMs
conditions all of whom should be confirmed in order
to proceed with TM translation: As translation work done by TM is much faster,
- Repetitive type of documentation of the same the resulting profitability of the TM system is
domain that tends to appear regularly or at least measured through lower costs, shorter time,
several times, so the old versions could be reused terminology consistency and saved layout of the page,
- Larger volumes of text (e.g. manuals, decisions, and one of the key advantages is that translation units
technology reports, catalogues, etc.) in TM can be organised as pairs among number of
- Style of the text having many repetitions (terms, languages.
phrases or sentences repeat and could be re-used Despite of mentioned savings, the hidden costs
in new translation of the same domain) should be also mentioned, such as costs of software,
containing specialized vocabulary, short and maintenance, education for work with TM system,
simply structured sentences with not many building of glossaries and TM revision. The biggest
pronouns and adjectives, standardized lacks of TMs are non-existence of knowledge of the
terminology, universal tables and graphics language to be translated and context insensitivity.
- Need for consistency of terminology Among obstacles in use of TMs, several ones
- Short deadlines could appear [10]:
- When professional classic type of translation - It is possible that TM tools do not fit into existing
takes too long translation or localization process regarding
approval of translation changes
- Customization of the TM systems and sufficient
4.4. Format
training
- Significant investment (purchase of software,
There are different practical solutions offered by
importing of past translations into TM database
different companies, such as "Translator's
(i.e. alignment process), training, if necessary
Workbench" by Trados used together with the
additional terminology tools, maintenance costs
corresponding terminology management solution
(upgrading with memory, fast network card, hard - How to organise data-sharing of translation
disk) project among several translators during
- Protection of TM investment by developing interactive work (simultaneously, updating TM
proper strategy for maintaining TM database and approved changes)
(data on frequency of updated, regular - Scanning the document for references pointing to
distribution, backup), ownership of the TM extra documents
(regulated by agreement), confidentiality, support - Can translators cope with augmented on-screen
of the TMX format information flow?
- Legalities – whether the intellectual property of - As integration of revisions into the TM workflow
the TM belongs to freelancer, vendor or end- is of special importance for the translation in
client question and for next ones, the following
questions should be considered:
6. TMs in organizational workflow - Will reviser update TM used by translator asking
him for permission
When considering introduction of large-scale of - Who takes responsibility for the final version of
TM software, even if the organisation is entirely the TM and of the document
based on translation experience and organisationally - Necessity to include labels when changes in the
independent, according del Pinto [4] the following document or TM are made (person, date,
should be taken into consideration: document – most TM tools have all these data in
After considering mentioned demands for type of the structure of every change made in TM
text (volume, repetitiveness, consistency, cost- database – they have the trace of changes)
effectiveness, time), the next stage would be building - How is the backup performed (it is very
of TM database from the existing data after important to have periodical backup and labelled
considering the following questions: TM history)
- Use of existing data – are they usable with a Our experience in doing revisions it is that revised
reasonable degree of effort, particularly whether TM becomes Read Only TM and every new
original text and translation exist in electronic translation must pass revision to become Read Only
form and in the same format. If not, is scanning TM that has the permission to be used. For every
of some key documents a reasonable alternative language there is one translator that is in charge for
and if yes does the translation organisation have a TM and does the revisions and approves the TM to
convenient system for retrieval of all necessary become read only.
documents?
- Can alignment be lightly edited or must be of 7. TMs in Croatian organizations
perfect quality? How much time does it take or is
it better to subcontract it? Unfortunately there is no reliable information
- If suitable data do not exist or are not worth about the use of TM and CAT tools in Croatia.
converting, will translators accept working with Implementation of CAT tools requires organizational,
TMs without any significant benefits at the educational and professional changes, so not many
beginning and will TM be result of their companies have implemented it jet. According to
interactive work within reasonable time? information that Integra d.o.o. has from their long
According to our experience another very experience in the field, and as Trados exclusive dealer
important fact is the text repetition. The organisation for Croatia, there are no more then 10 companies that
which has a lot of similar translations can seriously have more then one TM workstation and 200
consider buying TM product. For instance technical freelance translators that use TM to help them
manuals or agreement revisions are ideal for TM use producing translations. MAT tools are rather often in
because every new document has many same or use but mostly as a little bit more intelligent
slightly changed sentences and savings are significant. dictionaries and not as tool that produces translations!
On the other hand TM has noting to do with According to our experience in using CAT tools
translating classical literature. It is very important (more the 10 years of using CAT tools, mostly
because no translator is going to make this big effort Trados) especially TM, this is a very useful tool that
to implement TM in his work if there is no evident can really drop down your costs, increase your
savings in his work to be done. productivity and quality of translation. When talking
When working with TM the following should be about technical manuals and their translation (e.g.
considered, the main question is a way to organise localisation of IT industry manuals) our analysis have
preparation of interactive work, which might include: shown us that even with a new field (e.g. media
- Decision on the most suitable TM, if there are products) we have very fast - after 3. updating of
several of them and does the TM contain products reached 30% of translation saving. On the
necessary material (previous versions, related other hand with our old customers we can, on some
documents, etc.) very similar products, reach 50% of savings! This
seems to be some kind of TM threshold in savings.
At the moment we are considering steps to use [2] Blatt, Achim. EURAMIS: Added Value by
TM tools with some kind of knowledge about Integration. Terminologie et traduction 1, pp.
Croatian language, and we hope that it can increase 59-73, 1998.
our translation savings, productivity and quality. The [3] Cavalli-Sforza, Violetta ; Brown, Ralf;
biggest lack is that TM does not have any knowledge
Carbonell, Jaime; Jansen, Peter J; Kim, Jae
about the language to be translated. TM is neither
sensitive to context problems nor to cultural or D. Challenges in Using an Example-Based
symbolic differences of language pairs. Necessity of MT System for a Transnational Digital
using some kind of language knowledge is obvious. Government Project. Proceedings of EAMT,
2004.
8. Conclusion [4] Del Pino, Santiago. Using Translation
Memory Software (TMS): An Organisational
Although TM saves time during translation by Checklist. Terminologie et Traduction, pp.
using existing translations, it also creates some new 132-139; 1998.
tasks (management, revision, preparation, post- [5] Directorate-General for Translation of the
editing). It the document is translated into several European Commission. Translation Tools
languages, some additional gains can arise. TM brings and Workflow; 2005. http://europa.eu.int/
also additional demands regarding protection of comm/dgs/translation/bookshelf,/
copyright, property of TM and additional expenses
toolsandworkflowen.pdf [10/1/2006]
when requiring additional linguistic resources, as well
as significant investment of in-house training. One of [6] Directorate-General for Translation of the
important benefits of TM could be complementarities European Commission. Translation Tools
among translators who could benefit from the and Workflow; 2005. http://europa.eu.int/
previous work of their colleagues, but knowing that comm/dgs/ translation/bookshelf,/
the price of translation with TM differs from the toolsandworkflowen.pdf [10/1/2006]
professional translation. As the language changes, TM [7] Directorate-General for Translation of the
could be also considered as a linguistic archive. European Commission. Translating for a
Introduction of TM into organisational workflow Multilingual Community; 2005.
imposed numerous questions, such as: Has the http://europa.eu.int/comm/dgs/translation/
introduction of TM ensured kind of balance between
bookshelf/brochure_en.pdf [10/1/2006]
data-sharing and confidence of data, between job
sharing, data quality and heavy management. Are [8] Dovedan, Zdravko; Seljan, Sanja; Vučković
acquired data in the TM valuable for future translation Kristina. Strojno prevođenje u procesu
projects? Is TM database valuable for new translation komunikacije, pp. 283-291. Informatologia
tasks and how is this new type of workflow accepted 35 (4); 2002.
among translators? etc. [9] EAGLES Evaluation Working Group.
Although there are potential risks and Benchmarking translation memories.
inconveniences, TM and MT represent different types http://www.issco.unige.ch/ewg95/node157.ht
of CAT tools that will augment efficiency ml#SECTION001043000000000000000
(consistency, data-sharing) and bring some savings [10] Heuberger, Andres. What you Need to
(cost, time, effort). Although TM can be seen as a
Know about Translation Memories?
pure database intended for archiving of language
pairs, it can be enriched by linguistic component Multilingual webmaster.com
related to the proper language. http://www.multilingualwebmaster.com/
TM does respond to another aspect of the human library/trmemories.html [12/2/2006]
intelligence: its capacity and pragmatism integrating [11] Hodasz, Gabor; Grobler, Tamas; Kis,
technology and language for the purpose of Balazs. Translation Memory as a Robust
automation and computerisation of the translation Example-based Translation System.
process, its integration in the translation workstation Proceedings of EAMT, 2004.
and organisational workflow with the main aim: tool [12] Hutchins, John. The State of Machine
for realisation positive human needs. Translation in Europe and Future
Prospects. HLT Central; 2002.
References: http://ccl.pku.edu.cn/doubtfire/NLP/
Machine_Translation/Overview/Article%2
[1] Andersen, Poul. Translation Tools for the 0-%20MT_John_Hutchins.htm
CEEC Candidates for EU Membership – an [20/10/2005]
Overview.http://ec.europa.eu/comm/translati [13] Hutchins, John. Towards a new vision for
on/reading/articles/pdf/1998_01_tt_andersen. MT. Proceedings of MT Summit; 2001
pdf [10/10/2005] Sept 18-22; Santiago de Compostela,
Spain. http://www.translationdirectory.com [18] Seljan, Sanja; Pavuna, Damir. Why
/article402.htm [15/10/2005] Machine-Assisted Translation (MAT)
[14] Hutchins, John. Computer-based Tools for Croatian? Proceedings of ITI
translation tools, terminology and International Conference, pp. 469-474.
documentation workflow: report from Cavtat, 2006.
recent EAMT workshops. 4th Infoterm [19] Ulrich, Heidi. La mise en place du
Symposium "Terminology work and Translator’s Workbench (TWB):
knowledge transfer”, Vienna, 1998. Concurrence avec SYSTRAN et élément
[15] LISA (Localisation Industry Standards humain. Terminologie et traduction 1, pp.
Association): OSCAR (Open Standards for 102-116; 1998.
Container/ Content Allowing Re-use) [20] Zetzsche, Jost. Translation Memories: The
http://www.lisa.org/sigs/oscar/ [12/2/2006] Discovery of Assets: Recognizing
[16] Loffler-Laurian, Anne-Maire. La opportunities and overcoming obstacles to
traduction automatique. Villeneuve TM sharing. MultiLingual Computing &
d'Ascq: Presse Universitaire du Technology 72, vol. 16 (4), pp. 43-45;
Septentrion; 1996. 2005.
[17] Petrits, Angeliki. EC Systran: The
Commission’s Machine Translation
System; 2001. http://europa.eu.int/
comm/translation/reading/articles/pdf/2001
_mt_mtfullen.pdf [15/1/2006]

You might also like