TMF Normalization of Arabic Technical and Scientific Terms

TMF Normalization of Arabic Technical and Scientific
Terms*
Chihebeddine Ammar1, Kais Haddar1, and Laurent Romary2

1
MIRACL Laboratory & Department of Computing and Telecommunication
Faculty of Sciences, Sfax University
Sfax, Tunisia
{chihebeddine.ammar, kais.haddar}@fss.rnu.tn
2
INRIA & Humboldt University
Berlin, Germany
laurent.romary@inria.fr
Abstract. Multilingual terminological data are an essential component for

many industries and the development of these resources is very expansive. In
addition, the exchange and data fusion is an important aspect in the construction
of most of the terminological databases (TDBs). We noticed a lack of robust
and rigorous terminological databases for the Arabic language and even less
standardized terminological databases. In this context, we present in this paper,
a process to organize the collection of terminological resources for the devel-
opment of Arabic multidisciplinary standardized terminology (according to
TMF ISO 16642 standard) based on a thorough study of the typology and char-
acteristics of Arabic technical and scientific terms.
Keywords: TMF, normalization, terminology, Arabic terminological database,

interoperability
1 Introduction
The main objective of terminology works is to document terms used to designate

concepts of a given discipline. It usually involves manual or semi-automated analysis
of documents to identify candidate terms. The development and the maintenance of
terminology resources is an extremely long and difficult process requiring continuous
human expertise from multiple domains.
Nowadays, the importance of using specific terminology - thus ensuring a clear
and effective communication between specialists - has become increasingly evident.
Indeed, the need to standardize the use of terminology was very important in profes-
sional circles and among various agencies drawn together by the community of their
*
This work is partially funded by the European Commission H2020 project Parthenos (Grant
agreement: 654119, Call: H2020-INFRADEV-1-2014-1).
interests. So, in order to facilitate cooperation and avoid redundancy it is appropriate
to use the standards and guidelines for the creation and utilization of terminological
data collections and for the sharing and exchange of such data.
Technical and scientific documents are written in specialist language. They are
very rich in well-presented terminology and cover multiple technical and scientific
fields. In fact, scientists, technical editors and inventors are the best to present the
scientific and technical terms of a field. Since, when drafting their scientific papers,
technical standards or patent applications, they will carefully choose words, terms and
named entities of a specific domain.
For many reasons (we will list some of them in section III), Arabic language is
considered as poor language coverage, in the sense that the work of Arabic linguists
and terminologists seems insufficient to make the Arabic language a technological
language. For this reason, there is a lack of standardized multidisciplinary Arabic
terminological databases and as a result, no possibility of fusion and interchange of
terminological data.
Another issue is that Arabic scientific and technical documents may contain ambi-
guities and have semantic issues due to regional specificities. In fact, in the same
field, there is a risk that the terms will be represented in different ways from one sci-
entific or technical document to another, from one country to another.
In this paper, we aim, through a thorough study of the characteristics and typology
of Arabic technical and scientific terms (from a collection of patents, scientific arti-
cles and manuals), to propose a development process of an Arabic standardized mul-
tidisciplinary terminological database from a corpus of Arabic technical and scientific
documents. The originality of our work consists in the automatic extraction of Arabic
terms and their presentation in a standardized format.
This paper is organized as follows. Section 2 is devoted to the presentation of the
previous works concerning existent norms of terminological products and Arabic and
non-Arabic terminological databases. In section 3, we detail the characteristics of
Arabic terms of our corpus. Then, in section 4, we present our method to automatical-
ly generate standardized terminological database. Section 5 is devoted to the experi-
mentation and evaluation and we conclude and enunciate some perspectives in section
6.
2 Previous Work
In this section, we detail the concept of standardization by presenting some existing

standards. Then, we give an idea about some work that result.
2.1 Normalization of terminological products

Normalization is the process leading to an agreement to make things work together.
When applied to terminology databases, normalization leads to an agreement under
which technical terms will be used in a standard and specifies the characteristics by
which the selected terms must be understood.
The Technical Committee TC 37 of the International Standards Organization
(ISO), which is responsible for the standardization of methodology in terminology,
has long been concerned to facilitate the exchange of terminological data by produc-
ing many standards, among which we can mention the following:
ISO 704: Terminology work — Principles and methods [10] which is an ISO
standard that establishes the basic principles and methods of preparing and compiling
terminologies both inside and outside of the standardization framework and describes
the links between objects, concepts and their terminological representations.
ISO 12620: Terminology and Other Language and Content Resources –
Specification of Data Categories and Management of a Data Category Registry for
Language Resources [9] which lists the data category names that should be used to
identify the different information that is usually found on a terminology record.
ISO 16642: Terminological Markup Framework (TMF) [8], [13] and [14] which
provides a framework for the representation of terminological databases in XML.
The Text Encoding Initiative (TEI1,2) consortium [12] provides guidelines to de-
scribe terminological data to allow users of terminology databases to exchange data-
base records.
According to the chosen approach (onomasiological or semasiological), the organi-
zation of data will be different. These two approaches are inverse but synergistic.
Today, the communities of terminologists and lexicographers have agreed to normal-
ize a single approach, the onomasiological. The main reason is its relative simplicity
of hierarchical concepts organization to be perfectly suitable for modeling multilin-
gual and multicultural repositories. That is why the community of terminologists and
lexicographers standardizers standardized as the only valid, the principles and meth-
ods corresponding to the onomasiological approach (ISO 704, Terminology: princi-
ples and methods).
They then standardized an open catalog of data category capable of defining termi-
nological and lexicographical data (ISO 12620). Then, they finally standardized a
common framework for terminology implementation able to ensure interoperability
and reusability of terminological resources independently of the various terminologi-
cal databases. This common framework, TMF (ISO 16642) requires that the different
bases comply the TMF XML representation (generic mapping tool (GMT) which is a
canonical representation for the computerized terminology markup in XML), or re-
quires that the terminological resources are reformatted according to the same XML
mapping tool.
Therefore, it is clear that terminological efforts were very important and were fo-
cused on the TMF and adaptation of new versions of ISO 12620 to the new methods.
In fact, the reference standard for multilingual terminologies within the ISO-norm
family is the ISO norm 16642 (TMF) which is a meta-model that decomposes the
organization of a terminological database into basic components according to hierar-
chical logic levels, as shown in Fig. 1.
1
http://www.tei-c.org/index.xml
2
http://www.tei-c.org/Vault/P4/doc/html/TE.html
The TDC (Terminological Data Collection) is the level of the database itself. It is
the union of several terminological records according to various languages dedicated
to the same purpose, it is in fact, the informative data collection on a specific domain
concepts. It includes, in addition to the terminological information in the strict sense,
global and complementary information. The GI (Global Information) and CI (com-
plementary information) are two levels which store important reference data to man-
age or operate the base. These data do not belong directly to the terminology collec-
tion.
Fig. 1. TMF meta-model.
The structural logic scheme that optimizes onomasiological approach is organized

in TE levels (Terminological Entry), LS (Language Section), TS (Term Section) and
TCS (Term Component Section). The TE contains information about terminology
units (the concepts specific to some topics, terms, etc.). The LS is a part of a termino-
logical entry including linguistic information. The TS is a part of a language section
giving information about a term. The TCS is a part of a term section giving linguistic
information about lexical components of the term.
2.2 Arabic terminological databases

In non-Arabic languages, there are many terminological databases, such as, the multi-
lingual terminology of the European Union, IATE3, contains 8,4 million terms in 23
languages covering EU specific terminology as well as multiple fields such as agricul-
ture or information technology. The terminology portal TermSciences 4 [11] offer a
TMF standardized scientific multidisciplinary terminology.
In the Arabic language, there are several terminological databases containing Arabic
concepts and terms. For instance, the multilingual terminology portal of the World
Intellectual Property Office, WIPO Pearl 5 , gives access to scientific and technical
3
http://iate.europa.eu
4
http://www.termsciences.fr/
5
http://www.wipo.int/reference/en/wipopearl
terms in ten languages, including Arabic, derived from patent documents. It contains
15,000 concepts and 90,000 terms. The ARABTERM6 is an online technical diction-
ary organized per industry sector and allowing user participation. Finally, the UN-
ARABTERM7 is a multilingual terminology database which offers technical or spe-
cialized terms in four of UN’s official languages: Arabic, English, French and Span-
ish.
Some previous works use automatic methods to extract Arabic terms and expres-
sions. For instance, in [3], authors use a hybrid approach combining grammatical
patterns and statistical methods to extract Arabic multi-word terms. In [2], authors use
a combination of three approaches that rely on linguistic information, frequency
counts, and statistical measures to extract multi-word expressions.
A very small number of studies and projects have been developed through the
standardization of Arabic terminology products. They have adopted the standards
mentioned in the previous section to achieve the desired results. One of these projects
is CARTAGO [7] which is an international project with group skills and provides
networking facilities to collect and make available widely standardized multilingual
terminology resources in the field of e-learning.
These previous works have three major disadvantages. The first is that some of
them ([3], [2], WIPO Pearl, ARABTERM and UN-ARABTERM) do not provide (or
are not) Arabic standardized TDBs so it will be very difficult to use their terms for
exchange and merging with other TDBs. The second is that some of them build TDBs
manually (WIPO Pearl, UN-ARABTERM, ARABTERM and CARTAGO). The third
disadvantage is that some of them provide a small number of Arabic terms and con-
cepts for only one field (CARTAGO) or for a very small number of fields (WIPO
Pearl and UN-ARABTERM).
3 Characteristics of Arabic Technical and Scientific Terms
In this section, we detail the characteristics of Arabic technical and scientific terms of
our corpus. First, we explain the impact of foreign languages on Arabic terms. Sec-
ond, we enunciate terminological disparities. Next, we point the translation of some of
them to other languages. Then, we present multi-word terms. Finally, we give exam-
ples of conceptual and terminological relationships.
3.1 The impact of foreign languages and cultures

At the times when the Arab world lived a technical and scientific progress (e.g. Anda-
lusian civilization), the Arabic language was used as a source of terminological and
conceptual borrowing for other Latin languages.
But at the time of the industrial and socio-cultural revolutions in the Western
world, the reverse has been happening. The development of any language depends on
6
http://www.arabterm.org/
7
http://unterm.un.org/dgaacs/arabterm.nsf
its ability to express complex notions of the modern world of science and technology
and to be a tool capable of translating the intellectual life in its entirety. The Arabic
language is, like any other language, a satisfactory instrument for the expression of
the world of which it is correlative. It is only in scientific and technical terminology
that it may seem insufficient. Unfortunately, the Arabic linguists and terminologists
have been unable to bridge the terminological and conceptual gaps from the socio-
cultural and technological development and to ensure consistency of a normalized
nomenclature of concepts. That is why the Arabic language is often criticized for its
slow modernization in terms of technical terminologies.
As a result, we noted, in many patents and manuals, the use of technical terms in
French or English and not their Arabic equivalents (e.g. jump guard). It seems that
Arabic inventors and technical editors cannot found the appropriate technical terms in
Arabic. We noted, also, the intensive use of phenomena such as the Arabization of
some English or French terms. It gives to them a double identity, one which is the
Arabization and another which is the Arabic equivalent. For instance, the word
“code” in the term “elevator code” has an Arabization ‘‫ ’كود‬kuwdo8, and it has an
Arabic equivalent which is ‘‫ ’رمز‬ramozo. Another phenomenon is periphrasis in Ara-
bic terminology which is due to the abundance of foreign neologisms to which it is
not possible to find one-word equivalent. For example, the word “microphone” could
be ‘‫ ’ميكروفون‬miykoruwfuwno as an Arabization or ‘‫ ’مكبر الصوت‬mukabGiro
AloSawoto, literally “amplifier of the sound” as a periphrasis.
3.2 Terminological disparities
The Arab world extends over a large geographic area with historical and socio-
cultural specificities and conceptual and terminological, semantic and taxonomic
disparities between Mashreq (Middle East) and Maghreb (Arab Maghreb) regions,
countries (urban-rural) and between the literary or administrative language and dialec-
tal or popular language. For instance, an inventor or technical editor from the
Mashreq uses the term ‘‫ ’شفرة‬$aforapo to designate the term “code”, and an inventor
from the Maghreb uses the term ‘‫ ’كود‬kuwdo.
Other disparities could be caused by the different variations of terms: flexional
(e.g. ‘‫ ’مصعد ثنائي الطوابق‬miSoEado vunaA}iy AloTawabiqo “double-deck elevator” and
‘‫ ’مصاعد ثنائية الطوابق‬maSaEido vunaA}iyGapo AloTawabiqo “double-deck elevators”),
graphic, syntactic and morphosyntactic variations.
3.3 Terms’ Translation
Most of the Arabs are bilingual or more, the majority masters the French or English
languages. For this reason technical and scientific documents generally have a transla-
tion of technical and scientific terms or keywords in the same paragraph. These trans-
8
Buckwalter Arabic Transliteration System: http://www.qamus.org/transliteration.htm
lations are usually of a very high quality because they are made by professional hu-
man translators and they facilitate the implementation of multilingual terminology
databases.
3.4 Mutli-word terms
There are two types of terms: the single word terms and multi-word or complex terms.
The structures of Arabic complex units that could be lexicalized are mainly
Noun+Adjective, Noun+Noun, Noun+Preposition+Noun. Other structures more com-
plex exist. Our work is to extract terms from complex sentences. More precisely, to
reduce the dispersion in the modeling of complex sentences content while keeping the
semantics of the base term. For example, the sentence ‘‫’زر يعمل عن طريق الضغط عليه‬
zirGo yaEomalo Eano Tariyqo AloDGagoTo Ealayoho “button works by pressing it”.
The term resulted from the complex sentence is ‘‫ ’زر بالضغط‬zirGo biAloDGagoTo
“pushbutton”.
3.5 Conceptual relationships
The identification of concepts and semantic relationships between them is an im-

portant element for terminology practice. The semantic relationships actually express
the sense relations between concepts, they necessarily relate to the hierarchical rela-
tionships as hyperonymy, hyponymy, holonymy, meronymy and non-hierarchical
relationships such as synonymy, antonymy, opposition, etc.
Therefore, the concepts of our multi-domain corpus adopted for our work shows a
set of these relations. These include: synonymy: some concepts have the same signi-
fied and different signifiers. For example, ‘‫ ’مصعد بدون وزن عكسي‬miSoEado biduwno
wazono Eakosiy signifies ‘‫ ’مصعد بدون وزن معادل‬miSoEado biduwno wazono muE-
aAdilo “elevator without counterweight” in English. Here, the two concepts have the
same part (‘‫ ’مصعد بدون وزن‬miSoEado biduwno wazono “elevator without weight”)
and two synonymous words (‘‫ ’معادل‬muEaAdilo “equivalent” and ‘‫ ’عكسي‬Eakosiy
“reverse”).
Hierarchical relationships between concepts could be: Firstly, from the generic
concept to the specific concept(s) (from hyperonym to hyponym). For example, hy-
peronym: ‘‫ ’مصعد‬muEaAdilo “elevator”, hyponym: ‘‫ ’مصعد بدون وزن عكسي‬miSoEado
biduwno wazono Eakosiy “elevator without weight”, ‘‫ ’مصعد متعدد المقصورات‬miSoEado
mutaEadGido AlomaqSuwraAto “multi-car elevator”. Secondly, from the all to the
different parts (from holonym to meronyms). For example, holonym: ‘‫’مصعد‬
miSoEado "elevator", meronyms: ‘‫ ’عربة‬Earabapo "car", ‘‫ ’باب‬baAbo "door", ‘‫’زر‬
zirGo "button", etc.
3.6 Terminological relationships
We found in our corpus many types of term relations depending on the nature of
terms (single words or multi-word). For example, Synonymy: ‘‫’تحديد الهوية بموجات الراديو‬
taHodido AlohawiyGapo bimawojaAto AlorGaAdoyuw signifies ‘‫’التعرف بترددات الراديو‬
AlotaEarGofo bitaradGudaAto AlorGaAdoyuw “Radio Frequency Identification” in
English. It has also an acronym ‘‫“ ’ت ت ر‬RFID”.
4 Our proposed method
Our proposed method consists of: first, the construction of a Data Category Registry
(DCR) and TMF modeling; then, the processing of Arabic terms’ characteristics;
finally, the alimentation of the TDB.
4.1 Construction of Data Category Registry and TMF modeling

The ISO 12620 provides an inventory of well-defined Data Categories with standard-
ized names that function as data element types or as predefined values. We built a
Data Category Registry (DCR) which includes a set of carefully chosen data catego-
ries corresponding to the information that we want to represent and that fits Arabic
terms.
Fig. 2. TMF meta-model levels and associated Data Categories.

Then, as shown in Fig. 2 above, we associate these data categories with the abstract
structural skeleton, i.e. with the TMF levels (see Fig. 1). As a result, we get a TML
(Terminological Markup Language).
The TMF is based entirely on XML logic, which is beneficial for our terminologi-
cal database. In fact, we can easily edit the database if we have more information
about the TMF entries. This is why we associate the necessary data categories with
the TMF meta-model levels into an XML Schema. As a result, it can be used to check
that a given Terminological Data Collection is compatible with the TML.
4.2 Processing of Arabic terms’ characteristics

It is very important to decide how to treat all terms’ characteristics observed in our
corpus. In fact, some of them could be treated by including the appropriate data cate-
gories into our TDB. For example, the complex sentences from which we will extract
multi-word terms could be included as definitions of the terms using the “definition”
data category in the “TermSection” level (see Fig. 2 above).
Terms’ translation could be treated as equivalent of the Arabic terms by including
other “LanguageSection” sections (English or French), which will give us a multilin-
gual TDB.
For terminological relationships: if the terminological relationships are abbrevia-
tions, acronyms, etc., they will be explicitly presented by their appropriate data cate-
gories (“abbreviation”, “acronym”), however, if they are synonymy relationships,
they will be implicitly presented in the TDB (For the same “TerminologicalEntry”
(the same concept), any term (in a “TermSection”) is synonymous with other terms
within the same language (LanguageSection)) [15]. Similarly, we consider Arabi-
zations, periphrases, flexional, graphic, syntactic and morphosyntactic variations of a
term as its synonyms.
For Conceptual relationships: hierarchical and non-hierarchical relations are pre-
sented by their appropriate data categories. For example, associative relation could be
presented by the “associatedConcept” data category, while the “is a” or “part of” rela-
tions could be presented by the “broaderConceptGeneric”, the “has a” relation by
“specificConcept” data category.
4.3 Alimentation of the TDB

The conversion of terminology data into ISO 16642 format consists in using the Ge-
neric Mapping Tools (GMT) structural format recommended by the standard to im-
plement data categories selected in the DCR and anchored on the meta-model. As a
result, we have a TDB of Arabic technical and scientific terms normalized TMF. Fig.
3 shows and example of a terminological entry.
For the implementation of our TDB, our method is based on a symbolic approach
based on extraction rules to automatically extract and annotate terms into the GMT
format of TMF. The rules that are manually built, express the structure of the infor-
mation to extract and take the form of transducers. These transducers generally oper-
ate morphosyntactic information, as well as those contained in the resources (lexicons
or dictionaries). Moreover, they allow the description of possible sequences of con-
stituents of Arabic terms.
Fig. 3. Terminological entry in GMT format
5 Experimentation and Evaluation
In this paper, we resume the work of [1], in which authors built a transducer cascade
using the CasSys tool [5] of the Unitex platform to extract and annotate terms under
TMF format from a corpus of Arabic technical and scientific documents. However,
we noticed that many of the important characteristics (Section 3) of terms were not
considered and presented in the given TDB. So, in this paper, we involve them by
including appropriate section levels and data categories. For example, the “definition”
of the multi-word terms from the complex sentences could be included to the “term-
Section”. An example is shown in Fig. 4 which is a transducer that recognizes the
terms ‘‫ ’مصعد يعمل بالجر ببكرة محزوزة‬miSoEado yaEomalo biAlojarGo bibakorapo ma-
Hozuwzapo “Elevator works by traction with splined roller”. In fact, the boxes be-
tween the two brackets “(definition” and “definition)” contain the complex sentence
used as the definition of the term. The term in its turn is extracted by the concatena-
tion of the three variables $Nc1$, $Prps$ and $Nc2$, given that Nc = Noun, Prps =
Preposition, V = Verb and Adj = Adjective.
Fig. 4. Transducer of terms extraction
Another example is the recognition of terminological relations such as synonymy.

In fact, we have developed dictionaries that indicate synonyms and we used the lexi-
cal constitution of terms to recognize their synonyms using transducers. In fact, a term
could be: a Noun (Single word), in this case synonymy could be easily detected be-
cause our dictionaries contain synonyms or a Noun phrase (NP = Noun + Comple-
ment modifier) (multi-word terms). The Complement modifier, in its turn, could be an
adjectival phrase (AP), a prepositional phrase (PP) or a noun phrase (NP). For exam-
ple, the two synonym phrases ‘‫ ’مصعد متعدد المقصورات‬miSoEado mutaEadGido
AlomaqSuwraAto and ‘‫ ’مصعد متعدد العربات‬miSoEado mutaEadGido AloEarabaAto
“multi-car elevator”, we have:
 T1=NP1 and T2=NP2

 NP1=Noun1+NP11 and NP2=Noun2+NP21
 NP11=Noun11+Noun12 and NP21=Noun21+Noun22
With:
 Noun1=Noun2=‘‫“=’ مصعد‬elevator”
 Noun11=Noun21=‘‫“=’متعدد‬multi”
 Noun12=‘‫ ’المقصورات‬synonym Noun22=‘‫“ ’العربات‬car”
So, T1 is a synonym of T2.

We noticed, also, that many other data categories could not be included automati-
cally into the TDB through the transducer cascade. For example, the “conceptIdentifi-
er” and “termsIdentifier”, administrativeStatus, geographicalUsage, etc. Indeed, the
intervention of a terminologist is needed for corrections of terminological data and
introduction of missing data categories. So, we developed a Java interface that han-
dles the XML file provided from the transducer cascade (it eliminates duplicated
terms resulting from the automatic extraction) and gives the terminologist the ability
to correct erroneous entries and to add other information that could not be extracted
into their appropriate data categories.
Our study corpus consists of 350 documents of patents, manuals and scientific pa-
pers, and our test corpus consists of 1000 multidisciplinary documents. Our resources
are: Dictionaries (one for inflected nouns of 230700 entries, one for inflected verbs of
1966052 entries (From [4]) and dictionaries for preposition, personal nouns, etc.), trig-
ger words and extraction rules. By enlarging our dictionaries, including synonyms and
treating Arabic terms’ characteristics, we obtained satisfactory results. We increased
the precision from 0.97 to 0.98 by reducing the ambiguity using the MADA [6] toolkit
for Arabic morphological disambiguation and a recall of 0.96 with a total number of
extract correct term of 3521.
6 Conclusion
In this paper, we point the lack of Arabic multilingual and multi-disciplinary termino-
logical databases. We have studied characteristics of Arabic technical and scientific
terms of our corpus. We have, also, benefited of the interoperable GMT format that
gives the TMF standard and create a TML that fits the Arabic Terms by including the
appropriate data categories in the TMF meta-model. Finally, we used a rule-based
approach to extract terms and build a terminological database in association with a
java interface. In the future, we aim enlarge our terminological database by extracting
and annotating terms under TMF format from scientific and technical documents
using a statistical approach. We aim also to combine the two approaches into a hybrid
one.
References
1. Ammar, C., Haddar, K., Romary, L.: Automatic Construction of a TMF Terminological
Database Using a Transducer Cascade. In: Recent Advances in Natural Language Pro-
cessing (RANLP ’15), pp. 17-23, Hissar (2015)
2. Attia, M., Toral, M., Tounsi, L., Pecina, P., Genabith, J.V.: Automatic extraction of Arabic
multiword expressions. In: Proceedings of the International Conference on Language Re-
sources and Evaluation LREC’10, pp. 19-26 (2010)
3. Boulaknadel, S., Daille, D., Aboutajdine, D.: A Multi-Word Term Extraction Program for
Arabic Language. In: Proceedings of the International Conference on Language Resources
and Evaluation LREC’08, pp. 630–634, Marrakech (2008)
4. Doumi, N., Lehireche, L., Maurel, D., Khater, M.: Using finite-state transducers to build
lexical resources for Unitex Arabic package. In: Second Symposium for Researcher Stu-
dents in Natural Language Processing and its Applications CEC-TAL’15, pp. 90-100,
Sousse (2015)
5. Friburger, N., Maurel, D.: Finitestate transducer cascades to extract named entities in texts.
In: Theoretical Computer Science, vol. 313, pp. 93-104 (2004)
6. Habash, N., Rambow, O., Roth, R.: MADA+TOKAN: A Toolkit for Arabic Tokenization,
Diacritization, Morphological Disambiguation, POS Tagging, Stemming and Lemmatiza-
tion. In Proceedings of the 2nd International Conference on Arabic Language Resources
and Tools, pp. 102-109, Cairo (2009)
7. Hudrisier, H., Ben Henda, M.: CARTAGO : une terminologie large langues de
l’enseignement électronique à distance, dans un contexte de co-élaboration multilingue de
documents normatifs. In : Séminaire international Les outils d'aide à la traduction, Union
Latine, Bucarest (2008)
8. ISO 16642, Computer applications in terminology - Terminological markup framework
(TMF). (2003)
9. ISO 12620:1999, Computer applications in terminology – Data categories. (1999)
10. ISO 704: Terminology work — Principles and methods. (2000)
11. Khayari, M., Schneider, S., Kramer, I., Romary, L.: Unification of multi-lingual scientific
terminological resources using the ISO 16642 standard. The TermSciences initiative. In:
International Workshop Acquiring and representing multilingual, specialized lexicons: the
case of biomedicine, 6 p., Genoa (2006)
12. Lou, B., Sperberg-McQueen, C.M.: The Design of the TEI Encoding Scheme. Computers
and the Humanities, vol. 29(1), pp. 17–39 (1995)
13. Romary, L.: An Abstract Model for the Representation of Multilingual Terminological Da-
ta: TMF Terminological Markup Framework. In: 5th TermNet Symposium TAMA’01,
Antwerp (2001)
14. Romary, L., Kramer, I., Salmon-Alt, S., Roumier J.: Gestion de données terminologiques :
principes, modèles, méthodes. Widad Mustafa El Hadi. Terminologie et accès à l'informa-
tion, Hermes, 13 p. (2006)
15. Romary, L., Van Campenhoudt, M.: Normalisation des échanges de données en termino-
logie : les cas des relations dites conceptuelles. In : 4ème Rencontre Terminologie et Intel-
ligence Artificielle TIA'01, 10 p., Nancy (2001)

TMF Normalization of Arabic Technical and Scientific Terms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TMF Normalization of Arabic Technical and Scientific Terms

Uploaded by

Copyright:

Available Formats

TMF Normalization of Arabic Technical and Scientific

Chihebeddine Ammar1, Kais Haddar1, and Laurent Romary2

Abstract. Multilingual terminological data are an essential component for

Keywords: TMF, normalization, terminology, Arabic terminological database,

The main objective of terminology works is to document terms used to designate

In this section, we detail the concept of standardization by presenting some existing

2.1 Normalization of terminological products

Fig. 1. TMF meta-model.

The structural logic scheme that optimizes onomasiological approach is organized

2.2 Arabic terminological databases

3 Characteristics of Arabic Technical and Scientific Terms

3.1 The impact of foreign languages and cultures

3.2 Terminological disparities

3.3 Terms’ Translation

3.4 Mutli-word terms

3.5 Conceptual relationships

The identification of concepts and semantic relationships between them is an im-

4 Our proposed method

4.1 Construction of Data Category Registry and TMF modeling

Fig. 2. TMF meta-model levels and associated Data Categories.

4.2 Processing of Arabic terms’ characteristics

4.3 Alimentation of the TDB

Fig. 3. Terminological entry in GMT format

5 Experimentation and Evaluation

Another example is the recognition of terminological relations such as synonymy.

 T1=NP1 and T2=NP2

So, T1 is a synonym of T2.

You might also like