Terminology Create A Corpus

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

z

Terminology
Practice - Session 2 - 23-11-2021

Christian Olalla-Soler
z

The terminographic workfow

1. Plan your termbase.

2. Collect reference materials.

3. Extract terms.

4. Systematise terms.

5. Create your termbase.


z
2. Collect reference materials
1. Create a corpus for each language.

2. Extract a list of candidate terms in each language.

3. Create a list of bilingual candidate terms.

4. Systematise the terms.


1. Create a corpus for each
z

language
▪ Find online reference materials for your domain.
▪ Between 5-10 texts per language. The texts must be comparable.

▪ The text genre(s) that make up your corpus must be homogeneous.

▪ Establish text selection criteria (e.g., texts for semi-experts*,


specific text genres, authoritative sources, non-translated texts, etc.), descriptive criteria
(time period, country, etc.) and comparability criteria (year of production of the text, text
genre, etc.).

▪ Use search operators!

▪ You will need to list your selection criteria, the searches your performed, and the URLS in
your readme.txt file!
1. Create a corpus for each
z

language
▪ Texts for experts vs. texts for semi-experts:

For experts For semi-experts


Are terms signalled? Not frequently Frequently, with orthotypographic features
(e.g., bold or italics).
Are terms defined? Not frequently Frequently and in an explicit way
Are terms used in context? Frequently Frequently
Are terms linked to other sources? Not frequently Frequently
Are terms represented graphically? Could happen Could happen
1. Create a corpus for each
z

language
▪ Use the online reference materials to build a manual corpus for each language.
▪ Careful with document conversion in txt format!

▪ 1 text = 1 txt file.

▪ Careful with metadata in the file!

▪ Use AntConc to extract (both simple and complex) terms from your corpora.
▪ Use corpus-based (concordances, clusters, collocates) and corpus-driven methods
(wordlists, n-grams, keyword lists).
1. Create a corpus for each
z

language
▪ Expand your bilingual comparable corpus with BootCat.
▪ Include your report.csv file in the project folder!
2. Extract a list of candidate
z

terms in each language


▪ Analyse your expanded corpora with AntConc and, for each language,
extract simple terms and complex terms (+ keep track of collocations/phraseologisms).
▪ Do your terms adequately describe the domain you chose?

▪ Are the terms you extracted representative of the breadth and depth of your domain?
3. Create a list of bilingual
z

candidate terms
▪ Try and match the terms you extracted inter-linguistically, and insert them in a txt file in this format:
Term_EN TAB Term_IT TAB
Example: stem cells TAB cellule staminali

▪ Add information on synonyms, collocations or notes:


Term_EN TAB Term_IT TAB Syn_EN: XXX
Term_EN TAB Term_IT TAB Syn_EN: XXX; Syn_IT: XXX
Term_EN TAB Term_IT TAB Coll_EN: XXX
Term_EN TAB Term_IT TAB Coll_EN: XXX; Coll_IT: XXX;
Term_EN TAB Term_IT TAB Coll_EN: XXX; Coll_IT: XXX; Syn_EN: XXX; Syn_IT: XXX
Term_EN TAB Term_IT TAB Note: XXX
Term_EN TAB Term_IT TAB Coll_EN: XXX; Coll_IT: XXX; Syn_EN: XXX; Syn_IT: XXX; Note: XXX
Example: adult TAB adulte TAB Note: adjective refers to "stem cells"/"cellule staminali"; Syn_EN: somatic; Syn_IT:
somatiche
3. Create a list of bilingual
z

candidate terms
▪ Don’t
▪ Add headings (column names) like “EN/IT” or “Term_EN” “Term_IT”.

▪ Precede the term with hyphens or numbers. E.g., “· stem cell” or “1. stem cell”.

▪ Add empty lines

▪ Classify terms (e.g., simple vs. complex terms). Just order them alphabetically.

▪ Be careful with
▪ Selecting a singular or plural form of the term. Choose the one that is more frequent.

▪ Verbs. Always report the infinitive or base form.

▪ Capitalization. Capitalize only if the term requires it. English tends to capitalize (e.g., in titles), but the term is
not necessarily capitalized.

▪ Check the example on Moodle.


▪ You will need to add this txt file to your project folder! Name it “Glossary.txt“.

You might also like