Professional Documents
Culture Documents
Terminology Create A Corpus
Terminology Create A Corpus
Terminology Create A Corpus
Terminology
Practice - Session 2 - 23-11-2021
Christian Olalla-Soler
z
3. Extract terms.
4. Systematise terms.
language
▪ Find online reference materials for your domain.
▪ Between 5-10 texts per language. The texts must be comparable.
▪ You will need to list your selection criteria, the searches your performed, and the URLS in
your readme.txt file!
1. Create a corpus for each
z
language
▪ Texts for experts vs. texts for semi-experts:
language
▪ Use the online reference materials to build a manual corpus for each language.
▪ Careful with document conversion in txt format!
▪ Use AntConc to extract (both simple and complex) terms from your corpora.
▪ Use corpus-based (concordances, clusters, collocates) and corpus-driven methods
(wordlists, n-grams, keyword lists).
1. Create a corpus for each
z
language
▪ Expand your bilingual comparable corpus with BootCat.
▪ Include your report.csv file in the project folder!
2. Extract a list of candidate
z
▪ Are the terms you extracted representative of the breadth and depth of your domain?
3. Create a list of bilingual
z
candidate terms
▪ Try and match the terms you extracted inter-linguistically, and insert them in a txt file in this format:
Term_EN TAB Term_IT TAB
Example: stem cells TAB cellule staminali
candidate terms
▪ Don’t
▪ Add headings (column names) like “EN/IT” or “Term_EN” “Term_IT”.
▪ Precede the term with hyphens or numbers. E.g., “· stem cell” or “1. stem cell”.
▪ Classify terms (e.g., simple vs. complex terms). Just order them alphabetically.
▪ Be careful with
▪ Selecting a singular or plural form of the term. Choose the one that is more frequent.
▪ Capitalization. Capitalize only if the term requires it. English tends to capitalize (e.g., in titles), but the term is
not necessarily capitalized.