Professional Documents
Culture Documents
Terminology Extraction Tools
Terminology Extraction Tools
Terminology Extraction Tools
Technical terms are important for knowledge mining, especially in the bio-medical area where a
vast amount of documents are available. The number of terms (e.g., names of genes, proteins,
chemical compounds, drugs, organisms, etc) is increasing at an astounding rate in the biomedical
literature. Existing terminological resources and scientific databases cannot keep up-to-date with
the growth of neologisms. A domain-independent method for term recognition is very useful to
automatically
Uploading: Texts may be submitted for analysis through any of the following ways:
entering the text you would like to analyze into the topmost text window;
specifying a text file (*.txt or *.pdf) from your computer's hard drive;
entering an URL of the Web resource (*.html or *.pdf.
fivefilters : This is a free software project to enable easy term extraction through a web service.
Given some text, it will return a list of terms with the most relevant first.
The list is returned in JSON format. It is a free alternative to Yahoo's Term Extraction service. It is
being developed as part of the Five Filters project to promote alternative, non-corporate media.
Languages supported: English
Maui - indexer: Maui automatically identifies main topics in text documents. Depending on the
task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles
of Wikipedia articles. It also shows how keyphrases can be extracted from document text.
File formats supported: text, PDF, Microsoft Word.
Vocab Grabber analyzes any text, generating lists of the most useful vocabulary words and
shows how those words are used in context. VocabGrabber creates a list of vocabulary from the
text, which can be then sorted, filtered, and saved. By selecting any word on the list it is possible
to see a snapshot of the Visual Thesaurus map and definitions for that word, along with
examples of the word in the text.
Languages supported: English, Supported file formats: all formats.
Anchovy: a free multilingual cross-platform glossary editor and term extraction tool based on
the open Glossary Markup Language (GlossML) format.
Bibclassify - A module in CDS Invenio (CERN’s document server software) for automatic
assignment of terms from SKOS vocabularies, developed on the High Energy Physics vocabulary.
Developed in the collaboration between CERN and DESY.
Extractor - Commercial software for keyword extraction in different languages. There is also a
demo. Developed at the National Research Council of Canada.
Topia term extractor - Part-of-speech and frequency based term extraction tool implemented in
python.
Yahoo term extraction - Web-service based content analysis via term extraction, includes a
demo.