Professional Documents
Culture Documents
GAURAV SINGH MLS-203 2nd ASSIGNMENT
GAURAV SINGH MLS-203 2nd ASSIGNMENT
GAURAV SINGH MLS-203 2nd ASSIGNMENT
Vocabulary Control refers to the process of creating, maintaining, and using a controlled vocabulary, where
a limited set of terms must be used to index documents, and to search for these documents, in a particular
system. It may be defined as a list of terms showing their relationships and used to represent the specific
subject of the document.
An information system may help the user by explicitly assigning index terms (that is, words or
notations) to the documents and controlling, at least in the case of alphabetical (word) systems,
the semantic and often the syntatic relationships between these index terms the words (which
may be subject headings or descriptors) are assigned from recognized subject heading lists or
thesauri, and the notations from recognized classification schedules, and thus use controlled
vocabulary. A controlled vocabulary is one in which there is only one term or notation in the
vocabulary for any one concept. The Library of Congress List of Subject Headings is an
example of a controlled alphabetical vocabulary, and the Dewey Decimal Classification is an
example of a notational vocabulary (By definition, all notational vocabularies must be
controlled).
The term 'Vocabulary control' refers to a limited set of teal that must be used to index
documents, and to search for these documents, in a particular system. It may be defined as
a list of terms showing their relationships and used to represent the specific subject of a
document. A certain degree of structure is introduced in a controlled vocabulary so that terms
whose meanings are related are brought together or linked in some way. An uncontrolled
vocabulary, is an unlimited set of terms drawn from natural language and used for describing
the contents of documents.
In addition, for word based systems, the controlled vocabulary identifies synonyms terms and
selects one preferred term among them. For homonyms, it explicitly identifies the multiple
concepts expressed by that word or phrase. In short, vocabulary control helps in overcoming
problems that occur due to natural language of the document’s subject. Hence, if voc abulary
2|Page
control is not exercised different indexers or the same indexer might use different terms for
the same concept on different occasions for indexing the documents dealing with the same
subject and also use a different set of terms for representing the same subject at the time of
searching. This, in turn, would result in ‘mis-match’ and thus affect information retrieval.
In library and information science, controlled vocabulary is a carefully selected list of words
and phrases, which are used to tag units of information (document or work) so that they may
be more easily retrieved by a search. Controlled vocabularies solve the problems of
homographs, synonyms and polysemes by a bijection between concepts and authorized terms.
In short, controlled vocabularies reduce ambiguity inherent in normal human languages where
the same concept can be given different names and ensure consistency.
For example, in the Library of Congress Subject Headings (a subject heading system that uses
a controlled vocabulary), authorized terms—subject headings in this case—have to be chosen
to handle choices between variant spellings of the same word (American versus British), choice
among scientific and popular terms (cockroach versus Periplaneta americana), and choices
between synonyms (automobile versus car), among other difficult issues.
Choices of authorized terms are based on the principles of user warrant (what terms users are
likely to use), literary warrant (what terms are generally used in the literature and documents),
and structural warrant (terms chosen by considering the structure, scope of the controlled
vocabulary).
Controlled vocabularies also typically handle the problem of homographs with qualifiers. For
example, the term pool has to be qualified to refer to either swimming pool or the game pool
to ensure that each authorized term or heading refers to only one concept.
There are two main kinds of controlled vocabulary tools used in libraries: subject headings and
thesauri. While the differences between the two are diminishing, there are still some minor
differences.
subject headings were designed to describe books in library catalogs by catalogers while thesauri
were used by indexers to apply index terms to documents and articles. Subject headings tend to be
3|Page
broader in scope describing whole books, while thesauri tend to be more specialized covering very
specific disciplines. Also because of the card catalog system, subject headings tend to have terms
that are in indirect order (though with the rise of automated systems this is being removed), while
thesaurus terms are always in direct order. while thesauri tend to use singular direct terms. Lastly
thesauri list not only equivalent terms but also narrower, broader terms and related terms among
various authorized and non-authorized terms, while historically most subject headings did not.
4|Page
integral part of bibliographic control, which is the function by which libraries collect, organize,
and disseminate documents.
The word `thesaurus' comes from Greek term `thesauros' meaning a storehouse or treasury of
words. The Oxford English Dictionary defines "thesaurus" as a archaeological term "a treasury
of temple, etc." and quotes its use in 1736 as a treasury or store house of knowledge. Dictionary
defines it as "a book of words or of information about a particular field or a set of concepts,
specially a dictionary of synonyms". A dictionary lists words along with their meanings;
synonyms, etc. in alphabetical order, but a thesaurus assembles all words related to an idea at
one place. Modern usage may be said to date from 1852 when Peter Mark Roget thought of
his thesaurus as a classification of ideas. Roget's Thesaurus had nothing to do with information
retrieval. But his novel idea was later profitably employed in the compilation of thesaurus for
information retrieval.
There are a number of definitions of `thesaurus' provided
by different experts and organisations. The most
comprehensive one has been provided by the International
Standards Organisation (ISO) as the basis of structure and
functions of a thesaurus. In terms of functions, it states "a
thesaurus is a terminological control device used in
translating from the natural language of documents,
indexers or users into a more constrained `system language'
(documentation language, information language)". In terms of structure, the Standard says, "a
thesaurus is a controlled and dynamic vocabulary of semantically and generically related terms
which covers a specific domain of knowledge". In short, a thesaurus may be defined as a
compilation of descriptors for use in an information retrieval system arranged in an alphabetical
order and manifesting the various types of relationships existing between the descriptors. An
information retrieval thesaurus is a kind of semantic networking of concepts.
5|Page
iv. homonyms are differentiated by qualifiers; b) it shows the intrinsic, semantic
relationship existing between, terms, and thus provides system of references between
terms;
it helps the indexer and the searcher in the choice of preferred terms;
it provides hierarchical display of terms so that a search can be broadened or narrowed
systematically;
it increases the speed of retrieval by use of indexing terms and search terms; and
it provides a map of a 'given subject field, which helps to understand the structure of the field.
Based on the nature of terminology control, there are mainly two types of thesauri:
a) controlled thesauri which allow only one term (preferred term) to denote a concept for the
purpose of indexing and searching; and Vocabulary Control : Subject Heading Lists and
Thesauri 29 Subjects Indexing, Vocabulary Control and Recent Developments in Cataloguing 30
b) free language thesauri, which allow use of all terms to denote a concept to be used for indexing
and searching. The controlled thesauri can be maintained manually but free language thesauri
require machine maintenance and retrieval.
The internal form of individual entries and the arrangement of various entries in relation to one
another constitute the structure of a thesaurus. Cross-references make explicit the way in which
entries relate to each other in a network of concepts. Each entry in a thesaurus consists of a pack
of terms, which are related to it in different ways. The different terms in the entry are displayed
in the following format:
DESCRIPTOR
(With scope note whenever needed)
Synonyms and quasi-synonyms
(displaying equivalence relationship and denoted by the relationship indicator USE/UF (Use For)
Broader Terms
(displaying hierarchical - subordinate relationship and denoted by BT)
Narrower Terms
(displaying hierarchical - subordinates relationship and denoted by NT)
Related Terms
6|Page
(displaying associate relationship and denoted by RT)
Top Term
(displaying hierarchical - subordinates relationship and denoted by TT . Top term or TT is not
repeated when all the descriptors belong to the same broad class).
A thesaurus may be either alphabetical, or classified, and it may or may not include a graphical
display. In an alphabetical thesaurus, the descriptors followed by their relationships are listed in
alphabetical sequences. In a classified thesaurus, the descriptors are listed in accordance with the
hierarchical relationships represented in the thesaurus. The various levels of hierarchy are shown
by appropriate indentations. The graphical displays are multi-dimensional ways of representing
the relationships between terms. Such relationships are indicated by arrows lines or by presenting
term in concentric circles showing hierarchy. Reciprocal entries appear for each term in a
thesaurus whenever a relationship, whether hierarchical or non-hierarchical, is established
between two terms.(Fidel, 1991)
It is also a vocabulary control device developed by Dr. Ganesh Bhattacharyya at DRTC that incorporates
in itself features of both a faceted classification scheme as well as that of a conventional alphabetical
thesaurus. It is an elementary categorybased (faceted) systematic scheme of hierarchical classification in
verbal plane incorporating all the necessary and sufficient features of a conventional information retrieval
thesaurus. Like any classification scheme, it displays hierarchical relationships among terms in its
schedules. Like a faceted classification scheme,, there are separate schedules for each of the Elementary
Categories (Entity, Property, and Action) and for common modifiers like Form, Time, Place, and
Environment. Like any thesaurus, each of the terms in the hierarchic schedules is enriched by synonyms,
quasi-synonyms, etc. Unlike a thesaurus, a classaurus does not include other associatively related terms
(RTs) because of its category-based (faceted) structure. It is said that a term in one elementary category
has a high chance of being associatively related with another term in another category depending on the
subject of the document. It is assumed that RTs should not be dictated by the designer of the classaurus,
rather it should be dictated by the document itself since any term may be associatively related to other
terms depending on the nature of the thought content of the document. The classaurus has two parts: the
Systematic Part and the Alphabetical Index part.(Leise, 2008)
This concept has been developed by Jean Aitchison and others for English Electric Company. It
is basically a faceted classification, integrated with a thesaurus. Thesauro-facet consists of two
sections: a) faceted classification scheme, and b) alphabetical thesaurus. Here, the thesaurus
replaces the alphabetical subject index, which normally follows the schedules in a conventional
faceted classification. Terms appear twice - once in the schedule and once in the alphabetical
7|Page
A taxonomy is an orderly
classification for a defined
domain. It may also be known as
a faceted vocabulary. It
comprises controlled
vocabulary terms (generally
only preferred terms) organized
into a hierarchical structure. Each term in a taxonomy is in one or more parent/child (broader/
narrower) relationships to other terms in the taxonomy. There can be different types of parent/child
relationships, such as whole/part, genus/ species, or instance relationships. However, in good
practice, all children of a given parent share the same type of relationship.
A taxonomy may differ from a thesaurus in that it generally has shallower hierarchies and a less
complicated structure. For example, it often has no equivalent (synonyms or variant terms) or
related terms (associative relationships). The scientific classifications of animals and plants are
well-known examples of taxonomies. A partial display of Flavobacteria in the taxonomy of the
U.S. National Center for Biotechnology Information is above. In common usage, the term
taxonomy may also refer to any classification or placement of terms or headings into categories,
particularly a controlled vocabulary used as a navigation structure for a Web site.
8|Page
Whereas the vocabularies discussed above are the ones
most commonly used for art information, discussions
of controlled vocabularies may also include ontologies.
In common usage in computer science, an ontology
is a formal, machine-readable specification of a
conceptual model in which concepts, properties,
relationships, functions, constraints, and axioms are
all explicitly defined. Such an ontology is not a
controlled vocabulary, but it uses one or more
controlled vocabularies for a defined domain and
expresses the vocabulary in a representative language that has a grammar for using vocabulary
terms to express something meaningful. Ontologies generally divide the realm of knowledge
that they represent into the following areas: individuals, classes, attributes, relations, and
events. The grammar of the ontology links these areas together by formal constraints that
determine how the vocabulary terms or phrases may be used together. There are several grammars
or languages for ontologies, both proprietary and standards-based. An ontology is used to
make queries and assertions. Ontologies have some characteristics in common with faceted
taxonomies and thesauri, but ontologies use strict semantic relationships among terms and
attributes with the goal of knowledge representation in machine-readable form, whereas
thesauri provide tools for cataloging and retrieval. Ontologies are used in the Semantic Web,
artificial intelligence, software engineering, and information architecture as a form of
knowledge representation in electronic form about a particular domain of knowledge.
In the example above, each item in the ontology belongs to the subclass above it. Items can also
belong to various other classes, although the relationships may be different. For example, a
watercolor is a painting, but it may also be classified as a drawing because it is a work on paper.
Van Gogh’s Irises could be classified with oil paintings (with the relationship type medium is)
but also with Post-Impressionist art (with relationship type style/period is). Relationships in
ontologies are defined according to strict rules, which are different than the equivalence,
hierarchical, and associative relationships used for thesauri and other vocabularies discussed in
this book.
Although there are many new concepts and ideas to learn about in the world of vocabulary control,
especially in an enterprise setting, many indexers find the work of creating vocabulary control to
be both a natural extension of their indexing skills and an important new income stream.
9|Page
Fidel, R. (1991). Searchers’ selection of search keys: II. Controlled vocabulary or
free‐text searching. Journal of the American Society for Information Science,
42(7), 501–514. https://doi.org/10.1002/(SICI)1097-
4571(199108)42:7<501::AID-ASI5>3.0.CO;2-V
Leise, F. (2008). Controlled vocabularies: an introduction. The Indexer: The
International Journal of Indexing, 26(3), 121–126.
https://doi.org/10.3828/indexer.2008.37
Mukherjee, B. (2017). Vocabulary Control : Subject HEADING LISTS AND
THESAURI. unit 15, 21–38.
http://www.egyankosh.ac.in/bitstream/123456789/33118/1/Unit-15.pdf
10 | P a g e