GAURAV SINGH MLS-203 2nd ASSIGNMENT

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

SUBMITTED BY

NAME- GAURAV SINGH


ROLL NO- 07
COURSE- MLISC
BATCH- 2020-2021

SUBMITTED TO – MS. SHAZIA ALVI

Vocabulary Control refers to the process of creating, maintaining, and using a controlled vocabulary, where
a limited set of terms must be used to index documents, and to search for these documents, in a particular
system. It may be defined as a list of terms showing their relationships and used to represent the specific
subject of the document.

JAMIA MILLIA ISLAMIA


1|Page
2021
Vocabulary Control refers to the process of creating, maintaining, and using a controlled
vocabulary, where a limited set of terms must be used to index documents, and to search for
these documents, in a particular system. It may be defined as a list of terms showing their
relationships and used to represent the specific subject of the document.

An information system may help the user by explicitly assigning index terms (that is, words or
notations) to the documents and controlling, at least in the case of alphabetical (word) systems,
the semantic and often the syntatic relationships between these index terms the words (which
may be subject headings or descriptors) are assigned from recognized subject heading lists or
thesauri, and the notations from recognized classification schedules, and thus use controlled
vocabulary. A controlled vocabulary is one in which there is only one term or notation in the
vocabulary for any one concept. The Library of Congress List of Subject Headings is an
example of a controlled alphabetical vocabulary, and the Dewey Decimal Classification is an
example of a notational vocabulary (By definition, all notational vocabularies must be
controlled).

The term 'Vocabulary control' refers to a limited set of teal that must be used to index
documents, and to search for these documents, in a particular system. It may be defined as
a list of terms showing their relationships and used to represent the specific subject of a
document. A certain degree of structure is introduced in a controlled vocabulary so that terms
whose meanings are related are brought together or linked in some way. An uncontrolled
vocabulary, is an unlimited set of terms drawn from natural language and used for describing
the contents of documents.
In addition, for word based systems, the controlled vocabulary identifies synonyms terms and
selects one preferred term among them. For homonyms, it explicitly identifies the multiple
concepts expressed by that word or phrase. In short, vocabulary control helps in overcoming
problems that occur due to natural language of the document’s subject. Hence, if voc abulary

2|Page
control is not exercised different indexers or the same indexer might use different terms for
the same concept on different occasions for indexing the documents dealing with the same
subject and also use a different set of terms for representing the same subject at the time of
searching. This, in turn, would result in ‘mis-match’ and thus affect information retrieval.

In library and information science, controlled vocabulary is a carefully selected list of words
and phrases, which are used to tag units of information (document or work) so that they may
be more easily retrieved by a search. Controlled vocabularies solve the problems of
homographs, synonyms and polysemes by a bijection between concepts and authorized terms.
In short, controlled vocabularies reduce ambiguity inherent in normal human languages where
the same concept can be given different names and ensure consistency.
For example, in the Library of Congress Subject Headings (a subject heading system that uses
a controlled vocabulary), authorized terms—subject headings in this case—have to be chosen
to handle choices between variant spellings of the same word (American versus British), choice
among scientific and popular terms (cockroach versus Periplaneta americana), and choices
between synonyms (automobile versus car), among other difficult issues.
Choices of authorized terms are based on the principles of user warrant (what terms users are
likely to use), literary warrant (what terms are generally used in the literature and documents),
and structural warrant (terms chosen by considering the structure, scope of the controlled
vocabulary).
Controlled vocabularies also typically handle the problem of homographs with qualifiers. For
example, the term pool has to be qualified to refer to either swimming pool or the game pool
to ensure that each authorized term or heading refers to only one concept.

There are basically two objectives for having a controlled vocabulary:


a) to promote the consistent representation of the subject matter of documents by indexers and
searchers, thereby avoiding the dispersion of related documents, through control of synonymous
and nearly synonymous expression ns and by distinguishing among homographs; and
b) to facilitate the conduct of a comprehensive search, by bringing together in someway, the terms
that are most closely related semantically.

There are two main kinds of controlled vocabulary tools used in libraries: subject headings and
thesauri. While the differences between the two are diminishing, there are still some minor
differences.
subject headings were designed to describe books in library catalogs by catalogers while thesauri
were used by indexers to apply index terms to documents and articles. Subject headings tend to be

3|Page
broader in scope describing whole books, while thesauri tend to be more specialized covering very
specific disciplines. Also because of the card catalog system, subject headings tend to have terms
that are in indirect order (though with the rise of automated systems this is being removed), while
thesaurus terms are always in direct order. while thesauri tend to use singular direct terms. Lastly
thesauri list not only equivalent terms but also narrower, broader terms and related terms among
various authorized and non-authorized terms, while historically most subject headings did not.

Subject heading has been defined as a word or


group of words indicating a subject under which all
materials dealing with same theme is entered in a
catalogue or bibliography, or is arranged in a file.
Credit should go to Crestadoro who, for the first
time in his book The Art of Making Catalogues'
published in 1856, could realize that the cataloguer should provide a standardized guide to the
subject content of a book by giving it a heading.
A vocabulary control device depends on a master list of terms that can be assigned to
documents. Such a master list of terms is called `List of Subject Headings'. A list of subject
headings list contains the subject access terms (preferred terms) to be used in the cataloguing or
indexing operation at hand. When there are synonymous terms for a given subject, these terms
are included in the list as these direct the searcher to the preferred terms for the subject. The
links from non-preferred terms are called "see" references, and the links to related terms are
called "see also" references. This is accomplished through a control system, called `subject
authority system', which, for each term, documents the basis for decisions on the term and on
what links connect it with other terms. The rules for subject headings in a dictionary catalogue
were formulated by Charles Ammi Cutter in 1876 in his `Rules for a Dictionary Catalog'. These
rules formed the basis of subject headings in American libraries for years to come and are a strong
force even today. In respect of subject cataloguing, Cutter stated two objectives: a) b) to enable
a person to find a book of which the subject is known, and to show what the library has on ' a
given subject. The first objective refers to the need to locate individual items, and the second
refers to the need to collocate materials on the same subject. It was on the basis of these needs
that Cutter set forth his basic principles of subject entry. They are important because the impact
of his principles on construction and maintenance of subject headings is still discernible today.
Two popular subject heading lists are Library of Congress Subject Headings (LCSH) and Sears
List of Subject Headings. Sears List of Subject Headings, first published by Minnie Earl Sears
in 1923, has served as a standard authority list for subject cataloging in small and medium-sized
libraries, delivering a basic list of essential headings, together with patterns and examples to guide
the cataloger in creating further headings as needed. It is available as a print publication and an
online database. The Library of Congress Subject Headings (LCSH) comprise a thesaurus (in
the information science sense, a controlled vocabulary) of subject headings, maintained by the
United States Library of Congress, for use in bibliographic records. LC Subject Headings are an

4|Page
integral part of bibliographic control, which is the function by which libraries collect, organize,
and disseminate documents.

The word `thesaurus' comes from Greek term `thesauros' meaning a storehouse or treasury of
words. The Oxford English Dictionary defines "thesaurus" as a archaeological term "a treasury
of temple, etc." and quotes its use in 1736 as a treasury or store house of knowledge. Dictionary
defines it as "a book of words or of information about a particular field or a set of concepts,
specially a dictionary of synonyms". A dictionary lists words along with their meanings;
synonyms, etc. in alphabetical order, but a thesaurus assembles all words related to an idea at
one place. Modern usage may be said to date from 1852 when Peter Mark Roget thought of
his thesaurus as a classification of ideas. Roget's Thesaurus had nothing to do with information
retrieval. But his novel idea was later profitably employed in the compilation of thesaurus for
information retrieval.
There are a number of definitions of `thesaurus' provided
by different experts and organisations. The most
comprehensive one has been provided by the International
Standards Organisation (ISO) as the basis of structure and
functions of a thesaurus. In terms of functions, it states "a
thesaurus is a terminological control device used in
translating from the natural language of documents,
indexers or users into a more constrained `system language'
(documentation language, information language)". In terms of structure, the Standard says, "a
thesaurus is a controlled and dynamic vocabulary of semantically and generically related terms
which covers a specific domain of knowledge". In short, a thesaurus may be defined as a
compilation of descriptors for use in an information retrieval system arranged in an alphabetical
order and manifesting the various types of relationships existing between the descriptors. An
information retrieval thesaurus is a kind of semantic networking of concepts.

The major functions of a thesaurus include the following :


it provides a standard vocabulary for a given subject field by exercising control on the
vocabulary of terms used in an indexing language. Methods of controlling the vocabulary .are:
i. out of all possible synonyms and quasi-synonyms, only one term is selected as a
descriptor,
ii. the scope of the meaning of the term is clearly indicated in a scope note for the best
suitability of the selected meaning,
iii. a definite rule is followed for compound terms, word-forms, number (singular/plural) and
spellings are standardized, and

5|Page
iv. homonyms are differentiated by qualifiers; b) it shows the intrinsic, semantic
relationship existing between, terms, and thus provides system of references between
terms;
it helps the indexer and the searcher in the choice of preferred terms;
it provides hierarchical display of terms so that a search can be broadened or narrowed
systematically;
it increases the speed of retrieval by use of indexing terms and search terms; and
it provides a map of a 'given subject field, which helps to understand the structure of the field.

Based on the nature of terminology control, there are mainly two types of thesauri:
a) controlled thesauri which allow only one term (preferred term) to denote a concept for the
purpose of indexing and searching; and Vocabulary Control : Subject Heading Lists and
Thesauri 29 Subjects Indexing, Vocabulary Control and Recent Developments in Cataloguing 30
b) free language thesauri, which allow use of all terms to denote a concept to be used for indexing
and searching. The controlled thesauri can be maintained manually but free language thesauri
require machine maintenance and retrieval.

The internal form of individual entries and the arrangement of various entries in relation to one
another constitute the structure of a thesaurus. Cross-references make explicit the way in which
entries relate to each other in a network of concepts. Each entry in a thesaurus consists of a pack
of terms, which are related to it in different ways. The different terms in the entry are displayed
in the following format:
DESCRIPTOR
(With scope note whenever needed)
Synonyms and quasi-synonyms
(displaying equivalence relationship and denoted by the relationship indicator USE/UF (Use For)
Broader Terms
(displaying hierarchical - subordinate relationship and denoted by BT)
Narrower Terms
(displaying hierarchical - subordinates relationship and denoted by NT)
Related Terms

6|Page
(displaying associate relationship and denoted by RT)
Top Term
(displaying hierarchical - subordinates relationship and denoted by TT . Top term or TT is not
repeated when all the descriptors belong to the same broad class).
A thesaurus may be either alphabetical, or classified, and it may or may not include a graphical
display. In an alphabetical thesaurus, the descriptors followed by their relationships are listed in
alphabetical sequences. In a classified thesaurus, the descriptors are listed in accordance with the
hierarchical relationships represented in the thesaurus. The various levels of hierarchy are shown
by appropriate indentations. The graphical displays are multi-dimensional ways of representing
the relationships between terms. Such relationships are indicated by arrows lines or by presenting
term in concentric circles showing hierarchy. Reciprocal entries appear for each term in a
thesaurus whenever a relationship, whether hierarchical or non-hierarchical, is established
between two terms.(Fidel, 1991)

It is also a vocabulary control device developed by Dr. Ganesh Bhattacharyya at DRTC that incorporates
in itself features of both a faceted classification scheme as well as that of a conventional alphabetical
thesaurus. It is an elementary categorybased (faceted) systematic scheme of hierarchical classification in
verbal plane incorporating all the necessary and sufficient features of a conventional information retrieval
thesaurus. Like any classification scheme, it displays hierarchical relationships among terms in its
schedules. Like a faceted classification scheme,, there are separate schedules for each of the Elementary
Categories (Entity, Property, and Action) and for common modifiers like Form, Time, Place, and
Environment. Like any thesaurus, each of the terms in the hierarchic schedules is enriched by synonyms,
quasi-synonyms, etc. Unlike a thesaurus, a classaurus does not include other associatively related terms
(RTs) because of its category-based (faceted) structure. It is said that a term in one elementary category
has a high chance of being associatively related with another term in another category depending on the
subject of the document. It is assumed that RTs should not be dictated by the designer of the classaurus,
rather it should be dictated by the document itself since any term may be associatively related to other
terms depending on the nature of the thought content of the document. The classaurus has two parts: the
Systematic Part and the Alphabetical Index part.(Leise, 2008)

This concept has been developed by Jean Aitchison and others for English Electric Company. It
is basically a faceted classification, integrated with a thesaurus. Thesauro-facet consists of two
sections: a) faceted classification scheme, and b) alphabetical thesaurus. Here, the thesaurus
replaces the alphabetical subject index, which normally follows the schedules in a conventional
faceted classification. Terms appear twice - once in the schedule and once in the alphabetical

7|Page
A taxonomy is an orderly
classification for a defined
domain. It may also be known as
a faceted vocabulary. It
comprises controlled
vocabulary terms (generally
only preferred terms) organized
into a hierarchical structure. Each term in a taxonomy is in one or more parent/child (broader/
narrower) relationships to other terms in the taxonomy. There can be different types of parent/child
relationships, such as whole/part, genus/ species, or instance relationships. However, in good
practice, all children of a given parent share the same type of relationship.
A taxonomy may differ from a thesaurus in that it generally has shallower hierarchies and a less
complicated structure. For example, it often has no equivalent (synonyms or variant terms) or
related terms (associative relationships). The scientific classifications of animals and plants are
well-known examples of taxonomies. A partial display of Flavobacteria in the taxonomy of the
U.S. National Center for Biotechnology Information is above. In common usage, the term
taxonomy may also refer to any classification or placement of terms or headings into categories,
particularly a controlled vocabulary used as a navigation structure for a Web site.

Folksonomy is a neologism referring to an assemblage of concepts represented by terms and


names (called tags) that are compiled through social tagging. Social tagging is the decentralized
practice and method by which individuals and groups create, manage, and share tags (terms,
names, etc.) to annotate and categorize digital resources in an online social environment. This
method is also referred to as social classification, social indexing, mob indexing, and folk
categorization. Social tagging is not necessarily collaborative, because the effort is typically not
organized; individuals are not actually working together or in concert, and standardization and
common vocabulary are not employed.
Folksonomies do not typically have hierarchical structure or preferred terms for concepts, and they
may not even cluster synonyms. They are not considered authoritative because they are typically
not compiled by experts. Furthermore, they are by definition not applied to documents by
professional indexers. Given that it is impossible for the large and varied community of creators
and users of Web content to independently add metadata in a consistent manner, folksonomies are
generally characterized by nonstandard, idiosyncratic terminology. Although they do not support
organized searching and other types of browsing as well as tags from controlled vocabularies
applied by professionals, folksonomies can be useful in situations where controlled tagging is not
possible: they can also provide additional access points not included in more formal vocabularies.
There may be great potential for enhanced retrieval by linking terms and names from folksonomies
to more rigorously structured controlled vocabularies. (Mukherjee, 2017)

8|Page
Whereas the vocabularies discussed above are the ones
most commonly used for art information, discussions
of controlled vocabularies may also include ontologies.
In common usage in computer science, an ontology
is a formal, machine-readable specification of a
conceptual model in which concepts, properties,
relationships, functions, constraints, and axioms are
all explicitly defined. Such an ontology is not a
controlled vocabulary, but it uses one or more
controlled vocabularies for a defined domain and
expresses the vocabulary in a representative language that has a grammar for using vocabulary
terms to express something meaningful. Ontologies generally divide the realm of knowledge
that they represent into the following areas: individuals, classes, attributes, relations, and
events. The grammar of the ontology links these areas together by formal constraints that
determine how the vocabulary terms or phrases may be used together. There are several grammars
or languages for ontologies, both proprietary and standards-based. An ontology is used to
make queries and assertions. Ontologies have some characteristics in common with faceted
taxonomies and thesauri, but ontologies use strict semantic relationships among terms and
attributes with the goal of knowledge representation in machine-readable form, whereas
thesauri provide tools for cataloging and retrieval. Ontologies are used in the Semantic Web,
artificial intelligence, software engineering, and information architecture as a form of
knowledge representation in electronic form about a particular domain of knowledge.
In the example above, each item in the ontology belongs to the subclass above it. Items can also
belong to various other classes, although the relationships may be different. For example, a
watercolor is a painting, but it may also be classified as a drawing because it is a work on paper.
Van Gogh’s Irises could be classified with oil paintings (with the relationship type medium is)
but also with Post-Impressionist art (with relationship type style/period is). Relationships in
ontologies are defined according to strict rules, which are different than the equivalence,
hierarchical, and associative relationships used for thesauri and other vocabularies discussed in
this book.

Although there are many new concepts and ideas to learn about in the world of vocabulary control,
especially in an enterprise setting, many indexers find the work of creating vocabulary control to
be both a natural extension of their indexing skills and an important new income stream.

9|Page
Fidel, R. (1991). Searchers’ selection of search keys: II. Controlled vocabulary or
free‐text searching. Journal of the American Society for Information Science,
42(7), 501–514. https://doi.org/10.1002/(SICI)1097-
4571(199108)42:7<501::AID-ASI5>3.0.CO;2-V
Leise, F. (2008). Controlled vocabularies: an introduction. The Indexer: The
International Journal of Indexing, 26(3), 121–126.
https://doi.org/10.3828/indexer.2008.37
Mukherjee, B. (2017). Vocabulary Control : Subject HEADING LISTS AND
THESAURI. unit 15, 21–38.
http://www.egyankosh.ac.in/bitstream/123456789/33118/1/Unit-15.pdf

10 | P a g e

You might also like