Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Assignment #4 Final Paper

Sam Ross
LIS 602-202
Fall 2017
A Controlled Vocabulary or Thesaurus can often be vital in the process of information

retrieval. Before getting into why this is, the difference between both concepts must be defined

and distinguished from what they are not. As Pomerantz (2015) explains, a controlled vocabulary

is a set of rules that dictate how to represent a specific kind of data, and also is specific to an

individual metadata element. (p. 32) This leads to the establishment of a number (large or

small) of specific defining terms which connect certain ideas. This is further developed and

quantified through the use of thesauri, which develops a structure around how the terms are

organized and connected to one another. A thesaurus does not govern how words may or may

not be used; a thesaurus governs the relationship among words. (Pomerantz, 2015, p. 38)

To explore these ideas in practice, this paper looks at how a controlled vocabulary and

thesaurus has been employed by the Library of Congress in their Thesaurus for Graphic

Materials. Previously separated into two separated vocabularies until 2007, the thesaurus

includes more than 7,000 subject terms and 650 genre/format terms to index types of

photographs, prints, design drawings, ephemera, and other pictures. (Library of Congress, 2017)

In deconstructing attributes of this thesaurus (and the controlled vocabulary it organizes), the

author hopes to define the thesaurus history, intended audience, scope and purpose within the

broader context of the course readings for LIS 602-202.

Background

The Library of Congress Thesaurus for Graphic Materials originated as a consolidated list of a

number of subject headings used to index materials in the Librarys Prints and Photographs

Division. As Alexander and Meehleib (2001) recount, there was a need for such a list of terms

because previously developed subject heading lists, such as the Library of Congress Subject

Headings (LCSH) were insufficient for staff cataloguing needs:

1
LCSH, with over 200,000 terms developed primarily to accommodate textual materials, included

terms that when applied to images seemed to overlap conceptually.9 LCSH also lacked terms for the

kinds of subjects frequently depicted in visual materials, which are typically too specific to be the

topic of a book, for example, Yin yang (Symbol), Moonlight, and Corn husking. On the other

hand, AAT, developed by the Getty Art History Information Program, contained roughly 120,000

terms focusing on art and architecture in the Western world, but lacked terms for abstract concepts

often represented in allegorical prints, cartoons, and postersas well as terms for people and activities,

all necessary for cataloging large and diverse image collections. (p. 192)

As Gilliland (2008) discusses, In general, all information objects, regardless of the physical or

intellectual form they take, have three featurescontent, context, and structureall of which

can and should be reflected through metadata. (p. 2) The metadata that cataloguers needed to

describe the materials they were sorting for future identification was found lacking in context of

those available through the LCSH for the needs of describing a visual medium. Likewise, AAT

had proven incompatible structurally with describing abstract concepts.

Therefore, where one cannot adequately detail a concept within existing controlled

vocabularies, a new one is demanded. In the case of the Thesaurus for Graphics Materials

(TGM), this new list of subject headings to cover the range of subjects of graphic images was

originally defined in 1980 and written up in the TGM I; a second thesaurus, the TGM II was also

defined and released the same year around a separate controlled vocabulary identifying genre

and physical description terms. While both were originally released independently, they were

consolidated into one main database in 2007, although both had received substantial updating

and revision from their original editions by that point.

2
Theoretical Underpinning

Before the structure of the thesaurus can be analyzed, it must first be established why the

thesaurus and controlled vocabulary are necessary. As Fast et. Al (2002) explain, a controlled

vocabulary is a way to insert an interpretive layer of semantics between the term entered by the

user and the underlying database to better represent the terms of the user. A constrained list of

terms that will allow both the database and the user to operate more efficiently in addressing the

users need. It is much more likely that satisfactory results will be identified if both the database

and the user have a term that acts as a consensus of a concept.

This is not an easy process, because as Day (2014) points out, Natural language is filled

with ambiguities. (p. 39) For example, in describing a visual work about racism, one might use

the terms bigotry or prejudice interchangeably. However, depending which of the two terms

was used would change the results that came up in any search of works on the topic. Therefore,

in the case of the example provided, prejudice is used for indexing all representative items in

the TGM, not bigotry.

It is further necessary to develop a common organizational system for the standardized

vocabulary where these controlled terms are defined through their relationships with other terms

within the vocabulary in order to put them in context. Relevance to the subject can be established

through a hierarchy of relationships and each term represents a specific idea that expresses

something definite and specific about the item. Using this kind of subject analysis to connect

where terms exist in relation to broader terms, narrower terms and other related terms provides

users with subject access to information, to collocate information resources of a like nature, and

to provide a logical location for similar tangible items. (Taylor & Joudrey, 2009, p. 305)

3
Intended Audience

Designed to function as a tool for both catalogers and researchers, TGM I and TGM II contain

abundant scope notes and cross references (Alexander et. al, 2001, p. 194) Cataloguers must use

the Thesaurus for Graphic Materials subject headings carefully and deliberately to accurately

describe and provide context to the content of the material so that the process of identifying and

finding the material again is made easier and more direct for researchers. Researchers than use

the subject headings to more efficiently identifies the information that they are seeking.

Neither the modern consolidated Thesaurus of Graphic Material, nor the Print &

Photographs Online Catalog (PPOC) designed to search it outline a formal intended audience.

Both thesauri within the consolidated Thesaurus of Graphic Materials however have traditionally

taken different approaches in defining their intended audiences. The 1995 printed edition of the

TGM I, for example, outlined that TGM I is designed as a tool both for those who create catalog

records and for those who search for them. TGM II, on the other hand, took a more purpose

oriented approach, focusing on all the different actions (Aid, Assist, Help, Make) that

the thesaurus could assist a variety of audiences with, providing only examples of possible

audiences, such as a student of lithography to find examples of lithographs or making it so the

scholar can rapidly retrieve photographically illustrated books.

Both these approaches are relatively common. Clearly identifying an intended audience is

often established in controlled vocabularies in order to best represent their appropriate use. The

U.S. National Library of Medicine, for example, explains in their introduction to Medical

Subject Headings (MeSH) that The MeSH vocabulary is designed for use by NLM for indexing

and searching of the MEDLINE database of journal citations and other data. This enables

retrieval systems, such as NLM's PubMed, to provide subject searching of the data. MeSH

4
establishes a formal limited number of query terms which all NLM member institutions and

information seekers are expected to use in order to find MEDLINE information which is relevant

to their query. The implication however, is that the intended audience is already fluent in or will

already understand how to navigate and employ this controlled vocabulary.

In other cases, the intended audience is not only defined by the organization laying out a

standard, but by who traditionally has employed that standard in the past or who the organization

specifically desires to use their vocabulary to find the information they seek. This is

demonstrated by the intended audience that the Getty Research Institute outlines for its Art and

Architecture Thesaurus which includes museums, libraries, visual resource collections,

archives, conservation projects, cataloging projects, and bibliographic projects.

Scope and Purpose

Originally designed for two different groups of subject headings, the TGM I and II similarly

always had separate defined purposes. Developed to support the cataloging and retrieval needs

of the Library of Congress Prints and Photographs Division, TGM I is offered to other

institutions in the hope that it will fill similar needs and will promote standardization in image

cataloging. (Library of Congress) TGM II, as discussed in the previous section, was intended

for a much broader range of activities, as discussed in the 1995 print version:

TGM II terms will:

assist research into the development and distribution of a particular genre or technical process;

aid retrieval of information about aspects of graphic materials frequently requested by people who

want to understand how a certain technique is performed;

aid selection of materials for exhibitions or class demonstrations;

assist collection preservation, since collections are handled less when the catalog provides more

specific access;

5
help collection management by providing, for example, the information needed to calculate the

quantity of glass transparencies held by an institution;

aid cataloging, since pinpointing a process or format may help to date or identify an image;

make cataloging more consistent and encourage specificity by providing standard terminology in a

ready reference format;

assist institutions in disseminating information about their collections through database networks or

other means. (Library of Congress)

The disparity between these two thesauri in terms of purpose reflects the variation in which the

terms are used in cataloguing. TGM I terms are largely descriptive and intended to describe

terms for various subjects depicted in the graphical work, constantly incorporating a variety of

new concepts as necessary. As Alexander et. al (2001) points out, TGM I contains more than

6,300 authorized terms with approximately 5,000 cross references. Several hundred terms are

added each year. Recent cataloging has produced new terms such as Body painting, Desert

islands, Bazookas, Lame duck, and Diapers. (p. 193) TGM I therefore has come to cover

a significant and flexible range of qualifications for anything that could be contrived as a Visual

Material.

TGM II also covers a wide scope, but generally creates a controlled vocabulary around

the form that a work taken on by different graphic materials, such as drawings and

lithographs, having more than 600 authorized terms with more than 450 cross references.

(Alexander et. al, 2001 p.193). Because of the fewer terms and orientation of the TGM II, the

Library of Congress can be much more explicit in what the terms are meant for and how they

should be used.

On a theoretical level, the separation in intended purpose of the thesauri bears some

resemblance to the Functional Requirements for Bibliographic Records (FRBR) entity-

6
relationship model. FRBR outlines four different terms that express different things about a

subject, Work, Expression, Manifestation, and Subject.

Croissant (2012) cites from the official definitions of these four terms outlined by the

group that created them, the IFLA:

A work is a distinct intellectual or artistic creation.

An expression is the intellectual or artistic realization of a work in the form of alpha-numeric, musical, or

choreographic notation, sound, image, object, movement, etc., or any combination of such forms.

A manifestation is the physical embodiment of an expression of a work.

An item is a single exemplar of a manifestation...a single physical object. (p. 8)

In the context of the TGM, TGM II would represent a controlled vocabulary of subject headings

to describe a manifestation or item. TGM I, on the other hand is a little bit broader than the

concept outlined through work or expression, as the TGM I describes something depicted by

the work, not the work itself. Under FRBR, those four terms are used in three different schemas,

known as group and get more specific with later groups. Group 3 entities, according to Tillet

(2003) are the subjects of works. These can be concepts, objects, events, places, and any of the

Group 1 or Group 2 entities, (p. 3) the latter of which describes the person or organization

related to the work. In other words, the TGM I controlled vocabulary is oriented around

simplifying the objects, concepts, events, places and objects in a work, expression, manifestation

or subject of an entity.

Conclusion

Controlled vocabularies and the Thesauri that compile them can be imperfect. Drabinski

(2013) notes that controlled vocabularies fail to account for a complete chronological context of

words that have changed over time and may be used differently in different kinds of search

queries. Olson (2001) even questions the legitimate need for controlled vocabularies, remarking

7
that in imposing controlled vocabulary we construct both a limited system for the representation

of information and a universality/diversity binary opposition, and that controlled vocabularies

actually hide their exclusions under the guise of neutrality. (p. 640)

Within this context, it seems ironic that the Library of Congress Thesaurus of Graphic

Materials was originally compiled to address finding the established LCSH methodology

insufficient for the needs of cataloguing. However, it is within this construction that a resolution

might be found. By clearly establishing the context, scope, audience and purpose for which a

thesaurus is constructed, one can better tailor the controlled vocabulary to fit the needs of the

desired audience without needing to decentralize the vocabulary as Olson suggests. When an

information seeker understands the information they are looking for, a contextual thesaurus like

the Thesaurus of Graphic Materials can be a great tool in assisting their search.

8
Bibliography

Alexander, A., & Meehleib, T. (2001). The Thesaurus for Graphic Materials: Its History, Use,
and Future. Cataloging & Classification Quarterly, 31(3-4), 189-212.
doi:https://doi.org/10.1300/J104v31n03_04

Croissant, Charles. (2012) FRBR and RDA: What They Are and How They May Affect the
Future of Libraries. Theological Librarianship. 5(2): 6-18. Retrieved December 07, 2017 from
https://www.sbt.ti.ch/doc/forum/RDA/BN/Croissant_FRBR_and_RDA.pdf

Day, R. E. (2014). Indexing it all: the subject in the age of documentation, information, and
data. Cambridge, MA: The MIT Press.

Drabinski, E. (2013). Queering the catalog: Queer theory and the politics of correction. The
Library Quarterly, 83(2), 94-111.

Fast, K., Leise, F & Steckel, M. (2002). What is a Controlled Vocabulary? Retrieved December
06, 2017, from http://boxesandarrows.com/what-is-a-controlled-vocabulary/

Gilliland, Anne. (2008). Setting the Stage, in Introduction to Metadata, ed. Murtha Baca.
Retrieved December 5, 2017 from
http://www.getty.edu/research/publications/electronic_publications/intrometadata/setting.pdf

J. Paul Getty Trust (n.d.). About the AAT (Getty Research Institute). Retrieved December 06,
2017, from http://www.getty.edu/research/tools/vocabularies/aat/about.html

Library of Congress (2017). Thesaurus for Graphic Materials. Retrieved December 5, 2017 from
http://www.loc.gov/pictures/collection/tgm/

Library of Congress. (n.d.). Thesaurus for Graphic Materials I: Subject Terms (TGM I)
INTRODUCTION (1995 printed edition). Retrieved December 06, 2017, from
https://www.loc.gov/rr/print/tgm1/ia.html

Library of Congress. (n.d.). Thesaurus for Graphic Materials II: Genre and Physical
Characteristic Terms (TGM II) INTRODUCTION. Retrieved December 06, 2017, from
https://www.loc.gov/rr/print/tgm2/ii.html

Olson, H. A. (2001). The power to name: Representation in library catalogs. Signs: journal of
Women in Culture and Society, 26(3), 639-668.

Pomerantz, J. (2015). Metadata. Cambridge, MA: The MIT Press.

Taylor, A. G., & Joudrey, D. N. (2009). The organization of information. Westport, CT:
Libraries Unlimited.
Tillett, Barbara B.(2003). What Is FRBR? A Conceptual Model for the Bibliographic Universe.
Technicalities, 25(5) (Sept./Oct. 2003). Retrieved December 07, 2017, from
https://www.loc.gov/cds/downloads/FRBR.PDF

U.S. National Library of Medicine. Use of MeSH in Online Retrieval. (n.d.). Retrieved
December 06, 2017, from https://www.nlm.nih.gov/mesh/intro_retrieval.html

You might also like