Professional Documents
Culture Documents
Samuel Ross - Assignment #4 - Final Paper
Samuel Ross - Assignment #4 - Final Paper
Sam Ross
LIS 602-202
Fall 2017
A Controlled Vocabulary or Thesaurus can often be vital in the process of information
retrieval. Before getting into why this is, the difference between both concepts must be defined
and distinguished from what they are not. As Pomerantz (2015) explains, a controlled vocabulary
is a set of rules that dictate how to represent a specific kind of data, and also is specific to an
individual metadata element. (p. 32) This leads to the establishment of a number (large or
small) of specific defining terms which connect certain ideas. This is further developed and
quantified through the use of thesauri, which develops a structure around how the terms are
organized and connected to one another. A thesaurus does not govern how words may or may
not be used; a thesaurus governs the relationship among words. (Pomerantz, 2015, p. 38)
To explore these ideas in practice, this paper looks at how a controlled vocabulary and
thesaurus has been employed by the Library of Congress in their Thesaurus for Graphic
Materials. Previously separated into two separated vocabularies until 2007, the thesaurus
includes more than 7,000 subject terms and 650 genre/format terms to index types of
photographs, prints, design drawings, ephemera, and other pictures. (Library of Congress, 2017)
In deconstructing attributes of this thesaurus (and the controlled vocabulary it organizes), the
author hopes to define the thesaurus history, intended audience, scope and purpose within the
Background
The Library of Congress Thesaurus for Graphic Materials originated as a consolidated list of a
number of subject headings used to index materials in the Librarys Prints and Photographs
Division. As Alexander and Meehleib (2001) recount, there was a need for such a list of terms
because previously developed subject heading lists, such as the Library of Congress Subject
1
LCSH, with over 200,000 terms developed primarily to accommodate textual materials, included
terms that when applied to images seemed to overlap conceptually.9 LCSH also lacked terms for the
kinds of subjects frequently depicted in visual materials, which are typically too specific to be the
topic of a book, for example, Yin yang (Symbol), Moonlight, and Corn husking. On the other
hand, AAT, developed by the Getty Art History Information Program, contained roughly 120,000
terms focusing on art and architecture in the Western world, but lacked terms for abstract concepts
often represented in allegorical prints, cartoons, and postersas well as terms for people and activities,
all necessary for cataloging large and diverse image collections. (p. 192)
As Gilliland (2008) discusses, In general, all information objects, regardless of the physical or
intellectual form they take, have three featurescontent, context, and structureall of which
can and should be reflected through metadata. (p. 2) The metadata that cataloguers needed to
describe the materials they were sorting for future identification was found lacking in context of
those available through the LCSH for the needs of describing a visual medium. Likewise, AAT
Therefore, where one cannot adequately detail a concept within existing controlled
vocabularies, a new one is demanded. In the case of the Thesaurus for Graphics Materials
(TGM), this new list of subject headings to cover the range of subjects of graphic images was
originally defined in 1980 and written up in the TGM I; a second thesaurus, the TGM II was also
defined and released the same year around a separate controlled vocabulary identifying genre
and physical description terms. While both were originally released independently, they were
consolidated into one main database in 2007, although both had received substantial updating
2
Theoretical Underpinning
Before the structure of the thesaurus can be analyzed, it must first be established why the
thesaurus and controlled vocabulary are necessary. As Fast et. Al (2002) explain, a controlled
vocabulary is a way to insert an interpretive layer of semantics between the term entered by the
user and the underlying database to better represent the terms of the user. A constrained list of
terms that will allow both the database and the user to operate more efficiently in addressing the
users need. It is much more likely that satisfactory results will be identified if both the database
This is not an easy process, because as Day (2014) points out, Natural language is filled
with ambiguities. (p. 39) For example, in describing a visual work about racism, one might use
the terms bigotry or prejudice interchangeably. However, depending which of the two terms
was used would change the results that came up in any search of works on the topic. Therefore,
in the case of the example provided, prejudice is used for indexing all representative items in
vocabulary where these controlled terms are defined through their relationships with other terms
within the vocabulary in order to put them in context. Relevance to the subject can be established
through a hierarchy of relationships and each term represents a specific idea that expresses
something definite and specific about the item. Using this kind of subject analysis to connect
where terms exist in relation to broader terms, narrower terms and other related terms provides
users with subject access to information, to collocate information resources of a like nature, and
to provide a logical location for similar tangible items. (Taylor & Joudrey, 2009, p. 305)
3
Intended Audience
Designed to function as a tool for both catalogers and researchers, TGM I and TGM II contain
abundant scope notes and cross references (Alexander et. al, 2001, p. 194) Cataloguers must use
the Thesaurus for Graphic Materials subject headings carefully and deliberately to accurately
describe and provide context to the content of the material so that the process of identifying and
finding the material again is made easier and more direct for researchers. Researchers than use
the subject headings to more efficiently identifies the information that they are seeking.
Neither the modern consolidated Thesaurus of Graphic Material, nor the Print &
Photographs Online Catalog (PPOC) designed to search it outline a formal intended audience.
Both thesauri within the consolidated Thesaurus of Graphic Materials however have traditionally
taken different approaches in defining their intended audiences. The 1995 printed edition of the
TGM I, for example, outlined that TGM I is designed as a tool both for those who create catalog
records and for those who search for them. TGM II, on the other hand, took a more purpose
oriented approach, focusing on all the different actions (Aid, Assist, Help, Make) that
the thesaurus could assist a variety of audiences with, providing only examples of possible
Both these approaches are relatively common. Clearly identifying an intended audience is
often established in controlled vocabularies in order to best represent their appropriate use. The
U.S. National Library of Medicine, for example, explains in their introduction to Medical
Subject Headings (MeSH) that The MeSH vocabulary is designed for use by NLM for indexing
and searching of the MEDLINE database of journal citations and other data. This enables
retrieval systems, such as NLM's PubMed, to provide subject searching of the data. MeSH
4
establishes a formal limited number of query terms which all NLM member institutions and
information seekers are expected to use in order to find MEDLINE information which is relevant
to their query. The implication however, is that the intended audience is already fluent in or will
In other cases, the intended audience is not only defined by the organization laying out a
standard, but by who traditionally has employed that standard in the past or who the organization
specifically desires to use their vocabulary to find the information they seek. This is
demonstrated by the intended audience that the Getty Research Institute outlines for its Art and
Originally designed for two different groups of subject headings, the TGM I and II similarly
always had separate defined purposes. Developed to support the cataloging and retrieval needs
of the Library of Congress Prints and Photographs Division, TGM I is offered to other
institutions in the hope that it will fill similar needs and will promote standardization in image
cataloging. (Library of Congress) TGM II, as discussed in the previous section, was intended
for a much broader range of activities, as discussed in the 1995 print version:
assist research into the development and distribution of a particular genre or technical process;
aid retrieval of information about aspects of graphic materials frequently requested by people who
assist collection preservation, since collections are handled less when the catalog provides more
specific access;
5
help collection management by providing, for example, the information needed to calculate the
aid cataloging, since pinpointing a process or format may help to date or identify an image;
make cataloging more consistent and encourage specificity by providing standard terminology in a
assist institutions in disseminating information about their collections through database networks or
The disparity between these two thesauri in terms of purpose reflects the variation in which the
terms are used in cataloguing. TGM I terms are largely descriptive and intended to describe
terms for various subjects depicted in the graphical work, constantly incorporating a variety of
new concepts as necessary. As Alexander et. al (2001) points out, TGM I contains more than
6,300 authorized terms with approximately 5,000 cross references. Several hundred terms are
added each year. Recent cataloging has produced new terms such as Body painting, Desert
islands, Bazookas, Lame duck, and Diapers. (p. 193) TGM I therefore has come to cover
a significant and flexible range of qualifications for anything that could be contrived as a Visual
Material.
TGM II also covers a wide scope, but generally creates a controlled vocabulary around
the form that a work taken on by different graphic materials, such as drawings and
lithographs, having more than 600 authorized terms with more than 450 cross references.
(Alexander et. al, 2001 p.193). Because of the fewer terms and orientation of the TGM II, the
Library of Congress can be much more explicit in what the terms are meant for and how they
should be used.
On a theoretical level, the separation in intended purpose of the thesauri bears some
6
relationship model. FRBR outlines four different terms that express different things about a
Croissant (2012) cites from the official definitions of these four terms outlined by the
An expression is the intellectual or artistic realization of a work in the form of alpha-numeric, musical, or
choreographic notation, sound, image, object, movement, etc., or any combination of such forms.
In the context of the TGM, TGM II would represent a controlled vocabulary of subject headings
to describe a manifestation or item. TGM I, on the other hand is a little bit broader than the
concept outlined through work or expression, as the TGM I describes something depicted by
the work, not the work itself. Under FRBR, those four terms are used in three different schemas,
known as group and get more specific with later groups. Group 3 entities, according to Tillet
(2003) are the subjects of works. These can be concepts, objects, events, places, and any of the
Group 1 or Group 2 entities, (p. 3) the latter of which describes the person or organization
related to the work. In other words, the TGM I controlled vocabulary is oriented around
simplifying the objects, concepts, events, places and objects in a work, expression, manifestation
or subject of an entity.
Conclusion
Controlled vocabularies and the Thesauri that compile them can be imperfect. Drabinski
(2013) notes that controlled vocabularies fail to account for a complete chronological context of
words that have changed over time and may be used differently in different kinds of search
queries. Olson (2001) even questions the legitimate need for controlled vocabularies, remarking
7
that in imposing controlled vocabulary we construct both a limited system for the representation
actually hide their exclusions under the guise of neutrality. (p. 640)
Within this context, it seems ironic that the Library of Congress Thesaurus of Graphic
Materials was originally compiled to address finding the established LCSH methodology
insufficient for the needs of cataloguing. However, it is within this construction that a resolution
might be found. By clearly establishing the context, scope, audience and purpose for which a
thesaurus is constructed, one can better tailor the controlled vocabulary to fit the needs of the
desired audience without needing to decentralize the vocabulary as Olson suggests. When an
information seeker understands the information they are looking for, a contextual thesaurus like
the Thesaurus of Graphic Materials can be a great tool in assisting their search.
8
Bibliography
Alexander, A., & Meehleib, T. (2001). The Thesaurus for Graphic Materials: Its History, Use,
and Future. Cataloging & Classification Quarterly, 31(3-4), 189-212.
doi:https://doi.org/10.1300/J104v31n03_04
Croissant, Charles. (2012) FRBR and RDA: What They Are and How They May Affect the
Future of Libraries. Theological Librarianship. 5(2): 6-18. Retrieved December 07, 2017 from
https://www.sbt.ti.ch/doc/forum/RDA/BN/Croissant_FRBR_and_RDA.pdf
Day, R. E. (2014). Indexing it all: the subject in the age of documentation, information, and
data. Cambridge, MA: The MIT Press.
Drabinski, E. (2013). Queering the catalog: Queer theory and the politics of correction. The
Library Quarterly, 83(2), 94-111.
Fast, K., Leise, F & Steckel, M. (2002). What is a Controlled Vocabulary? Retrieved December
06, 2017, from http://boxesandarrows.com/what-is-a-controlled-vocabulary/
Gilliland, Anne. (2008). Setting the Stage, in Introduction to Metadata, ed. Murtha Baca.
Retrieved December 5, 2017 from
http://www.getty.edu/research/publications/electronic_publications/intrometadata/setting.pdf
J. Paul Getty Trust (n.d.). About the AAT (Getty Research Institute). Retrieved December 06,
2017, from http://www.getty.edu/research/tools/vocabularies/aat/about.html
Library of Congress (2017). Thesaurus for Graphic Materials. Retrieved December 5, 2017 from
http://www.loc.gov/pictures/collection/tgm/
Library of Congress. (n.d.). Thesaurus for Graphic Materials I: Subject Terms (TGM I)
INTRODUCTION (1995 printed edition). Retrieved December 06, 2017, from
https://www.loc.gov/rr/print/tgm1/ia.html
Library of Congress. (n.d.). Thesaurus for Graphic Materials II: Genre and Physical
Characteristic Terms (TGM II) INTRODUCTION. Retrieved December 06, 2017, from
https://www.loc.gov/rr/print/tgm2/ii.html
Olson, H. A. (2001). The power to name: Representation in library catalogs. Signs: journal of
Women in Culture and Society, 26(3), 639-668.
Taylor, A. G., & Joudrey, D. N. (2009). The organization of information. Westport, CT:
Libraries Unlimited.
Tillett, Barbara B.(2003). What Is FRBR? A Conceptual Model for the Bibliographic Universe.
Technicalities, 25(5) (Sept./Oct. 2003). Retrieved December 07, 2017, from
https://www.loc.gov/cds/downloads/FRBR.PDF
U.S. National Library of Medicine. Use of MeSH in Online Retrieval. (n.d.). Retrieved
December 06, 2017, from https://www.nlm.nih.gov/mesh/intro_retrieval.html