Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Kara Craig

LIS 882
Spring 2018/2019

Metadata Schema Report

History and Background

Preservation Metadata: Implementation Strategies, or PREMIS, was conceptualized by the


Online Computer Library Center (OCLC) and Research Libraries Group (RLG) in order to develop a core
set of metadata elements specifically geared toward digital preservation that could easily be implemented
by any institution focused on preservation work. PREMIS expanded on previous standards established by
the Preservation Metadata Framework Working Group, a previous initiative also by the OCLC and RLG.
This earlier working group’s goal was “to develop a framework outlining the types of information that
should be associated with an archived digital object” and mapped its own metadata elements to an
already existing information model from the Open Archival Information System (Caplan & Guenther,
2005, p. 112). Though this earlier iteration developed metadata elements and standards for preservation
work, the PREMIS working group saw that it was lacking detail and clear specifications for the elements
(Caplan & Guenther, 2005, p. 112). With this, the working group specifically wanted to “develop a data
dictionary of core metadata elements to be applied to archived objects, give guidance on the
implementation of that metadata element set in preservation systems, and suggest best practice for
populating those elements (Caplan & Guenther, 2005, p. 112).

The PREMIS working group was divided into two subgroups, one of those being the
Implementation Strategies Subgroup. The group was tasked with surveying “encoding, storage, and
management of preservation metadata within digital preservation systems”, among other things (Caplan &
Guenther, 2005, p. 113). After surveying a variety of institutions, from libraries and archives on an
international level, the group found that, one, “there [was] very little experience with digital preservation”
and, two, “those engaged in digital preservation still [lacked] a common vocabulary and, to a large extent,
a common conceptual framework”, to the point where “thirty-three different metadata element sets or rule
sets were mentioned by at least one repository” (Caplan & Guenther, 2005, p. 114-115). These findings
clearly illustrated not only a dire need for a standardization of practice and a kind of singular framework
related to metadata but also a foundational knowledge and need for experience in digital preservation.

PREMIS

It is important to note that PREMIS, unlike other schemas that may focus on administrative or
descriptive metadata, is specifically concerned with metadata based around the preservation of digital
files in order to “ensure the long-term usability of a digital resource” and the “viability, renderability,
understandability, authenticity and identity” of a digital object (Caplan, 2009, p. 3; Pomerantz, 2017).
PREMIS is “not concerned with discovery and access” nor does it “attempt to define detailed format-
specific metadata” (Caplan 2009, p. 4). In this vein, the below information about a file is particularly
relevant when wanting to use PREMIS:
● Inhibitors - “any features of an object intended to inhibit access, use, or migration. Inhibitors
include password protection and encryption”
● Provenance/digital provenance - “is the record of the chain of custody and change history of a
digital object”
● Significant properties - characteristics of an object that should be maintained through
preservation actions
● Rights - such as “copyright status, license terms and special permissions” (Caplan, 2009 p. 6-7).

Core Entities & Semantic Units


The PREMIS data dictionary identifies four core entities: objects, events, agents, and rights, with
intellectual entities being an important part of the PREMIS data model. Each entity has parts called
semantic units and they are conceptually similar to elements. Semantic units are defined as “[pieces] of
information or knowledge” (Caplan, 2009, p. 7). Sometimes, they can act as containers for their semantic
components, which are similar to subelements.

An intellectual entity is essentially a single conceptual item, a “set of content that is considered a
single intellectual unit for purposes of management and description”, while the object is that tangible unit
of representation for that conceptual item (Caplan, 2009, p. 9). PREMIS does not actually provide actual
semantic units for this entity (mostly because of redundancy due to other vocabularies and schema that
already exist) but provides a semantic unit that falls under the Object Entity to represent it (Pomerantz,
2017).

The object entity has three subtypes: file (basically the computer file of the object), bitstream (the
parts of that computer file), and representation (“the set of files, including structural metadata, needed for
a complete and reasonable rendition of an Intellectual Entity”) (Caplan, 2009, p. 9; ) . The information that
can be included in the object entity include an identifier (semantic unit objectIdentifier), size and format of
the object (contained in the objectCharacteristics semantic unit), and so on. It is important to note that the
object entity is the only entity that allows the record to show relationships with other object entities. There
will be a little more in-depth information on the object entity later.

Events are essentially actions done to the object over time, such as the creation and modification
of the digital object (Pomerantz, 2017). Some event semantic units include eventIdentifier, eventType,
eventDateTime, eventDetailInformation, and eventOutcomeInformation.

Agents are the persons or organizations that basically enact these events or affect the object in
any kind of way, including those involved in the rights of the object (McCargar, 2005). Information/units
include: an identifier for the agent (the agentIdentifier semantic unit), agent's name (agentName),
designation of the type of agent, like the person, organization, software (agentType), among others
(Caplan, 2009, p. 10).

Rights are essentially things related to the permissions, copyright, intellectual property, and so
on, of the digital object. The rightsStatement semantic unit requires the semantic components of
rightsStatementIdentifier and rightsBasis.

Object Entity

As stated previously, each entity has corresponding semantic units which have their own
semantic components. Below is a brief XML example of <objectIdentifier>, from the (Library of Congress’
PREMIS XML Usage Examples).
<premis:objectIdentifier>
<premis:objectIdentifierType>local<premis:/objectIdentifierType>
<premis:objectIdentifierValue>001<premis:/objectIdentifierValue>
<premis:/objectIdentifier>

Above, <objectIdentifier> acts as a container element for the two components, where
<objectIdentifierType> contains the domain where the object identifier is unique (usually from a controlled
vocabulary) and <objectIdentifierValue> has the actual value or identifier for the object, per that domain.
The below example shows an object semantic unit with a relationship to another object in the same
record.

<premis:object xsi:type="premis:file">
<premis:objectIdentifier>
<premis:objectIdentifierType>local</premis:objectIdentifierType>
<premis:objectIdentifierValue>001</premis:objectIdentifierValue>
</premis:objectIdentifier>
<premis:preservationLevel>
<premis:preservationLevelType>logical
preservation</premis:preservationLevelType>
<premis:preservationLevelValue>emulation</premis:preservationLevelValue>
<premis:preservationLevelRole
authority="preservationLevelRole"authorityURI="http://id.loc.gov/vocabular
y/preservation/preservationLevelRole"valueURI="http://id.loc.gov/vocabular
y/preservation/preservationLevelRole/int">intention</premis:preservationLe
velRole>
<premis:preservationLevelRationale>institutional
policy</premis:preservationLevelRationale>
<premis:preservationLevelDateAssigned>2015-02-
23</premis:preservationLevelDateAssigned>
</premis:preservationLevel>
<premis:relationship>
<premis:relationshipType
authority="relationshipType"authorityURI="http://id.loc.gov/vocabulary/pre
servation/relationshipType"valueURI="http://id.loc.gov/vocabulary/preserva
tion/relationshipType/str">structural</premis:relationshipType>
<premis:relationshipSubType
authority="relationshipSubType"authorityURI="http://id.loc.gov/vocabulary/
preservation/relationshipSubType"valueURI="http://id.loc.gov/vocabulary/pr
eservation/relationshipSubType/hsp">has part</premis:relationshipSubType>
<premis:relatedObjectIdentifier>
<premis:relatedObjectIdentifierType>local</premis:relatedObjectIdentifie
rType>
<premis:relatedObjectIdentifierValue>007</premis:relatedObjectIdentifier
Value>
</premis:relatedObjectIdentifier>
</premis:relationship>
</premis:object>
Briefly, <premis:object> is acting as a container element for all of the object related information, where the
object type is a file based off of “ xsi:type=”. In <premis:objectIdentifier> and </premis:preservationLevel>,
there is specific information identifying the object being recorded in the record and specific preservation
functions regarding the object. The <premis:relationship> element contains information about related
object “007”, where the relationshipType is “structural” and the relationshipSubType is “has part”,
meaning “the object contains the related object” (PREMIS Data Dictionary, 2008, p. 110).

PREMIS in the Future

According to Donaldson and Conway in “Implementing PREMIS” widespread adoption of the


PREMIS Data Dictionary and its implementation guidelines has yet to occur, and they attempted to
determine what barriers might be causing this (Donaldson & Conway, 2010, p. 276). They cite a study
which found that the most frequent barriers in regard to PREMIS adoption were the “lack of
training/expertise and perceived lack of knowledge necessary to be confident in the ability to implement
PREMIS” (Donaldson & Conway, 2010, p. 276). Though these institutions had intended on implementing
PREMIS, many had not adopted it yet (Donaldson & Conway, 2010, p. 276). These hurdles could be
applicable to any institution attempting to implement and adapt to a new schema in their management
systems and may not be exclusive to PREMIS. As with many things, widespread training may be needed
in order to get these institutions familiar and comfortable with the data dictionary and schema, as they
seem interested in actually implementing it.

-------------------------------------------

Resources/Further Reading

PREMIS Homepage, which includes data dictionaries and schemas, implementation tools, and other
resources - https://www.loc.gov/standards/premis/index.html

PREMIS Data Dictionary - PREMIS Version 3.0 (current version) -


https://www.loc.gov/standards/premis/v3/index.html

PREMIS Resources page, which includes a wide variety of key documents, papers and articles,
presentations, and other resources - https://www.loc.gov/standards/premis/bibliography.html

Youtube - Metadata MOOC 4-9: PREMIS Data Dictionary for Preservation Metadata, Part 1
(https://www.youtube.com/watch?v=-_rntZXG7T) and Part 2 (https://www.youtube.com/watch?
v=2JFaC6kFXpo) - introductory videos on PREMIS from Jeffrey Pomerantz of UNC Chapel Hill

History and Background

The Metadata Encoding and Transmission Standard (METS) is an encoding standard and XML
schema “designed for the purpose of creating XML document instances that express the hierarchical
structure of digital library objects, and the associated descriptive and administrative metadata” (Cundiff,
2004, p. 53). METS was born out of the Making of America II project formed by UC Berkeley and the
Digital Library Federation with the goal of “[creating] a proposed digital library object standard by
encoding defined descriptive, administrative, and structural metadata, along with primary content, inside a
digital library object” (Cundiff, 2004, p. 52). According to the METS page from the Library of Congress,
without structural metadata for digital library objects such as digital/ebooks, “the page image or text files
comprising the digital work are of little use, and without technical metadata regarding the digitization
process, scholars may be unsure of how accurate a reflection of the original the digital version provides”
(“METS: An Overview & Tutorial,” 2017). The development of MoA was focused on creating a standard
encoding to “serve as a digital object transfer syntax”, “function as a data format for use with digital
libraries” and digital repositories (Cundiff, 2004, p. 52). But MoA was not sufficient enough in that it “did
not provide a vocabulary for expressing descriptive or administrative metadata” and its structural
metadata elements were too limited and only supported text and still image material (McDonough, 2006,
p. 148). Along with this “was a desire for METS to facilitate the exchange and interoperability of digital
library objects across digital library systems and to provide support for the long-term preservation of
digital library objects” (McDonough, 2006, p. 148).

Taking these issues and aims into consideration, METS was developed for “digital objects that
comprise text, images, audio, and video file”, acting as a “digital wrapper” in order to “relate the
components of a digital resource” (“METS: A Data Standard,” 2005, p. 1). These could be individual
tracks on an album, chapters in a book, the audio and visual aspects of a video, and so on. The goal is, in
order to “assure the integrity of the overall object and to facilitate the use of it, the structural relationship of
these files needs to be captured” (“METS: A Data Standard,” 2005, p. 1).

METS Document/Subsections

METS was designed to “promote interoperability of digital content between digital library systems
and contribute to the preservation of digital library materials” (McDonough, 2006, p. 148). With this, METS
does not have a single or prescribed vocabulary but, specifically in the case of descriptive metadata, it is
recommended that users employ the following schema/vocabularies: MARCXML, MODS, and Simple
Dublin Core; for technical metadata, MIX and TextMD (Cundiff, 2004, 62). METS is currently on version
1.12.

METS has seven major subsections: METS header, descriptive metadata, administrative
metadata, file inventory, structural map, structural links, and the behaviors section.

The METS header section (metsHdr) contains information about the METS record itself, such as
the name of the record, the creator, etc.

The descriptive metadata section (dmdSec) contains the descriptive metadata for the object, such
as a unique ID, information related to creation, etc. This section can also contain “metadata in external
documents or systems” with element mdRef and to “[embed] descriptive metadata from a different
namespace in the METS document” with element mdWrap (Cundiff, 2004, p. 55).

The administrative metadata section (amdSec) is divided into four sections related to information
regarding the technical (techMD), rights (rightsMD), the orginal source (sourceMD), and provenance
metadata about the object (digiProvMD). Like dmdSec, the amdSec may contain metadata that is external
to the document or from other namespaces.
The file section (fileSec) lists all of the individual files that comprise the digital object. This section
can also be used “to record links to content files residing externally to the METS file… allows for files to
be grouped together into sets” (McDonough, 2006, p. 150).

The structural map (structMap) is the only mandatory section and is considered the core or
backbone of the METS document. It is “the means by which the hierarchical structure and the sequence
of the components of a digital object are expressed” (Cundiff, 2004, p. 58). Information here would be, for
example, the specific order of chapters within a digital book.

The structural links (structLink) specifically makes reference the to the structMap section,
showing the relationships between element/components in structMAP.

The behavior section (behaviorSec) contains information regarding how to interact with the object
when “viewing” it, such as the software needed and actions that need to be performed, like page turning.

Because METS is so extensive and exhaustive, the structMap and structLink sections will be
detailed.

The Structural Map (structMap) and Structural Link (structLink)

As previously stated, the Structural Map section (structMap) provides a kind of hierarchical and
sequential structuring of the components contained within the digital object. Essentially, it provides the
overall organization of the resource. In XML, this information is encoded in the element <structMap>
while the hierarchy is expressed in the <div> elements that are nested within the section (METS
Overview). The <div> element tells how the objects in the record need to be displayed (Pomerantz, 2017).

Within the <div> element(s) are two other elements that make reference to elements that are
relevant to that <div> element, either already within the METS record or another, external METS record,
such as the content presented in the <fileSec> element (METS Manual). The <fptr> element references
<file> elements that already exist in the current METS record. The <mptr> references content that exists
in a METS record external to the one currently being described. XML examples of both are below:

<mets:div TYPE="page" LABEL=" Blank page">


<mets:fptr FILEID="epi01m"/>
<mets:fptr FILEID="epi01r"/>
<mets:fptr FILEID="epi01t"/>
</mets:div>

In the <fptr> elements, the fileID is used to point to the files that are recorded in the fileSec of the record.

<mets:structMap TYPE="physical">
<mets:div TYPE="multivolume book" LABEL="Martial Epigrams I & II"
DMDID="DMD1">
<mets:div TYPE="volume" LABEL="Volume I">
<mets:mptr LOCTYPE="URL" xlink:href=“http://www.loc.gov/standards/mets/
documentation MatrialEpigrams.xml”/>
</mets:div>
<mets:div TYPE="volume" LABEL="Volume II">
<mets:mptr LOCTYPE="URL" xlink:href=”http://www.loc.gov/standards/mets/
documentation/MatialEpigramsII.xml”/>
</mets:div>
</mets:div>
</mets:structMap>

In this example, <mptr> is being used to point to the external METS records via a URI.

Attributes within the <div> element, TYPE and LABEL, for example, help to specify the
description of each component in the element. The TYPE attribute “specifies the type of structural division
that the <div> element represents”, such as a chapter, page, track, etc. The LABEL attribute essentially
identifies what the <div> element is displayed as, such as a table of contents, a chapter title/number, and
so on. The latter attribute is especially specific to hierarchical arrangements within the <structMap>
(Digital Library Federation, 2007, p. 59). An XML example of a simple <structMap> section is below:

<mets:structMap TYPE="physical">
<mets:div TYPE="book" LABEL="Martial Epigrams II" DMDID="DMD1">
<mets:div TYPE="page" LABEL="Blank page"/>
<mets:div TYPE="page" LABEL="Page i: Series title page"/>
<mets:div TYPE="page" LABEL="Page ii: Blank page"/>
<mets:div TYPE="page" LABEL="Page iii: Title page"/>
<mets:div TYPE="page" LABEL="Page iv: Publication info"/>
<mets:div TYPE="page" LABEL="Page v: Table of contents"/>
<mets:div TYPE="page" LABEL="Page vi: Blank page"/>
<mets:div TYPE="page" LABEL="Page 1: Half title page"/>
<mets:div TYPE="page" LABEL="Page 2 (Latin)"/>
<mets:div TYPE="page" LABEL="Page 3 (English)"/>
<mets:div TYPE="page" LABEL="Page 4 (Latin)">
<mets:div TYPE="page" LABEL="Page 5 (English)"/>
<mets:div TYPE="page" LABEL="Page 6 (Latin)"/>
<mets:div TYPE="page" LABEL="Page 7 (English)"/>
</mets:div>
</mets:structMap> (Digital Library Federation, 2007, p. 58).

In the above example, what is being described is a physical book, based off of the TYPE attributes in both
the <structMap> and <div> elements, where the <div> elements are breaking down the individual pages
of the book (TYPE=“page”). The LABEL is providing the page number with its corresponding displayed
information.

The Structural Link Section element <structLink> “allows for the specification of hyperlinks
between the different components of a METS structure that are delineated in a structural map” (Digital
Library Federation, 2007, p. 76). The Structural Map Link (smLink) elements, a repeatable container
element, can express a link between any two <div> elements in the <structMap> section (Cundiff, 2004,
p. 56).

METS in the Future


McDonough points to challenges that may arise from “flexibility and interoperability”, in that the
two “are seemingly contradicting and can create challenges in that two records from different instns could
be completely different”. Another area of potential difficulty is “METS’ support of the use of arbitrary
extension schema” which will lead to inevitable variability if “it does not constrain where metadata… may
be stored, nor the specific format in which metadata or content is stored.” (McDonough, 2006, p. 153).
Though these issues may just be from one source, there could be an opportunity for more collaboration in
order to further standardize the already expansive METS schema.

-------------------------------------------

References/Further Reading

METS Homepage, which includes link to METS schemas and documentation, example documents,
news, and other resources - https://www.loc.gov/standards/mets/mets-home.html

METS: PRIMER AND REFERENCE MANUAL - http://www.loc.gov/standards/mets/METSPrimer.pdf

METS: An Overview & Tutorial - https://www.loc.gov/standards/mets/METSOverview.v2.html

Youtube - Metadata MOOC 4-12: Metadata Encoding and Transmission Standard (METS)
(https://www.youtube.com/watch?v=i0Uet7MLqrg) - introductory video on METS from Jeffrey Pomerantz
of UNC Chapel Hill

References

Caplan P (2009) Understanding PREMIS. Washington, DC: Library of Congress. Retrieved from
http://www.loc.gov/standards/premis/understanding-premis.pdf

Caplan, P., & Guenther, R. (2005). Practical Preservation: The PREMIS Experience. Library
Trends, 54(1), 111–124. Retrieved from
https://doi-org.dom.idm.oclc.org/10.1353/lib.2006.0002

Cundiff, M. V. (2004) "An introduction to the Metadata Encoding and Transmission Standard
(METS)", Library Hi Tech, Vol. 22 Issue: 1, 52-64.
https://doi.org/10.1108/07378830410524495

Digital Library Federation. 2007. METS: Metadata Encoding and Transmission Standard: Primer
and reference manual, Washington, DC: Digital Library Federation. Retrieved from
http://www.loc.gov/standards/mets/METSPrimer.pdf

Donaldson, D. R. & Conway, P. (2010) "Implementing PREMIS: a case study of the Florida
Digital Archive", Library Hi Tech, Vol. 28 Issue: 2, 273-289.
https://doi.org/10.1108/07378831011047677

Lavoie, B. (2008). PREMIS With a Fresh Coat of Paint: Highlights from the Revision of the
PREMIS Data Dictionary for Preservation Metadata. D-Lib Magazine, 14(5/6). Retrieved
from http://www.dlib.org/dlib/may08/lavoie/05lavoie.html

Library of Congress. (2017). METS: An Overview & Tutorial. Retrieved from


https://www.loc.gov/standards/mets/METSOverview.v2.html

Library of Congress. (2017). [XML Example 1 of PREMIS version 3.0]. PREMIS XML Usage
Examples. Retrieved from
https://www.loc.gov/standards/premis/v3/sample-records/PREMIS%203%20example%2
01.xml

McCargar, V. (2005). No Pain, No Metadata. Seybold Report: Analyzing Publishing


Technologies, 5(6), 10–12. Retrieved from
https://dom.idm.oclc.org/login?url=http://search.ebscohost.com/login.aspx?direct=true&d
b=a9h&AN=17534587&site=ehost-live&scope=site

McDonough, J. P. (2006). METS: Standardized Encoding for Digital Library Objects.


International Journal of Digital Libraries, 6 (2), 148–158. DOI
10.1007/s00799-005-0132-1

Pomerantz, J. [Jeffrey Pomerantz]. (2017, January 11). Metadata MOOC 4-9: PREMIS Data
Dictionary for Preservation Metadata, Part 1 [Video file]. Retrieved from
https://www.youtube.com/watch?v=-_rntZXG7TY

Pomerantz, J. [Jeffrey Pomerantz]. (2017, January 11). Metadata MOOC 4-10: PREMIS Data
Dictionary for Preservation Metadata, Part 2 [Video file]. Retrieved from
https://www.youtube.com/watch?v=2JFaC6kFXpo

Pomerantz, J. [Jeffrey Pomerantz]. (2017, January 11). Metadata MOOC 4-12: Metadata
Encoding and Transmission Standard (METS) [Video file]. Retrieved from
https://www.youtube.com/watch?v=i0Uet7MLqrg&list=LLHTLEjC4ytlqnFHtEfL6t5g

PREMIS Editorial Committee, PREMIS Data Dictionary for Preservation Metadata, version 2.0
(Library of Congress, March 2008), 1,
http://www.loc.gov/standards/premis/v2/premis-2-0.pdf

USCD Digital Library Program. (2005). METS: A Data Standard for Access and Preservation
Now and into the Future. Digital Letters, Summer (8). Retrieved from
http://web.archive.org/web/20060313123817/http://gort.ucsd.edu/dlpwg/dletters/issue8.p
df

You might also like