Professional Documents
Culture Documents
Craig Schemareport
Craig Schemareport
LIS 882
Spring 2018/2019
The PREMIS working group was divided into two subgroups, one of those being the
Implementation Strategies Subgroup. The group was tasked with surveying “encoding, storage, and
management of preservation metadata within digital preservation systems”, among other things (Caplan &
Guenther, 2005, p. 113). After surveying a variety of institutions, from libraries and archives on an
international level, the group found that, one, “there [was] very little experience with digital preservation”
and, two, “those engaged in digital preservation still [lacked] a common vocabulary and, to a large extent,
a common conceptual framework”, to the point where “thirty-three different metadata element sets or rule
sets were mentioned by at least one repository” (Caplan & Guenther, 2005, p. 114-115). These findings
clearly illustrated not only a dire need for a standardization of practice and a kind of singular framework
related to metadata but also a foundational knowledge and need for experience in digital preservation.
PREMIS
It is important to note that PREMIS, unlike other schemas that may focus on administrative or
descriptive metadata, is specifically concerned with metadata based around the preservation of digital
files in order to “ensure the long-term usability of a digital resource” and the “viability, renderability,
understandability, authenticity and identity” of a digital object (Caplan, 2009, p. 3; Pomerantz, 2017).
PREMIS is “not concerned with discovery and access” nor does it “attempt to define detailed format-
specific metadata” (Caplan 2009, p. 4). In this vein, the below information about a file is particularly
relevant when wanting to use PREMIS:
● Inhibitors - “any features of an object intended to inhibit access, use, or migration. Inhibitors
include password protection and encryption”
● Provenance/digital provenance - “is the record of the chain of custody and change history of a
digital object”
● Significant properties - characteristics of an object that should be maintained through
preservation actions
● Rights - such as “copyright status, license terms and special permissions” (Caplan, 2009 p. 6-7).
An intellectual entity is essentially a single conceptual item, a “set of content that is considered a
single intellectual unit for purposes of management and description”, while the object is that tangible unit
of representation for that conceptual item (Caplan, 2009, p. 9). PREMIS does not actually provide actual
semantic units for this entity (mostly because of redundancy due to other vocabularies and schema that
already exist) but provides a semantic unit that falls under the Object Entity to represent it (Pomerantz,
2017).
The object entity has three subtypes: file (basically the computer file of the object), bitstream (the
parts of that computer file), and representation (“the set of files, including structural metadata, needed for
a complete and reasonable rendition of an Intellectual Entity”) (Caplan, 2009, p. 9; ) . The information that
can be included in the object entity include an identifier (semantic unit objectIdentifier), size and format of
the object (contained in the objectCharacteristics semantic unit), and so on. It is important to note that the
object entity is the only entity that allows the record to show relationships with other object entities. There
will be a little more in-depth information on the object entity later.
Events are essentially actions done to the object over time, such as the creation and modification
of the digital object (Pomerantz, 2017). Some event semantic units include eventIdentifier, eventType,
eventDateTime, eventDetailInformation, and eventOutcomeInformation.
Agents are the persons or organizations that basically enact these events or affect the object in
any kind of way, including those involved in the rights of the object (McCargar, 2005). Information/units
include: an identifier for the agent (the agentIdentifier semantic unit), agent's name (agentName),
designation of the type of agent, like the person, organization, software (agentType), among others
(Caplan, 2009, p. 10).
Rights are essentially things related to the permissions, copyright, intellectual property, and so
on, of the digital object. The rightsStatement semantic unit requires the semantic components of
rightsStatementIdentifier and rightsBasis.
Object Entity
As stated previously, each entity has corresponding semantic units which have their own
semantic components. Below is a brief XML example of <objectIdentifier>, from the (Library of Congress’
PREMIS XML Usage Examples).
<premis:objectIdentifier>
<premis:objectIdentifierType>local<premis:/objectIdentifierType>
<premis:objectIdentifierValue>001<premis:/objectIdentifierValue>
<premis:/objectIdentifier>
Above, <objectIdentifier> acts as a container element for the two components, where
<objectIdentifierType> contains the domain where the object identifier is unique (usually from a controlled
vocabulary) and <objectIdentifierValue> has the actual value or identifier for the object, per that domain.
The below example shows an object semantic unit with a relationship to another object in the same
record.
<premis:object xsi:type="premis:file">
<premis:objectIdentifier>
<premis:objectIdentifierType>local</premis:objectIdentifierType>
<premis:objectIdentifierValue>001</premis:objectIdentifierValue>
</premis:objectIdentifier>
<premis:preservationLevel>
<premis:preservationLevelType>logical
preservation</premis:preservationLevelType>
<premis:preservationLevelValue>emulation</premis:preservationLevelValue>
<premis:preservationLevelRole
authority="preservationLevelRole"authorityURI="http://id.loc.gov/vocabular
y/preservation/preservationLevelRole"valueURI="http://id.loc.gov/vocabular
y/preservation/preservationLevelRole/int">intention</premis:preservationLe
velRole>
<premis:preservationLevelRationale>institutional
policy</premis:preservationLevelRationale>
<premis:preservationLevelDateAssigned>2015-02-
23</premis:preservationLevelDateAssigned>
</premis:preservationLevel>
<premis:relationship>
<premis:relationshipType
authority="relationshipType"authorityURI="http://id.loc.gov/vocabulary/pre
servation/relationshipType"valueURI="http://id.loc.gov/vocabulary/preserva
tion/relationshipType/str">structural</premis:relationshipType>
<premis:relationshipSubType
authority="relationshipSubType"authorityURI="http://id.loc.gov/vocabulary/
preservation/relationshipSubType"valueURI="http://id.loc.gov/vocabulary/pr
eservation/relationshipSubType/hsp">has part</premis:relationshipSubType>
<premis:relatedObjectIdentifier>
<premis:relatedObjectIdentifierType>local</premis:relatedObjectIdentifie
rType>
<premis:relatedObjectIdentifierValue>007</premis:relatedObjectIdentifier
Value>
</premis:relatedObjectIdentifier>
</premis:relationship>
</premis:object>
Briefly, <premis:object> is acting as a container element for all of the object related information, where the
object type is a file based off of “ xsi:type=”. In <premis:objectIdentifier> and </premis:preservationLevel>,
there is specific information identifying the object being recorded in the record and specific preservation
functions regarding the object. The <premis:relationship> element contains information about related
object “007”, where the relationshipType is “structural” and the relationshipSubType is “has part”,
meaning “the object contains the related object” (PREMIS Data Dictionary, 2008, p. 110).
-------------------------------------------
Resources/Further Reading
PREMIS Homepage, which includes data dictionaries and schemas, implementation tools, and other
resources - https://www.loc.gov/standards/premis/index.html
PREMIS Resources page, which includes a wide variety of key documents, papers and articles,
presentations, and other resources - https://www.loc.gov/standards/premis/bibliography.html
Youtube - Metadata MOOC 4-9: PREMIS Data Dictionary for Preservation Metadata, Part 1
(https://www.youtube.com/watch?v=-_rntZXG7T) and Part 2 (https://www.youtube.com/watch?
v=2JFaC6kFXpo) - introductory videos on PREMIS from Jeffrey Pomerantz of UNC Chapel Hill
The Metadata Encoding and Transmission Standard (METS) is an encoding standard and XML
schema “designed for the purpose of creating XML document instances that express the hierarchical
structure of digital library objects, and the associated descriptive and administrative metadata” (Cundiff,
2004, p. 53). METS was born out of the Making of America II project formed by UC Berkeley and the
Digital Library Federation with the goal of “[creating] a proposed digital library object standard by
encoding defined descriptive, administrative, and structural metadata, along with primary content, inside a
digital library object” (Cundiff, 2004, p. 52). According to the METS page from the Library of Congress,
without structural metadata for digital library objects such as digital/ebooks, “the page image or text files
comprising the digital work are of little use, and without technical metadata regarding the digitization
process, scholars may be unsure of how accurate a reflection of the original the digital version provides”
(“METS: An Overview & Tutorial,” 2017). The development of MoA was focused on creating a standard
encoding to “serve as a digital object transfer syntax”, “function as a data format for use with digital
libraries” and digital repositories (Cundiff, 2004, p. 52). But MoA was not sufficient enough in that it “did
not provide a vocabulary for expressing descriptive or administrative metadata” and its structural
metadata elements were too limited and only supported text and still image material (McDonough, 2006,
p. 148). Along with this “was a desire for METS to facilitate the exchange and interoperability of digital
library objects across digital library systems and to provide support for the long-term preservation of
digital library objects” (McDonough, 2006, p. 148).
Taking these issues and aims into consideration, METS was developed for “digital objects that
comprise text, images, audio, and video file”, acting as a “digital wrapper” in order to “relate the
components of a digital resource” (“METS: A Data Standard,” 2005, p. 1). These could be individual
tracks on an album, chapters in a book, the audio and visual aspects of a video, and so on. The goal is, in
order to “assure the integrity of the overall object and to facilitate the use of it, the structural relationship of
these files needs to be captured” (“METS: A Data Standard,” 2005, p. 1).
METS Document/Subsections
METS was designed to “promote interoperability of digital content between digital library systems
and contribute to the preservation of digital library materials” (McDonough, 2006, p. 148). With this, METS
does not have a single or prescribed vocabulary but, specifically in the case of descriptive metadata, it is
recommended that users employ the following schema/vocabularies: MARCXML, MODS, and Simple
Dublin Core; for technical metadata, MIX and TextMD (Cundiff, 2004, 62). METS is currently on version
1.12.
METS has seven major subsections: METS header, descriptive metadata, administrative
metadata, file inventory, structural map, structural links, and the behaviors section.
The METS header section (metsHdr) contains information about the METS record itself, such as
the name of the record, the creator, etc.
The descriptive metadata section (dmdSec) contains the descriptive metadata for the object, such
as a unique ID, information related to creation, etc. This section can also contain “metadata in external
documents or systems” with element mdRef and to “[embed] descriptive metadata from a different
namespace in the METS document” with element mdWrap (Cundiff, 2004, p. 55).
The administrative metadata section (amdSec) is divided into four sections related to information
regarding the technical (techMD), rights (rightsMD), the orginal source (sourceMD), and provenance
metadata about the object (digiProvMD). Like dmdSec, the amdSec may contain metadata that is external
to the document or from other namespaces.
The file section (fileSec) lists all of the individual files that comprise the digital object. This section
can also be used “to record links to content files residing externally to the METS file… allows for files to
be grouped together into sets” (McDonough, 2006, p. 150).
The structural map (structMap) is the only mandatory section and is considered the core or
backbone of the METS document. It is “the means by which the hierarchical structure and the sequence
of the components of a digital object are expressed” (Cundiff, 2004, p. 58). Information here would be, for
example, the specific order of chapters within a digital book.
The structural links (structLink) specifically makes reference the to the structMap section,
showing the relationships between element/components in structMAP.
The behavior section (behaviorSec) contains information regarding how to interact with the object
when “viewing” it, such as the software needed and actions that need to be performed, like page turning.
Because METS is so extensive and exhaustive, the structMap and structLink sections will be
detailed.
As previously stated, the Structural Map section (structMap) provides a kind of hierarchical and
sequential structuring of the components contained within the digital object. Essentially, it provides the
overall organization of the resource. In XML, this information is encoded in the element <structMap>
while the hierarchy is expressed in the <div> elements that are nested within the section (METS
Overview). The <div> element tells how the objects in the record need to be displayed (Pomerantz, 2017).
Within the <div> element(s) are two other elements that make reference to elements that are
relevant to that <div> element, either already within the METS record or another, external METS record,
such as the content presented in the <fileSec> element (METS Manual). The <fptr> element references
<file> elements that already exist in the current METS record. The <mptr> references content that exists
in a METS record external to the one currently being described. XML examples of both are below:
In the <fptr> elements, the fileID is used to point to the files that are recorded in the fileSec of the record.
<mets:structMap TYPE="physical">
<mets:div TYPE="multivolume book" LABEL="Martial Epigrams I & II"
DMDID="DMD1">
<mets:div TYPE="volume" LABEL="Volume I">
<mets:mptr LOCTYPE="URL" xlink:href=“http://www.loc.gov/standards/mets/
documentation MatrialEpigrams.xml”/>
</mets:div>
<mets:div TYPE="volume" LABEL="Volume II">
<mets:mptr LOCTYPE="URL" xlink:href=”http://www.loc.gov/standards/mets/
documentation/MatialEpigramsII.xml”/>
</mets:div>
</mets:div>
</mets:structMap>
In this example, <mptr> is being used to point to the external METS records via a URI.
Attributes within the <div> element, TYPE and LABEL, for example, help to specify the
description of each component in the element. The TYPE attribute “specifies the type of structural division
that the <div> element represents”, such as a chapter, page, track, etc. The LABEL attribute essentially
identifies what the <div> element is displayed as, such as a table of contents, a chapter title/number, and
so on. The latter attribute is especially specific to hierarchical arrangements within the <structMap>
(Digital Library Federation, 2007, p. 59). An XML example of a simple <structMap> section is below:
<mets:structMap TYPE="physical">
<mets:div TYPE="book" LABEL="Martial Epigrams II" DMDID="DMD1">
<mets:div TYPE="page" LABEL="Blank page"/>
<mets:div TYPE="page" LABEL="Page i: Series title page"/>
<mets:div TYPE="page" LABEL="Page ii: Blank page"/>
<mets:div TYPE="page" LABEL="Page iii: Title page"/>
<mets:div TYPE="page" LABEL="Page iv: Publication info"/>
<mets:div TYPE="page" LABEL="Page v: Table of contents"/>
<mets:div TYPE="page" LABEL="Page vi: Blank page"/>
<mets:div TYPE="page" LABEL="Page 1: Half title page"/>
<mets:div TYPE="page" LABEL="Page 2 (Latin)"/>
<mets:div TYPE="page" LABEL="Page 3 (English)"/>
<mets:div TYPE="page" LABEL="Page 4 (Latin)">
<mets:div TYPE="page" LABEL="Page 5 (English)"/>
<mets:div TYPE="page" LABEL="Page 6 (Latin)"/>
<mets:div TYPE="page" LABEL="Page 7 (English)"/>
</mets:div>
</mets:structMap> (Digital Library Federation, 2007, p. 58).
In the above example, what is being described is a physical book, based off of the TYPE attributes in both
the <structMap> and <div> elements, where the <div> elements are breaking down the individual pages
of the book (TYPE=“page”). The LABEL is providing the page number with its corresponding displayed
information.
The Structural Link Section element <structLink> “allows for the specification of hyperlinks
between the different components of a METS structure that are delineated in a structural map” (Digital
Library Federation, 2007, p. 76). The Structural Map Link (smLink) elements, a repeatable container
element, can express a link between any two <div> elements in the <structMap> section (Cundiff, 2004,
p. 56).
-------------------------------------------
References/Further Reading
METS Homepage, which includes link to METS schemas and documentation, example documents,
news, and other resources - https://www.loc.gov/standards/mets/mets-home.html
Youtube - Metadata MOOC 4-12: Metadata Encoding and Transmission Standard (METS)
(https://www.youtube.com/watch?v=i0Uet7MLqrg) - introductory video on METS from Jeffrey Pomerantz
of UNC Chapel Hill
References
Caplan P (2009) Understanding PREMIS. Washington, DC: Library of Congress. Retrieved from
http://www.loc.gov/standards/premis/understanding-premis.pdf
Caplan, P., & Guenther, R. (2005). Practical Preservation: The PREMIS Experience. Library
Trends, 54(1), 111–124. Retrieved from
https://doi-org.dom.idm.oclc.org/10.1353/lib.2006.0002
Cundiff, M. V. (2004) "An introduction to the Metadata Encoding and Transmission Standard
(METS)", Library Hi Tech, Vol. 22 Issue: 1, 52-64.
https://doi.org/10.1108/07378830410524495
Digital Library Federation. 2007. METS: Metadata Encoding and Transmission Standard: Primer
and reference manual, Washington, DC: Digital Library Federation. Retrieved from
http://www.loc.gov/standards/mets/METSPrimer.pdf
Donaldson, D. R. & Conway, P. (2010) "Implementing PREMIS: a case study of the Florida
Digital Archive", Library Hi Tech, Vol. 28 Issue: 2, 273-289.
https://doi.org/10.1108/07378831011047677
Lavoie, B. (2008). PREMIS With a Fresh Coat of Paint: Highlights from the Revision of the
PREMIS Data Dictionary for Preservation Metadata. D-Lib Magazine, 14(5/6). Retrieved
from http://www.dlib.org/dlib/may08/lavoie/05lavoie.html
Library of Congress. (2017). [XML Example 1 of PREMIS version 3.0]. PREMIS XML Usage
Examples. Retrieved from
https://www.loc.gov/standards/premis/v3/sample-records/PREMIS%203%20example%2
01.xml
Pomerantz, J. [Jeffrey Pomerantz]. (2017, January 11). Metadata MOOC 4-9: PREMIS Data
Dictionary for Preservation Metadata, Part 1 [Video file]. Retrieved from
https://www.youtube.com/watch?v=-_rntZXG7TY
Pomerantz, J. [Jeffrey Pomerantz]. (2017, January 11). Metadata MOOC 4-10: PREMIS Data
Dictionary for Preservation Metadata, Part 2 [Video file]. Retrieved from
https://www.youtube.com/watch?v=2JFaC6kFXpo
Pomerantz, J. [Jeffrey Pomerantz]. (2017, January 11). Metadata MOOC 4-12: Metadata
Encoding and Transmission Standard (METS) [Video file]. Retrieved from
https://www.youtube.com/watch?v=i0Uet7MLqrg&list=LLHTLEjC4ytlqnFHtEfL6t5g
PREMIS Editorial Committee, PREMIS Data Dictionary for Preservation Metadata, version 2.0
(Library of Congress, March 2008), 1,
http://www.loc.gov/standards/premis/v2/premis-2-0.pdf
USCD Digital Library Program. (2005). METS: A Data Standard for Access and Preservation
Now and into the Future. Digital Letters, Summer (8). Retrieved from
http://web.archive.org/web/20060313123817/http://gort.ucsd.edu/dlpwg/dletters/issue8.p
df