Professional Documents
Culture Documents
PreservationMetadata NCDDworkshop 2014 Titia VD Werf
PreservationMetadata NCDDworkshop 2014 Titia VD Werf
PreservationMetadata NCDDworkshop 2014 Titia VD Werf
adapted from:
Rebecca Guenther, “Metadata for preservation of digital objects:
background, functions, and standards” – Preservation Metadata Workshop (1),
Hilversum, The Netherlands, 4 March 2014
OUTLINE
1. General introduction to preservation metadata
2. The PREMIS Data Dictionary
3. A use case: the Preservation Health Check
2
Introduction to preservation
metadata
3
metadata
Function Type
Discovery Descriptive
Access Administrative
Management Technical
Control intellectual property Rights/Access
rights Structural
Identification Meta-metadata
Certify authenticity Etc.
Mark content structure
Indicate status
Describe processes
Etc.
4
digital preservation
Digital preservation is part and parcel of the “management and
preservation” tasks and responsibilities of a heritage institution.
5
digital preservation
Digital preservation is part and parcel of the “management and
preservation” tasks and responsibilities of a heritage institution.
6
preservation metadata in 2000
“We can then say that the main problem metadata
for long term preservation will help to solve is the
problem of technological obsolescence.” (p.4)
7
http://www.kb.nl/sites/default/files/docs/NEDLIBmetadata.pdf
preservation metadata in 2002
“Preservation metadata (…) is the information
necessary to maintain the viability,
renderability, and understandability of digital
resources over the long-term.” (p.1)
http://www.oclc.org/content/dam/research/activities/pmwg/pm_framework.pdf?urlm=161391
8
preservation metadata in 2005
“Preservation metadata (…) metadata supporting
the functions of maintaining viability,
renderability, understandability, authenticity,
and identity in a preservation context.” (p. ix)
http://www.loc.gov/standards/premis/
9
The SPOT Model for risk assessment
http://www.dlib.org/dlib/september12/vermaaten/09vermaaten.html
Availability
Threats
Identity
Persistence
SPOT
Model
Renderability
Understandability
Authenticity
Six essential properties of successful digital preservation
metadata and preservation metadata
PRESERVATION
“Structured information that METADATA
describes, explains, locates,
or otherwise makes it easier to
retrieve, use, or manage an
information resource”
METADATA
supporting and documenting the
digital preservation process
• Provenance:
– The chain of custody/ownership of the digital object; info about the
depositor; etc.
• Authenticity:
– The documentation of changes affecting the authenticity of the digital object
during the preservation process
• Preservation Activity:
– The documentation of actions taken to preserve the digital object
• Technical Environment:
– The documentation of the dependencies on and changes in the technical
environment needed to render and use the digital object
• Rights:
– The documentation of the rights and permissions for carrying out
preservation activities on the digital object (duplication, migration,
transformations)
OAIS Information Model
Preservation
Description
Information
Implementation choices:
e.g. fixity information in source AIP
+ keep log of data integrity checks and their
outcomes separate from the AIP.
16
OAIS compliance relevant to preservation metadata
19
Functions of a trusted digital repository
relevant to preservation metadata
• Maintains precise descriptions of actions necessary to ensure
that objects are preserved
• Has mechanisms for monitoring and notification when formats
are becoming obsolete
• Uses tools and resources such as format registries to
establish semantic and technical context
• Has processes for storage media and/or hardware changes
• Tracks and manages intellectual property rights and
restrictions
• Ensures that agreements applicable to access conditions are
adhered to
• Maintains descriptive metadata for access and retrieval and
associates it with object
20
PREMIS
21
Standards that address preservation
metadata: technical
• PREMIS
• Images
– NISO Z39.87 and MIX
– Adobe and XMP (Extensible Metadata Platform)
– Exif (Exchangeable Image File Format)
– IPTC (International Press Telecommunications Council)/XMP
• Text: textMD
• Sound
– AES57-2011: Audio Object XML Schema
– AES60-2011: Core Audio Metadata
– AudioMD (Library of Congress)
Standards that address preservation
metadata: technical
• Video
– VideoMD
– SMPTE RP210
– Technical metadata in EBUCore, PBCore
– U.S. Federal Agencies Digitization Guidelines
– MPEG-7 and MPEG-21 for video
Standards that address preservation
metadata: Structural
§ METS
§ PREMIS
§ MPEG 21 Digital Item Declaration
§ OAI/ORE
§ Specific format types
– MXF
– AVI
Standards that address preservation
metadata: Rights
• PREMIS
• METS Rights
• CDL Copyright schema
• Creative commons
• PLUS for images
• MPEG-21 REL for moving images
• ONIX for licensing terms
• Full rights expression languages
– XRML/MPEG-21
– ODRL
PREMIS Data Dictionary
• May 2005: Data Dictionary for Preservation
Metadata: Final Report of the PREMIS Working Group
• Data Dictionary:
– Comprehensive view of information needed to support digital preservation
• Guidelines/recommendations to support creation, use, management
– Based on deep pool of institutional experiences in setting up and managing operational
capacity for digital preservation
Guiding principles: “implementable,
core preservation metadata”
• Preservation metadata: maintain viability, renderability,
understandability, authenticity, identity in a preservation
context
Intellectual
Entities
Rights
Statements
Objects Agents
Events
Intellectual Entities
• Set of content that is considered a
single intellectual unit for purposes of
management and description (e.g., a
book, a photograph, a map, a
database)
Intellectual Entity
Da Vinci Code by
Dan Brown
Representation 1
Representation 2
Page image
ebook version
version
• Agent Identifier
• Agent Name
• Agent Type
• Agent Note
• Agent Extension
• Linking Event Identifier
• Linking Rights Identifier
Rights Statements
• An agreement with a rights holder
that grants permission for the
repository to undertake an
action(s) associated with an
Object(s) in the repository.
• Not a full rights expression
language; focuses exclusively on
permissions that take the form:
Example: – Agent X grants Permission Y
§ Priscilla Caplan grants FCLA to the repository in regard to
digital repository permission to Object Z.
make three copies of
metadata_fundamentals.pdf for
preservation purposes.
Semantic units pertaining to Rights
http://www.loc.gov/standards/premis/
Implementation resources
• Tools:
– XML schema
– PREMIS-in-METS toolbox <http://pim.fcla.edu>
– Controlled vocabularies at http://id.loc.gov
– RDF/OWL ontology for use as Linked Data
• Guidelines:
– PREMIS conformance statement
– PREMIS & METS guidelines
• Community Working groups on special topics
• Implementation Fairs
• Others:
– Understanding PREMIS (available in multiple languages)
– PIG Forum
– Implementation Registry
– Tools Registry
Some implementers …
• DAITTSS (Florida)
• Ex Libris Rosetta
• OCLC’s Digital Archive™
• Archivematica
• HathiTrust
• TIPR (Towards Interoperable Preservation
Repositories)
– FCLA, NYU and Cornell
• Digital libraries in Spain
– Mandated for use in cultural heritage preservation
repositories
See: http://www.loc.gov/premis/premis-registry.html
PREMIS Conformance
• Conformance statement issued in 2010
• PREMIS Conformance Working Group active
now
• Levels of conformance:
– Level 1
A repository uses an internal metadata schema whose elements can be
mapped to PREMIS. The mapped metadata can satisfy the principles of
use at both the semantic unit and Data Dictionary levels. The repository
is able to produce documentation demonstrating such mapping for
representative samples of its holdings.
– Level 2
A repository implements the PREMIS Data Dictionary as its internal
metadata schema in a way that satisfies the principles of use at both the
semantic unit and Data Dictionary levels and in a form that does not
require further mapping or conversion.
URLs, etc.
• PREMIS Maintenance Activity:
http://www.loc.gov/standards/premis/
46
What is the Preservation Health Check
Pilot?
- Open Planets Foundation (OPF)
A community hub for digital preservation whose main goal is
to jointly manage and improve tools and research
outcomes for practical use.
- OCLC Research
A community resource for shared R&D that addresses
challenges facing libraries and archives in a rapidly
changing information technology environment.
- Bibliothèque nationale de France
The BnF runs a fully operational trusted digital repository
(SPAR). They volunteered to become a PHC-pilot site.
The Preservation Health Check
proposition
As part of their preservation management task, repository
managers need to be able to monitor the preservation
status of the content of their repository.
We are looking at regular “routine check-ups” that can
support this monitoring task.
– Monitoring should be made easy (automatically
generated reports or dashboard)
– Monitoring should be based on objective data,
generated by the repository (e.g. preservation
metadata)
The analogy
The research question
If a Preservation Health Check is a monitoring activity to be
performed on a repository with digital content
1. What are empirical indicators (i.e. measures) for PHCs?
2. Are preservation metadata recorded by repositories
useful as health indicators for PHCs?
PREMIS Persistence
Data Events SPOT
Model Model
Renderability
Rights
Understandability
Agents
Authenticity
preservation metadata in 2005
“Preservation metadata (…) metadata supporting
the functions of maintaining viability,
renderability, understandability, authenticity,
and identity in a preservation context.” (p. ix)
http://www.loc.gov/standards/premis/
55
Findings: coverage
SPOT property # of PREMIS semantic
units*
• Availability 16
• Identity 19
• Persistence 10
• Renderability 15
• Understandability 14
• Authenticity 16
*Container level only; Agents, Events, Rights considered one semantic unit
Findings: coverage
• What does coverage in terms of “number of PREMIS
semantic units” mean?
• More meaningful: Do the PREMIS semantic units
address the threats associated with a SPOT property?
Availability
Threats
Identity
Persistence
SPOT
Model
Renderability
Understandability
Authenticity
Six essential properties of successful digital preservation
62
Logic for assessing Persistence
• If storage medium information is not available in PREMIS metadata,
the PHC will need to take other information sources into account –
such as audit reports generated by storage management systems.
64
Thank You!
titia.vanderwerf@oclc.org
©2014 OCLC. This work is licensed under a Creative Commons Attribution 3.0 Unported License. Suggested attribution: “This
work uses content from [presentation title] © OCLC, used under a Creative Commons Attribution license:
http://creativecommons.org/licenses/by/3.0/”