Professional Documents
Culture Documents
Metadata - Data About Data PDF
Metadata - Data About Data PDF
Metadata - Data About Data PDF
Kashif Rabbani
1 Introduction
Metadata is a term widely used in data science nowadays. Most often this term
is misunderstood due to lack of appropriate knowledge. Philip Bagley[1] coined
the term Metadata for the first time in November 1968. The idea of the concept
metadata belongs to the first library thousands of years ago.
The 1st catalog created for the Library of Alexandria in the year 245BC was
called Pinakes ( in Ancient Greek). It was invented to sort out the critical issue
of finding the relevant book of interest quickly. As an analogy, it was more like
VHS-Tape scrolling technologies we had in the past. Attributes used in these
catalogs were the same as being used in today’s libraries e.g. title, genre and
author.
The 2nd invention in the field of library catalog developments was Codex. It
was also called shelf-list (the book). 3rd and the most revolutionizing invention
was Card Catalogs, invented at the time of French revolution. Card catalog
atomized the shelf-list in two dimensions. 1) records for individual items and
2) Headers/categories shared by the data items if we think about it again, by
breaking the data into records (individual items) and categories that are shared
by the data items you essentially invent a spreadsheet. This atomization in two
dimensions led us to the invention of the databases later.
Let’s build some basis to come up with a technical definition of metadata. Ac-
cording to the theory of Alfred Korzybsk (An American scholar recognized as a
founder of general semantics),
The map is not the territory.
We encounter different types of maps in our daily life, for example, the most
used road maps (Google Maps), topographical maps and nautical charts. All
these different types of maps are entitled to serve a different specific purpose
and possibly they are not interchangeable. The commonality among these maps
is that all these maps simplify the copiousness and complexity of the physical
world into the details that one can need in a specific situation. Precisely, these
maps serve as a Language to reduce the daily life’s complexities. For example,
we do not need topographic (information about the shape and features of land
surfaces) when planning a road trip, we only require weather and traffic/roads
information. Thus we can say that the map is a separate (simple) object of the
territory. Hence we conclude that Metadata is a map. It is a way to simplify the
complexity of an object.
When a task is being performed well by the metadata, its existence fades away
into the background. As an elementary example, every piece of information we
get while backtracking our memory to find out the lost keys of our house is
metadata.
4 Kashif Rabbani
2 Metadata Standards
There are hundreds of metadata standards available for different domain-specific
areas. However, this report does not aim to overwhelm the readers with meta-
data standards. A topology of metadata standards is formed to illustrate the
METADATA 5
3 Metadata Types
Perhaps the most famous and widely used type of metadata is descriptive meta-
data. However, this is not the only type in the market. Different communities
perceive metadata from different angles and thus come up with a new type of
metadata or metadata standards. We will discuss eight different types of meta-
data in the details below.
and other web features. Hence the purpose of Dublin Core seems to go shallow
here because search engines did not make their foundations based on Dublin Core
metadata standards. Should we declare Dublin Core as a failure now? No, the
first initiative to implement RDF data model was because of Dublin Core. Most
famous RDF data models are the Digital Public Library of America, Europeana,
and DBpedia.
DBpedia aims to extract information from the Wikipedia
project. This structured information is stored in the form of
RDF. It is available on the World Wide Web. It allows query-
ing Wikipedia resources semantically to get details about their
relationships and properties and links to other RDF ontolo-
gies. It is also known as one of the best efforts of decentralized
Linked Data.
Europeana was started to preserve the European cultural
heritage in digital format. Most famous Mona Lisa painting by Leonardo da
Vinic is one of the examples of Europeana. Europeana got contributed by more
than 3000 institutes. Europeana let users explore the European cultural and the
scientific heritage.
The nice thing about standards is that there are so many of them to choose
from. – Admiral Grace Hopper
useful and extendible at the same time. We will discuss three types of metadata
under the hood of administrative metadata in the subsequent subsections.
Rights Metadata It provides information about access control rules and reg-
ulations of a resource. Digital resources most often suffer from the issue of copy-
rights. A schema to capture the data about rights of the resources; remember
the “rights” element of Dublin Core. Dublin core standards get extended with
three more elements. 1) Access Rights: Policies and rights for the holder to ac-
cess the resource, 2) Rights Holder: It can be an individual or an organization,
3) License: It is a legal document.
3.5 Meta-Metadata
Metadata Encoding and Transmission Standard (METS ) started in early 2000
as a result of an enormous increase in data from digital resources like libraries,
museums, archives and cultural heritage. It resulted in an exponential increase
of metadata schemas and standards for the resources mentioned above. Popular
repositories which came into existence includes arxiv.org, Fedora, eprints, and
Dspace. Few of these resources are still up to date and well known. It started
the problem of reproduction of content and functionality of the data. METS
provided a standard structure for metadata about resources and ensured data
exchange among different repositories to solve this problem.
METS creates documents for metadata records. A METS document is a
mechanism to read several relationships that exists between digital library ob-
ject and pieces of contents. There are seven parts of the METS document. The
Header, Descriptive metadata, Administrative metadata, Structural Map, Struc-
tural link, Behavior and comparison analysis.
4 Domain-Specific Metadata
Metadata is everywhere, but few of the most public areas are HealthCare, Envi-
ronmental, GeoSpatial, Education, Music Industry, and the Automobile indus-
try. We will discuss each domain in details below.
4.2 Education
Education is a broad field, and there are plenty of learning resources available
online to facilitate the learning pathways. Metadata comes into the picture when
we need to standardize the learning objects. The Institute of Electrical and Elec-
tronic Engineers (IEEE) announced the standard for Learning Object Metadata
(LOM) to describe the learning objects in 2002.
Another aspect associated with the process of learning is teaching. Learning
objects support both teaching and learning around a single learning objective.
As most of these learning resources are in the form of digital resources, therefore
it is easy to standardize their distribution to one meta-body. LOM defines the
set of categories. Each category contains a specific set of elements. As a result
of this initiative, many higher education systems adopted LOM. E.g. Learning
management systems (LMS) used in K-12 2 . LOM categories include Educational
category comprised of set TypicalAgeRange, TypicalLearningTime and Rights
Category comprised of Copyright element.
4.3 Transcripts
As the heading does not convey much about this domain, we need to dig into its
essence. Educational institutes are providing transcripts/degrees/certificates to
the students. The fact that not every institute of the state is inter-linked with
each other. A reliable way to avoid the verification of transcripts via physical mail
was the necessity of time. Parchment 3 is a company making use of metadata
for developing schemas to represent degree programs, and courses of students
in a well-structured way. This area has got standardization in higher education
recently. Parchment will facilitate the verification of transcripts across differ-
ent institutes and companies by enabling easy import and export of student’s
transcripts and credentials.
4.4 Publishing
Publications and descriptive metadata are interrelated for many decades. Tradi-
tionally it only consisted of publisher details, publication date, ISBN, etc. But
now with the arrival of ebooks and self-publishing platforms in the online world,
it has gotten the eyes of the audience. Amazon Kindle direct publishing and Lulu
are few of the modern self-publications platforms. It has been observed that the
quality and richness of metadata related to these publications is critical despite
the readers discover the title or not.
2
https://en.wikipedia.org/wiki/K%E2%80%9312
3
https://www.parchment.com/
12 Kashif Rabbani
Maps nowadays are in everyone’s pocket. Most of the businesses are also making
use of maps to visualize different aspects of the business projects. Geospatial
elements are making extensive use of metadata. Geospatial metadata describes
maps, Geographic Information System (GIS) files, Imagery, and other location-
based resources. Metadata is a part of the dataset, and it provides context to
the metadata.
Metadata contains the information about data’s origin, custodianship, copy-
rights, and reuse. Metadata is now widely used in spatial data communities
for sharing/transferring the information. Geographic metadata is responsible for
making users aware of geographic data’s limitations, suitability, indexing, and
restrictions.
Geospatial Metadata Tools There are few well known Geographic Infor-
mation Systems (GIS) systems, e.g. ArcGIS, PYCSW, and OSGeo. ArcGIS5 is
the most famous and widely adopted by industries because it enables users to
create and use geo maps, compile geographic data, share and manage the geo-
information. PYCSW6 and OSGeo7 are in use as a framework to manage and
create geospatial data.
5 Use of Metadata
5.2 Paradata
This term is mostly used for metadata about learning resources. Learning re-
sources include education and research. In the context of education, Paradata
is about educational resources, and in the context of research methodology, it is
14 Kashif Rabbani
mostly used to create metadata records and schemas for large datasets used in
the extensive experiments which are sometimes confidential. For example meta-
data records about the origin of the dataset and timeline of data collection and
utilization.
6 Conclusion
We have discussed a few possible aspects of the metadata in this report. Never-
theless, we can not capture all the details due to the scope of the topic. Metadata
is thriving in every domain. Metadata has good and bad aspects depending on
the type of usage. Author of the report has declared metadata as a parasite and
a matter of perspective. We never know if it is there? If it is harmful or not?
What are its type? What is its usage and characteristics.
METADATA 15
References
1. History of information (1968), http://www.historyofinformation.com/detail.php?entryid=4241
2. Fgdc (2010), https://www.fgdc.gov/metadata
3. LePage, A.: Introduction to metadata.
edited by murtha baca. getty research institute. (2009), http://www.
getty.edu/research/conducting research/standards/intrometadata
4. Pomerantz, J.: Metadata. MIT Press (2015)