Professional Documents
Culture Documents
Automatic Cataloging & Classification: Eric Childress OCLC Research
Automatic Cataloging & Classification: Eric Childress OCLC Research
classification
Eric Childress
OCLC Research
Can machines be
Human Labor
leveraged for?
Input
– Baseline metadata
• Critical data present
• Accurate tagging Status quo
• Accurate values
Output
– Ideal: Enriched
metadata
Metadata
The answer:
– Yes…with caveats
Automation approaches
Library of Congress
NSF-funded NSDL projects
AMeGA
iVia software
RLG’s Automatic Exposure
Library of Congress
BEAT (Bibliographic Enrichment Advisory
Team) activities & projects:
– MARC records fromharvesting:
• E-CIP
• Web access to publications in series
– Numerous enrichment activities:
• TOCs: E-CIP, ONIX, dTOC project, more
• Reviews: HNET, Outstanding Reference Sources,
HLAS reviews, MARS Best Free Reference Sites
• Contributor biographic information, ONIX
descriptions, sample texts
• Links to e-versions of various texts
• Special projects for select LC collections
– Work with bibliographies & pathfinders
NSDL-related projects (selected)
MetaExtract: An NLP System to Automatically Assign
Metadata
– CNLP (Syracuse U) & SIS (Syracuse U)
– Builds on several previous projects including:
• Breaking the MetaData Generation Bottleneck [2000-2002]
– CNLP (Syracuse U) & U Washington iSchool
– Application of NLP to automatically generate metadata for course-
oriented materials
Lenny
– Cornell NSDL group & INFOMINE
– Orchestrated application of a suite of activities
• OAI harvesting with metadata augmentation using iVia
• Loosely-coupled third party services to provide metadata
enhancements (correction, augmentation) to metadata destined
for a central repository
• Interactions orchestrated by centralized software application
MetaExtract study findings
Auto-generated versus manually-assigned:
– Comparable
• Performance in Retrieval
• Quality of most elements (for Browsing)
– Better
• Coverage of metadata elements
Auto-generated versus full-text:
– Comparable
• Performance in Retrieval
– Better
• Enables Fielded searching
• Enables Browsing of results
– Provides useful structuring of data
Other projects
AMeGA (Automatic Metadata Generation Applications
Project)
– UNC-CH SILS Metadata Research Center
– Research initiated to fulfill LC Bibliographic Control Action Plan
4.2 (deliver specifications for tools to effect automated
processing of Web-based resources)
– Final report identifies and recommends functionalities for
automatic metadata generation applications
iVia software
– Developed by INFOMINE & in use by NSDL, various other
digital library projects; LC looking at using iVia
– Sophisticated open source harvester software that can assign
LCSH, LCC
Automatic Exposure
– RLG-led initiative advocates capturing standard technical
metadata about digital images automatically, as part of image
creation
OCLC activities
Recommended reading:
– Liddy, Elizabeth, “Metadata: A
Promising Solution” in EDUCAUSE
Review, v. 40, n. 3 (May/June 2005)
OCLC Research links:
– Automatic classification projects
– SchemaTrans
– ResearchWorks