Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 49

Knowledge Organization:

Library Tools and


Taxonomies for the Web
Jan Herd jher@loc.gov
Business Reference Services
Science, Technology & Business Division
The Library of Congress
2
Web is too big to organize?
One billion pages
1.5 million pages added daily
Selection of sites by
collection development
specialists/reference
librarians

3
Librarians work in
corporate settings
Yahoo.com (directory)
Northern Light.com
(search engine)
Amazon.com (e-book seller)
Microsoft.com
4
OCLC Library Corporation
Cooperatively Catalogs:

45 Million Works

350,000 Web sites and


growing

5
Traditional Library Tools
on the Web
Medical Subject Headings 1996

Web Dewey 2000

Classification Web 2001


(LCSH & LCC)

6
Importance of controlled
vocabulary as metadata
American Library Association
Subject Analysis Committee (SAC)
Subcommittee on Metadata and
Subject Analysis recommendations

http://www.ala.org/alcts/organization/
ccs/metarept2.html

7
Controlled Vocabularies
Why We Need Them
 Used “behind” search engines
 Standard in online databases
 New adherents (i.e., Web Content
Managers utilizing Taxonomies)
 They Work !

8
Sherry Vellucci, Associate
Professor, St. John’s Univ., during
the Conference on Bibliographic
Control for the New Millennium:
“authority control is not only
wonderful, but critical. Controlled
vocabulary mediating tools
should cover Subjects, Genres,
Gazetteers, Names and Titles,
etc.”
9
Metathesauri/Subject Correlations
 Universal Medical Language System
(UMLS) maps over 60 medical and
health care thesauri in one
http://www.nlm.nih.gov/pubs/
factsheets/umlsmeta.html
 ClassificationWeb
The Library of Congress subject
headings and LC classification
correlations
http://classweb.loc.gov

10
11
12
13
14
15
16
17
18
19
20
21
Mapping:
Standard information exchange
systems
 Dublin Core to MARC
 http://lcweb.loc.gov/marc/dccross.html
MARC to Dublin Core
http://www.loc.gov/marc/marc2dc.html
 XMLMARC Crosswalk
http://lcweb.loc.gov/marc/marcsgml.html (Must
download files)
 MARC to XML to MARC Converter
http://www.logos.com/marc/default.asp

22
Mapping:
Specialized information exchange
systems
Standard Industrial Classification
(SIC codes)
to
North American Industrial Classification
System (NAICS codes)

23
24
SIC Code Example
 Major group 73=Business services
 737=Computer programming, data
processing, and other computer related
services, 7372=Prepackaged software
 Equivalent NAICS codes are:
 Major group=51 Information
 511=Publishing industries
 5112=Software publishers (with cross ref. to
Sector 42 for reselling packaged software)

25
Using old and new tools for
knowledge organization on the
Web

Water into Wine


26
What is a Taxonomy ?

A high level information


search device constructed to
provide a means of
understanding, navigating,
and gaining access to
intellectual capital.

27
History of Taxonomies
Aristotle
384 - 322 B.C.

Kallimachos Carl Linnaeus


305 - 240 B.C. 1707-1778

Library of Alexandria 28
“Classification” is
used much more
frequently than
“Taxonomy”, in all
fields of study.

29
Numerous formal
taxonomies are
maintained by
government and
commercial
enterprises
30
Taxonomies are used in:

 Customized search engines

 Interfaces in web portals

31
32
33
Service Codes CODE TITLE
A Research and Development
B Special Studies and Analysis ‑ Not R&D
C Architect and Engineering Services ‑ Construction
D Information Technology Services, including Telecommunication Services
E Purchase of Structures and Facilities
F Natural Resources and Conservation Services
G Social Services
H Quality Control, Testing and Inspection Services
J Maintenance, Repair, and Rebuilding of Equipment
K Modification of Equipment
L Technical Representative Services
M Operation of Government‑Owned Facilities
N Installation of Equipment
P Salvage Services
Q Medical Services
R Professional, Administrative and Management Support Services
S Utilities and Housekeeping Services
T Photographic, Mapping, Printing, and Publication Services
U Education and Training Services
V Transportation, Travel and Relocation Services
W Lease or Rental of Equipment
X Lease or Rental of Facilities
Y Construction of Structures and Facilities
Z Maintenance, Repair or Alteration of Real Property
34
35
36
How do we define
taxonomies in a wired world ?
 Taxonomy: A classification of elements within a
domain
 Domain: a sphere of knowledge, influence, or
activity
 Classification: the operation of grouping elements
and establishing relationships between them (or
the product of that operation)
 Relationships: a defined linkage between two
elements
 Element: an object or concept

Crandall, Mike.”Taxonomies for the Real World: The Business


37
Imperative to Simply Content Access” TFPL Taxonomies for
Business Conference, London, Oct.23, 2000.
What are Taxonomies Good For?
Taxonomies are applied to:
 Items (aka resources) individual pieces of
information (documents, people...
By the use of:
 Metadata: (aka properties, attributes) information
describing types of data
Which may or may not use values from a:
 Vocabulary: selection of terms, classified or sorted
To create:
 Content: an item and its associated metadata

Crandall, Mike.”Taxonomies for the Real World: The Business


Imperative to Simply Content Access” TFPL Taxonomies for
Business Conference, London, Oct.23, 2000. 38
Challenges
 Information management across divisions of your
agency
 Agency global intranets/Internet portals
 Global or national document management
including technical documentation
 Incorporating taxonomy technology into agency
technology +info. policies
 Cost of building a taxonomy
 Moving a taxonomy from overhead to being a
core part of your agency’s information
management.

39
More Challenges
 Certification of the taxonomy by an
authoritative body.
 Finding common ground across multiple
taxonomies or schemas with similar terms
and different meanings.
 Ensuring the ongoing integrity of the
taxonomy with constant maintenance.
 Acceptance by developers of tagging tools.
 Integrating with a legacy system and
external content.

40
The core expertise required for
constructing a taxonomy is:
 Systems Analyst who understands
specifications for creating taxonomies
 Domain expert/Subject expert in the subject
of the taxonomy
 Computational linguist, AI engineer
 Linguist and/or Lexicographer
 Database/Application Development Expert
 Administrative Support
 Review Support

41
Example of a custom taxonomy marked up in xbrl:

<?xml version=”1.0" encoding=”utf-8"?>


<schema xmlns:xbrl=”http://www.xbrl.org/core/2000-07-31">
targetNamespace=”http://www.xbrl.org/us/gaap/ci/2000-07-31">
<import namespace=http://www.xbrl.org/core/2000-07-31/
schemaLocation=”http://www.xbrl.org/core/2000-07-31/
xbrl-meta-2000-07-31.xsd”/>

<element name=”propertyPlantAndEquipmentGrossNote.purchasedSoftwareForInternalUse”
type=”monetary”>
<annotation>
<documentation>this is software that...</documentation>
<appinfo>
<xbrl:rollup
to=”ci:propertyPlantAndEquipmentNetNote.propertyPlantAndEquipmentGrossNote”
weight=”1" order=”7.5" />
<xbrl:label xml:lang=”en”>Purchased software for internal use</xbrl:label>
<xbrl:reference name=”GPSI” number=”73" chapter=”11" paragraph=”b”
subparagraph=”i” />
</appinfo>
</annotation>
</element>
</schema>

42
43
Recommendations:
 Actively seek out existing taxonomies in the target discipline or
subject area. If your needs are met in part by an existing
taxonomy use it and build on it.
 Look at the intended purpose of the taxonomy and select
appropriate software tools.
 Consider scalability of the taxonomy. Look at the big picture
and see how the taxonomy will be able to hook into others.
 Consider utilizing numerical taxonomy as a schema in the
metadata in order to merge documents in foreign languages.
 Accommodate new standards whenever possible.
 Document “Best Practices” while creating the taxonomy and
review them regularly.
 Maintain and update the taxonomy continually.

44
Meta Model
(Describes how
taxonomies Existing
are created) Taxonomy
in your Field

Your Related
Agency Taxonomy of other
Taxonomy agency in same field

Related
Taxonomy of other
Core Schema agency hooked
Electronic
(Describes how to one above
Document
document is
in XML
to be created)
45
Efficient Web information
retrieval systems
in the form of search engines
or Web portals
require continued support and
improvement of:
46
 Web based classification and
numerical taxonomic tools to use in
 Web based cataloging tools such as
CORC, which provides metadata
based on
 Taxonomies such as controlled
vocabularies/thesauri which will be
hooked together using
 Metathesauri and standard
information exchange systems such
as MARC-XML

47
And this is the house that
Jack built…

With a wine cellar...

48
Knowledge Organization:
Library Tools and
Taxonomies for the Web
Jan Herd jher@loc.gov
Business Reference Services
Science, Technology & Business Division
The Library of Congress 49

You might also like