Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 126

Taxonomy Strategies


Taxonomy & metadata strategies for effective content management

Melbourne, Sydney, Canberra Masterclass

6-15 June 2007

Copyright 2007 Taxonomy Strategies LLC. All rights reserved.

Todays agenda
9:00-9:10 9:10-9:15 9:15-9:45 9:45-10:00 10:00-10:30 10:30-11:00 11:00-12:00 12:00-12:30 12:30-13:30 13:30-14:30 14:30-14:45 14:45-15:15 15:15-16:15 16:15-16:30 16:30-17:00 10 minIntroduction 5 minWarm-up exercise 30 minTaxonomy fundamentals: Building taxonomies 15 minTaxonomy exercise 30 minTaxonomy fundamentals: Taxonomy business case 30 minTea Break 60 minTaxonomy governance 30 minCapabilities self-assessment 60 minLunch 60 minTaxonomy benchmarking 15 minBenchmarking exercise 30 minTea Break 60 minContent tagging 15 minTagging exercise 30 minQ&A

Taxonomy Strategies LLC The business of organized

Who I am: Joseph Busch

y Over 25 years in the business of organized information. Founder, Taxonomy Strategies LLC Director, Solutions Architecture, Interwoven VP, Infoware, Metacode Technologies
(acquired by Interwoven, November 2000)

Program Manager, Getty Foundation Manager, Pricewaterhouse

y Metadata and taxonomies community leadership. President, American Society for Information Science & Technology Director, Dublin Core Metadata Initiative Adviser, National Research Council Computer Science and Telecommunications Board Reviewer, National Science Foundation Division of Information and Intelligent Systems Founder, Networked Knowledge Organization Systems/Services
Taxonomy Strategies LLC The business of organized 3

What we do

Organize Stuff

Taxonomy Strategies LLC The business of organized

For us, taxonomy work includes:

y Metadata specification defines

the properties needed to describe content so that it can be found & used. y Vocabularies are collections of terms that are used to specify some of the metadata properties.
Some vocabularies are big

and hierarchical, some are small and flat.

y An application profile specifies

what metadata & vocabularies are required, and then represents them formally.
Taxonomy Strategies LLC The business of organized 5

Recent & current projects: Government Commercial


Taxonomy Strategies LLC The business of organized

Who are you? What sectors do you work in?

Your Role
y Administrator y Records Manager y Content Manager y Communications y Editor y Information Architect y Usability Expert y Librarian y Knowledge Engineer y Ontologist y Chief Information Officer

Industrial Sector
y Agriculture & Processing Food, Lumber, Pulp & Paper y Financial Services
Banking & Insurance

y Government
Public administration Public safety

y High Tech Computers, Software & Telecommunications y Heavy Manufacturing

Steel, Automobiles & Aircraft

y Manufacturing Consumer Products y Medical & Health Care y Mining & Refining Petrochemicals, Oil & Gas y Pharmaceuticals

Taxonomy Strategies LLC The business of organized

Why are you here?

y What are the key questions that you want answered in todays

workshop? y Please rank the questions from the most important (5) to the least important (1) y Please provide your job title, organization and department; your name is optional.
Priority (1-5) Questions

Your title or role: Your org or industry: Your dept: Your name: (optional)

Taxonomy Strategies LLC The business of organized

Todays agenda
9:00-9:10 9:10-9:15 9:15-9:45 9:45-10:00 10:00-10:30 10:30-11:00 11:00-12:00 12:00-12:30 12:30-13:30 13:30-14:30 14:30-14:45 14:45-15:15 15:15-16:15 16:15-16:30 16:30-17:00 10 minIntroduction 5 minWarm-up exercise 30 minTaxonomy fundamentals: Building taxonomies 15 minTaxonomy exercise 30 minTaxonomy fundamentals: Taxonomy business case 30 minTea Break 60 minTaxonomy governance 30 minCapabilities self-assessment 60 minLunch 60 minTaxonomy benchmarking 15 minBenchmarking exercise 30 minTea Break 60 minContent tagging 15 minTagging exercise 30 minQ&A

Taxonomy Strategies LLC The business of organized

The Taxonomy problem: How to pick from > 5,000 faucets? By:
y Category y Price y Brand y Color/Finish y # Handles y Series Name y Water Filter? y Faucet Spray y Handle Shape y Soap Dispenser?
Taxonomy Strategies LLC The business of organized 10

The main issue: What goes here?

y When do the

things in the list change? y How do we maintain the list? y What rules do we follow?

Taxonomy Strategies LLC The business of organized


Seven phases of taxonomy development

Week: 1 Identify Objectives 2 Inventory Resources 3 Specify Metadata 4 Model Content 5 Specify Vocabularies 6 Specify Procedures 1 2 3 4 5 6 7 8 9 10 11 12

Conduct interviews

Identify, gather & review resources Define fields & purpose Define content chunks & XML DTDs Compile controlled vocabularies Develop workflow, rules & procedures Manually tag small sample

7 Test & Train

Taxonomy Strategies LLC The business of organized


Taxonomy design phases need to be iterated

Plan & Prototype 1 Identify Objectives 2 Inventory Resources 3 Specify Metadata 4 Model Content 5 Specify Vocabularies 6 Specify Procedures
Interview core team and stakeholders

Alpha Dev & Test

Review tagged samples, default procedures Gather additional resources, if any Revise if needed, bake into alpha CMS Revise if needed, bake into alpha CMS

Beta D&T
Interview alpha users Gather additional sources, if any Modify CMS for beta

Final D&T
Interview beta users

Identify, gather & review resources

Define fields & purpose

Modify for 1.0

Define content chunks & XML DTDs Compile controlled vocabularies Develop workflow rules & procedures Manually tag small sample

Modify CMS for beta Revise, use in beta CMS Modify & extend workflows

Modify for 1.0 Revise using team procedu re Finalize procedure materials

Revise, use in alpha CMS

alpha workflows in CMS

7 Test & Train

Use alpha CMS to tag larger sample

Use beta CMS to tag larger sample

Finalize training materials & train staff

Taxonomy Strategies LLC The business of organized


Licensing an existing taxonomy

See Factivas taxonomy
y There are usually license fees, but these will be less than the

effort to develop an equivalent taxonomy. y But pre-existing taxonomies rarely fit an organizations needs and may require extensive customization.

y Adopt a faceted approach. y Reuse existing (especially internal) vocabularies for as many

of the facets as possible. y Plan on doing full-custom Content Type and Topic taxonomies.

Taxonomy Strategies LLC The business of organized


Free sources for 8 common taxonomies

Organization Content Type

Organizational structure.

Potential Sources
SP 800-87, U.S. Government Manual, Your organizational structure, etc.

Structured list of the various types of Dublin Core Type Vocabulary, AGLS Document content being managed or used. Type, Your records management policy, etc. Broad market categories such as lines of business, life events, or industry codes. Place of operations or constituencies. Business activities or functions performed to accomplish mission and goals. Business topics relevant to your mission & goals. Subset of constituents to whom a piece of content is directed or is intended to be used by. Names of products/programs and services. SIC, NAICS, Your market segments, etc.



FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div, US Postal Service, Your sales regions, etc. Federal Enterprise Architecture Business Reference Model, Enterprise ontology, Your business functions, etc. Federal Register Thesaurus, NAL Agricultural Thesaurus, Your research areas, etc. GEM, ERIC Thesaurus, IEEE LOM, Your psycho-graphics or personas, etc. ERP system, Your products and services, etc.

Business Activity Topic Audience

Products & Services

Taxonomy Strategies LLC The business of organized


Typical product catalog: A-Z, then idiosyncratic categories

Taxonomy Strategies LLC The business of organized


How to analyze existing product catalog categories: Principles and priorities

Preparing a product catalog for facet browsing (aka Guided Navigation) requires a category hierarchy and additional attributes. Principles
1. Categories and subcategories that could be swapped are candidates for

conversion to attributes. 2. Repeated lists of subcategories signal a possible need for an attribute. 3. The number of attributes should not exceed six or seven, so not all attribute candidates should be used.
Avoid selecting strongly correlated attributes, such as Weight and Shipping


1. Choose Categories that apply to many products, over those with few

products. 2. Choose Attributes that apply to many Categories over those that apply only to very few categories.

Taxonomy Strategies LLC The business of organized


Product categories example: Wireless carrier

Accessories Content Phones Services Batteries Cases Chargers Data Hands-Free Headsets Miscellaneous Purchased Subscription Versatile Phones Smart Devices Basic Phones Prepaid Phones International Only Phones Mobile Broadband Cards Conferencing Internet / Data Landline Phone Network & Roaming Relay Services Solutions Wireless Data

Taxonomy Strategies LLC The business of organized


Product attributes example: Digital cameras in an electronics catalog


y Types of attributes Generic attributes

Brand/Product Family/Model Price Range Usually Ships

3 Megapixels (4) 4 Megapixels (5) 5 Megapixels (27) 6-8 Megapixels (21)

Canon (15) Fuji (10) Kodak (17) Nikon (8) Olympus (9)

Merchandising attributes Usage (E-mail, Internet Browsing, Programming, ) Segment (Home, Business, Education, Government ) Region & Country Most Popular New Related Products Specialized attributes Capacity (Battery; Memory; MB; GB; BPS, ) Resolution (DPI; Megapixels; XGA, XGA, UXGA, ) Size (Display; Screen; ...) Standard (a, b, g, n, ; scsi, ata, sata, eide, ; dimm, simm, ) Type (Camera; Battery; Display; Printer; Server; Storage; Switch; )
Taxonomy Strategies LLC The business of organized

Point & Shoot (25) Digital SLR (10) Packages (5)

Price Range
$100-250 (5) $250-500 (16) $500-1000 (19) More than $1000 (3)


Faceted taxonomy theory & practice

y How many terms are needed to provide sufficient

granularity? Not as many as you think! y Post-coordinate indexing allows several simple controlled vocabularies to be combined, rather than using a single large pre-coordinated vocabulary.

Taxonomy Strategies LLC The business of organized


The power of faceted taxonomy

4 independent categories of 10

nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (104)
Easier to maintain Easier to tag by content authors Can be easier to navigate

Advocacy Contractors & Grantees Environmental Professionals Federal Facilities General Public Industry Kids Researchers & Scientists Small Business Students

Advisory Exposure Food Safety Health Assessment Health Effect Health Risk Occupational Health Pesticide Effects Sun Protection Toxicity

Agriculture & Cattle Automobile Repair Chemical Dry Cleaning Electronics & Computer Energy Extractive Industries Food Processing Leather Tanning & Finishing Metal Finishing

Allergen Biological Contaminant Carcinogen Chemical Explosive Liquid Waste Microorganism Ozone Pesticide Radioactive Waste

y Its more effective to increase

the number of facets, than to increase the number of terms per facet.

Taxonomy Strategies LLC The business of organized


Automatically created taxonomies

y Documents can be clustered

based on similarities and differences. y Problems:

Typically only a single

hierarchy No overall plan Results hard for people to navigate

What does North mean on this map?

Taxonomy Strategies LLC The business of organized


Automatic taxonomy construction software

y Software can scan large quantities of

content and extract statistically significant words and phrases. y Example:

Archive of 10 publications analyzed for

topics related to copyright.

y Software does a poor job of De-duplication. Turning significant words and phrases

into a larger structure. Discriminating between gold and garbage.

y Software is good for Getting an understanding of the key noun

phrases in a large collection. Providing test cases for evaluating a taxonomy.

Source: Sample data courtesy of nStein.

Taxonomy Strategies LLC The business of organized 23

Most popular flickr tags on 20 Feb 2007

Sort flickr categories into 5 or fewer groups. Then label each group.
Taxonomy Strategies LLC The business of organized 24

Taxonomy exercise Facet grouping

y Universal taxonomy facets
By location (spatially) By time (chronologically) By type (genre) By physical properties (size, color, shape, etc.) By subject (topic)
Richard Saul Wurman. Information Architects (1996)

Taxonomy Strategies LLC The business of organized


Taxonomy exercise Facet grouping

Sort flickr categories into 5 or fewer groups. Then label each group.
Taxonomy Strategies LLC The business of organized 26

Todays agenda
9:00-9:10 9:10-9:15 9:15-9:45 9:45-10:00 10:00-10:30 10:30-11:00 11:00-12:00 12:00-12:30 12:30-13:30 13:30-14:30 14:30-14:45 14:45-15:15 15:15-16:15 16:15-16:30 16:30-17:00 10 minIntroduction 5 minWarm-up exercise 30 minTaxonomy fundamentals: Building taxonomies 15 minTaxonomy exercise 30 minTaxonomy fundamentals: Taxonomy business case 30 minTea Break 60 minTaxonomy governance 30 minCapabilities self-assessment 60 minLunch 60 minTaxonomy benchmarking 15 minBenchmarking exercise 30 minTea Break 60 minContent tagging 15 minTagging exercise 30 minQ&A

Taxonomy Strategies LLC The business of organized

Business case and motivations for taxonomies

y How are we going to use content, metadata, and

taxonomies in applications to obtain business benefits?

Taxonomy Strategies LLC The business of organized


What technology analysts have said: Add metadata to search on!

y Adding metadata to unstructured content allows it to be managed like

structured content. Applications that use structured content work better.

y Enriching content with structured metadata is critical for

supporting search and personalized content delivery.

y Content that has been adequately tagged with metadata can be

leveraged in usage tracking, personalization and improved searching.

y Better structure equals better access: Taxonomy serves as a

framework for organizing the ever-growing and changing information within a company. The many dimensions of taxonomy can greatly facilitate Web site design, content management, and search engineering. If well done, taxonomy will allow for structured Web content, leading to improved information access.

Taxonomy Strategies LLC The business of organized

Fundamentals of taxonomy ROI

y Tagging content using a taxonomy is a cost, not a benefit. y There is no benefit without exposing the tagged content

to users in some way that cuts costs or improves revenues. y Putting taxonomy into operation requires UI changes and/or backend system changes, as well as data changes. y You need to determine those changes, and their costs, as part of the ROI.

Taxonomy Strategies LLC The business of organized


Product utilization: Taxonomy compared to search

y Conversion rate increases. Double digit increase. More than a 10% increase. Otto Group (Kaleidoscope, Freemans, Grattan, and lookagain

catalogs) 130% increase.

y Lift in average order size.

Taxonomy Strategies LLC The business of organized


Product catalog: Taxonomy compared to search

Benefit: Increased conversion rate & revenue lift

Web sales net income Increased conversion rate $ Order size lift $ Potential revenue increase per year $ $ 80,000,000 30% 24,000,000 10% 8,000,000 32,000,000

Taxonomy Strategies LLC The business of organized


Usability research: Taxonomy compared to search

y We found that users preferred a browsing oriented

interface for a browsing task, and a direct search interface when they knew precisely what they wanted.

Marti Hearst (and others)

y The category interface is superior to the list interface in

both subjective and objective measures.

Hao Chen & Susan Dumais

Taxonomy Strategies LLC The business of organized


Usability research: Taxonomy compared to search

Category is 36% faster

Category is 48% faster

140 120 100 80 60 40 20 0 C ategory List

In top 20 results Not in top 20 results

Source: Chen & Dumais

Taxonomy Strategies LLC The business of organized

Median Search Time in Seconds

Time saved: Taxonomy compared to search 1 hour per day searching x 36% faster = 22 minutes each day

22 minutes x 250 working days per year = 5500 minutes or 92 hours per year

Taxonomy Strategies LLC The business of organized


Time saved: Taxonomy compared to search


Increase service efficiency

Number of call center calls per month Average cost per call Call response costs per month Total call response costs per year Percentage of self-serviced calls due to improved information browsing Service costs savings per year $ 50,000 20

$ 1,000,000 $12,000,000 30% $ 3,600,000

Taxonomy Strategies LLC The business of organized


Trusted advisers: Taxonomy avoids costs

y The amount of time wasted in futile searching for vital

information is enormous, leading to staggering costs

Sue Feldman,
y Suns usability experts calculated that 21,000 employees

were wasting an average of six minutes per day due to inconsistent intranet navigation structures. When lost time was multiplied by staff salaries, the estimated productivity loss exceeded $10M per yearabout $500 per employee per year.
Jakob Nielsen,

Taxonomy Strategies LLC The business of organized


Knowledge workers spend up to 2.5 hours each day looking for information




But find what they are looking for only 40% of the time.
Source: Kit Sims Taylor
Taxonomy Strategies LLC The business of organized 38

Knowledge workers spend more time re-creating existing content than creating new content



Recreating existing content 25%

Creating new content 8%

Source: Kit Sims Taylor (cited by Sue Feldman in her original article)
Taxonomy Strategies LLC The business of organized 39

Cost saved by not recreating content

Benefit: Increase in productivity

Number of employees Average employee salary Employee costs per year Increase in productivity from not re-creating content Employee cost savings per year $ 100 80,000

$8,000,000 25% $2,000,000

Taxonomy Strategies LLC The business of organized


Business case summary

1. Classifications and classification-like schemes are

being used to facilitate information seeking in the workplace, and on the web. scheme (faceted navigation) when it is made available in the user interface. User Interface.

2. Users take advantage (and prefer) this type of

3. Hierarchical or facet navigation can be guided by the 4. Facet navigation is best combined with keyword

searching. E.g., keyword search followed by faceted navigation of results.


Taxonomy Strategies LLC The business of organized

Todays agenda
9:00-9:10 9:10-9:15 9:15-9:45 9:45-10:00 10:00-10:30 10:30-11:00 11:00-12:00 12:00-12:30 12:30-13:30 13:30-14:30 14:30-14:45 14:45-15:15 15:15-16:15 16:15-16:30 16:30-17:00 10 minIntroduction 5 minWarm-up exercise 30 minTaxonomy fundamentals: Building taxonomies 15 minTaxonomy exercise 30 minTaxonomy fundamentals: Taxonomy business case 30 minTea Break 60 minTaxonomy governance 30 minCapabilities self-assessment 60 minLunch 60 minTaxonomy benchmarking 15 minBenchmarking exercise 30 minTea Break 60 minContent tagging 15 minTagging exercise 30 minQ&A

Taxonomy Strategies LLC The business of organized

Taxonomy requires a business processes

y Taxonomies must change, gradually, over time if they are

to remain relevant. y Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions.

Taxonomy Strategies LLC The business of organized


Taxonomy governance can be viewed as a standards process

y Taxonomy must evolve, but in a predictable way. y Team structure, with an appeals process
Taxonomy stewardship is part-time role at most organizations. Team needs to make decisions based on costs and benefits.

y Documentation and educational materials. y Comment-handling responsibilities (part of error-

correction process) y Issue Logs. y Release Schedule.

Taxonomy Strategies LLC The business of organized


Taxonomy governance: Change process overview

2: Taxonomy Team decides when to update CV 2: NASA snapshots Taxonomy Team

Taxonomy Facets

CV Consumers
Site Search Tool

CV Sources
Subject Codes Codes

decides when to update snapshots of external CVs

Site Search Tool

Portal Portal

Taxonomy Working Copies of CVs, maintain in Tool Taxonomy Tool

Project Archives

Working Papers

NASA Expertise Competencies

CVsOther from other NASA Sources Internal

3: 3: Team adds value to Team adds value via definitions, through snapshots definitions, synonyms, synonyms, classification rules, classification rules, training materials, etc. training materials, etc. Internally Internally Created CVs Created


External External Standard Vocabularies Standard

4: Updated versions of CVs 4: Updated versions of CVspublished to to Consumers consumers


Tagging Metatagging Tool Tool Search UI Search UI

1: External controlled vocabularies (CVs) change on their own schedule

Taxonomy NASA Taxonomy Governance Governance Environment


CV = Controlled Vocabulary
Taxonomy Strategies LLC The business of organized 45

Who should build the taxonomy?

y The taxonomy (and metadata specification) should be

produced by a cross-functional team which includes business, technical, information management, and content creation stakeholders. y The team should plan on maintaining the taxonomy as well as building it.
Maintenance will not (usually) be anyones full-time job. Exact mix of people on team will change.

y It should be built in an iterative fashion, with more content

and broader review for each iteration.

Taxonomy Strategies LLC The business of organized


Taxonomy governance: Generic team charter

y Taxonomy Team is responsible for maintaining:
The Taxonomy, a multi-faceted classification scheme. Associated taxonomy materials, such as: Editorial Style Guides. Taxonomy Training Materials. Metadata Standard. Team rules and procedures for change management.

y Taxonomy Team will consider costs and benefits of

suggested changes. y Taxonomy Team will:

Manage relationship between providers of source vocabularies

and consumers of the Taxonomy. Identify new opportunities for use of the Taxonomy across the enterprise to improve information management practices. Promote awareness and use of the Taxonomy.
Taxonomy Strategies LLC The business of organized


Taxonomy governance team: Generic roles

Business Lead
Technical Specialist Taxonomy Specialist Content Specialist Content Owners

Keeps committee on track with larger business objectives. Balances cost/benefit issues to decide appropriate levels of effort. Obtains needed resources if those on committee cant accomplish a particular task.
Estimates costs of proposed changes in terms of amount of

data to be retagged, additional storage and processing burden, software changes, etc. Helps obtain data from various systems.
Committees liaison to content creators. Estimates costs of proposed changes in terms of editorial

process changes, additional or reduced workload, etc.

Suggests potential taxonomy changes based on analysis of

query logs, indexer feedback. Makes edits to taxonomy, installs into system with aid of IT specialist.
Reality check on process change suggestions.

Taxonomy Strategies LLC The business of organized


Where taxonomy changes come from

Firewall Application UI
Application Logic

Tagging UI Content Tagging Logic

Taxonomy Query log analysis

Staff notes missing concepts Tagging Staff

End User

Recommendations by Editor 1. Small taxonomy changes (labels, synonyms) 2. Large taxonomy changes (retagging, application changes) 3. New best bets content.

Taxonomy Editor

Team Considerations 1. Business goals.

experience 2. Changes in user experience.

Taxonomy Team

3. Retagging cost.
Requests from other Requestsof NASA parts from other parts of the organization

Taxonomy Strategies LLC The business of organized

Taxonomy maintenance processes

y Different organizations will need to consider their own

change processes.

Organization 1: A custodian is responsible for the content, but

checks facts with department heads before making changes. Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency. Organization 3: Marketing reps ask for a change, taxonomy editor makes demo, web representative approves it.

y Change process MUST also consider cost of

implementing the change

Retagging data. Reconfiguring auto-classifier. Retraining staff. Changes in user expectations.

Taxonomy Strategies LLC The business of organized


Taxonomy maintenance workflow

Taxonomy Tool




Suggest new name/category

Review new name

Copy edit new name

Add to enterprise Taxonomy







Sys Admin

Taxonomy Strategies LLC The business of organized


Sample taxonomy editor: Data Harmony

Hierarchy Browser

Standard Term Info

Taxonomy Strategies LLC The business of organized


Taxonomy editing tools vendors

Most popular taxonomy editor is MS Excel

An immature area No vendors are in upper-right quadrant!

Ability to Execute

High functionality /high cost products ($100K+)

Niche Players Visionaries

MultiTes is widely used, cheap with


Completeness of Vision

Taxonomy Strategies LLC The business of organized


Taxonomy maturity model

y Taxonomy governance processes must fit the organization. y As consultants, we notice different levels of maturity in the business

processes around content management, taxonomy, and metadata. y Honestly assess your organizations metadata maturity in order to design appropriate governance processes. y We are starting to define a maturity model, similar to the Software Capability Maturity Model (CMM)

Initial: Ad hoc, each project begins from scratch. Repeatable: Procedures defined and used, but not standardized across

organization or are misapplied to projects. Defined: Standard processes are tailored for project needs. Strategic training for long-range goals is in place. Managed: Projects managed using quantitative quality measures. Process itself is measured and controlled. Optimizing: Continual process improvement. Extremely accurate project estimation.

Taxonomy Strategies LLC The business of organized


Purpose of maturity model

y Estimating the maturity of an organizations information

management processes tells us:

How involved the taxonomy development and maintenance

process should be
Overly sophisticated processes will fail.

What to recommend as first steps.

y Maturity is not a goal, it is a characterization of an

organizations methods for achieving particular goals. y Mature processes have expenses which must be justified by consequent cost savings or revenue gains. y IT Maturity may not be core to your business.

Taxonomy Strategies LLC The business of organized


Taxonomy maturity scorecard

Initial Organizational Structure Executive Sponsorship Budgeting Hiring & Training Quality Assurance Manual Processes Automated Processes Project Management Estimating & Scheduling Cost Control Project Methodology Design and Execution Planning Design Excellence Development Maturity * * * * * * 2 * * 1 * * * Repeatable Defined Managed Optimizing

1 X is starting to examine search query logs, which is an important first step in improving search. But this is only an isolated example. 2 IT has a project methodology they are trying to use across all projects. But not all business units have project methodologies.

Taxonomy Strategies LLC The business of organized


Taxonomy governance self-assessment

1. Rate your organizations overall taxonomy maturity from 1 to 2. Does the search engine index more than 4 repositories

around the organization?



4 5 6 Mature


3. Are system features and metadata fields added based on

2. What type of change was most recently made to your

cost/benefit analysis, or because they are easy to do with the current applications and tools? Cost/Benefit Easy been analyzed, or are major purchases sometimes made to use up year-end money? Requirements Year-End taxonomy positions? Yes No

organizations taxonomy management environment? Standards Tools People

4. Are applications and tools acquired after requirements have


Data Quality

2. What is the area for your organizations taxonomy

management environment improvement? Standards Tools

5. Are there hiring and training practices for metadata and

Functionality Basic


Data Quality

If there is training, describe it briefly.

1. Is there a process in place to examine search query logs?



2. Is there an organization-wide metadata standard, such as the

1. Are there established qualitative and quantitative measures of

Dublin Core, for use by search tools? Yes


metadata quality?



1. Is there an ongoing data cleansing procedure to look for any

If there are measures, describe them briefly.

redundant, obsolete or trivial content (ROT)? If there is a process, describe it briefly.



2. Can the CEO explain the return on investment (ROI) for

content management, search and metadata?



Taxonomy Strategies LLC The business of organized


2005 Maturity survey: Search practices

n=87 Search Box in standard place on all web pages. Search engine indexes multiple repositories in addition to web sites. Spell Checking. Synonym Searching. Search results grouped by date, location, or other factors in addition to simple relevance score. Queries are logged and the logs are regularly examined Common queries identified, 'best' pages for those queries are found, and search engine configured to return them at the top. (Best Bets) Advanced computation of relevance based on data in addition to the text of the document. A faceted search tool, such as Endeca, has been implemented for the organization's external site or product catalog search. A faceted search tool, such as Endeca, has been implemented for the organization's internal website(s) or portal.
Taxonomy Strategies LLC The business of organized

Not current practice

Being developed

In practice

Former practice

NA or Unknown

20% (12) 25% (15) 31% (19) 41% (25) 37% (22) 31% (19) 46% (28) 43% (26) 68% (41) 57% (34)

11% (7) 21% (13) 18% (11) 23% (14) 20% (12) 25% (15) 25% (15) 16% (10) 7% (4) 15% (9)

62% (38) 44% (27) 38% (23) 30% (18) 37% (22) 31% (19) 21% (13) 25% (15) 10% (6) 17% (10)

2% (1) 2% (1) 0% (0) 0% (0) 0% (0) 5% (3) 0% (0) 0% (0) 0% (0) 0% (0)

5% (3) 8% (5) 13% (8) 7% (4) 7% (4) 8% (5) 8% (5) 16% (10) 15% (9) 12% (7)


2005 Maturity survey: Metadata practices

n=87 Metadata standards are developed for the needs of each system with no overall attempt to unify them. An Organization-wide metadata standard exists and new systems consider it during development. The Organization-wide metadata standard is based on the Dublin Core. Multiple repositories comply with metadata standard. A Cataloging Policy document exists to teach people how to tag data in compliance with organizational metadata standard. The Cataloging Policy document is revised periodically. A centralized metadata repository exists to aggregate and unify metadata from disparate sources. Metadata is manually entered into web forms. Metadata is generated automatically by software. Metadata is generated automatically, then reviewed manually for correction. Not current practice Being developed In practice Former practice NA or Unknown

22% (13) 37% (22) 52% (30) 52% (31) 48% (29)

12% (7) 37% (22) 16% (9) 20% (12) 20% (12)

37% (22) 20% (12) 21% (12) 17% (10) 20% (12)

20% (12) 0% (0) 0% (0) 0% (0) 0% (0)

10% (6) 7% (4) 12% (7) 12% (7) 12% (7)

48% (29) 57% (34) 15% (9) 38% (23) 48% (29)

15% (9) 17% (10) 12% (7) 18% (11) 18% (11)

17% (10) 17% (10) 61% (36) 27% (16) 17% (10)

0% (0) 0% (0) 3% (2) 2% (1) 2% (1)

20% (12) 10% (6) 8% (5) 15% (9) 15% (9)

Taxonomy Strategies LLC The business of organized


2005 Maturity survey: Taxonomy practices

n=87 Org Chart Taxonomy - One based primarily on the structure of the organization. Products Taxonomy - One based primarily on the products and/or services offered by the organization. Content Types Taxonomy - One based primarily on the different types of documents. Topical Taxonomy - One based primarily on topics of interest to the site users. Faceted Taxonomy - One which uses several of the approaches above. The Taxonomy, or a portion of it, was licensed from an outside taxonomy vendor. The Taxonomy follows a written 'style guide' to ensure its consistency over time. The Taxonomy is maintained using a taxonomy editing tool other than MS Excel. The Taxonomy was validated on a representative sample of content during its development. A Roadmap for the future evolution of the Taxonomy has been developed.
Taxonomy Strategies LLC The business of organized

Not current practice

Being developed

In practice

Former practice

NA or Unknown

36% (21) 37% (22) 28% (16) 20% (12) 32% (19) 75% (44) 47% (28) 35% (21) 28% (17) 38% (23)

10% (6) 10% (6) 21% (12) 36% (21) 29% (17) 3% (2) 22% (13) 17% (10) 22% (13) 40% (24)

34% (20) 32% (19) 40% (23) 34% (20) 34% (20) 14% (8) 20% (12) 40% (24) 33% (20) 13% (8)

5% (3) 5% (3) 5% (3) 3% (2) 0% (0) 0% (0) 0% (0) 2% (1) 3% (2) 0% (0)

15% (9) 15% (9) 7% (4) 7% (4) 5% (3) 8% (5) 10% (6) 7% (4) 13% (8) 8% (5)

Todays agenda
9:00-9:10 9:10-9:15 9:15-9:45 9:45-10:00 10:00-10:30 10:30-11:00 11:00-12:00 12:00-12:30 12:30-13:30 13:30-14:30 14:30-14:45 14:45-15:15 15:15-16:15 16:15-16:30 16:30-17:00 10 minIntroduction 5 minWarm-up exercise 30 minTaxonomy fundamentals: Building taxonomies 15 minTaxonomy exercise 30 minTaxonomy fundamentals: Taxonomy business case 30 minTea Break 60 minTaxonomy governance 30 minCapabilities self-assessment 60 minLunch 60 minTaxonomy benchmarking 15 minBenchmarking exercise 30 minTea Break 60 minContent tagging 15 minTagging exercise 30 minQ&A

Taxonomy Strategies LLC The business of organized

Taxonomy testing methods




Rough taxonomy

Approach Appropriateness to task

Show & explain Taxonomist SME Team Taxonomist Check conformance to editorial rules Users Contextual analysis (card sorting, scenario testing, etc.) Survey Users


Draft Consistent look and feel taxonomy Editorial Rules Rough taxonomy Tasks & Answers Rough Taxonomy UI Mockup Search prototype Tasks are completed successfully Time to complete task is reduced Reaction to taxonomy Reaction to new interface Reaction to search results

Usability Testing

User Satisfaction

Tagging Samples

Tag sample content with taxonomy

Taxonomist Team Indexers

Sample Content fit content Fills out content inventory Rough Training materials for people & taxonomy (or algorithms better)

Taxonomy Strategies LLC The business of organized

Walk-through method Show & explain


Content Type
Award Case Study Contract & Warranty Demo Magazine News & Event Product Information Services Solution Specification Technical Note Tool Training White Paper Other Content Types




Product Family
Desktops MP3 Players Monitors Networking Notebooks Printers Projectors Servers Services Storage Televisions Other Brands


Line of Business
All Home & Home Office Gaming Government, Education & Healthcare Medium & Large Business Small Business

All Asia-Pacific Canada EMEA Japan Latin America & Caribbean United States

Business & Finance Interpersonal Development IT Professionals Technical Training IT Professionals Training & Certification PC Productivity Personal Computing Proficiency

Banking & Finance Communications E-Business Education Government Healthcare Hospitality Manufacturing Petro-chemicals Retail / Wholesale Technology Transportation Other Industries

Assessment, Design & Implementation Deployment Enterprise Support Client Support Managed Lifecycle Asset Recovery & Recycling Training

All Business Employee Education Gaming Enthusiast Home Investor Job Seeker Media Partner Shopper First Time Experienced Advanced Supplier

Taxonomy Strategies LLC The business of organized


Walk-through method Editorial rules consistency check

y y y y y y y y y y y y y y y

Abbreviations Ampersands Capitalization General, More, Other Languages & character sets Length limits Multiple parents Plural vs. singular form Scope notes Serial comma Sources of terms Spaces Synonyms & acronyms Term order (Alphabetic or ) Term label order (Direct vs. inverted)

Rule Name

Editorial Rule
Abbreviations, other than colloquial terms and acronyms, shall not be used in term labels. Example: Public Information NOT: Public Info. The ampersand [&] character shall be used instead of the word and. Example: Licensing & Compliance NOT: Licensing and Compliance Title case capitalization shall be used. Example: Customer Service NOT: CUSTOMER SERVICE NOT: Customer service NOT: customer service



General, The term labels General, More, and More, Other Other shall be used for categories which contain content items that are not further classifiable. Example: Other Property Other Services General Information General Audience

Taxonomy Strategies LLC The business of organized


Task-based testing*

* Based on Donna Maurers usability work with the Australian government

y 15 representative questions were selected Perspective of various organizational units Most frequent website searches Most frequently accessed website content Correct answers to the questions were agreed in advance by team. y 15 users were tested Did not work for the organization Represented target audiences y Testers were asked where would you look for under which facet Topic, Commodity, or Geography? Then, under which category? Then, under which sub-category? Tester choices were recorded y Testers were asked to think aloud Notes were taken on what they said y Pre- and post questions were asked Tester answers were recorded
Taxonomy Strategies LLC The business of organized 65

Task-based testing Representative questions

How much cotton is imported from China? What are the impacts of mad cow" disease on U.S. meat production, sales? What is the average farm income level in your state? How much of our diet comes from fast food? How many people receive WIC benefits (Special Supplemental Nutrition Program for Women, Infants, and Children)? 6. How much acreage is planted to genetically engineered corn? 7. What is the cost of foodborne illness in the United States? 8. What part of food costs go to farmers, retailers? 9. Which States produce the most tobacco? 10. What percentage of farms in the United States are small farms? 11. What are the costs and benefits associated with providing more traceability in the U.S. food supply? 12. How many people in America dont get enough to eat? 13. What is behind the trade balance (surplus or deficit) in agricultural goods? 14. What is the extent of conservation compliance? How does that impact farmer's decisions? 15. What are the impacts of foreign trade restrictions on U.S. farmers, U.S. food prices?
1. 2. 3. 4. 5.
Taxonomy Strategies LLC The business of organized 66

Task-based testing Closed card sorting

3. What is the average farm income level in your state?

1. Topics 2. Commodities 3. Geographic Coverage

1. Topics 1.1 Agricultural Economy 1.2 Agriculture-Related Policy 1.3 Diet, Health & Safety 1.4 Farm Financial Conditions 1.5 Farm Practices & Management 1.6 Food & Agricultural Industries 1.7 Food & Nutrition Assistance 1.8 Natural Resources & Environment 1.9 Rural Economy 1.10 Trade & International Markets

1.4 1.4.1 1.4.2 1.4.3 1.4.4 1.4.5 1.4.6 1.4.7

Farm Financial Conditions Costs of Production Commodity Outlook Farm Financial Management & Performance Farm Income Farm Household Financial Well-being Lenders & Financial Markets Taxes

Taxonomy Strategies LLC The business of organized


Task based testing Card sort analysis

Find-it Tasks
1. Cotton 2. Mad cow 3. Farm income 4. Fast food 5. WIC 6. GE Corn 7. Foodborne illness 8. Food costs 9. Tobacco 10. Small Farms 11. Traceability 12. Hunger 13. Trade balance 14. Conservations 15. Trade restrictions

User 1
Cotton Cattle Farm Income

User 2
Cotton Food Safety Farm Income Asia Cattle

User 3

User 4
Cotton Cattle Farm Income

User 5
Cotton Cattle Farm Income Diet Quality & Nutrition WIC Program Corn

US States

Food Consumption Diet Quality & Nutrition WIC Program Corn WIC Program Corn

Food Expenditures Diet Quality & Nutrition WIC Program Corn WIC Program Corn

Foodborne Disease Foodborne Disease Consumer Food Safety Food Prices Tobacco Farm Structure Food System Food Security Market Structure Tobacco Farm Structure Labeling Policy Food Security Market Analysis Tobacco Farm Structure Food Safety Innovations Food Security

Foodborne Disease Foodborne Disease Food Expenditures Retailing & Wholesaling Tobacco Farm Structure Tobacco Farm Structure

Food Safety Policy Food Prices Food Security Food Security Commodity Trade

Commodity Trade Trade & Intl Markets

Commodity Trade Market Analysis

Cropping Practices Conservation PolicyConservation PolicyConservation PolicyConservation Policy Trade Policy Food Safety & Trade WTO Market Analysis Commodity Trade

Taxonomy Strategies LLC The business of organized

Task based testing Card sort results

y In 80% of the trials users looked for information under the

categories that we expected them to look for it. y Breaking-up topics into facets makes it easier to find information, especially information related to commodities.

Taxonomy Strategies LLC The business of organized


Task based testing Card sort results

Test Questions
1. Cotton 2. Mad cow 3. Farm income 4. Fast food 5. WIC 6. GE corn 7. Foodborne illness 8. Food costs 9. Tobacco 10. Small farms 11. Traceability 12. Hunger 13. Trade balance 14. Conservation 15. Trade restrictions

% Correct % Agree
91% 73% 100% 91% 100% 100% 82% 55% 100% 91% 36% 100% 36% 91% 55% 82% 64% 55% 73% 100% 100% 82% 27% 100% 91% 18% 73% 64% 91% 36%

Possible change required. Change required.

Policy of Traceability needs to be clarified. Use quasi-synonyms.

On these trials, only 50% looked in the right category, & only 27-36% agreed on the category.

Possible error in categorization of this question because 64% thought the answer should be Commodity Trade.

Taxonomy Strategies LLC The business of organized


Task-based testing User satisfaction survey

y Was it easy, medium or difficult to choose the appropriate


Easy Medium Difficult

y Was it easy, medium or difficult to choose the appropriate

Easy Medium Difficult

y Was it easy, medium or difficult to choose the appropriate

Geographic Coverage?
Easy Medium Difficult

Taxonomy Strategies LLC The business of organized


User satisfaction survey Results

More Difficult
2.00 Difficult 1.50 1.00 0.50 Topic Commodity Facet Geography


Taxonomy Strategies LLC The business of organized




User interface survey Which search UI is better?

y Criteria User satisfaction
Success completing tasks Confidence in results Fewer dead ends

y Methodology Design tasks from specific to

general Time performance Calculate success rates Survey subjective criteria Pay attention to survey hygiene:

Participant selection Counterbalancing T-scores

Source: Yee, Swearingen, Li, & Hearst

Taxonomy Strategies LLC The business of organized 73

User interface survey Results (1)

Which Interface would you rather use for these tasks? Find images of roses Find all works from a certain period Find pictures by 2 artists in the same media Overall assessment: More useful for your usual tasks Easiest to use Most flexible More likely to result in dead-ends Helped you learn more Overall preference Google-like Baseline 4 8 6 28 1 2 Faceted Category 28 23 24 3 31 29 Google-like Baseline 15 2 1 Faceted Category 16 30 29

Source: Yee, Swearingen, Li, & Hearst

Taxonomy Strategies LLC The business of organized 74

User interface survey Results (2)

9 8 7 6 5 4 3 2 1 0
7.7 6.3 4.7 4.6 3.5 5.8 7.8 6.0 4.8 4.0

7.6 6.0

7.2 6.7

7.4 5.5

se U

sy Ea

m Si

e pl

e Fl

e bl xi

s ou i ed T


s re te

g tin Ea sy to

w ro B


y jo En

e bl a O w er v

m el h

g in

Google-like Baseline Faceted Category

Taxonomy Strategies LLC The business of organized

Source: Yee, Swearingen, Li, & Hearst


Tagging samples How many items?

Goal Illustrate metadata schema Develop training documentation Qualitative test of small vocabulary (<100 categories) Number of Items 1-3 10-20 25-50 Criteria Random (excluding junk) Show typical & unusual cases Random (excluding junk)

Quantitative test of vocabularies 3-10X numberUse computer-assisted * of categories methods when more than 1020 categories. Pre-existing metadata is the most meaningful.
* Quantitative methods require large amounts of tagged content. This requires specialists, or software, to do tagging. Results may be very different than how real users would categorize content.
Taxonomy Strategies LLC The business of organized 76

Tagging samples Manually tagged metadata sample

Attribute Title URL Description Jupiters Ring System Overview of the Jupiter ring system. Many images, animations and references are included for both the scientist and the public. Web Sites; Animations; Images; Reference Sources Educators; Students Ames Research Center Voyager; Galileo; Cassini; Hubble Space Telescope Jupiter Scientific and Technical Information Planetary and Lunar Science 1979-1999


Content Types Audiences Organizations Missions & Projects Locations Business Functions Disciplines Time Period

Taxonomy Strategies LLC The business of organized

Tagging samples Spreadsheet for tagging 10s-100s of items

1) Clickable URLs for sample content 2) Review small sample and describe 3) Drop-down for tagging (including Other entry for the unexpected

4) Flag questions
Taxonomy Strategies LLC The business of organized 78

Rough bulk tagging Facet demo (1)

y Collections: 4 content sources
NTRS, SIRTF, Webb, Lessons Learned

y Taxonomy
Converted MultiTes format into RDF for Seamark

y Metadata
Converted from existing metadata on web pages, or Created using simple automatic classifier (string matching with

terms & synonyms) 250k items, ~12 metadata fields, 1.5 weeks effort

y OOTB Seamark user interface, plus logo

Taxonomy Strategies LLC The business of organized


Rough bulk tagging Facet demo (2)

Taxonomy Strategies LLC The business of organized


Document distribution How evenly does it divide the content?

y Documents do not distribute uniformly across categories y Zipf (1/x) distribution is expected behavior y 80/20 rule in action (actually 70/20 rule)
Measured v Expected Distribution of Top 10 Content Types in Library of Congress Database
350,000 Number of Records 300,000 250,000 200,000 150,000 100,000 50,000 0
Bi bl io gr ap hy Co ng re ss es og ra ph y Fi ct io n er at ur e ca ls M ap s itio ns Pe rio di St at is tic s

Leading candidate for splitting

Leading candidates for merging

Ex hi b


Top 10 Content Types

Taxonomy Strategies LLC The business of organized

Ju ve ni le



Document distribution How evenly does it divide the content?

y Methodology: 115 randomly selected URLs from corporate intranet

search index were manually categorized. Inaccessible files and junk were removed.
y Results: Slightly more uniform than Zipf distribution. Above the curve

is better than expected.

Measured v Expected Intranet Content Type Distribution
25 20 # Documents 15 10 5 0 News & Events People, Groups & Places Operations & Internal Communications Regulations, Policies, Procedures & Templates Papers & Presentations Other & Unclassified Marketing & Sales Programs, Proposals, Plans & Schedules Manuals & Learning Materials

Content Type

Taxonomy Strategies LLC The business of organized


Document distribution How does taxonomy shape match that of content?

y Hierarchical taxonomies allow
Term Group Administrators Community Groups Counselors Federal Funds Recipients and Applicants Librarians News Media Other Parents and Families Policymakers Researchers School Support Staff Student Financial Aid Providers Students Teachers % Terms % Docs 7.8 2.8 3.4 9.5 2.8 0.6 7.3 2.8 4.5 2.2 2.2 1.7 27.4 25.1 15.8 1.8 1.4 34.4 1.1 3.1 2.0 6.0 11.5 3.6 0.2 0.7 7.0 11.4

comparison of fit between content and taxonomy areas

y 25,380 resources tagged with

taxonomy of 179 terms. (Avg. of 2 terms per resource) y Counts of terms and documents summed within taxonomy hierarchy

y Roughly Zipf distributed (top 20

terms: 79%; top 30 terms: 87%) y Mismatches between term% and document% flagged
Taxonomy Strategies LLC The business of organized

Source: Courtesy Keith Stubbs, US. Dept. of Ed.


Usability testing How intuitive (repeatable) are the categorizations (1)?

y Methodology: Closed Card Sort
For alpha test of a grocery site 15 Testers put each of 71 best-selling product types into one of

10 pre-defined categories Categories where fewer than 14 of 15 testers put product into same category were flagged

Taxonomy Strategies LLC The business of organized


Usability testing How intuitive (repeatable) are the categorizations (2)?

Taxonomy Strategies LLC The business of organized


Usability testing How intuitive (repeatable) are the categorizations?

% of Testers 15/15 14/15 13/15 12/15 11/15 <11/15 Cumulative % of Products 54% 70% 77% 83% 85% 100% With Poly-Hierarchy 69% 83% 93% 100% 100% 100%

Taxonomy Strategies LLC The business of organized

The #1 underused source of quantitative information on how to improve your taxonomy?

Query Logs & Click Trails

Taxonomy Strategies LLC The business of organized


Query log & click trail examination Who are the users & what are they looking for?
y Only 30-40% of organizations regularly examine their

logs*. y Sophisticated software available, but dont wait. y 80% of value comes from basic reports

Taxonomy Strategies LLC The business of organized


Query log & click trail examination Query log

UltraSeek Reporting
y Top queries y Queries with no results y Queries with no click-through y Most requested documents y Query trend analysis y Complete server usage


Taxonomy Strategies LLC The business of organized


Query log & click trail examination Click trail packages

y iWebTrack y NetTracker y OptimalIQ y SiteCatalyst y Visitorville y WebTrends

Taxonomy Strategies LLC The business of organized


Summary Start a Measure & Improve mindset

y Taxonomy changes do not stand alone
Search system improvements Navigation improvements Content improvements Process improvements

Taxonomy Strategies LLC The business of organized


Benchmarking exercise
y What are 5 representative questions that your users ask or tasks

that your users do when using your application? y Is it currently easy, medium or difficult to answer these questions or accomplish these tasks?

Rating (Easy/ Medium/Difficult)

Questions or Tasks

Taxonomy Strategies LLC The business of organized


Conclusion What is a good taxonomy?

y Incremental, extensible process that identifies and y y y y

enables owners, and engages stakeholders. Quick implementation that provides measurable results as quickly as possible. A means to an end, and not the end in itself. Not perfect, but it does the job it is supposed to dosuch as improving search and navigation. Improved over time, and maintained.

Taxonomy Strategies LLC The business of organized


Todays agenda
9:00-9:10 9:10-9:15 9:15-9:45 9:45-10:00 10:00-10:30 10:30-11:00 11:00-12:00 12:00-12:30 12:30-13:30 13:30-14:30 14:30-14:45 14:45-15:15 15:15-16:15 16:15-16:30 16:30-17:00 10 minIntroduction 5 minWarm-up exercise 30 minTaxonomy fundamentals: Building taxonomies 15 minTaxonomy exercise 30 minTaxonomy fundamentals: Taxonomy business case 30 minTea Break 60 minTaxonomy governance 30 minCapabilities self-assessment 60 minLunch 60 minTaxonomy benchmarking 15 minBenchmarking exercise 30 minTea Break 60 minContent tagging 15 minTagging exercise 30 minQ&A

Taxonomy Strategies LLC The business of organized

Tagging Overview
y Tagging is better than the words that happen to occur in a

piece of content. y All tagging is useful

End user tagging Tagging by librarians Automated tagging by OS and algorithms

y Content should be tagged throughout its lifecycle, each

time the content is handled and used so that it accrues value or its significance is diminished.

Taxonomy Strategies LLC The business of organized


MS Office: File Properties

Ho wm any

peo ple f

ill t his


Taxonomy Strategies LLC The business of organized



Ho wm an yp eo p le

cli ck on thi s?

Taxonomy Strategies LLC The business of organized


What is social tagging?

y End user tagging y Easy, intuitive tagging interfaces y Almost instantaneous feedback
Enables people to tag & re-tag content in response to seeing their tags in context with other tags.

y Emergent categories
Resembles open card sort process in which patterns emerge rather than validating categories using closed card sorts.

Taxonomy Strategies LLC The business of organized


Social tagging innovators

y flickr founders
Caterina Fake Stewart Butterfield

y founder
Joshua Schachter

y & flickr are now both part of Yahoo! y As of April 2006 flickr had 130 million photos posted by 3

million registered users.

Taxonomy Strategies LLC The business of organized


Four tagging rules for end users

Rule Description

Use specific terms Apply the most specific terms when tagging content. But do not tag every possible topic, just the ones that are most important or best characterize the content as a whole. Use multiple terms Use as many terms as necessary to describe overall What the content is about & Why it is important. Do not over-tag. Use appropriate terms Consider how content will be used Only fill-in the facets & values that make sense. Not all facets apply to all content. Anticipate how the content will be searched for in the future, & how to make it easy to find it. Remember that search engines can only operate on explicit information.

Taxonomy Strategies LLC The business of organized

y Content Tagging y Tagging Interface

Taxonomy Strategies LLC The business of organized


Requirements for a tagging interface

y Automated form fill-in (automatically fills in known data) y Tagging precedents (see tags already assigned by y y y y y y y y y

others) Controlled vocabularies, e.g., with pull-down list Multi-valued tags Geo-tagging Group tagging Clean-up tag tools, e.g., alpha list Batch editing Share/Dont share (Public/Private) Identified owner (who can be emailed) Almost immediate feedback, e.g., tag cloud

Taxonomy Strategies LLC The business of organized


Form fill-in: Automatically filled-in known data

Taxonomy Strategies LLC The business of organized


Form fill-in: Automatically filled-in known data

Manual form fill-in w/ check boxes, pull-down lists, etc.

Auto keyword & summarization

Taxonomy Strategies LLC The business of organized


Form fill-in: Automatically filled-in known data

Auto-categorization Rules & pattern matching

Parse & lookup (recognize names)

Taxonomy Strategies LLC The business of organized


Tagging precedents: See tags assigned by others

Taxonomy Strategies LLC The business of organized


Multi-valued group tagging

Taxonomy Strategies LLC The business of organized


Group geo-tagging

Taxonomy Strategies LLC The business of organized


Group geo-tagging

Taxonomy Strategies LLC The business of organized


Clean up tag tools: Alpha list

Taxonomy Strategies LLC The business of organized


Batch edit

Taxonomy Strategies LLC The business of organized


Share or dont share tagging

Taxonomy Strategies LLC The business of organized


Bulk tagging
y ID collection of related content items by pattern or context y Then, apply same attributes to all content items

Taxonomy Strategies LLC The business of organized


Tag a folder
y Drag & drop content items into folder y Then, content items inherit properties of folder

Taxonomy Strategies LLC The business of organized


y Approve & improve mindset

Create Content Review & Improve

Add Metadata Review & Improve


Taxonomy Strategies LLC The business of organized


Interactive rewards
y Almost instantaneous exposure of tags in simple user

interfaces on the web provides positive reinforcement for user tagging that simply did not exist before. y For example,
Most popular Tag clouds Alerts

Taxonomy Strategies LLC The business of organized


Most popular

Another example is most emailed from, e.g., the NY


Taxonomy Strategies LLC The business of organized


Tag cloud

Taxonomy Strategies LLC The business of organized


y New (content selected by date) y Subscriptions (content selected by tags) y Interest (content selected by other people) y Individual (content selected for you by other people)

Taxonomy Strategies LLC The business of organized


Taxonomy Strategies


Is faceted indexing the future of social tagging?

6-15 June 2007

Copyright 2007 Taxonomy Strategies LLC. All rights reserved.

Tagging exercise: Blog tagging (a)

ALA Tech Source.

Taxonomy Strategies LLC The business of organized 121

Tagging exercise: Blog tagging (b)

Taxonomy Strategies LLC The business of organized 122

Tagging exercise: Taxonomy facetsdefinitions

Taxonomy Facets Business activity Industry / Product Geography Organization Person / Role Content Type Audience Topic Descriptions Use for common business function or activity such as finance, marketing and sales. Use for content that is about or related to an industrial sector or product such as construction equipment. Use for content that is about a region, country or city. Use for named organizations, brands and business entities. Use for named people and the roles people have in organizations. Use for content genres such as letters, memos and reports. Use to indicate the intended audience. Use for other business and associated topics that the content is about or related to.

Taxonomy Strategies LLC The business of organized

Tagging exercise: Taxonomy facetsvalues

Business activity Geography Industry / Product Organization / Entity People / Role Content Type Audience

Accounting Auditing Finance HR management IT Marketing Operations management Sales

Africa Americas Antarctica Asia Europe Oceania Global Historical geography Oceans & seas Regions

Agriculture Mining Utilities Construction Manufacturing Wholesale trade Retail trade Transportation & warehousing Information Finance & insurance Real estate Professional Management Administrative support Education Health care Arts, entertainment & recreation Accommodation & food Other services Public administration

Business entities Companies & brands Government agencies International NGOs Organization types

Business Leaders Thought Leaders Political Leaders Roles

Basic facts & information Blog Brochure Database E-mail Letter Memo Multimedia Report Newsletter Podcast Press Release Research & Analysis RSS Feed

Consumer Employee Manager Executive

Taxonomy Facets Business activity Industry / Product Geography Organization Person / Role Content Type Audience


Taxonomy Strategies LLC The business of organized



y There are lessons to be learned from web tagging about

how to get good metadata in document and content management applications. y Document and content management system tagging must be simple, and it must be almost instantaneously easier to find relevant work products.

Taxonomy Strategies LLC The business of organized


Taxonomy Strategies


Joseph A. Busch + 415-377-7912

6-15 June 2007

Copyright 2007 Taxonomy Strategies LLC. All rights reserved.

You might also like