Taxonomies and Metadata in Information Architecture
- 1. Taxonomies
and Metadata
for Information
Architecture
Alice Redmond-Neal
Thesaurus Development Manager
Access Innovations, Inc. - Booth 217
ared@accessinn.com
Internet Librarian 2005
- 2. What we’ll cover
Key definitions
Taxonomies, metadata, information architecture
How taxonomies and metadata influence
information architecture
Using taxonomies to enhance retrieval
2 Copyright © 2005 Access Innovations, Inc.
- 3. Key points
A taxonomy provides both a browsable outline
and descriptive metadata.
Metadata provide efficient searchable handles
for content.
Taxonomy-based subject metadata yields the
most precise retrieval.
Taxonomy is the basis for Information
architecture.
Information architecture that takes full
advantage of taxonomy and subject metadata
supports findability.
3 Copyright © 2005 Access Innovations, Inc.
- 4. What’s a taxonomy?
Words
Controlled vocabulary for a subject area
Descriptive labels
Hierarchy
Simple hierarchical view of a thesaurus
Knowledge organization system
Storage and retrieval aid
4 Copyright © 2005 Access Innovations, Inc.
- 5. Info retrieval starts with a
knowledge organization system
Uncontrolled list Not complex
Name authority file
Synonym set/ring
Controlled vocabulary
Taxonomy
Thesaurus
Ontology
Highly complex
Semantic network
5 Copyright © 2005 Access Innovations, Inc.
- 6. Structure of
controlled vocabularies
List of words Synonyms Taxonomy Thesaurus
INCREASING COMPLEXITY
Ambiguity control Ambiguity control Ambiguity cont’l
Synonym control Synonym control Synonym cont’l
Hierarchical rel’s Hierarchical rel’s
Associative rel’s
6 Copyright © 2005 Access Innovations, Inc.
- 7. Taxonomy? Thesaurus?
Often used interchangeably
Thesaurus is a taxonomy with extras
RelatedTerms
Nonpreferred Terms (USE/Used for)
Scope Notes
more
Use the word your audience understands
Avoid confusion with Roget’s thesaurus
7 Copyright © 2005 Access Innovations, Inc.
- 8. Taxonomy Thesaurus
view Term Record
view
8 Copyright © 2005 Access Innovations, Inc.
- 9. Basic taxonomy / thesaurus features
Hierarchy structure
BroaderTerms = more general concepts
Narrower Terms = more specific concepts
Related Terms = conceptual cousins
Term equivalents
Facets
Scope notes
Other elements as needed
9 Copyright © 2005 Access Innovations, Inc.
- 10. Perspectives on taxonomies
Taxonomist
(aka Lexicographer, Thesaurus builder)
Indexer
Information architect
Searcher
Each has a different view and need for words in
retrieving information.
Each need relates to using a taxonomy for
indexing / categorizing content.
10 Copyright © 2005 Access Innovations, Inc.
- 11. Taxonomies for
information retrieval online
Conceptual framework for web content –
reflects organization of knowledge in a
domain
Foundation for information architecture
Term records contain valuable info
Often 3 levels deep – depends on domain
May be displayed in full or part, modified,
11
or hidden Copyright © 2005 Access Innovations, Inc.
- 12. Taxonomy display
depends on purpose
Descriptive taxonomy Navigational taxonomy
Includes term variants, Reflects user’s mental model
synonyms, nonpreferred terms Reflects user’s vernacular
Query term expansion links Supports discovery through
synonyms to valid taxonomy browsing
term May be modified version of full
Supports discovery through taxonomy
hierarchy, Related Terms
Used primarily at indexing
stage of content mgmt
Hidden taxonomy
Usually alphabetic links to terms
workflow to categorize May recognize term variants
documents
12 Copyright © 2005 Access Innovations, Inc.
- 13. Taxonomy
provides a way to describe the content --
the basis for subject metadata
Metadata
provide a way for that description to be
captured for a website
13 Copyright © 2005 Access Innovations, Inc.
- 14. What’s metadata?
“When it comes to definitions, metadata is a
slippery fish.”
Data about data
Tags used to describe documents, pages, images,
software, video and audio files, and other content objects
for the purposes of improved navigation and retrieval
Finding tool
Keywords not displayed to the viewer but available to
search engines
Viewable in HTML keyword meta tag field of
most web sites
Peter Morville, Louis Rosenfeld
14 Copyright © 2005 Access Innovations, Inc.
- 15. Data about data - like what?
Title
Author name
Date of creation
Language used in the creation
Publisher
Subject of the creation
Keywords... our focus re: taxonomies
Other stuff, depending on need
Dublin Core is a well-known metadata standard,
but metadata schemas are commonly custom-designed
15 Copyright © 2005 Access Innovations, Inc.
- 16. How does metadata work?
Search engine / web crawler looks at the
HTML header on a web page
View Page source
Subject Metadata is one part of the HTML
header
<META NAME="KEYWORDS" CONTENT= … >
16 Copyright © 2005 Access Innovations, Inc.
- 17. 17 Copyright © 2005 Access Innovations, Inc.
- 18. <META NAME="KEYWORDS" CONTENT=
"content management software, xml thesaurus,
data management, database management system,
concept extraction, document management software,
information management system, information retrieval,
knowledge extraction, knowledge management software,
machine aided indexing, taxonomy management system,
text management, text retrieval,
thesaurus management software, xml">
A search including these words/phrases will retrieve this website.
18 Copyright © 2005 Access Innovations, Inc.
- 19. Taxonomy terms as metadata
Most precise topic identifiers –
100% relevant
Searchable as metadata
Gives more precise results than free text
search – if you know what you’re looking for
Prevents hit on random occurrence of your
query word
19 Copyright © 2005 Access Innovations, Inc.
- 20. What’s Information Architecture?
The art and science of structuring and
classifying web sites and intranets to help
people find and manage information
Content + Structure + Function =
the basis for
User Experience
Peter Morville, Louis Rosenfeld
20 Copyright © 2005 Access Innovations, Inc.
- 21. 21 Copyright © 2005 Access Innovations, Inc.
- 22. What’s an Information Architect?
“I’m an Information Architect.
I organize huge amounts of information
on big web sites and intranets
so that people can actually find what they want.
Think of me as an Internet Librarian.”
Peter Morville, Louis Rosenfeld
22 Copyright © 2005 Access Innovations, Inc.
- 23. What IA is not
Graphic / visual design
Software development
Content management
Knowledge management
Coding (HTML, etc.)
Usability engineering
Library science
23 Copyright © 2005 Access Innovations, Inc.
- 24. Information Architecture –
major components
Taxonomies
Metadata
Organization
Search
Labeling
Navigation
24 Copyright © 2005 Access Innovations, Inc.
- 25. 1–Taxonomies aid site organization
Taxonomy provides
Framework for content organization
Hierarchical outline of your content by subject
categories
Basis for faceted browsing
25 Copyright © 2005 Access Innovations, Inc.
- 26. Categories
show clearly
what’s covered
in this domain
26 Copyright © 2005 Access Innovations, Inc.
- 27. Value of Category search
Searchers find info 50% faster using
browsable categories than using list
returned from free text search
Results even stronger when results not in top
20 returns
Searchers prefer browsable category
search
Chen, H., and Dumais, S.
27 Copyright © 2005 Access Innovations, Inc.
- 28. MediaSleuth – displaying
taxonomy categories improves IA
MediaSleuth is:
Online source of educational media
Videos,software, audio, etc.
Over 96,000 products, nearly 64,000 titles
Based on NICEM database (National
Information Center for Educational Media)
Content – Excellent
Findability – ?
28 Copyright © 2005 Access Innovations, Inc.
- 29. 29 Copyright © 2005 Access Innovations, Inc.
- 30. 30 Copyright © 2005 Access Innovations, Inc.
- 31. MediaSleuth draws on XML-tagged elements under
the
hood
31 Copyright © 2005 Access Innovations, Inc.
- 32. Machine Aided Indexer (M.A.I.)
suggests taxonomy descriptors
32 Copyright © 2005 Access Innovations, Inc.
- 33. Taxonomy terms on documents
help sort and organize the content
M.A.I. suggests the correct terms from the
taxonomy as descriptors
M.A.I. rulebase recognizes term equivalents
germs Microorganisms
vaccin* Pharmaceutical drugs
Recognizing term equivalents
enables enhanced search
33 Copyright © 2005 Access Innovations, Inc.
- 34. Taxonomy descriptors
become subject metadata
Selected descriptors are XML-tagged and
stored with document
Descriptors available as webpage metadata
Metatags enable precise document retrieval
Term equivalence enables query expansion
in search (coming)
34 Copyright © 2005 Access Innovations, Inc.
- 35. Search: body growth
Complete database
Free text search
8 hits — some irrelevant
Free text search on titles
6 hits — limited recall
Search by taxonomy descriptor (AKA category)
470 hits
100% relevant
100% recall
1,100 document sample
Category search results
3 hits
35 Copyright © 2005 Access Innovations, Inc.
- 36. Sidebar: Recall, Precision,
and Relevance Search for body growth
If you retrieve B, C, F, G
100% recall, 100% precision,
Documents 100% relevant
If you retrieve B, C
A, D, E, H, I, J
50% recall, 100% precision,
100% relevant
Documents If you retrieve B, C, H, J
tagged 50% recall, 50% precision,
“body growth” 50% relevant
B, C, F, G If you retrieve A, D, E, H
0% recall, 0% precision,
0% relevant
36 Copyright © 2005 Access Innovations, Inc.
- 37. Display taxonomy categories
to improve MediaSleuth search
Results from
sample
of 1,100
documents
(not all categories
37 are populated)
Copyright © 2005 Access Innovations, Inc.
- 40. 40 Copyright © 2005 Access Innovations, Inc.
- 41. Facets offer finer organization
Add details about any term
Pre-established aspects that pertain to each item
Cross-cut a taxonomic hierarchy
Basis for fine-tuning search results
Market group / audience
Price
Color
Sizerange
Source / company
Other attributes, varying by domain and need
- 42. Facets describe all / most
items by Department, Price,
42 Copyright © 2005 other Attributes
Color, Access Innovations, Inc.
- 43. “Taxonomies and faceted models
provide users with tools
to see the forest and
quickly focus on a specific tree.”
Sullivan, D.
43 Copyright © 2005 Access Innovations, Inc.
- 44. Alternative ways to display
content organization
Alphabetically
Chronologically
Geographically
Permuted list of taxonomy terms
Content management system
management system, Content
system, Content management
44 Copyright © 2005 Access Innovations, Inc.
- 45. 2–Taxonomies aid search
Taxonomy provides
Authority terms of a controlled vocabulary
Synonyms and other alternative expressions
Typos (lathes, laiths, laths, layth…)
Obsolete names (Cooper’s plane / Lamb’s
tongue)
Query expansion
45 Copyright © 2005 Access Innovations, Inc.
- 46. Search: kangaroo
Leverage
taxonomy term
information
to aid search
46 Copyright © 2005 Access Innovations, Inc.
- 47. 47 Copyright © 2005 Access Innovations, Inc.
- 48. 48 Copyright © 2005 Access Innovations, Inc.
- 49. SLA search
Interpret search word
“competencies”
as taxonomy term
Professional
competencies
49 Copyright © 2005 Access Innovations, Inc.
- 50. Returns all documents in
Professional competencies
category
50 Copyright © 2005 Access Innovations, Inc.
- 51. Search: thesaurus
Interpret “thesaurus”
as term Thesauri,
return all documents
in that category.
51 Copyright © 2005 Access Innovations, Inc.
- 52. Search “taxonomy”
in XML descriptor field
returns all documents
in that category 27
Search in original
metadata 1
Solution:
Include descriptors
52 Copyright © 2005with metadata!
Access Innovations, Inc.
- 53. 3–Taxonomies aid labeling
Taxonomy provides
Basis for labels on site/portal
Concepts that can be re-worded for audience
53 Copyright © 2005 Access Innovations, Inc.
- 54. SLA website and thesaurus
Navigational Descriptive
Taxonomy Taxonomy
for end user for
Indexers
54 Copyright © 2005 Access Innovations, Inc.
- 55. Adapt taxonomy terms for labeling
What words do users use? Gather variants from
Search logs
User focus groups
Subject matter experts
Tailor site/portal labels to typical users
Include variants as Nonpreferred terms
(USE/Used for equivalents) in taxonomy
M.A.I. can also capture variants as rules without
formalizing them as Nonpreferred terms
55 Copyright © 2005 Access Innovations, Inc.
- 56. 4–Taxonomies aid navigation
Taxonomy provides
Major categories
Expansion to Narrower Terms
Additional term information
56 Copyright © 2005 Access Innovations, Inc.
- 57. Taxonomy Expanded
Top categories
Categories & additional
information
57 Copyright © 2005 Access Innovations, Inc.
- 58. Drop-down menus
reflect Narrower Terms
and Related Terms
58 Copyright © 2005 Access Innovations, Inc.
- 59. Integrate taxonomy
to enhance findability
Browsable categories of a directory
Browsable faceted navigation
Smart search for term equivalents
Taxonomy terms (original or modified) as
labels
Navigation aids incorporate taxonomy
terms and relationships
59 Copyright © 2005 Access Innovations, Inc.
- 60. Use software tools to support IA
Thesaurus creation / management tools
ANSI/NISO standards compliant
Support features you need
Customizable fields
Import ability
Categorization tools
Human / automatic / hybrid categorizer
Content management systems
60 Copyright © 2005 Access Innovations, Inc.
- 61. TAXONOMY ABC Company ---
---
Foundation of information ---
---
--- Your
architecture ---
---
--- Portal
---
---
Source of subject ion
t
metadata n iza ls
e
ga be id
n
r a
o l la c h
io
Path to portal nt
at
te rta ar
ig
n o
av
usability Co P Se
N
Natural science
Biology
Botany
TAXONOMY Medicine
Physical science
Astrononmy
Chemistry
Physics
Your
Content
- 62. Recap
Taxonomies and metadata are cornerstones of
information architecture
Taxonomies are the basis for content organization
Taxonomies provide a browsable outline of your
content
Subject metadata using taxonomy terms yield 100%
relevant retrieval
Taxonomies are the basis for search, labeling, and
navigation in information architecture
Tools that recognize synonyms (query expansion)
improve taxonomy implementation
62 Copyright © 2005 Access Innovations, Inc.
- 63. References
Aitchison, J., Gilchrist, A., and Bawden, D. Thesaurus
Construction and Use: A Practical Manual (4th edition).
Aslib, 2000
Chen, H., Dumais, S., Bringing order to the web:
automatically categorizing search results. Proceedings of
the ACM SIGCHI Conference on Human Factors in
Computing Systems (CHI'00), ACM (2000) 145-152.
Rosenfeld, L., and Morville, P. Information Architecture
for the World Wide Web. O'Reilly, 1998.
Sullivan, D., Proven Portals: Best Practices for Planning,
Designing, and Developing Enterprise Portals. Addison
Wesley, 2003
63 Copyright © 2005 Access Innovations, Inc.
- 64. Thank you!
Questions?
Alice Redmond-Neal
Access Innovations, Inc.
Data Harmony software
Thesaurus Master and Machine Aided Indexer
ared@accessinn.com
(505) 998-0800
64 Copyright © 2005 Access Innovations, Inc.
Editor's Notes
- Who is IA? Who is into taxonomies? Who is generally curious? Disclaimer: not librarian, IA, programmer, etc. Have something to offer on taxonomies, defer to many for IA stuff--- share!
- Any search requiring a taxonomy term search is impossible unless the site shows the taxonomy. (MediaSleuth did in the past) Whatever we can do to improve user’s experience (search and find) is good.
- Taxonomy allows you to sort out content by conceptual categories – by topic or subject -- by “aboutness” There are other forms of organization – alpha, chronological, geographical, audience, etc.
- Special attention to Non-Preferred Term -- goldmine
- All potentially applicable for a website’s IA
- History of the term – Jack Meyers coined term “metadata” for products associated with his MetaModel and for his company The Medadata Company, registered trademark for the term in 1986. Found in Page Source or Page Info for any website.
- If you know what you’re looking for -- Return to this point later and talk about smarter searching.
- Function = Findability
- There are other forms of organization – alpha, chronological, geographical, audience, etc. Taxonomy organizes by topic, by subject, by aboutness.
- Room for improvement
- Under the hood – the content management workflow stage, including indexing
- Recognizing term equivalents – important point, we’ll see more on this later.
- Facets work especially well when most items in the database can be described in multiple ways, have numerous aspects to consider…. E-commerce products, pharmaceuticals, etc.
- There are other forms of organization – alpha, chronological, geographical, audience, etc. Taxonomy organizes by topic, by subject, by aboutness.
- Search recognizes singular/plural and stemming (kangarooers, kangaroo-paws) Links to Broader and Narrower Terms and to Related Terms
- For MediaSleuth, we are progressing toward doing just that and more. I introduced Machine Aided Indexer or MAI earlier. It is the categorizing assistant that prompts taxonomy terms for indexing—ultimately for subject metadata— based on words in a document. Those words come from a wide range of synonyms that writers use. MAI expands on the query, using its rulebase to link the search word to taxonomy terms. Let’s follow a search on the word “germs”
- SLA also uses MAI behind the scenes to match search words to terms in their taxonomy and then to corresponding documents.
- A search on the word “competencies” returns all documents in the category Professional competencies — documents for which MAI had suggested that term from their taxonomy.
- Searching the word “thesaurus” (read as taxonomy term Thesauri ) yields 3 documents by looking at the descriptors, but 0 hits by looking at the original metadata supplied with the document.
- Searching on the word “taxonomy” yielded 27 documents with Taxonomies as an indexing term, but only one having that word in the document’s original metadata. SLA’s search system takes advantage of two kinds of search: 1—a targeted search for document descriptors drawn from their own taxonomy, including the synonyms for taxonomy terms, and 2—a search of the original metadata. The two could be combined by including all document descriptors in the subject metadata.
- Expands to 2 nd and 3 rd levels of taxonomy, includes Related Terms