On Your MARC, Get
Set, Code!
Hosted by Core: Leadership, Infrastructure, Futures
March 23, 2022
Paul Daybell
Archival Cataloging Librarian
Andrea Payant
Metadata Librarian
Liz Woolcott
Cataloging and Metadata Services Unit Head
Project Team Leadership
Anna-Maria Arnljots
Metadata Assistant
Paul Daybell
Archival Cataloging Librarian
Kurt Meyer
Government Information and E-
Resource Cataloger
Andrea Payant
Metadata Librarian
Becky Skeen
Special Collection Cataloging Librarian
Liz Woolcott
Cataloging and Metadata Services Unit Head
Full Research Team
• Anna-Maria Arnljots
• Josee Butler
• Ryan Bushman (Stats)
• Paul Daybell
• Barbara Fleming
• Maddie Gardner
• Alisha Grant
• Bryn Larsen
• Sabrina Leatham
• Rachel Olsen
• Andrea Payant
• Kurt Meyer
• Jessica Mills
• Abby Rodabough
• MaKayla Roundy
• Melanie Shaw
• Becky Skeen
• Sara Skindelien
• Seth Westenburg
• Liz Woolcott

• Multi-year research into user search behavior for all metadata
standards employed by the unit
 First phase: MARC
 Second phase: EAD
 Current phase: Dublin Core
• Project started just as the library moved everyone to work from
• Whole unit was able to participate in the coding project
Problem Statement
How do well do MARC records perform in a typical
user search process?
Research Questions
• What is the frequency and placement of MARC
records in search results lists?
• Where are search terms located in Marc
Table of Contents
Log Analysis (Liz)
Methodology (Andrea)
Results and Analysis (Paul)
Programs and Resources (Liz)

Log Analysis
What is log analysis?
What kind of data can we get from it?
• Rezarta Islamaj Dogan, G. Craig Murray, Aurélie Névéol, Zhiyong
Lu, Understanding PubMed® user search behavior through log
analysis, Database, 2009,
“Web logs can capture a number of informative
aspects of a user’s interaction, including timing,
query term selection and paths taken through a
Web site.”
Single search box
presented on the library
Web Logs
Example of time-
stamped web logs from
Google Analytics

Breaking down
the URL
Senvironmental sociology__P1__O-
Example of search results
Example of record page
(This is exclusively from Sierra.)
Example of advanced search page
C__S(environmental sociology) a:(Gustavo
Medina) f:a y:[2000-

Web Scraping
Exported list of all URLs
accessed the previous
day, sorted by time

• Uploaded
into Airtable
• Assigned ID
• Sorted for
search vs.
record page
Web Logs
Search results
page URLs fed
into Octoparse
Each item on a search
results page is numbered,
uploaded into Airtable,
and linked with the URL
that generated the item.

Search Results Scraped
Poll 1
What types of data do
you have experience
coding (if any)?
• Extract search terms
• Coded for:
 Page Type
 Advanced Search fields
 Facets Used
 Page #
URL Content

• URLS grouped into search
• Assigned a search ID
• Put in order of occurrence
• Search re-run for QC
• Coded for:
 Search term construction
 Search Categories (known
item, topical, etc.)
 User Path
 Known Item Titles
Search Queries
• Extracted from URL/Search Query
• Coded for:
 Format/Genre type
 Availability
 Physical/Electronic
 Location
 Steps to access (e-resources)
 Listed by (in Encore)
 Final content provider
 Check-outs
 Discoverability in Google Scholar
and Microsoft Academic
o Step to access (e-resources)
Known Items
• Filtered for just Sierra records
• BIB # extracted from URL
• MARC record copy/pasted from
• MARC record coded for:
 Creator
 Material Type
 MARC field where search term is
 Fields not present
 Word Count
MARC Records
Search Sessions Coded
MARC Records Coded
Known Items identified and coded

Results and
MARC Fields
Results and Analysis
Research Question #1
What is the frequency and placement of MARC
records in search results lists?
Batch 1 Batch 2 Batch 3 Combined
MARC-based catalog records 5264 3299 4749 13312
Records from other platforms 20326 17560 16811 54697
Total Records 25603 20859 21560 68022
Percent MARC records 20.56% 15.82% 22.03% 19.57%
Analysis 1.1:
How frequently are MARC records showing up in search results?

Analysis 1.2: Is there a difference between locally created records and vendor
supplied records in the frequency of listing in search results?
Record Creator
# Records in results
% Total records in
results list
# Records accessed
% Total records
Vendor 7,727 58.05% 163 39.00%
Cataloging and Metadata Services 5,066 38.06% 239 57.18%
Distance Campus Libraries 410 3.08% 5 1.20%
Record unavailable at time of coding 52 0.39% 2 0.48%
Patron Services, Library Media Collections, or Resource
Sharing and Document Delivery
33 0.25% 8 1.91%
Acquisitions 16 0.12% 0 0.00%
Unknown 5 0.04% 1 0.24%
Natural History Library 3 0.02% 0 0.00%
Total 13,312 418
Analysis 1.3:
How are MARC records ranked in the search results list?
• Most common position for MARC records in a search result set of 25
items, is position 4
• MARC records appear in the top five search results 25.35% of the time
Analysis 1.4:
Where do MARC records for known items rank in the search results list?
Percentage of Times Available Whole Object Appeared in Search Results by Position Number
Result 1 Result 2 Result 3 Result 4 Result 5
Total # 125 107 61 49 37 104 67 56 35
% in results 18.7% 16.0% 9.1% 7.3% 5.5% 15.6% 10.0% 8.4% 5.2%
Research Question #2
Where are search terms located in
MARC records?

Poll 2
Besides the title (245) field,
what field do you think most
frequently contained user
search terms?
Analysis 2.1:
What fields are used most in retrieving records?
4998 4806
245 505 650 520 600
MARC Fields
MARC Fields Where Search Terms Were
Located (Top 5)
Analysis 2.2:
For records accessed by the patron, is there a difference in
where search terms are located?
• The 245 Title statement remained highest, appearing 64% more often than the
next most utilized field
• Instead of the 505 Formatted Contents Note being in second place, the 650
Subject Added Entry is the next most used field
• The 505 Formatted Contents Note and 520 Summary fields retained a spot in
the top four fields
Analysis 2.3:
For locally created records and vendor-supplied records, is
there a difference in where search terms are located?
Percentage of fields used in record retrieval (top 5 most frequent)
Field Field Description CMS Records Vendor Records
245 Title Statement 43.80% 51.64%
505 Formatted Contents Note 28.13% 69.65%
650 Subject Added Entry - Topical 40.89% 56.58%
520 Summary, etc. 23.41% 76.03%
600 Subject Added Entry – Personal Name 59.94% 32.68%

Analysis 2.4:
What fields are not present in the records?
CMS Vendor
Not Present Present Not Present Present
Author (both 1xx and 7xx) 0.75% 99.25% 1.18% 98.82%
Subject (any authorized) 4.46% 95.54% 6.73% 93.27%
505 Formatted Contents Note 63.96% 36.04% 45.54% 54.46%
520 Summary Note 75.60% 24.40% 50.45% 49.55%
All Categories Present 14.86% 33.26%
Analysis 2.5:
Which fields would make the greatest impact if not included in the record?
• The top four fields with the greatest impact on retrieval, if not found in a record:
505, 245, 520, and 650
• Without the 505 or 520, 16.86% of all records appearing in results would not
have shown up
• In contrast, without 650 and 600 fields, only 0.66% of records would not have
appeared in the search results
MARC Fields
Results and Analysis
MARC Fields Findings

• Non-MARC records have
advantage over MARC
• MARC vendor records appear
more often than locally
created MARC records
80% Of all records in search
results are Non-MARC
25% Of MARC records place in the
top 5 search results
Occur more frequently in
vendor records
Occur at the same rate in
Vendor and Locally
created records
Title fields are most important overall, but…
505 =
• Ranked higher than 245
for records where search
terms matched only one
• Consistently in the top
4 fields that retrieved
a record (along with
• If missing, 12% of all
MARC results would
not have been
3rd Most important
field for matching
search terms
2nd Most important
field for records
viewed by patrons
1xx fields were much more likely to be “clicked on”
Would not have been
displayed if field
were missing
Instance of subject
fields being “clicked
Subject fields
are important
MARC Take-Aways
• Cataloger will retain ability to make best judgment for each
record, but will be asked to consider the following
 More emphasis on creating 505 and 520 notes in
local records
 Less emphasis on 6xx fields as an entry point
 More emphasis on 1xx fields as an entry point

Programs and
Poll 3
I have used the following
Pros and Cons: Google Analytics
• Google Analytics
 Lots of data
 Customizable reports
 Good export options (PDF, Google
Sheets, CSV, Excel)
 Runs constantly –good for historical
 Privacy issues
 Only downloads 5,000 at a time
 Institution chosen
Pros and Cons: Octoparse
• Octoparse
 Free option (under 10, trial)
 Speeds up the data collection process
 Can be simple – autodetect
 Fast
 Export into Excel, CSV, HTML, JSON
 Free version is limited in projects
 Sometimes skips records, need to keep
 Slight learning curve

Pros and Cons: Airtable
 Linking
 Flexible
 Dynamic dashboards
 Multi-user + Versioning
 Communication (commenting, tagging)
 Color Coding
 Views
 Codebooks
 Subscription
 Structuring can be complex
 Simplistic dashboard
Alternative Programs
Web log
 Matomo
 Open Web Analytics
Web Scraping
 ScrapingBot
 ParseHub
 Data Scraper (Chrome browser
 Web Scraper (Chrome and cloud
 Scraper (Chrome browser
Data Coding
 Excel
 Dedoose
 QDA Miner Lite
 Google Sheets
Next Steps
In process
 Dublin Core Discoverability
 Encore vs Google Scholar
 Search query construction
 Controlled field analysis
MARC Discoverability
EAD Discoverability
User search habits in Encore
Full Procedures:
Article with final results:
Liz Woolcott, Andrea Payant, Becky Skeen & Paul Daybell (2021) Missing the MARC:
Utilization of MARC Fields in the Search Process, Cataloging & Classification Quarterly,
59:1, 28-52, DOI: 10.1080/01639374.2021.1881010
Related articles
Robert Heaton & Liz Woolcott. Unraveling the (Search) String: Assessing Library Discovery
Layers Using Patron Queries. Library Assessment Conference, January 2021,

Coding Group
• Anna-Maria Arnljots
• Josee Butler
• Ryan Bushman (Stats)
• Paul Daybell
• Barbara Fleming
• Maddie Gardner
• Alisha Grant
• Bryn Larsen
• Sabrina Leatham
• Rachel Olsen
• Andrea Payant
• Kurt Meyer
• Jessica Mills
• Abby Rodabough
• MaKayla Roundy
• Melanie Shaw
• Becky Skeen
• Sara Skindelien
• Seth Westenburg
• Liz Woolcott
Anna-Maria Arnljots
Metadata Assistant
Paul Daybell
Archival Cataloging Librarian
Kurt Meyer
Government Information and E-
Resource Cataloger
Andrea Payant
Metadata Librarian
Becky Skeen
Special Collection Cataloging Librarian
Liz Woolcott
Cataloging and Metadata Services Unit Head
Thank You!

Similar to On Your MARC, Get Set, Code! (20)

Taming the Wilde
Taming the WildeTaming the Wilde
Taming the Wilde
How Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarHow Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community Webinar
Discovery Systems: Connecting the 21st Century Academic User to Content
Discovery Systems: Connecting the 21st Century Academic User to ContentDiscovery Systems: Connecting the 21st Century Academic User to Content
Discovery Systems: Connecting the 21st Century Academic User to Content
Research Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesResearch Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case Studies
Register "New Directions in Cataloging and Metadata Creation"
Register "New Directions in Cataloging and Metadata Creation"Register "New Directions in Cataloging and Metadata Creation"
Register "New Directions in Cataloging and Metadata Creation"
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching Data
Discovery Layer Strategies for Kuali OLE: Indiana University
Discovery Layer Strategies for Kuali OLE: Indiana UniversityDiscovery Layer Strategies for Kuali OLE: Indiana University
Discovery Layer Strategies for Kuali OLE: Indiana University
The Emergence  of Research Information Management (RIM) within US Libraries
The Emergence  of Research Information Management (RIM) within US LibrariesThe Emergence  of Research Information Management (RIM) within US Libraries
The Emergence  of Research Information Management (RIM) within US Libraries
Improving Visibility in Search Engines: How collections and organizations ben...
Improving Visibility in Search Engines: How collections and organizations ben...Improving Visibility in Search Engines: How collections and organizations ben...
Improving Visibility in Search Engines: How collections and organizations ben...
OpenAthens Conference 2018 - Tim Lull and Chad Smith - Cultivating your onlin...
OpenAthens Conference 2018 - Tim Lull and Chad Smith - Cultivating your onlin...OpenAthens Conference 2018 - Tim Lull and Chad Smith - Cultivating your onlin...
OpenAthens Conference 2018 - Tim Lull and Chad Smith - Cultivating your onlin...
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findings
Summary of Trends in Cataloging
Summary of Trends in CatalogingSummary of Trends in Cataloging
Summary of Trends in Cataloging
Non-textual ranking in Digital Libraries
Non-textual ranking in Digital LibrariesNon-textual ranking in Digital Libraries
Non-textual ranking in Digital Libraries
Discovery Interfaces
Discovery InterfacesDiscovery Interfaces
Discovery Interfaces
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...
Mine or theirs, where do users go? A comparison of collection usage at a loca...
Mine or theirs, where do users go? A comparison of collection usage at a loca...Mine or theirs, where do users go? A comparison of collection usage at a loca...
Mine or theirs, where do users go? A comparison of collection usage at a loca...
Avoiding a Level of Discontent in Finding Aids: An Analysis of User Engagemen...
Avoiding a Level of Discontent in Finding Aids: An Analysis of User Engagemen...Avoiding a Level of Discontent in Finding Aids: An Analysis of User Engagemen...
Avoiding a Level of Discontent in Finding Aids: An Analysis of User Engagemen...
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...

More from Andrea Payant (20)

Let's Get Digital!
Let's Get Digital!Let's Get Digital!
Let's Get Digital!
Where's the Data?
Where's the Data?Where's the Data?
Where's the Data?
The Missing Link: Metadata Conversion Workflows for Everyone
The Missing Link: Metadata Conversion Workflows for EveryoneThe Missing Link: Metadata Conversion Workflows for Everyone
The Missing Link: Metadata Conversion Workflows for Everyone
Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Mitigating the Risk: identifying Strategic University Partnerships for Compli...Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Just Keep Cataloging: How One Cataloging Unit Changed Their Workflows to Fit ...
Just Keep Cataloging: How One Cataloging Unit Changed Their Workflows to Fit ...Just Keep Cataloging: How One Cataloging Unit Changed Their Workflows to Fit ...
Just Keep Cataloging: How One Cataloging Unit Changed Their Workflows to Fit ...
But Were We Successful: Using Online Asynchronous Focus Groups to Evaluate Li...
But Were We Successful: Using Online Asynchronous Focus Groups to Evaluate Li...But Were We Successful: Using Online Asynchronous Focus Groups to Evaluate Li...
But Were We Successful: Using Online Asynchronous Focus Groups to Evaluate Li...
Assessment and Visualization Tools for Technical Services
Assessment and Visualization Tools for Technical ServicesAssessment and Visualization Tools for Technical Services
Assessment and Visualization Tools for Technical Services
Research Data Management at USU
Research Data Management at USUResearch Data Management at USU
Research Data Management at USU
liwalaawiiloxhbakaa (How We Lived): The Grant Bulltail Absáalooke (Crow Natio...
liwalaawiiloxhbakaa (How We Lived): The Grant Bulltail Absáalooke (Crow Natio...liwalaawiiloxhbakaa (How We Lived): The Grant Bulltail Absáalooke (Crow Natio...
liwalaawiiloxhbakaa (How We Lived): The Grant Bulltail Absáalooke (Crow Natio...
Crowdsourcing Metadata Practices at USU
Crowdsourcing Metadata Practices at USUCrowdsourcing Metadata Practices at USU
Crowdsourcing Metadata Practices at USU
Homeward Bound: How to Move an Entire Cataloging Unit to Remote Work
Homeward Bound: How to Move an Entire Cataloging Unit to Remote WorkHomeward Bound: How to Move an Entire Cataloging Unit to Remote Work
Homeward Bound: How to Move an Entire Cataloging Unit to Remote Work
Outside In: Retooling Cataloging Outreach Efforts
Outside In: Retooling Cataloging Outreach EffortsOutside In: Retooling Cataloging Outreach Efforts
Outside In: Retooling Cataloging Outreach Efforts
Charting Communication: Assessment and Visualization Tools for Mapping the Co...
Charting Communication: Assessment and Visualization Tools for Mapping the Co...Charting Communication: Assessment and Visualization Tools for Mapping the Co...
Charting Communication: Assessment and Visualization Tools for Mapping the Co...
Memes of Resistance, Election Reflections, and Voices from Drug Court: Social...
Memes of Resistance, Election Reflections, and Voices from Drug Court: Social...Memes of Resistance, Election Reflections, and Voices from Drug Court: Social...
Memes of Resistance, Election Reflections, and Voices from Drug Court: Social...
Giving Credit Where Credit is Due: Author and Funder IDs
Giving Credit Where Credit is Due: Author and Funder IDsGiving Credit Where Credit is Due: Author and Funder IDs
Giving Credit Where Credit is Due: Author and Funder IDs
VOCAB for Collaboration: How “Work Language” Can Help You Win at Teamwork
VOCAB for Collaboration: How “Work Language” Can Help You Win at TeamworkVOCAB for Collaboration: How “Work Language” Can Help You Win at Teamwork
VOCAB for Collaboration: How “Work Language” Can Help You Win at Teamwork
Can You Scan This For Me? Making the Most of Patron Digitization Request in t...
Can You Scan This For Me? Making the Most of Patron Digitization Request in t...Can You Scan This For Me? Making the Most of Patron Digitization Request in t...
Can You Scan This For Me? Making the Most of Patron Digitization Request in t...
Wisdom of the Crowd: Successful Ways to Engage the Public in Metadata Creation
Wisdom of the Crowd: Successful Ways to Engage the Public in Metadata CreationWisdom of the Crowd: Successful Ways to Engage the Public in Metadata Creation
Wisdom of the Crowd: Successful Ways to Engage the Public in Metadata Creation
Retooling Your Story: Using Visualizations to Demonstrate Your Impact
Retooling Your Story: Using Visualizations to Demonstrate Your ImpactRetooling Your Story: Using Visualizations to Demonstrate Your Impact
Retooling Your Story: Using Visualizations to Demonstrate Your Impact
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...

Recently uploaded (20)

2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference2024 KWL Back 2 School Summer Conference
2024 KWL Back 2 School Summer Conference
The membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERPThe membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERP
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
The basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptxThe basics of sentences session 9pptx.pptx
The basics of sentences session 9pptx.pptx
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Is Email Marketing Really Effective In 2024?
Is Email Marketing Really Effective In 2024?Is Email Marketing Really Effective In 2024?
Is Email Marketing Really Effective In 2024?
Ardra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Ardra Nakshatra (आर्द्रा): Understanding its Effects and RemediesArdra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Ardra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
NLC English 7 Consolidation Lesson plan for teacher
NLC English 7 Consolidation Lesson plan for teacherNLC English 7 Consolidation Lesson plan for teacher
NLC English 7 Consolidation Lesson plan for teacher
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptxChapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)
Split Shifts From Gantt View in the Odoo 17
Split Shifts From Gantt View in the  Odoo 17Split Shifts From Gantt View in the  Odoo 17
Split Shifts From Gantt View in the Odoo 17
How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17
How to Store Data on the Odoo 17 Website
How to Store Data on the Odoo 17 WebsiteHow to Store Data on the Odoo 17 Website
How to Store Data on the Odoo 17 Website
Webinar Innovative assessments for SOcial Emotional Skills
Webinar Innovative assessments for SOcial Emotional SkillsWebinar Innovative assessments for SOcial Emotional Skills
Webinar Innovative assessments for SOcial Emotional Skills

On Your MARC, Get Set, Code!

  • 1. On Your MARC, Get Set, Code! Hosted by Core: Leadership, Infrastructure, Futures March 23, 2022
  • 2. Presenters Paul Daybell Archival Cataloging Librarian Andrea Payant Metadata Librarian Liz Woolcott Cataloging and Metadata Services Unit Head
  • 3. Project Team Leadership Anna-Maria Arnljots Metadata Assistant Paul Daybell Archival Cataloging Librarian Kurt Meyer Government Information and E- Resource Cataloger Andrea Payant Metadata Librarian Becky Skeen Special Collection Cataloging Librarian Liz Woolcott Cataloging and Metadata Services Unit Head
  • 4. Full Research Team • Anna-Maria Arnljots • Josee Butler • Ryan Bushman (Stats) • Paul Daybell • Barbara Fleming • Maddie Gardner • Alisha Grant • Bryn Larsen • Sabrina Leatham • Rachel Olsen • Andrea Payant • Kurt Meyer • Jessica Mills • Abby Rodabough • MaKayla Roundy • Melanie Shaw • Becky Skeen • Sara Skindelien • Seth Westenburg • Liz Woolcott
  • 5. Background • Multi-year research into user search behavior for all metadata standards employed by the unit  First phase: MARC  Second phase: EAD  Current phase: Dublin Core • Project started just as the library moved everyone to work from home • Whole unit was able to participate in the coding project
  • 6. Problem Statement How do well do MARC records perform in a typical user search process?
  • 7. Research Questions • What is the frequency and placement of MARC records in search results lists? • Where are search terms located in Marc records?
  • 8. Table of Contents Log Analysis (Liz) Methodology (Andrea) Results and Analysis (Paul) Programs and Resources (Liz)
  • 9. Log Analysis What is log analysis? What kind of data can we get from it?
  • 10. • Rezarta Islamaj Dogan, G. Craig Murray, Aurélie Névéol, Zhiyong Lu, Understanding PubMed® user search behavior through log analysis, Database, 2009, “Web logs can capture a number of informative aspects of a user’s interaction, including timing, query term selection and paths taken through a Web site.”
  • 11. ENCORE Single search box presented on the library homepage
  • 12. Web Logs Example of time- stamped web logs from Google Analytics
  • 13. Breaking down the URL Senvironmental sociology__P1__O- date__X0__T__Ks@2000e@2020?lang=eng& suite=cobalt
  • 14. ENCORE Example of search results page lus/C__Senvironmental sociology__P1__O- date__X0__T__Ks@2000e@2020?l ang=eng&suite=cobalt
  • 15. ENCORE Example of record page (This is exclusively from Sierra.) _Rb4067331__Senvironmental sociology__P1__O- date__X0__T__Ks@2000e@2020?lang=eng&su ite=cobalt lus/C__Senvironmental sociology__P1__O- date__X0__T__Ks@2000e@2020?l ang=eng&suite=cobalt
  • 16. ENCORE Example of advanced search page C__S(environmental sociology) a:(Gustavo Medina) f:a y:[2000- 2020]__U__X0?lang=eng&suite=cobalt
  • 20. WEB LOGS Exported list of all URLs accessed the previous day, sorted by time
  • 21. • Uploaded into Airtable • Assigned ID • Sorted for search vs. record page Web Logs Search results page URLs fed into Octoparse
  • 24. WEB SCRAPE Each item on a search results page is numbered, uploaded into Airtable, and linked with the URL that generated the item.
  • 27. Poll 1 What types of data do you have experience coding (if any)?
  • 28. CODING • Extract search terms • Coded for:  Page Type  Advanced Search fields used  Facets Used  Page # URL Content
  • 29. CODING • URLS grouped into search sessions • Assigned a search ID • Put in order of occurrence • Search re-run for QC • Coded for:  Search term construction  Search Categories (known item, topical, etc.)  User Path  Known Item Titles Search Queries
  • 30. CODING • Extracted from URL/Search Query coding • Coded for:  Format/Genre type  Availability  Physical/Electronic  Location  Steps to access (e-resources)  Listed by (in Encore)  Final content provider  Check-outs  Discoverability in Google Scholar and Microsoft Academic o Step to access (e-resources) Known Items
  • 31. CODING • Filtered for just Sierra records • BIB # extracted from URL • MARC record copy/pasted from WebPac • MARC record coded for:  Creator  Material Type  MARC field where search term is found  Fields not present  Word Count MARC Records
  • 32. 1,040 13,312 609 Search Sessions Coded MARC Records Coded Known Items identified and coded
  • 35. Research Question #1 What is the frequency and placement of MARC records in search results lists?
  • 36. Batch 1 Batch 2 Batch 3 Combined MARC-based catalog records 5264 3299 4749 13312 Records from other platforms 20326 17560 16811 54697 Total Records 25603 20859 21560 68022 Percent MARC records 20.56% 15.82% 22.03% 19.57% Analysis 1.1: How frequently are MARC records showing up in search results?
  • 37. Analysis 1.2: Is there a difference between locally created records and vendor supplied records in the frequency of listing in search results? Record Creator # Records in results list % Total records in results list # Records accessed % Total records accessed Vendor 7,727 58.05% 163 39.00% Cataloging and Metadata Services 5,066 38.06% 239 57.18% Distance Campus Libraries 410 3.08% 5 1.20% Record unavailable at time of coding 52 0.39% 2 0.48% Patron Services, Library Media Collections, or Resource Sharing and Document Delivery 33 0.25% 8 1.91% Acquisitions 16 0.12% 0 0.00% Unknown 5 0.04% 1 0.24% Natural History Library 3 0.02% 0 0.00% Total 13,312 418
  • 38. Analysis 1.3: How are MARC records ranked in the search results list? • Most common position for MARC records in a search result set of 25 items, is position 4 • MARC records appear in the top five search results 25.35% of the time
  • 39. Analysis 1.4: Where do MARC records for known items rank in the search results list? Percentage of Times Available Whole Object Appeared in Search Results by Position Number Result 1 Result 2 Result 3 Result 4 Result 5 Results 6-10 Results 11-15 Results 16-20 Results 21-25 Total # 125 107 61 49 37 104 67 56 35 % in results 18.7% 16.0% 9.1% 7.3% 5.5% 15.6% 10.0% 8.4% 5.2%
  • 40. Research Question #2 Where are search terms located in MARC records?
  • 41. Poll 2 Besides the title (245) field, what field do you think most frequently contained user search terms?
  • 42. Analysis 2.1: What fields are used most in retrieving records? 9100 4998 4806 3700 1328 245 505 650 520 600 Number of Records MARC Fields MARC Fields Where Search Terms Were Located (Top 5)
  • 43. Analysis 2.2: For records accessed by the patron, is there a difference in where search terms are located? • The 245 Title statement remained highest, appearing 64% more often than the next most utilized field • Instead of the 505 Formatted Contents Note being in second place, the 650 Subject Added Entry is the next most used field • The 505 Formatted Contents Note and 520 Summary fields retained a spot in the top four fields
  • 44. Analysis 2.3: For locally created records and vendor-supplied records, is there a difference in where search terms are located? Percentage of fields used in record retrieval (top 5 most frequent) Field Field Description CMS Records Vendor Records 245 Title Statement 43.80% 51.64% 505 Formatted Contents Note 28.13% 69.65% 650 Subject Added Entry - Topical 40.89% 56.58% 520 Summary, etc. 23.41% 76.03% 600 Subject Added Entry – Personal Name 59.94% 32.68%
  • 45. Analysis 2.4: What fields are not present in the records? CMS Vendor Not Present Present Not Present Present Author (both 1xx and 7xx) 0.75% 99.25% 1.18% 98.82% Subject (any authorized) 4.46% 95.54% 6.73% 93.27% 505 Formatted Contents Note 63.96% 36.04% 45.54% 54.46% 520 Summary Note 75.60% 24.40% 50.45% 49.55% All Categories Present 14.86% 33.26%
  • 46. Analysis 2.5: Which fields would make the greatest impact if not included in the record? • The top four fields with the greatest impact on retrieval, if not found in a record: 505, 245, 520, and 650 • Without the 505 or 520, 16.86% of all records appearing in results would not have shown up • In contrast, without 650 and 600 fields, only 0.66% of records would not have appeared in the search results
  • 49. Analysis • Non-MARC records have advantage over MARC • MARC vendor records appear more often than locally created MARC records 80% Of all records in search results are Non-MARC 25% Of MARC records place in the top 5 search results 505/520 Occur more frequently in vendor records 1xx/6xx/7xx Occur at the same rate in Vendor and Locally created records
  • 50. Analysis Title fields are most important overall, but… 505 = • Ranked higher than 245 for records where search terms matched only one field • Consistently in the top 4 fields that retrieved a record (along with 520) • If missing, 12% of all MARC results would not have been displayed
  • 51. Analysis 3rd Most important field for matching search terms 2nd Most important field for records viewed by patrons 1xx fields were much more likely to be “clicked on” .66% Would not have been displayed if field were missing 1 Instance of subject fields being “clicked on” Subject fields are important BUT…
  • 52. MARC Take-Aways • Cataloger will retain ability to make best judgment for each record, but will be asked to consider the following guidelines:  More emphasis on creating 505 and 520 notes in local records  Less emphasis on 6xx fields as an entry point  More emphasis on 1xx fields as an entry point
  • 54. Poll 3 I have used the following programs:
  • 55. Pros and Cons: Google Analytics • Google Analytics Pro  Lots of data  Customizable reports  Good export options (PDF, Google Sheets, CSV, Excel)  Runs constantly –good for historical data Cons  Privacy issues  Only downloads 5,000 at a time  Institution chosen
  • 56. Pros and Cons: Octoparse • Octoparse Pros  Free option (under 10, trial)  Speeds up the data collection process  Can be simple – autodetect  Fast  Export into Excel, CSV, HTML, JSON Cons  Free version is limited in projects  Sometimes skips records, need to keep track  Slight learning curve
  • 57. Pros and Cons: Airtable Pros  Linking  Flexible  Dynamic dashboards  Multi-user + Versioning  Communication (commenting, tagging)  Color Coding  Views  Codebooks Cons  Subscription  Structuring can be complex  Simplistic dashboard
  • 58. Alternative Programs Web log generation  Matomo  Open Web Analytics Web Scraping  ScrapingBot  ParseHub  Data Scraper (Chrome browser extension)  Web Scraper (Chrome and cloud extension)  Scraper (Chrome browser extension) Data Coding  Excel  Dedoose  QDA Miner Lite  Google Sheets
  • 59. Next Steps PROJECTS In process  Dublin Core Discoverability  Encore vs Google Scholar Upcoming  Search query construction  Controlled field analysis Completed MARC Discoverability EAD Discoverability User search habits in Encore
  • 60. Resources Full Procedures: Article with final results: Liz Woolcott, Andrea Payant, Becky Skeen & Paul Daybell (2021) Missing the MARC: Utilization of MARC Fields in the Search Process, Cataloging & Classification Quarterly, 59:1, 28-52, DOI: 10.1080/01639374.2021.1881010 Related articles Robert Heaton & Liz Woolcott. Unraveling the (Search) String: Assessing Library Discovery Layers Using Patron Queries. Library Assessment Conference, January 2021, Unraveling-the-Search-String.pdf
  • 61. Coding Group • Anna-Maria Arnljots • Josee Butler • Ryan Bushman (Stats) • Paul Daybell • Barbara Fleming • Maddie Gardner • Alisha Grant • Bryn Larsen • Sabrina Leatham • Rachel Olsen • Andrea Payant • Kurt Meyer • Jessica Mills • Abby Rodabough • MaKayla Roundy • Melanie Shaw • Becky Skeen • Sara Skindelien • Seth Westenburg • Liz Woolcott
  • 62. Questions? Anna-Maria Arnljots Metadata Assistant Paul Daybell Archival Cataloging Librarian Kurt Meyer Government Information and E- Resource Cataloger Andrea Payant Metadata Librarian Becky Skeen Special Collection Cataloging Librarian Liz Woolcott Cataloging and Metadata Services Unit Head Thank You!