ALA Interoperability

CONTENTdm Interoperability -- Leveraging resources; repurposing collections ALA Annual New Orleans, LA June 23 rd , Friday, 9 am to noon Claire Cocco , Product Manager Geri Ingram , Customer Service Specialist DiMeMa, Inc.

Agenda Part 1 9:00 to 10:15 Mainstream digital objects into existing workflows Importing from legacy systems Exporting Example of collaborative development for interoperability METS transform (courtesy of CDL) [BREAK 10:15 TO 10:30]

Agenda Part 2 10:30 to 11:30 Customizing and integrating your CONTENTdm site Web templates Custom Queries and Results Configuration files

Agenda Part 3 11:30 to Noon Handling Finding Aids Importing EAD files into CONTENTdm

Setting the context: fully engaged in digital library transformation Library services and collections expanding to encompass all Traditional to digital Licensed Reformatted Sharing Preserving

Leveraging resources Staff time and skills throughout the organization and/or consortium Existing metadata in some form Existing digital collections (images and transcripts)

Why? For better customer service In order to mainstream your processing and amplify your efforts. Your digital collections should ultimately be mainstreamed into regular workflows, similar to the ones used for other materials (whether that’s done centrally or in a distributed fashion). This includes selection, technical processing (cataloging, organizing, importing), integration with site vis-à-vis presentation and archiving.

Mainstreaming processing of digital formats (Part 1 of 3) Importing from other systems to CONTENTdm Exporting from CONTENTdm Example of collaborative development for interoperability CONTENTdm Standard Export METS transform for import

I . Importing from other systems to CONTENTdm Metadata only When records describe items that are not yet scanned Replace “null” files at later time Metadata AND their digital files

From an OPAC or other database system When you have… Individual image files cataloged already And can export from an OPAC or other dbms Or where you have compound digital objects ready for migration

Migration steps: Prepare the collection and the import files Cross-walk metadata to Dublin Core Configure the CONTENTdm collection fields Export and prep data in a tab-delimited ASCII file Import the file to CONTENTdm

Data prep: Common problems in tab delimited data files Extra data in columns or rows Extra tabs at end of line Extra CRs at end of file (Should only be 1 CR) Carriage return in metadata, tab in metadata Files must exist 0 versus O Error may occur in previous record, check few rows before and after error File names are required, not full pathnames

Data prep: Troubleshooting with Excel Use Microsoft Excel to open the file and view data Each row should be an item with last column as filename Work with small batches to find errors – keep adding items until record with error is found Use Excel’s “CLEAN” function to remove invisible characters Import images from directory without using tab delimited file Checks for any type of imaging errors

Demo : MARC to DC Export MARC records to tab-delimited text file (using ILS or MarcEdit) Format and clean up the text file to conform to your CONTENTdm Collection schema Import the file (with or without images) to the Collection

Importing compound objects For documents, postcards, monographs and picture cubes Can do singly or in batch Much easier to start with singles, then set up for batch when process is smooth

Migrate compound objects from another database system Where you have many compound digital objects to migrate Prepare the collection and the import files Cross-walk metadata to Dublin Core Configure the CONTENTdm collection fields Configure folders for scans and transcripts (if appropriate) Choose an import method based on your data structure Create tab-delimited ASCII file(s) appropriate to the method Import the files to CONTENTdm in batches

Multiple compound object wizard Documented in online tutorial Today’s demo described in handout Four import methods for multiple object loading Compound object (same as single, but upload batched) Directory Structure (most flexible and efficient) Object List (useful when NO page-level metadata) Job List Time allowing, demonstrate three different object types using 3 of 4 methods

Choose a multiple compound import method based on your data YES YES * YES Monograph YES * YES YES Documents * YES YES YES Postcards Object List (No page-level metadata) Directory Structure Compound Object * Will demo

Do you have page-level metadata for the compound objects ? Are your scan files separated into compound object directories? Create compound object directories for EACH compound object. No Yes DIRECTORY STRUCTURE Yes Do you have one tab-delimited text file containing ALL the objects? Are they all the same type of compound object ? Break up into batches by type No No OBJECT LIST Yes Do you have tab- delimited text files for EACH compound object? . DIRECTORY STRUCTURE . Create text file listing all compound objects and object metadata or create a text file for each compound object. No Yes No Yes

Every one of the four CONTENTdm compound object importing methods Requires object -level metadata Requires preparation File–naming, keeping sort order in mind Each object has own directory for scans May use tab-delimited text file(s) Accommodates transcripts

A word about descriptive page-level metadata Supported by some but not all 4 import methods NOT supported by Object List At page-level Title is only field required Technical metadata, can be generated by Template creator

More on transcripts Typescripts and transcripts Requires a field designated as the data type “Full Text Search” Inserted into the metadata field of the scanned page During import Through use of .txt file found, or By Template Creator If OCR Extension in use Or by “Directory Import” as with early versions of CONTENTdm Transcripts and typescripts are supported by all four methods (i.e., not considered “metadata” for purposes of this discussion)

Demo: Import Multiple Compound Objects Monograph using Compound Object method Postcards using Object List method Documents using Directory Structure method

II. Exporting from CONTENTdm To ascii tab-delimited with field headers To xml: Standard Dublin Core —only DC Custom—all fields, including local but not structure CDM Standard—all fields, including structure

III. Examples of collaboration for interoperability Web integration through search engines, RSS OAI harvesting Enable at collection or server level Choose to suppress <pagedata> or not WorldCat registration Open WorldCat integration

CONTENTdm and a new METS transform Info available on USC in July Code at SourceForge Windows-oriented

The CONTENTdm to METS conversion tool

What is/are METS? Why is/are METS good? What is 7train? How do I use 7train? What do I get from 7train? How do I get 7train?

What is/are METS? METS (Metadata Encoding and Transmission Standard) is an XML-based standard for encoding metadata to describe objects (digital or otherwise) within a digital library. See http://www.loc.gov/standards/mets/ for more information

What is/are METS? METS metsHdr structMap dmdSec amdSec fileSec behaviorSec METS metsHdr structMap dmdSec amdSec fileSec behaviorSec Yellow elements/tags are required; all others are optional Metadata for the management of the object: technical details, object history, etc. Description of the structure of the object, i.e. how the files fit together What to do with the object: machine actionable instructions A list of files that make up the object Descriptive metadata - title, author, subjects, etc. Metadata about this particular METS - encoder, contact info, etc.

Why METS? To be able to add your objects to other collections and increase the visibility your institution's assets.

What is 7train? 7train is an XSL-based tool for converting XML documents - in this case CONTENTdm exports describing objects managed in the CONTENTdm system - into METS objects suitable for submission to a digital library system, such as the California Digital Library's Online Archive of California. 7train is a platform-independent, standalone tool that was designed to work on any system and to be simple to use.

How does 7train work? It is as easy as dragging your CONTENTdm XML export file onto an executable file.

How does 7train work? What do you get?

Output: A Sample METS document

References & Links 7train Home: http://seventrain.sourceforge.net 7train Download: http://seventrain.sourceforge.net/7train_download.html CONTENTdm: http://www.dimema.com METS: http://www.loc.gov/standards/mets/ XSL: http://www.w3.org/Style/XSL/ The California Digital Library: http://www.cdlib.org The Online Archive of California: http://www.oac.cdlib.org

Interoperability Librarians, Archivists… For Library Users OPEN WORLDCAT OAI MARC RECORDS OAI Web WorldCat Regional Union Catalog Other digital archives OAI OAI XML DC DC CONTENTdm Existing Libraries 10K/50K/ Unlimited Objects New Libraries Other CONTENTdm sites CONTENTdm Multi-Site Server OPACS

BREAK—15 minutes This concludes Part 1 To come after the break: Part 2 Customization Part 3 Finding Aids

Customizing and integrating your CONTENTdm site (Part 2 of 3) Web templates Custom Queries and Results Configuration files

CONTENTdm Web Templates Customizable for integration Designed to support broad range of users Small to large organizations Beginners to experts Use out of the box with minimal customization Basic customization requires minimal HTML skills Fully customize including advanced extensions Based on a PHP API ( Hypertext Preprocessor and Application Program Interface)

Basic Customizations Minimal skills needed Easy to make changes Global include files Variables Recommend all organizations do basic customizations Header (name/logo), contact e-mail address, colors, about page, home page http://www.contentdm.com/help4/custom/templates.html

Getting Started Access to Web server docs directory HTML editor or text editor Design plan Logo or other graphics Backup copy of original files

Customization Demo http://sr.contentdmdemo.com Files located in /cdm4 directory /includes/global_header.php /client/LOC_global.php /client/STY_global_style.php about.php browse.php results.php New logo saved in /cdm4/images/

Advanced Customizations Experience with HTML, PHP, and JavaScript needed Customize looks for each collection University of Nevada, Reno Web Template extensions E-commerce (University of Utah, Oregon State University) Comment forms (SENYLRC, Enoch Pratt Free Library, OSU) Custom metadata display (University of Oregon) QuickTime video (Williams College) http://www.contentdm.com/customers/index.html

Examples of Advanced Customizations University of Nevada, Reno http://imageserver.library.unr.edu/ University of Utah http://www.lib.utah.edu/digital/bodmer/ Oregon State University http://digitalcollections.library.oregonstate.edu/cdm4/client/bracero/ SENYLRC http://www.hrvh.org/ Enoch Pratt Free Library http://www.mdch.org/ Williams College http://contentdm.williams.edu/

Customizations Tips Always make a backup! Be aware of encoding (UTF-8 vs. ASCII) See what other users are doing Share, borrow, and copy ideas and code http://www.contentdm.com/customers/index.html Listserv Document changes Document which files are edited and what code changes are made to ease upgrading to newer versions

Custom Queries and Results (CQR) Create predefined, custom queries Virtual collections Guide users to specific results Integrate with other sites Multiple options Simple hyperlink, drop-down list, index box, text box, browse Easy to use Wizard generates code to copy and paste into Web pages Documentation http://www.contentdm.com/help4/custom/cqr.html http://www.contentdm.com/USC/tutorials/cqr.pdf

CQR DEMO Generate code using CQR Copy and paste into Web pages May need to change path Customize as desired

Configuration Files Customizable files that reside on the server Stop words Full text field stop words – fullstop.txt Automatic hyperlink stop words – stopwords.txt http://www.contentdm.com/help4/custom/stopwords.html Image viewer Customize how images are displayed – imageconf.txt For all collections or per collection http://www.contentdm.com/help4/custom/zoompan.html

Imageconf.txt Demo Located in the /conf directory on the CONTENTdm server Can change globally or for individual collections If you wish to change the zoom and pan default settings for a particular collection, copy the imageconf.txt file from the Server/conf directory to the index/etc directory of the collection(s) you wish to modify. Make a backup copy!

Introduction to Finding Aids How many of you have them? Are they digital documents or paper? If digital, are they XML? Basic: create documents, monographs, and use http protocol to link XML: use EAD DTD, and style sheet to display

Handling Finding Aids Part 3 Importing EAD files to CONTENTdm

Current EAD Support Import of EAD files Automatic text extraction from EAD files when: The file extension of the EAD is .xml. The file includes a header record beginning with DOCTYPE ead. The collection has a full text search field. The full text search field is empty when the item is added to the collection. Up to 128,000 characters extracted from the following fields and placed in the full text search field titleproper, title, unititle, persname, famname, corpname, genreform

Current EAD Support Display determined by style sheet XSLT CSS Client side parsing Affected by Web browser

Getting Started EAD XML files EAD DTD XSLT style sheet

EAD Demo Configure Full Text Search field Store DTD and style sheet on server Edit path to DTD and XSLT in EAD files Import (single or batch) Add metadata Custom thumbnail if desired Upload, approve, index

Custom EAD Extension Example by Oregon State University Terry Reese, [email_address] Customized Web templates Client side or server side parsing Integrates display in templates VBScript for extracting metadata from EAD to tab-delimited text file www.contentdm.com/USC/templates/index.asp

Oregon State University EAD Collection http://digitalcollections.library.oregonstate.edu/

Announcing new exposure for your CONTENTdm Collections Collection of Collections http://collections.contentdmdemo.com/ (also featured at contentdm.com/customers) Harvesting metadata from Collection sites at: http://primarysources.contentdmdemo.com Uses CONTENTdm Multi-site server

ALA Interoperability

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to ALA Interoperability

Similar to ALA Interoperability (20)

Recently uploaded

Recently uploaded (20)

ALA Interoperability