This document describes workflows developed by Utah State University and the University of Nevada, Las Vegas to streamline metadata creation between special collections and digital initiatives departments. The workflows allow for converting finding aid information into Dublin Core for uploading item records to a digital repository, and batch linking digitized content to finding aids. The processes are designed to be taught easily and performed by various staff levels to automate metadata work and make it more flexible.
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...Andrea Payant
This document describes a low-tech approach developed by Utah State University to link finding aids to digital content using Archival Resource Keys (ARKs). The approach aims to make the process flexible and accessible to various library staff and student workers without requiring in-depth XML training. It utilizes common office tools like Excel and Word along with XML editors. Dublin Core metadata is used to meet standards for two different consortiums while ARKs serve as persistent identifiers independent of any digital repository system. Step-by-step workflows are documented for applying this approach to born-digital archival materials.
Transparent Licenses: Making user rights clear (OLA Super Conference 2015)Hong (Jenny) Jing
Recent changes to Canada’s Copyright Act have propelled copyright and licensed use into the spotlight at colleges and universities in Canada. This session will look at Queen’s and University of Toronto libraries’ experience implementing a licensing permissions workflow using OCUL Usage Rights database (OUR). The systems will be covered are: 360 Link, Summon, Voyager OPAC, Endeca. We will explain how to implement the license links with and without using API.
This document provides an overview of the open source electronic resource management system CORAL. It begins with a brief history of ERM systems and an introduction to CORAL. Next, it reviews literature about CORAL implementations at various universities. It then provides a tour of CORAL's modules for resources, licensing, organizations, and usage statistics. The document concludes with a case study of CORAL's implementation at East Carolina University and their experiences getting the most out of the system.
Discovery layer decisions, configurations and strategiesRay Schwartz
What are discovery layers
What brought about this topic and the five libraries chosen
How did they implement
How have they assessed
What modifications were made
Conclusions
Getting on the Same Page: Aligning ERM and LIbGuides ContentNASIG
The document discusses efforts at the University of North Texas libraries to align their electronic resource management (ERM) system data with their LibGuides subject and course guides. This included cleaning up subject headings, migrating data from their ERM to populate the LibGuides A-Z database list, and using spreadsheets to match records and import fields to enhance the A-Z list entries. The goals were to centralize electronic resource information management, improve the user experience of finding resources, and establish workflows for regular synchronization between the ERM and LibGuides systems.
Many projects in the digital humanities involve either digitization or enrichment of existing digital materials. In both these cases, the process can be understood as a workflow. This paper intent to discuss new forms of interactions for workflows tools. We used the workflow present in the Orlando Project to illustrate how structured surfaces can he useful to help scholars to track changes along big projects.
This work was presented in a panel session at SDH-SEMI 2012 in Waterloo, Canada.
See more here: http://luciano.fluxo.art.br/?p=316
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNextTerry Reese
MarcEdit and MARCNext provide linked data tools for catalogers to experiment with BIBFRAME and linked data concepts. The MARCNext tools allow visualization of metadata in various BIBFRAME formats and embedding URIs. The Link Identifiers tool embeds URIs from sources like VIAF, ID.LOC.GOV, and FAST in authority fields. The SPARQL Browser allows querying SPARQL endpoints. The Zepheira training plugin transforms records for the LibHub initiative. However, many linking services are still evolving and local implementations vary, posing challenges for widespread adoption.
Presentation give to our local cataloging and discovery unit. The meeting discussed the current state of Linked Data in Libraries, as well as how we can experiment with tools like MarcEdit.
ARK de Triumph: Linking Finding Aids & Digital Libraries Using a Low-Tech App...Andrea Payant
This document describes a low-tech approach developed by Utah State University to link finding aids to digital content using Archival Resource Keys (ARKs). The approach aims to make the process flexible and accessible to various library staff and student workers without requiring in-depth XML training. It utilizes common office tools like Excel and Word along with XML editors. Dublin Core metadata is used to meet standards for two different consortiums while ARKs serve as persistent identifiers independent of any digital repository system. Step-by-step workflows are documented for applying this approach to born-digital archival materials.
Transparent Licenses: Making user rights clear (OLA Super Conference 2015)Hong (Jenny) Jing
Recent changes to Canada’s Copyright Act have propelled copyright and licensed use into the spotlight at colleges and universities in Canada. This session will look at Queen’s and University of Toronto libraries’ experience implementing a licensing permissions workflow using OCUL Usage Rights database (OUR). The systems will be covered are: 360 Link, Summon, Voyager OPAC, Endeca. We will explain how to implement the license links with and without using API.
This document provides an overview of the open source electronic resource management system CORAL. It begins with a brief history of ERM systems and an introduction to CORAL. Next, it reviews literature about CORAL implementations at various universities. It then provides a tour of CORAL's modules for resources, licensing, organizations, and usage statistics. The document concludes with a case study of CORAL's implementation at East Carolina University and their experiences getting the most out of the system.
Discovery layer decisions, configurations and strategiesRay Schwartz
What are discovery layers
What brought about this topic and the five libraries chosen
How did they implement
How have they assessed
What modifications were made
Conclusions
Getting on the Same Page: Aligning ERM and LIbGuides ContentNASIG
The document discusses efforts at the University of North Texas libraries to align their electronic resource management (ERM) system data with their LibGuides subject and course guides. This included cleaning up subject headings, migrating data from their ERM to populate the LibGuides A-Z database list, and using spreadsheets to match records and import fields to enhance the A-Z list entries. The goals were to centralize electronic resource information management, improve the user experience of finding resources, and establish workflows for regular synchronization between the ERM and LibGuides systems.
Many projects in the digital humanities involve either digitization or enrichment of existing digital materials. In both these cases, the process can be understood as a workflow. This paper intent to discuss new forms of interactions for workflows tools. We used the workflow present in the Orlando Project to illustrate how structured surfaces can he useful to help scholars to track changes along big projects.
This work was presented in a panel session at SDH-SEMI 2012 in Waterloo, Canada.
See more here: http://luciano.fluxo.art.br/?p=316
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNextTerry Reese
MarcEdit and MARCNext provide linked data tools for catalogers to experiment with BIBFRAME and linked data concepts. The MARCNext tools allow visualization of metadata in various BIBFRAME formats and embedding URIs. The Link Identifiers tool embeds URIs from sources like VIAF, ID.LOC.GOV, and FAST in authority fields. The SPARQL Browser allows querying SPARQL endpoints. The Zepheira training plugin transforms records for the LibHub initiative. However, many linking services are still evolving and local implementations vary, posing challenges for widespread adoption.
Presentation give to our local cataloging and discovery unit. The meeting discussed the current state of Linked Data in Libraries, as well as how we can experiment with tools like MarcEdit.
Linked Data is exploding in the library world, but the biggest problems libraries have are coming up with the time or money involved in converting their records, looking into Linked Data programs, finding community support, and all the various other issues that arise as part of developing new methods. Likewise, one of the biggest hurdles for libraries and linked data is that they do not know what to do to get involved. As we have fewer people available and smaller budgets each year, we would like to explore ways in which libraries can get involved in the process without expending an undue amount of their already dwindling resources. To see how linked data can be applied, we will look at the example of the Smithsonian Libraries (SIL). Over the past 18 months, SIL has been preparing for the transition from MARC to linked open data. This session will talk about various SIL projects and initiatives (such as the FAST headings project and the introduction of Wikidata and WikiBase); how to incorporate linked data elements into MARC records; and how to develop staff and give them proficiency with new tools and workflows.
Heidy Berthoud, Head, Resource Description, Smithsonian Libraries
COMPanion Corporation Alexandria by Nancy Garcia, Luis Mercado, Elizabeth Tan...Louminous Mercado
Take a look at the history, strengths, & weaknesses of the COMPanion Corporation Alexandria system as well as whether we recommend it or not. The only way you'll find out is by giving our presentation a look!
This presentation was provided by Fred Reiss of the University of Oklahoma for the NISO webinar, Integrating Library Management Systems, held on June 8, 2016.
This presentation was given by Michael Lauruhn of Elsevier Labs during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016.
Presentation given at Cilip ARLG/MmIT day conference on "Research(er) Workflows in the Real World" on 9 Dec 2019 at the British Library Conference Centre. Conference summary at: https://mmitblog.wordpress.com/2020/01/20/researcher-workflows-in-the-real-world-a-guest-review-from-our-bursary-winner/
Georgia Tech Drupal Users Group - February 2015 MeetingEric Sembrat
This document summarizes the February 2015 meeting of the Georgia Tech Drupal Users Group. It includes announcements about upcoming Drupal conferences, presentations from guest speakers on introductory and advanced Views topics, and relationships within Views. Presenters discussed how to map out data connections and leverage relationships to build powerful Views, Panels, and Pages.
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...Paris Open Source Summit
ONLYOFFICE développée par Ascensio System SIA, est une suite bureautique open-source basée sur l'élément Canvas de HTML5, qui offre une gamme complète d’outils d’édition en ligne des documents texte, feuilles de calcul et présentations.
Cette présentation commence par l’aperçu des principes de base :
- support de tous les formats courants,
- riche éventail d’outils de la mise en forme,
- affichage du contenu de manière identique, quel que soit le navigateur utilisé,
- ressources permettant d’étendre les fonctionnalités des éditeurs,
- capacités avancées de co-édition,
- transfert de données sécurisé en temps réel.
Le nombre des universités et des écoles qui optent pour les alternatives open source aux solutions populaires offertes par les grandes marques, augmente chaque année. Les solutions de ONLYOFFICE sont actuellement utilisées par plus de 30 établissements d’enseignement en France tels que treize Universités de la Sorbonne, l’Université de Grenoble, l’Université de Nantes, l’École Nationale d'Ingénieurs de Brest, le l'établissement public Campus Condorcet, etc.
Dans cette partie, Jeremy Maton, l’Administrateur Systèmes et Réseaux à l’Institut de Biologie de Lille, va présenter comment ONLYOFFICE est intégrée au sein de leur unité de recherches et aide à organiser le flux de travail.
This presentation was provided by Ellen Bishop of the Florida Virtual Campus for the NISO webinar, Integrating Library Management Systems, held on June 8, 2016
Show 'Em What You've Got: Exposing Finding Aids with ArchivesSpaceAngela Kroeger
This document summarizes the University of Nebraska at Omaha's implementation of ArchivesSpace to create online finding aids for its archives and special collections. It describes ArchivesSpace and why it was chosen, the process of loading legacy data and enhancing records, and ongoing work to add collections. The presentation includes a live demo of the public and staff interfaces of UNO's ArchivesSpace installation.
Don’t make me think: biodiversity data publishing made easyVince Smith
Presented by Vince Smith at the iEvoBio 2013 meeting in Snowbird, Utah, USA on 25th June, 2013. The presentation coauthors are Alice Heaton, Laurence Livermore, Simon Rycroft and Ben Scott from the Natural History Museum, London, and Lyubomir Penev from Pensoft Publishing, Bulgaria.
This document summarizes a presentation given by Judy McNally and Doreen Herold of Lehigh University about the challenges facing their technical services department and how they are adapting workflows to address changing trends. Key challenges include acquiring fewer print materials, an explosion of digital resources, reduced budgets, and changing staff roles. The department is shifting from print to electronic serials, outsourcing more work, and cross-training staff. Staff are taking on new roles like resolving access issues for electronic journals and doing more batch cataloging of materials like ETDs and SpringerLink titles. The department is also exploring new cataloging solutions like OLE.
Design decisions relating to ChiToolbox, presented at the Kick-off Meeting for OpenVibSpec, 3 February 2020, in Bochum, Germany.
ChiToolbox is an open source MATLAB toolbox for handling data from hyperspectral imaging experiments.
https://bitbucket.org/AlexHenderson/ChiToolbox/
https://openvibspec.org/
Walk this way: Online content platform migration experiences and collaboration NASIG
In this session, a librarian and a publisher share their perspectives on content platform migrations, and the Working Group Co-chairs will describe the group’s efforts to-date and expected outcomes. Our publisher-side speaker will describe issues they must consider when their content migrates, such as providing continuous access, persistent linking, communicating with stakeholders, and working with vendors. Our librarian speaker will describe their experience and steps they take during migrations, such as receiving notifications about migrations, identifying affected e-resources, updating local systems to ensure continuous access, and communicating with their front-line staff and patrons.
The document discusses different NoSQL database models and when each may be appropriate to use. It notes that relational databases can scale with enough effort, but using the proper NoSQL model for one's data avoids unnecessary layers of abstraction. Key-value stores are best for simple dictionaries or session data, while document stores allow for querying inner document values and are well-suited for documents like blogs. Column-family databases are optimized for high write volumes with small chance of collisions. Graph databases are best for data inherently involving nodes and relationships like social networks. The best approach is "polyglot persistence", using the database model that best represents each slice of data rather than forcing all data into a single model.
This document summarizes Susan Johns-Smith's work coordinating library system integrations at Pittsburg State University. It describes integrating the university's discovery platform, Summon, with their ILS after migrating from Dynix to Sierra in 2014. The integration projects included mapping MARC data in Summon to items in the online catalog, managing additions and deletions daily via FTP, and refreshing the full database quarterly. It also discusses integrating Sierra with their Encore discovery interface and implementing various APIs for functions like patron loads and fines/fees. Finally, it provides strategies for basic integration approaches and considerations around profile setup, APIs, identity management, and utilizing discovery integration.
Informatics and data analysis - McMahon - MEWE 2013mcmahonUW
Trina McMahon discusses best practices for computer-based work in bioinformatics and data analysis. She recommends treating computer work with the same care as lab work, learning to use the command line interface, and seeking help from local and global networks. She also stresses the importance of organizing files consistently, modifying systems as needs change, and staying up to date by reading literature and following experts on social media.
This document provides an introduction and overview of an IS220 Database Systems course. It outlines that the course will cover topics like database design, file organization, indexing and hashing, query processing and optimization, transactions, object-oriented and XML databases. It notes that the class will be 70% theory and 30% hands-on assignments completed in pairs. Assessment will include group work, tests, and a final exam. Class rules require punctuality, use of English, dressing professionally, and minimum 80% attendance.
CSV-X is a schema language, model, and processing engine for non-uniform CSV enabling annotation, validation, cross-referencing, Linked Data, RDF serialization, and transformation to other formats.
This document summarizes a research paper about reengineering PDF documents containing complex software specifications into multilayer hypertext interfaces. The paper proposes extracting the logical structure and text from PDFs, transforming them into XML, and generating multiple interconnected HTML pages. It describes techniques for extracting figures, tables, lists and concepts to produce navigable outputs that improve on original PDFs and HTML conversions. The framework is evaluated on its usability and architecture with the goal of future work expanding its capabilities to other document formats.
This document provides an overview of the CS639: Data Management for Data Science course. It discusses that data science is becoming increasingly important as more fields utilize data-driven approaches. The course will teach students the basics of managing and analyzing data to obtain useful insights. It will cover topics like data storage, predictive analytics, data integration, and communicating findings. The goal is for students to learn fundamental concepts and design data science workflows and pipelines. The course will include lectures, programming assignments, a midterm, and final exam.
Linked Data is exploding in the library world, but the biggest problems libraries have are coming up with the time or money involved in converting their records, looking into Linked Data programs, finding community support, and all the various other issues that arise as part of developing new methods. Likewise, one of the biggest hurdles for libraries and linked data is that they do not know what to do to get involved. As we have fewer people available and smaller budgets each year, we would like to explore ways in which libraries can get involved in the process without expending an undue amount of their already dwindling resources. To see how linked data can be applied, we will look at the example of the Smithsonian Libraries (SIL). Over the past 18 months, SIL has been preparing for the transition from MARC to linked open data. This session will talk about various SIL projects and initiatives (such as the FAST headings project and the introduction of Wikidata and WikiBase); how to incorporate linked data elements into MARC records; and how to develop staff and give them proficiency with new tools and workflows.
Heidy Berthoud, Head, Resource Description, Smithsonian Libraries
COMPanion Corporation Alexandria by Nancy Garcia, Luis Mercado, Elizabeth Tan...Louminous Mercado
Take a look at the history, strengths, & weaknesses of the COMPanion Corporation Alexandria system as well as whether we recommend it or not. The only way you'll find out is by giving our presentation a look!
This presentation was provided by Fred Reiss of the University of Oklahoma for the NISO webinar, Integrating Library Management Systems, held on June 8, 2016.
This presentation was given by Michael Lauruhn of Elsevier Labs during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016.
Presentation given at Cilip ARLG/MmIT day conference on "Research(er) Workflows in the Real World" on 9 Dec 2019 at the British Library Conference Centre. Conference summary at: https://mmitblog.wordpress.com/2020/01/20/researcher-workflows-in-the-real-world-a-guest-review-from-our-bursary-winner/
Georgia Tech Drupal Users Group - February 2015 MeetingEric Sembrat
This document summarizes the February 2015 meeting of the Georgia Tech Drupal Users Group. It includes announcements about upcoming Drupal conferences, presentations from guest speakers on introductory and advanced Views topics, and relationships within Views. Presenters discussed how to map out data connections and leverage relationships to build powerful Views, Panels, and Pages.
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...Paris Open Source Summit
ONLYOFFICE développée par Ascensio System SIA, est une suite bureautique open-source basée sur l'élément Canvas de HTML5, qui offre une gamme complète d’outils d’édition en ligne des documents texte, feuilles de calcul et présentations.
Cette présentation commence par l’aperçu des principes de base :
- support de tous les formats courants,
- riche éventail d’outils de la mise en forme,
- affichage du contenu de manière identique, quel que soit le navigateur utilisé,
- ressources permettant d’étendre les fonctionnalités des éditeurs,
- capacités avancées de co-édition,
- transfert de données sécurisé en temps réel.
Le nombre des universités et des écoles qui optent pour les alternatives open source aux solutions populaires offertes par les grandes marques, augmente chaque année. Les solutions de ONLYOFFICE sont actuellement utilisées par plus de 30 établissements d’enseignement en France tels que treize Universités de la Sorbonne, l’Université de Grenoble, l’Université de Nantes, l’École Nationale d'Ingénieurs de Brest, le l'établissement public Campus Condorcet, etc.
Dans cette partie, Jeremy Maton, l’Administrateur Systèmes et Réseaux à l’Institut de Biologie de Lille, va présenter comment ONLYOFFICE est intégrée au sein de leur unité de recherches et aide à organiser le flux de travail.
This presentation was provided by Ellen Bishop of the Florida Virtual Campus for the NISO webinar, Integrating Library Management Systems, held on June 8, 2016
Show 'Em What You've Got: Exposing Finding Aids with ArchivesSpaceAngela Kroeger
This document summarizes the University of Nebraska at Omaha's implementation of ArchivesSpace to create online finding aids for its archives and special collections. It describes ArchivesSpace and why it was chosen, the process of loading legacy data and enhancing records, and ongoing work to add collections. The presentation includes a live demo of the public and staff interfaces of UNO's ArchivesSpace installation.
Don’t make me think: biodiversity data publishing made easyVince Smith
Presented by Vince Smith at the iEvoBio 2013 meeting in Snowbird, Utah, USA on 25th June, 2013. The presentation coauthors are Alice Heaton, Laurence Livermore, Simon Rycroft and Ben Scott from the Natural History Museum, London, and Lyubomir Penev from Pensoft Publishing, Bulgaria.
This document summarizes a presentation given by Judy McNally and Doreen Herold of Lehigh University about the challenges facing their technical services department and how they are adapting workflows to address changing trends. Key challenges include acquiring fewer print materials, an explosion of digital resources, reduced budgets, and changing staff roles. The department is shifting from print to electronic serials, outsourcing more work, and cross-training staff. Staff are taking on new roles like resolving access issues for electronic journals and doing more batch cataloging of materials like ETDs and SpringerLink titles. The department is also exploring new cataloging solutions like OLE.
Design decisions relating to ChiToolbox, presented at the Kick-off Meeting for OpenVibSpec, 3 February 2020, in Bochum, Germany.
ChiToolbox is an open source MATLAB toolbox for handling data from hyperspectral imaging experiments.
https://bitbucket.org/AlexHenderson/ChiToolbox/
https://openvibspec.org/
Walk this way: Online content platform migration experiences and collaboration NASIG
In this session, a librarian and a publisher share their perspectives on content platform migrations, and the Working Group Co-chairs will describe the group’s efforts to-date and expected outcomes. Our publisher-side speaker will describe issues they must consider when their content migrates, such as providing continuous access, persistent linking, communicating with stakeholders, and working with vendors. Our librarian speaker will describe their experience and steps they take during migrations, such as receiving notifications about migrations, identifying affected e-resources, updating local systems to ensure continuous access, and communicating with their front-line staff and patrons.
The document discusses different NoSQL database models and when each may be appropriate to use. It notes that relational databases can scale with enough effort, but using the proper NoSQL model for one's data avoids unnecessary layers of abstraction. Key-value stores are best for simple dictionaries or session data, while document stores allow for querying inner document values and are well-suited for documents like blogs. Column-family databases are optimized for high write volumes with small chance of collisions. Graph databases are best for data inherently involving nodes and relationships like social networks. The best approach is "polyglot persistence", using the database model that best represents each slice of data rather than forcing all data into a single model.
This document summarizes Susan Johns-Smith's work coordinating library system integrations at Pittsburg State University. It describes integrating the university's discovery platform, Summon, with their ILS after migrating from Dynix to Sierra in 2014. The integration projects included mapping MARC data in Summon to items in the online catalog, managing additions and deletions daily via FTP, and refreshing the full database quarterly. It also discusses integrating Sierra with their Encore discovery interface and implementing various APIs for functions like patron loads and fines/fees. Finally, it provides strategies for basic integration approaches and considerations around profile setup, APIs, identity management, and utilizing discovery integration.
Informatics and data analysis - McMahon - MEWE 2013mcmahonUW
Trina McMahon discusses best practices for computer-based work in bioinformatics and data analysis. She recommends treating computer work with the same care as lab work, learning to use the command line interface, and seeking help from local and global networks. She also stresses the importance of organizing files consistently, modifying systems as needs change, and staying up to date by reading literature and following experts on social media.
This document provides an introduction and overview of an IS220 Database Systems course. It outlines that the course will cover topics like database design, file organization, indexing and hashing, query processing and optimization, transactions, object-oriented and XML databases. It notes that the class will be 70% theory and 30% hands-on assignments completed in pairs. Assessment will include group work, tests, and a final exam. Class rules require punctuality, use of English, dressing professionally, and minimum 80% attendance.
CSV-X is a schema language, model, and processing engine for non-uniform CSV enabling annotation, validation, cross-referencing, Linked Data, RDF serialization, and transformation to other formats.
This document summarizes a research paper about reengineering PDF documents containing complex software specifications into multilayer hypertext interfaces. The paper proposes extracting the logical structure and text from PDFs, transforming them into XML, and generating multiple interconnected HTML pages. It describes techniques for extracting figures, tables, lists and concepts to produce navigable outputs that improve on original PDFs and HTML conversions. The framework is evaluated on its usability and architecture with the goal of future work expanding its capabilities to other document formats.
This document provides an overview of the CS639: Data Management for Data Science course. It discusses that data science is becoming increasingly important as more fields utilize data-driven approaches. The course will teach students the basics of managing and analyzing data to obtain useful insights. It will cover topics like data storage, predictive analytics, data integration, and communicating findings. The goal is for students to learn fundamental concepts and design data science workflows and pipelines. The course will include lectures, programming assignments, a midterm, and final exam.
The document provides an overview of database management systems (DBMS). It begins with introducing the presenters and objective to make the audience knowledgeable about DBMS fundamentals and improvements. The contents section outlines topics like introduction, data, information, database components, what is a DBMS, database administrator, database languages, advantages and disadvantages of DBMS, examples of DBMS like SQL Server, and applications of DBMS.
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Australian Service Manager User Group. Presentation deck from our Knowledge Event in February 2015. Head to our website to see a recording of the event.
The document discusses the history of database management and database models through 6 generations from 1900 to present. It describes the evolution from early manual record keeping systems to current big data technologies. Key database models discussed include hierarchical, network, relational, object-oriented, and dimensional models. The document also covers topics like data warehousing and data mining.
This document provides an overview of a course on data structures and algorithms. The course covers fundamental data structures like arrays, stacks, queues, lists, trees, hashing, and graphs. It emphasizes good programming practices like modularity, documentation and readability. Key concepts covered include data types, abstract data types, algorithms, selecting appropriate data structures based on efficiency requirements, and the goals of learning commonly used structures and analyzing structure costs and benefits.
Day 4 - Excel Automation and Data ManipulationUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program: https://bit.ly/Africa_Automation_Student_Developers
In this fourth session, we shall learn how to automate Excel-related tasks and manipulate data using UiPath Studio.
📕 Detailed agenda:
About Excel Automation and Excel Activities
About Data Manipulation and Data Conversion
About Strings and String Manipulation
💻 Extra training through UiPath Academy:
Excel Automation with the Modern Experience in Studio
Data Manipulation with Strings in Studio
👉 Register here for our upcoming Session 5/ June 25: Making Your RPA Journey Continuous and Beneficial: https://community.uipath.com/events/details/uipath-lagos-presents-session-5-making-your-automation-journey-continuous-and-beneficial/
The data science process document outlines the typical steps involved in a data science project including: 1) setting research goals, 2) retrieving data from internal or external sources, 3) preparing data through cleansing and transformation, 4) performing exploratory data analysis, 5) building models using techniques like machine learning or statistics, and 6) presenting and automating results. It also discusses challenges in working with different file formats and the importance of understanding various formats as a data scientist.
A machine learning and data science pipeline for real companiesDataWorks Summit
Comcast is one of the largest cable and telecommunications providers in the country built on decades of mergers, acquisitions, and subscriber growth. The success of our company depends on keeping our customers happy and how quickly we can pivot with changing trends and new technologies. Data abounds within our internal data centers and edge networks as well as both the private and public cloud across multiple vendors.
Within such an environment and given such challenges, how do we get AI, machine learning, and data science platforms built so our company can respond to the market, predict our customers’ needs and create new revenue generating products that delight our customers? If you don’t happen to be our friends and colleagues at Google, Facebook, and Amazon, what are technologies, strategies, and toolkits you can employ to bring together disparate data sets and quickly get them into the hands of your data scientists and then into your own production systems for use by your customers and business partners?
We’ll explore our journey and evolution and look at specific technologies and decisions that have gotten us to where we are today and demo how our platform works.
Speaker
Ray Harrison, Comcast, Enterprise Architect
Prashant Khanolkar, Comcast, Principal Architect Big Data
The document provides an overview of database management systems (DBMS). It defines DBMS as software that creates, organizes, and manages databases. It discusses key DBMS concepts like data models, schemas, instances, and database languages. Components of a database system including users, software, hardware, and data are described. Popular DBMS examples like Oracle, SQL Server, and MS Access are listed along with common applications of DBMS in various industries.
Importance of Data - Where to find it, how to store, manipulate, and characterize it
Artificial Intelligence (AI)- Introduction to AI & ML Technologies/ Applications
Machine Learning (ML), Basic Machine Learning algorithms.
Applications of AI & ML in Marketing, Sales, Finance, Operations, Supply Chain
& Human Resources Data Governance
Legal and Ethical Issues
Robotic Process Automation (RPA)
Internet of Things (IoT)
Cloud Computing
Scoping Level of Effort and Getting the Right Resources for the JobJason Kaufman
In this session, you will learn proven methods used for scoping the level of effort involved in achieving successful content project outcomes. You will also learn how to leverage this data to strengthen the case for the resources needed to successfully deliver on project goals.
How Skroutz S.A. utilizes Deep Learning and Machine Learning techniques to efficiently serve product categorization! Based on my talk at Athens PyData meetup!
The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization
This document outlines the curriculum for the course "Elective Theory II - Data Science and Big Data" for the VI semester of the Diploma in Computer Engineering program. The course covers 5 units over 80 hours on data science fundamentals, data modeling, and big data concepts including storage and processing. The objectives are to understand data science techniques, apply data analysis in Python and Excel, learn about big data characteristics and technologies like Hadoop, and explore applications of big data. Topics include linear regression, classification models, MapReduce, and using big data in fields such as marketing, healthcare, and advertising.
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
This document discusses building agile analytics applications with Hadoop. It outlines several principles for developing data science teams and applications in an agile manner. Some key points include:
- Data science teams should be small, around 3-4 people with diverse skills who can work collaboratively.
- Insights should be discovered through an iterative process of exploring data in an interactive web application, rather than trying to predict outcomes upfront.
- The application should start as a tool for exploring data and discovering insights, which then becomes the palette for what is shipped.
- Data should be stored in a document format like Avro or JSON rather than a relational format to reduce joins and better represent semi-structured
Similar to The Missing Link: Metadata Conversion Workflows for Everyone (20)
Avoiding a Level of Discontent in Finding Aids: An Analysis of User Engagemen...Andrea Payant
As part of a multi-faceted research project examining user engagement with various types of descriptive metadata, Utah State University Libraries Cataloging and Metadata Services unit (CMS) investigated the discoverability of local Encoded Archival Description (EAD) finding aids. The research team put two versions of the same finding aid online with one described at the file (box or folder) level and the other at the item-level. Over a year later, the team pulled the analytics for each guide and assessed which descriptive level was most frequently accessed. The research team also looked at the type of search terms patrons utilized and wherein the finding aid they were located. Usage data shows that personal names are the most common type of search term, search terms are most commonly found in the Collection Inventory, and that the availability of item-level description improves discovery by an average of 6,100% over file-level descriptions.
How are MARC records performing in our search environment? This presentation will look at the process and results of a research project that analyzed how users’ search terms matched up with MARC fields, as well as how and where MARC records were displayed in search results lists. Presenters will discuss the process, the results of the project, and outline how attendees can implement similar research projects at their institutions, including tools and techniques they can use to analyze how their own records are surfacing in a search environment.
This document outlines best practices for building digital collections through community crowdsourcing efforts. It discusses strategies for gathering metadata and historical information from local communities in person through meetings with historical groups and individual interviews, as well as online through web forms and comments. Lessons learned include the importance of community partnerships, making the process approachable, and thanking contributors to encourage further participation.
At Utah State University, a pilot project is under development to evaluate the benefits of tracking data sets and faculty publications using the online catalog and the Library’s institutional repository.
With federal mandates to make publications and data open, universities look for solutions to track compliance. At Utah State University, the Sponsored Programs Office follows up with researchers to determine where data has been or will be deposited, per the terms of their grant.
Interested in making this publicly discoverable, the Library, Sponsored Programs, and Research Office are working together to pilot a project that enables the creation of publicly accessible MARC and Dublin Core records for data deposited by USU faculty. This project aims to make data sets, as well as publications, visible in research portals such as WorldCat, as well through Google searches.
This presentation will describe the project and anticipated benefits, as well as outline the roles of the cataloging staff and data librarian, and the involvement of the Research Office.
Mitigating the Risk: identifying Strategic University Partnerships for Compli...Andrea Payant
Payant, A., Rozum, B., Woolcott, L. (2016). Mitigating the Risk: Identifying Strategic University Partnerships for Compliance Tracking of Research Data and Publications. International Federation of Library Associations (IFLA) Satellite Conference: Data in Libraries: The Big Picture
Just Keep Cataloging: How One Cataloging Unit Changed Their Workflows to Fit ...Andrea Payant
Utah State University Libraries Cataloging and Metadata Services (CMS) unit, including student workers, transitioned to remote cataloging in March 2020 due to the COVID-19 pandemic. The presentation will outline the process undertaken by supervisors to evaluate and modify services and workflows to continue cataloging materials through the different phases of library capacity from shutting down most of the library, to a hybrid limited staff capacity, through staff back in the library full-time.
But Were We Successful: Using Online Asynchronous Focus Groups to Evaluate Li...Andrea Payant
USU launched a program in 2016 to connect researchers seeking federal funding with librarians to assist them with data management. This program assisted over 100 researchers, but was it successful? Our presentation will discuss how we evaluated the success of this program using online asynchronous focus groups (OAFG) in conjunction with a traditional survey. Our cross-institutional research team will share our findings as well as the challenges and successes of using OAFGs to assess library services.
Assessment and Visualization Tools for Technical ServicesAndrea Payant
A survey and demonstration of open source, freely available tools to help technical services units assess their work, collect and analyze data, create infographics, and visually demonstrate their impact on the library and their patrons.
The document discusses research data management at Utah State University (USU). It provides a history of USU's data management efforts beginning in 2013 with the creation of a campus committee and the hiring of a Data Librarian in 2015. The librarians developed a compliance program to meet federal requirements for data sharing and launched it in 2016. They now provide standard resources like a website and consultations, as well as non-standard services like annual communication with researchers regarding data deposit requirements. The document concludes with suggestions for backing up data using the "Rule of 3," describing data adequately, and organizing data files and directories.
liwalaawiiloxhbakaa (How We Lived): The Grant Bulltail Absáalooke (Crow Natio...Andrea Payant
USU was selected to host a unique collection of oral histories from Grant Bulltail, Crow Storyteller and 2019 NEA National Heritage Fellow, representing the stories and knowledge of the Crow Nation as passed down by his ancestors. The collection spans 20+ years of field work and collaboration across library departments and regional partners.
Crowdsourcing Metadata Practices at USUAndrea Payant
USU Libraries’ Cataloging and Metadata Unit has successfully investigated several methods to engage the public to involve them in the creation of metadata for USU’s Digital History Collections. Most, if not all the techniques we have tested have yielded positive results and have improved the relevancy and accuracy of our descriptive metadata.
Homeward Bound: How to Move an Entire Cataloging Unit to Remote WorkAndrea Payant
Utah State University Libraries Cataloging and Metadata Services (CMS) unit, including student workers, transitioned to remote cataloging in March 2020 due to the COVID-19 pandemic. This presentation will outline the process undertaken by supervisors to evaluate and modify services and workflows to continue cataloging service during the time when the library was shut down.
The document summarizes a research project conducted by the Cataloging and Metadata Services unit at Utah State University to analyze user search behavior and the performance of MARC records in search results. The project involved analyzing web logs of searches, scraping search results pages, and coding records and fields in Airtable. Key findings included that MARC records make up around 20% of search results on average, vendor records appear more often than locally created records, and the 245 and 505 fields were most important for retrieving records while the 505, 520 and 650 fields had the greatest impact if missing from records. Guidelines for cataloging practice were proposed based on the findings.
Outlines the development of the two single-service point and education initiatives, describes feedback gathered from a survey, and discusses how the Cataloging and Metadata Services unit plans to adapt services based on findings
Charting Communication: Assessment and Visualization Tools for Mapping the Co...Andrea Payant
The document summarizes a study conducted by Becky Skeen, Liz Woolcott, and Andrea Payant at Utah State University on assessing communication patterns within their cataloging and metadata services department. They used interaction logs filled out by staff weekly and an anonymous survey distributed to other library departments. The study found lower than expected interaction with other technical services units and higher interaction with special collections. It also contradicted stereotypes of catalogers being withdrawn by finding most interactions were social. The data analysis tools used included Excel, Qualtrics, Tableau and OpenRefine. Conducting this assessment on a regular basis and expanding the research was recommended to provide more useful insights into communication over time.
Memes of Resistance, Election Reflections, and Voices from Drug Court: Social...Andrea Payant
Folklorists and librarians have long championed social justice and advocacy issues. Today, the skills garnered through principled academic discourse, community based ethnographic fieldwork, and ethical librarianship are being utilized to collect, preserve, present, and educate around social themes and issues. USU folklorists and librarians are working to create robust digital collections that focus on timely social issues with informed and ethical metadata.
Giving Credit Where Credit is Due: Author and Funder IDsAndrea Payant
A process to include standardized funder and author identifiers into institutional repository and ILS records which are associated with funded research data
VOCAB for Collaboration: How “Work Language” Can Help You Win at TeamworkAndrea Payant
Clair Canfield's VOCAB model provides a framework for effective collaboration through vulnerability, ownership, communication, acceptance, and boundaries. The document discusses each element of the model and provides tips for incorporating them into teamwork. It suggests taking time for reflection, setting group agreements, embracing different communication styles, taking accountability, and accepting realities outside of one's control. Practicing these concepts can help teams work through challenges, utilize individual strengths, and adapt to change.
Can You Scan This For Me? Making the Most of Patron Digitization Request in t...Andrea Payant
This document discusses Utah State University's process for handling patron requests to digitize materials from the archives. It outlines the evolution from self-serve scanning to a mediated scanning service with a charge. The main challenges are lack of consistency, turnaround time, and documentation. The solution was to create an online digitization request form and standardized workflow. Initial results showed around 90 requests since implementation, with most being made available online. Next steps include linking digital items to finding aids and expanding the process to more complex requests within collections.
Wisdom of the Crowd: Successful Ways to Engage the Public in Metadata CreationAndrea Payant
Utah State University Libraries’ Cataloging and Metadata Unit has successfully used several methods to engage the public in metadata creation for USU’s Digital History Collections.
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894PECB
As artificial intelligence continues to evolve, understanding the complexities and regulations regarding AI risk management is more crucial than ever.
Amongst others, the webinar covers:
• ISO/IEC 42001 standard, which provides guidelines for establishing, implementing, maintaining, and continually improving AI management systems within organizations
• insights into the European Union's landmark legislative proposal aimed at regulating AI
• framework and methodologies prescribed by ISO/IEC 23894 for identifying, assessing, and mitigating risks associated with AI systems
Presenters:
Miriama Podskubova - Attorney at Law
Miriama is a seasoned lawyer with over a decade of experience. She specializes in commercial law, focusing on transactions, venture capital investments, IT, digital law, and cybersecurity, areas she was drawn to through her legal practice. Alongside preparing contract and project documentation, she ensures the correct interpretation and application of European legal regulations in these fields. Beyond client projects, she frequently speaks at conferences on cybersecurity, online privacy protection, and the increasingly pertinent topic of AI regulation. As a registered advocate of Slovak bar, certified data privacy professional in the European Union (CIPP/e) and a member of the international association ELA, she helps both tech-focused startups and entrepreneurs, as well as international chains, to properly set up their business operations.
Callum Wright - Founder and Lead Consultant Founder and Lead Consultant
Callum Wright is a seasoned cybersecurity, privacy and AI governance expert. With over a decade of experience, he has dedicated his career to protecting digital assets, ensuring data privacy, and establishing ethical AI governance frameworks. His diverse background includes significant roles in security architecture, AI governance, risk consulting, and privacy management across various industries, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: June 26, 2024
Tags: ISO/IEC 42001, Artificial Intelligence, EU AI Act, ISO/IEC 23894
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
Beginner's Guide to Bypassing Falco Container Runtime Security in Kubernetes ...anjaliinfosec
This presentation, crafted for the Kubernetes Village at BSides Bangalore 2024, delves into the essentials of bypassing Falco, a leading container runtime security solution in Kubernetes. Tailored for beginners, it covers fundamental concepts, practical techniques, and real-world examples to help you understand and navigate Falco's security mechanisms effectively. Ideal for developers, security professionals, and tech enthusiasts eager to enhance their expertise in Kubernetes security and container runtime defenses.
Top Profile Creation Sites List - Boost Your Online Presencemonikakhanna42677
Looking to enhance your digital profile? Check out our ultimate list of profile creation sites. Perfect for SEO and gaining high-quality backlinks.
Visit site:- https://www.seoworld.in/high-pr-profile-creation-sites-list/
Hospital pharmacy and it's organization (1).pdfShwetaGawande8
The document discuss about the hospital pharmacy and it's organization ,Definition of Hospital pharmacy
,Functions of Hospital pharmacy
,Objectives of Hospital pharmacy
Location and layout of Hospital pharmacy
,Personnel and floor space requirements,
Responsibilities and functions of Hospital pharmacist
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...Neny Isharyanti
Presented as a plenary session in iTELL 2024 in Salatiga on 4 July 2024.
The plenary focuses on understanding and intepreting relevant TPACK competence for teachers to be adept in teaching multimodality in the digital age. It juxtaposes the results of research on multimodality with its contextual implementation in the teaching of English subject in the Indonesian Emancipated Curriculum.
Environmental science 1.What is environmental science and components of envir...Deepika
Environmental science for Degree ,Engineering and pharmacy background.you can learn about multidisciplinary of nature and Natural resources with notes, examples and studies.
1.What is environmental science and components of environmental science
2. Explain about multidisciplinary of nature.
3. Explain about natural resources and its types
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartMohit Tripathi
SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY BATTA SATKA MATKA PATTI JODI NUMBER MATKA RESULTS MATKA CHART MATKA JODI SATTA COM INDIA SATTA MATKA MATKA TIPS MATKA WAPKA ALL MATKA RESULT LIVE ONLINE MATKA RESULT KALYAN MATKA RESULT DPBOSS MATKA 143 MAIN MATKA KALYAN MATKA RESULTS KALYAN CHART
Kalyan Matka Kalyan Result Satta Matka Result Satta Matka Kalyan Satta Matka Kalyan Open Today Satta Matka Kalyan
Kalyan today kalyan trick kalyan trick today kalyan chart kalyan today free game kalyan today fix jodi kalyan today matka kalyan today open Kalyan jodi kalyan jodi trick today kalyan jodi trick kalyan jodi ajj ka.
How to Purchase Products in Different Units of Measure (UOM) in Odoo 17Celine George
In these slides, we will discuss how Odoo makes it easier to configure Purchase UOM for products, create purchase orders, convert units, confirm purchase orders, and receive products. Let's explore how these features can benefit our business.
Cross-Cultural Leadership and CommunicationMattVassar1
Business is done in many different ways across the world. How you connect with colleagues and communicate feedback constructively differs tremendously depending on where a person comes from. Drawing on the culture map from the cultural anthropologist, Erin Meyer, this class discusses how best to manage effectively across the invisible lines of culture.
No, it's not a robot: prompt writing for investigative journalismPaul Bradshaw
How to use generative AI tools like ChatGPT and Gemini to generate story ideas for investigations, identify potential sources, and help with coding and writing.
A talk from the Centre for Investigative Journalism Summer School, July 2024
Slide Presentation from a Doctoral Virtual Open House presented on June 30, 2024 by staff and faculty of Capitol Technology University
Covers degrees offered, program details, tuition, financial aid and the application process.
The Missing Link: Metadata Conversion Workflows for Everyone
1. Create It Once, Use It
Again…and Again…andAgain…
Cross-platform Repurposing of Archival
Metadata
Andrea Payant
Sara Skindelien
Liz Woolcott
Utah State University
Carol Ou
Katherine Rankin
University of Nevada, Las Vegas
Cory Nimer
Brigham Young University
2. The Missing Link
Metadata Conversion Workflows for Everyone
Andrea Payant
Metadata Specialist
andrea.payant@usu.edu
Sara Skindelien
Special Collections Assistant
sara.skindelien@usu.edu
Liz Woolcott
Head, Cataloging & Metadata
Liz.woolcott@usu.edu
CIMA Annual Conference
2016
3. PilotProject
Working conditions
• No archival management system
• Hand coded EAD guides
• Legacy finding aids
• No consistent use of spreadsheets
• Digital repository for archival
material
• Contribute to two consortiums
• Need to meet both standards
4. PilotProject
What we needed
• Streamline/automate metadata
creation
• Link digitized images between EAD
and CONTENTdm
• Make work flexible
• Work can be done by anyone
(library staff, student workers,
curators)
•Lower the tech barrier
• XML transformations require in-
depth training – is there another
way?
• Document procedures
5. PilotProject
SCA-Digital (SCA-D) Workflow Group
• What/Who
• Group composed of Special Collection and Archives staff, Digital Initiatives
staff, and Metadata staff
• Purpose
• Streamline workflows between Special Collections and Digital Initiatives
• Primary focus on metadata creation – most time consuming of tasks
• Timeline
• 2014-2015
• Results (View report: https://usu.box.com/s/fqyn5usd9b4wf6pcg7466bwt3oeam4q6)
• Developed two workflows
• Automation of EAD to Dublin Core and
• Digital content linking
• Digital Assessment Checklist
• Tackled two retro metadata projects
6. Two processes, step-by-step
Workflow for converting HTML finding aid inventory into Dublin
Core: https://workflowy.com/s/mRrejmDAtj
Workflow for Digital Content Linking:
https://workflowy.com/s/Ekz41aSze2
8. Repurposing EAD Container Lists
Problem: We needed a simple, low tech option to convert our legacy finding
aids into Dublin Core compliant metadata for digitization.
Solution: Opted for “copy/paste” process because it was by far the easiest
method to develop and teach. EVERYBODY can copy/paste.
Tools:
Methods:
Microsoft Office (Excel specifically), Oxygen XML Editor, &
CONTENTdm
In less than 10 easy steps we adjusted data using common Excel
spreadsheet formulas and batch imported the data into the digital
collection management system
17. Add the Collection Name, Collection Number and Collection URL at
the top for automatic exporting to Dublin Core sheet.
Step5:Insertcollectioninformation
20. Step 7: Save Excel spreadsheet as a
new tab delimited file.
Step 6: Filenames, provided by the
Digital Initiatives staff, are added for
each item.
Step 8: Open in a text editor such as
Notepad and save the file again for
batch uploading into CONTENTdm.
22. Batch Linking Digital Content
OVERVIEW
Procedure 1 – Exporting and Spreadsheet Clean-Up
o Outcome: Create a tab delimited file – re-purpose existing metadata
Procedure 2 – Mail Merge
o Outcome: Use metadata to create container lists in xml for EAD finding
aids and complete batch linking
Procedure 3 – Uploading the Finding Aid
o Outcome: Perform quality control and upload to Archives West
23. Batch Linking Digital Content
Procedure 1 – Exporting and Spreadsheet Clean-Up
• Export metadata from CONTENTdm
• Open the tab delimited file in Excel and edit as needed
24. Batch Linking Digital Content
Procedure 2 – Mail Merge
• Use an xml container list template - copy & paste into a new Word document
• Use mail merge feature in Word to automatically populate container list fields
from your source file
• Edit the merged document
25. Batch Linking Digital Content
Procedure 3 – Uploading the Finding Aid
• Copy & Paste new container list from Word into the <dsc> section of the
master xml document
26. What we learned
- Training needs
• Be prepared to teach/re-teach
• Helping them see the bigger picture
How are users going to access the material
How will these descriptions look in all applicable systems (CDM, Archives
West, etc.)
- Develop and train everyone on Best Practices
- Fluency with Excel
• Excel will mess with dates – make sure this formatted correctly
- Compliance with multiple standards
• DACs allows “circa” dates, RDA prefers “approximate”, ISO standards do not
• Need to be machine-readable and human readable
- Future applications of this process will change (ie. adopting
ArchivesSpace)
27. Want to try it out?
Workflow for Digital Content Linking:
https://workflowy.com/s/Ekz41aSze2
Workflow for converting HTML finding aid inventory into Dublin Core:
https://workflowy.com/s/mRrejmDAtj
Visit our Blog/Find our presentation slides here:
http://usucataloging.wix.com/usucatalogers
No ArchivesSpace
Hand code EAD (or use template)
Used CONTENTdm as digital repository
Contributor to two consortiums, need to meet both standards
ArchivesWest (for EADs)
MWDL (for digital content)
Some batch loading of digital content
Relied on spreadsheets populated row by row
Sara and Andrea will be demonstrating two processes – converting HTML finding aid inventories into Dublin Core metadata and Digital Content Linking. All the step-by-step procedures are available at the links above, if you want to try them out later. We will also show these links at the end of the presentation.
In the days before standardization, finding aid formats were as unique as the people creating them. This made legacy finding aids difficult to convert into spreadsheets. In addition, we also found that XML stylesheets vary with each collection. We needed a simple, low tech option to convert our legacy finding aids into Dublin Core compliant data for digitization. After extensive research, we opted for the copy/paste method process because it was by far the easiest method to develop and teach. Everybody can copy the html table-formatted container list and paste it into an Excel spreadsheet. We also wanted to utilize the tools we already had on hand – Excel, Oxygen, CONTENTdm. We did not want to purchase or design new software- since such an approach would have been counterproductive to our goal of maintaining a low technological bar. So by developing a strategy that involves less than 10 steps, we adjusted data using common spreadsheet formulas and an XML Editor to batch import the data into the digital collection management system.
Step 1: We copied the table formatted container list from the online finding aid
We then open our plain, old, run-of-the mill spreadsheet. Or is it?
We pasted the html table-formatted container list into a blank spreadsheet, which we titled “Raw HTML Copy”. We want to separate out the identifying numbers in Column B – the 01:01: and so forth, from the title and place the data into its own column.
We accomplish this by inserting a column, enter our formula =RIGHT(C1, LEN(C1)-7, with 7 representing the number of characters you want removed from the cell.
We now have the title isolated into its own cell.
Step 3: Insert another column to include our identifiers. Type in the first three identifiers: 1:01, 1:02, 1:03, highlight the first three rows and grab the black square at the bottom and drag down to your last item to autofill the cells with consecutive numbers.
The identifiers have now been separated from the title into their own column.
Step 4: Copy corresponding columns from the Raw HTML sheet into the EAD sheet. But beware: make sure you select Paste Special when copying instead of just Paste to make sure only the data is exported over and NOT the formulas otherwise your data fields will not export correctly and will display hashtags.
Step 5: Insert the collection name, Herald Journal Photograph Collection; Collection Number, P0001 & Collection URL into the first three rows.
Through the use of embedded Excel formulas, collection information is then effortlessly exported over to the Dublin Core sheet from the EAD sheet into Source, Physical Collection Name,
Physical Collection Number, Box, Item, Call Number, & Collection Inventory URL for each item. Review the Dublin Core sheet for complete exportation and clean up the sheet to remove empty columns. For instance, this collection did not have folder information so the sheet exported zeros for folder information. Those will need to be re
Step 6: Insert filenames provided by our Digital Initiatives staff
Step 7: Save spreadsheet as a tab delimited file
Step 8: Open the file in a text editor such as Notepad, delete any trailing spaces, and save the file again for batch uploading into CONTENTdm. And now Andrea will explain the batch linking digital content.
Overview
This is a brief outline of the procedures involved in the workflow we have created to batch upload and embed links to digital content in EAD finding aids (this process works best when an entire collection has been digitized). The 3 main processes include first, the exporting of digital collection metadata into a tab delimited file then editing that metadata in order to repurpose it for linking. Second, we then use the mail merge function in Microsoft Word to automatically create an xml format container list that can be copied then pasted directly into the xml document for the EAD finding aid. Finally, you perform quality control on the xml document then upload the content to Archives West.
Here is a more detailed look at the process
The first step is to export metadata from your digital asset management system – in our case the system is CONTENTdm and the process is pretty simple.
In CONTENTdm administration you select the collections tab and then go to the export option from the menu – you make the appropriate selections for the metadata export – then CONTENTdm creates
a tab-delimited text file > you right click on the file to “Save Link As” and save it to your computer. This text file can now be opened in Microsoft Excel. You click through the text import wizard until the process is finished.
The result should be a spreadsheet that looks something like this with a lot of fields for the collection metadata
Which you will then edit to only include information needed to create an EAD container list with the necessary elements for the xml document including component numbers, component levels, and any necessary hierarchical containers for box, folder, or item, and title, format, date, and the ARK URLs for linking the digital content.
Once you have finished making the necessary edits to your spreadsheet you can move on to the next step which is to utilize the mail merge function in Microsoft Word to create a new xml container list for EAD with links to digital content embedded. To begin you will need to use a template like the one you see here
This template should represent the xml coding needed for a single item in your EAD finding aid and you want to be sure to include the digital access object and xlink tagging (which are necessary for the content linking to operate effectively).
The parts of the xml template that are highlighted here in the angle brackets are variable while the rest of the text is constant, or fixed.
Mail merge will use each row of data in your spreadsheet to populate these variable fields and duplicate this template for each item in your collection.
To perform the mail merge you first go to the mailings tab in Word and click “Start Mail Merge” and then make sure “Normal Word Document” is selected.
Second, you click “Select Recipients” and choose “Use Existing List”, a new window opens to select a table > you select your spreadsheet then another new window opens for you to select your spreadsheet again as the data source for the merge.
Next, you will assign fields from your spreadsheet to the corresponding EAD elements in the xml template. You begin by highlighting the first EAD element, then you go to “Insert Merge Field” then select the matching field from the drop down list of data source options. You repeat the same process for each of the EAD elements in your template.
Once you have finished, you complete the merge by selecting “Finish & Merge,” then you select “Edit Individual Document” then you choose “All”
You will now have a new word document that should look like this. You should see xml for individual items in your collection on each page with information inserted from your spreadsheet.
You will then want to make any necessary edits to the xml (like removing empty tags or getting rid of all the extra white space).
Then for the final phase of the process you copy the entire container list in Word and paste it into the <dsc> section of your master xml file for your collection’s EAD finding aid. You can then perform quality control on the xml, once finished you can upload your new EAD finding aid complete with links to the digital objects
Throughout the creation of this workflow we have learned a few things and we can make some suggestions of things to keep in mind for anyone seeking to implement this process:
First, there will most likely be a training needs - you will need be prepared to teach and re-teach as necessary, also make sure those involved in the process understand the overall purpose and benefits from the results of their work – for example teach about how users are accessing material and also what the description differences are in each system
You will also need to be sure that everyone is aware of and using best practices and standards for your institution to ensure consistency from all parties involved in the process
This workflow involves the use of Excel quite a bit – so there needs to be a certain level of fluency with the program – for example: formatting cells in the spreadsheet can be tricky especially when working with dates for your collection
You will need to also make sure that there is compliance across multiple standards – for example, DACS allows “circa” dates but ISO standards do not - you will need to keep in mind that there is an overall need for the information to be machine readable as well as human readable
Finally – be aware of and consider the future applications for the process (for example we anticipate adopting Archives Space at some point and we will no doubt have to adapt our workflows for that)
If you would like to try out the process you can access our detailed workflows as well and the slides from this presentation today at these sites.
You are also welcome to contact any of us if you have further questions