SlideShare a Scribd company logo
Using OpenURL Activity Data Activity Data Online Exchange Event Sheila Fraser 2 nd  June 2011
What are the OpenURL Router Data? Learners, teachers and researchers in UK HE institutions seek journals and papers for academic use A paper may be available from many different service providers - so which is the “appropriate copy” for a user? The OpenURL Router directs the request to the appropriate institutional resolver Each redirect request is logged, providing a record of the article the user was attempting to find The Router also logs non-bibliographic metadata: “lookup” requests (registry searches) and preferred button image requests Existing process … Institutional Resolvers OpenURL Router Institutional Resolvers Institutional Resolvers Institutional Resolvers Request Redirect  request Log request Level 0 Data (Log)
What are we doing in this project? Aim 3: Explore including other institutions’ data in aggregation Existing process … Institutional Resolvers OpenURL Router Institutional Resolvers Institutional Resolvers Institutional Resolvers Request Redirect  request Log request Level 0 Data (Log) Survey institutions to enable opt-out Level 1 Data Process to Level 1: Exclude data from opted-out institutions Anonymise IP addresses Anonymise institution & remove button & lookup data that would identify institution Parse OpenURL request into constituent parts Process to Level 2: Include only redirect to resolver requests Level 2 Data Use for prototypes & services Aim 1:  Make this data available under open licence Aim 2: Develop prototype service using this activity data
What’s in the data set? Log-specific data (based on OpenURL Router log entries): logDate  (Date the record was logged)  logTime  (Time the record was logged in format HH:MM:SS)  encryptedUserIP  (Anonymised IP address/session identifier)  Request-specific data (based on the OpenURL standard): institutionResolverID  (Anonymised institutional identifier)  routerRedirectIdentifier (Redirect identifier passed as part of the URL)  aulast (Last author)  aufirst (First author)  auinit (First author's first and middle initials)  auinit1 (First author's first initial)  auinitm (First author's middle initial)  au (Full name of a single author)  aucorp (Organization or corporation that is the author or creator of the document)  atitle  (Article title)  title (Journal title, for compatibility with version 0.1)  jtitle  (Journal title)  stitle (Short journal title)  date (Date of publication)  ssn (Season (chronology). Legitimate values are spring, summer, fall, winter)  quarter (Quarter (chronology). Legitimate values are 1, 2, 3, 4.)  volume (Volume designation, usually expressed as a number but could be non-numeric)  part (Part can be a special subdivision of a volume or it can be the highest level division of the journal. Parts are often designated with letters or names)  issue (Designation of published issue of a journal)  spage (First page number. Pages are not always numeric)  epage (Second (ending) page number)  pages (Start and end pages, e.g. 53-58)  artnum (Article number assigned by the publisher)  issn  (International Standard Serials Number)  eissn (ISSN for electronic version of the journal)  isbn  (International Standard Book Number)  coden (Alphanumeric bibliographic code)  sici (Serial Item Contribution Identifier)  genre  btitle  (The title of the book - can also be expressed as title)  place (International Standard Book Number)  pub  (Publisher name)  edition (Statement of the edition of the book)  tpages (Total pages)  series (The title of a series in which the book or document was issued)  doi  (Digital Object Identifier)  sid  (Service ID, the item(journal, article etc) provider)  Further details: http://openurl.ac.uk/doc/data/data.html
How might the data be used? Article/journal recommendations Student analysis Research thesis Publishers comparing listings with texts sought Identifying priorities for eJournal preservation Innovative services to meet your users’ needs Other, unanticipated uses
Explore including other institutions’ data Can it be aggregated? Data compatibility (OpenURL standard) What are the issues? Legal DPA: cannot share personal data without permission Technical Can we / how do we extract resolver data at the same level of detail? How to identify duplicates? Regular sharing & maintainability? Financial What potential effort could be involved? What other issues are there?

More Related Content

Using OpenURL Activity Data - Activity Data Online Exchange Event

  • 1. Using OpenURL Activity Data Activity Data Online Exchange Event Sheila Fraser 2 nd June 2011
  • 2. What are the OpenURL Router Data? Learners, teachers and researchers in UK HE institutions seek journals and papers for academic use A paper may be available from many different service providers - so which is the “appropriate copy” for a user? The OpenURL Router directs the request to the appropriate institutional resolver Each redirect request is logged, providing a record of the article the user was attempting to find The Router also logs non-bibliographic metadata: “lookup” requests (registry searches) and preferred button image requests Existing process … Institutional Resolvers OpenURL Router Institutional Resolvers Institutional Resolvers Institutional Resolvers Request Redirect request Log request Level 0 Data (Log)
  • 3. What are we doing in this project? Aim 3: Explore including other institutions’ data in aggregation Existing process … Institutional Resolvers OpenURL Router Institutional Resolvers Institutional Resolvers Institutional Resolvers Request Redirect request Log request Level 0 Data (Log) Survey institutions to enable opt-out Level 1 Data Process to Level 1: Exclude data from opted-out institutions Anonymise IP addresses Anonymise institution & remove button & lookup data that would identify institution Parse OpenURL request into constituent parts Process to Level 2: Include only redirect to resolver requests Level 2 Data Use for prototypes & services Aim 1: Make this data available under open licence Aim 2: Develop prototype service using this activity data
  • 4. What’s in the data set? Log-specific data (based on OpenURL Router log entries): logDate (Date the record was logged) logTime (Time the record was logged in format HH:MM:SS) encryptedUserIP (Anonymised IP address/session identifier) Request-specific data (based on the OpenURL standard): institutionResolverID (Anonymised institutional identifier) routerRedirectIdentifier (Redirect identifier passed as part of the URL) aulast (Last author) aufirst (First author) auinit (First author's first and middle initials) auinit1 (First author's first initial) auinitm (First author's middle initial) au (Full name of a single author) aucorp (Organization or corporation that is the author or creator of the document) atitle (Article title) title (Journal title, for compatibility with version 0.1) jtitle (Journal title) stitle (Short journal title) date (Date of publication) ssn (Season (chronology). Legitimate values are spring, summer, fall, winter) quarter (Quarter (chronology). Legitimate values are 1, 2, 3, 4.) volume (Volume designation, usually expressed as a number but could be non-numeric) part (Part can be a special subdivision of a volume or it can be the highest level division of the journal. Parts are often designated with letters or names) issue (Designation of published issue of a journal) spage (First page number. Pages are not always numeric) epage (Second (ending) page number) pages (Start and end pages, e.g. 53-58) artnum (Article number assigned by the publisher) issn (International Standard Serials Number) eissn (ISSN for electronic version of the journal) isbn (International Standard Book Number) coden (Alphanumeric bibliographic code) sici (Serial Item Contribution Identifier) genre btitle (The title of the book - can also be expressed as title) place (International Standard Book Number) pub (Publisher name) edition (Statement of the edition of the book) tpages (Total pages) series (The title of a series in which the book or document was issued) doi (Digital Object Identifier) sid (Service ID, the item(journal, article etc) provider) Further details: http://openurl.ac.uk/doc/data/data.html
  • 5. How might the data be used? Article/journal recommendations Student analysis Research thesis Publishers comparing listings with texts sought Identifying priorities for eJournal preservation Innovative services to meet your users’ needs Other, unanticipated uses
  • 6. Explore including other institutions’ data Can it be aggregated? Data compatibility (OpenURL standard) What are the issues? Legal DPA: cannot share personal data without permission Technical Can we / how do we extract resolver data at the same level of detail? How to identify duplicates? Regular sharing & maintainability? Financial What potential effort could be involved? What other issues are there?

Editor's Notes

  1. Intro – what will be gone through in the 10 mins: What the OpenURL Router Data are Project aims and data process What’s in the data set How the data might be used – more ideas please Explore issues in including other institutions’ data – suggestions and comments welcome
  2. Key points: data are logs of OpenURL Router requests from existing process – may be openURL data or non-bibliographic
  3. Key points: 3 aims – make data available under open licence, develop prototype service and explore adding others’ data to the aggregation
  4. 1. Highlight some different aspects of the OpenURL request data, e.g. Article Title Journal Title Book Title Author ISSN DOI … 2. What can the data tell us? Maybe some simple things like the most sought articles, journals and books (except data is limited by that which has gone through the router). E.g. If one user has made several requests in a short space of time we can infer that they are linked in some way, and develop a recommender prototype based on those links. The challenge in this is that we need a large body of data with which to make links.
  5. If for recommendations or prioritisation then more is better…so how do we get more data?
  6. What is ‘critical mass’ and how do we get to it? Reputation issues for institutions? Purpose of aggregation?