SlideShare a Scribd company logo
Shining a light on our analytics and usage data
Ben Showers, Jisc
Joy Palmer, Mimas
Graham Stone, University of HuddersfieldThis work is licensed under a
Creative Commons Attribution 3.0
Unported License
Why do this?
Library Data at Huddersfield
Library Impact Data Project
To support the hypothesis that…
“There is a statistically significant correlation across a
number of universities between library activity data
and student attainment”
Library Impact Data Project 1
Original data requirements for each student:
• Final grade achieved
• Number of books borrowed
• Number of times e-resources were accessed
• Number of times each student entered the library, e.g. via a
turnstile system that requires identity card access
• School/Faculty
– Showed a statistical
significance between:
• Final grade achieved
• Number of books
borrowed
• Number of times e-
resources were accessed
– Across all 8 partners
Library Impact Data Project 1
Library Impact Data Project
Phase I looked at over 33,000 students across 8 universities
Phase II looked at around 2,000 FT undergraduate students at
Huddersfield
Library Impact Data Project 2
Now with additional data:
• Demographics
• Discipline
• Retention
• On/off campus use
• Breadth and depth of e-resource usage
• UCAS points (entry data)
• Correlations for Phase 1
Library Impact Data Project 2
Conclusions:
• We showed statistical significance for demographics such as
age, gender, ethnicity and country of origin
• We showed statistical significance across top level subjects
and within these disciplines
• We showed a connection between library use and retention
• We showed the depth and breadth of a collection may make
a difference
Drivers for change 1And we know all this is firmly on
Libraries’ radars
Our survey:
How important will analytics be to
academic libraries now and in the
future, and what is the potential for a
service in this area?
What about sharing your data about usage with other
institutions?
There’s a significant appetite for analytics services….But hesitation over
sharing entry data and other student data than other forms of usage
data.
Only 46% would be willing to share data if the institution was named.
But if institutional identity can be anonymised, that changes to 91%
And is this a current strategic priority?
What about in the next five years?
Cue
Can we collect data from institutions and create tools that allow
libraries to analyze how their resources are being used, when
and by whom?
Can this dashboard also give institutions the tools to
compare or even benchmark usage
against other institutions?
What about the benefits of scale?
What data can we use and get a hold of?
UCAS data, loan data, eResource logins..
(but not data on usage of individual items)
(yet)
Our collaborators and data providers:
data wrangling:
Getting, analyzing, cleaning, and presenting that data
A brief (important) word on ethics
Should we be holding and analyzing this kind of data?
• Data protection issues & ‘Big brother’ concerns
• All students pay the same fees – shouldn’t they be treated the same?
But what if we didn’t do this
• What would the reaction be if it was found that we had this data but didn’t act
on it?
• We have a duty to care for the individual wellbeing of our students
Qs1 group a
Working with the API to present the data…
How should users work with the data?
What do they want to be able to do?
What do they do?What does the system do?
The Epic User Stories
• connect the library with the university
mission
• contribute to the institutional analytics
effort
• demonstrate value added to users
• ensure value from major investments
• develop investment business cases
• impact student measures of
satisfaction, such as NSS
• address measures of equality and
diversity of opportunity
• inform / justify library policy and
decisions as evidence led
• engage stakeholders in productive
dialogue
• identify basket of measures covering
all key areas
• inform librarian professional
development
• enable the sector to understand the
questions to be answered
Qs1 group a
Qs1 group a
Job stories
JiscLAMP –What did we achieve?
• LAMP project outputs
– We managed to clean up and process the data from all of
the partners
– We created a prototype – our analytics engine
– We performed a benchmarking exercise
• We showed that the idea of a shared library analytics service
was feasible
What can we do with the data?
» We can demonstrate usage by cohorts:
› Department
› Degree name
› Course
› Course ‘type’?
› Gender/Ethnicity/Nationality/Disability/Age
› Level of attainment
› Attendance mode (full time/part time)
› UCAS points
What can we do with the data?
And we can demonstrate correlations between usage and
attainment, usage and cohort (and attainment and cohort)
Qs1 group a
Qs1 group a
Qs1 group a
And we can potentially signal if
findings are statistically significant or not…
But where exactly does the user journey and workflow begin?
Within the system or outside of it?
How much do we assume the user is analysing the data?
And much analysis should the tool perform on behalf of the end
user?
» Here’s a simple question:
How do humanities and social
science students use books? *
*All the data used in this presentation is completely made up.Any resemblance
to real university library usage data, living or dead, is purely coincidental.
» And a simple answer
Humanities bigger users than
social sciences
» But wait!
Humanities still bigger users,
but the difference isn’t so stark
» What about other factors?
In both disciplines, full-time
students borrow more books
than part-time ones
The difference is smaller for
social scientists than for
humanities students
Part-time students seem quite
similar across disciplines
» Who’s not borrowing?
So although the means were
fairly similar (2.9 to 2.3),
proportionally there are lots
more social scientists who have
never borrowed a book.
» Who’s not borrowing?
There’s clearly a big problem
with part-time students in the
humanities
In the social sciences, everyone
is equally in need of help
Social science part-timers are
more likely to borrow books
than humanities part-timers –
why might that be?
» Who’s not borrowing?
Shows median borrowing
(humanities = 3, social sciences = 2)
Shows upper and lower quartiles
Shows max and min values
Standard way of describing data –
but is it useful here?
Where do we go from here? Phase 2
» We have funding for
Phase 2
» We’re now testing the
‘ugly prototype’
» Currently putting
together the new
project and marcomms
plan
» Make the data beautiful and
compelling….
› Develop a dashboard UI
through iterative testing
and development
Our key areas for focus 2014-15
» Usage data to ‘profile’
individuals, e.g. for REF
or intervention
purposes?
› What are the ethical
or legal issues?
Our key areas for focus 2014-15
» eResource item level
usage and the current
approach of the UK
Access Management
Federation
› Is it possible to crack
that nut?
Our key areas for focus 2014-15
» NSS data and SCONUL
stats. Integration would
be of major value.
› How can we bring
that data into scope?
Our key areas for focus 2014-15
» Data literacy
› What does it mean?
Who needs it?
› What needs to be
automated and what
needs to be taught as
a skillset?
Our key areas for focus 2014-15
» Benchmarking.
› The killer app?
› Is there a business
case for the service if
it doesn’t provide the
capability to compare
across institutions?
› How would this work?
Our key areas for focus 2014-15
JiscLAMP – Phase 2
• Workshop with SCONUL (London 7 May 2014)
• Key contacts/relationships for next phase
– HESA (NSS)
– Shibboleth/Athens
– SCONUL (performance group)
• Developing the business case and model
How can you get involved?
» Follow and comment on our
blog:
http://jisclamp.mimas.ac.uk
» Attend a LAMP workshop
(tba)
» Become a data contributor!
› email: b.showers@jisc.ac.uk
› email joy.palmer@manchester.ac.uk
› email g.stone@hud.ac.uk

More Related Content

Qs1 group a

  • 1. Shining a light on our analytics and usage data Ben Showers, Jisc Joy Palmer, Mimas Graham Stone, University of HuddersfieldThis work is licensed under a Creative Commons Attribution 3.0 Unported License
  • 3. Library Data at Huddersfield
  • 5. To support the hypothesis that… “There is a statistically significant correlation across a number of universities between library activity data and student attainment”
  • 6. Library Impact Data Project 1 Original data requirements for each student: • Final grade achieved • Number of books borrowed • Number of times e-resources were accessed • Number of times each student entered the library, e.g. via a turnstile system that requires identity card access • School/Faculty
  • 7. – Showed a statistical significance between: • Final grade achieved • Number of books borrowed • Number of times e- resources were accessed – Across all 8 partners Library Impact Data Project 1
  • 8. Library Impact Data Project Phase I looked at over 33,000 students across 8 universities Phase II looked at around 2,000 FT undergraduate students at Huddersfield
  • 9. Library Impact Data Project 2 Now with additional data: • Demographics • Discipline • Retention • On/off campus use • Breadth and depth of e-resource usage • UCAS points (entry data) • Correlations for Phase 1
  • 10. Library Impact Data Project 2 Conclusions: • We showed statistical significance for demographics such as age, gender, ethnicity and country of origin • We showed statistical significance across top level subjects and within these disciplines • We showed a connection between library use and retention • We showed the depth and breadth of a collection may make a difference
  • 11. Drivers for change 1And we know all this is firmly on Libraries’ radars Our survey: How important will analytics be to academic libraries now and in the future, and what is the potential for a service in this area?
  • 12. What about sharing your data about usage with other institutions? There’s a significant appetite for analytics services….But hesitation over sharing entry data and other student data than other forms of usage data. Only 46% would be willing to share data if the institution was named. But if institutional identity can be anonymised, that changes to 91%
  • 13. And is this a current strategic priority?
  • 14. What about in the next five years?
  • 15. Cue
  • 16. Can we collect data from institutions and create tools that allow libraries to analyze how their resources are being used, when and by whom?
  • 17. Can this dashboard also give institutions the tools to compare or even benchmark usage against other institutions? What about the benefits of scale?
  • 18. What data can we use and get a hold of? UCAS data, loan data, eResource logins.. (but not data on usage of individual items) (yet)
  • 19. Our collaborators and data providers:
  • 20. data wrangling: Getting, analyzing, cleaning, and presenting that data
  • 21. A brief (important) word on ethics Should we be holding and analyzing this kind of data? • Data protection issues & ‘Big brother’ concerns • All students pay the same fees – shouldn’t they be treated the same? But what if we didn’t do this • What would the reaction be if it was found that we had this data but didn’t act on it? • We have a duty to care for the individual wellbeing of our students
  • 23. Working with the API to present the data… How should users work with the data? What do they want to be able to do? What do they do?What does the system do?
  • 24. The Epic User Stories • connect the library with the university mission • contribute to the institutional analytics effort • demonstrate value added to users • ensure value from major investments • develop investment business cases • impact student measures of satisfaction, such as NSS • address measures of equality and diversity of opportunity • inform / justify library policy and decisions as evidence led • engage stakeholders in productive dialogue • identify basket of measures covering all key areas • inform librarian professional development • enable the sector to understand the questions to be answered
  • 28. JiscLAMP –What did we achieve? • LAMP project outputs – We managed to clean up and process the data from all of the partners – We created a prototype – our analytics engine – We performed a benchmarking exercise • We showed that the idea of a shared library analytics service was feasible
  • 29. What can we do with the data? » We can demonstrate usage by cohorts: › Department › Degree name › Course › Course ‘type’? › Gender/Ethnicity/Nationality/Disability/Age › Level of attainment › Attendance mode (full time/part time) › UCAS points
  • 30. What can we do with the data? And we can demonstrate correlations between usage and attainment, usage and cohort (and attainment and cohort)
  • 34. And we can potentially signal if findings are statistically significant or not…
  • 35. But where exactly does the user journey and workflow begin? Within the system or outside of it?
  • 36. How much do we assume the user is analysing the data? And much analysis should the tool perform on behalf of the end user?
  • 37. » Here’s a simple question: How do humanities and social science students use books? * *All the data used in this presentation is completely made up.Any resemblance to real university library usage data, living or dead, is purely coincidental.
  • 38. » And a simple answer Humanities bigger users than social sciences
  • 39. » But wait! Humanities still bigger users, but the difference isn’t so stark
  • 40. » What about other factors? In both disciplines, full-time students borrow more books than part-time ones The difference is smaller for social scientists than for humanities students Part-time students seem quite similar across disciplines
  • 41. » Who’s not borrowing? So although the means were fairly similar (2.9 to 2.3), proportionally there are lots more social scientists who have never borrowed a book.
  • 42. » Who’s not borrowing? There’s clearly a big problem with part-time students in the humanities In the social sciences, everyone is equally in need of help Social science part-timers are more likely to borrow books than humanities part-timers – why might that be?
  • 43. » Who’s not borrowing? Shows median borrowing (humanities = 3, social sciences = 2) Shows upper and lower quartiles Shows max and min values Standard way of describing data – but is it useful here?
  • 44. Where do we go from here? Phase 2 » We have funding for Phase 2 » We’re now testing the ‘ugly prototype’ » Currently putting together the new project and marcomms plan
  • 45. » Make the data beautiful and compelling…. › Develop a dashboard UI through iterative testing and development Our key areas for focus 2014-15
  • 46. » Usage data to ‘profile’ individuals, e.g. for REF or intervention purposes? › What are the ethical or legal issues? Our key areas for focus 2014-15
  • 47. » eResource item level usage and the current approach of the UK Access Management Federation › Is it possible to crack that nut? Our key areas for focus 2014-15
  • 48. » NSS data and SCONUL stats. Integration would be of major value. › How can we bring that data into scope? Our key areas for focus 2014-15
  • 49. » Data literacy › What does it mean? Who needs it? › What needs to be automated and what needs to be taught as a skillset? Our key areas for focus 2014-15
  • 50. » Benchmarking. › The killer app? › Is there a business case for the service if it doesn’t provide the capability to compare across institutions? › How would this work? Our key areas for focus 2014-15
  • 51. JiscLAMP – Phase 2 • Workshop with SCONUL (London 7 May 2014) • Key contacts/relationships for next phase – HESA (NSS) – Shibboleth/Athens – SCONUL (performance group) • Developing the business case and model
  • 52. How can you get involved? » Follow and comment on our blog: http://jisclamp.mimas.ac.uk » Attend a LAMP workshop (tba) » Become a data contributor! › email: b.showers@jisc.ac.uk › email joy.palmer@manchester.ac.uk › email g.stone@hud.ac.uk

Editor's Notes

  1. Background image – something along the lines of data visualisations
  2. So. How did we get here?Before I get into the details of the LAMP project I want to give a bit of background and context. Jisc has been involved in looking at activity or usage data and how we can exploit it for some time now. Back in 2007we started looking at activity data and personalisation, when it was clear that models of consumption were being completely turned on their head by the likes of AmazonAnd of course right from the beginning we’ve been interested in how this might apply to the context of the library, and particularly the discovery interface.…to improve existing services…to gain insights into user behaviour…to measure the impact of the library
  3. And while Huddersfield was the only institution experimenting with data to quite this extent, of course all of this has been very much on most Library’s radars for a while now. Indeed, some of the more competitive library system vendors are building in analytics tools so that some usage data can be tracked on an institutional level.Just over a year ago we worked with RLUK and SCONUL to conduct a survey to fully understand how important analytics will be to academic libraries now and in the future, and to see also what opportunities there might be for a shared service in this area.
  4. 61 institutions responded to the survey, so a pretty good sample. What we were particularly interested in was the appetite libraries and institutions had for sharing their data.We found that while there was obvious appetite for analytics services, no surprise there, there was more hesitation over sharing usage data. These concerns were around being ‘named’ and sharing (potentially) sensitive information that could be distorted. In other words, the data might not tell a story an institution wants to tell (think headline: ‘most chemistry majors who get a first never step foot in the library.’ that sort of thing!).But this concern diminished if there were reassurances that institutional identity could be anonymised. Respondents could see the value in sharing for purposes such as profiling and benchmarking.
  5. We asked people whether this was a key priority. Most people felt that it was an ’important but not essential’ strategic priority
  6. But this changed when we asked about the next 5 years, where most felt that it would become a top priortiy.So this is clearly an area for potential opportunity and growth
  7. Cue the Library Analytics and Metrics or LAMP project. We started our work early last year, following on from the survey results and also the work of the Library Impact Data project
  8. An innovations project, we wanted to explore whether we Can “collect data from institutions and create tools that allow libraries to analyze how their resources are being used, when and by whom.Many systems vendors are already providing tools in some of these areas, but while they can provide stats on usage, what they can’t do is demonstrate which resources are being used by which different groups. They can’t tell you how resources are being used by first year law students, or second year students from China for example, orby students who were close to failing the previous year.The ability to work with UCAS data allows us to do this and so provide significantly more value to the tools
  9. What we’re also interested in though is what the benefits of scaling this activity up might be. Can we also provide institutions to compare and benchmark against one another? Is this another area where we can add value, but exploiting the potential of this being an above campus, shared service?
  10. But of course there are a great deal of unknowns we had to work through first to assess the feasibility of all this.The first key challenge is around getting the data, which sits in different systems and is often ‘managed’ by different parts of the institution, particularly the UCAS data.The data we needed (and knew could be available from most institutions, at least from somewhere) was that precious UCAS data (so we can identify the characteristics of users), but we also needed loan data, and eResource logins. What we didn’t take was usage of individual items, at least not for now. I’ll come back to that.
  11. When we initially put feelers out to find out who might be able to work with us on this, we though we might get 2 or 3. But we were very pleased to end up with six institutions, all of whom could provide us with the data we needed and who agreed to trial this with us.
  12. The next phase of work was really around data wrangling. Getting the data (or getting the person who can get the data and the permission to use the data)Analysing, normalising, and then developing an API for presenting the data in a visualisation dashboard.
  13. Because without a graphical visualisation dashboard, this is what your visualisation tool looks like.(and actually this is a spreadsheet showing how we’re normalising fields such as ‘ethnicity’ or country of origin)
  14. And it’s been in developing the API and considering the requirements of that API that we’ve come into our most challenging and interesting phase of work.In doing this we’ve needed to ask questions about how users work with the data. What do they want to be able to do?And then there are more fundamental questions over what analysis the user undertakes, and what the system does on behalf of the user, which I’ll also come back to
  15. And it’s been in developing the API and considering the requirements of that API that we’ve come into our most challenging and interesting phase of work.In doing this we’ve needed to ask questions about how users work with the data. What do they want to be able to do?And then there are more fundamental questions over what analysis the user undertakes, and what the system does on behalf of the user, which I’ll also come back to
  16. And here’s our first basic wireframe, which starts to think in more detail about functionality and the different tasks that users might want to accomplish (for example, tracking usage by discipline or demographics such as country or ethnicity).But this was still very much exploratory work, and although this looks sensible, as we discovered, the devil is very much in the detail.
  17. First and foremost we’ve had to work within the constraints of the data that we had.Here is what we *can* do with the data from those institutions.
  18. And here are some screen shots of the very basic prototype as it’s functioning right now, with live data. So here is a pie chart demonstrating loans activity among students studying for a bachelors in psychology, broken down by genderApparently women borrow a lot more – or do they? Or are there simply more women in that discipline? What story is this data actually telling? I’ll let Ellen get into these types of issues
  19. For example, we can potentially signal if findings are statistically significant or not.So is usage by female psychology students from the first graph something to take a note of or not? This is something we’re going to integrate into the tool into the next phase – the ability to know if what you’re seeing is significant or now.
  20. I’m going to walk you through how complicated it is to structure the service in a way that it actually answers the question the user wants to ask. Here’s a fairly simple question which we know from the use cases and job stories is a fairly common interest (although the things people compare will vary)
  21. The answer is pretty straightforward (the data is made up by the way, you can tell because it is so tidy, humanities have borrowed more books, in this time period, than social sciences
  22. But of course this might be because there are more humanities students than social science students – so in fact what the user might really want to know is the average use per student. If we calculate this, we can see that humanities is still the bigger users, but the difference is less obvious than with the simple presentation of counts. Would the user know that this is what they want, though, when they first ask the question? Would they think about the number of users and how it might affect the total books borrowed?
  23. And of course there are other factors to take into account besides discipline which might affect usage. For example, mode of study might change how thinkgs look – unsurprisingly, in our example it does! In both disciplines, full time students are bigger borrowers than part time ones. But the difference is smaller for social scientists than for humanities students.
  24. And remember the original question – how do humanities and social science students use books? Well, what about people who aren’t using books at all? Nothing we’ve looked at so far tells you anything about them, but it’s probably quite important to know. This graph shows us that although the mean use was fairly similar in both subjects, there are a lot more social scientists who have never borrowed a book
  25. We can get even more fancy – this is a boxplot, which is a fairly common way of showing the distribution of a dataset. It shows the median average (basically of you lined up, say, 100 data points in numerical order from low to high, the median is the value that would appear in position 50) and the upper and lower quartiles (again, all your data points lined up but in this case the values that appear at the position 75 and 250 and also the maximum and minimum values - 100 and 1 in the lineup). People who are familiar with statistics would get a lot of useful information from this chart – but is it just going to be confusing to the non-expert user?
  26. Can usage data be used to ‘profile’ individuals, e.g. for REF or intervention purposes? What are the ethical or legal issues?eResource item level usage. Is it possible to crack that nut? How are other countries tackling this? And is this service viable without that data?NSS data and SCONUL stats. How can we bring that data into scope and help streamline reporting processes?Data literacy. What does it mean? Who needs it? What needs to be automated and what needs to be taught?Benchmarking. The killer app? We need data first…