SlideShare a Scribd company logo
What’s so special about the social sciences?  Peter Burnhill Director, EDINA national academic data centre,  University of Edinburgh, Scotland UK Bloomsbury Conference on e-publishing and e-publications University College London, 24/25 June 2010
short answer:  Some things but not everything Overview for a longer answer Autobiographic Apologia Yesterday and Yesteryears Research publication and data for research in the social sciences some evidence (some old and re-used)  all that is digital are not data Societal Big Challenges a sense of place Our shared task ease and continuity of access citation and linking Linked Data: Semantic Web anyone? Socio-Informatics & the Internet Will not take the full two hours ….
Social Science Research Council  [now ESRC] ‘ Scientific Officer’ for Economic & Social History and Statistics (left to do MSc Statistics at London School of Economics) Scottish Education Data Archive,  until mid ‘80s Survey statistician: school leavers, YTS, 16-19 cohort surveys; demand for HE  Graduate School, Faculty of Social Science,  1987 – 1997 Senior Lecturer, teaching quantitative/survey methods Edinburgh University Data Library,  mid- 1980s & on Manager: set-up and development President of IASSIST, 1997 – 2001: social science data professionals ESRC Regional Research Laboratory  for  Scotland  1986/90 Co-director: early days of Geographical Information Systems (GIS) EDINA national data centre,  mid-1990s to present:  my day job Director: set-up and continuous development Digital Curation Centre,  2004/05 as Interim Director  set-up & definition of  ‘data curation + digital preservation’ autobiography as commentary
digital curation: ... digital objects and data, over their life-cycle, for current & future generations of use ... = f(data curation & digital preservation) data curation [when high current/ongoing interest] actions needed to maintain and utilise digital data & research results over entire life-cycle data creation & management; adding value; generating new sources of information & knowledge, for use digital preservation [for longevity;fall off in interest] long-run technological/legal accessibility & usability storage, maintenance & accessibility of information content in digital material over the long-term, for use OAIS concept of designated community What is this digital curation anyway? Taken from a PPT to JISC in July 2004 …
Taken from a PPT to JISC in July 2004 … The term Digital Curation is a rather recent invention .  The  Digital Data Curation Task Force - Report of the Task Force Strategy Discussion Day  (2002) states  Tony Hey took up the term which had been used by Dr John Taylor, Director General of the Research Councils, to distinguish the actions involved in caring for digital data beyond its original use, from digital preservation. The concept’s reach extends beyond libraries.   The e-Science Curation Report (2003) proposed the following distinctions:  Curation : The activity of, managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and re-use. For dynamic datasets this may mean continuous enrichment or updating to keep it fit for purpose. Higher levels of curation will also involve maintaining links with annotation and with other published materials.  Archiving : A curation activity which ensures that data is properly selected, stored, can be accessed and that its logical and physical integrity is maintained over time, including security and authenticity.  Preservation : An activity within archiving in which specific items of data are maintained over time so that they can still be accessed and understood through changes in technology.
UK funding councils   JISC Sub-Committees JISC Collections acting as platform for network-level services  &   helping to build the JISC Integrated Information Environment  research, learning & teaching in UK  universities & colleges Research  Councils UK National Data Centres
EDINA Management Board met yesterday  to review its 3-year Strategy and its Budget from JISC for the coming year
Reading & Reference Room: supporting scholarly communication the Depot  international Open Access   facility to support  self  deposit of peer-reviewed papers SUNCAT  UK serials union catalogue: what’s held where No longer host specialist Abstract & Index databases … EDINA Strategy in this area just reviewed following: RLUK/JISC Resource Discovery Task Force SCONUL Shared Services Business Case EDINA Focus Groups on ‘ ease and continuity of access’ Arts, Humanities & Social Sciences new technologies
Reading & Reference Room: supporting scholarly communication the Depot  international Open Access   facility to support  self  deposit of peer-reviewed papers SUNCAT  UK serials union catalogue: what’s held where No longer host specialist Abstract & Index databases … EDINA Strategy in this area just reviewed following: RLUK/JISC Resource Discovery Task Force SCONUL Shared Services Business Case EDINA Focus Groups on ‘ ease and continuity of access’ Arts, Humanities & Social Sciences new technologies
authorisation licence  to use Ensuring  researchers, students and their teachers have ease and continuity of access to online scholarly resources ‘ ease’ ‘ continuity’ P.Burnhill, Edinburgh 2009 open restricted access   to content & services usability post-cancellation back content preservation Creative Commons  licensing discoverability Search (Re-)Use Modify/Combine Share (Issue/Publish) additional considerations Should apply to different types of resource:  typically  journal articles,  but also now OER learning materials, data etc
Finds the agencies looking after e-journal, and the volumes being preserved
Geo-spatial resources: Map & Data Place
Multimedia resources: Sound & Pictures Show platform for search and download of film, video and audio   wide range of subject coverage, including documentary film Llicensed for use in learning, teaching and research Being re-worked as the Digital Media Hub, combining Film & Sound Online initial 600 hours of film, digitised for downloading NewsFilm Online   3000 hours of material from ITN & Reuters Over 4TBs of clips to download Release of product from JISC Digitisation programmes  Plus Education Image Gallery of still photography Visual and Sound Materials Portal project Discovering all sorts of audio-visual material Special interest for social science as record on non-print record of 20 th  Century: the first A-V century With  new forms of research material to use and to master
Defining the Social Sciences a collection of disciplines that variously apply theorising and systematic method to the study of human society  from family to politics, from law/religion to economy:  of what it is to be human and our interaction among ourselves and with our environment, whether on land, sea or the Internet Teaching draws upon schooling: social arithmetic of Qualified Empirical Statements We make  provisional  statements about the world  in the language of  our  theory and the context of time & place on basis of evidence derived from the  [real]  world conditioned by our theory and choice of systematic method seeking to qualify our statements with imperative to express our measures of uncertainty
Pattern of research publication in the social sciences  ‘ The Four Literatures of Social Science’  (Diana Hicks, 2004)   Handbook of Quantitative Science and Technology Studies, Henk Moed (Ed ) All more trans-disciplinary than comparable scientific literatures international journal articles  the SSCI indexed currency of evaluation books  can have a high citation/impact national  knowledge developed in context embedded in their society ; influenced by national trends & policy concerns non-scholarly publications  knowledge into application enlightenment or knowledge transfer  to the non-scholarly public Hicks states  Burnhill and Tubby-Hille (1994)  “investigated this issue in some  depth [with] publications database from [ESRC] grant reports [and] survey .. .. Assigning non peer reviewed journals to .. enlightenment  .. suggests that psychologists, statisticians and geographers do not publish much in non-scholarly literature. Other fields do. Even economics, normally quite scientific in its publication patterns, exhibits a healthy percentage of articles in non-scholarly venues. Linguistics, education and sociology lead in share of non-scholarly publications.” ‘ On measuring the relation between social science research activity and research publication’  Research Evaluation  4 (3) December 1994
Pattern of research publication in the social sciences  ‘ The Four Literatures of Social Science’  (Diana Hicks, 2004)   Handbook of Quantitative Science and Technology Studies, Henk Moed (Ed ) international journal articles books  national  embedded in their society non-scholarly publications  enlightenment or knowledge transfer  to the non-scholarly public Hicks states  Burnhill and Tubby-Hille (1994)  “investigated this issue  in some  depth [with] publications database from [ESRC] grant reports [and] survey .. ..  Assigning non peer reviewed journals to .. enlightenment   .. suggests that psychologists, statisticians and geographers do not publish much in non-scholarly literature. Other fields do.  Even economics , normally quite scientific in its publication patterns, exhibits a  healthy percentage of articles in non-scholarly venues. Linguistics, education and sociology lead in share of non-scholarly publications. ” ‘ On measuring the relation between social science research activity and research publication’  Research Evaluation  4 (3) December 1994
Pattern of research publication in the social sciences  Table from Burnhill and Tubby-Hille (1994) reproduced in Vasilakos et al (2007)  ‘Evaluating the Performance of UK Research in Economics’, [sponsored by the Royal Economic Society]  Keele Economics Research Papers, ISSN1740-231x  www.keele.ac.uk/depts/ec/kerp
Pattern of research publication in the social sciences  from Burnhill and Tubby-Hille (1994), not yet reproduced by anyone
Pattern of research publication in the social sciences  Following the trace to  Keele Economics Research Papers, ISSN1740-231x  www.keele.ac.uk/depts/ec/kerp   led me to:
What’s special about social sciences:  policy & action “ philosophers have only interpreted the world,  the point is to change it” Karl Marx (1845), Thesis 11  published in 1924 in German & Russian translation; in English in 1938  appeared in Engels’ edited version in 1888, as ‘Theses on Feurbach’  Not the moment to debate origins of social science:  of Hume, Ferguson, Smith, Hegal, Marx, Kant, Jung, Parsons, Durkheim, Popper etc  – even Jeremy Bentham (UCL),  nor of modern theorists,  but along with development and shifts in theory  … what is key is that … the  practice  of social science,  and the modality of peer communication and publication in the discipline,  has much to do its connection to the urgency of interaction with agencies of civil society
Six Strategic Challenges: Global Economic Performance, Policy & Management Health & Well-being Environment, Energy & Resilience Security, Conflict & Justice Social Diversity & Population Dynamics New Technology, Innovation & Skills UK: ESRC  Strategic Plan  & Societal Big Challenges
Six Strategic Challenges: Global Economic Performance, Policy & Management Health & Well-being Environment, Energy & Resilience Security, Conflict & Justice Social Diversity & Population Dynamics New Technology, Innovation & Skills UK: ESRC  Strategic Plan  & Societal Big Challenges 7. Public Debt & the ConDem Government
Data as scholarship: a cultural shift? Preserve or Perish “ You are not finished until you have done the research, published the results,  and  published the data, receiving formal credit for everything.” Mark A. Parsons (2006) International Polar Year “ A scholar’s positive contribution is measured by the sum of the original data that he contributes. Hypotheses come and go but data remain.” in  Advice to a Young Investigator  (1897) Santiago Ramón y Cajal  (Nobel Prize winner, 1906)
What’s special about social sciences:  third party data Demand for data to carry out  secondary data analysis   Social sciences do not generate all the data they need to address their research questions Do not command the resources (funding/expertise)  few research groups and Government could get funding to manufacture original data ESRC-led  National Data Strategy, 14 Actions: potential research value of new types of data (transactions data and tracking records)  new data infrastructures  via EU and Euro Strategy Forum for Research Infrastructures improved access to Census of Population data a geo-spatial resources advisory service (JISC/ESRC)  collaborative agreements with agencies within and outside UK sharing of data resources across ‘North/South’ global networks Explains why data libraries and archives have been around so long IASSIST   International Association for Social Science Information Service & Technology annual conference since 1974;  www.iassistdata.org DISC-UK  a group of data libraries in UK universities  (including EUDL) Providing  ease of access to data held elsewhere  (including UKDA) Datashare  project to support institutional responsibilities for data  alongside  Institutional Repositories
Note: Not all that is digital are data  (& vice versa) Data derive importance from their evidential value the empirical base  for (scholarly) statement & decision-making Provenance (how data are derived) is very important Differences in ways that disciplines in Humanities & Social Sciences assess scholarship and evidence in what they regard as data, as value for their subject mix of approach to  epistemology, inc  document tradition Data represented (encoded) as numbers or words  - often derived from observation (with issues of  phenomenology!) or as pictures or sounds  (not encoded - pre-data?) access to (now digitally/digitised) record of experience or algorithmic models  (as with physical & life sciences) modelling is widespread in economics, psychology, social statistics, geography etc
Our shared task: To ensure ease & continuing access  to record of scholarship research publications and research data  Consider at least three types of (research) data:  Supplementary data  [enhanced publication] multimedia files:  part of the published article that presents research argument and conclusions more than linear text, limited tabular and graphical display enhances user experience with various multimedia objects Research dataset(s)  upon which conclusions based check analysis of those data to support statements made  Database(s)  from which datasets were assembled for reproducibility (exposure to refutation) and new work  via alternative analysis and updates to the database(s) these are curated  in situ – by data centres / originators
Citation and linking Citation of the datasets used  (Type B data) verification of analysis, that  the figures and conclusions accurately reflect  those data Citation of d atabase(s)  (Type C data) for reproducibility (exposure to refutation)  to prompt new work via alternative analysis and updates to the database(s) to credit those who curate the data needed for scholarship Plus  hyperlink   to  the database  from  the published article …  and back again  from  the database  to  the published article  Links to presentations, blogs, websites, funders etc related to the same research activity and same researcher(s)   (Type D data?)
Obtaining the citation at source CIESIN  “ Most of our datasets and products contain a suggested citation on the Web site as to where the data was obtained” “ Whenever possible, we urge you to cite the use of data  and web resources in the reference section”  http ://sedac.ciesin.columbia.edu/citations / How to Cite Statistics Canada Products: “ This guide has been developed for authors, editors, researchers, academics, students, librarians and data librarians.  “ It describes, in three steps, how to build your reference  when citing Statistics Canada products” http://www.statcan.gc.ca/pub/12-591-x/12-591-x2006001-eng.htm Get it from those who make the data available: the data publishers cf Cataloguing in Publication!
Link remains the key verb But need to shift attention from Linking resolver   (unidirectional) From metadata reference to full text of article SICI-Citation | Z39.50 DOI | OpenURL | http to Linked Data  (relational, bi-directional)  Between resources in the weave of the Web Using URIs as names for things  Not just URLs (the addresses on the web) but the URIs Using RDF/XML to define the relationships between the resources  RDF triples: subject / relationship / object
Resource Description Framework  (RDF) Resource Description Framework (RDF), and URIs framework for representing information in Web; identifiers http://www.w3.org/TR/rdf-concepts/  http://www.w3.org/TR/rdf-primer/
RDF graph: Article & Supplementary Data  http://www.emeraldinsight.com/fig/0350570303002.png Build and publish as metadata in XML format to be found on the web Publishing text and data/multimedia content in XML will delight researchers Researchers want to access ‘article as data’, via computational algorithm
uses Linked Data
uses Linked Data
Parse to ‘mark up’ archaeological site record (metadata)
 
 
Overcoming sparse metadata  problem that inhibits discoverability using ancillary information in the metadata evoking ‘has Event’ relation Initial focus on (digitised) 20th Century newsfilm footage Enriching resources with contextual metadata  Sparse Metadata The only data we have: 1 st  October 1995 Cyprus Disturbance (street disturbance)  British soldiers Broadcast on TV News
finding related text  for mining and so  auto-creating metadata to improve discoverability  and  provide/enhance context
 
Digital Library as applied Information Science Michael Buckland, Presidential Address, American Society for Information Science, JASIS’s 50th  (1998) 2 traditions/mentalities co-exist in Information Science Document tradition : signifying record-ness Computational tradition : various uses of formal techniques non-convergent mentalities working to build the ‘digital library’ modernisation of library services  infrastructure to access complex databases Aside: first met Clifford Lynch when visiting Professor Buckland in  UC Berkley on occasion of IASSIST Conference in 1994
[email_address] http://edina.ac.uk Tel.: +44 (0)131 650 3302 Fax: +44 (0)131 650 3308 Time for me to stop …  Hoping that I have left some space/place for questions Thank you Acknowledgements
Pattern of research publication in the social sciences  from Burnhill and Tubby-Hille (1994), not yet reproduced by anyone
ISSN Register E-J Preservation Registry Service E-Journal Preservation Registry SERVICES: user requirements (a) (b) Data dependency P iloting an   E -journals   P reservation R egistry   S ervice METADATA on extant e-journals METADATA   on preservation action Abstract Data Model: Figure 1 in reference paper in  Serials , March 2009 Digital Preservation Agencies   e.g. CLOCKSS, Portico; BL, KB;  UK LOCKSS Alliance etc.
Author (article) Reader (article) Publisher article  serial issue Library (serial) Licence Challenge to Ensure Continuing Access peer  review peer exchange Informal: ‘invisible college’ and the ‘gift economy’ Institutional arrangement  Licensed  Online  Access Fo rma£  E c onomy ILL/ docdel Continuity  of access learned  society Long term  digital preservation E-prints Institutional Repositories free to web access E-prints Subject Repositories
Author (article) Reader (article) Publisher article  serial issue Library (serial) Licence Increasing dominance of The Web peer to peer exchange Informal: ‘invisible college’ and the ‘gift economy’ Institutional arrangement   Fo rma£  E c onomy free to web access Role of Institutional Repositories? Web 2.0/3.0: Semantic web mash-ups, Blogs. RSS feeds, Wikis
Research Data  Creator Researcher Generates (curates) data for own purpose,  or as part of team     … wants/has to ‘put’ it somewhere for use by others (perhaps to be recognised by a peer community) Key User (Researcher) Verbs: Discover  data of interest Locate  service on that data with documentation on provenance etc  Request  permission to use service Access to service/data,  Evidential value of data in analysis as object of desire’
…… .. The term “curation” builds on our understanding of the word “curator”, somebody who keeps something for the public good, whose value often needs to be brought out by the curator. Firstly, this open context implies more support for explicit policies with regard to data sharing, and it has major implications for structuring and tools.  Secondly, the digital curator is store-keeper but he is also closely linked to promoting new science, making sure that his user-base is solid, sufficient, and looking forward to identify new ways to serve present and future researchers. The digital curator should take an active role in promoting and adding value to his holdings, hold exhibitions, run joint events; he should manage the value of his collection.
More definitions There does seem to be a lack of clarity. Some terms worth distinguishing are:  data preservation  : a general term probably equivalent to digital preservation in this context  digital preservation  : could be, and probably is, interpreted as simply ensuring the original bits and bytes are accessible  digital information preservation  : this is what is referred to in the OAIS standard - what is important is not the original "bits and bytes" but the content. An OAIS ensures that the content is accessible, understandable and usable.  curation  : general term - taking care of things  if someone currently calls themselves a “curator” – do we accept their definition? data curation  : looking after and adding value to data  digital curation  : looking after and somehow "adding value" to digital data. This probably implies creating some new data from the existing, in order to make the latter more useful and "fit for purpose".  information curation  : not seen in the wild evidence  : bit preservation plus authenticity and trust?
licence  to use ‘ ease’ P.Burnhill, Edinburgh 2009 usability open preservation post-cancellation back content restricted access   to content & services who/WAYF authentication licence registry entitlement history location registry/discovery content registry archiving registry UKAMF registry Suncat &  Zetoc OpenURL Router [Curation is additional but has relation to ease and continuing access.]  Use case: article–length work published in e-journals ISSN Register as a key content registry; need registry of ToC   ‘ continuity’ Ensuring  researchers, students and their teachers have ease and continuity of access to online scholarly resources

More Related Content

What's So Special about the Social Sciences

  • 1. What’s so special about the social sciences? Peter Burnhill Director, EDINA national academic data centre, University of Edinburgh, Scotland UK Bloomsbury Conference on e-publishing and e-publications University College London, 24/25 June 2010
  • 2. short answer: Some things but not everything Overview for a longer answer Autobiographic Apologia Yesterday and Yesteryears Research publication and data for research in the social sciences some evidence (some old and re-used) all that is digital are not data Societal Big Challenges a sense of place Our shared task ease and continuity of access citation and linking Linked Data: Semantic Web anyone? Socio-Informatics & the Internet Will not take the full two hours ….
  • 3. Social Science Research Council [now ESRC] ‘ Scientific Officer’ for Economic & Social History and Statistics (left to do MSc Statistics at London School of Economics) Scottish Education Data Archive, until mid ‘80s Survey statistician: school leavers, YTS, 16-19 cohort surveys; demand for HE Graduate School, Faculty of Social Science, 1987 – 1997 Senior Lecturer, teaching quantitative/survey methods Edinburgh University Data Library, mid- 1980s & on Manager: set-up and development President of IASSIST, 1997 – 2001: social science data professionals ESRC Regional Research Laboratory for Scotland 1986/90 Co-director: early days of Geographical Information Systems (GIS) EDINA national data centre, mid-1990s to present: my day job Director: set-up and continuous development Digital Curation Centre, 2004/05 as Interim Director set-up & definition of ‘data curation + digital preservation’ autobiography as commentary
  • 4. digital curation: ... digital objects and data, over their life-cycle, for current & future generations of use ... = f(data curation & digital preservation) data curation [when high current/ongoing interest] actions needed to maintain and utilise digital data & research results over entire life-cycle data creation & management; adding value; generating new sources of information & knowledge, for use digital preservation [for longevity;fall off in interest] long-run technological/legal accessibility & usability storage, maintenance & accessibility of information content in digital material over the long-term, for use OAIS concept of designated community What is this digital curation anyway? Taken from a PPT to JISC in July 2004 …
  • 5. Taken from a PPT to JISC in July 2004 … The term Digital Curation is a rather recent invention . The Digital Data Curation Task Force - Report of the Task Force Strategy Discussion Day (2002) states Tony Hey took up the term which had been used by Dr John Taylor, Director General of the Research Councils, to distinguish the actions involved in caring for digital data beyond its original use, from digital preservation. The concept’s reach extends beyond libraries. The e-Science Curation Report (2003) proposed the following distinctions: Curation : The activity of, managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and re-use. For dynamic datasets this may mean continuous enrichment or updating to keep it fit for purpose. Higher levels of curation will also involve maintaining links with annotation and with other published materials. Archiving : A curation activity which ensures that data is properly selected, stored, can be accessed and that its logical and physical integrity is maintained over time, including security and authenticity. Preservation : An activity within archiving in which specific items of data are maintained over time so that they can still be accessed and understood through changes in technology.
  • 6. UK funding councils JISC Sub-Committees JISC Collections acting as platform for network-level services & helping to build the JISC Integrated Information Environment research, learning & teaching in UK universities & colleges Research Councils UK National Data Centres
  • 7. EDINA Management Board met yesterday to review its 3-year Strategy and its Budget from JISC for the coming year
  • 8. Reading & Reference Room: supporting scholarly communication the Depot international Open Access facility to support self deposit of peer-reviewed papers SUNCAT UK serials union catalogue: what’s held where No longer host specialist Abstract & Index databases … EDINA Strategy in this area just reviewed following: RLUK/JISC Resource Discovery Task Force SCONUL Shared Services Business Case EDINA Focus Groups on ‘ ease and continuity of access’ Arts, Humanities & Social Sciences new technologies
  • 9. Reading & Reference Room: supporting scholarly communication the Depot international Open Access facility to support self deposit of peer-reviewed papers SUNCAT UK serials union catalogue: what’s held where No longer host specialist Abstract & Index databases … EDINA Strategy in this area just reviewed following: RLUK/JISC Resource Discovery Task Force SCONUL Shared Services Business Case EDINA Focus Groups on ‘ ease and continuity of access’ Arts, Humanities & Social Sciences new technologies
  • 10. authorisation licence to use Ensuring researchers, students and their teachers have ease and continuity of access to online scholarly resources ‘ ease’ ‘ continuity’ P.Burnhill, Edinburgh 2009 open restricted access to content & services usability post-cancellation back content preservation Creative Commons licensing discoverability Search (Re-)Use Modify/Combine Share (Issue/Publish) additional considerations Should apply to different types of resource: typically journal articles, but also now OER learning materials, data etc
  • 11. Finds the agencies looking after e-journal, and the volumes being preserved
  • 13. Multimedia resources: Sound & Pictures Show platform for search and download of film, video and audio wide range of subject coverage, including documentary film Llicensed for use in learning, teaching and research Being re-worked as the Digital Media Hub, combining Film & Sound Online initial 600 hours of film, digitised for downloading NewsFilm Online 3000 hours of material from ITN & Reuters Over 4TBs of clips to download Release of product from JISC Digitisation programmes Plus Education Image Gallery of still photography Visual and Sound Materials Portal project Discovering all sorts of audio-visual material Special interest for social science as record on non-print record of 20 th Century: the first A-V century With new forms of research material to use and to master
  • 14. Defining the Social Sciences a collection of disciplines that variously apply theorising and systematic method to the study of human society from family to politics, from law/religion to economy: of what it is to be human and our interaction among ourselves and with our environment, whether on land, sea or the Internet Teaching draws upon schooling: social arithmetic of Qualified Empirical Statements We make provisional statements about the world in the language of our theory and the context of time & place on basis of evidence derived from the [real] world conditioned by our theory and choice of systematic method seeking to qualify our statements with imperative to express our measures of uncertainty
  • 15. Pattern of research publication in the social sciences ‘ The Four Literatures of Social Science’ (Diana Hicks, 2004) Handbook of Quantitative Science and Technology Studies, Henk Moed (Ed ) All more trans-disciplinary than comparable scientific literatures international journal articles the SSCI indexed currency of evaluation books can have a high citation/impact national knowledge developed in context embedded in their society ; influenced by national trends & policy concerns non-scholarly publications knowledge into application enlightenment or knowledge transfer to the non-scholarly public Hicks states Burnhill and Tubby-Hille (1994) “investigated this issue in some depth [with] publications database from [ESRC] grant reports [and] survey .. .. Assigning non peer reviewed journals to .. enlightenment .. suggests that psychologists, statisticians and geographers do not publish much in non-scholarly literature. Other fields do. Even economics, normally quite scientific in its publication patterns, exhibits a healthy percentage of articles in non-scholarly venues. Linguistics, education and sociology lead in share of non-scholarly publications.” ‘ On measuring the relation between social science research activity and research publication’ Research Evaluation 4 (3) December 1994
  • 16. Pattern of research publication in the social sciences ‘ The Four Literatures of Social Science’ (Diana Hicks, 2004) Handbook of Quantitative Science and Technology Studies, Henk Moed (Ed ) international journal articles books national embedded in their society non-scholarly publications enlightenment or knowledge transfer to the non-scholarly public Hicks states Burnhill and Tubby-Hille (1994) “investigated this issue in some depth [with] publications database from [ESRC] grant reports [and] survey .. .. Assigning non peer reviewed journals to .. enlightenment .. suggests that psychologists, statisticians and geographers do not publish much in non-scholarly literature. Other fields do. Even economics , normally quite scientific in its publication patterns, exhibits a healthy percentage of articles in non-scholarly venues. Linguistics, education and sociology lead in share of non-scholarly publications. ” ‘ On measuring the relation between social science research activity and research publication’ Research Evaluation 4 (3) December 1994
  • 17. Pattern of research publication in the social sciences Table from Burnhill and Tubby-Hille (1994) reproduced in Vasilakos et al (2007) ‘Evaluating the Performance of UK Research in Economics’, [sponsored by the Royal Economic Society] Keele Economics Research Papers, ISSN1740-231x www.keele.ac.uk/depts/ec/kerp
  • 18. Pattern of research publication in the social sciences from Burnhill and Tubby-Hille (1994), not yet reproduced by anyone
  • 19. Pattern of research publication in the social sciences Following the trace to Keele Economics Research Papers, ISSN1740-231x www.keele.ac.uk/depts/ec/kerp led me to:
  • 20. What’s special about social sciences: policy & action “ philosophers have only interpreted the world, the point is to change it” Karl Marx (1845), Thesis 11 published in 1924 in German & Russian translation; in English in 1938 appeared in Engels’ edited version in 1888, as ‘Theses on Feurbach’ Not the moment to debate origins of social science: of Hume, Ferguson, Smith, Hegal, Marx, Kant, Jung, Parsons, Durkheim, Popper etc – even Jeremy Bentham (UCL), nor of modern theorists, but along with development and shifts in theory … what is key is that … the practice of social science, and the modality of peer communication and publication in the discipline, has much to do its connection to the urgency of interaction with agencies of civil society
  • 21. Six Strategic Challenges: Global Economic Performance, Policy & Management Health & Well-being Environment, Energy & Resilience Security, Conflict & Justice Social Diversity & Population Dynamics New Technology, Innovation & Skills UK: ESRC Strategic Plan & Societal Big Challenges
  • 22. Six Strategic Challenges: Global Economic Performance, Policy & Management Health & Well-being Environment, Energy & Resilience Security, Conflict & Justice Social Diversity & Population Dynamics New Technology, Innovation & Skills UK: ESRC Strategic Plan & Societal Big Challenges 7. Public Debt & the ConDem Government
  • 23. Data as scholarship: a cultural shift? Preserve or Perish “ You are not finished until you have done the research, published the results, and published the data, receiving formal credit for everything.” Mark A. Parsons (2006) International Polar Year “ A scholar’s positive contribution is measured by the sum of the original data that he contributes. Hypotheses come and go but data remain.” in Advice to a Young Investigator (1897) Santiago Ramón y Cajal (Nobel Prize winner, 1906)
  • 24. What’s special about social sciences: third party data Demand for data to carry out secondary data analysis Social sciences do not generate all the data they need to address their research questions Do not command the resources (funding/expertise) few research groups and Government could get funding to manufacture original data ESRC-led National Data Strategy, 14 Actions: potential research value of new types of data (transactions data and tracking records) new data infrastructures via EU and Euro Strategy Forum for Research Infrastructures improved access to Census of Population data a geo-spatial resources advisory service (JISC/ESRC) collaborative agreements with agencies within and outside UK sharing of data resources across ‘North/South’ global networks Explains why data libraries and archives have been around so long IASSIST International Association for Social Science Information Service & Technology annual conference since 1974; www.iassistdata.org DISC-UK a group of data libraries in UK universities (including EUDL) Providing ease of access to data held elsewhere (including UKDA) Datashare project to support institutional responsibilities for data alongside Institutional Repositories
  • 25. Note: Not all that is digital are data (& vice versa) Data derive importance from their evidential value the empirical base for (scholarly) statement & decision-making Provenance (how data are derived) is very important Differences in ways that disciplines in Humanities & Social Sciences assess scholarship and evidence in what they regard as data, as value for their subject mix of approach to epistemology, inc document tradition Data represented (encoded) as numbers or words - often derived from observation (with issues of phenomenology!) or as pictures or sounds (not encoded - pre-data?) access to (now digitally/digitised) record of experience or algorithmic models (as with physical & life sciences) modelling is widespread in economics, psychology, social statistics, geography etc
  • 26. Our shared task: To ensure ease & continuing access to record of scholarship research publications and research data Consider at least three types of (research) data: Supplementary data [enhanced publication] multimedia files: part of the published article that presents research argument and conclusions more than linear text, limited tabular and graphical display enhances user experience with various multimedia objects Research dataset(s) upon which conclusions based check analysis of those data to support statements made Database(s) from which datasets were assembled for reproducibility (exposure to refutation) and new work via alternative analysis and updates to the database(s) these are curated in situ – by data centres / originators
  • 27. Citation and linking Citation of the datasets used (Type B data) verification of analysis, that the figures and conclusions accurately reflect those data Citation of d atabase(s) (Type C data) for reproducibility (exposure to refutation) to prompt new work via alternative analysis and updates to the database(s) to credit those who curate the data needed for scholarship Plus hyperlink to the database from the published article … and back again from the database to the published article Links to presentations, blogs, websites, funders etc related to the same research activity and same researcher(s) (Type D data?)
  • 28. Obtaining the citation at source CIESIN “ Most of our datasets and products contain a suggested citation on the Web site as to where the data was obtained” “ Whenever possible, we urge you to cite the use of data and web resources in the reference section” http ://sedac.ciesin.columbia.edu/citations / How to Cite Statistics Canada Products: “ This guide has been developed for authors, editors, researchers, academics, students, librarians and data librarians. “ It describes, in three steps, how to build your reference when citing Statistics Canada products” http://www.statcan.gc.ca/pub/12-591-x/12-591-x2006001-eng.htm Get it from those who make the data available: the data publishers cf Cataloguing in Publication!
  • 29. Link remains the key verb But need to shift attention from Linking resolver (unidirectional) From metadata reference to full text of article SICI-Citation | Z39.50 DOI | OpenURL | http to Linked Data (relational, bi-directional) Between resources in the weave of the Web Using URIs as names for things Not just URLs (the addresses on the web) but the URIs Using RDF/XML to define the relationships between the resources RDF triples: subject / relationship / object
  • 30. Resource Description Framework (RDF) Resource Description Framework (RDF), and URIs framework for representing information in Web; identifiers http://www.w3.org/TR/rdf-concepts/ http://www.w3.org/TR/rdf-primer/
  • 31. RDF graph: Article & Supplementary Data http://www.emeraldinsight.com/fig/0350570303002.png Build and publish as metadata in XML format to be found on the web Publishing text and data/multimedia content in XML will delight researchers Researchers want to access ‘article as data’, via computational algorithm
  • 34. Parse to ‘mark up’ archaeological site record (metadata)
  • 35.  
  • 36.  
  • 37. Overcoming sparse metadata problem that inhibits discoverability using ancillary information in the metadata evoking ‘has Event’ relation Initial focus on (digitised) 20th Century newsfilm footage Enriching resources with contextual metadata Sparse Metadata The only data we have: 1 st October 1995 Cyprus Disturbance (street disturbance) British soldiers Broadcast on TV News
  • 38. finding related text for mining and so auto-creating metadata to improve discoverability and provide/enhance context
  • 39.  
  • 40. Digital Library as applied Information Science Michael Buckland, Presidential Address, American Society for Information Science, JASIS’s 50th (1998) 2 traditions/mentalities co-exist in Information Science Document tradition : signifying record-ness Computational tradition : various uses of formal techniques non-convergent mentalities working to build the ‘digital library’ modernisation of library services infrastructure to access complex databases Aside: first met Clifford Lynch when visiting Professor Buckland in UC Berkley on occasion of IASSIST Conference in 1994
  • 41. [email_address] http://edina.ac.uk Tel.: +44 (0)131 650 3302 Fax: +44 (0)131 650 3308 Time for me to stop … Hoping that I have left some space/place for questions Thank you Acknowledgements
  • 42. Pattern of research publication in the social sciences from Burnhill and Tubby-Hille (1994), not yet reproduced by anyone
  • 43. ISSN Register E-J Preservation Registry Service E-Journal Preservation Registry SERVICES: user requirements (a) (b) Data dependency P iloting an E -journals P reservation R egistry S ervice METADATA on extant e-journals METADATA on preservation action Abstract Data Model: Figure 1 in reference paper in Serials , March 2009 Digital Preservation Agencies e.g. CLOCKSS, Portico; BL, KB; UK LOCKSS Alliance etc.
  • 44. Author (article) Reader (article) Publisher article serial issue Library (serial) Licence Challenge to Ensure Continuing Access peer review peer exchange Informal: ‘invisible college’ and the ‘gift economy’ Institutional arrangement Licensed Online Access Fo rma£ E c onomy ILL/ docdel Continuity of access learned society Long term digital preservation E-prints Institutional Repositories free to web access E-prints Subject Repositories
  • 45. Author (article) Reader (article) Publisher article serial issue Library (serial) Licence Increasing dominance of The Web peer to peer exchange Informal: ‘invisible college’ and the ‘gift economy’ Institutional arrangement Fo rma£ E c onomy free to web access Role of Institutional Repositories? Web 2.0/3.0: Semantic web mash-ups, Blogs. RSS feeds, Wikis
  • 46. Research Data Creator Researcher Generates (curates) data for own purpose, or as part of team … wants/has to ‘put’ it somewhere for use by others (perhaps to be recognised by a peer community) Key User (Researcher) Verbs: Discover data of interest Locate service on that data with documentation on provenance etc Request permission to use service Access to service/data, Evidential value of data in analysis as object of desire’
  • 47. …… .. The term “curation” builds on our understanding of the word “curator”, somebody who keeps something for the public good, whose value often needs to be brought out by the curator. Firstly, this open context implies more support for explicit policies with regard to data sharing, and it has major implications for structuring and tools. Secondly, the digital curator is store-keeper but he is also closely linked to promoting new science, making sure that his user-base is solid, sufficient, and looking forward to identify new ways to serve present and future researchers. The digital curator should take an active role in promoting and adding value to his holdings, hold exhibitions, run joint events; he should manage the value of his collection.
  • 48. More definitions There does seem to be a lack of clarity. Some terms worth distinguishing are: data preservation : a general term probably equivalent to digital preservation in this context digital preservation : could be, and probably is, interpreted as simply ensuring the original bits and bytes are accessible digital information preservation : this is what is referred to in the OAIS standard - what is important is not the original "bits and bytes" but the content. An OAIS ensures that the content is accessible, understandable and usable. curation : general term - taking care of things if someone currently calls themselves a “curator” – do we accept their definition? data curation : looking after and adding value to data digital curation : looking after and somehow "adding value" to digital data. This probably implies creating some new data from the existing, in order to make the latter more useful and "fit for purpose". information curation : not seen in the wild evidence : bit preservation plus authenticity and trust?
  • 49. licence to use ‘ ease’ P.Burnhill, Edinburgh 2009 usability open preservation post-cancellation back content restricted access to content & services who/WAYF authentication licence registry entitlement history location registry/discovery content registry archiving registry UKAMF registry Suncat & Zetoc OpenURL Router [Curation is additional but has relation to ease and continuing access.] Use case: article–length work published in e-journals ISSN Register as a key content registry; need registry of ToC ‘ continuity’ Ensuring researchers, students and their teachers have ease and continuity of access to online scholarly resources

Editor's Notes

  1. As many of you will know, JISC is the Joint Systems Committee of the UK funding bodies for higher and further education. It has a number of sub-committees which help inform policy and also watch over programmes of funding and the operation of services, such as those provided by the two National Data Centres. It has also set up a company, JISC Collections as a legal body to broker licences.
  2. EDINA may be less familiar, at least to all of you. It is a national academic data centre, established in 1995 following the success of the University of Edinburgh putting forward its Data Library in an open competition to set up three datacentres capable of hosting and providing access to bibliographic datasets and numeric research data. The other two were BIDS, which subsequently moved into the private sector as Ingenta, and MIDAS, the data centre at the University of Manchester - its now renamed as Mimas. The mission of EDINA, which incidentally is the older poetic name for Edinburgh, is to enhance productivity of research, learning and teaching in the UK. It used to host a range of key A&T databaes like BIOSIS ~Previews, Compendex, Inspec, Art Abstracts etc, but now the services on journal …. As you can see, EDINA is a funded by JISC … <click>
  3. This illustrates that in the Sociology of Science how evidence from one decade is re-used in the next – especially since this relates to research awards made in 1984/5. See also the role played by ‘research papers’, even given an ISSN
  4. Digital is a medium, which along with telematics makes different things possible – but why conflate digital media with data? Digital surrogate of an analogue work may be regarded as ‘data’ but only to the extent to which the analogue work was previously thought of having evidential value.
  5. Type A: if its part of the published work then should we look to the preservation agencies, the national deposit libraries and CLOCKSS, LOCKSS and Portico, for access over the longer term? And do publishers see these as costly files to maintain in the short to medium term? Or do publishers want to hand the responsibility to subject and institutional data repositories? Type B: and with knowledge of UKDA, ADS but also the (growing but problematic ??) call for data to be held in institutional repositories, some recommendations on what is the 'right thing' to do, and how that can be done with ease - a Repository Junction task! For Type C, I intend to propose what editors should require by way of citation and URL link. I am on the hunt for such editorial practice.
  6. Focus here on ‘article-length’ work rather than longer working papers or book-length work,nor correspondence, annotation & criticism, nor text books.