The open repositories community has made great strides in recent years in addressing interoperability, policy and providing the arguments for open access and sharing. One aspect of open research which has come to prominence is the importance of software as a fundamental part of reproducible research, which in turn raises issues around the preservation of software.
In this short presentation, I will describe some of the work that the Software Sustainability Institute (SSI) has been doing to address the structural and policy issues which currently present a barrier to the deposit and use of software in open repositories.
Mining Software Repositories: Using Humans to Better SoftwareMarat Akhin
The document discusses mining software repositories (MSR), which involves analyzing historical software development data to understand empirical aspects of software development and help guide future work. MSR data can include version control systems, bug trackers, communication logs, execution traces, and source code. MSR methods include classification, clustering, and statistical analysis. Studies have shown MSR insights can help with quality assurance, architecture analysis, bug prediction, and providing developer feedback. Specific examples analyze correlations between bugs and factors like code changes on Fridays or reopened bugs, the relationship between code reviews and bugs, and whether code clones are linked to more or fewer bugs than other code. The document concludes more MSR studies are possible as open source code and cloud data doubles
1. The document discusses using topic models to study defects in software systems.
2. Topic models are used to analyze source code files and group them into topics, some of which are found to be more defect-prone than others.
3. Metrics related to topic membership are able to better explain post-release defects compared to traditional static and historical metrics. This suggests topics can help identify areas of code more likely to contain defects.
MapReduce can effectively scale three large-scale MSR studies to clusters with more machines. A software evolution study using J-REX saw a 9x speedup on an 18-machine cluster. Log analysis using JACK saw a 6x speedup. Code clone detection using CCFinder that previously took 58 hours was able to complete in 58 hours on an 18-machine cluster. Two main challenges of migrating MSR studies to MapReduce are the locality of the analysis (local, semi-local, or global) and granularity of analysis (fine-grained or coarse-grained). Other challenges include locating a suitable cluster, managing large amounts of data during analysis, and recovering from errors.
Mining and Untangling Change Genealogies (PhD Defense Talk)Kim Herzig
The document discusses mining software repositories to analyze code history and detect patterns. It describes representing code changes as change operations like adding or removing method definitions. These are used to build change genealogies modeling dependencies between changes. Change genealogies can be model checked using CTL to extract rules describing likely cause-effect chains of changes. These rules are evaluated on projects to predict with over 60% precision which future changes may occur based on current changes. The approach ensures predictions are based on structural dependencies between changes.
Mobile Audio Transcription and Submission (MATS)DevCSI
Slides to accompany Keith Gilbertson and Linda Newman's pitch of their entry to the DevCSI OR2012 Developer Challenge.
Further information about their entry can be found at http://devcsi.ukoln.ac.uk/or2012-developer-challenge-mobile-audio-transcription-and-submission-mats
Slides to accompany Patrick McSweeney's winning pitch in the Open Repositories 2012 DevCSI Developer Challenge.
More information about this entry can be found at http://devcsi.ukoln.ac.uk/or2012-developer-challenge-data-engine
The document describes a study that aimed to predict the severity of reported software bugs by analyzing their textual descriptions. The study used a Bayesian classifier trained on bug reports from Mozilla, Eclipse, and GNOME to classify bug severity. Results showed the approach could predict severity with reasonable precision and recall when using short descriptions and training per component. Combining components required a larger training set.
This document discusses the preservation of e-journal content by archiving organizations called "The Keepers." It provides examples of organizations that serve as keepers, such as the National Science Library of the Chinese Academy of Sciences. It also mentions The Keepers Registry, which allows users to search for e-journal content preserved by keepers based on title, ISSN, or publisher. The document suggests that users can search The Keepers Registry to discover which volumes of the journal Folklore have been preserved.
Code coverage for MSR Researches [Work in Progress]Maurício Aniche
The document discusses code coverage analysis and proposes a static analysis technique to estimate code coverage. It calculates method-level and class-level coverage using McCabe's complexity metric and test method counts. The technique is compared to the dynamic analysis tool Emma, finding differences of 25-30% on average. While fast, static analysis has disadvantages like being heuristic-based and having difficulty handling inheritance and polymorphism.
Doing Science Properly In The Digital Age - Rutgers SeminarNeil Chue Hong
The document discusses the role of software in research and the Software Sustainability Institute's (SSI) work to address challenges. SSI helps researchers make their software more sustainable and reusable through consulting, training, and community engagement. Case studies show how SSI has helped research groups improve software to enable new science and broader adoption. The document observes that software is now pervasive in research but culture does not widely support reuse or recognize software contributions. SSI aims to address gaps in skills, recognition, and sustainable practices to support digital research foundations.
The document discusses the foundations of digital research and software sustainability. It promotes practices like developing reusable, reproducible software and careers in software. It also addresses issues like skills and training, recognition for software, and ensuring software is accessible, open, and its "correctness" can be assured. The document proposes a 5-star rating system for software quality and sustainability.
Scientific Software: Sustainability, Skills & SociologyNeil Chue Hong
This document discusses the importance of software sustainability. It notes that software is everywhere, long-lived, and hard to define, which makes sustainability challenging. It emphasizes that software sustainability requires cultivating skills in developers and researchers, providing proper incentives, and recognizing that people are a key part of maintaining software over long periods of time.
Software, Training and Users Panel: the Software Sustainability Institute's ViewNeil Chue Hong
1) Software is now pervasive across all areas of research but a culture of re-use is lacking, leading to wasted effort and increased silos.
2) There are gaps in skills training, software infrastructure support like repositories, and recognition/reward for software work.
3) Addressing these gaps would help create a more manageable research software ecosystem that discourages duplicative work and supports software at different stages of its lifecycle.
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...Neil Chue Hong
Keynote given at Digital Research 2012, Oxford, on the current challenges and opportunities for changing the way that software development is taught to researchers. Can we get to the point where the "why" of programming is as important as the "how"?
The goal of this talk is to highlight open source opportunities for students especially through an opportunity to earn $5000 through Google Summer of Code program. I will discuss some of the tips on how to engage with open source communities, the befits for contributing. I will provide motivating examples on how students can gain significant experience in contributing challenging distributed systems problems while impacting scientific research. I will specifically focus with a concrete example of Apache Airavata software suite for Web-based science gateways. I will list some example GSoC topics of interest and provide some recipes for success in getting accepted and navigating through success.
What to curate? Preserving and Curating Software-Based Artneilgrindley
This is a presentation given at the CHArt (Computers and History of Art) conference held in London in November 2011. The slides on the title page are images taken from works exhibited at the V&A Decode exhibition.
Software Sustainability in e-Research: Dying for a ChangeNeil Chue Hong
The document discusses challenges with sustaining software developed for research purposes. It describes how early UK e-science projects led to the establishment of organizations like OMII and the Software Sustainability Institute to help maintain software over the long term. However, software sustainability remains a challenge, as researchers are often more interested in developing new software than maintaining existing tools. The document advocates developing communities around research software and viewing software sustainability as an ongoing process rather than a single solution. It also argues that researchers should receive proper credit for their work sustaining software over time.
OMII-UK is an open-source organization established by the EPSRC to provide software and services to help the UK research community adopt e-Research practices and technology. It is currently funded by EPSRC, JISC and others. OMII-UK's mission is to cultivate and sustain important community software through various channels of support like requirements gathering, software development expertise, and community development. It has undertaken initiatives like the ENGAGE Initiative to better understand researchers' computational needs and develop focused projects to address these needs.
1. The document discusses the challenges of widespread adoption of e-research technologies by everyday researchers. While early adopters found success, most researchers are not using the infrastructure services that have been created.
2. It argues that repositories and other e-research tools need to focus on the needs and perspectives of researchers. Researchers work with data, so tools should emphasize data sharing and metadata. They should also support collaboration and open participation in the scientific process.
3. For technologies to truly enable new forms of research, their use needs to become integrated into the everyday work of all researchers, not just a specialized few. Systems must be easy to use, empower researchers' autonomy, and intersect seamlessly with digital and physical
1. The document discusses the challenges of adopting e-research technologies by everyday researchers and moving from specialized scientists doing specialized science to widespread adoption.
2. It proposes a more data-centric and collaborative approach focused on the social process of science and empowering researchers.
3. Key lessons for repositories include understanding user needs, being open-minded about problems and solutions, embracing the web instead of creating barriers, and thinking of repositories as a cloud service instead of an institutional system.
This document summarizes key aspects of computational research methods and the myExperiment platform. It discusses how myExperiment allows researchers to automate, share, and reuse workflows and other methods. It also addresses challenges around reproducibility, provenance, collaboration, and incentives for sharing methods. MyExperiment provides social features and aims to build a community around openly exchanging and improving computational research techniques.
Presented by Simon Hettrick at the e-Infrastructure Academic User Community Forum, this talk provides a number of recommendations for improving the provision of training for e-Infrastructure.
Six Principles of Software Design to Empower ScientistsDavid De Roure
Keynote talk for Workshop on Managing for Usability:
Challenges and Opportunities for E-Science Project Management, 10-11 April 2008,
OeRC, University of Oxford, UK
Panel presentation at Software Institutes for Sustained Innovation (SI2) BoF at SuperComputing11 in Seattle.
Experiences and challenges identified from 7 years of working to build more maintainable software in research environments.
This document discusses challenges related to analyzing large and heterogeneous biological datasets generated by new sequencing technologies. It describes how the exponential growth of data is outpacing computational resources. Common solutions have not been effective due to obstacles like access to data, tools and computing power. Ideal solutions would provide flexible access to resources through cost-effective and reusable platforms. The document presents Amazon and the DOE Knowledgebase as examples of architectures that could help address these issues through community-driven development and open access to data and services. It concludes by offering advice on managing expectations and emphasizing reproducibility, accessibility and communication across different stakeholders.
Devara Sainath has over 2.9 years of experience as a Java developer working on projects for clients like Sell Globally Infotech and SEED INFOTECH. He has expertise in technologies like Java, J2EE, MySQL, and tools like Eclipse. Some of his major projects include a fuzzy preference tree recommender system, spatial information system, outlier detection system, and multi-cloud data storage system. He holds an M.Tech in software engineering and has publications and technical skills in areas like algorithms, OS, networks, and cloud computing.
The document introduces Sakai, an open source learning management system. It discusses how Sakai can save institutions money compared to proprietary LMS alternatives, while providing robust features. The presentation covers the Sakai community and software, budget challenges in higher education, and rSmart's services to help institutions implement Sakai, including training, support, hosting and consulting.
Similar to Where does it go from here? The role of software in digital repositories (20)
Why developing research software is like a startup (and why this matters)Neil Chue Hong
When we think about the software used in research and science, we might think of the commercial packages with thousands of users, or the millions of lines of code that support experiments such as the Large Hadron Collider, or indeed the millions of scripts written every day by researchers across the world to undertake simple tasks. What is clear is that modern research relies on software: a recent survey of UK researchers conducted by the Software Sustainability Institute reported that 92% of researchers used software, and 69% could not conduct their work without it. Millions of dollars are invested each year in supporting a quasi-industry of software production, with the equivalent of the full-spectrum from large multinationals and tiny cottage industries, but little is known about whether this is efficient or indeed appropriate. This talk will examine the similarities between the development of software in the research environment and the lifecycle of technology startup companies. It will also consider the driving factors behind adoption of software and the impact of software sustainability on the ability to conduct research.
The document discusses various challenges around tracking contributions and attributing authorship to software. It notes that while version control systems make it easy to track changes to software, they do not uniquely attribute the work to individuals. There are open questions around what level of a software project deserves attribution (e.g. functions vs. whole packages). Micro-attribution of individual contributions is also discussed. The document proposes that following the "Five Stars of Research Software" can help make software more identifiable, reusable and accessible to others.
UK Funder Policy - the results of the Academic Spring?Neil Chue Hong
The document discusses recent UK funder policies that emphasize open access and reproducibility in computational research. Two influential reports in 2012 addressed improving access to published research findings. In response, the Research Councils UK implemented new open access and data policies starting in 2013. These policies require publicly sharing research publications and data, with exceptions for legal or commercial reasons. The policies aim to make publicly funded research outputs openly available and reusable. Assessment of research is expanding to include software and datasets as valid research outputs.
The Software Sustainability Institute (SSI) provides services to help research groups sustain their software over the long term. It collaborates with groups in various fields to improve key software through advice, training, and partnerships. Case studies describe projects in fields like fusion energy, climate modeling, geospatial data, and computational chemistry. The SSI aims to promote best practices and change perceptions so software is recognized as a valuable long-term asset, not just for initial research. Sustaining software requires support for both technical aspects and community engagement over decades.
Software Sustainability: preserving the future of research softwareNeil Chue Hong
Talk given at the National Science Foundation on the UK e-Science programme, the UK Software Sustainability Institute, and some of the challenges faced in ensuring long term development and maintenance of scientific software
Cultivating Sustainable Software For ResearchNeil Chue Hong
Keynote given at the NSF Cyberinfrastructure Software and Sustainability Workshop, March 26th-27th 2009, Indianapolis.
Exploration of software sustainability based on experiences from UK.
Presentation given at Supercomputing 2007 on the progress of data sharing models, specifically highlighting the collision of data grid / data service and Web 2.0 worlds.
Cat Herding and Community Gardens: Practical e-Science Project ManagementNeil Chue Hong
A talk given by Neil Chue Hong at the e-Science Project Management Symposium looking at issues and models of managing projects which are cross-organisation, cross-discipline and cross-usertype, based on experience of managing several e-Science projects.
Why Good Software Sometimes Dies... and how to save itNeil Chue Hong
A talk given by Neil Chue Hong at the JISC Innovation Forum 08 on software sustainability and the issues, challenges and potential solutions to improve the longevity and uptake of your research software.
UK e-Infrastructure: Widening Access, Increasing ParticipationNeil Chue Hong
A talk given at the ICHEC Annual Seminar by Neil Chue Hong, reflecting on the rise of Grid and Web 2.0, and how this might enable increased participation and use of computing infrastructure for e-Science and research.
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc
Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk.
What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year?
Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year.
This webinar will review:
- Key changes to privacy regulations in 2024
- Key themes in privacy and data governance in 2024
- How to maximize your privacy program in the second half of 2024
YOUR RELIABLE WEB DESIGN & DEVELOPMENT TEAM — FOR LASTING SUCCESS
WPRiders is a web development company specialized in WordPress and WooCommerce websites and plugins for customers around the world. The company is headquartered in Bucharest, Romania, but our team members are located all over the world. Our customers are primarily from the US and Western Europe, but we have clients from Australia, Canada and other areas as well.
Some facts about WPRiders and why we are one of the best firms around:
More than 700 five-star reviews! You can check them here.
1500 WordPress projects delivered.
We respond 80% faster than other firms! Data provided by Freshdesk.
We’ve been in business since 2015.
We are located in 7 countries and have 22 team members.
With so many projects delivered, our team knows what works and what doesn’t when it comes to WordPress and WooCommerce.
Our team members are:
- highly experienced developers (employees & contractors with 5 -10+ years of experience),
- great designers with an eye for UX/UI with 10+ years of experience
- project managers with development background who speak both tech and non-tech
- QA specialists
- Conversion Rate Optimisation - CRO experts
They are all working together to provide you with the best possible service. We are passionate about WordPress, and we love creating custom solutions that help our clients achieve their goals.
At WPRiders, we are committed to building long-term relationships with our clients. We believe in accountability, in doing the right thing, as well as in transparency and open communication. You can read more about WPRiders on the About us page.
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...Toru Tamaki
Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models" arXiv2023
https://arxiv.org/abs/2307.12980
Best Programming Language for Civil EngineersAwais Yaseen
The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era.
Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
Details of description part II: Describing images in practice - Tech Forum 2024BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and transcript: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
Choose our Linux Web Hosting for a seamless and successful online presencerajancomputerfbd
Our Linux Web Hosting plans offer unbeatable performance, security, and scalability, ensuring your website runs smoothly and efficiently.
Visit- https://onliveserver.com/linux-web-hosting/
Mitigating the Impact of State Management in Cloud Stream Processing SystemsScyllaDB
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states.
In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing.
Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Measuring the Impact of Network Latency at TwitterScyllaDB
Widya Salim and Victor Ma will outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter's network latency.
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
7 Most Powerful Solar Storms in the History of Earth.pdfEnterprise Wired
Solar Storms (Geo Magnetic Storms) are the motion of accelerated charged particles in the solar environment with high velocities due to the coronal mass ejection (CME).
Best Practices for Effectively Running dbt in Airflow.pdfTatiana Al-Chueyr
As a popular open-source library for analytics engineering, dbt is often used in combination with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models.
This webinar will cover a step-by-step guide to Cosmos, an open source package from Astronomer that helps you easily run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through:
- Standard ways of running dbt (and when to utilize other methods)
- How Cosmos can be used to run and visualize your dbt projects in Airflow
- Common challenges and how to address them, including performance, dependency conflicts, and more
- How running dbt projects in Airflow helps with cost optimization
Webinar given on 9 July 2024
Best Practices for Effectively Running dbt in Airflow.pdf
Where does it go from here? The role of software in digital repositories
1. www.software.ac.uk
Where does it go from here?
The Place of Software in Digital Repositories
12 July 2012
OR2012, Edinburgh
Neil Chue Hong (@npch)
N.ChueHong@software.ac.uk
Software Sustainability Institute
2. Software is pervasive
in research www.software.ac.uk
Software Sustainability Institute
3. The Software Sustainability
Institute www.software.ac.uk
A national facility for building better software
• Better software enables better research
• Software reaches boundaries in its
development cycle that prevent
improvement, growth and adoption
• Providing the expertise and services
needed to negotiate to the next stage
• Software reviews and refactoring, collaborations
to develop your project, guidance and best practice
on software development, project management,
community building, publicity and more…
Supported by EPSRC
Software Sustainability Institute Grant EP/H043160/1
4. Software Sustainability:
preservation vs sustainability www.software.ac.uk
Sustainability?
Image courtesy of London Permaculture under CC-by-nc-sa license
Image courtesy of Mortati under CC-by-nc-nd
Preservation?
Software Sustainability Institute
5. Why are you considering
software sustainability? www.software.ac.uk
Achieve legal compliance
Create heritage value
Purpose
Enable continued access to data
Encourage software reuse
JISC-funded, with Curtis+Cartwright
http://www.software.ac.uk/resources/preserving-software-resources
Software Sustainability Institute
6. How are you going to choose
the right approach? www.software.ac.uk
Preservation (techno-centric)
Emulation (data-centric)
Migration (functionality-centric)
Approach
Transition (process-centric)
Hibernation (knowledge-centric)
Deprecation
Software Sustainability Institute
7. Software Carpentry
www.software.ac.uk
• Helping scientists be more productive by
teaching them basic computing skills
• How to use
repositories
properly
is a key skill
• http://software-carpentry.org
Software Sustainability Institute
8. Just the Nature of the problem?
www.software.ac.uk
Statistics courtesy of Greg Wilson, Software Carpentry, from Nature article
Maintenance is not fun
Published online 13 October 2010 | Nature 467, 775-777 (2010)
doi:10.1038/467775a
Hacking is fun
Software Sustainability Institute
10. Slide from Carole Goble, JCDL 2012
Reuse Review
New Refresh
State
Rerun
Same
State Good enough Repeat
To Verify
Reproduce
with new Data
Data
Replay
Provenance
Repurpose Recover
Reconstruct Repair
Data
Reproduce with new Method
Public
ation
Method Method Method
only
Documentation Provenance Execution
(link data and code)
Drummond C Replicability is not Reproducibility: Nor is it Good Science, online
Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
11. The most important: Reward
www.software.ac.uk
• How do we reward people for important software
contributions?
• Traditionally: publish a research paper that happens to
mention software
Can we provide more direct, acceptable software citations?
• A Research Software Impact Manifesto
http://www.software.ac.uk/blog/2011-05-02-publish-or-be-
damned-alternative-impact-manifesto-research-software
NB Authorship is hard
Software Sustainability Institute
13. Boundary www.software.ac.uk
What do we choose to keep:
- Workflow?
- Software that runs workflow?
- Software referenced by workflow?
- Software dependencies?
What’s the minimum citable part?
Software Sustainability Institute
14. Function
Granularity www.software.ac.uk
Library / Suite / Package
Algorithm
Program
…
Software Sustainability Institute
15. Why do we version?
Versioning www.software.ac.uk
- To indicate a change
- To allow sharing
- To confer special status
Public Public Public
v1 v2 v3
Personal Personal
v3 v3a
Personal Personal Personal
v1 v2 v2a
Personal
v2a
Software Sustainability Institute
17. Differing roles,
different repositories www.software.ac.uk
backup sharing archiving
Timescales Ingest
Policy Metadata
Licensing Assurance
Software Sustainability Institute
18. Software Metapapers
www.software.ac.uk
• Create a complete scholarly record including “standard”
publication, method, dataset and models, and software
e.g. modelling and simulation, statistical analysis
Enable replay, reproduction and reuse
• Pragmatic approach is to create a metadata record for
the software, and link it to a copy of the software in
some storage infrastructure
This is a software metapaper
Peer-review the metadata, not the software
• Journal of Open Research Software:
http://openresearchsoftware.metajnl.com/
See: http://openresearchsoftware.metajnl.com/faq/
Software Sustainability Institute
and the work by B. Matthews et al: The Significant Properties of Software: A Study
19. An acceptable repository
www.software.ac.uk
• Metapaper references an instance of software,
stored in a “suitable” repository
Clear access / deposit / preservation policy
Adherence to standards
Ability to easily “transfer”
Sustainability of hosting organisation
Ability to monitor, check integrity (obsolescence?)
• We may be storing
Binaries, source code (as text or archived), virtual
machines(!)
Software Sustainability Institute
20. Potential for confusion
www.software.ac.uk
• ‘The right license for all parts of the scholarly record’
Victoria Stodden, Enabling Reproducible Research: Open
Licensing for Scientific Innovation
• Commonly used OSI approved licenses include:
Apache License, 2.0 (Apache-2.0)
BSD 3-Clause “New” or “Revised” license (BSD-3-Clause)
BSD 3-Clause “Simplified” or “FreeBSD” license (BSD-2-Clause)
GNU General Public License (GPL)
GNU Library or “Lesser” General Public License (LGPL)
MIT license (MIT)
Mozilla Public License 2.0 (MPL-2.0)
Common Development and Distribution License (CDDL-1.0)
Eclipse Public License (EPL-1.0)
• Does enabling the deposit of software just confuse
those already depositing publications/data?
Software Sustainability Institute
21. 5 Stars of Software?
www.software.ac.uk
• Do we need a 5 stars for software?
Existence – there is accurate
metadata that defines the software
Availability – you can access and run
the software
Openness – the software has an
open permissible license
Assured – the software provides
ways of assuring its correctness
Linked – the related data, c.f.
5 Stars of Linked Data
dependencies and papers are (Berners-Lee)
indicated 5 Stars of Online Journals
(Shotton)
Software Sustainability Institute
22. Take home points www.software.ac.uk
1) Researchers are developing more software
than ever, and trying to do it better
2) They want to be rewarded for creating a
complete scholarly record – this includes
software
3) We still don’t know the best way to shift
from one repository role to another when it
comes to software!
BackupSoftware Sustainability Institutearchiving
-> sharing ->
Editor's Notes
Steven Gray here at CASA has produced a proof of concept showing the last hours snow fall in the UK as Tweets and the last 24 in postcode districts (the important part here is the data underneath, not the Tweets as such)Based on Ben Marsh’s work.
I ended up doing this because we needed to fix the basics:Reproducible researchSoftware credit / career pathsSoftware skillsDrawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchersProviding services for research software users and developersDeveloping research community interactions and capacityPromoting research software best practice and capability
Clarifying the Purposes and Benefits of Software Preservation: http://softwarepreservation.jiscinvolve.org/wp/about/
There is a spectrum of approaches
Statistics from Greg WilsonAre academics software developers?Can research consortia manage production?Are timing constraints different?What is the role of the PI in software development management?Are the skills for software and research the same?
c.f work of James Howison
Based on study done for Cameron Neylon’s Beyond Impact workshop
Is it more important to sustain the software that this workflow references, or the workflow itself?
At what level do you reference, at what level do you deposit?
Made more difficult than data because of the fluidly changing collaborative nature of software development – not just adding to the contributor pool
Based on OR2012 workshop outputs
Want to move towards OSI licenses which are similar in spirit to CC-BY e.g. BSD, Apache
C.f.5 Stars of Linked Data (Berners-Lee):Available w/ open license, machine-readable, non-proprietary format, open standards, linked to provide context 5 Stars of Online Journals (Shotton):Peer Review, Open Access, Enriched Content, Available Datasets, Machine-readable metadataWhat about community?