Community-developed guidelines for managers of software registries and repositories will help efforts to increase awareness and recognition of software as a research product.
Software is a fundamental element of the scientific process, and cataloguing scientific software is helpful to enable software discoverability. During the years 2019-2020, the Task Force on Best Practices for Software Registries of the FORCE11 Software Citation Implementation Working Group worked to create Nine Best Practices for Scientific Software Registries and Repositories. In this post, we explain why scientific software registries and repositories are important, why we wanted to create a list of best practices for such registries and repositories, the process we followed, what the best practices include, and what the next steps for this community are.
Why are scientific software registries and repositories important?
Scientific software registries and repositories support identifying and finding software, provide information for software citation, foster long-term preservation and reuse of computational methods, and ultimately, improve research reproducibility and replicability.
Why did we write these guidelines?
Managers of scientific software registries and repositories have been working independently to run their services and provide useful information and tools to users in different communities. The Best Practices for Software Registries Task Force participants had different perspectives representing a heterogeneous set of resources, but came together for the common goal of creating a list of best practices for scientific software registries. These shared practices help to raise awareness of software as a research output, enable credit for software creators, and guide curators working on software catalogues through the steps to consider when setting up their software registries. In the longer term, we hope to improve the interoperability of the software metadata supported by different services.
The goals that we considered for writing the guidelines were:
to have a minimal number of best practices, easy to adopt by repository managers
to be broadly applicable to most or all of our resources
to be descriptive on a meta level, not prescriptive, and focused on what the best practices should do or provide, not on what a suggested policy or element should specifically say.
What are the best practices?
Our guidelines, listed below, provide an overview of the key points to take into consideration when creating a software registry. They are:
Provide a public scope statement (examples)
Provide guidance for users
Provide guidance to software contributors (examples)
Establish an authorship policy
Share your metadata schema (examples)
Stipulate conditions of use (examples)
Provide a retention policy (examples)
Disclose your end-of-life policy (examples)
Our pre-print offers more explanation about each guideline and a longer list of implementations that we found when we were doing our work on these practices.
What process did we follow to produce the guidelines?
Representatives from numerous software registries and repositories were involved in the FORCE11 Software Citation Implementation Working Group (SCIWG). Alice Allen proposed that we form a task force within the SCIWG for writing up some best practices for the registries and repositories, and with acceptance by the co-chairs of the SCIWG and interest from relevant people, the Task Force on Best Practices for Software Registries was formed. Initially, we gathered information from members of this Task Force to learn more about each resource and to identify some of our overlapping interests. We then identified potential best practices based on prior issues we experienced running our services and discussed what each potential practice might include or exclude.
Through iterative deliberations, we determined which of the potential practices were the most broadly applicable. With generous funding from the Alfred P. Sloan Foundation, we hosted a workshop for scientific registries and repositories, part of which was devoted to gathering final consensus around the Best Practices. The workshop included registries who were not part of the Task Force, resulting in a broader set of contributions to the final list.
What are the next steps for the group?
Our goal is to continue our efforts by implementing these practices more uniformly in our own registries and repositories and reducing the burdens of adoption. We have created SciCodes, a consortium of scientific software registries and repositories, which is now defining the next priorities to tackle, such as tracking the impact of good metadata, improving interoperability between registries, and making our metadata more discoverable by search engines and services such as Google Scholar, ORCID, and discipline indexes. We are also sharing tools and ideas in a series of presentations that are recorded and available for viewing on the SciCodes website, so please check them out!
This article is cross-posted on the SciCodes website, the ASCL blog, the US Research Software Sustainability Institute blog, the UK Software Sustainability Institute blog, and the FORCE11 blog.
Alejandra Gonzalez-Beltran leads the Data and Software Engineering Group (DSEG) in the Scientific Computing Department at the Science and Technology Facilities Council, part of UK Research and Innovation. Alejandra’s work revolves around developing models, methods, and software tools for data science and innovative scholarly communication with the aim of enabling Findable, Accessible, Interoperable and Reusable (FAIR) data, research reproducibility and aggregation of research results. DSEG mainly focuses on the design, development and support of high-quality FAIR software to support FAIR data management of the large-scale facilities at the Rutherford Appleton Laboratory, which include lasers, particle accelerators, neutron and muon sources.
Alice Allen is the Editor of the Astrophysics Source Code Library (ASCL), which works to improve the transparency and reproducibility of astronomy research by making the computational methods used in this research more discoverable. She is a member of the FORCE11 Software Citation Implementation Working Group, the SciCodes Consortium, and the Astronomy Picture of the Day Evaluation and Advisory Committee.
Allen Lee is a computer scientist and research software engineer working to (hopefully) improve our abilities to understand and sustainably evolve with the complex adaptive systems that we collectively navigate. I contribute to open science initiatives like the Network for Computational Modeling in the Social and Ecological Sciences (https://comses.net), help conduct research in collective action and the commons 🤲, and serve as a maintainer and instructor for The Carpentries.
Daniel Garijo is a Researcher at the Ontology Engineering Group of the Universidad Politécnica de Madrid. Daniel's research activities focus on e-Science and the Semantic web, specifically on how to increase the understandability of software and scientific workflows using provenance, metadata, intermediate results and Linked Data.
Tom Morrell is the Research Data Specialist at Caltech Library. He is responsible for managing the CaltechDATA institutional data and software repository and helping researchers effectively store and share their data and software. Tom also contributes to the FORCE11 Software Citation Implementation Working Group, SciCodes Consortium, and InvenioRDM repository development.