The Contributions of Scientific Software to Scientific Discovery

Omicron spectrogram of LIGO-Livingston detector’s data, see image credit below.

In July 2021, the SoftwareX journal published a special issue on gravitational waves and the software that contributed to their detection. The editor, Kate Keahey, joined with colleagues to write an introductory editorial to that special issue. Kate kindly took time from her busy schedule to discuss with Rinku Gupta (Editor-in-chief of the BSSw.io website) her concern that society – researchers, funding agencies, academia – continues to fail to credit software as a major contributor to scientific discovery.

Rinku Gupta: Kate, let’s start with a brief discussion of the journal. You co-founded SoftwareX in 2015. What motivated that decision? And how far do you think the journal has come in realizing your aims?

Kate Keahey: My colleagues and I discussed the fact that many people don’t think of software as a scientific instrument and thus its role in enabling science is not well understood or appreciated. This has many undesirable consequences on research publishing, funding structures, and career development. Software developers might be thanked in the acknowledgments of a paper that focuses on research results, but their libraries or tools are not always recognized as original scientific contributions, which is what they often are. Nor was the actual impact of the software being highlighted. That situation was, and still is, troublesome: it not only means that excellent software contributions often go unnoticed, but also that scientists and software engineers who put their energies into software design and development often are not professionally recognized for those efforts and can find themselves in a career dead end. This means that talented and ambitious individuals drift away from scientific software and into other avenues of creating impact, which then leaves research software development efforts underserved and poses a barrier to progress in science.

Out of this came the idea of creating a journal that specifically recognizes software as a scientific instrument -- in other words, not just any software system but original software contributions that have an impact on scientific discovery. This approach takes a metaphor from scientific publishing -- via a peer reviewed journal -- and applies it to a new medium. The jury is still out on whether this is the best way to recognize and share scientific software instruments but it certainly has struck a chord with the community. One evidence of this is the fact that SoftwareX received a PROSE award from the Association of American Publishers in 2016 for innovation in journal publishing.

Rinku: This is only the second special issue of SoftwareX since its founding. What motivated you now to organize this special issue, and why did you choose gravitational waves for your focus?

Kate: The idea occurred to me when I was preparing a presentation for the Society for Scholarly Publishing back in 2018. In the opening remarks, I used examples of scientific instrument designers from the past, such as Antonie van Leeuwenhoek, who invented the microscope, and John Harrison, inventor of the marine chronometer, that we first described in the introductory SofwareX editorial. Those scientists were recognized for their contributions to science; for example, John Harrison received the Copley Medal, the most prestigious scientific recognition awarded by The Royal Society. And then I wondered if I could find examples of similar recognition for designers of instruments that enabled modern discoveries. At that time, if you thought of your favorite scientific breakthroughs, gravitational waves detection came to mind immediately. It was very much in the news, and that particular discovery would have been largely enabled by software. So, I thought that if we wanted to demonstrate the impact of software on modern science, we had to take a big, groundbreaking discovery like that and look at what role software played in enabling it. Organizing a special issue seemed like a perfect tool for doing that because it would both tell a story of a discovery from the software side and also credit the software's authors. The special issue took a while to gestate. In particular, finding the right co-editors was critical to the venture, and I was very lucky to collaborate with Peter Couvares and Frederique Marion on it; they supplied the thorough knowledge of not only the community, and all the influential software instruments, but also how those instruments were combined and evolved to eventually form an ecosystem that enabled the discovery.

As an addendum to this: when I gave the talk back in 2018, many people came up to me afterwards and said they never thought of software as a scientific instrument, and that it was a new perspective for them. I found it surprising at the time: I thought it was something that everybody would have known and we just needed to find vehicles to recognize it. But of course, it is easy to think this way if all of your career has been devoted to scientific software. It's not always clear if you work in other spheres.

Rinku: The editorial emphasizes numerous areas in which software played an essential part in the gravitational wave research and detector development. Were there any aspects that surprised you as you were collecting the articles for this special issue?

Kate: Yes, one interesting thing is understanding the dynamics and interplay between the very directed approach to development adopted in part of the project and the eventual need for innovation and integration of multiple independent efforts -- there are some lessons here for future scientific ventures of this kind. Another was understanding what it means to build software instruments supporting a discovery that takes three decades to accomplish; during this time many of the software tools, services, and even standards evolved or were replaced. Code management platforms went from CVS and SVN to Git; data formats underwent evolution from various proprietary formats to XML, JSON, or HDF; RPC requests were replaced by SOAP and REST, and so forth. This created many opportunities and efficiencies, but also meant that tools critical for scientific work had to be upgraded, rewritten or replaced, so that instrument designers constantly had to walk the balance between providing much needed new capabilities and just keeping pace with the evolving software ecosystem. My father has a pair of scissors that was manufactured roughly a hundred years ago and he still uses it daily. Software is not like that: it depends on a rapidly changing ecosystem and thus adapting it as the ecosystem evolves will always be a necessary -- and large -- component of any kind of development.

Rinku: Why is software sustainability so important and yet so difficult for long-term projects? What lessons can we learn from the LIGO project?

Kate: Sustainability of scientific software is sometimes simplistically understood as something that would make long-term maintenance and evolution of a particular software tool “somebody else’s problem.” Implementing a desired functionality is one thing -- but implementing it in such a way that it is extensible and maintainable, takes into account evolving hardware and software trends and can adapt to them, and integrates tools in a way that anticipates evolution and upgrades, is much harder, requiring significant experience, skillful decision making, and above all sound architecting. Doing it well sometimes means compromising on your “ideal scenario” or going out of your way in the interest of creating something that leverages other effort rather than contributes to a proliferation of tools that all do a very similar thing, slightly differently.

To give you an example: I lead a project called Chameleon that provides infrastructure for systems research] -- effectively, a bare metal reconfigurable cloud with some additional enhancements to support as many experiments as possible. Historically, those types of infrastructures were developed in-house. In contrast, we went the extra mile to build on a mainstream open source platform called OpenStack. This was hard to do at first, but we see the benefits now: since we are using a mainstream platform, it is familiar to most users and operators -- and if it is not familiar, they acquire transferable skills. We leverage the effort of a community of developers and reviewers that numbers in the thousands -- and were able to significantly contribute to the project ourselves, extending the impact of our development beyond our original mission. And last but not least, a mainstream platform is of course compatible with other mainstream platforms; this means that there is extensive community and commercial investment in tooling, documentation, and other support for things like e.g., converting OpenStack images to Amazon images and other functions that tie a specific solution into the ecosystem of existing tools and services. All these benefits come because we took the time to factor our problem in terms of an existing ecosystem of tools rather than simply develop things from scratch. This is hard to do and is often discarded because the cost comes up front and is borne largely by the developing team, while benefits come later in the form of less operational cost and lesser training costs for users and thus often revert to the community rather than developers themselves. Ultimately, the core cost of scientific software instruments has to be borne by science because almost by definition those are not systems of commercial interest, but the trick is to leverage an existing or evolving software ecosystem by sound architecting to make that cost as small as possible.

Rinku: At the end of the editorial, you say that you “hope that this demonstration of the role scientific software instruments played in gravitational waves detection forcefully collides with existing perceptions of the role of software in scientific discovery.” Your choice of the words “forcibly collide” is interesting. Do you think that the misconceptions about software’s value are caused by lack of knowledge, or something more sinister?

Kate: I think it is largely a lack of knowledge. Software has become a major player in the construction of scientific instruments only over the last 20-30 years. It also has the disadvantage of being intangible -- it is zeros and ones -- so you can’t really take a picture of yourself with it! One of the colleagues who co-founded SoftwareX with me related a conversation with a CERN scientist in which he was told that software cannot be a scientific instrument because a scientific instrument is made out of metal. So, there are certain preconceptions that we are dealing with: software has snuck upon us as a major element of science, and our consciousness has simply not caught up with it yet. Things are changing, however, not least because the role software is playing in the construction of scientific instruments is increasing.

Rinku: Thank you for taking the time to answer our questions, Kate. We hope, as you do, that your efforts will pay off in having the role of scientific software in scientific discovery recognized and appreciated.

Image credit

Omicron spectrogram of LIGO-Livingston detector’s data around the time of GW170817, using data after glitch subtraction. (Source: Florent Robinet et al: "Omicron: A tool to characterize transient noise in gravitational-wave detectors," SoftwareX, July-Dec 2020).

Author bio

Kate Keahey is one of the pioneers of infrastructure cloud computing. She created the Nimbus project, recognized as one of the first open source Infrastructure-as-a-Service implementations, and continues to work on research and development projects aligning cloud computing concepts with the needs of scientific applications and infrastructure. To facilitate such research for the community at large, Kate created and leads the Chameleon project, providing a deeply reconfigurable, large-scale, and open experimental platform for Computer Science research. To foster the recognition of contributions to science made by software projects, Kate co-founded and serves as co-Editor-in-Chief of the SoftwareX journal, a new format designed to publish software contributions. Kate is a Senior Scientist at the Mathematics and Computer Science Division at Argonne National Laboratory and the Consortium for Advanced Science and Engineering at the University of Chicago.

Comment

More on Software Engineering and Software Publishing and Citation