Reflecting on Our Community: The SC21 BoF on Software Engineering and Reuse in Modeling, Simulation, and Data Analytics for Science and Engineering

Back in 2015, a small group of people got together to organize a session in Supercomputing’s “Birds of a Feather” (BoF) track to provide an opportunity for people interested in software for computational science and engineering (and beyond) to get together and share their thoughts, network, and start building a community. The organizing committee has changed over the years, but we’ve been fortunate to be able to offer this BoF every year since then (link below). The BOFs have proven to be an interesting and enjoyable way to find out what’s on the minds of some of the folks in this community.

We’ve settled on a format for the BoF that includes 3-minute lightning talks, chosen by the organizing committee to highlight activities and ideas that seem important and timely, followed (when we’re in person) by a discussion with the audience, which is always vigorous, wide-ranging, and interesting (and pretty tiring for the people running around with the microphones). This year, with COVID-19 concerns and travel restrictions by many employers, we decided to do our online BoF session a little differently. After the usual lightning talks, we organized into a number of breakout discussion groups around the topics of the lightning talks and a few others pre-chosen by the organizing committee. Below we summarize the talks and the breakout discussions from the session and sample what’s on the minds of our community as of November 2021.

Lightning talks

We had six lightning talks this year, covering a wide range of topics with speakers from the US, UK, Germany, and Australia.

Ecosystems are the Future! by Benjamin Brown (Office of Advanced Scientific Computing Research (ASCR), Office of Science, U.S. Dept. of Energy). Ben discussed the importance of ecosystems in his vision for the future of the ASCR high performance computing and networking user facilities (ALCF, NERSC, OLCF, and ESnet) and highlighted scientific software as one of the key ecosystems. He closed with the message “Software ecosystems are research infrastructure!”
Open Source for Researchers by Yo Yehudi (Wellcome Trust). Yo encouraged us to up our game for our open source software projects, making them more open and more accessible to others. She touched on readme files, roadmaps, contributor guides, codes of conduct, a requested citation, contact information, and using an issue tracker and also suggested some resources to help folks get started contributing to open source software.
The Internat. CSE Master Program at TUM by Michael Bader (Technical University of Munich, TUM). Michael described the International Master’s program in Computational Science and Engineering, which has been offered at TUM since 2001, currently serving approximately 50 students per year. The 4-semester program combines classes in computer science, numerical analysis, and scientific computing - with one of the key challenges being how to guide students from various backgrounds towards becoming experts in software development for supercomputing applications.
Senior Level RSE career paths (with an s) by Daniel S. Katz (University of Illinois at Urbana-Champaign). Dan presented ideas to help define paths for career progression for research software engineers (RSEs), looking particularly at offering a richer set of opportunities at senior levels to allow RSEs to explore different roles, emphasizing different skills.
FAIR 4 Research Software (FAIR4RS) by Michelle Barker (Research Software Alliance). Michelle presented the emerging idea of applying the FAIR principles (findability, accessibility, interoperability, reusability) to research software, noting “Software is not just another type of data.” The Research Data Alliance, FORCE11, and the Research Software Alliance are working together to develop the FAIR4RS principles and guidelines for implementing them.
Highlights from the IEEE CS Ad Hoc Committee on Open Science & Reproducibility by Manish Parashar (University of Utah). In 2019, the National Academies of Science, Engineering, and Medicine (NASEM) published a report on Reproducibility and Replicability in Science. Manish described work by the IEEE Computer Society, building upon the NASEM report, to develop an action plan to improve and recognize reproducibility in the society’s publications, conferences, and through its technical committees.

Breakout discussions

While we had a good turnout for the BoF, we weren’t able to form discussion groups for all of the topics on the table. The participants ended up settling on four topics from the list.

Computational science ecosystems

The take-away message from this discussion group is the idea that RSEs are crucial to the future success of computational science ecosystems – including projects ranging from small (individual investigators) to very large. RSEs in the U.S. (and many other countries) are still at a very early stage, with the role not being universally recognized or named, and with real career paths only beginning to gel. But already there are many positive examples where RSEs have been brought into software-intensive projects and have made notable contributions. But many organizations or principal investigators still feel that they shouldn’t or can’t propose project budgets that openly include RSEs. In addition to pointing out the need for further work throughout the community to recognize and define RSE roles and suitable career paths, the group called for funding agencies to encourage and support the use of RSEs and to help build this component of the workforce.

Training programs

This discussion began with a deeper look at the TUM program. The group concluded that it represents a good model and would be worth replicating elsewhere. There was some discussion of the idea of software engineering from a “research software” perspective versus an “industrial software” perspective. The group was concerned that the two may be different enough that a class focusing on one or the other might leave students thinking they were prepared for both, when, in fact, they might not be. However, there was no consensus on what should be taught. The group acknowledged that more research is needed into best practices for research software, and how (and whether) they differ from industrial software. There are many ideas and opinions about this, but few robust scientific studies. The topic then morphed into training of RSEs. There was a discussion of the need for background in soft skills, project management skills, and software quality assurance skills. In response to a question about the current state of practice in the community, participants noted that the US Research Software Engineer Association (US-RSE) has been discussing training curricula but hasn’t made real progress yet. They noted the INTERSECT project, funded by the US National Science Foundation, which has been delayed due to covid, and that a UK funding agency currently has a call out for RSE training programs (links to both below). Finally, in the wrap-up discussion, someone observed that the scale and scope of what we expect for an RSE is nearly equivalent to getting a PhD, and perhaps we need to be thinking about advanced degree programs for research software engineering.

FAIR4RS: Findability, Accessibility, Interoperability, and Reusability for Research Software

The gist of this discussion might be summarized by a comment from Michelle Barker, lightning speaker and one of the leaders of the FAIR4RS movement. She noted that the FAIR principles for data have been around for about five years now, whereas FAIR4RS is still very new. The current focus is on awareness of the concept and getting people to start thinking about it. Another participant noted that the UK Society of Research Software Engineering wanted to endorse the FAIR principles, but found them very aspirational (though that may have been an artifact of the document on which they were basing their discussions). Others in the group suggested that perhaps we should treat the FAIR principles more as suggestions – perhaps universal compliance is not reasonable to expect. Nevertheless, there was general consensus that FAIRness is an important topic for scientific research software, and the group was glad to see so many different organizations working to spread awareness of the FAIR principles. The discussion touched on questions of the possible value of branding mechanisms; whether open source software projects, by their nature, might already incorporate (some) FAIR principles, or have a higher chance to; and at what point in the lifecycle of a project should one begin to focus on FAIRness? The group noted the need for good examples to showcase to facilitate adoption of the FAIR principles. Dan Katz noted that this was an element of the FAIR4HEP (high-energy physics) project in which he’s involved (link below). Something else that has come out of their work is the observation that the definition of FAIRness can be very community dependent. For example, in FAIR4HEP, is your community all physicists? High-energy physicists? Just members of the Compact Muon Solenoid (CMS) experimental collaboration? Overall, many open questions remain with both FAIR and FAIR4RS.

Reproducibility

The discussions of reproducibility quickly moved on from the NASEM and IEEE reports from Manish’s presentation (links below), which focus on publications, to reproducibility considerations in the scientific workflows themselves. The group noted that reproducibility concerns placed a significant burden on the domain scientists typically leading this kind of work. They felt that the benefits of reproducible workflows vastly outweighed the challenges, but that practitioners might have a hard time appreciating that unless ways could be found to reduce the burdens they entail. One direction the group thought could be useful was an emphasis on open source software solutions for workflows, which could more easily be used as templates or building blocks to create new workflows, thereby simplifying reproducibility. It was pointed out that there must be a balance between “closed box” usage of scientific tools and having to understand every detail, and that it would be useful to be able to think and work in terms of well-defined abstractions to facilitate reproducibility while being able to drill into the numerical, mathematical, or physical details as needed. The group also noted the potential role that RSEs could play in facilitating reproducibility by bringing a knowledge and understanding of appropriate solutions to projects they support. Finally, the group noted that both data and software are typically needed to reproduce today’s computational science results, and it is still rare to find them together, especially for large-scale datasets; but being able to define a reproducible workflow to recreate the datasets can be helpful.

Summary

In wrap-up discussions, the BoF participants observed that although none of the focus areas were nominally about research software engineers, the topic came up, unsolicited, to play prominent roles in most of the conversations. And perhaps that is the most significant take-away message from the BoF overall: although there is still a lot of work to do for the role, career paths, and recognition of RSEs, they already play prominent and indispensable roles in the current practice of computational science and engineering, and are likely to become more prominent and indispensable in the future.

Resources mentioned

Links to the BoF web site, and to other resources mentioned in the presentations and discussions.

BoF series website, including notes from past meetings: http://bit.ly/swe-cse-bof.
A list of software-focused events at SC21: https://bssw.io/events/sc21-software-related-events
A longer version of Ben Brown’s talk about ecosystems and his vision for the future of ASCR high performance computing and networking facilities: https://youtu.be/ItYuCtS4QH4?t=4971
A longer version of Dan Katz’s presentation about RSE career paths: https://doi.org/10.5281/zenodo.5531839
RDA FAIR4RS Working Group: https://www.rd-alliance.org/group/fair-4-research-software-fair4rs-wg/case-statement/fair-research-software-wg-case-statement
FAIR4HEP project: https://fair4hep.github.io/
INTERSECT RSE training project: https://intersect-training.github.io/
UK funding opportunity for RSE training: https://www.ukri.org/opportunity/support-the-development-of-research-software-engineering/
NASEM report on Reproducibility and Replicability in Science: https://www.nap.edu/catalog/25303/reproducibility-and-replicability-in-science
IEEE CS Ad Hoc Committee on Open Science and Reproducibility web site, with links to survey and report: https://www.computer.org/volunteering/boards-and-committees/open-science-reproducibility

Reflecting on Our Community: The SC21 BoF on Software Engineering and Reuse in Modeling, Simulation, and Data Analytics for Science and Engineering

Lightning talks

Breakout discussions

Computational science ecosystems

Training programs

FAIR4RS: Findability, Accessibility, Interoperability, and Reusability for Research Software

Reproducibility

Summary

Resources mentioned

More on Conferences and Workshops, Reproducibility, Software Publishing and Citation, and Software Engineering

Reflecting on Our Community: The SC25 BoF on Scientific Software and the People Who Make It Happen: Building Our Communities and Practices

Community

Practical Reproducibility: Report from the Community Workshop

Community

Conscious Reviewing: A Commitment to Our Community

Community

Research Software Engineers in the Age of GenAI: Same Value, Changing Practice

Community

AI4Dev and LLM4HPC Workshops: Leveraging AI for Productive HPC Software Development

Community