The Research Software Engineers in HPC workshop (RSE-HPC-2023) was held last November as part of the SC23 conference in Denver, CO, USA. Following up on the successful workshops of the previous three years, the 2023 session was a half-day event including nine speakers, about 85 in-person participants, and a number of online participants.
Introduction
This year’s workshop theme was “Growing the RSE Community,” with many of the talks and discussions focusing on RSE training and mentoring. The workshop was divided into four main sessions: a featured talk, lightning talks, a panel discussion, and breakout discussion groups.
Featured talk
The program of the RSE-HPC workshop started with a featured talk by Weronika Filinger from Edinburgh Parallel Computing Centre. Her talk was entitled “UNIVERSE-HPC - towards a sustainable RSE training ecosystem,” and was co-authored by Jeremy Cohen and Neil Chue Hong. She began with an introduction about challenges for RSE education and training in HPC such as that many RSEs have a lack of dedicated time for skill development and that learning is often task oriented which can lead to gaps. Designing personalized pathways requires some knowledge of the skills and how they fit into the overall landscape and establishing “common” or “typical” learning pathways is hard. Finding suitable training content takes a lot of time and using training from different sources is difficult. The UNIVERSE-HPC project aims to understand and nurture an integrated vision for education in RSE and HPC. This includes defining a training framework for RSEs specializing in HPC, identifying and integrating course materials, developing new materials to fill gaps, and enabling more people from a wide diversity of disciplines and backgrounds to obtain the skills and experience they require to have a successful RSE career. The project is organized around four main topics:
Competencies and skills: Identifying competencies and skills required by RSEs as they progress through their careers and understanding where these competencies are already being taught.
Learning pathways: Define different curricula, learning pathways, and delivery mechanisms for providing training in RSE competencies, including as-taught programs, online/self-paced learning, and professional development.
Course development and delivery: Development of missing modules. Packaging of existing and new modules into different formats. Pilot delivery of courses.
Community support and contributions: Facilitate professional networking and peer support for RSEs. Develop a community of maintainers for open materials.
Weronika then presented the project’s approach to making the RSE ecosystem more sustainable on the project/institutional level and for the whole community. They suggest better sharing practices by more exchanges between projects and efforts, focus on making training resources FAIR (Findable, Accessible, Interoperable, and Reusable), persistence and citability by design, documenting and sharing processes (not just the outcomes and successful ones as well as unsuccessful ones) and accreditation and certification.
The project developed new processes such as pilot operations guidance, pilot survey and analysis, training content creation, Byte-sized RSE delivery format, etc. They have plans to share some of these, e.g. pilot operations guidance, and document and share them. Weronika finalized the talk with several questions they would like to get input for like which processes are valuable to the community, how to share and whom to share the processes with, whether there are any existing resources they should be aware of and accreditation or certification
- What are the use cases for accreditation/certification of RSE training?
- Who needs to be involved to make this successful?
- What are the steps that UNIVERSE-HPC should lead, and what needs to be led by other organizations?
After the talk, the discussion revolved around training in interpersonal skills and barriers to sharing training materials such as funding, effort, and how to make content ready to use in different contexts.
Lightning talks
The workshop brought together four speakers to discuss views on research software engineering and the achievements of Research Software Engineers.
Nicole Brewer from Arizona State University discussed five tips for putting the “research” into research software engineering. This work implements user-friendly tools to translate scientific workflows into web applications. Nicole shared her experience of working with researchers and using open source and often idiosyncratic tools. She promoted an approach of working with vague requirements, of dedicating time to journaling progress and reflecting on it, and of RSEs creating their own onboarding materials as they learn.
Arien Tamerus from the University of Cambridge described a collaboration with the Institute of Computing for Climate Science (ICCS). Arien discussed the Virtual Earth System Research Institute, which has a group of models, each of which represents a disruptive aspect of climate modeling. RSEs in the ICCS play a key role by conducting engineering support, but also by transferring engineering skills through training and a range of activities from sustainability reviews to code code clinics.
Jamie Quinn from University College London shared his experiences of being a Trustee of the Society of Research Software Engineering: both those activities that go well and those that the Society is struggling with. After introducing the activities that are going well, such as the RSE Conference, he focused on areas that need more work including updating processes and systems. Engaging people in work conducted by the Society was identified as a particular issue, along with the delegation of activities to Trustees and others in the RSE community. He ended with a vision for improving value for RSEs including those based in industry.
Jenny Wong discussed her experiences as a senior RSE at the University of Birmingham. She introduced her work with RSE Midlands, a regional RSE group in the UK, and the way in which it has brought together the local community for regular meetings. She discussed her work in equality, diversity, inclusivity, and accessibility both in HPC-SIG, a UK-based national HPC community, and in the international Women in HPC organization. She focused on the benefits of mentoring and the importance of role models, and discussed how her work in this area had been supported by her employment, which allows 10% of her time for self-development activities.
Panel
We next held a panel on RSE training and mentoring. The panelists were Francesca Schiavello from the Hartree Centre, Science and Technology Facilities Council, UK; Helen Kershaw from the National Center for Atmospheric Research, US; Samantha Wittke from CSC - IT Center for Science Ltd, Finland; and Ian Cosden from Princeton University. The panelists first discussed their backgrounds, which were quite varied.
- Francesca provided examples of working as an RSE. She had little computer science training in her undergraduate program, but then got into research software in grad school, and had lots of zigzags to find something that was interesting and the skills needed to do this. She didn't have the needed info at the start, but has learned along the way, and is happy now.
- Helen's takeaway was that mentoring is a really effective way of improving scientific software; it has massive community value because there isn't a clear path otherwise.
- Samantha described herself as a geoinformatics specialist, and CodeRefinery co-organizer and instructor, and is also part of the Nordic-RSE board. CodeRefinery builds and offers lessons, and builds community around research software.
- Ian leads the RSE group at Princeton, is chair of US-RSE, and is co-lead of INTERSECT. INTERSECT develops curriculum (with CodeRefinery and Universe-HPC), creates modular lessons, and puts them together as a 5-day training event, where students learn and interact with each other and with the instructors.
The panel then discussed questions from the audience:
-
How do we coordinate all the different projects that are building training material?
We need to bring the community of trainers and training programs together, like in this workshop. We need to think about barriers to reuse and how to overcome them. INTERSECT has worked on collecting the different materials: https://intersect-training.org/training-links/.
-
Are there mentoring methods that don't require one-to-one interaction?
One answer is that groups of students and RSEs can help each other out.
-
How do you get funding for providing training?
For some public-sector organizations, this is part of their remit.
-
Where can I find more advanced HPC training material?
We have communities where people can contribute such material, which is welcome.
-
Are reflections on training methods shared?
Some use post-workshop surveys and discussions to lead to blog posts and improvements, including in some cases longer-term event surveys.
-
Is it a better use of resources to have formal programs or informal mentoring opportunities?
Formal paid internships lead to a more diverse staff.
-
How do we train and mentor existing staff, not just students and new staff?
If people want to learn new tools, it doesn't matter. Some organizations provide time for employees for development, which can include training, viewing learning new skills as essential. Some project leads/PIs also might be unhappy with staff taking the time for training, but training can make the RSE more valuable and pays off in the long run, and the training time might be paid for by funding that isn't from the project. In general, academia is supportive of training and new learning, so some RSEs might be more concerned about this than their project leaders are.
-
How does mentoring work with different levels of coding, from custom scripts to reusable products?
It's useful to have mentoring span different levels, not to focus on just one level.
-
How to get people who need training to come?
Many who would benefit don't realize it, don't know about the training, or don't want to attend for other reasons, such as not seeing the time as valuable. An answer is to focus the advertising on what a person will be able to do afterward rather than what the training is. Another is to have open office hours that can lead to discussion, which can either solve problems or point to training as a next step.
-
How do we advertise opportunities for training and practical experiences for students or others who want to be RSEs, like summer programs?
We need to collaborate on this, but no clear ideas came up. Many current opportunities are found by chance.
Breakout sessions
We organized an interactive session on the topic of training/mentoring/preparation of RSEs. We split the attendees in the room into seven working groups of 5-8 people. Each working group discussed the following questions:
- What is the most important training/mentoring/ preparation RSEs need?
- What good examples of programs exist?
- What material is available now?
- What are the gaps, specifically in HPC?
- Who should help provide what’s missing?
and reported back to the plenary at the end of the session.
Key topics that were mentioned include the following.
- There are already well-defined training and mentoring programs, but they are not communicated between organizations.
- Intermediate/advanced training is harder, and there are gaps.
- There is value in having well-defined pathways, but we need to be careful how they are defined so that they apply to all types of RSEs.
- Bigger centers contribute to the sustainability of training.
- There's a role for industrial collaboration, and this is particularly valuable in hackathons.
Answers to “1. What is the most important training/mentoring/ preparation RSEs need?" included that there are some good starting points such as the software carpentries or advanced training by HPC centers. Different groups discussed that there must be a common language with researchers and that soft skills like communication are as important as technical skills. Learning technical skills for RSEs needs clear terminology, a definition of what is needed, patience, ability to learn, and, ideally, mentoring and the availability of “good enough” practices for RSEs. An important aspect is also whether training is offered with consideration of diversity, equity, and inclusion, and during working hours instead of in the leisure time of RSEs.
Answers to "2. What good examples of programs exist?" and "3. What material is available now?" included:
- US-RSE: https://us-rse.org/resources/rses/
- Carpentries: https://carpentries.org/community-lessons/
- ARCHER courses: https://www.archer2.ac.uk/training/courses/
- PRACE Training Portal: https://training.prace-ri.eu/
- ENCCS: https://enccs.se/events/
- Intersect: https://intersect-training.org/index.html
- CodeRefinery: https://coderefinery.org/lessons/
- Sciware: https://sciware.flatironinstitute.org/
- SigHPC: https://www.sighpc.org/
- RSE competencies toolkit: https://rsetoolkit.github.io/rse-competencies-toolkit/
- UNIVERSE-HPC: https://www.universe-hpc.ac.uk/
- Pittsburgh Supercomputing Center has series of sessions on YouTube: https://www.youtube.com/@XSEDETraining
- OLCF CUDA Training Series, new HIP series ongoing: https://www.olcf.ornl.gov/cuda-training-series/ and https://www.olcf.ornl.gov/hip-training-series/
- Portals at HPC centers in Europe: https://hartreetraining.stfc.ac.uk/moodle/local/hartree/index.php and https://www.fz-juelich.de/en/ias/jsc/education/training-courses and https://www.hpc.cineca.it/content/training
- NVIDIA offerings
- Chameleon Cloud
- International High Performance Computing Summer School
- Argonne Training Program for Extreme Scale Computing (ATPESC)
- US DoD HPCMP PET program (open only to HPCMP users)
- University offered courses, Supercomputer documentation, code communities, SC partnership
- Hackathons (e.g., partnering with industry) for mentoring
- HPC Europa (as example of mentorship - more targeted to HPC researchers, but was useful for RSE)
A lot of answers to "4. What are the gaps, specifically in HPC?" were focused around accessibility: economic accessibility, time zone accessibility, outreach to RSEs, need for certified instructors, and a missing overview of what training is available. Some participants said that example pathways are missing as well as specialized training such as platform-agnostic HPC courses. Since there are still many members in the HPC community who do not know about RSEs, it is hard to know what training is necessary and missing. Scaling of training is also a challenge and train-the-trainer models could help with this.
The discussion around "5. Who should help provide what’s missing?" elucidated that there are many stakeholders who could fill in gaps, from university-level support to RSE societies to vendors. Also, the community at large, including researchers, people who can fund and develop material, and the open source community, can fill in gaps and support funded training programs.
Conclusions
We believe the workshop was successful in helping to connect RSEs with one another, and to connect them with resources and ideas to take back to their home institutions. Many enthusiastic conversations took place during and after the sessions. The energy level in the breakout discussions was particularly noticeable; it was clear that participants enjoyed the chance to share challenges and successes.
We would like to thank everyone who participated, especially the committee members and speakers who helped make the workshop possible. And we hope to be able to continue the discussions in the next workshop at SC24!
Author bios
Charles Ferenbaugh is the Computer Science Lead for the Eulerian Applications Project at Los Alamos National Laboratory. He received a Ph.D. in Mathematics from Princeton University in 1992. He spent several years working at Raytheon developing high-performance signal processing software, before coming on staff at LANL in 2001. At LANL he has been a software developer contributing to large multiphysics code projects running on supercomputer clusters. He has also been a part of LANL research efforts in advanced architectures and programming models. He was a founding steering committee member of the US Research Software Engineer Association.
Sandra Gesing is the inaugural Executive Director of the US Research Software Engineer Association (US-RSE) and a Senior Researcher at the San Diego Supercomputer Center. Her research focuses on science gateways, computational workflows as well as distributed computing which inherently leads to highly interdisciplinary projects. She is especially interested in the sustainability of research software, the usability of computational methods, and the reproducibility of research results. She advocates for improving career paths for research software engineers and facilitators and for incentivizing their work via means beyond the traditional academic reward system.
Simon Hettrick is Deputy Director of the Software Sustainability Institute and a Director of the Southampton Research Software Group. Simon's research focuses on the use of software in the research community with the aim of understanding practices and demographics. Simon is a passionate advocate for Research Software Engineers. He orchestrated a campaign to gain recognition for this community, which has grown from a handful of people in 2013 to a substantial international community. He was the founding chair of the UK's Association of Research Software Engineers and was a founding Trustee of the Society of Research Software Engineering.
Daniel S. Katz is Chief Scientist at the National Center for Supercomputing Applications (NCSA) and Research Associate Professor in Computer Science, Electrical and Computer Engineering, and the School of Information Sciences (iSchool) at the University of Illinois Urbana-Champaign. Dan's interest is in the development and use of advanced cyberinfrastructure to solve challenging problems at multiple scales. His technical research interests are in applications, algorithms, fault tolerance, and programming in parallel and distributed computing, including HPC, Grid, Cloud, etc. He is also interested in policy issues, including citation and credit mechanisms and practices associated with software and data, organization and community practices for collaboration, and career paths for computing researchers.