To develop software that can push the limits of accelerated molecular dynamics, the EXAALT team is working with the IDEAS-ECP project to adopt continuous integration practices.
The Exascale Atomistic capability for Accuracy, Length and Time (EXAALT) is an ECP-funded materials modeling framework designed to leverage extreme-scale parallelism to produce accelerated molecular dynamics simulations. The end goal is to allow the user to access the most appropriate combination of accuracy, length, and time for the problem at hand, trading the costs of various forms of parallelism. One application of EXAALT is modeling the surface of a fusion reactor (shown above is the interior of a tokamak at MIT, photograph by Chris Bolin, wikimedia commons). As shown in Figure 1, EXAALT is actually a collection of multiple packages, each having its own dependencies. At the heart of EXAALT is the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS), developed at Sandia National Laboratories. However, in order to enable simulations with ab initio accuracy and extended time scales, other critical packages include LATTE and ParSplice, both developed at Los Alamos National Laboratory (LANL).
Software Challenges in EXAALT
In 2017, LANL focused its annual IS&T Co-Design Summer School program on the topic of accelerated molecular dynamics (AMD). The idea was to gather a small group of elite graduate students to help optimize the performance of ParSplice, the AMD driver of EXAALT. During the early weeks of that summer, the developers of ParSplice quickly recognized that their rapidly evolving code had become difficult for the typical (and even advanced) computational scientist to compile and run. The reason was that the existing build system (or lack thereof) was becoming prohibitively difficult to negotiate.
The summer students overcame their early technical difficulties and accomplished great feats. However, the experience inspired the EXAALT team to be more proactive about future productivity issues. That is, the members decided to collaborate closely with members of the IDEAS-ECP project, to adopt modern and sustainable software-development practices. In the short term, this decision meant the implementation of a more user-friendly and portable build system (compared with the manual Makefile-based compilation of several different packages). In the longer term, this meant the development of a continuous-integration (CI) pipeline to first automate build testing and ultimately accelerate all quality-control efforts.
Productivity and Sustainability Improvement Planning
The fruitful EXAALT-IDEAS collaboration, which is still ongoing between researchers at LANL and Argonne National Laboratory (ANL), has proven mutually beneficial to both teams: providing EXAALT with technical advice and providing IDEAS with clear insight into the fundamental needs of an ECP application project. To help the IDEAS team map their collaboration efforts onto a manageable set of tasks, the group leveraged the Productivity and Sustainability Improvement Planning (PSIP) process. For the first stage of the collaboration, the construction of a minimal end-to-end CMake build system, this process was implicitly used for project planning and execution. The work required CMake script/module implementations within all three major framework components (LAMMPS example: CMakeLists.txt). For the second (ongoing) stage, PSIP was followed more explicitly by compiling the planning/tracking cards shown in Figure 2 (in summarized form).
One significant advantage of the PSIP management approach is that it forces the team to specify the 4-6 steps needed to reach a given goal. In this case, the process helped formulate the actionable items needed to lay the foundation for CI within the existing EXAALT software repository. Although PSIP can be used to manage the goals of any software project, the specific details of each step are highly dependent on the project. For example, different projects will most likely need to work with slightly different technologies to build a practical CI pipeline. Specific details will depend on where and how the repository is organized, as well as the limitations/capabilities of the existing library dependencies. For EXAALT, after careful discussion between teams, it was decided that the CI pipeline would need to depend on four key technologies:
- CMake: To manage the end-to-end compilation and testing execution using CTest
- Boost: To implement and organize functionality tests (integration, regression, and unit) inside CTest
- GitLab CI: To automatically build and test the software framework (using CMake) to validate new repository commits (usually using docker)
- Docker: To generate standard system images (with library dependencies) for use in GitLab CI
As illustrated in Figure 2, most of the work detailed in the PSIP cards was carried out by members of the IDEAS-EXAALT collaboration by the end of July 2018. With that said, the completion of these PSIP cards does not mean that the EXAALT team is finished improving their CI and/or testing infrastructure. Like most aspects of software engineering, PSIP is an iterative process, and the initial plan may need change if unexpected roadblocks emerge. Whether or not a PSIP card can be followed to completion, documenting, revising, and repeating the process make sense when a natural finishing point is reached.
At this stage, the EXAALT team members have successfully adopted continuous integration and are ready to apply the PSIP process to improve their CI pipeline further. The plan is to modify the existing infrastructure to interface with ECP-supported facilities (e.g., ALCF and OLCF). They will also expand on the Boost testing suite to tackle a related software development issue: code coverage.
For more resources on PSIP, please refer to the following articles on the BSSw.io site:
- The BSSw.io PSIP page
- Planning for Better Software: PSIP Tools
- Lightweight Software Process Improvement using Productivity and Sustainability Improvement Planning (PSIP)
- What makes PSIP suitable for the Exascale Computing Project?
- FLASH5 Refactoring and PSIP
Author Bio
Richard Zamora is an assistant computer scientist in the ALCF Data Science group at Argonne National Laboratory. His research focuses on the development and optimization of parallel software for high-performance computing and machine learning. Before joining Argonne, Richard worked in the Theoretical Division at Los Alamos National Laboratory, where he was heavily involved in the design and application of accelerated molecular dynamics algorithms. While working on the EXAALT package at LANL, he helped manage the official software repository, and he has since taken a special interest in sustainable and productive development practices.