Junchao Zhang

Affiliation: Argonne National Laboratory

GitHub

MPI debugging resources and community hub

"There is always one more bug to fix". When code crashes one needs to debug it, making debugging an essential activity in code development. Debugging is a detective process that involves finding and solving mysteries. However, in high performance computing (HPC), debugging can be particularly challenging. Many codes in HPC are written using the message passing interface (MPI) and often run using a handful to millions of processes. Debugging these codes requires following many clues/processes and can be an exciting, frustrating, or even a desperate task. It's important to know tools and best practices to help you fix the bugs, save time, and feel more fulfilled. Unfortunately, debugging resources for MPI are limited and scattered throughout the HPC community. BSSw Fellow Junchao Zhang aims to fix this gap by creating a community hub introducing MPI debugging tools and best practices, letting MPI code developers share their tips and tricks. This project will keep beginners in mind and focus mostly on freely available tools.

Junchao is a research software engineer in the Mathematics and Computer Science Division of Argonne National Laboratory. He is a developer of PETSc (the Portable, Extensible Toolkit for Scientific Computation), a widely used math library written in C. His primary focus is on the MPI communication module and the GPU backends in PETSc. He often needs to debug PETSc codes. Before joining PETSc, he was an MPICH developer and he still keeps close collaboration with the MPICH team at Argonne.