This event is a part of the "Best Practices for HPC Software Developers" webinar series, produced by the IDEAS Productivity Project. The HPC Best Practices webinars address issues faced by developers of computational science and engineering (CSE) software on high-performance computers (HPC) and occur approximately monthly.
|Reducing Technical Debt with Reproducible Containers
|Date and Time
|2020-11-04 01:00 pm EST
|Tanu Malik (DePaul University)
|Registration, Information, and Archives
Webinars are free and open to the public, but advance registration is required through the Event website. Archives (recording, slides, Q&A) will be posted at the same link soon after the event.
Computational experiments can be challenging to reproduce; researchers have to choose between pursuing a fast-paced research agenda and developing well-organized, sufficiently documented, and easily reproducible software. Like incurring fiscal debt, there are often tactical reasons to take on technical debt in scientific software—such as deferring documentation, organization, refactoring, and unit tests when pursuing a new idea or meeting a conference deadline. However, more often than not, researchers do not repay this technical debt, leading to irreproducible experiments.
The webinar will describe different levels of technical debt and quantify the cost of not repaying the technical debt. The presenter will introduce isolation in containers as a powerful mechanism for reducing portability debt and describe limitations of current container tools. The presenter will introduce a vision of a reproducible container that aims to automate repayment of different types of technical debt, and will describe the current state of this vision with three tools that use isolation, encapsulation, and monitoring to include necessary and sufficient content in the container—both in terms of software and data, and describe the contents of the container. Finally, the presenter will show results of using reproducible containers on domain science and HPC use cases, and provide guidance.
Tanu Malik is an assistant professor in the School of Computing, DePaul University. At DePaul, she directs the Data Systems and Optimization Lab. Her research interests span topics in data provenance, database systems, distributed systems, and cyber-infrastructure for scientific data management. Her group is currently developing methods and systems for improving conduct of reproducible science in computational and data science disciplines. Tanu received the 2019 NSF CAREER award for her work on computational reproducibility. She is also a 2019 Better Scientific Software Fellow. Tanu has actively collaborated with scientists across several institutions. Her research is funded by the National Science Foundation, the Department of Energy, the Sloan Foundation, and the Bloomberg Foundation. Tanu received her PhD in Computer Science from the Johns Hopkins University and she previously worked as a Research Associate Scientist at The University of Chicago.