The first-of-its-kind conference will serve as a premier forum for the exchange and presentation of the concepts, tools, techniques, practice, and state-of-the-art in reproducibility and replicability.
The promise of digital artifacts for reproducibility
Science must communicate. The bedrock of scientific communications are the written descriptions of experiments that need to be reproduced or replicated before the results can be accepted as science. As computation is playing an increasingly important role in scientific disciplines, digital assets are becoming a fundamental building block of scientific communication that goes far beyond the digital representation of textual documents: digital artifacts can represent complex workflows that, given adequate resources, can automatically repeat experiments, reproduce results, and validate claims of contributions in research labs as well as classrooms, with the potential of greatly accelerating the speed of learning and scientific progress.
Practical challenges
In practice, communicating digital artifacts has proven challenging. Communication can occur over time, in which a 10-year-old digital artifact is still performing its function, or over space, in which a digital artifact is performing its function at a different location. The challenges stem from the behavior of contexts required to run digital artifacts: these contexts are continually and rapidly evolving because of software and hardware infrastructure upgrades, security patches, configuration changes, and hardware aging. That is, communication over time will fail in a matter of weeks or months, and digital artifacts will stop working unless updated. Reproducibility requires communication over time and space, including communication among reviewers, researchers, instructors, students, and practitioners. This communication is labor intensive, and the contexts of all these stakeholders are evolving independently, so communication over space is likely to fail immediately.
Open source software engineering
Meanwhile, open source software communities have evolved sophisticated technologies to share and organize digital assets according to principles that have enabled collaborations of unprecedented scales and speed, including versioning, branching, pull requests, and continuous integration. With scientific communities embracing open science and the FAIR principles, many scientists are discovering the value of these open source software techniques and strategies for research practices.
ACM REP '23
The 2023 ACM Conference on Reproducibility and Replicability (ACM REP ‘23) is taking place during June 27-29, 2023 in Santa Cruz, California, USA, and evolved out of Practical Reproducible Evaluation of Computational Systems (P-RECS), a series of annual workshops convened at the ACM International Symposia on High-Performance Parallel and Distributed Computing (ACM HPDC) from 2018 through 2022, and the ACM Emergent Interest Group on Reproducibility and Replicability (EIGREP). ACM REP is bringing together experts and practitioners interested in the advancement and conduct of reproducible science in computing disciplines. The first-of-its-kind conference will serve as a premier forum for the exchange and presentation of the concepts, tools, techniques, practice, and state-of-the-art in reproducibility and replicability. Many computer science conferences already have their own reproducibility initiatives, including awards for distinguished reproducibility artifacts. ACM REP will be an important forum for sharing innovative best practices across these communities with a program consisting of peer-reviewed articles, invited talks, panels, posters, and demonstrations.
ACM REP '23 offers a 3-day program that features three keynote speakers, Torsten Hoefler (ETH Zurich), Juliana Freire (NYU), and Grigori Fursin (Co-chair of MLCommons task force on automation and reproducibility, President of cTuning Foundation, and Founder of cKnowledge.org). Thanks to the authors who submitted papers, the program chairs Tanu Malik and Jay Lofstead, and their fantastic program committee, eleven peer-reviewed papers will be presented in four sessions, spanning topics across advancing reproducibility, testing reproducibility, costs and benefits, and benchmarking reproducibility. Thanks to Johannes Pietrzyk, all presented papers will be published in ACM proceedings. Thanks to the tutorial authors and the tutorial chair Alexandru Uta, the third day of the conference is dedicated to three tutorials. Thanks to the local arrangements team Stephanie Lieggi and Yelena Martinovskaya, the conference will take place at the beautiful campus of the University of California, Santa Cruz. The event is designed for hybrid attendance that welcomes remote participation. In-person attendance will be rewarded with receptions on the evenings of the first and second days.
ACM REP community resources
ACM REP is introducing two resources for the reproducibility community and asks for participation in maintaining them:
Index for Conferences with Reproducibility is a GoogleSheet that aims to list all conferences with reproducibility-related programs, awards, and their awardees. Everyone who has information about a conference that should be mentioned here is highly encouraged to update the spreadsheet. The link to this Index will be posted on the ACM REP website.
ACM REP Speaker Directory allows the nomination of individuals who should be invited to speak about reproducibility topics. The directory is intended for event organizers who would like to include a reproducibility component in their program. To nominate, use this form. Nominations will be reviewed by the ACM REP Steering Committee at regular intervals and published on the ACM REP website.
Update
A summary of the ACM REP 2023 conference was published 2023-07-13.
Author bios
Carlos Maltzahn is the PI of the Open Source Program Office (OSPO), UC Santa Cruz, and the founder and director of the UC Santa Cruz Center for Research in Open Source Software (CROSS). He also co-founded the Systems Research Lab, known for its cutting-edge work on programmable storage systems, big data storage and processing, scalable data management, distributed system performance management, and practical reproducible evaluation of computer systems. Carlos joined UC Santa Cruz in 2004, after five years at Netapp working on network-intermediaries and storage systems. In 2005 he co-founded and became a key mentor on Sage Weil’s Ceph project. In 2008 Carlos became a member of the computer science faculty at UC Santa Cruz and has graduated nine Ph.D. students since. Carlos graduated with a M.S. and Ph.D. in Computer Science from the University of Colorado at Boulder. His work is funded by nonprofits, government, and industry, including the National Science Foundation, U.S. Department of Energy, the Alfred P. Sloan Foundation, and CROSS.
Philippe Bonnet is professor at the IT University of Copenhagen. He is a Marie Curie fellow with a track record of successful research projects under DARPA, NSF (while a research associate at Cornell University), EU, and Danish funding (first at U. Copenhagen and since 2009 at ITU). Philippe is an experimental computer scientist with a background in database management. For twenty years, he has explored the design, implementation, and evaluation of database systems in the context of successive generations of computer classes in particular wireless sensor networks and cloud computing. In 2011-15, Philippe managed the CLyDE project that promoted open-channel SSDs and resulted in two contributions to the Linux kernel and two patents. Currently, Philippe's research focuses on computational storage. Philippe is co-author of a reference book on database tuning together with Dennis Shasha from New York University.