• Information For
    • Computational Molecular Sciences Community
    • Environmental System Science Community
    • Exascale Computing Community
    • Scientific Libraries Community
    • Software Engineering Community
    • Community of Supercomputer Facilities and their Users
  • Contribute to BSSw
  • Receive Our Email Digest
  • Contact BSSw
  • Information For
    Computational Molecular Sciences Community Environmental System Science Community Exascale Computing Community Scientific Libraries Community Software Engineering Community Community of Supercomputer Facilities and their Users
  • Contribute to BSSw
  • Receive Our Email Digest
  • Resources

    Better

    Planning

    • Software Process Improvement
    • Software Engineering
    • Requirements
    • Design
    • Software Interoperability

    Better

    Development

    • Documentation
    • Configuration and Builds
    • Revision Control
    • Release and Deployment
    • Issue Tracking
    • Programming Languages
    • Development Tools
    • Refactoring

    Better

    Performance

    • High-Performance Computing (HPC)
    • Performance at Leadership Computing Facilities
    • Performance Portability

    Better

    Reliability

    • Testing
    • Continuous Integration Testing
    • Reproducibility
    • Debugging

    Better

    Collaboration

    • Projects and Organizations
    • Strategies for More Effective Teams
    • Funding Sources and Programs
    • Software Publishing and Citation
    • Licensing
    • Discussion and Question Sites
    • Conferences and Workshops

    Better

    Skills

    • Online Learning
    • Personal Productivity and Sustainability

    View All Resources

    • Better Planning
    • Better Development
    • Better Performance
    • Better Reliability
    • Better Collaboration
    • Better Skills
    • View All Resources
  • Blog
  • Events
  • About
    • Team
    • Policies and Code of Conduct
    • BSSw Fellowship Program
  • Home
  • Blog
  • Software Sustainability in the Molecular...

Software Sustainability in the Molecular Sciences

Share on LinkedIn Share on Facebook Tweet Copied! Permalink

PublishedNov 14, 2019
Author Theresa L. Windus and T. Daniel Crawford
TOPICS
Better Collaboration
Projects and Organizations
Conferences and Workshops
Better Planning
Design
Software Interoperability

The molecular sciences -- including chemistry, materials, biophysics and biochemistry -- have a long history of developing software to answer core scientific questions. The field also has a long history of challenges to software sustainability. This blog article discusses some of the software sustainability challenges and the opportunities/possible solutions that the Molecular Sciences Software Institute (MolSSI) is working toward with the molecular sciences software development community.

The MolSSI is an NSF-funded project that is a nexus for science, education, and cooperation for the global computational molecular sciences community. Funded in 2016, the MolSSI seeks to provide software expertise and infrastructure, education and training, and community engagement and leadership in molecular sciences software development. The fundamental purpose of the MolSSI is to serve and enhance the software development efforts of the broad field of computational molecular science.

Challenges

Software, hardware, and educational challenges

Software in the molecular sciences ranges from small utility programs used to manipulate inputs and outputs, to analysis programs, to libraries that provide a particular functionality, to monolithic scientific codes that are capable of many types of simulations. Each code is usually developed and optimized with a particular computing environment in mind (such as a laptop or workstation, a small cluster, or the emerging exascale and quantum computers). In toto, this software runs on the complete suite of computational hardware, from laptops to supercomputers. In addition, the software is developed with many different types of languages (Fortran, Python, C++, C, and scripting languages, for example) with dependencies on different math libraries and compiling environments.

The diversity and evolving set of computational hardware and software stacks lead to significant sustainability challenges. Molecular computational scientists, as with most software developers, want their software to be portable and run on multiple platforms (as appropriate). This is one reason why most developers are happy to use math libraries that are optimized by the hardware and software vendors to perform well on a particular platform. Most of these libraries have common abstract programming interfaces (APIs) that allow the developer to link in different math libraries without having to make changes to the application software. Unfortunately, APIs and shared data formats are not common for most of the rest of the molecular sciences software ecosystem. Thus, sharing of software is not as easy as it should be. Also, new or faster algorithms for particular hardware are usually not adopted by other codes. Therefore, developers end up optimizing similar application software on the same hardware and software stacks. This extra work translates into more resources being needed across the community to sustain the software.

Much of the early software in the field was developed in an academic setting where little or no software engineering practices were used. To be fair, the wealth of software engineering practices in vogue today were not taught (or even developed) at the time of these early programs. In addition, there was generally no formal training of molecular scientists in the computer sciences including use of languages and data structures. Even today, graduate degrees in the molecular sciences generally require a minimum number of courses, and adding additional courses is often discouraged in lieu of more time spent on research. Undergraduate curricula in the molecular sciences already include a significant amount of coursework, and adding additional requirements can be difficult, especially if the faculty are not convinced that computational training is necessary. Unfortunately, even with the advances made in computer science and engineering as well as changes in the social environment in some molecular science departments, most of the computer science training for computational molecular scientists is still ad hoc, although the wealth of online materials and courses makes obtaining this knowledge much easier than it used to be.

Socioeconomic challenges

Sustainability is as much a socioeconomic problem as a technical one; thus, there is a supply and a demand side to software sustainability. On the supply side, the cost model for the program needs to be considered, as well as the commitment of the developer community. If the software is open source, this includes the cost of the developers for the initial development and incentives and appropriate licensing models that facilitate external developer communities. Of importance here are low barriers to entry -- ease of development, documentation, communication among the software developers, etc. Developing this community can take time and effort, and therefore expense. In addition, the software development must align with user needs. Consistent and frequent communication with potential and existing users is necessary in order to ensure that the demand will be high. Factors that determine the engagement of users include tangible metrics such as software quality, as well as intangible metrics such as the external user community's confidence in the code and process. The software development must be driven by maximum quality and community confidence in the code. For some software, a cost recovery model will require providing support of, and services around, software as opposed to selling per-use licenses. In addition, it must be understood that not all software should be sustained. Natural attrition of software is something we should expect in an ever-changing socioeconomic environment.

Another social challenge in the molecular sciences is that academic software is usually developed only as part of the process of producing simulations or verifying theoretical developments -- to get the physics right -- for publications in journals and toward advancement in a graduate degree. In other words, the software is often not the primary focus; it is just a vehicle used toward a publication -- the primary metric still used in most science fields. Thus, developers often skip the upfront cost necessary for clear and effective design decisions, instead taking shortcuts in the development to "just get the software working," skipping testing for end cases, and putting off documentation (that then never appears). As a result, the software may be difficult to maintain and, therefore, difficult to sustain. Often, those developers who do take the time to produce well-thought-out, hardened software produce fewer publications. Although the social environment is changing, software still is not considered as important a product as is a publication, although projects such as the NSF-funded URSSI project seek to change this status quo. Careers for such developers often lead into industry positions outside of the molecular sciences. While these careers can be satisfying, such moves mean that some of the best computational engineers are enticed out of the very field that needs them.

Opportunities/possible solutions

Meeting these challenges can be daunting; however, significant progress has been made. The molecular sciences community itself has come to recognize that traditional development methods have sustainability challenges and has sought out opportunities to change the environment. Indeed, the fact that NSF is now funding not only the MolSSI effort but other computational science software development efforts such as the Institute for Research and Innovation in Software for High Energy Physics and the Science Gateways Community Institute is a significant sign that the culture is changing. These funding opportunities allow the community to implement significant changes in our software ecosystem that might otherwise not be possible. Here we outline some of those possible solutions that we have been involved in as part of the MolSSI project.

Education and outreach

As one of the most significant parts of our agenda, the MolSSI has engaged in a large education and outreach effort. The MolSSI offers summer schools, educational workshops, and tutorials at the undergraduate, graduate, and postgraduate levels that focus on best practices in software engineering and applying those techniques to important topics in the molecular sciences. Several of our workshops also address programming issues related to changing hardware and high-performance computing. To date, these activities have reached over 300 students, and best software practices are becoming the de facto standard for the field. Materials also are made available online in a tutorial form for those who cannot make the formal meetings or who have specific needs. In addition, the MolSSI has developed an online resource page for best practices and has partnered with the Better Scientific Software effort to provide general resources from the broader computational sciences community. Furthermore, the MolSSI has awarded software fellowships to approximately 50 graduate students and postdocs with the intent of providing in-depth training in software best practices and engineering. These educational opportunities are gaining a foothold in the community, and we expect they will have a long-term impact on how the community as a whole develops software as these students progress into the next phases of their careers.

The MolSSI also sponsors approximately eight software workshops per year, reaching more than 500 participants, to understand the challenges of software development in different pockets of the community. Most of these workshops are open discussions of bottlenecks to software development as well as issues associated with creating sustainable software. The workshop reports include specific, actionable recommendations for the MolSSI to aid the developments in the communities. Many of these recommendations -- such as creating API standards and key infrastructure to help codes work together to enable world-class science solutions -- have huge potential to increase software sustainability in the community. We note that most of these workshops are led by members of the community and not by the MolSSI itself; we are a key partner in the workshops, but the organizers determine the primary topics of discussion.

Software and data API standards

The MolSSI has also taken a leadership role in the development of code and data API standards. These common APIs will facilitate a level playing field for all developers, enabling even the smallest software projects to gain recognition for a unique feature or performance optimization. In addition, the MolSSI is developing infrastructure software that uses these APIs to enable tasks that have proven to be difficult in our community, such as developing commonly accessible data sets using multiple quantum chemistry codes (the QCArchive) and enabling the coupling of multiple quantum and molecular mechanics codes to perform very large, complex molecular simulations (the MolSSI Driver Interface). Ultimately, these types of standardizations enable fair competition, with the users picking the most successful products for their needs.

Best practices and new metrics

Of course, this primarily MolSSI-developed infrastructure software faces its own sustainability challenges. However, the MolSSI strives to practice what it preaches: designing and developing with its users (mostly developers of simulation codes, but also actual end users); using thoughtful design to enable modularity, separation of concerns, reusability, and ease of use; using standard APIs that enable a broad swath of developers to engage; developing documentation (both user and developer); using distributed version-control systems, such as GitHub, automatic testing, and messaging tools to engage new developers and track issues; and building the user community through seminars, workshops, and personal communication. The MolSSI will also continue to engage with commercial interests to provide training (as appropriate) and services associated with the software that is being developed.

Finally, the MolSSI is working to encourage new metrics within the field to reward those who take the software development path -- other than a large salary at an industrial position not related to molecular sciences. These efforts include encouraging developers to use DOIs for their software and datasets, raising the level of a software release to the same level as a publication, and using data within tools such as GitHub to show productivity, as well as encouraging the continued formation of positions related to scientific software development.

Acknowledgments

This blog post is based on a white paper at the 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19) and is cross-posted on the BSSw and URSSI sites.

Author bios

Distinguished Professor Theresa Windus is the Liberal Arts and Sciences Dean's Professor at Iowa State University and a Laboratory Associate with Ames Laboratory. She received her bachelor’s degree in chemistry, mathematics, and computer science from Minot State University and her Ph.D. from Iowa State University. She is currently the Director of the NWChemEx DOE Exascale Computing Project and Deputy Director of the MolSSI. Her research focuses on the development of high-performance computational chemistry methods and their use in practical applications. She is a Fellow of the American Association for the Advancement of Science and has garnered multiple research and teaching awards.

Dr. T. Daniel Crawford is University Distinguished Professor of Chemistry at Virginia Tech and the Director of the Molecular Sciences Software Institute in Blacksburg, Virginia. He received his bachelor's degree in chemistry in 1992 from Duke University and his Ph.D. in 1996 from the University of Georgia. His research focuses on quantum chemical models of molecular response properties in liquid environments. He is a Fellow of the American Chemical Society and the winner of 2010 Dirac Medal of the World Association of Theoretical and Computational Chemists.

Comment

More on Projects and Organizations, Conferences and Workshops, Design, and Software Interoperability

Better Scientific Software: 2020 Highlights

Published Jan 07, 2021

By Rinku Gupta

Introducing the 2021 BSSw Fellows

Published Dec 16, 2020

By Hai Ah Nam

The RSE-HPC-2020 Workshop: Creating Community, Building Careers, Addressing Challenges

Published Oct 26, 2020

By Charles Ferenbaugh, Sandra Gesing, Simon Hettrick, and Daniel S. Katz

Applications Open for the 2021 BSSw Fellowship Program

Published Aug 19, 2020

By Hai Ah Nam

2019 BSSw Fellows Guide Developers through Each Stage of the Scientific Software Lifecycle

Published Jul 17, 2020

By Hai Ah Nam, Rene Gassmoeller, Ignacio Laguna, Tanu Malik, and Kyle Niemeyer

Working Remotely: The Spack Team

Published May 16, 2020

By Todd Gamblin and Elaine M. Raybourn

Research Software Engineer Stories

Published Apr 24, 2020

By Vanessasaurus

Scientific Software Projects and Their Communities

Published Mar 23, 2020

By Rene Gassmoeller

Spreading Ideas about Better Scientific Software

Published Feb 28, 2020

By David E. Bernholdt

US Research Software Engineer (US-RSE) Association

Published Jan 15, 2020

By Ian Cosden, Chris Hill, Sandra Gesing, and Charles Ferenbaugh

Better Scientific Software: 2019 Highlights

Published Jan 03, 2020

By Rinku Gupta

Introducing the 2020 BSSw Fellows

Published Dec 13, 2019

By Hai Ah Nam

Research Software Science: A Scientific Approach to Understanding and Improving How We Develop and Use Software for Research

Published Sep 25, 2019

By Mike Heroux

Data-driven Software Sustainability

Published Sep 18, 2019

By Daniel S. Katz

Applications Open for 2020 BSSw Fellowship Program ... Q&A Webinar on Sept 20, 2019

Published Sep 04, 2019

By Hai Ah Nam

Building Community through Software Policies

Published Aug 12, 2019

By Piotr Luszczek and Ulrike Meier Yang

2018 BSSw Fellows Tackle Scientific Productivity Challenges

Published Mar 28, 2019

By Hai Ah Nam

Accelerating Scientific Discovery with Reusable Software: Special issue of IEEE CiSE

Published Mar 19, 2019

By Scott Lathrop

Better Scientific Software: 2018 Highlights

Published Jan 04, 2019

By Lois Curfman McInnes, David E. Bernholdt, and Mike Heroux

Introducing the 2019 BSSw Fellows

Published Dec 11, 2018

By David E. Bernholdt, Mike Heroux, and Lois Curfman McInnes

SC18: Does That Stand for “Software Conference”?

Published Nov 08, 2018

By David E. Bernholdt

Building Connections and Community within an Institution

Published Oct 26, 2018

By Greg Watson and Elsa Gonsiorowski, PhD

Applications Open for 2019 BSSw Fellowship Program ... Q&A Webinar on Sept 21, 2018

Published Sep 10, 2018

By David E. Bernholdt, Mike Heroux, and Lois Curfman McInnes

URSSI: Conceptualizing a US Research Software Sustainability Institute

Published Jul 30, 2018

By Daniel S. Katz, Jeff Carver, Sandra Gesing, Karthik Ram, and Nic

Research Software Engineer: A New Career Track?

Published Jun 14, 2018

By Chris Richardson

Scaling Small Teams to a Team of Teams: Shared Consciousness

Published Apr 17, 2018

By Elaine M. Raybourn and David Moulton

BSSw Fellowship Activity: Promoting Software Citation

Published Mar 13, 2018

By Daniel S. Katz

Call for Papers ... Accelerating Scientific Discovery with Reusable Software

Published Feb 26, 2018

By Scott Lathrop

Introducing the 2018 BSSw Fellows

Published Feb 05, 2018

By David E. Bernholdt, Mike Heroux, and Lois Curfman McInnes

New FAQ List for BSSw Fellowship Program ... Applications Due by Jan 5, 2018

Published Dec 13, 2017

By Mike Heroux and Lois Curfman McInnes

Applications Open for New BSSw Fellowship Program ... Q&A Webinar on Dec 12, 2017

Published Dec 01, 2017

By Mike Heroux and Lois Curfman McInnes

BSSw Site Launch at SC17 ... Contribute to Better Scientific Software!

Published Nov 13, 2017

By David E. Bernholdt, Mike Heroux, and Lois Curfman McInnes

Pending BSSw Site Launch at SC17

Published Sep 27, 2017

By Lois Curfman McInnes

Talking about Software Development at SIAM CSE19

Published May 29, 2019

By David E. Bernholdt, Anshu Dubey, Mike Heroux, Catherine Jones, Daniel S. Katz, Lois Curfman McInnes, and James Willenbring

Performance Portability and the Exascale Computing Project

Published Dec 07, 2020

By Anshu Dubey

Porting the Ginkgo Package to AMD's HIP Ecosystem

Published Jun 25, 2020

By Hartwig Anzt

When NOT to Write Automated Tests?

Published Jul 29, 2019

By Roscoe A. Bartlett

Software As Craft

Published Feb 25, 2019

By Paul Wolfenbarger

The Art of Writing Scientific Software in an Academic Environment

Published Feb 11, 2019

By Hartwig Anzt

Can You Teach an Old Code New Tricks?

Published Mar 26, 2018

By Charles Ferenbaugh

Better Science through Software Testing

Published Feb 02, 2018

By Tom Evans

logo Better Scientific Software
  • BSSw Fellowship Program
  • Policies
  • Site Contributors
  • Contact BSSw
  • Receive Our Email Digest
  • Follow Our RSS Feed

Copyright © 2020 Better Scientific Software under MIT License

United States Department of Energy Office of Science National Nuclear Security Administration Exascale Computing Project