• Information For
    • Computational Molecular Sciences Community
    • Environmental System Science Community
    • Exascale Computing Community
    • Scientific Libraries Community
    • Software Engineering Community
    • Supercomputer Facilities and their Users
  • Contribute to BSSw
  • Receive Our Email Digest
  • Contact BSSw
  • Information For
    Computational Molecular Sciences Community Environmental System Science Community Exascale Computing Community Scientific Libraries Community Software Engineering Community Supercomputer Facilities and their Users
  • Contribute to BSSw
  • Receive Our Email Digest
  • Resources

    Better

    Planning

    • Software Process Improvement
    • Software Engineering
    • Requirements
    • Design
    • Software Interoperability
    • Software Sustainability

    Better

    Development

    • Documentation
    • Configuration and Builds
    • Revision Control
    • Release and Deployment
    • Issue Tracking
    • Programming Languages
    • Development Tools
    • Refactoring

    Better

    Performance

    • High-Performance Computing (HPC)
    • Performance at Leadership Computing Facilities
    • Performance Portability
    • Cloud Computing
    • Big Data

    Better

    Reliability

    • Peer Code Review
    • Testing
    • Continuous Integration Testing
    • Reproducibility
    • Debugging

    Better

    Collaboration

    • Projects and Organizations
    • Strategies for More Effective Teams
    • Inclusivity
    • Funding Sources and Programs
    • Software Publishing and Citation
    • Licensing
    • Discussion and Question Sites
    • Conferences and Workshops
    • Research Software Engineers

    Better

    Skills

    • Online Learning
    • In-Person Learning
    • Personal Productivity and Sustainability

    View All Resources

    • Better Planning
    • Better Development
    • Better Performance
    • Better Reliability
    • Better Collaboration
    • Better Skills
    • View All Resources
  • Blog
  • Events
  • About
    • Site Overview
    • Team
    • Policies and Code of Conduct
    • BSSw Fellowship Program
Applications are open for the 2024 BSSw Fellowship Program ... Deadline September 29
  • Home
  • Blog
  • Containers for Deploying Workflow Systems and Application...

Containers for Deploying Workflow Systems and Application Codes

Share on LinkedIn Share on Facebook Tweet Copied! Permalink

PublishedAug 28, 2023
Author Karan Vahi
TOPICS
Better Performance
High-Performance Computing (HPC)
Better Development
Release and Deployment

Scientific workflows are a key enabler for complex scientific computations. They capture the interdependencies between processing steps in data analysis and simulation pipelines as well as the mechanisms to execute those steps reliably and efficiently. Workflows can capture complex processes, promote sharing and reuse, and also provide provenance information necessary for the verification of scientific results and scientific reproducibility. Workflows bring the promise of lowering the barrier to using large HPC resources for the end scientist.

Pegasus

The Pegasus (https://pegasus.isi.edu) workflow management system (WMS) is used in a number of scientific domains doing production-grade science. Pegasus allows users to describe their pipelines in a high-level resource-agnostic manner, and then execute these on a variety of execution environments ranging from local campus clusters and computational clouds to large national cyberinfrastructure such as Open Science Grid (OSG), the National Science Foundation's ACCESS program, and various DOE supercomputing resources. A key benefit of using Pegasus is its data management capabilities, whereby it ensures that the data required for the workflow is transferred to the compute nodes automatically, stages generated output to a location of the user's choosing, cleans up data that is no longer required, and also ensures scientific integrity of the data during workflow execution.

Use of containers in a workflow to manage application dependencies

In the context of scientific workflows, container technologies are especially interesting for two reasons:

  1. Containers provide an important tool to the end user to enable reproducibility of their scientific work, by providing a fully defined and reproducible environment; and
  2. Containers decrease reliance on the system administrators of centrally managed compute clusters to deploy scientific codes and their dependencies. System administrators often have a conflicting goal of providing a stable, slow-moving, multi-user environment and may not be willing to install libraries or packages that a scientist's application code requires.

Pegasus provides support for users to easily describe the container that a job in a workflow requires. Pegasus supports all major container technologies such as Docker, Singularity, and Shifter. Once described, Pegasus ensures that the underlying container is deployed automatically at runtime on the node, where a job runs along with its input data. Figure 1 provides an overview of how a job in a Pegasus workflow executes on a node, pulls in the container and input data, and stages generated outputs.

Figure 1. Containerized Job Setup on a Compute Node by Pegasus.

New training material in the form of Jupyter notebooks has recently been integrated into the main Pegasus tutorial. It covers the basics of how to package code into a Docker container, push it to an image repository such as DockerHUB, and describe how to associate the container with specific jobs in the workflow, and then run the workflow using Pegasus. The notebook can be found in the Pegasus Github repository.

Use of containers for deploying a workflow submit node

Another emerging use case for containers in workflows is to use a container to deploy the workflow system itself within the "science DMZ" of an HPC center. Several large DOE supercomputing facilities, such as the Oak Ridge Leadership Computing Facility (OLCF) and the National Energy Research Scientific Computing Center (NERSC), provide users access to a Kubernetes environment within their DMZ, which enables users to spin up containers from where they can submit jobs to the centers' HPC clusters.

For Pegasus WMS there is now a containerized setup for a "workflow submit host" with Pegasus and HTCondor installed. This container allows the user to set up pilot jobs for workflows using htcondor annex with supported HPC clusters. The workflow submit host does not run any compute jobs itself, but rather submits jobs to a remote cluster. The container is meant be to deployed on a host to which the compute nodes of the cluster to which you are submiting jobs can connect back. The host can be within the science DMZ of the HPC facility or any system with a public IP address that allows the necessary inbound connections. An example of the former case would be deploying the container on the Spin Cluster at NERSC to submit jobs to Perlmutter. Instructions on how to do this deployment can be found here.

Author bio

Karan Vahi is a Senior Computer Scientist in the Science Automation Technologies group at the USC Information Sciences Institute. He has been working in the field of scientific workflows since 2002 and has been closely involved in the development of the Pegasus Workflow Management System. He is currently the architect/lead developer for Pegasus and is in charge of the core development of Pegasus. His work on implementing integrity checking in Pegasus for scientific workflows won the best paper and the Phil Andrews Most Transformative Research Award at PEARC19. He is also a 2022 Better Scientific Software Fellow.

Comment

More on High-Performance Computing (HPC) and Release and Deployment

I/O Sleuthing: Digging into Storage Performance

Published Sep 11, 2023

By Rob Latham

Software Deployment: Bringing E4S Resources into Effective Action

Published Jun 15, 2022

By Shahzeb Siddiqui and Sameer Shende

Improving Application Performance by Optimizing I/O

Published Jul 14, 2023

By Ritu Arora

E4S: Extreme-Scale Scientific Software Stack

Published Jun 29, 2021

By Sameer Shende, Michael A. Heroux, and James Willenbring

Julia's Value Proposition for Better Scientific Software

Published Apr 14, 2023

By William F Godoy

logo Better Scientific Software
  • BSSw Fellowship Program
  • Policies
  • Site Contributors
  • Contact BSSw
  • Receive Our Email Digest
  • Follow Our RSS Feed

Copyright © 2023 Better Scientific Software under MIT License

United States Department of Energy Office of Science National Nuclear Security Administration Exascale Computing Project