Julia's Value Proposition for Better Scientific Software

Introduction

Current requirements for scientific software have expanded beyond number crunching, with reproducibility, AI workflows, data analysis and visualization, continuous integration and continuous deployment (CI/CD) pipelines, packaging, and interactive computing taking central roles in the scientific discovery process. The current status quo is to use a compiled language (Fortran, C, C++) for performance-critical code, while a higher-level language (e.g., Python) is preferred for portions of the code that are not performance sensitive, with the promise of higher productivity. Nevertheless, we typically interact with third-party components within the ecosystem, such as build systems, packaging, and programming models to access heterogeneous hardware, e.g., graphical processing units (GPUs). In this model, we must deal with a many-body ecosystem, with code bases usually composed of a base language + X, in which X may comprise a long list of components, e.g., Python, C, C++, Fortran, CMake, Make, Catch, doctest, pytest, pybind11, conda, pip, Jupyter, apt, yum, etc., with no guarantees of interoperability among them. Thus, overall, economics and productivity suffer in a way that may scale with the number and variety of components required, as well as with project size and performance portability requirements.

Why Julia?

Why do we keep creating new languages and ecosystems?

We are in a constant search for new approaches that empower practitioners by lowering existing technical, economical, and social barriers. Fortran allowed access to a "formula translator" model of programming in the 1950s, while C has enabled "portable assembly" for systems programming since the 1970s, and Python has succeeded in the twenty-first century as a friendly interface that enhances productivity. Julia enables an evolutionary approach to today's scientific software development that is highly exploratory and constantly adapting to new and often unexpected science requirements, for example, COVID-19.

As shown in Figure 1, Julia's high-productivity plus high-performance layer builds upon LLVM for both CPU and GPU access, along with a unified open-source packaging and data science ecosystem hosted on GitHub. Julia also provides lightweight interoperability with existing C and Fortran codes. Hence, the value proposition is not to replace a particular language, but rather to reduce current costs in the scientific software development process (e.g., from prototyping to publication with Python + X).

Figure 1. Julia's value proposition to the construction process in scientific software.

The ecosystem is NOT an afterthought

In Julia, the project description and dependencies are the starting point when creating a new package via toml files. Just inspect any Julia package source code on GitHub and see the Project.toml files for a list of dependencies and version compatibility information. In addition, Julia provides unit testing, interactive computing via the read-eval-print loop (REPL), a standard library with mathematical and data abstractions, and a unified package manager with access to a rich ecosystem for scientific computing, data science, visualization, and AI.

So how does this circumstance differ from Python's ecosystem? Recently GitHub Actions bumped its Python version to 3.11 on some of its runners, causing problems for many Python packages. As of March 2023, 59.4% of the most popular Python packages do not indicate support for that version of Python yet on PyPI). Thus, the cost of this coordination is passed to the end user until package developers can react. In contrast, Julia promotes a more "predictive" rather than "reactive" maintenance approach, in which packages in Julia's general registry must meet certain requirements. We don't live in a perfect world, thus the value of this coordination is not only on "not breaking the API" or "fixing bugs", but also in enriching user-developer communications using an open-source process for package updates prior to deployment.

This model of "batteries included" is not new for more targeted languages, such as R or MATLAB, but the model is indeed new for more general languages that put performance (Julia) and safety (Rust) at the forefront. I find myself writing more tests and verifying my ideas on the REPL when using Julia, rather than writing boilerplate code as would be done in a general-purpose language or dealing with mismatched package versions.

Making heterogeneous hardware more accessible

JuliaGPU and JuliaParallel provide information on the packages that provide access to several vendor GPUs, e.g., NVIDIA (CUDA.jl), AMD (AMDGPU.jl), and Intel (oneAPI.jl). These high-level interfaces provide an excellent mathematical playground for exploring fine-granularity parallelization on GPUs. The CUDA.jl docs are a great starting point for those familiar with NVIDIA's CUDA or who want to learn about GPU custom kernel programming. Julia uses an integrated GPUCompiler.jl layer, whereas Python's pyCUDA and cuPy require programmers to pass custom kernels as strings.

Compose to prevent inheritance bloat

Julia does not support object-oriented programming in the same manner as C++ or Python. Julia projects are organized by modules; proper data locality and composition are achieved using "data container" structs and type hierarchy trees in which abstract types have no members (see related discussions). Think of composition as derived "has-a" based, instead of derived "is-a" based. This weak coupling prevents deep hierarchies of classes that can quickly get out of hand, while encouraging software developers to think of structs as data containers to which operations are applied, as one would do in languages like pre-2003 Fortran, R, or C.

Interoperate with existing software

Julia enables lightweight reusability of existing Fortran and C infrastructure via the @ccall macro. Similarly, Python and R interoperability is possible with PyCall.jl and RCall.jl, respectively. Thus, Julia promotes reuse over reinvention, which is both important and useful given the volume of mature scientific software.

Jupyter and Pluto.jl notebooks

Computational notebooks, Jupyter in particular, have been widely adopted in science and do not need an introduction. Jupyter is powered by Anaconda, with Python kernels requiring setting a conda environment for managing the required dependencies before launching the server and web client interface. Notebooks are stored using an *.ipynb file format based on JSON. Jupyter also supports interactive Julia notebooks via the IJulia kernel package, which can understand information readily available in Project.toml files. This approach is really neat when using services like mybinder.org for distributing and sharing notebook projects "as-is" with a broader audience.

Pluto.jl is the Julia-exclusive alternative that favors "reactive" notebooks for interactivity, essentially leveraging the fact that packaging is part of the language. There is no need for setting an environment, just launch Pluto from the REPL (illustrated below), start importing package dependencies directly into a notebook, and save them as a Julia file (.jl) in which text (in Markdown) and code cells are identified simply by annotations. The first time Pluto is launched, it provides several sample notebooks of what can be done; the introduction to Plots.jl is shown in Figure 2. I enjoy the plug-and-play approach in which the mathematical syntax, software ecosystem, and packaging simplify my work.

Launching a Pluto Notebook from the REPL.

$ julia
> using Pluto
> Pluto.run()
┌ Info: 
└ Opening http://localhost:1234/?secret=AZ8Ynd82 in your default browser... ~ have fun!
┌ Info: 
│ Press Ctrl+C in this terminal to stop Pluto
└

Figure 2. Intro to Plots.jl Pluto notebook from default examples.

The community

The Julia community is where the real value of Julia lies. The Julia community is very enthusiastic about helping others and engages using modern tools, such as the Julia Slack and Discord channels, along with each package's GitHub issues tracker. JuliaCon is the annual community gathering; a variety of interesting talks and tutorials from there can be found on YouTube. Many contributions and support come from JuliaHub (formerly Julia Computing) as part of its mission.

Last summer we organized a full-day workshop, entitled Julia for Oak Ridge National Laboratory Science, JuFOS, which, to our surprise attracted 101 registrations from a range of scientific domains. Roughly 90% of the participants responded that they wanted to learn more about Julia, while roughly 50% indicated an interest in alternatives to the current status quo for building scientific workflows in the high-level plus high-productivity space.

For people focused on high-performance computing (HPC), it is worth noting that the community is invested in performance from day one. Several members of the community shared our thoughts about HPC and Julia in a recent paper. Meanwhile, building the community has kept many of us very busy. Many venues have been organized in recent years, including a tutorial and BoFs organized by the U.S. Department of Energy Exascale Computing Project (ECP), a Supercomputing BoF, a JuliaCon minisymposium, and a monthly JuliaHPC call to provide exposure and highlight the work done by community members.

It's also worth mentioning that unifying and coordinated initiatives coming out of the ECP, such as Spack and E4S, are an invaluable source of HPC packages that can be leveraged within the Julia ecosystem.

Where to start

For a more technical introduction to the Julia ecosystem, you might want to start with a blog article that I recently updated: First Project Using the Julia Language. The article includes links to numerous other resources that can provide further help getting started. I recommend to anyone trying the language for the first time to use Visual Studio Code, which offers excellent Julia support through its extensions marketplace, and the upcoming Julia v1.9 version for a better experience. I also encourage seeking out colleagues who may be using Julia and building a local community, including sharing experiences, tips, and examples. Ultimately, scientific software, in Julia or other languages, benefits from communities. As a bonus, I've found that using GitHub Copilot with the simple Julia APIs can be a welcome boost to the productivity of writing and porting code to Julia. Its autocompletion-like capabilities save on typing (but not thinking).

Final thoughts

Julia is part of the natural evolution of programming languages. Powered by LLVM and a carefully thought-out ecosystem, Julia's design decisions and value proposition target the high-performance plus high-productivity space. Mastering a new programming language can be a steep initial investment, and eventual adoption is a result of both technical and non-technical factors. I believe it is important to expose the scientific computing community to the value proposition of newer alternatives like Julia, and that is the goal of this article. The actual value is ultimately determined by each user and project, and their particular scientific software needs.

Acknowledgment

I want to thank the many people in the community for enabling our efforts, in particular the IDEAS and PROTEAS-TUNE sub-projects within ECP, the Sustainable Research Pathways program, and the Bluestone project.

Author bio

William F. Godoy is a Senior Computer Scientist in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL). His interests are in the areas of HPC, scientific software, programming models, data, and parallel I/O. At ORNL, he has contributed to scientific software projects funded by the Exascale Computing Project and ORNL's neutron science facilities. Godoy received a PhD in Mechanical Engineering from the University at Buffalo, The State University of New York. He is a 2022 BSSw Fellowship honorable mention, a member of the United States Research Software Engineer Association and ACM, and an IEEE senior member serving in several technical venues.

Julia's Value Proposition for Better Scientific Software

Introduction

Why Julia?

Where to start

Final thoughts

Acknowledgment

Author bio

More on Programming Languages, Performance Portability, High-Performance Computing (HPC), Testing, Software Interoperability, and Configuration and Builds

Come for Syntax, Stay for Speed, and Understand Bugs in Julia Programs

Experience

Rethinking Software Variants

Deep Dive

Reproducibility in the Age of Approximate Computing

Deep Dive

Developing Coding Standards and Practices for Sustainable Software Development

Deep Dive and How To

Enabling Complex Scientific Applications

Experience and Deep Dive