SC24 The Importance of Open Standards in High Performance Computing

Rob_Mueller-Albrecht · ‎02-18-2025

A Quick Snapshot from Supercomputing 2024 (SC24)

In November, at SC24 in Atlanta, Dr. Tom Deakin, Senior Lecturer in Advanced Computer Systems, University of Bristol (UK), took to the stage in Intel’s exhibitor booth theater to highlight the opportunities the variety of heterogeneous architectures brings to applications, and how application performance and portability can be rigorously measured and compared across diverse architectures. He shares strategies for writing performance portable applications. He presents the roles that ISO languages C++ and Fortran, as well as parallel programming models and abstractions such as OpenMP*, SYCL*, and Kokkos*, play in the ever-changing heterogeneous landscape.

Dr. Deakin is taking us on a journey along the yellow brick road on the quest to achieve the seemingly elusive goal of true scalability across varied distributed computer infrastructure configurations.

How can we best achieve the “Three Ps” of Performance, Portability, and Productivity?

AI and HPC workloads execute on an ever more complex and diverse set of CPUs paired with GPUs, AI accelerators, and custom accelerators. The supercomputer update cycles are accelerating. Currently, about two-thirds of supercomputers running at research institutions don’t take advantage of GPUs and accelerators at scale, but that percentage is rapidly shrinking.

At the same time, many of the most challenging workloads have a long development history spanning decades. Those workloads need to be brought into the modern world of scalable heterogeneous computing. This needs to happen in a way that is future-proof and will also scale across new yet undetermined architectural changes.

The question this conference talk I am writing about is thus trying to answer is which programming paradigm can show us the path to the future. How do developers write and update their application so that they know it will get good performance as they migrate it from one system to another?

This gets us to the key concept of performance portability:

“A code is performance portable if we can get a similar fraction of peak performance on a range of different target architectures.”

Needs to be a good fraction of the best achievable (i.e., hand-optimized)
The range of architectures depends on your goal, but important to allow for future developments
Consistency of distribution of performance across systems

Pennycook, Sewall, Jacobsen, Deakin, McIntosh-Smith,

Navigating Performance, Portability, and Productivity,

https://doi.org/10.1109/MCSE.2021.3097276

This clearly is a balancing act:

We would like to achieve an abstraction layer allowing a single codebase that runs on all the different architectures.
But, we also want to be able to accommodate specialization: optimization for unique underlying hardware features of a given architecture.

In his talk, Dr. Deakin explores how far we can reasonably push toward a single code base that runs everywhere.

Every parallel programming model is slightly different and approaches the abstraction of parallel kernels and parallel loops in a multi-option code offload environment differently.

The YouTube* video featuring the full presentation explores a variety of them: OpenMP, OpenCL*, SYCL, Kokkos, and CUDA* and how they match up to the promise of performance portability and the future of a wide-open standardized, architecture-agnostic yet adaptable approach to open accelerated software development.

Check out the full video here:

University of Bristol: The Role of Open Standard Programming Models for HPC

The presentation sketches out a hopeful vision of the future for C/C++ and Fortran in high-performance exascale computing. The LLVM* Compiler Infrastructure Project embraces many of the tenets of open source and open governance software development and the advent of heterogeneous parallelism.

“Code once, run anywhere’, is achievable, and we are on the journey toward it.

Join the Open Accelerated Computing Revolution

Accelerated heterogeneous computing is finding its way into every facet of technology development. By embracing openness, you can access an active ecosystem of software developers and solve the big question of multi-architecture scalability.

Get Started with oneAPI

If you find the possibilities for accelerated compute discussed here intriguing, check out the latest LLVM-based Intel® oneAPI DPC++/C++ Compiler and Intel® Fortran Compiler, either stand-alone or as part of the Intel® oneAPI Base Toolkits, or Intel® oneAPI HPC Toolkit, or toolkit selector essentials packages.