Ready for Exascale - Bringing Science Into the Future

Adam_Wolf · ‎09-16-2024

The latest episode of the "Code Together" podcast, Ready for Exascale - Bringing Science Into the Future, features an insightful discussion between host Tony Mongkolsmai and Scott Parker, a lead for the Performance Engineering team at the Argonne Leadership Computing Facility (ALCF). The conversation centers on the progress and challenges that were overcome in porting scientific applications to the Aurora Supercomputer, a cutting-edge exascale system at Argonne National Laboratory.

Overview of Aurora's Development

Aurora is notable as the first GPU-based system deployed by Argonne and the first of its kind to utilize Intel® Data Center GPU Max Series at such a large scale. This transition from CPU to GPU-based computing presents a unique set of challenges for the Argonne team, who are tasked with preparing 40 different scientific applications to run efficiently on a different architecture and on a new platform.

Porting Challenges: CPU to GPU and NVIDIA to Intel

Parker explains that the applications being ported are complex scientific codes, many of which have been in development for decades. These applications are technically sophisticated, involving intricate mathematics and modeling, and transitioning them to a new platform is a significant undertaking.

CPU to GPU Transition:
- Many applications were originally designed for CPU-only systems and had to be adapted to run on GPUs. This required a rethinking of programming models and a deep understanding of GPU architectures.
- For CPU-based applications, the common models like MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) were prevalent. Aurora supports these models, with the later versions of OpenMP allowing code offloading to GPUs. Both programming models allow developers to write parallel code. MPI allows code to use multiple nodes, and OpenMP allows for parallelism within a node and provides the capability to run on the GPU. Intel's MPI and OpenMP implementations on Aurora and subsequent engagement in helping identify and fix issues with Aurora were critical to the success of this collaboration.
NVIDIA to Intel Transition:
- Applications previously optimized for NVIDIA's CUDA environment face a challenge because CUDA is proprietary to NVIDIA.
- Intel offers SYCL (an open programming model supported by Intel® oneAPI compilers) as an alternative, providing a path for porting CUDA-based applications. Tools like Intel's SYCLomatic assist in this transition by automating parts of the process. Since CUDA is a proprietary NVIDIA model that, for the most part, only runs on NVIDIA hardware, in order to run on Intel GPUs, it is necessary to rewrite the code from CUDA into something else, and SYCL was a common choice for the developers at Aurora.

Developers face the complex decision of selecting the right programming model for their applications. Parker notes that many teams experiment with different models to find the best fit. Options include:

OpenMP: A natural choice for teams already using this model, especially those working with Fortran.
SYCL: Favored by teams transitioning from CUDA due to its conceptual similarities and available conversion tools.
Kokkos and Raja: Developed by the Department of Energy (DOE) for applications requiring portability across various platforms, including CPUs and different GPU architectures.

The choice of programming model often depends on the developers' familiarity and the specific needs of their applications. For C++ codes, the flexibility to choose among several models exists, while Fortran codes typically continue using OpenMP unless they are rewritten in C++.

Progress and Achievements

Over the years, significant progress has been made. Initially, many applications faced issues running efficiently on Aurora, but through collaborative efforts, including hackathons and workshops, substantial improvements were achieved. Currently, a majority of the applications are running well on the GPUs, and many have achieved excellent performance metrics.

The team monitors application readiness and performance using a color-coded system: red indicates non-functional applications, while various shades of green represent varying levels of performance. This systematic tracking has shown a positive trend, with most applications now running efficiently.

Future Directions and Scaling Challenges

Looking ahead, the goal is to enable more scientific breakthroughs by leveraging Aurora’s full potential. The team at Argonne is focused on moving applications from development into , enabling researchers to conduct groundbreaking science on this powerful platform.

Scaling applications to utilize Aurora’s 10,000+ nodes remains a critical focus, with many applications having already run at 1-2000 node scale and several at 4000 node scale. While many applications have successfully run on a couple of thousand nodes, achieving full scalability is an ongoing process. Challenges include balancing computational load, managing communication overhead, and optimizing performance for fixed-size problems.

Innovations and the Road Ahead

In the coming months, the team aims to expand the range of applications running on Aurora, looking beyond the initial 40 to potentially hundreds more, covering diverse scientific domains. This expansion is facilitated by improved toolchains and greater stability in programming frameworks.

A noteworthy development is the experimental adaptation of the HIP programming model, originally for AMD GPUs, to run on Intel GPUs. This offers another layer of portability for applications developed using CUDA or HIP, providing more options for developers aiming to leverage Aurora’s capabilities.

Conclusion

The transition to Aurora marks a significant milestone in computational science, providing researchers with unprecedented computing power to tackle complex scientific challenges. The collaborative efforts at Argonne and the continued innovation in programming models and toolchains promise to unlock new scientific discoveries, making Aurora a cornerstone of modern computational research. As the development phase matures, the focus shifts to scientific discovery, fulfilling the promise of exascale computing to push the boundaries of knowledge across multiple scientific fields.

We encourage you to check out Intel’s other AI Tools and framework optimizations and learn about the unified, open, standards-based oneAPI programming model that forms the foundation of Intel’s AI Software Portfolio.

The Guests:

Scott Parker

Lead for Performance Tools and Programming Models at the ALCF

References

ChatGPT & Wolf, A. (2024, Aug 6). Summarize the main points from the following transcript. https://chat.openai.com