oneAPI DevSummit for AI & HPC 2023 Summary: Development & Optimization of a SYCL Backend for libCEED

SusanK_Intel1 · ‎01-25-2024

Susan Kahler, AI/ML Products and Solutions Marketing Manager, Intel | LinkedIn

This blog summarizes the Tech Talk: Development and Optimization of a SYCL Backend for libCEED delivered at the oneAPI DevSummit for AI and HPC in December 2023. In this tech talk, we learn about developing a SYCL backend for the libCEED library on the Aurora Supercomputing platform, which is currently being deployed at the Argonne Leadership Computing Facility. LibCEED is an open-source library developed as part of the Exascale Computing Project (ECP) and is based on C. The library provides interfaces to Fortran, Python, Julia, and Rust and can be run on CPU (serial, AVX, LIBXSMM) and GPU (CUDA, HIP) backends.

In this tech talk, the presenter discusses:

Software design of libCEED (runtime compilation)
SYCL online compiler
Optimization of hotspot kernels

LibCeed is a portable library that provides an API for applications to share efficient kernels for element-based discretization. It can then be interfaced as an Application Programming Interface (API) to scientific applications like MFEM. The libCEED host application generates device code at runtime based on user input and is compiled to run on GPU devices.

The SYCL Intel online compiler extension allows for NVRTC-like runtime compilation, but it is currently restricted to the use of OpenCL C/LevelZero kernels. The Argonne team needed to implement workarounds because OpenCL C does not support templates (a C++ attribute) or function pointers. The presenter walks through an example of how they used the Intel Online Compiler SYCL extension. They are currently working with the Intel team to extend support to C++ SYCL based source code.

Using the SYCL-fluids example for benchmark performance, they looked at the key hotspot kernels and the use of optimization strategies to improve performance. The three optimization strategies used were specialization constants, appropriate workgroup sizes and barriers, and the large register file option in conjunction with the SIMD width size set to 16. These strategies resulted in significant performance improvements.

Lastly, the presenter discussed the need for systematic profiling of the online compile code in LibCEED to gather further insights on performance bottlenecks by using the Intel® VTune™ Profiler, which can be used to optimize application performance, system performance, and system configuration for HPC, cloud, IoT, media, and storage.

Note: This work was done on a pre-production supercomputer with early versions of the Aurora SDK and is not meant to be the final results.

Watch the full video recording here and download the slides to learn more about the project.

About the Speakers

Umesh Unnikrishnan - Argonne National Laboratory/Postdoctoral Appointee

Dr. Umesh Unnikrishnan is a postdoctoral appointee at the Argonne National Laboratory. He has a background in computational fluid dynamics and high-performance computing. His work at the Argonne Leadership Computing Facility focuses on the development of finite-element-based software libraries for running multi-physics scientific codes on GPU computing platforms. Umesh holds a PhD in Aerospace Engineering from the Georgia Institute of Technology.

Kris Rowe - Argonne National Laboratory/Assistant Computational Scientist

Kris is an Assistant Computational Scientist at Argonne National Laboratory’s Leadership Computing Facility. An applied mathematician by training, Kris holds a PhD from the University of Waterloo—with research focusing on geophysical fluid dynamics, high-order methods for incompressible flows, and adaptive mesh refinement. During his postdoc at Cornell University, he went undercover in the Civil and Environmental Engineering Department to study internal waves radiated by the wakes of submersible vehicles and design fast algorithms for high-order methods. Currently, as part of the Performance Engineering Group at ALCF, he is working to prepare computational science and engineering applications for the Aurora Exascale supercomputer.

Varsha Madananth - Intel Corporation/Applications Engineer

Varsha Madananth is an Applications engineer at Intel. She has been working on enabling workload on PVC, focusing on SYCL. She also has experience in performance optimizations using parallel programming techniques like vectorization, threading and heterogenous compute, compiler optimizations, and micro-architecture tuning.

Umesh Unnikrishnan - Argonne National Laboratory/Postdoctoral Appointee

Dr. Umesh Unnikrishnan is a postdoctoral appointee at the Argonne National Laboratory. He has a background in computational fluid dynamics and high-performance computing. His work at the Argonne Leadership Computing Facility focuses on the development of finite-element-based software libraries for running multi-physics scientific codes on GPU computing platforms. Umesh holds a PhD in Aerospace Engineering from the Georgia Institute of Technology.

Useful resources

oneAPI Unified Programming Model
SYCL Intel extension: Online Compilation
libCEED on GitHub