Technion* oneAPI CoE: Performance Acceleration, Portability, and Scalability of OpenMP* GPU Offload

Rob_Mueller-Albrecht · ‎05-31-2023

Technion’s Experience with Intel® oneAPI DPC++/C++ Compiler and Intel® Data Center GPU Max Series

At ISC* High Performance 2023 in Hamburg, Germany last week, the Intel Extreme Performance Users Group (IXPUG) offered a deep-dive workshop. Technion – the Israel Institute of Technology took the opportunity to discuss its findings investigating the “Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators”. In their discussion they previewed and expanded on the paper by Yehonatan Fridman, Guy Tamir, and Gal Oren accessible at the Cornell University* Computer Science e-print archive (arXiv).

The results they present are very insightful. I strongly encourage you to take the time to read the publication in full:

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Y Fridman, G Tamir, G Oren - arXiv preprint arXiv:2304.04276, 2023

They stress the importance of code portability across different platforms and GPU architectures.

In addition, they point out the role support for the latest OpenMP standards along with open developer framework support in the form of LLVM* or oneAPI play, in driving the vision of cross-architecture heterogeneous compute forward.

The findings are topped off with detailed performance analysis running a LULESH benchmark on Intel® Data Center GPU Max Series 1100 and NVIDIA* A100 Tensor Core GPU.

In the Taylor-von Neumann-Sedov blast problem, a volume of uniform density and temperature is initialized. Then a large quantity of thermal energy is injected at the center. This over-pressurized region rapidly develops into a shock wave, which expands in a well-defined self-similar fashion.

The Livermore* Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH) algorithm, implemented at Lawrence Livermore National Laboratory*, provides an analytic answer to a simplified version of this problem. The algorithm is highly parallelizable and serves as a great proxy for real-world scientific computational problems. For more details, you can have a look at the source code on GitHub*.

Technion Israel Institute of Technology advancing the vision of oneAPI

Technion is one of over 30 Intel® oneAPI Centers of Excellence that contribute to open accelerated computing and are propelling the next generation of innovation with open standards, collaboration, and support as part of the oneAPI ecosystem .

Leading the center are Dr. Gal Oren and Prof. Hagit Attiya from the Henry and Marilyn Taub Faculty of Computer Science at the Technion – Israel Institute of Technology, in collaboration with Prof. Danny Hendler from the Computer Science Department of Ben-Gurion University.

In addition to actively leveraging oneAPI in their research, they developed a comprehensive online course “Shared-Memory Parallelism: CPUs, GPUs and In-Between.” The course provides an introduction into parallel programming frameworks used for modern multi and many-core architecture processors.

These programming models scale from parallel compute offload to clusters of modern multi-core and many-core configurations.

Some of the topics covered are:

OpenMP
Creating Threads
Synchronization
Parallel Loops
Data Environment
Memory Model
Irregular Parallelism and Tasks
Memory Access / non-uniform memory access
Thread Affinity
SIMD–Vectorization
Heterogeneous Architectures
Intel oneAPI Base and HPC Toolkits

The detailed course outline can be found here.

Please check out Technion’s press release to find out how they endeavor to enable a new generation of developers, so they can drive heterogeneous applications performance to the limit taking advantage of oneAPI.

“Technion’s oneAPI Center of Excellence, the first in Israel, is an exciting step forward preparing students for a multi-architecture computing world by teaching them SYCL and oneAPI,” says Scott Apeland, Senior Director of Intel Developer Ecosystem Programs. “This oneAPI Center brings open, standards-based programming skills to students to innovate, drive research, and advance science and industry.”

Tackling Heterogeneous Compute

Performance in scientific computing, complex physics modeling, finance, and machine learning / artificial intelligence has been mostly driven by advances in the adoption of heterogeneous compute.

Different implementations from a variety of silicon vendors like AMD* and NVIDIA* exist. To ensure interoperability and scalability across CPUs, GPUs, and accelerators, it is however necessary to drive common open frameworks.

Programmers may not want to deal with different ISA (Instruction Set Architecture) in a single application, if they want to offload the compute-intense part of the application to the GPU or other devices. A programming model that makes the hardware layer transparent and provides a high level of usability is needed.

OpenMP is the most widely used standard for multi-threading in scientific computing applications. For those developing in C++, SYCL* provides an organic extension to the language standard. Intel fully embraces both those standards and open frameworks.

This embrace is most evident in Intel’s active promotion and spearheading of oneAPI. oneAPI is an open, cross-industry, standards-based, unified, multiarchitecture, multi-vendor programming model that delivers a common developer experience across accelerator architectures – for faster application performance, more productivity, and greater innovation. The oneAPI initiative encourages collaboration on the oneAPI specification and compatible oneAPI implementations across the ecosystem.

Technion, in their research, takes advantage of Intel’s implementation of oneAPI and the constituent tools and optimizations across HPC, AI, and other domains. To run the analysis referenced in their paper, the Intel® oneAPI DPC++/C++ Compiler with the open-source LLVM backend as well as OpenMP 5.1 support was used.

OpenMP for GPUs

OpenMP support for GPU has steadily increased over the years. Offloading capability was added as early as 2013 with the OpenMP API 4.0 and continuous expansion of OpenMP target constructs ever since. The Intel oneAPI DPC++/C++ Compiler fully embraces OpenMP and provides advanced OpenMP library support.

The latest versions of OpenMP 5.x, have introduced many new target offload and host-based features to the programming model with advanced scientific and exascale many-core computing in mind.

oneAPI: The Future

Driving open standards and open parallel software development frameworks via the oneAPI industry initiative, Intel and its partners in the oneAPI Academic Centers of Excellence are ready to usher in the future of heterogeneous compute. We invite you to join us on the journey towards the next generation of scalable multi-platform, multi-architecture software development for science, enterprise and beyond.

Additional Resources

Get the Software

Test it for yourself today by downloading and installing the Intel® oneAPI Base Toolkit, just the Intel® oneAPI DPC++/C++ Compiler and Intel® Fortran Compiler, or any of the other oneAPI-powered AI or HPC tools from Intel. In addition to these download locations, many of the tools are also available through partner repositories.