oneAPI and SYCL* Accelerate Matrix Solver with a Common Codebase for Multivendor GPUs

Nikita_Shiledarbaxi · ‎08-12-2024

Speed up the Study of Phenomena in Disordered Systems

Authors:

Nikita Shiledarbaxi, Software Product Marketing Engineer, Intel

Rob Mueller-Albrecht, Software Tools Marketing Manager, Intel

Shiquan Su, Software Technical Consulting Engineer, Intel

Fengrui Zhang, Software Technical Consulting Engineer, Intel

At SC23, Fengrui Zhang (Intel), Shiquan Su (Intel), Xiao Zhu (University of Washington), Xiao Liang (Pittsburgh Supercomputer Center), and Yang Wang (Pittsburgh Supercomputer Center) presented a SYCL* implementation of the MuST (Multiple Scattering Theory) framework. The MuST framework provides a high-accuracy and efficient numerical approach to the study and simulation of quantum phenomena in random, locally self-consistent, but largely disordered systems. At the center of its compute-intensive calculations is a matrix solver, which in its original form, is implemented in Fortran.

The most time-consuming part of the CPU-specific Fortran code, i.e., the matrix inversion step, was significantly accelerated using Intel® oneAPI tools and the SYCL programming framework. Moving to a single, optimized SYCL codebase allows you to harness the computational power not just of the CPU, but additionally of GPUs from diverse vendors like Intel, NVIDIA* and AMD*.

This blog will give you an overview of how oneAPI tools and SYCL helped accelerate matrix solvers on cross-vendor GPUs and achieve code portability and scalability for deployment on different parallel supercomputers. Before going into these details, let us briefly discuss the MuST project and the oneAPI tools used in the experiment:

1. Intel® oneAPI Math Kernel Library (oneMKL)

oneMKL is a high-performance library for accelerating and optimizing math routines on Intel® architectures. It is an extensive collection of math functions, including linear algebra, vector math, Fast Fourier Transforms (FFTs), and more. It enables offloading computations to GPUs for parallel executions using OpenMP* and SYCL frameworks.

The oneMKL Interfaces Project, an open-source implementation of the oneMKL specification, provides SYCL functionalities to perform numerical computations across a wide range of domains, each with relevant code samples available at the oneAPI GitHub repository.

2. Intel® Fortran Compiler and Intel oneAPI DPC++/C++ Compiler: An Overview

Intel® Fortran Compiler, named ifx, is a oneAPI-powered compiler based on the LLVM* technology for efficient compilation, latest programming language standard compiliance, and execution performance of Fortran code on Intel architectures, both CPU and GPU. Intel® oneAPI DPC++/C++ Compiler, the world’s first fully SYCL 2020 compliant compiler, is another LLVM-based, industry-standard, cross-architecture compiler for C, C++, and SYCL code.

3. Codeplay* oneAPI Plugins for NVIDIA* and AMD* GPUs

Codeplay* provides oneAPI Plugins for NVIDIA* and AMD* GPUs that bring all the benefits of heterogeneous computing to your hardware. The plugins add support for the Intel® oneAPI Base Toolkit to NVIDIA and AMD GPUs. Furthermore, these plugins have been fully open-sourced, allowing you to build them for your own implementation of oneAPI.

The plugins thus enable multi-architecture, cross-vendor programming by unlocking the potential of the SYCL programming framework on NVIDIA and AMD GPUs.

Fig.1: Codeplay’s oneAPI Plugins Enable oneAPI Programming on NVIDIA and AMD GPUs

About The MuST Framework

MuST is an open-source computational framework designed for the study of quantum phenomena in disordered materials. It is a software suite for electronic structure calculations based on a multiple scattering theory known as the Korringa-Kohn-Rostoker (KKR) method or Green’s function method. The framework enables the first-principle study (i.e., the study of electronic structure) of random alloys and disorder effects in quantum materials.

The MuST project, funded by the U.S. National Science Foundation (NSF), is a collaborative effort of expert researchers from high-performance computing (HPC), condensed matter physics, applied mathematics, applied materials science, and software engineering communities.

Learn more about the MuST framework on the project website and GitHub.

Challenge: Slow Matrix Inversion

The original Fortran code of the MuST framework is CPU-specific and employs a block LU algorithm to perform matrix inversion. The matrix inversion step is computationally so expensive that it requires approximately 80%-90% of the total execution time of the application. The challenge is to expedite the matrix inversion process and make it platform-independent for faster results across diverse architectures.

Proposed Solution: Accelerate Matrix Inversion with SYCL and oneAPI

Fig.3 shows the workflow of the matrix inversion stage of MuST framework usage. In contrast, the original slow inversion method employed a block LU algorithm, which works only on CPUs. When accelerated using NVIDIA* cuSOLVER library function calls, the resultant code gets vendor-locked to NVIDIA GPUs.

To free the matrix inversion code from vendor lock-in and accelerate it on multi-vendor GPUs, the solution proposed at SC23 involves interlanguage calls to the oneMKL SYCL API from the Fortran-coded matrix inversion. As shown in fig.2, these are implemented using the ICO_C_BINDING intrinsic module for Fortran/C interoperability and zgetr_i() function that internally calls oneMKL SYCL functions.

Fig. 2: Matrix Inversion Interlanguage Calls Using ISO_C_BINDING

The resulting SYCL version of the code can harness the potential of accelerated GPUs from diverse vendors, including Intel, NVIDIA and AMD.

Fig.3: Matrix Inversion in MuST Framework

The proposed solution leverages both the Intel Fortran compiler to compile the Fortran-coded functionalities and the Intel oneAPI DPC++/C++ Compiler to compile the oneMKL SYCL function calls on cross-vendor GPUs.

When computations were performed on GPU, it was found that the proposed, simpler yet accelerated method of matrix inversion performs faster than the more advanced, original block LU method. Hence adopting the proposed solution, the physicist researchers of the project can focus more on the scientific details than devising complex algorithms for computations.

What’s Next?

Get started with oneMKL, Intel oneAPI DPC++/C++ Compiler, and Codeplay’s oneAPI plugins for NVIDIA and AMG GPUs to develop optimized, common SYCL codebase capable of yielding high performance across multi-vendor, heterogeneous architectures. We encourage you to explore other AI, HPC, and Rendering tools in Intel’s oneAPI-powered software portfolio.

Get The Software

Download the standalone versions of oneMKL, Intel Fortran Compiler, and Intel oneAPI DPC++/C++ Compiler.

oneMKL and Intel oneAPI DPC++/C++ Compiler are also included in the Intel oneAPI Base Toolkit.

You can also install Intel oneAPI DPC++/C++ Compiler and Intel Fortran Compiler as parts of the Intel® HPC Toolkit.