Intel at ISC High Performance 2023 – Leading with Software

MaxTerry · ‎05-30-2023

The explosion of Large Language Models (LLMs) over the past year has generated intense interest and discussion, as millions of people are now using generative AI applications such as ChatGPT in their daily lives. What has been less widely discussed is the emerging scientific “AI boom” in which the disciplines of AI and HPC converge to fuel AI-enabled research.

“There’s this synergistic relationship between AI and HPC,” Jeff McVeigh emphasized in an Intel special presentation at ISC 2023. AI is already enabling HPC workloads such as climate and weather modeling, drug discovery, and high energy physics.

“We’re seeing the opportunities for dramatically reducing the number of iterations for an experiment by using hyperparameter optimization. We’re also seeing the design principles of large-scale supercomputers being applied to large-scale AI systems like the Azure system, with 2,500 CPUs, 10,000 GPUs, and large rankings in the latest Top 500 list.”

One recent example: Argonne National Laboratory,* in collaboration with Intel and HPE, announced plans to create a series of generative AI models for the scientific research community. “The project aims to leverage the full potential of the Aurora supercomputer to produce a resource that can be used for downstream science at the Department of Energy labs and in collaboration with others,” said Rick Stevens, Argonne* associate laboratory director. These generative AI models for science will be trained on general text, code, scientific texts, and structured scientific data from biology, chemistry, materials science, physics, medicine, and other sources. This international collaboration will also include Department of Energy laboratories; U.S. and international universities; nonprofit organizations; and international partners, such as RIKEN.*

But what is needed to truly unleash the potential of converged AI and HPC? One challenge is managing the growing diversity of accelerator hardware–GPUs, FPGAs, specialty AI ASICs, and even CPUs with built-in accelerators. As different accelerator architectures emerged, they required unique languages, tools, and libraries, creating costly, time-consuming complexity for developers. The tendency towards walled gardens encourages vendor lock-in and limits developers’ ability to reuse code. As a result, hardware choice is dictated by the software.

In this dynamic, rapidly changing architectural landscape, the lack of openness and choice, enabling developers to easily leverage all the architectures and resources available, is standing in the way.

Unifying software stacks is critical to maximize performance and development productivity on processors from multiple vendors without the limitations of proprietary lock-in. That’s why Intel supports the oneAPI industry initiative: a programming model for multiple architectures and vendors based SYCL*, the open, standards-based extension of the widely used C++ language from the Khronos Group. oneAPI is an open, vendor-agnostic, community-driven industry specification that enables developers to program across CPUs, GPUs, FPGAs, and other accelerators.

ISC oneAPI.PNG

Intel’s own implementation of oneAPI provides tools developers can use to optimize workload acceleration on Intel hardware and make it faster and easier to build applications blending HPC and AI. At the same time, oneAPI plug-ins from Codeplay, available in the Intel® oneAPI Base Toolkit, make it possible for developers to target their code to Nvidia and AMD GPUs.

A growing ecosystem providing freedom of choice

Intel helped launch the oneAPI industry initiative in 2019. One of the remarkable observations at ISC 2023 was seeing how the ecosystem has evolved and grown in adoption, showing strong multivendor and multiarchitecture momentum with implementations on NVIDIA* GPUs, AMD* CPUs and GPUs, and Arm* CPUs.1

McVeigh was joined by Dr. Hartwig Antz of the University of Tennessee Knoxville to share his views on the growth of the oneAPI ecosystem. “There are several reasons,” said Dr. Antz. “SYCL is an open standard running on different architectures–not only Intel GPUs, but also CPUs–architectures on different vendors. And that is exactly what the community wants. The community doesn’t want to be locked into one vendor and then rewrite all the code when they get a new system.”

“The community has a lot of momentum. The community is really excited. You see that many people in the community actually go the extra mile. They do not do only what they get paid for, but they think SYCL is a really cool thing and I want to try this out on a different platform. And this is why I think SYCL will have a good future.”

Ecosystem progress.PNG

oneAPI was indeed well represented at ISC 2023. In addition to McVeigh’s presentation, there were numerous sessions on multiarchitecture accelerated computing, oneAPI, and SYCL, including participation by several oneAPI Academic Centers of Excellence.

Heterogeneous Programming in Modern C++ with Khronos SYCL: Aksel Alpay of the University of Heidelberg, Dr. Tom Deakin of the University of Bristol, James Brodman of Intel, and Michael Wong of Codeplay led a Birds of a Feather session. Alpay and Deakin were joined by Igor Baratta of University of Cambridge, Igor Vorobstov of Intel, and Rod Burns of Codeplay to conduct a related tutorial.

The Intel Extreme Performance Users Group held a day-long workshop on Communication, I/O, and Storage at Scale on Next-Generation Platforms–Scalable Infrastructures. Sessions included:
- Performance Portability for Next-Generation Heterogeneous Systems, by Dr. Tom Deakin, University of Bristol
- Building a Productive, Open, Accelerated Programming Model for Now and the Future, by Joe Curley of Intel
- Next-Gen Acceleration with Multi-Hybrid Devices–Is GPU Enough?, by Dr. Taisuke Boku, Center for Computational Sciences, University of Tsukuba
- Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators, with Yehonatan Fridman of NRCN and Ben-Gurion University, Guy Tamir of Intel, Dr. Gal Oren of NRCN and the Department of Computer Science, Technion – Israel Institute of Technology
Porting Numerical Integration Codes from CUDA to oneAPI: A Case Study: presented by Ioannis Sakiotis of Old Dominion University.
HPC on Heterogeneous Hardware: Dr. Anzt participated in this workshop with representatives of University of Tsukuba, Sandia National Laboratories, University of York, Polytechic University of Valencia, Nvidia, and Argonne National Laboratory.
A SYCL Extension to Enable Approximate Computing on Heterogeneous Systems: presented by Lorenzo Carpentieri, University of Salerno.
Live demos of Intel® oneAPI tools at the Intel booth included:
- Visualizing Fusion Energy Simulations with GPU Ray Tracing: Demonstrating interactive ray tracing of UKAEA's dataset from their fusion reactor in Kitware's ParaView application using Intel OSPRay on Intel® Data Center GPU Max Series.
- Portable Performance Across CPUs and GPUs: Demonstrating performance and portability of two scientific applications using SYCL (DPEcho) and Fortran+OpenMP offloading (GRILLIX) for both CPUs and GPUs.

Looking forward, Joe Curley stated in this interview with Inside HPC, the community-driven oneAPI standard "will continue to work on the breadth of support for multiple architectures, continue to make the language more productive, work on time to performance, and of course bring out support for our new GPUs."

See more about Intel at ISC High Performance 2023

Learn more about Intel® oneAPI developer tools and Intel-optimized AI frameworks

Get involved and join the oneAPI community!

oneAPI ecosystem examples:

Fujitsu and Riken implemented the oneAPI Deep Neural Network on Arm CPUs on the supercomputer Fugaku
University of Heidelberg has used SYCL on AMD CPUs and GPUs
The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Lab (Berkeley Lab), in collaboration with Argonne, signed a contract with Codeplay to enhance the LLVM SYCL™ GPU compiler for NVIDIA® A100 GPUs
GROMACS molecular dynamics workload executed on AMD & Nvidia GPUS, as well as Intel GPU & CPU from a single binary executable
TensorFlow uses oneDNN to accelerate models, with significant performance improvement
Recent benchmarks have shown that SYCL™ Performance for Nvidia® and AMD GPUs Matches Native System Language.

*Other names and brands may be claimed as the property of others. SYCL is a trademark of the Khronos Group Inc.