Our Future with Hierarchical Heterogeneous Computing

Rick_Johnson · ‎06-12-2023

Posted on behalf of Tim Mattson, Senior Principal Engineer, Intel Corporation

The recent passing of Intel’s co-founder, Gordon Moore, brings to mind the huge changes in computer architecture over the past decade. The market demands high performance, power-efficient computing that minimizes the total cost of ownership. This forces us to keep pushing the limits of Moore’s law. Since a processor running workloads matched to its architectures delivers better performance per watt, market demands have also pushed us to innovate with specialized processor architectures. The need for a general-purpose CPU will never go away, but as key workloads evolve, there will be a drive to build processors optimized to support important workloads. For example, workloads in AI benefit greatly from the data-parallel, throughput optimized design of a GPU. This has made the GPU a fundamental part of the modern data center. This trend will only continue leading to a new “Golden age of computer architecture” with heterogeneous systems that will exploit multiple architectures. With ever larger models in AI and HPC’s increasing demands, these heterogeneous systems will include heterogeneity at multiple levels; accelerators inside a processor, processors based on multiple architectures on a node, and different node types across a distributed system. Heterogeneity won’t just be ubiquitous. It will be hierarchical.

Hierarchical heterogenous computing creates a tremendous opportunity for innovation in hardware. For software developers, however, this Golden age in computer architecture is scary. If developers have to write different code for each architecture and optimize separately for each distinct system, they won’t be able to keep up. Experience has shown that we must make heterogenous computing a natural extension of the developer’s intention — one that would allow developers to write code once and reuse that code across multiple platforms – while not burdening the programmer with excess complexity, compatibility, and optimization challenges.

Figure 1. A Golden Age for hardware optimized, heterogenous, accelerated computing. Statistic data is from Evans Data Global Development Survey Report 22.1, June 2022. Slide is from Accelerating Developer Innovation Through An Open Ecosystem session, at Intel Innovation 2022.

An Open Ecosystem for hierarchical heterogenous computing

A hierarchical system that embraces heterogeneity at every level is where hardware is moving. How will we support this in software? At the lowest level are programming languages and supporting libraries for writing programs that run on the CPUs and GPUs that make up the node. For C++ programmers, this means the SYCL standard — a programming language that exploits the GPU model for all GPUs on the market, but at the same time, works quite well across the cores and vector units of a CPU (oneAPI’s SYCL implementation (Data Parallel C++) is one that has some of the broadest multiarchitecture/multivendor support). For C and Fortran programmers, we have OpenMP. This open standard is integrated with our Intel compilers. It supports directive-driven programming for the CPU and the GPU.

SYCL and OpenMP are open standards that work across the processor hierarchy. Intel does not buy-in to the walled-garden concept. An open ecosystem for software means you write code once and it will map onto all mainstream processors; even ones that come from companies other than Intel. That is what it means to respect the developer community and focus on a developer-first mindset.

SYCL and OpenMP are mature technologies. The challenge is to fit them into large-scale distributed systems. Writing parallel code for distributed computing is familiar to HPC programmers and when we can enable it behind a fixed API, AI programmers work with distributed systems as well. Direct programming of distributed systems, especially when combined with the heterogeneity on the node, is a new challenge for most programmers.

In my group’s work in Intel Labs, we believe the foundational distributed computing layer is based on a partitioned global address space. Nodes open up windows into memory and any node in the system can utilize put/get operations (one sided communication) to move data around the system. This powerful technique (well known in the HPC community) works across the concurrency models of multiarchitecture processors.

golden-age for sw development.png

Figure 2. Software enables this golden age of hardware optimized, heterogenous, accelerated computing.

A partitioned global address space, however, is challenging for programmers new to distributed parallel computing. In my research, we believe we can enable that complexity behind an abstraction of distributed data structures. You don’t put and get buffers into memory windows. Instead, you put and get into array index ranges and leave it to the system to map those onto the partitioned global address space.

We know this can work in principle based on work in the HPC community. An open ecosystem, to be effective, must be open to everyone. That means we have to connect to the masses of programmers working in high level languages such as Python, people who won’t be comfortable using SYCL let alone a portioned global address space. How are we going to serve those people?

This is where the work in my group on machine programming comes in. As I described in my Intel Innovation 22 blog post, we need to meet programmers on their level. We need human focused expressions of programmer intent that we transform into low level code that runs across nodes using the oneAPI languages.

This is research. We are not talking about products we’ll deliver any time soon. We have a vision, however, and my research group working closely with the oneAPI team are working to make this a reality. We won’t rest until we have an open ecosystem that connects to every developer to create software that runs on hierarchical heterogenous systems. We have a great starting point with oneAPI. Fancy dreams about machine programming generating code based on high level expressions of programmer intent are fine, but we need a foundation to build on. — and that foundation is oneAPI.

So, taking this out of the realm of research and a “Golden age for Software Developers”, we can only do this because of the open ecosystem Intel is enabling today. None of this would be possible without the oneAPI industry initiative. You can see an overview of the oneAPI industry initiative and Intel developer tools, along with my conversation with Andrew Richards on the benefits of SYCL. Readers can also see more of my perspective on the adoption of oneAPI in my blog: Centers of Excellence Lead Adoption of oneAPI’s Vision (intel.com). A large tranche of information can be gained by viewing the session of the Innovation 2022 session Accelerating Developer Innovation Through An Open Ecosystem.

Notices & Disclaimers

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

Statements in this document that refer to future plans or expectations are forward-looking statements. These statements are based on current expectations and involve many risks and uncertainties that could cause actual results to differ materially from those expressed or implied in such statements. For more information on the factors that could cause actual results to differ materially, see our most recent earnings release and SEC filings at www.intc.com.