Intel Labs Research Receives Outstanding Paper Award at HPEC 2024

Scott_Bair · ‎09-25-2024

Scott Bair is a key voice at Intel Labs, sharing insights into innovative research for inventing tomorrow’s technology.

Highlights:

This year’s IEEE High Performance Extreme Computing Virtual Conference will be hosted through the Engagez platform from September 23rd to 27th.
Intel’s presentation of an HPC-specific language model, MONOCODER, has received an Outstanding Paper Award at this year’s conference.
Intel has five accepted works at HPEC 2024, which include an AI solution for post-silicon power performance development and validation; a discussion on how to characterize power-performance tradeoff across Intel Xeon CPU parameters in a variety of use cases; an evaluation of the resilience of HPC program loops; and an analysis of the impact of Intel Optimized Power Mode on power consumption.

The IEEE High Performance Extreme Computing Conference is an annual virtual event for experts, researchers, and students from around the world to connect on improving the performance of hardware, software, systems, and applications. This year’s conference will be hosted through the Engagez platform from September 23rd to 27th.

Intel Labs is proud to announce that one of Intel’s accepted works, “MONOCODER: Domain-Specific Code Language Model for HPC Codes and Tasks,” has received an Outstanding Paper Award at the 2024 conference.

Greater access to powerful compute resources has led to a growing trend in AI for software developers to create large language models (LLMs) that address a variety of programming tasks. LLMs applied to high-performance computing (HPC) domain are extremely large and demand expensive compute resources for training. This is partly because LLMs for HPC tasks are obtained by finetuning existing LLMs that support several natural and/or programming languages. However, the research team posited that it should not be necessary to use LLMs trained on languages unrelated to HPC for HPC-related tasks. Thus, the work presented aims to question choices made by existing LLMs by developing smaller language models (LMs) for specific domains, which the team calls domain-specific LMs.

Specifically, the team started with HPC as a domain and built an HPC-specific LM, named MONOCODER, which is orders of magnitude smaller than existing LMs but delivers better performance on non-HPC and HPC codes. Researchers pre-trained MONOCODER on an HPC-specific dataset (named HPCORPUS) of C and C++ programs mined from GitHub. They then evaluated the performance of MONOCODER against state-of-the-art multi-lingual LLMs. Results demonstrate that MONOCODER, although much smaller than existing LMs, outperforms other LLMs on normalized-perplexity tests (in relation to model size) while also delivering competing CodeBLEU scores for high-performance and parallel code generations. In other words, results suggest that MONOCODER understands HPC code better than state-of-the-art LLMs.

Intel’s other contributed works at the conference include an AI solution for post-silicon power performance development and validation; a discussion on how to characterize power-performance tradeoff across Intel Xeon CPU parameters in a variety of use cases; an evaluation of the resilience of HPC program loops; and an analysis of the impact of Intel Optimized Power Mode on power consumption.

Contributed Papers

Artificial Intelligence Solution on Intel Xeon Processor Power and Performance Engineering

Nowadays the major Cloud Service Providers (CSP) are setting up high-performance infrastructures to meet cloud customers’ diverse computing demands. In order to help CSP customers invest in the right areas to build high-performance Xeon-based systems Intel invests significant resources on Xeon products’ power performance features. However, tailoring services to unique customer needs requires large engineering costs and is not scalable. This paper introduces an AI solution, named Bench Counselor, for post-silicon power performance development and validation. Bench Counselor could suggest the most valuable hardware investment areas as per customer usage or benchmarking methodology, while reducing engineering resources. Training AI models with historical Xeon processor performance results and system hardware configurations, Bench Counselor could efficiently assist power and performance engineers in categorizing outliers and debugging. The AI solution could also provide heuristics on the most valuable investment areas to get significant performance gain.

Intel Xeon Optimization for Efficient Media Workload Acceleration

This paper discusses key methodologies involved in performing Workload Affinity characterization along with how to characterize the power-performance tradeoff across fine granular Intel Xeon CPU parameters across variety of industry popular Media use cases. Key results from the detailed study along with business acumen helped to define the first ever Media workload optimized Intel Xeon CPU.

Investigating Resilience of Loops in HPC Programs: A Semantic Approach with LLMs

Transient hardware faults, resulting from particle strikes, are significant concerns in High-Performance Computing (HPC) systems. As these systems scale, the likelihood of soft errors rises. Traditional methods like Error-Correcting Codes (ECCs) and checkpointing address many of these errors, but some evade detection, leading to silent data corruptions (SDCs). This paper evaluates the resilience of HPC program loops, which are crucial for performance and error handling, by analyzing their computational patterns, known as the thirteen dwarfs of parallelism. Researchers employ fault injection techniques to quantify SDC rates and utilize Large Language Models (LLMs) with prompt engineering to identify the loop semantics of the dwarfs in real source code. Contributions include defining and summarizing loop patterns for each dwarf, quantifying their resilience, and leveraging LLMs for precise identification of these patterns. These insights enhance the understanding of loop resilience, aiding in the development of more resilient HPC applications.

Power Efficient Deep Learning Acceleration using Intel Xeon® Processors

With the exponential growth of AI applications in data center, one of the foremost concerns is power consumption. Intel Optimized Power Mode (OPM) aims to lower power and reduce cooling costs when servers are not at full utilization. Most of the data center deployments keep platform workload mix to be around the 30% to 40% utilization range for total cost of ownership and the capability to handle any spikes. In this paper, performance/watt has been measured on Intel 5th Gen Intel Xeon Scalable Processors using both gen-AI and non-Gen-AI workloads to see the impact of OPM on power consumption. Results show that OPM yields up to 25% improvement on performance/watt at 25% server utilization. Performance or performance/watt improvement varies depending on the use cases running. Meanwhile, when hitting 100% server utilization, the performance/watt using the OPM is like out-of-the-box performance.