Comparing HPC Workload performance between Intel and AMD on AWS (Part 3 of 3)

Mohan_Potheri · ‎07-20-2023

In part 1 of the blog series we looked at the HPC landscape and the tools available on AWS for HPC deployment. In part 2 of the blog series we compared the performance between Intel and AMD HPC instances on AWS for GROMACS. In this final part 3, we will compare the performance between Intel and AMD HPC instances on AWS for OpenFoam.

OpenFoam:

OpenFOAM (Open Field Operation and Manipulation) is an open-source computational fluid dynamics (CFD) software package. It provides a comprehensive set of tools for solving complex fluid flow and heat transfer problems using numerical simulation methods. OpenFOAM is widely used in academia, industry, and research institutions for a variety of applications, including aerospace, automotive, energy, and environmental engineering.

Here are some key features and aspects of OpenFOAM:

Open-Source Nature: OpenFOAM is released under the GNU General Public License (GPL) and is freely available for use, modification, and distribution. Its open-source nature encourages collaboration, knowledge sharing, and community contributions, allowing users to customize and extend the software to meet their specific needs.
Finite Volume Method: OpenFOAM utilizes the finite volume method, a numerical technique for solving partial differential equations that describe fluid flow and heat transfer phenomena. This approach discretizes the domain into finite control volumes and solves the governing equations based on conservation principles for mass, momentum, and energy.
Wide Range of Solvers: OpenFOAM offers a comprehensive suite of solvers that can handle various types of flow problems. It includes solvers for incompressible and compressible flows, laminar and turbulent flows, multiphase flows, reacting flows, and heat transfer problems. These solvers are designed to accurately simulate complex flow phenomena using robust numerical algorithms.
Meshing Capabilities: OpenFOAM provides meshing tools for generating high-quality computational grids. It supports structured, unstructured, and hybrid meshing methods, allowing users to generate meshes that suit their specific simulation requirements. OpenFOAM also includes mesh manipulation and conversion utilities, which facilitate the preprocessing stage of CFD simulations.
Boundary Conditions and Models: OpenFOAM offers a wide range of boundary conditions and physical models to represent various flow conditions and phenomena. It provides options for specifying velocity inlets, pressure outlets, walls, symmetry planes, and other boundary conditions. Additionally, OpenFOAM includes turbulence models, such as the popular k-epsilon and k-omega models, as well as models for heat transfer, combustion, multiphase flows, and more.
Parallel Computing: OpenFOAM supports parallel computing, allowing users to distribute computational tasks across multiple cores or nodes for faster simulations. It utilizes Message Passing Interface (MPI) for parallelization and can take advantage of high-performance computing (HPC) clusters or multi-core workstations to speed up simulations.
Extensibility and Customization: OpenFOAM is designed to be highly extensible and customizable. Users can add their own solvers, models, and utilities to the software to address specific simulation requirements or research objectives. This flexibility allows for the development of new capabilities and the adaptation of OpenFOAM to emerging computational fluid dynamics challenges.

OpenFOAM provides a powerful and versatile platform for performing CFD simulations, enabling engineers, scientists, and researchers to study and analyze a wide range of fluid flow and heat transfer problems. Its open-source nature, robust solvers, and extensive capabilities make it a popular choice for those seeking a flexible and customizable CFD software package.

Performance testing for OpenFoam:

The motorbike Simulation Test Case:

This benchmark offers a free comparison of various CPU processors. The test case is a well-known motorBike tutorial[i] which is a standard part of OpenFOAM installation. The case settings are intentionally left default. The test case can be run on any computer. Three different mesh settings of with three different sizes were used. The base mesh (size S) is about 8.6 * 10^6 cells (the tutorial one is 3.2 * 10^5) # of cells. Appendix A shows the configuration used.

The total simulation time calculated was for the blockMesh, decomposePar, snappyHexMesh, potentialFoam and simpleFoam processes combined.

Figure 3: The Motorbike Simulation Test case

The tests were performed in June 2023 on AWS ParallelCluster in Amazon region us-east-2. All Head Nodes and Compute nodes for the testing used general Purpose SSD gp3 storage as local storage. The benchmarks were run on 6 nodes of Intel based Amazon EC2 hpc6id instances and compared with the run of 4 nodes of AMD based Amazon EC2 hpc6a instances. The goal was to match the number of total cores in each cluster and so 6 nodes of hpc6id were compared with 4 nodes of hpc6a for a total of 384 cores. The config files and the scripts used for the testing are shown in Appendix A. The following IAAS instance configuration was used for the comparison in performance between Intel (HPC6id) and AMD (HPC6a) instances.

Category	Attribute	Config1	Config2
Run Info
	Cumulus Run ID	N/A	N/A
	Benchmarks	OpenFOAM Motorbike L, XL	OpenFOAM Motorbike L, XL
	Date	June 2023	June 2023
	Test by	Intel	Intel
CSP and VM Config
	Cloud	AWS	AWS
	Region
	Instance Type	HPC6a in AWS ParallelCluster	HPC6id in AWS ParallelCluster
	CPU(s)	384	384
	Microarchitecture	AWS Nitro	AWS Nitro
	Instance Cost
	Number of Instances or VMs (if cluster)
	Iterations and result choice (median, average, min, max)
Memory
	Memory	1534 GB	6144 GB
	DIMM Config
	Memory Capacity / Instance
Network Info
	Network BW / Instance	100 GBPS EFA	200 GBPS EFA
	NIC Summary
Storage Info
	Storage: Network Storage	FSX Lustre 1.2 TB	FSX Lustre 1.2 TB

Table 4: Instance and Benchmark Details for OpenFOAM

Multiple runs of the small, medium, and large use cases were run for HPC6id (Intel) an HPC6a (AMD) with a configuration to leverage a total of 384 cores. The results are shown below.

Large	HPC6id (Intel)	HPC6a (AMD)
Run 1	252	369
Run 2	252	368
Run 3	256	369
Average	253	369

X-Large
Run 1	886	1472
Run 2	884	1479
Run 3	881	1467
Average	884	1473


Runtime	HPC6id (Intel)	HPC6a (AMD)
Large (42 million Cells)	253	369
X-Large (145 million Cells)	884	1473

	HPC6id (Intel)	HPC6a (AMD)
Large (42 million Cells)	145.53%	100.00%
X-Large (145 million Cells)	166.69%	100.00%

Table 5: Raw Data from OpenFOAM Testing

Figure 4: Chart comparing absolute performance in runtimes for motorbike OpenFOAM benchmark between Intel and AMD instance types.

Figure 5: Performance comparison by percentage for OpenFOAM benchmarks

OpenFoam Results:

The results clearly show that Intel based HPC6id instances perform 46-67% faster than AMD based HPC6a instances. Intel is the established player in the HPC market with many SW optimizations for HPC workloads that could further improve performance for its instances over that of AMD. Any potential cost differences between the instances and cluster cost are easily compensated for with the improved performance shown in the results.

Conclusion:

Intel has been the leader and innovator for HPC over the past few decades. Intel based cloud instances on Amazon EC2 provide the agility and scalability of the Cloud for HPC workloads. Intel works with all major HPC workloads to optimize them for performance and scalability on its processors running on-premises and in the Cloud. Intel contributes to many libraries and open-source projects for HPC such as MPI and with Intel oneAPI HPC toolkit [ii]. Our head-to-head comparison between Intel and AMD based instances on Amazon EC2 for common HPC Workloads such as GROMACS and OpenFoam, clearly showcases Intel superiority in performance and scalability. Superior performance for Intel more than compensates for any differences in cost with AMD for HPC workloads.

Bibliography:

[i] Amazon Elastic Compute Cloud (Amazon EC2) Hpc6id[1] instances, powered by 3rd Generation Intel Xeon Scalable processors, offer cost-effective price performance for memory-bound and data-intensive high-performance computing (HPC) workloads in Amazon EC2.

[ii] Amazon EC2 AMD based HPC6a instances: AMD is claiming that Amazon EC2 Hpc6a [1]instances offer the best price performance for compute-intensive high-performance computing (HPC) workloads in Amazon EC2. Most of the cost savings claims can be attributed to increased core count for AMD over Intel per socket.

[iii] The PEP (Performance Evaluation Project) benchmark for GROMACS is a widely recognized benchmark used to assess the performance of computer systems and architectures in running molecular dynamics simulations using GROMACS.

[iv] Satellite Tobacco Mosaic Virus (STMV) is a small, icosahedral plant virus that worsens the symptoms of infection by Tobacco Mosaic Virus (TMV).

[v] This benchmark offers a free comparison of various CPU processors. The test case is a well-known Motor Bike tutorial which is a standard part of OpenFOAM installation.

[vi] The Intel® oneAPI Base Toolkit includes powerful data-centric libraries, advanced analysis tools, and Intel® Distribution for Python* for near-native code performance of core Python numerical, scientific, and machine learning packages.

Disclosure text:

Tests were performed April-June 2023 on AWS in region us-east-1. Full configuration details are shown in table 1 and 4. The total number of cores of the AWS ParallelCluster configs were matched for Intel and AMD at 384 cores for the comparison.

Instance Size	Physical Cores	Memory (GiB)	EFA Network Bandwidth (Gbps)	Network Bandwidth (Gbps)*
hpc6id.32xlarge	64	1024	200	25

6 x HPC6id instances were used in an AWS ParallelCluster with the following configuration:

The Amazon Elastic Compute Cloud (Amazon EC2) Hpc6id instances, powered by 3rd Generation Intel Xeon Scalable processors, offer cost-effective price performance for memory-bound and data-intensive high performance computing (HPC) workloads in Amazon EC2.

Instance Size	Physical Cores	Memory (GiB)	EFA Network Bandwidth (Gbps)	Network Bandwidth (Gbps)*
Hpc6a.48xlarge	96	384	100	25

4 x HPC6a instances were used in an AWS ParallelCluster with the following configuration:

The Amazon EC2 Hpc6a instance features the 3rd generation AMD EPYC 7003 series processors with up to 3.6 GHz all-core turbo frequency built on a 7nm process node for increased efficiency.

DB Client machine details: For the client machine, we used the EC2 instance type: c6i.4xlarge with 16vCPU (8 core), with 32 GB Memory, 75 GB GP2 Storage volume with 12.5GB Network bandwidth powered by 3rd Generation Intel Xeon Scalable processors. The client machines use the following Software Image (AMI) with Canonical, Ubuntu, 20.04 LTS, amd64 focal image build on 2022-09-14 & ami-0149b2da6ceec4bb0. All Instances, as well as the client Instances were run in US-EAST-1 region. Benchmarking Software:

FSX Shared storage details:

File system type: Lustre
Deployment type: Persistent 2
Data compression type: NONE
Storage type: SSD
Storage capacity: 1.2 TiB
Throughput per unit of storage: 125 MB/s/TiB
Total throughput: 150 MB/s
Root Squash: Disabled
Lustre version: 2.12

Notices & Disclaimers:

Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary. For further information please refer to Legal Notices and Disclaimers.

Intel technologies may require enabled hardware, software, or service activation.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.