Comparing HPC Workload performance between Intel and AMD on AWS (Part 1 of 3)

Mohan_Potheri · ‎07-10-2023

Intel leadership in HPC:

Intel has been a significant player in the field of high-performance computing (HPC) for several decades. The company has consistently strived to push the boundaries of computing performance, power efficiency, and scalability, making it a trusted provider of HPC solutions.

Intel's leadership in HPC can be attributed to several key factors:

Processor Architecture: Intel's x86 architecture has long been a dominant force in the HPC market. Intel Xeon processors are widely used in high-performance computing clusters and supercomputers due to their strong performance, advanced features, and broad software ecosystem.
Technology Innovation: Intel has been at the forefront of technology innovation, continuously introducing new processor generations with higher core counts, improved instructions per cycle (IPC), and enhanced memory bandwidth. These advancements enable HPC users to tackle complex workloads and simulations more efficiently.
Software Optimization: Intel invests heavily in optimizing software for its processors, collaborating with software developers to ensure that applications and tools are highly optimized for Intel architecture. This optimization enables users to extract maximum performance from Intel processors, enhancing overall HPC capabilities. Intel MPI is a true example of Intel’s contribution to HPC as it is the most widely used MPI library for HPC.
Co-design Approach: Intel takes a co-design approach to develop HPC solutions. This involves collaborating closely with system vendors, software developers, and end-users to understand their requirements and tailor solutions accordingly. By working closely with the HPC community, Intel can deliver technologies that address specific needs and deliver optimal performance.
Research and Development: Intel invests significantly in research and development, exploring new technologies and architectures to drive the future of HPC. This commitment to advancing the field ensures that Intel remains at the forefront of innovation and capable of addressing the evolving needs of the HPC community.
Industry Partnerships: Intel collaborates with industry partners, academic institutions, and national laboratories to foster innovation and drive HPC advancements. These partnerships enable Intel to leverage collective expertise, resources, and insights to develop and deliver cutting-edge solutions.

It is important to note that the HPC landscape is competitive, and other companies, such as AMD, NVIDIA, and IBM, also play significant roles in the market. However, Intel's long-standing presence, technological advancements, and strong partnerships have solidified its leadership position in the HPC space.

Leveraging AWS Cloud for HPC:

High-Performance Computing (HPC) in the cloud refers to the practice of leveraging cloud computing infrastructure and services to perform computationally intensive tasks that require significant computational power, memory, or storage resources. It allows organizations and individuals to access high-performance computing capabilities on-demand without the need to invest in and maintain dedicated HPC hardware.

Using Amazon Web Services (AWS) for high-performance computing (HPC) in the cloud offers several advantages:

Scalability: AWS provides virtually unlimited scalability, allowing users to easily scale up or down their HPC infrastructure based on workload demands. With AWS, you can provision and deploy a cluster of any size, from small-scale testing environments to large-scale supercomputing clusters, without the need for upfront investments in hardware.
On-Demand Resources: AWS offers a vast range of compute resources, including powerful instances optimized for different HPC workloads. Users can choose from a variety of instance types, such as compute-optimized instances for high-performance computing, memory-optimized instances for data-intensive workloads, and GPU instances for accelerated computing tasks. These resources are available on-demand, enabling users to quickly access the required computational power when needed.
Cost Efficiency: AWS provides flexible pricing options that can help optimize costs for HPC workloads. Users can leverage spot instances, which offer significant discounts compared to on-demand instances, to run non-critical and fault-tolerant workloads at lower costs. Additionally, AWS provides cost management tools and monitoring capabilities to help users understand resource utilization, identify cost-saving opportunities, and optimize spending.
Wide Range of Services: AWS offers a comprehensive suite of services that can enhance HPC workflows. For example, Amazon S3 provides highly durable and scalable object storage for data archiving and sharing. AWS Batch and AWS Lambda offer job scheduling and serverless computing options, respectively, simplifying the management of HPC workloads. Additionally, services like Amazon Elastic File System (EFS) and Amazon FSx for Lustre provide scalable and high-performance file systems for storing and accessing data.
Global Infrastructure: AWS has a global infrastructure footprint, with data centers located in multiple regions worldwide. This distributed infrastructure allows users to deploy HPC clusters closer to their data sources and end-users, minimizing latency and improving performance. Additionally, AWS offers tools and services for data transfer and replication, enabling efficient data movement between regions.
Security and Compliance: AWS provides robust security measures to protect HPC workloads and data. AWS implements physical and logical security controls, and users can leverage features such as virtual private clouds (VPCs), network access control lists (ACLs), and security groups to isolate and secure their HPC environments. AWS also offers compliance certifications and frameworks, such as HIPAA, GDPR, and ISO 27001, ensuring that HPC workloads can meet regulatory requirements.
Collaboration and Integration: AWS facilitates collaboration and integration within HPC workflows. It supports integrations with popular HPC software packages and frameworks, making it easier to migrate existing workloads to the cloud. AWS also provides features like identity and access management (IAM), which allow users to manage access permissions and control resource sharing among team members or collaborators.

These advantages make AWS a compelling choice for running HPC workloads in the cloud, offering scalability, cost efficiency, a wide range of services, global infrastructure, security, and collaboration capabilities.

AWS ParallelCluster:

AWS ParallelCluster is an open-source cluster management tool provided by Amazon Web Services (AWS) for deploying and managing high-performance computing (HPC) clusters on AWS infrastructure. It simplifies the process of setting up, configuring, and scaling HPC clusters, allowing users to focus on their computational workloads rather than the underlying infrastructure.

Key features and capabilities of AWS ParallelCluster include:

Cluster Management: AWS ParallelCluster provides a command-line interface (CLI) and a CloudFormation template to create, configure, and manage HPC clusters. It allows users to specify the desired cluster configuration, including the number and types of instances, storage options, networking, and software stack.
Elastic Scalability: AWS ParallelCluster enables automatic scaling of cluster resources based on workload demands. Users can define scaling policies that automatically add or remove compute nodes based on predefined conditions, such as CPU utilization or queue length. This flexibility allows clusters to dynamically adapt to changing workload requirements, optimizing resource utilization and cost efficiency.
Customizable Software Stack: With AWS ParallelCluster, users can customize the software environment on their HPC clusters. It supports a wide range of HPC software packages and applications, allowing users to define their preferred tools, libraries, and compilers. This flexibility enables researchers and developers to work with the software stack they are familiar with and tailor it to their specific needs.
Integration with AWS Services: AWS ParallelCluster integrates with other AWS services, providing additional capabilities and services for HPC workloads. For example, users can leverage Amazon Elastic File System (EFS) or Amazon FSx for Lustre as shared file systems for data storage, AWS Identity and Access Management (IAM) for access control, and AWS Batch or AWS Lambda for task scheduling and job management.
Cost Optimization: AWS ParallelCluster offers various features to optimize costs. It provides detailed monitoring and logging capabilities to track resource utilization and identify potential cost-saving opportunities. Users can take advantage of spot instances to significantly reduce costs, leveraging spare EC2 capacity. Additionally, users can define policies to automatically terminate idle resources, ensuring that clusters are not consuming unnecessary resources when not in use.
Flexibility and Portability: AWS ParallelCluster is an open-source tool, allowing users to take advantage of its flexibility and portability. Users can modify and extend the tool's functionality to meet their specific requirements. It also supports hybrid and multi-cloud deployments, enabling users to extend their clusters across different cloud providers or on-premises environments.

Overall, AWS ParallelCluster simplifies the process of deploying and managing HPC clusters on AWS, providing flexibility, scalability, and cost optimization. It enables users to focus on their HPC workloads while leveraging the benefits of cloud infrastructure and services.

AWS ParallelCluster is available as an Intel Select Solution for simulation and modeling. Configurations are verified to meet the standards set by the Intel HPC Platform Specification, use specific Intel instance types, and are configured to use the Elastic Fabric Adapter (EFA) networking interface. AWS ParallelCluster is the first cloud solution to meet the requirements for the Intel Select Solutions program.

Intel Instances for HPC on AWS:

AWS and Intel have a 16+ year relationship dedicated to developing, building, and supporting cloud services that are designed to manage cost and complexity, accelerate business outcomes, and scale to meet current and future computing requirements. Intel® processors provide the foundation of many cloud computing services deployed on AWS. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by Intel® Xeon® Scalable processors have the largest breadth, global reach, and availability of compute instances across AWS geographies.

Amazon Elastic Compute Cloud (Amazon EC2) Hpc6id instances, powered by 3rd Generation Intel Xeon Scalable processors, offer cost-effective price performance for memory-bound and data-intensive high-performance computing (HPC) workloads in Amazon EC2. Hpc6id instances deliver up to 2.2X better, price-performance over comparable x86-based instances for data-intensive HPC workloads, such as Finite Element Analysis (FEA).

AMD is claiming that Amazon EC2 Hpc6a instances offer the best price performance for compute-intensive high-performance computing (HPC) workloads in Amazon EC2. Most of the cost savings claims can be attributed to increased core count for AMD over Intel per socket. There are differences between the AMD and Intel instances relating to the number of cores and the memory. AMD instances have more cores, while the Intel instances has more memory. Intel instances have four times the memory capacity as the AMD instances providing for more headroom for HPC applications requiring more memory.

In this three part blog series, we will compare the performance of HPC instances of Intel versus those of AMD on AWS for common HPC workloads for genomics (GROMACS) and Computational Fluid Dynamics (OpenFoam). In part 2 of the blog series we will look at GROMACS performance for Intel and AMD instances.