Alphabet soup: How to Pick the Right AWS EC2 Instance.

Chris_Norman · ‎06-20-2023

In this article, I explore the tools that Amazon gives you to understand Amazon Web Services (AWS) instance capabilities, crack the code of the alphabet soup naming scheme, highlight the importance of understanding your workload bottlenecks and some of the acceleration capabilities in 4th Generation Intel® Xeon® Scalable Processors that we’re taking advantage of in open source software.

One of the first steps in my exploration of open source cloud native technologies was to try and fire up an Amazon Elastic Compute Cloud (EC2) instance to try some ideas out. Armed with newly minted AWS credentials, and forewarned by AWS Cloud Practitioner Essentials training, I knew there were a wide range of AWS instances to choose from. I wasn’t quite prepared for just how many there are though.

How do you choose which one is the right one?

Fortunately, there is an open source Command Line tool from AWS that can help guide us. I followed the instructions to install it on my Ubuntu system.

curl -Lo ec2-instance-selector https://github.com/aws/amazon-ec2-instance-selector/releases/download/v2.4.1/ec2-instance-selector-`uname | tr '[:upper:]' '[:lower:]'`-amd64 && chmod +x ec2-instance-selector

And added in my credentials, and default region

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-west-2"

Then I could use the tool to list all the EC2 instances

./ec2-instance-selector

At the time of writing this, for region US-west 2, the total number of instance types was 628. And of those,

./ec2-instance-selector --cpu-architecture x86_64

497 are based on Intel Architecture. There’s a nice interactive mode that allows you to narrow down options based on various parameters like architecture, memory size and network performance.

./ec2-instance-selector -o interactive

As I want to explore some of the work our Intel engineers are doing, it makes sense for me to choose an Intel architecture based instance, but how do I narrow that down any further?

Amazon breaks down the various instance types on their website, by categories – general purpose computing, compute optimized, memory optimized, accelerated computing, storage optimized and HPC optimized.

Similarly, we have a page that highlights the Intel instances. Looking at all these, we can start to see some patterns in the naming schemes, but it still seems arbitrarily random.

There must be a way to crack the code of all these instance types. With the aid of some google searches, Justin Garrison gives a good overview of how the instances are named. From there we can see that the first letter is the family or workload category (general purpose, compute, memory, etc). The number is generation 1-7. (This explains why our processors from the same family end up with different generation numbers based on the family or category of AWS instance.) The next letter(s) give an indication of capabilities or attributes of the instance, like whether it uses an i for Intel processor, d for instance store volumes or n for network optimizations for example. And lastly, the descriptor at the end gives an indication of the size of the instance.

Knowing the characteristics of your workload is key to ensuring that you pick the right size of instance. While vCPU and memory count tend to increase linearly with the increase in instance types, network and storage may not and are a lot trickier to figure out. Benchmarking your workload on your laptop is no substitute for trying it out in an EC2 production environment and you should ensure that you can collect data in production to optimize your instances.

Of course, cost plays a key part in making the decision on which instance to use, but in his article “A case for a balanced approach with Intel based instances for the AWS Well-Architected framework”, my colleague Mohan Potheri makes a compelling argument that using cost per core count solely as a metric is flawed, and a more nuanced approach to matching workload to instance type should be considered. He argues that there are risks associated with migrations from Intel to non-Intel architectures if re-platforming is the main approach to cost reduction. There is a potential loss in performance due to the absence of Intel hardware and software features in the target platform and additional operational burdens in supporting that new platform. On the other hand, there are significant opportunities to improve performance & sustainability and reduce costs by leveraging the hardware acceleration and the software optimizations for workloads running on Intel instances.

When the ratio of Intel to non-Intel architecture instances in the Amazon EC2 cloud is 79%, it’s a strong bet that the code you’re running has been compiled for Intel Architecture. No debugging of code due to architecture porting issues will be needed (I can’t promise that there won’t be other bugs though ). We have many software engineers working on hardware drivers, firmware, middleware and frameworks, as well as optimization work in runtime languages and applications. The work we do lower in the stack helps the developers higher up the stack. Intel optimized runtime and framework code makes code run faster, better, more efficiently. It is prudent and mitigates risk to optimize workloads for Intel instances and adjust the sizing of the instances appropriately and benefit from the associated cost savings.

Potheri also talks about taking advantage of the built-in accelerators of the latest 4th Gen Intel® Xeon® Scalable processors which include features like:

- Intel® Advanced Matrix Extensions (Intel® AMX) significantly accelerates HPC, deep learning training and inference, ideal for workloads like risk and fraud detection, genomic sequencing, climate science, natural language processing, recommendation systems and image recognition.

- Intel® In-Memory Analytics Accelerator (Intel® IAA) helps run database and analytics workloads faster, ideal for in-memory databases, open-source databases, and data stores.

- Intel® Data Streaming Accelerator (Intel® DSA) drives high performance for storage, networking, and data-intensive workloads by improving streaming data movement and transformation operations.

- Intel Xeon CPU Max Series provides high bandwidth memory that helps reduce bottlenecks of memory-bound applications and deliver much improved performance for modeling, AI, HPC and data analytics.

These 4th Gen Intel® Xeon® Scalable processors (Sapphire Rapids) are R7iz instances, and aren’t generally available in AWS just yet, but they are being previewed and you can read more about them or sign up on the preview list to try it out. You know I will be itching to explore these new capabilities as they become more widely available. Pass the soup!

About the Author

Chris Norman is an Open Source Advocate who has promoted the use of open source ecosystems for over a decade.  You can find him as pixelgeek on Twitter,  Mastodon, IRC and GitHub.  

Photo by Sigmund on Unsplash