Data Center
Participate in insightful discussions regarding Data Center topics
80 Discussions

Untying the new HCI and Virtualization Decision Knot – Part 3

IllyseSheaffer
Employee
1 0 4,891

We started this conversation by discussing the issues facing many data center managers with the recent disruption created by Broadcom’s purchase of VMware. Then, I identified key questions customers should ask themselves before making serious decisions while avoiding common landmines and traps.

First, I need to start with the fact that VMware is the most popular hypervisor in use today, and for good reason. No other single product can do everything that VMware provides in terms of a software-defined data center, and its service has kept pace with the changing needs of its customers (all of us) for decades. Those products and services are still there and aren’t going away.

Some things have changed recently, including VMware’s offerings and pricing structures, leaving many folks wondering what else is out there. In this final chapter of our conversation, we’ll look at some available options. Basically, what can we do?

Let's Take a Closer Look

If you’re in the “Virtualization Compute-Level-Only” space, you have some options:

  • KVM or Xen (open source)
  • Microsoft Hyper-V
  • Oracle Linux Virtualization Manager (based on KVM)
  • Red Hat OpenShift Virtualization
  • XenServer (Citrix hypervisor)

New UI Learning Curves

Not all of these have the same look or feel that many of us are accustomed to, meaning there’s a UI learning curve along with how a different hypervisor thinks of its options and views. This isn’t insurmountable, but it’s real, and like many things, it will get easy over time.

Are You an Over-Subscriber?

It’s critical to note that not all the options listed above will match the vCPU to CPU over-subscription we experience with VMware.

Today, with VMware, many customers with traditional VMs are using compute overcommits to “over-subscribe” the virtual CPU to the physical core. Suppose you’re using three- to five-year-old hardware or older with VMware hypervisor, with generic traditional workload VMs. In that case, most customers are experiencing a range of 2.5->3.5 vCPU to CPU oversubscription (based on the CPU and your specific workloads). With today’s Intel Xeon Scalable processors, you can safely provide a 4.5-5.0 vCPU to 1 CPU oversubscription with limited risk. Of course, your actual “mileage” will always vary based on your workloads, so understanding your over-subscription rates and what is or is not supported as you consider options will affect your chosen solution. Pro Tip: Always validate your specific environments in advance to avoid risk.

Fully Open Source or “Based” on Open Source?

While we’re on risk avoidance or acceptance, there are things to consider when using a total open-source Xen or KVM solution versus a solution based on open-source backed by an Independent Software Vendor like Red Hat OpenShift Virtualization, XenServer or Oracle Linux Virtualization Manager. What’s the difference? Support. Suppose you’re running critical workloads on a hypervisor. In that case, most customers need to know they have support ready 24x7x365 rather than going online and waiting for a community response, which may or may not be helpful. On-demand support matters, especially during outages, and it does come at a cost.

What About HCI?

For those working with hyper-converged infrastructure (HCI) compute and storage virtualization, there are other options above and beyond VMware. This is not an exhaustive list, but it includes the ones I encounter most frequently:

  • Azure Stack HCI
  • Harvester (SUSE)
  • Nutanix
  • Red Hat OpenShift Virtualization with CEPH
  • Verge.io

If you keep up on industry reviews, you know that Nutanix, in Gartner, and other industry reviews, rates at the top along with VMware for HCI. Nutanix is unique in that it supports three different hypervisors on its software-defined storage: its own AHV hypervisor, VMware, or Hyper-V’s hypervisor. Azure Stack HCI has been around for many years. However, it does require that both the compute and storage are Microsoft-based. Understanding the storage services offered by the HCI could be critical if you need encryption at rest, snapshots, clones, or even deduplication; not all HCI solutions provide these services natively.

One major big consideration with Nutanix and Azure Stack HCI offerings is that they can be purchased in an engineered turnkey solution, meaning that the hardware used to run their software stack is built by the OEMs down to the BIOs and drivers specific to what they offer.

The pro? Engineered systems provide ease of purchase and confidence that validated software, and hardware is holistically supported and maintained. When you get updates or patches, the hardware and software updates are validated to work together reducing any management or testing efforts at the customer end. There is the confidence you know any issues have been addressed before the patches arrive at your portal to perform.

The con? There is a lack of hardware flexibility. You cannot use your current on-hand hardware to build any of the engineered solutions. Nutanix and Dell announced 5/21/2024 the Nutanix AHV Hypervisor on PowerFlex storage offering. Azure Stack HCI has an additional option to overcome the flexibility objection, which allows for customer-installed hardware via their hardware compatibility list and software. This option will require more management of each node. The customer needs to update patches specifically and do testing to ensure the patches will not cause an invalid configuration.

All that said… one concern customers have is repurposing the hardware they invested in for future needs “if” they choose to leave the current solution. In all the options above, you can run other workloads on these solutions if you choose to leave them later.

Also note that many providers of HCI marry the compute and storage virtualization into one offering, including Harvester, Azure Stack HCI, Verge.io, and Scale.io. Before 5/21/2024 Nutanix would be part of this marriage but now provides an additional option with PowerFlex storage. Just note that flexibility comes at a cost and understanding how you want to procure and build your HCI environment is a critical part of your decision.

What about Containers/ Replatforming?

Given the disruption and analysis required, this may be the time to consider re-platforming from traditional VMs to microservices. Successfully re-platforming traditional VMs to microservices takes both time and talent. While this is not a quick fix to avoid licensing costs on the front end, there could be savings down the road, yet a full analysis and the proper development teams would need to be assembled. Or Is your company using containers today? Do you need or want to manage containers and traditional VMs in a single pane of glass? Not all hypervisors or HCI providers provide container and VM management in one dashboard. The options to do both will minimize your operations and narrow your choices.

Support

No matter which option you choose, review and assess your locations of placement needs and the business hours of support vs. out-of-business hours of support. Enterprise customers require support consistently, and waiting till business hours in the USA while you are in EMEA for problem issues, especially when there is an outage, is frustrating. Do not make assumptions as this may catch you in the wrong way in times of a critical outage.

Other Overlooked Costs

There are other important features and details to consider that can be missed. For example, Disaster recovery/site recovery or automation: Are these services today turnkey and can be integrated into another solution, or will they need to be redesigned/built, which may entail added resources and cost?

Ensure to validate that your most important software ISVs will work on your chosen solution. Will your new ISV support the hardware you require for your workloads, like GPUs or other accelerators? A critical software to validate is backup, is it supported on other hypervisors/HCI solutions? Is your current virtualization environment integrated with your Service Now, BMC, or other ticketing or automation systems? Knowing your current investments and how they will integrate holistically into a new environment will take time, but it is worth the planning effort.

Moving VMware to a Public Cloud

Moving from an on-prem subscription to a VMware public cloud solution like AWS VMware Manage Cloud (VMC), Azure VMware Solution, or Google VMware Cloud Engine means moving from a CapEx to an OpEx model. If you were looking for reduced VMware subscription costs, you may not find this a cost savings. Based on the current infrastructure you have it may even cost you more. The pro to moving to a VMware public cloud offering is the ease of migration, but it is like moving for the sake of moving. It does provide you the ability to get out of the data center business and could afford you the ability to modernize on your timeframe and re-platform your workloads; however, you should ask yourself the ‘why’ to this moving as chances are it will not decrease your total spend.

Going Cloud Native

I’d be remiss if I didn’t mention the option of going to a native cloud offering like AWS, Azure, or Google. Taking traditional VMs directly to a public cloud instance doesn’t always come with live migration/automatic failover of a VM nor with disaster recovery. You can design into the solution the services you need for your most critical business needs. Understanding security and roles/responsibilities is also important as you move to native cloud. If you are looking at public cloud instances, my nugget to share is to understand your networking rates and data usage per hypervisor currently; going into the cloud is free, yet coming out of the cloud is not. Companies often don’t think about the networking costs until it gets them months later. Your networking team can help you understand the bandwidth within your data center.

While Weighing your Opinions to Stay or Go

After all of this…let’s talk about optimizing your current investment in VMware.

One of the most common questions I receive is, “How do I improve consolidation costs in this new pricing environment?”

Please note at the time of this blog being written, VMware subscription pricing is based on per core cost with a minimum of 16cores per. I want to mention that purchasing a 24core socket or a 28core socket has the same “per-core subscription cost” as the 16core socket. There are no per-core discounts for using more cores. I write this specifically because some people now focus on buying only 16c socket hypervisors for their infrastructure. Others are asking for the largest of core counts in their hypervisors. 16c socket hypervisors may be a great solution for a smaller environment with minimal VMs. The largest core count sockets may be a good fit for multiple thousands of VMs, but as you consolidate VMs with a 4.5 vCPU to 1CPU oversubscription, those VMs also require memory. As such, your core count might not be the biggest cost; it could be the memory cost.

Memory cost is another factor that is important to keep in mind. The more VMs you add, the more memory they require; thus, understanding those costs is very important. Today, the 128GB DDR5 dimm is almost 4x the cost of a 64GB DDR5 dimm. 64GB DDR5 dimms and 96GB DDR5 dimms are a sweet spot. Intel provides 2 dimms per 8 channels per socket, allowing you to size your memory correctly at the lowest cost, providing incredible memory bandwidth performance. A third impact if you decide to consolidate or densify is to consider that your networking capacity, if you are on 10GbE still, may no longer be performant enough without having “more” 10GbE ports in the host for acceptable failover times and storage needs, sometimes called blast radius, what ratio of VMs to a hypervisor are you comfortable with. We already talked about “your mileage will vary” by workload. Thus I advise testing this out first is important to reduce that exposure.

One last nugget to consider: To optimize for license costs, consider separating your Windows VMs from your Linux VMs on different clusters or hypervisors with affinity.

Are you thinking the possibilities seem endless? We get it. This can feel like every stone turned over reveals a new one underneath. The good news is that we’re here to help you find the best opportunities and avoid the pitfalls others frequently overlook. It’s certainly worth giving some thought....

 

Notices and Disclaimers

Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.