Ethernet Products
Intel® Ethernet products and technologies
4155 Discussions

Poor network performance with Intel 810-C and 128 Core Systems

SecurityFun23
Novice
330 Views

Hello!

 

We have been seeing network performance  problems with some of our new servers, and we are hoping to get some advice on a possible fix and / or tuning. 

 

First, some background.

These powerful servers are running either 2x AMD EPYC 7H12 (64 Core) or 2x AMD EPYC 7763 (64 Core) processors with 512 MB of RAM.  That is a total of 128 cores / 256 vCPS.   They all have Intel 810-C network cards.

 

After a clean reboot of the servers, performance will work well for a few minutes, but it quickly degrades over 10 or so minutes, and it will tend to get even worse over time.  

 

The easiest way to explain this is with a Ceph Benchmark test between some of the servers.  When the servers are working well we around getting around 4,500 MB/s write speed.  After 10 minutes or so, we are seeing that good speed dropping randomly to under 1,000 MB/s, and over time it gets to the point the it will pretty consistently under 100 MB/s, but it will still sometimes jump up to better performance.  So its very random.

 

We have tested this with Ubuntu 20.04, Ubuntu 22.04, Centos 9 Stream, and Rocky Linux 9.1.  We have tried the kernel include driver as well as the Intel Provide ICE Driver Version 1.9.11 and 1.10.1.2.2.  

 

We have the identical configuration working with excellent network performance with systems based on the 2 x AMD EPYC 7302 (16 Core) processors.  So we suspect that this has to do something with the way the network interfaces interact with so many processors.

 

Now, after a lot of testing, we have found that if we decrease the number of channels available to the interface via "ethtool -L combined 8", we will actually get substantially better performance. I say "better" performance, but instead of getting around 4500 MB/s like we see when things are good in the beginning, we are seeing 1500-2000 MB/s. This is much better than widely varying performance that goes down to 0 sometimes, but its still not as good as the other lower performance servers that we have.

Normally, increasing the number of channels should boost performance. The number of channels normally defaults to the number of vCPU's. However, in this case we are seeing that the default value of channels for a system with 256 vCPU's is 128 or 252 (think this depends on firmare or driver version). With this number of channels, the performance seems to be be inconsistent and very varying. I don't understand why this occurs exactly, and why changing it to a lower number of channels, 4, 8, or 16 helps with the issue. I suspect there is some sort of interrupt contention going on or something similiar. One thing that I did notice was that under "/proc/interrupts" the number of interrupts calls for the ice interrupts varies from a few 10's of thousands to a few millions instead of a consistent distribution.

So to summarize, we think that there is some sort of contention that occurs when using the Intel 810-C cards with this many cores, and we are hoping that someone can help us figure out how to tune this properly.

 

Thank you!

0 Kudos
1 Solution
SecurityFun23
Novice
105 Views

Michael,

Sorry, I have been slow in responding.  Yes, we have been in contact with Dell on this as well.  However, the most useful information has come out of our own investigations.  

Following up on our initial finding that if we lower the channels to 8 instead of automatic value of 128 (for the 1.10.x driver) or 252 (for the 1.9.x driver), that will help bring the performance back to an "alright level".  What have seen now is that if you reduce the channels to a higher level of 32 or 64, you can maintain good performance BUT only if you do this on the system right away before putting load onto the system.  Once the poor performance sets in, adjusting the channels will never get you back to the full expected performance (although it will make the performance more consistent).  We also found that setting the processor to use "linear" for the "MADT Core Enumeration" in the bios was important to get performance reliable.  Such a setting may have a different name in some systems, but the effect of it is to make the cpus numbers in /proc/cpuinfo cycle through the first processor, and then the second processor.  The other option is "Round Robin".  In that case, the CPU's are allocated such that the odd numbers come from one processor and the even numbers from the other processor.  I believe the reason this make a difference is that the intel drivers will automatically allocate the IRQ's to CPU cores starting in the beginning and counting up.  So if you use linear, then the IRQ's will be completely (or mostly depending on the number of channels) allocated to the first NUMA.  If you use round robin, they will be split across the 2 NUMA's.  This makes me speculate that if you had your network cards in the second numa (ours are in the first) then you might have to use the "intel-balance.sh" with a custom set of vCPU's to force the balancing to the second NUMA.

The fact that using the default number of channels will lead to good performance initially, that then drops and varies widely make me think there is still a bug in the drivers, the kernel, or the cards themselves that doesn't play well with so many channels.  

However, now we are no longer blocked, but we will continue to monitor this issue.  We are getting decent results with this method, but we are not sure if the results will remain good over a long period of time or if the performance will degrade over time.

 
 

View solution in original post

12 Replies
Mike_Intel
Moderator
287 Views

Hello SecurityFun23,


Thank you for posting in Intel Ethernet Communities. 


For us to further check the issue, please provide the following details.


  1. Are you using onboard 810-C or is it a PCIe card?
  2. Can you share some screenshots of the issue?
  3. Can you share the link of the driver that you are using?


If you have questions, please let us know. In case we do not hear from you, we will make a follow up after 3 workings days. Thank you.


Best regards,

Michael L.

Intel® Customer Support


SecurityFun23
Novice
278 Views
Hello!

Thanks you for your response. To answer your questions:

1) These are PCI express cards.

2) I tried to explain the general issue above. I’m not exactly sure what screenshots would be useful for you. Would could share a dump of a ceph performance test showing varying throughout if that is helpful.

3) Here is the link to the latest ice driver which we have tested with:

https://sourceforge.net/projects/e1000/files/ice%20stable/1.10.1.2.2/ice-1.10.1.2.2.tar.gz/download

We have also just used the Ubuntu 22.04 default driver with the same results:
$ ethtool -i enp33s0f0
driver: ice
version: 5.15.0-56-generic
firmware-version: 4.00 0x800139bc 21.5.9
expansion-rom-version:
bus-info: 0000:21:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
Mike_Intel
Moderator
258 Views

Hello SecurityFun23,


Thank you for the prompt reply. For us to further investigate the issue, please provide the following details:


  1. Please provide photos of the cards on both sides for us to validate if the cards are not OEM units.
  2. Please generate the ssu logs of the system for us to check. Kindly download the utility below:


https://www.intel.com/content/www/us/en/download/18895/intel-system-support-utility-for-the-linux-op...


If you have questions, please let us know. In case we do not hear from you, we will make a follow up after 3 workings days. Thank you.


Best regards,

Michael L.

Intel® Customer Support


SecurityFun23
Novice
235 Views

Michael,

 

    These systems are primarily Dell's with Dell provided Intel 810C cards.  We can provide you an SSU log if that's helpful, but I'd rather send that directly rather than post on the public forum if that's possible.  Although, I'm hoping there is something generic about performance with using intel cards on systems with a large number of vCPU's that can be shared.

 

We appreciate anything you can provide!

 

Mike_Intel
Moderator
227 Views

Hello SecurityFun23,


Thank you for the prompt reply. As much as we want to further check the issue. Since you mentioned that the cards are from Dell, that means the cards are all OEM's. Please understand that we have limited support for OEM units because they may have been altered by the OEM manufacturer. They should have custom driver, software and firmware for the cards. It is best to contact Dell for further assistance.


If you have questions, please let us know. In case we do not hear from you, we will make a follow up after 3 workings days. Thank you.


Best regards,

Michael L.

Intel® Customer Support


SecurityFun23
Novice
224 Views
We understand you may not be able to provide direct support for these specific cards. However, we would appreciate any information, guidelines, white papers, etc. you may have as to how to configure these cards in general with large multi core systems. Thank you!
Mike_Intel
Moderator
203 Views

Hello SecurityFun23,


I understand. Let me try to check what we can recommend regarding the issue that you encounter. Please give us 2 to 3 working days to provide an update and we may also ask some additional details to investigate the issue.


Best regards,

Michael L.

Intel® Customer Support


Mike_Intel
Moderator
168 Views

Hello SecurityFun23,


Thank you for patiently waiting for our update.  


Upon further checking and as much as we want to assist you, the components are DELL manufactured and we do not support OEM units. We understand that it is an Intel Ethernet chipset that is embedded with Dell adapter but it would still be best to get in touch with Dell for further support because they have already customized the Firmware, driver and software to enable or alter features that best suited for your system. They are in best position where you can seek assistance so you do not lose features or customizations of the Ethernet adapter. 


Please accept our apology for the inconvenience.

 

If you have questions, please let us know. In case we do not hear from you, we will make a follow up after 3 workings days.

Thank you.

 

Best regards,

Michael L.

Intel® Customer Support


Mike_Intel
Moderator
143 Views

Hello SecurityFun23,

 

I hope you're having a wonderful day. I am just sending a soft follow up on hoping that you are now talking with Dell for further assistance.

 

If you have questions, please let us know. In case we do not hear from you, we will make a follow up after 3 workings days.

Thank you.

 

Best regards,

Michael L.

Intel® Customer Support


Mike_Intel
Moderator
113 Views

Hello SecurityFun23,

 

I hope this message finds you well. I am just sending another follow up hoping that you are now talking with Dell for further assistance. Since we have not heard back from you, I need to close this inquiry. 


If you need any additional information, please submit a new question as this thread will no longer be monitored.


Thank you and stay safe.

 

Best regards,

Michael L.

Intel® Customer Support


SecurityFun23
Novice
106 Views

Michael,

Sorry, I have been slow in responding.  Yes, we have been in contact with Dell on this as well.  However, the most useful information has come out of our own investigations.  

Following up on our initial finding that if we lower the channels to 8 instead of automatic value of 128 (for the 1.10.x driver) or 252 (for the 1.9.x driver), that will help bring the performance back to an "alright level".  What have seen now is that if you reduce the channels to a higher level of 32 or 64, you can maintain good performance BUT only if you do this on the system right away before putting load onto the system.  Once the poor performance sets in, adjusting the channels will never get you back to the full expected performance (although it will make the performance more consistent).  We also found that setting the processor to use "linear" for the "MADT Core Enumeration" in the bios was important to get performance reliable.  Such a setting may have a different name in some systems, but the effect of it is to make the cpus numbers in /proc/cpuinfo cycle through the first processor, and then the second processor.  The other option is "Round Robin".  In that case, the CPU's are allocated such that the odd numbers come from one processor and the even numbers from the other processor.  I believe the reason this make a difference is that the intel drivers will automatically allocate the IRQ's to CPU cores starting in the beginning and counting up.  So if you use linear, then the IRQ's will be completely (or mostly depending on the number of channels) allocated to the first NUMA.  If you use round robin, they will be split across the 2 NUMA's.  This makes me speculate that if you had your network cards in the second numa (ours are in the first) then you might have to use the "intel-balance.sh" with a custom set of vCPU's to force the balancing to the second NUMA.

The fact that using the default number of channels will lead to good performance initially, that then drops and varies widely make me think there is still a bug in the drivers, the kernel, or the cards themselves that doesn't play well with so many channels.  

However, now we are no longer blocked, but we will continue to monitor this issue.  We are getting decent results with this method, but we are not sure if the results will remain good over a long period of time or if the performance will degrade over time.

 
 
SecurityFun23
Novice
101 Views

Hello!

 

I just found the recently updated document (Updated last week) - "Intel® Ethernet 800 Series Linux Performance Tuning Guide, NEX Cloud Networking Group (NCNG), January 2023" at https://cdrdv2-public.intel.com/636781/800%20Series%20Linux%20Performance%20Tuning%20Guide_Rev1.1.pd...

 

This seems to refer to the exact issue if have been have.   I have pasted section 5.2 below.  Here is specifically states that you may need to reduce the number of channels with high CPU count system duet to resource contention!  In other words, this is exactly what we were suspecting from the beginning.  If there is any more detail available on this "resource contention" I would be interested in understanding its details as this could help us better tune the systems in the future.

 

5.2 Tx/Rx Queues

The default number of queues enabled for each Ethernet port by the driver at initialization is equal to the total number of CPUs available in the platform. This works well for many platform and workload configurations. However, in platforms with high core counts and/or high Ethernet port density, this configuration can cause resource contention. Therefore, it might be necessary in some cases to modify the default for each port in the system.

It is recommended in these cases to reduce the default queue count for each port to no more than the number of CPUs available in the NUMA node local to the adapter port. In some cases, when attempting to balance resources on high port count implementations, it might be necessary to reduce this number even further.

• To modify queue configuration:
The following example sets the port to 32 Tx/Rx queues:

     ethtool -L ethX combined 32

Example output:

ethtool -l ethX
Channel parameters for ethX: Pre-set maximums:
RX: 96
TX: 96
Other: 1
Combined: 96
Current hardware settings: RX: 0
TX: 0
Other: 1
Combined: 32

Reply