- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
I have a problem with Intel XL710, after 22 Gbps of traffic some cores of processor loads 100% IRQ and traffic goes down.
There are latast firmware and driver:
# ethtool -i ens2
driver: i40e
version: 2.24.6
firmware-version: 9.40 0x8000ecc0 1.3429.0
expansion-rom-version:
bus-info: 0000:d8:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
I read this guide
And made all possible configurations. But nothing changes
382 set_irq_affinity local ens2
384 set_irq_affinity all ens2
387 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0
389 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 5000 tx-usecs 20000
391 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 10000 tx-usecs 20000
393 ethtool -g ens2
394 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 84 tx-usecs 84
396 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 62 tx-usecs 62
398 ethtool -S ens2 | grep drop
399 ethtool -S ens2 | grep drop
400 ethtool -S ens2 | grep drop
401 ethtool -S ens2 | grep drop
402 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 336 tx-usecs 84
403 ethtool -S ens2 | grep drop
404 ethtool -S ens2 | grep drop
406 ethtool -S ens2 | grep drop
407 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 672 tx-usecs 84
408 ethtool -S ens2 | grep drop
409 ethtool -S ens2 | grep drop
411 ethtool -S ens2 | grep drop
412 ethtool -S ens2 | grep drop
425 ethtool -S ens2 | grep drop
426 ethtool -S ens2 | grep drop
427 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 8400 tx-usecs 840
428 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 4200 tx-usecs 840
430 ethtool -S ens2 | grep drop
431 ethtool -S ens2 | grep drop
432 ethtool -S ens2 | grep drop
433 ethtool -S ens2 | grep drop
434 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 4200 tx-usecs 1680
435 ethtool -S ens2 | grep drop
436 ethtool -S ens2 | grep drop
439 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 3200 tx-usecs 3200
469 ethtool -a ens2
472 ethtool ens2
473 ethtool -i ens2
475 ethtool -i ens2
476 ethtool ens2
482 ethtool -C ens2 adaptive-rx on
484 ethtool -c ens2
486 ethtool -C ens2 adaptive-tx on
487 ethtool -c ens2
492 history | grep ens2
494 ethtool -m ens2
499 ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 4200 tx-usecs 1600
501 history | grep ens2
Server configuration:
80 cores Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
RAM 960 GB
8 x SAMSUNG MZQLB7T6HMLA-000AZ NVME disks
Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
What can be done with the settings of this network card to solve the problem?
On another server with a similar configuration, but a different network cards, everything is fine
Server configuration:
88 cores Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
RAM 512 GB
8 x SAMSUNG MZQLB7T6HMLA-00007 NVME disks
4 x 82599ES 10-Gigabit SFI/SFP+ Network Connection
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Simon!
Sorry for my late answer.
We use a lot of different Intel network cards from 1000Mbit/s on the e1000e and igb drivers to 100Gbit/s on the ice driver.
Nowhere have there been problems with such large IRQ except this server with XL710
For example, a server graph with E810-C:
With 84 Gbps traffic, only 1.16k IRQ.
Immediately with a traffic of 20 Gbit/s there are more than 2k IRQ. Why?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello AndriiV,
Greetings!
Thank you for writing to us. Thank you for sharing further information. We are actively working on this case with our level 2 team. We will get back to you again with an update at earliest.
Thank you for your patience.
Regards,
Subhashish.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi AndriiV,
Thank you for your patience.
Please let us know if you have try the following tunings that we hadrecommended? If so, have you noticed any differences in the results?
With regards to high IRQ load, Higher-end NICs typically employ more sophisticated interrupt coalescing mechanisms to aggregate multiple network events into a single interrupt. Thereby reducing the frequency of interrupts and mitigating the IRQ load.
Interrupt affinity, interrupt moderation parameters, and CPU scaling governors, can affect how interrupts are handled and distributed across CPU cores.
Along with the tunings we had suggested in the previous email, to please try the steps below to configure CPU governance.
To view the current CPU scaling governor -> #cpupower frequency-info
To change the CPU scaling governor to a specific mode -> #cpupower frequency-set -g performance
Kindly clarify few questions as below:
+ In the link , you had shared the graph for bonding results too. Have
you encountered the same issue with bonding?
+ Does the issue occur across all platforms? Please try the following
scenarios below:
- Test with different XL710 NICs to determine if the issue is with
the card or something else.
- Try inserting the same NIC in another platform to determine if the
issue is platform-specific or not.
Regards,
Hayat
Intel Customer Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hayat!
To view the current CPU scaling governor -> #cpupower frequency-info
To change the CPU scaling governor to a specific mode -> #cpupower frequency-set -g performance
# cpupower frequency-info
analyzing CPU 0:
driver: intel_pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: Cannot determine or is not supported.
hardware limits: 800 MHz - 3.90 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 800 MHz and 3.90 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 2.80 GHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: yes
# cpupower frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
Setting cpu: 8
Setting cpu: 9
Setting cpu: 10
Setting cpu: 11
Setting cpu: 12
Setting cpu: 13
Setting cpu: 14
Setting cpu: 15
Setting cpu: 16
Setting cpu: 17
Setting cpu: 18
Setting cpu: 19
Setting cpu: 20
Setting cpu: 21
Setting cpu: 22
Setting cpu: 23
Setting cpu: 24
Setting cpu: 25
Setting cpu: 26
Setting cpu: 27
Setting cpu: 28
Setting cpu: 29
Setting cpu: 30
Setting cpu: 31
Setting cpu: 32
Setting cpu: 33
Setting cpu: 34
Setting cpu: 35
Setting cpu: 36
Setting cpu: 37
Setting cpu: 38
Setting cpu: 39
Setting cpu: 40
Setting cpu: 41
Setting cpu: 42
Setting cpu: 43
Setting cpu: 44
Setting cpu: 45
Setting cpu: 46
Setting cpu: 47
Setting cpu: 48
Setting cpu: 49
Setting cpu: 50
Setting cpu: 51
Setting cpu: 52
Setting cpu: 53
Setting cpu: 54
Setting cpu: 55
Setting cpu: 56
Setting cpu: 57
Setting cpu: 58
Setting cpu: 59
Setting cpu: 60
Setting cpu: 61
Setting cpu: 62
Setting cpu: 63
Setting cpu: 64
Setting cpu: 65
Setting cpu: 66
Setting cpu: 67
Setting cpu: 68
Setting cpu: 69
Setting cpu: 70
Setting cpu: 71
Setting cpu: 72
Setting cpu: 73
Setting cpu: 74
Setting cpu: 75
Setting cpu: 76
Setting cpu: 77
Setting cpu: 78
Setting cpu: 79
I already made it by startup-script:
cpucores=`cat /proc/cpuinfo | awk '/^processor/{print $3}' | tail -1`
for i in `seq 0 $cpucores`;
do
echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor
done;
+ In the link , you had shared the graph for bonding results too. Have you encountered the same issue with bonding?
No, as for me, the 82599 cards are the most problem-free.
+ Does the issue occur across all platforms? Please try the following scenarios below:
- Test with different XL710 NICs to determine if the issue is with the card or something else.
- Try inserting the same NIC in another platform to determine if the issue is platform-specific or not.
Unfortunately, we currently do not have a second server with an XL710 card or another such card in this data center. We want to replace it with 4x82599.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi AndriiV,
Thank you for your response. Please allow us some time to check this internally. We will get back to you as soon as we have an update.
Regards,
Simon
![](/skins/images/E2C6D832B61BAAAA0B8D2A8E57BC7B01/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »