Ethernet Products
Determine ramifications of Intel® Ethernet products and technologies
4996 Discussions

XL710 too many irq

AndriiV
Novice
2,608 Views

Hello!

I have a problem with Intel XL710,  after 22 Gbps of traffic some cores of processor loads 100% IRQ and traffic goes down.

 

atop.png

 

There are latast firmware and driver:

 

 

 

# ethtool -i ens2
driver: i40e
version: 2.24.6
firmware-version: 9.40 0x8000ecc0 1.3429.0
expansion-rom-version:
bus-info: 0000:d8:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

 

 

 

 

graphs.png

I read this guide

And made all possible configurations.  But nothing changes

 

 

 

382  set_irq_affinity local ens2
  384  set_irq_affinity all ens2
  387  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0
  389  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 5000 tx-usecs 20000
  391  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 10000 tx-usecs 20000
  393  ethtool -g ens2
  394  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 84 tx-usecs 84
  396  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 62 tx-usecs 62
  398   ethtool -S ens2 | grep drop
  399   ethtool -S ens2 | grep drop
  400   ethtool -S ens2 | grep drop
  401   ethtool -S ens2 | grep drop
  402  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 336 tx-usecs 84
  403   ethtool -S ens2 | grep drop
  404   ethtool -S ens2 | grep drop
  406   ethtool -S ens2 | grep drop
  407  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 672 tx-usecs 84
  408   ethtool -S ens2 | grep drop
  409   ethtool -S ens2 | grep drop
  411   ethtool -S ens2 | grep drop
  412   ethtool -S ens2 | grep drop
  425   ethtool -S ens2 | grep drop
  426   ethtool -S ens2 | grep drop
  427  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 8400 tx-usecs 840
  428  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 4200 tx-usecs 840
  430   ethtool -S ens2 | grep drop
  431   ethtool -S ens2 | grep drop
  432   ethtool -S ens2 | grep drop
  433   ethtool -S ens2 | grep drop
  434  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 4200 tx-usecs 1680
  435   ethtool -S ens2 | grep drop
  436   ethtool -S ens2 | grep drop
  439  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 3200 tx-usecs 3200
  469  ethtool -a ens2
  472  ethtool ens2
  473  ethtool -i ens2
  475  ethtool -i ens2
  476  ethtool ens2
  482  ethtool -C ens2 adaptive-rx on
  484  ethtool -c ens2
  486  ethtool -C ens2 adaptive-tx on
  487  ethtool -c ens2
  492  history | grep ens2
  494  ethtool -m ens2
  499  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 4200 tx-usecs 1600
  501  history | grep ens2

 

 

 

 

Server configuration:

80 cores  Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz

RAM 960 GB

8 x SAMSUNG MZQLB7T6HMLA-000AZ NVME disks

Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)

 

What can be done with the settings of this network card to solve the problem?

 

On another server with a similar configuration, but a different network cards, everything is fine

atop_2.png

graphs_2.png

Server configuration:

88 cores Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

RAM 512 GB

8 x SAMSUNG MZQLB7T6HMLA-00007 NVME disks

4 x 82599ES 10-Gigabit SFI/SFP+ Network Connection

 

 

0 Kudos
25 Replies
AndriiV
Novice
756 Views

Hi Simon!

Sorry for my late answer.

We use a lot of different Intel network cards from 1000Mbit/s on the e1000e and igb drivers to 100Gbit/s on the ice driver.

Nowhere have there been problems with such large IRQ except this server with XL710

For example, a server graph with E810-C:

graphs_7.png

With 84 Gbps traffic, only 1.16k IRQ.

Immediately with a traffic of 20 Gbit/s there are more than 2k IRQ. Why?

0 Kudos
IntelSupport
Community Manager
492 Views

Hello AndriiV,

Greetings!


Thank you for writing to us. Thank you for sharing further information. We are actively working on this case with our level 2 team. We will get back to you again with an update at earliest.


Thank you for your patience.



Regards,

Subhashish.


0 Kudos
Hayat
Employee
439 Views

Hi AndriiV, 


Thank you for your patience.


Please let us know if you have try the following tunings that we hadrecommended? If so, have you noticed any differences in the results?


With regards to high IRQ load, Higher-end NICs typically employ more sophisticated interrupt coalescing mechanisms to aggregate multiple network events into a single interrupt. Thereby reducing the frequency of interrupts and mitigating the IRQ load. 


Interrupt affinity, interrupt moderation parameters, and CPU scaling governors, can affect how interrupts are handled and distributed across CPU cores.


Along with the tunings we had suggested in the previous email, to please try the steps below to configure CPU governance.


To view the current CPU scaling governor -> #cpupower frequency-info

To change the CPU scaling governor to a specific mode -> #cpupower frequency-set -g performance


 

Kindly clarify few questions as below:


+ In the link , you had shared the graph for bonding results too. Have  

 you encountered the same issue with bonding?

+ Does the issue occur across all platforms? Please try the following  

 scenarios below:

  - Test with different XL710 NICs to determine if the issue is with   

   the card or something else.

  - Try inserting the same NIC in another platform to determine if the 

   issue is platform-specific or not.


Regards,

Hayat

Intel Customer Support


0 Kudos
AndriiV
Novice
428 Views

Hi Hayat!

 

To view the current CPU scaling governor -> #cpupower frequency-info

To change the CPU scaling governor to a specific mode -> #cpupower frequency-set -g performance

 

 

 

 

 

# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 800 MHz - 3.90 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 800 MHz and 3.90 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.80 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
# cpupower frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
Setting cpu: 8
Setting cpu: 9
Setting cpu: 10
Setting cpu: 11
Setting cpu: 12
Setting cpu: 13
Setting cpu: 14
Setting cpu: 15
Setting cpu: 16
Setting cpu: 17
Setting cpu: 18
Setting cpu: 19
Setting cpu: 20
Setting cpu: 21
Setting cpu: 22
Setting cpu: 23
Setting cpu: 24
Setting cpu: 25
Setting cpu: 26
Setting cpu: 27
Setting cpu: 28
Setting cpu: 29
Setting cpu: 30
Setting cpu: 31
Setting cpu: 32
Setting cpu: 33
Setting cpu: 34
Setting cpu: 35
Setting cpu: 36
Setting cpu: 37
Setting cpu: 38
Setting cpu: 39
Setting cpu: 40
Setting cpu: 41
Setting cpu: 42
Setting cpu: 43
Setting cpu: 44
Setting cpu: 45
Setting cpu: 46
Setting cpu: 47
Setting cpu: 48
Setting cpu: 49
Setting cpu: 50
Setting cpu: 51
Setting cpu: 52
Setting cpu: 53
Setting cpu: 54
Setting cpu: 55
Setting cpu: 56
Setting cpu: 57
Setting cpu: 58
Setting cpu: 59
Setting cpu: 60
Setting cpu: 61
Setting cpu: 62
Setting cpu: 63
Setting cpu: 64
Setting cpu: 65
Setting cpu: 66
Setting cpu: 67
Setting cpu: 68
Setting cpu: 69
Setting cpu: 70
Setting cpu: 71
Setting cpu: 72
Setting cpu: 73
Setting cpu: 74
Setting cpu: 75
Setting cpu: 76
Setting cpu: 77
Setting cpu: 78
Setting cpu: 79

 

 

 

 

I already made it by startup-script:

 

 

 

cpucores=`cat /proc/cpuinfo | awk '/^processor/{print $3}' | tail -1`

for i in `seq 0 $cpucores`;
do
echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor
done;

 

 

 

 

+ In the link , you had shared the graph for bonding results too. Have you encountered the same issue with bonding?

No, as for me, the 82599 cards are the most problem-free.

 

+ Does the issue occur across all platforms? Please try the following scenarios below:

- Test with different XL710 NICs to determine if the issue is with the card or something else.

- Try inserting the same NIC in another platform to determine if the issue is platform-specific or not.

 

Unfortunately, we currently do not have a second server with an XL710 card or another such card in this data center.  We want to replace it with 4x82599.

0 Kudos
Simon-Intel
Employee
420 Views

Hi AndriiV,


Thank you for your response. Please allow us some time to check this internally. We will get back to you as soon as we have an update.


Regards,

Simon


0 Kudos
Reply