Ethernet Products
Determine ramifications of Intel® Ethernet products and technologies
5306 Discussions

XL710 too many irq

AndriiV
Novice
7,111 Views

Hello!

I have a problem with Intel XL710,  after 22 Gbps of traffic some cores of processor loads 100% IRQ and traffic goes down.

 

atop.png

 

There are latast firmware and driver:

 

 

 

# ethtool -i ens2
driver: i40e
version: 2.24.6
firmware-version: 9.40 0x8000ecc0 1.3429.0
expansion-rom-version:
bus-info: 0000:d8:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

 

 

 

 

graphs.png

I read this guide

And made all possible configurations.  But nothing changes

 

 

 

382  set_irq_affinity local ens2
  384  set_irq_affinity all ens2
  387  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0
  389  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 5000 tx-usecs 20000
  391  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 10000 tx-usecs 20000
  393  ethtool -g ens2
  394  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 84 tx-usecs 84
  396  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 62 tx-usecs 62
  398   ethtool -S ens2 | grep drop
  399   ethtool -S ens2 | grep drop
  400   ethtool -S ens2 | grep drop
  401   ethtool -S ens2 | grep drop
  402  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 336 tx-usecs 84
  403   ethtool -S ens2 | grep drop
  404   ethtool -S ens2 | grep drop
  406   ethtool -S ens2 | grep drop
  407  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 672 tx-usecs 84
  408   ethtool -S ens2 | grep drop
  409   ethtool -S ens2 | grep drop
  411   ethtool -S ens2 | grep drop
  412   ethtool -S ens2 | grep drop
  425   ethtool -S ens2 | grep drop
  426   ethtool -S ens2 | grep drop
  427  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 8400 tx-usecs 840
  428  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 4200 tx-usecs 840
  430   ethtool -S ens2 | grep drop
  431   ethtool -S ens2 | grep drop
  432   ethtool -S ens2 | grep drop
  433   ethtool -S ens2 | grep drop
  434  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 4200 tx-usecs 1680
  435   ethtool -S ens2 | grep drop
  436   ethtool -S ens2 | grep drop
  439  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 3200 tx-usecs 3200
  469  ethtool -a ens2
  472  ethtool ens2
  473  ethtool -i ens2
  475  ethtool -i ens2
  476  ethtool ens2
  482  ethtool -C ens2 adaptive-rx on
  484  ethtool -c ens2
  486  ethtool -C ens2 adaptive-tx on
  487  ethtool -c ens2
  492  history | grep ens2
  494  ethtool -m ens2
  499  ethtool -C ens2 adaptive-rx off adaptive-tx off rx-usecs 4200 tx-usecs 1600
  501  history | grep ens2

 

 

 

 

Server configuration:

80 cores  Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz

RAM 960 GB

8 x SAMSUNG MZQLB7T6HMLA-000AZ NVME disks

Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)

 

What can be done with the settings of this network card to solve the problem?

 

On another server with a similar configuration, but a different network cards, everything is fine

atop_2.png

graphs_2.png

Server configuration:

88 cores Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

RAM 512 GB

8 x SAMSUNG MZQLB7T6HMLA-00007 NVME disks

4 x 82599ES 10-Gigabit SFI/SFP+ Network Connection

 

 

0 Kudos
47 Replies
AndriiV
Novice
2,581 Views

Hi Simon!

Sorry for my late answer.

We use a lot of different Intel network cards from 1000Mbit/s on the e1000e and igb drivers to 100Gbit/s on the ice driver.

Nowhere have there been problems with such large IRQ except this server with XL710

For example, a server graph with E810-C:

graphs_7.png

With 84 Gbps traffic, only 1.16k IRQ.

Immediately with a traffic of 20 Gbit/s there are more than 2k IRQ. Why?

0 Kudos
IntelSupport
Community Manager
2,317 Views

Hello AndriiV,

Greetings!


Thank you for writing to us. Thank you for sharing further information. We are actively working on this case with our level 2 team. We will get back to you again with an update at earliest.


Thank you for your patience.



Regards,

Subhashish.


0 Kudos
Hayat
Employee
2,264 Views

Hi AndriiV, 


Thank you for your patience.


Please let us know if you have try the following tunings that we hadrecommended? If so, have you noticed any differences in the results?


With regards to high IRQ load, Higher-end NICs typically employ more sophisticated interrupt coalescing mechanisms to aggregate multiple network events into a single interrupt. Thereby reducing the frequency of interrupts and mitigating the IRQ load. 


Interrupt affinity, interrupt moderation parameters, and CPU scaling governors, can affect how interrupts are handled and distributed across CPU cores.


Along with the tunings we had suggested in the previous email, to please try the steps below to configure CPU governance.


To view the current CPU scaling governor -> #cpupower frequency-info

To change the CPU scaling governor to a specific mode -> #cpupower frequency-set -g performance


 

Kindly clarify few questions as below:


+ In the link , you had shared the graph for bonding results too. Have  

 you encountered the same issue with bonding?

+ Does the issue occur across all platforms? Please try the following  

 scenarios below:

  - Test with different XL710 NICs to determine if the issue is with   

   the card or something else.

  - Try inserting the same NIC in another platform to determine if the 

   issue is platform-specific or not.


Regards,

Hayat

Intel Customer Support


0 Kudos
AndriiV
Novice
2,253 Views

Hi Hayat!

 

To view the current CPU scaling governor -> #cpupower frequency-info

To change the CPU scaling governor to a specific mode -> #cpupower frequency-set -g performance

 

 

 

 

 

# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 800 MHz - 3.90 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 800 MHz and 3.90 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.80 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
# cpupower frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
Setting cpu: 8
Setting cpu: 9
Setting cpu: 10
Setting cpu: 11
Setting cpu: 12
Setting cpu: 13
Setting cpu: 14
Setting cpu: 15
Setting cpu: 16
Setting cpu: 17
Setting cpu: 18
Setting cpu: 19
Setting cpu: 20
Setting cpu: 21
Setting cpu: 22
Setting cpu: 23
Setting cpu: 24
Setting cpu: 25
Setting cpu: 26
Setting cpu: 27
Setting cpu: 28
Setting cpu: 29
Setting cpu: 30
Setting cpu: 31
Setting cpu: 32
Setting cpu: 33
Setting cpu: 34
Setting cpu: 35
Setting cpu: 36
Setting cpu: 37
Setting cpu: 38
Setting cpu: 39
Setting cpu: 40
Setting cpu: 41
Setting cpu: 42
Setting cpu: 43
Setting cpu: 44
Setting cpu: 45
Setting cpu: 46
Setting cpu: 47
Setting cpu: 48
Setting cpu: 49
Setting cpu: 50
Setting cpu: 51
Setting cpu: 52
Setting cpu: 53
Setting cpu: 54
Setting cpu: 55
Setting cpu: 56
Setting cpu: 57
Setting cpu: 58
Setting cpu: 59
Setting cpu: 60
Setting cpu: 61
Setting cpu: 62
Setting cpu: 63
Setting cpu: 64
Setting cpu: 65
Setting cpu: 66
Setting cpu: 67
Setting cpu: 68
Setting cpu: 69
Setting cpu: 70
Setting cpu: 71
Setting cpu: 72
Setting cpu: 73
Setting cpu: 74
Setting cpu: 75
Setting cpu: 76
Setting cpu: 77
Setting cpu: 78
Setting cpu: 79

 

 

 

 

I already made it by startup-script:

 

 

 

cpucores=`cat /proc/cpuinfo | awk '/^processor/{print $3}' | tail -1`

for i in `seq 0 $cpucores`;
do
echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor
done;

 

 

 

 

+ In the link , you had shared the graph for bonding results too. Have you encountered the same issue with bonding?

No, as for me, the 82599 cards are the most problem-free.

 

+ Does the issue occur across all platforms? Please try the following scenarios below:

- Test with different XL710 NICs to determine if the issue is with the card or something else.

- Try inserting the same NIC in another platform to determine if the issue is platform-specific or not.

 

Unfortunately, we currently do not have a second server with an XL710 card or another such card in this data center.  We want to replace it with 4x82599.

0 Kudos
Simon-Intel
Employee
2,245 Views

Hi AndriiV,


Thank you for your response. Please allow us some time to check this internally. We will get back to you as soon as we have an update.


Regards,

Simon


0 Kudos
MACM
Employee
1,740 Views

Hi AndriiV,


Greetings from Intel.


Hope you are doing great.


We noticed one small mistake in the startup-script which is listed in below email , we hope the highlighted is typo in the email, not in the script.


 echo performance ] /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor 


Correct syntax : echo performance > sudo tee /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor


Regards,

Ali


0 Kudos
AndriiV
Novice
1,719 Views

Hello, MACM!

I see my message like this:

Screenshot 2024-07-11 103627.png

0 Kudos
Simon-Intel
Employee
1,712 Views

Hello AndriiV,


Good Day!


Please allow us sometime while we check this internally with our team and will get back to you as soon as we have an update.


Best regards,

Simon


0 Kudos
Irwan_Intel
Moderator
1,615 Views

Hello AndriiV,


We have emailed you to request remote access. Please share if there's any way for us to access the system.


Regards,

Irwan_Intel


0 Kudos
AndriiV
Novice
1,596 Views

Hello Irwan!

I can grant root access via SSH to this server. How can I send you credentials?

0 Kudos
Fikri_Intel
Employee
1,573 Views

Hi AndriiV,


We have emailed you separately for you to share the credential access.




Regards,

Fikri


0 Kudos
AndriiV
Novice
1,462 Views

Hello all!

We decided to try another network card on this server.

NIC: Intel XXV710-DA2 2x25GbE

Cable: Intel XXV4DACBL2M passive 100GBASE-CR4 QSFP28
to 4x25GBASE-CR SFP28

First of all.  Advertised auto-negotiation only 10Gbps on each port.

After 6Mbps of trafic everything hung out.

After

 

ethtool --set-priv-flags ens1f0 disable-fw-lldp on
ethtool --set-priv-flags ens1f1 disable-fw-lldp on

 

almost all was fine but traffic did not rise above 9 Gbit/s on each port. Also was disabled DCB on switch.

As it turned out, this cable is not in the compatibility list of Juniper QFX5200-32C 

It was replaced by  H3C QSFP28-4SFP28-CU-3M Compatible 100G QSFP28 to 4 x 25G SFP28 Passive Direct Attach Copper Breakout Cable.

Advertised auto-negotiation 25Gbps on each port.

But after 20 Gbps on bonding interface (of this two ports) exactly the same problems as with the XL710 were observed.

graphs_9.png

Judging by the graphs, we rest on some ceiling of 20 Gbps.

 

We have almost the same server with Intel XXV710-DA2 2x25Gb NIC

Server configuration:

72 cores  Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz

RAM 768 GB

8 x SAMSUNG MZQLB7T6HMLA-00007 NVME disks

Ethernet Controller Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)

It conected to Arista DCS-7060CX-32S switch by Intel® Ethernet QSFP28 to SFP28 Twinaxial Breakout XXV4DACBL2M 952303 NDAQGF-I202 100GbE/4x25GbE/4x10GbE Direct Attach

There no such problem. And there is LLDP enabled (by default):

 

# ethtool --show-priv-flags ens2f0
Private flags for ens2f0:
MFP                     : off
total-port-shutdown     : off
LinkPolling             : off
flow-director-atr       : on
veb-stats               : off
hw-atr-eviction         : off
link-down-on-close      : off
legacy-rx               : off
disable-source-pruning  : off
mac-source-pruning      : on
disable-fw-lldp         : off
rs-fec                  : on
base-r-fec              : on
multiple-traffic-classes: off
vf-vlan-pruning         : off
vf-source-pruning       : on
mdd-auto-reset-vf       : off
vf-true-promisc-support : off

 

After all these tests I am increasingly inclined to the conclusion that the problem lies in the incompatibility of Intel and Juniper and there are clearly some problems with LDDP/DCB.

 

0 Kudos
AndriiV
Novice
1,355 Views

Hello all! A few days ago we connected this server using a new scheme.

NIC: 2 ports X710 

         2 ports XXV710

Cable: Intel XXV4DACBL2M passive 100GBASE-CR4 QSFP28

Switch: Juniper QFX5200-32C 

I used Linux software bonding to set up a 4x 10Gbps connection:

 

 

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 3c:fd:fe:a8:62:04
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 4
        Actor Key: 15
        Partner Key: 5
        Partner Mac Address: 88:e6:4b:6d:d5:24

Slave Interface: ens1f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 3c:fd:fe:a8:62:04
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 3c:fd:fe:a8:62:04
    port key: 15
    port priority: 255
    port number: 1
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 88:e6:4b:6d:d5:24
    oper key: 5
    port priority: 127
    port number: 41
    port state: 63

Slave Interface: ens1f1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 3c:fd:fe:a8:62:05
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 3c:fd:fe:a8:62:04
    port key: 15
    port priority: 255
    port number: 2
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 88:e6:4b:6d:d5:24
    oper key: 5
    port priority: 127
    port number: 40
    port state: 63

Slave Interface: eno3
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:59:c9:7e
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 3c:fd:fe:a8:62:04
    port key: 15
    port priority: 255
    port number: 3
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 88:e6:4b:6d:d5:24
    oper key: 5
    port priority: 127
    port number: 38
    port state: 63

Slave Interface: eno4
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:59:c9:7f
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 3c:fd:fe:a8:62:04
    port key: 15
    port priority: 255
    port number: 4
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 88:e6:4b:6d:d5:24
    oper key: 5
    port priority: 127
    port number: 39
    port state: 63

 

 

 

Also LLDP was disabled:

 

 

# ethtool --set-priv-flags ens1f1 disable-fw-lldp on
# ethtool --set-priv-flags ens1f0 disable-fw-lldp on
# ethtool --set-priv-flags eno3 disable-fw-lldp on
# ethtool --set-priv-flags eno4 disable-fw-lldp on

 

 

Also set irq affinity:

 

 

# set_irq_affinity -x local ens1f0 ens1f1 eno3 eno4

 

 

Results:

We've hit a ceiling of 23Gbps.
Interrupts are still eating up the entire CPU

graphs_10.png

 

When I tried to enable LLDP on two cards, traffic stopped flowing through them and errors started pouring in:

 

 

Aug  9 11:02:49 15224 kernel: [245313.055731] i40e 0000:19:00.1: VSI seid 390 Tx ring 0 disable timeout
Aug  9 11:02:49 15224 kernel: [245313.062170] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.070168] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.078169] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.086168] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.094173] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.102171] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.114168] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.122168] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.134170] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.146189] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.154167] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.162170] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.174168] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.182167] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.194183] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.202168] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.210167] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.221934] i40e 0000:19:00.1: VSI seid 0 Tx ring 767 disable timeout
Aug  9 11:02:49 15224 kernel: [245313.222170] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.230167] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.238165] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.246166] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.258167] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.270170] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.278189] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.284384] i40e 0000:19:00.1: FW LLDP is disabled, attempting SW DCB
Aug  9 11:02:49 15224 kernel: [245313.286167] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.294167] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.295276] i40e 0000:19:00.1: SW DCB initialization succeeded.
Aug  9 11:02:49 15224 kernel: [245313.302170] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.303844] i40e 0000:19:00.1: MAC source pruning enabled on all VFs
Aug  9 11:02:49 15224 kernel: [245313.310167] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.318240] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.330170] bond0: link status down for interface eno4, disabling it in 200 ms
Aug  9 11:02:49 15224 kernel: [245313.333712] i40e 0000:19:00.1: Set default VSI failed, err I40E_ERR_ADMIN_QUEUE_ERROR, aq_err I40E_AQ_RC_EINVAL
Aug  9 11:02:49 15224 kernel: [245313.333715] i40e 0000:19:00.1: Failed to restore promiscuous setting: off, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EINVAL
Aug  9 11:02:49 15224 kernel: [245313.333717] i40e 0000:19:00.1: VF BW shares not restored
Aug  9 11:02:49 15224 kernel: [245313.333772] i40e 0000:19:00.1: FW LLDP is disabled
Aug  9 11:02:54 15224 kernel: [245318.023464] i40e 0000:19:00.0: VSI seid 391 Tx ring 0 disable timeout
Aug  9 11:02:54 15224 kernel: [245318.078151] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.086148] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.089907] i40e 0000:19:00.0: VSI seid 0 Tx ring 767 disable timeout
Aug  9 11:02:54 15224 kernel: [245318.098147] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.106153] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.114146] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.122145] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.130146] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.138144] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.146148] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.154154] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.162156] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.174161] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.182145] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.190154] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.198146] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.206152] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.218154] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.226148] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.234235] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.242286] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.250159] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.262150] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.274212] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.386149] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.394147] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.402150] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.410151] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.418153] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.426144] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.434148] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.442160] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.454147] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.462146] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.463467] i40e 0000:19:00.0: VSI seid 391 Tx ring 0 disable timeout
Aug  9 11:02:54 15224 kernel: [245318.470149] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.478153] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.486144] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.494144] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.502143] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.510143] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.522155] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.534344] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.546147] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.554145] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.562145] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.570154] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.582151] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.590144] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.598152] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.606144] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.614149] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.619470] i40e 0000:19:00.0: VSI seid 0 Tx ring 767 disable timeout
Aug  9 11:02:54 15224 kernel: [245318.622143] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.630145] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.638146] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.646144] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.654145] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.662142] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.670146] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.678147] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.682065] i40e 0000:19:00.0: FW LLDP is disabled, attempting SW DCB
Aug  9 11:02:54 15224 kernel: [245318.686145] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.692934] i40e 0000:19:00.0: SW DCB initialization succeeded.
Aug  9 11:02:54 15224 kernel: [245318.698145] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:54 15224 kernel: [245318.701717] i40e 0000:19:00.0: MAC source pruning enabled on all VFs
Aug  9 11:02:55 15224 kernel: [245318.710161] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:55 15224 kernel: [245318.718144] bond0: link status down for interface eno3, disabling it in 200 ms
Aug  9 11:02:55 15224 kernel: [245318.724405] i40e 0000:19:00.0: Set default VSI failed, err I40E_ERR_ADMIN_QUEUE_ERROR, aq_err I40E_AQ_RC_EINVAL
Aug  9 11:02:55 15224 kernel: [245318.724408] i40e 0000:19:00.0: Failed to restore promiscuous setting: off, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EINVAL
Aug  9 11:02:55 15224 kernel: [245318.724410] i40e 0000:19:00.0: VF BW shares not restored
Aug  9 11:02:55 15224 kernel: [245318.724466] i40e 0000:19:00.0: FW LLDP is disabled

 

 

 

0 Kudos
IntelSupport
Community Manager
943 Views

Hi Andrii,


Greetings!!


Our performance team has been trying to connect to your system, but they have encountered some difficulties. To resolve this issue, we would appreciate your assistance in troubleshooting the connection.


Best Regards,

Vishal


0 Kudos
AndriiV
Novice
899 Views

Hello!

We placed that server to other data-centre and connected it to Arista DCS-7060CX-32S switch with 2x25Gbps Intel XXV710

There no problem with LLDP (now it enabled), lower IRQ but too high (3 times) when compared with 4x10Gbps Intel 82599ES

2x25Gbps Intel XXV710:

2x25Gbps Intel XXV7102x25Gbps Intel XXV710

4x10Gbps Intel 82599ES:

4x10Gbps Intel 82599ES4x10Gbps Intel 82599ES

0 Kudos
Simon-Intel
Employee
910 Views

Hello AndriiV,

 

Thank you for contacting Intel.

 

This is the first follow-up regarding the issue you reported to us.

 

We wanted to inquire whether you had the opportunity to review our previous message.

 

Feel free to reply to this message, and we'll be more than happy to assist you further.

 

Regards,

Simon

Intel Server Support



0 Kudos
AndriiV
Novice
899 Views

Hello, Simon! Sorry for late answer. I send new IP for this server via e-mail.

0 Kudos
Simon-Intel
Employee
898 Views

Hello Andrii,

 

Thank you for sharing the credentials. We will get back to you as soon as we have an update.

 

Regards,

Simon

Intel Server Support



0 Kudos
AndriiV
Novice
727 Views

Hello!

Some new observations.

After moving the server to another data center and connecting it to another switch (Arista DCS-7060CX-32S), the problem with a large IRQ after 20 Gbps still remained.

We have another server with an XXV710 card connected to the same switch, and to the same 100Gbps port using an Intel XXV4DACBL2M passive 100GBASE-CR4 QSFP28 cable.

The configuration on the switch for these servers:

 

 

 

interface Ethernet9/1
   description 12-fdc (158178)
   speed forced 25gfull
   switchport access vlan 2724
   switchport
   channel-group 12 mode active
!
interface Ethernet9/2
   description 12-fdc (158178)
   speed forced 25gfull
   switchport
   channel-group 12 mode active
!
interface Ethernet9/3
   description 02-fdc (152224)
   speed forced 25gfull
   switchport access vlan 2724
   switchport
   channel-group 2 mode active
!
interface Ethernet9/4
   description 02-fdc (152224)
   speed forced 25gfull
   switchport access vlan 2724
   switchport
   channel-group 2 mode active
!

 

 

 

Now graphs of this two servers for yesterday:

Cache server inbound trafifc  4.7 Gbps in peakCache server inbound trafifc 4.7 Gbps in peakFile server inbound trafifc  1 Gbps in peakFile server inbound trafifc 1 Gbps in peak

 

Server with ID 152224 - cache server. In peak there was 22 Gbps outbound traffic and 4.8 Gbps inbound  traffic

Server with ID 158178 - file server.  In peak there was 39.5 Gbps outbound traffic and 1.1 Gbps inbound  traffic

 

As you can see bigger IRQ when bigger inbound traffic.

My previous conclusion about incompatibility with Juniper switches is incorrect. Although LLDP does not work with Juniper

 

0 Kudos
Azeem_Intel
Employee
452 Views

Hello AndriiV,


Greeting for the day!


Would it be helpful to set up a live debug call with you? If so, please let us know a suitable day and time this week. We would suggest Thursday or Friday, but we're flexible and can accommodate your availability.



Regards,

Azeem Ulla

Intel Customer Support Technician

intel.com/vroc


0 Kudos
AndriiV
Novice
427 Views

Hello Azeem!

You can call me thursday via phone, Facetime, telegram, viber, signal or write to this messangers. I will wait  your call from 10 to 16 GMT+2

0 Kudos
Reply