Ethernet Products
Determine ramifications of Intel® Ethernet products and technologies
5742 Discussions

E810-XXV PCIe NIC RDMA with UD Queue Pair, has Long Send Time Every 2^24 Sends.

latency_hunter
New Contributor I
10,426 Views

Setup

I am testing worst case latency in a setup where I have two E810-XXV direct connected. I am on a Debian 12 standard 6.1.0-26-amd64 Linux kernel.

I have upgraded to the latest ice and irdma drivers:

 

[  154.144293] ice: Intel(R) Ethernet Connection E800 Series Linux Driver - version 1.15.5
[  154.144377] ice: Copyright (C) 2018-2024 Intel Corporation
[  154.227153] ice 0000:04:00.0: fw 7.7.3 api 1.7.11 nvm 4.70 0x8001f7bb 1.3755.0 [8086:159b] [8086:0003]
...
[  155.655299] irdma driver version: 1.15.15

 

Findings

My code sets up a UD queue pair using a socket to share information, and times how long a round trip takes using IBV_WR_SEND and then waiting for a response from the other side. I noticed huge jumps in some RTTs (25+ ms) and isolated where the delay was coming from.  I simplified my code some more so the stream of data is just going one way, and implemented the test as follows:

LOOP:
1) Start timer
2) Post send request:

3) Poll send completion:

4) End Timer

 

With this simplified test, I still got unexpected long latencies, specifically at the send completion polling. The receiver side also detects the slow packet, so the poll completion isn't just taking a long time to return, the packet is getting stuck on the way out.

 

Data and Explanation

 

[ 8751.362085] ib_client: WARN - It took longer than 27125379 ns to wait for send, count: 8388610
[ 8803.219365] ib_client: WARN - It took longer than 25279562 ns to wait for send, count: 25165826
[ 8855.077630] ib_client: WARN - It took longer than 16410736 ns to wait for send, count: 41943042
[ 8906.934875] ib_client: WARN - It took longer than 13447023 ns to wait for send, count: 58720258
[ 8958.793107] ib_client: WARN - It took longer than 29723372 ns to wait for send, count: 75497474
[ 9010.650323] ib_client: WARN - It took longer than 29989750 ns to wait for send, count: 92274690
[ 9062.508529] ib_client: WARN - It took longer than 28589231 ns to wait for send, count: 109051906
[ 9114.364223] ib_client: WARN - It took longer than 29764279 ns to wait for send, count: 125829122
[ 9166.221406] ib_client: WARN - It took longer than 29413178 ns to wait for send, count: 142606338
[ 9168.820274] min: 2117 ns, max: 29989750 ns, mean: 3063 ns
[ 9168.820276] <3000 : 71619390
[ 9168.820276] [3000, 4000) : 71646397
[ 9168.820277] [4000, 5000) : 176602
[ 9168.820278] [5000, 5500) : 9
[ 9168.820278] [5500, 6000) : 154
[ 9168.820279] [6000, 6500) : 134
[ 9168.820279] [6500, 7000) : 516
[ 9168.820280] [7000, 7500) : 293
[ 9168.820280] [7500, 8000) : 917
[ 9168.820281] [8000, 9000) : 1075
[ 9168.820281] [9000, 10000) : 1139
[ 9168.820282] [10000, 15000) : 1031
[ 9168.820282] [15000, 20000) : 0
[ 9168.820283] [20000, 25000) : 0
[ 9168.820283] [25000, 30000) : 0
[ 9168.820284] [30000, 50000) : 0
[ 9168.820284] [50000, 100000) : 0
[ 9168.820285] [100000, 200000) : 0
[ 9168.820285] [200000, 300000) : 0
[ 9168.820286] [300000, 500000) : 0
[ 9168.820286] [500000, 1000000) : 0
[ 9168.820287] [1000000, 2000000) : 0
[ 9168.820287] [2000000, 3000000) : 0
[ 9168.820288] [3000000, 4000000) : 0
[ 9168.820288] >4000000 : 9

 

 The above data is from a test that uses the kernel-space ib_* interfaces and shows a log for each slow completion poll and a histogram of all completions. The count field in the first logs show what message count we are at (1 indexed), so on the 8388610th message we see our first very long completion. The pattern that follows is every 16777216 (2^24) messages after the initial delay we get another long completion time. This is not a time based slowdown, if I slow down my send loop that same message numbers have the slowdown.

8388610 is also suspiciously close to 2^22/2.
My only idea is that this may be related to the sq_psn (It is a 24 bit field) that you need to set in your UD queue pair, however changing this value to 0 or any random value does not change when the issue first shows up, 8388610th message or 2^24 messages after.

 

I have also replicated my issue using the userspace ibverbs interface by adding some timing instrumentation to the ud_pingpong.c example in rdma core. The same issue is present, with long latencies happening at the same message counts {8388610, 25165826, 41943042, ...}

 

These are not scheduler interruptions, as my kernel space implementation does not get interrupted and the user-space testing is on isolated cores where scheduler interruptions have been reduced to 60 us. Also if it was some sort of preemption, it would not always happen at the same message.

Other Tests/Question

I have another version of this RTT test that uses a RC queue pair type and it does not suffer from the same issue.

 

Any other ideas at to what could be causing these long completions ?

Thanks.

0 Kudos
34 Replies
pujeeth
Employee
6,770 Views

Hello latency_hunter


Greetings!


Apologies for the delayed response.


Kindly let us know if you still require assistance with this case.

Please feel free to reply to this email. We're here to assist you every step of the way.


Regards,

Pujeeth

Intel Customer Support Technician


0 Kudos
latency_hunter
New Contributor I
6,765 Views

That would be nice. It seems pretty clear that the RoCE implementation has a bug.

I have verified this issue does not show up with the same code and Mellanox ConnectX-6 SmartNIC HW.

 

Thanks.

0 Kudos
pujeeth
Employee
6,743 Views

Hello latency_hunter


Greetings!


Thank you for the update, in order to proceed further with troubleshooting would request you to share us the below information:


1) Kindly share the driver and firmware version of the NIC card.

2) Kindly provide the system details.

3) Please share front and back pictures of the NIC card, clearly showing the serial number and MM ID markings.

4) Kindly share the SSU logs.


Intel® System Support Utility for the Linux* Operating System

https://www.intel.com/content/www/us/en/download/18895/intel-system-support-utility-for-the-linux-operating-system.html


Regards

Pujeeth

Intel customer support technician




0 Kudos
pujeeth
Employee
6,710 Views

Hello latency_hunter,


Thank you for contacting Intel.

 

This is the first follow-up regarding the issue you reported to us.

We wanted to inquire whether you had the opportunity to review the plan of action (POA) we provided.

 

Feel free to reply to this email, and we'll be more than happy to assist you further.


Regards,

Pujeeth

Intel Customer Support Technician


0 Kudos
latency_hunter
New Contributor I
6,674 Views

I have, still working on getting the pictures and SSU logs. Should have them in a day or two.

0 Kudos
pujeeth
Employee
6,667 Views

Hello latency_hunter,


Greetings!


Thank you for the update, kindly keep us posted.


Regards

Pujeeth_Intel


0 Kudos
pujeeth
Employee
6,615 Views

Hello latency_hunter,


Thank you for contacting Intel.

 

This is the first follow-up regarding the issue you reported to us.

We wanted to inquire whether you had the opportunity to review the plan of action (POA) we provided.

 

Feel free to reply to this email, and we'll be more than happy to assist you further.


Regards,

Pujeeth

Intel Customer Support Technician



0 Kudos
pujeeth
Employee
6,574 Views

Hello latency_hunter,


Thank you for contacting Intel.

 

We will proceed to close this case. If you find that you still required assistance, we kindly request you to respond to the case. This will allow us to either reopen the current case or initiate a new one.


Regards,

Pujeeth_Intel



0 Kudos
latency_hunter
New Contributor I
6,568 Views

I'm still working on getting the requested information. Because it took a while for you to get back to us, we repurposed the server we were using. We are working on getting another one set up.

0 Kudos
pujeeth
Employee
6,566 Views

Hello latency_hunter,


Thank you for the update, kindly keep us posted.


Regards

Pujeeth_Intel


0 Kudos
latency_hunter
New Contributor I
6,531 Views

I am on a Debian 12 standard 6.1.0-26-amd64 Linux kernel.

 

I have upgraded to the latest ice and irdma drivers:

[ 154.144293] ice: Intel(R) Ethernet Connection E800 Series Linux Driver - version 1.15.5
[ 154.144377] ice: Copyright (C) 2018-2024 Intel Corporation
[ 154.227153] ice 0000:04:00.0: fw 7.7.3 api 1.7.11 nvm 4.70 0x8001f7bb 1.3755.0 [8086:159b] [8086:0003]
...
[ 155.655299] irdma driver version: 1.15.15

 

Pictures attached.

 

SSU logs attached.

0 Kudos
pujeeth
Employee
6,517 Views

Hello latency_hunter


Greetings!


Thank you for sharing the details. We would like to request that you review the supported operating systems for the 810-XXV.


Supported Operating Systems for Retail Intel® Ethernet Adapters

https://www.intel.com/content/www/us/en/support/articles/000025890/ethernet-products.html


Regards

Pujeeth_Intel


0 Kudos
pujeeth
Employee
6,430 Views

Hello latency_hunter


Greetings!


We wanted to follow up on this case, Please feel free to respond to this email at your earliest convenience.

 

Regards,

Pujeeth

Intel Customer Support Technician


0 Kudos
latency_hunter
New Contributor I
6,276 Views

I did read the support matrix, I'll replicate the issue on Debian 11 if you'd like and report back.

0 Kudos
pujeeth
Employee
6,249 Views

Hello latency_hunter


Greetings!


Thank you for the update, kindly keep us posted.


Regards

Pujeeth_Intel


0 Kudos
pujeeth
Employee
6,161 Views

Hello latency_hunter,


Thank you for contacting Intel.

 

This is the first follow-up regarding the issue you reported to us.

 

Feel free to reply to this email, and we'll be more than happy to assist you further.


Regards,

Pujeeth_Intel


0 Kudos
latency_hunter
New Contributor I
6,146 Views

I have replicated the issue as documented above on a Debian 11 machine (SSU logs attached).

Linux x1cymac 5.10.0-34-amd64 #1 SMP Debian 5.10.234-1 (2025-02-24) x86_64 GNU/Linux

 

I installed the latest driver, so they are slightly different.

[    1.637963] ice: Intel(R) Ethernet Connection E800 Series Linux Driver - version 1.16.3
[    1.718722] ice 0000:02:00.0: fw 7.7.3 api 1.7.11 nvm 4.70 0x8001f7bb 1.3755.0 [8086:159b] [8086:0003]
[  475.879715] irdma driver version: 1.16.10



Example ibv_ud_pingpong output on the new machine:

bla@x1cymac:~/e810_test/rdma-core$ taskset -c 2 ./build/bin/ibv_ud_pingpong -g 1 -n 100000000
  local address:  LID 0x0001, QPN 0x000004, PSN 0x215fa1: GID ::ffff:192.168.1.106
Using psn of 0x215fa1
Dest GID = 00:00:00:00:00:00:00:00:00:00:ff:ff:c0:a8:01:67
  remote address: LID 0x0000, QPN 0x001199, PSN 0x09c669, GID ::ffff:192.168.1.103
 - Long send time: 20009908 ns, index: 8388610 -
 - Long send time: 56279375 ns, index: 25165826 -
 - Long send time: 52510283 ns, index: 41943042 -
 - Long send time: 37830943 ns, index: 58720258 -
 - Long send time: 54142370 ns, index: 75497474 -
 - Long send time: 43183471 ns, index: 92274690 -


204800000000 bytes in 1300.34 seconds = 1259.98 Mbit/sec
100000000 iters in 1300.34 seconds = 13.00 usec/iter
Max send time was: 56279375 ns at index: 25165826 and last was 4557 ns

 The delays are not always the same time, but almost always over 10ms and they happen at the exact same indices.

0 Kudos
pujeeth
Employee
6,112 Views

Hello latency_hunter,


Thank you for sharing the SSU logs. Upon reviewing the logs, we see that the system is manufactured by Supermicro. Kindly let us know if this adapter came with the system or was purchased separately.


Regards

Pujeeth_Intel


0 Kudos
latency_hunter
New Contributor I
6,108 Views

It was purchased separately.

0 Kudos
Fikri_Intel
Employee
6,077 Views

Hi latency_hunter,


Thank you for your response.


Would you be able to share with us the NIC card with the label to check the marking?


Along with that, please let us know if you are using the latest driver and firmware as the below link:

1- https://www.intel.com/content/www/us/en/download/15084/intel-ethernet-adapter-complete-driver-pack.html

2- https://www.intel.com/content/www/us/en/download/19624/non-volatile-memory-nvm-update-utility-for-intel-ethernet-network-adapter-e810-series.html


Looking forward to your response.




Regrads,

Fikri O.


0 Kudos
Reply