- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
The environment: two servers (Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz) connected with Mellanox CX-5. All test are within local socket.
According to my understanding, with DDIO enabled, rdma read will fetch data directly from memory bypassing LLC if the data is not valid in LLC. Rdma write will 'write update' or 'write allocate' into LLC. And once data is valid in LLC by rdma writing, rdma read can access data from LLC.
With DDIO disabled, both rdma read and rdma write cannot access data in LLC. Both read or write to the memory directly.
What I expect: disabling DDIO will increase the latency of rdma write but have little influence on rdma read.
The experiment shows:
Command:
ib_read/write_lat -d mlx5_0 -R address
Result:
| latency/us | DDIO enabled | DDIO disabled |
| ib_read_lat | 1.773 | 1.897 |
| ib_write_lat | 0.933 | 0.938 |
It seems that latency of rdma write is rarely affacted by DDIO. But latency of rdma write increases obviously. The result is just opposite to my understanding.
Here is one more insteresting result:
Read latency : 1.80 us.
Raw write latency : 0.93 us
Write latency after disabling IBV_SEND_INLINE: 1.32us
Write latency after disabling IBV_SEND_INLINE and DDIO: 1.32us
My questions are:
Q1: Is my understanding about rdma and DDIO correct?
Q2: Why does DDIO affect rdma read instead of rdma write, which really confuses me?
Q3: Why latency of rdma read and that of rdma write is not equal after disabling IBV_SEND_INLINE and DDIO? Is there any other functions of processor or rdma or driver that improve the performance of write?
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hello oleotiger,
Thank you for contacting Intel Customer Support.
We do recommend checking the following url about Intel® Data Direct I/O Technology (Intel® DDIO):
A Primer
If this document does not have the information that you need please let us know so we can keep searching on our end.
Best regards,
Sergio S.
Intel Customer Support Technician
For firmware updates and troubleshooting tips, visit :https://intel.com/support/serverbios
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hello oleotiger,
We are following your case and would like to know if you need more assistance.
Best regards,
Sergio S.
Intel Customer Support Technician
For firmware updates and troubleshooting tips, visit :https://intel.com/support/serverbios
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
I have read the document and it helps me with a better understanding of DDIO. But the experiment result still conflicts with the description of DDIO.
As document shows, a network read with DDIO enabled and data invalid in LLC will trigger memory load and will trigger cache read with data valid in LLC.
I do rdma write--read--write-read of the same address within a cacheline. The cache or memory access of each read or write operation can be summarized:
| Access properity | Client | Server | |||
| op to data | access target | op to data | access target | ||
| DDIO ON | first write | read | memory | write allocate | cache |
| first read | write allocate | cache | read | cache | |
| second write | read update | cache | write update | cache | |
| second read | write | cache | read | cache | |
| DDIO OFF | first write | read | memory | write | memory |
| first read | write | memory | read | memory | |
| second write | read | memory | write | memory | |
| second read | write | memory | read | memory | |
So with DDIO on the second write should have less latency than that of the first write. The latency of first read and that of second read should be similar.
Q1: Why does experiment show that second read still has less latency than first read with DDIO on?
With DDIO disabled, both first and second write should have similar latency since they all access memory at both side.
Q2: Why does experiment show that first write has less latency than second write?
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi oleotiger,
I am also studying the relationship between DDIO and RDMA. I test the performance of RDMA using Perftest tool. But I find it is hard to operate the same address in cacheline or memory by only using Perftest. So, how do you "do rdma write--read--write-read of the same address within a cacheline" ? Do you write a Verb application? And how do you restrict the operation to the specific cacheline?
I really appreciate for your reply!
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi oleotiger,
I am interested in your test cases. Could you share your verbs tools for deep analysis?
There are some advices and questions:
1. For Q1, we should consider the impact of SQ\CQ during the test. The operations of reading wqes and writing cqes may cache hits in the second operation. Do you use the same QP for both read and write tests.
2. The ib_***_lat test may loops for n iterations. One-shot test can be accidental. Is there any tricks to get the accurate test data?
3. For Q2, if the data is accurate, I also want to know the nature of phenomena that the test data set in Q2 is more discrete than Q1.
I really appreciate for your reply!
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hello oleotiger,
We appreciate the additional information, please allow us to check it and we will get back to you.
Best regards,
Sergio S.
Intel Customer Support Technician
For firmware updates and troubleshooting tips, visit :https://intel.com/support/serverbios
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hello oleotiger,
Thank you for waiting for our updates, unfortunately the information that you need can not be disclosed publicly. We can only share information that is publicly available on our public Intel website.
We do apologize for this inconvenience.
Best regards,
Sergio S.
Intel Customer Support Technician
For firmware updates and troubleshooting tips, visit :https://intel.com/support/serverbios