Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

DMA scans with I/OAT have inconsistent results

Timo_D_
Beginner
757 Views

Hello everyone,

after reading this article about the I/OAT DMA engine, I have been trying to build an I/OAT-powered scan operation. The basic principle is dividing the input data into equal-sized chunks, which are then copied sequentially into one of two local chunk-sized buffers in an alternating fashion. We are using two buffers because in the end we would like to process one buffer while the other is being loaded by the DMA engine (and then swap them), essentially overlapping data transfer and computation.

However, we have noticed that the DMA-copied data in the local buffer is somewhat volatile and the results vary with every iteration. The program in the attached file uses the aforementioned approach to scan a large area of memory chunk-by-chunk, counting the occurrences of the number 5. This is also tested with data from multiple NUMA nodes - in this case 0, 2, and 5. And, at least on our test system, the result varies with each iteration:

[Data on 0]
 sequential I/OAT
  result: 16384
  result: 16074
  result: 16388
  result: 16335
  result: 16168
[Data on 2]
 sequential I/OAT
  result: 1496
  result: 16330
  result: 16233
  result: 16211
  result: 16388
[Data on 5]
 sequential I/OAT
  result: 16081
  result: 16150
  result: 16104
  result: 16211
  result: 16066

The attached file could be compiled by just adding it as an executable target to the "ex4" directory of the blog post in the link above. Alternatively, it should compile with the following command line (given that the libraries and headers are available in the correct paths):

g++-7 -std=c++14 -fno-strict-aliasing -march=native -m64 -D_GNU_SOURCE -fPIC -fstack-protector -Wl,-z,relro,-z,now -Wl,-z,noexecstack ioat_inconsistencies.cpp -o bug_report -lnuma -lspdk_ioat -lspdk_util -lspdk_env_dpdk -lspdk_log -lrt -ldl -lrte_eal -lrte_mempool -lrte_ring -pthread

 

Can somebody confirm this inconsistent behavior on their machine? Or am I missing something about how the DMA memory works in general?

Thanks in advance for any help

Timo

0 Kudos
2 Replies
DANIEL_V_Intel
Employee
757 Views

Hi Timo,

In your test code, you are using copy_done[idx] as a flag to determine when the copy is complete, but after the first two iterations of the chunk_count loop, the copy_done[0] and copy_done[1] values will already be 1 by the time you call spdk_ioat_submit_copy(), so the spdk_ioat_process_events() loop will probably exit early (before the copy is done).

Re-initializing copy_done[idx] to 0 before calling spdk_ioat_submit_copy() makes the test code work for me.

0 Kudos
Timo_D_
Beginner
757 Views

That was it, how embarrassing! Thank you very much.

0 Kudos
Reply