Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring

DMA scans with I/OAT have inconsistent results


Hello everyone,

after reading this article about the I/OAT DMA engine, I have been trying to build an I/OAT-powered scan operation. The basic principle is dividing the input data into equal-sized chunks, which are then copied sequentially into one of two local chunk-sized buffers in an alternating fashion. We are using two buffers because in the end we would like to process one buffer while the other is being loaded by the DMA engine (and then swap them), essentially overlapping data transfer and computation.

However, we have noticed that the DMA-copied data in the local buffer is somewhat volatile and the results vary with every iteration. The program in the attached file uses the aforementioned approach to scan a large area of memory chunk-by-chunk, counting the occurrences of the number 5. This is also tested with data from multiple NUMA nodes - in this case 0, 2, and 5. And, at least on our test system, the result varies with each iteration:

[Data on 0]
 sequential I/OAT
  result: 16384
  result: 16074
  result: 16388
  result: 16335
  result: 16168
[Data on 2]
 sequential I/OAT
  result: 1496
  result: 16330
  result: 16233
  result: 16211
  result: 16388
[Data on 5]
 sequential I/OAT
  result: 16081
  result: 16150
  result: 16104
  result: 16211
  result: 16066

The attached file could be compiled by just adding it as an executable target to the "ex4" directory of the blog post in the link above. Alternatively, it should compile with the following command line (given that the libraries and headers are available in the correct paths):

g++-7 -std=c++14 -fno-strict-aliasing -march=native -m64 -D_GNU_SOURCE -fPIC -fstack-protector -Wl,-z,relro,-z,now -Wl,-z,noexecstack ioat_inconsistencies.cpp -o bug_report -lnuma -lspdk_ioat -lspdk_util -lspdk_env_dpdk -lspdk_log -lrt -ldl -lrte_eal -lrte_mempool -lrte_ring -pthread


Can somebody confirm this inconsistent behavior on their machine? Or am I missing something about how the DMA memory works in general?

Thanks in advance for any help


0 Kudos
2 Replies

Hi Timo,

In your test code, you are using copy_done[idx] as a flag to determine when the copy is complete, but after the first two iterations of the chunk_count loop, the copy_done[0] and copy_done[1] values will already be 1 by the time you call spdk_ioat_submit_copy(), so the spdk_ioat_process_events() loop will probably exit early (before the copy is done).

Re-initializing copy_done[idx] to 0 before calling spdk_ioat_submit_copy() makes the test code work for me.


That was it, how embarrassing! Thank you very much.