We have implemented a heterogeneous system with Intel CPU and an (Xilinx) FPGA to meet 10 Gbps network data processing. All work in custom design and userspace. There are no kernel drivers. Each CPU has its own mmap'ed memory region and after it is converted to physical address, it is sent to FPGA. Then FPGA writes data into the physical addresses.
We tried to use Intel DDIO to injects the inbound I/O traffic to the LLC instead of the main memory but we came across a situation:
If there is a packet flow in the network when the system is turned on, FPGA writes from PCIe to the relevant memory within ~3 seconds, the system works without any problems. (When I check with PCM tool, it is seen high in LLC occupancy) But if the first message does not come within 3 seconds, the CPU cannot read any value from FPGA again. Probably, the inbound data (from FPGA) is allocated to the LLC (write allocate) then evicts to the memory and somehow CPU cacheline cannot be invalidated and never read values written from FPGA.
We would be very pleased if you could help us with a related document, datasheet, a tool etc. that will facilitate our understanding.
Unfortunately I gave wrong information. The issue has nothing to do with timeout. As I understand it, the application starts a thread on a different CPU after 3 seconds and evicts the LLC space used for FPGA communication. From what I've read, it's called the Leaky DMA problem. As a result, can another CPU be prevented from writing to the LLC memory space reserved for FPGA communication?
Thanks in advance