Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16597 Discussions

Issues on PCI Transfers and Kernels Execution Overlapping.

Altera_Forum
Honored Contributor II
1,091 Views

Hi everybody, 

 

 

We have a piece of software that uses a double buffering pipeline to overlap FPGA execution, PCI transfers, and CPU-side competition. 

The software is pretty throughput intensive, and at each cycle of the pipeline we do many PCI transfers, execute many many kernels on the FPGA. 

Obviously, the best would be to be able to keep the PCI bus and the FPGA as busy as possible. However, we know that our FPGA is equipped with only one DMA engine; thus, transfers from and to the FPGA are going to be serialized, but we can live with that. 

 

 

However, we see few issues: 

1. Sometimes, when a PCI transfer is in-flight, no kernel execution is scheduled on the FPGA until the PCI transfer is concluded. Note that the two tasks are independent and the completion of one should not stall the execution other. (See the red rectangles in the attached first.png file.) 

 

2. The vice-versa of the previous point also happens. Which is, no PCI transfers is executed until the current kernels on the FPGA concluded. (See the orange rectangle in the attached first.png file.) 

 

3. [The major problem] We saw that after few hundreds of cycles (thousands of kernels and PCI transfers), time gaps from one task and another start to appear. Gaps that get worst and worst with the time and that become unacceptable at some time. Making a cycle lasts around 30 seconds instead than the usual 3.5/4 seconds. It's true that the software runs for around one hour performing hundreds of thousands of PCI transfers and kernel executions, but we believe this should not happen. And it does not occur when using a GPU.  

We have no idea whether this is a problem of the driver or of the FPGA itself but we are thinking to something like "the driver keeps a list of events/object, and it has to traverse all the list every time making it slower and slower the more task you do." (See the purple rectangle in the attached second.png file.) 

 

 

We would like to know if some of you experienced similar problems. 

Also, any idea, comment, and suggestion are welcome! 

 

 

OS: Centos 7.4 

FPGA board: Nallatech 510T 

Software environment: IntelFPGA SDK v17.1 build 270 and BSP R001.005.0004
0 Kudos
1 Reply
Altera_Forum
Honored Contributor II
259 Views

update 1: 

 

Following our intuition about problem# 3, we found an easy workaround for that issue: If we release and recreate (clreleasecommandqueue() and clcreatecommandqueue()) all the command queues at every cycle of our pipeline the problem disappears. 

We were not able to check whether this affects other boards or not; but, we will do it later this week I guess.
0 Kudos
Reply