- thread 1 (feeder): has a loop that feeds data into OpenCL and queues kernels
- thread 2 (consumer): waits for results and reads output data.
- an OpenCL Host command queue with out-of-order execution enabled
- what might be causing this?
- how to debug this on the Yocto platform?
I have attached a reproducer for this issue and the text output it produces. Attached: source code, output text from the program. Compiled with gcc gpu_issue.c -o gpu_issue -L ./opt/intel/opencl -LOpenCL
Note that there is no output after the call to map buffer 2. If I modify the code to select a CPU device then the call to map buffer 2 succeeds.
Thank you for your report, I can confirm this is a GPU driver problem.
We are looking into possible solutions for it, so it may be difficult to provide timeline for the fix at the moment.
In the meantime if you could provide more information about what do you want to accomplish, then I may be able to provide another solution for your use case.
Thanks for the information and the offer to help us find a way to work around the issue. Below is an outline of the processing constraints we need to satisfy.
Our system processes a real-time data stream on a very short time cycle (in the ms region) so low latency processing is as important as raw speed. To allow for varying processing latency we use multiple input and output buffers in a cyclic fashion. Also, on the OpenCL implementation used on another platform (ARM Mali) we found we could reduced latency by queueing tasks ahead and using the out-of-order queueing feature.
Would you be able to give us more information about the issue and what we must avoid doing?
What is happening in the code is that MapBuffer is called with blocking_flag set to True on an Out Of Order Queue.
There are no input events, so for the driver it means, map this buffer for me now, even if it may be in use by GPU, please confirm that this is expected.
If you want such access to buffer storage and synchronization is not needed, then you may have another out of order queue on which this MapBuffer operation will actually happen. Currently driver improperly waits for the blocked unMap operation to complete prior to servicing MapBuffer call, this wait shouldn't be present in out of order queue.
If you want to actually synchronize on the previous unMap call, then code should use events.
Thanks Michal. Yes, the map now is intended. Now that I understand what the issue is, I have been able reorder a couple of things to avoid this problem, so I have our application running on the GPU. Unfortunately, not fast enough though, but I'll open a new topic to ask for help on that.