I am using mmap to share system memory between processes but it seems opencl don't know how to use these memory.
A simple test program attached. If clEnqueueWriteBuffer is using mmaped memory, it will fail. But everything goes fine if using any other memory like aligned_
Any help is appreciated.
Thank you for this report. We will be glad to investigate, but first I'm wondering if there is a reason to use clEnqueueWriteBuffer instead of clEnqueueMapBuffer. Since Intel GPUs share physical memory with the CPUs a separate write (copy) is often not necessary. Performance could be better with map/unmap instead.
Thanks for your reply. I know it should be more efficient with other functions such as clCreateBuffer(...CL_MEM_USE_HOST_PTR...). But they all failed. This piece of program is written to illustrate the memory problem. I am thinking there must be sth wrong with how opencl get the physical memory address from a mmap memory pointer.
This change worked for me:
char* vaddr = (char*)mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
char* vaddr = (char*)mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
Thanks for your help. My source does use "PROT_READ|PROT_WRITE" to do mmap. And I tried to use PROT_READ, tried to use latest SRB4.1_linux64.zip, all failed. Also I have tried to use different linux kernel linux-4.10.8.tar.xz instead of linux-4.4.6, that makes no difference. Or the CPU is not supported correctly? I am using NUC with 6100U. I tried NUC7i3BNH either, same result.
Anything else I can try?
I build a CentOS system as MediaServerStudioEssentials2017R2 required which should be the golden reference. But It still failed.
Are you using
err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, 4096, vaddr, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, 4096, test, 0, NULL, NULL);
The processor, OS, and install procedure you are using should be fine. To double check, you could try some of our other samples.
Hopefully I've reproduced what you are seeing now.
int fd = open("/dev/mem", O_RDWR); char* vaddr = (char*)mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); ... err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, 4096, vaddr, 0, NULL, NULL);
Platform: Intel(R) OpenCL
Device: Intel(R) HD Graphics
Error: Failed to write to source array!
The error code value is -5 (CL_OUT_OF_RESOURCES) Is this the same failure you're seeing?
By the way, if I mmap a regular file instead of /dev/mem I do not see the error -- clEnqueueWriteBuffer completes with CL_SUCCESS.
Our developers are looking at this case too. I'll let you know when there are updates.
Our findings so far:
- mmap for regular files seems to work as expected, no issues
- mmap for /dev/mem is a special case, not supported by the current design.
Could you tell us a bit more about why you need to work with /dev/mem? There has been extensive discussion on this topic. It would be difficult to add support for /dev/mem in a general way, and it is not likely to perform well. However, with more info on what you hope to do we may be able to find a workaround you could implement in your application.
We don't work with /dev/mem but a similar way. We have a pci express device which capture video frames and transfer to the system memory. Our linux driver will allocate continuous physical memory so that the hardware will know where the DMA address is. When the frame data received, our applications can deal with the data without time consuming data copy if using mmap. Since applications share data memory we can chain data processes easily. We have frame process algorithm which works fine with cpu but the performance is relative low, that's why opencl is used. Currently we have to copy the data to application allocated buffer to run opencl and copy back when finished, which decrease the performance. If you can give us some advice on how to share memory between hardware and applications on Linux, that will be fine.
Thanks for these additional details. It is surprising to find that this case does not work. If there is any additional info you can give us about how the driver allocates and shares the memory this could help. Kernel system calls could be especially useful. Please feel free to move to private messages if any of this could be sensitive information.
As another path, this approach for memory transfers may be faster:
- Use CL_MEM_ALLOC_HOST_PTR in clCreateBuffer to let the runtime allocate memory in a way that enables zero copy for you. A description of the different allocation modes is here. This pointer will not change for the lifetime of the cl_mem buffer.
- use clEnqueueMapBuffer to obtain a CPU pointer to the driver allocated memory. For proper synchronization just make sure that when camera uploads the data the buffer is in this mapped state
- Whenever buffer needs to be used by the GPU just call clEnqueueUnmapMemObject
Does this approach still fail if you use CL_MEM_ALLOC_HOST_PTR instead of CL_MEM_USE_HOST_PTR?
We are using kernel parameter to preserved physical memory.
We also tried CMA but got same result. It's a common practice to use dma_alloc_coherent for device drivers to allocate a small piece of memory, but it failed either in out test.
Our driver will use request_mem_region to claim the preserved memory. When the application open our device and call mmap, our driver can use remap_pfn_range to map the physical memory to user space. It works similar to /dev/mem, the source can be find in the linux kernel source.
I read the article about CL_MEM_ALLOC_HOST_PTR. It seems clCreateBuffer will only allocate memory in virtual space. It's hard to share this memory with hardware which requires none pageable continuous physical memory. And I don't know how to share this memory between different user space program.
The dev team has considered this case from many angles. Unfortunately the conclusion is that this scenario is not supported and there is no quick path to adding support for this scenario. For now your approach of making a copy may be the best you can do. If you are not already, making sure the buffer you are copying into is aligned may help -- but a copy will still be required.
Thanks for your details and patience as we worked through to this case. Even though I can't make any promises or give you any timelines, the team is very interested in exploring ways to access data directly from a device like this in the future.