failed to access mmap memory

cheng_z_ · ‎04-16-2017

I am using mmap to share system memory between processes but it seems opencl don't know how to use these memory.

A simple test program attached. If clEnqueueWriteBuffer is using mmaped memory, it will fail. But everything goes fine if using any other memory like aligned_alloc or stack.

Any help is appreciated.

Jeffrey_M_Intel1 · ‎04-17-2017

Thank you for this report. We will be glad to investigate, but first I'm wondering if there is a reason to use clEnqueueWriteBuffer instead of clEnqueueMapBuffer. Since Intel GPUs share physical memory with the CPUs a separate write (copy) is often not necessary. Performance could be better with map/unmap instead.

cheng_z_ · ‎04-17-2017

Hi Jeffrey,

Thanks for your reply. I know it should be more efficient with other functions such as clCreateBuffer(...CL_MEM_USE_HOST_PTR...). But they all failed. This piece of program is written to illustrate the memory problem. I am thinking there must be sth wrong with how opencl get the physical memory address from a mmap memory pointer.

Cheng_Z_2 · ‎04-24-2017

Can anyone help on this issue?

Jeffrey_M_Intel1 · ‎04-25-2017

Sorry for the delay. I am running some more tests to understand what is going on. I'll get back to you with an update tomorrow.

Jeffrey_M_Intel1 · ‎04-25-2017

This change worked for me:

char* vaddr = (char*)mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);

to

char* vaddr = (char*)mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Cheng_Z_2 · ‎04-26-2017

Hi Jeffrey,

Thanks for your help. My source does use "PROT_READ|PROT_WRITE" to do mmap. And I tried to use PROT_READ, tried to use latest SRB4.1_linux64.zip, all failed. Also I have tried to use different linux kernel linux-4.10.8.tar.xz instead of linux-4.4.6, that makes no difference. Or the CPU is not supported correctly? I am using NUC with 6100U. I tried NUC7i3BNH either, same result.

Anything else I can try?

Cheng_Z_2 · ‎05-09-2017

Hi Jeffrey,

I build a CentOS system as MediaServerStudioEssentials2017R2 required which should be the golden reference. But It still failed.

Are you using

err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, 4096, vaddr, 0, NULL, NULL);

instead of

err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, 4096, test, 0, NULL, NULL);

Jeffrey_M_Intel1 · ‎05-10-2017

The processor, OS, and install procedure you are using should be fine. To double check, you could try some of our other samples.

Hopefully I've reproduced what you are seeing now.

With

	int fd = open("/dev/mem", O_RDWR);
	char* vaddr = (char*)mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
...
        err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, 4096, vaddr, 0, NULL, NULL);

I see

Platform: Intel(R) OpenCL
Device: Intel(R) HD Graphics
Error: Failed to write to source array!

The error code value is -5 (CL_OUT_OF_RESOURCES) Is this the same failure you're seeing?

By the way, if I mmap a regular file instead of /dev/mem I do not see the error -- clEnqueueWriteBuffer completes with CL_SUCCESS.

Our developers are looking at this case too. I'll let you know when there are updates.

Cheng_Z_2 · ‎05-11-2017

Hi Jeffrey,

The error is what I'm seeing.

The samples works fine with my current platform.

Looking forward to hearing from you.

Jeffrey_M_Intel1 · ‎05-25-2017

Our findings so far:

mmap for regular files seems to work as expected, no issues
mmap for /dev/mem is a special case, not supported by the current design.

Could you tell us a bit more about why you need to work with /dev/mem? There has been extensive discussion on this topic. It would be difficult to add support for /dev/mem in a general way, and it is not likely to perform well. However, with more info on what you hope to do we may be able to find a workaround you could implement in your application.

Cheng_Z_2 · ‎05-26-2017

Hi Jeffrey,

We don't work with /dev/mem but a similar way. We have a pci express device which capture video frames and transfer to the system memory. Our linux driver will allocate continuous physical memory so that the hardware will know where the DMA address is. When the frame data received, our applications can deal with the data without time consuming data copy if using mmap. Since applications share data memory we can chain data processes easily. We have frame process algorithm which works fine with cpu but the performance is relative low, that's why opencl is used. Currently we have to copy the data to application allocated buffer to run opencl and copy back when finished, which decrease the performance. If you can give us some advice on how to share memory between hardware and applications on Linux, that will be fine.

BR

Jeffrey_M_Intel1 · ‎06-01-2017

Thanks for these additional details. It is surprising to find that this case does not work. If there is any additional info you can give us about how the driver allocates and shares the memory this could help. Kernel system calls could be especially useful. Please feel free to move to private messages if any of this could be sensitive information.

As another path, this approach for memory transfers may be faster:

Use CL_MEM_ALLOC_HOST_PTR in clCreateBuffer to let the runtime allocate memory in a way that enables zero copy for you. A description of the different allocation modes is here. This pointer will not change for the lifetime of the cl_mem buffer.
use clEnqueueMapBuffer to obtain a CPU pointer to the driver allocated memory. For proper synchronization just make sure that when camera uploads the data the buffer is in this mapped state
Whenever buffer needs to be used by the GPU just call clEnqueueUnmapMemObject

Does this approach still fail if you use CL_MEM_ALLOC_HOST_PTR instead of CL_MEM_USE_HOST_PTR?

Cheng_Z_2 · ‎06-02-2017

Hi Jeffrey,

We are using kernel parameter to preserved physical memory.

BOOT_IMAGE=/boot/vmlinuz-4.10.8 memmap=2000M$0x8000000

We also tried CMA but got same result. It's a common practice to use dma_alloc_coherent for device drivers to allocate a small piece of memory, but it failed either in out test.

Our driver will use request_mem_region to claim the preserved memory. When the application open our device and call mmap, our driver can use remap_pfn_range to map the physical memory to user space. It works similar to /dev/mem, the source can be find in the linux kernel source.

I read the article about CL_MEM_ALLOC_HOST_PTR. It seems clCreateBuffer will only allocate memory in virtual space. It's hard to share this memory with hardware which requires none pageable continuous physical memory. And I don't know how to share this memory between different user space program.

Jeffrey_M_Intel1 · ‎06-04-2017

Thanks for the additional details. The dev team is considering this helpful input.

Jeffrey_M_Intel1 · ‎06-06-2017

The dev team has considered this case from many angles. Unfortunately the conclusion is that this scenario is not supported and there is no quick path to adding support for this scenario. For now your approach of making a copy may be the best you can do. If you are not already, making sure the buffer you are copying into is aligned may help -- but a copy will still be required.

Thanks for your details and patience as we worked through to this case. Even though I can't make any promises or give you any timelines, the team is very interested in exploring ways to access data directly from a device like this in the future.

Cheng_Z_2 · ‎06-06-2017

OK. I will go on with memory copy. If any progress, please tell me. Thanks for your help.