OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

failed to access mmap memory

cheng_z_
Beginner
519 Views

I am using mmap to share system memory between processes but it seems opencl don't know how to use these memory.

A simple test program attached. If clEnqueueWriteBuffer is using mmaped memory, it will fail. But everything goes fine if using any other memory like aligned_alloc or stack.

Any help is appreciated.

0 Kudos
16 Replies
Jeffrey_M_Intel1
Employee
519 Views

Thank you for this report.  We will be glad to investigate, but first I'm wondering if there is a reason to use clEnqueueWriteBuffer instead of clEnqueueMapBuffer.  Since Intel GPUs share physical memory with the CPUs a separate write (copy) is often not necessary.  Performance could be better with map/unmap instead.

cheng_z_
Beginner
519 Views

Hi Jeffrey,

Thanks for your reply. I know it should be more efficient with other functions such as clCreateBuffer(...CL_MEM_USE_HOST_PTR...). But they all failed. This piece of program is written to illustrate the memory problem. I am thinking there must be sth wrong with how opencl get the physical memory address from a mmap memory pointer.

Cheng_Z_2
Beginner
519 Views

Can anyone help on this issue?

Jeffrey_M_Intel1
Employee
519 Views

Sorry for the delay. I am running some more tests to understand what is going on.  I'll get back to you with an update tomorrow.

Jeffrey_M_Intel1
Employee
519 Views

This change worked for me:

char* vaddr = (char*)mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);

to

char* vaddr = (char*)mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

 

Cheng_Z_2
Beginner
519 Views

Hi Jeffrey,

Thanks for your help. My source does use "PROT_READ|PROT_WRITE" to do mmap. And I tried to use PROT_READ, tried to use latest SRB4.1_linux64.zip, all failed. Also I have tried to use different linux kernel linux-4.10.8.tar.xz instead of  linux-4.4.6, that makes no difference. Or the CPU is not supported correctly? I am using NUC with 6100U. I tried NUC7i3BNH either, same result.

Anything else I can try?

Cheng_Z_2
Beginner
519 Views

Hi Jeffrey,

I build a CentOS system as MediaServerStudioEssentials2017R2 required which should be the golden reference. But It still failed.

Are you using 

err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, 4096, vaddr, 0, NULL, NULL);

instead of

err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, 4096, test, 0, NULL, NULL);

 

Jeffrey_M_Intel1
Employee
519 Views

The processor, OS, and install procedure you are using should be fine.  To double check, you could try some of our other samples.

Hopefully I've reproduced what you are seeing now.

With

	int fd = open("/dev/mem", O_RDWR);
	char* vaddr = (char*)mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
...
        err = clEnqueueWriteBuffer(queue, input, CL_TRUE, 0, 4096, vaddr, 0, NULL, NULL);

I see

Platform: Intel(R) OpenCL
Device: Intel(R) HD Graphics
Error: Failed to write to source array!

The error code value is -5 (CL_OUT_OF_RESOURCES) Is this the same failure you're seeing?

By the way, if I mmap a regular file instead of /dev/mem I do not see the error -- clEnqueueWriteBuffer completes with CL_SUCCESS.

Our developers are looking at this case too.  I'll let you know when there are updates.

 

 

Cheng_Z_2
Beginner
519 Views

Hi Jeffrey,

The error is what I'm seeing.

The samples works fine with my current platform.

Looking forward to hearing from you.

Jeffrey_M_Intel1
Employee
519 Views

Our findings so far:

  • mmap for regular files seems to work as expected, no issues
  • mmap for /dev/mem is a special case, not supported by the current design.

Could you tell us a bit more about why you need to work with /dev/mem?  There has been extensive discussion on this topic.  It would be difficult to add support for /dev/mem in a general way, and it is not likely to perform well.  However, with more info on what you hope to do we may be able to find a workaround you could implement in your application.

Cheng_Z_2
Beginner
519 Views

Hi Jeffrey,

We don't work with /dev/mem but a similar way. We have a pci express device which capture video frames and transfer to the system memory. Our linux driver will allocate continuous physical memory so that the hardware will know where the DMA address is. When the frame data received, our applications can deal with the data without time consuming data copy if using mmap. Since applications share data memory we can chain data processes easily. We have frame process algorithm which works fine with cpu but the performance is relative low, that's why opencl is used. Currently we have to copy the data to application allocated buffer to run opencl and copy back when finished, which decrease the performance. If you can give us some advice on how to share memory between hardware and applications on Linux, that will be fine. 

BR

Jeffrey_M_Intel1
Employee
519 Views

Thanks for these additional details.  It is surprising to find that this case does not work.  If there is any additional info you can give us about how the driver allocates and shares the memory this could help.  Kernel system calls could be especially useful.  Please feel free to move to private messages if any of this could be sensitive information.

As another path, this approach for memory transfers may be faster:

  1. Use CL_MEM_ALLOC_HOST_PTR in clCreateBuffer to let the runtime allocate memory in a way that enables zero copy for you.  A description of the different allocation modes is here.  This pointer will not change for the lifetime of the cl_mem buffer.
  2.  use clEnqueueMapBuffer to obtain a CPU pointer to the driver allocated memory. For proper synchronization just make sure that when camera uploads the data the buffer is in this mapped state
  3.  Whenever buffer needs to be used by the GPU just call clEnqueueUnmapMemObject

Does this approach still fail if you use CL_MEM_ALLOC_HOST_PTR instead of CL_MEM_USE_HOST_PTR?

Cheng_Z_2
Beginner
519 Views

Hi Jeffrey,

We are using kernel parameter to preserved physical memory.

BOOT_IMAGE=/boot/vmlinuz-4.10.8 memmap=2000M$0x8000000

We also tried CMA but got same result. It's a common practice to use dma_alloc_coherent for device drivers to allocate a small piece of memory, but it failed either in out test.

Our driver will use request_mem_region to claim the preserved memory. When the application open our device and call mmap, our driver can use remap_pfn_range to map the physical memory to user space. It works similar to /dev/mem, the source can be find in the linux kernel source.

I read the article about CL_MEM_ALLOC_HOST_PTR. It seems clCreateBuffer will only allocate memory in virtual space. It's hard to share this memory with hardware which requires none pageable continuous physical memory. And I don't know how to share this memory between different user space program.

Jeffrey_M_Intel1
Employee
519 Views

Thanks for the additional details.  The dev team is considering this helpful input.

Jeffrey_M_Intel1
Employee
519 Views

The dev team has considered this case from many angles.  Unfortunately the conclusion is that this scenario is not supported and there is no quick path to adding support for this scenario.  For now your approach of making a copy may be the best you can do.  If you are not already, making sure the buffer you are copying into is aligned may help -- but a copy will still be required.

Thanks for your details and patience as we worked through to this case.  Even though I can't make any promises or give you any timelines, the team is very interested in exploring ways to access data directly from a device like this in the future.  

 

Cheng_Z_2
Beginner
519 Views

OK. I will go on with memory copy. If any progress, please tell me. Thanks for your help.

Reply