OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1718 Discussions

Max size error on Creating Buffer using Alloc_host_ptr

If_09
Beginner
814 Views

Hello all,

The max mem alloc size of my cpu device (i5-3470) is 4266006528(less than 4GB) and that of gpu (hd-2500) is 425721856(less than 512MB).

Now i am creating a simple buffer clInput = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, sizeof(type) * elements, NULL, &err);

But I am getting invalid_buffer_size for GPU when size reaches 512MB. This would have made sense if i am allocating buffer on GPU memory. the only purpose of using alloc_host_ptr flag was to use the max_mem_alloc size of cpu which is 4GB. Am i doing something wrong or is it a bug?

0 Kudos
6 Replies
Raghupathi_M_Intel
814 Views

I was able to reproduce the error. I will debug this and get back to you. Thanks for reporting.

Raghu

0 Kudos
Raghupathi_M_Intel
814 Views

Sorry. I checked my test program again. My CPU device reports CL_DEVICE_MAX_MEM_ALLOC_SIZE as 536838144 which is less than 512MB, to be precise. So if I pass this exact number as the sizeof my buffer I didn't get the error. Can you use gpucapsviewer to see what your CPU device reports this number as? And try creating the buffer whose size is less than or equal to this number. Let me know if you are still getting the error.

I am curious why you got 4GB and I am getting 512MB.

Raghu

0 Kudos
If_09
Beginner
814 Views

Hi thanks for the prompt response, I checked CL_DEVICE_MAX_MEM_ALLOC_SIZE for cpu again and found the same 4266006528 bytes. I am checking this number in opencl code using clGetDeviceInfo(device_id,  CL_DEVICE_MAX_MEM_ALLOC_SIZE , sizeof(tmpLong), &tmpLong, NULL) for cpu.

I am not familiar with gpucapsviewer. i tried typing it in cmd prompt but got unrecognizable command error.

0 Kudos
If_09
Beginner
814 Views

Hi thanks for the prompt response, I checked CL_DEVICE_MAX_MEM_ALLOC_SIZE for cpu again and found the same 4266006528 bytes. I am checking this number in opencl code using clGetDeviceInfo(device_id,  CL_DEVICE_MAX_MEM_ALLOC_SIZE , sizeof(tmpLong), &tmpLong, NULL) for cpu. I am even able to create 2GB buffers on CPU (opencl)

I am not familiar with gpucapsviewer. i tried typing it in cmd prompt but got unrecognizable command error.

0 Kudos
Maxim_S_Intel
Employee
814 Views

If 09 wrote:
the only purpose of using alloc_host_ptr flag was to use the max_mem_alloc size of cpu which is 4GB.

Hi,

The ALLOC_HOST_PTR flag just tells the runtime to mirror the memory allocation on the host. The actual buffer is being created on the device.

With ALLOC_HOST_PTR you are taking advantage of the upfront allocation, so when you do clEnqueueMapBuffer and the mapped memory is already allocated, so you just got the pointer which is fast. In contrast, if you do the clEnqueueMapBuffer on the regular buffer, the runtime would need to allocate the mapped memory first. Finally, without mapping you would need to allocate memory yourself and use clEnqueueWriteBuffer.

If you are not going to populate and/or read the buffer from the host code you don't need any flags. Otherwise the best way to avoid copying from CL buffer into your internal structures (and back) is actually using USE_HOST_PTR.

0 Kudos
If_09
Beginner
814 Views

Please correct me if I am not. According to my experimental results and my understanding, ALLOC_HOST_PTR allocates the buffers on host so the kernel accesses the buffers from host memory and this access is very slow. On the other hand, USE_HOST_PTR allocates buffer on device and so the time for kernel execution was faster in comparison to when buffer was created using ALLOC_HOST_PTR.

0 Kudos
Reply