- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
The max mem alloc size of my cpu device (i5-3470) is 4266006528(less than 4GB) and that of gpu (hd-2500) is 425721856(less than 512MB).
Now i am creating a simple buffer clInput = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, sizeof(type) * elements, NULL, &err);
But I am getting invalid_buffer_size for GPU when size reaches 512MB. This would have made sense if i am allocating buffer on GPU memory. the only purpose of using alloc_host_ptr flag was to use the max_mem_alloc size of cpu which is 4GB. Am i doing something wrong or is it a bug?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was able to reproduce the error. I will debug this and get back to you. Thanks for reporting.
Raghu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry. I checked my test program again. My CPU device reports CL_DEVICE_MAX_MEM_ALLOC_SIZE as 536838144 which is less than 512MB, to be precise. So if I pass this exact number as the sizeof my buffer I didn't get the error. Can you use gpucapsviewer to see what your CPU device reports this number as? And try creating the buffer whose size is less than or equal to this number. Let me know if you are still getting the error.
I am curious why you got 4GB and I am getting 512MB.
Raghu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi thanks for the prompt response, I checked CL_DEVICE_MAX_MEM_ALLOC_SIZE for cpu again and found the same 4266006528 bytes. I am checking this number in opencl code using clGetDeviceInfo(device_id
I am not familiar with gpucapsviewer. i tried typing it in cmd prompt but got unrecognizable command error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi thanks for the prompt response, I checked CL_DEVICE_MAX_MEM_ALLOC_SIZE for cpu again and found the same 4266006528 bytes. I am checking this number in opencl code using clGetDeviceInfo(device_id
I am not familiar with gpucapsviewer. i tried typing it in cmd prompt but got unrecognizable command error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If 09 wrote:
the only purpose of using alloc_host_ptr flag was to use the max_mem_alloc size of cpu which is 4GB.
Hi,
The ALLOC_HOST_PTR flag just tells the runtime to mirror the memory allocation on the host. The actual buffer is being created on the device.
With ALLOC_HOST_PTR you are taking advantage of the upfront allocation, so when you do clEnqueueMapBuffer and the mapped memory is already allocated, so you just got the pointer which is fast. In contrast, if you do the clEnqueueMapBuffer on the regular buffer, the runtime would need to allocate the mapped memory first. Finally, without mapping you would need to allocate memory yourself and use clEnqueueWriteBuffer.
If you are not going to populate and/or read the buffer from the host code you don't need any flags. Otherwise the best way to avoid copying from CL buffer into your internal structures (and back) is actually using USE_HOST_PTR.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please correct me if I am not. According to my experimental results and my understanding, ALLOC_HOST_PTR allocates the buffers on host so the kernel accesses the buffers from host memory and this access is very slow. On the other hand, USE_HOST_PTR allocates buffer on device and so the time for kernel execution was faster in comparison to when buffer was created using ALLOC_HOST_PTR.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page