OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

CL_MEM_OBJECT_ALLOCATION_FAILURE question

Tyler_S_2
Beginner
4,146 Views

Hi,

I stumbled on some interesting behaviour w.r.t. allocating memory that I was hoping someone could shed some light on. I've written a simple program that shows the behaviour (I'm happy to provide it, but I imagine the behaviour might differ across systems based on the available memory).

my specs are:

GPU: Intel HD 5500
OS: Windows 10
Driver Version: 10.18.15.4281
Host memory: 8 GB

The program is very simple: First I allocate a large chunk of host memory using malloc (or calloc). I then attempt to allocate several large OpenCL global memory buffers. Usually after two calls to clCreateBuffer, I get an CL_MEM_OBJECT_ALLOCATION_FAILURE. However, sometimes the program runs to completion without error (rarely).

I've been careful not to allocate more OpenCL global memory at a time than DEVICE_MAX_MEM_ALLOC_SIZE and in total not more than CL_DEVICE_GLOBAL_MEM_SIZE. Also, I've checked that the host has sufficient free memory for the host allocations when the program is run (using the device manager) and I check that those calls succeed.

The machine also has a small Nvidia GPU (with less global memory than the Intel GPU). The program is able to reliably run without error when targeting the Nvidia GPU.

Any help would be greatly appreciated, thanks!

0 Kudos
1 Solution
Robert_I_Intel
Employee
4,146 Views

Tyler,

I would try to allocate one big chunk of memory with _aligned_malloc, and then feed portions of that memory to clCreateBuffer calls (see https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics for more guidance).

I can replicate the behavior on a system w/ 4GB of RAM, but not on 16GB of RAM system, however, at this point I believe the runtime is doing the right thing: malloc and calloc chunks can come from anywhere and can fragment your memory. Same goes for clCreateBuffer calls, so in the end you might end up not being able to find the large enough continuous chunk of memory to place your buffer. So try the scheme above instead, where you explicitly manage a large chunk of aligned memory.

View solution in original post

0 Kudos
2 Replies
Robert_I_Intel
Employee
4,147 Views

Tyler,

I would try to allocate one big chunk of memory with _aligned_malloc, and then feed portions of that memory to clCreateBuffer calls (see https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics for more guidance).

I can replicate the behavior on a system w/ 4GB of RAM, but not on 16GB of RAM system, however, at this point I believe the runtime is doing the right thing: malloc and calloc chunks can come from anywhere and can fragment your memory. Same goes for clCreateBuffer calls, so in the end you might end up not being able to find the large enough continuous chunk of memory to place your buffer. So try the scheme above instead, where you explicitly manage a large chunk of aligned memory.

0 Kudos
Tyler_S_2
Beginner
4,146 Views

Thanks for the insights Robert, this makes a lot of sense!

I tried some initial experiments and the approach you are suggesting seems to work. 

0 Kudos
Reply