- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I stumbled on some interesting behaviour w.r.t. allocating memory that I was hoping someone could shed some light on. I've written a simple program that shows the behaviour (I'm happy to provide it, but I imagine the behaviour might differ across systems based on the available memory).
my specs are:
GPU: Intel HD 5500
OS: Windows 10
Driver Version: 10.18.15.4281
Host memory: 8 GB
The program is very simple: First I allocate a large chunk of host memory using malloc (or calloc). I then attempt to allocate several large OpenCL global memory buffers. Usually after two calls to clCreateBuffer, I get an CL_MEM_OBJECT_ALLOCATION_FAILURE. However, sometimes the program runs to completion without error (rarely).
I've been careful not to allocate more OpenCL global memory at a time than DEVICE_MAX_MEM_ALLOC_SIZE and in total not more than CL_DEVICE_GLOBAL_MEM_SIZE. Also, I've checked that the host has sufficient free memory for the host allocations when the program is run (using the device manager) and I check that those calls succeed.
The machine also has a small Nvidia GPU (with less global memory than the Intel GPU). The program is able to reliably run without error when targeting the Nvidia GPU.
Any help would be greatly appreciated, thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tyler,
I would try to allocate one big chunk of memory with _aligned_malloc, and then feed portions of that memory to clCreateBuffer calls (see https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics for more guidance).
I can replicate the behavior on a system w/ 4GB of RAM, but not on 16GB of RAM system, however, at this point I believe the runtime is doing the right thing: malloc and calloc chunks can come from anywhere and can fragment your memory. Same goes for clCreateBuffer calls, so in the end you might end up not being able to find the large enough continuous chunk of memory to place your buffer. So try the scheme above instead, where you explicitly manage a large chunk of aligned memory.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tyler,
I would try to allocate one big chunk of memory with _aligned_malloc, and then feed portions of that memory to clCreateBuffer calls (see https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics for more guidance).
I can replicate the behavior on a system w/ 4GB of RAM, but not on 16GB of RAM system, however, at this point I believe the runtime is doing the right thing: malloc and calloc chunks can come from anywhere and can fragment your memory. Same goes for clCreateBuffer calls, so in the end you might end up not being able to find the large enough continuous chunk of memory to place your buffer. So try the scheme above instead, where you explicitly manage a large chunk of aligned memory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the insights Robert, this makes a lot of sense!
I tried some initial experiments and the approach you are suggesting seems to work.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page