My code is looping through a few different kernels (each is very similar, just compiled with a few different optimizations) and I am trying to avoid the overhead of copying live buffers back to the host, configuring the next kernel, and then re-copying the buffers back (this is a total waste for my app, since I don't re-use the buffers at all). In total I am using 5 buffers and they basically max out the available memory (4GB total).The problem is that, even though I call clReleaseMemObject() on all 5 buffers (I also, in fact, release the kernel, the program, the command-queue, and the context), the next iteration fails to re-allocate the space I need; it succeeds to allocate a few small buffers, and fails on the first one that makes the total (including the space already freed with clReleaseMemObject()) memory allocated exceed the 4GB limit. Are there any other steps to freeing a buffer other than clReleaseMemObject()? A solution or advice on how to proceed debugging would be greatly appreciated. Host: RHEL 6.4 with kernel 2.6.32-358 Intel Xeon cpu 12GB of RAM Device: Nallatech PCIe385n_a7.
Is there a particular reason you are releasing everything (kernel, program, queue, context) at every iteration, rather than just the buffers?Would it be possible to have some kernels interspersed between your "real" kernels which re-initialized the data? In this way, you could reuse the same buffers but avoid having to copy them back and forth, and avoid having to allocate and destroy them at each iteration.
well, I have to release the kernel & program as the whole point of each iteration is to read in a new .aocx file and test it. Originally, I wasn't releasing and recreating the queue & context but, after I got this error that implies the buffers were never released, I tried adding those to the loop as well to see if they were connected to the buffers some how.No adding data-initializing kernels is not really an option.