Hello,I am trying to debug an opencl program with 3 kernels cascade which read global memory at the beginning, communicate thru channels and write the global memory at end. The first kernel has a local size of (11, 11, 1) and global size of (605, 605, 6). the second kernel is a task and the third one has a small global size. The host iterates the kernel cascade 5 times. After launching the program in Emulation mode the CPU memory usage increases as more and more instances of the first kernel ( global size (605, 605, 6) ) are internally launched by the emulator. I know it since I am using a printf() to get an indication on kernel entry. As the kernel reaches its maximal global size (about 2.1 Million instances) the memory usage is about 30 GB ! which makes it about 14 KB per instance. On the second iteration of the cascade the memory is not released but continues to grow and the program crashes. Has anyone idea whether this is normal ? If it is normal, is the only way to debug the program is shrinking the dimensions dramatically ? (and hopping that in run time no bugs would appear) Is there a way to release the memory between iterations while preserving the integrity of the global memory ? thanks.
I personally avoid using channels as much as possible and perform all of my debugging on a standard CPU (unless I have to use channels), so I have never encountered this. Have you tried reducing your input size (rather than dimension size)? If all else fails, you can always open a support ticket directly with Altera.