Strange error message during kernel execution: "No kernel updates in a while..."

Altera_Forum · ‎07-18-2013

Hi all,

I'm observing a strange error message during kernel execution on nallatech-pcie385n-d5

"No kernel updates in a while... a kernel may be hung"

It comes with a list of kernels (kernel1, kernel2, ...) and some info on global mem accesses.

Has anyone seen this before? If yes, any ideas on what might be the cause?

What bothers me is that it seems to be random: I do not always get it and when I do it does not happen at the same video frame.

Any feedback would be appreciated.

Thanks.

Altera_Forum · ‎07-18-2013

I never got that error message, although I do get a "Real-time signal 10" or "-10" when I enqueue too many kernel calls in a row. How many kernel calls do you perform per execution? It might be motivated by a similar reason.

Altera_Forum · ‎07-18-2013

The only time I have seen that is when I debug and accidentally step to code around the time the kernel is launched, basically the kernel launches, the CPU is paused and then eventually the runtime thinks that the kernel is hung.

The other thing that can cause this is buggy kernel code. I would inspect the kernel to make sure there are no race conditions or problematic statements like having barriers inside 'if' statements for example. Since OpenCL archetictures vary between devices and vendors code with potential race conditions can work on one target and fail on another. I have seen this happen with code that ran fine on a GPU but failed on the FPGA because on the FPGA doesn't have a concept of a warp or wavefront which can sometimes mask race conditions due to the way these scheduling works on the GPU.

Altera_Forum · ‎07-19-2013

--- Quote Start ---

I never got that error message, although I do get a "Real-time signal 10" or "-10" when I enqueue too many kernel calls in a row. How many kernel calls do you perform per execution? It might be motivated by a similar reason.

--- Quote End ---

Hi andradx,

I have 9 kernels that are called multiple times for each frame.

total number of kernel calls per frame is around 150.

I call clFinish after each kernel enqueue due to timing purposes.

Altera_Forum · ‎07-19-2013

Hi BadOmen,

I believe these kernels are bug-free.

but they had originally been written for GPU.

I will try to see if I had made any warp based sync assumptions.

thanks for your suggestion.

--- Quote Start ---

The only time I have seen that is when I debug and accidentally step to code around the time the kernel is launched, basically the kernel launches, the CPU is paused and then eventually the runtime thinks that the kernel is hung.

The other thing that can cause this is buggy kernel code. I would inspect the kernel to make sure there are no race conditions or problematic statements like having barriers inside 'if' statements for example. Since OpenCL archetictures vary between devices and vendors code with potential race conditions can work on one target and fail on another. I have seen this happen with code that ran fine on a GPU but failed on the FPGA because on the FPGA doesn't have a concept of a warp or wavefront which can sometimes mask race conditions due to the way these scheduling works on the GPU.

--- Quote End ---

Altera_Forum · ‎07-23-2013

ok. I've gone through all my kernels and I cannot see any race condition possibilities.

to make things worse, adding debug code seems to make this error/warning message disappear so I can't even pinpoint the specific kernel(s) causing this.

since I have clFinish right after all kernel submissions I don't think this error/warning is caused by host side kernel ordering/sync problems.

any ideas would be appreciated.

Altera_Forum · ‎07-23-2013

Double check the programming guide to make sure you are not using a feature of OpenCL that is not currently supported. The compiler should flag cases such as these but it could be failing silently.

If you have multiple kernels in the hardware have you debugged down to the point where you know which kernel appears to be hanging? If not I would do that so that you can focus your debug efforts on that single kernel.

Last but not least, does your application code implement error checking and handling? If not I would add that since it could be a failure happening much earlier than the kernel launch causing this.

Altera_Forum · ‎08-12-2013

--- Quote Start ---

Double check the programming guide to make sure you are not using a feature of OpenCL that is not currently supported. The compiler should flag cases such as these but it could be failing silently.

If you have multiple kernels in the hardware have you debugged down to the point where you know which kernel appears to be hanging? If not I would do that so that you can focus your debug efforts on that single kernel.

Last but not least, does your application code implement error checking and handling? If not I would add that since it could be a failure happening much earlier than the kernel launch causing this.

--- Quote End ---

The only adv. feature is local mem barrier and it is applied to 256 threads so it should be fine.

The message is quite rare. so far I was able to clear a few kernels with light weight step-in/step-out checks but so far I could not pin point the exact kernel causing this.

All API calls are error checked. As far as I can see the issue is caused by kernel code.

Altera_Forum · ‎08-25-2013

Hi,I also meet this problem.I used the board of DE4 530.I guess the reason of the error is that some kernels have been hung because of global mem synchronization. For some reasons the code can make the global mem synchronized, but I use the command of "barrier(CLK_GLOBAL_MEM_FENCE)" .So the kernels is stopped and blocking. The host is still waiting for the kernels.After I removed "barrier(CLK_GLOBAL_MEM_FENCE)" in my code,this error disappeared. Hope this can help you more or less.

Altera_Forum · ‎08-31-2013

FOR loops can also have subtle races.

LOOP Indices starting from a Linear Function of ThreadID (like for (int i=get_local_id(0);)) and ending against a comparison to constant N (constant for all threads) will also make the FOR loop unsuitable for BARRIERs.

e.g.

for(int i=get_local_id(0); i<N ; i+=get_local_size(0))

{

....

barrier(CLK_LOCAL_MEM_FENCE); // --> This is a bug.

}

Altera_Forum · ‎09-03-2013

--- Quote Start ---

Hi,I also meet this problem.I used the board of DE4 530.I guess the reason of the error is that some kernels have been hung because of global mem synchronization. For some reasons the code can make the global mem synchronized, but I use the command of "barrier(CLK_GLOBAL_MEM_FENCE)" .So the kernels is stopped and blocking. The host is still waiting for the kernels.After I removed "barrier(CLK_GLOBAL_MEM_FENCE)" in my code,this error disappeared. Hope this can help you more or less.

--- Quote End ---

Very much appreciated. I will try to put this piece information into use.

I am not very familiar with FPGA fabric but my gut feeling is this issue is caused by the HW synthesized by OpenCL compiler/Quartus II sw.

I've been trying to compile and test this app with SDK ver 13 but so far I could not solve a compilation error... when compiled with SDK ver 13 this issue might just dissappear.