OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

Issue Sychronzation between threads in GPU.


Hi there,

I am currently working on an error diffusion algo refering to  the following link.



a thread handles a single image line

# a pixel can be only handled when the line above it  has already handled  corresponding horizontal position+2 


#current result

I tried the code in the link above, and it stalls.

Is it possible that Intel cpu's thread will wait for each other and will never proceed?


Thank for you help in advance.



0 Kudos
7 Replies

Hello YX,

Note that the article you're referring to explains and suggests the code may stall on other platforms due to a while loop for every work-item but the first in a work group.

See this article on forward progress programming guidance for related information.


If you'd like to discuss a minimally required reproducer for understanding when you can expect forward progress to be portable per OpenCL spec, please add a hanging reproducer here. Having a discussion about the topic as it relates to expectations presented by the standard is useful for users new to the parallel execution environment. Particularly if they are interested in portability.

See this guide on soliciting feedback on forum posts for other information to include in the post. Also, please only use sources that you have privileges to post on this forum.





Hi YX/forum viewers,

Also... It's very verbose initially to monitor race conditions and hangs, but printf(...) debug within a kernel can be useful. Keep a little creativity in mind so your terminal output isn't flooded and it can make source code leading to hangs more obvious.




This is a nice example.  If you're able to share your code once it starts working (or even while debugging) I would encourage you to do so.

I'm not 100% sure what is going on, but if I had to wager a guess I'd say that the compiler is aggressively optimizing one or both of the loops that are waiting for prior results to complete:

while (progress[get_local_id(0) - 1] < (x + 2))

In other words, marking the pointers as "volatile" may not have the intended effect.

If you're able to, I would encourage you to switch to the OpenCL 2.0 atomics (which are based on C11 atomics) instead, which are much better specified:

If you cannot switch to the OpenCL 2.0 atomics for some reason, you may want to try experimenting with the OpenCL 1.x memory fence functions, which are at least better specified than "volatile".

You may find this presentation insightful - for some reason my name is in the presentation filename, but it was primarily work done by my colleague Biju George:


Hi Michael and Hi Ben,

Thank you for your reply and advice.

I will look into the issue and the documents you provided , and again provide a minimal code/and environment information  here.

#By the way there seems to be not output by using  printf() in kernel using my device(Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)

  from clinfo I can't find any cl_intel_printf shown. 


Again thank you very much for the help.





Hi XiaoyingY,

Are you running the application in a developer environment in either Intel­® SDK for OpenCL™ Applications or Intel® System Studio: OpenCL™ Tools component? Output may go to the IDE pane if so.... 

Is this a windows environment where the console terminal disappears when the process is terminated? A system("pause"), getchar(), cin.get() or piping output to file helps here.

printf(...) is defined per OpenCL specification in OCL 1.2 and 2.0 ... Output should be going somewhere. printf was an extension before 1.2.



Hi MichaelC,

Thank you for your reply.

>Are you running the application in a developer environment in either Intel­® SDK for OpenCL™ Applications or Intel® System Studio...

I am not using an IDE, it is a pure Linux program.


What I did was just installing the .deb files in the following link

 #include <CL/cl.h> file 

and compile the source code with -lOpenCL.


Do you mean printf() in a kernel cannot be used if is without an IDE?


Thank you in advance.


Hi XiaoyingY,

Revisiting this thread...

This style of debug works fine to regular stdout. Tested on Ubuntu with May 30 2019 NEO runtimes for Intel® Core™ i7-6770HQ.

If the program is used in an IDE debugger, such output may go to a GUI pane as opposed to the shell.


~/workspace/training_sample/ocl-tools-walkthrough-20181221-edit/cpu$ ls
CPUOpenCLProjectforLinux  CPUOpenCLProjectforLinux.cpp  Makefile  utils.cpp  utils.h
~/workspace/training_sample/ocl-tools-walkthrough-20181221-edit/cpu$ cat

__kernel void Add(__global int* pA, __global int* pB, __global int* pC)
    const int x     = get_global_id(0);
    const int y     = get_global_id(1);
    const int width = get_global_size(0);

    const int id = y * width + x;

    pC[id] = pA[id] + pB[id];

Hope this helps,