06-21-2012 04:34 AM
I have very strange and serious problems with OpenCL SDK 2012 compiler (Windows 7 32-bit platform).
At earlier step of development everything was fine. But ones when i added new kenel to my openCL program source i've got a lot of odd errors. For example some very simple kernels return completely wrong results, or fatal error that crashes program can occur, even if i do not use new kernel, and even i do not create it with clCreateKernel.
When I'm switching off autovectorization by usingvec_type_hint in one or more kernels (not necessarily new one)in new source code everything is fine again. Also I can remove some old kernels from the source and new kernel with all remainded work correctly. Complete OpenCL program works fine in debug mode (-g compiler option) too. By the way everything ok on amd and nvidia platforms in both windows and linux.
It looks like there is some kind of limitation of vectorized kernels numberin OpenCL program but i don't believe that. Maybe when there is too much such kernels compiler behaves unexpectedly or something similar.
Did anyone face such problem or have any idea how to solve it?
06-25-2012 12:10 PM
Let me try to understand what you are saying. You were able to build and run kernels fine until you added a new kernel and everything broke? Can you give us a small test case so that we can try to reproduce the behavior on our end?
06-26-2012 09:38 AM
Thank you for your attention.
I've written sample that demonstrates problem. There are one simple kernel evenBytes that just takes even bytes from memory buffer and puts them into another one and number of not used dummy kernels kernelX (they are completely identical but have different names). kernelX kernels aresuccessfullyvectorized during build.
If you run this sample it will produce wrong results. BUT if you remove or comment one of kernelX kernel, everything will be fine. Also you can just add vec_type_hint attribute (it prevents autovectorization) to one of kernelX kernel and result will be correct too.
As an additional information that maybe can help, I use previous generation Core i5 750 processor (SSE 4.1 instruction set).
Sample host and device code with build are in attachment.
06-27-2012 03:05 AM
I've discovered some more information about this bug. Maybe it can help fix the problem.In this sample if switch work-group size from 960 to 480 (global work-size is evenly divisible by both of them) everything works correctly with all kernels and autovectorization enabled. So, maybe problem is not on compiler side but in OpenCL runtime, or maybe in both.
06-27-2012 03:06 AM
HelloRaghu!I've discovered some more information about this bug. Maybe it can help fix the problem. In this sample if switch work-group size from 960 to 480 (global work-size is evenly divisible by both of them) everything works correctly with all kernels and autovectorization enabled. So, maybe problem is not on compiler side but in OpenCL runtime, or maybe in both.