- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I have tried running NVidia's OpenCL example of scan (prefixsum) on an Intel CPU but without luck.
The program fails with a memory error.
If I remove the barrier synchronization lines in the following part of the code it runs without errors, but the results are offcause wrong:
[cpp]inline uint scan1Inclusive(uint idata, __local uint *l_Data, uint size){ uint pos = 2 * get_local_id(0) - (get_local_id(0) & (size - 1)); l_Data[pos] = 0; pos += size; l_Data[pos] = idata; for(uint offset = 1; offset < size; offset <<= 1){ barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL uint t = l_Data[pos] + l_Data[pos - offset]; barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL l_Data[pos] = t; } return l_Data[pos]; }[/cpp]
I have tested other code using barriers that worked fine. The main difference is that this example is using a three layer nesting of inlining. The inserted code is the innermost function.
The code runs fine on a NVidia graphics card and on CPU compiled with the AMD openCL compiler.
-Jens
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have just tried to inline the code manually which made it work!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello jrimestad,
Thanks for the report. We're working on reproducing and fixing the problem.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page