OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1719 Discussions

Problems running scan SDK examples from NVidia

jrimestad
Beginner
645 Views
Hi
I have tried running NVidia's OpenCL example of scan (prefixsum) on an Intel CPU but without luck.
The program fails with a memory error.
If I remove the barrier synchronization lines in the following part of the code it runs without errors, but the results are offcause wrong:
[cpp]inline uint scan1Inclusive(uint idata, __local uint *l_Data, uint size){
	uint pos = 2 * get_local_id(0) - (get_local_id(0) & (size - 1));
    l_Data[pos] = 0;
    pos += size;
    l_Data[pos] = idata;

	for(uint offset = 1; offset < size; offset <<= 1){
		barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL
        uint t = l_Data[pos] + l_Data[pos - offset];
		barrier(CLK_LOCAL_MEM_FENCE); //Fails with Intel openCL
		l_Data[pos] = t;
	}
	return l_Data[pos];
}[/cpp]
I have tested other code using barriers that worked fine. The main difference is that this example is using a three layer nesting of inlining. The inserted code is the innermost function.
The code runs fine on a NVidia graphics card and on CPU compiled with the AMD openCL compiler.
-Jens
0 Kudos
2 Replies
jrimestad
Beginner
645 Views
Have just tried to inline the code manually which made it work!
0 Kudos
Eli_Bendersky__Intel
645 Views
Hello jrimestad, Thanks for the report. We're working on reproducing and fixing the problem.
0 Kudos
Reply