I tried compile my own opencl kernel but KernelBuilder (32bit and 64bit too) says this log:
fc1 build 1 succeded.
fc1 build 2 succeeded.
Error: Internal Error.
And when the compiling is in-progress the ioc32 process's memory usage is almost 2 GB and the ioc64 has more than 5GB.
The same program works fine on the CPU; it only fails in the GPU. Any pointers on where to start? Thanks much (I am using OpenCL 2013 on a Windows 7 Pro with Intel HD Driver ver. 184.108.40.20671 ).
We debugged this further and it appears that the program generates a branch greater than 2^15 instructions which cannot be encoded in the current gen architecture. A simple workaround would be to rewrite the function miller_rabin_32 like this:
bool miller_rabin_32(long n)
bool result = false;
if (n <= 1L) result = false;
else if (n == 2L) result = true;
else if (miller_rabin_pass_32( 2L, n) &&
(n <= 7L || miller_rabin_pass_32( 7L, n)) &&
(n <= 61L || miller_rabin_pass_32(61L, n)))
result = true;
Hope this helps.
Sorry for the delay in responding. Actually looks like there is a bug in KernelBuilder that is responsible for the failure. You still need to remove the early returns and rewrite your kernel like I suggested below. Build your program (dont use KernelBuilder). I tried this and it seems to work.