OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1664 Discussions

Error: unimplented function(s) used

rutger1
Beginner
419 Views
Good morning forum,
I ran the Shoc benchmark (http://ft.ornl.gov/doku/shoc) on Linux, on two different configurations:
1) AMD 64 X2 4400+ w/ Ubuntu Lucid
2)Intel Xeon E5620 @ 2.40GHz w/ RHEL6
For some problem sizes of the benchmark, I run into an SDK error:
For datasize=1: the AMD, in Reduction, Scan, Sort and Spmv;
for the Intel, Spmv;
No errors for datasize=2.
The error reported in Shoc's Reduce is:
Error: unimplemented function(s) used:
allZero_v8_i32 in function __Vectorized_reduce
allOne_v8_i32 in function __Vectorized_reduce
allZero_i32 in function __Vectorized_reduce
allOne_v8_i32 in function __Vectorized_reduce
Log: Error: unimplemented function(s) used:allZero_v8_i32 in function __Vectorized_reduceallOne_v8_i32 in function __Vectorized_reduceallZero_i32 in function __Vectorized_reduceallOne_v8_i32 in function __Vectorized_reduce
In Shoc's Spmv, the error looks related:
Error: unimplemented function(s) used:
allZero_v4_i32 in function __Vectorized_spmv_csr_vector_kernel
allOne_v4_i32 in function __Vectorized_spmv_csr_vector_kernel
And I *think* (but I am not really sure) that rarely, these errors do not occur -- so it might be a race condition.
What can I do to fix this? A gdb backtrace points me deep into the library, and we don't have the source code to that of course. If it helps, I can report the backtrace. I can also post the code to the kernels if that helps.
Rutger Hofman
VU Amsterdam
0 Kudos
13 Replies
Evgeny_F_Intel
Employee
419 Views
Hi,

Please confurm that you have these issues on Intel Xeon / RHEL6.
rutger1
Beginner
419 Views
Yes, the problem also occurs on the Xeon/RHEL. Do you want me to post the relevant kernels or even the complete benchmark tarball? It seems the Shoc site is unreachable currently.

Rutger
Evgeny_F_Intel
Employee
419 Views
Sure,
Thanks a lot.
Please provide tarball.

Evgeny
rutger1
Beginner
419 Views
Get it from here:
Hmmm. I was imprecise with the error report. The 'unimplemented function(s)' only occurs on AMD. On Intel, the Spmv benchmark throws a segmentation violation deep in the OpenCL library.
The error on Intel Xeon/RHEL occurs with size=1 with benchmark Spmv. To see the error quickly, configure and make. Then cd to the bin/Serial/OpenCL directory and run:
$ ./Spmv -d 0 -s 1
(a number of tests passes)
Segmentation fault (core dumped)
Any idea what the 'unimplemented function(s)' on the AMD CPU means, and how I can tackle that?
Rutger
Evgeny_F_Intel
Employee
419 Views
Thanks for additional info. Intel SDK requries SSE4.1 support (this is minimum requirement)
Please check if your AMD machine supports it.

We will try to reproduce the failure on the Xeon machine.

Evgeny
rutger1
Beginner
419 Views
It seems that Athlon 64 X2 Dual Core processors support SSE1,2,3. That would explain the issue. Thanks.
Rutger Hofman
VU Amsterdam
rutger1
Beginner
419 Views
I tried on yet another system, Intel Core i3 550 @ 3.20GHz running Ubuntu 10.04 x86_64 (since last night). The same segfault as on the Intel Xeon. Of course, it could be a Shoc problem, even though it doesn't surface with the AMD OpenCL/CPU SDK or the NVIDIA OpenCL/GPU SDK. And it is indeed a race condition, as it occasionally runs correctly to termination.

Rutger
Evgeny_F_Intel
Employee
419 Views
Thanks,
For the info.
I asked our QA team to reproduce the issue.
Yevgeniy_B_Intel
Employee
419 Views
We've reproduced this bug. This issue is connected with a mistake in the SHOC source code. To fix this problem add the following code:
memset(newcols, 0, paddedSize*sizeof(int));

in filesrc/opencl/level1/spmv/util.h, line 478 (this line is empty) and recompile it. That should help you to avoid crashes.
rutger1
Beginner
419 Views
Well, that is fast debugging! Thanks a lot. I will notify the Shoc people.

Rutger
Yevgeniy_B_Intel
Employee
419 Views
You're welcome!
I think we've already notified the SHOC team. But you can remind them about this problem one more time as they haven't fixed it yet.
cantallo
Beginner
419 Views
Hello,

on core2duo E6850 stepping 0b, I got the following error:


Error: unimplemented function(s) used:
allZero_i32 in function __Vectorized_.preillumin
allOne_i32 in function __Vectorized_.preillumin
CompilerException Failed to parse IR


on a task (workgroup of 1 thread) for an housekeeping non optimized part of a code (listed at the end of my post).

Does it mean that I should avoid assignments as i=0; or i= -1; ???

code:
[cpp]__kernel __attribute__((reqd_work_group_size(1,1,1))) void preillumin(const int iazimut,const int bufoffset,const int buflongfilled,const float azimut,const __global float *altilook,const __global float *integstartband,const __global float *integstopband,const __global double *xtraj2,const __global double *ytraj2,const __global double *ztraj2,const __global float *psiatti2,const __global float *thetaatti2,const __global float *phiatti2,const __global float *lobexbuf,const __global float *t2,const __global float *hardpreintbuf,const __global float *rangefilterbuf,__global float *restrict compenstoband)
{
__private int i,j,n,p;
__private float w,w2,w4,d1,d2,xfocus,yfocus,zfocus,xtraj,ytraj,ztraj;
__private float c,s;
__private float x,y,z;
__private float x1,y1,z1;

d2=azimut*(3.316377475857735e-02f)+(-1.877929329872131e+00f);
i=floor(d2);
if (i<0)
 {
 i=0;
 d2=0.0f;
 }
if (i>132)
 {
 i=132;
 d2=133.0f;
 }
d2-=i;
d1=get_global_id(0)*(9.692779287317110e-01f)+(-3.377743680511874e-01f);
j=floor(d1);
if (j<0)
 {
 j=0;
 d1=0.0f;
 }
if (j>43)
 {
 j=43;
 d1=44.0f;
 }
d1-=j;
n=45*i+j;
zfocus=(1.0f-d2)*((1.0f-d1)*altilook+d1*altilook[n+1])+d2*((1.0f-d1)*altilook[n+45]+d1*altilook[n+46])+(-2.252434373756267e+03f);
w=get_global_id(0)*(2.955416520878326e+01f)+(3.603700952450512e+03f);
d2=zfocus+azimut*(-1.010356079772958e-04f);
d1=w*(3.501031082123518e-03f)+d2*(-1.010356084929913e-04f);
d2=w*w-d2*d2-d1*d1;
if (d2<0.0f)
 {
 d1=w*(3.602066051945484e-03f);
 zfocus=azimut*(1.010356079772958e-04f)+w*(-9.999935018887185e-01f);
 d2=0.0f;
 }
else
 d2=sqrt(d2);
xfocus=azimut*(8.592837859064036e-01f)+d1*(8.592837902922715e-01f)+d2*(-5.114991375759568e-01f);
yfocus=azimut*(-5.114991349652156e-01f)+d1*(-5.114991375759568e-01f)+d2*(-8.592837902922715e-01f);
w2=0.0f;
for (n=0;n<=2;n++)
 {
 p=64*iazimut+get_global_id(0);
 p=floor(integstartband

+(integstopband

-integstartband

)*(5.000000000000000e-01f)*n+0.5f); if (p<0) p=0; if (p>=buflongfilled) p=buflongfilled-1; i=(p+bufoffset)&2047; xtraj=xfocus-xtraj2; ytraj=yfocus-ytraj2; ztraj=zfocus-ztraj2; d2=sqrt(xtraj*xtraj+ytraj*ytraj+ztraj*ztraj); x1=(8.660254037844387e-01f); y1=(5.302876193624535e-17f); z1=(-4.999999999999999e-01f); c=cos(phiatti2); s=sin(phiatti2); x=x1*c+z1*s; z1=z1*c-x1*s; x1=x; c=cos(thetaatti2); s=sin(thetaatti2); y=y1*c-z1*s; z1=z1*c+y1*s; y1=y; c=cos(psiatti2); s=sin(psiatti2); x=x1*c+y1*s; y1=y1*c-x1*s; x1=x; w=(xtraj*x1+ytraj*y1+ztraj*z1)/d2; w=native_sqrt(1.0f-w*w); i=floor(w*4000.0f+0.5f); if (i<0) i=0; if (i>3999) i=3999; w4=lobexbuf; if (!(w4>1.000000047497451e-03f)) w4=0.0f; i=p-1; if (i<0) i=0; i=(bufoffset+i)&2047; j=p+1; if (j>buflongfilled) j=buflongfilled; j=(bufoffset+j)&2047; x=xtraj2-xtraj2; y=ytraj2-ytraj2; z=ztraj2-ztraj2; w=native_divide(xtraj*x+ytraj*y+ztraj*z,d2*(t2-t2)); i=floor(w*(1.292359000000000e+06f)+(1.874149169921875e+03f)); if (i<0) i=0; if (i>3999) i=3999; w=hardpreintbuf; w4*=w; i=floor(d2*(6.101493061542592e+00f)+(-2.613644720440822e+04f)); if (i<-4097) i=-4097; if (i>4094) i=4094; w4=native_divide(w4,rangefilterbuf[i&8191]); i=1+(n&1); i=i+i; if (n==0 || n==2) i=1; w2+=i*w4; } w2*=1.666666666666667e-01f; p=64*iazimut+get_global_id(0); i=floor(integstopband

); if (i>=buflongfilled-1) i=buflongfilled-2; w=integstopband

-i; w=w*t2[(bufoffset+i+1)&2047]+(1.0f-w)*t2[(bufoffset+i)&2047]; i=floor(integstartband

); if (i<0) i=0; w4=integstartband

-i; w-=w4*t2[(bufoffset+i+1)&2047]+(1.0f-w4)*t2[(bufoffset+i)&2047]; w2*=w; compenstoband[get_global_id(0)]=w2; } [/cpp]


cantallo
Beginner
419 Views
Okay, I checked that Core2 E6850 does not have the SSE4.1 instructions !

My mistake (SSE4.1 is a requisite for Intel OpenCL SDK)

on a Xeon X5570 (I just checked) it compiles with no error.
Reply