"Illegal instruction" using -ipo

QFang1 · ‎11-18-2010

I was working on a code using SSE4.1 instructions. When compiled with "icc -O3 -msse4.1", everything worked just fine. Howerver, if I add -ipo to the compilation, the code can be generated fine, but it will crash with "Illegal instruction" error.

Using valgrind, I found the offending instruction was the following

vex amd64->IR: unhandled instruction bytes: 0x66 0x45 0xF 0x3A 0x40 0xD9

I don't know what does that mean.

Similarly, if I used -fast, I will also get some other errors, such as

Fatal Error: This program was not built to run on the processor in your system.
The allowed processors are: Intel processors with SSE4.2 and POPCNT instructions support.

My computer has a Xeon E5520 quad-core CPU, it is running 64bit Ubuntu Linux 10.04. The /proc/cpuinfo shows the following

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel Xeon CPU E5520 @ 2.27GHz
stepping : 5
cpu MHz : 2266.785
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 4533.57
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

It looks like sse4_1, sse4_2 and popcnt are all supported.

Can any one let me know what was going on and if there is a work-around? The icc version is icc "(ICC) 12.0.0 20101006".

thanks in advance!

Qianqian

mecej4 · ‎11-19-2010

It would help much if you can isolate and display a small C/C++ code extract which, when compiled with the -O3 -msse4.1 -ipo options, produces the offending instruction sequence.

Tom_Truscott · ‎11-19-2010

That valgrind error usually indicates a valid machine instruction which is not yetsupported byvalgrind.

If you aren't using the latest valgrind (3.6.0 released 21 October 2010), give it a try.

JenniferJ · ‎11-19-2010

Yes, we'll need more info in order to do any valid investigation.
Several methods here:
1. try "IDB".
2. the issue may not be ipo. it maybe other optimizations that only kicked in after inlining.
so lower the optimization might help work-around the problem.

This issuewill likely take a long time to isolate. please file a ticket to the Intel Premier Support (https://premier.intel.com/)to get more hands-on help.

thanks,
Jennifer

aazue · ‎11-19-2010

Hi I dont know if it help you more Prescot processor make same problem, no relation with code. Is strange same code with gnu compiler -march=core2 accepted and rejected by Icc. I think problem is configuration firmeware (irq) (separed (old) . Regards

QFang1 · ‎11-19-2010

hi Jennifer

thanks for the advice, idb is a very nice tool, I am glad that you pointed this out to me.

Using IDB, I found the code crashed at the following highlighted line:

for(i=0;i<4;i++)
if(havelsse4(plucker->m+eid*12+i*3,pout,&bary,o,d,int_coef)){
...
}

where the function havelsse4 is defined as an inline function

[bash]inline int havelsse4(float4 *vecN, float4 *pout,float4 *bary, const __m128 o,const __m128 d,const __m128 int_coef){
    ...
    if(...){
		_mm_store_ps(&pout->x, _mm_mul_ps(detp,_mm_shuffle_ps(inv_det, inv_det, 0))); /* crashed here */
                return 1;
    }
    return 0;
}[/bash]

the error occurred at "movaps xmmword ptr [r8], xmm8" which corresponds to the marked line: ptr [r8] points to address "&pout->x" and xmm8 is the result from __mm_mul_ps().

if I commend out this line, -ipo and -fast worked perfectly. pout is a pointer to a float4 (a 4 floats struct) and it was allocated. Do you think that icc did something fishy when expanding this inline function?

JenniferJ · ‎11-19-2010

how about moving the intrinsics parameters into different statements:

aa = _mm_shuffle_ps(inv_det,inv_det,0);
bb = _mm_mul_ps(detp, aa)
_mm_store_ps(&pout->x,bb);

can you give it a try?

QFang1 · ‎11-19-2010

hi Jennifer

I tried, but got the same error.

jimdempseyatthecove · ‎11-20-2010

>>the error occurred at "movaps xmmword ptr [r8], xmm8" which corresponds to the marked line: ptr [r8] points to address "&pout->x"

Was r8 a multiple of 16?

movaps == move 16-byte aligned single precision

Jim Dempsey

QFang1 · ‎11-21-2010

yes, ptr [r8] is 16byte aligned. Variable pout is a float4 struct defined by

typedef struct Float4{
float x,y,z,w;
} float4 __attribute__ ((aligned(16)));

JenniferJ · ‎12-10-2010

Could you file a ticket to Intel Premier Support? We will need a test case for it. Or if you could attach the testcase here (private if prefered), it would be ok too.

btw, make sure to get the latest compiler update.

Thanks,
Jennifer