- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using valgrind, I found the offending instruction was the following
vex amd64->IR: unhandled instruction bytes: 0x66 0x45 0xF 0x3A 0x40 0xD9
I don't know what does that mean.
Similarly, if I used -fast, I will also get some other errors, such as
Fatal Error: This program was not built to run on the processor in your system.
The allowed processors are: Intel processors with SSE4.2 and POPCNT instructions support.
My computer has a Xeon E5520 quad-core CPU, it is running 64bit Ubuntu Linux 10.04. The /proc/cpuinfo shows the following
- processor : 0
- vendor_id : GenuineIntel
- cpu family : 6
- model : 26
- model name : Intel Xeon CPU E5520 @ 2.27GHz
- stepping : 5
- cpu MHz : 2266.785
- cache size : 8192 KB
- physical id : 0
- siblings : 4
- core id : 0
- cpu cores : 4
- apicid : 0
- initial apicid : 0
- fpu : yes
- fpu_exception : yes
- cpuid level : 11
- wp : yes
- flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
- bogomips : 4533.57
- clflush size : 64
- cache_alignment : 64
- address sizes : 40 bits physical, 48 bits virtual
- power management:
Can any one let me know what was going on and if there is a work-around? The icc version is icc "(ICC) 12.0.0 20101006".
thanks in advance!
Qianqian
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you aren't using the latest valgrind (3.6.0 released 21 October 2010), give it a try.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, we'll need more info in order to do any valid investigation.
Several methods here:
1. try "IDB".
2. the issue may not be ipo. it maybe other optimizations that only kicked in after inlining.
so lower the optimization might help work-around the problem.
This issuewill likely take a long time to isolate. please file a ticket to the Intel Premier Support (https://premier.intel.com/)to get more hands-on help.
thanks,
Jennifer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for the advice, idb is a very nice tool, I am glad that you pointed this out to me.
Using IDB, I found the code crashed at the following highlighted line:
for(i=0;i<4;i++)
if(havelsse4(plucker->m+eid*12+i*3,pout,&bary,o,d,int_coef)){
...
}
where the function havelsse4 is defined as an inline function
[bash]inline int havelsse4(float4 *vecN, float4 *pout,float4 *bary, const __m128 o,const __m128 d,const __m128 int_coef){the error occurred at "movaps xmmword ptr [r8], xmm8" which corresponds to the marked line: ptr [r8] points to address "&pout->x" and xmm8 is the result from __mm_mul_ps().
...
if(...){
_mm_store_ps(&pout->x, _mm_mul_ps(detp,_mm_shuffle_ps(inv_det, inv_det, 0))); /* crashed here */
return 1;
}
return 0;
}[/bash]
if I commend out this line, -ipo and -fast worked perfectly. pout is a pointer to a float4 (a 4 floats struct) and it was allocated. Do you think that icc did something fishy when expanding this inline function?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
how about moving the intrinsics parameters into different statements:
aa = _mm_shuffle_ps(inv_det,inv_det,0);
bb = _mm_mul_ps(detp, aa)
_mm_store_ps(&pout->x,bb);
can you give it a try?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried, but got the same error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Was r8 a multiple of 16?
movaps == move 16-byte aligned single precision
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
typedef struct Float4{
float x,y,z,w;
} float4 __attribute__ ((aligned(16)));
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you file a ticket to Intel Premier Support? We will need a test case for it. Or if you could attach the testcase here (private if prefered), it would be ok too.
btw, make sure to get the latest compiler update.
Thanks,
Jennifer

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page