- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everybody,
We just built python today using icc 2016. On the dev machine everything works fine, but on our CI server all our automated tests for python fail (crash). When I try to use the python command line, I'm getting the following error message: "Please verify that both the operating system and the processor support Intel(R) MOVBE, F16C, FMA, BMI, LZCNT and AVX2 instructions." After printing, python exits.
I printed out /proc/cpuinfo for both machines:
dev machine:
model name : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz stepping : 2 microcode : 49 cpu MHz : 2599.785 cache size : 20480 KB physical id : 0 siblings : 16 core id : 7 cpu cores : 8 apicid : 15 initial apicid : 15 fpu : yes fpu_exception : yes cpuid level : 15 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid
on the CI machine:
processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz stepping : 2 cpu MHz : 2599.999 cache size : 20480 KB fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm ida arat epb pln pts dts
So it's apparent, that the CI machine does not have all flags enabled that the dev machine has. Question is: which compiler flags can/must I (un)set to get the Python binaries to work on both machines?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Tim and Christian
AVX2 doubles width of integer vector instructions to 256 bits, and adds FMA.
I agree it's worth to fully test on both machines on different code paths. Maybe in some cases AVX runs faster on an AVX2 platform, but in most cases I met AVX2 is still better.
Hope it helps.
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Chrisitian
Your dev machine used E5-2640 v3 supports AVX 2.0, while your CI machine supports AVX only.
Which compiler flag you used on your dev machine? Is it -xHost?
You may try -xAVX -axCORE-AVX2 to run on both machines.
Hope this helps.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yolanda's answer appears correct, but -mAVX may be sufficient. It would be difficult to see sufficient performance advantage in AVX2 to compensate for code expansion with multiple architecture paths, and you may wish to test fully on both machines if they are taking different code paths.
AVX optimization in Intel compilers is sometimes better tuned than AVX2, so AVX may actually run faster on an AVX2 platform, although that appears to border on an actionable bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yolanda,
Yes, xHost is set - we used to have an older dev machine where this flag didn''t cause any problems. I'll go through the docs and play with the options.
Thanks,
Christian
Yuan C. (Intel) wrote:
Hi, Chrisitian
Your dev machine used E5-2640 v3 supports AVX 2.0, while your CI machine supports AVX only.
Which compiler flag you used on your dev machine? Is it -xHost?
You may try -xAVX -axCORE-AVX2 to run on both machines.
Hope this helps.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Tim!
Tim P. wrote:
Yolanda's answer appears correct, but -mAVX may be sufficient. It would be difficult to see sufficient performance advantage in AVX2 to compensate for code expansion with multiple architecture paths, and you may wish to test fully on both machines if they are taking different code paths.
AVX optimization in Intel compilers is sometimes better tuned than AVX2, so AVX may actually run faster on an AVX2 platform, although that appears to border on an actionable bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Tim and Christian
AVX2 doubles width of integer vector instructions to 256 bits, and adds FMA.
I agree it's worth to fully test on both machines on different code paths. Maybe in some cases AVX runs faster on an AVX2 platform, but in most cases I met AVX2 is still better.
Hope it helps.
Thanks.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page