I used icpc to compile an AVX program with instruction such as _mm256_maskload_epi32.
The program compiles and links fine. When I run it, I got "Illegal instruction" error.
This is on Linux machine, "SLES11SP2-2 Revision 0 ia32e".
I do see "avx" on "/proc/cpuinfo".
If I only have instruction like _mm256_sub_ps, it is fine.
Please help. If it is the hardware limitation, how should I check?
If you had -mavx set when compiling, you should not see this promoted to AVX2. You might check the asm file generated by -S.
Your /proc/cpuinfo report seems to indicate avx should be OK.
Thank you for your suggestion.
I have tried with "-mavx" and also "-march=core-avx2", they all give me same output: Illegal instruction
It seems this only happen with certain AVX instructions. For example, _mm256_sub_ps is fine, but _mm256_fmaddsub_pd will not.
What am I missing here?
The "_mm256_maskload_epi32" is a AVX2 intrinsic and when you include that as part of the code the binary will only work if you run on a HSW system for example that supports avx2 instruction set. You can generate the asm file using the -S option and check that its equivalent instruction will be " vpmaskmovd" using the ymm registers accordingly. On the other hand, the intrinsic "_mm256_sub_ps" will work on a system that supports avx (like SNB) and its equivalent instruction you'll find in the asm file as "vsubps".
So, if your code has avx instrinsics then you'll need to compile with -xAVX and if you have any avx2 intrinsics then compile with -xCORE_AVX2 switch. Of course, you'll need to run the binary supporting the corresponding intrinsics you use in your code. The https://software.intel.com/sites/landingpage/IntrinsicsGuide/ guide gives the list of supported intrinsics including avx and avx2.
If your code doesn't use any intrinsics per-se, then using the switch -xHOST will use the highest available SIMD set on the system and the asm generated will reflect the systems support for the available intrinsics accordingly.
BTW, you can use the manual dispatch procedures to dispatch a particular routine to the processor of choice. The article https://software.intel.com/en-us/articles/how-to-manually-target-2nd-generation-intel-core-processor... should give some details on its usage so you can target sections of code to the processor of choice as well which might be useful.
I tried with -xHOST and got same error message.
So if I only see avx in the /proc/cpuinfo, without avx2, does it mean my system doesn't support avx2?
As I noted earlier, the __mm256_maskload_epi32() function is only provided by AVX2 instruction set and not by AVX set. Therefore if you use that function in your code and since this is an intrinsic function the compiler will generate asm for that function which will be an AVX2 instruction. Hence when you run the binary it'll core-dump if you run it on an system that only supports AVX and not AVX2.
That said, the -xHOST switch is to let the compiler know to take the max available SIMD set on the system you compile for the application build. But, since you're explicitly using an intrinsic that can only be generated to an equivalent asm and in this case an avx2 instruction as you can see in the asm file which will be equivalent to "vpmaskmovd". In general you should make sure to use -xAVX switch to compile for AVX and -xCORE_AVX2 for generating code for processors supporting AVX2. And if you have functions that need to be dispatched according to the processors the application is run on, then you need to use manual dispatch procedures that I noted earlier.
And, yes "cat /proc/cpuinfo" shows only avx then it only supports AVX. If it shows "avx2" then it supports AVX2. So, a SandyBridge (2nd gen) system only supports AVX while a Haswell (4th gen) system supports AVX2. Hope this makes it clear?
I encounter the same problem. My cpuinfo is listed as below. How I can I set a flag to them?
1 processor : 0
2 vendor_id : GenuineIntel
3 cpu family : 6
4 model : 23
5 model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz
6 stepping : 6
7 microcode : 0x60f
8 cpu MHz : 2992.497
9 cache size : 6144 KB
10 physical id : 0
11 siblings : 4
12 core id : 0
13 cpu cores : 4
14 apicid : 0
15 initial apicid : 0
16 fpu : yes
17 fpu_exception : yes
18 cpuid level : 10
19 wp : yes
20 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm pti dtherm
21 bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
22 bogomips : 5984.99
23 clflush size : 64
24 cache_alignment : 64
25 address sizes : 38 bits physical, 48 bits virtual
26 power management:
_mm256_maskload_epi32 intrinsic is for AXV2 whereas _mm256_sub_ps is an intrinsic for AVX. So, if you use _mm256_maskload_epi32 make sure your system supports AVX2.
cat /proc/cpuinfo|grep AVX2 should show if your system supports AVX2 or not.
My cpu does not support avx. It only support:
"fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm pti dtherm"
If I really want to use the intel compiler 2021, what flag should I set to use generate instruction sets that works. My computer is ""model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz
The error information when I am running the program is:
"forrtl: severe (168): Program Exception - illegal instruction
Image PC Routine Line Source
libifcoremt.so.5 00007FA125F7462C for__signal_handl Unknown Unknow
libmpi.so.12.0.0 00007FA1200E439A impi_malloc Unknown Unknown
libmpi.so.12.0.0 00007FA12017C1DD Unknown Unknown Unknown
libmpi.so.12.0.0 00007FA12017B84B MPI_Init Unknown Unknown
libmpifort.so.12. 00007FA11FA58D9B MPI_INIT Unknown Unknown
abinit 0000000002A82936 m_xmpi_mp_xmpi_in 722 m_xmpi.F90
abinit 000000000040A865 MAIN__ 202 abinit.F90
abinit 000000000040A3F2 Unknown Unknown Unknown
libc-2.28.so 00007FA11CAF37B3 __libc_start_main Unknown Unknown
abinit 000000000040A2FE Unknown Unknown Unknown
I have tried to use -xHost when I am compiling the programs. But it seems the program still has illegal instruction sets.
I am using abinit-9.4.1. There is no way I can tell whether the code is using an intrinsic specified function. I was thinking about whether any command flags that can enable me to avoid such a specific function. PS: when I am using gnu compilers. the code can work.
What Intel compiler version are you using?
I am not sure what happened, with -xHost option, the compiler only generates code for the highest instruction set avail on the compilation host machine. So, when you run on the same host, it should be able to run.
Can you provide a completed options you use?
Sorry for the late response.
Yes, the problem is resolved. As Kittur said, I tried on one machine with avx flag on cpuinfo and certain instruction doesn't work. I got another machine with same avx flag (no avx2 flag) and it works.
So I guess, the first machine is SNB and the second is HSW? Honestly, I can't tell.
@zlw: Thanks for the confirmation. Well, if the processor on the first system indicates avx then that supports AVX (and is generally a SNB system). If the system indicates AVX2 then it's a Haswell system. So the instruction you're trying can only work on a Haswell system and will fail to run on a sandy bridge system (cpuinfo will only show avx), fyi.
I suppose if your linux isn't up to date it may not show avx2 in the flags on a haswell CPU. You could look up your CPU at ark.intel.com If the os supports avx then avx2 would work when the CPU supports it.