I used icpc to compile an AVX program with instruction such as _mm256_maskload_epi32.
The program compiles and links fine. When I run it, I got "Illegal instruction" error.
This is on Linux machine, "SLES11SP2-2 Revision 0 ia32e".
I do see "avx" on "/proc/cpuinfo".
If I only have instruction like _mm256_sub_ps, it is fine.
Please help. If it is the hardware limitation, how should I check?
If you had -mavx set when compiling, you should not see this promoted to AVX2. You might check the asm file generated by -S.
Your /proc/cpuinfo report seems to indicate avx should be OK.
Thank you for your suggestion.
I have tried with "-mavx" and also "-march=core-avx2", they all give me same output: Illegal instruction
It seems this only happen with certain AVX instructions. For example, _mm256_sub_ps is fine, but _mm256_fmaddsub_pd will not.
What am I missing here?
The "_mm256_maskload_epi32" is a AVX2 intrinsic and when you include that as part of the code the binary will only work if you run on a HSW system for example that supports avx2 instruction set. You can generate the asm file using the -S option and check that its equivalent instruction will be " vpmaskmovd" using the ymm registers accordingly. On the other hand, the intrinsic "_mm256_sub_ps" will work on a system that supports avx (like SNB) and its equivalent instruction you'll find in the asm file as "vsubps".
So, if your code has avx instrinsics then you'll need to compile with -xAVX and if you have any avx2 intrinsics then compile with -xCORE_AVX2 switch. Of course, you'll need to run the binary supporting the corresponding intrinsics you use in your code. The https://software.intel.com/sites/landingpage/IntrinsicsGuide/ guide gives the list of supported intrinsics including avx and avx2.
If your code doesn't use any intrinsics per-se, then using the switch -xHOST will use the highest available SIMD set on the system and the asm generated will reflect the systems support for the available intrinsics accordingly.
BTW, you can use the manual dispatch procedures to dispatch a particular routine to the processor of choice. The article https://software.intel.com/en-us/articles/how-to-manually-target-2nd-generation-intel-core-processor... should give some details on its usage so you can target sections of code to the processor of choice as well which might be useful.
I tried with -xHOST and got same error message.
So if I only see avx in the /proc/cpuinfo, without avx2, does it mean my system doesn't support avx2?
As I noted earlier, the __mm256_maskload_epi32() function is only provided by AVX2 instruction set and not by AVX set. Therefore if you use that function in your code and since this is an intrinsic function the compiler will generate asm for that function which will be an AVX2 instruction. Hence when you run the binary it'll core-dump if you run it on an system that only supports AVX and not AVX2.
That said, the -xHOST switch is to let the compiler know to take the max available SIMD set on the system you compile for the application build. But, since you're explicitly using an intrinsic that can only be generated to an equivalent asm and in this case an avx2 instruction as you can see in the asm file which will be equivalent to "vpmaskmovd". In general you should make sure to use -xAVX switch to compile for AVX and -xCORE_AVX2 for generating code for processors supporting AVX2. And if you have functions that need to be dispatched according to the processors the application is run on, then you need to use manual dispatch procedures that I noted earlier.
And, yes "cat /proc/cpuinfo" shows only avx then it only supports AVX. If it shows "avx2" then it supports AVX2. So, a SandyBridge (2nd gen) system only supports AVX while a Haswell (4th gen) system supports AVX2. Hope this makes it clear?
Sorry for the late response.
Yes, the problem is resolved. As Kittur said, I tried on one machine with avx flag on cpuinfo and certain instruction doesn't work. I got another machine with same avx flag (no avx2 flag) and it works.
So I guess, the first machine is SNB and the second is HSW? Honestly, I can't tell.
@zlw: Thanks for the confirmation. Well, if the processor on the first system indicates avx then that supports AVX (and is generally a SNB system). If the system indicates AVX2 then it's a Haswell system. So the instruction you're trying can only work on a Haswell system and will fail to run on a sandy bridge system (cpuinfo will only show avx), fyi.
I suppose if your linux isn't up to date it may not show avx2 in the flags on a haswell CPU. You could look up your CPU at ark.intel.com If the os supports avx then avx2 would work when the CPU supports it.
@zlw: Np, thanks for your reply. BTW, Tim's response answers your question. If there's no avx2 support and if you code has an avx2 instruction it should fail! Unless you have used manual dispatch procedure to dispatch the same routine on two different CPUs which I don't think it is based on what you mention.