I am experimenting with AVX-512 on a Skylake-SP processor using the Intel C++ Compiler version 18.2.
I am curious to know the difference, if any, between these two options to the compiler:
Are they different? If so, why would I want to choose one over the other? The documentation does not make this clear.
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Parallel Computing
I suppose you might still wonder why Intel thinks these descriptions require legalese. Are you saying that the superficial interpretation doesn't work for you, that CORE-AVX512 is the minimum set of AVX512 instructions included in all AVX512 implementations, while SKYLAKE-AVX512 includes all the recommended options for that CPU family? You're welcome to benchmark your application to find out where one or the other may have the advantage. While you're about it, you might investigate whether the compile is making the best choice with either option between -256 and -512 instructions for your case, e.g. by flipping that option as well.
The description of the relationship between CORE-AVX512 and COMMON-AVX512 looks buggy and might throw you off. If the two are the same, why not say so?
I stumbled upon the same question and came to the conclusion that CORE-AVX512 and SKYLAKE-AVX512 are synonyms.
I'm currently working on a platform with Intel Xeon Gold 61XX microprocessors using Intel 2019.4 compilers. I've experimented with the code snippets suggested in this very interesting article: https://colfaxresearch.com/skl-avx512/ . The assembly code produced using these options is identical. But beware that even using these options the compiler won't exploit the AVX512 registers (the zmm registers) unless one use also the -qopt-zmm-usage=high option.
Hope this helps.
>> But beware that even using these options the compiler won't exploit the AVX512 registers (the zmm registers) unless one use also the -qopt-zmm-usage=high option
Thanks for that tidbit.
I wonder why this is the case?
On a different forum thread John McCalpin stated that one can disable processor features in the BIOS relating to the different SIMD instruction sets. And that this can (for some applications) improve the clock rate (Turbo change points and limits). While there may be lane change delays between 128/256/512, imho, I think it may be more of a thermal restriction (or advantage). IOW the CPU design has a limited number of thermal sensors, and the interpolation of localized maximum temperature (away from the sensor) is better when the CPU knows what portions of the die are not used.
BTW.... It should be obvious that if (when) you are inclined to disable CPU instructions in the BIOS you should NOT compile with optimization switches targeting the given CPU (as in doing so may generate code using the disabled instructions).
Intel's compiler targets specify both the instruction set and the tuning target, while GCC allows separate specification of instruction set ("-march") and tuning target ("-mtune"). Some of the Intel targets are more specialized than others -- for example, when I tested the "-xAVX" target a few years ago, it generated AVX code that was tuned for the Sandy Bridge core. In this case, that meant using 128-bit loads instead of 256-bit loads if the compiler could not prove the loads were naturally aligned. This was not necessary on Ivy Bridge (though it did not hurt), and was actually detrimental to performance on Haswell/Broadwell and later cores.
The -xCORE-AVX512 option came first, and clearly targets the SKX processor. Creating a new -xSKYLAKE-AVX512 makes sense if the compiler team plans to change the -xCORE-AVX512 option to target future AVX512 cores.