I'm currently fighting hard with the newest version of ICL. First the profile generated code crashes in Intel stuff (https://software.intel.com/en-us/forums/intel-c-compiler/topic/760787#comment-1920417) and now this:
- If I compile /fp:fast /OxAVX2, everything is fast, the executable is huge and I cannot make it smaller using profile based build (see the other post). And it runs only on AVX2 CPUs.
- If I compile /fp:fast /OxSSE2 /OaxAVX, everything is fast, less but still
- If I compile /fp:fast /OxSSE2 /OaxAVX,CORE-AVX2, it's superfast, actually faster than /OxAVX2 :), that itself is weird, and well, it doesn't work - some calculations just result in some nonsense, in the superhuge code I cannot really post any "minimum example" or anything.
- If I compile /fp:precise /OxSSE2 /OaxAVX,CORE-AVX2, it gets superslow and huge, but works :).
It's pretty obvious that some optimization makes things dysfunctional and since having alternative path to AVX2 is faster than compiling the whole thing directly for AVX2 (albeit not working correctly), something is not working as it should. For the record, it's audio processing, contains lots of vectorizable loops for crossmultiplication of buffers etc.