Given that we don't know the hardware that our code will be running on at the time of compilation, does that mean that we need to generate a set of different EXEs each optimised for such things as SIMD or AVX2?
As Viet suggested, Intel Compiler can produce multi-CPU code. For situations like yours I have used: /QxSSEs /QaxCORE-AVX2 . With this your code is able to run on SSE3 or better CPUs and for some regions of code the optimizer creates AVX2 solutions, in case AVX2 is present at runtime (the CPU dispatch is used).
The problem that I found with two BIG projects is that if interprocedural optimization is set to /Qipo (global), the xilink have stopped with internal error [ 1>xilink: : error #10014: problem during multi-file optimization compilation (code 4) ] or the resulting code was not stable. See more information here : https://software.intel.com/en-us/forums/intel-c-compiler/topic/740119
I would like to know if other users are having problems with /QxSSEs /QaxCORE-AVX2 /Qipo .
For small projects I did not find errors, but I am not comfortable with that combination switches any more.
If /Qipo is changed to /Qip, no problem was found but most of the performance gain from the use of AVX2 is gone (for us).