I believe I've found a bug with the compiler - but am willing to be proved wrong <g>.
I have a project which is set up with different object files to target different instruction sets. Each of these object files instantiates some of the same templated code with different parameters:
template <class impl_t> class Thing
impl_t addTwo(const impl_t& a, const impl_t& b)
... then SSEImplementation.cpp is compiled for SSE CPUs, AVXImplementation.cpp is compiled for AVX CPUs, and a dispatcher determines at runtime which one to use based on available CPU capabilities.
All has been working fine-and-dandy using MSVC++ 2015 and Xcode 6. But using the Intel compiler there's a problem.
The real files are rather more complex than the above, and instantiating the template causes quite a bit of code to be generated at compile time, including various dependent functions.
The problem seems to be that icc tries to share these between compilation units (in a reasonable attempt to keep code bloat down?) which is fine in most cases - but /not/ in cases where the supported instruction sets differ between two compilation units. So, for example, a rounding function is being generated in AVXImplementation.cpp, which generates the VROUNDSS function. Thanks to the compiler, this function ends up being reused in, and called from, SSEImplementation.cpp. Which is fine if you run the resulting code on an AVX machine, but not if you run it on an SSE machine.
It's entirely possible that at a language level the two compiler-generated functions would have the same signature (arguments, return types, sizes, alignment etc..) but nevertheless it shouldn't try to share generated functions between compilation units that are compiled for different instruction sets.