04-08-2015 09:50 AM
We have a code using C++11 features to essentially execute a large metaprogram. The program uses both classic template metaprogramming and constexpr functions to help the compiler process a huge type object and from it instantiate a (hopefully) very efficient code. There is a lot of redundancy in the tree, and if this is pointed out to the compiler, then there is a hope that it can collapse it all to a very efficient code.
We've tested the code on gcc 4.9, nvcc 7.0, clang 3.5 and icc 15.2.164, and currently only clang produces the correct optimised output, i.e. only clang follows the metaprogram "correctly" and optimises the tree fully. To do this we had to use the following compiler flags
-std=c++11 -ftemplate-depth-512 -fconstexpr-steps=2000000 -fconstexpr-depth=10000 -O3
In particular, the -fconstexpr-steps flag was crucial (the value given is sufficient not necessary).
icc 15.2.164 produces huge executables (15MB compared with 63K for clang) and has a runtime about 3x slower than clang.
Now I know there are a lot of knobs and bells on icc, not all of which are documented. Could you suggest some flags/options we could try to coax icc into doing what we want it to do? Current icc compiler flags are