Hello Experts, I have a well-established code which is essentially multiplying complex matrices. Now compiling this code with icpc I find that it runs at best two times slower than the gcc 4.3 compiled version. I have tried Intel 10.1, 11.0 and 11.1 and many optimizations along the lines of -O3 -Oi -fno_alias -ftz -funroll-all-qloops -ipo -parallel -rcd -fp fast with little improvement.
To shed some light on this, I profiled the executables with gprof. For g++ it returns
You see that the most time-consuming function is the multiplication of two complex values, which seems plausible to me as my code does little else, like I said. In general: the algebraic functions have the top ranks, as I would suppose.
You see that complex multiplication only comes in third. The complex deconstructor and another function have stepped up and seem consume a lot of time!
Now I have two issues: 1) Do you see where the problem could lie? Can you give me hints where to look to improve the icpc performance, whether it be an alterration of the code or a compile flag I have not thought of?
2) "c++filt _ZN5cmplxIfEC9ERKdS2_" does not work for me. Can c++filt decipher the intel name mangling and if not, can you tell me another way how to do so?
Thank you so much for your help in advance! If need be, I am eager to provide more information.
Complex multiplication would need to be expanded in line for satisfactory performance, including enabling vectorization. icpc tends to reserve optimization of complex matrix multiplication for the options -O3 -xSSE3, with restrict qualifiers where appropriate, and a preference for C99 complex. When no information on data extents is provided, it ought to optimize for something on the order of 50x50; #pragma loop count min(10) avg(20) max(30), for example, might optimize a designated for() for smaller sizes. As MKL BLAS is provided, for a quick route to high performance, you would prefer that over elegance, if the data extents are moderate to large. You appear to be trying random combinations of linux and Windows compiler options.