I have a code that runs about 40% faster with the O3 flag on compared to O2 flag.
But recently I have started noticing some errors (non-physical results) in the code compiled with O3 flag.
I would like to keep using O3 but would like to isolate the part of code creating issue when using O3.
How can I figure out which part of the code is creating the issue when using the O3 flag?
Anil, are you using Intel Compiler? If so what version and which OS?
Reading the problem description I assume you're seeing errors in the run time output. Is that a correct assumption?
1) Are these errors floating point errors? If so, please refer to this article https://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ that goes over some details on how to maintain consistency in floating point calculations accordingly.
4) It could be a bug in the compiler. Did you try with GCC and do you get correct results with both the options? If that's the case, then the issue needs to be triaged and filed with the developers against the compiler. You can submit an issue in Premier Support at https://premier.intel.com against the product by attaching a test reproducer for triaging the issue (there are internal options to narrow down to internal optimization routines causing the issue) and communicate with you directly through resolution. The content is secure in Premier and you can attach large test files. Or you can attach to this thread if it's a small test reproducer and I can take a look at it and file the issue with the developers accordingly. Let me know.
Kittur, I am using intel compiler (icpc (ICC) 15.0.0 20140723), on Arch Linux.
Yes, the errors are in the output. The code is roughly ~12k lines with interfaces to few external libraries.
So I am finding it hard to isolate the part. I am having issues with gcc too, 02 to 03 producing similar results.
I am using "-fp-model precise" with icpc. Unless I can isolate the issue It will be hard for me to come up with
a smaller code that could trigger similar issue.
Thanks Anil, understood. Since it also happens with GCC I am suspecting that the issue could be related to something else as well. Another option for you is to download an evaluation copy of Intel(R) Vtune Performance Analyzer to analyze the hotspots functions leading to the code causing the performance loss. Sure, if you are able to isolate and come up with a test reproducer that'll be great, thanks.