I had compiled the same program using the two different parameters to compile.
1. icc -qopt-report -g -O2 MD.c util.c control.c coord.h -c -lm
2. icc -qopt-report -g -O2 -ipo MD.c util.c control.c coord.h -c -lm //add "-ipc"
Then I compared every corresponding .optrpt file from the two. The result is that all the contents are the same except the second's content is
while the first's content is
It seems that the two's performances will also be the same. But the amazing result is that the second is three times speed up than the first!
So what is the reason? Who can help me explain it?
- Development Tools
- General Support
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Parallel Computing
>>But the amazing result is that the second is three times speed up than the first!
If I were to make a guess....
The first compilation was "inline everything that can be inlined".
The second compilation placed upper limits on the degree of inlining.
To a new programmer, when they discover that inlining can be good in one case, naively assume that inlining to the max must be better.
There are a few issues with over aggressive inlining (and loop unrolling)
1) The level 1 instruction cache has a limited size. A loop with several calls to the same function when inlined can produce a loop that spills out of the L1 instruction cache. Whereas the same loop with the function calls not inlined can produce a loop + function that fits within the L1 instruction cache. In the non-inlined case in this example will run faster than the inlined case.
2) overuse of inlining can at times result in over-subscription of the available registers.
You often need to be more judicious (less aggressive) in where you perform inlining and/or how/where you perform ipo.
Kindly confirm if your query is resolved or else provide a sample test case so that we can get back to you with the actual explanation for such behavior.