链接已复制
iliyapolak wrote:
Probably while executing case 2 code CPU is able to exploit more efficiently Instruction Level Parallelism (ILP).
Case 2 could yield better ILP overall, but this doesn't explain why it is much faster than case 1 (13.7s vs 24.0s) while there is more work to do.
Patrick, why are there more branches in case 2 than in case 1 while a "je" instruction has been commented out? It seems that the code flow is not exactly the same, and this could have an influence, IMHO.
>>>but this doesn't explain why it is much faster than case 1 (13.7s vs 24.0s) while there is more work to do.>>>
Thanks for correction because I did not pay an attention to those 29 instructions.
I would try to run VTune on those two versions of the code in order get more comprehensive CPU metrics. Running aferomentioned code under debugger should be also done in order to see which code path is executed where code is compiled (case 2).
Vincent Lefevre wrote:
Patrick, why are there more branches in case 2 than in case 1 while a "je" instruction has been commented out? It seems that the code flow is not exactly the same, and this could have an influence, IMHO.
From a private reply by Patrick, his asm excerpt was incorrect, indeed yielding different code flow in case 1 and case 2, explaining the obtained timings.
