- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Probably while executing case 2 code CPU is able to exploit more efficiently Instruction Level Parallelism (ILP).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapolak wrote:
Probably while executing case 2 code CPU is able to exploit more efficiently Instruction Level Parallelism (ILP).
Case 2 could yield better ILP overall, but this doesn't explain why it is much faster than case 1 (13.7s vs 24.0s) while there is more work to do.
Patrick, why are there more branches in case 2 than in case 1 while a "je" instruction has been commented out? It seems that the code flow is not exactly the same, and this could have an influence, IMHO.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>but this doesn't explain why it is much faster than case 1 (13.7s vs 24.0s) while there is more work to do.>>>
Thanks for correction because I did not pay an attention to those 29 instructions.
I would try to run VTune on those two versions of the code in order get more comprehensive CPU metrics. Running aferomentioned code under debugger should be also done in order to see which code path is executed where code is compiled (case 2).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vincent Lefevre wrote:
Patrick, why are there more branches in case 2 than in case 1 while a "je" instruction has been commented out? It seems that the code flow is not exactly the same, and this could have an influence, IMHO.
From a private reply by Patrick, his asm excerpt was incorrect, indeed yielding different code flow in case 1 and case 2, explaining the obtained timings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page