- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compile the attached source file with '-O3' and either of '-xSSE4.2' and '-xAVX'. ifort 13.0 vectorizes the k-loop but generates an unneeded scalar version. Since the loop count is 4, in no cases can that scalar version be used.
By the way, the compiler seems to be too aggressive in vectorization. It generates simulated gathers for accesses to the o array. In order to use VPSLLD, it uses three instructions to pack four integers into a vector, then uses another 7 instructions to unpack them into four GPRs. It would have better to just use GPRs from the start and use SHL/LEA instead of VPSLLD.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:>>With your modifications, the compiler no longer vectorizes the loop. That eliminates the root cause of all raised issues. If the second k-loop is force-vectorized with '!dec$ simd', the scalar version is still generated.
From my programming perspective:
My primary concern is not if the compiler reports vectorization or not, or if vectorization is used or not.
Rather, that the compiler uses vectorization when it is appropriate (read faster code).With the pick list modifications, did the code run faster than without pick list (with and without explicit simd vectorization)?
What I am trying to teach the readers of this thread is: Do not assume vectorization is always best (force it when not appropriate), and at times help out the compiler (e.g. incorporating the pick list).
BTW - it was a good catch to look down to the disassembly level to notice the root cause of additional overhead. Not all posters do this. This is not as hard as it seams.
Jim Dempsey
Maybe you misunderstood. I meant that the compiler generates a useless scalar remainder loop when it decides to vectorize. This is a separate issue from whether it makes good decisions on whether to vectorize or not.
I did not test your code, but now that it prevents vectorization, I presume that '!dec$ novector' will have the same effect (or better effect because the compiler does not need to worry about the o?k arrays). And yes, my measurements did show that disabling vectorization results in better performance. The uncertain point is, like I mentioned, the reason of such performance degradation. Whether this code is worth vectorizing cannot be immediately tested because compiler generates less-than-ideal code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kevin Davis (Intel) wrote:The value of ld cannot affect the vectorizability of the k-loop. k only ever appears in the second subscripts of references to the o and x arrays, which has nothing to do with ld. Furthermore, if ld is indeed zero, then the i-loop must not run, i.e., n must be less than 9, otherwise all accesses to o and x will be out of bounds. In this case, no code is ever needed, including the scalar loop.Development continues to investigate and commented regarding the scalar version writing "Can't get rid of scalar code ---- if vectorized. The array dim size LD may be zero and the scalar code is used for that fall back path."
I will update as I hear more and pass any comments back you may have.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kevin Davis (Intel) wrote:Does the issue regarding the use of vector shift vs scalar shift has a tracking id as well?Thank you for the feedback. I failed to indicate earlier that this issue was reported to Developers under the internal tracking id, DPD200237580. I added your latest feedback and will update when I learn more.
(Internal tracking id: DPD200237580)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page