- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm optimizing the simplest code of looping over 2 arrays and summing them into third array.
I used:
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is observed as I an see that when I comment this out the Compiler chose unaligned access pattern for the optimized code.
Using #pragma vector aligned indeed forces aligned access pattern but then turn off OpenMP.
According to:
https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization
https://software.intel.com/en-us/node/532796
It should work without peeling.
I just wonder if this is a symptom the compiler misses something or behave unexpectedly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
Yes I understood this on OpenMP.
Again, my problem is that the compiler is using peeling even though it knows the arrays are aligned.
I wonder why and if this actually a bug (Or just done always as safety net).
Thank You.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also lookup the OpenMP simd clause. This can be used to force the loop partitioning (RE Tim P's #4) to be split at vector boundaries.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Guys,
I don't care about the interaction with OpenMP.
I just wanted to raise issue of peeling when peeling isn't needed.
It might point the compiler make the wrong decision in this case and I'd like to know why.
Maybe on Intel side the will even find it as a bug or a case for farther optimization.
So, my question is:
1. Did I miss something on how to imply the Compiler the arrays are aligned for AVX / SSE (Besides using the pragma which Intel itself doesn't advise)?
2. If I did not, why does the compiler make this choice? Is that on purpose (Safety net against users) or a case not fully optimized. As it seems in past blog posts of Intel peeling wasn't done.
Thank You.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe it would help if you show:
a) the source code, including information regarding: sources, targets, loop counts discoverable by the compiler
b) dissassembly code, preferably a screenshot of the disassembly view from VTune (inclusive of all alternate paths if possible)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may still be misunderstanding the under-documented #pragma vector unaligned. It doesn't require any non-alignment . To my knowledge, it simply removes cost analysis and exception coverage (like the other vector pragmas) and also prevents peeling, in case you are serious about that. If your hardware is AVX or newer, there is no performance penalty for executing unaligned instructions on aligned data, with the possible exception of the splitting of unaligned loads and stores for sandy and ivy bridge targets.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page