- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
consider this functions intended for vectorization:
void AddSqr(float* restrict dst, float* restrict src, int cnt) { for (int i=0; i<cnt; i++) dst = src * src; };
This would work if the src & dst are not aliased of course. But what if src == dst? Extreme cases such as src == dst+1 are not allowed of course. But if the pointers are the same, there shouldn't be a problem, or am I missing something?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Insert an assert
assert(((src >= dst) || (src + cnt <= dst)));
or conditional if test to take a branch to non-vectored code.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't follow your answer really.
My point with this question is that I don't see a way how any kind of vectorization could go wrong: Since every dst value is dependent on the single src value at either completely different (without any aliasing) or EXACTLY the same address, when dst is changed, the src value will never be needed anymore, because the fact that it has been written means that the output has been calculated. The only case would be if the compiler used the dst itself as temporary buffer, which I don't think is even correct.
That said, I WANT this to be vectored even if the arrays are the same.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By adding the restrict keyword, you made a contract with the compiler that, for the purpose of the code using said keyword attributed pointer, that any dereference of the pointer (* or [index]) will not introduce an aliasing issue/side effect with vectorization.
For your #1 stated loop, that contract can be fulfilled with the pointers equal to each other.
For different loops, this may not be necessarily true. Example:
for (int i=0; i<cnt; i++) dst = (src * src + 0.5* (src[i-1] * src[i-1] + 0.5* (src[i+1] * src[i+1]) / 2.0;
The above loop may or may not have vectorization issues (if the loop were unrolled at least twice and prefetches occur before store then there would be no alias issue with src==dst).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Although Jim already answered your question here's a little more elaboration. The compiler checks for dependencies such (anti, flow etc) and checks if the pointers refer to the same memory locations and if it can't exclude the possibility of such dependencies it'll not vectorize unless of course the user helps the compiler with hints such as using the restrict keyword to assert that the memory referenced by a pointer is not aliaseed or accessed in any other way. And the compiler will not do any runtime checks for aliasing as well if the restrict keyword is used. Thus, you need know the context of the pointers when using the restrict keyword as well.
BTW, here's a good link on vectorization that should throw some light on the above as well:
https://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-with-intel-c-compilers/
https://software.intel.com/en-us/articles/requirements-for-vectorizable-loops/
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you guys. One more thing that hit my eye: At the https://software.intel.com/en-us/articles/requirements-for-vectorizable-loops/ there is a note, that the vectorized code might not be supported by non-Intel CPUs. Does it mean that it may not work on AMD processors?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This legalese from the ftc settlement means in practice you must choose an arch option compatible with your intended target CPU. For example, default sse2 works for all Athlon and turion 2 and later, sse3 works for all CPU of the last decade.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. So I assume it will work. I may not be the "best performance", but that's ok. I just needed to verify that it won't crash on illegal instructions or anything, because that "note" is kind of ambiguous.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, that's correct. As Tim mentioned it is to follow the FTC regulation per-se and should work with any compatible processor as well.
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Moreover for code to be vectorized your code should have consistent array index access pattern.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rightly said, Iliyapolak - as this will prevent non-unit stride access from occurring in which vectorization could be possible but data operations could become too expensive accordingly and the vectorizer in such a case might report: "–“Loop was not vectorized: vectorization possible but seems inefficient” as well. Code transformations like loop interchange can avoid non unit access frequently in case access is linear and compiler does that automatically in most cases else in some cases has to be done manually. Additionally, it's nice to also ensure the data is aligned as well to prevent expensive compiler splits of unaligned memory operations thereof.
_Kittur
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page