Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Is pointer aliasing a problem if the pointers are the same?

meldaproduction
Beginner
581 Views

Hi,

consider this functions intended for vectorization:

void AddSqr(float* restrict dst, float* restrict src, int cnt)
{
for (int i=0; i<cnt; i++) dst = src * src;
};

This would work if the src & dst are not aliased of course. But what if src == dst? Extreme cases such as src == dst+1 are not allowed of course. But if the pointers are the same, there shouldn't be a problem, or am I missing something?

0 Kudos
10 Replies
jimdempseyatthecove
Honored Contributor III
581 Views

Insert an assert

assert(((src >= dst) || (src + cnt <= dst)));

or conditional if test to take a branch to non-vectored code.

Jim Dempsey

0 Kudos
meldaproduction
Beginner
581 Views

I don't follow your answer really.

My point with this question is that I don't see a way how any kind of vectorization could go wrong: Since every dst value is dependent on the single src value at either completely different (without any aliasing) or EXACTLY the same address, when dst is changed, the src value will never be needed anymore, because the fact that it has been written means that the output has been calculated. The only case would be if the compiler used the dst itself as temporary buffer, which I don't think is even correct.

That said, I WANT this to be vectored even if the arrays are the same.

0 Kudos
jimdempseyatthecove
Honored Contributor III
581 Views

By adding the restrict keyword, you made a contract with the compiler that, for the purpose of the code using said keyword attributed pointer, that any dereference of the pointer (* or [index]) will not introduce an aliasing issue/side effect with vectorization.

For your #1 stated loop, that contract can be fulfilled with the pointers equal to each other.

For different loops, this may not be necessarily true. Example:

for (int i=0; i<cnt; i++) dst = (src * src + 0.5* (src[i-1] * src[i-1] + 0.5* (src[i+1] * src[i+1]) / 2.0;

The above loop may or may not have vectorization issues (if the loop were unrolled at least twice and prefetches occur before store then there would be no alias issue with src==dst).

Jim Dempsey 

0 Kudos
Kittur_G_Intel
Employee
581 Views

Although Jim already answered your question here's a little more elaboration. The compiler checks for dependencies such (anti, flow etc) and checks if the pointers refer to the same memory locations and if it can't exclude the possibility of such dependencies it'll not vectorize unless of course the user helps the compiler with hints such as using the restrict keyword to assert that the memory referenced by a pointer is not aliaseed or accessed in any other way. And the compiler will not do any runtime checks for aliasing as well if the restrict keyword is used. Thus, you need know the context of the pointers when using the restrict keyword as well.

BTW, here's a good link on vectorization that should throw some light on the above as well: 

  https://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-with-intel-c-compilers/ 
 https://software.intel.com/en-us/articles/requirements-for-vectorizable-loops/

_Kittur

0 Kudos
meldaproduction
Beginner
581 Views

Thank you guys. One more thing that hit my eye: At the https://software.intel.com/en-us/articles/requirements-for-vectorizable-loops/ there is a note, that the vectorized code might not be supported by non-Intel CPUs. Does it mean that it may not work on AMD processors?

0 Kudos
TimP
Honored Contributor III
581 Views

This legalese from the ftc settlement means in practice you must choose an arch option compatible with your intended target CPU. For example, default sse2 works for all Athlon and turion 2 and later, sse3 works for all CPU of the last decade.

0 Kudos
meldaproduction
Beginner
581 Views

Thank you. So I assume it will work. I may not be the "best performance", but that's ok. I just needed to verify that it won't crash on illegal instructions or anything, because that "note" is kind of ambiguous.

0 Kudos
Kittur_G_Intel
Employee
581 Views

Yes, that's correct. As Tim mentioned it is to follow the FTC regulation per-se and should work with any compatible processor as well. 

_Kittur

0 Kudos
Bernard
Valued Contributor I
581 Views

Moreover for code to be vectorized your code should have consistent array index access pattern. 

 

 

0 Kudos
Kittur_G_Intel
Employee
581 Views

Rightly said, Iliyapolak - as this will prevent non-unit stride access from occurring in which vectorization could be possible but data operations could become too expensive accordingly and the vectorizer in such a case might report: "–“Loop was not vectorized: vectorization possible but seems inefficient” as well. Code transformations like loop interchange can avoid non unit access frequently in case access is linear and compiler does that automatically in most cases else in some cases has to be done manually.  Additionally, it's nice to also ensure the data is aligned as well to prevent expensive compiler splits of unaligned memory operations thereof.

_Kittur 

0 Kudos
Reply