Community
cancel
Showing results for 
Search instead for 
Did you mean: 
McCalpinJohn
Black Belt
59 Views

Why does compiler generate gather loops when using unsigned array offsets?

I raised this issue in another forum (https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/603688), but wanted to see if any of the compiler folks could explain to me why the compiler generates VGATHER-based code whenever I use an unsigned variable as an offset to an array index?

For a simple loop such as

for (int i=0; i<N; i++) {
      target = scalar * source[i+offset];
}

the compiler will claim that the access is "indirect" if the "offset" variable is unsigned, but will generate reasonable vector code if the "offset" variable is signed.  The VGATHER-based code is typically slower (~1.5x) than the corresponding scalar code and 4x or more slower than the straightforward vector code (assuming both arrays are in the L1 data cache).

Is there something about the interpretation of unsigned variables that makes the gather function necessary, or is this an idiosyncrasy of the compilers?   (I have seen it with both icc 15.0.3 and with a 2016 version.)

0 Kudos
3 Replies
pbkenned1
Employee
59 Views

Hello John,

Thanks for reporting this.  This looks more like an unnecessary codegen limitation than a compiler idiosyncrasy, so I have reported the issue to the developers, tracking ID DPD200380099.  I'll pass along updates here.

Patrick

 

McCalpinJohn
Black Belt
59 Views

Oops -- this topic got duplicated when it was moved here from a different forum...

I opened an issue for this on Premier a few days ago -- issue ID 6000146067 -- and have added several updates about the slight variations in the behavior for different compilers and different target ISAs.

pbkenned1
Employee
59 Views

No worries John -- I've picked up the Premier issue.  Thanks for the updates based on compiling with -xMIC-AVX2.  I'll add those to the problem report.

Patrick

Reply