topic int64* saPtrX = (int64* in Intel® Moderncode for Parallel Architectures

loop optimization for non-uniform access to an array

Abhishek_S_4 — Thu, 15 Feb 2018 13:37:50 GMT

Hi,

I have following kind of loop that I am looking to optimize for intel compiler 18.0

SomeData* sourcePtr = GetMySourceSomeData();

SomeData* saPtr = GetMySomeDataPointer();

int size = GetSizeSomeData(saPtr);

// indexArray has series of indices based on some business logic, can be considered random.

int* indexArray = GetRandomIndexArray();

SomeData zero = SomeData(0);

//below loop needs to be optimized.

for (int point = 0; point < size; indexArray++, saPtr++, point++)

{

*(saPtr) = (GetDecisionMaker(point))?sourcePtr[*(indexArray)]:zero;

}

// GetDecisionMaker(point) returns a boolean value based on some business logic, can be considered random.

With intel compiler 13.0 we had a good performance, but with 18.0 we don't get a good performance.

All help is welcome!

Thanks.

Try this first:

jimdempseyatthecove — Thu, 15 Feb 2018 15:42:04 GMT

Try this first:

for (int point = 0; point < size; point++)
{
      saPtr[point] = (GetDecisionMaker(point))?sourcePtr[indexArray[point]]:zero;
}

IOW use the index point as opposed to advancing pointers (which may have lifespan after loop).

Please describe GetDecisionMaker.

Be aware that the newer CPU architectures have Scatter/Gather instructions. If the type of SomeData is a "standard" type (char, short, int, float, double, ...) .AND. if the GetDecisionMaker is suitable to (vectorwise) generate a mask, then the loop may be vectorized by the compiler (assuming appropriate compiler optimization options and/or #pragma are used).

Jim Dempsey

Thank you Jim for the reply!

Abhishek_S_4 — Fri, 16 Feb 2018 09:08:43 GMT

Thank you Jim for the reply!

I vectorized GetDecisionMaker(point) as decisionArray and now I have below code. saPtr points to array of structure containing two integers.

typedef struct SomeArrayStruct { int x; int y; } SomeArray;

for (int point = 0; point < size; indexArray++, saPtr++, point++)
{
      saPtr[point] = decisionArray[point] ? sourcePtr[indexArray[point]]:zero;
}

the performance is still the same as previous code, no improvement.

Thanks,

~Abhishek

int64* saPtrX = (int64*

jimdempseyatthecove — Fri, 16 Feb 2018 15:56:14 GMT

__int64* saPtrX = (__int64*)saPtr;
__int64* sourcePtrX = (__int64*)sourcePtr;
__int64 zero = 0;
for (int point = 0; point < size; indexArray++, saPtr++, point++)
{
      saPtrX[point] = decisionArray[point] ? sourcePtrX[indexArray[point]]:zero;
}

Jim Dempsey

Note, the earlier post

jimdempseyatthecove — Fri, 16 Feb 2018 15:59:12 GMT

Note, the earlier post relating to scatter/gather requires CPU that supports this and compiler option and/or #pragma hint to use this.

Jim Dempsey

topic __int64* saPtrX = (__int64* in Intel® Moderncode for Parallel Architectures

loop optimization for non-uniform access to an array

Try this first:

Thank you Jim for the reply!

__int64* saPtrX = (__int64*

Note, the earlier post

topic int64* saPtrX = (int64* in Intel® Moderncode for Parallel Architectures

int64* saPtrX = (int64*