- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have following kind of loop that I am looking to optimize for intel compiler 18.0
- Tags:
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try this first:
for (int point = 0; point < size; point++) { saPtr[point] = (GetDecisionMaker(point))?sourcePtr[indexArray[point]]:zero; }
IOW use the index point as opposed to advancing pointers (which may have lifespan after loop).
Please describe GetDecisionMaker.
Be aware that the newer CPU architectures have Scatter/Gather instructions. If the type of SomeData is a "standard" type (char, short, int, float, double, ...) .AND. if the GetDecisionMaker is suitable to (vectorwise) generate a mask, then the loop may be vectorized by the compiler (assuming appropriate compiler optimization options and/or #pragma are used).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Jim for the reply!
I vectorized GetDecisionMaker(point) as decisionArray and now I have below code. saPtr points to array of structure containing two integers.
typedef struct SomeArrayStruct { int x; int y; } SomeArray;
for (int point = 0; point < size; indexArray++, saPtr++, point++) { saPtr[point] = decisionArray[point] ? sourcePtr[indexArray[point]]:zero; }
the performance is still the same as previous code, no improvement.
Thanks,
~Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
__int64* saPtrX = (__int64*)saPtr; __int64* sourcePtrX = (__int64*)sourcePtr; __int64 zero = 0; for (int point = 0; point < size; indexArray++, saPtr++, point++) { saPtrX[point] = decisionArray[point] ? sourcePtrX[indexArray[point]]:zero; }
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note, the earlier post relating to scatter/gather requires CPU that supports this and compiler option and/or #pragma hint to use this.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page