<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic __int64* saPtrX = (__int64* in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139505#M7745</link>
    <description>&lt;PRE class="brush:cpp;"&gt;__int64* saPtrX = (__int64*)saPtr;
__int64* sourcePtrX = (__int64*)sourcePtr;
__int64 zero = 0;
for (int point = 0; point &amp;lt; size; indexArray++, saPtr++, point++)
{
      saPtrX[point] = decisionArray[point] ? sourcePtrX[indexArray[point]]:zero;
}&lt;/PRE&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
    <pubDate>Fri, 16 Feb 2018 15:56:14 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2018-02-16T15:56:14Z</dc:date>
    <item>
      <title>loop optimization for non-uniform access to an array</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139502#M7742</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I have following kind of loop that I am looking to optimize for intel compiler 18.0&lt;/P&gt;

&lt;DIV&gt;SomeData* sourcePtr = GetMySource&lt;SPAN style="font-size: 13.008px;"&gt;SomeData&lt;/SPAN&gt;();&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;SomeData&lt;/SPAN&gt;* saPtr = GetMy&lt;SPAN style="font-size: 13.008px;"&gt;SomeData&lt;/SPAN&gt;Pointer();&lt;/DIV&gt;

&lt;DIV&gt;int size = GetSize&lt;SPAN style="font-size: 13.008px;"&gt;SomeData&lt;/SPAN&gt;(saPtr);&lt;/DIV&gt;

&lt;DIV&gt;// indexArray has series of indices based on some business logic, can be considered random.&lt;/DIV&gt;

&lt;DIV&gt;int* indexArray = GetRandomIndexArray();&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;SomeData zero = SomeData(0);&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;//below loop needs to be optimized.&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;STRONG&gt;for (int point = 0; point &amp;lt; size; indexArray++, saPtr++, point++)&lt;/STRONG&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;STRONG&gt;{&lt;/STRONG&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; *(saPtr) = (GetDecisionMaker(point))?sourcePtr[*(indexArray)]:zero;&lt;/STRONG&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;STRONG&gt;}&lt;/STRONG&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;// GetDecisionMaker(point) returns a boolean value based on some business logic, can be considered random.&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;With intel compiler 13.0 we had a good performance, but with 18.0 we don't get a good performance.&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;All help is welcome!&lt;/DIV&gt;

&lt;DIV&gt;Thanks.&lt;/DIV&gt;</description>
      <pubDate>Thu, 15 Feb 2018 13:37:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139502#M7742</guid>
      <dc:creator>Abhishek_S_4</dc:creator>
      <dc:date>2018-02-15T13:37:50Z</dc:date>
    </item>
    <item>
      <title>Try this first:</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139503#M7743</link>
      <description>&lt;P&gt;Try this first:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;for (int point = 0; point &amp;lt; size; point++)
{
      saPtr[point] = (GetDecisionMaker(point))?sourcePtr[indexArray[point]]:zero;
}&lt;/PRE&gt;

&lt;P&gt;IOW use the index point as opposed to advancing pointers (which may have lifespan after loop).&lt;/P&gt;

&lt;P&gt;Please describe GetDecisionMaker.&lt;/P&gt;

&lt;P&gt;Be aware that the newer CPU architectures have Scatter/Gather instructions. If the type of SomeData is a "standard" type (char, short, int, float, double, ...) .AND. if the GetDecisionMaker is suitable to (vectorwise) generate a mask, then the loop may be vectorized by the compiler (assuming appropriate compiler optimization options and/or #pragma are used).&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Thu, 15 Feb 2018 15:42:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139503#M7743</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2018-02-15T15:42:04Z</dc:date>
    </item>
    <item>
      <title>Thank you Jim for the reply!</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139504#M7744</link>
      <description>&lt;P&gt;Thank you Jim for the reply!&lt;/P&gt;

&lt;P&gt;I vectorized GetDecisionMaker(point) as decisionArray and now I have below code. saPtr points to array of structure containing two integers.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;SPAN style="font-size: 13.008px; font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif;"&gt;typedef struct SomeArrayStruct { int x; int y; } SomeArray;&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;for (int point = 0; point &amp;lt; size; indexArray++, saPtr++, point++)
{
      saPtr[point] = decisionArray[point] ? sourcePtr[indexArray[point]]:zero;
}&lt;/PRE&gt;

&lt;P&gt;the performance is still the same as previous code, no improvement.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;~Abhishek&lt;/P&gt;</description>
      <pubDate>Fri, 16 Feb 2018 09:08:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139504#M7744</guid>
      <dc:creator>Abhishek_S_4</dc:creator>
      <dc:date>2018-02-16T09:08:43Z</dc:date>
    </item>
    <item>
      <title>__int64* saPtrX = (__int64*</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139505#M7745</link>
      <description>&lt;PRE class="brush:cpp;"&gt;__int64* saPtrX = (__int64*)saPtr;
__int64* sourcePtrX = (__int64*)sourcePtr;
__int64 zero = 0;
for (int point = 0; point &amp;lt; size; indexArray++, saPtr++, point++)
{
      saPtrX[point] = decisionArray[point] ? sourcePtrX[indexArray[point]]:zero;
}&lt;/PRE&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 16 Feb 2018 15:56:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139505#M7745</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2018-02-16T15:56:14Z</dc:date>
    </item>
    <item>
      <title>Note, the earlier post</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139506#M7746</link>
      <description>&lt;P&gt;Note, the earlier post relating to scatter/gather requires CPU that supports this and compiler option and/or #pragma hint to use this.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 16 Feb 2018 15:59:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/loop-optimization-for-non-uniform-access-to-an-array/m-p/1139506#M7746</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2018-02-16T15:59:12Z</dc:date>
    </item>
  </channel>
</rss>

