<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Seems reasonable.  You could in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Aligned-loads-shift-vs-unaligned-loads-vs-vgather/m-p/1066553#M6944</link>
    <description>Seems reasonable.  You could write c or Fortran with alignment assertions</description>
    <pubDate>Wed, 18 May 2016 16:33:05 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2016-05-18T16:33:05Z</dc:date>
    <item>
      <title>Aligned loads + shift vs. unaligned loads vs. vgather</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Aligned-loads-shift-vs-unaligned-loads-vs-vgather/m-p/1066552#M6943</link>
      <description>&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60506" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;What do you recommend would be the best approach for this stencil on a Xeon Phi? &amp;nbsp;The idea is:&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60516" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60515" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;Given a 1-D array A: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18]&amp;nbsp;&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60514" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60521" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;And three scalars: probUp, probMid, probDown&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60522" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60523" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;You need to compute a 1-D array B where:&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60524" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60525" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;B[0] = probDown*A[0] + probMid*A[1] + probUp*A[2]&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60526" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;B[1] = probDown*A[1] + probMid*A[2] + probUp*A[3]&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60527" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;B[2] = probDown*A[2] + probMid*A[3] + probUp*A[4]&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60528" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;etc…&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60529" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60530" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;Everything is double precision.&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60531" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60532" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;The approach I’m going with is to load three vector registers, a0, a1, and a2 (where a0 is aligned to the loop iterator, a1 is shifted by 1 element, and a2 is shifted by 2 elements), do muls and fmadds to smoosh them together with the scalars, and then store.&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60533" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60534" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;The only question I guess I have is what would they say the most efficient way is to get the a* vector registers loaded from memory. &amp;nbsp;I’m pretty sure it’s best to do two loads and some shifts (as I’m doing below) as opposed to doing unaligned loads or using gather intrinsics, but I’d be curious if Intel has a different opinion. &amp;nbsp;&lt;/DIV&gt;

&lt;DIV id="yui_3_16_0_ym19_1_1463582561108_60545" style="-webkit-padding-start: 0px; color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 16px; line-height: normal;"&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Wed, 18 May 2016 15:40:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Aligned-loads-shift-vs-unaligned-loads-vs-vgather/m-p/1066552#M6943</guid>
      <dc:creator>Mark_D_9</dc:creator>
      <dc:date>2016-05-18T15:40:12Z</dc:date>
    </item>
    <item>
      <title>Seems reasonable.  You could</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Aligned-loads-shift-vs-unaligned-loads-vs-vgather/m-p/1066553#M6944</link>
      <description>Seems reasonable.  You could write c or Fortran with alignment assertions</description>
      <pubDate>Wed, 18 May 2016 16:33:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Aligned-loads-shift-vs-unaligned-loads-vs-vgather/m-p/1066553#M6944</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2016-05-18T16:33:05Z</dc:date>
    </item>
    <item>
      <title>So, which is the most optimal</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Aligned-loads-shift-vs-unaligned-loads-vs-vgather/m-p/1066554#M6945</link>
      <description>&lt;P&gt;So, which is the most optimal approach? Aligned loads with shifts, or unaligned loads, or using gather intrinsics?&lt;/P&gt;</description>
      <pubDate>Wed, 18 May 2016 16:35:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Aligned-loads-shift-vs-unaligned-loads-vs-vgather/m-p/1066554#M6945</guid>
      <dc:creator>Mark_D_9</dc:creator>
      <dc:date>2016-05-18T16:35:56Z</dc:date>
    </item>
    <item>
      <title>The posted papers discussing</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Aligned-loads-shift-vs-unaligned-loads-vs-vgather/m-p/1066555#M6946</link>
      <description>&lt;P&gt;The posted papers discussing -qopt-assume-safe-padding are relevant. &amp;nbsp;Compilers use gather and scatter to access partial cache lines only to prevent possible unsafe access outside the array. Unaligned access may be the slowest.&lt;/P&gt;

&lt;P&gt;You might get attention from experts if you would ask on the Intel Xeon phi forum.&lt;/P&gt;</description>
      <pubDate>Thu, 19 May 2016 12:13:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Aligned-loads-shift-vs-unaligned-loads-vs-vgather/m-p/1066555#M6946</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2016-05-19T12:13:00Z</dc:date>
    </item>
  </channel>
</rss>

