<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Which AVX memory access pattern is better? in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-AVX-memory-access-pattern-is-better/m-p/1130228#M7686</link>
    <description>&lt;P&gt;For example, there are an array A. it’s length is length_A.&amp;nbsp; Using AVX gather(_mm256_i32gather_i32) function to read array A. There are two memory access pattern.&lt;/P&gt;&lt;P&gt;1.&amp;nbsp;&lt;/P&gt;&lt;P&gt;mm256 register = (A[0], A[1],….A[7])&lt;/P&gt;&lt;P&gt;mm256 register = (A[8], A[9],….A[15]),,,and so on&lt;/P&gt;&lt;P&gt;2.&lt;/P&gt;&lt;P&gt;stride = length_a /8;&lt;/P&gt;&lt;P&gt;mm256 register = (A[0], A[stride+0],….A[7*stride+0])&lt;/P&gt;&lt;P&gt;mm256 register = (A[1], A[stride+1],….A[7*stride+1]),,,and so on&lt;/P&gt;&lt;P&gt;which is better when length_A is very large?&lt;/P&gt;</description>
    <pubDate>Mon, 28 Oct 2019 16:42:40 GMT</pubDate>
    <dc:creator>sun__lei</dc:creator>
    <dc:date>2019-10-28T16:42:40Z</dc:date>
    <item>
      <title>Which AVX memory access pattern is better?</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-AVX-memory-access-pattern-is-better/m-p/1130228#M7686</link>
      <description>&lt;P&gt;For example, there are an array A. it’s length is length_A.&amp;nbsp; Using AVX gather(_mm256_i32gather_i32) function to read array A. There are two memory access pattern.&lt;/P&gt;&lt;P&gt;1.&amp;nbsp;&lt;/P&gt;&lt;P&gt;mm256 register = (A[0], A[1],….A[7])&lt;/P&gt;&lt;P&gt;mm256 register = (A[8], A[9],….A[15]),,,and so on&lt;/P&gt;&lt;P&gt;2.&lt;/P&gt;&lt;P&gt;stride = length_a /8;&lt;/P&gt;&lt;P&gt;mm256 register = (A[0], A[stride+0],….A[7*stride+0])&lt;/P&gt;&lt;P&gt;mm256 register = (A[1], A[stride+1],….A[7*stride+1]),,,and so on&lt;/P&gt;&lt;P&gt;which is better when length_A is very large?&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2019 16:42:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Which-AVX-memory-access-pattern-is-better/m-p/1130228#M7686</guid>
      <dc:creator>sun__lei</dc:creator>
      <dc:date>2019-10-28T16:42:40Z</dc:date>
    </item>
  </channel>
</rss>

