<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic traces need not be aligned in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115641#M7467</link>
    <description>&lt;P&gt;traces need not be aligned since it is an array of structures containing two array of float pointers. Typically you will not use SIMD instructions to manipulate these (except for possible copying one traces32 structure to another. Can you show more of the code. Also, it helps at time in debug build to insert asserts to assure you are going to use is in fact what you think you are going to use.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
    <pubDate>Wed, 03 Feb 2016 17:44:34 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2016-02-03T17:44:34Z</dc:date>
    <item>
      <title>Alignment problem</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115640#M7466</link>
      <description>&lt;P&gt;Dear Intel Developers,&lt;/P&gt;

&lt;P&gt;I'm using Intel icc 15.0.1 version on a C program. I'm trying to align a structure of arrays and the same structure is passed to a computational kernel that uses Intrinsics. I'm not sure I'm doing the right allocations:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;struct traces_32 {
    float32* r;
    float32* i;
};

typedef struct traces_32 traces32;

.....


traces32* traces = (traces32*)_mm_malloc(*ntr * sizeof(traces32), 16);

for (i = 0; i &amp;lt; *ntr; i++) {
      traces&lt;I&gt;.r = (float32 *)_mm_malloc( (nsamples_padded) * sizeof(float32), 16);
      traces&lt;I&gt;.i = (float32 *)_mm_malloc( (nsamples_padded) * sizeof(float32), 16);
  }&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;Is it right this way? The code dies on computational kernel on _mm_load_ps with traces involved. If I use _mm_loadu_ps and malloc instead of _mm_malloc kernel works well, so It seems an alignement problem. Could you help me? Thanks.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 10:36:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115640#M7466</guid>
      <dc:creator>unrue</dc:creator>
      <dc:date>2016-02-03T10:36:28Z</dc:date>
    </item>
    <item>
      <title>traces need not be aligned</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115641#M7467</link>
      <description>&lt;P&gt;traces need not be aligned since it is an array of structures containing two array of float pointers. Typically you will not use SIMD instructions to manipulate these (except for possible copying one traces32 structure to another. Can you show more of the code. Also, it helps at time in debug build to insert asserts to assure you are going to use is in fact what you think you are going to use.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 17:44:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115641#M7467</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2016-02-03T17:44:34Z</dc:date>
    </item>
    <item>
      <title>Quote:jimdempseyatthecove</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115642#M7468</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;traces need not be aligned since it is an array of structures containing two array of float pointers. Typically you will not use SIMD instructions to manipulate these (except for possible copying one traces32 structure to another. Can you show more of the code. Also, it helps at time in debug build to insert asserts to assure you are going to use is in fact what you think you are going to use.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hi Jim, the original code worked as is:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt; traces = (complex32 **)malloc( *ntr * sizeof(complex32 *));
  for (i = 0; i &amp;lt; *ntr; i++) 
      traces&lt;I&gt; = (complex32 *)malloc( *nsamples * sizeof(complex32));


for( n... {
   for(j ... {
       sample_r = traces&lt;N&gt;&lt;J&gt;.r
       sample_i = traces&lt;N&gt;&lt;J&gt;.i

       }
   }
&lt;/J&gt;&lt;/N&gt;&lt;/J&gt;&lt;/N&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;And it is very bad to vectorize it, because each elements is a structure of complex. So, I changed that code in a posted way, in order to have contiguos elements for real imaginary part, my new usage is:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;for( n... {
    for(j....{
        sample_r = traces&lt;N&gt;.r&lt;J&gt;
        sample_i = traces&lt;N&gt;.i&lt;J&gt;
    }
}&lt;/J&gt;&lt;/N&gt;&lt;/J&gt;&lt;/N&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 08:25:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115642#M7468</guid>
      <dc:creator>unrue</dc:creator>
      <dc:date>2016-02-04T08:25:00Z</dc:date>
    </item>
    <item>
      <title>Structure of arrays</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115643#M7469</link>
      <description>&lt;P&gt;Structure of arrays organization may be required to take advantage of avx256 and avx512 where sse3 has satisfactory simd support for complex data type.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 10:43:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115643#M7469</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2016-02-04T10:43:28Z</dc:date>
    </item>
    <item>
      <title>how did it die? any screen</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115644#M7470</link>
      <description>&lt;P&gt;how did it die? any screen capture as illustration?&lt;/P&gt;

&lt;P&gt;could you show the corresponding disassembly and register values?&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 13:22:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115644#M7470</guid>
      <dc:creator>JWong19</dc:creator>
      <dc:date>2016-02-04T13:22:05Z</dc:date>
    </item>
    <item>
      <title>Quote:Tim P. wrote:</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115645#M7471</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Tim P. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Structure of arrays organization may be required to take advantage of avx256 and avx512 where sse3 has satisfactory simd support for complex data type.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hi Tim. Could you explain better this point? Structure of arrays is not ever the best solution so? And apart this question, is my aligment right?&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 13:24:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115645#M7471</guid>
      <dc:creator>unrue</dc:creator>
      <dc:date>2016-02-04T13:24:00Z</dc:date>
    </item>
    <item>
      <title>I'm agreeing that you may</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115646#M7472</link>
      <description>&lt;P&gt;I'm agreeing that you may have chosen a reasonable method to support AVX optimization, but I don't see that it would have an advantage on a non-AVX CPU.&amp;nbsp; So I'm guessing you are motivated by AVX, although you didn't show enough to evaluate that question.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 16:53:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115646#M7472</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2016-02-04T16:53:49Z</dc:date>
    </item>
    <item>
      <title>Hi Tim, I'm developing SSE</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115647#M7473</link>
      <description>&lt;P&gt;Hi Tim, I'm developing SSE and AVX version, in order to get best performance, so It would be interested if I'm doing a correct alignment, and I don't still understand if my alignment on traces structure is it right or not, by using _mm_malloc on the first post.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 09:17:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115647#M7473</guid>
      <dc:creator>unrue</dc:creator>
      <dc:date>2016-02-05T09:17:34Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;Structure of arrays is not</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115648#M7474</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;Structure of arrays is not ever the best solution so?&lt;/P&gt;

&lt;P&gt;The above is a generalized statement. TimP was referring to the special condition of complex numbers. This is a&amp;nbsp;two element structure with specific operational characteristics that make them somewhat compatible with AVX manipulations. See &lt;A href="http://www.codeproject.com/Articles/874396/Crunching-Numbers-with-AVX-and-AVX"&gt;http://www.codeproject.com/Articles/874396/Crunching-Numbers-with-AVX-and-AVX&lt;/A&gt; &amp;nbsp;near the bottom of the page.&lt;/P&gt;

&lt;P&gt;That article illustrates vectorization of complex multiply.&lt;/P&gt;

&lt;P&gt;As to if SOA or AOS is better for vectorization, this would depend on your application.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 15:47:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Alignment-problem/m-p/1115648#M7474</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2016-02-05T15:47:57Z</dc:date>
    </item>
  </channel>
</rss>

