<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Could you possibly post a in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041027#M46032</link>
    <description>&lt;P&gt;Could you possibly post a small sample code? Thanks.&lt;/P&gt;</description>
    <pubDate>Tue, 16 Jun 2015 16:34:25 GMT</pubDate>
    <dc:creator>Frances_R_Intel</dc:creator>
    <dc:date>2015-06-16T16:34:25Z</dc:date>
    <item>
      <title>Data alignment problem</title>
      <link>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041026#M46031</link>
      <description>&lt;P&gt;Hi there&lt;/P&gt;

&lt;P&gt;I was trying to offload some computation to MIC using "pragma", sending data addressed by a pointer p, then how to ensure the alignment of data on MIC after MIC recieved it?&amp;nbsp;Does" __assume(p, 64)" work?I was trying to use instrinsics to load data to the vector RF, which requires the alignment of data.&lt;/P&gt;

&lt;P&gt;Another problem, that I was trying to active lots of threads for the calculation using "#pragma omp parallel for", and some arrays inside the loop must be thread private while also 64-byte aligned.&lt;/P&gt;

&lt;P&gt;I was using "_mm_malloc()" inside the loop to ensure these, but this leads to reduplicated and unnecessary allocation.&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jun 2015 13:52:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041026#M46031</guid>
      <dc:creator>Nick_L_1</dc:creator>
      <dc:date>2015-06-16T13:52:26Z</dc:date>
    </item>
    <item>
      <title>Could you possibly post a</title>
      <link>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041027#M46032</link>
      <description>&lt;P&gt;Could you possibly post a small sample code? Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jun 2015 16:34:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041027#M46032</guid>
      <dc:creator>Frances_R_Intel</dc:creator>
      <dc:date>2015-06-16T16:34:25Z</dc:date>
    </item>
    <item>
      <title>Quote:Frances Roth (Intel)</title>
      <link>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041028#M46033</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Frances Roth (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Could you possibly post a small sample code? Thanks.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;In the main function:&lt;/STRONG&gt;&lt;/P&gt;

&lt;DIV&gt;....... &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;double * p;&lt;/DIV&gt;

&lt;DIV&gt;p = (double * )malloc(sizeof(double)*1024);&lt;/DIV&gt;

&lt;DIV&gt;#pragma offload target(mic:0) in(p:length(128)&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foo(p);&lt;/DIV&gt;

&lt;DIV&gt;.......&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;P&gt;&lt;STRONG&gt;The data addressed by p is transfered into MIC And the function foo is defined like this:&lt;/STRONG&gt;&lt;/P&gt;

&lt;DIV&gt;__attribute__((target(mic)))void foo( double * p)&lt;/DIV&gt;

&lt;DIV&gt;{&lt;/DIV&gt;

&lt;DIV&gt;#ifdef __MIC__&lt;/DIV&gt;

&lt;DIV&gt;......&lt;/DIV&gt;

&lt;DIV&gt;long long iter;&lt;/DIV&gt;

&lt;DIV&gt;#pragma omp parallel for private(iter)&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; for(iter = 0 ; iter &amp;lt; N ; iter ++)&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;{&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;__m512d _A, _B;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;double * p1;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;p1 = (double * )_mm_malloc(sizeof(double)*1024, 512);&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;STRONG&gt;//p1 has to be thread-private&lt;/STRONG&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;......&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;_A = _mm512_load_pd((void*)p); &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;STRONG&gt;//p has to be aligned&lt;/STRONG&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;_B = _mm512_load_pd((void*)p1);&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;STRONG&gt;//p1 has to be aligned&lt;/STRONG&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;......&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;/* Calculations */&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;......&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; _mm_free(p1);&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;}&lt;/DIV&gt;

&lt;DIV&gt;#endif&lt;/DIV&gt;

&lt;DIV&gt;}&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;P&gt;&lt;STRONG&gt;Thus p1 is allocated repeatedly inside the loop to make sure it's thread-private, while p1 has to be aligned.&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2015 09:05:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041028#M46033</guid>
      <dc:creator>Nick_L_1</dc:creator>
      <dc:date>2015-06-17T09:05:38Z</dc:date>
    </item>
    <item>
      <title>At the very least you should</title>
      <link>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041029#M46034</link>
      <description>&lt;P&gt;At the very least you should structure that more like this (which allocates once per thread, rather than once per iteration)&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;#pragma omp parallel
{
    long long iter;     // Though does it *really* need to be 64 bits!? How many iterations do you have?
&amp;nbsp;                       // 64bit indexes are likely inefficient.
    double * p1 = (double *) _mm_malloc (sizeof(double)*1024, 512);

#pragma omp for
    for (iter=0; iter&amp;lt;N; iter++)
    {
        _mm_512d _A;
       ... etc ...
    }

&amp;nbsp;   _mm_free (p1);
}&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2015 09:26:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041029#M46034</guid>
      <dc:creator>James_C_Intel2</dc:creator>
      <dc:date>2015-06-17T09:26:38Z</dc:date>
    </item>
    <item>
      <title>Quote:James Cownie (Intel)</title>
      <link>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041030#M46035</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;James Cownie (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;At the very least you should structure that more like this (which allocates once per thread, rather than once per iteration)&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;#pragma omp parallel
{
    long long iter;     // Though does it *really* need to be 64 bits!? How many iterations do you have?
&amp;nbsp;                       // 64bit indexes are likely inefficient.
    double * p1 = (double *) _mm_malloc (sizeof(double)*1024, 512);

#pragma omp for
    for (iter=0; iter&amp;lt;N; iter++)
    {
        _mm_512d _A;
       ... etc ...
    }

&amp;nbsp;   _mm_free (p1);
}&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I really have that many iterations. Reconstructing the code helps ,thanks~&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2015 23:42:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Data-alignment-problem/m-p/1041030#M46035</guid>
      <dc:creator>Nick_L_1</dc:creator>
      <dc:date>2015-06-17T23:42:45Z</dc:date>
    </item>
  </channel>
</rss>

