<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic combining pthread and offload in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918008#M12985</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; I have some old codes written pthread. i want to simply add some #pragma, and offload some hot loops to mic. But I got some problems. For example, the following codes create 5 threads, and each of them sum the numbers in pArray altogether. I got two problems there: 1) only one thread can provide correct sum result, and the other 4 just print 0; 2) the program will hang in the "free(pArray)" statement.&amp;nbsp;Any hints to explain these two problems?&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; Thanks a lot!&lt;/P&gt;
&lt;P&gt;#include &amp;lt;pthread.h&amp;gt;&lt;BR /&gt;#include &amp;lt;stdio.h&amp;gt;&lt;BR /&gt;#define NUM_THREADS 5&lt;/P&gt;
&lt;P&gt;#define SIZE 1000000&lt;BR /&gt;int * pArray;&lt;/P&gt;
&lt;P&gt;void * PrintHello(void *threadid)&lt;BR /&gt;{&lt;BR /&gt; long tid;&lt;BR /&gt; tid = (long)threadid;&lt;BR /&gt; printf("Hello World! It's me, thread #%ld!\n", tid);&lt;BR /&gt; pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;void * pCount(void * threadid)&lt;BR /&gt;{&lt;BR /&gt; int i;&lt;BR /&gt; int iSum = 0;&lt;BR /&gt; &lt;BR /&gt; for(i=0; i&amp;lt;SIZE; i++ )&lt;BR /&gt; {&lt;BR /&gt; pArray&lt;I&gt; = pArray&lt;I&gt; + 1;&lt;BR /&gt; }&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;printf("done!\n");&lt;BR /&gt; pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;int main()&lt;BR /&gt;{&lt;BR /&gt; pthread_t threads[NUM_THREADS];&lt;BR /&gt; int rc;&lt;BR /&gt; long t;&lt;BR /&gt; int i;&lt;/P&gt;
&lt;P&gt;pArray = (int *)malloc(SIZE * sizeof(int));&lt;BR /&gt; for(i=0; i&amp;lt;SIZE; i++)&lt;BR /&gt; {&lt;BR /&gt; pArray&lt;I&gt; = i;&lt;BR /&gt; }&lt;BR /&gt; &lt;BR /&gt; &lt;BR /&gt; for(t=0; t&amp;lt;NUM_THREADS; t++) {&lt;BR /&gt; printf("In main: creating thread %ld\n", t);&lt;BR /&gt; rc = pthread_create(&amp;amp;threads&lt;T&gt;, NULL, pCount, (void *)t);&lt;BR /&gt; if(rc) {&lt;BR /&gt; printf("ERROR; return code from pthread_create() is %d\n", rc);&lt;BR /&gt; exit(-1);&lt;BR /&gt; }&lt;BR /&gt; }&lt;/T&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;for(t=0; t&amp;lt;NUM_THREADS; t++)&lt;BR /&gt; {&lt;BR /&gt; pthread_join(threads&lt;T&gt;, NULL);&lt;BR /&gt; }&lt;/T&gt;&lt;/P&gt;
&lt;P&gt;free(pArray);&lt;/P&gt;
&lt;P&gt;pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;</description>
    <pubDate>Tue, 11 Jun 2013 02:16:07 GMT</pubDate>
    <dc:creator>songlinhai</dc:creator>
    <dc:date>2013-06-11T02:16:07Z</dc:date>
    <item>
      <title>combining pthread and offload</title>
      <link>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918008#M12985</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; I have some old codes written pthread. i want to simply add some #pragma, and offload some hot loops to mic. But I got some problems. For example, the following codes create 5 threads, and each of them sum the numbers in pArray altogether. I got two problems there: 1) only one thread can provide correct sum result, and the other 4 just print 0; 2) the program will hang in the "free(pArray)" statement.&amp;nbsp;Any hints to explain these two problems?&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; Thanks a lot!&lt;/P&gt;
&lt;P&gt;#include &amp;lt;pthread.h&amp;gt;&lt;BR /&gt;#include &amp;lt;stdio.h&amp;gt;&lt;BR /&gt;#define NUM_THREADS 5&lt;/P&gt;
&lt;P&gt;#define SIZE 1000000&lt;BR /&gt;int * pArray;&lt;/P&gt;
&lt;P&gt;void * PrintHello(void *threadid)&lt;BR /&gt;{&lt;BR /&gt; long tid;&lt;BR /&gt; tid = (long)threadid;&lt;BR /&gt; printf("Hello World! It's me, thread #%ld!\n", tid);&lt;BR /&gt; pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;void * pCount(void * threadid)&lt;BR /&gt;{&lt;BR /&gt; int i;&lt;BR /&gt; int iSum = 0;&lt;BR /&gt; &lt;BR /&gt; for(i=0; i&amp;lt;SIZE; i++ )&lt;BR /&gt; {&lt;BR /&gt; pArray&lt;I&gt; = pArray&lt;I&gt; + 1;&lt;BR /&gt; }&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;printf("done!\n");&lt;BR /&gt; pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;int main()&lt;BR /&gt;{&lt;BR /&gt; pthread_t threads[NUM_THREADS];&lt;BR /&gt; int rc;&lt;BR /&gt; long t;&lt;BR /&gt; int i;&lt;/P&gt;
&lt;P&gt;pArray = (int *)malloc(SIZE * sizeof(int));&lt;BR /&gt; for(i=0; i&amp;lt;SIZE; i++)&lt;BR /&gt; {&lt;BR /&gt; pArray&lt;I&gt; = i;&lt;BR /&gt; }&lt;BR /&gt; &lt;BR /&gt; &lt;BR /&gt; for(t=0; t&amp;lt;NUM_THREADS; t++) {&lt;BR /&gt; printf("In main: creating thread %ld\n", t);&lt;BR /&gt; rc = pthread_create(&amp;amp;threads&lt;T&gt;, NULL, pCount, (void *)t);&lt;BR /&gt; if(rc) {&lt;BR /&gt; printf("ERROR; return code from pthread_create() is %d\n", rc);&lt;BR /&gt; exit(-1);&lt;BR /&gt; }&lt;BR /&gt; }&lt;/T&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;for(t=0; t&amp;lt;NUM_THREADS; t++)&lt;BR /&gt; {&lt;BR /&gt; pthread_join(threads&lt;T&gt;, NULL);&lt;BR /&gt; }&lt;/T&gt;&lt;/P&gt;
&lt;P&gt;free(pArray);&lt;/P&gt;
&lt;P&gt;pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jun 2013 02:16:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918008#M12985</guid>
      <dc:creator>songlinhai</dc:creator>
      <dc:date>2013-06-11T02:16:07Z</dc:date>
    </item>
    <item>
      <title>I forgot to add #pragma</title>
      <link>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918009#M12986</link>
      <description>&lt;P&gt;I forgot to add #pragma offload in my post. The codes I tried are attached in this post.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jun 2013 02:18:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918009#M12986</guid>
      <dc:creator>songlinhai</dc:creator>
      <dc:date>2013-06-11T02:18:25Z</dc:date>
    </item>
    <item>
      <title>#pragma offload target(mic:0)</title>
      <link>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918010#M12987</link>
      <description>&lt;P&gt;#pragma offload target(mic:0) \&lt;BR /&gt; inout(pArray:length(SIZE)) &lt;BR /&gt; #pragma omp parallel for private(i) num_threads(100) reduction(+:iSum)&lt;BR /&gt; for(i=0; i&amp;lt;SIZE; i++ )&lt;BR /&gt; {&lt;BR /&gt; iSum += pArray&lt;I&gt;;&lt;BR /&gt; }&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I attached wrong programs again. Sorry about these two mistakes.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jun 2013 02:20:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918010#M12987</guid>
      <dc:creator>songlinhai</dc:creator>
      <dc:date>2013-06-11T02:20:15Z</dc:date>
    </item>
    <item>
      <title>I am still not quite sure</title>
      <link>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918011#M12988</link>
      <description>&lt;P&gt;I am still not quite sure what you are trying to accomplish with this example. Could you please give us some background and the correct code for your example.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jun 2013 20:04:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918011#M12988</guid>
      <dc:creator>Sumedh_N_Intel</dc:creator>
      <dc:date>2013-06-11T20:04:35Z</dc:date>
    </item>
    <item>
      <title>Thanks a lot for the reply!</title>
      <link>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918012#M12989</link>
      <description>&lt;P&gt;Thanks a lot for the reply!&lt;/P&gt;
&lt;P&gt;I just want to test how to mix pthread and offload. I attach the "correct" codes I use as follows:&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;#include &amp;lt;pthread.h&amp;gt;&lt;BR /&gt;#include &amp;lt;stdio.h&amp;gt;&lt;BR /&gt;#define NUM_THREADS 5&lt;/P&gt;
&lt;P&gt;#define SIZE 1000000&lt;BR /&gt;__attribute__((target(mic))) int * pArray;&lt;/P&gt;
&lt;P&gt;void * PrintHello(void *threadid)&lt;BR /&gt;{&lt;BR /&gt; long tid;&lt;BR /&gt; tid = (long)threadid;&lt;BR /&gt; printf("Hello World! It's me, thread #%ld!\n", tid);&lt;BR /&gt; pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;void * pCount(void * threadid)&lt;BR /&gt;{&lt;BR /&gt; int i;&lt;BR /&gt; int iSum = 0;&lt;/P&gt;
&lt;P&gt;#pragma offload target(mic:0) \&lt;BR /&gt; inout(pArray:length(SIZE)) &lt;BR /&gt; #pragma omp parallel for private(i) num_threads(100) reduction(+:iSum)&lt;BR /&gt; for(i=0; i&amp;lt;SIZE; i++ )&lt;BR /&gt; {&lt;BR /&gt; iSum += pArray&lt;I&gt;;&lt;BR /&gt; }&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;printf("%d\n", iSum);&lt;BR /&gt; pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;int main()&lt;BR /&gt;{&lt;BR /&gt; pthread_t threads[NUM_THREADS];&lt;BR /&gt; int rc;&lt;BR /&gt; long t;&lt;BR /&gt; int i;&lt;/P&gt;
&lt;P&gt;pArray = (int *)malloc(SIZE * sizeof(int));&lt;BR /&gt; for(i=0; i&amp;lt;SIZE; i++)&lt;BR /&gt; {&lt;BR /&gt; pArray&lt;I&gt; = i;&lt;BR /&gt; }&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt; for(t=0; t&amp;lt;NUM_THREADS; t++) {&lt;BR /&gt; printf("In main: creating thread %ld\n", t);&lt;BR /&gt; rc = pthread_create(&amp;amp;threads&lt;T&gt;, NULL, pCount, (void *)t);&lt;BR /&gt; if(rc) {&lt;BR /&gt; printf("ERROR; return code from pthread_create() is %d\n", rc);&lt;BR /&gt; exit(-1);&lt;BR /&gt; }&lt;BR /&gt; }&lt;/T&gt;&lt;/P&gt;
&lt;P&gt;for(t=0; t&amp;lt;NUM_THREADS; t++)&lt;BR /&gt; {&lt;BR /&gt; pthread_join(threads&lt;T&gt;, NULL);&lt;BR /&gt; }&lt;/T&gt;&lt;/P&gt;
&lt;P&gt;printf("after join\n");&lt;/P&gt;
&lt;P&gt;free(pArray);&lt;/P&gt;
&lt;P&gt;printf("after free");&lt;BR /&gt; pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;The program will hang in "free(pArray)" statement. I do not understand why the program hangs.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best,&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jun 2013 20:17:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918012#M12989</guid>
      <dc:creator>songlinhai</dc:creator>
      <dc:date>2013-06-11T20:17:55Z</dc:date>
    </item>
    <item>
      <title>Only one thread in your code</title>
      <link>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918013#M12990</link>
      <description>&lt;P&gt;Only one thread in your code provides the correct answer because there is a race condition in your code. The offload runtime links the host-side and coprocessor-side arrays based on a the host-side pointers. In your case, since the host-side array is the same, the offload runtime does not create a separate copy of the array for each of the offloads (five offloads in this case: one offload for each pthread). Since the offload runtime is trying to allocate the same array again and again, only the first thread succeeds where as the others fail. This results in the incorrect answers.&lt;/P&gt;
&lt;P&gt;If you reorder your code such that allocations and free of the array happen only &amp;nbsp;once then all your threads will report the correct answer. Your code should look similar to this:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[cpp]&lt;/P&gt;
&lt;P&gt;#include &amp;lt;stdlib.h&amp;gt;&lt;BR /&gt;#include &amp;lt;pthread.h&amp;gt;&lt;BR /&gt;#include &amp;lt;stdio.h&amp;gt;&lt;BR /&gt;#define NUM_THREADS 5&lt;/P&gt;
&lt;P&gt;#define SIZE 1000000&lt;BR /&gt;__attribute__((target(mic))) int * pArray;&lt;/P&gt;
&lt;P&gt;void * PrintHello(void *threadid)&lt;BR /&gt;{&lt;BR /&gt;long tid;&lt;BR /&gt;tid = (long)threadid;&lt;BR /&gt;printf("Hello World! It's me, thread #%ld!\n", tid);&lt;BR /&gt;pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;void * pCount(void * threadid)&lt;BR /&gt;{&lt;BR /&gt;int i;&lt;BR /&gt;int iSum = 0;&lt;/P&gt;
&lt;P&gt;#pragma offload target(mic:0) \&lt;BR /&gt;in(pArray:length(0) alloc_if(0) free_if(0))&lt;BR /&gt;#pragma omp parallel for private(i) num_threads(100) reduction(+:iSum)&lt;BR /&gt;for(i=0; i&amp;lt;SIZE; i++ )&lt;BR /&gt;{&lt;BR /&gt;iSum += pArray&lt;I&gt;;&lt;BR /&gt;}&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;printf("%d\n", iSum);&lt;BR /&gt;pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;int main()&lt;BR /&gt;{&lt;BR /&gt;pthread_t threads[NUM_THREADS];&lt;BR /&gt;int rc;&lt;BR /&gt;long t;&lt;BR /&gt;int i;&lt;/P&gt;
&lt;P&gt;pArray = (int *)malloc(SIZE * sizeof(int));&lt;BR /&gt;for(i=0; i&amp;lt;SIZE; i++)&lt;BR /&gt;{&lt;BR /&gt;pArray&lt;I&gt; = i;&lt;BR /&gt;}&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;#pragma offload_transfer target(mic) in(pArray:length(SIZE) alloc_if(1) free_if(0))&lt;/P&gt;
&lt;P&gt;for(t=0; t&amp;lt;NUM_THREADS; t++) {&lt;BR /&gt;printf("In main: creating thread %ld\n", t);&lt;BR /&gt;rc = pthread_create(&amp;amp;threads&lt;T&gt;, NULL, pCount, (void *)t);&lt;BR /&gt;if(rc) {&lt;BR /&gt;printf("ERROR; return code from pthread_create() is %d\n", rc);&lt;/T&gt;&lt;/P&gt;
&lt;P&gt;exit(-1);&lt;BR /&gt;}&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;for(t=0; t&amp;lt;NUM_THREADS; t++)&lt;BR /&gt;{&lt;BR /&gt;pthread_join(threads&lt;T&gt;, NULL);&lt;BR /&gt;}&lt;/T&gt;&lt;/P&gt;
&lt;P&gt;#pragma offload_transfer target(mic) out(pArray:length(SIZE) alloc_if(0) free_if(1))&lt;/P&gt;
&lt;P&gt;printf("after join\n");&lt;/P&gt;
&lt;P&gt;free(pArray);&lt;/P&gt;
&lt;P&gt;printf("after free");&lt;BR /&gt;//pthread_exit(NULL);&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;[/cpp]&lt;/P&gt;
&lt;P&gt;I must admit that I am unsure why you are spawning 5 pthreads and starting 5 offloads simultaneously on the same coprocessor. It would be better to spawn multiple threads on the host if you were offloading to different coprocessors for each thread. You should also note that through your 5 offloads you are trying to spawn about 500 threads which is more than the number of hardware threads available in the coprocessor.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Lastly, on further inspection I noticed that your code hangs at the pthread_exit() and not the free. If you comment it out then your code will work just fine. However, I am still unsure of why this is causing your code to hang. I will investigate further and get back to you with what I find.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jun 2013 14:34:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/combining-pthread-and-offload/m-p/918013#M12990</guid>
      <dc:creator>Sumedh_N_Intel</dc:creator>
      <dc:date>2013-06-12T14:34:59Z</dc:date>
    </item>
  </channel>
</rss>

