<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic nested omp parallelization in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/nested-omp-parallelization/m-p/771287#M133</link>
    <description>How many hardware threads are available on your system?&lt;BR /&gt;&lt;BR /&gt;Can you provide a code sketch or sample program?&lt;BR /&gt;&lt;BR /&gt;Are you timing the 1st time performing the nested calls or multiple times?&lt;BR /&gt;(discard the 1st time, average or pick smallest of next 5 times).&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey</description>
    <pubDate>Sat, 02 Apr 2011 12:44:41 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2011-04-02T12:44:41Z</dc:date>
    <item>
      <title>nested omp parallelization</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/nested-omp-parallelization/m-p/771286#M132</link>
      <description>Hi,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;If one process launches two threads (A1 and A2), can one of the threads (say A2) launches 8 threads (B1, ... B8) again such that the total 9 threads running in parallel?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;currently, my simple testing codes show that executing A1, finishing it, then executing A2( launches 8 threads) is much faster than launch A1 and A2 simultaneously. But i am not sure my codes use the correct ways or not and how to use nested omp efficiently.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;thanks,&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Sat, 02 Apr 2011 02:33:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/nested-omp-parallelization/m-p/771286#M132</guid>
      <dc:creator>pilot117</dc:creator>
      <dc:date>2011-04-02T02:33:54Z</dc:date>
    </item>
    <item>
      <title>nested omp parallelization</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/nested-omp-parallelization/m-p/771287#M133</link>
      <description>How many hardware threads are available on your system?&lt;BR /&gt;&lt;BR /&gt;Can you provide a code sketch or sample program?&lt;BR /&gt;&lt;BR /&gt;Are you timing the 1st time performing the nested calls or multiple times?&lt;BR /&gt;(discard the 1st time, average or pick smallest of next 5 times).&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey</description>
      <pubDate>Sat, 02 Apr 2011 12:44:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/nested-omp-parallelization/m-p/771287#M133</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2011-04-02T12:44:41Z</dc:date>
    </item>
    <item>
      <title>nested omp parallelization</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/nested-omp-parallelization/m-p/771288#M134</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I have four core cpu. The testing codes is like this:&lt;BR /&gt;&lt;BR /&gt;sequential call:&lt;BR /&gt;---------------------------------------------------------------------------------------&lt;BR /&gt;double start=omp_get_wtime();&lt;BR /&gt;myHeavyFunction();&lt;BR /&gt;omp_set_num_threads(4);&lt;BR /&gt;#pragma omp parallel&lt;BR /&gt; {&lt;BR /&gt; unsigned int thread_id = omp_get_thread_num();&lt;BR /&gt; if(thread_id==0)&lt;BR /&gt; func();&lt;BR /&gt; if(thread_id==1)&lt;BR /&gt;
 func();&lt;BR /&gt; if(thread_id==2)&lt;BR /&gt;
 func();&lt;BR /&gt; if(thread_id==3)&lt;BR /&gt;
 func(); &lt;BR /&gt; }&lt;BR /&gt; printf("test time is %e\n",finish-start);&lt;BR /&gt;---------------------------------------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;nested call:&lt;BR /&gt;&lt;BR /&gt;---------------------------------------------------------------------------------------&lt;BR /&gt;double start=omp_get_wtime();&lt;BR /&gt;omp_set_num_threads(2);&lt;BR /&gt;#pragma omp parallel&lt;BR /&gt;
 {&lt;BR /&gt;
 unsigned int thread_id = omp_get_thread_num();&lt;BR /&gt;
 if(thread_id==0)&lt;BR /&gt;
 myHeavyFunc();&lt;BR /&gt;
 if(thread_id==1){&lt;BR /&gt; omp_set_num_threads(4);&lt;BR /&gt;#pragma omp parallel&lt;BR /&gt;
    {&lt;BR /&gt;
 unsigned int thread_id = omp_get_thread_num();&lt;BR /&gt;
 if(thread_id==0)&lt;BR /&gt;
 func();&lt;BR /&gt;
     if(thread_id==1)&lt;BR /&gt;

 func();&lt;BR /&gt;
 if(thread_id==2)&lt;BR /&gt;

 func();&lt;BR /&gt;
 if(thread_id==3)&lt;BR /&gt;

 func(); &lt;BR /&gt; }&lt;BR /&gt;
  }&lt;BR /&gt;

 } &lt;BR /&gt;printf("test time is %e\n",finish-start);&lt;BR /&gt;
 ---------------------------------------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;the sequential call is faster based on my timing provided in the above pseudocode. But when I use 3 threads in the inner omp region of nested call, it gets faster as expected (suppose that four cores occpuied by four threads is the best case). &lt;BR /&gt;&lt;BR /&gt;In my real codes, in fact, myHeavyFunc() is doing nothing but just launch GPU kernel. So although it is "heavy", the work is done on the GPU side. That thread is supposed not occupy any cpu rescource. I dont know whether the OS will put that thread in the pool but allocate the hardware resources to other CPU computing threads. &lt;BR /&gt;&lt;BR /&gt;hope this can give you a rough idea what i am doing. thanks for the help!</description>
      <pubDate>Sat, 02 Apr 2011 18:32:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/nested-omp-parallelization/m-p/771288#M134</guid>
      <dc:creator>pilot117</dc:creator>
      <dc:date>2011-04-02T18:32:09Z</dc:date>
    </item>
    <item>
      <title>nested omp parallelization</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/nested-omp-parallelization/m-p/771289#M135</link>
      <description>Two things:&lt;BR /&gt;&lt;BR /&gt;1) how many cores (or HT hw threads)are on your system?&lt;BR /&gt;&lt;BR /&gt;2) add in front of your timed section of code&lt;BR /&gt;&lt;BR /&gt;---------------------------------------------------------------------------------------&lt;BR /&gt;omp_set_num_threads(2);&lt;BR /&gt;#pragma omp parallel&lt;BR /&gt; {&lt;BR /&gt; unsigned int thread_id = omp_get_thread_num();&lt;BR /&gt; if(thread_id==0)&lt;BR /&gt; doNothing();&lt;BR /&gt; if(thread_id==1){&lt;BR /&gt; omp_set_num_threads(4);&lt;BR /&gt;#pragma omp parallel&lt;BR /&gt;    {&lt;BR /&gt; unsigned int thread_id = omp_get_thread_num();&lt;BR /&gt; if(thread_id==0)&lt;BR /&gt; doNothing();&lt;BR /&gt;     if(thread_id==1)&lt;BR /&gt; doNothing();&lt;BR /&gt; if(thread_id==2)&lt;BR /&gt; doNothing();&lt;BR /&gt; if(thread_id==3)&lt;BR /&gt; doNothing(); &lt;BR /&gt; }&lt;BR /&gt;  }&lt;BR /&gt; }&lt;BR /&gt; ---------------------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;Now run your timed section of code.&lt;BR /&gt;&lt;BR /&gt;The next thing to do is to time each thread, use an array, be wary of reuse of thread_id.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey</description>
      <pubDate>Mon, 04 Apr 2011 14:21:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/nested-omp-parallelization/m-p/771289#M135</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2011-04-04T14:21:00Z</dc:date>
    </item>
  </channel>
</rss>

