<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: More threads than sections ? in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868024#M2763</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/336004"&gt;Robert Reed (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;As others have suggested before me, the code as written would only take advantage of three of the 24 HW threads available on your machine. Here's chapter and verse from the OpenMP 3.0 specification:&lt;BR /&gt;&lt;BR /&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Each structured block is executed &lt;STRONG&gt;once by one of the threads&lt;/STRONG&gt; in the team in the context of its implicit task.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;BR /&gt;Also previously mentioned, the natural thing to look at is whether the elements of &lt;EM&gt;vx, vy&lt;/EM&gt; and &lt;EM&gt;vz&lt;/EM&gt; can be computed in parallel. Perhaps that &lt;EM&gt;for&lt;/EM&gt; loop cited in your post could be wrapped in an &lt;EM&gt;omp parallel for&lt;/EM&gt; construct? It would require that each of the array elements could be computed independently and in any order, but the &lt;EM&gt;parallel for&lt;/EM&gt; could use all 24 of your HW threads if such a computational organization is possible. If that works, I would start with the parallelization of the &lt;EM&gt;for loop&lt;/EM&gt; in &lt;EM&gt;computeV()&lt;/EM&gt; and skip the sections until I had the loop parallelization working.&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;SPAN style="font-size: large;"&gt;thank you all for your help !&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN style="color: #000000;"&gt;sorry for my wrong click for rating&lt;/SPAN&gt;.&lt;/STRONG&gt;</description>
    <pubDate>Fri, 18 Sep 2009 23:04:30 GMT</pubDate>
    <dc:creator>afd_lml</dc:creator>
    <dc:date>2009-09-18T23:04:30Z</dc:date>
    <item>
      <title>More threads than sections ?</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868020#M2759</link>
      <description>&lt;P&gt;&lt;BR /&gt;My program code is listed in the following:&lt;/P&gt;
&lt;P&gt;void mv(void)&lt;BR /&gt;{&lt;BR /&gt; double vx[size1]; // size1 = 10000&lt;BR /&gt; double vy[size2]; // size2 = 10000&lt;BR /&gt; double vz[size3]; // size3 = 10000&lt;/P&gt;
&lt;P&gt;// I must compute vx, vy, and vz seprately&lt;BR /&gt;computeV(vx); //computethearrayvx&lt;BR /&gt; computeV(vy);  // computethearrayvy&lt;BR /&gt; computeV(vz); //computethearrayvz&lt;BR /&gt;&lt;BR /&gt; //sum upvx, vy, vz&lt;BR /&gt; ...................................&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;void computeV(double v[])&lt;BR /&gt;{&lt;BR /&gt; //&lt;STRONG&gt;most computationwill becarried out in this function, heavilyCPU burden.&lt;/STRONG&gt;&lt;BR /&gt; //such as, calculate V by4 times fast fourier transformations (using intel MKL), and matrix-vector multiplication, likethis&lt;BR /&gt;for(int i=0; i&lt;N&gt;&lt;/N&gt; v&lt;I&gt; = ........&lt;BR /&gt; ........................................&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;My workstation has &lt;STRONG&gt;24-core&lt;/STRONG&gt; Xeon 7400, if Isimply use openmp like the following code,is there &lt;STRONG&gt;only 3 coresbuy&lt;/STRONG&gt; ?, in other words, 21 cores are idle ? &lt;STRONG&gt;What happens if the number of threads and the number of sections are different? More threads than sections ?&lt;/STRONG&gt; How to obtain the best performance of my code ? &lt;BR /&gt;&lt;BR /&gt;void mv(void)&lt;BR /&gt;{&lt;BR /&gt; double vx[size1]; // size1 = 100000&lt;BR /&gt; double vy[size2]; // size2 = 100000&lt;BR /&gt; double vz[size3]; // size3 = 100000&lt;/I&gt;&lt;/P&gt;
&lt;P&gt; #pragma omp parallel sections&lt;BR /&gt; {&lt;BR /&gt; #pragma omp section&lt;BR /&gt; computeV(vx);//computethearrayvx&lt;/P&gt;
&lt;P&gt; #pragma omp section&lt;BR /&gt; computeV(vy);// computethearrayvy&lt;BR /&gt;&lt;BR /&gt; #pragma omp section&lt;BR /&gt; computeV(vz);//computethearrayvz&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; //sum upvx, vy, vz&lt;BR /&gt; ...................................&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;Would anyone like to help me ? many thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Sep 2009 13:10:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868020#M2759</guid>
      <dc:creator>afd_lml</dc:creator>
      <dc:date>2009-09-18T13:10:20Z</dc:date>
    </item>
    <item>
      <title>Re: More threads than sections ?</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868021#M2760</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
hi,&lt;BR /&gt;&lt;BR /&gt;don't take my word for it (beginner myself) but the trivial 3-way parallelization you implemented is in fact limited to 3 cores.&lt;BR /&gt;your function computeV can theoretically be parallelized further (especially if it contains a simple enough, outermost master loop) but you have to analyze data dependency for that and eliminate shared writes. &lt;BR /&gt;if it's all about performance for a specific example, also try auto-vectorization and auto-parallelization first and see what they tell/give you.&lt;BR /&gt;&lt;BR /&gt;cheers,&lt;BR /&gt;andreas &lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 18 Sep 2009 13:25:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868021#M2760</guid>
      <dc:creator>gilthe</dc:creator>
      <dc:date>2009-09-18T13:25:32Z</dc:date>
    </item>
    <item>
      <title>Re: More threads than sections ?</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868022#M2761</link>
      <description>&lt;DIV style="margin: 0px; height: auto;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;In the above case the compiler can see you have 3 sections and "should" be able to schedule just 3 threads. This said, your program may not always clearly expose to the compiler an appropriate number of threads to use. For this there is the num_threads(n) modifier you can add&lt;BR /&gt;&lt;BR /&gt;#pragma omp parallel sections num_threads(3)&lt;BR /&gt;&lt;BR /&gt;Also, you may (or may not) find it benificial to request fewer threads than sections (e.g. when in nested parallel regions).&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey</description>
      <pubDate>Fri, 18 Sep 2009 14:11:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868022#M2761</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2009-09-18T14:11:00Z</dc:date>
    </item>
    <item>
      <title>Re: More threads than sections ?</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868023#M2762</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/441618"&gt;afd.lml&lt;/A&gt;&lt;EM&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;
&lt;P&gt;My program code is listed in the following:&lt;/P&gt;
&lt;P&gt;[snip]&lt;BR /&gt;&lt;BR /&gt;My workstation has &lt;STRONG&gt;24-core&lt;/STRONG&gt; Xeon 7400, if Isimply use openmp like the following code,is there &lt;STRONG&gt;only 3 coresbuy&lt;/STRONG&gt; ?, in other words, 21 cores are idle ? &lt;STRONG&gt;What happens if the number of threads and the number of sections are different? More threads than sections ?&lt;/STRONG&gt; How to obtain the best performance of my code ?&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;As others have suggested before me, the code as written would only take advantage of three of the 24 HW threads available on your machine. Here's chapter and verse from the OpenMP 3.0 specification:&lt;BR /&gt;&lt;BR /&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Each structured block is executed &lt;STRONG&gt;once by one of the threads&lt;/STRONG&gt; in the team in the context of its implicit task.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;BR /&gt;Also previously mentioned, the natural thing to look at is whether the elements of &lt;EM&gt;vx, vy&lt;/EM&gt; and &lt;EM&gt;vz&lt;/EM&gt; can be computed in parallel. Perhaps that &lt;EM&gt;for&lt;/EM&gt; loop cited in your post could be wrapped in an &lt;EM&gt;omp parallel for&lt;/EM&gt; construct? It would require that each of the array elements could be computed independently and in any order, but the &lt;EM&gt;parallel for&lt;/EM&gt; could use all 24 of your HW threads if such a computational organization is possible. If that works, I would start with the parallelization of the &lt;EM&gt;for loop&lt;/EM&gt; in &lt;EM&gt;computeV()&lt;/EM&gt; and skip the sections until I had the loop parallelization working.</description>
      <pubDate>Fri, 18 Sep 2009 19:03:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868023#M2762</guid>
      <dc:creator>robert-reed</dc:creator>
      <dc:date>2009-09-18T19:03:27Z</dc:date>
    </item>
    <item>
      <title>Re: More threads than sections ?</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868024#M2763</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/336004"&gt;Robert Reed (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;As others have suggested before me, the code as written would only take advantage of three of the 24 HW threads available on your machine. Here's chapter and verse from the OpenMP 3.0 specification:&lt;BR /&gt;&lt;BR /&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Each structured block is executed &lt;STRONG&gt;once by one of the threads&lt;/STRONG&gt; in the team in the context of its implicit task.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;BR /&gt;Also previously mentioned, the natural thing to look at is whether the elements of &lt;EM&gt;vx, vy&lt;/EM&gt; and &lt;EM&gt;vz&lt;/EM&gt; can be computed in parallel. Perhaps that &lt;EM&gt;for&lt;/EM&gt; loop cited in your post could be wrapped in an &lt;EM&gt;omp parallel for&lt;/EM&gt; construct? It would require that each of the array elements could be computed independently and in any order, but the &lt;EM&gt;parallel for&lt;/EM&gt; could use all 24 of your HW threads if such a computational organization is possible. If that works, I would start with the parallelization of the &lt;EM&gt;for loop&lt;/EM&gt; in &lt;EM&gt;computeV()&lt;/EM&gt; and skip the sections until I had the loop parallelization working.&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;SPAN style="font-size: large;"&gt;thank you all for your help !&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN style="color: #000000;"&gt;sorry for my wrong click for rating&lt;/SPAN&gt;.&lt;/STRONG&gt;</description>
      <pubDate>Fri, 18 Sep 2009 23:04:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/More-threads-than-sections/m-p/868024#M2763</guid>
      <dc:creator>afd_lml</dc:creator>
      <dc:date>2009-09-18T23:04:30Z</dc:date>
    </item>
  </channel>
</rss>

