<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I'm guessing that you haven't in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Responsive-OpenMP-Theads-in-Hybrid-Parallel-Environment/m-p/968497#M5486</link>
    <description>&lt;P&gt;I'm guessing that you haven't done anything to control affinity when you combine multiple ranks with OpenMP on a node. &amp;nbsp;This is particularly important if your nodes are NUMA, where you should pin each rank to a group of cores which shares a cache and take care that you spread threads across cores (which may be difficult if HyperThreading is engaged).&lt;/P&gt;

&lt;P&gt;It's not clear to me, even after reading&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.nas.nasa.gov/hecc/support/kb/With-MVAPICH-and-Intel-OpenMP_210.html" target="_blank"&gt;http://www.nas.nasa.gov/hecc/support/kb/With-MVAPICH-and-Intel-OpenMP_210.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;whether mvapich has the hybrid affinity options similar to Intel MPI (where it works by default) or openmpi (where you must specify it). &amp;nbsp;It seems that mvapich is (or was 4 years ago) not designed for this, but would work if you disabled mvapich affinity and set up your job so as to specify a separate OpenMP affinity group for each rank, using OMP_PLACES or KMP_AFFINITY.&lt;/P&gt;</description>
    <pubDate>Wed, 02 Apr 2014 13:52:18 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2014-04-02T13:52:18Z</dc:date>
    <item>
      <title>Responsive OpenMP Theads in Hybrid Parallel Environment</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Responsive-OpenMP-Theads-in-Hybrid-Parallel-Environment/m-p/968496#M5485</link>
      <description>&lt;P&gt;I have a Fortran code that runs both MPI and OpenMP. &amp;nbsp;I have done some profiling of the code on an 8 core windows laptop varying the number of mpi &amp;nbsp;tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface. &amp;nbsp;The problem I am having is when I port over to a Linux cluster with several 8-core nodes. &amp;nbsp;Specifically, my openmp thread parallelism performance is very poor. &amp;nbsp;Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue. &amp;nbsp;I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue. &amp;nbsp;What I have tried so far ...&lt;/P&gt;

&lt;P&gt;1. &amp;nbsp;setenv OMP_WAIT_POLICY active &amp;nbsp; &amp;nbsp; &amp;nbsp;## seems to make sense&lt;BR /&gt;
	2. &amp;nbsp;setenv KMP_BLOCKTIME 1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;## this is counter to what I have read but when I set this to a large number (25000) code is very slow&lt;BR /&gt;
	3. &amp;nbsp;removed some old "unlimited" limit settings (viz., stacksize, coresize) &amp;nbsp;that I have had since "dawn of time." &amp;nbsp;This also helped openmp thread performance significantly.&lt;/P&gt;

&lt;P&gt;It seems I am looking for ways to reasonably assure my OpenMP threads don't vanish between the parallel regions I have in the code and making sure these threads are as system-wise lightweight as possible. &amp;nbsp;These above corrections do not seem to impact mpi tasking. &amp;nbsp;Are there any other&lt;BR /&gt;
	recommendations? &amp;nbsp;By the way, the mpi tasks are using an mvapich library on a cluster with IB. &amp;nbsp;The code is compiled with "-openmp" (-Qopenmp).&lt;/P&gt;

&lt;P&gt;Thank you in advance.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Apr 2014 12:44:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Responsive-OpenMP-Theads-in-Hybrid-Parallel-Environment/m-p/968496#M5485</guid>
      <dc:creator>Don_K_</dc:creator>
      <dc:date>2014-04-02T12:44:23Z</dc:date>
    </item>
    <item>
      <title>I'm guessing that you haven't</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Responsive-OpenMP-Theads-in-Hybrid-Parallel-Environment/m-p/968497#M5486</link>
      <description>&lt;P&gt;I'm guessing that you haven't done anything to control affinity when you combine multiple ranks with OpenMP on a node. &amp;nbsp;This is particularly important if your nodes are NUMA, where you should pin each rank to a group of cores which shares a cache and take care that you spread threads across cores (which may be difficult if HyperThreading is engaged).&lt;/P&gt;

&lt;P&gt;It's not clear to me, even after reading&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.nas.nasa.gov/hecc/support/kb/With-MVAPICH-and-Intel-OpenMP_210.html" target="_blank"&gt;http://www.nas.nasa.gov/hecc/support/kb/With-MVAPICH-and-Intel-OpenMP_210.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;whether mvapich has the hybrid affinity options similar to Intel MPI (where it works by default) or openmpi (where you must specify it). &amp;nbsp;It seems that mvapich is (or was 4 years ago) not designed for this, but would work if you disabled mvapich affinity and set up your job so as to specify a separate OpenMP affinity group for each rank, using OMP_PLACES or KMP_AFFINITY.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Apr 2014 13:52:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Responsive-OpenMP-Theads-in-Hybrid-Parallel-Environment/m-p/968497#M5486</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2014-04-02T13:52:18Z</dc:date>
    </item>
  </channel>
</rss>

