<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I'm not familiar enough with in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934252#M4988</link>
    <description>&lt;P&gt;I'm not familiar enough with the looks of KMP_AFFINITY verbose on non-Intel CPUs (if that's what you have).&lt;/P&gt;
&lt;P&gt;Is it correct that you have a single 8-core non-Intel CPU on each node?&amp;nbsp; If so, you may need to try the effect of various choices for using 4 of those cores, such as&lt;/P&gt;
&lt;P&gt;KMP_AFFINITY="proclist=[1,3,5,7],explicit,verbose"&lt;/P&gt;
&lt;P&gt;which, on an Intel CPU, ought to come out the same as&lt;/P&gt;
&lt;P&gt;KMP_AFFINITY=scatter,1,1,verbose&lt;/P&gt;
&lt;P&gt;In my experience, you always need the ""&amp;nbsp; around a proclist, presumably on account of the embedded punctuation.&lt;/P&gt;
&lt;P&gt;A reason for trying the odd numbered cores might be that your system assigns interrupts to even numbered ones.&amp;nbsp; I have no idea if this might be true of AMD, or why you decline to identify yours.&lt;/P&gt;
&lt;P&gt;Alternatively, if your CPU shares cache between even-odd core pairs, and your application doesn't need to be spread across all of cache, the choice you suggested may be best, if you get the syntax right.&lt;/P&gt;
&lt;P&gt;If you assign 8 threads per node, but restrict them to 4 cores by an affinity setting such as you suggest, you will likely get worse results than without affinity setting.&amp;nbsp; proclist=[0-7] would request use of all 8 cores in order.&lt;/P&gt;</description>
    <pubDate>Wed, 06 Feb 2013 16:27:35 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2013-02-06T16:27:35Z</dc:date>
    <item>
      <title>MPI+OPENMP does not speed up</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934246#M4982</link>
      <description>&lt;P&gt;Hi everyone&lt;/P&gt;
&lt;P&gt;I need help for a problem that bothers me several days.&lt;/P&gt;
&lt;P&gt;When I use only OpenMP, the code works fine, the time for 4 threads is 70s and 136s for 8 threads.&lt;/P&gt;
&lt;P&gt;However, when I use MPI+OpenMP, I found the code did not spped up, i.e. the time for threads=1 and 8 are the same!!!&lt;/P&gt;
&lt;P&gt;I am using intel fortran I compiled it in this way: mpif90 -openmp -check all -hybpi.f90 -o hybpi&lt;/P&gt;
&lt;P&gt;I also uploaded my codes, they are very easy, just to compute the value of PI. &amp;nbsp;And I also pate the PBS script for MPI+OpenMP&lt;/P&gt;
&lt;P&gt;!/bin/sh -e &lt;BR /&gt;#PBS -N thread8_hybpi&lt;BR /&gt;#PBS -e out.err &lt;BR /&gt;#PBS -o out.out &lt;BR /&gt;#PBS -l walltime=2:00:00,nodes=2:ppn=12:nogpu&lt;BR /&gt;#PBS -k oe&lt;/P&gt;
&lt;P&gt;cd $PBS_O_WORKDIR&lt;BR /&gt;cat $PBS_NODEFILE &amp;gt; nodefile&lt;BR /&gt;cat $PBS_NODEFILE | uniq &amp;gt; ./mpd_nodefile_$USER&lt;BR /&gt;export NPROCS=`wc -l mpd_nodefile_$USER |gawk '//{print $1}'`&lt;BR /&gt;export OMP_NUM_THREADS=8&lt;BR /&gt;ulimit -s hard&lt;/P&gt;
&lt;P&gt;WORKDIR=/home/siwei/module&lt;/P&gt;
&lt;P&gt;cd $WORKDIR&lt;/P&gt;
&lt;P&gt;MPIEXEC="/apps/mvapich2-1.7-r5123/build-intel/bin/mpirun"&lt;BR /&gt;mpiexec -machinefile mpd_nodefile_$USER -np $NPROCS /bin/env \&lt;BR /&gt; OMP_NUM_THREADS=8 ./hybpi&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I hope someone can help me!&lt;/P&gt;
&lt;P&gt;Thanks in advance!!!&lt;/P&gt;
&lt;P&gt;Siwei&lt;/P&gt;</description>
      <pubDate>Mon, 04 Feb 2013 14:55:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934246#M4982</guid>
      <dc:creator>Siwei_D_</dc:creator>
      <dc:date>2013-02-04T14:55:46Z</dc:date>
    </item>
    <item>
      <title>If you run multiple MPI</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934247#M4983</link>
      <description>&lt;P&gt;If you run multiple MPI processes per node, you probably need to arrange so that each MPI process gets a distinct list of cores in KMP_AFFINITY.&amp;nbsp; If your application uses all cores effectively with MPI alone, it probably won't do any better with a combination of MPI and OpenMP..&lt;/P&gt;
&lt;P&gt;If you have 12 cores per 2 CPU node, you should try:&lt;/P&gt;
&lt;P&gt;1 process per node x 8 and 12 OpenMP threads&lt;/P&gt;
&lt;P&gt;2 processes affinitized by CPU, 4 and 6 threads per&lt;/P&gt;
&lt;P&gt;4 proc, 3 threads per&lt;/P&gt;
&lt;P&gt;6 proc, 2 threads per&lt;/P&gt;
&lt;P&gt;In my work on Westmere, the cases with 2 and 3 threads per MPI rank worked best.&lt;/P&gt;
&lt;P&gt;It's nearly certain that having multiple MPI ranks competing to run their threads on the same cores will degrade performance.&amp;nbsp; If you had an application set up to benefit from HyperThreading, you would still need to assure that the correct pairs of threads from a single rank land on each core.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Feb 2013 15:42:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934247#M4983</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2013-02-04T15:42:29Z</dc:date>
    </item>
    <item>
      <title>Hi Tim</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934248#M4984</link>
      <description>&lt;P&gt;Hi Tim&lt;/P&gt;
&lt;P&gt;Thank you so much for your reply so fast!!&lt;/P&gt;
&lt;P&gt;In my cluster 1 node has two sockets, each socket has 6 cores, so 1 node has 12 cores.&lt;/P&gt;
&lt;P&gt;When I use hybrid MPI+OpenMP using 2 nodes, 8 threads. I just let 8 cores in each node work&lt;/P&gt;
&lt;P&gt;I don't know how to use the KMP_AFFINITY you said.&lt;/P&gt;
&lt;P&gt;Could you tell me detail for example how to do 4 proc, 3 threads per you mentioned?&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Feb 2013 16:28:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934248#M4984</guid>
      <dc:creator>Siwei_D_</dc:creator>
      <dc:date>2013-02-04T16:28:30Z</dc:date>
    </item>
    <item>
      <title>Hi Tim</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934249#M4985</link>
      <description>&lt;P&gt;Hi Tim&lt;/P&gt;
&lt;P&gt;I think I did not make it clear. I think that I did not make multiple MPI ranks competng their threads on the same core.&lt;/P&gt;
&lt;P&gt;In the PBS script. I use 2 nodes(2 MPI tasks or 2 MPI ranks) with each node 8 threads(one thread one core), so their is no competing in the same core.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Feb 2013 16:32:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934249#M4985</guid>
      <dc:creator>Siwei_D_</dc:creator>
      <dc:date>2013-02-04T16:32:08Z</dc:date>
    </item>
    <item>
      <title>If you are using just 1 MPI</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934250#M4986</link>
      <description>&lt;P&gt;If you are using just 1 MPI process per node, then it's fairly easy; simply set appropriate KMP_AFFINITY environment variable, same for each node, according to whether you have HT enabled.&lt;/P&gt;
&lt;P&gt;If it's a Westmere, you may have to deal with its peculiarity; the optimum way to run 8 threads per node is with 1 thread per L3 connection, recognizing that the first 2 pairs of cores share cache connections.&amp;nbsp; But for a start, you must recognize that it's important to use the thread pinning of your OpenMP.&amp;nbsp; If you don't, your MPI job will be paced by the worst accidental thread placement of either node.&amp;nbsp;&amp;nbsp; You might start by reading the documentation which comes with the compiler.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Feb 2013 15:17:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934250#M4986</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2013-02-05T15:17:34Z</dc:date>
    </item>
    <item>
      <title>Quote:TimP (Intel) wrote:</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934251#M4987</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;TimP (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;If you are using just 1 MPI process per node, then it's fairly easy; simply set appropriate KMP_AFFINITY environment variable, same for each node, according to whether you have HT enabled.&lt;/P&gt;
&lt;P&gt;If it's a Westmere, you may have to deal with its peculiarity; the optimum way to run 8 threads per node is with 1 thread per L3 connection, recognizing that the first 2 pairs of cores share cache connections.&amp;nbsp; But for a start, you must recognize that it's important to use the thread pinning of your OpenMP.&amp;nbsp; If you don't, your MPI job will be paced by the worst accidental thread placement of either node.&amp;nbsp;&amp;nbsp; You might start by reading the documentation which comes with the compiler.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi Tim&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sorry I do not understand the AFFINITY very well.&lt;/P&gt;
&lt;P&gt;I do not understand this.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am using:&amp;nbsp;&lt;STRONG&gt;export KMP_AFFINITY=verbose,granularity=thread,proclist=[0,1,2,3],explicit&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Using 2 nodes, each node has 8 cores(without hyper thread)&lt;/P&gt;
&lt;P&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;/P&gt;
&lt;P&gt;OMP: Warning #205: KMP_AFFINITY: cpuid leaf 11 not supported - decoding legacy APIC ids.&lt;BR /&gt;OMP: Info #149: KMP_AFFINITY: Affinity capable, using global cpuid info&lt;BR /&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0}&lt;BR /&gt;OMP: Info #156: KMP_AFFINITY: 1 available OS procs&lt;BR /&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Warning #205: KMP_AFFINITY: cpuid leaf 11 not supported - decoding legacy APIC ids.&lt;BR /&gt;OMP: Info #159: KMP_AFFINITY: 1 packages x 1 cores/pkg x 1 threads/core (1 total cores)&lt;BR /&gt;OMP: Info #149: KMP_AFFINITY: Affinity capable, using global cpuid info&lt;BR /&gt;OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:&lt;BR /&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0}&lt;BR /&gt;OMP: Info #156: KMP_AFFINITY: 1 available OS procs&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 &lt;BR /&gt;OMP: Warning #123: Ignoring invalid OS proc ID 1.&lt;BR /&gt;OMP: Warning #123: Ignoring invalid OS proc ID 2.&lt;BR /&gt;OMP: Warning #123: Ignoring invalid OS proc ID 3.&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}&lt;BR /&gt;OMP: Info #159: KMP_AFFINITY: 1 packages x 1 cores/pkg x 1 threads/core (1 total cores)&lt;BR /&gt;OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:&lt;BR /&gt;OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 &lt;BR /&gt;OMP: Warning #123: Ignoring invalid OS proc ID 1.&lt;BR /&gt;OMP: Warning #123: Ignoring invalid OS proc ID 2.&lt;BR /&gt;OMP: Warning #123: Ignoring invalid OS proc ID 3.&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {0}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {0}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {0}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {0}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0}&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Feb 2013 15:27:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934251#M4987</guid>
      <dc:creator>Siwei_D_</dc:creator>
      <dc:date>2013-02-06T15:27:13Z</dc:date>
    </item>
    <item>
      <title>I'm not familiar enough with</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934252#M4988</link>
      <description>&lt;P&gt;I'm not familiar enough with the looks of KMP_AFFINITY verbose on non-Intel CPUs (if that's what you have).&lt;/P&gt;
&lt;P&gt;Is it correct that you have a single 8-core non-Intel CPU on each node?&amp;nbsp; If so, you may need to try the effect of various choices for using 4 of those cores, such as&lt;/P&gt;
&lt;P&gt;KMP_AFFINITY="proclist=[1,3,5,7],explicit,verbose"&lt;/P&gt;
&lt;P&gt;which, on an Intel CPU, ought to come out the same as&lt;/P&gt;
&lt;P&gt;KMP_AFFINITY=scatter,1,1,verbose&lt;/P&gt;
&lt;P&gt;In my experience, you always need the ""&amp;nbsp; around a proclist, presumably on account of the embedded punctuation.&lt;/P&gt;
&lt;P&gt;A reason for trying the odd numbered cores might be that your system assigns interrupts to even numbered ones.&amp;nbsp; I have no idea if this might be true of AMD, or why you decline to identify yours.&lt;/P&gt;
&lt;P&gt;Alternatively, if your CPU shares cache between even-odd core pairs, and your application doesn't need to be spread across all of cache, the choice you suggested may be best, if you get the syntax right.&lt;/P&gt;
&lt;P&gt;If you assign 8 threads per node, but restrict them to 4 cores by an affinity setting such as you suggest, you will likely get worse results than without affinity setting.&amp;nbsp; proclist=[0-7] would request use of all 8 cores in order.&lt;/P&gt;</description>
      <pubDate>Wed, 06 Feb 2013 16:27:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934252#M4988</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2013-02-06T16:27:35Z</dc:date>
    </item>
    <item>
      <title>Quote:TimP (Intel) wrote:</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934253#M4989</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;TimP (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;If you are using just 1 MPI process per node, then it's fairly easy; simply set appropriate KMP_AFFINITY environment variable, same for each node, according to whether you have HT enabled.&lt;/P&gt;
&lt;P&gt;If it's a Westmere, you may have to deal with its peculiarity; the optimum way to run 8 threads per node is with 1 thread per L3 connection, recognizing that the first 2 pairs of cores share cache connections.&amp;nbsp; But for a start, you must recognize that it's important to use the thread pinning of your OpenMP.&amp;nbsp; If you don't, your MPI job will be paced by the worst accidental thread placement of either node.&amp;nbsp;&amp;nbsp; You might start by reading the documentation which comes with the compiler.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi Tim&lt;/P&gt;
&lt;P&gt;I am sorry for troubling you agian.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I read the output when using KMP_AFFINITY above, the problem is that I can only see one core, which is really wierd.&lt;/P&gt;
&lt;P&gt;As I has 8 cores per node.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Feb 2013 16:40:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934253#M4989</guid>
      <dc:creator>Siwei_D_</dc:creator>
      <dc:date>2013-02-06T16:40:50Z</dc:date>
    </item>
    <item>
      <title>Quote:TimP (Intel) wrote:</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934254#M4990</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;TimP (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I'm not familiar enough with the looks of KMP_AFFINITY verbose on non-Intel CPUs (if that's what you have).&lt;/P&gt;
&lt;P&gt;Is it correct that you have a single 8-core non-Intel CPU on each node?&amp;nbsp; If so, you may need to try the effect of various choices for using 4 of those cores, such as&lt;/P&gt;
&lt;P&gt;KMP_AFFINITY="proclist=[1,3,5,7],explicit,verbose"&lt;/P&gt;
&lt;P&gt;which, on an Intel CPU, ought to come out the same as&lt;/P&gt;
&lt;P&gt;KMP_AFFINITY=scatter,1,1,verbose&lt;/P&gt;
&lt;P&gt;In my experience, you always need the ""&amp;nbsp; around a proclist, presumably on account of the embedded punctuation.&lt;/P&gt;
&lt;P&gt;A reason for trying the odd numbered cores might be that your system assigns interrupts to even numbered ones.&amp;nbsp; I have no idea if this might be true of AMD, or why you decline to identify yours.&lt;/P&gt;
&lt;P&gt;Alternatively, if your CPU shares cache between even-odd core pairs, and your application doesn't need to be spread across all of cache, the choice you suggested may be best, if you get the syntax right.&lt;/P&gt;
&lt;P&gt;If you assign 8 threads per node, but restrict them to 4 cores by an affinity setting such as you suggest, you will likely get worse results than without affinity setting.&amp;nbsp; proclist=[0-7] would request use of all 8 cores in order.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;it is&amp;nbsp;Intel(R) Xeon(R) CPU &amp;nbsp;E5462 &amp;nbsp;@ 2.80GHz&lt;/P&gt;</description>
      <pubDate>Wed, 06 Feb 2013 16:43:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934254#M4990</guid>
      <dc:creator>Siwei_D_</dc:creator>
      <dc:date>2013-02-06T16:43:14Z</dc:date>
    </item>
    <item>
      <title>I got the same problem when</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934255#M4991</link>
      <description>&lt;P&gt;&lt;STRONG&gt;I got the same problem when running &amp;nbsp;through PBS script, i.e. number of processors to the second mpi process is not 4 but 1.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;#PBS -l nodes=1:ppn=8&lt;/P&gt;
&lt;P&gt;export OMP_NUM_THREADS=4&lt;BR /&gt;export I_MPI_PIN_DOMAIN=omp&lt;/P&gt;
&lt;P&gt;export KMP_AFFINITY=verbose&lt;/P&gt;
&lt;P&gt;mpirun -np 2 ./a.out&amp;nbsp;&lt;/P&gt;
&lt;P&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;BR /&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;BR /&gt;OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info&lt;BR /&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {4,5,6,7}&lt;BR /&gt;OMP: Info #156: KMP_AFFINITY: 4 available OS procs&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Info #179: KMP_AFFINITY: 2 packages x 2 cores/pkg x 1 threads/core (4 total cores)&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {4,5,6,7}&lt;BR /&gt;OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info&lt;BR /&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {7}&lt;BR /&gt;OMP: Info #156: KMP_AFFINITY: 1 available OS procs&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Info #159: KMP_AFFINITY: 1 packages x 1 cores/pkg x 1 threads/core (1 total cores)&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4,5,6,7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {4,5,6,7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {4,5,6,7}&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;But if I do the same interactively, there is no such problem.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;BR /&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;BR /&gt;OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info&lt;BR /&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,12,13}&lt;BR /&gt;OMP: Info #156: KMP_AFFINITY: 4 available OS procs&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total cores)&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1,12,13}&lt;BR /&gt;OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info&lt;BR /&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {2,3,14,15}&lt;BR /&gt;OMP: Info #156: KMP_AFFINITY: 4 available OS procs&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total cores)&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {2,3,14,15}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1,12,13}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2,3,14,15}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {0,1,12,13}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {0,1,12,13}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3,14,15}&lt;/P&gt;
&lt;P&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3,14,15}&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Aug 2013 08:38:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934255#M4991</guid>
      <dc:creator>zhubq</dc:creator>
      <dc:date>2013-08-22T08:38:39Z</dc:date>
    </item>
    <item>
      <title>Hi</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934256#M4992</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;
&lt;P&gt;I do not remenber it correctly but it seems that it was due to my MPI.My MPICH did not support OPENMP, but then I used&amp;nbsp;mvapich and then the problem is solved. So try to use MPICH MVAPICH OPENMP to see the difference, good luck!&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;zhubq wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;I got the same problem when running &amp;nbsp;through PBS script, i.e. number of processors to the second mpi process is not 4 but 1.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;#PBS -l nodes=1:ppn=8&lt;/P&gt;
&lt;P&gt;export OMP_NUM_THREADS=4&lt;BR /&gt;export I_MPI_PIN_DOMAIN=omp&lt;/P&gt;
&lt;P&gt;export KMP_AFFINITY=verbose&lt;/P&gt;
&lt;P&gt;mpirun -np 2 ./a.out&amp;nbsp;&lt;/P&gt;
&lt;P&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;BR /&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;BR /&gt;OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info&lt;BR /&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {4,5,6,7}&lt;BR /&gt;OMP: Info #156: KMP_AFFINITY: 4 available OS procs&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Info #179: KMP_AFFINITY: 2 packages x 2 cores/pkg x 1 threads/core (4 total cores)&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {4,5,6,7}&lt;BR /&gt;OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info&lt;BR /&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {7}&lt;BR /&gt;OMP: Info #156: KMP_AFFINITY: 1 available OS procs&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Info #159: KMP_AFFINITY: 1 packages x 1 cores/pkg x 1 threads/core (1 total cores)&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4,5,6,7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {4,5,6,7}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {4,5,6,7}&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;But if I do the same interactively, there is no such problem.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;BR /&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;BR /&gt;OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info&lt;BR /&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,12,13}&lt;BR /&gt;OMP: Info #156: KMP_AFFINITY: 4 available OS procs&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total cores)&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1,12,13}&lt;BR /&gt;OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info&lt;BR /&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {2,3,14,15}&lt;BR /&gt;OMP: Info #156: KMP_AFFINITY: 4 available OS procs&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total cores)&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {2,3,14,15}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1,12,13}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2,3,14,15}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {0,1,12,13}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {0,1,12,13}&lt;BR /&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3,14,15}&lt;/P&gt;
&lt;P&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3,14,15}&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Aug 2013 10:52:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MPI-OPENMP-does-not-speed-up/m-p/934256#M4992</guid>
      <dc:creator>Siwei_D_</dc:creator>
      <dc:date>2013-08-22T10:52:56Z</dc:date>
    </item>
  </channel>
</rss>

