<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I could narrow the issue down in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/HPCG-and-mpirun-on-2S-Xeon-EP/m-p/1097078#M23635</link>
    <description>&lt;P&gt;I could narrow the issue down to the MPI runtime version.&lt;/P&gt;

&lt;P&gt;composer_xe_2015.1.133&amp;nbsp;works;&lt;BR /&gt;
	2016.3.210 does not.&lt;/P&gt;</description>
    <pubDate>Fri, 25 Nov 2016 11:14:09 GMT</pubDate>
    <dc:creator>JJoha8</dc:creator>
    <dc:date>2016-11-25T11:14:09Z</dc:date>
    <item>
      <title>HPCG and mpirun on 2S Xeon-EP</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/HPCG-and-mpirun-on-2S-Xeon-EP/m-p/1097077#M23634</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Hi,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I'm encountering a problem when trying to measure socket performance of a Xeon E5 v3 chip with COD active but the problem also persists when I try to run on two sockets of a 2S Xeon-EP node. I am using the latest benchmark from the intel.com website (l_mklb_p_2017.1.013) and am following the advice from&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;&lt;A href="https://software.intel.com/en-us/node/599526" target="_blank"&gt;https://software.intel.com/en-us/node/599526&lt;/A&gt;.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I was trying running&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I_MPI_ADJUST_ALLREDUCE=5 mpiexec.hydra -n 2 env OMP_NUM_THREADS=18 KMP_AFFINITY=granularity=fine,compact,1,0 bin/xhpcg_avx2 --n=168&amp;nbsp;on a 2S&amp;nbsp;E5-2697&amp;nbsp;v4 with COD deactivated and;&lt;BR /&gt;
	&lt;SPAN style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px;"&gt;mpiexec.hydra -genv I_MPI_PIN_DOMAIN node -genv I_MPI_PI disable -np 1 -env OMP_NUM_THREADS 7 -env KMP_AFFINITY 'verbose,granularity=fine,proclist=[0,1,2,3,4,5,6],explicit' ../l_mklb_p_2017.1.013/hpcg/mybins/xhpcg_avx2_mpi --n=128 --t=0 : -np 1 -env OMP_NUM_THREADS 7 -env KMP_AFFINITY 'verbose,granularity=fine,proclist=[7,8,9,10,11,12,13],explicit' ../l_mklb_p_2017.1.013/hpcg/mybins/xhpcg_avx2_mpi --n=128 --t=0 to evaluate the socket performance of a 14-core Haswell-EP with COD active.&lt;BR /&gt;
	&lt;BR /&gt;
	The problem is that xhpcg does not finish when running with more than one MPI process per node. When running one process per node and setting OMP_NUM_THREADS to x I can see x*100% CPU load for that process in top (I know top probably isn't the best tool to estimate core utilisation for a memory-bound application, but it's a good enough indicator); if I use more than one MPI process, I see the cpu utilisation dropping to 100% for each MPI process instead of x*100%.&lt;BR /&gt;
	&lt;BR /&gt;
	I tried some debugging but I'm neither an MPI nor HPCG expert, so some help would be appreciated. I set&lt;/SPAN&gt;&lt;BR /&gt;
	HPCG_OPTS &amp;nbsp; &amp;nbsp; = -DHPCG_DEBUG -DHPCG_DETAILED_DEBUG and compiled a new binary. If I run with two MPI processes I get two files, the last entries of each are:&lt;BR /&gt;
	broadep2:IMPI_IOMP_AVX2 iwi325$ tail hpcg_log_n168_2p_1t_2016.11.24.20.16.15.txt&lt;BR /&gt;
	Process 0 of 2 has 9261 rows.&lt;BR /&gt;
	Process 0 of 2 has 230702 nonzeros.&lt;BR /&gt;
	Process 0 of 2 has 4741632 rows.&lt;BR /&gt;
	Process 0 of 2 has 126758012 nonzeros.&lt;BR /&gt;
	Process 0 of 2 has 592704 rows.&lt;BR /&gt;
	Process 0 of 2 has 15687500 nonzeros.&lt;BR /&gt;
	Process 0 of 2 has 74088 rows.&lt;BR /&gt;
	Process 0 of 2 has 1922000 nonzeros.&lt;BR /&gt;
	Process 0 of 2 has 9261 rows.&lt;BR /&gt;
	Process 0 of 2 has 230702 nonzeros.&lt;/P&gt;

&lt;P&gt;broadep2:IMPI_IOMP_AVX2 iwi325$ tail hpcg_log_n168_2p_1t_1_2016.11.24.20.16.15.txt&lt;BR /&gt;
	Process 1 of 2 has 9261 rows.&lt;BR /&gt;
	Process 1 of 2 has 230702 nonzeros.&lt;BR /&gt;
	Process 1 of 2 has 4741632 rows.&lt;BR /&gt;
	Process 1 of 2 has 126758012 nonzeros.&lt;BR /&gt;
	Process 1 of 2 has 592704 rows.&lt;BR /&gt;
	Process 1 of 2 has 15687500 nonzeros.&lt;BR /&gt;
	Process 1 of 2 has 74088 rows.&lt;BR /&gt;
	Process 1 of 2 has 1922000 nonzeros.&lt;BR /&gt;
	Process 1 of 2 has 9261 rows.&lt;BR /&gt;
	Process 1 of 2 has 230702 nonzeros.&lt;BR /&gt;
	&lt;BR /&gt;
	Maybe there's an MPI_barrier and they're waiting for a third non-existend MPI process?&lt;BR /&gt;
	&lt;BR /&gt;
	Any help would be appreciated.&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Nov 2016 19:29:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/HPCG-and-mpirun-on-2S-Xeon-EP/m-p/1097077#M23634</guid>
      <dc:creator>JJoha8</dc:creator>
      <dc:date>2016-11-24T19:29:36Z</dc:date>
    </item>
    <item>
      <title>I could narrow the issue down</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/HPCG-and-mpirun-on-2S-Xeon-EP/m-p/1097078#M23635</link>
      <description>&lt;P&gt;I could narrow the issue down to the MPI runtime version.&lt;/P&gt;

&lt;P&gt;composer_xe_2015.1.133&amp;nbsp;works;&lt;BR /&gt;
	2016.3.210 does not.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Nov 2016 11:14:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/HPCG-and-mpirun-on-2S-Xeon-EP/m-p/1097078#M23635</guid>
      <dc:creator>JJoha8</dc:creator>
      <dc:date>2016-11-25T11:14:09Z</dc:date>
    </item>
  </channel>
</rss>

