<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi all, in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116882#M74420</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;Thanks to Dr. Dahnken, developer of the libxphi library. He found the solution to my question.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://github.com/cdahnken/libxphi" target="_blank"&gt;https://github.com/cdahnken/libxphi&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://github.com/hfp/mpirun/blob/master/mpirun.py" target="_blank"&gt;https://github.com/hfp/mpirun/blob/master/mpirun.py&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;In configuration with CentOS 7.1+ MPSS-3.8.1 + Intel PSXE 2017 update 1, a line have to be removed from the mpirun.py&lt;/P&gt;

&lt;PRE class="brush:python;"&gt;runstring = "mpirun -bootstrap ssh"
if (None == args.mri): runstring = runstring \
                    + " -genv I_MPI_PIN_DOMAIN=auto" \
#                   + " -genv OFFLOAD_INIT=on_start" \
                    + " -genv MIC_USE_2MB_BUFFERS=2m" \
                    + " -genv MIC_ENV_PREFIX=" + micenv \
                    + " -genv " + micenv + "_KMP_AFFINITY=" + args.micaffinity + ",granularity=fine" \
                    + " -genv " + micenv + "_OMP_SCHEDULE=" + args.schedule \
                    + " -genv " + micenv + "_OMP_NUM_THREADS=" + str(max((mcores + min(0, args.reserved)) * 4, args.mthreads))&lt;/PRE&gt;

&lt;P&gt;After removal of the line "&lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 13.008px;"&gt;+ " -genv OFFLOAD_INIT=on_start" \&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;", it runs offloading to on 2x 7120P.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;With these parameters, 2x 7120P complete the QE AUSURF112 benchmark in 17min56sec,&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;$ QE_MIC_BLOCKSIZE_M=2048
$ QE_MIC_BLOCKSIZE_N=2048
$ QE_MIC_BLOCKSIZE_K=512&lt;/PRE&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;By increasing K to 1024,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;2x 7120P complete the QE AUSURF112 benchmark in 17min17sec,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;I found 3 warning are given by the code:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;(1) OMP: Info #256: KMP_PLACE_THREADS variable deprecated, please use KMP_HW_SUBSET instead.&lt;BR /&gt;
	(2) OMP: Warning #250: KMP_HW_SUBSET "o" offset designator deprecated, please use @ prefix for offset value.&lt;BR /&gt;
	(3) offload error: cannot unload library from the device 0 (error code 14)&lt;/P&gt;

&lt;P&gt;and I am looking into these and I will report my findings.&lt;/P&gt;

&lt;P&gt;Thanks for your attention.&lt;/P&gt;

&lt;P&gt;Rolly&lt;/P&gt;</description>
    <pubDate>Mon, 27 Mar 2017 15:11:07 GMT</pubDate>
    <dc:creator>Rolly_N_</dc:creator>
    <dc:date>2017-03-27T15:11:07Z</dc:date>
    <item>
      <title>Failed to explicit offload on Xeon Phi KNC</title>
      <link>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116881#M74419</link>
      <description>&lt;DIV class="field field-name-body field-type-text-with-summary field-label-hidden" style="color: rgb(96, 96, 96); font-size: 13.008px;"&gt;
	&lt;DIV class="field-items"&gt;
		&lt;DIV class="field-item even" property="content:encoded"&gt;
			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;Dear all,&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;I am referring to the guide on Explicit offload QE to Xeon Phi KNC (7120P) here,&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;&lt;A href="https://software.intel.com/en-us/articles/explicit-offload-for-quantum-espresso"&gt;https://software.intel.com/en-us/articles/explicit-offload-for-quantum-espresso&lt;/A&gt;&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;I tried to follow the above steps but I failed to run pw.x (QE v5.3.0) on 2 Xeon Phi 7120P using the mpirun.sh script. The error reads:&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;&amp;nbsp;allocating buffers &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2048 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2048 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1024&lt;BR /&gt;
				&amp;nbsp;on device &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;0&lt;BR /&gt;
				&amp;nbsp;threshold &amp;nbsp; &amp;nbsp;20000000000.0000 &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;BR /&gt;
				&amp;nbsp;allocating buffers &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2048 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2048 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1024&lt;BR /&gt;
				&amp;nbsp;on device &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;0&lt;BR /&gt;
				&amp;nbsp;threshold &amp;nbsp; &amp;nbsp;20000000000.0000 &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;BR /&gt;
				offload error: cannot create buffer on device 0 (error code 14)&lt;BR /&gt;
				offload error: cannot create buffer on device 0 (error code 14)&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;This is how I run the script,&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;[qeuser@node09 ~]$ ~/mpirun/mpirun.sh -p 1 -w ~/libxphi/xphilibwrapper.sh -x ~/QE530-KNC-OL/espresso-5.3.0/bin/pw.x -i ~/rolly/AUSURF112/ausurf.in&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;I have already scp all the lib and bin files to each Xeon Phi 7120P and I have also compiled the libxphi lib. This is how the libxphi directory reads,&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;[qeuser@node09 libxphi]$ ls&lt;BR /&gt;
				build-library.sh &amp;nbsp;libmkl_proxy.so &amp;nbsp;LICENSE &amp;nbsp; &amp;nbsp; &amp;nbsp;README.md &amp;nbsp; &amp;nbsp;xphilibmod.mod &amp;nbsp; &amp;nbsp; xphilib.o &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;xphilib_proxy.o&lt;BR /&gt;
				clean.sh &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;libxphi.so &amp;nbsp; &amp;nbsp; &amp;nbsp; mkl_proxy.c &amp;nbsp;xphilib.f90 &amp;nbsp;xphilibmod.modmic &amp;nbsp;xphilib_proxy.f90 &amp;nbsp;xphilibwrapper.sh&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;I suppose this is okay.&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;However, I found it interesting that I can run a single instance on mic0 but it is very slow. This is how I did it,&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;[qeuser@node09 ~]$ export LD_LIBRARY_PATH=/home/qeuser/libxphi/:$LD_LIBRARY_PATH&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;[qeuser@node09 ~]$ LD_PRELOAD="/home/qeuser/libxphi/libxphi.so"&amp;nbsp;/home/qeuser/QE530-KNC-OL/espresso-5.3.0/bin/pw.x &amp;nbsp;&amp;lt; /home/qeuser/rolly/AUSURF112/ausurf.in&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;Error messages were also produced,&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;&amp;nbsp;allocating buffers &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2048 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2048 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1024&lt;BR /&gt;
				&amp;nbsp;on device &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;0&lt;BR /&gt;
				&amp;nbsp;threshold &amp;nbsp; &amp;nbsp;20000000000.0000 &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;BR /&gt;
				ERROR: ld.so: object '/home/qeuser/libxphi/libxphi.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.&lt;BR /&gt;
				ERROR: ld.so: object '/home/qeuser/libxphi/libxphi.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.&lt;BR /&gt;
				&amp;nbsp;buffer allocation &amp;nbsp; 4.02019500732422 &amp;nbsp; &amp;nbsp; &amp;nbsp;s&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;The AUSURF112 benchmark completed, it took very long time, and it reads&lt;/P&gt;

			&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;PWSCF &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;: &amp;nbsp; &amp;nbsp; 9h56m CPU &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;0h36m WALL&lt;/P&gt;

			&lt;P&gt;&amp;nbsp; &amp;nbsp;This run was terminated on: &amp;nbsp; 2:25: 1 &amp;nbsp;11Mar2017&lt;/P&gt;

			&lt;P&gt;=------------------------------------------------------------------------------=&lt;BR /&gt;
				&amp;nbsp; &amp;nbsp;JOB DONE.&lt;BR /&gt;
				=------------------------------------------------------------------------------=&lt;BR /&gt;
				2&lt;BR /&gt;
				offload error: cannot unload library from the device 0 (error code 14)&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;On the host I can see one copy of pw.x is running, and on mic0 I can see that offload_main and coi_daemon are running by the micuser. But it is very slow.&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;So, is this offload error: cannot create buffer on device 0 (error code 14) related to the mpirun.sh script and the libxphi.so were not preloaded even it is present and the libxphi cannot be unload after completion???&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;With two Xeon E5-2683v3 CPU of 56 threads, it tooks only 9 mins and 15 seconds, but with this single instance offloaded to Xeon Phi, it tooks 10 hours?&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;I am running CentOS 7.1 + Intel MPSS 3.8.1 + Intel psxe 2017 update 1, and I have already made a symbolic link of the psxevars.sh to /etc/profile.d and I can use mpirun to pw.x on the host, but not offload to mic0 and mic1&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;Are these compatibility issues because the libxphi and mpirun.sh were written 2 years ago? How can these be fixed?&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;Thank you,&lt;/P&gt;

			&lt;P style="word-wrap: break-word; font-size: 12px; color: rgb(83, 87, 94);"&gt;Rolly&lt;/P&gt;
		&lt;/DIV&gt;
	&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Fri, 10 Mar 2017 17:53:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116881#M74419</guid>
      <dc:creator>Rolly_N_</dc:creator>
      <dc:date>2017-03-10T17:53:30Z</dc:date>
    </item>
    <item>
      <title>Hi all,</title>
      <link>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116882#M74420</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;Thanks to Dr. Dahnken, developer of the libxphi library. He found the solution to my question.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://github.com/cdahnken/libxphi" target="_blank"&gt;https://github.com/cdahnken/libxphi&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://github.com/hfp/mpirun/blob/master/mpirun.py" target="_blank"&gt;https://github.com/hfp/mpirun/blob/master/mpirun.py&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;In configuration with CentOS 7.1+ MPSS-3.8.1 + Intel PSXE 2017 update 1, a line have to be removed from the mpirun.py&lt;/P&gt;

&lt;PRE class="brush:python;"&gt;runstring = "mpirun -bootstrap ssh"
if (None == args.mri): runstring = runstring \
                    + " -genv I_MPI_PIN_DOMAIN=auto" \
#                   + " -genv OFFLOAD_INIT=on_start" \
                    + " -genv MIC_USE_2MB_BUFFERS=2m" \
                    + " -genv MIC_ENV_PREFIX=" + micenv \
                    + " -genv " + micenv + "_KMP_AFFINITY=" + args.micaffinity + ",granularity=fine" \
                    + " -genv " + micenv + "_OMP_SCHEDULE=" + args.schedule \
                    + " -genv " + micenv + "_OMP_NUM_THREADS=" + str(max((mcores + min(0, args.reserved)) * 4, args.mthreads))&lt;/PRE&gt;

&lt;P&gt;After removal of the line "&lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 13.008px;"&gt;+ " -genv OFFLOAD_INIT=on_start" \&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;", it runs offloading to on 2x 7120P.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;With these parameters, 2x 7120P complete the QE AUSURF112 benchmark in 17min56sec,&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;$ QE_MIC_BLOCKSIZE_M=2048
$ QE_MIC_BLOCKSIZE_N=2048
$ QE_MIC_BLOCKSIZE_K=512&lt;/PRE&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;By increasing K to 1024,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;2x 7120P complete the QE AUSURF112 benchmark in 17min17sec,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;I found 3 warning are given by the code:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;(1) OMP: Info #256: KMP_PLACE_THREADS variable deprecated, please use KMP_HW_SUBSET instead.&lt;BR /&gt;
	(2) OMP: Warning #250: KMP_HW_SUBSET "o" offset designator deprecated, please use @ prefix for offset value.&lt;BR /&gt;
	(3) offload error: cannot unload library from the device 0 (error code 14)&lt;/P&gt;

&lt;P&gt;and I am looking into these and I will report my findings.&lt;/P&gt;

&lt;P&gt;Thanks for your attention.&lt;/P&gt;

&lt;P&gt;Rolly&lt;/P&gt;</description>
      <pubDate>Mon, 27 Mar 2017 15:11:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116882#M74420</guid>
      <dc:creator>Rolly_N_</dc:creator>
      <dc:date>2017-03-27T15:11:07Z</dc:date>
    </item>
    <item>
      <title>Re 1: KMP_PLACE_THREADS has</title>
      <link>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116883#M74421</link>
      <description>&lt;P&gt;Re 1: KMP_PLACE_THREADS has been replaced with KMP_HW_SUBSET which better captures what the environmnet variable actually does.(The link below also points this out!)&lt;/P&gt;

&lt;P&gt;Re 2: KMP_HW_SUBSET is documented in the compiler manual at&amp;nbsp;https://software.intel.com/en-us/node/684224#DA65065D-8C30-4E3A-BA61-4AAE42DD505C (and elsewhere).&lt;/P&gt;</description>
      <pubDate>Mon, 27 Mar 2017 15:23:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116883#M74421</guid>
      <dc:creator>James_C_Intel2</dc:creator>
      <dc:date>2017-03-27T15:23:53Z</dc:date>
    </item>
    <item>
      <title>Hi James,</title>
      <link>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116884#M74422</link>
      <description>&lt;P&gt;Hi James,&lt;/P&gt;

&lt;P&gt;Thanks for the info.&lt;/P&gt;

&lt;P&gt;I took a look at the link, I believe KMP_HW_SUBSET reads something like this, KMP_HW_SUBSET=&lt;VAR style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 13px; line-height: 1.6em; color: rgb(102, 102, 102); background-color: rgb(242, 242, 242);"&gt;sockets&lt;/VAR&gt;&lt;SPAN style="color: rgb(102, 102, 102); font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 13px; background-color: rgb(242, 242, 242);"&gt;S[@&lt;/SPAN&gt;&lt;VAR style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 13px; line-height: 1.6em; color: rgb(102, 102, 102); background-color: rgb(242, 242, 242);"&gt;offset&lt;/VAR&gt;&lt;SPAN style="color: rgb(102, 102, 102); font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 13px; background-color: rgb(242, 242, 242);"&gt;],&lt;/SPAN&gt;&lt;VAR style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 13px; line-height: 1.6em; color: rgb(102, 102, 102); background-color: rgb(242, 242, 242);"&gt;cores&lt;/VAR&gt;&lt;SPAN style="color: rgb(102, 102, 102); font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 13px; background-color: rgb(242, 242, 242);"&gt;C[@&lt;/SPAN&gt;&lt;VAR style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 13px; line-height: 1.6em; color: rgb(102, 102, 102); background-color: rgb(242, 242, 242);"&gt;offset&lt;/VAR&gt;&lt;SPAN style="color: rgb(102, 102, 102); font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 13px; background-color: rgb(242, 242, 242);"&gt;],&lt;/SPAN&gt;&lt;VAR style="box-sizing: border-box; font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 13px; line-height: 1.6em; color: rgb(102, 102, 102); background-color: rgb(242, 242, 242);"&gt;threads&lt;/VAR&gt;&lt;SPAN style="color: rgb(102, 102, 102); font-family: &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 13px; background-color: rgb(242, 242, 242);"&gt;T&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;The mpirun.py script generates something like this,&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;mpirun 
-bootstrap ssh 
-genv I_MPI_PIN_DOMAIN=auto 
-genv MIC_USE_2MB_BUFFERS=2m 
-genv MIC_ENV_PREFIX=MIC 
-genv MIC_KMP_AFFINITY=balanced,granularity=fine 
-genv MIC_OMP_SCHEDULE=static 
-genv MIC_OMP_NUM_THREADS=32 
-host node09 
-np 1 
-env KMP_AFFINITY=compact,1,granularity=fine 
-env OMP_SCHEDULE=static 
-env OMP_NUM_THREADS=4 
-env OFFLOAD_DEVICES=0 
&lt;STRONG&gt;-env MIC_KMP_PLACE_THREADS=8c,4t,0o&lt;/STRONG&gt; 
-env MIC_OMP_SCHEDULE=static 
/home/qeuser/libxphi/xphilibwrapper.sh  
/home/qeuser/QE530-KNC-OL/espresso-5.3.0/bin/pw.x  &amp;lt; /home/qeuser/rolly/AUSURF112/ausurf.in : 
-host node09 
-np 1 
-env KMP_AFFINITY=compact,1,granularity=fine 
-env OMP_SCHEDULE=static 
-env OMP_NUM_THREADS=4 
-env OFFLOAD_DEVICES=0 
&lt;STRONG&gt;-env MIC_KMP_PLACE_THREADS=8c,4t,8o&lt;/STRONG&gt; 
-env MIC_OMP_SCHEDULE=static 
/home/qeuser/libxphi/xphilibwrapper.sh  
/home/qeuser/QE530-KNC-OL/espresso-5.3.0/bin/pw.x  &amp;lt; /home/qeuser/rolly/AUSURF112/ausurf.in :
...
-host node09 
-np 1 
-env KMP_AFFINITY=compact,1,granularity=fine 
-env OMP_SCHEDULE=static 
-env OMP_NUM_THREADS=4 
-env OFFLOAD_DEVICES=1 
&lt;STRONG&gt;-env MIC_KMP_PLACE_THREADS=8c,4t,48o&lt;/STRONG&gt; 
-env MIC_OMP_SCHEDULE=static 
/home/qeuser/libxphi/xphilibwrapper.sh  
/home/qeuser/QE530-KNC-OL/espresso-5.3.0/bin/pw.x  &amp;lt; /home/qeuser/rolly/AUSURF112/ausurf.in &lt;/PRE&gt;

&lt;P&gt;So, is it correct to change environmental variable MIC_KMP_PLACE_THREADS to MIC_KMP_HW_SUBSET?&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;Shall it looks like this?&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;-host node09 
-np 1 
-env KMP_AFFINITY=compact,1,granularity=fine 
-env OMP_SCHEDULE=static 
-env OMP_NUM_THREADS=4 
-env OFFLOAD_DEVICES=0 
&lt;STRONG&gt;-env MIC_KMP_HW_SUBSET=0s,8c@0,4t&lt;/STRONG&gt; 
-env MIC_OMP_SCHEDULE=static 
/home/qeuser/libxphi/xphilibwrapper.sh  
/home/qeuser/QE530-KNC-OL/espresso-5.3.0/bin/pw.x  &amp;lt; /home/qeuser/rolly/AUSURF112/ausurf.in :
-host node09 
-np 1 
-env KMP_AFFINITY=compact,1,granularity=fine 
-env OMP_SCHEDULE=static 
-env OMP_NUM_THREADS=4 
-env OFFLOAD_DEVICES=0 
-env MIC_KMP_HW_SUBSET=0s,8c@8,4t 
-env MIC_OMP_SCHEDULE=static 
/home/qeuser/libxphi/xphilibwrapper.sh  
/home/qeuser/QE530-KNC-OL/espresso-5.3.0/bin/pw.x  &amp;lt; /home/qeuser/rolly/AUSURF112/ausurf.in :
...
-host node09 
-np 1 
-env KMP_AFFINITY=compact,1,granularity=fine 
-env OMP_SCHEDULE=static 
-env OMP_NUM_THREADS=4 
-env OFFLOAD_DEVICES=1 
-env MIC_KMP_HW_SUBSET=0s@1,8c@48,4t 
-env MIC_OMP_SCHEDULE=static 
/home/qeuser/libxphi/xphilibwrapper.sh  
/home/qeuser/QE530-KNC-OL/espresso-5.3.0/bin/pw.x  &amp;lt; /home/qeuser/rolly/AUSURF112/ausurf.in :&lt;/PRE&gt;

&lt;P&gt;The&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;MIC_KMP_PLACE_THREADS also assign different thread numbers to different core via the "o" offset &lt;/SPAN&gt;designator, but how can I&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;do the same with MIC_KMP_HW_SUBSET? By the way, is&amp;nbsp;MIC_KMP_HW_SUBSET the correct environmental variable?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Rolly&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Mar 2017 09:11:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116884#M74422</guid>
      <dc:creator>Rolly_N_</dc:creator>
      <dc:date>2017-03-28T09:11:00Z</dc:date>
    </item>
    <item>
      <title>Hi all,</title>
      <link>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116885#M74423</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;I am able to remove the first two warnings by modifying the mpirun.py. Can anyone check if these are correct? Thank you!&lt;/P&gt;

&lt;PRE class="brush:python;"&gt;#! /usr/bin/env python
###############################################################################
## Copyright (c) 2014-2015, Intel Corporation                                ##
## All rights reserved.                                                      ##
##                                                                           ##
## Redistribution and use in source and binary forms, with or without        ##
## modification, are permitted provided that the following conditions        ##
## are met:                                                                  ##
## 1. Redistributions of source code must retain the above copyright         ##
##    notice, this list of conditions and the following disclaimer.          ##
## 2. Redistributions in binary form must reproduce the above copyright      ##
##    notice, this list of conditions and the following disclaimer in the    ##
##    documentation and/or other materials provided with the distribution.   ##
## 3. Neither the name of the copyright holder nor the names of its          ##
##    contributors may be used to endorse or promote products derived        ##
##    from this software without specific prior written permission.          ##
##                                                                           ##
## THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS       ##
## "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT         ##
## LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR     ##
## A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT      ##
## HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,    ##
## SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED  ##
## TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR    ##
## PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF    ##
## LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING      ##
## NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS        ##
## SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.              ##
###############################################################################
## Hans Pabst, Christopher Dahnken, Intel Corporation                        ##
###############################################################################
import platform
import sys
import os

try:
    import argparse
except ImportError, e:
    print 'Failed to import the argparse module: ', e
    print 'Please follow the troubleshooting instructions at:'
    print 'https://github.com/hfp/mpirun#troubleshooting'


def micshift(i, rcores, ncores, nthreads):
    return str(max(ncores + min(0, rcores), nthreads)) + "c@" + str(i * ncores) + "," + str(nthreads) + "t"  


nodename = platform.node().split(".")[0]
micenv = "MIC"

parser = argparse.ArgumentParser()
parser.add_argument("-n", "--nodelist", help="list of comma separated node names", default=["localhost", nodename]["" != nodename])
parser.add_argument("-p", "--cpuprocs", help="number of processes per socket (host)", type=int, default=1)
parser.add_argument("-q", "--micprocs", help="number of processes per mic (native)", type=int, default=1)
parser.add_argument("-s", "--nsockets", help="number of sockets per node", type=int, default=1)
parser.add_argument("-d", "--ndevices", help="number of devices per node", type=int, default=0)
parser.add_argument("-e", "--cpucores", help="number of CPU cores per socket", type=int, default=1)
parser.add_argument("-t", "--nthreads", help="number of CPU threads per core", type=int, default=2)
parser.add_argument("-m", "--miccores", help="number of MIC cores per device", type=int, default=57)
parser.add_argument("-r", "--reserved", help="number of MIC cores reserved", type=int, default=sys.maxint)
parser.add_argument("-u", "--mthreads", help="number of MIC threads per core", type=int, default=4)
parser.add_argument("-a", "--cpuaffinity", help="affinity (CPU) e.g., compact", default="compact")
parser.add_argument("-b", "--micaffinity", help="affinity (MIC) e.g., balanced", default="balanced")
parser.add_argument("-c", "--schedule", help="schedule, e.g. dynamic", default="static")
parser.add_argument("-w", "--wrapper", help="wrapper")
parser.add_argument("-g", "--debugger", help="debugger")
parser.add_argument("-i", "--inputfile", help="inputfile (&amp;lt;)")
parser.add_argument("-0", "--hr0", help="executable (rank-0)")
parser.add_argument("-x", "--hri", help="executable (host)")
parser.add_argument("-y", "--mri", help="executable (mic)")
parser.add_argument("-z", "--micpre", help="prefixed mic name", action="store_true")
parser.add_argument("-v", "--dryrun", help="dryrun", action="store_true")
args, unknown = parser.parse_known_args()

if (None != args.inputfile):
    arguments = " ".join(unknown) + " &amp;lt; " + args.inputfile
else:
    arguments = " ".join(unknown)
if ("" != arguments): arguments = " " + arguments

if (None != args.wrapper):
    wrapper = args.wrapper + [" ", " " + str(args.debugger) + " -ex run --args"][None != args.debugger]
else:
    wrapper = ["", str(args.debugger) + " -ex run --args"][None != args.debugger]
if ("" != wrapper): wrapper = wrapper + " "

if (1 &amp;lt; args.nsockets): args.cpuaffinity = args.cpuaffinity + ",1"
if (1 &amp;lt; args.nthreads): args.cpuaffinity = args.cpuaffinity + ",granularity=fine"

if (None == args.mri):
    if (sys.maxint == args.reserved): args.reserved = 1
    if (0 &amp;lt; args.ndevices):
        cpusockets = min(args.nsockets, args.ndevices)
    else:
        cpusockets = args.nsockets
    nparts = args.cpuprocs
else:
    if (sys.maxint == args.reserved): args.reserved = 0
    cpusockets = args.nsockets
    nparts = args.micprocs

cputhreads = cpusockets * args.cpucores * args.nthreads
mcores = int(0 &amp;lt; nparts) * max((args.miccores - max(args.reserved, 0)), nparts) / max(nparts, 1)
pthreads = int(0 &amp;lt; (args.cpuprocs * cpusockets)) * cputhreads / max(args.cpuprocs * cpusockets, 1)
remainder = cputhreads - pthreads * args.cpuprocs * cpusockets


runstring = "mpirun -bootstrap ssh"
if (None == args.mri): runstring = runstring \
                    + " -genv I_MPI_PIN_DOMAIN=auto" \
		    + " -genv MIC_USE_2MB_BUFFERS=2m" \
                    + " -genv MIC_ENV_PREFIX=" + micenv \
                    + " -genv " + micenv + "_KMP_AFFINITY=" + args.micaffinity + ",granularity=fine" \
                    + " -genv " + micenv + "_OMP_SCHEDULE=" + args.schedule \
                    + " -genv " + micenv + "_OMP_NUM_THREADS=" + str(max((mcores + min(0, args.reserved)) * 4, args.mthreads))
else: runstring = runstring \
                    + " -genv I_MPI_MIC=1"
if (None != args.hr0): runstring = runstring \
                    + " -host " + args.nodelist.split(",")[0] + " -np 1" \
                    + " -env I_MPI_PIN_DOMAIN=auto" \
                    + " -env KMP_AFFINITY=" + args.cpuaffinity \
                    + " -env OMP_NUM_THREADS=" + str(args.nthreads) \
                    + " " + wrapper + args.hr0 + arguments \
                    + " :"
for n in args.nodelist.split(","):
    for s in range(0, cpusockets):
        for p in range(0, args.cpuprocs):
            runstring = runstring \
                    + " -host " + n + " -np 1" \
                    + " -env KMP_AFFINITY=" + args.cpuaffinity \
                    + " -env OMP_SCHEDULE=" + args.schedule \
                    + " -env OMP_NUM_THREADS=" + str(pthreads - [0, args.nthreads][None != args.hr0 and 0 == s and 0 == p])
            if (None == args.mri): runstring = runstring \
                    + " -env OFFLOAD_DEVICES=" + str(s) \
                    + " -env " + micenv + "_KMP_HW_SUBSET=" + str(s) + "s," + micshift(p, args.reserved, mcores, args.mthreads) \
                    + " -env " + micenv + "_OMP_SCHEDULE=" + args.schedule
            else: runstring = runstring \
                    + " -env I_MPI_PIN_DOMAIN=auto"
            if (None != args.hri): runstring = runstring \
                    + " " + wrapper + args.hri + arguments \
                    + " :"
    if (None != args.mri):
        for d in range(0, args.ndevices):
            for m in range(0, args.micprocs):
                runstring = runstring \
                    + " -host " + ["", n + "-"][int(args.micpre)] + "mic" + str(d) + " -np 1" \
                    + " -env LD_LIBRARY_PATH=$MIC_LD_LIBRARY_PATH" \
                    + " -env I_MPI_PIN=off" \
                    + " -env KMP_HW_SUBSET=" + str(d) + "s," + micshift(m, args.reserved, mcores, args.mthreads) \
                    + " -env KMP_AFFINITY=" + args.micaffinity + ",granularity=fine" \
                    + " -env OMP_SCHEDULE=" + args.schedule \
                    + " -env OMP_NUM_THREADS=" + str(max(mcores + min(0, args.reserved), args.mthreads) * 4) \
                    + " " + args.mri + arguments \
                    + " :"
        if (None != args.hri and 0 &amp;lt; remainder and 0 &amp;lt; cputhreads and 0 &amp;lt; args.cpuprocs): runstring = runstring \
                    + " -host " + n + " -np 1" \
                    + " -env I_MPI_PIN_DOMAIN=auto" \
                    + " -env KMP_AFFINITY=" + args.cpuaffinity \
                    + " -env OMP_SCHEDULE=" + args.schedule \
                    + " -env OMP_NUM_THREADS=" + str(remainder) \
                    + " " + wrapper + args.hri + arguments \
                    + " :"

runstring = runstring[0:len(runstring)-2]
print runstring
print

result = 0
if (False == args.dryrun):
    result = os.system(runstring)

sys.exit([0, 1][0 != result])

&lt;/PRE&gt;

&lt;P&gt;But for the 3rd warning,&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;(3) offload error: cannot unload library from the device 0 (error code 14), can anyone help?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;Rolly&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 02:37:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Failed-to-explicit-offload-on-Xeon-Phi-KNC/m-p/1116885#M74423</guid>
      <dc:creator>Rolly_N_</dc:creator>
      <dc:date>2017-03-30T02:37:14Z</dc:date>
    </item>
  </channel>
</rss>

