<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MKL/Xeon Phi Offload Runtime Issue - 3120A in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Xeon-Phi-Offload-Runtime-Issue-3120A/m-p/1170302#M28513</link>
    <description>&lt;P&gt;Hello there&lt;/P&gt;

&lt;P&gt;I have set up my Xeon phi 3120A in Windows 10 Pro, with MPSS 3.8.4 and Parallel XE 2017 (Initial Release). I have chosen this Parallel XE as this was the last supported XE for the x100 series. I have installed the MKL version that is packaged with the &lt;SPAN style="font-size: 13.008px;"&gt;Parallel XE 2017 (Initial Release)&lt;/SPAN&gt;.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;What have I done / setup:&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;After setting up MPSS 3.8.4, and following the steps such as flashing and pinging, I have checked that&amp;nbsp;&lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 1em;"&gt;micctrl -s&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;shows “mic0 ready” (with linux image containing the appropriate KNC name),&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 1em;"&gt;miccheck&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;produces all "passes" and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 1em;"&gt;micinfo&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;gives me a reading for all the key stats that the co-processor is providing.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Hence to me it looks like the co-processor is certainly installed and being recognised by my computer. I can also see that mic0 is up and running in the&amp;nbsp;&lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 1em;"&gt;micsmc&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;gui.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I have then set up my environment variables to enable automatic offload, namely, MKL_MIC_ENABLE=1,&amp;nbsp;OFFLOAD_DEVICES= 0, MKL_MIC_MAX_MEMORY= 2GB,&amp;nbsp;MIC_ENV_PREFIX= MIC, MIC_OMP_NUM_THREADS= 228,&amp;nbsp;MIC_KMP_AFFINITY= balanced.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;The Problem&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;When I go to run some simple code in R-3.4.3 (copied below, designed specifically for automatic offload), it keeps running the code through my host computer rather than running anything through the Xeon phi. To support this, I cannot see any activity onthe xeon Phis when I look at the &lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 1em;"&gt;micsmc&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;gui.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;The R code:&lt;/P&gt;

&lt;PRE class="brush:cpp; class-name:dark;"&gt;require(Matrix)
sink("output.txt")
N &amp;lt;- 16000
cat("Initialization...\n")
a &amp;lt;- matrix(runif(N*N), ncol=N, nrow=N);
b &amp;lt;- matrix(runif(N*N), ncol=N, nrow=N);
cat("Matrix-matrix multiplication of size ", N, "x", N, ":\n")
for (i in 1:5) {
  dt=system.time( c &amp;lt;- a %*% b )
  gflops = 2*N*N*N*1e-9/dt[3]
  cat("Trial: ", i, ", time: ", dt[3], " sec, performance: ", gflops, " GFLOP/s\n")
}&lt;/PRE&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Other steps I have tried:&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;I then proceeded to set up the MKL_MIC_DISABLE_HOST_FALLBACK=1 environmental variable, and as expected, when I ran the above code, R terminated.&lt;/P&gt;

&lt;P&gt;In &lt;A href="https://community.intel.com/legacyfs/online/drupal_files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf"&gt;https://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf&lt;/A&gt;&amp;nbsp;it says that if the HOST_FALLBACK flag is active and offload is attempted but fails (due to “offload runtime cannot find a coprocessor or cannot initialize it properly”), it will terminate the program – this is happening in that R is terminating completely. For completeness, this problem is happening on R-3.5.1, Microsoft R Open 3.5.0 and R-3.2.1 as well.&lt;/P&gt;

&lt;P&gt;&lt;U&gt;&lt;STRONG&gt;So my questions are:&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;

&lt;OL&gt;
	&lt;LI&gt;What am I missing to make the R code run on the Xeon phi? Can you please advise me on what I need to do to make this work?&lt;/LI&gt;
	&lt;LI&gt;(linked to 1) is there a way to check if the MKL offload runtime can see the Xeon phi? Or that it is correctly set up, or what (if any) problem that MKL is having initialising the Xeon phi?&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;Will sincerely appreciate your help – I believe that I am missing a fundamental/simple step, and have been tearing my hair out trying to make this work.&lt;/P&gt;

&lt;P&gt;Many thanks in advance,&lt;/P&gt;

&lt;P&gt;Keyur&lt;/P&gt;</description>
    <pubDate>Tue, 14 Aug 2018 07:23:53 GMT</pubDate>
    <dc:creator>kelkar__Keyur1</dc:creator>
    <dc:date>2018-08-14T07:23:53Z</dc:date>
    <item>
      <title>MKL/Xeon Phi Offload Runtime Issue - 3120A</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Xeon-Phi-Offload-Runtime-Issue-3120A/m-p/1170302#M28513</link>
      <description>&lt;P&gt;Hello there&lt;/P&gt;

&lt;P&gt;I have set up my Xeon phi 3120A in Windows 10 Pro, with MPSS 3.8.4 and Parallel XE 2017 (Initial Release). I have chosen this Parallel XE as this was the last supported XE for the x100 series. I have installed the MKL version that is packaged with the &lt;SPAN style="font-size: 13.008px;"&gt;Parallel XE 2017 (Initial Release)&lt;/SPAN&gt;.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;What have I done / setup:&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;After setting up MPSS 3.8.4, and following the steps such as flashing and pinging, I have checked that&amp;nbsp;&lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 1em;"&gt;micctrl -s&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;shows “mic0 ready” (with linux image containing the appropriate KNC name),&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 1em;"&gt;miccheck&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;produces all "passes" and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 1em;"&gt;micinfo&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;gives me a reading for all the key stats that the co-processor is providing.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Hence to me it looks like the co-processor is certainly installed and being recognised by my computer. I can also see that mic0 is up and running in the&amp;nbsp;&lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 1em;"&gt;micsmc&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;gui.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I have then set up my environment variables to enable automatic offload, namely, MKL_MIC_ENABLE=1,&amp;nbsp;OFFLOAD_DEVICES= 0, MKL_MIC_MAX_MEMORY= 2GB,&amp;nbsp;MIC_ENV_PREFIX= MIC, MIC_OMP_NUM_THREADS= 228,&amp;nbsp;MIC_KMP_AFFINITY= balanced.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;The Problem&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;When I go to run some simple code in R-3.4.3 (copied below, designed specifically for automatic offload), it keeps running the code through my host computer rather than running anything through the Xeon phi. To support this, I cannot see any activity onthe xeon Phis when I look at the &lt;SPAN style="font-family: Consolas, &amp;quot;Lucida Console&amp;quot;, Menlo, Monaco, &amp;quot;DejaVu Sans Mono&amp;quot;, monospace, sans-serif; font-size: 1em;"&gt;micsmc&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;gui.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;The R code:&lt;/P&gt;

&lt;PRE class="brush:cpp; class-name:dark;"&gt;require(Matrix)
sink("output.txt")
N &amp;lt;- 16000
cat("Initialization...\n")
a &amp;lt;- matrix(runif(N*N), ncol=N, nrow=N);
b &amp;lt;- matrix(runif(N*N), ncol=N, nrow=N);
cat("Matrix-matrix multiplication of size ", N, "x", N, ":\n")
for (i in 1:5) {
  dt=system.time( c &amp;lt;- a %*% b )
  gflops = 2*N*N*N*1e-9/dt[3]
  cat("Trial: ", i, ", time: ", dt[3], " sec, performance: ", gflops, " GFLOP/s\n")
}&lt;/PRE&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Other steps I have tried:&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;I then proceeded to set up the MKL_MIC_DISABLE_HOST_FALLBACK=1 environmental variable, and as expected, when I ran the above code, R terminated.&lt;/P&gt;

&lt;P&gt;In &lt;A href="https://community.intel.com/legacyfs/online/drupal_files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf"&gt;https://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf&lt;/A&gt;&amp;nbsp;it says that if the HOST_FALLBACK flag is active and offload is attempted but fails (due to “offload runtime cannot find a coprocessor or cannot initialize it properly”), it will terminate the program – this is happening in that R is terminating completely. For completeness, this problem is happening on R-3.5.1, Microsoft R Open 3.5.0 and R-3.2.1 as well.&lt;/P&gt;

&lt;P&gt;&lt;U&gt;&lt;STRONG&gt;So my questions are:&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;

&lt;OL&gt;
	&lt;LI&gt;What am I missing to make the R code run on the Xeon phi? Can you please advise me on what I need to do to make this work?&lt;/LI&gt;
	&lt;LI&gt;(linked to 1) is there a way to check if the MKL offload runtime can see the Xeon phi? Or that it is correctly set up, or what (if any) problem that MKL is having initialising the Xeon phi?&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;Will sincerely appreciate your help – I believe that I am missing a fundamental/simple step, and have been tearing my hair out trying to make this work.&lt;/P&gt;

&lt;P&gt;Many thanks in advance,&lt;/P&gt;

&lt;P&gt;Keyur&lt;/P&gt;</description>
      <pubDate>Tue, 14 Aug 2018 07:23:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Xeon-Phi-Offload-Runtime-Issue-3120A/m-p/1170302#M28513</guid>
      <dc:creator>kelkar__Keyur1</dc:creator>
      <dc:date>2018-08-14T07:23:53Z</dc:date>
    </item>
    <item>
      <title>I am not sure hoe exactly you</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Xeon-Phi-Offload-Runtime-Issue-3120A/m-p/1170303#M28514</link>
      <description>&lt;P&gt;I am not sure how exactly you do the call MKL from R API, but you try to use&amp;nbsp;&lt;SPAN class="fontstyle0"&gt;OFFLOAD_REPORT&amp;nbsp; environment variable and see&amp;nbsp; summary information about data transfers between the host and the target.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 09:02:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Xeon-Phi-Offload-Runtime-Issue-3120A/m-p/1170303#M28514</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-08-16T09:02:00Z</dc:date>
    </item>
    <item>
      <title>Thanks Gennady - I tried</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Xeon-Phi-Offload-Runtime-Issue-3120A/m-p/1170304#M28515</link>
      <description>Thanks Gennady - I tried setting OFFLOAD_REPORT=2 but couldnt find the output. Where in Windows would i find it?

Is there anything else in the settings i need to activate to make auto offload work in Windows? (not just limiting this question to R now)</description>
      <pubDate>Thu, 16 Aug 2018 09:47:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Xeon-Phi-Offload-Runtime-Issue-3120A/m-p/1170304#M28515</guid>
      <dc:creator>kelkar__Keyur1</dc:creator>
      <dc:date>2018-08-16T09:47:32Z</dc:date>
    </item>
  </channel>
</rss>

