<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Sabela,  in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042882#M46723</link>
    <description>&lt;P&gt;Sabela,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;- AO mode available for pretty narrow list of MKL's functions ( like BLAS level 3 &amp;nbsp;or factorization ( LU, QR or Cholesky ) routines)., &amp;nbsp;&lt;/P&gt;

&lt;P&gt;- you should also know that for the "small" sizes, AO mode will not work. &amp;nbsp;Please refer to this article to get more info about that:&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/intel-mkl-automatic-offload-enabled-functions-for-intel-xeon-phi-coprocessors" target="_blank"&gt;https://software.intel.com/en-us/articles/intel-mkl-automatic-offload-enabled-functions-for-intel-xeon-phi-coprocessors&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;--Gennady&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 13 Jan 2015 13:03:37 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2015-01-13T13:03:37Z</dc:date>
    <item>
      <title>Automatic offload not working for R</title>
      <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042880#M46721</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I'm trying to run the &lt;A href="http://r.research.att.com/benchmarks/R-benchmark-25.R"&gt;R-benchmark-25.R&lt;/A&gt; using automatic offloading to the Xeon Phi but AO is not working.&lt;/P&gt;

&lt;P&gt;I have compiled my R library as explained &lt;A href="https://software.intel.com/en-us/articles/running-r-with-support-for-intel-xeon-phi-coprocessors"&gt;here&lt;/A&gt;&amp;nbsp; and set the env variables (MIC_ENABLE=1 and different values for MKL_HOST_WORKDIVISION, MKL_MIC_0_WORKDIVISION, and , MKL_MIC_1_WORKDIVISION).&lt;/P&gt;

&lt;P&gt;When running the benchmark, I've checked with micsmc that no code is being offloaded to any of the mics. I guess it is not a system configuration problem because the example sgemm.c from the compiler runs just fine and offloads parts of the code.&lt;/P&gt;

&lt;P&gt;I've also tried&amp;amp;checked the following:&lt;/P&gt;

&lt;P&gt;- Force all work to be run on on mic (setting the workdivision variables).&lt;/P&gt;

&lt;P&gt;- Increase the size of the variables in the benchmark.&lt;/P&gt;

&lt;P&gt;- The executable is linked to MKL libraries&lt;/P&gt;

&lt;P&gt;The sw/hw versions that I'm using:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Intel composer_xe_2013.3.163&lt;/LI&gt;
	&lt;LI&gt;MPSS 3.4.1&lt;/LI&gt;
	&lt;LI&gt;CentOS 6.4&lt;/LI&gt;
	&lt;LI&gt;two Xeon Phi 5110P&lt;/LI&gt;
	&lt;LI&gt;R 3.1.2&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Thank you very much for your help,&lt;/P&gt;

&lt;P&gt;Sabela Ramos.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Jan 2015 17:41:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042880#M46721</guid>
      <dc:creator>Sabela_R_</dc:creator>
      <dc:date>2015-01-07T17:41:24Z</dc:date>
    </item>
    <item>
      <title>Sabela, I requested some</title>
      <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042881#M46722</link>
      <description>&lt;P&gt;Sabela, I requested some assistance from the MKL team regarding your issue.&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jan 2015 10:34:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042881#M46722</guid>
      <dc:creator>Kevin_D_Intel</dc:creator>
      <dc:date>2015-01-13T10:34:16Z</dc:date>
    </item>
    <item>
      <title>Sabela, </title>
      <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042882#M46723</link>
      <description>&lt;P&gt;Sabela,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;- AO mode available for pretty narrow list of MKL's functions ( like BLAS level 3 &amp;nbsp;or factorization ( LU, QR or Cholesky ) routines)., &amp;nbsp;&lt;/P&gt;

&lt;P&gt;- you should also know that for the "small" sizes, AO mode will not work. &amp;nbsp;Please refer to this article to get more info about that:&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/intel-mkl-automatic-offload-enabled-functions-for-intel-xeon-phi-coprocessors" target="_blank"&gt;https://software.intel.com/en-us/articles/intel-mkl-automatic-offload-enabled-functions-for-intel-xeon-phi-coprocessors&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;--Gennady&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jan 2015 13:03:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042882#M46723</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2015-01-13T13:03:37Z</dc:date>
    </item>
    <item>
      <title>one more tips would be also</title>
      <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042883#M46724</link>
      <description>&lt;P&gt;one more tips would be also relevant to this topic - usually to take detailed info about offloading process, I use &lt;SPAN style="font-size: 12.8000001907349px; line-height: 15.609601020813px;"&gt;OFFLOAD_REPORT&amp;nbsp;&lt;/SPAN&gt;environment variable.&lt;/P&gt;

&lt;P&gt;export OFFLOAD_REPORT=1,2 or 3.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;you may find details into intel compiler documentation about how it works.&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jan 2015 13:14:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042883#M46724</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2015-01-13T13:14:28Z</dc:date>
    </item>
    <item>
      <title>Hi, thank you for your</title>
      <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042884#M46725</link>
      <description>&lt;P&gt;Hi, thank you for your answers.&lt;/P&gt;

&lt;P&gt;I had tried with the OFFLOAD_REPORT variable set to 1,2 and 3, but there was no offloading so it did not report anything.&lt;/P&gt;

&lt;P&gt;Regarding the functions with AO and the size, I have used the sizes indicated here &lt;A href="http://r.789695.n4.nabble.com/Building-R-for-better-performance-td4686227.html" target="_blank"&gt;http://r.789695.n4.nabble.com/Building-R-for-better-performance-td4686227.html&lt;/A&gt; , and the same benchmark has been used in this paper &lt;A href="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6691695&amp;amp;tag=1" target="_blank"&gt;http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6691695&amp;amp;tag=1&lt;/A&gt; to assess R performance using one Xeon Phi, so I was trying to replicate the results, but none of the function calls in the benchmark result in AO.&lt;/P&gt;

&lt;P&gt;Thank you very much again,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Sabela.&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jan 2015 15:22:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042884#M46725</guid>
      <dc:creator>Sabela_R_</dc:creator>
      <dc:date>2015-01-13T15:22:33Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042885#M46726</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I finally managed to make it work but I had to increase the sizes way over the minimum to be able to see the peaks in the micsmc tool because the OFFLOAD_REPORT, even when set to 3, is not showing anything. When using "manual offloading" it works. Do you know what might be happening?&lt;/P&gt;

&lt;P&gt;All the best and thank you very much,&lt;/P&gt;

&lt;P&gt;Sabela.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Jan 2015 17:44:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042885#M46726</guid>
      <dc:creator>Sabela_R_</dc:creator>
      <dc:date>2015-01-16T17:44:11Z</dc:date>
    </item>
    <item>
      <title>Same issue. Not able to</title>
      <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042886#M46727</link>
      <description>&lt;P&gt;Same issue. Not able to automatic offload to work with Revolution R with MKL despite matrix operation on matrix that was 32000X32000 floating point numbers.&lt;/P&gt;</description>
      <pubDate>Mon, 02 Nov 2015 03:52:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042886#M46727</guid>
      <dc:creator>terry_l_</dc:creator>
      <dc:date>2015-11-02T03:52:36Z</dc:date>
    </item>
    <item>
      <title>I was able to exploit</title>
      <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042887#M46728</link>
      <description>&lt;P&gt;I was able to exploit automatic offload in R 3.2.3. I have found some minor discrepancies with respect to the recipe in the article - perhaps this iis the cause of the problem? My experience is summarized below.&lt;/P&gt;

&lt;P&gt;In the current version of MKL (11.3.1), the compilation flags are slightly different than in the &lt;A href="https://software.intel.com/en-us/articles/running-r-with-support-for-intel-xeon-phi-coprocessors"&gt;article&lt;/A&gt; cited by the OP. I was able to figure out the correct flags using the &lt;A href="https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor"&gt;MKL link line advisor&lt;/A&gt;.&lt;/P&gt;

&lt;P&gt;To build R, I used the following configure command arguments:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;./configure --with-blas="-L/opt/intel/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lm" --with-lapack CC=icc CFLAGS="-O2 -qopenmp -I/opt/intel/mkl/include" CXX=icpc CXXFLAGS="-O2 -qopenmp -I/opt/intel/mkl/include" F77=ifort FFLAGS="-O2 -qopenmp -I/opt/intel/mkl/include" FC=ifort FCFLAGS="-O2 -qopenmp -I/opt/intel/mkl/include" --prefix=/opt/R&lt;/PRE&gt;

&lt;P&gt;After that, I ran "make" and "make install" (the latter as root) and added /opt/R/lib64 to my LD_LIBRARY_PATH and /opt/R/bin to my PATH.&lt;/P&gt;

&lt;P&gt;Here is the R program "gemm.R" that I used to test the AO functionality:&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;require(Matrix)
sink("output.txt")
N &amp;lt;- 16000
cat("Initialization...\n")
a &amp;lt;- matrix(runif(N*N), ncol=N, nrow=N);
b &amp;lt;- matrix(runif(N*N), ncol=N, nrow=N);
cat("Matrix-matrix multiplication of size ", N, "x", N, ":\n")
for (i in 1:5) {
  dt=system.time( c &amp;lt;- a %*% b )
  gflops = 2*N*N*N*1e-9/dt[3]
  cat("Trial: ", i, ", time: ", dt[3], " sec, performance: ", gflops, " GFLOP/s\n")
}
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;First, I ran it on the CPU with some performance tuning tweaks:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;[andrey@cfx R-test]$ export KMP_AFFINITY=compact,1
[andrey@cfx R-test]$ R -q -f gemm.R
&lt;/PRE&gt;

&lt;P&gt;And the result was:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;[avladim@alma-ata R-test]$ cat output.txt 
Initialization...
Matrix-matrix multiplication of size  16000 x 16000 :
Trial:  1 , time:  35.041  sec, performance:  233.7833  GFLOP/s
Trial:  2 , time:  35.135  sec, performance:  233.1578  GFLOP/s
Trial:  3 , time:  34.959  sec, performance:  234.3316  GFLOP/s
Trial:  4 , time:  34.301  sec, performance:  238.8269  GFLOP/s
Trial:  5 , time:  34.297  sec, performance:  238.8547  GFLOP/s
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Second, I set up automatic offload and some tuning:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;[andrey@cfx R-test]$ export MKL_MIC_ENABLE=1
[andrey@cfx R-test]$ export MIC_KMP_AFFINITY=compact
[andrey@cfx R-test]$ R -q -f gemm.R
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;This time, the output was as below:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;[andrey@cfx R-test]$ cat output.txt 
Initialization...
Matrix-matrix multiplication of size  16000 x 16000 :
Trial:  1 , time:  11.728  sec, performance:  698.4993  GFLOP/s
Trial:  2 , time:  7.678  sec, performance:  1066.945  GFLOP/s
Trial:  3 , time:  7.802  sec, performance:  1049.987  GFLOP/s
Trial:  4 , time:  7.715  sec, performance:  1061.828  GFLOP/s
Trial:  5 , time:  7.821  sec, performance:  1047.436  GFLOP/s
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;and in the micsmc tool, I could see activity on both of my Xeon Phi cards:&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Screenshot-26_0.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/7228iEF24309F83152B81/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="Screenshot-26_0.png" alt="Screenshot-26_0.png" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;For some reason, the performance numbers that I got with R are around 30% lower than from an equivalent C++ program with the same matrix size (I got 1360 GFLOP/s in C++). Possibly bad alignment in R?&lt;/P&gt;

&lt;P&gt;I have verified that AO kicks in for matrix sizes over 1280, as per the &lt;A href="https://software.intel.com/en-us/articles/intel-mkl-automatic-offload-enabled-functions-for-intel-xeon-phi-coprocessors"&gt;article cited above&lt;/A&gt;.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;My system has a Xeon E5-2630 v2 CPU with enabled hyper-threading and two 3120P Xeon Phis. I am running on CentOS 6.5 with MPSS 3.4.2 and Intel Parallel Studio XE 2016 update 3.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Shameless plug&lt;/STRONG&gt;: I will be demonstrating this workload and some other tuning aspects of MKL in this webinar next week: &lt;A href="http://colfaxresearch.com/hot-1512/#3"&gt;http://colfaxresearch.com/hot-1512/#3&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Dec 2015 18:47:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042887#M46728</guid>
      <dc:creator>Andrey_Vladimirov</dc:creator>
      <dc:date>2015-12-17T18:47:09Z</dc:date>
    </item>
    <item>
      <title>Andrey,</title>
      <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042888#M46729</link>
      <description>&lt;P&gt;Andrey,&lt;/P&gt;

&lt;P&gt;Thanks for the info, but I was unable (404) to follow your link to the webinar.&lt;/P&gt;

&lt;P&gt;Can you please update it? Thanks.&lt;/P&gt;

&lt;P&gt;Best,&lt;/P&gt;

&lt;P&gt;CB&lt;/P&gt;</description>
      <pubDate>Mon, 19 Sep 2016 22:23:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042888#M46729</guid>
      <dc:creator>Carl_B_</dc:creator>
      <dc:date>2016-09-19T22:23:35Z</dc:date>
    </item>
    <item>
      <title>Hello there</title>
      <link>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042889#M46730</link>
      <description>&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;Hello there&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;I have set up my Xeon phi 3120A in Windows 10 Pro, with MPSS 3.8.4 and Parallel XE 2017 (Initial Release). I have chosen this Parallel XE as this was the last supported XE for the x100 series. I have installed the MKL version that is packaged with the Parallel XE 2017 (Initial Release).&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;SPAN style="font-weight: 700;"&gt;What have I done / setup:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;After setting up MPSS 3.8.4, and following the steps such as flashing and pinging, I have checked that&amp;nbsp;micctrl -s&amp;nbsp;shows “mic0 ready” (with linux image containing the appropriate KNC name),&amp;nbsp;miccheck&amp;nbsp;produces all "passes" and&amp;nbsp;micinfo&amp;nbsp;gives me a reading for all the key stats that the co-processor is providing.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;Hence to me it looks like the co-processor is certainly installed and being recognised by my computer. I can also see that mic0 is up and running in the&amp;nbsp;micsmc&amp;nbsp;gui.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;I have then set up my environment variables to enable automatic offload, namely, MKL_MIC_ENABLE=1,&amp;nbsp;OFFLOAD_DEVICES= 0, MKL_MIC_MAX_MEMORY= 2GB,&amp;nbsp;MIC_ENV_PREFIX= MIC, MIC_OMP_NUM_THREADS= 228,&amp;nbsp;MIC_KMP_AFFINITY= balanced.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;SPAN style="font-weight: 700;"&gt;The Problem&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;When I go to run some simple code in R-3.4.3 (copied below, designed specifically for automatic offload), it keeps running the code through my host computer rather than running anything through the Xeon phi. To support this, I cannot see any activity onthe xeon Phis when I look at the micsmc&amp;nbsp;gui.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;The R code (copy from above Andrey's code):&lt;/P&gt;

&lt;DIV class="syntaxhighlighter  " id="highlighter_965054" style="width: 742.5px; font-size: 13.008px; color: rgb(96, 96, 96);"&gt;
	&lt;DIV class="lines" style="width: 2227.5px;"&gt;
		&lt;DIV class="line alt1"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;01&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="plain"&gt;require(Matrix)&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt2"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;02&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="plain"&gt;sink(&lt;/CODE&gt;&lt;CODE class="string"&gt;"output.txt"&lt;/CODE&gt;&lt;CODE class="plain"&gt;)&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt1"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;03&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="plain"&gt;N &amp;lt;- 16000&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt2"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;04&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="plain"&gt;cat(&lt;/CODE&gt;&lt;CODE class="string"&gt;"Initialization...\n"&lt;/CODE&gt;&lt;CODE class="plain"&gt;)&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt1"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;05&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="plain"&gt;a &amp;lt;- matrix(runif(N*N), ncol=N, nrow=N);&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt2"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;06&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="plain"&gt;b &amp;lt;- matrix(runif(N*N), ncol=N, nrow=N);&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt1"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;07&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="plain"&gt;cat(&lt;/CODE&gt;&lt;CODE class="string"&gt;"Matrix-matrix multiplication of size "&lt;/CODE&gt;&lt;CODE class="plain"&gt;, N,&amp;nbsp;&lt;/CODE&gt;&lt;CODE class="string"&gt;"x"&lt;/CODE&gt;&lt;CODE class="plain"&gt;, N,&amp;nbsp;&lt;/CODE&gt;&lt;CODE class="string"&gt;":\n"&lt;/CODE&gt;&lt;CODE class="plain"&gt;)&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt2"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;08&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="keyword bold"&gt;for&lt;/CODE&gt;&amp;nbsp;&lt;CODE class="plain"&gt;(i in 1:5) {&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt1"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;09&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="spaces"&gt;&amp;nbsp;&amp;nbsp;&lt;/CODE&gt;&lt;CODE class="plain"&gt;dt=&lt;/CODE&gt;&lt;CODE class="functions bold"&gt;system&lt;/CODE&gt;&lt;CODE class="plain"&gt;.&lt;/CODE&gt;&lt;CODE class="functions bold"&gt;time&lt;/CODE&gt;&lt;CODE class="plain"&gt;( c &amp;lt;- a %*% b )&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt2"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;10&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="spaces"&gt;&amp;nbsp;&amp;nbsp;&lt;/CODE&gt;&lt;CODE class="plain"&gt;gflops = 2*N*N*N*1e-9/dt[3]&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt1"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;11&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="spaces"&gt;&amp;nbsp;&amp;nbsp;&lt;/CODE&gt;&lt;CODE class="plain"&gt;cat(&lt;/CODE&gt;&lt;CODE class="string"&gt;"Trial: "&lt;/CODE&gt;&lt;CODE class="plain"&gt;, i,&amp;nbsp;&lt;/CODE&gt;&lt;CODE class="string"&gt;", time: "&lt;/CODE&gt;&lt;CODE class="plain"&gt;, dt[3],&amp;nbsp;&lt;/CODE&gt;&lt;CODE class="string"&gt;" sec, performance: "&lt;/CODE&gt;&lt;CODE class="plain"&gt;, gflops,&amp;nbsp;&lt;/CODE&gt;&lt;CODE class="string"&gt;" GFLOP/s\n"&lt;/CODE&gt;&lt;CODE class="plain"&gt;)&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;

		&lt;DIV class="line alt2"&gt;
			&lt;TABLE&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD class="number"&gt;&lt;CODE&gt;12&lt;/CODE&gt;&lt;/TD&gt;
						&lt;TD class="content"&gt;&lt;CODE class="plain"&gt;}&lt;/CODE&gt;&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;
	&lt;/DIV&gt;
&lt;/DIV&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;SPAN style="font-weight: 700;"&gt;Other steps I have tried:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;I then proceeded to set up the MKL_MIC_DISABLE_HOST_FALLBACK=1 environmental variable, and as expected, when I ran the above code, R terminated.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;In&amp;nbsp;&lt;A href="https://community.intel.com/legacyfs/online/drupal_files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf"&gt;https://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf&lt;/A&gt;&amp;nbsp;it says that if the HOST_FALLBACK flag is active and offload is attempted but fails (due to “offload runtime cannot find a coprocessor or cannot initialize it properly”), it will terminate the program – this is happening in that R is terminating completely. For completeness, this problem is happening on R-3.5.1, Microsoft R Open 3.5.0 and R-3.2.1 as well.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;SPAN style="font-weight: 700;"&gt;So my questions are:&lt;/SPAN&gt;&lt;/P&gt;

&lt;OL style="color: rgb(96, 96, 96); font-size: 13.008px;"&gt;
	&lt;LI&gt;What am I missing to make the R code run on the Xeon phi? Can you please advise me on what I need to do to make this work?&lt;/LI&gt;
	&lt;LI&gt;(linked to 1) is there a way to check if the MKL offload runtime can see the Xeon phi? Or that it is correctly set up, or what (if any) problem that MKL is having initialising the Xeon phi?&lt;/LI&gt;
&lt;/OL&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;Will sincerely appreciate your help – I believe that I am missing a fundamental/simple step, and have been tearing my hair out trying to make this work.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;Many thanks in advance,&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;Keyur&lt;/P&gt;</description>
      <pubDate>Wed, 15 Aug 2018 05:05:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Automatic-offload-not-working-for-R/m-p/1042889#M46730</guid>
      <dc:creator>kelkar__Keyur1</dc:creator>
      <dc:date>2018-08-15T05:05:06Z</dc:date>
    </item>
  </channel>
</rss>

