MKL/Xeon Phi Offload Runtime Issue - 3120A

kelkar__Keyur1 · ‎08-14-2018

Hello there

I have set up my Xeon phi 3120A in Windows 10 Pro, with MPSS 3.8.4 and Parallel XE 2017 (Initial Release). I have chosen this Parallel XE as this was the last supported XE for the x100 series. I have installed the MKL version that is packaged with the Parallel XE 2017 (Initial Release).

What have I done / setup:

After setting up MPSS 3.8.4, and following the steps such as flashing and pinging, I have checked that micctrl -s shows “mic0 ready” (with linux image containing the appropriate KNC name), miccheck produces all "passes" and micinfo gives me a reading for all the key stats that the co-processor is providing.

Hence to me it looks like the co-processor is certainly installed and being recognised by my computer. I can also see that mic0 is up and running in the micsmc gui.

I have then set up my environment variables to enable automatic offload, namely, MKL_MIC_ENABLE=1, OFFLOAD_DEVICES= 0, MKL_MIC_MAX_MEMORY= 2GB, MIC_ENV_PREFIX= MIC, MIC_OMP_NUM_THREADS= 228, MIC_KMP_AFFINITY= balanced.

The Problem

When I go to run some simple code in R-3.4.3 (copied below, designed specifically for automatic offload), it keeps running the code through my host computer rather than running anything through the Xeon phi. To support this, I cannot see any activity onthe xeon Phis when I look at the micsmc gui.

The R code:

require(Matrix)
sink("output.txt")
N <- 16000
cat("Initialization...\n")
a <- matrix(runif(N*N), ncol=N, nrow=N);
b <- matrix(runif(N*N), ncol=N, nrow=N);
cat("Matrix-matrix multiplication of size ", N, "x", N, ":\n")
for (i in 1:5) {
  dt=system.time( c <- a %*% b )
  gflops = 2*N*N*N*1e-9/dt[3]
  cat("Trial: ", i, ", time: ", dt[3], " sec, performance: ", gflops, " GFLOP/s\n")
}

Other steps I have tried:

I then proceeded to set up the MKL_MIC_DISABLE_HOST_FALLBACK=1 environmental variable, and as expected, when I ran the above code, R terminated.

In https://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf it says that if the HOST_FALLBACK flag is active and offload is attempted but fails (due to “offload runtime cannot find a coprocessor or cannot initialize it properly”), it will terminate the program – this is happening in that R is terminating completely. For completeness, this problem is happening on R-3.5.1, Microsoft R Open 3.5.0 and R-3.2.1 as well.

So my questions are:

What am I missing to make the R code run on the Xeon phi? Can you please advise me on what I need to do to make this work?
(linked to 1) is there a way to check if the MKL offload runtime can see the Xeon phi? Or that it is correctly set up, or what (if any) problem that MKL is having initialising the Xeon phi?

Will sincerely appreciate your help – I believe that I am missing a fundamental/simple step, and have been tearing my hair out trying to make this work.

Many thanks in advance,

Keyur

Gennady_F_Intel · ‎08-16-2018

I am not sure how exactly you do the call MKL from R API, but you try to use OFFLOAD_REPORT environment variable and see summary information about data transfers between the host and the target.

kelkar__Keyur1 · ‎08-16-2018

Thanks Gennady - I tried setting OFFLOAD_REPORT=2 but couldnt find the output. Where in Windows would i find it? Is there anything else in the settings i need to activate to make auto offload work in Windows? (not just limiting this question to R now)