Xeon Phi wrong behavior

moises_v_ · ‎03-07-2014

Hi,

I have a new 'strange' behavior of a OpenCL kernel using the Xeon Phi.

In this case, I have a small example written in HPL that executes perfectly in CPU, GPU but not in XEON PHI.

I've attached the example in a .cpp file. You can download HPL library to test it or you can reproduce it with OpenCL (If you need the OpenCL code, please ask me). The problem is in the following loop:

inline Double ipow(Double aa, Int a, Int b)
{
  Double q, qaux;
  Int n, n2;
  Int two_pow = 0;
  q = aa;
  n = a;
  while_(two_pow < 100) {
    n2 = n / 2;
    if_(n2 * 2 == n) {
      qaux = 1.5*q;
      q = qaux;
      n = n2;
    }
    else_ {
      n = n * b;
      two_pow = 200;
    }
//////// UNCOMMENT THIS LINE AND THE PROBLEM WILL BE SOLVED :(
//    two_pow++;
  }

 return(q);
}

As I wrote above, if you uncomment the line (superfluous line) the code executes perfectly in the three (Intel CPU, NVIDIA GPU, and XEON PHI) platforms but the line is commented, the code fails in the XEON PHI case. The global and local work spaces are {1,1,1}.

Why it fails for PHI case? Thank you so much

Moisés Viñas

http://gac.udc.es/~moises/index_en.html

Yuri_K_Intel · ‎03-13-2014

Hi Moisés, I can't compile the attached reproducer for Xeon Phi, I've got compilation error for 'ACCELERATOR' macro. I guess this is because the public version of HPL doesn't support accelerator device type. Could you please provide a more recent version? Meanwhile for CPU I got: 2.96439e-323 51.2578 51.2578 51.2578 ... The first value fluctuates slightly from run to run. Is the output correct? Thanks, Yuri

moises_v_ · ‎03-13-2014

Hi Yuri,

Yes, the public version is outdated :-/ But the problem will be solved if you change the following files in /src subfolder for these ones: CLbinding.cpp, Device.h and Device.cpp

The results for a = 1.5^7 x 3 for i!=0. For i = 0 is an aleatory number. However, for XEON PHI, the value is 1.5 x 3 Why?

Thanks in advance,

Moisés Viñas

http://gac.udc.es/~moises/index_en.html

Yuri_K_Intel · ‎03-14-2014

Ok, now it's working for me and I was able to reproduce the issue using latest public release (XE R3). But it works correctly on our internal development version. So please expect a fix to be available in next release (no estimates about the date, as usual). If time permits I will also try to add this reproducer to our test system to be sure that regression is not introduced. Thanks, Yuri

Dave_O_ · ‎04-01-2014

Yuri, my Xeon Phi acts strange with a simple convolution kernel. I believe the results it shows are incorrect. Is there a known problem with opencl support for phi?

For kernel enqueue:

global size set to 1024x1024

local size set to NULL.

Thanks

//KERNEL_SIMPLE
__kernel void Convolve(const __global float * pInput,
__constant float * pFilter,
__global float * pOutput,
const int nInWidth,
const int nFilterWidth)
{
const int nWidth = get_global_size(0);

const int xOut = get_global_id(0);
const int yOut = get_global_id(1);

const int xInTopLeft = xOut;
const int yInTopLeft = yOut;

float sum = 0;
for (int r = 0; r < nFilterWidth; r++)
{
const int idxFtmp = r * nFilterWidth;

const int yIn = yInTopLeft + r;
const int idxIntmp = yIn * nInWidth + xInTopLeft;

for (int c = 0; c < nFilterWidth; c++)
{
const int idxF = idxFtmp + c;
const int idxIn = idxIntmp + c;
sum += pFilter[idxF]*pInput[idxIn];
}
}
const int idxOut = yOut * nWidth + xOut;
pOutput[idxOut] = sum;
}
//KERNEL_SIMPLE