OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Xeon Phi wrong behavior

moises_v_
Beginner
450 Views

Hi,

I have a new 'strange' behavior of a OpenCL kernel using the Xeon Phi.

In this case,  I have a small example written in HPL that executes perfectly in CPU, GPU but not in XEON PHI.

I've attached the example in a .cpp file. You can download HPL library to test it or you can reproduce it with OpenCL (If you need the OpenCL code, please ask me). The problem is in the following loop:

inline Double ipow(Double aa, Int a, Int b)
{
  Double q, qaux;
  Int n, n2;
  Int two_pow = 0;
  q = aa;
  n = a;
  while_(two_pow < 100) {
    n2 = n / 2;
    if_(n2 * 2 == n) {
      qaux = 1.5*q;
      q = qaux;
      n = n2;
    }
    else_ {
      n = n * b;
      two_pow = 200;
    }
//////// UNCOMMENT THIS LINE AND THE PROBLEM WILL BE SOLVED :(
//    two_pow++;
  }

 return(q);
}

As I wrote above, if you uncomment the line (superfluous line) the code executes perfectly in the three (Intel CPU, NVIDIA GPU, and XEON PHI) platforms but the line is commented, the code fails in the XEON PHI case. The global and local work spaces are {1,1,1}.

Why it fails for PHI case? Thank you so much

 

Moisés Viñas

http://gac.udc.es/~moises/index_en.html

 

0 Kudos
4 Replies
Yuri_K_Intel
Employee
450 Views
Hi Moisés, I can't compile the attached reproducer for Xeon Phi, I've got compilation error for 'ACCELERATOR' macro. I guess this is because the public version of HPL doesn't support accelerator device type. Could you please provide a more recent version? Meanwhile for CPU I got: 2.96439e-323 51.2578 51.2578 51.2578 ... The first value fluctuates slightly from run to run. Is the output correct? Thanks, Yuri
0 Kudos
moises_v_
Beginner
449 Views

Hi Yuri,

 

Yes, the public version is outdated :-/ But the problem will be solved if you change the following files in /src subfolder for these ones: CLbinding.cpp, Device.h and Device.cpp

 

The results for a = 1.5^7 x 3 for i!=0. For i = 0 is an aleatory number. However, for XEON PHI, the value is 1.5 x 3 Why?

 

Thanks in advance,

 

Moisés Viñas

http://gac.udc.es/~moises/index_en.html

 

0 Kudos
Yuri_K_Intel
Employee
450 Views
Ok, now it's working for me and I was able to reproduce the issue using latest public release (XE R3). But it works correctly on our internal development version. So please expect a fix to be available in next release (no estimates about the date, as usual). If time permits I will also try to add this reproducer to our test system to be sure that regression is not introduced. Thanks, Yuri
0 Kudos
Dave_O_
Beginner
450 Views

Yuri, my Xeon Phi acts strange with a simple convolution kernel. I believe the results it shows are incorrect. Is there a known problem with opencl support for phi?

 

For kernel enqueue: 

global size set to 1024x1024

local size set to NULL.

 

Thanks 

//KERNEL_SIMPLE
__kernel void Convolve(const __global  float * pInput,
                        __constant float * pFilter,
                        __global  float * pOutput,
                        const int nInWidth,
                        const int nFilterWidth)
{
    const int nWidth = get_global_size(0);

    const int xOut = get_global_id(0);
    const int yOut = get_global_id(1);

    const int xInTopLeft = xOut;
    const int yInTopLeft = yOut;

    float sum = 0;
    for (int r = 0; r < nFilterWidth; r++)
    {
        const int idxFtmp = r * nFilterWidth;

        const int yIn = yInTopLeft + r;
        const int idxIntmp = yIn * nInWidth + xInTopLeft;

        for (int c = 0; c < nFilterWidth; c++)
        {
            const int idxF  = idxFtmp  + c;
            const int idxIn = idxIntmp + c;
            sum += pFilter[idxF]*pInput[idxIn];
        }
    }
    const int idxOut = yOut * nWidth + xOut;
    pOutput[idxOut] = sum;
}
//KERNEL_SIMPLE

0 Kudos
Reply