Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Benjamin_H_
Beginner
39 Views

Relatively simply kernel (included) fails to compile on GPU, compiles on CPU & other platforms

I was able to simplify the kernel to a very small state in order to make it easier to track down the bug. Logically, this kernel may not be correct however syntax seems OK to me and it compiles with on the CPU, as well as other platforms (AMD CPU and GPU).

This is on an Intel Core i3 32xx CPU, the IGP is an Intel HD 2500. I'm using the latest driver build (3345), windows 7 x64 and the Kernel Builder x64 application.

[cpp]

/* Please Write the OpenCL Kernel(s) code here*/

void ntn_snh(__global float16 const *restrict src, __global float16 *restrict dst)

{

    for (__private int j = 0; j < 16; j+=4, src+=4, dst+=4)

    {

        dst[0] = sinh(src[0]);

        dst[1] = sinh(src[1]);

        dst[2] = sinh(src[2]);

        dst[3] = sinh(src[3]);

    }

}

 

void ntn_csh(__global float16 const *restrict src, __global float16 *restrict dst)

{

    for (__private int j = 0; j < 16; j+=4, src+=4, dst+=4)

    {

        dst[0] = cosh(src[0]);

        dst[1] = cosh(src[1]);

        dst[2] = cosh(src[2]);

        dst[3] = cosh(src[3]);

    }

}

 

void ntn_tnh(__global float16 const *restrict src, __global float16 *restrict dst)

{

    for (__private int j = 0; j < 16; j+=4, src+=4, dst+=4)

    {

        dst[0] = tanh(src[0]);

        dst[1] = tanh(src[1]);

        dst[2] = tanh(src[2]);

        dst[3] = tanh(src[3]);

    }

}

 

__kernel void eval(__global uchar  const *restrict const gcom,

                          __global float16 const *restrict const gdata_in,

                          __global float16          *restrict const gdata_out)

{

    __private int j = 6, k = 9;

    for (; k >= 7; --j, --k)

    {

        switch(gcom)

        {

        case 0: break;

        case 1: ntn_snh(gdata_in + gcom*64, gdata_out + j*64); break;

        case 2: ntn_csh(gdata_in + gcom*64, gdata_out + j*64); break;

        case 3: ntn_tnh(gdata_in + gcom*64, gdata_out + j*64); break;

        case 4: break;

        default:{}

        }

}

 

    for (; j >= 0; --j, --k)

    {

        switch(gcom)

        {

        case 0: break;

        case 1: ntn_snh(gdata_out + k*64, gdata_out + j*64); break;

        case 2: ntn_csh(gdata_out + k*64, gdata_out + j*64); break;

        case 3: ntn_tnh(gdata_out + k*64, gdata_out + j*64); break;

        case 4: break;

        default:{}

        }

    }

}

[/cpp]

0 Kudos
2 Replies
Benjamin_H_
Beginner
39 Views

So nothing? This bug seems related, at least partially, to sinh, cosh and tanh. What's more, quite often the compiler vacuums up gigabytes worth of memory.

My current revision, which refuses to compile for an Intel HD 4000 (Ivy bridge, 16 CUs), allocated near 16 gigabytes of ram at one point. And it took near half an hour for it to fail compilation. yet the same code compiles fine on the CPU as well as nVidia's platform (GTX 660M).

ARNON_P_Intel
Employee
39 Views

Hi Benjamin,

Looks like we have issues with the hyperbolic functions sinh/cosh/tanh. We are investigate this.

Thanks a lot for providing us such a simple kernel to reproduce this issue. Please try to use alternatives functions while we working on this.

Regards,

Arnon

 

 

 hyperbolic