OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1719 Discussions

Argument passing bug only in Debug mode for Intel OpenCL SDK

FENG__Leman
Beginner
363 Views

I ran into a bug when I use the Intel OpenCL SDK (verison 7.0.0.2567) on Visual Studio 2015. I defined a struct "Obj" of an array of 5 int. Then I pass five variable of type "Obj" to my opencl kernel program as "__private" variables. If I build my kernel in debug mode (with argument "-g -s filepath"), then some of my variables can not be passed correctly. The code example is on https://github.com/flm8620/intel_opencl_bug/blob/master/main.cpp

Kernel program is:

struct Obj {
    int a[5];
};

__kernel void test(
    __global double* output,
    __private struct Obj param1,
    __private struct Obj param2,
    __private struct Obj param3,
    __private struct Obj param4,
    __private struct Obj param5
    //__private struct Obj param6
)
{
    int gl = get_global_id(0);
    const int N = 5;
    if (gl == 0) {
        for (int i = 0; i < N; i++)
            output = param1.a;
        for (int i = 0; i < N; i++)
            output[i + N * 1] = param2.a;
        for (int i = 0; i < N; i++)
            output[i + N * 2] = param3.a;
        for (int i = 0; i < N; i++)
            output[i + N * 3] = param4.a;
        for (int i = 0; i < N; i++)
            output[i + N * 4] = param5.a;
    }
}

To verify this, I copy passed variables in kernel to a output variable and print it out at host side:

int main() {
    bool debug = true;
    find_cl(debug);
    const int N = 5;
    struct Obj
    {
        cl_int a;
    };
    Obj param1{ {1,12,123,1234,12345} };
    Obj param2{ {1,12,123,1234,12345} };
    Obj param3{ {1,12,123,1234,12345} };
    Obj param4{ {1,12,123,1234,12345} };
    Obj param5{ {1,12,123,1234,12345} };

    double output[N * 5];

    cl::Buffer output_b(context, CL_MEM_WRITE_ONLY, N * 5 * sizeof(double));

    kernel.setArg(0, output_b);
    kernel.setArg(1, param1);
    kernel.setArg(2, param2);
    kernel.setArg(3, param3);
    kernel.setArg(4, param4);
    kernel.setArg(5, param5);


    cl::CommandQueue queue(context, device);

    queue.enqueueNDRangeKernel(kernel, cl::NullRange, { size_t(1) }, { size_t(1) });
    queue.enqueueReadBuffer(output_b, false, 0, N * 5 * sizeof(double), output);
    queue.finish();

    for (int i = 0; i < N * 5; i++)
        std::cout << output << std::endl;

    return 0;
}

The output is 

Detected 3 platforms :
NVIDIA CUDA
Intel(R) OpenCL
Experimental OpenCL 2.1 CPU Only Platform
Found CPU platform: Intel(R) OpenCL, has devices:
  1: Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz
Use device: Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz
1
12
123
1234
12345
1
12
123
1234
12345
1
12
123
1234
12345
1
12
123
1234
0
123
1234
0
0
3.90955e+07

I don't think it's linked to struct align because the first four arguments are correctly passed. If I pass six instead of five variables, the program will crash.

If I change the first line in main() to

bool debug = false;

Then everything works. 

0 Kudos
0 Replies
Reply