clEnqueueReadBuffer crashes if using printf and global work size > 1

goldmund99 · ‎05-09-2011

System: Intel Core 2 Duo SL 9400, Windows 7 64 bits, Intel OpenCL SDK 1.1 Beta

Hello,
I have a problem I'm unable to debug. My program crashes when calling a kernel with a printf in it.
The program does not crash when I remove the printf statement or when I use global work size = 1 (but in any case it does not print anything).

I could not find any documentation about printf, except that works like in C99 and that requires to enable the extension. Are there any other requirements?

Eli_Bendersky__Intel · ‎05-09-2011

Hello goldmund99,

Could you please provide us with a minimal code sample that reproduces your problem?

It does not have to be your full kernel, if that's impossible for you due to security reasons.

Thanks in advance

goldmund99 · ‎05-09-2011

The kernel is:

#pragma OPENCL EXTENSION cl_intel_printf : enable
__kernel void sensitivity(__global int * dest)
{
int tid = get_global_id(0);
dest[tid] = tid*2;
printf("tid");
}

Maybe I did some error during testing, now it crashes also with global_work_size = 1 and local_work_size = NULL.
If I remove the printf statement, it works as espected.

goldmund99 · ‎05-09-2011

More details:
The reported error is "Stack overflow"

The code I use to run the kernel is as follows:

cl_int errorcode = CL_SUCCESS;
n = 1;
std::vector dst_host(n);

cl_context context = clCreateContextFromType(context_properties, CL_DEVICE_TYPE_CPU, NULL, NULL, &errorcode);
cl_command_queue cmd_queue = clCreateCommandQueue(context, device, properties, &errorcode);
cl_mem dest_mem = clCreateBuffer(context, CL_MEM_WRITE_ONLY, n*sizeof(cl_int), &dst_host[0], &errorcode);
cl_program program = clCreateProgramWithSource(context, 1, (const char**)kernel_source, lengths, &errorcode);
errorcode = clBuildProgram(program, NULL, NULL, NULL, NULL, NULL);
cl_kernel kernel = clCreateKernel(program, "sensitivity", &errorcode);
errorcode = clSetKernelArg(kernel,0,sizeof(cl_mem),(const void *)dest_mem);
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);
errorcode = clEnqueueReadBuffer(cmd_queue, dest_mem, CL_TRUE, 0, n*sizeof(cl_int), &dst_host[0],NULL,NULL,NULL); // <- CRASH

goldmund99 · ‎05-09-2011

Also very strange: when I compile the kernel with the Intel OpenCL SDK Offline Compiler it gives me this warning:

Build succeeded!
:1:26: warning: expected identifier in '#pragma OPENCL' - ignored
Build started
Kernel was successfully vectorized
Done.

Evgeny_F_Intel · ‎05-10-2011

Hi,

We will check the warning issue, but it's not related to the failure.

Please check the following line:

errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);

According to the spec. the global range parameter should be passed as array and not as constant.
Please use this sequence:

size_t global = 1;
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global, NULL, NULL, NULL);

goldmund99 · ‎05-10-2011

Actually I'm using a pointer as the specification says, I just changed in the post the sentence to made the sentence clearer...sorry for the mistake.
I reported the API code just for reference but it is the same as in provided examples, I checked that the same code works with different kernels and also for the minimal kerner I wrote, if I comment the printf statement.

Shiri_M_Intel · ‎05-10-2011

Hi
We will try your code and update you
Thanks, Shiri

Eli_Bendersky__Intel · ‎05-11-2011

Hello goldmund99,

To reproduce your problem quickly, it would help a lot if you could send us the complete source code file that crashes for you. We need the host C++ code that invokes the OpenCL SDK, as well as the kernel (if that's in a separate file). Getting the complete code, as you yourself run it, is very important.

Is this possible?

Eli

Eli_Bendersky__Intel · ‎05-11-2011

goldmund99,

Thanks. We'll try to reproduce the problem and will get back to you once we have some conclusions.

goldmund99 · ‎05-12-2011

I have found a workaround for now.
The problem is not clEnqueueReadBuffer, because the crash happens also by calling clFinish after clEnqueueNDRangeKernel.

I was calling clEnqueueNDRangeKernel using NULL as argument for local_work_size. According to the 1.1 specification: "local_work_size can also be a NULL value in which case the OpenCL implementation will determine how to be break the global work-items into appropriate work-group instances."

If i explicit the local work size, for global_work_size = 10 and work_dim I call clEnqueueNDRangeKernel and get:

*local_work_size = 0 -> Error: CL_INVALID_WORK_GROUP_SIZE (this is different than giving NULL pointer as argument), conformant with OpenCL 1.1
*local_work_size = 1-2 -> Returns CL_SUCCESS but crashes when calling clFinish
*local_work_size = 3-4 -> Error: CL_INVALID_WORK_GROUP_SIZE, conformant with OpenCL 1.1
*local_work_size = 5 -> Correct results, every 4-5 tries crashes
*local_work_size = 6-9 -> Error: CL_INVALID_WORK_GROUP_SIZE, conformant with OpenCL 1.1
*local_work_size = 10 -> Correct results, seems to never crash

Shiri_M_Intel · ‎05-15-2011

Hi
We were able to reproduce the failure. It apears to be bug in the SDK compiler.
We are looking into that.
Thanks, Shiri