OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1686 Discussions

clEnqueueReadBuffer crashes if using printf and global work size > 1

goldmund99
Beginner
849 Views
System: Intel Core 2 Duo SL 9400, Windows 7 64 bits, Intel OpenCL SDK 1.1 Beta

Hello,
I have a problem I'm unable to debug. My program crashes when calling a kernel with a printf in it.
The program does not crash when I remove the printf statement or when I use global work size = 1 (but in any case it does not print anything).

I could not find any documentation about printf, except that works like in C99 and that requires to enable the extension. Are there any other requirements?
0 Kudos
11 Replies
Eli_Bendersky__Intel
849 Views
Hello goldmund99,

Could you please provide us with a minimal code sample that reproduces your problem?

It does not have to be your full kernel, if that's impossible for you due to security reasons.

Thanks in advance
goldmund99
Beginner
849 Views
The kernel is:

#pragma OPENCL EXTENSION cl_intel_printf : enable
__kernel void sensitivity(__global int * dest)
{
int tid = get_global_id(0);
dest[tid] = tid*2;
printf("tid");
}

Maybe I did some error during testing, now it crashes also with global_work_size = 1 and local_work_size = NULL.
If I remove the printf statement, it works as espected.
goldmund99
Beginner
849 Views
More details:
The reported error is "Stack overflow"

The code I use to run the kernel is as follows:

cl_int errorcode = CL_SUCCESS;
n = 1;
std::vector dst_host(n);

cl_context context = clCreateContextFromType(context_properties, CL_DEVICE_TYPE_CPU, NULL, NULL, &errorcode);
cl_command_queue cmd_queue = clCreateCommandQueue(context, device, properties, &errorcode);
cl_mem dest_mem = clCreateBuffer(context, CL_MEM_WRITE_ONLY, n*sizeof(cl_int), &dst_host[0], &errorcode);
cl_program program = clCreateProgramWithSource(context, 1, (const char**)kernel_source, lengths, &errorcode);
errorcode = clBuildProgram(program, NULL, NULL, NULL, NULL, NULL);
cl_kernel kernel = clCreateKernel(program, "sensitivity", &errorcode);
errorcode = clSetKernelArg(kernel,0,sizeof(cl_mem),(const void *)dest_mem);
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);
errorcode = clEnqueueReadBuffer(cmd_queue, dest_mem, CL_TRUE, 0, n*sizeof(cl_int), &dst_host[0],NULL,NULL,NULL); // <- CRASH
goldmund99
Beginner
849 Views
Also very strange: when I compile the kernel with the Intel OpenCL SDK Offline Compiler it gives me this warning:

Build succeeded!
:1:26: warning: expected identifier in '#pragma OPENCL' - ignored
Build started
Kernel was successfully vectorized
Done.

Evgeny_F_Intel
Employee
849 Views
Hi,

We will check the warning issue, but it's not related to the failure.

Please check the following line:

errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);

According to the spec. the global range parameter should be passed as array and not as constant.
Please use this sequence:

size_t global = 1;
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global, NULL, NULL, NULL);
goldmund99
Beginner
849 Views
Actually I'm using a pointer as the specification says, I just changed in the post the sentence to made the sentence clearer...sorry for the mistake.
I reported the API code just for reference but it is the same as in provided examples, I checked that the same code works with different kernels and also for the minimal kerner I wrote, if I comment the printf statement.
Shiri_M_Intel
Employee
849 Views
Hi
We will try your code and update you
Thanks, Shiri
Eli_Bendersky__Intel
849 Views
Hello goldmund99,

To reproduce your problem quickly, it would help a lot if you could send us the complete source code file that crashes for you. We need the host C++ code that invokes the OpenCL SDK, as well as the kernel (if that's in a separate file). Getting the complete code, as you yourself run it, is very important.

Is this possible?

Eli
Eli_Bendersky__Intel
849 Views
goldmund99,

Thanks. We'll try to reproduce the problem and will get back to you once we have some conclusions.

goldmund99
Beginner
849 Views
I have found a workaround for now.
The problem is not clEnqueueReadBuffer, because the crash happens also by calling clFinish after clEnqueueNDRangeKernel.

I was calling clEnqueueNDRangeKernel using NULL as argument for local_work_size. According to the 1.1 specification: "local_work_size can also be a NULL value in which case the OpenCL implementation will determine how to be break the global work-items into appropriate work-group instances."

If i explicit the local work size, for global_work_size = 10 and work_dim I call clEnqueueNDRangeKernel and get:
  • *local_work_size = 0 -> Error: CL_INVALID_WORK_GROUP_SIZE (this is different than giving NULL pointer as argument), conformant with OpenCL 1.1
  • *local_work_size = 1-2 -> Returns CL_SUCCESS but crashes when calling clFinish
  • *local_work_size = 3-4 -> Error: CL_INVALID_WORK_GROUP_SIZE, conformant with OpenCL 1.1
  • *local_work_size = 5 -> Correct results, every 4-5 tries crashes
  • *local_work_size = 6-9 -> Error: CL_INVALID_WORK_GROUP_SIZE, conformant with OpenCL 1.1
  • *local_work_size = 10 -> Correct results, seems to never crash


Shiri_M_Intel
Employee
849 Views
Hi
We were able to reproduce the failure. It apears to be bug in the SDK compiler.
We are looking into that.
Thanks, Shiri
Reply