OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1720 Discussions

clEnqueueReadBuffer crashes if using printf and global work size > 1

goldmund99
Beginner
2,171 Views
System: Intel Core 2 Duo SL 9400, Windows 7 64 bits, Intel OpenCL SDK 1.1 Beta

Hello,
I have a problem I'm unable to debug. My program crashes when calling a kernel with a printf in it.
The program does not crash when I remove the printf statement or when I use global work size = 1 (but in any case it does not print anything).

I could not find any documentation about printf, except that works like in C99 and that requires to enable the extension. Are there any other requirements?
0 Kudos
11 Replies
Eli_Bendersky__Intel
2,171 Views
Hello goldmund99,

Could you please provide us with a minimal code sample that reproduces your problem?

It does not have to be your full kernel, if that's impossible for you due to security reasons.

Thanks in advance
0 Kudos
goldmund99
Beginner
2,171 Views
The kernel is:

#pragma OPENCL EXTENSION cl_intel_printf : enable
__kernel void sensitivity(__global int * dest)
{
int tid = get_global_id(0);
dest[tid] = tid*2;
printf("tid");
}

Maybe I did some error during testing, now it crashes also with global_work_size = 1 and local_work_size = NULL.
If I remove the printf statement, it works as espected.
0 Kudos
goldmund99
Beginner
2,171 Views
More details:
The reported error is "Stack overflow"

The code I use to run the kernel is as follows:

cl_int errorcode = CL_SUCCESS;
n = 1;
std::vector dst_host(n);

cl_context context = clCreateContextFromType(context_properties, CL_DEVICE_TYPE_CPU, NULL, NULL, &errorcode);
cl_command_queue cmd_queue = clCreateCommandQueue(context, device, properties, &errorcode);
cl_mem dest_mem = clCreateBuffer(context, CL_MEM_WRITE_ONLY, n*sizeof(cl_int), &dst_host[0], &errorcode);
cl_program program = clCreateProgramWithSource(context, 1, (const char**)kernel_source, lengths, &errorcode);
errorcode = clBuildProgram(program, NULL, NULL, NULL, NULL, NULL);
cl_kernel kernel = clCreateKernel(program, "sensitivity", &errorcode);
errorcode = clSetKernelArg(kernel,0,sizeof(cl_mem),(const void *)dest_mem);
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);
errorcode = clEnqueueReadBuffer(cmd_queue, dest_mem, CL_TRUE, 0, n*sizeof(cl_int), &dst_host[0],NULL,NULL,NULL); // <- CRASH
0 Kudos
goldmund99
Beginner
2,171 Views
Also very strange: when I compile the kernel with the Intel OpenCL SDK Offline Compiler it gives me this warning:

Build succeeded!
:1:26: warning: expected identifier in '#pragma OPENCL' - ignored
Build started
Kernel was successfully vectorized
Done.

0 Kudos
Evgeny_F_Intel
Employee
2,171 Views
Hi,

We will check the warning issue, but it's not related to the failure.

Please check the following line:

errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);

According to the spec. the global range parameter should be passed as array and not as constant.
Please use this sequence:

size_t global = 1;
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global, NULL, NULL, NULL);
0 Kudos
goldmund99
Beginner
2,171 Views
Actually I'm using a pointer as the specification says, I just changed in the post the sentence to made the sentence clearer...sorry for the mistake.
I reported the API code just for reference but it is the same as in provided examples, I checked that the same code works with different kernels and also for the minimal kerner I wrote, if I comment the printf statement.
0 Kudos
Shiri_M_Intel
Employee
2,171 Views
Hi
We will try your code and update you
Thanks, Shiri
0 Kudos
Eli_Bendersky__Intel
2,171 Views
Hello goldmund99,

To reproduce your problem quickly, it would help a lot if you could send us the complete source code file that crashes for you. We need the host C++ code that invokes the OpenCL SDK, as well as the kernel (if that's in a separate file). Getting the complete code, as you yourself run it, is very important.

Is this possible?

Eli
0 Kudos
Eli_Bendersky__Intel
2,171 Views
goldmund99,

Thanks. We'll try to reproduce the problem and will get back to you once we have some conclusions.

0 Kudos
goldmund99
Beginner
2,171 Views
I have found a workaround for now.
The problem is not clEnqueueReadBuffer, because the crash happens also by calling clFinish after clEnqueueNDRangeKernel.

I was calling clEnqueueNDRangeKernel using NULL as argument for local_work_size. According to the 1.1 specification: "local_work_size can also be a NULL value in which case the OpenCL implementation will determine how to be break the global work-items into appropriate work-group instances."

If i explicit the local work size, for global_work_size = 10 and work_dim I call clEnqueueNDRangeKernel and get:
  • *local_work_size = 0 -> Error: CL_INVALID_WORK_GROUP_SIZE (this is different than giving NULL pointer as argument), conformant with OpenCL 1.1
  • *local_work_size = 1-2 -> Returns CL_SUCCESS but crashes when calling clFinish
  • *local_work_size = 3-4 -> Error: CL_INVALID_WORK_GROUP_SIZE, conformant with OpenCL 1.1
  • *local_work_size = 5 -> Correct results, every 4-5 tries crashes
  • *local_work_size = 6-9 -> Error: CL_INVALID_WORK_GROUP_SIZE, conformant with OpenCL 1.1
  • *local_work_size = 10 -> Correct results, seems to never crash


0 Kudos
Shiri_M_Intel
Employee
2,171 Views
Hi
We were able to reproduce the failure. It apears to be bug in the SDK compiler.
We are looking into that.
Thanks, Shiri
0 Kudos
Reply