- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
System: Intel Core 2 Duo SL 9400, Windows 7 64 bits, Intel OpenCL SDK 1.1 Beta
Hello,
I have a problem I'm unable to debug. My program crashes when calling a kernel with a printf in it.
The program does not crash when I remove the printf statement or when I use global work size = 1 (but in any case it does not print anything).
I could not find any documentation about printf, except that works like in C99 and that requires to enable the extension. Are there any other requirements?
Hello,
I have a problem I'm unable to debug. My program crashes when calling a kernel with a printf in it.
The program does not crash when I remove the printf statement or when I use global work size = 1 (but in any case it does not print anything).
I could not find any documentation about printf, except that works like in C99 and that requires to enable the extension. Are there any other requirements?
Link Copied
11 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello goldmund99,
Could you please provide us with a minimal code sample that reproduces your problem?
It does not have to be your full kernel, if that's impossible for you due to security reasons.
Thanks in advance
Could you please provide us with a minimal code sample that reproduces your problem?
It does not have to be your full kernel, if that's impossible for you due to security reasons.
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The kernel is:
#pragma OPENCL EXTENSION cl_intel_printf : enable
__kernel void sensitivity(__global int * dest)
{
int tid = get_global_id(0);
dest[tid] = tid*2;
printf("tid");
}
Maybe I did some error during testing, now it crashes also with global_work_size = 1 and local_work_size = NULL.
If I remove the printf statement, it works as espected.
#pragma OPENCL EXTENSION cl_intel_printf : enable
__kernel void sensitivity(__global int * dest)
{
int tid = get_global_id(0);
dest[tid] = tid*2;
printf("tid");
}
Maybe I did some error during testing, now it crashes also with global_work_size = 1 and local_work_size = NULL.
If I remove the printf statement, it works as espected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
More details:
The reported error is "Stack overflow"
The code I use to run the kernel is as follows:
cl_int errorcode = CL_SUCCESS;
n = 1;
std::vector dst_host(n);
cl_context context = clCreateContextFromType(context_properties, CL_DEVICE_TYPE_CPU, NULL, NULL, &errorcode);
cl_command_queue cmd_queue = clCreateCommandQueue(context, device, properties, &errorcode);
cl_mem dest_mem = clCreateBuffer(context, CL_MEM_WRITE_ONLY, n*sizeof(cl_int), &dst_host[0], &errorcode);
cl_program program = clCreateProgramWithSource(context, 1, (const char**)kernel_source, lengths, &errorcode);
errorcode = clBuildProgram(program, NULL, NULL, NULL, NULL, NULL);
cl_kernel kernel = clCreateKernel(program, "sensitivity", &errorcode);
errorcode = clSetKernelArg(kernel,0,sizeof(cl_mem),(const void *)dest_mem);
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);
errorcode = clEnqueueReadBuffer(cmd_queue, dest_mem, CL_TRUE, 0, n*sizeof(cl_int), &dst_host[0],NULL,NULL,NULL); // <- CRASH
The reported error is "Stack overflow"
The code I use to run the kernel is as follows:
cl_int errorcode = CL_SUCCESS;
n = 1;
std::vector
cl_context context = clCreateContextFromType(context_properties, CL_DEVICE_TYPE_CPU, NULL, NULL, &errorcode);
cl_command_queue cmd_queue = clCreateCommandQueue(context, device, properties, &errorcode);
cl_mem dest_mem = clCreateBuffer(context, CL_MEM_WRITE_ONLY, n*sizeof(cl_int), &dst_host[0], &errorcode);
cl_program program = clCreateProgramWithSource(context, 1, (const char**)kernel_source, lengths, &errorcode);
errorcode = clBuildProgram(program, NULL, NULL, NULL, NULL, NULL);
cl_kernel kernel = clCreateKernel(program, "sensitivity", &errorcode);
errorcode = clSetKernelArg(kernel,0,sizeof(cl_mem),(const void *)dest_mem);
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);
errorcode = clEnqueueReadBuffer(cmd_queue, dest_mem, CL_TRUE, 0, n*sizeof(cl_int), &dst_host[0],NULL,NULL,NULL); // <- CRASH
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also very strange: when I compile the kernel with the Intel OpenCL SDK Offline Compiler it gives me this warning:
Build succeeded!
:1:26: warning: expected identifier in '#pragma OPENCL' - ignored
Build started
Kernel was successfully vectorized
Done.
Build succeeded!
:1:26: warning: expected identifier in '#pragma OPENCL' - ignored
Build started
Kernel
Done.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We will check the warning issue, but it's not related to the failure.
Please check the following line:
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);
According to the spec. the global range parameter should be passed as array and not as constant.
Please use this sequence:
size_t global = 1;
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global, NULL, NULL, NULL);
We will check the warning issue, but it's not related to the failure.
Please check the following line:
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, 1, NULL, NULL, NULL);
According to the spec. the global range parameter should be passed as array and not as constant.
Please use this sequence:
size_t global = 1;
errorcode = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global, NULL, NULL, NULL);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually I'm using a pointer as the specification says, I just changed in the post the sentence to made the sentence clearer...sorry for the mistake.
I reported the API code just for reference but it is the same as in provided examples, I checked that the same code works with different kernels and also for the minimal kerner I wrote, if I comment the printf statement.
I reported the API code just for reference but it is the same as in provided examples, I checked that the same code works with different kernels and also for the minimal kerner I wrote, if I comment the printf statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
We will try your code and update you
Thanks, Shiri
We will try your code and update you
Thanks, Shiri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello goldmund99,
To reproduce your problem quickly, it would help a lot if you could send us the complete source code file that crashes for you. We need the host C++ code that invokes the OpenCL SDK, as well as the kernel (if that's in a separate file). Getting the complete code, as you yourself run it, is very important.
Is this possible?
Eli
To reproduce your problem quickly, it would help a lot if you could send us the complete source code file that crashes for you. We need the host C++ code that invokes the OpenCL SDK, as well as the kernel (if that's in a separate file). Getting the complete code, as you yourself run it, is very important.
Is this possible?
Eli
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
goldmund99,
Thanks. We'll try to reproduce the problem and will get back to you once we have some conclusions.
Thanks. We'll try to reproduce the problem and will get back to you once we have some conclusions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have found a workaround for now.
The problem is not clEnqueueReadBuffer, because the crash happens also by calling clFinish after clEnqueueNDRangeKernel.
I was calling clEnqueueNDRangeKernel using NULL as argument for local_work_size. According to the 1.1 specification: "local_work_size can also be a NULL value in which case the OpenCL implementation will determine how to be break the global work-items into appropriate work-group instances."
If i explicit the local work size, for global_work_size = 10 and work_dim I call clEnqueueNDRangeKernel and get:
The problem is not clEnqueueReadBuffer, because the crash happens also by calling clFinish after clEnqueueNDRangeKernel.
I was calling clEnqueueNDRangeKernel using NULL as argument for local_work_size. According to the 1.1 specification: "local_work_size can also be a NULL value in which case the OpenCL implementation will determine how to be break the global work-items into appropriate work-group instances."
If i explicit the local work size, for global_work_size = 10 and work_dim I call clEnqueueNDRangeKernel and get:
- *local_work_size = 0 -> Error: CL_INVALID_WORK_GROUP_SIZE (this is different than giving NULL pointer as argument), conformant with OpenCL 1.1
- *local_work_size = 1-2 -> Returns CL_SUCCESS but crashes when calling clFinish
- *local_work_size = 3-4 -> Error: CL_INVALID_WORK_GROUP_SIZE, conformant with OpenCL 1.1
- *local_work_size = 5 -> Correct results, every 4-5 tries crashes
- *local_work_size = 6-9 -> Error: CL_INVALID_WORK_GROUP_SIZE, conformant with OpenCL 1.1
- *local_work_size = 10 -> Correct results, seems to never crash
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
We were able to reproduce the failure. It apears to be bug in the SDK compiler.
We are looking into that.
Thanks, Shiri
We were able to reproduce the failure. It apears to be bug in the SDK compiler.
We are looking into that.
Thanks, Shiri

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page