- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I'm using OpenCL for a few accelerators, but within the last few days I tried to get my stuff running on two Intel CPUs.Unfortunatly my program fails to run on Intel Core i7 2600k and Intel Core i7 2620M and I think that this could be a bug withinthe Intel OpenCL SDK 1.5. I'm using Arch Linux on both Intel CPUs with like I said Intel OpenCL SDK 1.5 and the latest Nvidia OpenCL Headers.
I reduced my program to a very small piece of code, which runs fine on my Nvidia GTX 560TI, Nvidia Quadro FX570 and a IBMQS22 Blade (Cell).
Let's have a look at it: I forward a struct as parameter to an OpenCL kernel, which look like this:
[cpp]struct myStruct { unsigned int A; unsigned int B; unsigned int C; unsigned int D; }; __kernel void ckInit( struct myStruct ctl ) { // just do nothing :-) }[/cpp]
... within the host code, the struct looks like following:
[cpp]struct myStruct { cl_uint A; cl_uint B; cl_uint C; cl_uint D; };[/cpp]
... and within the host code function I set the kernel arguments like this:
[cpp] struct myStruct my_test_struct; my_test_struct.A = 1; my_test_struct.B = 2; my_test_struct.C = 3; my_test_struct.D = 4; error = clSetKernelArg(ckInit, 0, sizeof(my_test_struct), (void*)&my_test_struct); if (error != CL_SUCCESS) { printf("Error creating clSetKernelArg!\n"); exit(error); } size_t szGlobalWorkSize[2] = {8,8}; size_t szLocalWorkSize[2] = {8,8}; error = clEnqueueNDRangeKernel(queue, ckInit, 2, NULL, szGlobalWorkSize, szLocalWorkSize, 0, NULL, NULL); if (error != CL_SUCCESS) { printf("Error creating clEnqueueNDRangeKernel!\n"); exit(error); }[/cpp]
... running this on Nvidia OpenCL devices or the cell OpenCL device works fine.
On Intel devices I get an error and the error code is: "CL_INVALID_ARG_SIZE"
You could say: struct forwarding is not supported ... maybe?!? ... but reducing the struct to two unsigned int(or also float) works on Intel.
Furthermore, increasing the size of the unsigned int to 8, I get a bad segmentation fault. (You can also take 8 floats and it happens just the same.)
GDB shows the following:
[bash](gdb) backtrace #0 0x00007ffff71e71cc in Intel::OpenCL::Framework::MemoryObject::GetDeviceMemoryObject(_cl_device_id*) () from /opt/intel/opencl-sdk/libintelocl.so #1 0x00007ffff71e91f8 in Intel::OpenCL::Framework::MemoryObject::IsAllocated(_cl_device_id*) () from /opt/intel/opencl-sdk/libintelocl.so #2 0x00007ffff7203931 in Intel::OpenCL::Framework::NDRangeKernelCommand::Init() () from /opt/intel/opencl-sdk/libintelocl.so #3 0x00007ffff71f7ff5 in Intel::OpenCL::Framework::ExecutionModule::EnqueueNDRangeKernel(_cl_command_queue*, _cl_kernel*, unsigned int, unsigned long const*, unsigned long const*, unsigned long const*, unsigned int, _cl_event* const*, _cl_event**) () from /opt/intel/opencl-sdk/libintelocl.so #4 0x00007ffff71dc876 in clEnqueueNDRangeKernel () from /opt/intel/opencl-sdk/libintelocl.so #5 0x000000000040145d in oclInterface () #6 0x00000000004010b7 in main ()[/bash]
Within the last few days I tried to get this fixed and read that I have to align my struct (within the host code I think) or pack it ...
But I got confused, because I'm not really familiar with aligning structs and I think it can't be right, because on other devices from other vendors it works fine :-)
So the only way around this problem was following OpenCL kernel (here with 8 unsigned int):
[cpp]struct myStruct { unsigned int A; unsigned int B; unsigned int C; unsigned int D; unsigned int A1; unsigned int B1; unsigned int C1; unsigned int D1; }; __kernel void ckInit( __constant struct myStruct* ctl ) { }[/cpp]
Within the host code I'm creating a buffer and writing the struct into the buffer. This works fine on Nvidia GPU, IBM Cell and Intel ... but this is only a way around and no solution ... :-/
... the host code struct looks like this:
[cpp]struct myStruct { cl_uint A; cl_uint B; cl_uint C; cl_uint D; cl_uint A1; cl_uint B1; cl_uint C1; cl_uint D1; };[/cpp]
... and the host code function like this:
[cpp] struct myStruct my_test_struct; my_test_struct.A = 1;
my_test_struct.B = 2;
my_test_struct.C = 3;
my_test_struct.D = 4;
my_test_struct.A1 = 5;
my_test_struct.B1 = 6;
my_test_struct.C1 = 7;
my_test_struct.D1 = 8;
cl_mem clBuffer_struct = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(my_test_struct), NULL, &error); error = clSetKernelArg(ckInit, 0, sizeof(clBuffer_struct), (void*)&clBuffer_struct)); if (error != CL_SUCCESS) { printf("Error creating clSetKernelArg!\n"); exit(error); } error = clEnqueueWriteBuffer(queue, clBuffer_struct, CL_TRUE, 0, sizeof(my_test_struct), (void*) &my_test_struct, 0, NULL, NULL); if (error != CL_SUCCESS) { printf("Error creating clEnqueueWriteBuffer!\n"); exit(error); }[/cpp]
Is this a bug or a feature?
I hope you will fix this :-)
Regards
Patrick
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*PUSH*
Is there really nobody who has an idea if it is a bug or not?
Regards
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick,
In general there is a bit of ambiguity in the OpenCL specification with regards to structs.
The ambiguity is that the struct on the host and the struct on the device don'tnecessarily share the same layout in memory. For example a certain device can choose to pad, reorder members differently than what the host would do. So when one performs a sizeof( mystructure ) on the host or initializes a data member in the structure on the host mystruct.myfield = something... doesn't necessarily reflects well the structure on the device. Wierd stuff could happen...
Having said that, we are now investigating the scenario which you are describing and will let you know soon whether this is a bug or something which falls in the category which I have described above.
Thanks ,
Boaz
In general there is a bit of ambiguity in the OpenCL specification with regards to structs.
The ambiguity is that the struct on the host and the struct on the device don'tnecessarily share the same layout in memory. For example a certain device can choose to pad, reorder members differently than what the host would do. So when one performs a sizeof( mystructure ) on the host or initializes a data member in the structure on the host mystruct.myfield = something... doesn't necessarily reflects well the structure on the device. Wierd stuff could happen...
Having said that, we are now investigating the scenario which you are describing and will let you know soon whether this is a bug or something which falls in the category which I have described above.
Thanks ,
Boaz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Boaz,
thank you for your feedback and your explanation.
Have you got any interesting results yet? :-)
Regards
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick,
We have analyzed the issue and this turns out to be a bug in our SDK.
At the moment it seems like our implementaiton is broken with regards to passing structs to kernels.
We are working on a fix which will be included in one of our coming releases.
I hope you are able to workaround this bug for now.
Thanks for raising this issue in the forum and sorry for the inconenice.
Boaz
We have analyzed the issue and this turns out to be a bug in our SDK.
At the moment it seems like our implementaiton is broken with regards to passing structs to kernels.
We are working on a fix which will be included in one of our coming releases.
I hope you are able to workaround this bug for now.
Thanks for raising this issue in the forum and sorry for the inconenice.
Boaz
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page