- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm trying to run code that was working on OpenCL GPU platform on Intel MIC platform. On the host computer, there is one Intel MIC, and two Xeon CPUs. I'm trying to get program binaries for caching (for the reasons described in the "Reusing Compilation Results with clCreateProgramWithBinary" article).
I'm only compiling for the MIC device, but clGetProgramInfo with CL_PROGRAM_NUM_DEVICES returns two devices.
However, clGetProgramInfo with CL_PROGRAM_BINARY_SIZES returns 0 B (CPU binary) and 810 B (MIC binary).
Finally, clGetProgramInfo with CL_PROGRAM_BINARIES fails with -59 (CL_INVALID_OPERATION). I have tried changing the value of the size argument, when too small (less than 2 * sizeof(void*) - the function needs two pointers to the buffers where to store the binaries) it returns CL_INVALID_VALUE, but when there is sufficient size, it just fails with CL_INVALID_OPERATION. Maybe I do not understand how this is supposed to work, but the same code works fine with GPUs.
My kernel is:
__kernel void f(__global int *p)
{
p[get_global_id(0)] = 0;
}
Version number: OpenCL 1.2 (Build 67279)
Operating System: Linux OS (Bullx linux)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yuri,
thanks for a quick response, I really appreciate it.
As for 1), the machine is a part of our national supercomputing centre, I don't really have rights to install new software. But I'll try to ask the admins.
Regarding 2), the speed is not really so much of an issue. This problem occurs in a larger application that caches the binaries by default and this makes it crash. It could be disabled, but at the same time - it should work, it is one of the basic functionalities that always were in OpenCL. Also, we did some research into processing the intermediate code in order to do some strange optimizations and it would be interesting to have it, but that is not an issue at this point.
If it helps, I can try to put together some minimal code that reproduces the issue, but it would take me a while, there are currently some more pressing deadlines.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
even if I create my programs from the offline-compiled IR files using clCreateProgramWithBinary, the subsequent clBuildProgram calls take quite some time for some of my kernels. I was looking into a way to cache the results, but clGetProgramInfo(CL_PROGRAM_BINARIES) just gives me the IR code, identical to what the offline compiler spit out.
Is there a way to retrieve the result of whatever clBuildProgram makes with the IR code?
On a side note, I found that Nvidia's driver caches the programs internally. But this turned out to be quite a pitfall because it seems they don't follow #include commands properly. Even after I changed the source code in of the files included by my main kernel file, it would use the cached version. I finally disabled it with CUDA_CACHE_DISABLE=1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would be just fine with IR, if it worked. As I said, the speed is not an issue, it is a part of a bigger package which caches binaries by default and fails on Intel's OpenCL platform for some reason.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is a bit messy, but here goes:
bool CCLProgramStorage::TProgramBinary::Download(cl_program h_program, size_t n_device_num) { _ASSERTE(!t_data_buffer.p_data && binary_size_list.empty()); // make sure this is empty printf("debug: CCLProgramStorage::TProgramBinary::Download()\n"); cl_uint _n_device_num; if(clGetProgramInfo(h_program, CL_PROGRAM_NUM_DEVICES, sizeof(size_t), &_n_device_num, NULL) != CL_SUCCESS || _n_device_num != n_device_num) { printf("error: have %d devices, device query returns %d\n", int(n_device_num), int(_n_device_num)); //return false; n_device_num = _n_device_num; // this fails on Intel MIC, I compiled for 1 device and get binaries for two! } // get number of devices, make sure it matches description if(!n_device_num || n_device_num > UINT32_MAX) return false; // we can't query binaries without devices printf("debug: have %d devices\n", int(n_device_num)); std::vector<const unsigned char*> binary_ptr_list; binary_ptr_list.resize(n_device_num); binary_size_list.resize(n_device_num); if(clGetProgramInfo(h_program, CL_PROGRAM_BINARY_SIZES, n_device_num * sizeof(size_t), &binary_size_list[0], NULL) != CL_SUCCESS) { printf("error: clGetProgramInfo(h_program, CL_PROGRAM_BINARY_SIZES) failed\n"); return false; } // get binary sizes uint64_t n_size_total = 0; for(size_t i = 0; i < n_device_num; ++ i) { if(n_size_total > UINT64_MAX - binary_size_list) return false; n_size_total += binary_size_list; } // sum binary sizes up t_data_buffer.n_size = n_size_total; if(n_size_total > SIZE_MAX || !(t_data_buffer.p_data = new(std::nothrow) uint8_t[size_t(n_size_total)])) return false; // alloc data buffer printf("debug: allocated buffer for " PRIsizeB "B\n", PRIsizeBparams(n_size_total)); { const uint8_t *p_data_ptr = t_data_buffer.p_data; for(size_t i = 0; i < n_device_num; ++ i) { binary_ptr_list = (const unsigned char*)p_data_ptr; p_data_ptr += binary_size_list; printf("debug: size %d is " PRIsizeB "B\n", i, PRIsizeBparams(binary_size_list)); } _ASSERTE(p_data_ptr == t_data_buffer.p_data + t_data_buffer.n_size); } // get pointers to individual binaries printf("debug: have binary ptrs\n"); int n_result; size_t n_ret_size = 0; if((n_result = clGetProgramInfo(h_program, CL_PROGRAM_BINARIES, size_t(n_size_total), 0, &n_ret_size)) != CL_SUCCESS) { fprintf(stderr, "error: the first clGetProgramInfo(CL_PROGRAM_BINARIES) failed with:" " %d (ret size = " PRIsize ")\n", n_result, n_ret_size); return false; } fprintf(stderr, "debug: the first clGetProgramInfo(CL_PROGRAM_BINARIES) returns:" " %d (ret size = " PRIsize ")\n", n_result, n_ret_size); _ASSERTE(n_device_num * sizeof(void*) == n_ret_size || n_size_total == n_ret_size); if((n_result = clGetProgramInfo(h_program, CL_PROGRAM_BINARIES, n_ret_size, &binary_ptr_list[0], &n_ret_size)) != CL_SUCCESS) { fprintf(stderr, "error: clGetProgramInfo(CL_PROGRAM_BINARIES) failed with:" " %d (ret size = " PRIsize ")\n", n_result, n_ret_size); return false; } // download binaries printf("debug: CCLProgramStorage::TProgramBinary::Download() succeeded\n"); return true; }
Right now I can't verify how that works, as we used up all the CPU-hours we were awarded in our project. But as far as I remember, the last call to clGetProgramInfo(h_program, CL_PROGRAM_BINARIES, ...) fails.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Uh huh, posting source code totally works on this site. When I said "messy", i did not mean unformatted. You can get the source code at http://www.luki.webzdarma.cz/up/CLKernelDownload.cpp. Thanks ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the code. And again it's working fine for me on XE 2013 R3. This means that the issue you're observing was probably fixed. So, I would still suggest to pursue the responsible people to perform an upgrade to this latest version.
Thanks,
Yuri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Volker,
We connnnsider enabling saving the very final JIT, allowing customers avoiding the re-compilation time. Your inputs would be well appreciated.
Thanks,
Arik
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Arik,
OpenGL (not OpenCL, mind you) API features binary formats, so that a binary can be received in any one of N formats that the API supports (see e.g. http://www.khronos.org/opengles/sdk/docs/man/xhtml/glShaderBinary.xml). Would the same be possible as an extension to OpenCL, so that someone could decide what format they want?
Best,
LukasP
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Lukas,
We currently don't support any standard binary format. However, we are looking at the OpenCL SPIR extension as a portable binary solution.
Would this satisfy your needs?
Regards,
Arik
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Arik,
that is actually very interesting. That would suit one of our applications very well.
Best,
Lukas
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page