Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Lukas_S_
Beginner
174 Views

clGetProgramInfo() with CL_PROGRAM_BINARIES fails

Hello,

I'm trying to run code that was working on OpenCL GPU platform on Intel MIC platform. On the host computer, there is one Intel MIC, and two Xeon CPUs. I'm trying to get program binaries for caching (for the reasons described in the "Reusing Compilation Results with clCreateProgramWithBinary" article).

I'm only compiling for the MIC device, but clGetProgramInfo with CL_PROGRAM_NUM_DEVICES returns two devices.

However, clGetProgramInfo with CL_PROGRAM_BINARY_SIZES returns 0 B (CPU binary) and 810 B (MIC binary).

Finally, clGetProgramInfo with CL_PROGRAM_BINARIES fails with -59 (CL_INVALID_OPERATION). I have tried changing the value of the size argument, when too small (less than 2 * sizeof(void*) - the function needs two pointers to the buffers where to store the binaries) it returns CL_INVALID_VALUE, but when there is sufficient size, it just fails with  CL_INVALID_OPERATION. Maybe I do not understand how this is supposed to work, but the same code works fine with GPUs.

My kernel is:

__kernel void f(__global int *p)

{

    p[get_global_id(0)] = 0;

}

Version number: OpenCL 1.2 (Build 67279)
Operating System: Linux OS (Bullx linux)

0 Kudos
12 Replies
Yuri_K_Intel
Employee
174 Views

Hi Lukas, I will try to reproduce the errors and let you know the results. Meanwhile I have a couple of comments. 1) Your version (67279) of Intel® SDK for OpenCL* Applications is not the latest one. Could you please upgrade to the recently released R3 version (http://software.intel.com/en-us/vcsource/tools/opencl-sdk-xe) and check if the errors are still present? 2) As I understand your final goal is to reduce compilation time. Currently what we return as a binary is not an actual binary, but rather an IR (intermediate representation), so there won't be significant speed up of compilation time. We have a number of requests to provide the final executable binary which should save a lot of compilation time, but at this time I don't have anymore details besides this. Thanks, Yuri
Lukas_S_
Beginner
174 Views

Hi Yuri,

thanks for a quick response, I really appreciate it.

As for 1), the machine is a part of our national supercomputing centre, I don't really have rights to install new software. But I'll try to ask the admins.

Regarding 2), the speed is not really so much of an issue. This problem occurs in a larger application that caches the binaries by default and this makes it crash. It could be disabled, but at the same time - it should work, it is one of the basic functionalities that always were in OpenCL. Also, we did some research into processing the intermediate code in order to do some strange optimizations and it would be interesting to have it, but that is not an issue at this point.

If it helps, I can try to put together some minimal code that reproduces the issue, but it would take me a while, there are currently some more pressing deadlines.

 

Volker_Bruns
Beginner
174 Views

even if I create my programs from the offline-compiled IR files using clCreateProgramWithBinary, the subsequent clBuildProgram calls take quite some time for some of my kernels. I was looking into a way to cache the results, but clGetProgramInfo(CL_PROGRAM_BINARIES) just gives me the IR code, identical to what the offline compiler spit out.

Is there a way to retrieve the result of whatever clBuildProgram makes with the IR code?

On a side note, I found that Nvidia's driver caches the programs internally. But this turned out to be quite a pitfall because it seems they don't follow #include commands properly.  Even after I changed the source code in of the files included by my main kernel file, it would use the cached version. I finally disabled it with CUDA_CACHE_DISABLE=1.

Lukas_S_
Beginner
174 Views

I would be just fine with IR, if it worked. As I said, the speed is not an issue, it is a part of a bigger package which caches binaries by default and fails on Intel's OpenCL platform for some reason.
 

Yuri_K_Intel
Employee
174 Views

Lukas, So far I couldn't reproduce the issue on the latest available version - XE 2013 R3. I'm getting similar results for CL_PROGRAM_NUM_DEVICES and CL_PROGRAM_BINARY_SIZES queries even if compilation was for MIC only (i.e. 2 devices and 0B for CPU). I think this is not an issue and this happens due to the fact that context contains both devices. Could you please post a snippet of your code for clGetProgramInfo with CL_PROGRAM_BINARIES? Thanks, Yuri
Lukas_S_
Beginner
174 Views

It is a bit messy, but here goes:

bool CCLProgramStorage::TProgramBinary::Download(cl_program h_program, size_t n_device_num)
{
	_ASSERTE(!t_data_buffer.p_data && binary_size_list.empty()); // make sure this is empty

	printf("debug: CCLProgramStorage::TProgramBinary::Download()\n");

	cl_uint _n_device_num;
	if(clGetProgramInfo(h_program, CL_PROGRAM_NUM_DEVICES,
	   sizeof(size_t), &_n_device_num, NULL) != CL_SUCCESS || _n_device_num != n_device_num) {
		printf("error: have %d devices, device query returns %d\n", int(n_device_num), int(_n_device_num));
		//return false;
		n_device_num = _n_device_num; // this fails on Intel MIC, I compiled for 1 device and get binaries for two!
	}
	// get number of devices, make sure it matches description

	if(!n_device_num || n_device_num > UINT32_MAX)
		return false;
	// we can't query binaries without devices

	printf("debug: have %d devices\n", int(n_device_num));

	std::vector<const unsigned char*> binary_ptr_list;
	binary_ptr_list.resize(n_device_num);
	binary_size_list.resize(n_device_num);
	if(clGetProgramInfo(h_program, CL_PROGRAM_BINARY_SIZES,
	   n_device_num * sizeof(size_t), &binary_size_list[0], NULL) != CL_SUCCESS) {
		printf("error: clGetProgramInfo(h_program, CL_PROGRAM_BINARY_SIZES) failed\n");
		return false;
	}
	// get binary sizes

	uint64_t n_size_total = 0;
	for(size_t i = 0; i < n_device_num; ++ i) {
		if(n_size_total > UINT64_MAX - binary_size_list)
			return false;
		n_size_total += binary_size_list;
	}
	// sum binary sizes up

	t_data_buffer.n_size = n_size_total;
	if(n_size_total > SIZE_MAX || !(t_data_buffer.p_data = new(std::nothrow) uint8_t[size_t(n_size_total)]))
		return false;
	// alloc data buffer

	printf("debug: allocated buffer for " PRIsizeB "B\n", PRIsizeBparams(n_size_total));

	{
		const uint8_t *p_data_ptr = t_data_buffer.p_data;
		for(size_t i = 0; i < n_device_num; ++ i) {
			binary_ptr_list = (const unsigned char*)p_data_ptr;
			p_data_ptr += binary_size_list;
			printf("debug: size %d is " PRIsizeB "B\n", i, PRIsizeBparams(binary_size_list));
		}
		_ASSERTE(p_data_ptr == t_data_buffer.p_data + t_data_buffer.n_size);
	}
	// get pointers to individual binaries

	printf("debug: have binary ptrs\n");

	int n_result;
	size_t n_ret_size = 0;
	if((n_result = clGetProgramInfo(h_program, CL_PROGRAM_BINARIES,
	   size_t(n_size_total), 0, &n_ret_size)) != CL_SUCCESS) {
		fprintf(stderr, "error: the first clGetProgramInfo(CL_PROGRAM_BINARIES) failed with:"
			" %d (ret size = " PRIsize ")\n", n_result, n_ret_size);
		return false;
	}
	fprintf(stderr, "debug: the first clGetProgramInfo(CL_PROGRAM_BINARIES) returns:"
		" %d (ret size = " PRIsize ")\n", n_result, n_ret_size);
	_ASSERTE(n_device_num * sizeof(void*) == n_ret_size || n_size_total == n_ret_size);
	if((n_result = clGetProgramInfo(h_program, CL_PROGRAM_BINARIES,
	   n_ret_size, &binary_ptr_list[0], &n_ret_size)) != CL_SUCCESS) {
		fprintf(stderr, "error: clGetProgramInfo(CL_PROGRAM_BINARIES) failed with:"
			" %d (ret size = " PRIsize ")\n", n_result, n_ret_size);
		return false;
	}
	// download binaries

	printf("debug: CCLProgramStorage::TProgramBinary::Download() succeeded\n");

	return true;
}



 

Right now I can't verify how that works, as we used up all the CPU-hours we were awarded in our project. But as far as I remember, the last call to  clGetProgramInfo(h_program, CL_PROGRAM_BINARIES, ...) fails.







 

Lukas_S_
Beginner
174 Views

Uh huh, posting source code totally works on this site. When I said "messy", i did not mean unformatted. You can get the source code at http://www.luki.webzdarma.cz/up/CLKernelDownload.cpp. Thanks ...


 

Yuri_K_Intel
Employee
174 Views

Thanks for the code. And again it's working fine for me on XE 2013 R3. This means that the issue you're observing was probably fixed. So, I would still suggest to pursue the responsible people to perform an upgrade to this latest version.

Thanks,

Yuri

Arik_N_Intel
Employee
174 Views

Volker,

We connnnsider enabling saving the very final JIT, allowing customers avoiding the re-compilation time. Your inputs would be well appreciated.

Thanks,

Arik

Lukas_S_
Beginner
174 Views

Arik,

OpenGL (not OpenCL, mind you) API features binary formats, so that a binary can be received in any one of N formats that the API supports (see e.g. http://www.khronos.org/opengles/sdk/docs/man/xhtml/glShaderBinary.xml). Would the same be possible as an extension to OpenCL, so that someone could decide what format they want?

Best,

LukasP


 

Arik_N_Intel
Employee
174 Views

Dear Lukas,

We currently don't support any standard binary format. However, we are looking at the OpenCL SPIR extension as a portable binary solution.

http://www.khronos.org/spir

Would this satisfy your needs?

Regards,

Arik

 

 

Lukas_S_
Beginner
174 Views

Arik,

that is actually very interesting. That would suit one of our applications very well.

Best,

Lukas