I'm running into a problem where data is not being written to my buffer when the kernels finish. I've tested my kernel in isolation in Eclipse running in Ubuntu on an Intel i5 CPU and it seems to output the correct results. When I move it over to CentOS I can't get printf statements to return from the kernel and my output buffers are never written to. Here is an example of my code:
double * coef_elts = (double *) calloc(p * voxels, sizeof(double));
return_vec_1 = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, sizeof(double) * p * voxels, coef_elts, &err);
err = clSetKernelArg(kernel, 26, sizeof(cl_mem), &return_vec_1);
err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_size, NULL, 0, NULL, NULL);
err = clEnqueueReadBuffer(queue, return_vec_1, CL_TRUE, 0, sizeof(double) * p * voxels, coef_elts, 0, NULL, NULL);
When I read the output it only contain the 0 data assigned by calloc. This wasn't the case in eclipse. If anyone has any suggestions on the code or getting an output in CentOS it would be much appreciated. I am aware CentOS is not supported but unfortunately I cannot change the OS.
Try the following just to make sure that the kernel completes:
cl_event event = NULL;
err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_size, NULL, 0, NULL, &event );
err = = clWaitForEvents(1, &event);
status = clReleaseEvent(event);
You could also put the completion event on clEnqueueReadBuffer, though that is not strictly necessary.
Another workaround is to put clFlush and/or clFinish between clEnqueueNDRangeKernel and clEnqueueReadBuffer.
Let me know how it worked out.
I've what you have suggested and I'm still getting the same results. I tried changing the CL_MEM_USE_HOST_PTR flag to CL_MEM_COPY_HOST_PTR and CL_MEM_ALLOC_HOST_PTR just incase it would make a difference but those also didn't work. I'm trying to run the code on the Intel Xeon processor instead of the Phi to check whether it's a problem with the Phi or the OS.
Could you by please provide the reproducer, if possible? Also, which OS version, processor and OpenCL driver version are you using?
At a minimum, what is your kernel?
Update: The code works on the Xeon processor so the problem is most likely with the Phi. I do have to leave the computer for a while but I will return with the information you wanted. Also, I'm unfamiliar with what "reproducer" means. If you can elaborate for me I would be happy to provide it when I return.
A reproducer is a buildable minimal code sample that reproduces the problem. Usually, we use it to reproduce the issue on our end and file the bug with the driver team.
I've talked to my partner in charge of the dev environment and he says the problem might be because I can't compile the code for the phi on the machine that I'm writing the code on since it does not have a phi. I've been sending the bin files over to the node that does have the phi. If that is the problem we can fix that pretty easy. Here is the stuff you asked for just in case you want to test it out anyway.
OS: CentOS version 7.0.1406
Processor: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
OpenCL driver version: 188.8.131.52_x64