I am developing an OpenCL kernel for particle simulation and I face a problem. I have to transfer a lot of arguments to the device (I have not counted properly, but from the original code it could be more than 20 or 30 float arrays). Could you suggest me a way to handle properly such amount of arguments without having to call clSetKernelArgument() and clEnqueueWriteBuffer() more than 20 times?.
The original code was written in Fortran, where the data was stored inside COMMON blocks, therefore one approach that I used was to use Fortran/C interoperability and store the arguments inside the proper structs and then pass the structs to the kernel. I wrote a C source file to handle all the OpenCL execution and then I return the results to the Fortran code for post-processing... what would you suggest to me?
Thanks for your help!, best regards!
1. You don't need to do clEnqueueWriteBuffer if you use aligned allocations (e.g. aligned_malloc) with 4096 byte alignment and create your buffers with CL_USE_HOST_PTR and make sure their sizes are multiples of 64 (cacheline size) if you target Intel Processor Graphics. You will need to do a lot of clSetKernelArgument calls still.
2. You can put all of your data into one huge buffer and then pass that buffer and the struct with offsets.
3. You can do a combination of 1. and 2.