I am currently using 'OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 19.2' on a PAC 5005 (Stratix 10) card to offload OpenCL-kernels to the FPGA.
Suppose there are two cl_programs prog1 and prog2 (i.e. bitstreams), each program contains a simple cl_kernel function, kern1 and kern2. When at least one instance of each kernel is enqueued (possibly into different command queues), at some point the bitstream has to be re-programmed. During this time, strace shows a slow ioctl() on /dev/intel-fpga-fme.0.
I have a few questions regarding this. Maybe someone with more experience with the runtime could help me.
Is it typical for a reconfiguration to take roughly 2 seconds (for a ~16 MB .aocx file)?
Is there any way to speed this up, e.g. with prefetching, with or without OpenCL?
During the reconfiguration the card seems completely blocked, i.e. even data movement from/to the global memory stops (or rather the copy tasks must finish first). Is there a way around that?
If you need more concrete code-samples and/or profiles, just let me know.