Sorry if the question may not be completely relevant. I'm trying to find a reference design for the implementation of the convolution using OpenCL on Arria 10 FPGAs in a systolic fashion. I realized Xilinx already provides a sample for a Matrix-Matrix multiplication using systolic arrays. In addition, their compiler is smart enough to derive a systolic array design, without any effort from the developer. On Intel OpenCL, it doesn't seem to be so.
I have seen in Intel FPGA Programming Guide, on how we can create an array of "auto" compute units and make them connect as a mesh, but would like to see a reference for a simple convolution.
Does anyone can point me into a source code or a material, if such exists?
Use a Single Kernel to Describe Systolic Arrays https://www.intel.com/content/www/us/en/programmable/documentation/mwh1391807516407.html#xis1520273381539
An example use case of a 2D array of compute units is a systolic array of kernels. https://www.intel.com/content/www/us/en/programmable/documentation/mwh1391807965224.html