I am wondering if anyone knows an efficient parallel prefix sum OpenCL implementation for FPGA. I am currently using the one at CLPP , but it is extremely slow. I guess it makes sense since it was developed earlier for GPU. Anyone knows an open source parallel prefix sum optimized for FPGA? Thanks
If you have the CUDA code for the prefix sum , then you can convert it to DPC++ and then try to compile the DPC++ code for FPGA.
Thanks and Regards