We are implementing ResNet101 on arria10, using Intel FPGA SDK. We have found an open source repository called PipeCNN,
where they have implemented ResNet50 with FP8. But the frame rate that is achieved is 10fps, which is very low as compared to other edge platforms.
Following are our questions
- Are there optimized libraries available which can be used instead of hand coding/reusing GIT based code for some of the implementation. Please note that our FPGA will have custom peripherals also. So we cannot use OpenVino, because there we can not customize ips.
- We are already experimenting with LANE_NUM and VEC_SIZE(parameters used in the open source code. These parameters are used to make the implementation parallel). Are there any other options to try out?
>But the frame rate that is achieved is 10fps, which is very low as compared to other edge platforms.
What is your expectation and what "edge" device are you comparing against? At the same time, I would also argue that Arria 10 is too power-hungry for an edge device unless you are using the smallest variation. If you are interested in batched inference, then it is completely natural for FPGAs to be [much] slower than GPUs since their peak compute and memory performance is far much lower. e.g. the biggest Arria 10 (GX 1150) can realistically provide ~900 GOP/s FP32, 1,500 GOP/s FP16 (https://arxiv.org/abs/1701.03534), and maybe ~2,000 GOP/s INT8, if you don't get limited by the measly external memory bandwidth provided by the two DDR4-2133 MHz banks used on most Arria 10 boards. These numbers pale in comparison to what Nvidia's Jetson Xavier series offers, for example. For batch 1 inference, however, the story is completely different since the workload will then be latency-bound where FPGAs would have a clear advantage over GPUs.
>Are there optimized libraries available which can be used instead of hand coding/reusing GIT based code for some of the implementation.
From Intel there is probably just OpenVINO which cannot be customized. It is unlikely you would find something that is both free/open-source and well-optimized; well-optimized deep learning library for FPGAs is something people sell these days rather than provide for free.
>Are there any other options to try out?
Even though I have seen the author of that code posting on this forum a few times, you'd be better off asking him directly by creating an issue on his repository or sending him an email; only he can tell you how to use his code properly.
PipeCNN is not own by Intel and it is developed by University Professor. From Intel, we only have solution which is included part of OpenVINO SDK solution. If you would like to have a customize primitive (https://www.youtube.com/watch?v=UNoEflEM2IQ) to be run on your custom board then I would recommend you to contact your local sales on this features.
We also have our partner which provided solution. You can get more information from https://www.intel.com/content/www/us/en/artificial-intelligence/programmable/partner-solutions.html