Profiling, Software/Hardware Partitioning and Accelerating

BBobb6 · ‎05-18-2020

Hi all,

I am working on an open-source application written mainly in C++ and C. But mostly in C++. I have started using that application and then with the passage of time I had started profiling the application. I compiled the huge source code and started profiling the source code. The reason for profiling was to see the compute intensive (CPU intensive) part of the application. And once those parts are identified, I would like to offload those parts on Intel FPGA. For profiling, I had used Intel VTune Profiler. The VTune Profiler has shown gave me some hotspots. And profiling is still work in progress. Currently I am focusing on function level profiling i.e. function calls. I have following questions.

Specific questions:

- Given my use case, is there any Intel tool where I could implement automatic Software-Hardware partitioning of an application with target architecture being Intel FPGA? .

- I know Intel has Intel HLS compiler. Does Intel HLS compiler provide automatic software-hardware partitioning? Is there any chance that I can bring HLS compiler in this use case?

General question:

Considering my use case, what would you recommend if I have to accelerate the application on Intel FPGA? I mean the right Intel tools etc.

Thanks in advance.

BR

Bobby !

AnilErinch_A_Intel · ‎05-21-2020

Hi,

Intel HLS can be very suitable in this case. Since you already know the computationally intensive points in your code, you can inspect what makes them computationally intensive and use HLS best practices like loop unrolling to mitigate them in the hardware. Already complex financial algorithms and workloads are running using HLS, also you have some reference to start with the examples coming along with HLS installation.

The reports of compilation will help you to see the result of optimizations performed.

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/hls/archives/ug-hls-best-practices-17-1.pdf

For accelerating the application and to check out the performance boost you can access the PAC(Programmable Acceleration Card) from Intel FPGA DevCloud and try out before planning to use a custom board , as DevCloud have good resources to get you started.

https://github.com/intel/FPGA-Devcloud

Thanks and Regards

Anil