Intel FPGA AI Sutie Inference Engine

RubenPadial · ‎02-09-2025

Is there any official documentation on the DLA runtime or inference engine for managing the DLA from the ARM side? I need to develop a custom application for running inference, but so far, I’ve only found the dla_benchmark (main.cpp) and streaming_inference_app.cpp example files. There should be some documentation covering the SDK. The only documentation that i found related with is the Intel FPGA AI suite PCIe based design example https://www.intel.com/content/www/us/en/docs/programmable/768977/2024-3/fpga-runtime-plugin.html

From what I understand, the general inference workflow involves the following steps:

Identify the hardware architecture
Deploy the model
Prepare the input data
Send inference requests to the DLA
Retrieve the output data

JohnT_Intel · ‎02-11-2025

Hi Ruben,

Currently we do not have any document publish. Let me check internally if we have any documentation to share out.

RubenPadial · ‎02-12-2025

Hello @JohnT_Intel ,

I know both example applications are based on OpenVINO runtime but I cannot find anything about FPGA and HETERO plugin to make inferences in HETERO:FPGA,CPU mode. This is the documentation I found https://docs.openvino.ai/archives/index.html

I will very helpful any official documentation from Intel side to make Intel FPGA AI suite really useful.

JohnT_Intel · ‎02-12-2025

Hi Ruben,

Currently the only documentation is from the OpenVINO tools. When you are using HETERO:FPGA, CPU then the OpenVINO will try the AI in FPGA whenever it is possible and if it is not possible then the layer will be performed in CPU side. The OpenVINO will automatically communicate with the FPGA MMD driver

Let me know if you have further queries on this or you need any help on this.

RubenPadial · ‎02-12-2025

Hello @JohnT_Intel ,

But when I use "GetAvailableDevices()" method I only get CPU as available device. There should be something I missed.
Form my point of view, there some points to be clarified from the Intel/Altera side to use OpenVINO tool in FPGA devices with FPGA AI Suite.

JohnT_Intel · ‎02-12-2025

Hi,

You may make use of dla_benchmark apps and modfy from there. The new method should be as using "device_name.find("FPGA")"

RubenPadial · ‎02-15-2025

Hello @JohnT_Intel

Taking dla_benchmark as an example, I get the following error:

[ ERROR ]

runtime/hps_packages/openvino/src/inference/src/ie_common.cpp:75
runtime/plugin/src/dlia_infer_request.cpp:53 Number of inference requests exceed the maximum number of inference requests supported per instance 5

I'm looping the inference request because I need to instantiate the DLA and continuously request inferences with new data. Each inference must be a single request, so I set nireq=1 and niter=1. Once an inference is finished, I request a new one with new input data.

Therefore, I loop from step no. 9 to 11, obtaining the new input data before filling the blobs.

Is this approach correct? I understand a real application needs to instantiate de DLA and keep filling input to compute the CNN oputput with new data.

JohnT_Intel · ‎02-16-2025

Hi,

Can you share me with your code or step so that I can try duplicating the issue from my side?

RubenPadial · ‎02-16-2025

Hello @JohnT_Intel ,

Here it is: https://consigna.ugr.es/?s=download&token=095745fa-b5b5-4d04-a372-f7a08c626780
It is based on dla_benchmakr

JohnT_Intel · ‎02-18-2025

Hi,

Can you also the full log when you are running it multiple time until you observed the error?

RubenPadial · ‎02-18-2025

Hello @JohnT_Intel,
Here it is: https://consigna.ugr.es/?s=download&token=0fcf80b0-8ff5-47da-8da7-5b9acebf1646

As you can see with the debug lines I included, program fails in line "inferRequestsQueues.push_back(std::move(
std::unique_ptr<InferRequestsQueue>(new InferRequestsQueue(*exeNetwork, nireq))));" in the 6th iteration.

JohnT_Intel · ‎02-21-2025

Hi Ruban,

Sorry that I forget to check which FPGA AI Suite version are you running? As the latest FPGA AI Suite (2024.3) runtime application code is different to yours.

RubenPadial · ‎02-21-2025

Hello @JohnT_Intel,

I'm currently using FPGA AI Suite 2023.2 and OpenVINO 2022.3.1. I know it is no the latest rlease of FPGA AI Suite but I cannot move the project to FPGA AI Suite 2024.3 at this moment.

JohnT_Intel · ‎02-24-2025

Hi Ruben,

I check on the log provided but it does not provide the full information on how you run it. Is it running the same graph? Or can you provide the step you use to run the application?

RubenPadial · ‎02-24-2025

Hello @JohnT_Intel ,

Same grpah with nireq and niter set to 1 in every inference.

This is how I run the application:
./ris_app \
-arch_file=$arch \
-cm=$model \
-plugins_xml_file=$plugins \
-nireq=1 \
-niter=1 \
-d=HETERO:FPGA,CPU

As far as I know, I only used the graph once to configure the DLA, and then I continually request inferences to that instance. At least that was the objective.

JohnT_Intel · ‎02-24-2025

Hi Ruben,

Can I confirm that you are running below command multiple time where the 6 times, you are facing the error?

./ris_app \

-arch_file=$arch \

-cm=$model \

-plugins_xml_file=$plugins \

-nireq=1 \

-niter=1 \

-d=HETERO:FPGA,CPU

RubenPadial · ‎02-24-2025

Hello @JohnT_Intel ,

No, I run it once. In the application there is a loop from step 9 to 11. In the 6th iteration of the loop program fails.

Prior stepts to 9 are intended to configure the dla and create the dla instance. The aim of looping steps 9 to 11 is to continually request inference to the already configured dla.

JohnT_Intel · ‎02-24-2025

Hi Ruben,

If that is the case then I suspect that the FPGA AI suite might not be able to run as it is already pre-occupied with the previous inferencing. It has already not able to run further inferencing unless the previous task is already fully completed and it can move towards a new inferencing.

RubenPadial · ‎02-24-2025

Hello @JohnT_Intel ,

Yes, that's what I supposed. How should it be handled?

The inferRequest->wait(), inferRequest->startAsync() and inferRequestsQueue->waitAll() statements are used, and the output is properly retrieved, so the inference is completed. I don't know what happens with the request or how to handle/wait/stop the request once inference is finished.

JohnT_Intel · ‎02-25-2025

Hi,

it seems like you are creating a new inference request for every new input, and it failed at the 6th. Instead of creating a new inference request for every new input, you should keep using the same set of inference requests, wait for one to become available, and supply input data to it.

RubenPadial · ‎02-25-2025

Hello @JohnT_Intel
,Do you have an example or pseudocee?

Intel FPGA AI Sutie Inference Engine

Artificial Intelligence

FPGA Interface Manager (FIM)

SW|HDL Development