Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, FPGA AI Suite, Software Stack, and Reference Designs
500 Discussions

Intel FPGA AI Sutie Inference Engine

RubenPadial
New Contributor I
1,900 Views

Is there any official documentation on the DLA runtime or inference engine for managing the DLA from the ARM side? I need to develop a custom application for running inference, but so far, I’ve only found the dla_benchmark (main.cpp) and streaming_inference_app.cpp example files. There should be some documentation covering the SDK. The only documentation that i found related with is the Intel FPGA AI suite PCIe based design example https://www.intel.com/content/www/us/en/docs/programmable/768977/2024-3/fpga-runtime-plugin.html

From what I understand, the general inference workflow involves the following steps:

  1. Identify the hardware architecture
  2. Deploy the model
  3. Prepare the input data
  4. Send inference requests to the DLA
  5. Retrieve the output data
0 Kudos
35 Replies
JohnT_Intel
Employee
1,382 Views

Hi Ruben,


Currently we do not have any document publish. Let me check internally if we have any documentation to share out.


0 Kudos
RubenPadial
New Contributor I
1,352 Views

Hello @JohnT_Intel ,

I know both example applications are based on OpenVINO runtime but I cannot find anything about FPGA and HETERO plugin to make inferences in HETERO:FPGA,CPU mode. This is the documentation I found https://docs.openvino.ai/archives/index.html

I will very helpful any official documentation from Intel side to make Intel FPGA AI suite really useful.

0 Kudos
JohnT_Intel
Employee
1,330 Views

Hi Ruben,


Currently the only documentation is from the OpenVINO tools. When you are using HETERO:FPGA, CPU then the OpenVINO will try the AI in FPGA whenever it is possible and if it is not possible then the layer will be performed in CPU side. The OpenVINO will automatically communicate with the FPGA MMD driver


Let me know if you have further queries on this or you need any help on this.


0 Kudos
RubenPadial
New Contributor I
1,292 Views

Hello @JohnT_Intel ,

But when I use "GetAvailableDevices()" method I only get CPU as available device. There should be something I missed.
Form my point of view, there some points to be clarified from the Intel/Altera side to use OpenVINO tool in FPGA devices with FPGA AI Suite.

0 Kudos
JohnT_Intel
Employee
1,286 Views

Hi,


You may make use of dla_benchmark apps and modfy from there. The new method should be as using "device_name.find("FPGA")"


0 Kudos
RubenPadial
New Contributor I
1,238 Views

Hello @JohnT_Intel 

Taking dla_benchmark as an example, I get the following error:

[ ERROR ]

runtime/hps_packages/openvino/src/inference/src/ie_common.cpp:75
runtime/plugin/src/dlia_infer_request.cpp:53 Number of inference requests exceed the maximum number of inference requests supported per instance 5

I'm looping the inference request because I need to instantiate the DLA and continuously request inferences with new data. Each inference must be a single request, so I set nireq=1 and niter=1. Once an inference is finished, I request a new one with new input data.

Therefore, I loop from step no. 9 to 11, obtaining the new input data before filling the blobs.

Is this approach correct? I understand a real application needs to instantiate de DLA and keep filling input to compute the CNN oputput with new data.




0 Kudos
JohnT_Intel
Employee
1,192 Views

Hi,


Can you share me with your code or step so that I can try duplicating the issue from my side?


0 Kudos
RubenPadial
New Contributor I
1,167 Views
0 Kudos
JohnT_Intel
Employee
1,143 Views

Hi,


Can you also the full log when you are running it multiple time until you observed the error?


0 Kudos
RubenPadial
New Contributor I
1,127 Views

Hello @JohnT_Intel,
Here it is: https://consigna.ugr.es/?s=download&token=0fcf80b0-8ff5-47da-8da7-5b9acebf1646

As you can see with the debug lines I included, program fails in line "inferRequestsQueues.push_back(std::move(
std::unique_ptr<InferRequestsQueue>(new InferRequestsQueue(*exeNetwork, nireq))));" in the 6th iteration.

0 Kudos
JohnT_Intel
Employee
1,062 Views

Hi Ruban,


Sorry that I forget to check which FPGA AI Suite version are you running? As the latest FPGA AI Suite (2024.3) runtime application code is different to yours.


0 Kudos
RubenPadial
New Contributor I
1,053 Views

Hello @JohnT_Intel,

I'm currently using FPGA AI Suite 2023.2 and OpenVINO 2022.3.1. I know it is no the latest rlease of FPGA AI Suite but I cannot move the project to FPGA AI Suite 2024.3 at this moment.

0 Kudos
JohnT_Intel
Employee
864 Views

Hi Ruben,


I check on the log provided but it does not provide the full information on how you run it. Is it running the same graph? Or can you provide the step you use to run the application?


0 Kudos
RubenPadial
New Contributor I
832 Views

Hello @JohnT_Intel ,

Same grpah with nireq and niter set to 1 in every inference.

This is how I run the application:
./ris_app \
-arch_file=$arch \
-cm=$model \
-plugins_xml_file=$plugins \
-nireq=1 \
-niter=1 \
-d=HETERO:FPGA,CPU

As far as I know, I only used the graph once to configure the DLA, and then I continually request inferences to that instance. At least that was the objective.

0 Kudos
JohnT_Intel
Employee
830 Views

Hi Ruben,


Can I confirm that you are running below command multiple time where the 6 times, you are facing the error?


./ris_app \

-arch_file=$arch \

-cm=$model \

-plugins_xml_file=$plugins \

-nireq=1 \

-niter=1 \

-d=HETERO:FPGA,CPU


0 Kudos
RubenPadial
New Contributor I
826 Views

Hello @JohnT_Intel ,

 

No, I run it once. In the application there is a loop from step 9 to 11. In the 6th iteration of the loop program fails.

Prior stepts to 9 are intended to configure the dla and create the dla instance. The aim of looping steps 9 to 11 is to continually request inference to the already configured dla.

0 Kudos
JohnT_Intel
Employee
824 Views

Hi Ruben,


If that is the case then I suspect that the FPGA AI suite might not be able to run as it is already pre-occupied with the previous inferencing. It has already not able to run further inferencing unless the previous task is already fully completed and it can move towards a new inferencing.


0 Kudos
RubenPadial
New Contributor I
821 Views

Hello @JohnT_Intel ,

Yes, that's what I supposed. How should it be handled?

The inferRequest->wait(), inferRequest->startAsync() and inferRequestsQueue->waitAll() statements are used, and the output is properly retrieved, so the inference is completed. I don't know what happens with the request or how to handle/wait/stop the request once inference is finished.

0 Kudos
JohnT_Intel
Employee
791 Views

Hi,


it seems like you are creating a new inference request for every new input, and it failed at the 6th. Instead of creating a new inference request for every new input, you should keep using the same set of inference requests, wait for one to become available, and supply input data to it. 


0 Kudos
RubenPadial
New Contributor I
782 Views

Hello @JohnT_Intel
,Do you have an example or pseudocee?

 

0 Kudos
Reply