Intel FPGA AI suite accuracy drop

RubenPadial · ‎12-10-2023

Hello,

I'm using Intel FPGA AI 2023.2 on ubuntu 20.04 host computer and trying to infer a custom CNN in a Intel Arria 10 SoC FPGA.

The CNN was trained with TensorFlow and the accuracy is 98.89% across the test dataset.

After converting the model to IR model with OpenVINO model optimer the accuracy remains the same.

mo
--saved_model_dir "{path_savedModelPath}"
--input_shape "{lst_inputShape}"
--model_name "{str_modelName}"
--output_dir "{path_irTargetPath}"
--use_new_frontend

However afer running the model in the Intel FPGA AI Suite IP the accuracy drops to 74.64% across the same test dataset. The architecture used is A10_FP16_Generic.arch, which has "arch_precision"=FP16. I have also tested with A10_FP16_Performance.arch and A10_Performance.arch.

dla_compiler
--march "{path_archPath}"
--network-file "{path_xmlPath}"
--o "{path_binPath}"
--foutput-format=open_vino_hetero
--fplugin "HETERO:FPGA,CPU"
--fanalyze-performance
--fdump-performance-report
--fanalyze-area
--fdump-area-report

I tried to optimize the model with "compress_to_fp16" openVINO model optimizer option but when compiling with dla_compiler I get this error:
"Layer (Name: Transpose_517_compressed, Type: Constant) is not supported:
Error occurred.
../compiler/aot_plugin/src/dla_executable_network.cpp:134 Graph is not supported on FPGA plugin due to existance of layer (Name: Transpose_517_compressed, Type: Constant)
in topology. Most likely you need to use heterogeneous plugin instead of FPGA plugin directly."
As you can see, hetero plugin option is set to FPGA and CPU. It was also tested with Intel FPGA AI Suite 2023.3 and OpenVINO 2022.3.1 with the same error message.

The accuracy in software with this compressd IR model to FP16 is 98.91 so in the FPGA the accuracy should be almos the same but there is a 24% of accuracy drop.

Find attached both IR model files.

What could be the rootcause of this accuracy drop?
What solution I can implement to improve the accuracy?

JohnT_Intel · ‎12-12-2023

Hi,

When you mention you are getting 98.89% across the test dataset, is it run in CPU/GPU or FPGA?

Are you currently running in HETERO polugin with S2M AI Suite design?

RubenPadial · ‎12-12-2023

Hello @JohnT_Intel,

I achieve a performance of 98.89% when running the model on CPU/GPU (in TensorFlow and with the IR OpenVINO model). However, when running the same model in the FPGA with Intel FPGA AI Suite I get 74.64%. Interestingly, when employing similar tools from a different software vendor, I maintain the high performance of 98.89% on the FPGA but not with this Intel tool.

The Hetero plugin is enabled during graph compilation, as evidenced in the comment above with the dla_compiler command. HETERO pluging is also included in the plugins.xml file that uses dla_benchmark application. HETERO plugin is also included in the plugins.xml file used by the dla_benchmark application. If there is any other option to be enabled to compile the Intel PGA AI suite IP core, it should be done because I use create_hps_image.sh script form he Intel FPGA AI Suite SoC Design Example User Guide, and HETERO plugin is used to run the example model.

JohnT_Intel · ‎12-12-2023

Hi,

Do you incluse the custom Intel FPGA AI Suite Library? https://www.intel.com/content/www/us/en/docs/programmable/768972/2023-3/inputs-dla-compiler-command-options.html

May I know what is the design you use on FPGA that is able to obtain same accuracy?

RubenPadial · ‎12-12-2023

Hello @JohnT_Intel,

I think the option you suggest is --plugin-file, which is set by default to =$COREDLA_ROOT/bin/plugins.xml. It is the same file I'm refering to.

The FPGA design that is able to give the 98.89% of accuracy follows the same approach that Intel FPGA AI Suite. It is currently under study and I cannot make public more deatils. It is a research project. but I can give you more details by email if are really interested in.

RubenPadial · ‎12-13-2023

Please, find in the following link the complete TF model, IR model and Compiled Graph https://consigna.ugr.es/download.php?files_ids=63731

JohnT_Intel · ‎12-14-2023

Hi,

I face error downloading it. It mention token missing.

RubenPadial · ‎12-14-2023

https://consigna.ugr.es/?s=download&token=779d3377-e785-43aa-ab82-6e14c50072fe

JohnT_Intel · ‎12-19-2023

Hi,

I have check on the file you provided, it looks there is multiple layer that is not supported by FPGA. May I know if you are able to use the correct architecture that will fit your DLA?

Layer (Name: StatefulPartitionedCall/model/dense_1/BiasAdd, Type: Eltwise) is not supported: FPGA plugin: all parent nodes are not supported by FPGA.

Layer (Name: Constant_2560, Type: Constant) is not supported:

Layer (Name: StatefulPartitionedCall/model/dense_1/Tensordot, Type: Reshape) is not supported:

FPGA plugin: layer is not executed on FPGA because it does not have preceding layer on FPGA.

Layer (Name: StatefulPartitionedCall/model/dense_1/Tensordot/MatMul, Type: FullyConnected) is not supported:

FPGA plugin: this Fully Connected layer does not have preceding convolutional layer / sequence.

Layer (Name: StatefulPartitionedCall/model/re_lu_7/Relu, Type: Relu) is not supported:

FPGA plugin: layer is not executed on FPGA because it does not have preceding layer on FPGA.

Layer (Name: StatefulPartitionedCall/model/dense/BiasAdd, Type: Eltwise) is not supported:

FPGA plugin: 'Constant' nodes are not supported as input of nodes Eltwise

Layer (Name: Constant_2559, Type: Constant) is not supported:

RubenPadial · ‎12-19-2023

Hello @JohnT_Intel ,

According to Intel FPGA AI Suite. IP Reference Manual section 2.3. some of that layers should be compatible like the ReLU layer. Could you confirm? In addition, the model is compiled for HETERO plugin (--fplugin option), if the layer cannot be implemented in FPGA, it should be implemented in the CPU. What solution do you suggest?

Plase note that someof that layers are not in the intioan TensorFlow model and they are included by the OpenVINO model optimizer (mo) application.

JohnT_Intel · ‎12-22-2023

Hi,

The chapter 2.3 will showcase what is the features supported and not related to the arch that you using.

For Arria 10, you will need to refer to "dla\arch\descriptions\A10" folder on the list of the architecture that was created by Intel. Depending on which Arch use, it will impact on the performance and accuracy.

JohnT_Intel · ‎12-22-2023

You may also refer to A10_Performance.arch on what is the features implemented on your bitstream

JohnT_Intel · ‎01-04-2024

Hi,

Can we consolidate all the AI Suite disucssion into single forum discussion?

JohnT_Intel · ‎01-15-2024

Hi,

May I know if you are to try different .arch on the FPGA bitstream?

RubenPadial · ‎01-15-2024

Hello @JohnT_Intel

I prefer to manage the issues in different threads as they seem to have different root causes.

It has already been tested with the four example architectures provided in the Intel FPGA AI suite, and the same accuracy drop was observed.

When running the graph on the ARM CPU, the accuracy is acceptable. It appears to be a potential issue with dla compiler and the dimensions in the dense layer.

JohnT_Intel · ‎01-24-2024

Hi,

I suspect that the provided architecture might not be able to fully fit your needs. If you need to have a better performance then customizing into new architecture will better benefit your requirement. Unless the DLA you plan to run is the same as what has been tested by Intel.

JohnT_Intel · ‎02-15-2024

Hi,

May I know if it is possible for you to implement custom AI bitstream or you are planning to just run on the provided bitstream?

RubenPadial · ‎02-15-2024

Hello @JohnT_Intel,

At this moment it is only planned to use the already provided example architectures and build the bitstreams.

JohnT_Intel · ‎02-23-2024

Hi,

Due you are planning to use prrovided bitstream then it will be hard for us to improve the perforrmance on the AI workload. Please let me know if you have any other queries.

JohnT_Intel · ‎03-05-2024

We do not receive any response from you to the previous question/reply/answer that I have provided. This thread will be transitioned to community support. If you have a new question, feel free to open a new thread to get the support from Intel experts. Otherwise, the community users will continue to help you on this thread. Thank you.

AndrewRooney · ‎05-24-2024

One thing to check is your channel order. By default dla_benchmark will feed input data as RGB, but the dla_benchmark option `-bgr` will reverse the channels. This could be the culprit for a performance drop of this scale.

Intel FPGA AI suite accuracy drop

Artificial Intelligence