FPGA - SSD300 - RuntimeError: The plugin does not support precision I64 for layer ...

Buvana_R · ‎01-05-2021

Hello,

I am using Openvino toolkit version 2020.3 and I am trying to get (a) Resnet-50 and (b) SSD model(s) inferenced from Intel A10 FPGA.

I am using the bitstreams:

a) 2020-3-1_PL2_FP16_InceptionV1_ResNet_YoloV3.aocx for the Resnet model.

b) '2020-3-1_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx' for the SSD model(s).

Using model_downloader tool, I downloaded the following:
a) Resnet-50

b) SSD300

I converted these models to FP16 and FP32 representation using the model optimizer tool (with the Resnet-50, I removed the final softmax layer as it is not supported in FPGA).

I tried the benchmark tool with the device as FPGA on these models and it was successful on the Resnet-50 (FP32 and FP16).

python3 benchmark_app.py -m /opt/converted_models/resnet-50-caffe2-No-Softmax/public/resnet-50-caffe2/FP32/resnet-50-caffe2.xml -d FPGA
[Step 1/11] Parsing and validating input arguments
[ WARNING ] -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
API version............. 2.1.2020.3.1-3500-68236d2e44c-releases/2020/3
[ INFO ] Device info
FPGA
dliaPlugin.............. version 2.1
Build................... 2020.3.1-3500-68236d2e44c-releases/2020/3

[Step 3/11] Reading the Intermediate Representation network
[ INFO ] Read network took 94.98 ms
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 13698.93 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'gpu_0/data' precision U8, dimensions (NCHW): 1 3 224 224
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'gpu_0/data' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'gpu_0/data' with random values (image is expected)
[ INFO ] Infer Request 2 filling
[ INFO ] Fill input 'gpu_0/data' with random values (image is expected)
[ INFO ] Infer Request 3 filling
[ INFO ] Fill input 'gpu_0/data' with random values (image is expected)
[ INFO ] Infer Request 4 filling
[ INFO ] Fill input 'gpu_0/data' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 5 inference requests, limits: 120000 ms duration)
[Step 11/11] Dumping statistics report
Count: 17530 iterations
Duration: 120052.41 ms
Latency: 34.19 ms
Throughput: 146.02 FPS

However, I got the error that I64 format at a certain layer is not supported on the FPGA (it was successful on the CPU) when I tried the benchmark tool on the SSD300 model :

python3 benchmark_app.py -m /opt/converted_models/ssd300_caffe2_to_IR/public/ssd300/FP16/ssd300.xml -d FPGA
[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.3.1-3500-68236d2e44c-releases/2020/3
[ INFO ] Device info
         FPGA
         dliaPlugin.............. version 2.1
         Build................... 2020.3.1-3500-68236d2e44c-releases/2020/3

[Step 3/11] Reading the Intermediate Representation network
[ INFO ] Read network took 54.55 ms
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[ ERROR ] The plugin does not support precision I64 for layer mbox_conf_flatten/Cast_17952_const
Supported formats are: FP32 and FP16.
Traceback (most recent call last):
  File "/opt/intel/openvino_2020.3.341/python/python3.6/openvino/tools/benchmark/main.py", line 87, in run
    exe_network = benchmark.load_network(ie_network, perf_counts)
  File "/opt/intel/openvino_2020.3.341/python/python3.6/openvino/tools/benchmark/benchmark.py", line 138, in load_network
    num_requests=1 if self.api_type == 'sync' else self.nireq or 0)
  File "ie_api.pyx", line 178, in openvino.inference_engine.ie_api.IECore.load_network
  File "ie_api.pyx", line 187, in openvino.inference_engine.ie_api.IECore.load_network
RuntimeError: The plugin does not support precision I64 for layer mbox_conf_flatten/Cast_17952_const
Supported formats are: FP32 and FP16.

Looking closely at the SSD300 xml file, I realize that there are several constants that are I64 and they are used further along in 'Reshape' and 'Transform' operations.

Interestingly, there is a layer, 'OC2_DUMMY_0/Cast_114292_const' ('Const' type) with I64 which is used in the subsequent 'Reshape' operation in the Resnet-50 model, which seems to work fine with the FPGA. In my guess, the actual error is different from what is thrown by the tool.

I also trialed the ssdlite_mobilenet_v2 model and got the same error message as I got for SSD300 model.

My questions:
1) How is it that the inferencing succeeded in the presence of an I64 attribute in the Resnet-50 model where as it failed in the SSD300 model?

2) How do I get the SSD300 model inferenced successfully from the FPGA now? In our team, we trained several other object detection models whose IR files contain I64 constants and subsequent Reshape or Transform operations based on these I64 constants. Is there any scope for inferencing them on the FPGA?

3) Is there a size limit on the model for the FPGA inferencing? If so, what is the guideline in determining the same?

I attach the xml files for the FP32 precision of Resnet-50, SSD300 and ssd_mobilenet_v2 models here. Let me know if you need the binary files, which are very bulky as you know.

Thanks,

Buvana

Munesh_Intel · ‎01-06-2021

Hi Buvana,

Thank you for contacting us. We are investigating this issue, and will get back to you at the earliest.

Regards,

Munesh

Munesh_Intel · ‎01-06-2021

Hi Buvana,

Please provide the Model Optimizer command used for conversion to IR.

Additionally, we would also like you to clarify if the issue with SSD300 model occurs with data type FP32 or only with FP16.

Regards,

Munesh

Buvana_R · ‎01-07-2021

Hello,

I used converter.py under the /opt/intel/openvino/deployment_tools/tools/model_downloader directory:

python3 converter.py -d . -o /opt/converted_models/ssd300_caffe2_to_IR --name ssd300

The error comes for both FP16 and FP32.

Thanks,

Buvana

Buvana_R · ‎01-07-2021

And the converter.py results in the following syntax for model optimizer invocation for FP16 (and similarly for FP32):

/usr/bin/python3 -- /opt/intel/openvino_2020.3.341/deployment_tools/model_optimizer/mo.py --framework=caffe --data_type=FP16 --output_dir=/opt/converted_models/ssd300_caffe2_to_IR/public/ssd300/FP16 --model_name=ssd300 '--input_shape=[1,3,300,300]' --input=data '--mean_values=data[104.0,117.0,123.0]' --output=detection_out --input_model=public/ssd300/models/VGGNet/VOC0712Plus/SSD_300x300_ft/VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.caffemodel --input_proto=public/ssd300/models/VGGNet/VOC0712Plus/SSD_300x300_ft/deploy.prototxt

Munesh_Intel · ‎01-19-2021

Hi Buvana,

Apologies for the delay and thank you for waiting.

The following are the answers for your respective questions:

(1) I64 is not supported by the mbox_conf_flatten/Cast_17952_const layer but might or might not be supported by other layers.

(2) We recommend using FPGA device with HETERO plugin in combination with another inference device for unsupported layers by FPGA, as stated in the following example:

https://docs.openvinotoolkit.org/2020.3/_docs_install_guides_VisionAcceleratorFPGA_Configure.html#4_run_the_image_classification_sample_application

It is worth mentioning here that Intel will be transitioning to the next-generation programmable deep-learning solution based on FPGAs in order to increase the level of customization possible in FPGA deep-learning. As part of this transition, future standard releases (i.e., non-LTS releases) of Intel® Distribution of OpenVINO™ toolkit will no longer include the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA and the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA.

The product change notice is available here :

https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_supported_plugins_FPGA.html#product_change_notice

(3) We don’t have any particular size limit, but we would suggest you use the techniques provided in the following links to optimize your model.

https://docs.openvinotoolkit.org/2021.2/openvino_docs_optimization_guide_dldt_optimization_guide.html#fpga

https://docs.openvinotoolkit.org/2021.2/openvino_docs_optimization_guide_dldt_optimization_guide.html#heterogeneous-scenarios-fpga

Regards,

Munesh

Munesh_Intel · ‎01-28-2021

Hi Buvana,

This thread will no longer be monitored since we have provided references and recommendations. If you need any additional information from Intel, please submit a new question.

Regards,

Munesh