I saved a tensorflow Deeplab segmentation model in the SavedModel format and converted it to IRv11 using openvino-dev model optimizer utility. When i ran inference on the IR model, the throughput that it gave was 5 FPS. I suspected that something might be off as I used to get around double the FPS using Tensorflow CPU.
So, I created and saved a dummy TF CNN model that just has one layer with 128 3x3 filters and converted it to IRv11 format. When I ran inference on this dummy, the throughput was still at 8 FPS. I have listed my installation details below. Please help me figure out if this due to tensorflow because one of my teammates is able to achieve 25 FPS on an equally big Pytorch object detection model as the Deeplab on the same machine. That's what caused me to think that there might something that I did wrongly during the Tensorflow model conversion.
OS: Ubuntu 18.04.5 LTS x86_64
CPU: Intel Xeon Silver 4208 (32) @ 3.200GHz
GPU: NVIDIA NVIDIA Corporation Device 2206
Memory: 22813MiB / 31850MiB
Tensorflow version: 2.3.1
Installed openvino toolkit using the command: pip install openvino-dev[tensorflow2]
Dummy model code:
Model Optimizer arguments:
- Path to the Input Model: None
- Path for generated IR: /mnt/workspace/Anirudh/warehouse_mezzanine/training/models/.
- IR output name: saved_model
- Log level: ERROR
- Batch: Not specified, inherited from the model
- Input layers: Not specified, inherited from the model
- Output layers: Not specified, inherited from the model
- Input shapes: Not specified, inherited from the model
- Source layout: Not specified
- Target layout: Not specified
- Layout: Not specified
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP32
- Enable fusing: True
- User transformations: Not specified
- Reverse input channels: False
- Enable IR generation for fixed input shape: False
- Use the transformations config file: None
- Force the usage of legacy Frontend of Model Optimizer for model conversion into IR: False
- Force the usage of new Frontend of Model Optimizer for model conversion into IR: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Use the config file: None
OpenVINO runtime found in: /mnt/workspace/Anirudh/warehouse_mezzanine/training/ov/lib/python3.6/site-packages/openvino
OpenVINO runtime version: 2022.2.0-7713-af16ea1d79a-releases/2022/2
Model Optimizer version: 2022.2.0-7713-af16ea1d79a-releases/2022/2
[ WARNING ] The model contains input(s) with partially defined shapes: name="input_1" shape="[-1, 320, 544, 3]". Starting from the 2022.1 release the Model Optimizer can generate an IR with partially defined input shapes ("-1" dimension in the TensorFlow model or dimension with string value in the ONNX model). Some of the OpenVINO plugins require model input shapes to be static, so you should call "reshape" method in the Inference Engine and specify static input shapes. For optimal performance, it is still recommended to update input shapes with fixed ones using "--input" or "--input_shape" command-line parameters.
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /mnt/workspace/Anirudh/warehouse_mezzanine/training/models/saved_model.xml
[ SUCCESS ] BIN file: /mnt/workspace/Anirudh/warehouse_mezzanine/training/models/saved_model.bin
[ SUCCESS ] Total execution time: 11.30 seconds.
[ SUCCESS ] Memory consumed: 480 MB.
[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai
100%|████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:37<00:00, 5.34it/s]
Throughput of IR Model 5.75 FPS
I assume you are using Google colab to run the tensorflow codes?
If that is the case, the Google Colab generally uses 2 vCPU ( Intel Xeon CPU @2.20 GHz, 13 GB RAM).
Meanwhile, in your specification, you mentioned that you are using Intel Xeon Silver 4208 (32) @ 3.200GHz.
Different type/numbers of inferencing hardware does result in different performances (eg: fps).
Another thing that may impact the performance is IR model precision.
This documentation might help you to understand better.
Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.