Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6413 Discussions

CNN model converted from TF2 to Openvino IR model gives very low throughput

Cosmichunt
Beginner
724 Views

I converted a single layer CNN model with ~3k parameters and a single layer Fully Connected model with ~35k parameters from Tensorflow SavedModel to OpenVino IR format using the commands 

mo --saved_model_dir dummy_cnn_model/  

mo --saved_model_dir dummy_dnn_model/  

The CNN model when tested with the OpenVino benchmarking tool has a latency of 65 ms

 

[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 81.50 ms
[Step 11/11] Dumping statistics report
Count: 3700 iterations
Duration: 60152.08 ms
Latency:
Median: 58.95 ms
AVG: 64.50 ms
MIN: 30.26 ms
MAX: 275.16 ms

The Fully connected model when tested with the same OpenVino benchmarking tool has a latency of 0.06 ms.

 

[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 1.13 ms
[Step 11/11] Dumping statistics report
Count: 2178004 iterations
Duration: 60000.10 ms
Latency:
Median: 0.06 ms
AVG: 0.07 ms
MIN: 0.03 ms
MAX: 19.31 ms

I understand that convolution operation is quite different from dense matrix multiplication but this comparison shows a difference in speed of 3 orders of magnitude. The dense single-layer model runs 1000x faster than the single-layer CNN model. 

 

The main reason why I started this benchmark was to find out why my Tensorflow deeplab CNN openvino IR model was running with an FPS of 5  when a fully connected model ran significantly faster. 

Could anyone please explain if this is due to Tensorflow because a yolov5-medium model in Pytroch that's similar in size to my Tensorflow Deeplab model  gives a throughput of 25 FPS on the same machine. Check the specs of my local server down below and I am not running this test in colab.

System specs: 

OS: Ubuntu 18.04.5 LTS x86_64

CPU: Intel Xeon Silver 4208 (32) @ 3.200GHz

GPU: NVIDIA NVIDIA Corporation Device 2206

Memory: 22813MiB / 31850MiB

 

Tensorflow version: 2.3.1

Openvino-dev version:

Name: openvino-dev
Version: 2022.2.0

Installed openvino toolkit using the command: pip install openvino-dev[tensorflow2]

 

@IntelSupport

0 Kudos
5 Replies
Peh_Intel
Moderator
693 Views

Hi Cosmichunt,

 

Thanks for reaching out to us.

 

From your description, I assume you are comparing a TensorFlow DeepLab model (Segmentation model) with a YOLO-V5 Pytorch model (Object Detection model) which results in TensorFlow DeepLab model only run 5 FPS whereas YOLO-V5 PyTorch model can run 25 FPS.

 

If this is the case, the reason of getting obvious difference in the throughput is not due to TensorFlow versus PyTorch but due to different model architecture and algorithms. For Object Detection, the method localizes and classifies the object in the image based on bounding box coordinates. While, in image segmentation, the model has to detect the exact boundaries of the object, which makes the job heavier and hence slower.

 

 

Regards,

Peh


0 Kudos
Cosmichunt
Beginner
654 Views

@Peh_Intel  But, what about the case where a single layer CNN model made with tensorflow and converted to Openvino that runs 10^3 times slower than a fully connected layer?

 

Regards

Anirudh 

0 Kudos
Peh_Intel
Moderator
643 Views

Hi Anirudh,

 

Here is a Stack Overflow discussion on why Convolution layer is slower than fully connected layer:

Torch: why convolution layer is even slower than full connect linear layer when in same data size

 

On another note, if possible, please share both your models (single layer CNN model and single layer Fully Connected model) so that we can validate on our side as well.

 

 

Regards,

Peh


0 Kudos
Cosmichunt
Beginner
613 Views

I finally understood a change that improved the performance. Tensorflow has tf.saved_model.save() and keras model.save(). When the model is saved using tf.saved_model.save( ), the inference using a single-CNN-layer-model goes up to 5000 FPS. But when I saved the model using the model.save() method, I observed that the FPS falls down to 7 for the same single-layer-CNN model.

 

Thanks for all your help and pointers!

Cheers,

Anirudh 

0 Kudos
Peh_Intel
Moderator
591 Views

Hi Anirudh,

 

Thanks for letting us know and sharing the information here.

 

This thread will no longer be monitored since this issue has been resolved. If you need any additional information from Intel, please submit a new question.

 

 

Regards,

Peh


0 Kudos
Reply