Poor performance (fps) on ssd_resnet_50_fpn_coco

Chen__Cheng · ‎04-08-2019

Hi

I am new to Intel OpenVino, and so far it is really a beautiful solution for inference on CPU. Some of the optimized models converted from Tensorflow Object detection model zoo work amazing fast on the CPU, but some of them work dramatically slower than the faster ones.

My system configuration: Ubuntu 16.04, i9-7900x, with 80GB ram and computer_vision_sdk_2018.5.455.

With reading the online documentation, I am able to convert several models into IR model and test inference. Besides all the mAPs, I am interested in seeing how many fps it can get on the CPU. For ssd_inception_v2_coco I am able to get around 80 fps, ssd_mobilenet_v1_coco with around 100 fps. The CPU extension I am using is from "/inference_engine_samples_build/intel64/Release/lib/libcpu_extension.so".

But, ssd_resnet_50_fpn_coco only can run at around 8 fps. This is an almost 10x time difference and I am wondering why. On the other hand, the speed reference for those models (on GPU of course) is quite "linear" to the model size. I also tried faster rcnn with resnet50, it is only 5 fps on CPU. So, any hint will be very helpful. Could someone help to explain to me a bit why???

PS: in your Object detection models (https://docs.openvinotoolkit.org/2018_R5/usergroup1.html), every model has a link to the performance table, but it is not accessible to users such as me... Unlike the model zoo from tensorflow, I can quickly get an overview of each models' performance, it is quite useful to the developers as I assume.

Shubha_R_Intel · ‎04-08-2019

Dear Chen, Cheng:

Right that performance table is not available publicly. First I have to ask, are you using 2019 R1 ? OpenVino in R1 uses TBB (Thread Building Blocks) as default which greatly improves the performance of multi-network pipelined models. Please read this blog for more information :

https://software.intel.com/en-us/blogs/2019/04/02/improved-parallelization-extended-deep-learning-capabilities-in-intel-distribution

Are you running your experiments on 2019 R1 ?

Thanks,

Shubha

Chen__Cheng · ‎04-08-2019

Hi Shubha

Thanks for your quick reply, I think I am using 2018.x version, so I will upgrade and give feedbacks.

Regards

Cheng

Chen__Cheng · ‎04-09-2019

Dear Shubha

I have updated OpenVino to 2019, and I still observe the same behaviour... It is still quite slow on ssd_resnet_50_fpn_coco. Only around 5.7 fps. To avoid any bias in measurement, I use benchmark_app for measuring the performance.

Meanwhile, after updating Openvino, I got another error with model_optimizer(-tensorflow): Note that the model I am trying to covert is further trained from tensorflow model zoo checkpoint. BTW, I did not get this error in previous version of openvino.

Here is the output:

Model Optimizer arguments:
Common parameters:
   - Path to the Input Model:    /home/cheng/TensorFlow/models/tested_models/model_ssd_mobilenet_v2/frozen_inference_graph.pb
   - Path for generated IR:    /home/cheng/openvino_models/.
   - IR output name:    frozen_inference_graph
   - Log level:    ERROR
   - Batch:    Not specified, inherited from the model
   - Input layers:    Not specified, inherited from the model
   - Output layers:    Not specified, inherited from the model
   - Input shapes:    Not specified, inherited from the model
   - Mean values:    Not specified
   - Scale values:    Not specified
   - Scale factor:    Not specified
   - Precision of IR:    FP32
   - Enable fusing:    True
   - Enable grouped convolutions fusing:    True
   - Move mean values to preprocess section:    False
   - Reverse input channels:    True
TensorFlow specific parameters:
   - Input model in text protobuf format:    False
   - Path to model dump for TensorBoard:    None
   - List of shared libraries with TensorFlow custom layers implementation:    None
   - Update the configuration file with input/output node names:    None
   - Use configuration file used to generate the model with Object Detection API:    /home/cheng/TensorFlow/TransferLearning/models/tested_models/model_ssd_mobilenet_v2_2239670/pipeline.config
   - Operations to offload:    None
   - Patterns to offload:    None
   - Use the config file:    /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json
Model Optimizer version:    2019.1.0-341-gc9b66a2
The Preprocessor block has been removed. Only nodes performing mean value subtraction and scaling (if applicable) are kept.
[ ERROR ] -------------------------------------------------
[ ERROR ] ----------------- INTERNAL ERROR ----------------
[ ERROR ] Unexpected exception happened.
[ ERROR ] Please contact Model Optimizer developers and forward the following information:
[ ERROR ] Exception occurred during running replacer "REPLACEMENT_ID (<class 'extensions.middle.DeleteNotExecutable.DeleteNotExecutable'>)":
[ ERROR ] Traceback (most recent call last):
File "/opt/intel/openvino_2019.1.094/deployment_tools/model_optimizer/mo/utils/class_registration.py", line 174, in apply_replacements
graph_clean_up)
File "/opt/intel/openvino_2019.1.094/deployment_tools/model_optimizer/mo/middle/pattern_match.py", line 58, in for_graph_and_each_sub_graph_recursively
func(graph)
File "/opt/intel/openvino_2019.1.094/deployment_tools/model_optimizer/mo/middle/passes/eliminate.py", line 186, in graph_clean_up_tf
graph_clean_up(graph, ['TFCustomSubgraphCall', 'Shape'])
File "/opt/intel/openvino_2019.1.094/deployment_tools/model_optimizer/mo/middle/passes/eliminate.py", line 181, in graph_clean_up
add_constant_operations(graph)
File "/opt/intel/openvino_2019.1.094/deployment_tools/model_optimizer/mo/middle/passes/eliminate.py", line 145, in add_constant_operations
Const(graph, dict(value=node.value, shape=np.array(node.value.shape))).create_node_with_data(data_nodes=node)
File "/opt/intel/openvino_2019.1.094/deployment_tools/model_optimizer/mo/ops/op.py", line 207, in create_node_with_data
[np.array_equal(old_data_value[id], data_node.value) for id, data_node in enumerate(data_nodes)])
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/intel/openvino_2019.1.094/deployment_tools/model_optimizer/mo/main.py", line 312, in main
return driver(argv)
File "/opt/intel/openvino_2019.1.094/deployment_tools/model_optimizer/mo/main.py", line 263, in driver
is_binary=not argv.input_model_is_text)
File "/opt/intel/openvino_2019.1.094/deployment_tools/model_optimizer/mo/pipeline/tf.py", line 128, in tf2nx
class_registration.apply_replacements(graph, class_registration.ClassType.MIDDLE_REPLACER)
File "/opt/intel/openvino_2019.1.094/deployment_tools/model_optimizer/mo/utils/class_registration.py", line 190, in apply_replacements
)) from err
Exception: Exception occurred during running replacer "REPLACEMENT_ID (<class 'extensions.middle.DeleteNotExecutable.DeleteNotExecutable'>)":

[ ERROR ] ---------------- END OF BUG REPORT --------------
[ ERROR ] -------------------------------------------------

Shubha_R_Intel · ‎04-09-2019

Dearest Cheng:

I'm so sorry that you're seeing these degradations in performance. I will definitely investigate the performance issue as well as the MO error you are seeing for ssd_resnet_50_fpn_coco.

Please check back here and I will update you.

Thanks,

Shubha

Shubha_R_Intel · ‎04-09-2019

Dear Cheng, I wanted to get back to you quickly. I had no problem creating IR on 2019 R1 using the below command:

python .\mo_tf.py --input_model C:\Users\MickeyMouse\Downloads\ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar\ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03\frozen_inference_graph.pb --tensorflow_use_custom_operations_config '
C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\extensions\front\tf\ssd_v2_support.json' -
-tensorflow_object_detection_api_pipeline_config C:\users\MickeyMouse\Downloads\ssd_resnet50_v1_fpn_shared_box_predictor_64
0x640_coco14_sync_2018_07_03.tar\ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03\pipeline.config

And I got the model from here (search for text 'ssd' at the download link):

https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow.html

And I followed instructions here for conversion to IR:

https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models.html

Once you are able to convert this model successfully on 2019 R1, please try the benchmark app demo to measure performance. Please report your findings here.

And thanks for using OpenVino !

Shubha

Chen__Cheng · ‎04-09-2019

Hi, Shubha

Thanks a lot for your help and fast reply.

I am/was able to convert the model directly fetched from the tensorflow model zoo, that works on my machine and I got around 6 fps tested with benchmark app. Taking that I have i9-7900x CPU, so I assume this is too slow for SSD-like models.
How many fps did you get by using the benchmark app?
I have a customized model trained based on tensorflow model zoo, after training, I did not have any problem converting it into ir model by using OpenVino-2018.5.45, but it does not work after I upgrade Openvino to 2019.1.094.

So the model I am having the problem of converting is not exactly the same but very similar to ssd-mobilenet-v2 model. But I do not think I have customized layers and the only difference could be the I have fewer detection classes for the output.

Regards

Cheng

Shubha_R_Intel · ‎04-10-2019

Dear Chen, Cheng:

So the model I am having the problem of converting is not exactly the same but very similar to ssd-mobilenet-v2 model. But I do not think I have customized layers and the only difference could be the I have fewer detection classes for the output.

I didn't try the benchmark_app yet but I will today and post back here.

Did you study the ssd_v2_support.json file and ensure that it matches your model ? You can either dump the frozen model to *.pbtxt or view it using Tensorboard - make sure that ssd_v2_support.json matches up with your model. Remember the *.json is expecting all those things inside to be found in the model.

The following is a great blog article on adding Tensorboard directives into your model code:

https://thecodacus.com/tensorboard-tutorial-visualize-networks-graphically/#.XKIqg_lKg2w

Or you can dump your model to a text file using code similar to this:

import tensorflow as tf
from tensorflow.python.lib.io import file_io
from tensorflow.core.protobuf import saved_model_pb2

def load_graph(frozen_graph_filename):
    
    with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())

    
    with tf.Graph().as_default() as graph:
        tf.import_graph_def(graph_def, name="prefix")
    return graph

if __name__ == '__main__':
    mygraph = load_graph("C:\\Users\\mickeymouse\\frozen_model_test.pb")
    tf.train.write_graph(mygraph, ".\", "frozen_model_test.txt")

Shubha_R_Intel · ‎04-10-2019

Dear Cheng,

Unfortunately I got an error when I ran this model through the benchmark_app. See below:

ERROR ] Unsupported primitive of type: Resample name: Resample_

Moreover, the object_detection_sample_ssd didn't work on the generated IR either (it crashed).

I'm sorry about this Cheng. I will file a bug on this issue and keep you posted.

Thanks for using OpenVino !

Shubha

Chen__Cheng · ‎04-11-2019

Dear Shubha

Thank you for your quick feedback. Hope to hearing from you about updates soon.

Regards

Cheng

Shubha_R_Intel · ‎04-11-2019

Dear Cheng, I have to admit - You are right. The performance looks bad. I found a workaround for the benchmark_app error above, see below:

I ran 1000 iterations and the throughput is less than 1 FPS ! The way I fixed the error is to add -l cpu_extension.dll . I will file a bug on your behalf straightaway ! Thanks for your patience Cheng,

Shubha

C:\Users\sdramani\Documents\Intel\OpenVINO\inference_engine_samples_2017\intel64\Release>benchmark_app.exe -d CPU -i c:\users\sdramani\Downloads\ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03\pic\horse.jpg -m c:\users\sdramani\Downloads\ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03\frozen_inference_graph.xml -l cpu_extension.dll -niter 1000
[ INFO ] InferenceEngine:
API version ............ 1.6
Build .................. 22239
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ] c:\users\sdramani\Downloads\ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03\pic\horse.jpg
[ INFO ] Loading plugin
[ INFO ] CPU (MKLDNN) extensions is loaded cpu_extension.dll
[ INFO ]
API version ............ 1.6
Build .................. 22239
Description ....... MKLDNNPlugin

[ INFO ] Loading network files
[ INFO ] Network batch size: 1, precision: FP32
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Input dimensions (NCHW): 1 3 640 640
[ WARNING ] Image is resized from (500, 333) to (640, 640)
[ INFO ] Input dimensions (NCHW): 1 3 640 640
[ WARNING ] Image is resized from (500, 333) to (640, 640)
[ INFO ] Start inference asynchronously (1000 async inference executions, 2 inference requests in parallel)

[ INFO ] Throughput: 0.77626 FPS

C:\Users\sdramani\Documents\Intel\OpenVINO\inference_engine_samples_2017\intel64\Release>

Chen__Cheng · ‎04-12-2019

Dear Shubha

Thanks for the update, looking forward to hearing more from you.

Regards

Cheng

Shubha_R_Intel · ‎04-12-2019

Dear Cheng, I have filed a bug (thanks for finding this bug !).

When the next release comes out, please check in and hopefully it's fixed by then !

Thanks so much -

Shubha