Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Does Inference Engine support non-square input images?

zhang__chunyan
Beginner
2,083 Views

Hi,everyone

 

Does anyone know how to use keep aspect ratio resize of input to Faster RCNN model? 

I have converted a Faster RCNN model by Model Optimizer, from the generated *.xml file, the input shape seems fixed:

<layer id="0" name="image_tensor" precision="FP32" type="Input">
            <output>
                <port id="0">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>600</dim>
                    <dim>600</dim>
                </port>
            </output>
        </layer>

And through the faster rcnn demo, I got totally wrong detection results:

[ INFO ] Loading model to the device
[ INFO ] Create infer request
[ INFO ] Batch size is 1
[ INFO ] Start inference
[ INFO ] Processing output blobs
[0,1] element, prob = 0.00185027    (0,-2147483648)-(0,-2147483648) batch id : 0
[1,1] element, prob = 0.00121647    (0,-2147483648)-(0,-2147483648) batch id : 0
[2,1] element, prob = 0.0011118    (0,-2147483648)-(0,-2147483648) batch id : 0
[3,1] element, prob = 0.00109558    (0,-2147483648)-(0,-2147483648) batch id : 0
[4,1] element, prob = 0.000270341    (0,-2147483648)-(0,-2147483648) batch id : 0
[5,1] element, prob = 0.000115265    (0,-2147483648)-(0,-2147483648) batch id : 0
[6,1] element, prob = 0.000103648    (0,-2147483648)-(0,-2147483648) batch id : 0
[7,1] element, prob = 0.000100398    (0,-2147483648)-(0,-2147483648) batch id : 0
[8,1] element, prob = 9.22909e-05    (0,-2147483648)-(0,-2147483648) batch id : 0
[9,1] element, prob = 8.88288e-05    (0,-2147483648)-(0,-2147483648) batch id : 0
[10,1] element, prob = 6.75321e-05    (0,-2147483648)-(0,-2147483648) batch id : 0
[11,1] element, prob = 6.50572e-05    (0,-2147483648)-(0,-2147483648) batch id : 0
[12,1] element, prob = 5.01635e-05    (0,-2147483648)-(0,-2147483648) batch id : 0

Then  I used the parameter --input_shape [1,600,1024,3] to convert the model, I still got wrong results. And, the test images do not have the same shape, so I can't give a fixed input shape.

 

Best regards,

Zhang Chunyan

 

 

0 Kudos
24 Replies
Shubha_R_Intel
Employee
1,781 Views

Dear zhang, chunyan,

Perhaps Tensorflow Object Detection API Custom Input Shape Documentation will help you.

If you have further questions, please post here.

Thanks,

Shubha

 

0 Kudos
zhang__chunyan
Beginner
1,781 Views

Dear Shubha,

 

Thanks. But I'm still confused about the 'keeping aspect ratio'. the document says that it is necessary to resize image before passing it to the sample. But After I resize the image, the image will be resized again according to the *.xml file.

 

Best regards,

zhang chunyan

0 Kudos
zhang__chunyan
Beginner
1,781 Views

Dear Shubha,

 

I have checked the code, it seems the model file *.bin was not correctly loaded. After loading weights, the CNNNetwork network was still empty.

 

Best regards,

zhang chunyan

0 Kudos
Shubha_R_Intel
Employee
1,781 Views

Dear zhang, chunyan,

Looking at pipeline.config of faster_rcnn_inception_v2_coco_2018_01_28 for example I see this:

model {
  faster_rcnn {
    num_classes: 90
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }

So in this case Model Optimizer will assume a square image of the min_dimension size or in other words, 600x600.  This is actually explained in the following text which I copied from the above referenced document regarding TF Object Detection API sizes

 

Keep Aspect Ratio Resizer Replacement

If the --input_shape command line parameter is not specified, the Model Optimizer generates an input layer with both height and width equal to the value of parameter min_dimension in the keep_aspect_ratio_resizer.

If the --input_shape [1, H, W, 3] command line parameter is specified, the Model Optimizer scales the specified input image height H and width W to satisfy the min_dimension and max_dimension constraints defined in the keep_aspect_ratio_resizer. The following function calculates the input layer height and width:

def calculate_shape_keeping_aspect_ratio(H: int, W: int, min_dimension: int, max_dimension: int):
    ratio_min = min_dimension / min(H, W)
    ratio_max = max_dimension / max(H, W)
    ratio = min(ratio_min, ratio_max)
    return int(round(H * ratio)), int(round(W * ratio))

In order to have a non-square image size you must pass in --input_shape to the MO command.

Hope it helps,

Thanks,

Shubha

0 Kudos
zhang__chunyan
Beginner
1,781 Views

Dear Shubha,

 

Thanks, I have read the document you gave. But the inference results were not correct. I tested the original model *.pb file, it can detect objects. But when I use the model file converted by model optimizer in faster rcnn demo, it can't detect any object. And there's a bug in the demo. Sometimes I run the demo, I meet an outofmemory error caused by opencv ( please see the scrrenshot in the attachment), but sometimes this error doesn't appear. And I don't know why the image is resized to such a big size. 

 

Best regards,

zhang chunyan

0 Kudos
zhang__chunyan
Beginner
1,781 Views

         

Dear Shubha,

 

I just found out the opencv error comes from the  faster RCNN demo code get the wrong input tensor:

<layer id="0" name="image_tensor" precision="FP32" type="Input">
            <output>
                <port id="0">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>600</dim>
                    <dim>600</dim>
                </port>
            </output>
        </layer>

----------------------------------

<layer id="117" name="image_info" precision="FP32" type="Input">
            <output>
                <port id="0">
                    <dim>1</dim>
                    <dim>3</dim>
                </port>
            </output>
        </layer>

---------------

I think the input tensor should be the first one, but in the code the second one is regarded as the input tensor.

By the way, does the Faster RCNN Demo support the keep_aspect_ratio resize? I think the code will resize to the fixed size specified in the *.xml file.

[ INFO ] Loading model to the device
[ INFO ] Create infer request
[ WARNING ] Image is resized from (603, 608) to (1024, 600)
[ INFO ] Batch size is 1
[ INFO ] Start inference
[ INFO ] Processing output blobs

 

Best regards,

zhang chunyan

  

0 Kudos
Shubha_R_Intel
Employee
1,781 Views

Dear zhang, chunyan,

If your accuracy is messed up that's because the image size passed into Model Optimizer is likely not correct. Or the images you are passing in for inference are the wrong size. Everything needs to match - in other words the model was trained with certain image size, pre-processing etc (mean values, scaling) and the exact same needs to be passed into Model Optimizer.  Did you add the  --reverse_input_channels switch ? If you trained your model with RGB then you definitely need to do --reverse_input_channels. Please read the faster rcnn demo doc .

Thanks,

Shubha

0 Kudos
zhang__chunyan
Beginner
1,781 Views

Dear Shubha,

 

Thanks。

I wonder if you have a successfully converted TF Faster RCNN model (*.bin, *xml) that can successfully run with the object_detection_demo_faster_rcnn.exe of version openvino_2019.2.242. If you have one, could you please send it to me.  Then I can take it as a reference.Thank you.

I have looked up in this model zoo: https://download.01.org/opencv/2019/open_model_zoo/R2/, but i didn't find faster rcnn model IR files.

 

Best regards,

zhang chunyan

0 Kudos
Shubha_R_Intel
Employee
1,781 Views

Dearest zhang, chunyan,

If you grab your faster rcnn from Model Optimizer Tensorflow Supported List then it should definitely work. I have done it before and it worked ! Also please update your tensorflow to TF 1.14 if you haven't already and also make sure you're using OpenVino 2019R2. Please try again and report your findings here. I will be glad to help you !

Thanks,

Shubha

0 Kudos
zhang__chunyan
Beginner
1,781 Views

Dear Shubha,

 

Thanks。

Now my tensorflow version is v1.11, I will upgrade the tf to v1.14 as you said.

And I download the Faster RCNN model (Faster R-CNN ResNet 50 COCO) from the list in this Link that you gave: Model Optimizer Tensorflow Supported List . Then I converted the model to *.bin and *.xml files (--reverse_input_channels) . But the result of the demo (object_detection_demo_faster_rcnn.exe) is also wrong.

 

Best regards,

zhang chunyan

0 Kudos
Shubha_R_Intel
Employee
1,781 Views

Dear zhang, chunyan,

It sounds like you did everything correctly. I hope you are using OpenVIno 2019R2 also. Please give me the exact mo_tf.py command you are using. Also can you kindly upload the image you're using ?

Thanks for your patience,

Shubha

0 Kudos
zhang__chunyan
Beginner
1,781 Views

Dear Shubha,

 

I'm using the OpenVINO 2019R2. The command that I use mo_tf.py is as follows: 

python mo_tf.py --input_model D:\zhangchunyan\OpenVINO\FasterRCNN\faster_rcnn_resnet50_coco_2018_01_28\frozen_inference_graph.pb --tensorflow_use_custom_operations_config C:\Intel\openvino_2019.2.242\deployment_tools\model_optimizer\extensions\front\tf\faster_rcnn_support.json --tensorflow_object_detection_api_pipeline_config D:\zhangchunyan\OpenVINO\FasterRCNN\faster_rcnn_resnet50_coco_2018_01_28\pipeline.config --reverse_input_channels 

And the command that I use object_detection_demo_faster_rcnn.exe is: 

object_detection_demo_faster_rcnn -i D:\zhangchunyan\OpenVINO\FasterRCNN\faster_rcnn_resnet50_coco_2018_01_28\COCO_val2014_000000203669.jpg -m D:\zhangchunyan\OpenVINO\FasterRCNN\faster_rcnn_resnet50_coco_2018_01_28\frozen_inference_graph.xml -d CPU -bbox_name ScaleShift/scale_locs -proposal_name proposals -prob_name Squeeze_3/softmax -p_msg

The image that I used is attached. The results is as follows:

[ INFO ] Start inference
[ INFO ] Processing output blobs
[0,1] element, prob = 0.999462    (0,0)-(0,0) batch id : 0 WILL BE PRINTED!
[1,1] element, prob = 0.632815    (0,0)-(1,0) batch id : 0 WILL BE PRINTED!
[2,1] element, prob = 0.118809    (0,0)-(0,0) batch id : 0
[3,1] element, prob = 0.0249501    (0,0)-(1,0) batch id : 0
[4,1] element, prob = 0.00494514    (0,0)-(0,0) batch id : 0
[5,1] element, prob = 0.00123677    (1,0)-(1,0) batch id : 0

The person in the image was detected, but the bounding box was incorrect.

 

Best regards,

zhang chunyan

0 Kudos
Shubha_R_Intel
Employee
1,781 Views

Dearest zhang chunyan,

You are correct. I reproduced this issue on OpenVino 2019R2. object_detection_demo_faster_rcnn.exe does not seem to work.  I will file a bug on your behalf. Sorry for the inconvenience !

Shubha

0 Kudos
zhang__chunyan
Beginner
1,781 Views

Dear Shubha,

 

Thanks。

If you fix the bug, Could you please tell me and send me the fixed demo. Thank you!

 

Best regards,

zhang chunyan

0 Kudos
Shubha_R_Intel
Employee
1,781 Views

Dear zhang, chunyan,

Of course I will inform you once the demo is fixed - the fix is scheduled for the next interim release.

Thanks,

Shubha

 

0 Kudos
Shubha_R_Intel
Employee
1,781 Views

Dearest zhang, chunyan,

Upon further investigation I find that I am mistaken. The answer is actually buried in the MO Documentation , this part:

A distinct feature of any SSD topology is a part performing non-maximum suppression of proposed images bounding boxes. This part of the topology is implemented with dozens of primitive operations in TensorFlow, while in Inference Engine, it is one layer called DetectionOutput. Thus, to convert a SSD model from the TensorFlow, the Model Optimizer should replace the entire sub-graph of operations that implement the DetectionOutput layer with a single DetectionOutput node.

Somewhere in your Model Optimizer output you probably saw the following message:

The graph output nodes "num_detections", "detection_boxes", "detection_classes", "detection_scores" have been replaced with a single layer of type "Detection Output". Refer to IR catalogue in the documentation for information about this layer.

Well Detection Output layer immediately screams SSD. So instead for this faster rcnn model use Object_Detection_Sample_SSD .

If you do use the SSD sample, it will work perfectly. object_detection_demo_faster_rcnn is not meant to be used here, because this demo takes IRs with 3 original outputs. 

I know this has been confusing - it is for me too. But let me know if the SSD sample works for you !

Thanks,

Shubha

 

0 Kudos
zhang__chunyan
Beginner
1,781 Views

Dear Shubha,

 

Thanks。

Yes, the SSD sample works for me! And the speed has improved!

But the results of Inference Engine is worse than that of tensorflow.  Some objects are not detected  or wrongly classified, but these objects are detected and classified correctly by my tf demo.  And I guess the reason is that " Inference Engine does not support dynamic image size so the Intermediate Representation file is generated with the input image size of a fixed size." This is from the log of Model Optimizer. Because I give a  reasonable --input_shape when I do model optimizer, the inference engine can output a perfect detection results as the tensorflow demo;    But you know, in many detection tasks, the sizes of input images varies. It's impossible to generate different *.xml and *.bin files for every image size;  So I want to ask that how to solve this kind of problem without big change of ssd sample to get right detection results for any size of images. Thank you.

 

Best regards,

zhang chunyan

0 Kudos
Shubha_R_Intel
Employee
1,781 Views

Dear zhang chunyan,

This is a valid question. Let me investigate this and get back to you on this forum !

Shubha

0 Kudos
Shubha_R_Intel
Employee
1,781 Views

Dear zhang chunyan,

This is a valid question. Let me investigate this and get back to you on this forum !

Shubha

0 Kudos
zhang__chunyan
Beginner
1,624 Views

Dear Shubha,

 

I have solved the keep aspect ratio resize problem by some process before feed the data to input blob of inference engine.

But the recall of my testing drops compared to the tensorflow demo, I have compared all the output bounding boxes of TF and OpenVINO SSD demo, all the coordinates of the detected boxes are the same except for one image. I think the reason maybe is that in this image there are too many objects, and in other images, the number of objects is less than 10 .And in this image, TF demo detected 52 objects, while OpenVINO SSD demo just detected 37 objects. So I want to ask if there is a threshold in the  OpenVINO SSD demo or the inference function to avoid detecting too many objects? I have went through the SSD demo code, and I have no idea about this..

Thank you.

Best regards,

zhang chunyan

0 Kudos
Reply