Converting YOLO v3 tiny to OpenVINO

gransanger19 · ‎03-22-2019

Hi,

I want to convert my YOLO v3 tiny model trained with Darknet into the OpenVINO format (.bin, .xml).

First, I converted the model to the TensorFlow format (frozen inference graph .pb):

https://github.com/mystic123/tensorflow-yolo-v3

python convert_weights_pb.py \
--class_names data/object.names \
--weights_file data/yolov3-tiny_final.weights \
--data_format NHWC \
--tiny \
--output_graph data/backup3/yolov3-tiny_final.pb

Then, I converted the TensorFlow model to the OpenVINO format using the Model Optimizer

python "D:/ModelOptimizer/model_optimizer/mo_tf.py" \
--input_model data/backup3/yolov3-tiny_final.pb \
--output_dir data/backup3/FP32 \
--data_type FP32 \
--batch 1 \
--input detector/yolo-v3-tiny/Conv/Conv2D \
--output detector/yolo-v3-tiny/Conv_9/BiasAdd,detector/yolo-v3-tiny/Conv_12/BiasAdd

The MO returned SUCCESS but showed a warning:

D:\ModelOptimizer\model_optimizer\mo\middle\passes\fusing\decomposition.py:64: 
RuntimeWarning: invalid value encountered in sqrt
  scale = 1. / np.sqrt(variance.value + eps)

The problem is, that the OpenVINO model does calculates reasonable results, if I use it to analyse a test image. Here some test code to generate the outputs for an image:

# Using the Darknet model

import cv2
import numpy as np

yolo_weights = "data/yolov3-tiny_final.weights"
yolo_cfg = "data/yolov3-tiny.cfg"

img_path = "images/demo_img2.jpg"

def get_output_layers(net):
    
    layer_names = net.getLayerNames()
    output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

    return output_layers


image = cv2.imread(img_path)
height, width, channels = image.shape
blob = cv2.dnn.blobFromImage(image, scalefactor=(1.0/255), size=(416, 416), swapRB=True, crop=False)

net = cv2.dnn.readNet(yolo_weights, yolo_cfg)
net.setInput(blob)
outs = net.forward(get_output_layers(net))

print(outs[0].shape)
print(outs[1].shape)

for out in outs:
    for detection in out:
        if detection[6] > 0.7:
            print("confidence: {}".format(detection[6]))

------------------------------------------------------------------------------------------

# Using the OpenVINO model

from openvino import cv2
import numpy as np

yolo_weights = "data/backup3/FP32/yolov3-tiny_final.bin"
yolo_cfg = "data/backup3/FP32/yolov3-tiny_final.xml"

img_path = "images/demo_img.jpg"

def get_output_layers(net):
    
    layer_names = net.getLayerNames()
    output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

    return output_layers


image = cv2.imread(img_path)
height, width, channels = image.shape
blob = cv2.dnn.blobFromImage(image, scalefactor=(1.0/255), size=(416, 416), swapRB=True, crop=False)

net = cv2.dnn.readNet(yolo_weights, yolo_cfg)
net.setInput(blob)
outs = net.forward(get_output_layers(net))

print(outs[0].shape)
print(outs[1].shape)

for out in outs[0]:
    for detection in out:
        if detection[5] > 0.7:
            print("confidence: {}".format(detection[5]))

All outputs should be values between 0 and 1. The Darknet model produces outputs between 0 and 1 and outputs reasonable bounding box coordinates. For demonstration, I print the confidences for the bounding boxes above a threshold of 0.7. For the Darknet model, I get

(507, 7)
(2028, 7)
confidence: 0.9999673366546631
confidence: 0.9999920129776001

Also the shape of 7 coordinates is reasonable, as I trained the YOLO network only for one class. So:

detection[0] = x coordinate (bbox center) / detection[1] = y coordinate (bbox center) / detection[2] = bbox width / detection[3] = bbox height / detection[4] = maximal confidence / detection[5] = confidence for class 0 [always 0] / detection[6] = confidence for class 1

The OpenVINO model also produces negative numbers and numbers greater than 1. Each detection has only 6 coordinates, so I can guess which coordinate is kicked out (I suppose, that it is eigher the maximal confidence or the confidence for class 0). There are also many NaN in the outputs of shape (1, 2028, 6). I can't interpret the results, since most numbers are not between 0 and 1. For the shapes, I get:

(1, 507, 6)
(1, 2028, 6)

It would be great, if someone could reproduce the bug and could help me to fix it. If you need by trained model, the frozen graph or the converted OpenVINO model, let me know it. But as the results are correct for the Darknet model, I don't think that the fault is in my model.

Best regards

gransanger19

Cary_P_Intel1 · ‎03-25-2019

Hi, gransanger19,

For the parsing of YoloV3, please refer to the function of "ParseYOLOV3Output" in the example named "object_detection_demo_yolov3_async".

gransanger19 · ‎03-27-2019

Thank you for your reply. Unfortunally, object_detection_demo_yolov3_async also returns not reasonable results. However, in the last layer representing the detected bounding boxes, values between 0 and 1 are expected, representing the bounding box coordinates, the confidence and the class id. But the IR model also produces many negative values, values greater than 1 and NaNs.

Did you try to convert an YOLO v3 tiny model with the commands above? Don't you get NaNs or values outside the range [0, 1] by predicting an image?

Mikhail_T_Intel · ‎03-27-2019

Hi there,

gransanger19, converting of TF Yolo models with Model Optimizer is not so straightforward, and requires additional MO options, But likely whole process is well documented here https://software.intel.com/en-us/articles/OpenVINO-Using-TensorFlow#yolov3-to-ir

Briefly, you just need to specify additional command-line argument --tensorflow_use_custom_operations_config $MO_ROOT/extensions/front/tf/yolo_v3.json. But anyway, I recommend to read the paper to make sure that every other steps, including conversion from darkflow, are done in a recommended way

Shubha_R_Intel · ‎03-27-2019

Dear gransanger19:

Perhaps this dldt github issue post will help you:

https://github.com/opencv/dldt/issues/102

Thanks for using OpenVino !

Shubha

gransanger19 · ‎03-28-2019

Hi,

thank you for your help. Now I followed exactly your instructions. For predicting a test image, the network now produces output shapes of (1, 24, 26, 26) and (1, 24, 13, 13). It seems so, that the final YOLO layers are cut out, as also mentioned in your article.

I want to find out, whether the problem is in my converted model or in my compilation of object_detection_demo_yolov3_async with Visual Studio 2017 (which produced some errors due to std::vectors delivered by a dll). It would be very nice, if you could take a look on my converted model and if you could check if you are able to parse it correctly (e.g. using object_detection_demo_yolov3_async) and to generate reasonable predictions on my test images.

This is what I did:

Preparing yolov3-tiny.cfg for classes=3 (and filters = 3*(5+classes) = 24) -> class 0 = background / class 1 = Circle / class 2 = Rectangle
Training a YOLO v3 tiny network with Darknet -> yoloy3-tiny_final.weights

Converting the darknet weights to a frozen inference graph (TensorFlow / pb):

python convert_weights_pb.py --class_names=data/CircleRectangle/object.names --weights_file=data/CircleRectangle/yolov3-tiny_final.weights --data_format=NHWC --tiny --output_graph data/CircleRectangle/yolov3-tiny_final.pb

Converting the frozen inference graph to the IR model (bin / mapping / xml)

python "D:/ModelOptimizer/model_optimizer/mo_tf.py" --batch 1 --input_model data/CircleRectangle/yolov3-tiny_final.pb --tensorflow_use_custom_operations_config "D:/ModelOptimizer/model_optimizer/extensions/front/tf/yolo_v3_tiny.json" --output_dir=data/CircleRectangle/

As already mentioned, the darknet weights file works very nice and generates reasonable predictions.

You can download my models and test images here. The predicted images in the results folder are generated with the original darknet model. Are you able to reproduce these results with the IR model? And if not, are you able to convert my darknet model to IR in such a way that it works? It would very nice if you could help me to find out, in which step the problem occurs.

Best regards and thanks a lot,

gransanger19

gransanger19 · ‎03-28-2019

By the way, this is my yolo_v3_tiny.json:

[
  {
    "id": "TFYOLOV3",
    "match_kind": "general",
    "custom_attributes": {
      "classes": 3,
      "coords": 4,
      "num": 6,
      "mask": [0, 1, 2],
      "entry_points": ["detector/yolo-v3-tiny/Reshape", "detector/yolo-v3-tiny/Reshape_4"]
    }
  }
]