OpenVINO failing on YoloV3's YoloRegion, only one working on FP16, all working on FP32

J__Niko · ‎01-24-2019

1) I followed OpenVINO tutorial and converted YoloV3 coco dataset weights, works perfectly on FP32 CPU and FP16 NCS2

2) I trained my own model with 1 class, tested it on darknet, works perfectly. I converted it, FP32 CPU, perfect. FP16 NCS2 had tens of boxes flickering around the screen randomly and confidences were low.

3) I converted same weights (coco) to FP16 and FP32, loaded models, converted FP16 weight values (16-bit integers) to 16 bit floats (1bit sign, 5bit exponent, 10bit mantissa) and compared models, biggest absolute difference in floats (FP16, FP32) was 0.125, biggest difference float values: (FP16: 305.0 = FP32: 304.875)

4) I did same steps as described in 3) for my own model, biggest absolute difference in floats was 0.124969482421875, biggest difference float values: (FP16: 272.75 = FP32: 272.62503)

5) Ok, so my model doesn't have huge inaccuracy that would come from using FP16 floating point accuracy.

6) I created script that loads FP16 model (coco) to NCS2 and FP32 model (coco) to CPU. I gave same image data to both and compared per-output-layer accuracy. Max difference means difference in output data for that layer. Basically I took outputs and calculated np.absolute(fp16_output_layer_x - fp32_output_layer_x) after which I calculated max and average differences in output layer values:

Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 0.06429833173751831 (avg 0.0007683004951104522)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.09005624055862427 (avg 0.0007103998796083033)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 0.04047667980194092 (avg 0.0008134443196468055)

7) Ok, pretty similar values, let's do same with my own model (trained 5000 steps for single class with >100k objects dataset)

Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 22.19023323059082 (avg 1.2534778118133545)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.07185804843902588 (avg 0.004095843061804771)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 17.305316925048828 (avg 1.1088379621505737)

8) Two layers are are having VERY dissimilar values compared to (accurate) FP32 model, let's test with same model but trained 12000 steps:

Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 24.81550407409668 (avg 1.3480792045593262)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.07248461246490479 (avg 0.004531938582658768)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 25.586721420288086 (avg 1.2835898399353027)

9) Even worse. What if I only use Conv_22 in post-processing? Tested. And it detects perfectly... small objects. Perfect accuracy for small objects but when those comes closer, the model doesn't detect anything. Well, that's obvious, Yolo uses different output layers for different sized objects. But why? Why does the FP32 model work perfectly but identical FP16 model does work only on one layer?

10) I ran diff tools on model .xml and .mapping files (fp16_trained_by_me vs fp32_trained_by_me) and (fp16_trained_by_me vs fp16_coco) and all of those are basically identical except some layer ids are different and number of classes differ (my model has only 1 class).

11) Tested tons of different modifications to yolo_v3.json -configuration and even tried some edits in YOLO.py that uses yolo_v3.json. Nothing worked. Does anyone have workarounds and (if Intel dev is reading) is Intel going to fix this soon?

nikos1 · ‎01-24-2019

Good analysis!

> OpenVINO failing on YoloV3's YoloRegion, only one working on FP16, all working on FP32

just to avoid confusion is your statement "only one working on FP16" valid for all FP16 devices or just NCS or NCS2?

In other words have you tried FP16 on GPU? Any issues? Same question for NCS.

thanks,

nikos

J__Niko · ‎01-24-2019

> just to avoid confusion is your statement "only one working on FP16" valid for all FP16 devices or just NCS or NCS2?

Ah, good point, I only tested with NCS2 as I don't have NCS1s.

> In other words have you tried FP16 on GPU?

I haven't, only machines I have installed Ubuntu on are some dev-boards and my laptop. Tried to run it on GPU on laptop and it threw "RuntimeError: failed to create engine: clGetPlatformIDs error -1001". Laptop and devboards have integrated graphics.

Oh and yes, this was on 2018.R5 as 2018.R4 doesn't support Resampling layer at all.

nikos1 · ‎01-24-2019

Thank you for the update! I do have an NCS1 and a few HD GPUs if you would like me to run any FP16 validation tests for you. Feel free to send me IR and script to test if it is important to draw conlcusions.

Good luck in this investigation. Hope you get some help from our forum experts.

Cheers,

Nikos

J__Niko · ‎01-25-2019

This is work-related project so I could not share the same image I tested with but got permission to share the models and the testing scripts:

https://drive.google.com/open?id=14tTQU58xvCxzZSMf1gvtZYsYvJxSFZRu

You may try different images (example.jpg in the package is public domain)

nikos1 · ‎01-25-2019

Thanks Niko! Got it - feel free to delete it.

Will be testing FP16 on GPU and NCS1 and will be updating here shortly.

Cheers,

Nikos

J, Niko wrote:
This is work-related project so I could not share the same image I tested with but got permission to share the models and the testing scripts:
https://drive.google.com/open?id=14tTQU58xvCxzZSMf1gvtZYsYvJxSFZRu
You may try different images (example.jpg in the package is public domain)

nikos1 · ‎01-25-2019

Results from your coco.py script run on FP16 { NCS1, NCS2, HD630 GPU } vs. FP32 CPU attached below.

It seems FP16 GPU is statistically close to FP32, right?

Also NCS1 also has the same issue as NCS2.

UHD630 GPU FP16 vs CPU FP32
Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 0.0 (avg 0.0)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.0 (avg 0.0)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 0.09870767593383789 (avg 0.005961029324680567)

NCS1 FP16 vs CPU FP32
Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 0.0 (avg 0.0)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.0 (avg 0.0)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 16.161561965942383 (avg 1.2095401287078857)

NCS2 FP16 vs CPU FP32
Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 0.0 (avg 0.0)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.0 (avg 0.0)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 16.052186965942383 (avg 1.2017731666564941)

nikos1 · ‎01-25-2019

JFTR results from custom.py attached below. Let me know if any more tests are needed.

NCS1 FP16 vs FP32
Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 21.495864868164062 (avg 1.2882004976272583)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.06138873100280762 (avg 0.00450884411111474)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 20.39262580871582 (avg 1.1951444149017334)

NCS2 FP16 vs FP32
Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 21.495864868164062 (avg 1.2882004976272583)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.06138873100280762 (avg 0.00450884411111474)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 20.39262580871582 (avg 1.1951444149017334)

GPU FP16 vs FP32
Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 0.023189589381217957 (avg 0.002240594243630767)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.022297099232673645 (avg 0.0016955230385065079)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 0.05176454782485962 (avg 0.004943516571074724)

J__Niko · ‎01-27-2019

nikos wrote:

JFTR results from custom.py attached below. Let me know if nay more tests are needed.

NCS1 FP16 vs FP32
Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 21.495864868164062 (avg 1.2882004976272583)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.06138873100280762 (avg 0.00450884411111474)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 20.39262580871582 (avg 1.1951444149017334)

NCS2 FP16 vs FP32
Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 21.495864868164062 (avg 1.2882004976272583)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.06138873100280762 (avg 0.00450884411111474)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 20.39262580871582 (avg 1.1951444149017334)

GPU FP16 vs FP32
Max diff for layer detector/yolo-v3/Conv_14/BiasAdd/YoloRegion = 0.023189589381217957 (avg 0.002240594243630767)
Max diff for layer detector/yolo-v3/Conv_22/BiasAdd/YoloRegion = 0.022297099232673645 (avg 0.0016955230385065079)
Max diff for layer detector/yolo-v3/Conv_6/BiasAdd/YoloRegion = 0.05176454782485962 (avg 0.004943516571074724)

Hi, thanks, this confirmed it. GPU FP16 is practically the same as CPU FP32 (small differences due to FP16 being more inaccurate compared to FP32) but two layers on both sticks (FP16 and FP32) are having really bad time. I guess there is not much we can do about this.

Thank you for running the tests!

Joseph_M_Intel · ‎02-15-2019

J, Niko wrote:
Oh and yes, this was on 2018.R5 as 2018.R4 doesn't support Resampling layer at all.

Can I ask why the resampling layer isn't supported? I am trying to understand why. Thank you for your time.

Or are you saying R5 DOES have support for Resampling layer?

J__Niko · ‎02-18-2019

JOSEPH M. (Intel) wrote:
Quote:
J, Niko wrote:

Oh and yes, this was on 2018.R5 as 2018.R4 doesn't support Resampling layer at all.

Can I ask why the resampling layer isn't supported? I am trying to understand why. Thank you for your time.
Or are you saying R5 DOES have support for Resampling layer?

Resampling layer was not supported in OpenVINO 2018.R4, I don't know why, the documentation claimed it was supported but it was not. As of OpenVINO 2018.R5 it is actually supported but all this other stuff is more important, it's not possible to run custom trained Yolo models on Myriad.

Batra__Dhruv · ‎03-01-2019

Hey, were you able to convert the yolov3 weights to .pb file and further optimize it using Openvino on just one "SINGLE" class.

Everytime i pass my weights through Mystic123-github link for conversion of yolov3 weights to tensorflow (.pb) file it gives me an error

./convert_weights_pb.py --class_name coco.names --data_format NHWC --weight_file yolov3.weights
Traceback (most recent call last):
File "./convert_weights_pb.py", line 52, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "./convert_weights_pb.py", line 42, in main
load_ops = load_weights(tf.global_variables(scope='detector'), FLAGS.weights_file)
File "/home/night/Desktop/github/tensorflow-yolo-v3/utils.py", line 115, in load_weights
(shape[3], shape[2], shape[0], shape[1]))
ValueError: cannot reshape array of size 4607 into shape (18,256,1,1)

Please share how you did you converted a single class trained model through openvino

Hyodo__Katsuya · ‎03-01-2019

The following may be helpful. https://github.com/mystic123/tensorflow-yolo-v3/issues/64

Batra__Dhruv · ‎03-01-2019

@Hyodo, Katsuya

root@night:/opt/github/tensorflow-yolo-v3# python3 convert_weights_pb.py
Traceback (most recent call last):
File "convert_weights_pb.py", line 52, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "convert_weights_pb.py", line 42, in main
load_ops = load_weights(tf.global_variables(scope='detector'), FLAGS.weights_file)
File "/opt/github/tensorflow-yolo-v3/utils.py", line 115, in load_weights
(shape[3], shape[2], shape[0], shape[1]))
ValueError: cannot reshape array of size 211871 into shape (256,128,3,3)
Didnt work out
can you share your coco.names file

Tegwyn_Twmffat · ‎03-16-2019

@Batra, Dhruv

coco.names = person ..... Fixed the problem for me. Also, keep your images no bigger than 224 x 224 or NCS2 will fail.

@JOSEPH M. (Intel)

By the way, I have been trying to port my single class YoloV3 model to NCS2 as above and can confirm the same problems as J.Niko had regarding the multiple random bounding boxes. The model worked fine in YoloV3 environment, but failed in conversion to .bin file, i guess.

Obviously this needs fixing ...... I'm a very patient person, but any idea when? Thanks!