I am running the object detection demo using yolov3 provided with OpenVino, and getting different results between the python implementation and the c++ implementation.
For the c++ implementation I am running using the following command from the directory ~/inference_engine_samples/intel64/Release:
./object_detection_demo_yolov3_async -i test.mov -m yolo_v3-FP16.xml -d MYRIAD
For the python implementation I am running using the following command from the directory ~/intel/computer_vision_sdk/deployment_tools/inference_engine/samples/python_samples:
python3 object_detection_demo_yolov3.py -i test.mov -m yolo_v3-FP16.xml -d MYRIAD
The C++ API detects much more objects than the python version. Notice that I am using the exact same model and video on both.
I tried running on NCS1, NCS2 and CPU (for CPU I changed the model to FP32) and I am using the R5 release.
I also noticed different results when trying to convert the interactive face detection demo from c++ to python.
I verified that the network gets the exact same input, and printed out the 10 first output values for the 13x13 output head in python and in C++, and the values are slightly different.
[0.4868164 , 0.5966797 , 0.5234375 , 0.47583008, 0.50878906,
0.515625 , 0.5463867 , 0.48388672, 0.43481445, 0.53271484]
[0.492188, 0.600586, 0.507324, 0.481201, 0.505859,
0.518066, 0.543457, 0.490479, 0.431396, 0.536621]
So I found that there's a bug in the python script.
Line 282 should be fixes to:
if obj['xmax'] > origin_im_size or obj['ymax'] > origin_im_size or obj['xmin'] < 0 or obj['ymin'] < 0:
after fixing this, the same objects are detected in both c++ and python
I still find it strange that the outputs of the network are a bit different between python and c++.
Nice investigation! I am seeing similar issues but also noticed that results for some networks are very sensitive to input perturbation. Here it seems the inputs are similar but not identical. To draw better conclusions it would be nice if you could feed the bit-exact same input to both C++ and Python code. In the case of YOLOv3 I am seeing different results even I change the input resizing algorithm.
Looking forward to more results and conclusions from this thread.