Hi, I am using the file _object_detection_demo_"yolov3_async.py_ demo.py" file provided in the OpenVINO 2020.1.023. I am facing trouble in understanding how the results initially obtained as are :
#Layer | Feature map shape
#detector/yolo-v3-tiny/Conv_12/BiasAdd/YoloRegion | (1, 255, 26, 26)
#detector/yolo-v3-tiny/Conv_9/BiasAdd/YoloRegion | (1, 255, 13, 13)
But then, these results are flattened and then the confidence, coordinates are extracted very mysteriously using the function _obj_index = entry_index(params.side, params.coords, params.classes, n * side_square + i, params.coords)_
where the function definition is as:
def entry_index(side, coord, classes, location, entry):
side_power_2 = side ** 2
n = location // side_power_2
loc = location % side_power_2
return int(side_power_2 * (n * (coord + classes + 1) + entry) + loc)
Please help me understand the process. Why are the results extracted this way from the flattened blob ?
Hi, thank you for feedback, we would simplify output postprocessing.
Back to your question. This is mostly legacy and was made for compatibility with V2 version. You should not flatten output in your app and could use array slices for access to some box, e.g:
def get_box(blob, i, j, n): return blob[0, n*85:(n+1)*85, i, j]
Or you can wait few days while we fix it here Open Model Zoo
Sorry for delay,
The YOLO output (in IR) can be described as 3D tensor with shape [Cy,Cx, N*B] (let set batch to 1 for simplification), where Cx,Cy is a grid size. So, for each cell net predicts N boundnig boxes (B), which contain coordinates, probabilities etc. And i,j,n are indexes of cell and bounding box number.