Help in decoding the YOLOv3-tiny outputs in the demo file `object_detection_demo_yolov3_async.py`

Dandriyal__Prashant · ‎05-04-2020

Hi, I am using the file _object_detection_demo_"yolov3_async.py_ demo.py" file provided in the OpenVINO 2020.1.023. I am facing trouble in understanding how the results initially obtained as are :

```

#Layer | Feature map shape
#detector/yolo-v3-tiny/Conv_12/BiasAdd/YoloRegion | (1, 255, 26, 26)
#detector/yolo-v3-tiny/Conv_9/BiasAdd/YoloRegion | (1, 255, 13, 13)

```

But then, these results are flattened and then the confidence, coordinates are extracted very mysteriously using the function _obj_index = entry_index(params.side, params.coords, params.classes, n * side_square + i, params.coords)_

where the function definition is as:

```

def entry_index(side, coord, classes, location, entry):
side_power_2 = side ** 2
n = location // side_power_2
loc = location % side_power_2
return int(side_power_2 * (n * (coord + classes + 1) + entry) + loc)

```

Please help me understand the process. Why are the results extracted this way from the flattened blob ?

Eduard_Zamaliev · ‎05-08-2020

Hi, thank you for feedback, we would simplify output postprocessing.

Back to your question. This is mostly legacy and was made for compatibility with V2 version. You should not flatten output in your app and could use array slices for access to some box, e.g:

def get_box(blob, i, j, n):

    return blob[0, n*85:(n+1)*85, i, j]

Or you can wait few days while we fix it here Open Model Zoo

Dandriyal__Prashant · ‎05-14-2020

Thanks for reaching out. Can you explain how the i, j and n are determined? I mean where do they come from ?

Eduard_Zamaliev · ‎05-20-2020

Sorry for delay,

The YOLO output (in IR) can be described as 3D tensor with shape [Cy,Cx, N*B] (let set batch to 1 for simplification), where Cx,Cy is a grid size. So, for each cell net predicts N boundnig boxes (B), which contain coordinates, probabilities etc. And i,j,n are indexes of cell and bounding box number.

Also you could look model's description and wait for PR with simplified postprocessing