Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Help in decoding the YOLOv3-tiny outputs in the demo file `object_detection_demo_yolov3_async.py`

Dandriyal__Prashant
756 Views

Hi, I am using the file _object_detection_demo_"yolov3_async.py_ demo.py" file provided in the OpenVINO 2020.1.023. I am facing trouble in understanding how the results initially obtained as are :

```

#Layer                                              |  Feature map shape
#detector/yolo-v3-tiny/Conv_12/BiasAdd/YoloRegion   |   (1, 255, 26, 26)
#detector/yolo-v3-tiny/Conv_9/BiasAdd/YoloRegion    |   (1, 255, 13, 13)

```

But then, these results are flattened and then the confidence, coordinates are extracted very mysteriously using the function _obj_index = entry_index(params.side, params.coords, params.classes, n * side_square + i, params.coords)_ 

where the function definition is as:

```

def entry_index(side, coord, classes, location, entry):
    side_power_2 = side ** 2
    n = location // side_power_2
    loc = location % side_power_2
    return int(side_power_2 * (n * (coord + classes + 1) + entry) + loc)

```

Please help me understand the process. Why are the results extracted this way from the flattened blob ?

0 Kudos
3 Replies
Eduard_Zamaliev
Employee
756 Views

Hi, thank you for feedback, we would simplify output postprocessing.

Back to your question. This is mostly legacy and was made for compatibility with V2 version. You should not flatten output in your app and could use array slices for access to some box, e.g:

def get_box(blob, i, j, n):

    return blob[0, n*85:(n+1)*85, i, j]

 

Or you can wait few days while we fix it here Open Model Zoo

0 Kudos
Dandriyal__Prashant
756 Views

Thanks for reaching out. Can you explain how the i, j  and n are determined? I mean where do they come from ?

 

0 Kudos
Eduard_Zamaliev
Employee
756 Views

Sorry for delay,

The YOLO output (in IR) can be described as 3D tensor with shape [Cy,Cx, N*B] (let set batch to 1 for simplification), where Cx,Cy is a grid size. So, for each cell net predicts N boundnig boxes (B), which contain coordinates, probabilities etc. And i,j,n are indexes of cell and bounding box number.

Also you could look model's description and wait for PR with simplified postprocessing

Reply