Showing results for 
Search instead for 
Did you mean: 
New Contributor I

How do I find out how to interpret or "decode" inference output?

This maybe a dumb question, but how to I find out the format of the output from an inference from a given model?

To be specific, I downloaded Mobilenet-SSDv2_coco from the model zoo using the model_downloader:

~/intel/openvino/deployment_tools/tools/model_downloader$ ./ --name ssd_mobilenet_v2_coco

I then ran the model optimizer to get the bin and xml model files in FP16 format for Myriad to be used in:

net = cv2.dnn.readNet('ssdv2coco.xml', 'ssdv2coco.bin')

(I renamed the *.bin and *.xml files from the to something more meaningful to me)

What I can't find, is the format of the output after running the inference:

inference_results = net.forward()

Where is this information to be found?  Also were do I find the labels for the detections or classifications from a given model?

0 Kudos
6 Replies

Hi Walter,

The return type of forward() is cv::Mat which returns the blob for first output of specified layer.

You can check the § forward() [1/4] from the documentation 


New Contributor I

I assumed the format of the returned output data would be network dependent, it would be incredibly helpful if the output format is standardized.

So you seem to be saying that if I use a different model the box points, confidence, and index of the object  are still in the same locations when I loop over  the results matrix?

conf = inference_results[0, 0, i, 2]   # extract the confidence (i.e., probability) 
idx = int(inference_results[0, 0, i, 1])   # extract the index of the class label
boxPoints = inference_results[0, 0, i, 3:7]

This will make my task of evaluating several different models  a whole lot easier!


I still have a question of where to get the label files for the different models (map of indicies to object names) in the model zoo as they don't seem to be in what is downloaded by the model downloader script.


New Contributor I

My converted MobilenetSSD-v2_coco model appears to be running fine and detecting objects but How do I find the label for the object types?  

For example a test image returned two detections with confidince > 0.3:

[ 0.          1.          0.97314453  0.22705078  0.31835938  0.47070312
[  0.          82.           0.35913086   0.21923828   0.32080078
   0.47607422   0.95654297]

What objects are represented by index 1 & 82.

Obviously  I can "guess" by where the boxes are drawn for the highest confidence, but the label file for this model downloaded from the "model zoo" should be available somewhere.


The model in question was trained on COCO, so it uses the COCO labels. Can be found, for example, here

New Contributor I

Sergei N. (Intel) wrote:

The model in question was trained on COCO, so it uses the COCO labels. Can be found, for example, here

I'm still seeing ambiguity, I got this pbtxt file when I asked on the PyImageSearch "help line"

Which has 80 items as does the list in your link.  But on some test images I got "hits" for item 1 & 82 with this simple sample code:

import cv2
import numpy as np

# Load the model
#net = cv2.dnn.readNet('face-detection-adas-0001.xml', 'face-detection-adas-0001.bin')
net = cv2.dnn.readNet("mobilenet_ssd_v2/MobilenetSSDv2coco.xml", "mobilenet_ssd_v2/MobilenetSSDv2coco.bin")
# Specify target device

# Read an image
#frame = cv2.imread('../Pictures/dw_cam_test.jpg')
frame = cv2.imread('test1.jpg')

# Prepare input blob and perform an inference
#blob = cv2.dnn.blobFromImage(frame, size=(672, 384), ddepth=cv2.CV_8U)
#blob = cv2.dnn.blobFromImage(frame, size=(300, 300), ddepth=cv2.CV_8U)
blob = cv2.dnn.blobFromImage(frame, size=(300, 300))

if 0:  # Intel example, didn't need numpy
  out = net.forward()
  # Draw detected object on the frame
  for detection in out.reshape(-1, 7):
    confidence = float(detection[2])
    if confidence > 0.3:
        xmin = int(detection[3] * frame.shape[1])
        ymin = int(detection[4] * frame.shape[0])
        xmax = int(detection[5] * frame.shape[1])
        ymax = int(detection[6] * frame.shape[0])
        cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(255, 255, 255))
else: # PyImageSearch tutorial dnn Caffe MobilenetSSD v1 tutorial code
  (h, w) = frame.shape[:2]
  detections = net.forward()
  for i in np.arange(0, detections.shape[2]):
    conf = detections[0, 0, i, 2]
    idx = int(detections[0, 0, i, 1])
    if conf > 0.3:
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")
        cv2.rectangle(frame, (startX, startY), (endX, endY), (255, 255, 255))

# Save the frame to an image file
#cv2.imwrite('out.jpg', frame)
cv2.imshow("Result", frame)
cv2.imshow("Hit a Key to EXIT", frame)  #display it.

So it can't be a linear mapping with the list you've linked.  The pbtxt file seems to have the correct mapping in than so far all my "hits" have had idx that match an Item ID in the bptxt file.


For my purposes I only care about detecting "people" so knowing idx == 1 is it solves my issue.  But I'm at the point where I want to compare different "off the shelf" AI, especially on images that are "bogus" detections.  But if the model downloader doesn't bring in the label mapping its going to be more difficult than necessary.


Have I got caught in the transition between 2019R1 and 2019R2?  I'm still using 2019R1


My frame rate dropped when switching from OpenVINO dnn using the original NCS from ~9.8 fps with the Mobilenet-SSD Caffe model from the PyImageSearch OpenVINO tutorial, to ~5.9 fps when I "dropped in" the v2 Tensorflow model I downloaded and "optimized".  I guess its to be expected since v1 had 21 items compared to 80 in v2.


Dear Kulecz, Walter,

If OpenVino 2019 R1 is working for you then I guess it's all good.  But an awful lot of improvements and bug-fixes went into OpenVIno 2019R2 so really, you should be on the latest and greatest. We are actually currently on 2.01 now.

Anyhoo, here's your answer :

For intel models, the labels are “integrated” into demos. Either as a stand alone file (e.g. ) or actually within code.

For public models, they are, typically, trained on ImageNet, Pascal VOC or COCO. So you should just use the appropriate set (all 3 are widely known and accessible).

Hope it helps.