This maybe a dumb question, but how to I find out the format of the output from an inference from a given model?
To be specific, I downloaded Mobilenet-SSDv2_coco from the model zoo using the model_downloader:
~/intel/openvino/deployment_tools/tools/model_downloader$ ./downloader.py --name ssd_mobilenet_v2_coco
I then ran the model optimizer to get the bin and xml model files in FP16 format for Myriad to be used in:
net = cv2.dnn.readNet('ssdv2coco.xml', 'ssdv2coco.bin')
(I renamed the *.bin and *.xml files from the mo_tf.py to something more meaningful to me)
What I can't find, is the format of the output after running the inference:
inference_results = net.forward()
Where is this information to be found? Also were do I find the labels for the detections or classifications from a given model?
I assumed the format of the returned output data would be network dependent, it would be incredibly helpful if the output format is standardized.
So you seem to be saying that if I use a different model the box points, confidence, and index of the object are still in the same locations when I loop over the results matrix?
conf = inference_results[0, 0, i, 2] # extract the confidence (i.e., probability) idx = int(inference_results[0, 0, i, 1]) # extract the index of the class label boxPoints = inference_results[0, 0, i, 3:7]
This will make my task of evaluating several different models a whole lot easier!
I still have a question of where to get the label files for the different models (map of indicies to object names) in the model zoo as they don't seem to be in what is downloaded by the model downloader script.
My converted MobilenetSSD-v2_coco model appears to be running fine and detecting objects but How do I find the label for the object types?
For example a test image returned two detections with confidince > 0.3:
[ 0. 1. 0.97314453 0.22705078 0.31835938 0.47070312 0.93652344] [ 0. 82. 0.35913086 0.21923828 0.32080078 0.47607422 0.95654297]
What objects are represented by index 1 & 82.
Obviously I can "guess" by where the boxes are drawn for the highest confidence, but the label file for this model downloaded from the "model zoo" should be available somewhere.
Sergei N. (Intel) wrote:
The model in question was trained on COCO, so it uses the COCO labels. Can be found, for example, here https://github.com/amikelive/coco-labels/blob/master/coco-labels-2014_20...
I'm still seeing ambiguity, I got this pbtxt file when I asked on the PyImageSearch "help line"
Which has 80 items as does the list in your link. But on some test images I got "hits" for item 1 & 82 with this simple sample code:
import cv2 import numpy as np # Load the model #net = cv2.dnn.readNet('face-detection-adas-0001.xml', 'face-detection-adas-0001.bin') net = cv2.dnn.readNet("mobilenet_ssd_v2/MobilenetSSDv2coco.xml", "mobilenet_ssd_v2/MobilenetSSDv2coco.bin") # Specify target device net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD) # Read an image #frame = cv2.imread('../Pictures/dw_cam_test.jpg') frame = cv2.imread('test1.jpg') # Prepare input blob and perform an inference #blob = cv2.dnn.blobFromImage(frame, size=(672, 384), ddepth=cv2.CV_8U) #blob = cv2.dnn.blobFromImage(frame, size=(300, 300), ddepth=cv2.CV_8U) blob = cv2.dnn.blobFromImage(frame, size=(300, 300)) net.setInput(blob) if 0: # Intel example, didn't need numpy out = net.forward() # Draw detected object on the frame for detection in out.reshape(-1, 7): confidence = float(detection) if confidence > 0.3: print(detection) xmin = int(detection * frame.shape) ymin = int(detection * frame.shape) xmax = int(detection * frame.shape) ymax = int(detection * frame.shape) cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(255, 255, 255)) else: # PyImageSearch tutorial dnn Caffe MobilenetSSD v1 tutorial code (h, w) = frame.shape[:2] detections = net.forward() for i in np.arange(0, detections.shape): conf = detections[0, 0, i, 2] idx = int(detections[0, 0, i, 1]) if conf > 0.3: box = detections[0, 0, i, 3:7] * np.array([w, h, w, h]) (startX, startY, endX, endY) = box.astype("int") cv2.rectangle(frame, (startX, startY), (endX, endY), (255, 255, 255)) # Save the frame to an image file #cv2.imwrite('out.jpg', frame) cv2.imshow("Result", frame) cv2.imshow("Hit a Key to EXIT", frame) #display it. cv2.waitKey(0)
So it can't be a linear mapping with the list you've linked. The pbtxt file seems to have the correct mapping in than so far all my "hits" have had idx that match an Item ID in the bptxt file.
For my purposes I only care about detecting "people" so knowing idx == 1 is it solves my issue. But I'm at the point where I want to compare different "off the shelf" AI, especially on images that are "bogus" detections. But if the model downloader doesn't bring in the label mapping its going to be more difficult than necessary.
Have I got caught in the transition between 2019R1 and 2019R2? I'm still using 2019R1
My frame rate dropped when switching from OpenVINO dnn using the original NCS from ~9.8 fps with the Mobilenet-SSD Caffe model from the PyImageSearch OpenVINO tutorial, to ~5.9 fps when I "dropped in" the v2 Tensorflow model I downloaded and "optimized". I guess its to be expected since v1 had 21 items compared to 80 in v2.
Dear Kulecz, Walter,
If OpenVino 2019 R1 is working for you then I guess it's all good. But an awful lot of improvements and bug-fixes went into OpenVIno 2019R2 so really, you should be on the latest and greatest. We are actually currently on 2.01 now.
Anyhoo, here's your answer :
For intel models, the labels are “integrated” into demos. Either as a stand alone file (e.g. https://github.com/opencv/open_model_zoo/blob/develop/demos/python_demos/instance_segmentation_demo/coco_labels.txt ) or actually within code.
For public models, they are, typically, trained on ImageNet, Pascal VOC or COCO. So you should just use the appropriate set (all 3 are widely known and accessible).
Hope it helps.