- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm trying to run an IR of a Yolo V2 Darkflow model on a Raspberry Pi using a Movidius Neural Stick.Getting this error:
terminate called after throwing an instance of 'InferenceEngine::details::InferenceEngineException' what(): Failed to infer shapes for Concat layer (concat_1) with error: Invalid inputs for Concat layer: dimensions should match in all positions except axis (1) : [[1,256,9,9]] vs [[1,1024,10,10]]
Not sure why this is happening and couldn't find any discussion of this error on the forum.
My setup:
- OpenVino 2019.3.334 on Windows 10 and Raspbian Buster
- Attaching the IR xml and mapping files and the pb and meta I used to create it
- used this command to create an IR
python mo_tf.py --input_model yolo-chubba.pb --batch 1 --data_type FP16 --tensorflow_use_custom_operations_config C:/"Program Files (x86)"/IntelSWTools/openvino_2019.3.334/deployment_tools/model_optimizer/extensions/front/tf/yolo_v2.json
Any help would be greatly appreciated.
Pasha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alpeyev, Pavel,
Your project sounds interesting! I'm sure you can get such completed way before to that happening.
I'd suggest to take a look at the OpenVINO training extensions repo and see if there is any model that fits your needs. I also know there is a user (PINTO) who has pretty good step-by-step tutorials on YOLOv3. I haven't tried this one myself but feel free to give it a try.
The YOLOv3 model that can be converted and used by OpenVINO can be found in this repository.
As far as you questions:
does it make sense to try the script with a still image instead of video?
It really is up to you, both work fine. You can do a live video feed and inference all the time, or perhaps somehow (motion sensor?) trigger to capture an image (or start a video feed) and then inference the frame you can also do that.
And do you think the dimensions of new input image might have something to do with the error?
Hmm I don't think that is the issue, but you can always resize the input image to what your network expects and see if the problem is solved.
Regards,
Luis
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looking at this post made me realized that I failed to change the number of classes for my custom Darkflow model in yolo_v2.json, which should be 1, but still getting the same concat layer error.
Do I need to specify --input_shape to match the resolution of the Pi Camera? Or is it the resolution of the photos I trained it on?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alpeyev, Pavel,
Thanks for reaching out. I am not sure what the issue could be, I was able to convert the model you have shared with the Model Optimizer just fine as you mentioned. May I ask which sample code are you using to run your model? Also if possible please share a sample image we could try as input for the model.
Regards,
Luis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Luis,
Thank you for the response!
I should have included the code to begin with.
Here it is below. I'm running it in a virtualenv setup with opencv:
# import the necessary packages from imutils.video import VideoStream from imutils.video import FPS import numpy as np import argparse import imutils import time import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-p", "--prototxt", required=False, help="path to Caffe 'deploy' prototxt file") ap.add_argument("-m", "--model", required=False, help="path to pre-trained model .bin") ap.add_argument("-x", "--config", required=False, help="path to Caffe pre-trained model .xml") ap.add_argument("-c", "--confidence", type=float, default=0.2, help="minimum probability to filter weak detections") ap.add_argument("-u", "--movidius", type=bool, default=0, help="boolean indicating if the Movidius should be used") args = vars(ap.parse_args()) # initialize the list of class labels MobileNet SSD was trained to # detect, then generate a set of bounding box colors for each class CLASSES = ["chubba"] COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3)) # load our serialized model from disk print("[INFO] loading model...") #net = cv2.dnn.readNet(args["model"], args["config") #net = cv2.dnn.readNet('yolo-chubba.bin', 'yolo-chubba.xml') #net = cv2.dnn.readNetFromTensorflow('yolo-chubba.bin', 'yolo-chubba.xml') net = cv2.dnn.readNetFromModelOptimizer('yolo-chubba.xml', 'yolo-chubba.bin') # specify the target device as the Myriad processor on the NCS net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD) # initialize the video stream, allow the cammera sensor to warmup, # and initialize the FPS counter print("[INFO] starting video stream...") vs = VideoStream(usePiCamera=True).start() time.sleep(2.0) fps = FPS().start() # loop over the frames from the video stream while True: # grab the frame from the threaded video stream and resize it # to have a maximum width of 400 pixels frame = vs.read() frame = imutils.resize(frame, width=400) # grab the frame dimensions and convert it to a blob (h, w) = frame.shape[:2] blob = cv2.dnn.blobFromImage(frame, 0.007843, (300, 300), 127.5) # pass the blob through the network and obtain the detections and # predictions net.setInput(blob) detections = net.forward() # loop over the detections for i in np.arange(0, detections.shape[2]): # extract the confidence (i.e., probability) associated with # the prediction confidence = detections[0, 0, i, 2] # filter out weak detections by ensuring the `confidence` is # greater than the minimum confidence if confidence > args["confidence"]: # extract the index of the class label from the # `detections`, then compute the (x, y)-coordinates of # the bounding box for the object idx = int(detections[0, 0, i, 1]) box = detections[0, 0, i, 3:7] * np.array([w, h, w, h]) (startX, startY, endX, endY) = box.astype("int") # draw the prediction on the frame label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100) cv2.rectangle(frame, (startX, startY), (endX, endY), COLORS[idx], 2) y = startY - 15 if startY - 15 > 15 else startY + 15 cv2.putText(frame, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2) # show the output frame cv2.imshow("Frame", frame) key = cv2.waitKey(1) & 0xFF # if the `q` key was pressed, break from the loop if key == ord("q"): break # update the FPS counter fps.update() # stop the timer and display FPS information fps.stop() print("[INFO] elasped time: {:.2f}".format(fps.elapsed())) print("[INFO] approx. FPS: {:.2f}".format(fps.fps())) # do a bit of cleanup cv2.destroyAllWindows() vs.stop()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the info, I am able to see the same error as you mentioned. I have seen an error when trying to convert the model to IR without --batch 1 (see below). This could be the reason why there is an error when running your program (Failed to infer shapes for Concat layer (concat_1) with error: Invalid inputs for Concat layer: dimensions should match in all positions except axis (1) : [[1,256,9,9]] vs [[1,1024,10,10]]). May I ask if its possible to take a look at your model prior freezing (converted to .pb) and also the command used to freeze the model? Feel free to PM me the files in case you don't want to share the files publicly.
Also let me know the repository or instructions used to train your model.
Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Not all output shapes were inferred or fully defined for node "concat_1". For more information please refer to Model Optimizer FAQ (https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html), question #40.Stopped shape/value propagation at "concat_1" node. For more information please refer to Model Optimizer FAQ (https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html), question #38.
Regards,
Luis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Luis,
I did include a batch parameter when converting to IR. Here's the command I used:
$ python mo_tf.py --input_model yolo-chubba.pb --batch 1 --data_type FP16 --tensorflow_use_custom_operations_config C:/"Program Files (x86)"/IntelSWTools/openvino_2019.3.334/deployment_tools/model_optimizer/extensions/front/tf/yolo_v2.json
Happy to share the model prior to freezing. Please see the weights and cfg in this Google Drive folder:
https://drive.google.com/drive/folders/1JTTr1ewgO0ux_ikbfBF6tH6xw8TO3Sj2?usp=sharing
I used the Darkflow repository to retrain a Yolo v2 model on just one class. The model was able to detect the class in both still images and video prior to freezing.
This is the flow command I used to create protobuf:
python flow --model cfg/yolo-chubba.cfg --load -1 --savepb
Thank you again for looking into this!
Pasha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pavel,
Thanks for the info! I am running some tests on my end but is taking me a bit longer than I anticipated. I will get back to you asap.
Regards,
Luis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alpeyev, Pavel,
Sorry for the delay. I concluded my testing and a few things I can comment on and also a suggestion:
- Based on your program, I see that you are using the dnn module from OpenCV, which is not OpenVINO toolkit. I couldn't get your code to work on my Windows environment, meaning I didn't see the same error as seen on the RPi but also the code didn't do much (it will just stop and hang at detections = net.forward()).
- Its been challenging to get a YOLOv2 model to work as the sample code avail for OpenVINO uses YOLOv3. I know YOLOv2 is supported by the toolkit but there isn't a sample program available. The sample available in the Open Model Zoo uses YOLOv3. There are other samples in the NCAPPZOO but only for tiny-yolo-v2 (and tiny-yolo-v3).
- I might take me longer to try and get a sample code to work with YOLOv2 model.
- Is there a reason for you to use YOLOv2 instead of YOLOv3?
- I would suggest to re-train using YOLOv3 instead if possible, since there is a sample available that can be reused for your purpose
Regards,
Luis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Luis,
First, big thanks for sticking with it this far!
The only reason I'm using YOLOv2 is because it's the latest version supported by Darkflow. (I only know Python.)
There were quite a few tutorials on how to re-train a darkflow model, which is the only reason a noob like myself has made it this far.
My project was to use the Pi to recognize by my toddler's face and sound an alarm when she tries to climb the stairs. She will probably go to college by the time I'm done with this :)
But I will look into re-training YOLOv3 or other beasties in the Model Zoo. Would welcome recommendations, if you have any.
Before I let you go, does it make sense to try the script with a still image instead of video?
And do you think the dimensions of new input image might have something to do with the error?
Thanks again,
Pasha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alpeyev, Pavel,
Your project sounds interesting! I'm sure you can get such completed way before to that happening.
I'd suggest to take a look at the OpenVINO training extensions repo and see if there is any model that fits your needs. I also know there is a user (PINTO) who has pretty good step-by-step tutorials on YOLOv3. I haven't tried this one myself but feel free to give it a try.
The YOLOv3 model that can be converted and used by OpenVINO can be found in this repository.
As far as you questions:
does it make sense to try the script with a still image instead of video?
It really is up to you, both work fine. You can do a live video feed and inference all the time, or perhaps somehow (motion sensor?) trigger to capture an image (or start a video feed) and then inference the frame you can also do that.
And do you think the dimensions of new input image might have something to do with the error?
Hmm I don't think that is the issue, but you can always resize the input image to what your network expects and see if the problem is solved.
Regards,
Luis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Luis, appreciate all your help ~
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page