Solved: Failed to infer shapes for Concat layer ... Invalid inputs

Alpeyev__Pavel · ‎01-20-2020

Hi,

I'm trying to run an IR of a Yolo V2 Darkflow model on a Raspberry Pi using a Movidius Neural Stick.Getting this error:

terminate called after throwing an instance of 'InferenceEngine::details::InferenceEngineException'
  what():  Failed to infer shapes for Concat layer (concat_1) with error: Invalid inputs for Concat layer: dimensions should match in all positions except axis (1) : [[1,256,9,9]] vs [[1,1024,10,10]]

Not sure why this is happening and couldn't find any discussion of this error on the forum.

My setup:
- OpenVino 2019.3.334 on Windows 10 and Raspbian Buster
- Attaching the IR xml and mapping files and the pb and meta I used to create it
- used this command to create an IR

python mo_tf.py --input_model yolo-chubba.pb --batch 1 --data_type FP16 --tensorflow_use_custom_operations_config C:/"Program Files (x86)"/IntelSWTools/openvino_2019.3.334/deployment_tools/model_optimizer/extensions/front/tf/yolo_v2.json

Any help would be greatly appreciated.

Pasha

Luis_at_Intel · ‎01-31-2020

Hi Alpeyev, Pavel,

Your project sounds interesting! I'm sure you can get such completed way before to that happening.

I'd suggest to take a look at the OpenVINO training extensions repo and see if there is any model that fits your needs. I also know there is a user (PINTO) who has pretty good step-by-step tutorials on YOLOv3. I haven't tried this one myself but feel free to give it a try.

The YOLOv3 model that can be converted and used by OpenVINO can be found in this repository.

As far as you questions:

does it make sense to try the script with a still image instead of video?

It really is up to you, both work fine. You can do a live video feed and inference all the time, or perhaps somehow (motion sensor?) trigger to capture an image (or start a video feed) and then inference the frame you can also do that.

And do you think the dimensions of new input image might have something to do with the error?

Hmm I don't think that is the issue, but you can always resize the input image to what your network expects and see if the problem is solved.

Regards,

Luis

View solution in original post

Alpeyev__Pavel · ‎01-21-2020

Looking at this post made me realized that I failed to change the number of classes for my custom Darkflow model in yolo_v2.json, which should be 1, but still getting the same concat layer error.

Do I need to specify --input_shape to match the resolution of the Pi Camera? Or is it the resolution of the photos I trained it on?

Luis_at_Intel · ‎01-22-2020

Hi Alpeyev, Pavel,

Thanks for reaching out. I am not sure what the issue could be, I was able to convert the model you have shared with the Model Optimizer just fine as you mentioned. May I ask which sample code are you using to run your model? Also if possible please share a sample image we could try as input for the model.

Regards,

Luis

Alpeyev__Pavel · ‎01-22-2020

Luis,

Thank you for the response!

I should have included the code to begin with.
Here it is below. I'm running it in a virtualenv setup with opencv:

# import the necessary packages
from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import argparse
import imutils
import time
import cv2


# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=False,
    help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=False,
    help="path to pre-trained model .bin")
ap.add_argument("-x", "--config", required=False,
    help="path to Caffe pre-trained model .xml")
ap.add_argument("-c", "--confidence", type=float, default=0.2,
    help="minimum probability to filter weak detections")
ap.add_argument("-u", "--movidius", type=bool, default=0,
    help="boolean indicating if the Movidius should be used")
args = vars(ap.parse_args())


# initialize the list of class labels MobileNet SSD was trained to
# detect, then generate a set of bounding box colors for each class
CLASSES = ["chubba"]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))


# load our serialized model from disk
print("[INFO] loading model...")
#net = cv2.dnn.readNet(args["model"], args["config")
#net = cv2.dnn.readNet('yolo-chubba.bin', 'yolo-chubba.xml')
#net = cv2.dnn.readNetFromTensorflow('yolo-chubba.bin', 'yolo-chubba.xml')
net = cv2.dnn.readNetFromModelOptimizer('yolo-chubba.xml', 'yolo-chubba.bin')


# specify the target device as the Myriad processor on the NCS
net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)


# initialize the video stream, allow the cammera sensor to warmup,
# and initialize the FPS counter
print("[INFO] starting video stream...")
vs = VideoStream(usePiCamera=True).start()
time.sleep(2.0)
fps = FPS().start()


# loop over the frames from the video stream
while True:
    # grab the frame from the threaded video stream and resize it
    # to have a maximum width of 400 pixels
    frame = vs.read()
    frame = imutils.resize(frame, width=400)


    # grab the frame dimensions and convert it to a blob
    (h, w) = frame.shape[:2]
    blob = cv2.dnn.blobFromImage(frame, 0.007843, (300, 300), 127.5)


    # pass the blob through the network and obtain the detections and
    # predictions
    net.setInput(blob)
    detections = net.forward()


    # loop over the detections
    for i in np.arange(0, detections.shape[2]):
        # extract the confidence (i.e., probability) associated with
        # the prediction
        confidence = detections[0, 0, i, 2]


        # filter out weak detections by ensuring the `confidence` is
        # greater than the minimum confidence
        if confidence > args["confidence"]:
            # extract the index of the class label from the
            # `detections`, then compute the (x, y)-coordinates of
            # the bounding box for the object
            idx = int(detections[0, 0, i, 1])
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")


            # draw the prediction on the frame
            label = "{}: {:.2f}%".format(CLASSES[idx],
                confidence * 100)
            cv2.rectangle(frame, (startX, startY), (endX, endY),
                COLORS[idx], 2)
            y = startY - 15 if startY - 15 > 15 else startY + 15
            cv2.putText(frame, label, (startX, y),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)


    # show the output frame
    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF


    # if the `q` key was pressed, break from the loop
    if key == ord("q"):
        break


    # update the FPS counter
    fps.update()


# stop the timer and display FPS information
fps.stop()
print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))


# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

Luis_at_Intel · ‎01-23-2020

Thanks for the info, I am able to see the same error as you mentioned. I have seen an error when trying to convert the model to IR without --batch 1 (see below). This could be the reason why there is an error when running your program (Failed to infer shapes for Concat layer (concat_1) with error: Invalid inputs for Concat layer: dimensions should match in all positions except axis (1) : [[1,256,9,9]] vs [[1,1024,10,10]]). May I ask if its possible to take a look at your model prior freezing (converted to .pb) and also the command used to freeze the model? Feel free to PM me the files in case you don't want to share the files publicly.

Also let me know the repository or instructions used to train your model.

Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Not all output shapes were inferred or fully defined for node "concat_1". 
For more information please refer to Model Optimizer FAQ (https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html), question #40.Stopped shape/value propagation at "concat_1" node. 
For more information please refer to Model Optimizer FAQ (https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html), question #38.

Regards,

Luis

Alpeyev__Pavel · ‎01-23-2020

Luis,

I did include a batch parameter when converting to IR. Here's the command I used:

$ python mo_tf.py --input_model yolo-chubba.pb --batch 1 --data_type FP16 --tensorflow_use_custom_operations_config C:/"Program Files (x86)"/IntelSWTools/openvino_2019.3.334/deployment_tools/model_optimizer/extensions/front/tf/yolo_v2.json

Happy to share the model prior to freezing. Please see the weights and cfg in this Google Drive folder:
https://drive.google.com/drive/folders/1JTTr1ewgO0ux_ikbfBF6tH6xw8TO3Sj2?usp=sharing

I used the Darkflow repository to retrain a Yolo v2 model on just one class. The model was able to detect the class in both still images and video prior to freezing.
This is the flow command I used to create protobuf:

 python flow --model cfg/yolo-chubba.cfg --load -1 --savepb

Thank you again for looking into this!

Pasha

Luis_at_Intel · ‎01-24-2020

Hi Pavel,

Thanks for the info! I am running some tests on my end but is taking me a bit longer than I anticipated. I will get back to you asap.

Regards,

Luis

Luis_at_Intel · ‎01-29-2020

Hi Alpeyev, Pavel,

Sorry for the delay. I concluded my testing and a few things I can comment on and also a suggestion:

Based on your program, I see that you are using the dnn module from OpenCV, which is not OpenVINO toolkit. I couldn't get your code to work on my Windows environment, meaning I didn't see the same error as seen on the RPi but also the code didn't do much (it will just stop and hang at detections = net.forward()).
Its been challenging to get a YOLOv2 model to work as the sample code avail for OpenVINO uses YOLOv3. I know YOLOv2 is supported by the toolkit but there isn't a sample program available. The sample available in the Open Model Zoo uses YOLOv3. There are other samples in the NCAPPZOO but only for tiny-yolo-v2 (and tiny-yolo-v3).
- I might take me longer to try and get a sample code to work with YOLOv2 model.
Is there a reason for you to use YOLOv2 instead of YOLOv3?
- I would suggest to re-train using YOLOv3 instead if possible, since there is a sample available that can be reused for your purpose

Regards,

Luis

Alpeyev__Pavel · ‎01-30-2020

Luis,

First, big thanks for sticking with it this far!

The only reason I'm using YOLOv2 is because it's the latest version supported by Darkflow. (I only know Python.)
There were quite a few tutorials on how to re-train a darkflow model, which is the only reason a noob like myself has made it this far.

My project was to use the Pi to recognize by my toddler's face and sound an alarm when she tries to climb the stairs. She will probably go to college by the time I'm done with this :)

But I will look into re-training YOLOv3 or other beasties in the Model Zoo. Would welcome recommendations, if you have any.

Before I let you go, does it make sense to try the script with a still image instead of video?
And do you think the dimensions of new input image might have something to do with the error?

Thanks again,

Pasha

Luis_at_Intel · ‎01-31-2020

Hi Alpeyev, Pavel,

Your project sounds interesting! I'm sure you can get such completed way before to that happening.

I'd suggest to take a look at the OpenVINO training extensions repo and see if there is any model that fits your needs. I also know there is a user (PINTO) who has pretty good step-by-step tutorials on YOLOv3. I haven't tried this one myself but feel free to give it a try.

The YOLOv3 model that can be converted and used by OpenVINO can be found in this repository.

As far as you questions:

does it make sense to try the script with a still image instead of video?

It really is up to you, both work fine. You can do a live video feed and inference all the time, or perhaps somehow (motion sensor?) trigger to capture an image (or start a video feed) and then inference the frame you can also do that.

And do you think the dimensions of new input image might have something to do with the error?

Hmm I don't think that is the issue, but you can always resize the input image to what your network expects and see if the problem is solved.

Regards,

Luis

Alpeyev__Pavel · ‎01-31-2020

Luis, appreciate all your help ~