Intel® Distribution of OpenVINO™ Toolkit
Community support and discussions about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all things computer vision-related on Intel® platforms.

ncappzoo TinyYolo example

Community Manager



I'm having trouble adapting the TinyYolo example to a custom-trained graph.


I have trained a 4-class classifier and am getting an error here that I cannot reshape array of size 980 into shape (7, 7, 4). I notice the code is hard-coded with two magic numbers: 980 and 1078. The 980 I can understand is 7x7x20 (20=the number of classes in the example). The 1078 seems to be 7x7x22 (num_classes + boxes_per_grid_cell=20+2)


However, having made the relevant substitutions for 4 instead of 20 classes, I still get an error. in this case it's in the line here:


all_boxes = np.reshape(inference_result[upperNum:], (grid_size, grid_size, boxes_per_grid_cell, 4))


The error is now "cannot reshape array of size 7311 into shape (7, 7, 2, 4)". Here, I'm not sure at all where to proceed.


I tried a little more math. In the original case, 1078+(7x7x2x4) = 1470. Why is my output so large? I recall the calculation done for 7x7(2x5 + 20) = 1470, and when 20 is changed to 4 this is 686. In my case, my inference output length is 7605 instead. I don't understand. My yolo cfg file looks like this in the last convolutional layer. Appreciate any tips!


[convolutional] size=1 stride=1 pad=1 filters=45 activation=linear [region] anchors = 0.738768,0.874946, 2.42204,2.65704, 4.30971,7.04493, 10.246,4.59428, 12.6868,11.8741 bias_match=1 classes=4 coords=4 num=5 softmax=1 jitter=.2 rescore=1


Relevant file:


Source paper:
0 Kudos
3 Replies
Community Manager

@ly3 The example you're referring to uses 2 anchor boxes and it looks like you're using 5 anchor boxes. Also it looks like you're using 13x13 grid and not 7x7.


So 4 classes + 1 score + 4 bounding box coordinates = 9 total values per anchor box.


Those 9 values exist for each of the 5 anchor boxes = 45 values per grid cell.


Finally there are 45 values per grid cell x (13 x 13 grid cells) = 7605 values total.


I think it makes more sense if you refer to the Tiny Yolo v2 sample code at Let me know if this helps.

Community Manager

That helped immensely, thank you! So, is the architectural difference between tiny_yolo v1 and tiny_yolo v2, or was the Tiny Yolo demo I linked to (from NCSDK v1's caffe folder) actually "full Yolo" ? The 1470 number sounds like full Yolo. The code is at least running now. Thanks for the timely and detailed help.


For others' reference , in addition to replacing some of the numbers in post_processing() in the ncsdk2 code tome linked above, I had to modify much of the main() code because all the method names were changed from UpperCamelCase to under_score in ncsdk2. I'm running ncsdk1 for various compatibility reasons that I can't completely recall.

Community Manager

@ly3 The Tiny Yolo v1 Caffe model is definitely Tiny Yolo and not full Yolo. You can tell this by the number of convolution filters for each convolution layer. Full Yolo starts with 32 filters for the first convolution layer and Tiny Yolo starts with 16.


You can check the cfg files for Tiny Yolo v2 to see what I'm talking about:


Full Yolo v2:


Tiny Yolo v2: