I recently converted my trained model (it's Tiny YOLO v2 via Darkflow transformation, has only one class to detect) to NCS graph, and modified @Tome_at_Intel 's example code to run the test using my webcam.
Before doing this, I have tried the sample of live-object-detector with ncapi2_shim (I'm using NCSDK2). The result was fine. The script was processing the test video at the speed of about 80 ms per frame.
But when I applied my own Tiny YOLO v2, the processing time jump to about 1200 ms per frame.
The processing/inference time was obtained by using the following code (it's copied and slightly modified from the code of live-object-detector):
inference_time = graph.get_option(mvnc.GraphOption.RO_TIME_TAKEN)
To confirm the result, I went back to follow @Tome_at_Intel 's comment in which he provided step-by-step instruction of how to generate the frozen model (and it's actually what I've done with my own trained model, of course with different cfg and weight files…) for
The compile command I used was:
mvNCCompile -in input -on output built_graph/tiny-yolo-v2.pb
I used the converted NCS graph (with 20 classes to detect) with the dog.jpg as the input, and the inference time was still about 1200 ms. The output I got is listed as the following:
Found objects in 1207.973388671875 ms. dog 97.35302828411412 222.60164861726685 343.3950671757223 531.8106313107114 car 444.2567046299852 90.19346571245428 685.6577025125912 185.98194001005976 bicycle 139.43421012548873 144.2388337194188 599.3307892196749 434.39073473429 car 452.1365970362832 71.05730864847379 545.5606719185531 152.82092145965296
I always expect Tiny YOLO should be faster than SSD, but the results show that it's not. It is not only slower, but far far slower than the inference time shown in the live-object-detector sample case.
Did I do something stupid in the test or miss anything which is essential to apply Darkflow's Tiny YOLO v2 to NCS? Or there's something with NCS hardware/software in which we users cannot do anything?
Any suggestion or information will be helpful. Thanks.
@hiankun I noticed that you did not compile your graph file with the
-s 12 option. Please use this option as it gives the NCS device the option of using all 12 SHAVE vector processors on the Myriad 2. So for example, use this command to recompile your code:
mvNCCompile -in input -on output built_graph/tiny-yolo-v2.pb -s 12. This should speed up your inference speeds.
I re-compiled the graph with the
-s 12 option, and then the inference time dropped from about 1200 ms to 170 ms.
It's still slower than SSD with NCS, and this confused me. After quick searching, I found a QA post on Quora: Why is SSD faster than YOLO?.
If the above link gives wrong information, please correct me.
No matter whether YOLO should be faster or slower than SSD, my original problem has been solved. The 170 ms inference time is sufficient for my own application.
I need to understand why SSD is always the fastest one, and i wonder if retrain the 20 classes model with just one will that make it faster?
i get the 80ms inference time but i keep wondering if that iterating through classes that i dont need each frame might take some time but noone confirms :( how did you build your own model with just one class?
Sorry for the too late reply.
I'm not very clear about your question, but I will try to give you some info about my tested case and hope that might give you some idea.
Due to some installation problems of Caffe in my system, I have not trained my data (which has only one class to detect) with SSD yet.
What I have done was to train it with Tiny YOLO v2 (Darkflow version). The necessary info to train customized dataset can be found here:
Training on your own dataset.
I myself have not played with the training process for a while and have forgotten many details, but I'm trying to pick it up recently. If you have further problems on the training process, PM me please and maybe we could have some discussions.