Inception V3 transfer learning (slow?) performance

idata · ‎07-10-2018

Dear community members.

I had been using the zoo inception V3 example achieving a resonable performance around 300ms to 400ms (with 1001 classes), matching the expected results.

I needed to perform a customized training for a proof of concept project. Therefore I implemented a transfer learning based on inception V3, essentially following the steps of the awesome post of Adam https://software.intel.com/en-us/articles/machine-learning-and-mammography

After the transfer learning process the inception V3 archieved a dsicouraging result of 1,8 seconds per image, around 5 to 6 times slower! when compared to zoo's inception V3 example. In fact these are the results that Adam also gets (both in a PC and a Rasp with the NCS) The only apparent difference between zoo example and adam's is in the last layer: instead of 1001 (zoo example), adam's only has 2.

Been throughly checking the code, the time traces, but they are consistent in both examples,. In fact my results are almost identical as the ones got in each example, therefore no systematic error on my side (hopefully).

Anyone can give me any hint or explanation for this big decrease in performance? Maybe you have successfully implemented a transfer learning of an inception V3, how dit it perform? what source did you use as a reference?

I am pretty aware of other less demanding networks, maybe you know one that enhances inceptionn v3 accuracy and it is also easy to be retrained with transfer learning friendly with the Movidius NCS.

Regards

idata · ‎07-10-2018

@bsense Hi, thanks for posting. Some tips that could help:

When compiling your model, make sure to compile with the -s 12 option. This gives the NCS device the option to use up to 12 of the SHAVE vector processors available on the Myriad 2 VPU on the NCS device. Double check this and see if it helps.

An example of using mvNCCompile with -s 12 option: mvNCCheck deploy.prototxt -w model.caffemodel -s 12 -o mymodel.graph

idata · ‎07-13-2018

Thanks Tome.

That did the trick! I missed that point misreably

Now NCS performs cutom trained inceptionV3 with a rasp3B (not plus) at 370ms even faster than a moderate 1060GTX nviidia cpu.

Great!!

BTW: In order to get the most from inception example we had to change some hyperparams that were not quite well set.