Inference time increased even though FLOPS reduced after reducing network width and parameters

idata · ‎11-05-2018

Hi all, I have pruned and removed some least significant filters from my neural network. I've done a profiling for the network before and after pruning. Flops are reduced but time spent for those layers are increased.

@Tome_at_Intel, Is there a hardware concepts like CUDA interface in your SDK. Do we have to use layer sizes in power of two. Like warp size , threads, blocks in CUDA? If some documentation available for maximum perf gain, it will be beneficial.

Please find the attached image.https://drive.google.com/open?id=1SvCrziaF_wHtY-CTboWZWqUpEUVdcsoR

idata · ‎11-07-2018

@chinthysl Thanks for reporting this. Can you share how you pruned your model to reduce the MFLOPs? Additionally, can you provide both the original and pruned models so that I may reproduce/debug on my end? Thanks.

At the moment, we don't have a tuning performance guide for the NCS and NCSDK.

idata · ‎11-14-2018

@Tome_at_Intel Please find the .caffemodel and .prototxt files I used to generate Movidius graph files here https://drive.google.com/open?id=1VDDg8IAtttieVhqzMfOvCLuSeDe4bRDn . And also accuracy of this Movidius graph inference drops 50% down even through caffemodel inference drops only 10%. Seems like Movidius compiler does some additional reduction in the pruned network also. If you can analyze and give us some tips to create network architectures(ex:layer sizes) which supports the compiler well, that would be beneficial.