Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6404 Discussions

Inference time increased even though FLOPS reduced after reducing network width and parameters

idata
Employee
658 Views

Hi all, I have pruned and removed some least significant filters from my neural network. I've done a profiling for the network before and after pruning. Flops are reduced but time spent for those layers are increased.

 

@Tome_at_Intel, Is there a hardware concepts like CUDA interface in your SDK. Do we have to use layer sizes in power of two. Like warp size , threads, blocks in CUDA? If some documentation available for maximum perf gain, it will be beneficial.

 

Please find the attached image.https://drive.google.com/open?id=1SvCrziaF_wHtY-CTboWZWqUpEUVdcsoR
0 Kudos
2 Replies
idata
Employee
399 Views

@chinthysl Thanks for reporting this. Can you share how you pruned your model to reduce the MFLOPs? Additionally, can you provide both the original and pruned models so that I may reproduce/debug on my end? Thanks.

 

At the moment, we don't have a tuning performance guide for the NCS and NCSDK.

0 Kudos
idata
Employee
399 Views

@Tome_at_Intel Please find the .caffemodel and .prototxt files I used to generate Movidius graph files here https://drive.google.com/open?id=1VDDg8IAtttieVhqzMfOvCLuSeDe4bRDn . And also accuracy of this Movidius graph inference drops 50% down even through caffemodel inference drops only 10%. Seems like Movidius compiler does some additional reduction in the pruned network also. If you can analyze and give us some tips to create network architectures(ex:layer sizes) which supports the compiler well, that would be beneficial.

0 Kudos
Reply