- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi all, I have pruned and removed some least significant filters from my neural network. I've done a profiling for the network before and after pruning. Flops are reduced but time spent for those layers are increased.
@Tome_at_Intel, Is there a hardware concepts like CUDA interface in your SDK. Do we have to use layer sizes in power of two. Like warp size , threads, blocks in CUDA? If some documentation available for maximum perf gain, it will be beneficial.
Please find the attached image.https://drive.google.com/open?id=1SvCrziaF_wHtY-CTboWZWqUpEUVdcsoR
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
@chinthysl Thanks for reporting this. Can you share how you pruned your model to reduce the MFLOPs? Additionally, can you provide both the original and pruned models so that I may reproduce/debug on my end? Thanks.
At the moment, we don't have a tuning performance guide for the NCS and NCSDK.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
@Tome_at_Intel Please find the .caffemodel and .prototxt files I used to generate Movidius graph files here https://drive.google.com/open?id=1VDDg8IAtttieVhqzMfOvCLuSeDe4bRDn . And also accuracy of this Movidius graph inference drops 50% down even through caffemodel inference drops only 10%. Seems like Movidius compiler does some additional reduction in the pruned network also. If you can analyze and give us some tips to create network architectures(ex:layer sizes) which supports the compiler well, that would be beneficial.
