I have 4 models that am running on the intel UP Squared card but am having a problem with getting my optimized models to work on it, the fps when the program starts is about 1-2 fps ,so i optimized the models using the OpenVino pot toolkit ,but recently after discovering that Fake Quantize layers are not supported on Myriad VPU device so am looking for alternative ways to speed up my models with minimum accuracy drop, my questions are:
- i have read that nncf have multiple compression algorithms so can i use one of them to optimize my models and get it working on my vpu device? if so which one can i use?
- can i load a pretrained model and apply nncf optimization on it with out the need of re-training it?
Thank you for reaching you to us.
Low precision 8-bit inference is not supported for the Myriad plugin, as mentioned here.
The Quantize Aware-Training is based on the Fake Quantization operation which, in turn, can be represented by a pair of Quantize/Dequantize operations. The important feature of the proposed software solution is the automatic insertion of the Fake Quantization operations, and Fake Quantization layers are automatically inserted in the model graph. Therefore, you cannot use quantized models trained using NNCF with Myriad plugin.
For your second question, you must retrain the model to use the NNCF. The following paper, Introducing a Training Add-on for OpenVINO toolkit: Neural Network Compression Framework, explains the steps needed to use NNCF to implement optimization methods using supported training samples as well as through integration into the custom training code.
The training samples are available at:
This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question.