I have two models, one is vehicle-detection-0200 and the other is a custom plate finder model converted to IR format. The thing is, when I use POT tool to quantize these models, and I specify the target device to be "GPU", the quantized model runs slower!
Also when I specify the target device to be "MYRIAD", I get this error
RuntimeError: [ GENERAL_ERROR ]
/home/jenkins/agent/workspace/private-ci/ie/build-linux-ubuntu18/b/repos/openvino/inference-engine/src/vpu/graph_transformer/src/frontend/frontend.cpp:441 Failed to compile layer "StatefulPartitionedCall/model_1/conv2d_1/Conv2D/fq_input_0": unsupported layer type "FakeQuantize"
Though when I run the POT model on CPU, I get more speed.
What is the cause of these problems?
I have the POT config file attached.
Thanks for reaching out to us and sharing your POT config file.
Based on Supported Layers, FakeQuantize layer is only supported by CPU plugin. Hence, unable to run optimized model on MYRIAD Plugin is expected. However, I was also able to run optimized model on GPU Plugin but getting the model ran slower compared to FP16 model.
Besides, I also optimized a FP16 vehicle-detection-0200 model with config file by setting “ANY”, “CPU” and “GPU” as the target device respectively and measure the performance of these three INT8 models with Benchmark App. Surprisingly, all of these models have not much significant difference when inferencing on CPU and GPU Plugins.
As such, I will highlight this issue with our development team and get back to you at the earliest.
We’ve obtained insights from our development team that this HW (GPU) isn't optimized for quantized models, so the results are to be expected.
Currently, POT models are tested and optimized for CPU plugin only.
For GPU, we recommend you use FP16 models for better performance.
This thread will no longer be monitored since we have provided answers and suggestion. If you need any additional information from Intel, please submit a new question.