I'm testing out OpenVINO on Mac with ssd_inception_v2 architecture. I quantized the model using DefaultQuantization on CPU. I'm getting a good inferencing speed-up (50 FPS base vs. 80 FPS quantized) via the benchmark_app script, however, the memory usage has increased from 350 MB to 485 MB. Is this expected behavior? If not, what are some potential causes for this increase in memory? Thanks!
Thanks for reaching out to us.
Which OpenVINO version are you using? If possible, please share your quantized model for us to reproduce your issue.
I am using OpenVINO 2021.1.110. Sure, I have attached the quantized .xml. I can't attach the quantized .bin however, as the forum is telling me "the file type (.bin) is not supported. Here's a mediafire link to the quantized .xml/.bin http://www.mediafire.com/folder/064ray5ygzssi/ssd_inception_v2
We are able to replicate your issue and observed higher memory consumption as well. This is not an ideally expected behavior. However, we don't have any targets for memory consumption for quantized models. Our target parameters for quantization are accuracy, throughput (FPS), and latency.
On top of that, SSD Inception v2 is not a validated quantized topology, as per https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_Int8Inference.html
Having said that, I would suggest you try applying optimization methods mentioned in the following page: https://docs.openvinotoolkit.org/latest/pot_docs_BestPractices.html
You can also try AccuracyAwareQuantization algorithm method.
This thread will no longer be monitored since we have provided explanation and suggestions. If you need any additional information from Intel, please submit a new question.