Model uses only 1 core after quantization

NewMember · ‎07-20-2020

Quantized a model with the following parameters:

    "compression": {
        "target_device": "CPU",
        "algorithms": [
            {
                "name": "AccuracyAwareQuantization",
                "params": {
                    "metric_subset_ratio": 1,
                    "ranking_subset_size": 300,
                    "max_iter_num": 500,
                    "maximal_drop": 0.01,
                    "drop_type": "relative",
                    "base_algorithm": "DefaultQuantization",
                    "use_prev_if_drop_increase": true,
                    "range_estimator": {
                        "preset": "default"
                    }
                }
            }
        ]
    }

The quantized model bin works fine, but at max it uses only 1 core during inference no matter how many cores are visible and free.

IE version: 2.1.2020.4.0-359-21e092122f4-releases/2020/4
Loaded CPU plugin version:
    CPU - MKLDNNPlugin: 2.1.2020.4.0-359-21e092122f4-releases/2020/4
INFO:compression.statistics.collector:Start computing statistics for algorithms : AccuracyAwareQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: AccuracyAwareQuantization
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Start original model inference
INFO:compression.engines.ac_engine:Start inference of 5642 images
Total dataset size: 5642
1000 / 5642 processed in 64.319s
2000 / 5642 processed in 63.766s
3000 / 5642 processed in 64.391s
4000 / 5642 processed in 66.509s
5000 / 5642 processed in 64.553s
5642 objects processed in 364.530 seconds
INFO:compression.engines.ac_engine:Inference finished
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Baseline metrics: {'map': 0.45369710716845546}
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Start quantization
INFO:compression.statistics.collector:Start computing statistics for algorithms : ActivationChannelAlignment
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.statistics.collector:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Start compressed model inference
INFO:compression.engines.ac_engine:Start inference of 5642 images
Total dataset size: 5642
1000 / 5642 processed in 845.572s
2000 / 5642 processed in 843.301s
3000 / 5642 processed in 843.223s
4000 / 5642 processed in 843.403s
5000 / 5642 processed in 843.912s
5642 objects processed in 4761.327 seconds
INFO:compression.engines.ac_engine:Inference finished
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Fully quantized metrics: {'map': 0.4520465728234177}
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Accuracy drop: {'map': 0.0016505343450377574}
INFO:compression.pipeline.pipeline:Finished: AccuracyAwareQuantization
 ===========================================================================

Are there any solutions to this?
Thanks

Max_L_Intel · ‎07-22-2020

This topic seems to be a duplicate of https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Model-inference-time-increases-drastically-after-quantization/m-p/1193741

NewMember · ‎07-22-2020

Yes, this post was rejected so I created the other post. Then when it came back up I couldn't delete this one. Sorry about that.
But that one doesn't have a solution either