Solved: Model inference time increases drastically after quantization

NewMember · ‎07-20-2020

Running post training optimization toolkit on a model gives the following results

    "compression": {
        "target_device": "CPU",
        "algorithms": [
            {
                "name": "AccuracyAwareQuantization",
                "params": {
                    "metric_subset_ratio": 1,
                    "ranking_subset_size": 300,
                    "max_iter_num": 500,
                    "maximal_drop": 0.01,
                    "drop_type": "relative",
                    "base_algorithm": "DefaultQuantization",
                    "use_prev_if_drop_increase": true,
                    "range_estimator": {
                        "preset": "default"
                    }
                }
            }
        ]
    }

IE version: 2.1.2020.4.0-359-21e092122f4-releases/2020/4
Loaded CPU plugin version:
    CPU - MKLDNNPlugin: 2.1.2020.4.0-359-21e092122f4-releases/2020/4
INFO:compression.statistics.collector:Start computing statistics for algorithms : AccuracyAwareQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: AccuracyAwareQuantization
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Start original model inference
INFO:compression.engines.ac_engine:Start inference of 5642 images
Total dataset size: 5642
1000 / 5642 processed in 64.319s
2000 / 5642 processed in 63.766s
3000 / 5642 processed in 64.391s
4000 / 5642 processed in 66.509s
5000 / 5642 processed in 64.553s
5642 objects processed in 364.530 seconds
INFO:compression.engines.ac_engine:Inference finished
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Baseline metrics: {'map': 0.45369710716845546}
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Start quantization
INFO:compression.statistics.collector:Start computing statistics for algorithms : ActivationChannelAlignment
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.statistics.collector:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Start compressed model inference
INFO:compression.engines.ac_engine:Start inference of 5642 images
Total dataset size: 5642
1000 / 5642 processed in 845.572s
2000 / 5642 processed in 843.301s
3000 / 5642 processed in 843.223s
4000 / 5642 processed in 843.403s
5000 / 5642 processed in 843.912s
5642 objects processed in 4761.327 seconds
INFO:compression.engines.ac_engine:Inference finished
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Fully quantized metrics: {'map': 0.4520465728234177}
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Accuracy drop: {'map': 0.0016505343450377574}
INFO:compression.pipeline.pipeline:Finished: AccuracyAwareQuantization
 ===========================================================================

It looks like after quantization the model only uses 1 CPU core, previously it was using 10's of cores, but I don't understand the reason for this. There were no other processes that caused this change.
Also running model inference using the IECore python api gives the same results, the new IR model does not use more than 1 core even if it there are more cores free.

Is there any known cause for this behavior? Thanks.

NewMember · ‎08-10-2020

@Max_L_Intel

Thanks for the suggestion. I ended up removing a few layers I thought might be troublesome (Accuracy Aware did not remove them). Once those layers were running at full precision the model speeds improved by about 40%

View solution in original post

Max_L_Intel · ‎07-23-2020

Hi @NewMember

How did you find out that your model uses only 1 CPU core?

Usually the number of cores used for inference is being indirectly configured on custom application level. You can test both you model instances (quantized and original one) with benchmark_app within openvino/deployment_tools/tools.

Sometimes depending on the model, if you don't have enough inference requests to saturate all the CPU cores, it may be sufficient to execute the model on just 1 core. Please run the benchmark app with default settings to check how many inference requests being created for your models. Then further you can run benchmark app with the following parameters: -nthreads <number of physical CPU cores> (e.g. -nthreads 4) and -pin YES (pin threads to cores). In that case you should get best result for throughput value (FPS). And the result should get worse if you increase in two times -nthreads value above the number of cores.

Thanks.

NewMember · ‎07-23-2020

Thanks for the suggestion @Max_L_Intel

I checked the number of cores because the model which was previously running at around 20 images/sec before quantization started running at around 2 images/sec.

I will test it out with the benchmark app and see how it goes. Thanks!

NewMember · ‎07-24-2020

@Max_L_Intel

These are the results of the benchmark app on the fp32 model

[Step 10/11] Measuring performance (Start inference asyncronously, 6 inference requests using 6 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      438 iterations
Duration:   61185.18 ms
Latency:    829.04 ms
Throughput: 229.08 FPS

And for the int8 model

[Step 10/11] Measuring performance (Start inference asyncronously, 6 inference requests using 6 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      72 iterations
Duration:   69004.14 ms
Latency:    5649.49 ms
Throughput: 33.39 FPS

I see similar results even after using pin YES and having the same count for nthreads for both the models

Max_L_Intel · ‎07-27-2020

Hi @NewMember

Thanks for providing benchmark app results. Is that possible for you to share with us both fp32 and int8 model instances? I also think you might be incorrectly quantizing your model, but first I would test the models from my end.

Thanks.
Best regards, Max.

NewMember · ‎07-27-2020

Hi @Max_L_Intel

Unfortunately I cannot share the actual models, but I did try quantizing one of the models from the model zoo using the same parameters as for the troublesome model and the results were same as the int8 model provided in the repo.

Also, I tried the default quantization method too with the same results. The config was same as the example config.

ompression": {
        "target_device": "CPU",
        "algorithms": [
            {
                "name": "DefaultQuantization",
                "params": {
                    "preset": "performance"
                }
            }
        ]
    },

Max_L_Intel · ‎07-28-2020

Hi @NewMember

I've tested Default Quantization method on mobilenet-v1-1.0-224 model from OMZ, and this works as expected.
Which one did you try?

user@user-NUC8i7INH:/opt/intel/openvino_2020.3.194/deployment_tools/tools/post_training_optimization_toolkit$ cat /opt/intel/openvino_2020.3.194/deployment_tools/tools/post_training_optimization_toolkit/configs/examples/quantization/classification/mobilenetV1_tf_int8.json 
{
    "model": {
        "model_name": "mobilenetv1",
        "model": "/opt/intel/openvino_2020.3.194/deployment_tools/open_model_zoo/tools/downloader/public/mobilenet-v1-1.0-224-tf/FP32/mobilenet-v1-1.0-224-tf.xml",
        "weights": "/opt/intel/openvino_2020.3.194/deployment_tools/open_model_zoo/tools/downloader/public/mobilenet-v1-1.0-224-tf/FP32/mobilenet-v1-1.0-224-tf.bin"
    },
    "engine": {
        "config": "./configs/examples/accuracy_checker/mobilenet_v1_tf.yaml"
    },
    "compression": {
        "target_device": "CPU",
        "algorithms": [
            {
                "name": "DefaultQuantization",
                "params": {
                    "preset": "performance",
                    "stat_subset_size": 1000
                }
            }
        ]
    }
}
user@user-NUC8i7INH:/opt/intel/openvino_2020.3.194/deployment_tools/tools/post_training_optimization_toolkit$
user@user-NUC8i7INH:/opt/intel/openvino_2020.3.194/deployment_tools/tools/post_training_optimization_toolkit$ pot -c /opt/intel/openvino_2020.3.194/deployment_tools/tools/post_training_optimization_toolkit/configs/examples/quantization/classification/mobilenetV1_tf_int8.json 
/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
22:38:42 accuracy_checker WARNING: /opt/intel/openvino_2020.3.194/deployment_tools/tools/post_training_optimization_toolkit/compression/algorithms/quantization/optimization/algorithm.py:44: UserWarning: Nevergrad package could not be imported. If you are planning to use theFQ range optimization algo, consider installing itusing pip. This implies advanced usage of the tool.Note that nevergrad is compatible only with Python 3.6+
  'Nevergrad package could not be imported. If you are planning to use the'

INFO:app.run:Output log dir: ./results/mobilenetv1_DefaultQuantization/2020-07-28_22-38-42
INFO:app.run:Creating pipeline:
 Algorithm: DefaultQuantization
 Parameters:
	preset                     : performance
	stat_subset_size           : 1000
	target_device              : CPU
	exec_log_dir               : ./results/mobilenetv1_DefaultQuantization/2020-07-28_22-38-42
 ===========================================================================
IE version: 2.1.2020.3.0-3467-15f2c61a-releases/2020/3
Loaded CPU plugin version:
    CPU - MKLDNNPlugin: 2.1.2020.3.0-3467-15f2c61a-releases/2020/3
INFO:compression.statistics.collector:Start computing statistics for algorithms : DefaultQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: DefaultQuantization
INFO:compression.statistics.collector:Start computing statistics for algorithms : ActivationChannelAlignment
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.statistics.collector:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Finished: DefaultQuantization
 ===========================================================================
user@user-NUC8i7INH:/opt/intel/openvino_2020.3.194/deployment_tools/tools/post_training_optimization_toolkit$
user@user-NUC8i7INH:/opt/intel/openvino_2020.3.194/deployment_tools/tools/post_training_optimization_toolkit$ cd opt/intel/openvino/deployment_tools/tools/benchmark_tool
user@user-NUC8i7INH:/opt/intel/openvino/deployment_tools/tools/benchmark_tool$ python3 benchmark_app.py -m /opt/intel/openvino_2020.3.194/deployment_tools/open_model_zoo/tools/downloader/public/mobilenet-v1-1.0-224-tf/FP32/mobilenet-v1-1.0-224-tf.xml -i /home/user/Downloads/tiny-imagenet-200/val/images
[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.3.0-3467-15f2c61a-releases/2020/3
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2020.3.0-3467-15f2c61a-releases/2020/3

[Step 3/11] Reading the Intermediate Representation network
[ INFO ] Read network took 18.69 ms
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 176.20 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input' precision U8, dimensions (NCHW): 1 3 224 224
[ WARNING ] Some image input files will be ignored: only 4 files are required from 10000
[ INFO ] Infer Request 0 filling
[ INFO ] Prepare image /home/user/Downloads/tiny-imagenet-200/val/images/val_0.JPEG
[ WARNING ] Image is resized from ((64, 64)) to ((224, 224))
[ INFO ] Infer Request 1 filling
[ INFO ] Prepare image /home/user/Downloads/tiny-imagenet-200/val/images/val_1.JPEG
[ WARNING ] Image is resized from ((64, 64)) to ((224, 224))
[ INFO ] Infer Request 2 filling
[ INFO ] Prepare image /home/user/Downloads/tiny-imagenet-200/val/images/val_10.JPEG
[ WARNING ] Image is resized from ((64, 64)) to ((224, 224))
[ INFO ] Infer Request 3 filling
[ INFO ] Prepare image /home/user/Downloads/tiny-imagenet-200/val/images/val_100.JPEG
[ WARNING ] Image is resized from ((64, 64)) to ((224, 224))
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      13220 iterations
Duration:   60025.06 ms
Latency:    17.90 ms
Throughput: 220.24 FPS
user@user-NUC8i7INH:/opt/intel/openvino/deployment_tools/tools/benchmark_tool$ 
user@user-NUC8i7INH:/opt/intel/openvino/deployment_tools/tools/benchmark_tool$ python3 benchmark_app.py -m /opt/intel/openvino_2020.3.194/deployment_tools/tools/post_training_optimization_toolkit/results/mobilenetv1_DefaultQuantization/2020-07-28_22-38-42/optimized/mobilenetv1.xml -i /home/user/Downloads/tiny-imagenet-200/val/images
[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.3.0-3467-15f2c61a-releases/2020/3
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2020.3.0-3467-15f2c61a-releases/2020/3

[Step 3/11] Reading the Intermediate Representation network
[ INFO ] Read network took 27.96 ms
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 220.78 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input' precision U8, dimensions (NCHW): 1 3 224 224
[ WARNING ] Some image input files will be ignored: only 4 files are required from 10000
[ INFO ] Infer Request 0 filling
[ INFO ] Prepare image /home/user/Downloads/tiny-imagenet-200/val/images/val_0.JPEG
[ WARNING ] Image is resized from ((64, 64)) to ((224, 224))
[ INFO ] Infer Request 1 filling
[ INFO ] Prepare image /home/user/Downloads/tiny-imagenet-200/val/images/val_1.JPEG
[ WARNING ] Image is resized from ((64, 64)) to ((224, 224))
[ INFO ] Infer Request 2 filling
[ INFO ] Prepare image /home/user/Downloads/tiny-imagenet-200/val/images/val_10.JPEG
[ WARNING ] Image is resized from ((64, 64)) to ((224, 224))
[ INFO ] Infer Request 3 filling
[ INFO ] Prepare image /home/user/Downloads/tiny-imagenet-200/val/images/val_100.JPEG
[ WARNING ] Image is resized from ((64, 64)) to ((224, 224))
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      18644 iterations
Duration:   60015.52 ms
Latency:    12.75 ms
Throughput: 310.65 FPS
user@user-NUC8i7INH:/opt/intel/openvino/deployment_tools/tools/benchmark_tool$

NewMember · ‎07-28-2020

Hi @Max_L_Intel
I tried the person road one

{
    "model": {
        "model": "/home/vulcanadmin/openvino/model_repo/intel/person-vehicle-bike-detection-crossroad-0078/FP32/person-vehicle-bike-detection-crossroad-0078.xml",
        "weights": "/home/vulcanadmin/openvino/model_repo/intel/person-vehicle-bike-detection-crossroad-0078/FP32/person-vehicle-bike-detection-crossroad-0078.bin"
    },
    "compression": {
        "algorithms": [
            {
                "name": "DefaultQuantization",
                "params": {
                    "preset": "performance",
                    "stat_subset_size": 300
                }
            }
        ],
        "target_device": "CPU"
    },
    "engine":  {
        "config": "/home/vulcanadmin/openvino/person_road_accuracy_checker.yaml"
    }
}

INFO:app.run:Creating pipeline:
 Algorithm: DefaultQuantization
 Parameters:
        preset                     : performance
        stat_subset_size           : 300
        target_device              : CPU
        exec_log_dir               : ./results/person-vehicle-bike-detection-crossroad-0078_DefaultQuantization/2020-07-28_16-16-41
 ===========================================================================
IE version: 2.1.2020.4.0-359-21e092122f4-releases/2020/4
Loaded CPU plugin version:
    CPU - MKLDNNPlugin: 2.1.2020.4.0-359-21e092122f4-releases/2020/4
INFO:compression.statistics.collector:Start computing statistics for algorithms : DefaultQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: DefaultQuantization
INFO:compression.statistics.collector:Start computing statistics for algorithms : ActivationChannelAlignment
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.statistics.collector:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Finished: DefaultQuantization
 ===========================================================================

python benchmark_app.py -m  ~/openvino/model_repo/intel/person-vehicle-bike-detection-crossroad-0078/FP32/person-vehicle-bike-detection-crossroad-0078.xml -i ~/openvino/test_images/
[Step 1/11] Parsing and validating input arguments
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/main.py:29: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 88.44 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 579.74 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 1024 1024
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:97: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  len(image_files)))
[ WARNING ] Some image input files will be ignored: only 6 files are required from 202
[ INFO ] Infer Request 0 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/1.jpg
[ WARNING ] Image is resized from ((1000, 1600)) to ((1024, 1024))
[ INFO ] Infer Request 1 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/2.jpg
[ WARNING ] Image is resized from ((1121, 1600)) to ((1024, 1024))
[ INFO ] Infer Request 2 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/3.jpg
[ WARNING ] Image is resized from ((1000, 1600)) to ((1024, 1024))
[ INFO ] Infer Request 3 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/4.jpg
[ WARNING ] Image is resized from ((1000, 1600)) to ((1024, 1024))
[ INFO ] Infer Request 4 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/4.jpg
[ WARNING ] Image is resized from ((1000, 1600)) to ((1024, 1024))
[ INFO ] Infer Request 5 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/5.jpg
[ WARNING ] Image is resized from ((1000, 1600)) to ((1024, 1024))
[Step 10/11] Measuring performance (Start inference asyncronously, 6 inference requests using 6 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      7566 iterations
Duration:   60063.93 ms
Latency:    46.41 ms
Throughput: 125.97 FPS

python benchmark_app.py -m  ~/openvino/results/person-vehicle-bike-detection-crossroad-0078_DefaultQuantization/2020-07-21_11-58-37/optimized/person-vehicle-bike-detection-crossroad-0078.xml -i ~/openvino/test_images/
[Step 1/11] Parsing and validating input arguments
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/main.py:29: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 243.69 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 1032.64 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 1024 1024
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:97: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  len(image_files)))
[ WARNING ] Some image input files will be ignored: only 6 files are required from 202
[ INFO ] Infer Request 0 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/1.jpg
[ WARNING ] Image is resized from ((1000, 1600)) to ((1024, 1024))
[ INFO ] Infer Request 1 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/2.jpg
[ WARNING ] Image is resized from ((1121, 1600)) to ((1024, 1024))
[ INFO ] Infer Request 2 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/3.jpg
[ WARNING ] Image is resized from ((1000, 1600)) to ((1024, 1024))
[ INFO ] Infer Request 3 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/4.jpg
[ WARNING ] Image is resized from ((1000, 1600)) to ((1024, 1024))
[ INFO ] Infer Request 4 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/5.jpg
[ WARNING ] Image is resized from ((1000, 1600)) to ((1024, 1024))
[ INFO ] Infer Request 5 filling
[ INFO ] Prepare image /home/vulcanadmin/openvino/test_images/ScienceWorld_MainEntrance_20170520_200000528_1959464_00--00--54--380_1632-0.jpg
[ WARNING ] Image is resized from ((1000, 1600)) to ((1024, 1024))
[Step 10/11] Measuring performance (Start inference asyncronously, 6 inference requests using 6 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      8442 iterations
Duration:   60046.63 ms
Latency:    41.61 ms
Throughput: 140.59 FPS

Whereas these are the results I get for my custom model

{
    "model": {
        "model": "/home/vulcanadmin/openvino/model_repo/SSD/SSD.xml",
        "weights": "/home/vulcanadmin/openvino/model_repo/SSD/SSD.bin"
    },
    "engine": {
        "config": "/home/vulcanadmin/openvino/accuracy_checker.yaml"
    },
    "compression": {
        "target_device": "CPU",
        "algorithms": [
            {
                "name": "DefaultQuantization",
                "params": {
                    "preset": "performance",
                    "stat_subset_size": 300
                }
            }
        ]
    }
}
INFO:app.run:Creating pipeline:
 Algorithm: DefaultQuantization
 Parameters:
        preset                     : performance
        stat_subset_size           : 300
        target_device              : CPU
        exec_log_dir               : ./results/SSD_DefaultQuantization/2020-07-28_16-53-50
 ===========================================================================
IE version: 2.1.2020.4.0-359-21e092122f4-releases/2020/4
Loaded CPU plugin version:
    CPU - MKLDNNPlugin: 2.1.2020.4.0-359-21e092122f4-releases/2020/4
INFO:compression.statistics.collector:Start computing statistics for algorithms : DefaultQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: DefaultQuantization
INFO:compression.statistics.collector:Start computing statistics for algorithms : ActivationChannelAlignment
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.statistics.collector:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection
16:57:28 accuracy_checker WARNING: /opt/intel/openvino_2020.4.287/deployment_tools/open_model_zoo/tools/accuracy_checker/accuracy_checker/logging.py:111: UserWarning: data batch 4 is not equal model input batch_size 32.
  warnings.warn(msg)

INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Finished: DefaultQuantization
 ===========================================================================


python benchmark_app.py -m  ~/openvino/model_repo/SSD/SSD.xml -i ~/openvino/test_images/
[Step 1/11] Parsing and validating input arguments
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/main.py:29: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 53.18 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 32
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 432.40 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'inputs' precision U8, dimensions (NCHW): 32 3 416 640
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:97: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  len(image_files)))
[ WARNING ] Some image input files will be ignored: only 192 files are required from 202
[Step 10/11] Measuring performance (Start inference asyncronously, 6 inference requests using 6 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      84 iterations
Duration:   67605.69 ms
Latency:    4761.07 ms
Throughput: 39.76 FPS

 python benchmark_app.py -m  ~/openvino/open_model_zoo/tools/accuracy_checker/results/SSD_DefaultQuantization/2020-07-28_16-53-50/optimized/SSD.xml -i ~/openvino/test_images/
[Step 1/11] Parsing and validating input arguments
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/main.py:29: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 109.94 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 32
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 584.20 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'inputs' precision U8, dimensions (NCHW): 32 3 416 640
[Step 10/11] Measuring performance (Start inference asyncronously, 6 inference requests using 6 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      18 iterations
Duration:   114635.87 ms
Latency:    37933.85 ms
Throughput: 5.02 FPS

Max_L_Intel · ‎07-30-2020

Hi @NewMember

As I see, for person-vehicle-bike-detection-crossroad-0078 anyway you got some small performance increase. It is approximately the same as if we compare FP32 and FP16-INT8 versions of this model from OMZ.

For your custom SSD topology I think you might need to use one of SSD .json config files within <openvino_dir>/deployment_tools/tools/post_training_optimization_toolkit/config/examples/quantization/object_detection/ as a template instead your current one.

There are:
ssd_mobilenetv1_int8.json
ssd_mobilenet_v1_voc_int8.json
ssd_resnet34_1200_int8.json
ssd_resnet50_512_mxnet_int8.json

NewMember · ‎07-31-2020

Hi @Max_L_Intel

This is one of the configs

post_training_optimization_toolkit/configs/examples/quantization/object_detection$ cat ssd_mobilenet_v1_voc_int8.json
{
    "model": {
        "model_name": "mobilenet_ssd",
        "model": "<MODEL_PATH>",
        "weights": "<PATH_TO_WEIGHTS>"
    },
    "engine": {
        "config": "<CONFIG_PATH>"
    },
    "compression": {
        "target_device": "CPU",
        "algorithms": [
            {
                "name": "DefaultQuantization",
                "params": {
                    "preset": "performance",
                    "stat_subset_size": 300
                }
            }
        ]
    }
}

Which is exactly the same config I use for quantization...

Max_L_Intel · ‎08-04-2020

Hi @NewMember

Configuration file ssd_mobilenet_v1_voc_int8.json works as expected for mobilenet-v1-1.0-224 model. Performance increase for quantized mobilenet_v1 model is around 40%.

For your custom trained model you need to take one of existing configuration .json files for SSD topologies, and choose the closest one to your custom topology (depending if there's any publicly available topology that you took as a basis for your model) and then adapt it to your model according to Post-training optimization best practices and the a configuration file structure, for example, for DefaultQuantization Algorithm.

NewMember · ‎08-07-2020

@Max_L_Intel

Are there any guidelines or best practices on picking which layers to quantize and which to ignore.
I see the ssd and resnet have some layers ignored. Is there any reasoning or intuition behind this?

Max_L_Intel · ‎08-10-2020

@NewMember

ignored_scope allows excluding some layers from the quantization process, i.e. their inputs will not be quantized. It may be helpful for some patterns for which it is known in advance that they drop accuracy when executing in low-precision. For example, DetectionOutput layer of SSD model expressed as a subgraph should not be quantized to preserve the accuracy of Object Detection models. One of the sources for the ignored scope can be the AccuracyAware algorithm which can revert layers back to the original precision.

NewMember · ‎08-10-2020

@Max_L_Intel

Thanks for the suggestion. I ended up removing a few layers I thought might be troublesome (Accuracy Aware did not remove them). Once those layers were running at full precision the model speeds improved by about 40%

Max_L_Intel · ‎08-11-2020

@NewMember

Great to hear that. Thanks for reporting this back to OpenVINO community.
If you need any further assistance, please submit a new question as this thread is no longer being monitored.

Best regards, Max.