Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
78 Views

Int8 quantized model slower than unquantized one

Jump to solution

Hi!

I'm trying to quantize FaceMesh model with POT tool using following config (based on default config example):

 

{
    /* Model parameters */

    "model": {
        "model_name": "facemesh", // Model name
        "model": "./facemesh.xml", // Path to model (.xml format)
        "weights": "./facemesh.bin" // Path to weights (.bin format)
    },

    /* Parameters of the engine used for model inference */
    "engine": {
        /* Simplified mode */
        "type": "simplified", 
        "data_source": "./data" 
    },

    /* Optimization hyperparameters */
    "compression": {
        "target_device": "CPU", 
        "algorithms": [
            {
                "name": "DefaultQuantization",
                "params": {
                    "preset": "performance",
                    "stat_subset_size": 300,
                    "shuffle_data": false
                }
            }
        ]
    }
}

 

 Quantized model becomes ~4 times smaller, although its inference time increases ~37%.

Unquantized model benchmark log:

[Step 1/11] Parsing and validating input arguments
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/main.py:29: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 31.38 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 199.60 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'image' precision U8, dimensions (NCHW): 1 3 192 192
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 2 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 3 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      64424 iterations
Duration:   60006.06 ms
Latency:    3.60 ms
Throughput: 1073.62 FPS

 

Quantized model benchmark log:

[Step 1/11] Parsing and validating input arguments
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/main.py:29: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 67.49 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 294.29 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'image' precision U8, dimensions (NCHW): 1 3 192 192
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 2 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 3 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      48160 iterations
Duration:   60007.22 ms
Latency:    4.93 ms
Throughput: 802.57 FPS

 

Could you check please, is it expected result for such model?

BR,
Alexey.

Labels (1)

Accepted Solutions
Highlighted
Community Manager
34 Views

Hi Alexey,

 

Thanks for reaching out.

 

I tested your xml file for both quantized and unquantized. I am getting the same result as you.

OpenVINO quantization depends on specific libraries and devices. It's probably due to unsupported layers in 8-bit integer computation mode for your model to be quantized.

You can refer here for more details: https://github.com/intel/webml-polyfill/issues/1239

 

Also please check the topologies that have been validated for 8-bit inference feature here.

 

Regards,

Aznie


View solution in original post

3 Replies
Beginner
64 Views

Hi!

Having the same issue with exact the same config file.

Waiting for an answer from intel.

0 Kudos
Highlighted
Community Manager
35 Views

Hi Alexey,

 

Thanks for reaching out.

 

I tested your xml file for both quantized and unquantized. I am getting the same result as you.

OpenVINO quantization depends on specific libraries and devices. It's probably due to unsupported layers in 8-bit integer computation mode for your model to be quantized.

You can refer here for more details: https://github.com/intel/webml-polyfill/issues/1239

 

Also please check the topologies that have been validated for 8-bit inference feature here.

 

Regards,

Aznie


View solution in original post

Highlighted
Community Manager
11 Views

Hi Alexey,


This thread will no longer be monitored since this issue has been resolved. If you need any additional information from Intel, please submit a new question.


Best Regards,

Aznie


0 Kudos