<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:Int8 quantized model slower than unquantized one in Intel® Distribution of OpenVINO™ Toolkit</title>
    <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Int8-quantized-model-slower-than-unquantized-one/m-p/1210157#M20688</link>
    <description>&lt;P&gt;Hi Alexey,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for reaching out.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tested your xml file for both quantized and unquantized. I am getting the same result as you.&lt;/P&gt;&lt;P&gt;OpenVINO quantization depends on specific libraries and devices. It's probably due to unsupported layers in 8-bit integer computation mode for your model to be quantized.&lt;/P&gt;&lt;P&gt;You can refer here for more details: &lt;A href="https://github.com/intel/webml-polyfill/issues/1239" rel="noopener noreferrer" target="_blank"&gt;https://github.com/intel/webml-polyfill/issues/1239&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also please check the topologies that have been validated for 8-bit inference feature &lt;A href="https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_Int8Inference.html" rel="noopener noreferrer" target="_blank"&gt;here&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Aznie&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Thu, 17 Sep 2020 11:07:24 GMT</pubDate>
    <dc:creator>IntelSupport</dc:creator>
    <dc:date>2020-09-17T11:07:24Z</dc:date>
    <item>
      <title>Int8 quantized model slower than unquantized one</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Int8-quantized-model-slower-than-unquantized-one/m-p/1209808#M20665</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;
&lt;P&gt;I'm trying to quantize FaceMesh model with POT tool using following config (based on default config example):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;{
    /* Model parameters */

    "model": {
        "model_name": "facemesh", // Model name
        "model": "./facemesh.xml", // Path to model (.xml format)
        "weights": "./facemesh.bin" // Path to weights (.bin format)
    },

    /* Parameters of the engine used for model inference */
    "engine": {
        /* Simplified mode */
        "type": "simplified", 
        "data_source": "./data" 
    },

    /* Optimization hyperparameters */
    "compression": {
        "target_device": "CPU", 
        "algorithms": [
            {
                "name": "DefaultQuantization",
                "params": {
                    "preset": "performance",
                    "stat_subset_size": 300,
                    "shuffle_data": false
                }
            }
        ]
    }
}&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Quantized model becomes ~4 times smaller, although its inference time increases&amp;nbsp;~37%.&lt;/P&gt;
&lt;P&gt;Unquantized model benchmark log:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;[Step 1/11] Parsing and validating input arguments
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/main.py:29: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 31.38 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 199.60 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'image' precision U8, dimensions (NCHW): 1 3 192 192
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 2 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 3 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      64424 iterations
Duration:   60006.06 ms
Latency:    3.60 ms
Throughput: 1073.62 FPS&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Quantized model benchmark log:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;[Step 1/11] Parsing and validating input arguments
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/main.py:29: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(" -nstreams default value is determined automatically for a device. "
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 67.49 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 294.29 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'image' precision U8, dimensions (NCHW): 1 3 192 192
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 2 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[ INFO ] Infer Request 3 filling
[ INFO ] Fill input 'image' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      48160 iterations
Duration:   60007.22 ms
Latency:    4.93 ms
Throughput: 802.57 FPS&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you check please, is it expected result for such model?&lt;/P&gt;
&lt;P&gt;BR,&lt;BR /&gt;Alexey.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Sep 2020 08:27:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Int8-quantized-model-slower-than-unquantized-one/m-p/1209808#M20665</guid>
      <dc:creator>a99user</dc:creator>
      <dc:date>2020-09-16T08:27:21Z</dc:date>
    </item>
    <item>
      <title>Re: Int8 quantized model slower than unquantized one</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Int8-quantized-model-slower-than-unquantized-one/m-p/1209863#M20669</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;
&lt;P&gt;Having the same issue with exact the same config file.&lt;/P&gt;
&lt;P&gt;Waiting for an answer from intel.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Sep 2020 14:18:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Int8-quantized-model-slower-than-unquantized-one/m-p/1209863#M20669</guid>
      <dc:creator>isomov</dc:creator>
      <dc:date>2020-09-16T14:18:26Z</dc:date>
    </item>
    <item>
      <title>Re:Int8 quantized model slower than unquantized one</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Int8-quantized-model-slower-than-unquantized-one/m-p/1210157#M20688</link>
      <description>&lt;P&gt;Hi Alexey,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for reaching out.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tested your xml file for both quantized and unquantized. I am getting the same result as you.&lt;/P&gt;&lt;P&gt;OpenVINO quantization depends on specific libraries and devices. It's probably due to unsupported layers in 8-bit integer computation mode for your model to be quantized.&lt;/P&gt;&lt;P&gt;You can refer here for more details: &lt;A href="https://github.com/intel/webml-polyfill/issues/1239" rel="noopener noreferrer" target="_blank"&gt;https://github.com/intel/webml-polyfill/issues/1239&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also please check the topologies that have been validated for 8-bit inference feature &lt;A href="https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_Int8Inference.html" rel="noopener noreferrer" target="_blank"&gt;here&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Aznie&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 17 Sep 2020 11:07:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Int8-quantized-model-slower-than-unquantized-one/m-p/1210157#M20688</guid>
      <dc:creator>IntelSupport</dc:creator>
      <dc:date>2020-09-17T11:07:24Z</dc:date>
    </item>
    <item>
      <title>Re:Int8 quantized model slower than unquantized one</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Int8-quantized-model-slower-than-unquantized-one/m-p/1211478#M20767</link>
      <description>&lt;P&gt;Hi Alexey,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;This thread will no longer be monitored since this issue has been resolved.&amp;nbsp;If you need any additional information from Intel, please submit a new question.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Best Regards,&lt;/P&gt;&lt;P&gt;Aznie&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 22 Sep 2020 13:17:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/Int8-quantized-model-slower-than-unquantized-one/m-p/1211478#M20767</guid>
      <dc:creator>IntelSupport</dc:creator>
      <dc:date>2020-09-22T13:17:57Z</dc:date>
    </item>
  </channel>
</rss>

