<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:The reason why the first interference time is slow in Intel® Distribution of OpenVINO™ Toolkit</title>
    <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/The-reason-why-the-first-interference-time-is-slow/m-p/1641560#M31591</link>
    <description>&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;Hi darkpilia,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;I tried running benchmark_app with two Intel Pre-trained models, face-detection-0200 and age-gender-recognition-retail-0013 models as the testing purpose. From both results, the first inference time is in the range of minimum and maximum value.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;As such, could you try running the benchmark_app again with your custom model?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;Regards,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;Peh&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Wed, 06 Nov 2024 05:55:03 GMT</pubDate>
    <dc:creator>Peh_Intel</dc:creator>
    <dc:date>2024-11-06T05:55:03Z</dc:date>
    <item>
      <title>The reason why the first interference time is slow</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/The-reason-why-the-first-interference-time-is-slow/m-p/1641164#M31589</link>
      <description>&lt;P&gt;Hello.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;When I measure the inference time of the model I am testing, the first inference time is measured very slowly, and this measurement does not seem to be included in the latency value.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;I wonder why the first inference time is slow and why this value is not included in the mean value of latency value.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The test log is as follows.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;----------------------------------------------------------------------------------------------------------&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;(python_310) css@sapphire:~/workspace/Resnet/src$ benchmark_app -m model.xml -d CPU -shape [2,2,100,32] -ip f16 -nthreads 1 -hint none&lt;BR /&gt;[Step 1/11] Parsing and validating input arguments&lt;BR /&gt;[ INFO ] Parsing input parameters&lt;BR /&gt;[Step 2/11] Loading OpenVINO Runtime&lt;BR /&gt;[ INFO ] OpenVINO:&lt;BR /&gt;[ INFO ] Build ................................. 2024.3.0-16041-1e3b88e4e3f-releases/2024/3&lt;BR /&gt;[ INFO ]&lt;BR /&gt;[ INFO ] Device info:&lt;BR /&gt;[ INFO ] CPU&lt;BR /&gt;[ INFO ] Build ................................. 2024.3.0-16041-1e3b88e4e3f-releases/2024/3&lt;BR /&gt;[ INFO ]&lt;BR /&gt;[ INFO ]&lt;BR /&gt;[Step 3/11] Setting device configuration&lt;BR /&gt;[Step 4/11] Reading model files&lt;BR /&gt;[ INFO ] Loading model files&lt;BR /&gt;[ INFO ] Read model took 5.64 ms&lt;BR /&gt;[ INFO ] Original model I/O parameters:&lt;BR /&gt;[ INFO ] Model inputs:&lt;BR /&gt;[ INFO ] input (node: input) : bf16 / [...] / [?,2,100,32]&lt;BR /&gt;[ INFO ] Model outputs:&lt;BR /&gt;[ INFO ] output (node: output) : bf16 / [...] / [?,2,100,32]&lt;BR /&gt;[Step 5/11] Resizing model to match image sizes and given batch&lt;BR /&gt;[ INFO ] Model batch size: 2&lt;BR /&gt;[ INFO ] Reshaping model: 'input': [2,2,100,32]&lt;BR /&gt;[ INFO ] Reshape model took 0.57 ms&lt;BR /&gt;[Step 6/11] Configuring input of the model&lt;BR /&gt;[ INFO ] Model inputs:&lt;BR /&gt;[ INFO ] input (node: input) : f16 / [N,C,H,W] / [2,2,100,32]&lt;BR /&gt;[ INFO ] Model outputs:&lt;BR /&gt;[ INFO ] output (node: output) : bf16 / [...] / [2,2,100,32]&lt;BR /&gt;[Step 7/11] Loading the model to the device&lt;BR /&gt;[ INFO ] Compile model took 33.16 ms&lt;BR /&gt;[Step 8/11] Querying optimal runtime parameters&lt;BR /&gt;[ INFO ] Model:&lt;BR /&gt;[ INFO ] NETWORK_NAME: main_graph&lt;BR /&gt;[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1&lt;BR /&gt;[ INFO ] NUM_STREAMS: 1&lt;BR /&gt;[ INFO ] INFERENCE_NUM_THREADS: 1&lt;BR /&gt;[ INFO ] PERF_COUNT: NO&lt;BR /&gt;[ INFO ] INFERENCE_PRECISION_HINT: &amp;lt;Type: 'bfloat16'&amp;gt;&lt;BR /&gt;[ INFO ] PERFORMANCE_HINT: LATENCY&lt;BR /&gt;[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE&lt;BR /&gt;[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0&lt;BR /&gt;[ INFO ] ENABLE_CPU_PINNING: True&lt;BR /&gt;[ INFO ] SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE&lt;BR /&gt;[ INFO ] MODEL_DISTRIBUTION_POLICY: set()&lt;BR /&gt;[ INFO ] ENABLE_HYPER_THREADING: False&lt;BR /&gt;[ INFO ] EXECUTION_DEVICES: ['CPU']&lt;BR /&gt;[ INFO ] CPU_DENORMALS_OPTIMIZATION: False&lt;BR /&gt;[ INFO ] LOG_LEVEL: Level.NO&lt;BR /&gt;[ INFO ] CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0&lt;BR /&gt;[ INFO ] DYNAMIC_QUANTIZATION_GROUP_SIZE: 32&lt;BR /&gt;[ INFO ] KV_CACHE_PRECISION: &amp;lt;Type: 'float16'&amp;gt;&lt;BR /&gt;[ INFO ] AFFINITY: Affinity.CORE&lt;BR /&gt;[Step 9/11] Creating infer requests and preparing input tensors&lt;BR /&gt;[ WARNING ] No input files were given for input 'input'!. This input will be filled with random values!&lt;BR /&gt;[ INFO ] Fill input 'input' with random values&lt;BR /&gt;[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, limits: 60000 ms duration)&lt;BR /&gt;[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).&lt;BR /&gt;[ INFO ] First inference took 2.07 ms&lt;BR /&gt;[Step 11/11] Dumping statistics report&lt;BR /&gt;[ INFO ] Execution Devices:['CPU']&lt;BR /&gt;[ INFO ] Count: 109716 iterations&lt;BR /&gt;[ INFO ] Duration: 60000.89 ms&lt;BR /&gt;[ INFO ] Latency:&lt;BR /&gt;[ INFO ] Median: 0.53 ms&lt;BR /&gt;[ INFO ] Average: 0.53 ms&lt;BR /&gt;[ INFO ] Min: 0.52 ms&lt;BR /&gt;[ INFO ] Max: 1.39 ms&lt;BR /&gt;[ INFO ] Throughput: 3657.15 FPS&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2024 23:45:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/The-reason-why-the-first-interference-time-is-slow/m-p/1641164#M31589</guid>
      <dc:creator>darkpilia</dc:creator>
      <dc:date>2024-11-04T23:45:18Z</dc:date>
    </item>
    <item>
      <title>Re:The reason why the first interference time is slow</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/The-reason-why-the-first-interference-time-is-slow/m-p/1641560#M31591</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;Hi darkpilia,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;I tried running benchmark_app with two Intel Pre-trained models, face-detection-0200 and age-gender-recognition-retail-0013 models as the testing purpose. From both results, the first inference time is in the range of minimum and maximum value.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;As such, could you try running the benchmark_app again with your custom model?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;Regards,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;Peh&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 06 Nov 2024 05:55:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/The-reason-why-the-first-interference-time-is-slow/m-p/1641560#M31591</guid>
      <dc:creator>Peh_Intel</dc:creator>
      <dc:date>2024-11-06T05:55:03Z</dc:date>
    </item>
    <item>
      <title>Re:The reason why the first interference time is slow</title>
      <link>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/The-reason-why-the-first-interference-time-is-slow/m-p/1644971#M31641</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;Hi darkpilia,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;We have not heard back from you. Thank you for your question. If you need any additional information from Intel, please submit a new question as Intel is no longer monitoring this thread.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;Regards,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px;"&gt;Peh&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 22 Nov 2024 05:42:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/The-reason-why-the-first-interference-time-is-slow/m-p/1644971#M31641</guid>
      <dc:creator>Peh_Intel</dc:creator>
      <dc:date>2024-11-22T05:42:52Z</dc:date>
    </item>
  </channel>
</rss>

