The reason why the first interference time is slow

darkpilia · ‎11-04-2024

Hello.

When I measure the inference time of the model I am testing, the first inference time is measured very slowly, and this measurement does not seem to be included in the latency value.
I wonder why the first inference time is slow and why this value is not included in the mean value of latency value.

The test log is as follows.

Thank you.

----------------------------------------------------------------------------------------------------------

(python_310) css@sapphire:~/workspace/Resnet/src$ benchmark_app -m model.xml -d CPU -shape [2,2,100,32] -ip f16 -nthreads 1 -hint none
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.3.0-16041-1e3b88e4e3f-releases/2024/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2024.3.0-16041-1e3b88e4e3f-releases/2024/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 5.64 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ] input (node: input) : bf16 / [...] / [?,2,100,32]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : bf16 / [...] / [?,2,100,32]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 2
[ INFO ] Reshaping model: 'input': [2,2,100,32]
[ INFO ] Reshape model took 0.57 ms
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ] input (node: input) : f16 / [N,C,H,W] / [2,2,100,32]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : bf16 / [...] / [2,2,100,32]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 33.16 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: main_graph
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
[ INFO ] NUM_STREAMS: 1
[ INFO ] INFERENCE_NUM_THREADS: 1
[ INFO ] PERF_COUNT: NO
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'bfloat16'>
[ INFO ] PERFORMANCE_HINT: LATENCY
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] ENABLE_CPU_PINNING: True
[ INFO ] SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ] MODEL_DISTRIBUTION_POLICY: set()
[ INFO ] ENABLE_HYPER_THREADING: False
[ INFO ] EXECUTION_DEVICES: ['CPU']
[ INFO ] CPU_DENORMALS_OPTIMIZATION: False
[ INFO ] LOG_LEVEL: Level.NO
[ INFO ] CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[ INFO ] DYNAMIC_QUANTIZATION_GROUP_SIZE: 32
[ INFO ] KV_CACHE_PRECISION: <Type: 'float16'>
[ INFO ] AFFINITY: Affinity.CORE
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'input'!. This input will be filled with random values!
[ INFO ] Fill input 'input' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 2.07 ms
[Step 11/11] Dumping statistics report
[ INFO ] Execution Devices:['CPU']
[ INFO ] Count: 109716 iterations
[ INFO ] Duration: 60000.89 ms
[ INFO ] Latency:
[ INFO ] Median: 0.53 ms
[ INFO ] Average: 0.53 ms
[ INFO ] Min: 0.52 ms
[ INFO ] Max: 1.39 ms
[ INFO ] Throughput: 3657.15 FPS

Peh_Intel · ‎11-05-2024

Hi darkpilia,

I tried running benchmark_app with two Intel Pre-trained models, face-detection-0200 and age-gender-recognition-retail-0013 models as the testing purpose. From both results, the first inference time is in the range of minimum and maximum value.

As such, could you try running the benchmark_app again with your custom model?

Regards,

Peh

Peh_Intel · ‎11-21-2024

Hi darkpilia,

We have not heard back from you. Thank you for your question. If you need any additional information from Intel, please submit a new question as Intel is no longer monitoring this thread.

Regards,

Peh