Re: openvino CPU inference time issue

zxc0328 · ‎10-30-2025

hello, i have a quesiton about Inferece time, could you please help to answer?

1> what cause the difference between case 1 and case 2?

2>before inference, need to complile the mode again every time? if needed, could you share why?

case 1
1》create openvino runtime core
//Load IR model
ov::Core core;
auto model = core.read_model(MODEL_XML, MODEL_BIN);

2》compile model
//setting input data format and layout
ov::preprocess::PrePostProcessor ppp(model);
ov::preprocess::InputInfo& inputInfo = ppp.input();
inputInfo.tensor().set_element_type(ov::element::f32);
inputInfo.tensor().set_layout({ "NHWC" });
inputInfo.tensor().set_shape({ BATCH_NUM, 36, 16, 16 });
model = ppp.build();
core.set_property(ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY));

//Compiled Model
ov::CompiledModel compiled_model = core.compile_model(model, "CPU", {{ov::cache_dir.name(),"./cache"}});

3》create inference request
infer_request = compiled_model.create_infer_request();

4》set inputs

5》inference warn up

infer_request.infer();

loop step 6,7
6》start inference
infer_request.infer();

7》process inference reslut

case 2
1》create openvino runtime core
//Load IR model
ov::Core core;
auto model = core.read_model(MODEL_XML, MODEL_BIN);

2》compile model
//setting input data format and layout
ov::preprocess::PrePostProcessor ppp(model);
ov::preprocess::InputInfo& inputInfo = ppp.input();
inputInfo.tensor().set_element_type(ov::element::f32);
inputInfo.tensor().set_layout({ "NHWC" });
inputInfo.tensor().set_shape({ BATCH_NUM, 36, 16, 16 });
model = ppp.build();
core.set_property(ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY));

//Compiled Model
ov::CompiledModel compiled_model = core.compile_model(model, "CPU", {{ov::cache_dir.name(),"./cache"}});

3》create inference request
infer_request = compiled_model.create_infer_request();

4》set inputs

5》inference warn up
infer_request.infer();

loop step 6,7,8

6》compile model
//Load IR model
ov::Core core;
auto model = core.read_model(MODEL_XML, MODEL_BIN);
ov::CompiledModel compiled_model = core.compile_model(model, "CPU");

7》start inference
infer_request.infer();

8》process inference reslut

compare case1 and case2,
1>for case1,sometimes there is bigger value of the inferece time and it is not stable

infer request Time taken: 0.436834 ms, coreId=10
infer request Time taken: 0.435128 ms, coreId=10
infer request Time taken: 21.8207 ms, coreId=10 ----->big value of the inferece time
infer request Time taken: 0.497611 ms, coreId=10
infer request Time taken: 0.442614 ms, coreId=10

2>for case2,there is no bigger value of the inferece time and it is stable

infer request Time taken: 0.572869 ms, coreId=10
infer request Time taken: 0.566436 ms, coreId=10
infer request Time taken: 0.577745 ms, coreId=10
infer request Time taken: 0.559088 ms, coreId=10
infer request Time taken: 0.583944 ms, coreId=10

Wan_Intel · ‎10-30-2025

Hi zxc0328,

Thank you for reaching out to us.

For your information, once you compiled the model, you don’t have to recompile the model to run inference.

On another note, to further investigate the inference time of your application, could you please share the following information with us?

Hardware specifications
Host Operating System
OpenVINO™ toolkit version used
Deep Learning Framework used
Minimal code to reproduce the issue

Regards,

Wan

zxc0328 · ‎10-30-2025

Hi

Thank you for reply~

1》Hardware specifications
[root@localhost ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel
CPU family: 6
Model: 143
Model name: Intel(R) Xeon(R) Gold 6433N
BIOS Model name: Intel(R) Xeon(R) Gold 6433N
Stepping: 8
CPU MHz: 1400.000
CPU max MHz: 3600.0000
CPU min MHz: 800.0000
BogoMIPS: 2800.00
Virtualization: VT-x
L1d cache: 48K
L1i cache: 32K
L2 cache: 2048K
L3 cache: 61440K
NUMA node0 CPU(s): 0-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_pkg_req hfi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr avx512_fp16 flush_l1d arch_capabilities

2》Host Operating System
[root@localhost ~]# cat /etc/os-release
NAME="Rocky Linux"
VERSION="8.8 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.8 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.8"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.8"

3》OpenVINO™ toolkit version
openvino_toolkit_rhel8_2025.2.0.19140.c01cd93e24d_x86_64.tgz
openvino_toolkit_rhel8_2025.2.0.19140.c01cd93e24d_x86_64.tgz.sha256

4》Deep Learning Framework

TensorFlow

5》 Minimal code to reproduce the issue
1> there is no this issue when using openvino environment to run Inference.
2> there is the issue when integrate the model into a real-time system.

could you share what log do you need to further investigate, i will get the related log and share them to you

Wan_Intel · ‎10-31-2025

Hi zxc0328,

Thank you for sharing the information with us.

I noticed that you are currently using an older version of OpenVINO™ toolkit. You may install the latest version via the following link: https://storage.openvinotoolkit.org/repositories/openvino/packages/2025.3/linux

Next, you may compile the model and run your application without re-compiling the model and see if the issue persists. If the issue persists, you may share your minimal code snippet with us so that we can replicate the issue from our system.

Regards,

Wan

zxc0328 · ‎10-31-2025

Hi

Thanks a lot for the prompt responses~

i checked with version 2025.3.0, compile the model and run my application without re-compiling the model and the issue persists.

The AI model that i used is a private model.

is there any other debug method to further check this issue? for example, log print...and so on
i can get log and share

Wan_Intel · ‎11-02-2025

Hi zxc0328,

I noticed the results from previous reply, one out of five is 21ms, while the rest is 0.4ms. Did you encounter the same issue while using the OpenVINO™ 2025.3? You may share the result with us.

Meanwhile, could you try to modify and run Hello Classification Sample with resnet-50-pytorch model and see if the issue persists? You may download the image from here, and you may download and convert resnet-50-pytorch with OpenVINO™ model downloader and OpenVINO™ model converter. Installation steps are available here.

#include <iterator>
#include <memory>
#include <sstream>
#include <string>
#include <vector>
#include "openvino/openvino.hpp"
#include "samples/args_helper.hpp"
#include "samples/common.hpp"
#include "samples/classification_results.h"
#include "samples/slog.hpp"
#include "format_reader_ptr.h"
#include <iostream>
#include <chrono>

/**
 * @brief Main with support Unicode paths, wide strings
 */
int tmain(int argc, tchar* argv[]) {
    try {
        // -------- Get OpenVINO runtime version --------
        slog::info << ov::get_openvino_version() << slog::endl;

        // -------- Parsing and validation of input arguments --------
        if (argc != 4) {
            slog::info << "Usage : " << TSTRING2STRING(argv[0]) << " <path_to_model> <path_to_image> <device_name>"
                       << slog::endl;
            return EXIT_FAILURE;
        }

        const std::string args = TSTRING2STRING(argv[0]);
        const std::string model_path = TSTRING2STRING(argv[1]);
        const std::string image_path = TSTRING2STRING(argv[2]);
        const std::string device_name = TSTRING2STRING(argv[3]);

        // -------- Step 1. Initialize OpenVINO Runtime Core --------
        ov::Core core;

        // -------- Step 2. Read a model --------
        slog::info << "Loading model files: " << model_path << slog::endl;
        std::shared_ptr<ov::Model> model = core.read_model(model_path);
        printInputAndOutputsInfo(*model);

        OPENVINO_ASSERT(model->inputs().size() == 1, "Sample supports models with 1 input only");
        OPENVINO_ASSERT(model->outputs().size() == 1, "Sample supports models with 1 output only");

        // -------- Step 3. Set up input

        // Read input image to a tensor and set it to an infer request
        // without resize and layout conversions
        FormatReader::ReaderPtr reader(image_path.c_str());
        if (reader.get() == nullptr) {
            std::stringstream ss;
            ss << "Image " + image_path + " cannot be read!";
            throw std::logic_error(ss.str());
        }

        ov::element::Type input_type = ov::element::u8;
        ov::Shape input_shape = {1, reader->height(), reader->width(), 3};
        std::shared_ptr<unsigned char> input_data = reader->getData();

        // just wrap image data by ov::Tensor without allocating of new memory
        ov::Tensor input_tensor = ov::Tensor(input_type, input_shape, input_data.get());

        const ov::Layout tensor_layout{"NHWC"};

        // -------- Step 4. Configure preprocessing --------

        ov::preprocess::PrePostProcessor ppp(model);

        // 1) Set input tensor information:
        // - input() provides information about a single model input
        // - reuse precision and shape from already available `input_tensor`
        // - layout of data is 'NHWC'
        ppp.input().tensor().set_shape(input_shape).set_element_type(input_type).set_layout(tensor_layout);
        // 2) Adding explicit preprocessing steps:
        // - convert layout to 'NCHW' (from 'NHWC' specified above at tensor layout)
        // - apply linear resize from tensor spatial dims to model spatial dims
        ppp.input().preprocess().resize(ov::preprocess::ResizeAlgorithm::RESIZE_LINEAR);
        // 4) Suppose model has 'NCHW' layout for input
        ppp.input().model().set_layout("NCHW");
        // 5) Set output tensor information:
        // - precision of tensor is supposed to be 'f32'
        ppp.output().tensor().set_element_type(ov::element::f32);

        // 6) Apply preprocessing modifying the original 'model'
        model = ppp.build();

        // -------- Step 5. Loading a model to the device --------
        ov::CompiledModel compiled_model = core.compile_model(model, device_name);

        // -------- Step 6. Create an infer request --------
        ov::InferRequest infer_request = compiled_model.create_infer_request();
        // -----------------------------------------------------------------------------------------------------

        // -------- Step 7. Prepare input --------
        infer_request.set_input_tensor(input_tensor);

        // -------- Step 8. Do inference synchronously --------
        
        // Start inference 10 times
        for (int i = 1; i <= 10; ++i)
        {
            // Start Timer
            auto start = std::chrono::high_resolution_clock::now();
            infer_request.infer();
            const ov::Tensor& output_tensor = infer_request.get_output_tensor();
            ClassificationResult classification_result(output_tensor, {image_path});
            // End Timer
            auto stop = std::chrono::high_resolution_clock::now();
            // Calculate the duration
            auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
            std::cout << "Time taken for Iteration " << i << ": " << duration.count() << "ms" << std::endl;
        }    
        //Calculate the duration in miroseconds

        // -----------------------------------------------------------------------------------------------------
    } catch (const std::exception& ex) {
        std::cerr << ex.what() << std::endl;
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

Reference:

Regards,

Wan

zxc0328 · ‎11-03-2025

Hi

1》the following is the test result with OpenVINO™ 2025.3

infer request Time taken: 0.45246 ms, coreId=10
infer request Time taken: 0.45675 ms, coreId=10
infer request Time taken: 97.5025 ms, coreId=10
infer request Time taken: 0.436692 ms, coreId=10
infer request Time taken: 0.434528 ms, coreId=10

2》we use the following method to check the detail inference time, but did not find the reason why there is big Inference time

core.set_property("CPU", ov::enable_profiling(true));

is there any way to check more detail inference time in the inner of infer_request.infer()?

3》about the sample you shared, we are trying with it. If there is the result, will share with you

Wan_Intel · ‎11-11-2025

Hi zxc0328,

Just wanted to follow up and see if you are able to run the sample provided in the previous reply. If you are facing issue while running the sample, you may share the encountered issue with us.

Regards,

Wan

zxc0328 · ‎11-12-2025

1》download resnet50 model

>>> import ssl
>>> import tensorflow as tf
>>> from tensorflow.keras.applications import ResNet50
>>> ssl._create_default_https_context = ssl._create_unverified_context
>>> model = ResNet50(weights='imagenet')
>>> model.save('resnet50_keras.h5')

2》i use the following cmd to convert model to IR format

ovc model_resnet50/ --compress_to_fp16 False
[ SUCCESS ] XML file: /home/samsung/openvino_env_fy/bin/model_resnet50.xml
[ SUCCESS ] BIN file: /home/samsung/openvino_env_fy/bin/model_resnet50.bin

when run the test the above model and found the shape was wrong

[xuchao.zheng@localhost Release]$ ./hello_classification /home/xuchao.zheng/openvino/resnet50/model_resnet50.xml /home/xuchao.zheng/openvino/resnet50/dog.bmp CPU
[ INFO ] Build ................................. 2025.3.0-19807-44526285f24-releases/2025/3
[ INFO ]
[ INFO ] Loading model files: /home/xuchao.zheng/openvino/resnet50/model_resnet50.xml
[ INFO ] model name: TensorFlow_Frontend_IR
[ INFO ] inputs
[ INFO ] input name: input_1
[ INFO ] input type: f32
Exception from src/core/src/partial_shape.cpp:266:
to_shape was called on a dynamic shape.

3》i use the following to convert the mode to IR format

ov_model = ov.convert_model(model,input=[1, 224, 224, 3])
ov.save_model(ov_model,'resnet50_saved_model.xml')

when run the test the above model and found the shape was wrong

[xuchao.zheng@localhost Release]$ ./hello_classification /home/xuchao.zheng/openvino/resnet50/resnet50_saved_model.xml /home/xuchao.zheng/openvino/resnet50/dog.bmp CPU
[ INFO ] Build ................................. 2025.3.0-19807-44526285f24-releases/2025/3
[ INFO ]
[ INFO ] Loading model files: /home/xuchao.zheng/openvino/resnet50/resnet50_saved_model.xml
[ INFO ] model name: TensorFlow_Frontend_IR
[ INFO ] inputs
[ INFO ] input name: input_1
[ INFO ] input type: f32
[ INFO ] input shape: [1,224,224,3]
[ INFO ] outputs
[ INFO ] output name: predictions/Softmax:0
[ INFO ] output type: f32
[ INFO ] output shape: [1,1000]
Check 'node.get_partial_shape().compatible(context.model_shape())' failed at src/core/src/preprocess/preprocess_impls.cpp:210:
Resulting shape '[1,3,224,3]' after preprocessing is not aligned with original parameter's shape: [1,224,224,3], input parameter: input_1

4》change input to convert the model again

>>> ov_model = ov.convert_model(model,input=[1, 3, 224, 3]) //change shape parameter
>>> ov.save_model(ov_model,'resnet50_saved_model.xml')

when run the test the above model and found there is no issue

[xuchao.zheng@localhost Release]$ ./hello_classification /home/xuchao.zheng/openvino/resnet50/resnet50_saved_model.xml /home/xuchao.zheng/openvino/resnet50/dog.bmp CPU
[ INFO ] Build ................................. 2025.3.0-19807-44526285f24-releases/2025/3
[ INFO ]
[ INFO ] Loading model files: /home/xuchao.zheng/openvino/resnet50/resnet50_saved_model.xml
[ INFO ] model name: TensorFlow_Frontend_IR
[ INFO ] inputs
[ INFO ] input name: input_1
[ INFO ] input type: f32
[ INFO ] input shape: [1,3,224,3]
[ INFO ] outputs
[ INFO ] output name: predictions/Softmax:0
[ INFO ] output type: f32
[ INFO ] output shape: [1,1000]

Top 10 results:

Image /home/xuchao.zheng/openvino/resnet50/dog.bmp

classid probability
------- -----------
916 0.8489928
800 0.0309246
664 0.0272908
851 0.0069002
782 0.0055445
846 0.0050483
892 0.0040564
498 0.0031591
629 0.0023113
464 0.0018572

5》i add log to check the model/input data shape and output the inference input data to .bin file for personal test

/opt/intel/openvino_2025.3.0/samples/cpp/hello_classification/main.cpp

std::shared_ptr<unsigned char> input_data = reader->getData();

std::cout << "x" << reader->size()<< "x" << reader->height() << "x" << reader->width() << std::endl; // add log for checking the H and W vaule

writeArrayData(input_data, reader->size()); //add function to write dog.bmp data into .bin file for Inference

// just wrap image data by ov::Tensor without allocating of new memory
ov::Tensor input_tensor = ov::Tensor(input_type, input_shape, input_data.get());

ov::Shape tensor_shape1 = input_tensor.get_shape(); // add log for checking input_tensor shape
size_t h1 = tensor_shape1[1];
size_t w1 = tensor_shape1[2];
size_t c1 = tensor_shape1[3];

// -------- Step 7. Prepare input --------
infer_request.set_input_tensor(input_tensor);

ov::Tensor input_tensor0 = infer_request.get_input_tensor(); // add log for checking input_tensor shape
ov::Shape tensor_shape0 = input_tensor0.get_shape();
size_t h0 = tensor_shape0[1];
size_t w0 = tensor_shape0[2];
size_t c0 = tensor_shape0[3];
printf("h0 = %lld, w0 = %lld, c0 = %lld h1 = %lld, w1 = %lld, c1 = %lld\n", h0, w0, c0, h1, w1, c1);

[xuchao.zheng@localhost Release]$ ./hello_classification /home/xuchao.zheng/openvino/resnet50/resnet50_saved_model.xml /home/xuchao.zheng/openvino/resnet50/dog.bmp CPU
[ INFO ] Build ................................. 2025.3.0-19807-44526285f24-releases/2025/3
[ INFO ]
[ INFO ] Loading model files: /home/xuchao.zheng/openvino/resnet50/resnet50_saved_model.xml
[ INFO ] model name: TensorFlow_Frontend_IR
[ INFO ] inputs
[ INFO ] input name: input_1
[ INFO ] input type: f32
[ INFO ] input shape: [1,3,224,3]
[ INFO ] outputs
[ INFO ] output name: predictions/Softmax:0
[ INFO ] output type: f32
[ INFO ] output shape: [1,1000]
x150528x224x224
done，data size: 150528
h0 = 224, w0 = 224, c0 = 3 h1 = 224, w1 = 224, c1 = 3

Top 10 results:

Image /home/xuchao.zheng/openvino/resnet50/dog.bmp

classid probability
------- -----------
916 0.8489928
800 0.0309246
664 0.0272908
851 0.0069002
782 0.0055445
846 0.0050483
892 0.0040564
498 0.0031591
629 0.0023113
464 0.0018572

6》i make a personal test for the converted resnet50 and only the first inferece time can be printed, i do not find the reason.

[xuchao.zheng@localhost resnet50]$ ./Inference
Time taken for Iteration 1: 1ms
Time taken for Iteration 2: 0ms
Time taken for Iteration 3: 0ms
Time taken for Iteration 4: 0ms
Time taken for Iteration 5: 0ms
Time taken for Iteration 6: 0ms
Time taken for Iteration 7: 0ms
Time taken for Iteration 8: 0ms
Time taken for Iteration 9: 0ms
Time taken for Iteration 10: 0ms

ai_r.h

#pragma once
#include <cstddef>
#include <iostream>
#include <fstream>
#include <cmath>
#include <vector>
#include <complex>
#include <cstring>
#include <chrono>
#include <cstdint>
#include <openvino/openvino.hpp>

#define MODEL_XML "resnet50_saved_model.xml"
#define MODEL_BIN "resnet50_saved_model.bin"

extern std::string INPUT_REF_DATA;             // MA input data

void infer_input_load(ov::InferRequest& infer_request);
void infer_init(ov::InferRequest& infer_request);
void ai_ce_request(ov::InferRequest& infer_request);

ai_r.cpp

#include <ai_r.h>

std::string INPUT_REF_DATA = "data.bin";
std::shared_ptr<unsigned char> input_data;

void infer_input_load(ov::InferRequest& infer_request) {
    ov::Tensor input_tensor = infer_request.get_input_tensor();
    unsigned char* host_data = input_tensor.data<unsigned char>();
    // Load inference data
    std::ifstream file(INPUT_REF_DATA, std::ios::binary);
    if (!file.is_open()) {
        std::cerr << "Failed to open file: " << INPUT_REF_DATA << std::endl;
        return ;
    }

    file.seekg(0, file.end);
    size_t size = file.tellg();
    file.seekg(0, file.beg);
	
	
    std::shared_ptr<unsigned char> input_data(new unsigned char[size]);    
    file.read(reinterpret_cast<char*>(input_data.get()), size);
    file.close();

    //ov::element::Type input_type = ov::element::u8;
    //ov::Shape input_shape = {1, 224, 224, 3};

    // just wrap image data by ov::Tensor without allocating of new memory
    //ov::Tensor input_tensor = ov::Tensor(input_type, input_shape, input_data.get());
	
	// -------- Step 7. Prepare input --------
    //infer_request.set_input_tensor(input_tensor);
    memcpy(host_data, input_data.get(), size);
    
}

void infer_init(ov::InferRequest& infer_request) {
	
    //Load IR model
    ov::Core core;
    auto model = core.read_model(MODEL_XML, MODEL_BIN);
	
    ov::element::Type input_type = ov::element::u8;
    ov::Shape input_shape = {1, 224, 224, 3};

    const ov::Layout tensor_layout{"NHWC"};

	// -------- Step 4. Configure preprocessing --------

	ov::preprocess::PrePostProcessor ppp(model);

	// 1) Set input tensor information:
	// - input() provides information about a single model input
	// - reuse precision and shape from already available `input_tensor`
	// - layout of data is 'NHWC'
	ppp.input().tensor().set_shape(input_shape).set_element_type(input_type).set_layout(tensor_layout);
	// 2) Adding explicit preprocessing steps:
	// - convert layout to 'NCHW' (from 'NHWC' specified above at tensor layout)
	// - apply linear resize from tensor spatial dims to model spatial dims
	ppp.input().preprocess().resize(ov::preprocess::ResizeAlgorithm::RESIZE_LINEAR);
	// 4) Suppose model has 'NCHW' layout for input
	ppp.input().model().set_layout("NCHW");
	// 5) Set output tensor information:
	// - precision of tensor is supposed to be 'f32'
	ppp.output().tensor().set_element_type(ov::element::f32);

	// 6) Apply preprocessing modifying the original 'model'
	model = ppp.build();	
	
    //apply changesand get compiled model

    ov::CompiledModel compiled_model = core.compile_model(model, "CPU");

    //create infer request
    infer_request = compiled_model.create_infer_request();
	
}

void ai_ce_request(ov::InferRequest& infer_request) {
	infer_request.infer();	
}

main_test.cpp

#include <ai_r.h>

/**
 * @brief Main with support Unicode paths, wide strings
 */
int main() {
    try {

        ov::InferRequest infer_request;

        infer_init(infer_request);
        infer_input_load(infer_request);
		
        // -------- Step 8. Do inference synchronously --------
        
        // Start inference 10 times
        for (int i = 1; i <= 10; ++i)
        {
            // Start Timer
            auto start = std::chrono::high_resolution_clock::now();
            ai_ce_request(infer_request);
            //const ov::Tensor& output_tensor = infer_request.get_output_tensor();
            //ClassificationResult classification_result(output_tensor, {image_path});
            // End Timer
            auto stop = std::chrono::high_resolution_clock::now();
            // Calculate the duration
            auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
            std::cout << "Time taken for Iteration " << i << ": " << duration.count() << "ms" << std::endl;
        }    
        //Calculate the duration in miroseconds

        // -----------------------------------------------------------------------------------------------------
    } catch (const std::exception& ex) {
        std::cerr << ex.what() << std::endl;
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

Wan_Intel · ‎11-12-2025

Hi zxc0328,

Thank you for sharing the result with us.

The reason of only the first inference time can be printed may be due to the inference time is very fast.

You may use the following command to change milliseconds to microseconds and see if it display the inference time:

auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);

Regards,

Wan

Wan_Intel · ‎11-20-2025

Hi zxc0328,

Thank you for your question.

We will proceed with closing this thread as we have provided suggestions. If you have additional question, please submit a new thread as this thread will no longer be monitored.

Regards,

Wan