Solved: CPU inference, comparing old and new cpu

brian2 · ‎06-12-2023

Hi Forum

I'm testing a production solution for inference on the edge.

(We have some simple CNN models which has been trained using TF2 and converted to

openvino via the model optimizer, mo.)

Everything works, and have been for long, but my assumption that a newer intel cpu would be faster has not satisified my hopes.

On developer machine : i7-11700 inference result is 7ms.

On production machine : i5-13600K inference result is 9ms.

Results are mean results from many iterations of

request.set_input_tensor( wrapMat2Tensor( M ) );
request.start_async();
request.wait();

On both machines the

/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

is set to performance.

I naively thought that I would get a few milliseconds off of the inference time. (newer core, more GHz, more cache, more more more .. ).

What can I do/test/verify/change?

- is my version (i5 13-th gen) really slower for this kind of job as the older i7 11'th gen?

- must I buy a i7 or i9 to really see some performace boost?

- is openvino (2023) using the 'performance cores' pr default? If not could you direct me towards some API (C++) which will enable this?

- IF I need the extra performance (AND YES I DO!) what hardware should I be looking at?

Hope to get a few pointers, Have a Good one!

/Brian

Aznie_Intel · ‎06-13-2023

Hi Brian2,

Thanks for reaching out.

Yes, you may set the specific core type base on your requirement. However, the slow inference on an old CPU is expected based on the system configuration and hardware specifications. Before upgrading your hardware, I would advise you to refer to the Intel® Distribution of OpenVINO™ toolkit Benchmark Results for the inference performance on a specified hardware configuration. Additionally, it also contains a thorough explanation of factors that affect IR model performance.

Regards,

Aznie

View solution in original post

brian2 · ‎06-12-2023

Regarding P-Cores and E-Cores:

I found the docu at : https://docs.openvino.ai/2023.0/groupov_runtime_cpp_prop_api.html#detailed-documentation

Using

ov::CompiledModel compiled_model = core.compile_model(model, "CPU",
ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY),

ov::hint::scheduling_core_type(ov::hint::SchedulingCoreType::ECORE_ONLY));

I get 19ms

and with

ov::hint::scheduling_core_type(ov::hint::SchedulingCoreType::PCORE_ONLY));

I get the 9ms.

Ie It is already utilizing the P-Cores in my default setup.

Aznie_Intel · ‎06-13-2023

Hi Brian2,

Thanks for reaching out.

Yes, you may set the specific core type base on your requirement. However, the slow inference on an old CPU is expected based on the system configuration and hardware specifications. Before upgrading your hardware, I would advise you to refer to the Intel® Distribution of OpenVINO™ toolkit Benchmark Results for the inference performance on a specified hardware configuration. Additionally, it also contains a thorough explanation of factors that affect IR model performance.

Regards,

Aznie

brian2 · ‎06-13-2023

Hi Aznie
And Thanks for reaching out..
Yes I would also expect a slow inference on an old cpu.

BUT my topic is the inverse.

I did not expect a slower inference on a newer cpu. (11th gen i7 faster than 13th gen i5)

I know that you can not investigate my topic in details without my system, BUT maybe you had some trick up your sleeve or some points to help me locate why I see these results ..
I aim at a simple production system with just a single cpu, but maybe I do need a top of the line for my purpose ..

Aznie_Intel · ‎06-13-2023

Hi Brian2,

Sorry for the misunderstanding. Model Optimizer can produce an IR with different precision. Which precision that you tested?

Generally, performance means how fast the model is in deployment with two key metrics as a measurement which are latency and throughput. You could try leveraging Throughput and latency by using OpenVINO performance hints.

ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));

Refer to Performance Hints: Latency and Throughput documentation.

Hope this helps.

Regards,

Aznie

brian2 · ‎06-14-2023

Hi Aznie
Yeah I have tried different settings, and compared on both machines, and tried different designs where I perform single inference in sequence as well as constructing a set of requests and running them in parallel .. all same results.

I have tried several different builds settings, building from source and installing the runtime via apt ..

Anyway I have accepted your answer, it did point me to a resource which I had skipped ..
Thanks .. have a good one
/Brian

Aznie_Intel · ‎06-14-2023

Hi Brian2,

This thread will no longer be monitored since this issue has been resolved. If you need any additional information from Intel, please submit a new question.

Regards,

Aznie