Is Linux kernel 5.19 optimized for 12th gen cpu ?

Aqil-ITXOTIC · ‎08-01-2023

Hello,

I am doing a benchmark for my license plate detection (ALPR) model. I am comparing the performance between i7-11700 and i9-12900KF. But surprisingly, the result i get is that i7-11700 outperformed slightly than i9-12900KF, which i trying to understand why.

The test configuration as of below : (for both cpu)

- my license plate detection (ALPR) model is FP16

- my code is written in python (version 3.8)

- OS Ubuntu 22.04 Kernel 5.19

- inferencing using openvino 23.01 (cpu mode)

The result benchmark i get (on average) for detecting license plate

- i7-11700 : 8.7095ms

- i9-12900KF : 9.938ms

My question is, how comes i7-11700 beats i9-12900KF by 1.229ms ? In theory, the i9-12900KF should be more faster than i7-11700, right ?

What are the possible reason to explain the result i get ? Is it Kernel fault, which not optimized to handle P-core and E-core ?

Thank you,

Aqil

Megat_Intel · ‎08-02-2023

Hi Aqil,

Thank you for reaching out to us.

Yes, you are correct, Intel® Core™ i9-12900KF is more powerful than Intel® Core™ i7-11700 therefore theoretically should perform better. However, other factors such as the model used might affect the results of the OpenVINO™ inference.

From the Performance Benchmark page, I configured the benchmark graph comparing i7-1185G7 and i9-12900HK. The results show that while most models perform better (lower latency) on the 12th gen CPU, there are some models that perform better (lower latency) on the 11th gen CPU. I share some of the results below:

Based on your average results, the difference of 1.229ms might be plausible. However, to provide any confirmation, more details are needed. Please share with us your model for further investigation.

On the other hand, the CPU plugin offers thread scheduling on 12th gen Intel® Core and up. You can choose to run inference on E-cores, P-cores, or both, depending on your application’s configurations. You can read more on the Release Notes page, under the New and Changed in 2023.0 section.

Could you try out the thread scheduling and run the benchmark on Intel® Core™ i9-12900KF using only P-cores to see if this solves your issue? For your information, I configured my scheduling core type to P-cores in Python here:

Regards,

Megat

Aqil-ITXOTIC · ‎08-04-2023

Thank you for your sharing and response.

It's my mistake that i didn't pay attention to Openvino updates that it is now can do scheduling on P or E core. I'll modify my code and then i'll analyse the result.

Thank you and best regards,

Aqil

Aqil-ITXOTIC · ‎08-07-2023

Hi,

I analyzed the result when i add scheduling PCORE_ONLY parameter in IR part. But the result is not improving. Here's the result shown as below, all numbers are recorded in milliseconds (ms).

i9-12900KF (without scheduling)
min 7.426463
max 12.879063
avg 9.938032436

i9-12900KF (scheduling PCORE_ONLY)

min 7.024152001
max 12.607284
avg 9.807636577

i7-11700

min 8.051206
max 13.548221
avg 8.709539491

From the result above, we can see, i just gained about 0.1ms from not using scheduling to use it. I also want to share the piece of code that does the IR part as shown below (we're using YOLOV7 Openvino script)

My question, what is the best IR parameter for 12th gen cpu ?

From my data, the i7-11700 and i5-1135G7, these two gave me logic and relevant benchmark data. Only the 12th gen not giving proper data.

Best regards and thank you,

Aqil

Aqil-ITXOTIC · ‎08-07-2023

Hi,

I got the new result, and it's not really a great improvement. All numbers shown in millisecond (ms).

i9-12900KF (no scheduling core parameter) :

min	7.426463
max	12.879063
avg	9.938032436

i9-12900KF using PCORE_ONLY :

min	7.024152001
max	12.607284
avg	9.807636577

i7-11700

min	8.051206
max	13.548221
avg	8.709539491

As shown above, i only gained roughly 0.1ms by stating scheduling core parameter PCORE_ONLY. And i'll share a piece of the code where the inference part take place :

Other information i can share, we're using YOLOV7 Openvino. My benchmark program will run a local video mp4, car entering toll gate, we detect the licence plate, and read it. (Detection plate, crop the plate, feed into Optical Character Recognition).
The numbers i just share to you, is only the detection part (car enter, we detect the plate). This part is the most time taken compared to Post-Processing time taken and OCR time taken.

So my question, based from my piece of code i shared, what is the best parameter for Inferencing especially for 12th gen cpu ?
I've tested on i7-11700 and i5-1135G7, and the result for these two cpu are relevant and logic.

Best regards and thank you,

Aqil

Megat_Intel · ‎08-08-2023

Hi Aqil,

Thanks for the details you provided.

For your information, on my end, I validated the YOLOv7 model with OpenVINO™ Benchmark Tool on Intel® Core™ i9-12900 and Intel® Core™ i7-1185G7. I received results that show that my i9-12900 performs better (137.90ms avg) than i7-1185G7 (194.66ms avg). However, it is important to note that the i7-1185G7 is less powerful than the i7-11700 that you have tested. I show my results below:

i9-12900:

i7-1185G7:

On the other hand, we have informed the relevant team regarding your issue for further investigation and we'll get back to you once we receive any confirmation, thank you.

Regards,

Megat

Megat_Intel · ‎10-23-2023

Hi Aqil,

We apologize for the delay.

After the investigation with the relevant team, we believe the issue you observed is due to the model itself. We investigated on our end with models from Open Model Zoo and received the expected results. As mentioned before, on the Performance Benchmark there are some models that perform better on the 11th-gen CPU compared to the 12th-gen CPU due to the model network structure.

The reason why some models perform differently is due to the model optimization. Each model has multiple layers that are optimized differently with OpenVINO™ which may affect the inference performance on the device. For your information, we tested the benchmark app on two models, the license-plate-recognition-barrier-0001 and the Yolov7-tiny model and also tested with the new OpenVINO™ 2023.1.0 release as well. Here are the Average Latency results we got:

license-plate-recognition-barrier-0001

Hint: Latency

CPU \ OV	23.0.1	23.1.0
i9-12900	0.67 ms	0.62 ms
i7-1165G7	1.11 ms	1.15 ms

Hint: None

CPU \ OV	23.0.1	23.1.0
i9-12900	7.31 ms	7.29 ms
i7-1165G7	3.95 ms	4.01 ms

Yolov7-tiny

Hint: Latency

CPU \ OV	23.0.1	23.1.0
i9-12900	18.91 ms	18.69 ms
i7-1165G7	49.62 ms	49.78 ms

Hint: None

CPU \ OV	23.0.1	23.1.0
i9-12900	139.51 ms	127.80 ms
i7-1165G7	197.84 ms	198.05 ms

From our results, The Yolov7-tiny model performs as expected with the i9-12900 CPU being faster than the i7-1165G7 CPU. On the other hand, for the license-plate-recognition-barrier-0001 model, the i9-12900 CPU did perform slower than the i7-1165G7 CPU however, when the latency hint is specified, the i9-12900 CPU is faster than the i7-1165G7 CPU.

On another note, we observed that the i9-12900 CPU performs faster in the OpenVINO™ 2023.1.0 compared to the OpenVINO™ 2023.0.1. You can try to install the latest 2023.1.0 release to see if the performance improves. Hope this helps.

Regards,

Megat

Megat_Intel · ‎10-30-2023

Hi Aqil,

Thank you for your question. This thread will no longer be monitored since we have provided suggestions. If you need any additional information from Intel, please submit a new question.

Regards,

Megat