Only 50% CPU usage during the async inference

Fernandez__Adrian · ‎05-10-2020

Hello,

I´m trying to infer a video using a custom python app based on the action recognition sample. Also, I´m using a custom model on this. However, the CPU usage is always less than 50% on each core when the CPU device is selected. I suppose that it´s not an accurate performance. The report produced by the app shows these values:

Data total: 10.06ms (+/-: 1.54) 99.41fps
Data own: 10.04ms (+/-: 1.55) 99.59fps
Data-Model total: 0.68ms (+/-: 0.12) 1471.27fps
Data-Model own: 0.67ms (+/-: 0.12) 1487.45fps
Model total: 0.63ms (+/-: 0.24) 1581.32fps
Model own: 0.24ms (+/-: 0.23) 4160.17fps
Render total: 21.93ms (+/-: 1.89) 45.60fps
Render own: 21.92ms (+/-: 1.89) 45.62fps

My specs are:

CPU: Xeon(R) CPU E3-1225 v3 @ 3.20GHz
RAM: 32GB
OS: Ubuntu 18.04
OpenVINO version: 2020.2.210

So, if you have any idea about what is happening or any aspect which can be optimized. Moreover, I´ve attached the app and model used

Thanks for reading!

SIRIGIRI_V_Intel · ‎05-15-2020

Hi Adrian,

How do you measure the CPU usage?

For better understanding about the CPU usage, you may use Intel Vtune Profiler.

Regards,

Ram prasad

Fernandez__Adrian · ‎05-21-2020

Hi Ram Prasad,

Thank you for your quick response!

I´ve followed your advice using vTune to measure my CPU usage and the export command. Nonetheless, the performance that I´ve got is not good in terms of CPU usage (24,2%) and Memory Bound (100% - 63.5 Cache Bound & 15.1 DRAM Bound). This time, I´ve used another python script based on the benchmark app available in the toolkit (attached bellow). I´ve also left you the vTune profile report attached to this post.

The purpose of this testing is to measure the enhancements of the inference time using this toolkit in comparison to Tensorflow. As far as I know, the tool provided by Tensorflow for inference is called Tensorflow Serving. Therefore, I would like to know how do you get the performance from this tool (or another approach that you are using) in order to see the improvements using OpenVINO against this one?

PS: I´m using the same model that I attached before.

Thanks in advance.

Regards, Adrian.

Max_L_Intel · ‎06-11-2020

Hi Adrian.

When running async inference in OpenVINO toolkit, the targeted performance value here is the actual throughput (as opposed to latency value in sync mode), so it's the number of inferences delivered (e.g. FPS). There are no benchmarking results available for such performance value as just % of CPU usage.

If you are targeting throughput results, then we would recommend you to try Benchmark C++ Tool or Benchmark Python Tool in async mode with your custom model. If you take CPU plugin, so it should optimize the number of parallel and queued inferences for CPU device based on the number of CPU cores. However, you can manipulate different parameters (e.g. -nstreams) to find the best approach for you - please see more details in Performance Topics

You can check OpenVINO inference results for various performance values (throughput, value, efficiency, total benefit) for different DL models and devices here https://docs.openvinotoolkit.org/latest/_docs_performance_benchmarks.html

Also, you can find some comparison example of OpenVINO throughput performance with 3rd party products in this article https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-model-server-boosts-ai-inference-operations.html

Thanks.
Best regards, Max.