I´m trying to infer a video using a custom python app based on the action recognition sample. Also, I´m using a custom model on this. However, the CPU usage is always less than 50% on each core when the CPU device is selected. I suppose that it´s not an accurate performance. The report produced by the app shows these values:
- Data total: 10.06ms (+/-: 1.54) 99.41fps
- Data own: 10.04ms (+/-: 1.55) 99.59fps
- Data-Model total: 0.68ms (+/-: 0.12) 1471.27fps
- Data-Model own: 0.67ms (+/-: 0.12) 1487.45fps
- Model total: 0.63ms (+/-: 0.24) 1581.32fps
- Model own: 0.24ms (+/-: 0.23) 4160.17fps
- Render total: 21.93ms (+/-: 1.89) 45.60fps
- Render own: 21.92ms (+/-: 1.89) 45.62fps
My specs are:
- CPU: Xeon(R) CPU E3-1225 v3 @ 3.20GHz
- RAM: 32GB
- OS: Ubuntu 18.04
- OpenVINO version: 2020.2.210
So, if you have any idea about what is happening or any aspect which can be optimized. Moreover, I´ve attached the app and model used
Thanks for reading!
Hi Ram Prasad,
Thank you for your quick response!
I´ve followed your advice using vTune to measure my CPU usage and the export command. Nonetheless, the performance that I´ve got is not good in terms of CPU usage (24,2%) and Memory Bound (100% - 63.5 Cache Bound & 15.1 DRAM Bound). This time, I´ve used another python script based on the benchmark app available in the toolkit (attached bellow). I´ve also left you the vTune profile report attached to this post.
The purpose of this testing is to measure the enhancements of the inference time using this toolkit in comparison to Tensorflow. As far as I know, the tool provided by Tensorflow for inference is called Tensorflow Serving. Therefore, I would like to know how do you get the performance from this tool (or another approach that you are using) in order to see the improvements using OpenVINO against this one?
PS: I´m using the same model that I attached before.
Thanks in advance.
When running async inference in OpenVINO toolkit, the targeted performance value here is the actual throughput (as opposed to latency value in sync mode), so it's the number of inferences delivered (e.g. FPS). There are no benchmarking results available for such performance value as just % of CPU usage.
If you are targeting throughput results, then we would recommend you to try Benchmark C++ Tool or Benchmark Python Tool in async mode with your custom model. If you take CPU plugin, so it should optimize the number of parallel and queued inferences for CPU device based on the number of CPU cores. However, you can manipulate different parameters (e.g. -nstreams) to find the best approach for you - please see more details in Performance Topics
You can check OpenVINO inference results for various performance values (throughput, value, efficiency, total benefit) for different DL models and devices here https://docs.openvinotoolkit.org/latest/_docs_performance_benchmarks.html
Also, you can find some comparison example of OpenVINO throughput performance with 3rd party products in this article https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-model-server-boosts-a...
Best regards, Max.