- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a C++ code that detects cars and pedestrians on videostream using optimized models (xml + bin). When I run this code on 2-cores i3, i get 14fps. On Xeon Gold 6132 - 17fps. Why the difference is so small for this CPUs? Second case: multiple copies of this code are running on Xeon Golds 6132 and used 4 threads. 1 copy - 14fps, 2 copies - 11fps, 3 copies - 8fps. With 20 threads I have: 1 copy - 17fps, 2 copies - 11fps, 3 copies - 7fps. CPU cores average load is less than 100% on any used thread (~60 - 80%). Why can the difference be so small even if i use 5x more threads? What are the reasons of this problems and how can i solve them?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Boris,
Please read the following blog post. I think it will help you. It covers many performance topics including "Throughput Mode". Long story short - try to make your "Infer requests in flight" be matched by the same number of available physical CPU cores.
Also kindly take a look at the following document:
https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Intro_to_Performance.html
Thanks,
Shubha

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page