I am trying to understand the relationship between inference requests, streams and threads on CPU. I've a yolo v3 model that i am bench-marking with the benchmarking app and here is my setup:
-Openvino 2021.1 running in docker container
-Core i7 9700k (8 cores, 8 threads)
- 16 gigs RAM
benchmark app results at different parameters are as follows:
|n_streams||n_threads||cpu cores at 100% usage||FPS|
From the above results, my questions are :
-why increasing threads from 1 to 7 doesn't make any change in FPS and why only half of the cores are at max usage and then climb to 100% at 8 threads? Is it OpenVino's default behavior to use cores in powers of 2?
-Also, sync mode always uses 50% of the cores, does this scale to any machine? like if a machine has 24 cores and 8 threads, will a single infer request use 12 cores to 100% usage?
- does 1 infer request means 1 stream launched by openvino? like are they equivalent? cz there is a separate option in benchmark app for infer_reqs, hence the confusion
- how openvino maps streams to threads and how to control this mapping in my app. like if i want to use 4 streams and restrict my OV app to 1 thread, whats the way to do it.
My goal here is to measure the performance of a model on some intel machines with x number of cores and y threads and by understanding the these relationships, being able to reliably estimate the performance of the model on an arbitrary intel CPU with a cores and b threads.
I understand that its a long post but would really appreciate some insights.
Thanks a lot.
Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.