Relationship between Infer Requests, Streams and Threads ( benchmarking app)

Raza__Ghulam_Jilani · ‎02-04-2021

I am trying to understand the relationship between inference requests, streams and threads on CPU. I've a yolo v3 model that i am bench-marking with the benchmarking app and here is my setup:

-Openvino 2021.1 running in docker container

-Core i7 9700k (8 cores, 8 threads)

- 16 gigs RAM

benchmark app results at different parameters are as follows:

mode: async

n_streams	n_threads	cpu cores at 100% usage	FPS
4	auto	8	11.2
4	1	4	6
4	2	4	6
4	3	4	6
4	4	4	6
4	7	4	6
4	8	8	11.2

From the above results, my questions are :

-why increasing threads from 1 to 7 doesn't make any change in FPS and why only half of the cores are at max usage and then climb to 100% at 8 threads? Is it OpenVino's default behavior to use cores in powers of 2?

-Also, sync mode always uses 50% of the cores, does this scale to any machine? like if a machine has 24 cores and 8 threads, will a single infer request use 12 cores to 100% usage?

- does 1 infer request means 1 stream launched by openvino? like are they equivalent? cz there is a separate option in benchmark app for infer_reqs, hence the confusion

- how openvino maps streams to threads and how to control this mapping in my app. like if i want to use 4 streams and restrict my OV app to 1 thread, whats the way to do it.

My goal here is to measure the performance of a model on some intel machines with x number of cores and y threads and by understanding the these relationships, being able to reliably estimate the performance of the model on an arbitrary intel CPU with a cores and b threads.

I understand that its a long post but would really appreciate some insights.

Thanks a lot.

Iffa_Intel · ‎02-07-2021

Greetings,

This article might answer all of your questions: https://www.edge-ai-vision.com/2020/03/maximize-cpu-inference-performance-with-improved-threads-and-memory-management-in-intel-distribution-of-openvino-toolkit/

Sincerely,

Iffa

Iffa_Intel · ‎02-18-2021

Greetings,

Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.

Sincerely,

Iffa

Relationship between Infer Requests, Streams and Threads ( benchmarking app)

Inference Engine