The optimum number of inference requests for multiple DNN models on multiple NCS sticks

Chow__Ka-Ho · ‎09-14-2019

Hello,

I am trying to run multiple DNN models on multiple NCS2 sticks. According to the guidelines in [link] (Multiple NCS Devices), I should

Initialize only one IEPlugin
Create one ExecutableNetwork for each device. Suppose I have two NCS sticks and three DNN models. I have to create two threads (conceptually, one for each stick) and initialized three models in each thread (i.e., six in total).

The above works well. But I have a question on setting the number of requests. The link suggests that the optimum number is four inference requests for each ExecutableNetwork. I was wondering in my case, should I set four in each of the six ExecutableNetworks I have created? Or the optimum number refers to NCS sticks, meaning four requests for each thread and I need to schedule the requests such that three ExecutableNetworks in the same thread share four slots of inference requests.

Sorry for the long text. Thank you so much =)

Shubha_R_Intel · ‎09-19-2019

Dear Chow, Ka-Ho,

May I suggest simplifying down to one NCS stick and one model (single thread) for the time being and running experiments with the benchmark app ? The benchmark app will allow you to tweak number of iterations, number of infer requests, async vs sync API, batch size. It seems to me that if you can see how one device behaves, you will be able to extrapolate data which would help with the multi-thread/multi-model design.

There is no black and white answer. It really depends on many factors, most importantly the DNN model you are using.

Hope it helps,

Shubha