As per my understanding. The potential gains with the Multi-Device plugin are:
Improved throughput from using multiple devices (compared to single-device execution)
More consistent performance, since the devices share the inference burden (if one device is too busy, another can take more of the load)
But when I tried benchmark app in Devcloud for the edge, I found its always MULTI plugin has the lowest throughput compared to CPU and iGPU!
May be there's something I don't understand well, Can you elaborate more why that happening, and how to get the most out of MULTI plugin to get the highest throughput?
Thanks for sharing your concise findings with us.
I tried running Benchmark App with FP16 model on my local machine (Intel Core i7-8665U, Intel UHD Graphics 620). Based on my results, GPU always have the best performance, followed by MULTI:CPU,GPU and CPU.
Besides, I also noticed that the documentation mentioned that the performance of accelerators combines really well with Multi-Device, the CPU+GPU execution poses some performance caveats, as these devices share the power, bandwidth and other resources. Hence, please try running the Benchmark App again with FP16 model on MULTI:HDDL,CPU or MULTI:HDDL,GPU and share the results with us.
Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.