- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Auto versus Multi Mode
While loading models, come one can select between auto and multi mode.
The multi mode enables the engine to run on different device-types in parallel.
Is it possible to specify device-types, which should be preferred by the engine?
For example if we have a Movidius extension card, can we specify CPU and the Movidius cores and tell the engine to manly use the Movidius cores. And only use the CPU if the extension card is 100% busy?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siegfried,
Yes, you are right. It is not possible to use heterogeneous execution since it will perform based on the supported layers for the device. There is no feature available in OpenVINO that will pass the pending inferences traffic to another device.
Regards,
Aznie
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siegfried,
Thanks for reaching out to us.
Firstly, you are right that the Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel.
For the Auto-Device plugin, there is Automatic device selection feature in OpenVINO Runtime. There is AUTO feature that will only discover available accelerators in your system based on the priority order list that you set. The AUTO will always choose the best device, and if compiling model fails on this device, AUTO will try to compile it on the next best device until one of them succeeds based on your priority order.
You can refer to the Auto-Device Plugin Execution documentation to setup the Auto-device.
Apart from that, the heterogeneous execution enables executing inference of one model on several. You can set the Automatic mode using “MULTI_DEVICE_PRIORITIES”ov::device::priorities property.
Regards,
Aznie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok... So multi-device makes it possible to use CPU and MYRIAD at the same time.
- Then i start an inference. Will it run on the CPU, or the MYRIAD device?
- Then start some more inferences. Will it run on the CPU, or the MYRIAD device? How is that decision made for every inference request?
- How are the devices are priorized? And is it possible to change the priority in some way?
- We would like to do almost all infereces on the MYRIAD device. Only if the MYRIAD device is busy it should use the CPU. Is something like this possible?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siegfried,
The purposes to execute networks in the heterogeneous mode are as follows:
- To utilize accelerators power and calculate heaviest parts of the network on the accelerator and execute not supported layers on fallback devices like CPU
- To utilize all available hardware more efficiently during one inference
Heterogeneous plugin will enable automatic inference splitting between several Intel devices if a device doesn’t support certain layers and the fallback will be executed on CPU.
However, a single inference can be split into multiple inferences to be handled by separate devices. Please refer to the following sections,
"Annotation of Layers per Device and Default Fallback Policy", "Details of Splitting Network and Execution" and "Execution Precision" documentation.
Regards,
Aznie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I do have heterogeneous execution in mind. But this would always use cpu and myriad at the same time, because the network is splitted to those two devices.
I would like to have a behaviour that only myriad is used in a "normal" scenario. But at some time, when there are more inferences pending, which myriad is not able to handle in time, the cpu should help.
From your answers i am assuming this is something which does not work with OpenVino. At least without building something atop of it.
Best regards,
Siegfried
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siegfried,
Yes, you are right. It is not possible to use heterogeneous execution since it will perform based on the supported layers for the device. There is no feature available in OpenVINO that will pass the pending inferences traffic to another device.
Regards,
Aznie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Siegfried,
This thread will no longer be monitored since this issue has been resolved. If you need any additional information from Intel, please submit a new question.
Regards,
Aznie

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page