Current architecture implies application execution on a CPU host and then forwarding of the execution to a chosen device only (using the corresponding plugin). So, it’s not possible to delegate CPU execution to some other process or host at runtime (at least until you have written your own plugin). When you start your application it executes on CPU: a network is read, then you load read network to the chosen plugin, create infer requests here and then ask to start inference.
Of course you can build your own application and just put it on a remote machine together with the required libraries (inference engine libs and plugins). And then you can start that app remotely from your local laptop using TCP/IP networking. But I don't think this is quite what you were asking...
Hope it helps !
Thank you very much for the Information! One more short question:
What if there is a process with several threads and each threads wants to make inferences using the same network. For memory efficiency it would be preferable if the network is only loaded/instantiated once. Is there some support (like synchronization, queuing, …?) so that multiple threads of the same process could use a single network/plugin?
Please read this documentation here and see if it will address your needs (I think it will). Search for StartAsync.
Also have a look at our Async samples under \inference_engine\samples. R5 has at least three different Async examples you can look at.
Hope it helps,