I am wondering whether it is legal to use multiple ExecutableNetworks from different application threads (std::thread)? In my case I want to load two models (with different topologies) on completely separate threads. Is there a problem with calling CreateInferRequestPtr() and Infer() without doing any kind of synchronization to prevent interleaving calls?
Is it required to use StartAsync() rather than Infer() in this case?
If it matters, I am using the CPU plugin, but would behavior be the same with other plugins?
By default the Inference Engine uses Intel TBB as a parallel engine. Thus, any OpenVINO-internal threading (including CPU inference) uses the same threads pool, provided by the TBB. But there are also other threads in your application, so oversubscription is possible at the application level
If your application simultaneously performs inference of multiple models on the same CPU, make sure you do not oversubscribe the machine. See Performance Aspects of Running Multiple Requests Simultaneously for more information.
Try the Benchmark App sample and play with number of streams running in parallel. The rule of thumb is tying up to a number of CPU cores on your machine. For example, on an 8-core CPU, compare the -nstreams 1 (which is a legacy, latency-oriented scenario) to the 2, 4, and 8 streams.
On the CPU there are multiple threads binding options, see CPU configuration options.
For more information refer to Optimization Guide
I've encountered random SegFaults and application termination if networks were loaded and infered in different threads (starting from 2020.1 release and v10-format of the model). I've tried to gather all the data needed to reproduce this behaviour and made code sample that reproduces this problem (can be found in separate forum's topic: https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit/topic/856371 )
Could you please clarify if it's a bug or it's not legal to load/infer networks in multiple application threads?