OpenVINO multiple instance

GMath7 · ‎01-08-2019

Hi,

I need to optimise my openvino object detection code. I found that it is possible to run multiple instances of the OpenVINO toolkit on each of the processor cores. Could you please brief me what I should do for that. Any sample code will be helpful. Could you please support as early as possible.

regards,

Gina

nikos1 · ‎01-08-2019

Hi Gina,

What inference device, CPU, GPU other? What OS, Linux or Windows, also Python or C++ API? Actually it does not matter too much provided you async your operations and feed the inference device fast enough once you get to over 90% device load that's all you can get. I found that the best example on how to optimize inference speed is in the async samples ( object_detection_demo_ssd_async , object_detection_demo_yolov3_async ). Once you get to that point of async I see no point trying to operate at core level. The SDK and plug-ins take care of this in an optimal way, e.g MKLDNN will ensure all CPU cores are utilized and clDNN will make sure all GPU EUs are fully loaded provided you push frames fast enough. Are the asnc samples helpful? Try the -pc parameter to get an idea of inference power. You can then estimate the best speed you can get.

   plugin.SetConfig({ { PluginConfigParams::KEY_PERF_COUNT, PluginConfigParams::YES } });
   ...
   printPerformanceCounts(*async_infer_request_curr, std::cout);

nikos

GMath7 · ‎01-08-2019

Hi nikos,

Need to run the same in CPU. OS:Linux and having Python Inference Engine API

nikos1 · ‎01-08-2019

Yes, that should be possible too. I don't use too much python for performance but I can see they support async from python too. Please see the relevant sections in the documentation and also refer to a few good async python samples posted in this forum. I think there are some good async python examples by another member in this forum ( https://github.com/PINTO0309 ). I am not sure if OpenVino comes with async python samples too.

async_infer(inputs=None)

Description:

Starts asynchronous inference of the infer request and fill outputs array

Parameters:

inputs - a dictionary of input layer name as a key and numpy.ndarray of proper shape with input data for the layer as a value
Return value:

None

Usage example:

1
>>> exec_net = plugin.load(network=net, num_requests=2)
2
>>> exec_net.requests[0].async_infer({input_blob: image})
3
>>> exec_net.requests[0].wait()
4
>>> res = exec_net.requests[0].outputs['prob']
5
>>> np.flip(np.sort(np.squeeze(res)),0)
6
array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
7
       5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
8
       2.26027006e-03, 2.12283316e-03 ...])

Stewart_C_Intel · ‎11-06-2019

Gina, if you have a look in the samples provided with OpenVINO 2019-R3 there's a C++ and Python Benchmark application supplied. It detects number of CPU cores and optimizes the number of parallel and queued inferences for each specific device eg CPU,GPU, Myriad etc. I highly recommend running this app to see whats possible.

The other think to try and get more speed is to use the MULTI option for your inference eg MULTI: CPU,GPU as that will use the CPU and the on-chip GPU. There's an example in https://github.com/intel-iot-devkit/smart-video-workshop/blob/master/hardware-heterogeneity/Multi-devices.md.