Poor inference performance with interleaved network models

Raghavan_S_ · ‎08-07-2018

When I run a typical convolution network on the CPU, the execution time is very unpredictable.

If I run inferencing (net.infer()) in a loop:
net.infer(input)
net.infer(input)
...
I get a time of 30 milliseconds with very little variation from run to run.

However, when I interleave multiple networks and invoke them alternately:
net1.infer(input)
net2.infer(input)
net1.infer(input)
net2.infer(input)
...
the average performance drops significantly to 67 milliseconds. The worst case is over 200 milliseconds. Is this a problem with cache performance? Is there any way to improve this?

Environment:
Processor: core i7 8700K with 32 GB RAM
OS: Ubuntu 16.04
OpenVINO version: 2018.2.319 (dated July 2018)
Network: 4 convolution layers + 3 FC layers
Model file size: ~ 8MB (32 bit float)
Target: CPU, AVX2 (SSE4 also gives similar performance)

Thanks
Raghavan

Mark_L_Intel1 · ‎08-24-2018

Hi Raghavan,

Sorry for the late response because we are busy on multiple requests.

I want to reproduce what you observed, could you tell me the steps?

The model and the sample you are using, which code you used for inference?

Mark