Weird latency behavior - Multi models for multiple batch sizes

Huynh__Loc · ‎02-03-2020

Hi all, hope you could help me with this.

I have tried to run mobilenetv1 with dynamic batch size but got the "RuntimeError: MKLDNNGraph::CreateGraph: such topology cannot be compiled for dynamic batch!" error. Properly due to the squeeze layer inside mobilenetv1 that changes the shape of the tensor.

So in the end, I've decided to create multiple models for many batch sizes and did some benchmarks. But I got weird latency/throughput behavior.

My program is pretty simple

images = np.random.uniform(-1,1, size=[64, 3, 224, 224]).astype(np.float32) 
for batch_size in range(1,16): 
    #create model 
    net = IENetwork(model=model_xml, weights=model_bin) 
    net.batch_size = batch_size 
    exec_net = plugin.load(network=net) 
    input_blob = next(iter(net.inputs)) 

    #run inference 
    batch = images[np.arange(batch_size)] 
    res = exec_net.infer(inputs={input_blob: batch})

However, after 4 iterations, openvino started using only 1 single CPU core instead of all my CPU cores (I'm using Intel(R) Xeon(R) Gold 6140).

Batch_size: 1, Throughput: 643.86 imgs/s 
Batch_size: 2, Throughput: 924.83 imgs/s 
Batch_size: 3, Throughput: 1064.74 imgs/s 
Batch_size: 4, Throughput: 1245.72 imgs/s 
Batch_size: 5, Throughput: 168.25 imgs/s 
Batch_size: 6, Throughput: 168.66 imgs/s

Do you have any suggestions to fix this problem?

Thank you

JesusE_Intel · ‎02-20-2020

Hi Loc,

Could you try your multiple models with the benchmark_app, do you see the same behavior? Also, please make sure to update to the latest OpenVINO toolkit 2020.1 release.

Regards,

Jesus