I'm trying to run a resnet50 model. First I ran it with batch = 1 then trying to increase batch to 4. However, execution time for batch = 4 is not much faster than 4*time with batch = 1. I think if it can execute in parallel so the time should reduce a lot right ?
Are there other way to reduce the execution time ?
Thank you in advance
I encourage you to study deployment_tools\inference_engine\samples\benchmark_app
The main idea of this tool is to find the best combination (from FPS/Latency perspective) of a number of parallel infer requests executed and batch size for your network on your machine. It should help OpenVino customers to understand how they should develop their own application (how many parallel infer requests, which limits can be met) to get best FPS/Latency during inference of their networks. So please use this tool to play with different parameters (such as number of asynchronous infer requests).
To answer your question the relationship between batch_size increasing and execution_time decreasing is a non-linear one. The network structure has higher impact.
So, usually running of the network with batch > 1 improves performance but eventually you will hit some limit. For example, you can increase performance by running of several asynchronous infer requests in parallel. But it also depends on currently used HW (number of cores, supported architecture and instructions set, etc).
Hope this answers your question and thanks for using OpenVino !