Dear Singal, Dhruv,

Singal__Dhruv · ‎07-25-2019

I'm trying to run a object detection model through openVINO 2019 R1. I modified the security camera example to enable batching via batching together the first frame of batch_size input streams. However, with batching and ni-req, I'm seeing much worse inference performance when compared to the numbers quoted in blog posts.

I'm also using timers to time the time each step of the computation takes: decoding, inferencing, and parsing the results, and while the times add up with ni-req = 1, they don't when ni-req !=1.

Here is the batching code in question:

for (size_t batch_i = 0; batch_i < nInputChannels; batch_i += FLAGS_batch_size)
            {
                if (availableObjectDetectionRequests.empty())
                {
                    // ----------------------------Get object detection results -------------------------------------------
                    ObjectDetectionInferRequest::Ptr objectDetectionRequest = pendingObjectRequests.front();
                    objectDetectionRequest->wait();
                    timers["fetchResults"].start();
                    ObjectDetector.fetchResultsBatch(objectDetectionRequest, results);
                    timers["fetchResults"].finish();
                    timers["object_detector"].setCallDuration(objectDetectionRequest->getTime());
                    //std::cout << "inference time = " << timers["object_detector"].lastCallDuration << "\n";
                    pendingObjectRequests.pop();
                    availableObjectDetectionRequests.push(objectDetectionRequest);
                    // -----------------------------------------------------------------------------------------------------
                }
                const size_t current_batch_size = std::min(int(FLAGS_batch_size), int(nInputChannels - batch_i));
                ObjectDetectionInferRequest::Ptr objectDetectionRequest = availableObjectDetectionRequests.front();

                objectDetectionRequest->setId(batch_i, current_batch_size);
                //objectDetectionRequest->request->SetBatch(current_batch_size);
                // ----------------------------Asynchronous run of a object detection inference -----------------------
                for (size_t b = 0; b < current_batch_size; b++)
                {
                    //matU8ToBlob<uint8_t>(frames[batch_i + b], input_blob_, b);
                    timers["matU8"].start();
                    objectDetectionRequest->setImage(frames[batch_i + b], b);
                    timers["matU8"].finish();
                }
                if (!detectorStarted)
                {
                    detectorStarted = true;
                    timers["object_detector"].start();
                    std::map<std::string, InferenceEngineProfileInfo> netPerf = objectDetectionRequest->request->GetPerformanceCounts();
                    for (auto &layerPerf : netPerf)
                    {
                        std::cout << "layer name = " << layerPerf.first << " layer type = " << layerPerf.second.layer_type;
                        std::cout << "layer exec type = " << layerPerf.second.exec_type << '\n';
                    }
                }
                objectDetectionRequest->startAsync();
                ++inferCount;
                availableObjectDetectionRequests.pop();
                pendingObjectRequests.push(objectDetectionRequest);
                //std::cout<<"Requests in flight = "<<pendingObjectRequests.size()<<'\n';
                // -----------------------------------------------------------------------------------------------------
            }

Shubha_R_Intel · ‎08-02-2019

Dear Singal, Dhruv,

First I noticed that you are running tests on openVINO 2019 R1. This is already quite old. We just recently (as of last week) released OpenVino 2019R2. Please download and retry your tests. The preferred method to run such benchmarking experiments however is to actually use the benchmark_app . batch_size value is one of the switches.

Please try the benchmark_app on OpenVino 2019R2 and let me know how things work for you.

Sincerely,

Shubha

Batching kills performance