Synchronous inference from multiple threads

Poca__Ramon · ‎07-21-2023

Hi,

We were using multiple threads with Openvino 2022.x to run multiple boxes (e.g. faces in a frame) through a network (e.g age detection). We were using the synchronous API.

Since 2023.0 we are getting an exception due to locking of the request. We might put our own lock to avoid that, but we'd rather be able to run several parallel inferences. Is there any hint/property allowing (fine grained) control over how many inferences can a network run? Or should we create multiple instances of the network?

return request.get_tensor(key)
RuntimeError: Exception from src/inference/src/infer_request.cpp:182:
Exception from src/inference/src/infer_request.cpp:164:
Infer Request is busy

Aznie_Intel · ‎07-21-2023

Hi Paco_Ramon,

Thanks for reaching out.

Did you observe the same RuntimeError & Infer Request Busy issue with OpenVINO 2022.x?Usually, REQUEST_BUSY error arises when a request is not being processed and then asks for another inference. May I know what you are trying to achieve?

Synchronous Inference will call inference stages one by one. Meanwhile, asynchronous inference requests are run in parallel. There is _waitExecutor class that waits for a response from a device about device task completion and this will avoid the request busy issue. You may refer to Asynchronous Inference Request.

Additionally, there are Threading utilities that provide task executors for asynchronous operations.

Regards,

Aznie

Poca__Ramon · ‎07-24-2023

Hi Aznie,

We didn't see this with Openvino 2022.x. It's really happened now. It might be that previous versions didn't hard-lock the network inference pipeline? We used this to spawn several threads all using the same network in parallel to process several detections on the input frame at once.

We use a serialized inference pipeline, based on multiple python threads attacking different openvino networks.

We need the pipeline to be able to ingest the camera input stream without lagging behind.

Aznie_Intel · ‎07-28-2023

Hi Paco_Ramon,

Which specific 2022 OpenVINO version you are using? Is it possible for you to share your files for us to further validate from our end?

You may share it here or privately to my email.

noor.aznie.syaarriehaahx.binti.baharuddin@intel.com

Please also share any workaround you have done so that we will not miss any information.

Regards,

Aznie

Poca__Ramon · ‎07-28-2023

Hi Aznie,

We worked around the issue by replacing the synchronous infer() by an async request followed by a wait(), which effectively blocks our threads while trusting that Openvino will somehow paralelize the requests.

Regards,

Aznie_Intel · ‎07-28-2023

Hi Poca_Ramon,

I'm glad to hear that. For a multiple thread inference, using an asynchronous code is more efficient as the ov::InferRequest::start_async() and ov::InferRequest::wait() allow the application to continue its activities and poll or wait for the inference completion when really needed.

Is there anything else you would like to know or may we proceed to close this ticket?

Regards,

Aznie

Aznie_Intel · ‎08-09-2023

Hi Poca_Ramon,

Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.

Regards,

Aznie