We use intel-gpu-tools tools.

Guduru__Bhanu_chandr · ‎07-17-2019

Hi

We are using Openvino in our Display solution. Currently we are facing some issues of which the details are mentioned below:

A single display is using multiple Inference models (Eg: 3) and we want to set all models with Hetero option (Eg: Hetero: GPU,CPU) with HW acceleration ON (HW acceleration being FFMPEG VAAPI or QSV).

In the above scenario we are seeing that output itself is not rendering at all and the Inference thread is in perpetual wait.When we did the same experiment with QSV builds, we are able to see the rendered output with inference outputs drawn.

We are using FFMPEG API to decode the given input instead of using OpenCV calls.

When an input is passed to ffmpeg (avformat_open_input) api, it will first create HW codec context using (av_hwdevice_ctx_create).

When we debugged the above scenario, we are seeing that Inference manager is stuck waiting in LoadNetwork ( InferenceEngine::InferencePlugin::LoadNetwork) in Inference thread.

Please note that av_hwdevice_ctx_create and LoadNetwork are working concurrently.

We are seeing that (av_hwdevice_ctx_create) thread is in waiting on lock for releasing resources. This call is common for both VAAPI and QSV.

av_hwdevice_ctx_create uses VAAPI driver calls in case of VAAPI build and Media SDK driver calls in case of QSV. We are seeing that this call is not getting locked in QSV for the above scenario and the rendering is happening smoothly.

Note: In the above problem scenario, if the first model is set to use simple CPU or GPU plugin and the remaining models to be set to Hetero, there are no problems and we are able to render all the outputs. This may be due to hardware context already created when the first model is given as a separate plugin instead of Hetero option.

Queries:

1) Could you let us know how OpenVino is accessing CPU/GPU resources? is it similar to how VAAPI or QSV accessing the same?

2) Could you let us know whether there are any resource sharing happening between LoadNetwork API and in av_hwdevice_ctx_create in VAAPI which is causing a deadlock? (GDB back trace and the thread info is attached)

3) Is it valid scenario to access using Hetero for all models (instead of second model onwards) when decode is getting done using FFMPEG VAAPI ?

Attaching the gdb traces for the issue scenario

Mark_L_Intel1 · ‎07-17-2019

Hi Chandra,

To answer your question:

1) No, there is no resources conflict between OpenVINO and VAAPI/QSV. In case of the acceleration cases, they uses different hardware piece on GPU, OpenVINO uses general graphic unit via OpenCL driver, VAAPI uses iHD graphic driver.

2) This is my guess, you have to check. I believe your are using FFmpeg decoder to output a surface to inference engine. But this should not be LoadNetwork, it should be input buffer to inference engine.

3) Yes, you should be able to give the answer to 1)

Let me know if this solve your question.

Mark

Mark_L_Intel1 · ‎07-17-2019

One more suggestion,

After checking your log file, I found you are loading "/usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so".

Could you change to iHD driver? This should the correct path "/usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so", you might change your environment to do this.

Mark

Guduru__Bhanu_chandr · ‎07-17-2019

Hi Mark,

av_hwdevice_ctx_create is created even before FFMpeg decoding of input frames is started. We have initiated av_hwdevice_ctx_create and LoadNetwork in concurrent threads as we want to keep our Inference engine ready when decoded frame comes to Inference engine. So, in this stage only we are seeing that both threads are in deadlock.

We did an experiment:

We created sleep after av_hwdevice_ctx_create to confirm context creation is done. Then we are initiating the LoadNetwork concurrently in another threads for the multiple models. In this scenario, the output is coming but no inference is being drawn as the debugging showed that Inference is stuck in LoadNetwork only (We are not holding on to the buffers in Loadstate. As long as Inference engine is in loadstate, we are sending the decoded buffers to rendering. Once inference engine load is completed, we will pass the decoded buffer to Inference engine and the inferred results are being drawn once infer request is completed)

In sample applications, provided with OpenVino toolkit all hetero operations are done in serial manner and hence this issue cannot be reproduced.

Is the multiple hetero loading concurrently supported in OpenVino in this case?

Regards,

Bhanu

Guduru__Bhanu_chandr · ‎07-17-2019

i965 driver is sued for ffmpeg vaapi decoding where as iHD driver is used for ffmpeg QSV decoding.

As we are seeing the issue in VAAPI build (and not in QSV builds), we should not change i965 to ihd libva driver for vaapi.

Liu, Mark (Intel) wrote:
One more suggestion,
After checking your log file, I found you are loading "/usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so".
Could you change to iHD driver? This should the correct path "/usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so", you might change your environment to do this.
Mark

om77 · ‎07-18-2019

Liu, Mark (Intel) wrote:
1) No, there is no resources conflict between OpenVINO and VAAPI/QSV. In case of the acceleration cases, they uses different hardware piece on GPU, OpenVINO uses general graphic unit via OpenCL driver, VAAPI uses iHD graphic driver.

Hi Mark,

just FYI.

During our tests, we noticed that hw encoder is using general graphic unit as well (the same one as OpenCL driver).

It isn't related to hw decoder.

Guduru__Bhanu_chandr · ‎07-18-2019

Hi Om,

Could you let us know how we can confirm whether hw decoder/encoder is using general graphic unit? Is there any tool we need to use or some other way?

Regards,

Bhanu

om77 wrote:
Quote:
Liu, Mark (Intel) wrote:

1) No, there is no resources conflict between OpenVINO and VAAPI/QSV. In case of the acceleration cases, they uses different hardware piece on GPU, OpenVINO uses general graphic unit via OpenCL driver, VAAPI uses iHD graphic driver.

Hi Mark,
just FYI.
During our tests, we noticed that hw encoder is using general graphic unit as well (the same one as OpenCL driver).
It isn't related to hw decoder.

om77 · ‎07-18-2019

We use intel-gpu-tools tools. There is an intel_gpu_top util inside. To check the last generation chips it may require to compile from sources.

https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/

The command line to monitor Intel GPU activity is like:

./intel_gpu_top -l

Mark_L_Intel1 · ‎07-18-2019

Hi Bhanu,

Your second post is better to explain the issue. Now I understand that this is the condition:

Happens when using FFmpeg VAAPI plugin not in QSV plugin.
The deadlock happens during library loading time when av_hwdevice_ctx_create called in one thread; while LoadNetwork in the other thread.

As I clarify before, there should not be hardware resource conflict but a software conflict, most likely, two threads might load the same software component. So I check the log again, in one thread it loads "libinference_engine.so" while the other one loads "i965_drv_video.so". So you might try these:

In a non deadlock case, gdb log to see what the "libinference_engine.so" will load next;
Try to load iHD driver to see if i965 is the problem, you can check this page for detailed instructions.

By the way, for FFmpeg support, you could also report the issue at http://trac.ffmpeg.org/report

Let me know if this helps.

Mark

Guduru__Bhanu_chandr · ‎07-22-2019

Hi Mark,

Please find below our observations for the 2 queries:

1. In non-deadlock scenario, we are seeing that libinference_engine.so is stuck in one of the low level locks (Attached: hetero_sleepfor2sec.txt)

2. We tried changing i965 to iHD driver and executed the use case. We are seeing that the iHD driver is now getting stuck in lock now. (Attached: hetero_ihd.txt)

Please let us know if this helps in proceeding further.

Regards,

Bhanu

Liu, Mark (Intel) wrote:
Hi Bhanu,
Your second post is better to explain the issue. Now I understand that this is the condition:
Happens when using FFmpeg VAAPI plugin not in QSV plugin.
The deadlock happens during library loading time when av_hwdevice_ctx_create called in one thread; while LoadNetwork in the other thread.
As I clarify before, there should not be hardware resource conflict but a software conflict, most likely, two threads might load the same software component. So I check the log again, in one thread it loads "libinference_engine.so" while the other one loads "i965_drv_video.so". So you might try these:
In a non deadlock case, gdb log to see what the "libinference_engine.so" will load next;
Try to load iHD driver to see if i965 is the problem, you can check this page for detailed instructions.
By the way, for FFmpeg support, you could also report the issue at http://trac.ffmpeg.org/report
Let me know if this helps.
Mark

Mark_L_Intel1 · ‎07-23-2019

Thanks,

I am confused with the result of #1, did you say #1 is non-lock case?

It looks like LoadNetwork is dead lock in both cases. The goal here is to check which resource is locking the thread, the lock case was blocking the whole sequence so I was thinking the non-lock case could tell us which software component is the root cause.

Mark

Guduru__Bhanu_chandr · ‎07-23-2019

Hi Mark,

Yes. That's right.

In non-lock case, our rendering is proceeding without inference data being drawn.

Earlier av_hwdevice_ctx_create is mot allowing to proceed to rendering. But now av_hwdevice_ctx_create i s created and the renderer gets video frames, the output video gets displayed properly.

The problem is with inference engine. As it is getting stuck in lock, the inference output is not coming and as such we are not able to draw the output on render buffer. So, render buffer is just going to renderer.

Only in the build with QSV, we are seeing that the output is getting drawn with inferred data as there is no lock getting stuck. The issue we are facing is with VAAPI builds as mentioned from the start.

Regards

Bhanu

Liu, Mark (Intel) wrote:
Thanks,
I am confused with the result of #1, did you say #1 is non-lock case?
It looks like LoadNetwork is dead lock in both cases. The goal here is to check which resource is locking the thread, the lock case was blocking the whole sequence so I was thinking the non-lock case could tell us which software component is the root cause.
Mark

OpenVino hetero models deadlock with VAAPI hw context create (Not observed with QSV)