Community
cancel
Showing results for 
Search instead for 
Did you mean: 
michael_p_4
Beginner
118 Views

Occasional MFX_ERR_DEVICE_FAILED and MFX_WRN_DEVICE_BUSY in an endless loop

The problems described in the subject are not new, I take it.

Searching the forum for "permanent MFX_WRN_DEVICE_BUSY" results in quite a few finds, but not a single resolution.

So, I'd like to try and pick it up again.

I'll summarize in a few lines:

  • Windows 10 anniversary
  • apparently latest SDK "Intel_Media_SDK_2016_R2.msi"
  • Intel(R) HD Graphics 530 (latest, according to Windows devicemanager, from Aug 2016 v20.19.15.4501)
  • additional NVidia GeForce GT740
  • a monitor plugged into each (onboard Intel & nVidia)

Symptoms:

  • My code decodes and encodes MPEG2, AVC, MVC and VC1 perfectly in Intel-sw mode. No errors, no problems
     
  • When activating HW acceleration (with DX11), AVC decoding tends to throw an MFX_ERR_DEVICE_FAILED error sometime during processing the first maybe 20 frames - usually after having successfully decoded 0...8 frames.
    I had to write a recovery function, that reinitializes the decoder (including closing the session) and re-feeds whatever was previously fed into the decoder and then, incase frames had already been delivered to the main application, discard that amount, so they won't be sent twice.
    That is a pretty disgusting workaround and I have to buffer incoming data up to a certain point, just to be prepared for this fatal error.

    I repeat the recovery up to 10 times, until finally the frames all go through. Sometimes it requires 1 recovery attempt, sometimes up to 5 or 6 (which shows that the data can't be the culprit, it's just the GPU's mood).

    Once the first 10 frames made it through alive (whether with or without a forced recovery), the rest plays flawlessly. I never get this dreaded DEVICE_FAILURE later on.

    Again, note: the exact same code works flawlessly, when I initialize the session for SW decoding
     
  • MVC decoding exhibits the previous symptom as well, but an additional, more severe one: I run into an endless MFX_WRN_DEVICE_BUSY loop.
    The left and right eye encoded frames are fed into the decoder pairwise. I get a couple of MFX_ERR_MORE_DATA and MFX_ERR_MORE_SURFACE replies and handle them accordingly, then I usually get a handful of decoded image pairs (typically 5 to 7), after thtat every call to MFXVideoDECODE_DecodeFrameAsync returns MFX_WRN_DEVICE_BUSY
    I suppose I could try to do the same recovery in this case, but this one is so reliable (in that it happens absolutely every time), that I don't think, it would help.

    Again: that same code works flawlessly in sw mode.

So, all in all, that MFX_ERR_DEVICE_FAILED thing seems to be common and a lot of people fix it through recovering by shutting down the decoder/encoder and starting fresh. Which works, but is an awfully ugly and complicated kludge and can disrupt the video flow. I really don't think this should be allowed to happen at all.

So, what might I be doing wrong?

Just a note:
I wrote similar code to handle the same things for nVidia/CUDA, where things run a lot more smoothly (and I have to say, it is much, much easier to implement).
CUDA comes with a built-in parser, that takes the burden of handling that decoding-loop and reacting to all kinds of different warnings and errors off the developers shoulders - simply calls callbacks, when something is done, ready for the next step.
Maybe that would be a helpful addition to the media sdk as well. It would definitively make coding errors less likely and maybe cause less problems.

 

0 Kudos
13 Replies
michael_p_4
Beginner
118 Views

Some additional information:

Now I finally got a useful tracer.exe log.

Found out, that tracer doesn't work while the additional nVidia adapter is installed.

But the log only contains in more detail what I described above. No helpful information about the DEVICE_BUSY messages.

But if anyone cares for that log, I'll be happy to post it.

 

michael_p_4
Beginner
118 Views

Nothing? Anyone?

 

Jeffrey_M_Intel1
Employee
118 Views

Thanks for your report.  Callbacks, especially for decode conditions, are a very interesting idea.  In terms of getting your code running, can you reproduce this behavior with the Media SDK samples or tutorials?  If the problem also occurs with the sample code that would indicate looking at your setup or inputs.  If the available example code runs without this issue then the next step would be to find what is different in your application code.

michael_p_4
Beginner
118 Views

Thanks for you reply.

I tried it with the sample code now - the same thing happens.

It was tricky to set up and get running, because there is a problem with the Intel Media SDK, when initializing using "MFX_IMPL_HARDWARE_ANY" in case there is another adapter installed (nVidia here). MFX_IMPL_HARDWARE_ANY always results in software mode in this configuration. It works, once I remove the 2nd card. This has been mentioned by someone else in an older thread somewhere (2013 or so).

The same apparently applies to tracer.exe. It will only trace anything, if I remove the second GPU.

So instead of specifying MFX_IMPL_HARDWARE_ANY, I have a list of preferred initialization options and try them one by one.

MFX_IMPL_HARDWARE_ANY | MFX_IMPL_VIA_D3D11

usually does the trick.

Ok, I had to force the sample code to use this value, then it initializes correctly.

The same sequence of results there: a couple of frames get processed, a few MFX_ERR_MORE_SURFACE results (tells me, that the incoming frames can't be all that bad), and then an endless MFX_DEVICE_BUSY loop.

 

 

michael_p_4
Beginner
118 Views

Forgot to mention: this is about MVC only.

command line was:

sample_decode mvc -hw -i inputfile.bin

 

michael_p_4
Beginner
118 Views

Hmm.

Since this is not moving, I'll just post the tracer and analyzer logs here, maybe that will help.

 

michael_p_4
Beginner
118 Views

Ok, I seem to be talking to myself here, mostly.

I kept fiddling with this the last few days and things aren't improving, so I've decided to give up on this.

We're encouraging our cusomers to use nVidia GPUs, which is all in all an acceptable solution. It's a bit saddening, because the on-board Intel hardware is wide spread already and many have it without even knowing. But I have to conclude, that this Intel Media SDK just isn't quite there yet.

Liu__Chao
Beginner
118 Views

Hi Michael,

We are kind of in the same situation.. We spent several months trying to make QSV work in our product. Unfortunately, it turns out not worthy. QSV has quite a few critical bugs and Intel guys are not very involved to fix them. They just ask you to try their sample_xxx and disappear (exactly like what happened in this thread). For me, it's even more awkward. They said it'll be fixed in the next release. When the new release 2017 came out, I found out that my hardware is not supported anymore! For this particular bug, I could repro it (or something similar) on Linux too. I remember some people reported it quite a long time ago. So, yeah, I think it's wise that you don't recommend QSV to your customers.

In case this might help you, we are considering using vaapi. We haven't really started it yet. From what I have learned so far, it's not bad. Most importantly, it's open source. If there is some critical bugs, we don't need to have to wait for these people to take a look. It supports way more CPUs than QSV. It's only available on Linux though.

Jeffrey_M_Intel1
Employee
118 Views

First and foremost I'd like to apologize for the slow replies.  I was out for a large part of when your issue was waiting.  While I can't promise that we can fix the issues raised in this thread immediately, I do want you to know that your feedback has been heard and we're doing everything we can internally to drive for improvement.

Definitely understand if you are out of time, but If you'd like I'd be happy to continue the investigation from here.  To make up for lost time it may be quickest to set up a call.  We can continue by private message if you would like to try this route.

michael_p_4
Beginner
118 Views

yjl wrote:

They just ask you to try their sample_xxx and disappear (exactly like what happened in this thread).

I wouldn't go that far - I noticed, that Intel employees are involved in this forum and that is far more than you could say about nVidia. But the nVidia SDKs do require less assistance and the community covers the shortcomings of the documentation simply because CUDA is more wide spread.

yjl wrote:

When the new release 2017 came out, I found out that my hardware is not supported anymore! 

THAT though is an absolute nogo, especially in the Intel case.

(Intel, please listen) - when I'm creating software for a wider customer base, I usually try to use somewhat not-up-to-date hardware. A lot of developers get themselves the latest, best coolest hardware, because we geeks love that. The result is that they don't realize when they are writing software that performas badly, because their sparkling new CRAY crunches those deficits away (I remember Windows Word taking 30 seconds to launch, just because I didn't have an SSD back then).

While I can tell CUDA users to maybe upgrade their graphics adapter to something more recent for maybe 50-100 dollars, tearing out and replacing a motherboard for the latest on-board Intel GPU is a different thing. Apart from the cost, it's usually a day's work and not 5 minutes like when replacing a card.
I can see, that Intel wants to sell hardware, that is understandable, but the incentive shouldn't be that software doesn't run anymore (that diminishes trust). Better performance is bait enough.

Anyway - back to troubleshooting

michael_p_4
Beginner
118 Views

Jeffrey M. (Intel) wrote:

First and foremost I'd like to apologize for the slow replies.  I was out for a large part of when your issue was waiting.  While I can't promise that we can fix the issues raised in this thread immediately, I do want you to know that your feedback has been heard and we're doing everything we can internally to drive for improvement.
Definitely understand if you are out of time, but If you'd like I'd be happy to continue the investigation from here.  To make up for lost time it may be quickest to set up a call.  We can continue by private message if you would like to try this route.

The out of time thing is pressing indeed, we're preparing for an overdue release and the plan was to support all three existing GPU brands.
Intel is the one that is keeping us atm.

I'd prefer handling this either through the forum or private message, as I'm not a native speaker and English writing comes easier than speaking.

Summarizing:

  • A big plus is the "software" implementation of the sdk, which theoretically allows the code to run without or with outdated Intel chips, unfortunately it is very slow (factor 3-4 compared to x264, both tested with "highest speed over quality" settings).
  • The software implementation works without any mentionable hitches (our scenario: decode AVC, MVC, VC-1, MPEG, encode AVC only - MVC is planned, HEVC support is on a wish list).
  • I currently only have an Intel HD 530 for testing (and an even older 4300 or 4400 on a notebook, doesn't support hw encoding at all, I believe)
  • We have three scenarios: either decoding, encoding or transcoding all from host memory to host memory, display through DX9/11 is not a requirement.
  • I'm having the most trouble with decoding in hardware mode. I believe the encoding hasn't failed me yet.
  • The major nuisances are:
    - initialization with MFX_IMPL_AUTO_ANY will always result in software mode, initialization with MFX_IMPL_HARDWARE_ANY will always fail. Initialization with MFX_IMPL_HARDWARE2 | MFX_IMPL_VIA_D3D11 works. I believe this is due to the fact that I have an additional nVidia adapter (currently even two) installed and somehow the auto-logic stumbles over that. My workaround is, that I traverse a priority list of possible initialization combinations myself, until one succeeds.
    - the previous item is probably the cause for tracer.exe not working. It will once I remove the other adapters.
    - 50% chance of getting a MFX_ERR_DEVICE FAILED during decoding of the first 20 frames. This may be tolerable for a player, that can reset the whole thing and try again, but it's a killer when transcoding.
    - the infamous MFX_WRN_DEVICE_BUSY loop that never ends. Only happens when decoding MVC. But reliably so.
    - the fact that tracer.exe (other adapters removed) only states the obvious and gives no additional clues. It simply tells me what I can see for myself in the application (what I passed in and what the result was).

I haven't pursued the MVC decoding further, because I simply couldn't make it work.

Jeffrey_M_Intel1
Employee
118 Views

Just want to make sure I understand: do you see MFX_ERR_DEVICE_FAILED for h264, mpeg2, h265, also?  Or just mvc?  Does this behavior continue if you remove the other adapters or disable them in your BIOS?

Could you send a short sample of an MVC sequence that causes MFX_WRN_DEVICE_BUSY?  I have not been able to replicate in my tests but this behavior may be content dependent.  If you're worried about posting to the forum please feel free to send a private message. 

By the way, the new developer's guide has a section 8.3 "Robust decoding of network streams" which describes a strategy to avoid this problem.  You're probably doing something like this already but I'm mentioning in case it might help.

michael_p_4
Beginner
118 Views

Jeffrey M. (Intel) wrote:
Just want to make sure I understand: do you see MFX_ERR_DEVICE_FAILED for h264, mpeg2, h265, also?  Or just mvc?  Does this behavior continue if you remove the other adapters or disable them in your BIOS?

In a previous post I attached a tracer log - in order to be able to create that, I had to remove the other adapters (because, as mentioned, tracer.exe doesn't work when other adapters are installed - or probably more precisely, if the Intel GPU is not index #0).
So it doesn't matter whether there are other adapters in play.

Regarding other sources than h264: I just tried an mpeg2 source and it happens there too. Again: not consistently, but every nth time.

Jeffrey M. (Intel) wrote:
Could you send a short sample of an MVC sequence that causes MFX_WRN_DEVICE_BUSY?  I have not been able to replicate in my tests but this behavior may be content dependent.  If you're worried about posting to the forum please feel free to send a private message.

I'll send you a PM, yes.

Jeffrey M. (Intel) wrote:
By the way, the new developer's guide has a section 8.3 "Robust decoding of network streams" which describes a strategy to avoid this problem.  You're probably doing something like this already but I'm mentioning in case it might help.

Thanks, just read it, but if I understand that right, it only deals with a very simple scenario, when corrupt data is the cause. So the procedure is to discard that data, reset the decoder and continue.
My scenario is more difficult, because the source is fine. I have to reset the decoder and re-feed the same data several times, until it goes through. And for that -  because of reference frame dependencies, I have to "skip back" to the last IDR frame, feed everything again from there and dismiss all decoded frames I already sent back to the application in the previous run. Also must reset timestamps, so they get back in sync, etc... it's complicated and error-prone. Not pretty.
I can't just skip frames, I need all of them, because I'm transcoding.

Reply