Media (Intel® oneAPI Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK

Errors when using several encoding applications

OTorg
New Contributor III
2,654 Views

Hi,

Customers often wish to fully utilize processor resources for decoding/encoding (of live/realtime video streams).
I.e. to run multiple imsdk applications: some of them utilize gpu/hw, and some - cpu/sw.

And it is normally that number of applications can be changed from time to time.
Or, alternative approach: one (multithreaded) apllication/service, which can change number of handled streams on-the-fly.
E.g. computer transcodes 5 iptv streams today and will transcode 7 streams tomorrow. And, 5 running already streams shouldn't be interrupted during start of 2 additional streams.

But intel media sdk library has well-known bug, which has not been fixed for years.
Start/stop of mfx session may cause errors inside another running sessions.
I think the reason is lack of synchronization primitives somewhere inside imsdk libraries.

I saw such errors on different processors, different windows versions, etc.
Issue can be easily reproduced using standart imsdk samples.

Steps to reproduce a bug with sw-library on windows:

1. Download latest samples (today it is https://software.intel.com/sites/default/files/managed/61/d0/MediaSamples_MSDK_2017_8.0.24.271.msi).
2. Take \_bin\win32\sample_encode.exe from it.
3. Employ latest imsdk library (MediaSDK2019R1.exe, libmfxsw32.dll).
4. Take some uncompessed video file. I use this one _input.nv12: https://drive.google.com/file/d/1z3O6iobsnPLzwQddXlTHoK1UzOIY9fJ3/view?usp=sharing
5. Download two scripts (sample_encode_1.bat and sample_encode_N.bat) attached to this message and put them beside sample_encode.exe.
6. Start sample_encode_N.bat. It will run in infinite loop four sample_encode.exe instances.
7. Wait several days (or less), and you'll see error messages at sample_encode consoles.

This bug has already been published years ago. But it is still not fixed. Here you can find detailed discussions:
https://software.intel.com/en-us/forums/intel-media-sdk/topic/696953
https://software.intel.com/en-us/forums/intel-media-sdk/topic/475624
https://software.intel.com/en-us/forums/intel-media-sdk/topic/536840
 

0 Kudos
21 Replies
Mark_L_Intel1
Moderator
2,425 Views

Thanks,

I am doing it now and I hope I can reproduce it.

The sample and release are old, I am using the latest sample and release--Media SDK for Windows 2019R1. So I skipped step 1~3 from your list.

I will keep you updated.

Mark

0 Kudos
Mark_L_Intel1
Moderator
2,425 Views

This is the updates,

I run the script for several hours and it was interrupt. It has error message of memory allocation but I am not sure if it is because of the issue you report or the windows sleep mode.

I check all the posts from you and I found the error is not the same as those. I also noticed you were referring to library " libmfxsw32.dll", I briefly check my installation, I can't find it. As I remembered, we discontinued software codec support.

Any way, here is the error message I got, do you still want me to continue?

file 151 processed, go next
file 152 processed, go next

[ERROR], sts=MFX_ERR_MEMORY_ALLOC(-4), CEncodingPipeline::Run, MSDK_INVALID_SURF_IDX==nEncSurfIdx error at c:\bb\nnmsdkbaw05_1\build_windows_sw_lib\build_dir\repos\mdp_msdk-lib\samples\sample_encode\src\pipeline_encode.cpp:2053

[ERROR], sts=MFX_ERR_MEMORY_ALLOC(-4), wmain, pPipeline->Run failed at c:\bb\nnmsdkbaw05_1\build_windows_sw_lib\build_dir\repos\mdp_msdk-lib\samples\sample_encode\src\sample_encode.cpp:1522
error got from encode: -4
Press any key to continue . . .

Mark

0 Kudos
OTorg
New Contributor III
2,425 Views

Liu, Mark (Intel) wrote:

I check all the posts from you and I found the error is not the same as those.

Any way, here is the error message I got, do you still want me to continue?

Errors are different from time to time. This is one of the reasons why I supposed that problems root is a lack of MT-synchronization primitives inside libmfx*.dll. Run the test again and again, and you'll see other errors that occur in different places.

Several years ago I wrote a workaround for our applications. It is a system-wide synchronization essence having some intelligence. It prevents parallel access to libmfx during "management" calls (MFXVideoENCODE_Init, MFXVideoENCODE_Close, etc), but allows multiprocess/multithreaded usage of coding pipe-line routines (MFXVideoENCODE_EncodeFrameAsync, MFXVideoCORE_SyncOperation, etc). Since that time our applications could work reliably in 24/7/365/N mode on hundreds of computers.

Having such a patch, why am I raising the question again now? Because I have suspicion about false MFX_ERR_GPU_HANG triggering. And I feels their source is also relative to synchronization lack. But for now, this is only suspicion. I have to perform a piece of research to confirm or deny it. I'll create a new forum topic if suspicions are confirmed. In the meantime, it would be desirable that intel developers to pay attention to synchronization problems.

In addition, I would like other people on planet Earth to have the possibility to normally use imsdk in multi-application scenarios:)

 

0 Kudos
OTorg
New Contributor III
2,425 Views

Liu, Mark (Intel) wrote:

I also noticed you were referring to library " libmfxsw32.dll", I briefly check my installation, I can't find it. As I remembered, we discontinued software codec support.

Umm. Can you tell more about the discontinuation of software version?

Because imsdk 2019 r1 release notes tell a different story:
- System Requirements: IA-32 or Intel 64 architecture ... for running software implementation...
- Known Limitations: ... is relevant for both software and hardware implementations ...

And MediaSDK2019R1.exe contains both libmfxsw32.dll and libmfxsw64.dll.

I realize that new coding features are absent in software versions, but the full absence of implementation/support is a something new for me. Or did I misunderstand you?

 

0 Kudos
Mark_L_Intel1
Moderator
2,425 Views

Yes, you are right and this is my fault. We have libmfxsw32.dll which is under <Media SDK root>/Software Development Kit/bin/win32 directory, it is the software codec.

I am still trying the reproducer you provided, I started with 4 threads over night and I can see they fail one by one at different time; by the time I left, I saw only one running, all the failures had sync error. Are these you expected?

 I will update the details when all threads are done.

 

0 Kudos
OTorg
New Contributor III
2,437 Views

Liu, Mark (Intel) wrote:
 I will update the details when all threads are done.

The last application instance will not fail (most likely). Because there are no more competitions/races on that machine.

0 Kudos
OTorg
New Contributor III
2,437 Views

Liu, Mark (Intel) wrote:
all the failures had sync error. Are these you expected?

Do you mean such messages?:

[ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::GetFreeTask, m_TaskPool.SynchronizeFirstTask failed at src\pipeline_encode.cpp:1533
[ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::Run, m_pmfxENC->EncodeFrameAsync failed at src\pipeline_encode.cpp:1738
[ERROR], sts=MFX_ERR_UNKNOWN(-1), wmain, pPipeline->Run failed at src\sample_encode.cpp:1086

Yes, it is typical failures.

0 Kudos
Mark_L_Intel1
Moderator
2,437 Views

Hi,

I had a 3-day runs and following results with script sample_encode_N.bat:

  • It starts 4 threads and 3 of them crashed in first day(<12 hours) with following error: 
    file 86 processed, go next
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncTaskPool::SynchronizeFirstTask, SyncOperation failed at src\pipeline_encode.cpp:157
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::GetFreeTask, m_TaskPool.SynchronizeFirstTask failed at src\pipeline_encode.cpp:1748
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::Run, m_pmfxENC->EncodeFrameAsync failed at src\pipeline_encode.cpp:1961
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), wmain, pPipeline->Run failed at src\sample_encode.cpp:1301
    error got from encode: -1
    Press any key to continue . . .
  • The last thread is still running up to now,

I am attached the screen capture here, is this the similar to yours?

I will submit a bug on this.

Mark

0 Kudos
OTorg
New Contributor III
2,437 Views

Hi,

Liu, Mark (Intel) wrote:
 I am attached the screen capture here, is this the similar to yours? 

Yes, your errors are similar to mine.

They arise when using sw-version of libmfx*.dll. Both 32-bit and 64-bit libraries have that problem.

And perhaps that bug is also the source of MFX_ERR_GPU_HANG errors at hw-libraries. I'll describe how to reproduce hw-errors in the next post.

 

0 Kudos
OTorg
New Contributor III
2,437 Views

It is harder to reproduce MFX_ERR_GPU_HANG issues using imsdk samples. I tried to find a script that shows errors faster.

Steps to reproduce MFX_ERR_GPU_HANG at encoder application (32-bit or 64-bit):
1. Download more_tools_to_raise_gpu_hang.zip and unpack it to your working folder.
2. Run sample_decode_N_and_encode.bat and wait.
3. If you don't see errors within 10-20 minutes, then close all sample_decode/sample_encode windows and go to step 2.

In a real life I saw MFX_ERR_GPU_HANG occurrences amid ordinary decoding/encoding work (without applications start/stop). It was observed using intel graphics driver version 6194, 6323, 6373, 7212 on i7-6700, i5-7500, i5-7260u, e3-1585-v5. So it seems like a common problem.

And I want to note that real-life MFX_ERR_GPU_HANG occurrences was observed when cpu/gpu load was reasonably away from 100%: on a machines with live (not file) media streams.
 

0 Kudos
Mark_L_Intel1
Moderator
2,437 Views

Thanks,

I have submitted the first issue,

Could you submit a different post for GPU hang issue? You don't have to resubmit the data and description but just point back to this post.

We need them to be debugged separately because I can't assume they are the same issue.

Mark

0 Kudos
OTorg
New Contributor III
2,437 Views

Hi Mark,

Liu, Mark (Intel) wrote:
 Could you submit a different post for GPU hang issue? 

Done:

https://software.intel.com/en-us/forums/intel-media-sdk/topic/830266

Thanks!

0 Kudos
Mark_L_Intel1
Moderator
2,437 Views

dj_alek wrote:

Hi Mark,

Quote:

Liu, Mark (Intel) wrote:

 Could you submit a different post for GPU hang issue? 

 

Done:

https://software.intel.com/en-us/forums/intel-media-sdk/topic/830266

Thanks!

Thanks, I have reproduced it and let's follow up this issue on that post.

0 Kudos
fei__liu
Beginner
2,437 Views

Hi Mark, I running sample_decoder with intel media sdk 2019 error occurred:

[ERROR], sts=MFX_ERR_NULL_PTR(-2), CSmplBitstreamReader::Init, m_fSource pointer is NULL at c:\users\admin\documents\intel? media sdk 2019 r1 - media samples 8.4.27.25\sample_common\src\sample_utils.cpp:596

[ERROR], sts=MFX_ERR_NULL_PTR(-2), CDecodingPipeline::Init, m_FileReader->Init failed at c:\users\admin\documents\intel? media sdk 2019 r1 - media samples 8.4.27.25\sample_decode\src\pipeline_decode.cpp:240

[ERROR], sts=MFX_ERR_NULL_PTR(-2), wmain, Pipeline.Init failed at c:\users\admin\documents\intel? media sdk 2019 r1 - media samples 8.4.27.25\sample_decode\src\sample_decode.cpp:686

Error in Source Code:

    // Initializing file reader
    totalBytesProcessed = 0;
    sts = m_FileReader->Init(pParams->strSrcFile);
    MSDK_CHECK_STATUS(sts, "m_FileReader->Init failed");

 

0 Kudos
Beese__Erin
Beginner
2,437 Views

dj_alek wrote:

Several years ago I wrote a workaround for our applications. It is a system-wide synchronization essence having some intelligence. It prevents parallel access to libmfx during "management" calls (MFXVideoENCODE_Init, MFXVideoENCODE_Close, etc), but allows multiprocess/multithreaded usage of coding pipe-line routines (MFXVideoENCODE_EncodeFrameAsync, MFXVideoCORE_SyncOperation, etc). Since that time our applications could work reliably in 24/7/365/N mode on hundreds of computers.

Alek,

Which versions of media sdk did you have success with this work around? Also, did you stop seeing the exceptions/MFX_ERR_UNKNOWN or did your application just recover properly after seeing the exceptions? I'm currently running 2019 R1 and I've tried your workaround but I'm still seeing an exception after ~ 3 hours (I've only got 2 joined streams in my test). I even tried preventing parallel access to pipeline routines in addition to management calls.

Thanks,

Erin 

0 Kudos
OTorg
New Contributor III
2,437 Views

Hi Erin,

Beese, Erin wrote:
 

> Which versions of media sdk did you have success with this work around?
Any version works good (hw imsdk implementation). Version 1.25 is used most often.

> Did you stop seeing the exceptions/MFX_ERR_UNKNOWN or did your application just recover properly after seeing the exceptions?
We don't get exceptions/MFX_ERR_UNKNOWN.

> I've tried your workaround but I'm still seeing an exception.
Perhaps your workaround implementation/usage has inaccuracies, check it carefully.

0 Kudos
Beese__Erin
Beginner
2,437 Views

Hi Alek,

"Any version works good (hw imsdk implementation). Version 1.25 is used most often "

Does this apply to the software implementation as well?

Also, in one of the old threads you mentioned the following:

"And I saw exceptions within even 1process/1session during recent tests. Both sw and hw implementations, imsdk 1.8/1.9."

If you are seeing exceptions with just one process/one session how is synchronization going to help with the issue?

Erin

0 Kudos
OTorg
New Contributor III
2,437 Views

Beese, Erin wrote:

Does this apply to the software implementation as well?

We haven't tested carefully such usecase. So, I can't get answer, sorry.

 

0 Kudos
Beese__Erin
Beginner
2,437 Views

Oh I see, thanks.

0 Kudos
OTorg
New Contributor III
2,090 Views

Beese, Erin wrote:

"And I saw exceptions within even 1process/1session during recent tests. Both sw and hw implementations, imsdk 1.8/1.9."

If you are seeing exceptions with just one process/one session how is synchronization going to help with the issue?

Probably, another problem was described there, which concerns early imsdk versions...

0 Kudos
Reply