Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

Errors when using several encoding applications

OTorg
New Contributor III
3,566 Views

Hi,

Customers often wish to fully utilize processor resources for decoding/encoding (of live/realtime video streams).
I.e. to run multiple imsdk applications: some of them utilize gpu/hw, and some - cpu/sw.

And it is normally that number of applications can be changed from time to time.
Or, alternative approach: one (multithreaded) apllication/service, which can change number of handled streams on-the-fly.
E.g. computer transcodes 5 iptv streams today and will transcode 7 streams tomorrow. And, 5 running already streams shouldn't be interrupted during start of 2 additional streams.

But intel media sdk library has well-known bug, which has not been fixed for years.
Start/stop of mfx session may cause errors inside another running sessions.
I think the reason is lack of synchronization primitives somewhere inside imsdk libraries.

I saw such errors on different processors, different windows versions, etc.
Issue can be easily reproduced using standart imsdk samples.

Steps to reproduce a bug with sw-library on windows:

1. Download latest samples (today it is https://software.intel.com/sites/default/files/managed/61/d0/MediaSamples_MSDK_2017_8.0.24.271.msi).
2. Take \_bin\win32\sample_encode.exe from it.
3. Employ latest imsdk library (MediaSDK2019R1.exe, libmfxsw32.dll).
4. Take some uncompessed video file. I use this one _input.nv12: https://drive.google.com/file/d/1z3O6iobsnPLzwQddXlTHoK1UzOIY9fJ3/view?usp=sharing
5. Download two scripts (sample_encode_1.bat and sample_encode_N.bat) attached to this message and put them beside sample_encode.exe.
6. Start sample_encode_N.bat. It will run in infinite loop four sample_encode.exe instances.
7. Wait several days (or less), and you'll see error messages at sample_encode consoles.

This bug has already been published years ago. But it is still not fixed. Here you can find detailed discussions:
https://software.intel.com/en-us/forums/intel-media-sdk/topic/696953
https://software.intel.com/en-us/forums/intel-media-sdk/topic/475624
https://software.intel.com/en-us/forums/intel-media-sdk/topic/536840
 

0 Kudos
21 Replies
Mark_L_Intel1
Moderator
3,243 Views

Thanks,

I am doing it now and I hope I can reproduce it.

The sample and release are old, I am using the latest sample and release--Media SDK for Windows 2019R1. So I skipped step 1~3 from your list.

I will keep you updated.

Mark

0 Kudos
Mark_L_Intel1
Moderator
3,243 Views

This is the updates,

I run the script for several hours and it was interrupt. It has error message of memory allocation but I am not sure if it is because of the issue you report or the windows sleep mode.

I check all the posts from you and I found the error is not the same as those. I also noticed you were referring to library " libmfxsw32.dll", I briefly check my installation, I can't find it. As I remembered, we discontinued software codec support.

Any way, here is the error message I got, do you still want me to continue?

file 151 processed, go next
file 152 processed, go next

[ERROR], sts=MFX_ERR_MEMORY_ALLOC(-4), CEncodingPipeline::Run, MSDK_INVALID_SURF_IDX==nEncSurfIdx error at c:\bb\nnmsdkbaw05_1\build_windows_sw_lib\build_dir\repos\mdp_msdk-lib\samples\sample_encode\src\pipeline_encode.cpp:2053

[ERROR], sts=MFX_ERR_MEMORY_ALLOC(-4), wmain, pPipeline->Run failed at c:\bb\nnmsdkbaw05_1\build_windows_sw_lib\build_dir\repos\mdp_msdk-lib\samples\sample_encode\src\sample_encode.cpp:1522
error got from encode: -4
Press any key to continue . . .

Mark

0 Kudos
OTorg
New Contributor III
3,243 Views

Liu, Mark (Intel) wrote:

I check all the posts from you and I found the error is not the same as those.

Any way, here is the error message I got, do you still want me to continue?

Errors are different from time to time. This is one of the reasons why I supposed that problems root is a lack of MT-synchronization primitives inside libmfx*.dll. Run the test again and again, and you'll see other errors that occur in different places.

Several years ago I wrote a workaround for our applications. It is a system-wide synchronization essence having some intelligence. It prevents parallel access to libmfx during "management" calls (MFXVideoENCODE_Init, MFXVideoENCODE_Close, etc), but allows multiprocess/multithreaded usage of coding pipe-line routines (MFXVideoENCODE_EncodeFrameAsync, MFXVideoCORE_SyncOperation, etc). Since that time our applications could work reliably in 24/7/365/N mode on hundreds of computers.

Having such a patch, why am I raising the question again now? Because I have suspicion about false MFX_ERR_GPU_HANG triggering. And I feels their source is also relative to synchronization lack. But for now, this is only suspicion. I have to perform a piece of research to confirm or deny it. I'll create a new forum topic if suspicions are confirmed. In the meantime, it would be desirable that intel developers to pay attention to synchronization problems.

In addition, I would like other people on planet Earth to have the possibility to normally use imsdk in multi-application scenarios:)

 

0 Kudos
OTorg
New Contributor III
3,243 Views

Liu, Mark (Intel) wrote:

I also noticed you were referring to library " libmfxsw32.dll", I briefly check my installation, I can't find it. As I remembered, we discontinued software codec support.

Umm. Can you tell more about the discontinuation of software version?

Because imsdk 2019 r1 release notes tell a different story:
- System Requirements: IA-32 or Intel 64 architecture ... for running software implementation...
- Known Limitations: ... is relevant for both software and hardware implementations ...

And MediaSDK2019R1.exe contains both libmfxsw32.dll and libmfxsw64.dll.

I realize that new coding features are absent in software versions, but the full absence of implementation/support is a something new for me. Or did I misunderstand you?

 

0 Kudos
Mark_L_Intel1
Moderator
3,243 Views

Yes, you are right and this is my fault. We have libmfxsw32.dll which is under <Media SDK root>/Software Development Kit/bin/win32 directory, it is the software codec.

I am still trying the reproducer you provided, I started with 4 threads over night and I can see they fail one by one at different time; by the time I left, I saw only one running, all the failures had sync error. Are these you expected?

 I will update the details when all threads are done.

 

0 Kudos
OTorg
New Contributor III
3,255 Views

Liu, Mark (Intel) wrote:
 I will update the details when all threads are done.

The last application instance will not fail (most likely). Because there are no more competitions/races on that machine.

0 Kudos
OTorg
New Contributor III
3,255 Views

Liu, Mark (Intel) wrote:
all the failures had sync error. Are these you expected?

Do you mean such messages?:

[ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::GetFreeTask, m_TaskPool.SynchronizeFirstTask failed at src\pipeline_encode.cpp:1533
[ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::Run, m_pmfxENC->EncodeFrameAsync failed at src\pipeline_encode.cpp:1738
[ERROR], sts=MFX_ERR_UNKNOWN(-1), wmain, pPipeline->Run failed at src\sample_encode.cpp:1086

Yes, it is typical failures.

0 Kudos
Mark_L_Intel1
Moderator
3,255 Views

Hi,

I had a 3-day runs and following results with script sample_encode_N.bat:

  • It starts 4 threads and 3 of them crashed in first day(<12 hours) with following error: 
    file 86 processed, go next
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncTaskPool::SynchronizeFirstTask, SyncOperation failed at src\pipeline_encode.cpp:157
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::GetFreeTask, m_TaskPool.SynchronizeFirstTask failed at src\pipeline_encode.cpp:1748
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::Run, m_pmfxENC->EncodeFrameAsync failed at src\pipeline_encode.cpp:1961
    
    [ERROR], sts=MFX_ERR_UNKNOWN(-1), wmain, pPipeline->Run failed at src\sample_encode.cpp:1301
    error got from encode: -1
    Press any key to continue . . .
  • The last thread is still running up to now,

I am attached the screen capture here, is this the similar to yours?

I will submit a bug on this.

Mark

0 Kudos
OTorg
New Contributor III
3,255 Views

Hi,

Liu, Mark (Intel) wrote:
 I am attached the screen capture here, is this the similar to yours? 

Yes, your errors are similar to mine.

They arise when using sw-version of libmfx*.dll. Both 32-bit and 64-bit libraries have that problem.

And perhaps that bug is also the source of MFX_ERR_GPU_HANG errors at hw-libraries. I'll describe how to reproduce hw-errors in the next post.

 

0 Kudos
OTorg
New Contributor III
3,255 Views

It is harder to reproduce MFX_ERR_GPU_HANG issues using imsdk samples. I tried to find a script that shows errors faster.

Steps to reproduce MFX_ERR_GPU_HANG at encoder application (32-bit or 64-bit):
1. Download more_tools_to_raise_gpu_hang.zip and unpack it to your working folder.
2. Run sample_decode_N_and_encode.bat and wait.
3. If you don't see errors within 10-20 minutes, then close all sample_decode/sample_encode windows and go to step 2.

In a real life I saw MFX_ERR_GPU_HANG occurrences amid ordinary decoding/encoding work (without applications start/stop). It was observed using intel graphics driver version 6194, 6323, 6373, 7212 on i7-6700, i5-7500, i5-7260u, e3-1585-v5. So it seems like a common problem.

And I want to note that real-life MFX_ERR_GPU_HANG occurrences was observed when cpu/gpu load was reasonably away from 100%: on a machines with live (not file) media streams.
 

0 Kudos
Mark_L_Intel1
Moderator
3,255 Views

Thanks,

I have submitted the first issue,

Could you submit a different post for GPU hang issue? You don't have to resubmit the data and description but just point back to this post.

We need them to be debugged separately because I can't assume they are the same issue.

Mark

0 Kudos
OTorg
New Contributor III
3,255 Views

Hi Mark,

Liu, Mark (Intel) wrote:
 Could you submit a different post for GPU hang issue? 

Done:

https://software.intel.com/en-us/forums/intel-media-sdk/topic/830266

Thanks!

0 Kudos
Mark_L_Intel1
Moderator
3,255 Views

dj_alek wrote:

Hi Mark,

Quote:

Liu, Mark (Intel) wrote:

 Could you submit a different post for GPU hang issue? 

 

Done:

https://software.intel.com/en-us/forums/intel-media-sdk/topic/830266

Thanks!

Thanks, I have reproduced it and let's follow up this issue on that post.

0 Kudos
fei__liu
Beginner
3,255 Views

Hi Mark, I running sample_decoder with intel media sdk 2019 error occurred:

[ERROR], sts=MFX_ERR_NULL_PTR(-2), CSmplBitstreamReader::Init, m_fSource pointer is NULL at c:\users\admin\documents\intel? media sdk 2019 r1 - media samples 8.4.27.25\sample_common\src\sample_utils.cpp:596

[ERROR], sts=MFX_ERR_NULL_PTR(-2), CDecodingPipeline::Init, m_FileReader->Init failed at c:\users\admin\documents\intel? media sdk 2019 r1 - media samples 8.4.27.25\sample_decode\src\pipeline_decode.cpp:240

[ERROR], sts=MFX_ERR_NULL_PTR(-2), wmain, Pipeline.Init failed at c:\users\admin\documents\intel? media sdk 2019 r1 - media samples 8.4.27.25\sample_decode\src\sample_decode.cpp:686

Error in Source Code:

    // Initializing file reader
    totalBytesProcessed = 0;
    sts = m_FileReader->Init(pParams->strSrcFile);
    MSDK_CHECK_STATUS(sts, "m_FileReader->Init failed");

 

0 Kudos
Beese__Erin
Beginner
3,255 Views

dj_alek wrote:

Several years ago I wrote a workaround for our applications. It is a system-wide synchronization essence having some intelligence. It prevents parallel access to libmfx during "management" calls (MFXVideoENCODE_Init, MFXVideoENCODE_Close, etc), but allows multiprocess/multithreaded usage of coding pipe-line routines (MFXVideoENCODE_EncodeFrameAsync, MFXVideoCORE_SyncOperation, etc). Since that time our applications could work reliably in 24/7/365/N mode on hundreds of computers.

Alek,

Which versions of media sdk did you have success with this work around? Also, did you stop seeing the exceptions/MFX_ERR_UNKNOWN or did your application just recover properly after seeing the exceptions? I'm currently running 2019 R1 and I've tried your workaround but I'm still seeing an exception after ~ 3 hours (I've only got 2 joined streams in my test). I even tried preventing parallel access to pipeline routines in addition to management calls.

Thanks,

Erin 

0 Kudos
OTorg
New Contributor III
3,255 Views

Hi Erin,

Beese, Erin wrote:
 

> Which versions of media sdk did you have success with this work around?
Any version works good (hw imsdk implementation). Version 1.25 is used most often.

> Did you stop seeing the exceptions/MFX_ERR_UNKNOWN or did your application just recover properly after seeing the exceptions?
We don't get exceptions/MFX_ERR_UNKNOWN.

> I've tried your workaround but I'm still seeing an exception.
Perhaps your workaround implementation/usage has inaccuracies, check it carefully.

0 Kudos
Beese__Erin
Beginner
3,255 Views

Hi Alek,

"Any version works good (hw imsdk implementation). Version 1.25 is used most often "

Does this apply to the software implementation as well?

Also, in one of the old threads you mentioned the following:

"And I saw exceptions within even 1process/1session during recent tests. Both sw and hw implementations, imsdk 1.8/1.9."

If you are seeing exceptions with just one process/one session how is synchronization going to help with the issue?

Erin

0 Kudos
OTorg
New Contributor III
3,255 Views

Beese, Erin wrote:

Does this apply to the software implementation as well?

We haven't tested carefully such usecase. So, I can't get answer, sorry.

 

0 Kudos
Beese__Erin
Beginner
3,255 Views

Oh I see, thanks.

0 Kudos
OTorg
New Contributor III
2,908 Views

Beese, Erin wrote:

"And I saw exceptions within even 1process/1session during recent tests. Both sw and hw implementations, imsdk 1.8/1.9."

If you are seeing exceptions with just one process/one session how is synchronization going to help with the issue?

Probably, another problem was described there, which concerns early imsdk versions...

0 Kudos
Reply