- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Customers often wish to fully utilize processor resources for decoding/encoding (of live/realtime video streams).
I.e. to run multiple imsdk applications: some of them utilize gpu/hw, and some - cpu/sw.
And it is normally that number of applications can be changed from time to time.
Or, alternative approach: one (multithreaded) apllication/service, which can change number of handled streams on-the-fly.
E.g. computer transcodes 5 iptv streams today and will transcode 7 streams tomorrow. And, 5 running already streams shouldn't be interrupted during start of 2 additional streams.
But intel media sdk library has well-known bug, which has not been fixed for years.
Start/stop of mfx session may cause errors inside another running sessions.
I think the reason is lack of synchronization primitives somewhere inside imsdk libraries.
I saw such errors on different processors, different windows versions, etc.
Issue can be easily reproduced using standart imsdk samples.
Steps to reproduce a bug with sw-library on windows:
1. Download latest samples (today it is https://software.intel.com/sites/default/files/managed/61/d0/MediaSamples_MSDK_2017_8.0.24.271.msi).
2. Take \_bin\win32\sample_encode.exe from it.
3. Employ latest imsdk library (MediaSDK2019R1.exe, libmfxsw32.dll).
4. Take some uncompessed video file. I use this one _input.nv12: https://drive.google.com/file/d/1z3O6iobsnPLzwQddXlTHoK1UzOIY9fJ3/view?usp=sharing
5. Download two scripts (sample_encode_1.bat and sample_encode_N.bat) attached to this message and put them beside sample_encode.exe.
6. Start sample_encode_N.bat. It will run in infinite loop four sample_encode.exe instances.
7. Wait several days (or less), and you'll see error messages at sample_encode consoles.
This bug has already been published years ago. But it is still not fixed. Here you can find detailed discussions:
https://software.intel.com/en-us/forums/intel-media-sdk/topic/696953
https://software.intel.com/en-us/forums/intel-media-sdk/topic/475624
https://software.intel.com/en-us/forums/intel-media-sdk/topic/536840
- Tags:
- Bug
- Development Tools
- Graphics
- Intel® Media SDK
- Intel® Media Server Studio
- Media Processing
- Optimization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks,
I am doing it now and I hope I can reproduce it.
The sample and release are old, I am using the latest sample and release--Media SDK for Windows 2019R1. So I skipped step 1~3 from your list.
I will keep you updated.
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is the updates,
I run the script for several hours and it was interrupt. It has error message of memory allocation but I am not sure if it is because of the issue you report or the windows sleep mode.
I check all the posts from you and I found the error is not the same as those. I also noticed you were referring to library " libmfxsw32.dll", I briefly check my installation, I can't find it. As I remembered, we discontinued software codec support.
Any way, here is the error message I got, do you still want me to continue?
file 151 processed, go next file 152 processed, go next [ERROR], sts=MFX_ERR_MEMORY_ALLOC(-4), CEncodingPipeline::Run, MSDK_INVALID_SURF_IDX==nEncSurfIdx error at c:\bb\nnmsdkbaw05_1\build_windows_sw_lib\build_dir\repos\mdp_msdk-lib\samples\sample_encode\src\pipeline_encode.cpp:2053 [ERROR], sts=MFX_ERR_MEMORY_ALLOC(-4), wmain, pPipeline->Run failed at c:\bb\nnmsdkbaw05_1\build_windows_sw_lib\build_dir\repos\mdp_msdk-lib\samples\sample_encode\src\sample_encode.cpp:1522 error got from encode: -4 Press any key to continue . . .
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Liu, Mark (Intel) wrote:I check all the posts from you and I found the error is not the same as those.
Any way, here is the error message I got, do you still want me to continue?
Errors are different from time to time. This is one of the reasons why I supposed that problems root is a lack of MT-synchronization primitives inside libmfx*.dll. Run the test again and again, and you'll see other errors that occur in different places.
Several years ago I wrote a workaround for our applications. It is a system-wide synchronization essence having some intelligence. It prevents parallel access to libmfx during "management" calls (MFXVideoENCODE_Init, MFXVideoENCODE_Close, etc), but allows multiprocess/multithreaded usage of coding pipe-line routines (MFXVideoENCODE_EncodeFrameAsync, MFXVideoCORE_SyncOperation, etc). Since that time our applications could work reliably in 24/7/365/N mode on hundreds of computers.
Having such a patch, why am I raising the question again now? Because I have suspicion about false MFX_ERR_GPU_HANG triggering. And I feels their source is also relative to synchronization lack. But for now, this is only suspicion. I have to perform a piece of research to confirm or deny it. I'll create a new forum topic if suspicions are confirmed. In the meantime, it would be desirable that intel developers to pay attention to synchronization problems.
In addition, I would like other people on planet Earth to have the possibility to normally use imsdk in multi-application scenarios:)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Liu, Mark (Intel) wrote:I also noticed you were referring to library " libmfxsw32.dll", I briefly check my installation, I can't find it. As I remembered, we discontinued software codec support.
Umm. Can you tell more about the discontinuation of software version?
Because imsdk 2019 r1 release notes tell a different story:
- System Requirements: IA-32 or Intel 64 architecture ... for running software implementation...
- Known Limitations: ... is relevant for both software and hardware implementations ...
And MediaSDK2019R1.exe contains both libmfxsw32.dll and libmfxsw64.dll.
I realize that new coding features are absent in software versions, but the full absence of implementation/support is a something new for me. Or did I misunderstand you?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you are right and this is my fault. We have libmfxsw32.dll which is under <Media SDK root>/Software Development Kit/bin/win32 directory, it is the software codec.
I am still trying the reproducer you provided, I started with 4 threads over night and I can see they fail one by one at different time; by the time I left, I saw only one running, all the failures had sync error. Are these you expected?
I will update the details when all threads are done.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Liu, Mark (Intel) wrote:
I will update the details when all threads are done.
The last application instance will not fail (most likely). Because there are no more competitions/races on that machine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Liu, Mark (Intel) wrote:
all the failures had sync error. Are these you expected?
Do you mean such messages?:
[ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::GetFreeTask, m_TaskPool.SynchronizeFirstTask failed at src\pipeline_encode.cpp:1533 [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::Run, m_pmfxENC->EncodeFrameAsync failed at src\pipeline_encode.cpp:1738 [ERROR], sts=MFX_ERR_UNKNOWN(-1), wmain, pPipeline->Run failed at src\sample_encode.cpp:1086
Yes, it is typical failures.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I had a 3-day runs and following results with script sample_encode_N.bat:
- It starts 4 threads and 3 of them crashed in first day(<12 hours) with following error:
file 86 processed, go next [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncTaskPool::SynchronizeFirstTask, SyncOperation failed at src\pipeline_encode.cpp:157 [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::GetFreeTask, m_TaskPool.SynchronizeFirstTask failed at src\pipeline_encode.cpp:1748 [ERROR], sts=MFX_ERR_UNKNOWN(-1), CEncodingPipeline::Run, m_pmfxENC->EncodeFrameAsync failed at src\pipeline_encode.cpp:1961 [ERROR], sts=MFX_ERR_UNKNOWN(-1), wmain, pPipeline->Run failed at src\sample_encode.cpp:1301 error got from encode: -1 Press any key to continue . . .
The last thread is still running up to now,
I am attached the screen capture here, is this the similar to yours?
I will submit a bug on this.
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Liu, Mark (Intel) wrote:
I am attached the screen capture here, is this the similar to yours?
Yes, your errors are similar to mine.
They arise when using sw-version of libmfx*.dll. Both 32-bit and 64-bit libraries have that problem.
And perhaps that bug is also the source of MFX_ERR_GPU_HANG errors at hw-libraries. I'll describe how to reproduce hw-errors in the next post.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is harder to reproduce MFX_ERR_GPU_HANG issues using imsdk samples. I tried to find a script that shows errors faster.
Steps to reproduce MFX_ERR_GPU_HANG at encoder application (32-bit or 64-bit):
1. Download more_tools_to_raise_gpu_hang.zip and unpack it to your working folder.
2. Run sample_decode_N_and_encode.bat and wait.
3. If you don't see errors within 10-20 minutes, then close all sample_decode/sample_encode windows and go to step 2.
In a real life I saw MFX_ERR_GPU_HANG occurrences amid ordinary decoding/encoding work (without applications start/stop). It was observed using intel graphics driver version 6194, 6323, 6373, 7212 on i7-6700, i5-7500, i5-7260u, e3-1585-v5. So it seems like a common problem.
And I want to note that real-life MFX_ERR_GPU_HANG occurrences was observed when cpu/gpu load was reasonably away from 100%: on a machines with live (not file) media streams.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks,
I have submitted the first issue,
Could you submit a different post for GPU hang issue? You don't have to resubmit the data and description but just point back to this post.
We need them to be debugged separately because I can't assume they are the same issue.
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mark,
Liu, Mark (Intel) wrote:
Could you submit a different post for GPU hang issue?
Done:
https://software.intel.com/en-us/forums/intel-media-sdk/topic/830266
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dj_alek wrote:Hi Mark,
Quote:
Liu, Mark (Intel) wrote:
Could you submit a different post for GPU hang issue?
Done:
https://software.intel.com/en-us/forums/intel-media-sdk/topic/830266
Thanks!
Thanks, I have reproduced it and let's follow up this issue on that post.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mark, I running sample_decoder with intel media sdk 2019 error occurred:
[ERROR], sts=MFX_ERR_NULL_PTR(-2), CSmplBitstreamReader::Init, m_fSource pointer is NULL at c:\users\admin\documents\intel? media sdk 2019 r1 - media samples 8.4.27.25\sample_common\src\sample_utils.cpp:596
[ERROR], sts=MFX_ERR_NULL_PTR(-2), CDecodingPipeline::Init, m_FileReader->Init failed at c:\users\admin\documents\intel? media sdk 2019 r1 - media samples 8.4.27.25\sample_decode\src\pipeline_decode.cpp:240
[ERROR], sts=MFX_ERR_NULL_PTR(-2), wmain, Pipeline.Init failed at c:\users\admin\documents\intel? media sdk 2019 r1 - media samples 8.4.27.25\sample_decode\src\sample_decode.cpp:686
Error in Source Code:
// Initializing file reader
totalBytesProcessed = 0;
sts = m_FileReader->Init(pParams->strSrcFile);
MSDK_CHECK_STATUS(sts, "m_FileReader->Init failed");
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dj_alek wrote:Several years ago I wrote a workaround for our applications. It is a system-wide synchronization essence having some intelligence. It prevents parallel access to libmfx during "management" calls (MFXVideoENCODE_Init, MFXVideoENCODE_Close, etc), but allows multiprocess/multithreaded usage of coding pipe-line routines (MFXVideoENCODE_EncodeFrameAsync, MFXVideoCORE_SyncOperation, etc). Since that time our applications could work reliably in 24/7/365/N mode on hundreds of computers.
Alek,
Which versions of media sdk did you have success with this work around? Also, did you stop seeing the exceptions/MFX_ERR_UNKNOWN or did your application just recover properly after seeing the exceptions? I'm currently running 2019 R1 and I've tried your workaround but I'm still seeing an exception after ~ 3 hours (I've only got 2 joined streams in my test). I even tried preventing parallel access to pipeline routines in addition to management calls.
Thanks,
Erin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Erin,
Beese, Erin wrote:
> Which versions of media sdk did you have success with this work around?
Any version works good (hw imsdk implementation). Version 1.25 is used most often.
> Did you stop seeing the exceptions/MFX_ERR_UNKNOWN or did your application just recover properly after seeing the exceptions?
We don't get exceptions/MFX_ERR_UNKNOWN.
> I've tried your workaround but I'm still seeing an exception.
Perhaps your workaround implementation/usage has inaccuracies, check it carefully.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alek,
"Any version works good (hw imsdk implementation). Version 1.25 is used most often "
Does this apply to the software implementation as well?
Also, in one of the old threads you mentioned the following:
"And I saw exceptions within even 1process/1session during recent tests. Both sw and hw implementations, imsdk 1.8/1.9."
If you are seeing exceptions with just one process/one session how is synchronization going to help with the issue?
Erin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Beese, Erin wrote:Does this apply to the software implementation as well?
We haven't tested carefully such usecase. So, I can't get answer, sorry.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh I see, thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Beese, Erin wrote:"And I saw exceptions within even 1process/1session during recent tests. Both sw and hw implementations, imsdk 1.8/1.9."
If you are seeing exceptions with just one process/one session how is synchronization going to help with the issue?
Probably, another problem was described there, which concerns early imsdk versions...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page