Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

Exception at encoding core, MFXVideoCORE_SyncOperation permanently returns MFX_ERR_UNKNOWN after it

OTorg
New Contributor III
3,656 Views

Application pseudo code:
1. create session
2. create h264 encoder on that session, init it, prepare buffers
3. encode frames from raw nv12-file
4. flush encoder, wait for last output frame
5. release encoder and all resources, except session
6. goto 2

Input file and encoding parameters are the same on every cycle execution. Debug compillation, x86. MFX_IMPL_SOFTWARE only. imsdk 1.7.
After some time after application start (from several minutes to several days, in different runs) i got at debugger output:
"First-chance exception at 0x595798f3 in imsdk_h264enc.exe: 0xC0000005: Access violation reading location 0x000000c0."
And MFXVideoCORE_SyncOperation returns MFX_ERR_UNKNOWN after that, subsequent MFXVideoCORE_SyncOperation calls (with the same arguments) permanently return MFX_ERR_UNKNOWN.

Exception point details:
C:\PLANG\Intel\Media SDK 2013 R2\bin\win32\libmfxsw32.dll, offset 0x001988F3 from section ".text" begin
595798D7 mov ecx,dword ptr [edi+1D34h]
595798DD push edx
595798DE lea edx,[ecx+eax*2]
595798E1 mov eax,dword ptr [esi+20h]
595798E4 lea ecx,[eax+edx*2]
595798E7 mov edx,dword ptr [edi+34h]
595798EA mov ecx,dword ptr [edx+ecx*4]
595798ED mov eax,dword ptr [edi+1B94h]
595798F3 add ecx,dword ptr [eax+0C0h] *********** it is here *********
595798F9 mov edx,dword ptr [esi+3Ch]
595798FC push ecx
595798FD push edx
595798FE push edi
595798FF lea ecx,[ebp+0Bh]
59579902 mov edx,esi

PS:
Now, I realized that there is a little chance that problem is provoked by crt versions conflict (i got "LINK : warning LNK4098: defaultlib 'MSVCRT' conflicts with use of other libs" at build, because of debug application build).
I'll make own mfx_dispatch build with the same settings as at application (debug/mt-dll-crt/vs2010), use it to rebuild application, and run tests again.
I'll write the results here in a couple of days.

0 Kudos
1 Solution
Petter_L_Intel
Employee
3,645 Views

Hi dj_alek,

I've asked my colleague to get back to on this since he explored thie question at an earlier stage.

Regarding the SW DLL exception issue. We plan to resolve the issue in Media SDK 2014, which will be released at the end of this year.

Regards,
Petter

View solution in original post

0 Kudos
69 Replies
celli4
New Contributor I
761 Views

I doubt this is your problem, but it's worth mentioning. I moved the hard disk from an Ivy Bridge system to a Haswell system, and I believe, because of it I got very infrequent but intermittent crashes in the IMSDK.  In my situation, upgrading to latest driver for the right architecture seemed to fix the problem.

0 Kudos
OTorg
New Contributor III
761 Views

camkego wrote:
upgrading to latest driver for the right architecture seemed to fix the problem

Thanks, but: MFX_IMPL_SOFTWARE only + libmfxsw32.dll shipped with imsdk 1.7 (latest). So, nothing to update...

0 Kudos
OTorg
New Contributor III
761 Views

Got exception without recreating CEncoder instances.
I attach this simpler test application (a simple long-running h264 encoding).

0 Kudos
Bernard
Valued Contributor I
761 Views

Sometimes access violation can be due to calling exports of unmapped DLL.Can you upload a dump file?

0 Kudos
OTorg
New Contributor III
761 Views

iliyapolak wrote:
 Sometimes access violation can be due to calling exports of unmapped DLL.Can you upload a dump file?

mov eax, dword ptr [edi+1B94h]
add ecx, dword ptr [eax+0C0h] *** AV here, with eax=00000000 ***

I don't think it is unmapped DLL, it is more like an inaccurate code.

Dump attached.

PS:
You can examine these exceptions yourself. Put the files _input.nv12, imsdk_h264enc.exe and libmfxsw32.dll in one folder (download links: https://docs.google.com/file/d/0B8SCkOT4os4HNXVHdHpFT0gzYkk/edit?usp=sharing and https://drive.google.com/file/d/0B8SCkOT4os4HTHVacmllbHFuUG8/edit?usp=sharing). And run imsdk_h264enc.exe under windbg...

0 Kudos
Zach_J_1
Beginner
761 Views

Intel mods, FYI I have a similar issue as well with a separate application:  http://software.intel.com/en-us/forums/topic/485411

I will be watching this closely!  Thanks.

0 Kudos
OTorg
New Contributor III
761 Views

And here are the test results from other machines. i7-4770 & i5-3330 both fell into AV, exception locations differ (see attachment) from the previously published here for the i7-3770 (or we simply have not got them yet on i7-3770). Plus, i7-4770 fell twice into infinite error returning from MFXVideoCORE_SyncOperation without any exception.

I would like to draw your attention once again: doing some other work on the computer concurrently with the test, leads to earlier exception raise.

0 Kudos
Zach_J_1
Beginner
761 Views

After looking into my issue a bit more, I realized that I am having the same exact issue as this post.  The sync operation fails with MFX_ERR_UNKNOWN.

0 Kudos
Bernard
Valued Contributor I
761 Views

Hi dj_alek 

I will look tomorrow at dump files.I was busy trying to understand kernel dump files of HAXM BSOD.

0 Kudos
Petter_L_Intel
Employee
761 Views

Hi,

I performed numerous runs (one of them 24h long) using the second code drop (imsdk_h264enc_v2.zip) you provided, but so far no errors or exceptions.

We will explore the new info you provided and execute some more runs with other workloads being executed concurrently.

Regards,
Petter 

0 Kudos
OTorg
New Contributor III
761 Views

Hi Peter

Got any results?

0 Kudos
Bernard
Valued Contributor I
761 Views

>>>mov eax, dword ptr [edi+1B94h]
add ecx, dword ptr [eax+0C0h] *** AV here, with eax=00000000 ***

I don't think it is unmapped DLL, it is more like an inaccurate code.>>>

Sorry for late answer.I was busy with programming projects and HAXM BSOD.

Yes I agree that the example above is not related to unmapped dll,but this snippet of code can be related. 

558cf459 752f jne libmfxsw32!MFXVideoVPP_GetVPPStat+0x19191a (558cf48a)15558cf45b 0fb708 movzx ecx,word ptr [eax] ds:002b:05cabfa0=???? *********** it is here *********

0 Kudos
Bernard
Valued Contributor I
761 Views

Hi dj_alek

Your minidump file is too small can you collect full dump?I need to see all loaded and referenced dll's by libmfxsw32.dll.Please use .dump /ma


command in windbg.


0 Kudos
Petter_L_Intel
Employee
761 Views

Hi dj_alek,

Today I encountered the issue "First-chance exception . 0xC0000005: Access violation reading location 0x000000c0" for the first time after running the workload during a few 10-20 hour chunks. Unfortunately I was not able to capture any deeper debug info due to some config issues with the machine I'm using.

Some observations however:

- I'm only able to hit the error state if I run the Release build of the workload. Despite 50h+ runs, I never encountered the issue in Debug build.

- As you said, there seems to be a relation to if the CPU is busy with other tasks or not. 

- During one run I did encounter a state where SyncOperation() returned MFX_WRN_IN_EXECUTION indefinitely. No exception thrown. I'm trying to reproduce this case again to try to find the root cause.   BTW: on this topic. Have you tried changing the time delay value you use with the SyncOperation() call?  Does setting that value to, for instance, 1000 instead or 10 make any difference?

I will try to reproduce the issue again shortly.

Regards,
Petter 

0 Kudos
OTorg
New Contributor III
761 Views

>> Today I encountered the issue
At last! Hurrah!

>> I never encountered the issue in Debug build
When I stumbled upon the issue for the first times, it was precisely a debug build of the test application. Has debug version also been observed under concurrent workloads in your tests?

>> Have you tried changing the time delay value
No, I haven't. I'll try 1000.

0 Kudos
OTorg
New Contributor III
761 Views

iliyapolak wrote:
Please use .dump /ma command in windbg.

I don't think it's worth rummaging through the dumps as long as the developers were able to catch the problem. Thanks, Bernard.

0 Kudos
Bernard
Valued Contributor I
761 Views

dj_alek wrote:

Quote:

iliyapolak wrote:Please use .dump /ma command in windbg.

I don't think it's worth rummaging through the dumps as long as the developers were able to catch the problem. Thanks, Bernard.

You are welcome.

0 Kudos
Bernard
Valued Contributor I
761 Views

dj_alek wrote:

I noticed that if I'm doing some other work on the computer concurrently with the test, then exception appears earlier.

What workload exactly?

0 Kudos
OTorg
New Contributor III
761 Views

>> What workload exactly?

Any. Text editor, visual studio edit/build, photo editor, browser, torrent.
Comical situation was on one machine. I ran the test for the night. Looked at windbg/procexp next morning - no exceptions, CPU load is high - ok. Then clicked on the windows start menu button - and exception occured at that moment.

0 Kudos
Bernard
Valued Contributor I
799 Views

>>> CPU load is high - ok. Then clicked on the windows start menu button - and exception occured at that moment.>>>

I agree that this is very strange situation.I think that in order to properly investigate that issue kernel debugger must be hooked up to the target(debugee).In case of clicked button I would suspect KeSwapContext and maybe any function related to context switching for not cleaning up registers when the some user mode thread's stack is swapped out.Somehow old context is preserved when control is returned and  libmfxsw32.dll function is called the exception occures.It could be something different also.

0 Kudos
OTorg
New Contributor III
799 Views

iliyapolak wrote:
... kernel debugger ... KeSwapContext ... cleaning up registers ... 

I think it is much more prosaic. And error lies somewhere in libmfxsw32 code. Stake on inaccurate critical sections usage:)

0 Kudos
Reply