Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

Slow h264 HW decoding on system even having better HW

Ramashankar
New Contributor III
579 Views

Hi,

I developed a h264 decoder application using Intel(R)_Media_SDK_2016.0.1 and Intel Media Samples 6.0.0.68. I am facing a strange issue of slower decoding on better HW system. I have two test machine: Sys1 and Sys2. Configuration (System analyzer logs) of both are attached below.

Sys1 is having better HW than sys2 in all aspect (Graphics card and CPU), but still decoding speed on sys1 is around 50 ms per frame whereas on sys2 it is 20 ms per frame. I am running same custom application on both the system. For the reference, I am attaching a part of developer logs for both the system (read time is frame reading time over network, decodetime is frame decoding time i.e DecodeFrameAsync() + SyncOutputSurface() )

My application scenario is that I am receiving h264 enocoded stream from an android mobile device. This h264 stream has only I and P slices, There is no B slice at all. At the receiver side (on sys1 and sys2), I am using intel hw acceleration to decode this stream but performance is very very different.

I analyzed other configuration (like directx version, window updates etc) on both system to identify the difference but there is nothing I could notice except that sys1 is having Microsoft Windows Embedded Standard OS (a flavour of Windows7 only) and Sys2 is having Microsoft Windows 7 Professional OS.

So can you please suggest what factor could be causing  this slower decoding on even better hw? .

Note: I noticed that SyncOutputSurface() => m_mfxSession.SyncOperation() part is the most time consuming on sys1. I am attaching my customized version of Decode function block also. It is supposed to take one encoded frame as input and generate one decoded frame (if available) in each function call. VPP block is used only if decoder's output is needed in BGRA format,

Thanks,

(PS: its fourth attempt of posting this same post on media sdk forum, don't whether it will be posted or not this time)

0 Kudos
5 Replies
Ramashankar
New Contributor III
579 Views

All attachments are here with this thread now

0 Kudos
Roman_T_
New Contributor I
579 Views

Hi Ramashankar,

I can't say definitely, that this is a source of problem, but in your log I see, that slow system sys2 works with Media SDK version 1.4 and fast system sys1 works with MediaSDK version 1.16.

Please check if sys1 has the latest version of Intel GPU driver for Windows Embedded.

Best regards,
Roman

0 Kudos
Ramashankar
New Contributor III
579 Views

Hi Roman,

Yes, sys1 has latest available version of Intel GPU driver (10.18.14.4332) installed, you can check it in Sys1_SystemAnalyzer.txt file attached in above thread.

Thanks,

0 Kudos
Jeffrey_M_Intel1
Employee
579 Views

Hi Ramashankar,

One approach you can try is to model your pipeline with the Media SDK sample applications.

If your final goal is transcode it may make sense to start with sample_multi_transcode instead of looking at decode, VPP, and encode separately since we have multiple hardware blocks that can run concurrently.

If you want to just look at decode+vpp you can start with sample_decvpp

sample_decvpp h264 -rgb4 -d3d -async 4 -hw -i test_sequence.h264
 

Or you could just look at decode itself:

sample_decode h264 -hw -d3d -async 4 -i test_sequence.h264

These two command lines don't include an output so you can separate hardware performance from disk I/O.  If there is a big difference between sample performance and your code then you can start looking at what your application is doing.  On the other hand, if you see the same behavior with the samples then it is more likely to be a configuration issue.  

Do you see the same behavior ratios running the samples without outputs?

0 Kudos
Ramashankar
New Contributor III
579 Views

HI Jeffrey,

My final goal is just to decode this h264 stream and render locally on system.

I had tried sample decode application and it was able to decode h264 file very fast in that system1. So it leads me to think that there is some issue in my implementation. But at the same time, my implementation is working very well on system2. So I am not sure where is the issue.

One more strange fact: On system1 if I receive h264 stream from any other machine (other than that android mobile) then also stream is getting decoding very fast on system1. i checked stream from android mobile and observed that it is generating only I and P frame. There is no 'B' frame in stream (not sure if it has any role in this issue).

I will try changing my implementation of decoder and make it similar to sample app tomorrow, and will let you know the result.

Meanwhile, if you can think of any possible root cause or any other approach, please let me know.

Thanks,

 

0 Kudos
Reply