Decoding efficiency optimization: Copy decoded video frame(video memory) into system memory

hang_liu · ‎12-01-2014

hi.

I tested with sample_decode (./sample_decode_drm h264 -i intel_hw_test.h264 -o output.yuv -hw -vaapi), that efficiency is very low.

the test data:

720p video, decoding 15 frame per second and cpu100% (total cpu400%).
When I removed the write file operation(m_FileWriter.WriteNextFrame (frame)), decoding 1700 frames per second and cpu40%.
if I use memcpy operation to replace write file operation, decoding 80 frames per second and cpu100%.

So i guess , copy one frame from video memory to system memory is Performance bottlenecks.

My question is:

My guess is correct?
Is there a good way to enhance the efficiency of decoding? My requirement: decoding 200 frame per second and consume less CPU.

thanks.

hang.liu

Surbhi_M_Intel · ‎12-02-2014

Hi hang.liu,

Thank you for your question.

Your guess is partially correct. Yes copying frame from system memory to video memory take CPU utilization and decreases the performance. But there is one more thing to consider here is color conversion happening from nv12 to yuv, which is not the most efficient way right now in sample_decode and hence considerably would reduce the performance. Just to make it clear, samples doesn't provide complete solutions, they are just starting point.
Depending upon your pipeline, there could be more options by which decoding speed can be increased. Please let us know what is the pipeline you are looking at and the system configuration you are using.

Thanks,
-Surbhi

hang_liu · ‎12-03-2014

hi SURBHI , thanks for your response.

As I said,When I removed the write file operation or memcpy operation ,this means that, the sample_decode do nothing when the decoded frame output to application, the performance is very high. So I think the “nv12 to yuv operation” is not reduced for performance reasons.

720p video, 80 frames per second is equivalent to 110M/s data, If it's GPU hardware transmission limit of I/O?
Is there authoritative test reports about "the intel hardware decode and encode performance "?

My test environment:

hardware:

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

software:

ubuntu 12.04 server

intel media server studio 2015

Our products is to do video intelligent analyse. So high-performance decoding and encoding is very important for us.

thanks

hang.liu

Surbhi_M_Intel · ‎12-09-2014

This topic is being discussed through private message, for the rest of folks who might be interested -

Use the IOPattern out to the system memory, this way Media SDK will optimize the o/p we write it to system memory. To understand better, please look at the tutorials simple_decode and simple_decode_vmem. The IOPattern is pretty well defined. Link to the tutorials : https://software.intel.com/en-us/intel-media-server-studio-support/training. Once you use IO pattern out to system memory then there is no need to separate mem copy.
Setting your BIOS on boost setting can also be helpful.

Thanks,
-Surbhi