Graphics
Intel® graphics drivers and software, compatibility, troubleshooting, performance, and optimization
20598 Discussions

Using NV12 video output during video decoding, drops performance a lot with Intel's GPUs

NNiko6
New Contributor I
4,069 Views

Hello.

I've recently found out a big performance drop when using NV12 video output with EVR/EVR-CP renderers and Intel's GPUs.

I have done some performance tests using DXVA Checker v3.9.0 on a Win 10 x64 system using Core i7-4790 and Intel's iGPU HD4600 with latest drivers v.4279.

As video decoder, I used LAV filters 0.66.31, which is the default video decoder for many Video Players, like MPC-HC.

The test clip is here, a small video sample from YouTube:

https://www.sendspace.com/file/o0oe5t https://www.sendspace.com/file/o0oe5t

It's a 4K60fps VP9 clip.

Results:

Decode - Renderless mode (No renderer)

Video output min fps/ average fps/ max fps

 

NV12/YV12 83 fps/112 fps/175 fps CPU usage 72%

Both NV12 and YV12 have the same performance.

Playback - Vanilla EVR (scale to 1280 x 720)

Video output min fps/ average fps/ max fps

 

YV12 79 fps/100 fps/139 fps CPU usage 73%

 

NV12 65 fps/74 fps/86 fps CPU usage 52%

The numbers represent min fps/average fps/max fps of the whole decoding process.

It is clear that using a renderless mode, in pure decoding, NV12 and YV12 have exactly the same performance.

BUT during real playback, using EVR renderer, the performance drop of NV12 is more than clear compared to YV12 along with the CPU usage.

There is a bottleneck somewhere in the driver that prevents GPU and CPU to reach maximum speed.

And because NV12 is the default, quality video output format for many video decoders like LAV filters and the preferred output even for HW acceleration of Microsoft's DXVA, I think is urgent to do something about it and increase the performance of that video output.

Waiting for your feedback.

0 Kudos
9 Replies
RB_2
Novice
2,668 Views

I would just like to mention that I also confirmed this performance issue for NikosD previously via the iGPU on my own Pentium G3258 (overclocked to 4.6GHz) on Windows 7 64bit.

For me it was a difference of 42fps vs 56fps (both with 100% CPU utilization); as someone that watches a great deal of 50fps content (yes 50, not a typo), that would be the difference between perfectly fine and completely unacceptable performance.

UPDATE: I can also confirm that, on my old Intel 965GMA integrated graphics, the performance of NV12 matches the performance of YV12.

0 Kudos
NNiko6
New Contributor I
2,668 Views

UPDATE 1:

I tried an ancient laptop with Win 10 x64 - iGPU 945GM (GMA 950) and the results were the same like 965.

Same performance for NV12/YV12

UPDATE 2:

I tried my SandyBridge system Win 10 x64 - Core i5 2400 - iGPU HD 2000 - v4229 and Radeon 5750 - Catalyst 15.10

The system has two graphics cards, so I tested both on the above clip 4K VP9

Decode (renderless)

The performance is the same using Intel or ATI and exactly the same using NV12/YV12.

In all cases the average fps is 84fps.

Playback (vanilla EVR renderer - scale to 1280x720)

Radeon 5750

NV12/YUY2 same performance like decode (renderless) mode ~84fps

Intel HD 2000

YV12 50/60/66 CPU usage 69%

NV12 42/51/64 CPU usage 86%

Even though NV12 for SandyBridge HD 2000 uses a lot more CPU than YV12, still doesn't catch YV12's performace.

From the above tests it is obvious that older Intel GPUs (like 945, 965) don't suffer from performance hit using NV12, but from SandyBridge and onwards the performance hit is clear when NV12 is used, instead of YV12.

Still, waiting for your feedback.

0 Kudos
Amy_C_Intel
Employee
2,668 Views

Hello NikosD,

Thank you for sharing your testing results.

Regards,

Amy.

0 Kudos
LYang23
Beginner
2,668 Views

I can confirm.

0 Kudos
Bryce__Intel
Employee
2,668 Views

Hi Nikos,

I'll get the investigation started on this one. I'll be in touch as needed for updates/questions. Thanks for raising this.

.:Bryce:.

0 Kudos
Bryce__Intel
Employee
2,668 Views

Bug submitted and in queue for investigation. Will follow up as updates come. Thanks.

Bug# 10003541

0 Kudos
NNiko6
New Contributor I
2,668 Views

Thank you very much for your prompt actions!

Hope we have good news soon.

0 Kudos
Bryce__Intel
Employee
2,668 Views

Correction...

Update:

Still work-in-progress on debugging root cause. Stay tuned...

0 Kudos
NNiko6
New Contributor I
2,668 Views

Hello.

I have a few indications showing me that the problem has been - somehow - fixed, but not completely.

For Haswell owners, latest drivers for Win 7/8/8.1 and Win 10 seem to have the same performance using NV12 and YV12.

I think the same goes to Broadwell and Skylake, but I don't know about Ivy and SandyBridge, which I think don't have updated drivers.

BUT even for my Haswell using 4326 latest driver for Win 10, I 've seen some "ugly" scaling algorithms using NV12, like those of YV12.

It seems that the new drivers somehow "cheat" in quality (scaling algorithms and others) in order to achieve the same performance as YV12, which already had lower quality and possibly that was the reason to be faster.

We, as users, want both. Speed and previous quality using NV12.

Hope it helps in order to further investigate the issue.

0 Kudos
Reply