I'm trying to increase density of our real-time media server implementation using the Media SDK. The code is currently only being used for color model conversion (YV12 to NV12 via. VPP) and encode for 720p at 30 fps. With 8 channels, intel_gpu_top shows between 80 and 100% render, video quality is slightly affected. With 16 channels, intel_gpu_top shows a solid 100% render, our results indicate video quality is poor. Are these results in line with estimates? I was expecting to get better density. Is there any way to analyze if I am doing something less than optimal?
[root@sut-1300 SDK2015Production220.127.116.11]# lspci -nn -s 00:02.0
00:02.0 Display controller : Intel Corporation Xeon E3-1200 v3 Processor Integrated Graphics Controller [8086:041a] (rev 06)
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Xeon(R) CPU E3-1285 v3 @ 3.60GHz
Thanks - Bob K., Dialogic
I'm not sure I understand your comment about video quality decreasing with # of channels. As the number of streams increase you should see lower FPS per stream but quality should be the same.
The first place to start for performance estimates is sample_multi_transcode. While you might not be able to set up exactly the same pipeline, you can set up a wide variety of "simplified" scenarios with the sink/source syntax of the par files to get a better sense of what your hardware can do.
With hardware acceleration, decode->VPP->encode is a bit more efficient than raw file read->VPP->encode, so this will also let you quickly look at how much additional benefit you could get from optimizing raw frame I/O.
To monitor GPU activity the metrics monitor tool is recommended instead of intel_gpu_top. However, the best way to understand how your code is performing is with VTune. This tool now allows visualization of fixed function, EU, and CPU activity in concurrent timelines.
Thanks for the prompt feedback.
Well, perhaps we are referring to the same thing. Our automated testing reports more errors as we increase density. We use this testing to verify quality as well as performance. It is likely the increase in failures is due to dropped frames. Dropped frames is more of an indication of decreased performance.
I understand it would be ideal to do 'decode-VPP-encode', however, our media server is doing more than what the SDK offers. We do things like video mixing, transcoding to other codecs, and test image overlays. The 'decode-VPP-encode' would be useful but only in limited scenarios. Though we also do file I/O, the main use case is for real-time media processing for rendering on numerous remote clients (browsers for WebRTC, various soft phones, etc.). We do not have a use case which renders to a player located on the GPU based server. The primary goal for using the SDK is to supplant what our product already offers, ideally to increase density on platforms which support hardware acceleration.
I'll take another look at sample_multi_transcode to see if I missed something.
I will also take a look at the tools you suggested. Other than intel_gpu_top, the tools I found to be recommended are not terribly useful. The ones from Intel are mostly Windows only. The one Linux tool is for Ubuntu and we use CentOS7.
Thanks - Bob
I tried downloading the Intel Metrics Framework (included in Intel Platform Analysis Library). I got an email confirmation that I would be contacted but was never actually contacted. Any suggestions?
Thank you for your request, we will be contacting you with the information needed to get access to the API framework based on the information you sent.
Submitted on Wednesday, February 17, 2016 - 07:45
Submitted by user: Bob Kirnum
Submitted values are:
I am considering using: Intel® Metrics Framework
Agreement: Yes, I have read and agree to the End User License
Sorry for the delayed reply. You probably don't want the metrics framework. You want the metrics monitor tool installed with Media Server Studio (including for CentOS). Check /opt/intel/mediasdk/tools/metrics_monitor. VTune also works for Linux. If you want you can test in Linux but view results in Windows too. While VTune may not always let you drill down to hotspots for Media SDK (because the code isn't in your application but a library or driver), it can be very helpful to visualize what is happening on CPU, EUs, and fixed function components. The best way to understand VTune output is to run a similar pipeline with sample_multi_transcode to understand what best case performance looks like then go back to your code to see what differs.
Another thing to consider for comparing a full featured framework is that FFmpeg now has wrappers for Media SDK codecs.
Just change codec name to h264_qsv to test HW acceleration in FFmpeg, which also includes audio, containers,and network protocols. It isn't ideal performance but, at least in my testing, does not drop frames as load increases.