Does anyone have a link to example code or example settings for the highest performance h264 encoding possible? I ran a test and MFX_IMPL_HARDWARE_ANY and MFX_IMPL_SOFTWARE had a difference of 100 FPS versus 44 FPS. I can't imagine that should be the case.
What settings are MOST responsible for a performance increase?
"Highest" performance is a very relative term.
Encoder performance depends on many different parameters/decisions. Below are a just a few:
- Codec profile. A lower profile will mean less complex computations, thus greater performance
- TargetUsage: Speed setting will give greater performance
- Bitrate: A low bitrate will lead to greater performance
- Selected surface type may also impact overall performance. For instance, if using system memory surfaces with HW encode it implies internal surface copy, which will impact performance
- Asynchronouse vs. Synchronous pipeline implementation will also have a large impact on overall performance for single channel workloads.
Keep in mind that performance will always be a trade-off to quality. If you set encode parameters to get very high performance it will also result in low quality.
Thanks. I'm looking for the differential factors that will pull the hardware encoding units away from straight software. So, all quality components being within an acceptable range, what parameters must be tweaked to force the greatest fixed function execution of the coding subsystems (ME, entropy coding, etc). I found the target after a little while. Bitrate surpised me a little (that must not be linear - up to X you work hard to keep quality and lower bits, after X you can just start throwing away data).
In the current setup Async wasn't pulling away. But I imagine that at some point in various loads it will.
One extra thing that perhaps you might have come across: is there any way to mix encoding between a discrete GPU and Intel's hardware? The GPU in this scenario will have lots of extra power to do things like DCT, etc that would not only take load off the CPU but also reduce the transfer bandwidth. My gut says it's all or nothing. What do you think?
If you are interested in exploring the async vs. sync impact Media SDK performance then I suggest you check out the first few chapters of the Media SDK tutorial here: http://software.intel.com/en-us/articles/intel-media-sdk-tutorial
Intel Media SDK does not support discrete GPU's.
Thanks, Petter. I did try the various tutorials. I think load will diffferentiate.
BTW, I see that you are from Intel. Is that correct? Although it may not be able to integrate with descrete GPU hardware, is there any way to feed the SDK data that is more fully prepared than bitplanes? That should free EU resources and reduce transfer bandwidth.
The alternate solution we're looking at is NVENC - h264 coding on the GPU itself. That has several advantages in latency at the expense of price.
I'm somewhat confused why you would think the difference between using using a software codec and a dedicated hardware codec should be small. The whole point is that in software you can't do it fast enough. I'm actually surpsied you got 44fps ! That's quite a bit more that I was able to achieve. Fundamently, it is that difference that caused Intel to implement a hardware codec in the first place.
What do you mean by "data that is more fully prepared than bitplanes"? And please also elaborate on why you think using NVENC on discrete card provide lower latency?
Regarding the Linux beta program. If you signed up and requested beta access you will be contacted shortly.
from what it looks like now (in the linux documentation), dct/me/etc is already happening on the cpu's gpu (ENC) in prep to feed PAK (MFX/VCE). if there's horsepower available, duplicating some of ENC on the discrete GPU would reduce transfer bandwidth (multiple HD streams) and reduce load on the cpu.
regarding lower latency, i mean reducing latency in a hybrid discrete + cpu environment. obviously if all rendering and encoding happens on the CPU you can hit low latency. if you leverage a discrete GPU for rendering, transferring across the bus and utilizing frame or memory pools will incrementally increase latency.
Media SDK does not support access to granular parts of the encoding process. ENC and PAK stages are executed as one single operation via the EncodeFrameAsync() call. All of the encode stages are executed on the Intel HD Graphics part of the Processor.
Regarding latency. HW accelerated encode/decode/frame processing using Media SDK provides very low latency. If you have concerns about low latency usages please provide more details about your pipeline. We can help ensure you configure the components for optimal latency.
This can be considered through the web gaming. So latency (user interaction, reaction) is very important. Using NVENC is very expensive (monetarily expensive) so if it's not necessary then we'd like to use the best of both worlds options. BTW, I haven't heard from anyone since last week regarding Linux SDK. Do you know if there's someone to check in with?
BTW, is it truly the case that the linux graphics stack (including encoding library) is open here:
if that's so that's a pretty compelling case to go very intel. Are there any portions of the encoding library that aren't open?
Regarding latency: Refer to the following post since it may relate well to to game capture case you describe
Regarding Intel Media SDK for Server usage: see this recent post:
I do not have control over the Linux SDK beta request procedure. You should be contacted shortly.