We're interested in configuring Intel hardware h.264 encoding in a way which is more optimized for video containing lots of tiny repetitive lines/characters, such as entire pages of small text or finely-spaced grids. I recognize this is not an ideal case for compressed video, but since x264 seems to be able to produce entirely acceptable video of this sort of content, hopefully there is a way to get Media SDK to do the same.
We've tried fiddling with QPI/QPP, etc., but so far no matter what we do, the output video remains blurry/garbled and unattractive/illegible/inaccurately reproduced, even when encoded by Haswell hardware at the highest quality setting. It tends to go through cycles of especially bad blurriness at what looks like the GOP interval.
Any tips/insights from the expert(s)?
Thanks very much,
Every encoder algorithm is going to be a little different, so x264 comparison may be expected.
Have you tried using the new "LOOKAHEAD" bitrate control method available on Media SDK API 1.8 and HSW?
Thanks for the quick reply! We did go back and try the LOOKAHEAD bitrate control method, along with all the other ones, and unfortunately none of them seem to make much of a difference for these sorts of text/line-heavy scenarios. It's possible that one reason for this is that we're under real-time/videoconf type constraints, so there aren't much in the way of future frames available for it to look ahead at.
This and our other attempted adjustments and manipulations to the configuration have continued to be unproductive. Naturally it's true that, as you say, every encoder is different. We're just strongly hoping to find a way to persuade the Intel Quicksync hardware to encode documents, spreadsheets, web pages, etc. in a way that's at least reasonably tolerable, even potentially at the expense of other tradeoffs.
So of course we'd greatly appreciate any other suggestions of what to try in order to bias the encoding in a direction more friendly to this sort of scenario!
I understand your reasonable usage model and have asked our developers to provide some input. Performing a task in 'real time' and obtaining the same quality as un-accelerated implementations is always challenging.
Can you provide some general information about your target usage (original image size and desired bitrate or encoded stream)?
You might have already thought about this, but you could zero your UV planes,and just using the Y plane.
Actually, that would give you green text I think, so maybe ideally, you just want to draw detailed work into Y, and set UV to a constant in order to get a single monochrome color.
[This means you can only a single color, but it should be a lot more detailed than using UV normally]
Since the UV NV12 planes are subsampled, you can be sure they are going to take a hit when used for text, etc.
If you must use color, try scaling the source up, so one pixel becomes two pixels, and all new two-by-two 'pixels' start on even rows, and even columns.
If you use these tricks, or variations on them, you may be able to crank out a lot more detail.
You can lookup chroma subsampling on Wiki to futher developer your ideas about restricting UV usage or patterns that do not endode well for text, etc.
You're right, of course, about the challenges of what we're trying to do with Intel's hardware, and so I do appreciate that you're willing to look into it. The QuickSync platform has been impressive in a lot of ways, and has continued to improve, so we wanted to make absolutely certain we weren't missing anything new or obscure in its configuration options that might help.
And I'm sorry for not giving more details to begin with -- I wanted to avoid dumping a wall of information if there were best practices or try-first tips that generally applied. I should have made clear that we're looking at a case of 1080p content @ 30fps, and that we've now managed to tweak things to get satisfactory performance at target bitrates above around 4 mbps, but we need to be able to target more like half that. At 2 mbps, sadly, text-heavy content exhibits extreme cyclical degradation of clarity at discrete intervals at around the GOP rate, gradually improving each time until the start of the next cycle. I can capture and attach an example if that would be helpful.
I mentioned other encoders only because they demonstrate that the performance we're looking for is at least theoretically possible at the bitrate we're targeting, even in low latency situations.
Thanks for the excellent suggestions regarding various UV plane gambits. I've used that sort of trickery before in the contexts of format conversion and image display, but it hadn't occurred to me to try it when Encoding. It seems that the main problems we're having are unrelated to the chroma portion, though. If I set UV across the board to the middle of the range to make everything monochromatic before encoding, the exact same phenomenon occurs. The next stage of refinement may well involve the sorts of schemes you mention (and I'll file away the notion of chroma subsampling for that point), but the current severe problem appears to be happening with regards to the luminance component.
Appreciate the thoughts so far!
It seems some users have reported they have had the best results for this kind of content using AVBR (instead of VBR or other bitrate control algorithms). Another option would be to try VBR while also setting NalHrdConformance=OFF (you may not need full HRD Conformance for your usage).
@Cameron, Thanks for pointing out the 1/4 resolution aspects of NV12 chroma content.
I have passed information to engineers and architects. Thank You
Thanks for the tip. Yes, we don't need full HRD Conformance for our application, fortunately.
However so far AVBR actually seems a bit worse than VBR, for these sorts of text-heavy use cases. Perhaps that's due to configuration? I gather from other posts that the Accuracy and Convergence encoder params effect this method of bitrate control, but I can't actually find documentation on them (beyond one indication that the units are 10ths of a %, and 100 frames, respectively, whatever that means exactly). I've tried setting them to different values, but so far haven't gotten AVBR to behave even as well as VBR.
On a related note, is there a current MSDK Reference Manual available anywhere? I can't seem to find even an older version now. I may have one archived somewhere, but it would be nice to have one from newer than a couple years ago.
If you have installed the Intel® Media SDK 2014 for Clients you should the Reference Manual at <install dir>\Media SDK 2014 for Clients\doc\
Default location would be here:"C:\Program Files\Intel\Media SDK 2014 for Clients\doc\mediasdk-man.pdf"
I'll try to get AVBR configuration information.
Oh of course, I should have known that. I think I had it in my head that since the samples were now separate downloads, the Reference Manual would be as well. Thanks for clearing that up!
And yes, any further tips on configuring AVBR, or on otherwise optimizing for clarity in busy/text-heavy videos (that often don't change from frame-to-frame), will continue to be appreciated. Especially with regards to getting satisfactory clarity at lower bitrates.
After further and more careful comparisons, it looks like using CQP mode is the best way we've found (so far) to guarantee the most clarity of intricate text/grids/etc. at the lowest bitrate. If there's a way to achieve this sort of quality at this sort of bitrate by targeting bitrate rather than quantization parameters, we'd still be interested in discovering it. But the results of CQP are thus far acceptable, with QPI in the 38-41 range being the likely sweet spot, and yielding bit rates of around 2.0 to 2.6 Mbps.
Of course, the type of content affects the bitrate considerably, when using CQP. And for relatively still text-heavy frames, the distribution of data tends to be very unbalanced, with practically all the data being stored in the I-Frames (as one would expect). This is fine/necessary, but when motion is introduced into the equation, it can shoot up the bitrate significantly, especially for text-heavy scenarios. A good compromise appears to be setting the QPP much higher than the QPI so that the non-reference frames are much smaller at the cost of being much less distinct, while the image coalesces crisply at the GOP interval. I'd be satisfied with this configuration, except that when more traditional video content is introduced, then it looks terrible, due to the high quantization parameter of the P Frames. Conversely, a more moderate QPP value results in satisfactory quality and bitrate for traditional video, but much too high bitrates for text-heavy scenarios with motion.
We've been considering different ways of trying to modify the QPs on the fly to get usable results in each of those three situations (still text/lines, text/lines with motion, and more traditional video), but it may not be entirely possible. I don't suppose there's any way to tweak VBR or AVBR to be a little friendlier to relatively still, text-heavy situations? What we probably really want is for the encoding of typical video to behave more or less as it does now with VBR, but for the aforementioned small-detail-intense instances to be encoded with the vast bulk of the available throughput being preferentially allocated to the reference frames. And of course any solution still needs to be low latency/real time. I don't suppose there's any clever combination of parameters that might have a chance at yielding this sort of compromise without significant external analysis and constant on-the-fly modification?
Thanks for the suggestion, Cameron!
I actually did try that earlier, but text/line-heavy content unfortunately seemed to remain equivalently indistinct (for a given encoder configuration). Unless I'm missing something, for text/line-heavy scenes the Y data seems to be the part that's tough to encode with precision without resorting to high bitrates (And which can't seem to be optimized for in relatively motionless video situations without negatively impacting higher motion traditional video content) Have you (or anyone else) had a different experience with such inputs?
In general, this is exactly the sort of thing that 'LOOKAHEAD' helps with, but only some hardware platforms support this feature.
When you tried this, did you see any 'warnings' returned from encoder initialization?
Did you see any improvement in quality at all, or was it just not 'enough improvement' for the content?
I understand that you may want a 'low latency' configuration, but even a small 'lookahead' should help here.
THANKS for any feedback.
Thanks for following up with the further observations and probing. I just tried the MFX_RATECONTROL_LA_ICQ again, to make sure I was reporting accurately. I set the AsyncDepth encode parameter to 5 (instead of the usual 1 we use to minimize latency). It doesn't appear that there are any errors on encoder initialization. This is on a 4th gen i7-4770K proc, so I would hope it would support anything that's a part of the current officially released SDK (and, as I said, I see no initialization errors, regardless).
The quality of text/table/line-heavy 1920x1080 encoding at target rates of 1024 and 2048 kbps is definitely worse than when using MFX_RATECONTROL_VBR with the other settings identical. It seems to be more inclined to cyclical degredation/recovery, whereas VBR seems to quickly settle into a fairly consistent state of quality (or the relative lack thereof -- though I recognize only so much can be expected at a given bitrate).
If there are other settings I may be missing in properly configuring a Lookahead configuration, please do let me know! We'd love to be able to get things working in a way that is acceptable for both relatively static text content, as well as high motion traditional video.