I've got an application where I actively grab frames from a video camera, which I want to encode. If my system is under computational load the framerate will drop (since I'm actively grabbing). Now I want to cope with this variable frame rate (VFR) during my encoding.
Since I'm recording a live video stream I process each frame individually after grabbing. I've got a working encoding pipeline which I've extended with VPP for handling VFR. I set a timestamp for each input frame to VPP, but the encoded stream does not seem to be affected. Here are my params for VPP (they are the same for encoding):
mfxVPPParams.vpp.In.FourCC = MFX_FOURCC_NV12; mfxVPPParams.vpp.Out.FourCC = MFX_FOURCC_NV12;
mfxVPPParams.vpp.In.Width = inputWidth; mfxVPPParams.vpp.Out.Width = inputWidth;
mfxVPPParams.vpp.In.Height = inputHeight; mfxVPPParams.vpp.Out.Height = inputHeight;
mfxVPPParams.vpp.In.ChromaFormat = MFX_CHROMAFORMAT_YUV420; mfxVPPParams.vpp.Out.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
mfxVPPParams.vpp.In.PicStruct = MFX_PICSTRUCT_PROGRESSIVE; mfxVPPParams.vpp.Out.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
mfxVPPParams.vpp.In.CropX = 0; mfxVPPParams.vpp.Out.CropX = 0;
mfxVPPParams.vpp.In.CropY = 0; mfxVPPParams.vpp.Out.CropY = 0;
mfxVPPParams.vpp.In.CropW = inputWidth; mfxVPPParams.vpp.Out.CropW = inputWidth;
mfxVPPParams.vpp.In.CropH = inputHeight; mfxVPPParams.vpp.Out.CropH = inputHeight;
mfxVPPParams.vpp.In.Width = MSDK_ALIGN16(inputWidth); mfxVPPParams.vpp.Out.Width = MSDK_ALIGN16(inputWidth);
mfxVPPParams.vpp.In.Height = (MFX_PICSTRUCT_PROGRESSIVE == mfxVPPParams.vpp.In.PicStruct) ? MSDK_ALIGN16(inputHeight) : MSDK_ALIGN32(inputHeight); mfxVPPParams.vpp.Out.Height = (MFX_PICSTRUCT_PROGRESSIVE == mfxVPPParams.vpp.In.PicStruct) ? MSDK_ALIGN16(inputHeight) : MSDK_ALIGN32(inputHeight);
mfxVPPParams.vpp.In.FrameRateExtD = 1; mfxVPPParams.vpp.Out.FrameRateExtD = 1;
mfxVPPParams.vpp.In.FrameRateExtN = framerate; mfxVPPParams.vpp.Out.FrameRateExtN = framerate;
mfxVPPParams.IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY | MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
I'm using the number of surfaces suggested for encoding. For VPP the suggested number is 5 (both in and out). However, for VPP I'm only using one, since I'm processing all frames one by one (I can see why encoding needs more surfaces but for VPP in this setting I don't). I fetch a free surface for encoding and call VPP as follows: [cpp]mfxVideoVPP->RunFrameVPPAsync(vppSurface, pEncSurfaces[nEncSurfIdx], NULL, &syncp);[/cpp]
My questions are:
- Is this the way to go, using VPP to handle VFR? What are my options?
- I've stumbled over some piece of information stating that timestamping for VFR is a thing that the container handles. Is this true? Could it be that my container (.mp4 from MP4Box) disregards the timestamps?
- Should I use 5 surfaces for VPP after all? If I should use all 5 surfaces for VPP, how do I connect VPP with encoding? (Are the out surfaces the same as the encoding surfaces? Should therefore the number of encoding surfaces be the maximum of suggested VPP and suggested encoding surfaces?)
- Are there possibly some other mistakes in my reasoning or in my code?
I'm very happy about any hints you could provide! I'm aware of section 4.9.4 from the dev guide as well as the MFXVideoVPPEx class, but I can't seem to find the answers to my questions there.
Thanks a lot in advance!
I've been looking into some examples and documentation and I've got some further questions...
- What format do the timestamps have to follow? As far as I know timestamp/90k = time in seconds. However, does the timestamp of the first frame have to be 0, or can it be any value and only the values of the following timestamps are important? (is the absolute or the relative value important?)
- How many syncpoints do I need? One for each Decode, VPP, Encode?
- Since the input to both VPP & Encode is in the exact same format (NV12, same dimensions), is there a need to make a distinction between their respective surface pool? Could I just create one larger surface pool which both VPP and Encode use?
I think I got some ideas on how to proceed:
SyncPoints are only needed for debugging and for calling SyncOperation. In my case I would only need to provide the SyncPoint for the call to Encode (since IMSDK will handle the sync between VPP and Encode) so I can call SyncOperation before writing the bitstream.
I actually do need to use as many surfaces for VPP as IMSDK suggests. I'll generate one larger surface pool of size VPP_in_sugg + max(VPP_out_sugg, Encode_sugg) - 2. (-2 because I set AsyncDepth to 2: calling both VPP and Encode before one SyncOperation, see reference manual p. 20) For each incoming frame I fetch two free surfaces from the pool, call surf1.Data.Timestamp = timeStmp; vppAsync(surf1, surf2, ...); encAsync(surf2, bitstream, syncp, ...); SyncOperation(syncp); writeBS(bitstream);
Finally I'd have to empty any buffered VPP frames to Encode, and as a last step empty any buffered Encode frames.
Does this make sense? Any ideas how to improve this?
Most importantly: Will I manage to process VBR this way?
Please also consider my questions in my previous post.
Thanks for your help!
I've made the changes I described in the previous post and still couldn't succeed... I simply end up with a stream that's the same as if I didn't use VPP and timestamping.
I noticed that: the encoder suggests and uses two surfaces. Depending on AsyncDepth VPP suggests 5 if I don't set AsyncDepth or 2 surfaces if I set AsyncDepth = 2. However, VPP actually only uses one surface.
I'd be very happy if someone could help me with my issues...
Let me try to answer your varying set of questions one by one.
- Overall, regarding VFR. I recommend you use timestamp parameter input to encoder (and VPP, if you need it) to reflect the actual rate of frames received from camera. Media SDK does not provide a mechanism to do this so you would have to implement that part yourself.
Note that Media SDK treats timestamps transparently, in other words it does not modify the timestamp, unless in some cases when using FRC, which I do not think you need for the case you describe.
- I do not understand why you need to use VPP for timestamp handling. Looking at the VPP parameters you provided the input and output is identical so I'm not sure why you want to use VPP. Instead you can input your timestamps to encoder, and time stamp will then be part of the encoded bit stream buffer. If you need to use VPP for frame processing before encode, it will treat timestamps transparently just like the encoder.
- The actual values you decide to use for timestamps is up to you. But keep in mind that different containers (mp4, mkv etc.) and players have different requirements and limitations when it comes to timestamps. This is out of scope and context of Media SDK.
- To ensure as few internal buffers for VPP (this is true for encode/decode also) you need to set AsyncDepth to 1. This will also ensure low latency, which is what I think you want in your case, right? Setting AsyncDepth=1 for VPP will result in QueryIOSurf requesting 1 buffer.
- Regarding syncpoints. You need to supply syncpoint parameter for all asynchronous Media SDK API calls. However, you only need to use the syncpoint associated with the operation in pipeline you want to sync on. For example, for a pipeline such as camera->VPP->Encode you only need to use the syncpoint from the encoder. Other syncpoints can be used if you need to manage errors, as you wrote in previous post.
- For simplified samples including details on syncpoint handling and low latency configurations and much more I recommend you check out the Media SDK tutorial here: http://software.intel.com/en-us/articles/intel-media-sdk-tutorial
Thank you very much for the explanations!
The dev guide section 4.9.4 and the following post led me to believe that I'd need VPP for VFR:
I'm still not entirely sure how to handle VFR...
Did I understand correctly that VFR is not handled by IMSDK?
So I either implement handling VFR myself (i.e. skip or duplicate frames according to the timestamp to 'interpolate' VFR to VFC) or let the video container handle this with the info in the timestamps?
For encoding I have to specify a framerate (FrameRateExtN and FrameRateExtD) but I've got the possibility to set a timestamp for a surface (surface->Data.TimeStamp). When you say treat transparently, I understand that this value is not considered by IMSDK but written as metainformation along the encoded frames. Is there something else I need to consider so I will get VFR after muxing? (I understand that this depends on the container to some extent)
I'm thinking of getting IPP, so I can include the muxing in my code rather than using an external tool. Will the muxing in IPP be able to handle VFR?
I suggest you rely on container timestamps to implement the VFR behavior you seek. As stated earlier, Media SDK is transparent to timestamps.
Regarding specifying encoder frame rate. You can set the encoding frame rate to an arbitrary value. E.g. if your target is 30 fps, then set encoder frame rate to this. The true per-frame frame rate is instead conveyed via the timestamps you compute and provide to the muxer.
So let's say you are encoding a frame, then you will set the timestamp in "surface->Data.TimeStamp" parameter. When the frame encode is complete the same timestamp will be available in the "bitstream.TimeStamp" parameter. Then you provide the timestamp and bitstream packet to your muxer.
Regarding IPP. There is some sample code part of the IPP samples that can handle muxing, but that sample code is quite stale and not feature complete (you may be required to make your own modifications). If you're looking for a product grade muxing solution I suggest you look elsewhere for open source solutions such as FFmpeg or commercial packages.