I'm a bit lost concerning random access in an elementary H264 stream. As of now I have written a demuxer for mts container files that feeds the H264 video stream to the Intel Media SDK decoder. That works as expected and I can extract and render all of the stream's frames.
However, now I want to leave this linear path of plain playback and go for random access. But after reading through the specs and various websites and threads dealing with H264 I'm more confused than enlighted about the correct way to do random access.
First I got the impression that IDR slices of NAL unit type 5 provide random access points, thus it would be save to start feeding the decoder from such an IDR slice (given that the decoder is correctly initialized with PPS/SPS).
But the H264 sample stream I'm working with does only contain all of its frames behind a single IDR slice. Its quite a short sample stream with just about 25 seconds. But I cannot believe that within 25 seconds there are no other keyframes I could jump to without decoding the whole sequence.
So some other sources say that each I-frame in any non IDR slice (nalu type 1) can be used as a keyframe to start decoding from there.
Yet again other sources say that they do only look at the first_mb field in the slice header. I this says 0 for the first mb then this slice can be used as keyframe to start decoding.
Finally other sources say that they use each SEI recovery point in the elementary stream as random access points.
Of course I could go ahead and start trying what actually works. But I'm sure there is an official answer as to where it is perfectly legal to start decoding inside the stream for random access.
I know this is not really Media SDK related but hopefully someone can provide me an answer :)
Thanks in advance.
From a Media SDK perspective, if you follow the steps described in the "Decoding Procedures" section about "Bitstream Repositioning" and supply data that is not sufficient to decode an entire frame, the DecodeFrameAsync call will return "MFX_ERR_MORE_DATA" and you can send more data until you reach a point in the bitstream that is sufficient.
I am not sure about our sample stream that does not have an IDR slice for 25 seconds, but you could try the method above to see if we also think you need to step this far into the stream to find an access point. I have heard of issues using only IDR to identify an access point. (see http://forum.doom9.org/showthread.php?t=124254)
I know the question has been asked before, but I am not quickly finding the answer. Please let me know if you'd like me to continue to investigate.
Hi, I also found the following notes that might help:
Media SDK decoder bitstream repositioning is described in the Media SDK manual but the following information explains the concept in the context of container handling which is tightly connected to stream repositioning.
Please follow these steps to reposition a stream during a decoding session:
1. Invoke decoder MFXVideoDECODE_Reset() to reset the decoder
2. Clear current Media SDK bit stream buffer (mfxBitStream)
3. Reposition demuxer (splitter) to desired frame backward or forward in stream
It is recommended that demuxer does the following before delivering frames to decoder:
a. Reposition to closest I-frame to the desired position
b. Insert sequence header (sequence parameter set for H.264, or sequence header for MPEG-2 and VC-1) into stream before the I-frame
Note: Some streams may already contain sequence headers before each I-frame
c. If sequence header is inserted before a frame that is not an I-frame, decoder may produce artifacts
4. Read data into Media SDK bit stream buffer from new demuxer position
5. Resume decoding by calling DecodeFrameAsync as usual.
If the Media SDK decoder does not find any sequence headers while decoding from the new position DecodeFrameAsync will continuously return MFX_ERR_MORE_DATA (effectively asking for more bitstream data)
thanks for your replies. I have to clarify one thing first :)
My sample bitstream is none from the Media SDK but one from my own camera. Basically it starts with the usual SPS, PPS, AUD stuff and the like, and next there are four IDR slices but only the fourth (and last) IDR slice in the stream is followed by more non-IDR slices, some SPS and PPS headers and SEI information inbetween. From looking at the AUD nalus and the read-head position I can tell that all of the streams frames sit after the fourth IDR slice.
I do not have the elementary stream "with me" at the moment but I have the impression that there are indeed a set of SPS/PPS/SEI before each I-frame in the stream. Since the stream is from a camera and I only want to support video file streams I can assume that the SPS/PPS remains the same for the whole stream. So I will try later today to feed my stream beginning with the SPS/PPS followed by an I-frame to have random access. I will also look at the document you suggested.
I was just thinking that there must be a "global solution" for random access that should work with any decoder out there since the random access points should be something anchored in the H264 specification? But maybe I'm wrong here and the decoders can have different requirements to predict and generate the various frames ... ?
Again, thanks for providing input for me. You are perfectly welcome to dig deeper, of course :)
I got that working. Just for future references if some else wants to know about this:
If I want to decode a particular frame (or start decoding from there) I do the following:
* reset the decode as suggested above
* feed the first SPS/PPS (basically the same stuff I used to initialize the decoder originally) to the decode
* seek the last slice with I-frame type and first_mb field 0 that precedes the frame I'm actually after
* start feeding the decode from this stream position and skip all synced frames until I got to the frame in question
Thanks a lot for pointing me to the right direction :)