Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

How to predict h264-encoder's PTS-DTS shift?

New Contributor III

Consider a scenario with several h264-streams stitching/concatenation (streams are created by different encoders). It is called "Multiple-Segment Encoding" at the imsdk documentation.
Wherein a resulting stitched stream should look like as if it was coded continuously by one encoder from one uncompressed source. That is, all segments must have the same frame size, framerate, etc. Also, the joints should not have holes/overlaps in time.

And now about the times. The notion of shift between PTS and DTS exists in all streams, where B-frames are present (in order to use the forward-prediction, first we need to decode one or more future frames) - see figure below. That PTS-DTS shift is equal to maximum number of (reference) frames that can be used by encoder for forward prediction.
Thus, all concatenated segments must have the same PTS-DTS shift also. Or we will get time-holes/overlaps at joints otherwise.

PTS-DTS shift is dependent on GopRefDist and NumRefFrame encoder initialization parameters in current h264 encoder implementation (although, in my opinion, the dependence on GopRefDist is superfluous).
A few shifts only were possible to obtain with imsdk 1.7sw: 0, -40 and -80 msec (for 25p/50i streams).
imsdk 1.8sw has evolved to: 0, -40, -80, -120 and -160 msec. This is good news, most of real world streams fit into them.
Yeah, the dependence formula is quite sophisticated:)

Unfortunately, I have no right to expect that dependence formula will not be changed in a future imsdk version (or between sw and hw-implementations).
But I want to be sure that Multiple-Segment Encoding will provide the correct results in the future. I don't expect full hrd/vbv-conformance on joints (yet), want basic norms only.

So, can I ask to consider the implementation of NumForwardRefFrame parameter (described here: in a future imsdk releases?

0 Kudos
2 Replies
New Contributor III

Sorry, I could not insert figures into the text normally...

0 Kudos

Sorry for the delay in replying.  Thank you for your excellent summary and suggestion.  This is very relevant feedback and I've passed it on to the development team.


0 Kudos