Developing Games on Intel Graphics
If you are gaming on graphics integrated in your Intel Processor, this is the place for you! Find answers to your questions or post your issues with PC games
477 Discussions

Direct3D11 DXVA behavior issues on some Atom chipsets; audio/video synchronization lost in decoding


In a a Windows Store App that our company is developing, we have countered a problem whereby if the user repeatedly adjusts the layout during playback of mpeg2 content, the synchronization between audio and video is lost. This lip-sync error persists even if the layout subsequently remains static. This problem is occurring on two devices, one with the Atom Z3735F chipset and the other with the Atom Z3795 chipset.


When AVC content is played back, if a lip sync error develops, frames are dropped and synchronization between audio and video is stored. Proceeding with an investigation based on the assumption that our MPEG2 decoder must adjust based on the current playback quality, we have discovered the following things.


1. Investigation into conditions of sync loss occurrence

We deliberately delayed the clock to determine the conditions under which sync was lost.



* If the video frame's PTS is delayed, the sync loss could be reproduced.

If the PTS is delayed by 1s during playback, lip sync is lost and is not subsequently restored. Introducing a several second delay on the output of the decoder also reproduced the loss of sync. Finally, the problem was not reproduced using DirectX9 on a Desktop version, but it did occur with DirectX11.



2. Investigation into what's causing the delay

Logging clock times, we established what was consuming time when sync was lost.



A) Sample requests from the renderer are late

At first, ProcessOutput is called with an interval of 2-3ms to try to catch up. Slowly, the interval increases and even though the video still lags, the final interval is in the range of 30-40ms. For normal and correct playback, the request interval should be 33-34ms for the frame rate. Under normal playback, if synchronization slips, requests should come at intervals of 2-3ms until synchronization is restored.


B) A Direct3D11 DXVA interface that is used internally by the decoder can take 31ms to complete

When requests from the renderer begin to arrive late, ID3D11VideoContext::DecoderBeginFrame, which is called to move processing from the renderer to the decoder, takes as long as 31ms to complete. Ordinarily this call completes inside of one millisecond.


3. Workarounds in the Decoder

When DecoderBeginFrame() takes a long time to return, try reducing workload until sync is restored



A) Drop a frame in the decoder

If DecoderBeginFrame() takes more than 10ms, do not decode the current frame and do not pass data to the renderer. DecoderBeginFrame() subsequently does not take a long time, and lip sync is restored.

B) Deactivate deinterlace processing

If DecoderBeginFrame() takes more than 10ms, set the de-interlace flag to false for one frame before passing the data to the renderer. DecoderBeginFrame() subsequently does not take a long time, and lip sync is restored.



4) In summary...

A) Even when the video is lagging, the interval between requests for ProcessOutput increase. At first, when video is lagging and sync is lost between audio and video, multiple immediate requests for data are received by the decoder, but before synchronization can be restored, the request interval increases. We expect that the DXVA acceleration is used not just by the decoder, but also by the renderer, and hope that Intel can explain to us the driver mechanism that leads to the above behavior.

B) DecoderBeginFrame() takes approximately 30ms.  When (A) occurs, DecoderBeginFrame() takes approximately 30ms. What internal mechanism is causing this delay?

C) May we have documentation related to the above?

D) Would it be possible for you to explain the mechanism which allows the Workaround (3) to address issues (1) and (2)? Additionally, please instruct how we can restore audio video synchronization without the workarounds of (3)?


X) Related problem with high speed playback


Outline of Issue
In a problem related to the above listed, mpeg2 playback with high playback rates (e.g., > 1.8) is causing audio/video sync to be lost. The problem has been explored on the following configurations:

Problem reproduced: Atom Z3735F 1.33GHz
Problem not reproduced: Atom Z3795 1.66Ghz, Atom Z2760 1.8GHz

Because the problem seems to occur on lower-specced chipsets, we determined to investigate what the bottleneck is. Logging timestamps, it turned out to be within the decoder where the below method is used to pass data to the DXVA.


At playback rates of 1.8, this method takes a long time to return. Although the time required to return is not fixed, when it is slow it takes between 10-150ms to complete, although 20-50s is most common. When it takes a long time, it is always on the second B frame (the B2 in the sequence I, B1, B2, P, B1, B2...); with other frames the processing completes without delay. Normally, the method completes in under 2ms.

(1) Drop b frames at the source
 Result? Dropping B frames at the IMFMediaSource restored audio/video sync at 1.8x playback
(2) Turn off de-interlacing
 Result? Synchronization restored.
(3) Turn of de-interlacing ONLY when SubmitDecoderBuffers was slow to return
 Result? Synchronization is not restored.
(4) Don't output frames to the renderer when SubmitDecoderBuffers was slow to return
 Result? Synchronization was restored. In this case, the one B frame (B2 above) was dropped several times until synchronization was stored. The process was periodically repeated in order to maintain sync.

In the workarounds (2) and (4), audio/video sync was restored without dropping B frames at the source. With (1), B frames were dropped before they were even passed to the decoder. In all three cases, the GPU load was decreased and audio/video sync was restored.
In (3), it was not possible to restore synchronization simply by dropping the second B frame. We imagine this is because the processing of the first B frame is still taking too much time. Furthermore, turning interlacing off for only one in four B frames is simply insufficient.

Regarding SubmitDecoderBuffers - when performing high speed processing, is the time taken by SubmitDecoderBuffers a fundamental limitation, or might there be another reason why it is taking so long? Can you explain the internal mechanisms which are causing SubmitDecoderBuffers to be the bottleneck in these circumstances?

0 Kudos
2 Replies

I doubt it should have something to do with the NumBuffers availability.Did you try playing with that property by any chance?

0 Kudos

Thank you for taking the time to articulate the issue so thoroughly. Please give me some time to get your issue connected with the people at Intel that can get to the bottom of it. We may need to ask for some more detail from you, but we'll make sure we answer as much as we can without you having to go into further detail.

0 Kudos