I ran into a race condition issue and maybe I am just abusing the Intel Media SDK API but I'd like to get a clarification on the following thing - in a pipeline with VPP and ENCODE where I use MFXVideoCORE_SyncOperation on synchronization points from both RunFrameVPPAsync and EncodeFrameAsync does the order matter?
The scenario is following:
1)I have a thread that has a queue with synchronization points from EncodeFrameAsync that are popped out from the queue, synchronized with MFXVideoCORE_SyncOperation and the bitstreams are processed.
2)This list is populated from another thread by a function that gets an RGB frame, calls RunFrameVPPAsync on the frame , EncodeFrameAsync and calls MFXVideoCORE_SyncOperation on the synchronization point from RunFrameVPPAsync to make sure that the caller can dispose of the RGB frame when the function returns. I signal the thread described above before I do MFXVideoCORE_SyncOperation on the VPP syncrhonization point (my assumption was to wake it up earlier to reduce the latency).
I observed that with bigger resolutions I tend to get error MFX_ERR_DEVICE_FAILED for VPP and MFX_ERR_DEVICE_FAILED for ENCODE while smaller resolutions work. I made traces of both cases (attached) and what is visible from the trace that in the erroneous case the ENCODE synchronization comes first in the trace, but in the good one VPP is syncrhonized first.
Am I abusing the API (calling MFXVideoCORE_SyncOperation on an operation after already synchronizing with a later step) or is this an API error?
Both traces are attached.
Your pipeline is not very clear, a pictorial representation will really help us understand better.
Here are some notes;
- Media SDK functions for VPP/Encode and decode are non-blocking call-backs. So, if you want to sync after every call-back, you are optimizng for latency but not throughput. If you want to improve throughput, then you can increase the async depth parameter to allow multiple call backs in flight (until you have to synchronize).
- (With or without async) one should use the QueryIOSurf function after Media pipeline is defined. This function returns the number of surfaces needed for the pipeline. If you have more than one pipeline, each of them should have their associated QueryIOSurf function.
- Sync points are associated with the surfaces defined in the QueryIOSurf function and cannot be used across threads or handled in different threads. We do not recommend users to thread their Media application since Media SDK, by implementation, implements threads under the SDK layer to execute on the GPU. You can certainly spawn multipl media sessions to process multiple media pipelines, and you can look at sample_multi_transcode as the sample example.
- Why your app is failing for one res while not for other - It is a race condition seems like. Before QueryIOSurf function is called, the application defines bufferSize to be used for each surface. This buffer size, in combination with your sync-lock mechanism may be giving rise to a scenario where small res works while others dont - race condition most likely.
Again, we need pcitorial representation of your pipeline to give more focused feedback. Thanks.
thanks for the reply. As for the notes - I am optimizing for latency so as seen in the log async depth is always set to 1 (this is also why I was not calling QueryIOSurf on VPP since there should not be more than one working frame, and this used to work before).
Sync points are associated with the surfaces defined in the QueryIOSurf function and cannot be used across threads or handled in different threads.
Is this true? If so then the pipeline in the image attached certainly does not meet the criteria, but why did it work before? Coincidence? So this means that I cannot push more work to the encoder w/o synchronizing with the bitstream in the same thread if I am going for minimal latency? Two threads seemed to solve this issue fine.
Pipeline visualization attached.