Parallelizing video encoder with TBB?

Nick_Chen · ‎08-03-2011

Has anyone implemented video encoders with complex dependencies between frames (like MPEG-4 or H.264) with TBB?

The encoding algorithm has a pipeline structure:

Input (token = frame) -> Predict (uses reference frames) -> Residuals -> Quantization -> Encode -> Output (also copied to global buffer)

so we'd like to exploit this by using a TBB pipeline, however, dependencies between frames make this very difficult.

These encoders allow a frame to predict its type based on frames before and/or after itself. The referenced frames can be arbitrarily far away (though typically within the previous 5 or next 5 frames, say) and they must be encoded and stored in a global buffer before they can be used as a reference. In short, there's a dependency DAG for frames.

The problem is making sure reference frames are completed before some frame requires them in the prediction stage. Simply waiting on a condition variable is unsatisfactory because it shuts down the entire thread, meaning that most of the time only a few threads will actually be doing work.

Suggestions? Or sample code showing how to address similar complex dependencies in similar applications would also be helpful.

To provide more context, this is being done in a research setting. We are trying to see if we could use TBB to express the parallel patterns used in the PThreads versions of applications. In this case, the reference application is x264 which threads in a complex manner for parallelization.

RafSchietekat · ‎08-03-2011

A dependency DAG can be expressed as a normal task tree plus additional explicit reference-count manipulations, possibly composed with dependencies from subdividing frames.

Nick_Chen · ‎08-05-2011

Has anyone tried whether the new Graph interface in TBB's community preview be suitable epressing this DAG? Or would it be too heavyweight and a bare-bones implementation using custome task trees would work better?

We are just trying to get some advice before proceeding.

ARCH_R_Intel · ‎08-05-2011

[Moving this reply from duplicate thread.]

tbb::pipeline is unlikely to have the flexibility to do what you wish. Using either the new tbb::flow::graph framework or using raw tbb::task objects seems appropriate. Chapters 9 and 10 of the TBB Design Patterns manual have examples of building special-purpose schedulers on top of tbb::task .

Another approach that I have heard other people use is to turn the problem into embarrassing parallelism by giving up a little compression. Break the input into separate chunks and compress each separately. That means the beginning of each chunk cannot use frames in the previous chunk as reference frames, hence some compression is lost. Of course this is not practical for streaming applications.

Once piece of advice if you build your own scheduler on top of tbb::task: Write it as a template so that you can instantiate it with dummy types that do heavy-duty checking. Being able to solidly test it without the complications of real video encoding logic will save you grief.

RafSchietekat · ‎08-08-2011

"tbb::pipeline is unlikely to have the flexibility to do what you wish. Using either the new tbb::flow::graph framework or using raw tbb::task objects seems appropriate. Chapters 9 and 10 of the TBB Design Patterns manual have examples of building special-purpose schedulers on top of tbb::task."
How would graph offer useful additional flexibility over pipeline in this situation, and does that assume subtiling?

Maybe I missed something, but the dependencies seem too ad hoc and continuously changing for any existing algorithm. A coder can isolate itself between substantial scene changes (unpredictable in both occurrence and content), but what is the cost (computational, compressionwise and visually) of subtiling? I suppose that tying division of work to the motion prediction logic could be too ambitious for an initial attempt, and I don't know enough to be able to even guess how beneficial it would be, but it's something I would be tempted to explore.