What's the best strategy to fully utilize computer with hardware/software encoding
My objective is to fully utilize the computer to encode as many frames as possible using combination of graphics card & CPU. The input is already in decode NV12 format waiting to distribute to various threads below for encode, and there is no VPP stage. For each call to encode the buffer is immediate sync to wait for the result.
Assume the computer has N cores using Intel graphics card. Here are a couple of ways I can think of
(1) Create N+1 threads using MFX_IMPL_AUTO and share all sessions, hope that Media SDK is smart enough to distribute the software and hardware encoding. Encoding parameter set to use single thread only for all session.
(2) Create N+1 threads, one use MFX_IMPL_HARDWARE, and the rest N use MFX_IMPL_SOFTWARE with share sessions. Encoding parameter set to use single thread only for all session.
(3) Create 2 threads, one use MFX_IMPL_HARDWARE with encoding parameter set to use single thread, the other use MFX_IMPL_SOFTWARE with encoding parameter set to use N thread.
By graphics card you mean Intel Processor graphics, not a discrete card right?
Some comments on the options you listed:
(1) If Intel Media SDK is used with the AUTO implementation, the actual implementation (HW or SW) is selected when initializing the session. There is no logic with regards to load balance between HW and SW.
(2) This may be the best approach for what you're trying to do. However, if you want to perform multiple stream encode using the HW then you will likely need corresponding number of threads. The trick to this approach is to know how much load you can put on HW, maintaining your target fps, without saturating the GPU. If your workloads are uniform this may just be a simple calculation (based on your own workload metrics) to determine the # of HW encodes that can be run simultaneously. If your workloads are not uniform then inferring GPU load will be complicated. Instead you may want to explore the new capabilities of Intel Graphics Performance Analyzer (GPA) 4.1 for which we have added API to query the GPU about the momentary load.
Note that the NumThread parameter is deprecated starting form Media SDK 3.0. This parameter has no real relevance for the HW codec case. In the SW codec case, make sure to use Join and Disjoin to enable efficient sharing of Media SDK resources (avoiding threading oversubscription).
Yes, using Intel processor graphics. Thanks for your advice. Response to (2) the input queue will feed workload to HW or SW threads whenever last task is done so workload should be balance automatically. That's why I think only one HW thread is good enough unless HW can do multiple encode in parallel.
Good to know. Then I will have N SW threads & M HW threads to run simulation in order to finger out the optimize value o M ! If I know how many multiple encode sessions HW can do then I can set M equal to that.