We have three platforms below all running the same configuration - stock CentOS 7.4 with Media Server Studio 2018 R2.
- Xeon E3-1585v5 (Supermicro X11SSH-GF-1585)
- Xeon E3-1285v4 (Supermicro X10SLH-F)
- Core i3-6100 (Supermicro X11SSH-LN4F)
We are experiencing some strange performance stats when comparing transcoding throughput. All testing below has been completed using sample_multi_transcode with the following options:
$ sample_multi_transcode -i::h264 test_es.264 -async 4 -o::h264 out.264 -b target_bitrate
What we are finding is that performance with the 1285v5 with smaller frame sizes is much lower than we would have expected, and in most cases is beaten by the Broadwell Xeon and the Skylake i3. Figures below are for a single instance of sample_multi_transcode.
1585v5 = 1270 fps
1285v4 = 1614 fps
i3-6100 = 1560 fps
1585v5 = 371 fps
1285v4 = 413 fps
i3-6100 = 381 fps
1585v5 = 528 fps
1285v4 = 558 fps
i3-6100 = 466 fps
1585v5 = 312 fps
1285v4 = 328 fps
i3-6100 = 257 fps
1585v5 = 111 fps
1285v4 = 106 fps
i3-6100 = 79 fps
The only time the 1585v5 comes out on top (and not by much) is when processing UHD frames. Running our own transcoding application which uses the SDK directly gives us very similar results, as does executing multiple transcoding jobs in parallel. Given the larger number of EUs we were expecting to see a much higher level of performance across the board with the v5 Xeon than we seem to be achieving. Are we missing something or is this expected behavior?
Thanks in advance,
I check some validation results test case with one input stream fan out to multiple output stream we have and I think this should not be the expected result.
From your test, it seems using a 1 to 1 validation on sample_multi_transcode, can you confirm?
There are also cases which certain content has different impact to the test, what's kind of content are you using?
I also looked at the performance on different platform and I didn't see a big difference between Core and Xeon.
Please also be carefully on the test configurations, make sure the clip is long enough since the first 10 seconds will be the time to set up the system. In this context, the longer you run, the stable the environment is.
Please also change "-async" to 1 and set "-hw" explicitly.