- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for providing the source code here and the instructions on how to use.
If I got you right here, you try to encode multiple streams simultaneously and you observe different behavior based on the number of encodes you are doing. In some kind of “time slicing” / round-robin fashion you are encoding video streams.
The answer here is quite complex and depends how the system got setup. Only an intensive performance tuning session on that system would give an answer here. A trace file is needed to fully understand whats happening. You might want to look at the following two tools which will give you more insights:
Not having access to the same platform you are using, I reran your code on a core i7-4650U Win8.1 system with the following performance figure:
Given the way your SW is architected, I don’t see anomalies here. The tutorial sample you are using here is to get started with Intel® Media SDK. For best performance you want to look at the code samples here which we also use for performance measurements.
You are measuring performance including disk I/O which might have an effect on what you see. Remove writing the elementary stream to disk and try to use just one RAW media stream in memory. This way you limit disk I/O influence. As a general guideline and to start with, you want to ensure that you are using GPU HW acceleration and that the GPU frequency is always running at maximum frequency. Joining sessions, use async acceleration structure with opaque memory will improve performance demonstrated in the additional encoding tutorial samples. Keep in mind that the CPU and the GPU are sharing the same TDP and that Turbo Boost effects are influencing the performance behavior.
As you are looking for best platform throughput you want to look at the Multi-Transcoding-Sample for best performance as well.
Best Regards
Bjoern
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Minliang,
Sure, please take a look at the following samples:
-
Tutorial: simple_3_encode_vmem
-
Tutorial: simple_3_encode_vmem_async – go for async of 2 for best performance
-
Samples: sample_multi_transcode
Those samples have additional optimizations integrated which you don’t want to miss integrating.
One more thing I observed: You are allocating resources for all your 18 streams in the beginning. You also do that for lower number of encodes. Make sure you only acquire resources you are actually using. Do not oversubscribe the HW resources as the driver might handle it but will lose performance.
Once you reached maximum throughput, bundle streams up in junks of work. In the above graph, your code works best running 3 encodes in parallel (adding optimizations above, you can do more!). So only add an additional stream if one encode completed. This way you are always running at maximum throughput.
Let me know how it worked out for you.
Best,
Bjoern
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bjoern,
Thank you so much.
1.I allocate 18 streams at the beginning,because I want to see whether this will affect the result.I test again in R7,It did affect,I modify the resource to 12,then sometimes it can run 12 streams in 500+FPS,but sometimes it go down to 200FPS.It just Reduce the proportion of problem.I doutbt that I just allow the resoure in HW mode and allow the surface in memory,does it affect the performance?
2.I just encode the sterams in one thread in the while(){}. I encode one frame then wait the result by :SyncOperation.After that ,I encode next frame.It means I just encode one frame at the same time.No matther how resoure I allow,the GPU only encode 1 frame at the same time.The performance should change little,does it wrong?
3.I want to konw In your project or other way,how many streams can encode at the same time ?Can I use other way to encode more than 20 streams ?I want to use the media sdk in real time monitor,I cannot encode one by one.
4."Once you reached maximum throughput, bundle streams up in junks of work." That means when the nDest >12,It did reach the maximum throughput?
Now,I should take time to see the codes you mention,Thanks again.I observe your time is:10/19/2015 - 23:14 , how hard you work,take care.good night.
Best wishes
minliang ou.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Minliang,
Let me close this thread here.
Best,
Bjoern Bruecher
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page