- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We have now been using the Intel Media SDK for a number of years with great success. Recently however we have run into a rather major issue that we have been unable to resolve and would appreciate your feedback on.
In short, we push your platforms pretty hard, using significant CPU and memory bandwidth for real time video processing. We then use the Media SDK to perform hardware assisted H.264 encoding, however the moment that we start encoding a 1920x1080 interlaced video stream we see the performance of the entire system drop dramatically. All processes, even ones that are not using the Media SDK, seem to immediately start using almost twice the CPU usage. When we encode at lower resolutions (e.g.720x480) we do not see the same problem.
I have attached an example CPU plot that shows the problem; basically if we run our application as normal it uses some level of CPU. When we then start recording using the Intel Media SDK, even a single stream of video makes the entire CPU usage jump up dramatically; what cannot be seen on this plot is that the CPU usage of the Media SDK process is actually relatively low, but when it is running the "CPU time" spent in all the other processes increases a lot; presumable because the overall CPU memory bandwidth drops and so reads and writes tend to stall.
I have made a test application that demonstrates at least part of this problem. It basically works as follows :
1) It launched 8 threads doing memcpy's and measures the Mb/s copied on each one.
2) After 30seconds it will then start writing a 1080i M4V file to disk using the Intel Media SDK using hardware encoding.
What you see is basically the following :
Thread 1, memcpy = 1.47Gb/s
Thread 3, memcpy = 1.44Gb/s
Thread 2, memcpy = 1.43Gb/s
Thread 5, memcpy = 1.43Gb/s
Thread 0, memcpy = 1.43Gb/s
Thread 6, memcpy = 1.42Gb/s
Thread 7, memcpy = 1.36Gb/s
Thread 4, memcpy = 1.36Gb/s
[ snip ]
Starting to record to disk using Intel Media SDK ...
[ snip ]
Thread 1, memcpy = 0.99Gb/s
Thread 5, memcpy = 1.21Gb/s
Thread 0, memcpy = 0.95Gb/s
Thread 3, memcpy = 1.41Gb/s
Thread 7, memcpy = 1.07Gb/s
Thread 4, memcpy = 1.17Gb/s
Thread 2, memcpy = 1.11Gb/s
Thread 6, memcpy = 1.11Gb/s
In other words, the memory performance when using the Media SDK about 30%.
Our best guess is that memory bandwidth across the system drops dramatically when using the hardware encoder, and would appreciate any guidance that you might have in either diagnosing or avoiding this issue.
Thank you,
Cary Tetrick on behalf of Andrew Cross.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cary,
Can you please share some more information about your system configuration such as: Processor/Platform, OS, Media SDK version, driver version.
To understand your workload, also please expand on the pipeline: Are you using Encode+VPP or just Encode, system memory or D3D(9 or 11) memory surfaces, muxing with audio?
Regards,
Petter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel Media SDK System Analyzer (64 bit)
The following versions of Media SDK API are supported by platform/driver:
Version Target Supported Dec Enc
1.0 HW Yes X X [Adapter 1]
1.0 SW Yes X X
1.1 HW Yes X X [Adapter 1]
1.1 SW Yes X X
1.3 HW Yes X X [Adapter 1]
1.3 SW Yes X X
1.4 HW Yes X X [Adapter 1]
1.4 SW Yes X X
1.5 HW No
1.5 SW Yes X X
1.6 HW No
1.6 SW Yes X X
Graphics Devices:
Name Version State
Intel(R) HD Graphics 9.17.10.2932 Active
NVIDIA GeForce GT 440 9.18.13.1106 Active
System info:
CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
OS: Microsoft Windows 7 Professional
Arch: 64-bit
Installed Media SDK packages (be patient...processing takes some time):
Intel« Media SDK 2013 (x64)
Intel(R) Media SDK 2012 R3 (x64)
Intel(R) Media SDK 2012 R3 (x86)
Intel(R) Media SDK 2012 R2 (x64)
Installed Media SDK DirectShow filters:
Intel« Media SDK MP3 Decoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\imc_mpa_dec_ds.dll
Intel« Media SDK JPEG Decoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\jpeg_dec_filter.dll
Intel« Media SDK MPEG-2 Splitter : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\imc_mp2_spl_ds.dll
Intel« Media SDK H.264 Encoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\h264_enc_filter.dll
Intel« Media SDK MVC Decoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\mvc_dec_filter.dll
Intel« Media SDK AAC Decoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\imc_aac_dec_ds.dll
Intel« Media SDK MPEG-2 Decoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\mpeg2_dec_filter.dll
Intel« Media SDK MP4 Splitter : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\imc_mp4_spl_ds.dll
Intel« Media SDK MPEG-2 Muxer : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\imc_mp2_mux_ds.dll
Intel« Media SDK MP4 Muxer : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\imc_mp4_mux_ds.dll
Intel« Media SDK H.264 Decoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\h264_dec_filter.dll
Intel« Media SDK MP3 Encoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\imc_mpa_enc_ds.dll
Intel« Media SDK AAC Encoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\imc_aac_enc_ds.dll
Intel« Media SDK MPEG-2 Encoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\mpeg2_enc_filter.dll
Intel« Media SDK VC-1 Decoder : C:\Program Files\Intel\Media SDK 2013\samples\_bin\x64\vc1_dec_filter.dll
Installed Intel Media Foundation Transforms:
Intel« Hardware VC-1 Decoder MFT : {059A5BAE-5D7A-4C5E-8F7A-BFD57D1D6AAA}
Intel« Hardware H.264 Decoder MFT : {45E5CE07-5AC7-4509-94E9-62DB27CF8F96}
Intel« Hardware MPEG-2 Decoder MFT : {CD5BA7FF-9071-40E9-A462-8DC5152B1776}
Intel« Quick Sync Video H.264 Encoder MFT : {4BE8D3C0-0515-4A37-AD55-E4BAE19AF471}
Intel« Hardware Preprocessing MFT : {EE69B504-1CBF-4EA6-8137-BB10F806B014}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(Hmmm... no preview in comments) - just in case the first attempt was unreadable:
Graphics Devices:
Name Version State
Intel(R) HD Graphics 9.17.10.2932 Active
NVIDIA GeForce GT 440 9.18.13.1106 Active
System info:
CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
OS: Microsoft Windows 7 Professional
Arch: 64-bit
Installed Media SDK packages (be patient...processing takes some time):
Intel« Media SDK 2013 (x64)
Intel(R) Media SDK 2012 R3 (x64)
Intel(R) Media SDK 2012 R3 (x86)
Intel(R) Media SDK 2012 R2 (x64)
This is from my dev system, which is using the 2013 SDK, but product systems have 2012 R3 and show the same symptems.
We are using only Encode in the pipeline. Currently released systems use system memory, but I recently changed things to use D3D9 surfaces which seems to have made a very small improvement. We use our own code to encode video, and use ffmpeg libs to encode AAC audio, and mux both.
In my testing, I was able to reproduce the problem just by running sample encode in a command window behind our code.
In our own code, if I bypass just the calls to encode and sync, I don't see thing happening.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cary,
I cannot run the executable project you provided since there is a DLL missing: "Codec.speed.x64.dll".
In any case, based on your description and system configuration, I'm pretty sure the reason for the observed behavior is due to the nature of recent generations of Intel Core processors. These processors have a capability called Intel Turbo Boost Technology which in essence lets the GPU and CPU part of the processor share the power envelope (TDP).
http://en.wikipedia.org/wiki/Intel_Turbo_Boost
For instance, if the GPU is idle and only one processor core is used by single threaded workload then that single core can execute at higher frequency. Consider the case when CPU is executing at high frequency, then we add a GPU intensive workload. Since there is a max TDP and the processor resources are shared, the CPU(s) must decrease frequency to allow for simultaneous GPU execution. This, in turn leads to overall higher CPU utilization.
Regards,
Petter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Where you guys able to reproduce this? Were you able to get our test sample to work?
Thanks,
Cary
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cary,
Yes, we can execute your application using the DLL you provided. Looking at the Media SDK API calls made from the application it does not give me any clues to what may be going on.
I still believe the behavior you observe is likely related to the way Intel Turbo Boost Tech. works.
Can you tell me a bit more about how the surfaces are delivered to the Media SDK encoder (copies, read from disk, backbuffer?)?
Regards,
Petter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I downloaded the Turbo Boost Monitor (2.6). It doesn't seem to behave the way you describe (i.e., doesn't drop running our software.or this test) It stays at 3.5GHz, on a machine with a 3.4GHz rated CPU. Is there another monitor or tool to monitor this? (or a performance counter?)
In the test code, we have a blank YUV frame buffer. This is passed into our writer through a memory mapped file mechanism. In the writer, I allocate surfaces (D3D9) in advance, much like in your sample code. When I get a frame, I get one of the unused buffers, lock it (D3D lock), Then it gets copied into the surface using one of our YUV conversion functions. These are highly opimized and use SSE2. Once filled, the surface is locked, and is then suubmited for encoding.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cary,
Unfortunately I do not think the available turbo/frequency monitoring tools conveys the TDP sharing between CPU and GPU accurately. You could check out the following tools, but they will likely give you the same result.
http://software.intel.com/en-us/articles/intel-power-gadget/
http://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization
We are exploring the behavior of your test application a bit further. I will let you know what we find.
One thing I noted is a potential thread contention issue. It looks like the first stage of your workload (not using Media SDK) is using as many threads as logical cores. You may try changing the number of threads you use to drive the SW workload. You can also explore to see if there is any impact in changing the Media SDK parameter "NumThreads" to 1 or 0 (Media SDK decides). For HW accelerated workloads there is no point in explicit use of many threads.
Regards,
Petter
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page