- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i write a encode applicationit real time encoding stream ,it is a console app . i have 3 simultaneous 1080i stream ,my app process is,VPP: color convert(yuy2->nv12) & (1080i->1080p) ,ENC (YUV->H264)
my cpu is 2600k, os:win7 U 32bit
my Configuration: H.264 codec, 1920x1080, 25fps, 4 Mpbs, MSDK balance quality setting,system memory surfaces
my problems and questions:
1. when 2 simultaneous 1080i input it can output 25fps 1080p (CPU utilization is 6% per stream),but 3 simultaneous the output only 18-20fps(CPU utilization is 8% per stream),when input is6 simultaneous 720*576 i 30fps it can output 30fps but 7 simultaneousthe output decrease 25fps ,8 simultaneous the output decrease 15-18 fps,why?Petter Larsson said it has great performance
2.Whether memory surfaces type influenced greatly performance?
3. whether 64bitOSis better than 32bit OS for encode performance?
4.i set bd3dAlloc=true
in follow code(pipeline_encode.cpp function Allocframes()): (1) return MFX_ERR_NONE but (2) return MFX_ERR_MEMORY_ALLOC why?how to resolve it? i set bd3dAlloc=false is no problem
EncRequest.NumFrameMin = nEncSurfNum;
EncRequest.NumFrameSuggested = nEncSurfNum;
memcpy(&(EncRequest.Info), &(m_mfxEncParams.mfx.FrameInfo), sizeof(mfxFrameInfo));
EncRequest.Type = MFX_MEMTYPE_EXTERNAL_FRAME | MFX_MEMTYPE_FROM_ENCODE;
if (m_pmfxVPP)
{
EncRequest.Type |= MFX_MEMTYPE_FROM_VPPOUT; // surfaces are shared between vpp output and encode input
}
// add info about memory type to request
EncRequest.Type |= m_bd3dAlloc ? MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET : MFX_MEMTYPE_SYSTEM_MEMORY;
// alloc frames for encoder
(1)sts = m_pMFXAllocator->Alloc(m_pMFXAllocator->pthis, &EncRequest, &m_EncResponse);
MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);
// alloc frames for vpp if vpp is enabled
if (m_pmfxVPP)
{
VppRequest[0].NumFrameMin = nVppSurfNum;
VppRequest[0].NumFrameSuggested = nVppSurfNum;
memcpy(&(VppRequest[0].Info), &(m_mfxVppParams.mfx.FrameInfo), sizeof(mfxFrameInfo));
VppRequest[0].Type = MFX_MEMTYPE_EXTERNAL_FRAME | MFX_MEMTYPE_FROM_VPPIN;
// add info about memory type to request
VppRequest[0].Type |= m_bd3dAlloc ? MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET : MFX_MEMTYPE_SYSTEM_MEMORY;
VppRequest[1].Type |= m_bd3dAlloc ? MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET : MFX_MEMTYPE_SYSTEM_MEMORY;
(2)sts = m_pMFXAllocator->Alloc(m_pMFXAllocator->pthis, &(VppRequest[0]), &m_VppResponse);
MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me try to address your questions one by one.
Not sure what may be different in your setup. Possibly other bottlenecks such as file access, implicitsurface copies or color conversions?
One way to determine pipeline efficiency is to the Intel Graphics Performance Analyzer (GPA). This tool is free, for more info please refer to this white paper:http://software.intel.com/en-us/articles/using-intel-graphics-performance-analyzer-gpa-to-analyze-intel-media-software-development-kit-enabled-applications/
2. Yes, the memory surfaces used will have a large impact on HW accelerated path performance, especially for a coupled pipeline like yours. I used D3D surfaces only for the benchmark above. If you use system memory to store your surfaces many surfaces copes would occur.
3&4: 32bit OS only has access to a limited amount of grapics memory. Since you are running on a 32 bit OS you are likely running out of memory when using D3D surfaces (bd3dAlloc=true). For heavy loads such as yours I encourage you to use 64 bit OS instead, and plenty of RAM.
32 vs. 64 bit performance should not differ much.
Some questions:
- What driver version are you using
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2.yeah,my app is a command line app it has a pipeline thread (VPP+Encode)to process a videostream and anther thread process info like aac ,network.itprocess input stream and send to network,the network is not bottlenecks,because it in other thread and has a list as buffer.i run n apps at a computer.
i run the mchecker got report i upload it peter ,can you check it for me ?thanks
i think my VPP cost GPU too much,is it?Your test case is whether to use vpp?
my input stream is 1920*1080i yuy2
frist, i use vpp change input to 1920*1080p nv12
2nd, i use encoder encode it to h264 stream.
i am prepare move this project to x64 os.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not recommend using Media Checker, it's a tool that has not been updated in a while and is known to not work well in many scenarios. If you want to capture more detailed Media SDK API traces I instead recommend using the tracer tool that is part of the Media SDK package (pre frame logging provides more details).
So you are using one thread per Media SDK session?
Yes, the setup I created to replicate your scenario is YV12 interlaced -> (VPP) -> NV12 progressive -> (Encode) -> H.264
The only difference is that you are using YUY2. I do not expect any difference in performance.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regards,
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page