Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Ralf_K_
Beginner
47 Views

VMem <> SysMem behavior with RGBA in Vpp.Out

Pipeline (non async)

DecodeH264 (NV12) -> VPPIn(NV12) -> VPPOut(RGB4) -> EncodeH264(NV12)

VMem works fine
SysMem fails with MFX_ERR_NULL_PTR (Sync-Operation in Encoding-Stage)

same Pipeline with NV12 instead of  RGB4 is OK for both

DecodeH264 (NV12) -> VPPIn(NV12) -> VPPOut(NV12) -> EncodeH264(NV12)

VMem works fine
SysMem works fine
 
 
My SysMem Allocator:
	mfxFrameSurface1&& Session::GetSysMemSurface(mfxFrameInfo& frameInfo)
	{
		mfxFrameSurface1 surface{};
		surface.Info = frameInfo;
		mfxU32 fourCC = frameInfo.FourCC;
		if ((surface.Info.FourCC != MFX_FOURCC_NV12) && (surface.Info.FourCC != MFX_FOURCC_RGB4))
		{
			throw exception("FourCC not supported");
		}
		auto width = Align32(frameInfo.Width);
		auto height = Align32(frameInfo.Height);
		int bytecount = 0;			
		if (fourCC == MFX_FOURCC_NV12)
		{
			bytecount = width*height * 3  / 2;//12 bit
		}
		if (fourCC == MFX_FOURCC_RGB4)
		{
			bytecount = width*height * 4;//32bit
		}
		systemMemory_.push_back(vector<UINT8>(bytecount));
		auto& pSurfaceVector = systemMemory_.back();

		UINT8* psurfaceSysMem = &(pSurfaceVector)[0];
		auto mod = reinterpret_cast<UINT64>(psurfaceSysMem) % 32;
		if (mod != 0)
			throw exception("Mem align error");

		if (fourCC == MFX_FOURCC_NV12)
		{
			surface.Data.Y = psurfaceSysMem;
			surface.Data.U = surface.Data.Y + width*height;
			surface.Data.V = surface.Data.U + 1;

			surface.Data.Pitch = width;
		}
		if (fourCC == MFX_FOURCC_RGB4)
		{
			surface.Data.B = psurfaceSysMem;
			surface.Data.G = psurfaceSysMem + 1;
			surface.Data.R = psurfaceSysMem + 2;
			surface.Data.A = psurfaceSysMem + 3;
			surface.Data.Pitch = width*4;
		}
		return move(surface);
	}

API-Level 1.17

Any Ideas?

Ralf

 

0 Kudos
9 Replies
Surbhi_M_Intel
Employee
47 Views

Hi Ralf, 

We need couple more things to debug this issue- Please send us hardware information, you can send us the system analyzer logs if windows, system analyzer is present at installed directory which provide details about your hardware? If not windows, then please send us Media Server Studio version, CPU info, GPU info, o/p of cmd - "uname -r". 
Another thing we might need is a reproducer to reproduce this issue locally, can you please send us one?

Thanks,
Surbhi

Ralf_K_
Beginner
47 Views

Hi Surbhi,

Hardware: INTEL NUC BOXNUC5I7RYH 5th Gen Core i7-5557U 3,1Gh

Ralf

Ralf_K_
Beginner
47 Views

Hi Surbhi,

the issue should be reproducable with any simple pipeline as described

the line of code i have to modify is this:

vppVideoParams_.vpp.Out.FourCC = MFX_FOURCC_NV12;//ok with VMem and SysMem

vppVideoParams_.vpp.Out.FourCC = MFX_FOURCC_RGB4;//VMem OK, fails with SysMem


the failure appears in SyncOperation in Encoding stage:

					mfxSyncPoint  syncp = nullptr;
					mfxEncodeCtrl* ctrl = nullptr;
					auto sts = encoder_->EncodeFrameAsync(ctrl, pVppSurface, &encoderBitStream_, &syncp);
					switch (sts)
					{
					case MFX_ERR_NONE:
						if (syncp != nullptr)
						{
							CheckError(session_.SyncOperation(syncp, SyncWait));//SyncOperation returns MFX_ERR_NULL_PTR, if SysMem and VPP.OUT with RGB4 is used
							encoderdata_(encoderBitStream_);
						}

 

I tried x86 and x64

If you cannot reproduce the behaviour, pls take a look at my SysMemAllocator from my initial post. Is there something i have forgotten to initialize for RGB4:

		if (fourCC == MFX_FOURCC_RGB4)
		{
			surface.Data.B = psurfaceSysMem;
			surface.Data.G = psurfaceSysMem + 1;
			surface.Data.R = psurfaceSysMem + 2;
			surface.Data.A = psurfaceSysMem + 3;
			surface.Data.Pitch = width*4;
		}

 

Thanks,

Ralf

BTW: It would be great if a General Memory Allocator would be part of the Media SDK (i.e. something like the general_allocator from sample_common)

 

Surbhi_M_Intel
Employee
47 Views

Hi Ralf, 

I don't see a problem in system memory allocation, this pipeline can be set using sample_multi_transcode with few changes. Will it be possible for you to set the pipeline and test the application on your platform? If not, please send your application in which you see the failure.  Regarding your application, can you please let us know the reason of adding another vpp stage - NV12 to RGB4 and then use NV12 as encode input? Also, in which scenario you need to add this VPP stage in system memory, it is best to keep the entire pipeline on video memory to get high performance from your under lying hardware. Keeping it on system memory will add extra copies and hence decrease the performance. 

Thanks,
Surbhi

 

 

Ralf_K_
Beginner
47 Views

Hi Surbhi,

thanks for your response.

if i use

DecodeH264 (NV12) -> VPPIn(NV12) -> VPPOut(RGB4) -> EncodeH264(RGB4)

instead of

DecodeH264 (NV12) -> VPPIn(NV12) -> VPPOut(RGB4) -> EncodeH264(NV12)

the behaviour is for

VMem: MFX_ERR_MEMORY_ALLOC  in frameAllocator_.Alloc
SysMem: MFX_ERR_INVALID_VIDEO_PARAM in encoder_->Init

What i want to do in a first step:

DecVppEnc.JPG
It is an existing algorithm, which i want to use. If available i want to use GPU for decoding encoding and scaling purposes. I know the cost for locking VMem, but it is hard to implement the algorithm in OpenCL for me atm to stay in VMem

Although i am thinking about advanced pipelines like this (first draft):

AdvPipeline.JPG

Ralf

Ralf_K_
Beginner
47 Views


a working workaround for this behavior for me is:

i use

DecodeH264 (NV12) -> VPPIn(NV12) -> VPPOut(NV12) -> EncodeH264(NV12) (which works in both SysMem and VMem)

In case of VMem (GPU/HardwareMode) i do:

Lock the RGBA Surface
Use my existing RGBA Algo
UnLock RGBA Surface

In case of SysMem (CPU/SoftwareMode) i do

Lock the NV12 Surface
Do color conversion with IPP in CPU from NV12 to RGBA
Use my existing RGBA Algo
Do color conversion with IPP in CPU from RGBA to NV12
UnLock NV12 Surface


There is no additional penalty for Hardware Mode
Software - Mode is only used as Fallback, if HardwareMode is not available. CPU has to do the color conversion in both cases (in Intel Software Driver, or in user code)

I can handle now the issue in a way which is absolute OK. The only small disadvantage is , that i have to check more often in code if hardware is supported or not ...

Ralf


 

Surbhi_M_Intel
Employee
47 Views

Hi Ralf, 

Good to know that you have a workaround at the moment. I tried to set the reproducer using sample_multi_transcode by moving VPP color conversion stage to system memory and keep the rest of the pipeline to GPU and didn't see the issue. Do you know if it happens after few iterations? How are you setting VPP out pattern, this is how I have set it - 
 m_mfxVppParams.IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY|MFX_IOPATTERN_OUT_VIDEO_MEMORY

Since I wasn't able to reproduce the issue, it will be good to check if you environment with latest updates. Which version of Media SDK or Media Server Studio version you have? OS and driver? You can use sample_multi_transcode to figure out if the issue is with your application or the underlying platform. 

Thanks,
Surbhi

 

Ralf_K_
Beginner
47 Views

Hi Surbhi,

you tried to reproduce the issue in hardweare-mode, but hardware mode worke fine for me, too.

The issue happens in software mode -> this means the complete pipeline uses SysMem:

Decoder:
decoderVideoParams_.IOPattern = MFX_IOPATTERN_OUT_SYSTEM_MEMORY;

Vpp:
vppVideoParams_.IOPattern =  MFX_IOPATTERN_IN_SYSTEM_MEMORY | MFX_IOPATTERN_OUT_SYSTEM_MEMORY;

Encoder:
encoderVideoParams_.IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY;


My enviroment:

Windows 10 Pro Build 14279.rs1_release 1602229-1700 latest updates installed
Intel Parallel Studio XE2016 Composer Editionfor C++ latest updates installed
Intel® Media Software Development Kit 2016 Release Notes (Version 7.0.0.311)
MS Visual Studio 2015 Update 1
 

Ralf

Ralf_K_
Beginner
47 Views



while trying to add MJPeg Encoding, another issue regarding the pipline with VPPOut(RGB4) appears (this issue is in Hardware-Mode):


DecodeH264 (NV12) -> VPPIn(NV12) -> VPPOut(RGB4) -> EncodeH264(NV12)

works fine, but

DecodeH264 (NV12) -> VPPIn(NV12) -> VPPOut(RGB4) -> EncodeMJPEG(NV12)

creates wrong Jpegs ... but there are no errors while setting up and running the transcoding pipeline

It seems it is best to stay in NV12 ...

DecodeH264 (NV12) -> VPPIn(NV12) -> VPPOut(NV12) -> EncodeMJPEG(NV12)

works fine.

Ralf

Reply