Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

GPU memory leak with MSDK using D3D11 for rendering (sample_decode)

Carl_L_
Beginner
2,670 Views

We're using Intel MSDK to decode H.264 video streams in our software, rendered with D3D11/DXGI. We discovered when stopping and starting streams we had a memory leak in GPU memory (observed primarily through ProcessExplorer/"System GPU memory"). We verified with DX debug layer that all objects were released correctly, and then we turned to checking the functionality in "sample_decode" to verify if the problems were in our implementation. 

Unfortunately (?) the memory issues can be seen even with sample_decode, with minor changes. The only changes done in code are:
- Added a for-loop to run the same decoder/render task 5 times instead of only 1 time.
- Ignoring the result of the RegisterClass call (to not abort since class already registered)
Code is attached, along with a compiled debug .exe.

* sample_decode based on "2018 R2" samples
* GPU memory issues seems to happen when rendering with D3D11
* Using D3D9 no memory leak could be observed
* Tested on multiple machines, but results below were running on a Skylake processor (HD 530)
 

D3D11 rendered
h264 -hw -d3d11 -r -async 4 -rgb4 -i c:\temp\bbb_sunflower_1080p_30fps_normal.mp4.264
=> GPU-memory increasing

D3D11 NOT rendered
h264 -hw -d3d11 -async 4 -rgb4 -i c:\temp\bbb_sunflower_1080p_30fps_normal.mp4.264
=> GPU-memory returns to 0 between each run

D3D9 rendered
h264 -hw -d3d -r -async 4 -rgb4 -i c:\temp\bbb_sunflower_1080p_30fps_normal.mp4.264
=> GPU-memory returns to 0 between each run

D3D11 software rendered
h264 -sw -d3d11 -r -async 4 -rgb4 -i c:\temp\bbb_sunflower_1080p_30fps_normal.mp4.264
=> GPU-memory increasing

 

See screenshots from Process Explorer below. 

D3D11 - rendering

D3D11 - not rendering

D3D9 - rendering

 

Could somebody from Intel look into this issue? Could it be something that may need to be handled differently in the code to mitigate this issue? Something not released correctly?  Obviously sample_decode is built to run once, but looking into how to handle opening/closing decoding streams with MSDK and D3D11 we would hope that the sample would at least initiate and close everything correctly anyway.

 

Best Regards,
Carl

0 Kudos
20 Replies
Carl_L_
Beginner
2,624 Views

Below is the output of Media System Analyzer:

 

Intel(R) Media Server Studio 2017 - System Analyzer (64-bit)


The following versions of Media SDK API are supported by platform/driver
[opportunistic detection of MSDK API > 1.20]:

        Version Target  Supported       Dec     Enc
        1.0     HW      Yes             X       X
        1.0     SW      Yes             X       X
        1.1     HW      Yes             X       X
        1.1     SW      Yes             X       X
        1.2     HW      Yes             X       X
        1.2     SW      Yes             X       X
        1.3     HW      Yes             X       X
        1.3     SW      Yes             X       X
        1.4     HW      Yes             X       X
        1.4     SW      Yes             X       X
        1.5     HW      Yes             X       X
        1.5     SW      Yes             X       X
        1.6     HW      Yes             X       X
        1.6     SW      Yes             X       X
        1.7     HW      Yes             X       X
        1.7     SW      Yes             X       X
        1.8     HW      Yes             X       X
        1.8     SW      Yes             X       X
        1.9     HW      Yes             X       X
        1.9     SW      Yes             X       X
        1.10    HW      Yes             X       X
        1.10    SW      Yes             X       X
        1.11    HW      Yes             X       X
        1.11    SW      Yes             X       X
        1.12    HW      Yes             X       X
        1.12    SW      Yes             X       X
        1.13    HW      Yes             X       X
        1.13    SW      Yes             X       X
        1.14    HW      Yes             X       X
        1.14    SW      Yes             X       X
        1.15    HW      Yes             X       X
        1.15    SW      Yes             X       X
        1.16    HW      Yes             X       X
        1.16    SW      Yes             X       X
        1.17    HW      Yes             X       X
        1.17    SW      Yes             X       X
        1.18    HW      Yes             X       X
        1.18    SW      Yes             X       X
        1.19    HW      Yes             X       X
        1.19    SW      Yes             X       X
        1.20    HW      Yes             X       X
        1.20    SW      Yes             X       X
        1.21    HW      Yes             X       X
        1.21    SW      Yes             X       X
        1.22    HW      Yes             X       X
        1.22    SW      Yes             X       X
        1.23    HW      Yes             X       X
        1.23    SW      Yes             X       X
        1.24    HW      Yes             X       X
        1.24    SW      Yes             X       X
        1.25    HW      Yes             X       X
        1.25    SW      Yes             X       X
        1.26    HW      Yes             X       X
        1.26    SW      Yes             X       X
        1.27    HW      Yes             X       X
        1.27    SW      Yes             X       X

Graphics Devices:
        Name                                         Version             State
        Intel(R) HD Graphics 530                     25.20.100.6444      Running / Full Power
        NVIDIA GeForce GTX 970                       25.21.14.1616       Running / Full Power

System info:
        CPU:    Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
        OS:     Microsoft Windows 10 Pro
        Arch:   64-bit

Installed Media SDK packages (be patient...processing takes some time):
        Intel(R) Media SDK 2018 R2 - HEVC GPU accelerated Encoder
        Intel(R) Media SDK 2018 R2 - Media Samples
        Intel(R) Media Server Studio 2017 - Video Quality Caliper
        Intel(R) Media SDK 2018 R2 - Software Development Kit
        Intel(R) Media SDK 2018 R2 - Documentation for HEVC
        Intel(R) Media SDK 2018 R2 - HEVC SW Encoder
        Samples for Intel(R) Media SDK 2017 for Windows*
        Intel(R) Media SDK 2018 R2 - HEVC SW Decoder

Installed Media SDK DirectShow filters:

Installed Intel Media Foundation Transforms:
        Intel(R) Hardware M-JPEG Decoder MFT : {00C69F81-0524-48C0-A353-4DD9D54F9A6E}

 

 

0 Kudos
Carl_L_
Beginner
2,624 Views

Hi,

Has anybody looked at this at all?
I can mention that our software is primarily used in industrial environments, where stability is key. Because of this issue we are now considering using other hardware decoding solutions than Intel, which would be unfortunate since it otherwise seems promising.

If any more information is needed, I'll be happy to provide it.

 

Best Regards,
Carl

0 Kudos
Mark_L_Intel1
Moderator
2,624 Views

Hi Carl,

Sorry for the late response, I have looked at your description and I can reproduce the issue.

This is the memory management issue which app or library didn't clean up the GPU memory for each run, I didn't see the memory increase during each run, it only happens between stop and start running.

I have submitted an investigation request to dev team and will keep you updated.

Mark

0 Kudos
Carl_L_
Beginner
2,624 Views

Hi Mark,

Ok, good. Correct, it seems that when closing a decoding session, not all memory is released correctly, so after too many video stream switches this causes our application to crash. (Our application is switching video streams by command of operators or automatically by )

Best Regards,
Carl

0 Kudos
Carl_L_
Beginner
2,624 Views

Any news with this issue?

Best Regards,
Carl

0 Kudos
Pascal_Binggeli
Beginner
2,624 Views

At some point, sample_decode has been updated to use CComPtr instead of raw pointers (which is a good thing). Instead of using the operator* to dereference the COM pointer, the code dereference the pointer directly (preventing the leak detection to work).

If you replace the line in sample_common/d3d11_device.cpp, line 262:

hres = m_pSwapChain->GetBuffer(0, __uuidof( ID3D11Texture2D ), (void**)&m_pDXGIBackBuffer.p);

with 

hres = m_pSwapChain->GetBuffer(0, __uuidof( ID3D11Texture2D ), (void**)&m_pDXGIBackBuffer);

You get an assert (in debug) indicating that the COM pointer was leaked.

Pascal

0 Kudos
Carl_L_
Beginner
2,624 Views

Hi! Thanks for the response, but I'm not sure I follow all the way.

Do you mean that the texture (m_pDXGIBackBuffer) is leaking? If so, wouldn't it be leaked either once (per running session), or once per frame? Neither case seems to be true as far as I understand, as the leaked memory is too large for once, and too small for once per frame.

We're not using CComPtr in our code (legacy reasons where we explicitly want to allocate/deallocate), so I'm not too familiar with CComPtr. What would you suggest as the solution for the sample in this case? (If you have time to answer)

Best Regards,
Carl

 

0 Kudos
Pascal_Binggeli
Beginner
2,624 Views

Hello Carl,

I'm not too sure about the size of the leak since there are a some resources which are leaked per frame. Also I did not try to understand how you measured the memory leak. I just remembered from the original sample, there was a few COM leaks around and apparently they are still here. To detect the leak, in d3d11_device.cpp, replace all the "&pointer.p" with "&pointer" and to fix it, place appropriate pointer.Release() call before dereferencing the pointer. I've tried your sample and I've had to do it in 3 places (m_pInputViewLeft, m_pOutputView and m_pDXGIBackBuffer.) 

Cheers,
Pascal

0 Kudos
Pascal_Binggeli
Beginner
2,624 Views

Hello Carl,

As mentioned in your original message, I've used Process Explorer to confirm that the memory leak disappears while rendering after fixing the COM leaks.

Pascal

0 Kudos
Mark_L_Intel1
Moderator
2,624 Views

Hi Carl,

Sorry for the late response, I have submitted the issue and the dev team is investigating it.

I just check the status and looks like they don't have progress yet. I will try to push.

About Pascal's suggestion, it seems like a work around but not direct hit the bug, would it be a clue to investigate your problem?

Mark

0 Kudos
Carl_L_
Beginner
2,624 Views

Pascal & Mark,

I'll look into this. I hope this would be a clue and provide a solution, but I'm not fully convinced yet since I feel the problem would be either more or less memory leaked than what is actually leaked. And in our code we're not using CComPtr, and using the D3D11 Debug Layer we can not see any leaked D3D resources (using ReportLiveDeviceObjects), although maybe we still could have a similar problem like Pascal mentions which the Debug Layer can not pick up? Anyway, I hope this could provide a clue and answer!

I'll get back when I've been able to look into it!

Best Regards,
Carl

0 Kudos
Carl_L_
Beginner
2,624 Views

Hi,

I can confirm this seems to solve the issue in the sample! So we can hopefully do some digging and find out why our own code (based on another/older sample) behaves like it does. 

I'll update as soon as I have more information.

Best Regards,
Carl

0 Kudos
Tuckerson__Robinson
2,624 Views

Pascal, thanks for this.  Can you confirm what the mitigating action is and where, precisely you have fixed it? I would like to put this fix into my own codebase too.  I'm guessing checking and releasing the pointer before use is the thing to do.

 

0 Kudos
Mark_L_Intel1
Moderator
2,624 Views

Hi Carl,

If you want my help, you can post how did you fix it. I can do a history check at least, I could also tell the dev team to speed up the investigation.

Mark

0 Kudos
Carl_L_
Beginner
2,624 Views

Hi,

Now we've had time to look at this again.

For the sample, I did exactly what Pascal said here: 
"To detect the leak, in d3d11_device.cpp, replace all the "&pointer.p" with "&pointer" and to fix it, place appropriate pointer.Release() call before dereferencing the pointer. I've tried your sample and I've had to do it in 3 places (m_pInputViewLeft, m_pOutputView and m_pDXGIBackBuffer.) "
If this is the "correct" way to solve the sample I'm not sure (regarding the usage of CComPtr), but for us it was enough to prove that the problem was in this sample, and it gave us clues as to why our codebase was having problems (based on another older sample). We've also solved our problems, so for me this issue is cleared. 

But it should be fixed in the sample for next release I'd think.

Best Regards,
Carl

0 Kudos
Mark_L_Intel1
Moderator
2,624 Views

Thanks so much,

This is a great help!

Let me tell this to dev team and make sure it is fixed for the next release.

Mark

0 Kudos
Cetrini__Fabio
2,624 Views

Carl L. wrote:

<cut>...and it gave us clues as to why our codebase was having problems (based on another older sample). We've also solved our problems, so for me this issue is cleared.

 

Hi Carl, I'm facing a very similar issue, but my solution architecture is completely different compared to the sample provided, please take a look at my thread if you can.

Can you explain me how you was able to solve your specific issue?

I hope it will let me "turn on a light" on mine.

Thank you,

Fabio

 

0 Kudos
Dmitry_E_Intel
Employee
2,624 Views

Hi Fabio and Carl,

BTW, do you see the issue with samples from GitHub https://github.com/Intel-Media-SDK/MediaSDK/tree/master/samples ?

Regards,

Dmitry

0 Kudos
Cetrini__Fabio
2,624 Views

Hi Dmitry, it's very difficult for me to recreate the same condition with your sample: I'm decoding 40 live streams inside a C# project, then stop and restart them on a timer.

I've written a simple C++ class that wraps mfx calls, can you help me debugging it?

This is my FrameAllocator alloc callback:

MfxHelper& self = *(MfxHelper*)pthis;
self.RealSurfaceNumber = (request->NumFrameSuggested + self.mfx_video_params->AsyncDepth);
self.mfx_surfaces = (mfxFrameSurface1**)calloc(self.RealSurfaceNumber, sizeof(mfxFrameSurface1*));

for (int i = 0; i < self.RealSurfaceNumber; i++)
{
	self.mfx_surfaces = (mfxFrameSurface1*)calloc(1, sizeof(mfxFrameSurface1));
	self.mfx_surfaces->Info = self.mfx_video_params->mfx.FrameInfo;
	self.mfx_surfaces->Data.MemId = (mfxMemId)(i + 1);
	self.outer_mids.push_back(self.mfx_surfaces->Data.MemId);
}

response->mids = &self.outer_mids.front();
response->NumFrameActual = self.RealSurfaceNumber;

D3D11_TEXTURE2D_DESC desc = {};
desc.Width = self.mfx_video_params->mfx.FrameInfo.Width;
desc.Height = self.mfx_video_params->mfx.FrameInfo.Height;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_NV12;
desc.SampleDesc.Count = 1;
desc.SampleDesc.Quality = 0;
desc.Usage = D3D11_USAGE_DEFAULT;
desc.BindFlags = D3D11_BIND_DECODER | D3D11_BIND_SHADER_RESOURCE;

self.textures = (ID3D11Texture2D**)calloc(self.RealSurfaceNumber, sizeof(ID3D11Texture2D*));

HRESULT hr;

for (int counter = 0; counter < self.RealSurfaceNumber; counter++)
{
	ID3D11Texture2D *texture;
	hr = self.d3d11Device->CreateTexture2D(&desc, NULL, &texture);

	self.textures[counter] = texture;
}

return MFX_ERR_NONE;

Then, in the GetHDL callback:

MfxHelper& self = *(MfxHelper*)pthis;

mfxHDL d3d_handle = self.textures[(int)mid];

mfxHDLPair *pPair = (mfxHDLPair*)handle;
pPair->first = d3d_handle;
pPair->second = (mfxHDL)(UINT_PTR)0;

return MFX_ERR_NONE;

This is the DecodeAsync (called from C#)

mfxFrameSurface1* pWorkSurface = this->findFreeSurface();

if (&pWorkSurface != NULL)
{
	mfxFrameSurface1* pOutSurface = NULL;
	mfxSyncPoint* sync = (mfxSyncPoint*)calloc(1, sizeof(mfxSyncPoint));

	this->lock_object.lock();

	if (this->disposing == false)
	{
		this->LastError = MFXVideoDECODE_DecodeFrameAsync(this->mfx_session, this->mfx_bitstream, pWorkSurface, &pOutSurface, sync);

		if (this->LastError == MFX_ERR_NONE)
			this->LastError = MFXVideoCORE_SyncOperation(this->mfx_session, *sync, 5000);

		if (this->LastError == MFX_ERR_NONE)
			this->onFrameReady(this->textures[(int)pOutSurface->Data.MemId]);
	}

	this->lock_object.unlock();

	free(sync);
}

And at the end the finalizer:

if (this->mfx_frame_allocator != NULL)
{
	do
	{
		mfxFrameSurface1* pWorkSurface = this->findFreeSurface();

		if (&pWorkSurface != NULL)
		{
			mfxFrameSurface1* pOutSurface = NULL;
			mfxSyncPoint* sync = (mfxSyncPoint*)calloc(1, sizeof(mfxSyncPoint));
			this->LastError = MFXVideoDECODE_DecodeFrameAsync(this->mfx_session, NULL, pWorkSurface, &pOutSurface, sync);
			free(sync);
		}
	} while (this->LastError != MFX_ERR_MORE_DATA);
}

this->LastError = MFXVideoDECODE_Close(this->mfx_session);

for (int i = 0; i < this->RealSurfaceNumber; i++)
	free(this->mfx_surfaces);

free(this->mfx_surfaces);

std::vector<mfxMemId>().swap(this->outer_mids);
this->outer_mids.clear();
this->outer_mids.shrink_to_fit();

if (this->mfx_frame_allocator != NULL)
{
	this->mfx_frame_allocator->pthis = NULL;
	this->mfx_frame_allocator->Alloc = NULL;
	this->mfx_frame_allocator->Free = NULL;
	this->mfx_frame_allocator->GetHDL = NULL;
	this->mfx_frame_allocator->Lock = NULL;
	this->mfx_frame_allocator->Unlock = NULL;

	free(this->mfx_frame_allocator);
}

this->onFrameReady = NULL;

delete[] this->mfx_bitstream->Data;
free(this->mfx_bitstream);
free(this->mfx_video_params);
free(this->mfx_version);
free(this->mfx_implementation);
free(this->mfx_init_params);

for (int i = 0; i < this->RealSurfaceNumber; i++)
	this->textures->Release();

free(this->textures);

this->LastError = MFXClose(this->mfx_session);

Do you spot something wrong?

Thank you

0 Kudos
Mark_L_Intel1
Moderator
2,245 Views

Hi Carl,

Sorry for the late response,

We have confirmed the issue has been solved with our latest code and sample.

We tested the 4 commands as you post at the beginning with the following conditions:

  • driver 25.20.100.6444,
  • similar machine in GTA (i5-6600, HD 530), i
  • MediaSDK 2019 R1
  • sample_decode from [https://github.com/Intel-Media-SDK/samples

Let me know if you have any questions.

Mark Liu

0 Kudos
Reply