i7-6770/8GB RAM in an Intel Skull Canyon NUC running Windows 10 Pro with the 20.1915.4501 driver package. The Media SDK is 2016_R2.
We are running code based on the simple_decoder example contained in the Media SDK tutorials: https://software.intel.com/en-us/articles/media-sdk-tutorials-for-client-and-server. The sample has been modified so that it loops over the raw H264 bitstream (thus acting as a continuous test source) and it outputs BGRA.
On all of the other systems we have here (gen 3,4, 5 and 6 with i5, i7 and Z8350 processors), running both Windows 7 and Windows 10, this code works exactly as expected.
On the 6770, the working set delta (as reported by task manager) is somewhere between 1400K and 2000K per task manager update.
By way of a sanity check we also run a stock build of the sample_decode example. When using the same H264 bitstream file in render mode this application does not exhibit any symptoms of a memory leak.
This is a show stopper for us.  I have attached a complete VS2013/2015 project that demonstrates the issue quite clearly. I hope will help isolate the problem. Any help greatly appreciated!
Just want to let you know that I've replicated the behavior you've described. Your reproducer runs without memory leak on 6th Generation Core processors with HD graphics, but for Iris Pro 580 I see a large leak.
The one thing I'd like to look at next since you mentioned that you don't see the issue from sample_decode is if there might be a workaround via code changes. This would be a quicker path to get you unblocked than waiting for a driver fix. I won't be able to get to that until tomorrow but we are making progress.
Thank you for the update. Glad you can reproduce the issue.
It might help help if you could shed a little more light on the problem - would using a custom frame/buffer allocator be of any use? It is unclear from the documentation whether this approach means the allocator gets used or _all_ allocations or just those required for the VPP and Decoder buffers.
I would like to avoid the more complex path if possible at this stage. Let me also add that there is a live report of long term leaks in sample_encode as _well_, whilst that may be a slower leak, it too will fail at some point.
I will see if I can acquire a 6G processor with HD for testing as well.
Look forward to updates.
Your help much appreciated.
More details: I've replicated using the tutorials based on your code. Results:
- If system memory is used (your example): memory leak
- If implemented as decode->video memory->VPP->system memory, no leak. Also runs faster.
Now we have a bug to follow up on our end, and a workaround which may be an improvement to try. I'll follow up with the code I tested with in a private message.