Solved: Re: Re:GPUCopy details

OTorg · ‎04-10-2023

Hi,

Could you please explain how exactly GPUCopy feature works in video encoding/decoding tasks.

Let's consider a scenario without GPUCopy option at first.
The application has buffers in RAM where it holds the video content (it should be so for architectural reasons).
When the encoder or decoder is initiated, the MFX_IOPATTERN_IN_VIDEO_MEMORY or MFX_IOPATTERN_OUT_VIDEO_MEMORY model is specified.
When it's time to encode the next frame, a LockSurface/Map call is made to mfxFrameSurface1, then video content is copied between the application's RAM and surface, then unlock is made and mfxFrameSurface1 is sent to VPL engine.
Copying is optimized using _mm_stream_load_si128/_mm_stream_si128 instructions. But, nevertheless, the CPU is engaged in this.

Let's go further.
If the GPU has GPUCopy functional, does this mean that:
When initializing the encoder/decoder, we can specify MFX_IOPATTERN_IN_SYSTEM_MEMORY/MFX_IOPATTERN_OUT_SYSTEM_MEMORY and external allocator.
And when filling the next mfxFrameSurface1 structure, we can point directly to the application frame's buffer.
And the GPU itself will copy the content from/to the application's arbitrary memory to/from video memory, the CPU will not be involved in this, double copying will not occur.
Is it correct?

If it matters, then the environment is Windows.

Rahila_T_Intel · ‎04-12-2023

Hi,

Thank you for posting in Intel Communities.

Let's consider a scenario where you are building a video processing application (VPP) that performs color space conversion and resizing on a sequence of video frames.

When 'gpucopy' is enabled:

1. The video frames are first read from the disk and loaded into the system memory (CPU memory) to mfxFrameSurface1.

2. The frames are then copied from the system memory to the GPU memory using a 'gpucopy' operation.

3. The GPU performs color space conversion and resizing on the frames

4. The processed frames are copied back from the GPU memory to the system memory using another 'gpucopy' operation

5. Finally, the processed frames are saved to disk or sent to another part of the system for further processing.

When 'gpucopy' is disabled:

1. The video frame is read from the disk and loaded into the system memory

2. The GPU performs color space conversion and resizing on the frames directly from the system memory, without copying the data to the GPU memory first.

3. The processed frames are saved to disk or sent to another part of the system for further processing directly from the system memory

Disabling 'gpucopy' might seem like a more efficient approach because it avoids the overhead of copying data between the system memory and the GPU memory. However, accessing the system memory directly from the GPU can result in lower performance due to the increased latency and lower bandwidth compared to the GPU's dedicated memory. In many cases, it is more efficient to perform 'gpucopy' operations and work with the data in the GPU memory, even with the additional overhead.

The optimal approach depends on the specific hardware, the size of the data being processed, and the nature of the processing tasks. In some cases, 'gpucopy' might be essential for achieving acceptable performance, while in others, it might be possible to achieve similar performance with or without 'gpucopy'. As a developer, it is essential to profile and optimize your application for the specific target hardware and use case.

It's important to note that the actual performance improvement depends on your specific hardware and use case. Enabling MFX_GPUCOPY_ON might not always result in better performance, so it's essential to test and profile your application with and without the flag to determine the best configuration for your needs.

DeviceCopy is the equivalent of mfxInitParam::GPUCopy. When enabled, it uses the EUs (CM kernels) for accelerated copy between video memory and system memory.

If this resolves your query, kindly accept this as a solution as it will help others with a similar query.

Thanks

View solution in original post

Rahila_T_Intel · ‎04-12-2023