- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please explain how exactly GPUCopy feature works in video encoding/decoding tasks.
Let's consider a scenario without GPUCopy option at first.
The application has buffers in RAM where it holds the video content (it should be so for architectural reasons).
When the encoder or decoder is initiated, the MFX_IOPATTERN_IN_VIDEO_MEMORY or MFX_IOPATTERN_OUT_VIDEO_MEMORY model is specified.
When it's time to encode the next frame, a LockSurface/Map call is made to mfxFrameSurface1, then video content is copied between the application's RAM and surface, then unlock is made and mfxFrameSurface1 is sent to VPL engine.
Copying is optimized using _mm_stream_load_si128/_mm_stream_si128 instructions. But, nevertheless, the CPU is engaged in this.
Let's go further.
If the GPU has GPUCopy functional, does this mean that:
When initializing the encoder/decoder, we can specify MFX_IOPATTERN_IN_SYSTEM_MEMORY/MFX_IOPATTERN_OUT_SYSTEM_MEMORY and external allocator.
And when filling the next mfxFrameSurface1 structure, we can point directly to the application frame's buffer.
And the GPU itself will copy the content from/to the application's arbitrary memory to/from video memory, the CPU will not be involved in this, double copying will not occur.
Is it correct?
If it matters, then the environment is Windows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
Let's consider a scenario where you are building a video processing application (VPP) that performs color space conversion and resizing on a sequence of video frames.
When 'gpucopy' is enabled:
1. The video frames are first read from the disk and loaded into the system memory (CPU memory) to mfxFrameSurface1.
2. The frames are then copied from the system memory to the GPU memory using a 'gpucopy' operation.
3. The GPU performs color space conversion and resizing on the frames
4. The processed frames are copied back from the GPU memory to the system memory using another 'gpucopy' operation
5. Finally, the processed frames are saved to disk or sent to another part of the system for further processing.
When 'gpucopy' is disabled:
1. The video frame is read from the disk and loaded into the system memory
2. The GPU performs color space conversion and resizing on the frames directly from the system memory, without copying the data to the GPU memory first.
3. The processed frames are saved to disk or sent to another part of the system for further processing directly from the system memory
Disabling 'gpucopy' might seem like a more efficient approach because it avoids the overhead of copying data between the system memory and the GPU memory. However, accessing the system memory directly from the GPU can result in lower performance due to the increased latency and lower bandwidth compared to the GPU's dedicated memory. In many cases, it is more efficient to perform 'gpucopy' operations and work with the data in the GPU memory, even with the additional overhead.
The optimal approach depends on the specific hardware, the size of the data being processed, and the nature of the processing tasks. In some cases, 'gpucopy' might be essential for achieving acceptable performance, while in others, it might be possible to achieve similar performance with or without 'gpucopy'. As a developer, it is essential to profile and optimize your application for the specific target hardware and use case.
It's important to note that the actual performance improvement depends on your specific hardware and use case. Enabling MFX_GPUCOPY_ON might not always result in better performance, so it's essential to test and profile your application with and without the flag to determine the best configuration for your needs.
DeviceCopy is the equivalent of mfxInitParam::GPUCopy. When enabled, it uses the EUs (CM kernels) for accelerated copy between video memory and system memory.
If this resolves your query, kindly accept this as a solution as it will help others with a similar query.
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
Let's consider a scenario where you are building a video processing application (VPP) that performs color space conversion and resizing on a sequence of video frames.
When 'gpucopy' is enabled:
1. The video frames are first read from the disk and loaded into the system memory (CPU memory) to mfxFrameSurface1.
2. The frames are then copied from the system memory to the GPU memory using a 'gpucopy' operation.
3. The GPU performs color space conversion and resizing on the frames
4. The processed frames are copied back from the GPU memory to the system memory using another 'gpucopy' operation
5. Finally, the processed frames are saved to disk or sent to another part of the system for further processing.
When 'gpucopy' is disabled:
1. The video frame is read from the disk and loaded into the system memory
2. The GPU performs color space conversion and resizing on the frames directly from the system memory, without copying the data to the GPU memory first.
3. The processed frames are saved to disk or sent to another part of the system for further processing directly from the system memory
Disabling 'gpucopy' might seem like a more efficient approach because it avoids the overhead of copying data between the system memory and the GPU memory. However, accessing the system memory directly from the GPU can result in lower performance due to the increased latency and lower bandwidth compared to the GPU's dedicated memory. In many cases, it is more efficient to perform 'gpucopy' operations and work with the data in the GPU memory, even with the additional overhead.
The optimal approach depends on the specific hardware, the size of the data being processed, and the nature of the processing tasks. In some cases, 'gpucopy' might be essential for achieving acceptable performance, while in others, it might be possible to achieve similar performance with or without 'gpucopy'. As a developer, it is essential to profile and optimize your application for the specific target hardware and use case.
It's important to note that the actual performance improvement depends on your specific hardware and use case. Enabling MFX_GPUCOPY_ON might not always result in better performance, so it's essential to test and profile your application with and without the flag to determine the best configuration for your needs.
DeviceCopy is the equivalent of mfxInitParam::GPUCopy. When enabled, it uses the EUs (CM kernels) for accelerated copy between video memory and system memory.
If this resolves your query, kindly accept this as a solution as it will help others with a similar query.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rahila,
Thank you for the detailed answer.
I would like to clarify two more details.
1.
Your phrase "frames are copied from the system memory to the GPU memory using a 'gpucopy' operation".
Who performs that 'gpucopy' operation, CPU/driver or GPU/firmware?
2.
When I make LockSurface/Map call to hardware surface, I get some pointer to the frame data.
Is it a pointer exaclty to GPU memory, that is mapped into my application's address space (and accesses to it go to the PCIe bus, not to RAM)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Plesae find the responses to the questions:
1.Intel VPL Runtime creates CM GPU Kernels for accelerated copy between video memory and system memory.
2.The pointer we get when locking or mapping a hardware surface, point to a region in the system memory that is mapped to the corresponding GPU memory. This allows us to access the frame data from the CPU while VPL runtime manages the necessary synchronization and data transfer between GPU and system memory.
If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Rahila!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Glad to know that your query is resolved. Thanks for accepting our solution.
If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Thanks
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page