Tools
Explore new features and tools within Intel® products, communities, and platforms
89 Discussions

3 Steps to Hardware Accelerated HDR Video Processing and Transcoding with oneVPL

Pamela_H_Intel
Moderator
1 11 7,770

Authors

Furong Zhang, GPU Software Development Engineer, Intel Corporation

Pamela Harrison, Software Technical Consulting Engineer

Abstract

HDR (High Dynamic Range) technology brings life to natural scenes with vivid colors, enables visible content even under challenging lighting conditions, and provides higher contrast in the scenes, thereby allowing better artistic representations.

Intel® oneAPI Video Processing Library (oneVPL) provides access to hardware accelerated HDR video processing and transcoding capabilities on Intel® Graphics.

With this tutorial, you will be able to optimize and accelerate your own HDR video processing and transcoding.

Contents

Quick Overview: HDR video processing and transcoding

Let’s Get to It

> HDR Video Decoding

> HDR Video Processing

>> HDR Metatada-Based Tone Mapping

>> Look Up Table (LUT)-Based HDR Tone Mapping

> HDR Video Encoding

FFMPEG integration

Acknowledgement

Summary

Legal Information

Quick Overview: HDR video processing and transcoding

Deliver High-Quality, High-Performance HDR via Intel® oneAPI Video Processing Library (oneVPL)  introduces HDR technology in detail and how to deliver HDR via oneVPL from high level. We worked with application developers to enable HDR in their applications. As a result of this enabling work, we have created this step-by-step tutorial with sample code, which will help to expedite your application enabling work. Follow this process to do HDR processing and transcoding on Intel® Graphics via oneVPL.

Let’s Get to It

Prerequisites

For HDR processing and transcoding, application developers need to attach a few external buffers via API calls in their code to achieve the desired results.

The sample code below and explanations list the steps needed to achieve HDR video processing and transcoding. The tutorial is divided into decoding, processing and encoding sections.

Note: Regarding other techniques in oneVPL such as encoding bit rate control, please refer to oneVPL documentation and possibly other sample code.

HDR Video Decoding

oneVPL provides HDR decoding functionality. Your application should extract HDR information from a bitstream via the oneVPL API during the HEVC/AV1 HDR decoding process. The 3 following structures are needed

Note that according to the oneVPL API specification, oneVPL will not do any conversion of these parameters. These values are extracted from the bitstream directly in the decoding process according to the video specification in use, such as HEVC or AV1.

Use the sample code below to extract the video signal information from the encoded bit stream. The video signal information in the bitstream indicates how the pictures should be interpreted and also ensures effective use of the decoded video pictures (for example, they can be used in display process). For the value and meaning of the specific field, please refer to the specification (HEVC, AV1, etc.).

 

 

 

mfxBitstream bit_stream = {};
mfxSession session = NULL;
mfxSyncPoint syncp;

mfxExtVideoSignalInfo vsi = { 0 };
vsi.Header.BufferId       = MFX_EXTBUFF_VIDEO_SIGNAL_INFO;
vsi.Header.BufferSz       = sizeof(vsi);

mfxExtBuffer *ext_buffer[1];
ext_buffer[0]        = (mfxExtBuffer *)&vsi;
mfxVideoParam params = {};
params.NumExtParam   = 1;
params.ExtParam      = (mfxExtBuffer **)&ext_buffer[0];

MFXVideoDECODE_DecodeHeader(session, &bit_stream, &params);

 

 

 

Once the bitstream parameters have been extracted, you will need to extract light and color information from the HDR bitstream.

The sample code below extracts content light level data and colour volume. The structure “mfxExtContentLightLevelInfo” is the static metadata for HDR10 content. It includes

  • Maximum Content Light Level and
  • Maximum Frame Average Light Level.

The structure “mfxExtMasteringDisplayColourVolume” identifies the colour volume

  • colour primaries,
  • white point,
  • luminance range

of a display considered to be the mastering display for the associated video content. An example might be the colour volume of a display that was used for viewing while creating the video content.

 

 

 

mfxExtMasteringDisplayColourVolume mdcv = {0};
mdcv.Header.BufferId = MFX_EXTBUFF_MASTERING_DISPLAY_COLOUR_VOLUME;
mdcv.Header.BufferSz = sizeof(mdcv);

mfxExtContentLightLevelInfo clli = {0};
clli.Header.BufferId = MFX_EXTBUFF_CONTENT_LIGHT_LEVEL_INFO;
clli.Header.BufferSz = sizeof(clli);

mfxExtBuffer *ext_buffer1[2];
ext_buffer1[0] = (mfxExtBuffer *)&mdcv;
ext_buffer1[1] = (mfxExtBuffer *)&clli;

mfxFrameSurface1 *surface_work, *surface_display;
surface_work->Data.NumExtParam = 2;
surface_work->Data.ExtParam    = (mfxExtBuffer **)&ext_buffer1[0];
MFXVideoDECODE_DecodeFrameAsync(session, &bit_stream, surface_work, &surface_display, &syncp);

 

 

 

The parsed HDR information will be attached to the mfxExtBuffer of the surface_display parameter of MFXVideoDECODE_DecodeFrameAsync() with the flag InsertPayloadToggle to indicate if there is valid HDR SEI message in the clip. InsertPayloadToggle will be set to MFX_PAYLOAD_IDR if oneVPL get valid HDR SEI, otherwise it will be set to MFX_PAYLOAD_OFF. 

This concludes the decoding segment of the tutorial. Now we will look at video processing.

HDR Video Processing

Based on the usage scenario, various video processing techniques are needed. Generally speaking, HDR tone mapping is needed in most HDR use cases. oneVPL provides 2 approaches for HDR tone mapping or processing. One approach is HDR metadata-based tone mapping for which application developers need to pass HDR metadata to oneVPL, then oneVPL will do the processing automatically. Another approach is look-up-table-based processing. With this second approach, application developers can customize the video effect. This approach requires the application developer to create the look up table and pass it to oneVPL.

Note: oneVPL only defines the memory layout of the look up table. The algorithm of generating the look up table is determined by the application developers and is beyond the scope of this tutorial.

HDR Metadata-Based Tone Mapping

The sample code below converts an HDR video to an SDR video via HDR metadata-based tone mapping.

In this approach, the following parameters need to be set for the input:

  • mfxExtVideoSignalInfo,
  • mfxExtMasteringDisplayColourVolume,
  • mfxExtContentLightLevelInfo;

The following parameters need to be set for the output:

  • mfxExtVideoSignalInfo,
  • mfxExtMasteringDisplayColourVolume (for HDR output)
  • Note: do not set mfxExtMasteringDisplayColourVolume for SDR output.

Additional Notes

  • The value of mfxExtVideoSignalInfo needs to follow the video specification (HEVC, AV1 etc.).
  • The fields in mfxExtContentLightLevelInfo also must follow the video specification (HEVC, AV1 etc.).
  • The units of MaxContentLightLevel and MaxPicAverageLightLevel are both nits (candelas per square meter).
  • The fields in mfxExtMasteringDisplayColourVolume, DisplayPrimariesX[3] and WhitePointX are in increments of 0.00002, in the range of [5, 37000];
  • DisplayPrimariesY[3] and WhitePointY are in increments of 0.00002, in the range of [5, 42000];
  • MaxDisplayMasteringLuminance is in units of 1 nit (candela per square meter).
  • MinDisplayMasteringLuminance is in units of 0.0001 nit (candela per square meter).
  • The structure mfxExtMasteringDisplayColourVolume in video processing needs to follow the description of the oneVPL API since different video specifications may have different units (e.g., HEVC units are different from AV1 units).

 

 

 

// HDR input
mfxExtVideoSignalInfo vsi_in    = { 0 };
vsi_in.Header.BufferId       = MFX_EXTBUFF_VIDEO_SIGNAL_INFO_IN;
vsi_in.Header.BufferSz          = sizeof(vsi_in);
vsi_in.VideoFormat              = 0;
vsi_in.VideoFullRange           = 0;
vsi_in.ColourDescriptionPresent = 1;
vsi_in.ColourPrimaries          = 9; // BT2020
vsi_in.TransferCharacteristics  = 16; // ST2084
vsi_in.MatrixCoefficients       = 9;

// SDR output
mfxExtVideoSignalInfo vsi_out    = { 0 };
vsi_out.Header.BufferId          = MFX_EXTBUFF_VIDEO_SIGNAL_INFO_IN;
vsi_out.Header.BufferSz          = sizeof(vsi_in);
vsi_out.VideoFormat              = 0;
vsi_out.VideoFullRange           = 0;
vsi_out.ColourDescriptionPresent = 1;
vsi_out.ColourPrimaries          = 1; // BT709
vsi_out.TransferCharacteristics  = 1; // BT709
vsi_our.MatrixCoefficients       = 1;

// HDR input
mfxExtMasteringDisplayColourVolume mdcv_in = { 0 };
mdcv_in.Header.BufferId                    = MFX_EXTBUFF_MASTERING_DISPLAY_COLOUR_VOLUME_IN;

mdcv_in.Header.BufferSz                    = sizeof(mdcv_in);
mdcv_in.InsertPayloadToggle                = MFX_PAYLOAD_IDR;
mdcv_in.DisplayPrimariesX[0]               = 8500;
mdcv_in.DisplayPrimariesX[1]               = 35400;
mdcv_in.DisplayPrimariesX[2]               = 6550;
mdcv_in.DisplayPrimariesY[0]               = 39850;
mdcv_in.DisplayPrimariesY[1]               = 14600;
mdcv_in.DisplayPrimariesY[2]               = 2300;
mdcv_in.WhitePointX                        = 15636;
mdcv_in.WhitePointY                        = 16450;
mdcv_in.MaxDisplayMasteringLuminance       = 2000;    //in units of nit
mdcv_in.MinDisplayMasteringLuminance       = 1;   //in units of 0.0001 nit

mfxExtContentLightLevelInfo clli_in = { 0 };
clli_in.Header.BufferId             = MFX_EXTBUFF_CONTENT_LIGHT_LEVEL_INFO;
clli_in.Header.BufferSz             = sizeof(clli_in);
clli_in.InsertPayloadToggle         = MFX_PAYLOAD_IDR;
clli_in.MaxContentLightLevel        = 2000;   //in units of nit
clli_in.MaxPicAverageLightLevel     = 2000;   //in units of nit
mfxExtBuffer *ext_buffer[4];
ext_buffer[0] = (mfxExtBuffer *)&vsi_in;
ext_buffer[1] = (mfxExtBuffer *)&vsi_out;
ext_buffer[2] = (mfxExtBuffer *)&mdcv_in;
ext_buffer[3] = (mfxExtBuffer *)&clli_in;

mfxVideoParam params = {};
params.NumExtParam   = 4;
params.ExtParam      = (mfxExtBuffer **)&ext_buffer[0];
MFXVideoVPP_Init(session, &params);

 

 

 

Look Up Table (LUT)-Based HDR Tone Mapping

The sample code below converts an HDR video to an SDR video via LUT (3DLUT)-based Tone mapping (or processing). This approach provides a flexible and customized way to achieve tone mapping. Application developers can generate a look up table and pass it to oneVPL for hardware acceleration. Regarding the memory which holds the look up table(3DLUT), it can be system buffer (mfx3DLutSystemBuffer) or video buffer (mfx3DLutVideoBuffer) memory.

If the LUT data is changing frame per frame, we suggest using video memory (mfx3DLutVideoBuffer) which can eliminate some of the system-video memory copying, thereby improving the workload efficiency.

  • In this case, the typical usage is application uses some GPU shader/OpenCL language to write data to 3DLUT video buffer, graphics VEBOX read data from this buffer directly. And application can use a few 3DLUT video buffers (a few structures of mfx3DlutVideoBuffer) for better efficiency considering write (GPU shader / OpenCL)/read (Graphics hardware unit VEBOX) different 3DLUT video memory address simultaneously.
  • On the other hand, if 3DLUT is in system memory, the oneVPL / driver implementation usually needs to copy this system memory to video memory for hardware direct access. This first sample code demonstrates 3DLUT in system memory.

 

 

 

mfxBitstream bit_stream = {};
mfxSession session      = NULL;
mfxSyncPoint syncp;
mfxFrameSurface1 surface;

// HDR input
mfxExtVideoSignalInfo vsi_in    = { 0 };
vsi_in.Header.BufferId          = MFX_EXTBUFF_VIDEO_SIGNAL_INFO_IN;
vsi_in.Header.BufferSz          = sizeof(vsi_in);
vsi_in.VideoFormat              = 0;
vsi_in.VideoFullRange           = 0;
vsi_in.ColourDescriptionPresent = 1;
vsi_in.ColourPrimaries          = 9; // BT2020
vsi_in.TransferCharacteristics  = 16; // ST2084
vsi_in.MatrixCoefficients       = 9;

// SDR output
mfxExtVideoSignalInfo vsi_out   = { 0 };
vsi_out.Header.BufferId         = MFX_EXTBUFF_VIDEO_SIGNAL_INFO_OUT;
vsi_out.Header.BufferSz          = sizeof(vsi_out);
vsi_out.VideoFormat              = 0;
vsi_out.VideoFullRange           = 0;
vsi_out.ColourDescriptionPresent = 1;
vsi_out.ColourPrimaries          = 1; // BT709
vsi_out.TransferCharacteristics  = 1; // BT709
vsi_out.MatrixCoefficients       = 1;

// 3DLUT
mfxExtVPP3DLut lut  = {0};
mfxU32 dim[3] = { 65, 65, 128 };
mfxU16 *r_corr, *g_corr, *b_corr;
lut.BufferType                             = MFX_RESOURCE_SYSTEM_SURFACE;
lut.SystemBuffer.Channel[0].Data16         = r_corr;
lut.SystemBuffer.Channel[0].Size           = 65;
lut.SystemBuffer.Channel[0].DataType       = MFX_DATA_TYPE_U16;
lut.SystemBuffer.Channel[1].Data16         = g_corr;
lut.SystemBuffer.Channel[1].Size           = 65;
lut.SystemBuffer.Channel[0].DataType       = MFX_DATA_TYPE_U16;
lut.SystemBuffer.Channel[2].Data16         = b_corr;
lut.SystemBuffer.Channel[2].Size           = 128;
lut.SystemBuffer.Channel[0].DataType       = MFX_DATA_TYPE_U16;

mfxExtBuffer *ext_buffer[3];
ext_buffer[0] = (mfxExtBuffer *)&vsi_in;
ext_buffer[1] = (mfxExtBuffer *)&vsi_out;
ext_buffer[2] = (mfxExtBuffer *)&lut;

mfxVideoParam params = {};
params.NumExtParam   = 3;
params.ExtParam      = (mfxExtBuffer **)&ext_buffer[0];

MFXVideoVPP_Init(session, &params);

 

 

 

This second LUT-based tone mapping code sample demonstrates how to create video memory in the mfx3DLutVideoBuffer and configure mfxExtVPP3DLut.

 

 

 

// Allocate 3DLUT video memory
mfxFrameAllocRequest request3dlut = {};
request3dlut.Info.FourCC       = MFX_FOURCC_P8;
request3dlut.Info.Width        = n3DLutVWidth;
request3dlut.Info.Height       = n3DLutVHeight;
request3dlut.NumFrameSuggested = 1; //This sample code uses only 1 frame, but in a real usage scenario, app can use a few frames for better efficiency and performance considering read/write different 3DLUT video memory address.
request3dlut.NumFrameMin       = 1;
request3dlut.Type = MFX_MEMTYPE_FROM_VPPIN | 
  MFX_MEMTYPE_VIDEO_MEMORY_PROCESSOR_TARGET;
pResources->p3dlutResponse = new mfxFrameAllocResponse;
auto pResponseOut          = pResources->p3dlutResponse;
sts = pAllocator->Alloc(pAllocator->pthis, &request3dlut, pResponseOut);           
mfxU16 nFrames = pResponseOut->NumFrameActual;
if (nFrames != 1)
return MFX_ERR_MEMORY_ALLOC;

// Fill or write the valid data into the 3DLUT video memory

// Set mfxExtVPP3DLut
mfxHDLPair pair;
mfxHDL* hdl = &(pair.first);
sts         = pAllocator->GetHDL(pAllocator->pthis, pResponseOut->mids[0], hdl);  
ID3D11Texture2D* texture = (ID3D11Texture2D*)((mfxHDLPair*)(hdl))->first;

lutConfig->BufferType            = MFX_RESOURCE_DX11_TEXTURE;
lutConfig->VideoBuffer.DataType  = MFX_DATA_TYPE_U16;
lutConfig->VideoBuffer.MemLayout = MFX_3DLUT_MEMORY_LAYOUT_INTEL_65LUT;
lutConfig->VideoBuffer.MemId     = (ID3D11Texture2D*)texture;

 

 

 

HDR Video Encoding

Now that we have demonstrated various decoding and video processing techniques, let’s look at oneVPL HDR encoding functionality. For encoding, the application should add HDR SEI information into the bitstream via the oneVPL API for HEVC/AV1 HDR encoding. Application developers need to address these 3 structures:

The sample code below inserts video signal data into an encoded bit stream during initialization. There are 2 approaches to indicate content light level and mastering display colour volume during encoding.

The first approach is to attach mfxExtContentLightLevelInfo and  mfxExtMasteringDisplayColourVolume to the mfxVideoParam structure during initialization or reset. In this case, the encoder inserts the HDR SEI message based on the InsertPayloadToggle flag.

The second approach is to attach these 2 structures to the mfxEncodeCtrl structure at runtime, per frame. The sample code below demonstrates attaching these 2 structures to mfxEncodeCtrl per frame. (Note that the video signal structure was set during initialization.)

 

 

 

mfxBitstream bit_stream = {};
mfxSession session      = NULL;
mfxSyncPoint syncp;
mfxFrameSurface1 surface;

mfxExtVideoSignalInfo vsi              = { 0 };
vsi.Header.BufferId                    = MFX_EXTBUFF_VIDEO_SIGNAL_INFO;
vsi.Header.BufferSz                    = sizeof(vsi);
vsi.VideoFormat                        = 0;
vsi.VideoFullRange                     = 0;
vsi.ColourDescriptionPresent           = 1;
vsi.ColourPrimaries                    = 9;  // BT2020
vsi.TransferCharacteristics            = 16; // ST2084
vsi.MatrixCoefficients                 = 9;

mfxExtBuffer *ext_buffer[1];
ext_buffer[0]        = (mfxExtBuffer *)&vsi;

mfxVideoParam params = {};
params.NumExtParam   = 1;
params.ExtParam      = (mfxExtBuffer **)&ext_buffer[0];
MFXVideoENCODE_Init(session, &params);
mfxExtMasteringDisplayColourVolume mdcv = { 0 };
mdcv.Header.BufferId                    = MFX_EXTBUFF_MASTERING_DISPLAY_COLOUR_VOLUME;
mdcv.Header.BufferSz                    = sizeof(mdcv);
mdcv.InsertPayloadToggle                = MFX_PAYLOAD_IDR;
mdcv.DisplayPrimariesX[0]               = 8500;
mdcv.DisplayPrimariesX[1]               = 35400;
mdcv.DisplayPrimariesX[2]               = 6550;
mdcv.DisplayPrimariesY[0]               = 39850;
mdcv.DisplayPrimariesY[1]               = 14600;
mdcv.DisplayPrimariesY[2]               = 2300;
mdcv.WhitePointX                        = 15636;
mdcv.WhitePointY                        = 16450;
mdcv.MaxDisplayMasteringLuminance       = 2000 * 10000;  // HEVC in units of 0.0001 nit
mdcv.MinDisplayMasteringLuminance       = 1*10000;  // HEVC in units of 0.0001 nit

mfxExtContentLightLevelInfo clli = { 0 };
clli.Header.BufferId             = MFX_EXTBUFF_CONTENT_LIGHT_LEVEL_INFO;
clli.Header.BufferSz             = sizeof(clli);
clli.InsertPayloadToggle         = MFX_PAYLOAD_IDR;
clli.MaxContentLightLevel        = 2000;
clli.MaxPicAverageLightLevel     = 2000;

mfxExtBuffer *ext_buffer1[2];
ext_buffer1[0] = (mfxExtBuffer *)&mdcv;
ext_buffer1[1] = (mfxExtBuffer *)&clli;

mfxEncodeCtrl enc_ctrl;
enc_ctrl.ExtParam = (mfxExtBuffer **)&ext_buffer1[0];
enc_ctrl.NumExtParam = 2;

MFXVideoENCODE_EncodeFrameAsync(session, &enc_ctrl, &surface, &bit_stream, &syncp);

 

 

 

This concludes encoding portion of our tutorial.

FFMPEG Integration

Users can download FFmpeg from the latest master branch or download release 6.1 to experience hardware accelerated HDR transcoding and processing. My colleague, Haihao Xiang, has provided the following FFmpeg command examples for HDR transcoding and processing as reference (for example, ffmpeg.exe in Windows, ffmpeg on Linux).   

Transcode 10bit HDR mp4 input to 10bit SDR mp4 output:

 

 

 

ffmpeg.exe -hwaccel qsv -i hdr.b10.mp4 -vf "vpp_qsv=tonemap=1" -c:v hevc_qsv sdr.b10.mp4

 

 

 

Transcode 10bit HDR mp4 input to 8bit SDR mp4 output:

 

 

 

ffmpeg.exe -hwaccel qsv -i hdr.b10.mp4 -vf "vpp_qsv=tonemap=1,format=nv12" -c:v hevc_qsv sdr.b8.mp4

 

 

 

Transcode 10bit HDR mp4 input to 10bit HDR mp4 output with down scaling:

 

 

 

ffmpeg.exe -hwaccel qsv -i hdr.in.mp4 -vf "vpp_qsv=w=1920:h=1080" -c:v hevc_qsv hdr.out.mp4

 

 

 

Acknowledgement

I want to express my gratitude to Haihao Xiang, my colleague at Intel for integrating oneVPL HDR related API into FFmpeg.

Also sincere appreciation to my Intel colleague Artem Galin for End-To-End integration and excellent suggestions from Application level and usage scenario perspective.

Both of these colleagues are excellent engineers and team contributors.

Summary

Intel oneAPI Video Processing Library provides an open programming interface for fast, high-quality, real-time HDR decoding, tone mapping, and encoding which delivers the HDR advantages. With oneVPL, the application developer can develop quality, performant video applications that can leverage Intel hardware accelerators.

Legal Information

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development.  All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications, and roadmaps.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel, the Intel logo, Intel® are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© Intel Corporation.

About the Author
Software engineer for 20+ years. Excels in all things software, plus connecting people and teams for optimal synergy.
11 Comments
ChenYufei
Beginner

In the example of using system memory for LUT table, using three 1D array should be 3x1D LUT table instead of 3D LUT table?

 

oneVPL - Video Processing 3DLUT also refers to this as 3D LUT, but I can't figure out how to assign lut3d[65][65][65] data to 3 mfxChannel.

Furong_Zhang_Intel

Hi @ChenYufei ,

 

Thank you for your attention and bringing up this question. 

 

We have sample code as this link oneVPL/tools/legacy/sample_vpp/src/sample_vpp_config.cpp at master · oneapi-src/oneVPL (github.com)

If the lut3d[65][65][65], the system memory is like the bebow.

// 65 size 3DLUT(3 dimension look up table)
// The buffer size(in bytes) for every channel is 65*65*65*sizeof(DataType)
mfxU16 dataR[65*65*65], dataG[65*65*65], dataB[65*65*65];
mfxChannel channelR, channelG, channelB;
channelR.DataType = MFX_DATA_TYPE_U16;
channelR.Size = 65;
channelR.Data16 = dataR;
channelG.DataType = MFX_DATA_TYPE_U16;
channelG.Size = 65;
channelG.Data16 = dataG;
channelB.DataType = MFX_DATA_TYPE_U16;
channelB.Size = 65;
channelB.Data16 = dataB;

Video Processing Procedures — oneVPL documentation (oneapi.io) v2.9.0, has a typo, I have already fixed it. But the fix needs some time to be upstreamed.

 

Thanks, 

Furong

 

 

 

 

ChenYufei
Beginner

Thanks for your helpful reply @Furong_Zhang_Intel .

 

The sample code in oneVPL is very helpful. I actually have found them when trying to add support for applying 3D LUT in FFmpeg with vpp_qsv filter.

 

There's still things that's not clear to me. In the sample of using system memory and this vaapi allocator, the LUT file loads directly from binary file. I'm using 3D LUT cube file, which use float values in range of [0, 1], while VPP use mfxU16 as LUT value. So there must be some convertion.

 

By experiment, I choose to multiple 65535 with floating LUT value. This is reasonable and the transcoding result seems correct, but I'm still not 100% sure this is the right scaling value to use.

Furong_Zhang_Intel

Hi @ChenYufei ,

 

Thank you so much for your attention to 3DLUT filter in oneVPL. 

 

I went through your code multiple 65535 with floating LUT value , it is looking good to me (just a kindly reminder, please note overflow). The 3DLUT in our HW is 16bit precision, the value should be in the range of [0, 0xFFFF]. 

 

For your information, I once added 3DLUT in our Sample Multiple Transcode for Linux Transcoding Link oneVPL/tools/legacy/sample_multi_transcode at master · oneapi-src/oneVPL (github.com)

Best Regards,

Furong

ChenYufei
Beginner

Thanks for your information @Furong_Zhang_Intel 

 

I noticed sample_multi_transcode before. Actually vaapi_allocator.cpp in oneVPL helped a lot to me. I've posted my changes to ffmpeg-devel mailing list and am still waiting for review. There's already tone mapping in vf_qsv_vpp in FFmpeg, it's would be greate to also support 3D LUT with oneVPL.

 

The speed of applying 3D LUT with oneVPL and Quick Sync Video is really amazing. There's almost no speed decrease compared to transcoding without any filters.

 

I'm an Intel Arc graphics card user and I shoot log video with camera, that's why I'm interested in apply 3D LUT filter. I started using Arc A380 for video editing with Davinci Resolve, but it can't smoothly playback 4K 60fps footage (a little suprise to me because I am using Intel 12700 CPU and the iGPU should be used for HEVC decoding in Davinci) so I upgraded to Arc A770. After those frequent Intel driver updates along with Davinci Resolve's update, I'm quite happy with Arc A770 for my usage now.

Furong_Zhang_Intel

@ChenYufei , thank you so much for the above information. I am very glad to know that. 

 

Best regards,

Furong

ChenYufei
Beginner

@Furong_Zhang_Intel  I tried to use system memory to hold 3D LUT, it seems that the sample code you provided is not compatible with current implementation of oneVPL when using VAAPI.

 

Here's the current implementation of copying system memory 3D LUT to video memory, it just memcpy the three channel data to mapped video memory:

 

    memcpy((char*)surface_p, pParams->lut3DInfo.Channel[0].Data, pParams->lut3DInfo.Channel[0].Size);
    memcpy((char*)surface_p + pParams->lut3DInfo.Channel[0].Size, pParams->lut3DInfo.Channel[1].Data, pParams->lut3DInfo.Channel[1].Size);
    memcpy((char*)surface_p + pParams->lut3DInfo.Channel[0].Size + pParams->lut3DInfo.Channel[1].Size, pParams->lut3DInfo.Channel[2].Data, pParams->lut3DInfo.Channel[2].Size);

 

As I previously implemented 3D LUT processing using video memory, this clearly is not going to work if we consider the 3 channels are storing values separately for RGB channel. Here the code to copy 3D LUT to mapped video memory:

 

    sf_idx = (r * lut_size * mul_size + g * mul_size + b) * 4;
    surface_u16[sf_idx + 0] = (mfxU16)(s->r * UINT16_MAX);
    surface_u16[sf_idx + 1] = (mfxU16)(s->g * UINT16_MAX);
    surface_u16[sf_idx + 2] = (mfxU16)(s->b * UINT16_MAX);
    // surface_u16[sf_idx + 4] is reserved channel.

 

RGBA channel values are packed together when copying to video memory.

 

Given the current implementation of oneVPL, in order to create 3D LUT using system memory, we have to use the same memory layout as using video memory. Here's my current implementation which is working correctly on my testing video.

 

My patch to FFmpeg of using VPP for 3D LUT got reviewed by Xiang Haihao (Intel employee). I'm suggested to use system memory to create 3D LUT. While my implementation now works, I'm not sure if this is the correct way to use oneVPL now.

 

Would you suggest I open an issue on oneVPL-intel-gpu project for this? Or it's better to open issue reguarding to the sample code in oneVPL project?

raoof123
Beginner

Subject: Appreciation for the Guide on Hardware Accelerated HDR Video Processing with oneVPL

Hello [Pamela_H_Intel],

I wanted to express my gratitude for sharing the insightful guide on "3 Steps to Hardware Accelerated HDR Video Processing and Transcoding with oneVPL." Your contribution is valuable for those seeking efficient ways to enhance video processing, especially in the realm of HDR content.

The step-by-step breakdown you provided is clear and concise, making it accessible for both beginners and those with more advanced technical knowledge. It's evident that you've put thought into simplifying what can be a complex process.

If you have any additional tips, best practices, or insights related to HDR video processing and transcoding, I'd love to hear more. Furthermore, if there are specific challenges or questions you'd like assistance with, please feel free to share, and I'm here to help.

Once again, thank you for your valuable contribution to the community.

Furong_Zhang_Intel

@raoof123 ,  I am the technical author of this article. Please let me know if you have any technical questions. 

Furong_Zhang_Intel

@ChenYufei , I will check your questions and update to you by this Tuesday.

Furong_Zhang_Intel

@ChenYufei , we resolved the issues you mentioned. The CPU Linux Implementation patch will be in intel-onevpl-24.1.1.