Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

Intel Media SDK's denoiser

Ranjit_T_1
Beginner
801 Views

All,

   We are trying to use Intel Media SDK's denoiser for denoising some yuvs and we have some questions in this regard.

1. From the deoiser output, we see that Intel Media SDK's denoiser does not seem to denoise chroma data. We see that chroma data before and after denoising is exactly matching. Is it that Intel Media SDK's denoiser does not denoise the chroma data? With regards to chroma, is it okay to create monochrome surfaces for each component and then call denoise for each surface?

2. What is the speed at which software and hardware denoiser can be run (say on i7 4770 machine)? Does the speed change depending on content properties?

3. We see that hardware and software denoisers don't give identical results. Is it expected?

4. We see that VPP call for denoising gives output from the first frame? Is it that first frame uses spatial denoising and second frame on spatial + temporal denoising is used?

Regards,

Ranjit

0 Kudos
12 Replies
Surbhi_M_Intel
Employee
801 Views

Hi Ranjit,

Thanks for asking the question.

>> What is the speed at which software and hardware denoiser can be run (say on i7 4770 machine)? Does the speed change depending on content properties?
We don't have the fps measurements for this, but you will definitely see better speed with HW. Yes the speed changes depending upon the complexity of the content

>> We see that hardware and software denoisers don't give identical results. Is it expected?
Yes It might be possible that you wouldn't get identical results. 

>> We see that VPP call for denoising gives output from the first frame? Is it that first frame uses spatial denoising and second frame on spatial + temporal denoising is used?
Yes we use spatial as well as temporal denoising. I am not sure if I get what this means "We see that VPP call for denoising gives output from the first frame?" can you please elaborate. 

I will get back to you regarding your 1st question. 
 

Thanks,
-Surbhi

0 Kudos
Ranjit_T_1
Beginner
801 Views

Hi Surbhi,

     We shall wait for a reply from you regarding query 1.

     Following are the details on the Query 4. Consider the following query sequence (let in0,in1,... are input frame;out0,out1,.... are output frames).

1. VPPDenoise (in0,out0) gives an output of out0 and we see that in0 is released as well (confirmed through GetFreeSurfaceIndex call). So here tere is no information past or future frames. So is it that only spatial denoising is performed for this frame?

2.VPPDenoise (in1,out1) call needs in0 as well for denoising as it uses spatial information. Is it that copy of in0 is maintained by denoiser? If yes, we assume that the denoising for this frame could be based on temporal denoising. If no, we assume that the denoising for this frame

This query is mainly to check if VPP denoiser does any buffering of the input frames passed.

Regards,

Ranjit

 

0 Kudos
Surbhi_M_Intel
Employee
801 Views

Hi Ranjit, 

Regarding Query1 : 
No currently we don't do denoising on the chroma data. Since we take input as NV12, du eto which we can't create monochrome surfaces for each component and call denoiser. 

Query4: 
Just checked with experts, currently we are implementing spatial denoiser only, for which there is no need to keep any buffers for the previous frame. Please let us know if you need this feature to reach your requirements with Media SDK, I can pass on this information to our  development team.

Thanks,
-Surbhi

 

0 Kudos
Ranjit_T_1
Beginner
801 Views

Hi Surbhi,

    Thanks for your reply.

Regarding Query 1, we feel that the application can do conversion from Semi-planar format (UV interleaved) to planar format (separate planes for U and V). If this conversion is done by application, we can do denoising on each of the planes (if application configures VPP module to do denoising of each plane through 3 sessions for Y,U and V). Please confirm.

Regarding Query 4, this was just a query from our side as we feel that temporal denoising in addition to spatial denoising can improve the quality of denoised video.

We have 2 more queries.

Query 5 : We tried checking the speed of the denoiser on a BRIX machine, we found that software version of denoiser works at 20 to 30 fps with 16% cpu load (depending on the content). We tried increasing the numThreads parameters to achieve more fps. But still we could not achieve fps more than above. Is there any way in which we can get better fps?

Query 6:We also see that denoising and deinterlacing can be configured through one VPPprocess call. However is there any way in which we can control the order of processing (ie, denoising first or deinterlacing first, followed by the other)

Regards,

Ranjit

0 Kudos
Surbhi_M_Intel
Employee
801 Views

Hi Ranjit, 

Regarding Query 1, we feel that the application can do conversion from Semi-planar format (UV interleaved) to planar format (separate planes for U and V). If this conversion is done by application, we can do denoising on each of the planes (if application configures VPP module to do denoising of each plane through 3 sessions for Y,U and V). Please confirm.
As far as I have understood this, the denoise is happening on Luma only so the algorithm might not be suitable for chroma data. Also the VPP process complete YUV plane in packed format. I think there is no conversion, it is the way you want to read the data - planar or packed format.  
 

Regarding Query 4, this was just a query from our side as we feel that temporal denoising in addition to spatial denoising can improve the quality of denoised video.
Thanks for clearing that

Query 5 : We tried checking the speed of the denoiser on a BRIX machine, we found that software version of denoiser works at 20 to 30 fps with 16% cpu load (depending on the content). We tried increasing the numThreads parameters to achieve more fps. But still we could not achieve fps more than above. Is there any way in which we can get better fps?
I haven't tried this, will try and update what I see on my brix machine. But you might wanna try asyncdepth parameter to parallelize. 
Also, from Media SDK manual "NumThread Deprecated; Used to represent the number of threads theunderlying implementation can use on the host processor. Always set this parameter to zero."

Query 6:We also see that denoising and deinterlacing can be configured through one VPPprocess call. However is there any way in which we can control the order of processing (ie, denoising first or deinterlacing first, followed by the other)
After talking to experts about it and doing some analysis myself, Deinterlacer needs to work first and then denoiser. Since denoiser only works on progressive data. And In general, there is no way we can control the order with one VPPprocess call unless you make more than one VPPprocess call.

Please let us know if you find any difference in behavior.

Thanks,
-Surbhi

 

0 Kudos
Ranjit_T_1
Beginner
801 Views

Hi Surbhi,

Please find by observations below:

Regarding Query1, please find our planned approach. We see that preprocessing module can take Monochroma input and produce monochrome output. Based on above, we plan to do the following by creating 3 sessions.

1. For luma , make a VPPProcess call to produce a luma denoised data

2. For chroma, applications will do the deinterleaving U and V data to two separate planes containing U and V separately and then makes two   VPPProcess calls separately for each plane (one for U data and one for V data).

We accept that denoiser might not be tuned for chroma data. Other than this, do you see any other issues with above design?

Regarding Query5, We tried increasing the asyncdepth (1/4/16) still we could not achieve better fps. Can you let me know if I am missing something?

Regarding Query6, .we accept that deinterlacing should be done first, followed by denoising. We observed that VPP module does it other way (denosing followed by deinterlacing). To confirm this we made following  experiment:

Take a yuv -> Perform denoising -> Perform deinterlacing ->out1.yuv

Take above yuv -> Perform deinterlacing -> Perform denoising ->out2.yuv

Take above yuv -> Perform deinterlacing + denoising in single VPP process call ->out3.yuv

As per the information provided by you, out3.yuv and out2.yuv should match.  We observed that out1.yuv and out3.yuv match. Can you confirm my observations?We shall verify the same once at our end.

Regards,

Ranjit

 

0 Kudos
Ranjit_T_1
Beginner
801 Views

Hi Surbhi,

    Any updates on above?

Regards,

Ranjit

0 Kudos
Sravanthi_K_Intel
801 Views

Hello there-

Your planned approach look alright. You should be able to VPP-process individual planes, as long as you handle these appropriately in the output stage (when writing to file).

Regarding your experiment, let me try to understand what you are doing - You are feeding an interleaved YUV input in the above 3 experiments? Meaning, you are applying VPP filters on interleaved YUV frames? 

It would be helpful if you could send in your sample code and the inputs you are using for your experiment. We can try to reproduce it easily and without any gaps. Could you please do that?

Regarding speed of the VPP filter (denoiser on i7-4770), you mention 16% cpu load at 20-30fps - Does this include or exclude the file I/O operations (specifically the YUV raw-data reading)? Can you use (and modify) our tutorial simple_4_vpp_resize_denoise (https://software.intel.com/en-us/intel-media-server-studio-support/training) to reproduce the behavior you are seeing, and send us the code? That will be very helpful.

0 Kudos
Ranjit_T_1
Beginner
801 Views

Hi Sravanthi,

       Performing denoising on each of the planes is our planned approach and we wanted to verify the same once with you. Thanks for confirming the correctness of approach. 

       We performed all the speed related experiments on windows OS using  simple_4_vpp_resize_procamp  example.  When we tried profiling on a i5 2400 cpu (32 bit build), we get roughly 70 fps. Whereas on i7 4770r (BRIX), we get 20 fps (64 bit build). Is this expected? We feel that we should have got better fps on  i7 4770r. 

Regards,

Ranjit

0 Kudos
Sravanthi_K_Intel
801 Views

Hello Ranjit - Thanks for sharing your results. Can you give some more details on the system details for the 2 configs above? (Gfx driver details, OS, version of SDK using, are you running app compiled on 32-b on 64-bit machine etc.,). The difference of 50 fps you are observing is not what we expect.

0 Kudos
Ranjit_T_1
Beginner
801 Views

Hi Sravanthi,

We found that the experiment we have done with asyncdepth parameter (for better speed in denoising mode) is not correct. We shall do the relevant experiment correctly and get back  to you in case of queries.

Regards,

Ranjit

0 Kudos
Sravanthi_K_Intel
801 Views

Thanks for the update Ranjit. Let us know if you find any unexpected behavior.

0 Kudos
Reply