Subregion encoding

Rūdolfs_B_ · ‎09-17-2014

Hi,

This is something that I could just test but I wanted to know if anyone from the technical support side can confirm/deny this to avoid spending time.

Is subregion encoding supported in intel quick sync? To be exact, let's say I have a video source that gives me the coordinates of the regions that are updated - e.g. let's start with a first frame of 640x480, it updates the whole frame and tells that the region 0,0,640,480 has changed. Next frame could be just an update of let's say 10,10,20,20. Can I actually pass the small updated region to the VPP and H.264 encoder to avoid processing unnecessary pixels by setting the crop values (the crop + width would stay the same as when the VPP/H.264 encoder was initialiazed, this is not an actual resolution change)? Or must I always provide a full frame.

Sravanthi_K_Intel · ‎09-17-2014

Hi Rudolf,

Thanks for your question. The question you ask is interesting and of interest to me as well - will get back to you soon on this.

Sravanthi_K_Intel · ‎09-17-2014

Hi there,

To specify a region of interest, you could use the mfxExtEncoderROI structure for encoding. Using this structure, you can specify an ROI size for your encoding operation. This structure is one of the variants of ExtendedBuffers and can be passed to the encoder by attaching it to mfxVideoParam->ExtParam. You can find more details on how to use these buffers in this article.

The above mentioned method will set the ROI for the stream and not for each frame as you request. And currently, MSDK does not support per-frame ROI specification for encoding. It is an interesting ask, and you could use the combination of extBuffer, resize, crop functionalities to get the desired effect.

"pass the small updated region to the VPP and H.264 encoder to avoid processing unnecessary pixels by setting the crop values " --> Yes, using mfxFrameInfo structure, you can specify any Crop value for the frame being processed. We are hoping to publish some articles on how to use crop/resize functions with VPP. When they are available, you will find the link here - https://software.intel.com/en-us/articles/technical-articles-on-trending-topics-of-intel-media-sdk

I hope this answers your questions. If not, please feel free to post a follow-up question.

Rūdolfs_B_ · ‎09-17-2014

Hi,

thanks for the response. Well, the ROI structure does not help me much if it is only per stream not per frame. Since the updates are random the source may as well update the whole frame at once so I can't actually set a ROI smaller than the width/height for the small stream, but then I'll play with the crop values to see if they help. Initially I asked the question since a quick look at the documentation of Media SDK 2014 R2 for clients stated (page 155, table "Constraits for ENCODE") that the crop values are ignored during SDK operation, so that made me think that it is impossible.

To give a more exact picture, I am processing multiple full hd streams with a pipeline VPP->ENCODE and since it seems that I am hitting the performance limits, I started to think, that maybe performing color conversion/H.264 encoding on smaller fragments not the full frame (since I have this information and I don't need any extra processing power to calculate it) would help me improve the performance since VPP and ENCODE would actually operate on smaller amount of pixels and less memory would be transferred, but if the ENCODE does not support this then I guess I'm out of luck.

Sravanthi_K_Intel · ‎09-17-2014

Yes, the ROI solution will work on a stream and not per-frame. Given your constraints, it is not the solution. But now that you mention you are hitting performance limits due to which you are considering encoding in this manner, can you let us know what are the performance thresholds that are not being met? Here is something that may be useful to you - In the VPP+Encode pipeline you mention, your input is raw stream. And the operation of doing a file IO to read the raw stream and copy the huge file over to the video surfaces (setting up for media pipeline) can be very costly performance-wise. (Instead, if you had a Decode+VPP+Encode pipeline, the cost of reading encoded frames and decoding them is much lower, since the encoded stream is much smaller and the decode process is very fast). Were you able to characterize where your performance bottleneck was being hit? If not, it may be worthwhile to take a look at this. If the encode performance still looks like the bottleneck, feel free to share you code and input stream, and we can take a look at it. We highly recommend you use/modify the existing tutorials/samples to reproduce the behavior.

Rūdolfs_B_ · ‎09-17-2014

Hi,

thanks for the response. Well I'll try to clarify the picture. This is a research project on building large scale display walls in the university - we have a demo video here https://www.youtube.com/watch?v=eFAATNofjHA&list=UUczMkdyYCPNj0GWc3rPcwaA. If we talk about the thresholds, I see that I am not able to encode 16 full hd streams with 25 fps (given the video source actually updates this fast, e.g. plays a video) in 16 parallel sessions in real time without a noticeable lag. I mean maybe bottleneck isn't even the right description, simply I am interested how can I max out the total fps at a given constant resolution while keeping a low latency. And subregion encoding was first that came to my mind. At least according to the benchmarks I've seen on web for full hd the peak encoding rate seems to be at about ~220 fps (I assume that breaking into parallel sessions would also introduce overhead) where I would need to encode 16 * 25 fps. Correct me if I'm wrong and the encoder is actually capable of doing this. I mean, I understand that every hardware has its limits and I'm still amazed by the performance I get if I run let's say 8 streams.then it actually works very well. This I why I've started looking into using Nvidia Maxwell cards, since the performance there seems to be quite close, but at least these can be stacked (e.g 3/4 cards) to share the workload.

As for my input, I get my input RGB data directly from RAM so I have no disk I/O overhead or whatsoever.

Sravanthi_K_Intel · ‎09-18-2014

Hello,

For "I am interested how can I max out the total fps at a given constant resolution while keeping a low latency. " you can configure the AsyncDepth parameter and GOP parameters to achieve low latency encoding. There is a tutorial application simple_3_encode_vmem_lowlatency that uses these parameters to achieve low latency encoding. You can find the tutorials here: https://software.intel.com/en-us/media-solutions-portal, and for some details, you can see here: https://software.intel.com/en-us/articles/media-sdk-tutorial-tutorial-samples-index. We also have tutorials for low latency transcoding, as shown in the previous link provided.

Regarding the ~220fps for encode - running my simple_3_encode_vmem_lowlatency on a 1920x1080p input stream gave me 360fps consistently. These numbers are from a HSW i7 machine (3.20GHz) with GT3e graphics. Possible that the system you are running on has its limitations as well, like you pointed out.

Anyway, hope the tutorials give you an idea about low latency settings. If you have more questions, let me know. Initially I was focusing on subregion encoding, but now I have a better understanding on your objective.

Rūdolfs_B_ · ‎09-18-2014

Hi,

before starting the project I already wen't through the samples and I'm using the same setup in terms of AsyncDepth, GOP and bitrate as in the tutorials, that is why I was asking about the subregions, since that seemed as the next logical step. My setup is not so powerful as mentioned so I'll see if I can get my hands on a faster machine to see how far I can get.

But still I did not fully understand, am I correct that since the SDK states that ENCODE ignores the crop values during operation that I don't even need to test it out if setting cropX and cropY per frame would help me anyhow?

Sravanthi_K_Intel · ‎09-22-2014

Yes, per frame crop values will not be used by the encoder. You can set the crop value at the stream granularity, and not at frame level for encoder.