Solved: Hi there,

Kz_Liao · ‎10-22-2014

Hi,

I'm building a video surveillance system with multiple concurrency 1080p inputs. The input video are all of yuy2 format, so they can by encoded directly. I found that if I increase the number of inputs, the CPU load might become high and a significant delay could be sensed by watching the real time video decoded on a remote PC.(The low latency mode is already enabled on both side of encoder and decoder)

So I just wanna find a way to calculate the maximum number of frames can be encoded in a second by a system, while the system can be functioning well enough. So the question is:

1) If I find the latency of encoding one frame is no more than 10 ms, should I say the system could encoding a similar video stream of 100fps? Considering the video streams are of the same resolution, and the movement or any other factor of the captured images will not affect the performance.

2) If it can encode a video stream of 100fps, could I say it can encode three similar video streams of 30fps? Should I take the session switch overhead into consideration?

3) If the system is overloaded, will it lead to the significant delay?

Thanks in advance.

Surbhi_M_Intel · ‎10-22-2014

Hi there,

Thank you for the question. With increase in the i/p, it's expected to have a higher load on CPU because of file copy.

Few questions to understand your problem better
- Are you taking i/p from PCI Card or have an attached camera?
- The i/p from the camera is YUY2 and you must be using VPP or some color conversion technique for converting it to NV12 format since MediaSDK encode support only NV12 i/p(considering H264 Codec), how are you handling this?
- Can you please give us more details of your pipeline and how you are using Media SDK features in it to understand what all areas can be bottleneck in your case?

>>> If I find the latency of encoding one frame is no more than 10 ms, should I say the system could encoding a similar video stream of 100fps? Considering the video streams are of the same resolution, and the movement or any other factor of the captured images will not affect the performance.
Ans- No it's not a direct relationship like that. There would be quite some factors like video being captured, the pipeline etc.

>>> If it can encode a video stream of 100fps, could I say it can encode three similar video streams of 30fps? Should I take the session switch overhead into consideration?
Yes you should take the session switch overhead in consideration. Again it's not a direct relationship. But just to give na idea to you that encoding multiple videos will might give better performance than expected. The reason being the system is being fully utilized at that time,all the components are busy. And the system would be running on turbo mode

3) If the system is overloaded, will it lead to the significant delay?
Yes it would lead to significant delay. You can check what is the capability of your system by increasing the no. of videos to be encode ranging from 1-n.

Thanks,
-Surbhi

View solution in original post

Surbhi_M_Intel · ‎10-22-2014

Hi there,

Thank you for the question. With increase in the i/p, it's expected to have a higher load on CPU because of file copy.

Few questions to understand your problem better
- Are you taking i/p from PCI Card or have an attached camera?
- The i/p from the camera is YUY2 and you must be using VPP or some color conversion technique for converting it to NV12 format since MediaSDK encode support only NV12 i/p(considering H264 Codec), how are you handling this?
- Can you please give us more details of your pipeline and how you are using Media SDK features in it to understand what all areas can be bottleneck in your case?

>>> If I find the latency of encoding one frame is no more than 10 ms, should I say the system could encoding a similar video stream of 100fps? Considering the video streams are of the same resolution, and the movement or any other factor of the captured images will not affect the performance.
Ans- No it's not a direct relationship like that. There would be quite some factors like video being captured, the pipeline etc.

>>> If it can encode a video stream of 100fps, could I say it can encode three similar video streams of 30fps? Should I take the session switch overhead into consideration?
Yes you should take the session switch overhead in consideration. Again it's not a direct relationship. But just to give na idea to you that encoding multiple videos will might give better performance than expected. The reason being the system is being fully utilized at that time,all the components are busy. And the system would be running on turbo mode

3) If the system is overloaded, will it lead to the significant delay?
Yes it would lead to significant delay. You can check what is the capability of your system by increasing the no. of videos to be encode ranging from 1-n.

Thanks,
-Surbhi

Kz_Liao · ‎10-22-2014

Hi, Surbhi ,

Thanks for the information.

For your questions:

>>> Can you please give us more details of your pipeline and how you are using Media SDK features in it to understand what all areas can be bottleneck in your case?

I'm using directshow in the development and using the sample_dshow_plugins for testing. The pipeline looks like:

capture source filter -> Intel H264 encoding filter -> RTP streamer filter

For multiple cameras, each one is in a separate directshow graph in a process. So when I test 3 cameras, there will be 3 processes.

>>> Are you taking i/p from PCI Card or have an attached camera?

I'm taking i/p from PCI card. We used to take image from USB camera, but the performance was unstable due to the USB bus bandwidth is not enough. But the manufacture of the PCI card provides an advanced driver which allow us to use it in a same way as the USB camera do with low overhead. So you don't see a video capture card and a camera in the pipeline above. Instead, there's only a capture source filter.

I think it might not the business of the PCI card driver. Since when I capture image from three cameras without encoding, the CPU load don't increase very much.

>>> The i/p from the camera is YUY2 and you must be using VPP or some color conversion technique for converting it to NV12 format since MediaSDK encode support only NV12 i/p(considering H264 Codec), how are you handling this?

Indeed, the VPP component in the filter converts the format. Beside, I found that the sample don't use the D3D video memory but use the system memory during the VPP->encoding procedure. May I try to use the D3D video memory?

Another question is, when you said "With increase in the i/p, it's expected to have a higher load on CPU because of file copy." Did the "file copy" mean like the data copy within the memory, and between the sys mem and the video mem?

Surbhi_M_Intel · ‎10-23-2014

Hi,

Thank you for explaining the problem in detail, appreciate it.
>>>"Indeed, the VPP component in the filter converts the format. Beside, I found that the sample don't use the D3D video memory but use the system memory during the VPP->encoding procedure. May I try to use the D3D video memory?"
If you are using VPP for the color conversion, then it would be great to take input from the system memory and out to the Video memory. This will read from the CPU and write to the GPU(check VppParams.IOPattern) and then encode on the GPU using video memory and -hw option.
>>>"With increase in i/p's, it's expected to have a higher load on CPU because of file copy." Did the "file copy" mean like the data copy within the memory, and between the sys mem and the video mem?
I think I wasn't very clear in my last post, what I mean here is that with increase in number of i/p's the file copy would increase. For eg CPU load for 1 i/p's would be lesser than CPU load for the 5 i/p's as more number of copies within the memory.
Using multiple streams will fully utilize the capability of the system and the no. of frames which can be encoded in a second with multiple streams can't be calculating the encoding speed of 1 stream.

Thanks,
-Surbhi

How to determine the maximum number of frames can be encoded per second