GPU Load is reaching high while decoding 8 H264 input streams with Gstreamer Pipeline

Madhavan · ‎01-28-2022

Hi Team,

In our Project, we are having the following requirements.

1) Decoding 16 4MP( 2560 x 1440 ) H264 input streams in parallel using Gstreamer pipeline.

2) Displaying all the decoded frames in single vaapi display. (By reducing the resolution of decoded frames using vappi postproc ).

3) And in parallel, reducing the resolution of decoded frames using vaapi, re encode using vaapi plugin and storing it in the file.

4) We are using VAAPI plugins to Decode, Encode, Post Processing and Display.

But we observe only maximum of 8 channels can be added, as GAM is reaching more than 90% in this case, found using intel_gpu_top command.

Shall you please share us the details for the following queries.

1) How to improve the GPU performance for above mentioned case, so GAM will be less and we could add more channels for decoding.

2) And what is the minimum VRAM size is required for the above mentioned specification.

3) Any other suggestion from intel, to improve the GPU Performance.

Please find the following environment details, for your reference.

intel-vaapi-driver : 2.2.0

libva : 1.8.3

i915.so/ i965.so : Mesa from Yocto build.

We are using the customized Appololake Intel Board for our Project.

We are using Linux Operating System:

Linux (none) 4.12.24-yocto-standard #1 SMP PREEMPT Tue May 18 23:23:53 EDT 2021 x86_64 GNU/Linux

CPU Details:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 92
model name : Intel(R) Celeron(R) CPU J3455 @ 1.50GHz
stepping : 9
microcode : 0x40
cpu MHz : 799.896
cache size : 1024 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 21
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault cat_l2 tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves ibpb ibrs stibp dtherm ida arat pln pts arch_capabilities
bugs : spectre_v1 spectre_v2
bogomips : 2995.20
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual

Also please find the attached Pipeline.jpg file for better undertsnading of our requirement or Gstreamer pipeline flow.

Please help us in resolving this issue.

Thanks in Advance.

AndrewG_Intel · ‎02-01-2022

Hello @Madhavan

Thank you for posting on the Intel® communities.

Please allow us to review this further and we will be posting back in the thread as soon as possible with more details.

Best regards,

Andrew G.

Intel Customer Support Technician

AndrewG_Intel · ‎02-04-2022

Hello Madhavan

After reviewing this further, the best option is to look for premier support from your FAE (field application engineer) especially since you are developing with Intel board (Apollo Lake) so you need to reach the Intel person who provided the hardware to develop with Intel and they can guide you with the appropriate support channel.

You can also go to the Software Development Technologies section in the Intel® Forums and post this question in the appropriate forum topic since you are looking for improving the GPU performance. However, the best action would be to reach the FAE who provided the hardware.

Having said that, we will proceed to close this thread now. Thank you for your understanding.

Best regards,

Andrew G.

Intel Customer Support Technician

Madhavan · ‎02-07-2022

@AndrewG_Intel Thank you.