Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

sample_decode_x11 cpu usage

johnson_j_
New Contributor I
1,035 Views

Hi, 

I  run sample_decode_x11 to decode a 1080P H264 stream with 25 fps and with -r option.  It shows 92.7% cpu usage. My

computer  cpu information : Intel(R) Core(TM) i5-4590 CPU @ 3.30GHZ.
 

Why does it have such high cpu usage. Which part consumes such high cpu usage.

Thanks! 

0 Kudos
13 Replies
Sravanthi_K_Intel
1,035 Views

First - can you give some more details on the experiment you are running? You can use the following format for that - https://software.intel.com/en-us/forums/topic/531083

May I ask why you are using X11 and not DRM method instead? We highly recommend using DRM, and you can find more information on that here - https://software.intel.com/en-us/articles/using-drmserver-with-media-sdk-for-linux-servers-applications

We do not expect (and have not observed) such high CPU usage while running decode sample - we expect it to be quite minimal on HW accel systems (comfortably <10%) with the render option. If you can send us some more details using the format above, we can try to identify the issue.

0 Kudos
johnson_j_
New Contributor I
1,035 Views

Processor Type:    Intel(R) Core(TM) i5-4590 CPU @ 3.30GHZ.
Driver Version:      MediaSDK version 1.11
Operating System:  CentOS  Linux release 7.0.1406 (Core)
Media SDK System Analyzer: This will give above three information and more about the system related capabilities
Quick Reproducer Code: sample_decode_x11
Concise Description of the Issue:
Priority:  High
Input File: The file is a 1080p h264 stream.  The size is very large(if you really want, i will think ways to grab some part of it and post them on the forum)
Tracer log(if required)

libva info: VA-API version 0.35.0

libva info: va_getDriverName() returns 0

libva info: User requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
Decoding Sample Version 0.0.000.0000


Input video    AVC 
Output format    YUV420
Resolution    1920x1088
Crop X,Y,W,H    0,0,0,0
Frame rate    0.00
Memory type        d3d
MediaSDK impl        hw
MediaSDK version    1.11

 

 

0 Kudos
johnson_j_
New Contributor I
1,035 Views

The second question: why i use  X11 instead of DRM,

The answer is  that  i want to display the video.   DRM can not display even i give -r option in the command line. I don't know why?

 

 

 

0 Kudos
johnson_j_
New Contributor I
1,035 Views

Below is the output of top -d 1:

top - 16:55:51 up 5 days,  6:48,  4 users,  load average: 0.43, 0.22, 0.12
Tasks: 202 total,   1 running, 201 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.2 us,  3.8 sy,  0.0 ni, 71.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   3753612 total,  3245472 used,   508140 free,     2108 buffers
KiB Swap:  3948540 total,        0 used,  3948540 free.  1978088 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                             
12585 sample    20   0  301816  13796   5596 S  96.6  0.4   0:05.96 sample_decode_x                                                                                     
10726 sample    20   0 1571288 113684  33732 S  10.0  3.0   0:27.70 gnome-shell                                                                                         
10257 root      20   0  153268  15992   7208 S   7.0  0.4   0:17.52 Xorg                                                                                                
11102 sample    20   0  629652  20740  12256 S   2.0  0.6   0:05.83 gnome-terminal-                                                                                     
10629 sample    20   0  999544  26160  15104 S   1.0  0.7   0:01.00 gnome-settings-                                                                                     
    1 root      20   0  143232   6948   3776 S   0.0  0.2   0:12.50 systemd                                                                                             
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.25 kthreadd                                                                                            
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.14 ksoftirqd/0                                                                                         
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                                                                                        
    7 root      rt   0       0      0      0 S   0.0  0.0   0:00.11 migration/0                                                                                         
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh                                                                                              
    9 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/0                                                                                             
   10 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/1                                                                                             
   11 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/2                                                                                             
   12 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/3                                                                                             
   13 root      20   0       0      0      0 S   0.0  0.0   1:18.69 rcu_sched                                                                                           
   14 root      20   0       0      0      0 S   0.0  0.0   0:37.20 rcuos/0                                                                                             
   15 root      20   0       0      0      0 S   0.0  0.0   0:14.72 rcuos/1                                                                                             
   16 root      20   0       0      0      0 S   0.0  0.0   0:20.27 rcuos/2                                                                                             
   17 root      20   0       0      0      0 S   0.0  0.0   0:17.39 rcuos/3                                                                                             
   18 root      rt   0       0      0      0 S   0.0  0.0   0:01.76 watchdog/0                                                                                          
   19 root      rt   0       0      0      0 S   0.0  0.0   0:01.73 watchdog/1                                                                                          
   20 root      rt   0       0      0      0 S   0.0  0.0   0:00.10 migration/1                                                                                         
   21 root      20   0       0      0      0 S   0.0  0.0   0:00.02 ksoftirqd/1                                                                                         
   23 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H                                                                                        
   24 root      rt   0       0      0      0 S   0.0  0.0   0:01.56 watchdog/2                                                                                          
   25 root      rt   0       0      0      0 S   0.0  0.0   0:00.12 migration/2                                                                                         
   26 root      20   0       0      0      0 S   0.0  0.0   0:00.05 ksoftirqd/2                                                                                         
   28 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/2:0H                                                                                        
   29 root      rt   0       0      0      0 S   0.0  0.0   0:01.56 watchdog/3                                                                                          
   30 root      rt   0       0      0      0 S   0.0  0.0   0:00.11 migration/3                                                                                         
   31 root      20   0       0      0      0 S   0.0  0.0   0:00.01 ksoftirqd/3                                                                                         
   33 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/3:0H                                                                                        
   34 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 khelper                                                                                             
   35 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kdevtmpfs                                                                                           
   36 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 netns                                                                                               
   37 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 writeback                                                                                           
   38 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kintegrityd                                                                                         
   39 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 bioset                                                                                              
   40 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kblockd                                                                                             

 

0 Kudos
Sravanthi_K_Intel
1,035 Views

Hello Johnson,

Thank you for the detailed report on the issue. The Media Server Studio product is intended for server use-case and not optimized for client use-case. Meaning, we intend the samples to be run in headless mode (use DRM) - so that the output is either streamed using UDP packets or written to a file and then decoded using a player.

In your case, you are rendering the output using X11 and we do not recommend this method since it is not optimized for. That is why you are observing such high CPU usage. In short, our recommendation is to run headless using DRM. Apologies for not making this clear at the beginning itself or in documentation. Questions such as these bring to focus the gaps in our documentation communication, and we will improve them for future.

0 Kudos
johnson_j_
New Contributor I
1,035 Views

Thanks for your reply!

My team wants to use media sdk  in our products  for local deoding and displaying.  The use cases include: decode and display stream data coming from network or decode and display stream data stored in local disks.  But our product  does local decoding and displaying at the same time does other things like recording etc. 

So we want  local decoding and displaying at very low cpu usage in order to give more cpu to other business, and at very low latency for

network  playback.

Can you get our  user scenario?  If not, i will give more description.

So, pls help us to get ways to apply media sdk to our  user scenario, thanks!

 

 

0 Kudos
johnson_j_
New Contributor I
1,035 Views

i  find vaPutSurface  is the source of high cpu usage.

if i  comment out the function, the cpu usage will be lower.

 

0 Kudos
Sravanthi_K_Intel
1,035 Views

Hello Johnson,

The vaPutSurface function renders the frame on the screen, and glad you found that as the bottleneck. Commenting that function will disable drawing the decoded output on the screen. Here are some of my recommendations based on what you want to achieve - 

1. Our decoder performance is very competitive, but we have not optimized the X11 interface. This is because our samples and tutorials are meant to be starting points for application development and not meant to be product quality. So, it would be very efficient if you could write an optimized X11 interface of your system that can display the decoded frames.

2. You can use a small circular buffer (or file I/O) to write the decoded frames at 30fps (or your playback rate), and use an external player to read the buffer or file to play. This way, you can avoid writing large files, and also control the rate.

3. UDP locally, and playback using ffplay or other players (VLC).

Hope this helps. If I get more suggestions from my colleagues, will let you know.

0 Kudos
johnson_j_
New Contributor I
1,035 Views

Thanks. 

Now i want to display  the decoded frames  by using opengl.

But i have a question: is there a method for transferring  decoded frames(always nv12 format for media sdk) to opengl texture in the GPU instead of  from GPU to CPU then back to GPU ?

0 Kudos
Sravanthi_K_Intel
1,035 Views

Hello johnson, Happy New Year!

We do not have an example or support to do this easily in Linux yet, although in Windows we support DXVA+OGL surface sharing. We understand the need for OGL for Linux - but given that the server product (for Linux) is usually run headless, we have not prioritized this, but your feedback is welcome and and we will plan to include this in our future releases.

in the meantime, I recommend you look at MMSF framework to achieve what you are looking for - https://software.intel.com/sites/landingpage/mmsf/documentation/index.html. It comes with samples as well for you to get started and playing with. For your use-case, this sample is of relevance - https://software.intel.com/sites/landingpage/mmsf/documentation/mmsf_example1.html. Hope this helps.

0 Kudos
johnson_j_
New Contributor I
1,035 Views

Thanks!

0 Kudos
慧_张_
Beginner
1,035 Views

Hello johnson,

I have the same requirement with you ! we want  local decoding and displaying at very low cpu usage in order to give more cpu to other business, and at very low latency for network  playback, too.

Do you have resolved the problem ?  Can you share your experience ?  Thank you in advance.

0 Kudos
慧_张_
Beginner
1,035 Views

Hello johnson,

I have the same requirement with you ! we want  local decoding and displaying at very low cpu usage in order to give more cpu to other business, and at very low latency for network  playback, too.

Do you have resolved the problem ?  Can you share your experience ?  Thank you in advance.

0 Kudos
Reply