Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6704 Discussions

H264 decoder has bad scalability on 8cpu systems

Richard_X_
Beginner
325 Views

When we use IntelH264Decoder to decompress H264-coded I-frames one frame at a time, the performance is not real-time on our Duo2QuadCore system with 8cpus. Each of these I-frames have 8 independent slices and have the resolution of 1920x1080.

Here are some figures regarding the speed vs number of threads we used

SPEEDNum. of Threads

147 ms per frame1

83 ms per frame2

61 ms per frame 4

61 ms per frame8

I read a note with 5.3 saying that

new threading scheme was implemented. The decoder has got more scalability on 4cpu systems.

Does this mean that the decoder won't be able to take advantage of using more than 8cpus?

0 Kudos
3 Replies
Vladimir_Dudnik
Employee
325 Views

Hello,

could you please provide more details about your issue. What version of IPP do you use? Are you using simple_player application from IPP samples? How do you link IPP dynamically or statically? What Operating System do you use, windows or linux? Is it 32-bit or 64-bit? How do you set number of threads? Can you see utilization of all 8 cores with system monitor or any tools like that?

Regards,
Vladimir

0 Kudos
Richard_X_
Beginner
325 Views

IPP version:5.3.0.164

Testing tool:umc_h264_dec_con.exe

we fed the tool with only one I-frame data

We changed the number of threads using the option -t

OS: XP 32-bit

Num. of Slices per frame: 10

An interesting fact is that we can significantly improve the speed when we feed the tool with a stream of frames on the 8cpu. However, as I know, Intel does the decompression at the slice level. There is not reason that the decoder can't make use of the more cpu resources.

I generated some logs inside the main threading routine as in the following.

It's very obviously that the 10 slices were processed by 8 threads.

Thread 4 started ProcessSegment.
Thread 5 started ProcessSegment.
Thread 0 started ProcessSegment.
Thread 6 started ProcessSegment.
Thread 1 started ProcessSegment.
Thread 7 started ProcessSegment.
Thread 2 started ProcessSegment.
Thread 3 started ProcessSegment.
thread 3, frame 1, slice 8, firstMB 5712, m_iMBToProcess 48
thread 0, frame 1, slice 3, firstMB 1632, m_iMBToProcess 48
thread 1, frame 1, slice 5, firstMB 3264, m_iMBToProcess 96
thread 2, frame 1, slice 7, firstMB 4896, m_iMBToProcess 144
thread 5, frame 1, slice 2, firstMB 816, m_iMBToProcess 144
thread 6, frame 1, slice 4, firstMB 2448, m_iMBToProcess 192
thread 4, frame 1, slice 1, firstMB 0, m_iMBToProcess 240
thread 7, frame 1, slice 6, firstMB 4080, m_iMBToProcess 240
thread 3, frame 1, slice 8, firstMB 5760, m_iMBToProcess 240
thread 0, frame 1, slice 3, firstMB 1680, m_iMBToProcess 240
thread 1, frame 1, slice 5, firstMB 3360, m_iMBToProcess 240
thread 2, frame 1, slice 7, firstMB 5040, m_iMBToProcess 240
thread 5, frame 1, slice 2, firstMB 960, m_iMBToProcess 240
thread 6, frame 1, slice 4, firstMB 2640, m_iMBToProcess 240
thread 4, frame 1, slice 1, firstMB 240, m_iMBToProcess 240
thread 7, frame 1, slice 6, firstMB 4320, m_iMBToProcess 240
thread 3, frame 1, slice 8, firstMB 6000, m_iMBToProcess 240
thread 0, frame 1, slice 3, firstMB 1920, m_iMBToProcess 240
thread 2, frame 1, slice 7, firstMB 5280, m_iMBToProcess 240
thread 1, frame 1, slice 5, firstMB 3600, m_iMBToProcess 240
thread 5, frame 1, slice 2, firstMB 1200, m_iMBToProcess 240
thread 4, frame 1, slice 1, firstMB 480, m_iMBToProcess 240
thread 6, frame 1, slice 4, firstMB 2880, m_iMBToProcess 240
thread 7, frame 1, slice 6, firstMB 4560, m_iMBToProcess 240
thread 3, frame 1, slice 8, firstMB 6240, m_iMBToProcess 240
thread 4, frame 1, slice 1, firstMB 720, m_iMBToProcess 96
thread 2, frame 1, slice 7, firstMB 5520, m_iMBToProcess 192
thread 0, frame 1, slice 3, firstMB 2160, m_iMBToProcess 240
thread 7, frame 1, slice 6, firstMB 4800, m_iMBToProcess 96
thread 5, frame 1, slice 2, firstMB 1440, m_iMBToProcess 192
&nb sp; thread 3, frame 1, slice 8, firstMB 6480, m_iMBToProcess 48
thread 6, frame 1, slice 4, firstMB 3120, m_iMBToProcess 144
thread 1, frame 1, slice 5, firstMB 3840, m_iMBToProcess 240
thread 0, frame 1, slice 3, firstMB 2400, m_iMBToProcess 48
thread 2, frame 1, slice 10, firstMB 7344, m_iMBToProcess 96
thread 4, frame 1, slice 9, firstMB 6528, m_iMBToProcess 192
thread 7, frame 1, slice 10, firstMB 7440, m_iMBToProcess 240
thread 0, frame 1, slice 9, firstMB 6720, m_iMBToProcess 240
thread 1, frame 1, slice 10, firstMB 7680, m_iMBToProcess 240
thread 2, frame 1, slice 9, firstMB 6960, m_iMBToProcess 240
thread 7, frame 1, slice 9, firstMB 7200, m_iMBToProcess 144
thread 5, frame 1, slice 10, firstMB 7920, m_iMBToProcess 240

frame completed - poc - 0
Thread 0 finished ProcessSegment.
Thread 5 finished ProcessSegment.

Thread 4 finished ProcessSegment.
Thread 6 finished ProcessSegment.
Thread 1 finished ProcessSegment.
Thread 3 finished ProcessSegment.
Thread 2 finished ProcessSegment.
Thread 7 finished ProcessSegment.

test.h264162.6208 ms.15.9691 fps
CABAC/CAVLC - decoding time 69.8392 ms.
reconstruct time 105.2672 ms.
deblocking time 0.0000 ms.
summary time on all CPU cores 175.1064 ms.

0 Kudos
Vladimir_Dudnik
Employee
325 Views

Hello,

Thanks, we will look on this. Could you please provide test stream somehow? I would recommend you to submit your issue through Intel Premier Support.

Regards,
Vladimir

0 Kudos
Reply