<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: H264 decoder has bad scalability on 8cpu systems in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/H264-decoder-has-bad-scalability-on-8cpu-systems/m-p/867752#M8494</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Thanks, we will look on this. Could you please provide test stream somehow? I would recommend you to submit your issue through &lt;A href="http://premier.intel.com"&gt;Intel Premier Support&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt; Vladimir&lt;/P&gt;</description>
    <pubDate>Wed, 20 Feb 2008 15:25:51 GMT</pubDate>
    <dc:creator>Vladimir_Dudnik</dc:creator>
    <dc:date>2008-02-20T15:25:51Z</dc:date>
    <item>
      <title>H264 decoder has bad scalability on 8cpu systems</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/H264-decoder-has-bad-scalability-on-8cpu-systems/m-p/867749#M8491</link>
      <description>&lt;P&gt;When we use IntelH264Decoder to decompress H264-coded I-frames one frame at a time, the performance is not real-time on our Duo2QuadCore system with 8cpus. Each of these I-frames have 8 independent slices and have the resolution of 1920x1080.&lt;/P&gt;
&lt;P&gt;Here are some figures regarding the speed vs number of threads we used&lt;/P&gt;
&lt;P&gt;SPEEDNum. of Threads&lt;/P&gt;
&lt;P&gt;147 ms per frame1&lt;/P&gt;
&lt;P&gt;83 ms per frame2&lt;/P&gt;
&lt;P&gt;61 ms per frame 4&lt;/P&gt;
&lt;P&gt;61 ms per frame8&lt;/P&gt;
&lt;P&gt;I read a note with 5.3 saying that&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="GramE"&gt;&lt;SPAN class="grame"&gt;&lt;SPAN&gt;new&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt; threading scheme was implemented. The decoder has got more scalability on 4cpu systems.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Does this mean that the decoder won't be able to take advantage of using more than 8cpus? &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jan 2008 15:07:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/H264-decoder-has-bad-scalability-on-8cpu-systems/m-p/867749#M8491</guid>
      <dc:creator>Richard_X_</dc:creator>
      <dc:date>2008-01-11T15:07:46Z</dc:date>
    </item>
    <item>
      <title>Re: H264 decoder has bad scalability on 8cpu systems</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/H264-decoder-has-bad-scalability-on-8cpu-systems/m-p/867750#M8492</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;could you please provide more details about your issue. What version of IPP do you use? Are you using simple_player application from IPP samples? How do you link IPP dynamically or statically? What Operating System do you use, windows or linux? Is it 32-bit or 64-bit? How do you set number of threads? Can you see utilization of all 8 cores with system monitor or any tools like that?&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt; Vladimir&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jan 2008 15:17:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/H264-decoder-has-bad-scalability-on-8cpu-systems/m-p/867750#M8492</guid>
      <dc:creator>Vladimir_Dudnik</dc:creator>
      <dc:date>2008-01-11T15:17:07Z</dc:date>
    </item>
    <item>
      <title>Re: H264 decoder has bad scalability on 8cpu systems</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/H264-decoder-has-bad-scalability-on-8cpu-systems/m-p/867751#M8493</link>
      <description>&lt;P&gt;IPP version:5.3.0.164&lt;/P&gt;
&lt;P&gt;Testing tool:umc_h264_dec_con.exe&lt;/P&gt;
&lt;P&gt; we fed the tool with only one I-frame data&lt;/P&gt;
&lt;P&gt;We changed the number of threads using the option -t&lt;/P&gt;
&lt;P&gt;OS: XP 32-bit&lt;/P&gt;
&lt;P&gt;Num. of Slices per frame: 10&lt;/P&gt;
&lt;P&gt;An interesting fact is that we can significantly improve the speed when we feed the tool with a stream of frames on the 8cpu. However, as I know, Intel does the decompression at the slice level. There is not reason that the decoder can't make use of the more cpu resources.&lt;/P&gt;
&lt;P&gt;I generated some logs inside the main threading routine as in the following.&lt;/P&gt;
&lt;P&gt;It's very obviously that the 10 slices were processed by 8 threads.&lt;/P&gt;
&lt;P&gt;Thread 4 started ProcessSegment.&lt;BR /&gt;Thread 5 started ProcessSegment.&lt;BR /&gt;Thread 0 started ProcessSegment.&lt;BR /&gt;Thread 6 started ProcessSegment.&lt;BR /&gt;Thread 1 started ProcessSegment.&lt;BR /&gt;Thread 7 started ProcessSegment.&lt;BR /&gt;Thread 2 started ProcessSegment.&lt;BR /&gt;Thread 3 started ProcessSegment.&lt;BR /&gt; thread 3, frame 1, slice 8, firstMB 5712, m_iMBToProcess 48&lt;BR /&gt; thread 0, frame 1, slice 3, firstMB 1632, m_iMBToProcess 48&lt;BR /&gt; thread 1, frame 1, slice 5, firstMB 3264, m_iMBToProcess 96&lt;BR /&gt; thread 2, frame 1, slice 7, firstMB 4896, m_iMBToProcess 144&lt;BR /&gt; thread 5, frame 1, slice 2, firstMB 816, m_iMBToProcess 144&lt;BR /&gt; thread 6, frame 1, slice 4, firstMB 2448, m_iMBToProcess 192&lt;BR /&gt; thread 4, frame 1, slice 1, firstMB 0, m_iMBToProcess 240&lt;BR /&gt; thread 7, frame 1, slice 6, firstMB 4080, m_iMBToProcess 240&lt;BR /&gt; thread 3, frame 1, slice 8, firstMB 5760, m_iMBToProcess 240&lt;BR /&gt; thread 0, frame 1, slice 3, firstMB 1680, m_iMBToProcess 240&lt;BR /&gt; thread 1, frame 1, slice 5, firstMB 3360, m_iMBToProcess 240&lt;BR /&gt; thread 2, frame 1, slice 7, firstMB 5040, m_iMBToProcess 240&lt;BR /&gt; thread 5, frame 1, slice 2, firstMB 960, m_iMBToProcess 240&lt;BR /&gt; thread 6, frame 1, slice 4, firstMB 2640, m_iMBToProcess 240&lt;BR /&gt; thread 4, frame 1, slice 1, firstMB 240, m_iMBToProcess 240&lt;BR /&gt; thread 7, frame 1, slice 6, firstMB 4320, m_iMBToProcess 240&lt;BR /&gt; thread 3, frame 1, slice 8, firstMB 6000, m_iMBToProcess 240&lt;BR /&gt; thread 0, frame 1, slice 3, firstMB 1920, m_iMBToProcess 240&lt;BR /&gt; thread 2, frame 1, slice 7, firstMB 5280, m_iMBToProcess 240&lt;BR /&gt; thread 1, frame 1, slice 5, firstMB 3600, m_iMBToProcess 240&lt;BR /&gt; thread 5, frame 1, slice 2, firstMB 1200, m_iMBToProcess 240&lt;BR /&gt; thread 4, frame 1, slice 1, firstMB 480, m_iMBToProcess 240&lt;BR /&gt; thread 6, frame 1, slice 4, firstMB 2880, m_iMBToProcess 240&lt;BR /&gt; thread 7, frame 1, slice 6, firstMB 4560, m_iMBToProcess 240&lt;BR /&gt; thread 3, frame 1, slice 8, firstMB 6240, m_iMBToProcess 240&lt;BR /&gt; thread 4, frame 1, slice 1, firstMB 720, m_iMBToProcess 96&lt;BR /&gt; thread 2, frame 1, slice 7, firstMB 5520, m_iMBToProcess 192&lt;BR /&gt; thread 0, frame 1, slice 3, firstMB 2160, m_iMBToProcess 240&lt;BR /&gt; thread 7, frame 1, slice 6, firstMB 4800, m_iMBToProcess 96&lt;BR /&gt; thread 5, frame 1, slice 2, firstMB 1440, m_iMBToProcess 192&lt;BR /&gt;&amp;amp;nb
sp; thread 3, frame 1, slice 8, firstMB 6480, m_iMBToProcess 48&lt;BR /&gt; thread 6, frame 1, slice 4, firstMB 3120, m_iMBToProcess 144&lt;BR /&gt; thread 1, frame 1, slice 5, firstMB 3840, m_iMBToProcess 240&lt;BR /&gt; thread 0, frame 1, slice 3, firstMB 2400, m_iMBToProcess 48&lt;BR /&gt; thread 2, frame 1, slice 10, firstMB 7344, m_iMBToProcess 96&lt;BR /&gt; thread 4, frame 1, slice 9, firstMB 6528, m_iMBToProcess 192&lt;BR /&gt; thread 7, frame 1, slice 10, firstMB 7440, m_iMBToProcess 240&lt;BR /&gt; thread 0, frame 1, slice 9, firstMB 6720, m_iMBToProcess 240&lt;BR /&gt; thread 1, frame 1, slice 10, firstMB 7680, m_iMBToProcess 240&lt;BR /&gt; thread 2, frame 1, slice 9, firstMB 6960, m_iMBToProcess 240&lt;BR /&gt; thread 7, frame 1, slice 9, firstMB 7200, m_iMBToProcess 144&lt;BR /&gt; thread 5, frame 1, slice 10, firstMB 7920, m_iMBToProcess 240&lt;/P&gt;
&lt;P&gt;frame completed - poc - 0&lt;BR /&gt;Thread 0 finished ProcessSegment.&lt;BR /&gt;Thread 5 finished ProcessSegment.&lt;/P&gt;
&lt;P&gt;Thread 4 finished ProcessSegment.&lt;BR /&gt;Thread 6 finished ProcessSegment.&lt;BR /&gt;Thread 1 finished ProcessSegment.&lt;BR /&gt;Thread 3 finished ProcessSegment.&lt;BR /&gt;Thread 2 finished ProcessSegment.&lt;BR /&gt;Thread 7 finished ProcessSegment.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;test.h264162.6208 ms.15.9691 fps&lt;BR /&gt;CABAC/CAVLC - decoding time 69.8392 ms.&lt;BR /&gt;reconstruct time 105.2672 ms.&lt;BR /&gt;deblocking time 0.0000 ms.&lt;BR /&gt;summary time on all CPU cores 175.1064 ms.&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jan 2008 15:33:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/H264-decoder-has-bad-scalability-on-8cpu-systems/m-p/867751#M8493</guid>
      <dc:creator>Richard_X_</dc:creator>
      <dc:date>2008-01-11T15:33:04Z</dc:date>
    </item>
    <item>
      <title>Re: H264 decoder has bad scalability on 8cpu systems</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/H264-decoder-has-bad-scalability-on-8cpu-systems/m-p/867752#M8494</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Thanks, we will look on this. Could you please provide test stream somehow? I would recommend you to submit your issue through &lt;A href="http://premier.intel.com"&gt;Intel Premier Support&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt; Vladimir&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2008 15:25:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/H264-decoder-has-bad-scalability-on-8cpu-systems/m-p/867752#M8494</guid>
      <dc:creator>Vladimir_Dudnik</dc:creator>
      <dc:date>2008-02-20T15:25:51Z</dc:date>
    </item>
  </channel>
</rss>

