<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Low performance issue about ipp function ippsEncodeLZO_8u in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/Low-performance-issue-about-ipp-function-ippsEncodeLZO-8u/m-p/994967#M22825</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I am testng the performance both Intel IPP LZO and LZO(Ver2.0.6). I found that the IPP performance is much lower than LZO2.06 .&lt;/P&gt;
&lt;P&gt;My test bed:&lt;/P&gt;
&lt;P&gt;Hardware&lt;/P&gt;
&lt;P&gt;•DELL R720&lt;/P&gt;
&lt;P&gt;&amp;nbsp; Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (Sandy Bridge Arch)&lt;/P&gt;
&lt;P&gt;&amp;nbsp; 24 GB RAM, BIOS Version: 1.2.6&lt;/P&gt;
&lt;P&gt;SoftWare&lt;/P&gt;
&lt;P&gt;•OS: RH6.0, kernel 2.6.32-71.el6.x86_64&lt;/P&gt;
&lt;P&gt;•Intel IPP main package: &lt;STRONG&gt;parallel_studio_xe_2011_sp1_update3_intel64 &lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;•LZO version 2.06 •Compile Option:&lt;STRONG&gt; gcc &lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Test Method :&lt;/P&gt;
&lt;P&gt;1.First&amp;nbsp; I&amp;nbsp; can configure the thread number and round number to do the compression. (The ipp internal thread mode&amp;nbsp;is IppLZO1XST, but&amp;nbsp;&amp;nbsp;benchmark program is multithread&amp;nbsp;)&lt;/P&gt;
&lt;P&gt;2.Then, the benchmark program reads full file into memory and compress whole in memory.&lt;/P&gt;
&lt;P&gt;3.Finally we can get the result about performance and compress ratio.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="title"&gt;The main procedure&lt;/SPAN&gt;&lt;SPAN class="additional"&gt; for&amp;nbsp;Intel IPP LZO&amp;nbsp;test program pseudocode:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;*&lt;STRONG&gt;The source file to be compressed is 16MB and the compression ratio is 1.5:1&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;#define BUFSIZE 16*1024*1024&amp;nbsp; /* 16MB */&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;void compress_per_thread(const char* pInFileName, int opt_round_num) // this is the thread function&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;{&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int fd_in;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; IppLZOState_8u *pLZOState;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Ipp8u* p_in_buffer = NULL;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Ipp32u srcLen, dstLen, lzoSize;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="additional"&gt;fd_in = open(pInFileName, O_RDONLY, 0);&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ippsEncodeLZOGetSize(IppLZO1XST, BUFSIZE, &amp;amp;lzoSize);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; pLZOState = (IppLZOState_8u*)ippsMalloc_8u(lzoSize);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ippsEncodeLZOInit_8u(IppLZO1XST, BUFSIZE, pLZOState);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; p_in_buffer = ppsMalloc_8u(BUFSIZE);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; p_out_buffer = ppsMalloc_8u(BUFSIZE + BUFSIZE / 10);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;src_len = read(fd_in, p_in_buffer, BUFSIZE); // I make sure that the size of src_file is BUFSIZE. So,&amp;nbsp;program read the whole file into memory&amp;nbsp;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; gettimeofday(timeStart, &amp;amp;tz);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for(i = 0; i &amp;lt; opt_round_num; i++) //&amp;nbsp;Specified the opt_round_num&amp;nbsp;for per thread to&amp;nbsp;tune &amp;nbsp;performance&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ippsEncodeLZO_8u(p_in_buffer , src_len , p_out_buffer , (Ipp32u*)&amp;amp;dst_len, pLZOState);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; gettimeofday(timeEnd, &amp;amp;tz);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ippsFree(p_out_buffer);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ippsFree(p_in_buffer);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; close(fd_in);&lt;BR /&gt;&lt;SPAN class="additional"&gt;}&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN class="title"&gt;The main procedure&lt;/SPAN&gt;&lt;SPAN class="additional"&gt; for&amp;nbsp;LZO(v.2.0.3) test program is same to IPP LZO, it calls function lzo1x_1_compress to compress.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;Performance reached the optimal value when thread nume is 24. But the performance for IPP LZO is&amp;nbsp;&lt;STRONG&gt;10.3 Gbps&lt;/STRONG&gt; and LZO v2.0.6 is&lt;STRONG&gt; 31.18 Gbps&lt;/STRONG&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&lt;STRONG&gt;Why the IPP LZO performance is much slower than LZO v2.0.6 with the 16MB test data?&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp; Notes: &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp; If I configure the IPP thread mode to IppLZO1XMT(the thread number equals number of processors in the system&amp;nbsp;by default), and my benchmark program&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;thread&lt;/SPAN&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;number also aquals to number of processors in the system. I think&lt;/SPAN&gt;&lt;SPAN class="additional"&gt;&amp;nbsp; the&amp;nbsp;&amp;nbsp;thread context-switch will degrade performance.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 28 Sep 2012 03:31:32 GMT</pubDate>
    <dc:creator>haixiao_j_</dc:creator>
    <dc:date>2012-09-28T03:31:32Z</dc:date>
    <item>
      <title>Low performance issue about ipp function ippsEncodeLZO_8u</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Low-performance-issue-about-ipp-function-ippsEncodeLZO-8u/m-p/994967#M22825</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I am testng the performance both Intel IPP LZO and LZO(Ver2.0.6). I found that the IPP performance is much lower than LZO2.06 .&lt;/P&gt;
&lt;P&gt;My test bed:&lt;/P&gt;
&lt;P&gt;Hardware&lt;/P&gt;
&lt;P&gt;•DELL R720&lt;/P&gt;
&lt;P&gt;&amp;nbsp; Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (Sandy Bridge Arch)&lt;/P&gt;
&lt;P&gt;&amp;nbsp; 24 GB RAM, BIOS Version: 1.2.6&lt;/P&gt;
&lt;P&gt;SoftWare&lt;/P&gt;
&lt;P&gt;•OS: RH6.0, kernel 2.6.32-71.el6.x86_64&lt;/P&gt;
&lt;P&gt;•Intel IPP main package: &lt;STRONG&gt;parallel_studio_xe_2011_sp1_update3_intel64 &lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;•LZO version 2.06 •Compile Option:&lt;STRONG&gt; gcc &lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Test Method :&lt;/P&gt;
&lt;P&gt;1.First&amp;nbsp; I&amp;nbsp; can configure the thread number and round number to do the compression. (The ipp internal thread mode&amp;nbsp;is IppLZO1XST, but&amp;nbsp;&amp;nbsp;benchmark program is multithread&amp;nbsp;)&lt;/P&gt;
&lt;P&gt;2.Then, the benchmark program reads full file into memory and compress whole in memory.&lt;/P&gt;
&lt;P&gt;3.Finally we can get the result about performance and compress ratio.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="title"&gt;The main procedure&lt;/SPAN&gt;&lt;SPAN class="additional"&gt; for&amp;nbsp;Intel IPP LZO&amp;nbsp;test program pseudocode:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;*&lt;STRONG&gt;The source file to be compressed is 16MB and the compression ratio is 1.5:1&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;#define BUFSIZE 16*1024*1024&amp;nbsp; /* 16MB */&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;void compress_per_thread(const char* pInFileName, int opt_round_num) // this is the thread function&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;{&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int fd_in;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; IppLZOState_8u *pLZOState;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Ipp8u* p_in_buffer = NULL;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Ipp32u srcLen, dstLen, lzoSize;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="additional"&gt;fd_in = open(pInFileName, O_RDONLY, 0);&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ippsEncodeLZOGetSize(IppLZO1XST, BUFSIZE, &amp;amp;lzoSize);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; pLZOState = (IppLZOState_8u*)ippsMalloc_8u(lzoSize);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ippsEncodeLZOInit_8u(IppLZO1XST, BUFSIZE, pLZOState);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; p_in_buffer = ppsMalloc_8u(BUFSIZE);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; p_out_buffer = ppsMalloc_8u(BUFSIZE + BUFSIZE / 10);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;src_len = read(fd_in, p_in_buffer, BUFSIZE); // I make sure that the size of src_file is BUFSIZE. So,&amp;nbsp;program read the whole file into memory&amp;nbsp;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; gettimeofday(timeStart, &amp;amp;tz);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for(i = 0; i &amp;lt; opt_round_num; i++) //&amp;nbsp;Specified the opt_round_num&amp;nbsp;for per thread to&amp;nbsp;tune &amp;nbsp;performance&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ippsEncodeLZO_8u(p_in_buffer , src_len , p_out_buffer , (Ipp32u*)&amp;amp;dst_len, pLZOState);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; gettimeofday(timeEnd, &amp;amp;tz);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ippsFree(p_out_buffer);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ippsFree(p_in_buffer);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; close(fd_in);&lt;BR /&gt;&lt;SPAN class="additional"&gt;}&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN class="title"&gt;The main procedure&lt;/SPAN&gt;&lt;SPAN class="additional"&gt; for&amp;nbsp;LZO(v.2.0.3) test program is same to IPP LZO, it calls function lzo1x_1_compress to compress.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;Performance reached the optimal value when thread nume is 24. But the performance for IPP LZO is&amp;nbsp;&lt;STRONG&gt;10.3 Gbps&lt;/STRONG&gt; and LZO v2.0.6 is&lt;STRONG&gt; 31.18 Gbps&lt;/STRONG&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&lt;STRONG&gt;Why the IPP LZO performance is much slower than LZO v2.0.6 with the 16MB test data?&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp; Notes: &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp; If I configure the IPP thread mode to IppLZO1XMT(the thread number equals number of processors in the system&amp;nbsp;by default), and my benchmark program&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;&amp;nbsp;thread&lt;/SPAN&gt;&lt;SPAN class="additional"&gt;&amp;nbsp;number also aquals to number of processors in the system. I think&lt;/SPAN&gt;&lt;SPAN class="additional"&gt;&amp;nbsp; the&amp;nbsp;&amp;nbsp;thread context-switch will degrade performance.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="additional"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Sep 2012 03:31:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Low-performance-issue-about-ipp-function-ippsEncodeLZO-8u/m-p/994967#M22825</guid>
      <dc:creator>haixiao_j_</dc:creator>
      <dc:date>2012-09-28T03:31:32Z</dc:date>
    </item>
    <item>
      <title>the duplicated issue - http:/</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Low-performance-issue-about-ipp-function-ippsEncodeLZO-8u/m-p/994968#M22826</link>
      <description>the duplicated issue - &lt;A href="http://software.intel.com/en-us/forums/topic/328387" target="_blank"&gt;http://software.intel.com/en-us/forums/topic/328387&lt;/A&gt;</description>
      <pubDate>Sat, 29 Sep 2012 08:04:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Low-performance-issue-about-ipp-function-ippsEncodeLZO-8u/m-p/994968#M22826</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2012-09-29T08:04:26Z</dc:date>
    </item>
  </channel>
</rss>

