<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic SSE4.1 optimization problem in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788203#M2196</link>
    <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1341777289593="60" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=565388" href="https://community.intel.com/en-us/profile/565388/" class="basic"&gt;Livikin Alexey&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;EM&gt;...set -O3 optimization after some time &lt;/EM&gt;&lt;STRONG&gt;encoder generate output that cannot be decoded on some devices but not all devices stop decode&lt;/STRONG&gt;&lt;EM&gt;...&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt; [&lt;STRONG&gt;SergeyK&lt;/STRONG&gt;] Could you provide more technical details? What devices are you using?&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;...understand problem in &lt;/EM&gt;&lt;STRONG&gt;&lt;EM&gt;MPSADBW instruction or something more?&lt;BR /&gt;&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt; [&lt;STRONG&gt;SergeyK&lt;/STRONG&gt;] It is hard to believe that there is an internal problem with '&lt;STRONG&gt;MPSADBW&lt;/STRONG&gt;' instruction. I would&lt;BR /&gt; consider:&lt;BR /&gt;&lt;BR /&gt; - an optimizationproblem with the C++ compiler&lt;BR /&gt; - some problem(s) with thelibrary&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Sun, 08 Jul 2012 19:57:18 GMT</pubDate>
    <dc:creator>SergeyKostrov</dc:creator>
    <dc:date>2012-07-08T19:57:18Z</dc:date>
    <item>
      <title>SSE4.1 optimization problem</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788199#M2192</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;We have problem with optimization. If him active or set -O3 optimization after some time encoder generate output that cannot be decoded on some devices but not all devices stop decode. If we set -msse3 all work fine with any optimization (-O3,-O2,-O1). If set -msse4.2 and -O2 works better but still happens from time to time the same. We try icc version 12.1.0 (gcc version 3.4.3 compatibility) RHEL4 32 bit, Intel Xeon CPU E5645 @ 2.40GHz. As i understand problem in &lt;B&gt;MPSADBW instruction or something more?&lt;BR /&gt;&lt;BR /&gt;Best Regards,&lt;BR /&gt;Alexey.&lt;BR /&gt;&lt;/B&gt;</description>
      <pubDate>Fri, 06 Jul 2012 12:34:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788199#M2192</guid>
      <dc:creator>Livikin_Alexey</dc:creator>
      <dc:date>2012-07-06T12:34:36Z</dc:date>
    </item>
    <item>
      <title>SSE4.1 optimization problem</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788200#M2193</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Hi Alexey,&lt;BR /&gt;&lt;BR /&gt;Quoting &lt;A jquery1341621855531="60" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=565388" href="https://community.intel.com/en-us/profile/565388/" class="basic"&gt;Livikin Alexey&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;We have problem with optimization. If him active or set -O3 optimization after some time encoder generate output that cannot be decoded on some devices but not all devices stop decode. If we set -msse3 all work fine with any optimization (-O3,-O2,-O1). If set -msse4.2 and -O2 works better but still happens from time to time the same. We try icc version 12.1.0 (gcc version 3.4.3 compatibility) RHEL4 32 bit, Intel Xeon CPU E5645 @ 2.40GHz. As i understand problem in &lt;B&gt;MPSADBW instruction or something more?&lt;BR /&gt;&lt;BR /&gt;Best Regards,&lt;BR /&gt;Alexey.&lt;BR /&gt;&lt;/B&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;BR /&gt;A '&lt;STRONG&gt;Sum of Absolute Differences&lt;/STRONG&gt;' instruction '&lt;STRONG&gt;MPSADBW&lt;/STRONG&gt;'was introduced with &lt;STRONG&gt;SSE4&lt;/STRONG&gt; and since your post&lt;BR /&gt;starts with expression'We have problem with optimization...' it would be nice if you provide a test case.&lt;BR /&gt;&lt;BR /&gt;I tested the instruction about two years ago and please let me know if you need my test case ( VS 2005 project).&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Sergey&lt;/P&gt;</description>
      <pubDate>Sat, 07 Jul 2012 00:49:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788200#M2193</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-07-07T00:49:48Z</dc:date>
    </item>
    <item>
      <title>SSE4.1 optimization problem</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788201#M2194</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1341622538703="61" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=353541" href="https://community.intel.com/en-us/profile/353541/" class="basic"&gt;Sergey Kostrov&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;...A '&lt;STRONG&gt;Sum of Absolute Differences&lt;/STRONG&gt;' instruction '&lt;STRONG&gt;MPSADBW&lt;/STRONG&gt;'was introduced with &lt;STRONG&gt;SSE4&lt;/STRONG&gt; and since your post&lt;BR /&gt;starts with expression'We have problem with optimization...' it would be nice if you provide a test case.&lt;BR /&gt;&lt;BR /&gt;I tested the instruction about two years ago and please let me know if you need my test case ( VS 2005 project)...&lt;/I&gt;&lt;/DIV&gt;&lt;BR /&gt;Here is a small examplethat showshow the instruction was used:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN style="text-decoration: underline;"&gt;Note&lt;/SPAN&gt;: 'NOP' instructions are used becausein 2010 I was using Intel Emulatorof SSE4 instructions...&lt;BR /&gt;&lt;BR /&gt;[cpp]...
;//////////////////////////////////////////////////////////////////////////////
; AsmTestLib.asm

include SSE4inst.inc

.686
.XMM

;//////////////////////////////////////////////////////////////////////////////

_TEXT	SEGMENT
_argSrc$		=  8								; size = 4
_argDst$		= 12								; size = 4
_spValueSize4	=  4								; size of a Single-Precision value
_dpValueSize8	=  8								; size of a Double-Precision value
_TEXT	ENDS

;//////////////////////////////////////////////////////////////////////////////

.CODE

;//////////////////////////////////////////////////////////////////////////////
; C/C++ Declaration:
; RTint SSE4CalcSAD( RTubyte *pchSrc, RTubyte *pchDst );

SSE4CalcSAD		PROC NEAR
	MOV			eax, DWORD PTR _argSrc$[esp-4]	; Load Source array of bytes into xmm1
	MOVDQU		xmm1, [eax]

	MOV			ecx, DWORD PTR _argDst$[esp-4]	; Load Destination array of bytes into xmm2
	MOVDQU		xmm2, [ecx]

	MPSADBW		xmm2, xmm1, 0					; Calculate SAD
	NOP											; NOP instructions placed between SSE4 instructions

	PHMINPOSUW	xmm1, xmm2						; Identify minimum SAD
	NOP

	PEXTRW		ecx, xmm1, 0					; Extract minimum SAD Value
	NOP
	PEXTRW		eax, xmm1, 1					; Extract minimum SAD Index
	NOP

	RET
SSE4CalcSAD		ENDP
...
&lt;BR /&gt;&lt;BR /&gt;[/cpp]&lt;/DIV&gt;</description>
      <pubDate>Sat, 07 Jul 2012 00:58:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788201#M2194</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-07-07T00:58:34Z</dc:date>
    </item>
    <item>
      <title>SSE4.1 optimization problem</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788202#M2195</link>
      <description>Thanx Sergey. I think i send small info. I not implement that instruction. I get ready ipp samples and make h.264 encoder based on that sources. h.264 profile baseline.</description>
      <pubDate>Sun, 08 Jul 2012 15:22:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788202#M2195</guid>
      <dc:creator>Livikin_Alexey</dc:creator>
      <dc:date>2012-07-08T15:22:48Z</dc:date>
    </item>
    <item>
      <title>SSE4.1 optimization problem</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788203#M2196</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1341777289593="60" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=565388" href="https://community.intel.com/en-us/profile/565388/" class="basic"&gt;Livikin Alexey&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;EM&gt;...set -O3 optimization after some time &lt;/EM&gt;&lt;STRONG&gt;encoder generate output that cannot be decoded on some devices but not all devices stop decode&lt;/STRONG&gt;&lt;EM&gt;...&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt; [&lt;STRONG&gt;SergeyK&lt;/STRONG&gt;] Could you provide more technical details? What devices are you using?&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;...understand problem in &lt;/EM&gt;&lt;STRONG&gt;&lt;EM&gt;MPSADBW instruction or something more?&lt;BR /&gt;&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt; [&lt;STRONG&gt;SergeyK&lt;/STRONG&gt;] It is hard to believe that there is an internal problem with '&lt;STRONG&gt;MPSADBW&lt;/STRONG&gt;' instruction. I would&lt;BR /&gt; consider:&lt;BR /&gt;&lt;BR /&gt; - an optimizationproblem with the C++ compiler&lt;BR /&gt; - some problem(s) with thelibrary&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Sun, 08 Jul 2012 19:57:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788203#M2196</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-07-08T19:57:18Z</dc:date>
    </item>
    <item>
      <title>SSE4.1 optimization problem</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788204#M2197</link>
      <description>Yes i think same, problem not in a instruction....</description>
      <pubDate>Mon, 09 Jul 2012 04:55:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788204#M2197</guid>
      <dc:creator>Livikin_Alexey</dc:creator>
      <dc:date>2012-07-09T04:55:57Z</dc:date>
    </item>
    <item>
      <title>SSE4.1 optimization problem</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788205#M2198</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1342009573359="60" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=565388" href="https://community.intel.com/en-us/profile/565388/" class="basic"&gt;Livikin Alexey&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;Yes i think same, problem not in a instruction....&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;BR /&gt;Hi Alexey,&lt;BR /&gt;Could provide some details on what IPP function was used? What about a simple test-case?&lt;BR /&gt;Best regards,&lt;BR /&gt;Sergey&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2012 12:32:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788205#M2198</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-07-11T12:32:36Z</dc:date>
    </item>
    <item>
      <title>SSE4.1 optimization problem</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788206#M2199</link>
      <description>Hi Alexey, &lt;BR /&gt;&lt;BR /&gt;How do you link IPP in your encoder application? If general link, like dynamic link or static link as&lt;BR /&gt;the article described: &lt;A href="http://software.intel.com/en-us/articles/introduction-to-linking-with-intel-ipp-70-library/"&gt;http://software.intel.com/en-us/articles/introduction-to-linking-with-intel-ipp-70-library/&lt;/A&gt;&lt;BR /&gt;the external Compiler option like -O2, -O3 or -msse3, -msse4.1 etc don't influence the ipp function internally.&lt;BR /&gt;&lt;BR /&gt;-msse* mean &lt;BR /&gt;No cpu id check &lt;BR /&gt;Intel and non-Intel processors &lt;BR /&gt;Illegal instruction error if run on unsupported processor &lt;BR /&gt;At least Pentium4 required (sse2)&lt;BR /&gt;&lt;BR /&gt;You mentioned the code crashed on some device, what kind of device (processor) they are?&lt;BR /&gt;&lt;BR /&gt;It seems wemay need to movethe topic to Intel Compiler Forum if youhave test caseand see if Compiler experts can give some hints? &lt;BR /&gt;&lt;BR /&gt;Best Regards,&lt;BR /&gt;Ying</description>
      <pubDate>Thu, 12 Jul 2012 03:18:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788206#M2199</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2012-07-12T03:18:09Z</dc:date>
    </item>
    <item>
      <title>SSE4.1 optimization problem</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788207#M2200</link>
      <description>Hi Ying,&lt;BR /&gt;&lt;BR /&gt;Link static and dynamic. "Devices" in my post its any Polycom videoconference hardware or software endpoints. I use h264 encoder from ipp samples with slice size -1300. With -msse4.1 Endpoints stop decoding after hard move scenes,and i see i-frame requests from endpoints after that(i try generate i-frames but to no avail, the playback is not restored). With -msse3 on h264 i can say all work fine. But same problem exist in h263 encoder(with -msse3) she appears very rarely and maybe its problem with something more on my code. H263 encoder same from ipp samples. &lt;BR /&gt;&lt;BR /&gt;Best Regards,&lt;BR /&gt;Alexey.&lt;BR /&gt;</description>
      <pubDate>Thu, 12 Jul 2012 13:06:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788207#M2200</guid>
      <dc:creator>Livikin_Alexey</dc:creator>
      <dc:date>2012-07-12T13:06:00Z</dc:date>
    </item>
    <item>
      <title>SSE4.1 optimization problem</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788208#M2201</link>
      <description>After some tests i can add next:&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;1. Problem exist with any optimization parameters if in h264 encoder seted parameters&lt;DIV id="_mcePaste"&gt;  mv_search_method = 0,1,2;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  me_search_x = 4;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  me_search_y = 4;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;That true for CIF dimension(352x288), on 4CIF (704x576) no problem...&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;2. If set:&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;DIV id="_mcePaste"&gt;  mv_search_method = 0,1,2;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  me_search_x = 0;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  me_search_y = 0;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;then problem exist with -msse4.1 on -msse3 him go away. So look problem not in optimization only.&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 12 Jul 2012 15:19:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/SSE4-1-optimization-problem/m-p/788208#M2201</guid>
      <dc:creator>Livikin_Alexey</dc:creator>
      <dc:date>2012-07-12T15:19:14Z</dc:date>
    </item>
  </channel>
</rss>

