We have problem with optimization. If him active or set -O3 optimization after some time encoder generate output that cannot be decoded on some devices but not all devices stop decode. If we set -msse3 all work fine with any optimization (-O3,-O2,-O1). If set -msse4.2 and -O2 works better but still happens from time to time the same. We try icc version 12.1.0 (gcc version 3.4.3 compatibility) RHEL4 32 bit, Intel Xeon CPU E5645 @ 2.40GHz. As i understand problem in MPSADBW instruction or something more?
Quoting Livikin Alexey
A 'Sum of Absolute Differences' instruction 'MPSADBW'was introduced with SSE4 and since your post
starts with expression'We have problem with optimization...' it would be nice if you provide a test case.
I tested the instruction about two years ago and please let me know if you need my test case ( VS 2005 project).
Here is a small examplethat showshow the instruction was used:
Note: 'NOP' instructions are used becausein 2010 I was using Intel Emulatorof SSE4 instructions...
[cpp]... ;////////////////////////////////////////////////////////////////////////////// ; AsmTestLib.asm include SSE4inst.inc .686 .XMM ;////////////////////////////////////////////////////////////////////////////// _TEXT SEGMENT _argSrc$ = 8 ; size = 4 _argDst$ = 12 ; size = 4 _spValueSize4 = 4 ; size of a Single-Precision value _dpValueSize8 = 8 ; size of a Double-Precision value _TEXT ENDS ;////////////////////////////////////////////////////////////////////////////// .CODE ;////////////////////////////////////////////////////////////////////////////// ; C/C++ Declaration: ; RTint SSE4CalcSAD( RTubyte *pchSrc, RTubyte *pchDst ); SSE4CalcSAD PROC NEAR MOV eax, DWORD PTR _argSrc$[esp-4] ; Load Source array of bytes into xmm1 MOVDQU xmm1, [eax] MOV ecx, DWORD PTR _argDst$[esp-4] ; Load Destination array of bytes into xmm2 MOVDQU xmm2, [ecx] MPSADBW xmm2, xmm1, 0 ; Calculate SAD NOP ; NOP instructions placed between SSE4 instructions PHMINPOSUW xmm1, xmm2 ; Identify minimum SAD NOP PEXTRW ecx, xmm1, 0 ; Extract minimum SAD Value NOP PEXTRW eax, xmm1, 1 ; Extract minimum SAD Index NOP RET SSE4CalcSAD ENDP ...
How do you link IPP in your encoder application? If general link, like dynamic link or static link as
the article described: http://software.intel.com/en-us/articles/introduction-to-linking-with-intel-ipp-70-library/
the external Compiler option like -O2, -O3 or -msse3, -msse4.1 etc don't influence the ipp function internally.
No cpu id check
Intel and non-Intel processors
Illegal instruction error if run on unsupported processor
At least Pentium4 required (sse2)
You mentioned the code crashed on some device, what kind of device (processor) they are?
It seems wemay need to movethe topic to Intel Compiler Forum if youhave test caseand see if Compiler experts can give some hints?
Link static and dynamic. "Devices" in my post its any Polycom videoconference hardware or software endpoints. I use h264 encoder from ipp samples with slice size -1300. With -msse4.1 Endpoints stop decoding after hard move scenes,and i see i-frame requests from endpoints after that(i try generate i-frames but to no avail, the playback is not restored). With -msse3 on h264 i can say all work fine. But same problem exist in h263 encoder(with -msse3) she appears very rarely and maybe its problem with something more on my code. H263 encoder same from ipp samples.