Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

SSE4.1 optimization problem

Livikin_Alexey
Beginner
774 Views
Hi,

We have problem with optimization. If him active or set -O3 optimization after some time encoder generate output that cannot be decoded on some devices but not all devices stop decode. If we set -msse3 all work fine with any optimization (-O3,-O2,-O1). If set -msse4.2 and -O2 works better but still happens from time to time the same. We try icc version 12.1.0 (gcc version 3.4.3 compatibility) RHEL4 32 bit, Intel Xeon CPU E5645 @ 2.40GHz. As i understand problem in MPSADBW instruction or something more?

Best Regards,
Alexey.
0 Kudos
9 Replies
SergeyKostrov
Valued Contributor II
774 Views
Hi Alexey,

Quoting Livikin Alexey
Hi,

We have problem with optimization. If him active or set -O3 optimization after some time encoder generate output that cannot be decoded on some devices but not all devices stop decode. If we set -msse3 all work fine with any optimization (-O3,-O2,-O1). If set -msse4.2 and -O2 works better but still happens from time to time the same. We try icc version 12.1.0 (gcc version 3.4.3 compatibility) RHEL4 32 bit, Intel Xeon CPU E5645 @ 2.40GHz. As i understand problem in MPSADBW instruction or something more?

Best Regards,
Alexey.


A 'Sum of Absolute Differences' instruction 'MPSADBW'was introduced with SSE4 and since your post
starts with expression'We have problem with optimization...' it would be nice if you provide a test case.

I tested the instruction about two years ago and please let me know if you need my test case ( VS 2005 project).

Best regards,
Sergey

0 Kudos
SergeyKostrov
Valued Contributor II
774 Views
...A 'Sum of Absolute Differences' instruction 'MPSADBW'was introduced with SSE4 and since your post
starts with expression'We have problem with optimization...' it would be nice if you provide a test case.

I tested the instruction about two years ago and please let me know if you need my test case ( VS 2005 project)...

Here is a small examplethat showshow the instruction was used:

Note: 'NOP' instructions are used becausein 2010 I was using Intel Emulatorof SSE4 instructions...

[cpp]... ;////////////////////////////////////////////////////////////////////////////// ; AsmTestLib.asm include SSE4inst.inc .686 .XMM ;////////////////////////////////////////////////////////////////////////////// _TEXT SEGMENT _argSrc$ = 8 ; size = 4 _argDst$ = 12 ; size = 4 _spValueSize4 = 4 ; size of a Single-Precision value _dpValueSize8 = 8 ; size of a Double-Precision value _TEXT ENDS ;////////////////////////////////////////////////////////////////////////////// .CODE ;////////////////////////////////////////////////////////////////////////////// ; C/C++ Declaration: ; RTint SSE4CalcSAD( RTubyte *pchSrc, RTubyte *pchDst ); SSE4CalcSAD PROC NEAR MOV eax, DWORD PTR _argSrc$[esp-4] ; Load Source array of bytes into xmm1 MOVDQU xmm1, [eax] MOV ecx, DWORD PTR _argDst$[esp-4] ; Load Destination array of bytes into xmm2 MOVDQU xmm2, [ecx] MPSADBW xmm2, xmm1, 0 ; Calculate SAD NOP ; NOP instructions placed between SSE4 instructions PHMINPOSUW xmm1, xmm2 ; Identify minimum SAD NOP PEXTRW ecx, xmm1, 0 ; Extract minimum SAD Value NOP PEXTRW eax, xmm1, 1 ; Extract minimum SAD Index NOP RET SSE4CalcSAD ENDP ...

[/cpp]
0 Kudos
Livikin_Alexey
Beginner
774 Views
Thanx Sergey. I think i send small info. I not implement that instruction. I get ready ipp samples and make h.264 encoder based on that sources. h.264 profile baseline.
0 Kudos
SergeyKostrov
Valued Contributor II
774 Views
...set -O3 optimization after some time encoder generate output that cannot be decoded on some devices but not all devices stop decode...

[SergeyK] Could you provide more technical details? What devices are you using?

...understand problem in MPSADBW instruction or something more?

[SergeyK] It is hard to believe that there is an internal problem with 'MPSADBW' instruction. I would
consider:

- an optimizationproblem with the C++ compiler
- some problem(s) with thelibrary
0 Kudos
Livikin_Alexey
Beginner
774 Views
Yes i think same, problem not in a instruction....
0 Kudos
SergeyKostrov
Valued Contributor II
774 Views
Yes i think same, problem not in a instruction....


Hi Alexey,
Could provide some details on what IPP function was used? What about a simple test-case?
Best regards,
Sergey

0 Kudos
Ying_H_Intel
Employee
774 Views
Hi Alexey,

How do you link IPP in your encoder application? If general link, like dynamic link or static link as
the article described: http://software.intel.com/en-us/articles/introduction-to-linking-with-intel-ipp-70-library/
the external Compiler option like -O2, -O3 or -msse3, -msse4.1 etc don't influence the ipp function internally.

-msse* mean
No cpu id check
Intel and non-Intel processors
Illegal instruction error if run on unsupported processor
At least Pentium4 required (sse2)

You mentioned the code crashed on some device, what kind of device (processor) they are?

It seems wemay need to movethe topic to Intel Compiler Forum if youhave test caseand see if Compiler experts can give some hints?

Best Regards,
Ying
0 Kudos
Livikin_Alexey
Beginner
774 Views
Hi Ying,

Link static and dynamic. "Devices" in my post its any Polycom videoconference hardware or software endpoints. I use h264 encoder from ipp samples with slice size -1300. With -msse4.1 Endpoints stop decoding after hard move scenes,and i see i-frame requests from endpoints after that(i try generate i-frames but to no avail, the playback is not restored). With -msse3 on h264 i can say all work fine. But same problem exist in h263 encoder(with -msse3) she appears very rarely and maybe its problem with something more on my code. H263 encoder same from ipp samples.

Best Regards,
Alexey.
0 Kudos
Livikin_Alexey
Beginner
774 Views
After some tests i can add next:
1. Problem exist with any optimization parameters if in h264 encoder seted parameters
mv_search_method = 0,1,2;
me_search_x = 4;
me_search_y = 4;
That true for CIF dimension(352x288), on 4CIF (704x576) no problem...
2. If set:
mv_search_method = 0,1,2;
me_search_x = 0;
me_search_y = 0;
then problem exist with -msse4.1 on -msse3 him go away. So look problem not in optimization only.
0 Kudos
Reply