cross-typed usage penalties

david_livshin1 · ‎06-12-2007

Hi,

Intel 64 and IA-32 Architectures Optimization Reference Manual states ( see 5.1)
that

"Code sequences containing cross-typed usage produce the same result across
different implementations but incur a significant performance penalty. Using
SSE/SSE2/SSE3/SSSE3 instructions to operate on type-mismatched SIMD data
in the XMM register is strongly discouraged". ( underline is mine ).

Is there exact data of the performance penalties?

Specifically, what would be the penalty of mixing movhlps ( single precision type ) with addsd ( double precision type ) e.g.

movhlp %xmm1,%xmm2
addsd %xmm2,%xmm3

How much more efficient would be to use the following instead

unpckhpd %xmm1,%xmm1
addsd %xmm1,%xmm3

or

shufpd $1,%xmm1,%xmm1
addsd %xmm1,%xmm3

Thank you,

David Livshin

http://www.dalsoft.com

Intel_Software_Netw1 · ‎06-12-2007

Our engineering contacts responded:

Cross-type (mostly) means mixing vector integer and vector floating point. Mixing packed single and packed double is ok. One can write a code sequence to measure this for specific examples. When you violate the rules, the penalty will depend on which micro-architecture you are using. It may increase in future machines. It is at least 1 clock on Intel Core2 Duo Processors, and in some cases you can pay this more than once.

==

Lexi S.

IntelSoftware NetworkSupport

http://www.intel.com/software

Contact us