- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Intel 64 and IA-32 Architectures Optimization Reference Manual states ( see 5.1)
that
"Code sequences containing cross-typed usage produce the same result across
different implementations but incur a significant performance penalty. Using
SSE/SSE2/SSE3/SSSE3 instructions to operate on type-mismatched SIMD data
in the XMM register is strongly discouraged". ( underline is mine ).
Is there exact data of the performance penalties?
Specifically, what would be the penalty of mixing movhlps ( single precision type ) with addsd ( double precision type ) e.g.
movhlp %xmm1,%xmm2
addsd %xmm2,%xmm3
How much more efficient would be to use the following instead
unpckhpd %xmm1,%xmm1
addsd %xmm1,%xmm3
or
shufpd $1,%xmm1,%xmm1Thank you,
addsd %xmm1,%xmm3
David Livshin
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our engineering contacts responded:
Cross-type (mostly) means mixing vector integer and vector floating point. Mixing packed single and packed double is ok. One can write a code sequence to measure this for specific examples. When you violate the rules, the penalty will depend on which micro-architecture you are using. It may increase in future machines. It is at least 1 clock on Intel Core2 Duo Processors, and in some cases you can pay this more than once.
==
Lexi S.
IntelSoftware NetworkSupport

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page