Newbie: SSE with integers

spertulo · ‎11-16-2010

I'm newbie in SSEx and now I want perform integer calculations with SSE. All examples I found in the NET are with float arithmetics. The operations itself are not problem, for example PSUBD for 4xDWORD. But I don't know how to load/store four DWORDs in/from XMM register. By float for loading usually MOVAPS XMMn,m128 is used. Do this command load right integer values? I'm confused because Intel documentation writes the name of the instruction as

"Move Aligned Packed Single-Precision Floating-Point Values"

therefore it would destinated explicitly for floats and the documentation do not mention, can this instruction load also integers or not.

Thanks
spertulo

Nicolae_P_Intel · ‎11-16-2010

have you had a look at movdqa (for 16 byte aligned values)or movdqu (for unaligned values)?

spertulo · ‎11-16-2010

Yes, I know these instructions: they loads/stores quadwords. For example, the
movdqa XMM0,MemAddr
will load
MemAddr - MemAddr +31 to second dword, i.e. to bits 32-63 of XMM register,
MemAddr + 32 - MemAddr +63 to bits 0-31 of register,
MemAddr +64 - MemAddr + 95 -> bits 96-127 of register,
MemAddr +96 - MemAddr +127 -> bits 64-95 of regiter.

In other words, DWORDs 0 and 1 will be interchanged and DWORDs 2 and 3 too.
Now I understand, that this does not matter because it will store the data in right order.

Example:
Movdqa XMM0,Mem1 ; interchanges
Movdqa XMM1, Mem2 ; interchanges in the same way
Paddd XMM0, XMM2 ; adds the right pairs, i.e. equally interchanged
Movdqa Mem1,XMM0 ; stores interchanging back, therefore correctly.

But a shorter variant works not correctly:

Movdqa XMM0,Mem1 ; interchanges
Paddd XMM0,Mem2 ; adds the wrong pairs, because one pair interchanged, the other not.
Movdqa Mem1,XMM0 ; stores this time invalid result.

Thanks for Your help

xift · ‎11-16-2010

Are you sure about movdqa interleaving the dwords?

I don't think so!

The instruction set in the reference manual doesn't say anything about that either!

spertulo · ‎11-16-2010

Yes, I'm sure because numbers in memory are represented in big endian, en register in little endian.

0xr · ‎11-18-2010

You're wrong. From the HW perspecitive, numbers in memory are always in little-endian byte order, at least on x86 / x64 architectures. You can, of course, put them there in big-endian yourselves, but that's a different story...

There's absolutely no shuffling in MOVDQA / MOVDQU. In addition, these instructions don't move quadwords, but double quadwords "DQ" (128b - the whole XMM register). But it's irrelevant. Due to little-endian you can use them for all data types - BYTE, WORD, ... , DQWORD without any troubles.

You can also use their FP counterparts as well. On current CPU's (and very very likely on all future ones) there's no real functional or performance difference between MOVAPS, MOVAPD and MOVDQA; ORPS, ORPD and POR etc. The only difference is in encoding, where the "PS" instructions are one byte shorter, which may slightly improve performance under some circumstances (but I haven't measured it yet :)). However, Intel manuals advice to use FP variants for FP data and integer variants for integers.