Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Christian_M_2
Beginner
185 Views

load and loadu - alignment

Hello,

for data fetching there always are load and loadu intrinsics. load only accepts aligned addresses and loadu will work in both cases.

But what about performance? Latency and Throughput of both instructions is the same according to Intel Intrinsics Guide. What will happen if loadu is executed on aligned addresses? Do I get the same performance compared to load? Or is loadu slower regardless of the real alignment of the given address.

Thanky for any hints!

0 Kudos
3 Replies
TimP
Black Belt
185 Views

On recent Intel CPU models (those which support SSE4), unaligned load is supposed to be as fast as aligned. Beginning with Intel 12.0 compilers, the aligned instructions aren't used for SSE4 or AVX, even when generating code which requires alignment. For AVX code, where alignment is expected, the compilers use AVX-256 movups/movupd, while the moves are split into AVX-128 pairs when alignment is unknown. On the Ivy Bridge corei7-3 CPUs, AVX-256 unaligned load should be faster than pairs of AVX-128 loads, regardless of alignment, but I haven't seen a compiler make that distinction.
Bernard
Black Belt
185 Views

@Tim While coding in inline assembly can I issue this instruction : "add xmm0,[eax+offset]", where eax register holds the value of the pointer to an aligned SoA. I have Core i3 CPU. Thanks in advance.
Christian_M_2
Beginner
185 Views

@ Tim: Thank you for your explanation! I did some tests and found that aligned and unaligned reach nearly same speed.
Reply