Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

load and loadu - alignment

Christian_M_2
ビギナー
1,827件の閲覧回数

Hello,

for data fetching there always are load and loadu intrinsics. load only accepts aligned addresses and loadu will work in both cases.

But what about performance? Latency and Throughput of both instructions is the same according to Intel Intrinsics Guide. What will happen if loadu is executed on aligned addresses? Do I get the same performance compared to load? Or is loadu slower regardless of the real alignment of the given address.

Thanky for any hints!

0 件の賞賛
3 返答(返信)
TimP
名誉コントリビューター III
1,827件の閲覧回数
On recent Intel CPU models (those which support SSE4), unaligned load is supposed to be as fast as aligned. Beginning with Intel 12.0 compilers, the aligned instructions aren't used for SSE4 or AVX, even when generating code which requires alignment. For AVX code, where alignment is expected, the compilers use AVX-256 movups/movupd, while the moves are split into AVX-128 pairs when alignment is unknown. On the Ivy Bridge corei7-3 CPUs, AVX-256 unaligned load should be faster than pairs of AVX-128 loads, regardless of alignment, but I haven't seen a compiler make that distinction.
Bernard
高評価コントリビューター I
1,827件の閲覧回数
@Tim While coding in inline assembly can I issue this instruction : "add xmm0,[eax+offset]", where eax register holds the value of the pointer to an aligned SoA. I have Core i3 CPU. Thanks in advance.
Christian_M_2
ビギナー
1,827件の閲覧回数
@ Tim: Thank you for your explanation! I did some tests and found that aligned and unaligned reach nearly same speed.
返信