I am a newbie who is learning avx，I want to know why loading data in avx1 needs 32-bit address alignment, can it be that the cpu can read 32 bytes of data at a time (my shallow knowledge thinks the cpu reads 8 bytes of data at a time), and why in avx2 The load function such as mm256_adds_epi16 does not require memory alignment.
I would be very grateful if someone would like to guide me
Thank you for posting in Intel Communities.
>>"can it be that the cpu can read 32 bytes of data at a time"
Intel® Advanced Vector Extensions (Intel® AVX) instructions use 256-bit(32 bytes) registers which are extensions of the 128-bit SIMD registers. Each of these registers can hold more than one data element, the processor can process more than one data element simultaneously. This processing capability is also known as single-instruction multiple data processing (SIMD). For more information refer to the below link:
For the other questions, we will get back to you soon.
Thanks & Regards,
mm256_adds_epi16 takes two register arguments as input and therefore has no memory reference (alignment) requirement.
There are two AVX load/store major groupings, one for aligned data and one for unaligned data.
_mm256_load_si256 (__m256i const * mem_addr)
Load 256-bits of integer data from memory into dst.
mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
_mm256_loadu_si256 (__m256i const * mem_addr)
Load 256-bits of integer data from memory into dst. mem_addr does not need to be aligned on any particular boundary.
Depending on CPU architecture, the unaligned load/store can take longer to execute.
The Intel Intrinsics Guide is quite helpful.
thanks for your reply
I'm sorry that I made a slip in writing , what I really mean is that mm256_load_epi32 does not require memory alignment, and that makes me confused. I think the alignment is for better performance , but why it must be 32-bit address alignment , how can cpu get better performance by 32-bit address alignment?
_mm256_load_epi32 (void const* mem_addr) loads 256-bits (composed of 8 packed 32-bit integers) from memory into dst. And mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
For more information, you can refer to the intrinsic guide https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#=undefined&ig_expand=5628,4....
We haven't heard back from you for a long time so we are assuming that the provided details helped you in solving your problem. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.