Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.
7744 Discussions

why intrinsics like __mm256_load_pd need 32-bit address alignment instead of 8-bit

might
Beginner
568 Views

  I am a newbie who is learning avx,I want to know why  loading data in avx1 needs 32-bit address alignment, can it be that the cpu can read 32 bytes of data at a time (my shallow knowledge thinks the cpu reads 8 bytes of data at a time), and why in avx2 The load function such as mm256_adds_epi16 does not require memory alignment.

  I would be very grateful if someone would like to guide me

0 Kudos
5 Replies
HemanthCH_Intel
Moderator
537 Views

Hi,

 

Thank you for posting in Intel Communities.

 

>>"can it be that the cpu can read 32 bytes of data at a time"

Intel® Advanced Vector Extensions (Intel® AVX) instructions use 256-bit(32 bytes) registers which are extensions of the 128-bit SIMD registers. Each of these registers can hold more than one data element, the processor can process more than one data element simultaneously. This processing capability is also known as single-instruction multiple data processing (SIMD). For more information refer to the below link:

https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-refer...

 

For the other questions, we will get back to you soon.

 

Thanks & Regards,

Hemanth

 

jimdempseyatthecove
Black Belt
514 Views

mm256_adds_epi16 takes two register arguments as input and therefore has no memory reference (alignment) requirement.

There are two AVX load/store major groupings, one for aligned data and one for unaligned data.

For example:

_mm256_load_si256 (__m256i const * mem_addr)

   Load 256-bits of integer data from memory into dst.

   mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

 

_mm256_loadu_si256 (__m256i const * mem_addr)

    Load 256-bits of integer data from memory into dst. mem_addr does not need to be aligned on any particular boundary.

 

Depending on CPU architecture, the unaligned load/store can take longer to execute.

The Intel Intrinsics Guide is quite helpful. 

Jim Dempsey

might
Beginner
492 Views

thanks for your reply

  I'm sorry that I made a slip in writing , what I really mean is that mm256_load_epi32 does not require memory alignment, and that makes me confused. I think the alignment is for better performance , but why it must be 32-bit address alignment , how can cpu get better performance by 32-bit address alignment?

cw_intel
Moderator
413 Views

Hi,


_mm256_load_epi32 (void const* mem_addr) loads 256-bits (composed of 8 packed 32-bit integers) from memory into dst. And mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.


For more information, you can refer to the intrinsic guide https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#=undefined&ig_expand=5628,4....


Thanks



cw_intel
Moderator
361 Views

Hi,


We haven't heard back from you for a long time so we are assuming that the provided details helped you in solving your problem. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.


Thanks


Reply