Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

why intrinsics like __mm256_load_pd need 32-bit address alignment instead of 8-bit

might
Beginner
1,503 Views

  I am a newbie who is learning avx,I want to know why  loading data in avx1 needs 32-bit address alignment, can it be that the cpu can read 32 bytes of data at a time (my shallow knowledge thinks the cpu reads 8 bytes of data at a time), and why in avx2 The load function such as mm256_adds_epi16 does not require memory alignment.

  I would be very grateful if someone would like to guide me

0 Kudos
5 Replies
HemanthCH_Intel
Moderator
1,472 Views

Hi,

 

Thank you for posting in Intel Communities.

 

>>"can it be that the cpu can read 32 bytes of data at a time"

Intel® Advanced Vector Extensions (Intel® AVX) instructions use 256-bit(32 bytes) registers which are extensions of the 128-bit SIMD registers. Each of these registers can hold more than one data element, the processor can process more than one data element simultaneously. This processing capability is also known as single-instruction multiple data processing (SIMD). For more information refer to the below link:

https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/details-about-intrinsics.html

 

For the other questions, we will get back to you soon.

 

Thanks & Regards,

Hemanth

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,449 Views

mm256_adds_epi16 takes two register arguments as input and therefore has no memory reference (alignment) requirement.

There are two AVX load/store major groupings, one for aligned data and one for unaligned data.

For example:

_mm256_load_si256 (__m256i const * mem_addr)

   Load 256-bits of integer data from memory into dst.

   mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

 

_mm256_loadu_si256 (__m256i const * mem_addr)

    Load 256-bits of integer data from memory into dst. mem_addr does not need to be aligned on any particular boundary.

 

Depending on CPU architecture, the unaligned load/store can take longer to execute.

The Intel Intrinsics Guide is quite helpful. 

Jim Dempsey

0 Kudos
might
Beginner
1,427 Views

thanks for your reply

  I'm sorry that I made a slip in writing , what I really mean is that mm256_load_epi32 does not require memory alignment, and that makes me confused. I think the alignment is for better performance , but why it must be 32-bit address alignment , how can cpu get better performance by 32-bit address alignment?

0 Kudos
cw_intel
Moderator
1,348 Views

Hi,


_mm256_load_epi32 (void const* mem_addr) loads 256-bits (composed of 8 packed 32-bit integers) from memory into dst. And mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.


For more information, you can refer to the intrinsic guide https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#=undefined&ig_expand=5628,4229,3849,4370,4234&text=mm256_load_epi32.


Thanks



0 Kudos
cw_intel
Moderator
1,296 Views

Hi,


We haven't heard back from you for a long time so we are assuming that the provided details helped you in solving your problem. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.


Thanks


0 Kudos
Reply