Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

VMOVDQU8 + Alignment

Shalomie42
Beginner
1,320 Views

I am from the embedded RISC realm and as a result have not dealt with PC / x86 assembly that much. I have been investigating using SIMD to speed up execution of an algorithm I am developing.

On the chips that I usually code for, unaligned loads carry a large performance penalty if they don't crash the chip at all. Additionally, reading past the end of defined memory can also be problematic.

 

Question 1:

I have been doing some research on aligned / unaligned loads on x86. It seems the consensus is that unaligned loads don't really have a penalty anymore and aligned instructions are essentially there for backwards compatibility / legacy. My code will have to work with unaligned memory pointers and there is no real way to align multiple of them together. Are the unaligned variants of the SIMD instructions going to cause large performance penalties?

 

Question 2:

I have tried to browse some of the information available on the new mask instructions in AVX512, specifically load / stores (moves on x86). Let's take VMOVDQU8 as an example. Using the mask registers, can this instruction be used to read anywhere from 1-64 bytes using an unaligned address assuming that any bytes that should not be read past the end of the address are masked?

In other words, what happens if I execute the instruction on a buffer pointer where the length of the buffer is 3 bytes? Assuming the mask register is set correctly to only read 3 bytes, will the instruction actually try to access the memory past the end and ultimately fault? Let's assume that the "end" of memory actually occurs after the 3 bytes and not something like "oh well, it depends, because usually the memory allocator only allocates blocks and length of aligned size, so you may actually have a bunch of bytes left" or "are you next to a page boundary?". Both no. Assume something like my buffer address is literally 0xFFFF_FFFF_FFFF_FFFD or next to a page boundary. 

0 Kudos
0 Replies
Reply