I amdoing consecutive load operations using _mm_loadu_si128() in my appl.. The two load operations using this instruction are using addressesas m1+len+h. First load operation uses xm1=_mm_loadu_si128(m1+16-1) , and second load operation uses xm2=_mm_loadu_si128(m1+16+0). I expect xm1 and xm2 to be similar except for the m128i_i8 when xm1 is shifted by left by 1. But, the result is something else. None of the 8-bit elements are same between xm1 and xm2. Is it something with memory address alignment; but _mm_loadu_si128() is supposed for non-aligned also.
Hoping for getting quick suggestions on this.
Is m1 a pointer to __m128i? In this case, adding 15 to m1 will point to m1 with an offset of size_of(__m128i)*15=16*15 Bytes.
I think that the following is what you want:
xm1 = _mm_loadu_si128((__m128i*) (((char*)m1)+16-1));