Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

SSE instruction error

inteleverywhere
Beginner
503 Views

Could someone please point out what the problem is with the following snippet. An exception occurs in MSVC.

__m128i R0, R1;

__m128i *R0P1, *R0P2;

__m128i *R1P1, *R1P2;

char buf_chr[20] = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',

'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't'

};

R0 = _mm_loadu_si128((__m128i*) (buf_chr));

R1 = _mm_loadu_si128((__m128i*) (buf_int));

R0P1 = (__m128i *) (buf_chr);

R0P2 = (__m128i *) (buf_chr);

R0 = _mm_add_epi8(R0, R1);

*R0P1 = _mm_add_epi8((__m128i) *R0P1, (__m128i) *R0P2); // ---------> Exception occurs here.

Is there a way to do the addition if I have a register used as a pointer. Can de-referencing be done by using *? This question arises because the following statements pass without incident!

R0P1++;

R0P2++;

Thanks and Regards
Deepak

0 Kudos
7 Replies
Arthur_Moroz
Beginner
503 Views
Check this: sizeof(__m128i) is 16 bytes. After first increment of R0P1 you are out of your fixed array boundary (20 bytes).
0 Kudos
inteleverywhere
Beginner
503 Views
Hi, It happens (the exception) for the very first access. The size of the array is enough for the first access. the problem may be something else. I am interested in the meaning of *Register when Register is used as a pointer (__m128i*). Does *(__m128i*) mean the 16 bytes pointed by the regsiter? Thanks and Regards Deepak
0 Kudos
Arthur_Moroz
Beginner
503 Views
I've just reproduced your result. It happens because char[] is generally not aligned to 16 byte boundaries.
0 Kudos
inteleverywhere
Beginner
503 Views

My dear friend that is not a problemsince I have used an unaligned load instruction. Please assume everything is right with respect to the buffer. Please read my question with regard to the use of a register as a pointer. What is the possibility to access the content pointed to by the register. I am also working on it and shall get back to you soon.

Thanks and Regards
Deepak

0 Kudos
Arthur_Moroz
Beginner
503 Views
My dear friend,

I know how to use debugger. And in debugger I see that your line

*R0P1 = _mm_add_epi8((__m128i) *R0P1, (__m128i) *R0P2);

is translated to

movdqa xmm0,xmmword ptr [eax]
[..skipped..]
paddb xmm1,xmm0

Guess where exception happens if eax is not 16B aligned?

0 Kudos
inteleverywhere
Beginner
503 Views
Sorry and thanks mate. Well actually you are right. I had triedin vainand thought thesolution was this

__m128i* R0P1;

Register1 =_mm_loadu_si128(R0P1);

You are right *R0P1 works if the buffer pointed to by R0P1 is aligned to a 16 byte boundary.

Register1 = *R0P1;

Follow up question: What if it is not an aligned address?Can we use the *operator to get the contents.


Regards
Deepak
0 Kudos
Arthur_Moroz
Beginner
503 Views
If it is not aligned it is fatser to use usual registers (eax, edx, etc) and don't bother with SSE optimization
0 Kudos
Reply