Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
30 Views

SSE instruction error

Could someone please point out what the problem is with the following snippet. An exception occurs in MSVC.

__m128i R0, R1;

__m128i *R0P1, *R0P2;

__m128i *R1P1, *R1P2;

char buf_chr[20] = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',

'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't'

};

R0 = _mm_loadu_si128((__m128i*) (buf_chr));

R1 = _mm_loadu_si128((__m128i*) (buf_int));

R0P1 = (__m128i *) (buf_chr);

R0P2 = (__m128i *) (buf_chr);

R0 = _mm_add_epi8(R0, R1);

*R0P1 = _mm_add_epi8((__m128i) *R0P1, (__m128i) *R0P2); // ---------> Exception occurs here.

Is there a way to do the addition if I have a register used as a pointer. Can de-referencing be done by using *? This question arises because the following statements pass without incident!

R0P1++;

R0P2++;

Thanks and Regards
Deepak

0 Kudos
7 Replies
Highlighted
Beginner
30 Views

Check this: sizeof(__m128i) is 16 bytes. After first increment of R0P1 you are out of your fixed array boundary (20 bytes).
0 Kudos
Highlighted
30 Views

Hi, It happens (the exception) for the very first access. The size of the array is enough for the first access. the problem may be something else. I am interested in the meaning of *Register when Register is used as a pointer (__m128i*). Does *(__m128i*) mean the 16 bytes pointed by the regsiter? Thanks and Regards Deepak
0 Kudos
Highlighted
Beginner
30 Views

I've just reproduced your result. It happens because char[] is generally not aligned to 16 byte boundaries.
0 Kudos
Highlighted
30 Views

My dear friend that is not a problemsince I have used an unaligned load instruction. Please assume everything is right with respect to the buffer. Please read my question with regard to the use of a register as a pointer. What is the possibility to access the content pointed to by the register. I am also working on it and shall get back to you soon.

Thanks and Regards
Deepak

0 Kudos
Highlighted
Beginner
30 Views

My dear friend,

I know how to use debugger. And in debugger I see that your line

*R0P1 = _mm_add_epi8((__m128i) *R0P1, (__m128i) *R0P2);

is translated to

movdqa xmm0,xmmword ptr [eax]
[..skipped..]
paddb xmm1,xmm0

Guess where exception happens if eax is not 16B aligned?

0 Kudos
Highlighted
30 Views

Sorry and thanks mate. Well actually you are right. I had triedin vainand thought thesolution was this

__m128i* R0P1;

Register1 =_mm_loadu_si128(R0P1);

You are right *R0P1 works if the buffer pointed to by R0P1 is aligned to a 16 byte boundary.

Register1 = *R0P1;

Follow up question: What if it is not an aligned address?Can we use the *operator to get the contents.


Regards
Deepak
0 Kudos
Highlighted
Beginner
30 Views

If it is not aligned it is fatser to use usual registers (eax, edx, etc) and don't bother with SSE optimization
0 Kudos