- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

need help, _mm_mullo_pi16() could not multiply big numbers.any suggestion to what i should do? just how do i multiply two 1-D arrays with big value(positive and negative)? x0,b,b1,s0,s1 are vectors (arrays) and _f, Adjust[], _scale are scalar integers.

// C++ codes

int x0 = s0 + s1;

if(x0 < 0)

short b

else

short b

// MMX intrinsic codes

__m64*b1 = (__m64*)b;

__m64 s0,s1,s2,s3,x0;

j=0;

__m64 r0,r1,t0,t1,t2,p0,p1;

r0 =_mm_set_pi16(Adjust[_qm],Adjust[_qm],Adjust[_qm],Adjust[_qm]);

r1 =_mm_set_pi16(_f,_f,_f,_f);

x0 =_mm_add_pi16(s0,s1);

t1 =_mm_cmpgt_pi16(_mm_set1_pi16(0),x0);

t2 = _mm_mullo_pi16((_mm_sub_pi16(_mm_setzero_si64(),x0)),r0);

t0 = _mm_mullo_pi16(x0,r0);

p0 =_mm_srai_pi16(,_scale );

p1 =_mm_srai_pi16(_mm_add_pi16(t2,r1),_scale );

b1

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

there is also a mulhi for the upper part

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

r0 ={11912,11912,11912,11912}

what i want is to multiply x0 and r0 using _m64 data type ie using _mm_mullo_ep16() and _mm_mulhi_ep16().

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

__m128i x1 = _mm_loadu_si128(&x0); // loads 16 Bytes (only 8 are used)

__m128i r1 = _mm_loadu_si128(&r0); // loads 16 Bytes (only 8 are used)

__m128i x_sse = _mm_cvtepi16_epi32(x1); // convert lower 4 16-bit values to 32-bit values with sign-extension

__m128i r_sse = _mm_cvtepi16_epi32(r1); // convert lower 4 16-bit values to 32-bit values with sign-extension

__m128i res_sse = _mm_mullo_epi32(x_sse, r_sse); // multiply 4 signed 32-bit values

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

You can compute the lower 16 bits and the upper 16 bits of the 32-bit results separately. Afterwards, you will need to interleave them in order to get the full 32-bit results. Something like this should work:

_m128i hi = _mm_mulhi_epi16(a, b);

_m128i lo = _mm_mullo_epi16(a,b);

_m128i r0 = _mm_unpacklo_epi16(lo, hi);

_m128i r1 = _mm_unpackhi_epi16(lo,hi);

a and b contain 8 16-bit values that you would like to multiply. r0 contains the first 4 32-bit results; r1 contains the remaining 4 32-bit results. These instructions come with the SSE2 instruction set.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page